Compare commits
1 Commits
feat/certi
...
adr/017-st
| Author | SHA1 | Date | |
|---|---|---|---|
| adb0e7014d |
@@ -0,0 +1,95 @@
|
|||||||
|
# Architecture Decision Record: Staleness-Based Failover Mechanism & Observability
|
||||||
|
|
||||||
|
**Status:** Proposed
|
||||||
|
**Date:** 2026-01-09
|
||||||
|
**Precedes:** [016-Harmony-Agent-And-Global-Mesh-For-Decentralized-Workload-Management.md](https://git.nationtech.io/NationTech/harmony/raw/branch/master/adr/016-Harmony-Agent-And-Global-Mesh-For-Decentralized-Workload-Management.md)
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
In ADR 016, we established the **Harmony Agent** and the **Global Orchestration Mesh** (powered by NATS JetStream) as the foundation for our decentralized infrastructure. We defined the high-level need for a `FailoverStrategy` that can support both financial consistency (CP) and AI availability (AP).
|
||||||
|
|
||||||
|
However, a specific implementation challenge remains: **How do we reliably detect node failure without losing the ability to debug the event later?**
|
||||||
|
|
||||||
|
Standard distributed systems often use "Key Expiration" (TTL) for heartbeats. If a key disappears, the node is presumed dead. While simple, this approach is catastrophic for post-mortem analysis. When the key expires, the evidence of *when* and *how* the failure occurred evaporates.
|
||||||
|
|
||||||
|
For NationTech’s vision of **Humane Computing**—where micro datacenters might be heating a family home or running a local business—reliability and diagnosability are paramount. If a cluster fails over, we owe it to the user to provide a clear, historical log of exactly what happened. We cannot build a "wonderful future for computers" on ephemeral, untraceable errors.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
We will implement a **Staleness Detection** mechanism rather than a Key Expiration mechanism. We will leverage NATS JetStream Key-Value (KV) stores with **History Enabled** to create an immutable audit trail of cluster health.
|
||||||
|
|
||||||
|
### 1. The "Black Box" Flight Recorder (NATS Configuration)
|
||||||
|
We will utilize a persistent NATS KV bucket named `harmony_failover`.
|
||||||
|
* **Storage:** File (Persistent).
|
||||||
|
* **History:** Set to `64` (or higher). This allows us to query the last 64 heartbeat entries to visualize the exact degradation of the primary node before failure.
|
||||||
|
* **TTL:** None. Data never disappears; it only becomes "stale."
|
||||||
|
|
||||||
|
### 2. Data Structures
|
||||||
|
We will define two primary schemas to manage the state.
|
||||||
|
|
||||||
|
|
||||||
|
**A. The Rules of Engagement (`cluster_config`)**
|
||||||
|
This persistent key defines the behavior of the mesh. It allows us to tune failover sensitivity dynamically without redeploying the Agent binary.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"primary_site_id": "site-a-basement",
|
||||||
|
"replica_site_id": "site-b-cloud",
|
||||||
|
"failover_timeout_ms": 5000, // Time before Replica takes over
|
||||||
|
"heartbeat_interval_ms": 1000 // Frequency of Primary updates
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Note :** The location for this configuration data structure is TBD. See https://git.nationtech.io/NationTech/harmony/issues/206
|
||||||
|
|
||||||
|
**B. The Heartbeat (`primary_heartbeat`)**
|
||||||
|
The Primary writes this; the Replica watches it.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"site_id": "site-a-basement",
|
||||||
|
"status": "HEALTHY",
|
||||||
|
"counter": 10452,
|
||||||
|
"timestamp": 1704661549000
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. The Failover Algorithm
|
||||||
|
|
||||||
|
**The Primary (Site A) Logic:**
|
||||||
|
The Primary's ability to write to the mesh is its "License to Operate."
|
||||||
|
1. **Write Loop:** Attempts to write `primary_heartbeat` every `heartbeat_interval_ms`.
|
||||||
|
2. **Self-Preservation (Fencing):** If the write fails (NATS Ack timeout or NATS unreachable), the Primary **immediately self-demotes**. It assumes it is network-isolated. This prevents Split Brain scenarios where a partitioned Primary continues to accept writes while the Replica promotes itself.
|
||||||
|
|
||||||
|
**The Replica (Site B) Logic:**
|
||||||
|
The Replica acts as the watchdog.
|
||||||
|
1. **Watch:** Subscribes to updates on `primary_heartbeat`.
|
||||||
|
2. **Staleness Check:** Maintains a local timer. Every time a heartbeat arrives, the timer resets.
|
||||||
|
3. **Promotion:** If the timer exceeds `failover_timeout_ms`, the Replica declares the Primary dead and promotes itself to Leader.
|
||||||
|
4. **Yielding:** If the Replica is Leader, but suddenly receives a valid, new heartbeat from the configured `primary_site_id` (indicating the Primary has recovered), the Replica will voluntarily **demote** itself to restore the preferred topology.
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
**Observability as a First-Class Citizen**
|
||||||
|
By keeping the last 64 heartbeats, we can run `nats kv history` to see the exact timeline. Did the Primary stop suddenly (crash)? or did the heartbeats become erratic and slow before stopping (network congestion)? This data is critical for optimizing the "Micro Data Centers" described in our vision, where internet connections in residential areas may vary in quality.
|
||||||
|
|
||||||
|
**Energy Efficiency & Resource Optimization**
|
||||||
|
NationTech aims to "maximize the value of our energy." A "flapping" cluster (constantly failing over and back) wastes immense energy in data re-synchronization and startup costs. By making the `failover_timeout_ms` configurable via `cluster_config`, we can tune a cluster heating a greenhouse to be less sensitive (slower failover is fine) compared to a cluster running a payment gateway.
|
||||||
|
|
||||||
|
**Decentralized Trust**
|
||||||
|
This architecture relies on NATS as the consensus engine. If the Primary is part of the NATS majority, it lives. If it isn't, it dies. This removes ambiguity and allows us to scale to thousands of independent sites without a central "God mode" controller managing every single failover.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive**
|
||||||
|
* **Auditability:** Every failover event leaves a permanent trace in the KV history.
|
||||||
|
* **Safety:** The "Write Ack" check on the Primary provides a strong guarantee against Split Brain in `AbsoluteConsistency` mode.
|
||||||
|
* **Dynamic Tuning:** We can adjust timeouts for specific environments (e.g., high-latency satellite links) by updating a JSON key, requiring no downtime.
|
||||||
|
|
||||||
|
**Negative**
|
||||||
|
* **Storage Overhead:** Keeping history requires marginally more disk space on the NATS servers, though for 64 small JSON payloads, this is negligible.
|
||||||
|
* **Clock Skew:** While we rely on NATS server-side timestamps for ordering, extreme clock skew on the client side could confuse the debug logs (though not the failover logic itself).
|
||||||
|
|
||||||
|
## Alignment with Vision
|
||||||
|
This architecture supports the NationTech goal of a **"Beautifully Integrated Design."** It takes the complex, high-stakes problem of distributed consensus and wraps it in a mechanism that is robust enough for enterprise banking yet flexible enough to manage a basement server heating a swimming pool. It bridges the gap between the reliability of Web2 clouds and the decentralized nature of Web3 infrastructure.
|
||||||
|
```
|
||||||
@@ -1,19 +0,0 @@
|
|||||||
[package]
|
|
||||||
name = "cert_manager"
|
|
||||||
edition = "2024"
|
|
||||||
version.workspace = true
|
|
||||||
readme.workspace = true
|
|
||||||
license.workspace = true
|
|
||||||
publish = false
|
|
||||||
|
|
||||||
[dependencies]
|
|
||||||
harmony = { path = "../../harmony" }
|
|
||||||
harmony_cli = { path = "../../harmony_cli" }
|
|
||||||
harmony_types = { path = "../../harmony_types" }
|
|
||||||
cidr = { workspace = true }
|
|
||||||
tokio = { workspace = true }
|
|
||||||
harmony_macros = { path = "../../harmony_macros" }
|
|
||||||
log = { workspace = true }
|
|
||||||
env_logger = { workspace = true }
|
|
||||||
url = { workspace = true }
|
|
||||||
assert_cmd = "2.0.16"
|
|
||||||
@@ -1,26 +0,0 @@
|
|||||||
use harmony::{
|
|
||||||
inventory::Inventory,
|
|
||||||
modules::{
|
|
||||||
cert_manager::{
|
|
||||||
capability::CertificateManagementConfig, score_k8s::CertificateManagementScore,
|
|
||||||
},
|
|
||||||
postgresql::{PostgreSQLScore, capability::PostgreSQLConfig},
|
|
||||||
},
|
|
||||||
topology::K8sAnywhereTopology,
|
|
||||||
};
|
|
||||||
|
|
||||||
#[tokio::main]
|
|
||||||
async fn main() {
|
|
||||||
let cert_manager = CertificateManagementScore {
|
|
||||||
config: CertificateManagementConfig {},
|
|
||||||
};
|
|
||||||
|
|
||||||
harmony_cli::run(
|
|
||||||
Inventory::autoload(),
|
|
||||||
K8sAnywhereTopology::from_env(),
|
|
||||||
vec![Box::new(cert_manager)],
|
|
||||||
None,
|
|
||||||
)
|
|
||||||
.await
|
|
||||||
.unwrap();
|
|
||||||
}
|
|
||||||
@@ -17,10 +17,6 @@ use crate::{
|
|||||||
interpret::InterpretStatus,
|
interpret::InterpretStatus,
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
modules::{
|
modules::{
|
||||||
cert_manager::{
|
|
||||||
capability::{CertificateManagement, CertificateManagementConfig},
|
|
||||||
operator::CertManagerOperatorScore,
|
|
||||||
},
|
|
||||||
k3d::K3DInstallationScore,
|
k3d::K3DInstallationScore,
|
||||||
k8s::ingress::{K8sIngressScore, PathType},
|
k8s::ingress::{K8sIngressScore, PathType},
|
||||||
monitoring::{
|
monitoring::{
|
||||||
@@ -363,27 +359,6 @@ impl Serialize for K8sAnywhereTopology {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl CertificateManagement for K8sAnywhereTopology {
|
|
||||||
async fn install(
|
|
||||||
&self,
|
|
||||||
config: &CertificateManagementConfig,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
let cert_management_operator = CertManagerOperatorScore::default();
|
|
||||||
|
|
||||||
cert_management_operator
|
|
||||||
.interpret(&Inventory::empty(), self)
|
|
||||||
.await
|
|
||||||
.map_err(|e| PreparationError { msg: e.to_string() })?;
|
|
||||||
Ok(PreparationOutcome::Success {
|
|
||||||
details: format!(
|
|
||||||
"Installed cert-manager into ns: {}",
|
|
||||||
cert_management_operator.namespace
|
|
||||||
),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl K8sAnywhereTopology {
|
impl K8sAnywhereTopology {
|
||||||
pub fn from_env() -> Self {
|
pub fn from_env() -> Self {
|
||||||
Self {
|
Self {
|
||||||
|
|||||||
@@ -1,18 +0,0 @@
|
|||||||
use async_trait::async_trait;
|
|
||||||
use serde::Serialize;
|
|
||||||
|
|
||||||
use crate::{
|
|
||||||
interpret::Outcome,
|
|
||||||
topology::{PreparationError, PreparationOutcome},
|
|
||||||
};
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait CertificateManagement: Send + Sync {
|
|
||||||
async fn install(
|
|
||||||
&self,
|
|
||||||
config: &CertificateManagementConfig,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError>;
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct CertificateManagementConfig {}
|
|
||||||
@@ -1,6 +1,3 @@
|
|||||||
pub mod capability;
|
|
||||||
pub mod cluster_issuer;
|
pub mod cluster_issuer;
|
||||||
mod helm;
|
mod helm;
|
||||||
pub mod operator;
|
|
||||||
pub mod score_k8s;
|
|
||||||
pub use helm::*;
|
pub use helm::*;
|
||||||
|
|||||||
@@ -1,64 +0,0 @@
|
|||||||
use kube::api::ObjectMeta;
|
|
||||||
use serde::Serialize;
|
|
||||||
|
|
||||||
use crate::{
|
|
||||||
interpret::Interpret,
|
|
||||||
modules::k8s::{
|
|
||||||
apps::crd::{Subscription, SubscriptionSpec},
|
|
||||||
resource::K8sResourceScore,
|
|
||||||
},
|
|
||||||
score::Score,
|
|
||||||
topology::{K8sclient, Topology, k8s::K8sClient},
|
|
||||||
};
|
|
||||||
|
|
||||||
/// Install the Cert-Manager Operator via RedHat Community Operators registry.redhat.io/redhat/community-operator-index:v4.19
|
|
||||||
/// This Score creates a Subscription CR in the specified namespace
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct CertManagerOperatorScore {
|
|
||||||
pub namespace: String,
|
|
||||||
pub channel: String,
|
|
||||||
pub install_plan_approval: String,
|
|
||||||
pub source: String,
|
|
||||||
pub source_namespace: String,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Default for CertManagerOperatorScore {
|
|
||||||
fn default() -> Self {
|
|
||||||
Self {
|
|
||||||
namespace: "openshift-operators".to_string(),
|
|
||||||
channel: "stable".to_string(),
|
|
||||||
install_plan_approval: "Automatic".to_string(),
|
|
||||||
source: "community-operators".to_string(),
|
|
||||||
source_namespace: "openshift-marketplace".to_string(),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<T: Topology + K8sclient> Score<T> for CertManagerOperatorScore {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"CertManagerOperatorScore".to_string()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
|
||||||
let metadata = ObjectMeta {
|
|
||||||
name: Some("cert-manager".to_string()),
|
|
||||||
namespace: Some(self.namespace.clone()),
|
|
||||||
..ObjectMeta::default()
|
|
||||||
};
|
|
||||||
|
|
||||||
let spec = SubscriptionSpec {
|
|
||||||
channel: Some(self.channel.clone()),
|
|
||||||
config: None,
|
|
||||||
install_plan_approval: Some(self.install_plan_approval.clone()),
|
|
||||||
name: "cert-manager".to_string(),
|
|
||||||
source: self.source.clone(),
|
|
||||||
source_namespace: self.source_namespace.clone(),
|
|
||||||
starting_csv: None,
|
|
||||||
};
|
|
||||||
|
|
||||||
let subscription = Subscription { metadata, spec };
|
|
||||||
|
|
||||||
K8sResourceScore::single(subscription, Some(self.namespace.clone())).create_interpret()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,66 +0,0 @@
|
|||||||
use async_trait::async_trait;
|
|
||||||
use harmony_types::id::Id;
|
|
||||||
use serde::Serialize;
|
|
||||||
|
|
||||||
use crate::{
|
|
||||||
data::Version,
|
|
||||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
|
||||||
inventory::Inventory,
|
|
||||||
modules::cert_manager::capability::{CertificateManagement, CertificateManagementConfig},
|
|
||||||
score::Score,
|
|
||||||
topology::Topology,
|
|
||||||
};
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct CertificateManagementScore {
|
|
||||||
pub config: CertificateManagementConfig,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<T: Topology + CertificateManagement> Score<T> for CertificateManagementScore {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"CertificateManagementScore".to_string()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
|
||||||
Box::new(CertificateManagementInterpret {
|
|
||||||
config: self.config.clone(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug)]
|
|
||||||
struct CertificateManagementInterpret {
|
|
||||||
config: CertificateManagementConfig,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + CertificateManagement> Interpret<T> for CertificateManagementInterpret {
|
|
||||||
async fn execute(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
let cert_management = topology
|
|
||||||
.install(&self.config)
|
|
||||||
.await
|
|
||||||
.map_err(|e| InterpretError::new(e.to_string()))?;
|
|
||||||
|
|
||||||
Ok(Outcome::success(format!("Installed CertificateManagement")))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_name(&self) -> InterpretName {
|
|
||||||
InterpretName::Custom("CertificateManagementInterpret")
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_version(&self) -> Version {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_status(&self) -> InterpretStatus {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_children(&self) -> Vec<Id> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
Reference in New Issue
Block a user