Compare commits
2 Commits
adr/decent
...
doc-and-br
| Author | SHA1 | Date | |
|---|---|---|---|
| b885c35706 | |||
|
|
bb6b4b7f88 |
15
Cargo.lock
generated
15
Cargo.lock
generated
@@ -6049,21 +6049,6 @@ version = "0.5.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683"
|
||||
|
||||
[[package]]
|
||||
name = "test-score"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"base64 0.22.1",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "thiserror"
|
||||
version = "1.0.69"
|
||||
|
||||
87
README.md
87
README.md
@@ -1,4 +1,6 @@
|
||||
# Harmony : Open-source infrastructure orchestration that treats your platform like first-class code
|
||||
# Harmony
|
||||
|
||||
Open-source infrastructure orchestration that treats your platform like first-class code.
|
||||
|
||||
_By [NationTech](https://nationtech.io)_
|
||||
|
||||
@@ -18,9 +20,7 @@ All in **one strongly-typed Rust codebase**.
|
||||
|
||||
From a **developer laptop** to a **global production cluster**, a single **source of truth** drives the **full software lifecycle.**
|
||||
|
||||
---
|
||||
|
||||
## 1 · The Harmony Philosophy
|
||||
## The Harmony Philosophy
|
||||
|
||||
Infrastructure is essential, but it shouldn’t be your core business. Harmony is built on three guiding principles that make modern platforms reliable, repeatable, and easy to reason about.
|
||||
|
||||
@@ -32,9 +32,18 @@ Infrastructure is essential, but it shouldn’t be your core business. Harmony i
|
||||
|
||||
These principles surface as simple, ergonomic Rust APIs that let teams focus on their product while trusting the platform underneath.
|
||||
|
||||
---
|
||||
## Where to Start
|
||||
|
||||
## 2 · Quick Start
|
||||
We have a comprehensive set of documentation right here in the repository.
|
||||
|
||||
| I want to... | Start Here |
|
||||
| ----------------- | ------------------------------------------------------------------ |
|
||||
| Get Started | [Getting Started Guide](./docs/guides/getting-started.md) |
|
||||
| See an Example | [Use Case: Deploy a Rust Web App](./docs/use-cases/rust-webapp.md) |
|
||||
| Explore | [Documentation Hub](./docs/README.md) |
|
||||
| See Core Concepts | [Core Concepts Explained](./docs/concepts.md) |
|
||||
|
||||
## Quick Look: Deploy a Rust Webapp
|
||||
|
||||
The snippet below spins up a complete **production-grade Rust + Leptos Webapp** with monitoring. Swap it for your own scores to deploy anything from microservices to machine-learning pipelines.
|
||||
|
||||
@@ -92,63 +101,33 @@ async fn main() {
|
||||
}
|
||||
```
|
||||
|
||||
Run it:
|
||||
To run this:
|
||||
|
||||
```bash
|
||||
cargo run
|
||||
```
|
||||
- Clone the repository: `git clone https://git.nationtech.io/nationtech/harmony`
|
||||
- Install dependencies: `cargo build --release`
|
||||
- Run the example: `cargo run --example try_rust_webapp`
|
||||
|
||||
Harmony analyses the code, shows an execution plan in a TUI, and applies it once you confirm. Same code, same binary—every environment.
|
||||
## Documentation
|
||||
|
||||
---
|
||||
All documentation is in the `/docs` directory.
|
||||
|
||||
## 3 · Core Concepts
|
||||
- [Documentation Hub](./docs/README.md): The main entry point for all documentation.
|
||||
- [Core Concepts](./docs/concepts.md): A detailed look at Score, Topology, Capability, Inventory, and Interpret.
|
||||
- [Component Catalogs](./docs/catalogs/README.md): Discover all available Scores, Topologies, and Capabilities.
|
||||
- [Developer Guide](./docs/guides/developer-guide.md): Learn how to write your own Scores and Topologies.
|
||||
|
||||
| Term | One-liner |
|
||||
| ---------------- | ---------------------------------------------------------------------------------------------------- |
|
||||
| **Score<T>** | Declarative description of the desired state (e.g., `LAMPScore`). |
|
||||
| **Interpret<T>** | Imperative logic that realises a `Score` on a specific environment. |
|
||||
| **Topology** | An environment (local k3d, AWS, bare-metal) exposing verified _Capabilities_ (Kubernetes, DNS, …). |
|
||||
| **Maestro** | Orchestrator that compiles Scores + Topology, ensuring all capabilities line up **at compile-time**. |
|
||||
| **Inventory** | Optional catalogue of physical assets for bare-metal and edge deployments. |
|
||||
## Architectural Decision Records
|
||||
|
||||
A visual overview is in the diagram below.
|
||||
- [ADR-001 · Why Rust](adr/001-rust.md)
|
||||
- [ADR-003 · Infrastructure Abstractions](adr/003-infrastructure-abstractions.md)
|
||||
- [ADR-006 · Secret Management](adr/006-secret-management.md)
|
||||
- [ADR-011 · Multi-Tenant Cluster](adr/011-multi-tenant-cluster.md)
|
||||
|
||||
[Harmony Core Architecture](docs/diagrams/Harmony_Core_Architecture.drawio.svg)
|
||||
## Contribute
|
||||
|
||||
---
|
||||
Discussions and roadmap live in [Issues](https://git.nationtech.io/nationtech/harmony/-/issues). PRs, ideas, and feedback are welcome!
|
||||
|
||||
## 4 · Install
|
||||
|
||||
Prerequisites:
|
||||
|
||||
- Rust
|
||||
- Docker (if you deploy locally)
|
||||
- `kubectl` / `helm` for Kubernetes-based topologies
|
||||
|
||||
```bash
|
||||
git clone https://git.nationtech.io/nationtech/harmony
|
||||
cd harmony
|
||||
cargo build --release # builds the CLI, TUI and libraries
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5 · Learning More
|
||||
|
||||
- **Architectural Decision Records** – dive into the rationale
|
||||
- [ADR-001 · Why Rust](adr/001-rust.md)
|
||||
- [ADR-003 · Infrastructure Abstractions](adr/003-infrastructure-abstractions.md)
|
||||
- [ADR-006 · Secret Management](adr/006-secret-management.md)
|
||||
- [ADR-011 · Multi-Tenant Cluster](adr/011-multi-tenant-cluster.md)
|
||||
|
||||
- **Extending Harmony** – write new Scores / Interprets, add hardware like OPNsense firewalls, or embed Harmony in your own tooling (`/docs`).
|
||||
|
||||
- **Community** – discussions and roadmap live in [GitLab issues](https://git.nationtech.io/nationtech/harmony/-/issues). PRs, ideas, and feedback are welcome!
|
||||
|
||||
---
|
||||
|
||||
## 6 · License
|
||||
## License
|
||||
|
||||
Harmony is released under the **GNU AGPL v3**.
|
||||
|
||||
|
||||
@@ -1,114 +0,0 @@
|
||||
# Architecture Decision Record: Higher-Order Topologies
|
||||
|
||||
**Initial Author:** Jean-Gabriel Gill-Couture
|
||||
**Initial Date:** 2025-12-08
|
||||
**Last Updated Date:** 2025-12-08
|
||||
|
||||
## Status
|
||||
|
||||
Implemented
|
||||
|
||||
## Context
|
||||
|
||||
Harmony models infrastructure as **Topologies** (deployment targets like `K8sAnywhereTopology`, `LinuxHostTopology`) implementing **Capabilities** (tech traits like `PostgreSQL`, `Docker`).
|
||||
|
||||
**Higher-Order Topologies** (e.g., `FailoverTopology<T>`) compose/orchestrate capabilities *across* multiple underlying topologies (e.g., primary+replica `T`).
|
||||
|
||||
Naive design requires manual `impl Capability for HigherOrderTopology<T>` *per T per capability*, causing:
|
||||
- **Impl explosion**: N topologies × M capabilities = N×M boilerplate.
|
||||
- **ISP violation**: Topologies forced to impl unrelated capabilities.
|
||||
- **Maintenance hell**: New topology needs impls for *all* orchestrated capabilities; new capability needs impls for *all* topologies/higher-order.
|
||||
- **Barrier to extension**: Users can't easily add topologies without todos/panics.
|
||||
|
||||
This makes scaling Harmony impractical as ecosystem grows.
|
||||
|
||||
## Decision
|
||||
|
||||
Use **blanket trait impls** on higher-order topologies to *automatically* derive orchestration:
|
||||
|
||||
````rust
|
||||
/// Higher-Order Topology: Orchestrates capabilities across sub-topologies.
|
||||
pub struct FailoverTopology<T> {
|
||||
/// Primary sub-topology.
|
||||
primary: T,
|
||||
/// Replica sub-topology.
|
||||
replica: T,
|
||||
}
|
||||
|
||||
/// Automatically provides PostgreSQL failover for *any* `T: PostgreSQL`.
|
||||
/// Delegates to primary for queries; orchestrates deploy across both.
|
||||
#[async_trait]
|
||||
impl<T: PostgreSQL> PostgreSQL for FailoverTopology<T> {
|
||||
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
|
||||
// Deploy primary; extract certs/endpoint;
|
||||
// deploy replica with pg_basebackup + TLS passthrough.
|
||||
// (Full impl logged/elaborated.)
|
||||
}
|
||||
|
||||
// Delegate queries to primary.
|
||||
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
|
||||
self.primary.get_replication_certs(cluster_name).await
|
||||
}
|
||||
// ...
|
||||
}
|
||||
|
||||
/// Similarly for other capabilities.
|
||||
#[async_trait]
|
||||
impl<T: Docker> Docker for FailoverTopology<T> {
|
||||
// Failover Docker orchestration.
|
||||
}
|
||||
````
|
||||
|
||||
**Key properties:**
|
||||
- **Auto-derivation**: `Failover<K8sAnywhere>` gets `PostgreSQL` iff `K8sAnywhere: PostgreSQL`.
|
||||
- **No boilerplate**: One blanket impl per capability *per higher-order type*.
|
||||
|
||||
## Rationale
|
||||
|
||||
- **Composition via generics**: Rust trait solver auto-selects impls; zero runtime cost.
|
||||
- **Compile-time safety**: Missing `T: Capability` → compile error (no panics).
|
||||
- **Scalable**: O(capabilities) impls per higher-order; new `T` auto-works.
|
||||
- **ISP-respecting**: Capabilities only surface if sub-topology provides.
|
||||
- **Centralized logic**: Orchestration (e.g., cert propagation) in one place.
|
||||
|
||||
**Example usage:**
|
||||
````rust
|
||||
// ✅ Works: K8sAnywhere: PostgreSQL → Failover provides failover PG
|
||||
let pg_failover: FailoverTopology<K8sAnywhereTopology> = ...;
|
||||
pg_failover.deploy_pg(config).await;
|
||||
|
||||
// ✅ Works: LinuxHost: Docker → Failover provides failover Docker
|
||||
let docker_failover: FailoverTopology<LinuxHostTopology> = ...;
|
||||
docker_failover.deploy_docker(...).await;
|
||||
|
||||
// ❌ Compile fail: K8sAnywhere !: Docker
|
||||
let invalid: FailoverTopology<K8sAnywhereTopology>;
|
||||
invalid.deploy_docker(...); // `T: Docker` bound unsatisfied
|
||||
````
|
||||
|
||||
## Consequences
|
||||
|
||||
**Pros:**
|
||||
- **Extensible**: New topology `AWSTopology: PostgreSQL` → instant `Failover<AWSTopology>: PostgreSQL`.
|
||||
- **Lean**: No useless impls (e.g., no `K8sAnywhere: Docker`).
|
||||
- **Observable**: Logs trace every step.
|
||||
|
||||
**Cons:**
|
||||
- **Monomorphization**: Generics generate code per T (mitigated: few Ts).
|
||||
- **Delegation opacity**: Relies on rustdoc/logs for internals.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
| Approach | Pros | Cons |
|
||||
|----------|------|------|
|
||||
| **Manual per-T impls**<br>`impl PG for Failover<K8s> {..}`<br>`impl PG for Failover<Linux> {..}` | Explicit control | N×M explosion; violates ISP; hard to extend. |
|
||||
| **Dynamic trait objects**<br>`Box<dyn AnyCapability>` | Runtime flex | Perf hit; type erasure; error-prone dispatch. |
|
||||
| **Mega-topology trait**<br>All-in-one `OrchestratedTopology` | Simple wiring | Monolithic; poor composition. |
|
||||
| **Registry dispatch**<br>Runtime capability lookup | Decoupled | Complex; no compile safety; perf/debug overhead. |
|
||||
|
||||
**Selected**: Blanket impls leverage Rust generics for safe, zero-cost composition.
|
||||
|
||||
## Additional Notes
|
||||
|
||||
- Applies to `MultisiteTopology<T>`, `ShardedTopology<T>`, etc.
|
||||
- `FailoverTopology` in `failover.rs` is first implementation.
|
||||
@@ -1,153 +0,0 @@
|
||||
//! Example of Higher-Order Topologies in Harmony.
|
||||
//! Demonstrates how `FailoverTopology<T>` automatically provides failover for *any* capability
|
||||
//! supported by a sub-topology `T` via blanket trait impls.
|
||||
//!
|
||||
//! Key insight: No manual impls per T or capability -- scales effortlessly.
|
||||
//! Users can:
|
||||
//! - Write new `Topology` (impl capabilities on a struct).
|
||||
//! - Compose with `FailoverTopology` (gets capabilities if T has them).
|
||||
//! - Compile fails if capability missing (safety).
|
||||
|
||||
use async_trait::async_trait;
|
||||
use tokio;
|
||||
|
||||
/// Capability trait: Deploy and manage PostgreSQL.
|
||||
#[async_trait]
|
||||
pub trait PostgreSQL {
|
||||
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String>;
|
||||
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String>;
|
||||
}
|
||||
|
||||
/// Capability trait: Deploy Docker.
|
||||
#[async_trait]
|
||||
pub trait Docker {
|
||||
async fn deploy_docker(&self) -> Result<String, String>;
|
||||
}
|
||||
|
||||
/// Configuration for PostgreSQL deployments.
|
||||
#[derive(Clone)]
|
||||
pub struct PostgreSQLConfig;
|
||||
|
||||
/// Replication certificates.
|
||||
#[derive(Clone)]
|
||||
pub struct ReplicationCerts;
|
||||
|
||||
/// Concrete topology: Kubernetes Anywhere (supports PostgreSQL).
|
||||
#[derive(Clone)]
|
||||
pub struct K8sAnywhereTopology;
|
||||
|
||||
#[async_trait]
|
||||
impl PostgreSQL for K8sAnywhereTopology {
|
||||
async fn deploy(&self, _config: &PostgreSQLConfig) -> Result<String, String> {
|
||||
// Real impl: Use k8s helm chart, operator, etc.
|
||||
Ok("K8sAnywhere PostgreSQL deployed".to_string())
|
||||
}
|
||||
|
||||
async fn get_replication_certs(&self, _cluster_name: &str) -> Result<ReplicationCerts, String> {
|
||||
Ok(ReplicationCerts)
|
||||
}
|
||||
}
|
||||
|
||||
/// Concrete topology: Linux Host (supports Docker).
|
||||
#[derive(Clone)]
|
||||
pub struct LinuxHostTopology;
|
||||
|
||||
#[async_trait]
|
||||
impl Docker for LinuxHostTopology {
|
||||
async fn deploy_docker(&self) -> Result<String, String> {
|
||||
// Real impl: Install/configure Docker on host.
|
||||
Ok("LinuxHost Docker deployed".to_string())
|
||||
}
|
||||
}
|
||||
|
||||
/// Higher-Order Topology: Composes multiple sub-topologies (primary + replica).
|
||||
/// Automatically derives *all* capabilities of `T` with failover orchestration.
|
||||
///
|
||||
/// - If `T: PostgreSQL`, then `FailoverTopology<T>: PostgreSQL` (blanket impl).
|
||||
/// - Same for `Docker`, etc. No boilerplate!
|
||||
/// - Compile-time safe: Missing `T: Capability` → error.
|
||||
#[derive(Clone)]
|
||||
pub struct FailoverTopology<T> {
|
||||
/// Primary sub-topology.
|
||||
pub primary: T,
|
||||
/// Replica sub-topology.
|
||||
pub replica: T,
|
||||
}
|
||||
|
||||
/// Blanket impl: Failover PostgreSQL if T provides PostgreSQL.
|
||||
/// Delegates reads to primary; deploys to both.
|
||||
#[async_trait]
|
||||
impl<T: PostgreSQL + Send + Sync + Clone> PostgreSQL for FailoverTopology<T> {
|
||||
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
|
||||
// Orchestrate: Deploy primary first, then replica (e.g., via pg_basebackup).
|
||||
let primary_result = self.primary.deploy(config).await?;
|
||||
let replica_result = self.replica.deploy(config).await?;
|
||||
Ok(format!("Failover PG deployed: {} | {}", primary_result, replica_result))
|
||||
}
|
||||
|
||||
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
|
||||
// Delegate to primary (replica follows).
|
||||
self.primary.get_replication_certs(cluster_name).await
|
||||
}
|
||||
}
|
||||
|
||||
/// Blanket impl: Failover Docker if T provides Docker.
|
||||
#[async_trait]
|
||||
impl<T: Docker + Send + Sync + Clone> Docker for FailoverTopology<T> {
|
||||
async fn deploy_docker(&self) -> Result<String, String> {
|
||||
// Orchestrate across primary + replica.
|
||||
let primary_result = self.primary.deploy_docker().await?;
|
||||
let replica_result = self.replica.deploy_docker().await?;
|
||||
Ok(format!("Failover Docker deployed: {} | {}", primary_result, replica_result))
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
let config = PostgreSQLConfig;
|
||||
|
||||
println!("=== ✅ PostgreSQL Failover (K8sAnywhere supports PG) ===");
|
||||
let pg_failover = FailoverTopology {
|
||||
primary: K8sAnywhereTopology,
|
||||
replica: K8sAnywhereTopology,
|
||||
};
|
||||
let result = pg_failover.deploy(&config).await.unwrap();
|
||||
println!("Result: {}", result);
|
||||
|
||||
println!("\n=== ✅ Docker Failover (LinuxHost supports Docker) ===");
|
||||
let docker_failover = FailoverTopology {
|
||||
primary: LinuxHostTopology,
|
||||
replica: LinuxHostTopology,
|
||||
};
|
||||
let result = docker_failover.deploy_docker().await.unwrap();
|
||||
println!("Result: {}", result);
|
||||
|
||||
println!("\n=== ❌ Would fail to compile (K8sAnywhere !: Docker) ===");
|
||||
// let invalid = FailoverTopology {
|
||||
// primary: K8sAnywhereTopology,
|
||||
// replica: K8sAnywhereTopology,
|
||||
// };
|
||||
// invalid.deploy_docker().await.unwrap(); // Error: `K8sAnywhereTopology: Docker` not satisfied!
|
||||
// Very clear error message :
|
||||
// error[E0599]: the method `deploy_docker` exists for struct `FailoverTopology<K8sAnywhereTopology>`, but its trait bounds were not satisfied
|
||||
// --> src/main.rs:90:9
|
||||
// |
|
||||
// 4 | pub struct FailoverTopology<T> {
|
||||
// | ------------------------------ method `deploy_docker` not found for this struct because it doesn't satisfy `FailoverTopology<K8sAnywhereTopology>: Docker`
|
||||
// ...
|
||||
// 37 | struct K8sAnywhereTopology;
|
||||
// | -------------------------- doesn't satisfy `K8sAnywhereTopology: Docker`
|
||||
// ...
|
||||
// 90 | invalid.deploy_docker(); // `T: Docker` bound unsatisfied
|
||||
// | ^^^^^^^^^^^^^ method cannot be called on `FailoverTopology<K8sAnywhereTopology>` due to unsatisfied trait bounds
|
||||
// |
|
||||
// note: trait bound `K8sAnywhereTopology: Docker` was not satisfied
|
||||
// --> src/main.rs:61:9
|
||||
// |
|
||||
// 61 | impl<T: Docker + Send + Sync> Docker for FailoverTopology<T> {
|
||||
// | ^^^^^^ ------ -------------------
|
||||
// | |
|
||||
// | unsatisfied trait bound introduced here
|
||||
// note: the trait `Docker` must be implemented
|
||||
}
|
||||
|
||||
@@ -1,90 +0,0 @@
|
||||
# Architecture Decision Record: Global Orchestration Mesh & The Harmony Agent
|
||||
|
||||
**Status:** Proposed
|
||||
**Date:** 2025-12-19
|
||||
|
||||
## Context
|
||||
|
||||
Harmony is designed to enable a truly decentralized infrastructure where independent clusters—owned by different organizations or running on diverse hardware—can collaborate reliably. This vision combines the decentralization of Web3 with the performance and capabilities of Web2.
|
||||
|
||||
Currently, Harmony operates as a stateless CLI tool, invoked manually or via CI runners. While effective for deployment, this model presents a critical limitation: **a CLI cannot react to real-time events.**
|
||||
|
||||
To achieve automated failover and dynamic workload management, we need a system that is "always on." Relying on manual intervention or scheduled CI jobs to recover from a cluster failure creates unacceptable latency and prevents us from scaling to thousands of nodes.
|
||||
|
||||
Furthermore, we face a challenge in serving diverse workloads:
|
||||
* **Financial workloads** require absolute consistency (CP - Consistency/Partition Tolerance).
|
||||
* **AI/Inference workloads** require maximum availability (AP - Availability/Partition Tolerance).
|
||||
|
||||
There are many more use cases, but those are the two extremes.
|
||||
|
||||
We need a unified architecture that automates cluster coordination and supports both consistency models without requiring a complete re-architecture in the future.
|
||||
|
||||
## Decision
|
||||
|
||||
We propose a fundamental architectural evolution. It has been clear since the start of Harmony that it would be necessary to transition Harmony from a purely ephemeral CLI tool to a system that includes a persistent **Harmony Agent**. This Agent will connect to a **Global Orchestration Mesh** based on a strongly consistent protocol.
|
||||
|
||||
The proposal consists of four key pillars:
|
||||
|
||||
### 1. The Harmony Agent (New Component)
|
||||
We will develop a long-running process (Daemon/Agent) to be deployed alongside workloads.
|
||||
* **Shift from CLI:** Unlike the CLI, which applies configuration and exits, the Agent maintains a persistent connection to the mesh.
|
||||
* **Responsibility:** It actively monitors cluster health, participates in consensus, and executes lifecycle commands (start/stop/fence) instantly when the mesh dictates a state change.
|
||||
|
||||
### 2. The Technology: NATS JetStream
|
||||
We will utilize **NATS JetStream** as the underlying transport and consensus layer for the Agent and the Mesh.
|
||||
* **Why not raw Raft?** Implementing a raw Raft library requires building and maintaining the transport layer, log compaction, snapshotting, and peer discovery manually. NATS JetStream provides a battle-tested, distributed log and Key-Value store (based on Raft) out of the box, along with a high-performance pub/sub system for event propagation.
|
||||
* **Role:** It will act as the "source of truth" for the cluster state.
|
||||
|
||||
### 3. Strong Consistency at the Mesh Layer
|
||||
The mesh will operate with **Strong Consistency** by default.
|
||||
* All critical cluster state changes (topology updates, lease acquisitions, leadership elections) will require consensus among the Agents.
|
||||
* This ensures that in the event of a network partition, we have a mathematical guarantee of which side holds the valid state, preventing data corruption.
|
||||
|
||||
### 4. Public UX: The `FailoverStrategy` Abstraction
|
||||
To keep the user experience stable and simple, we will expose the complexity of the mesh through a high-level configuration API, tentatively called `FailoverStrategy`.
|
||||
|
||||
The user defines the *intent* in their config, and the Harmony Agent automates the *execution*:
|
||||
|
||||
* **`FailoverStrategy::AbsoluteConsistency`**:
|
||||
* *Use Case:* Banking, Transactional DBs.
|
||||
* *Behavior:* If the mesh detects a partition, the Agent on the minority side immediately halts workloads. No split-brain is ever allowed.
|
||||
* **`FailoverStrategy::SplitBrainAllowed`**:
|
||||
* *Use Case:* LLM Inference, Stateless Web Servers.
|
||||
* *Behavior:* If a partition occurs, the Agent keeps workloads running to maximize uptime. State is reconciled when connectivity returns.
|
||||
|
||||
## Rationale
|
||||
|
||||
**The Necessity of an Agent**
|
||||
You cannot automate what you do not monitor. Moving to an Agent-based model is the only way to achieve sub-second reaction times to infrastructure failures. It transforms Harmony from a deployment tool into a self-healing platform.
|
||||
|
||||
**Scaling & Decentralization**
|
||||
To allow independent clusters to collaborate, they need a shared language. A strongly consistent mesh allows Cluster A (Organization X) and Cluster B (Organization Y) to agree on workload placement without a central authority.
|
||||
|
||||
**Why Strong Consistency First?**
|
||||
It is technically feasible to relax a strongly consistent system to allow for "Split Brain" behavior (AP) when the user requests it. However, it is nearly impossible to take an eventually consistent system and force it to be strongly consistent (CP) later. By starting with strict constraints, we cover the hardest use cases (Finance) immediately.
|
||||
|
||||
**Future Topologies**
|
||||
While our immediate need is `FailoverTopology` (Multi-site), this architecture supports any future topology logic:
|
||||
* **`CostTopology`**: Agents negotiate to route workloads to the cluster with the cheapest spot instances.
|
||||
* **`HorizontalTopology`**: Spreading a single workload across 100 clusters for massive scale.
|
||||
* **`GeoTopology`**: Ensuring data stays within specific legal jurisdictions.
|
||||
|
||||
The mesh provides the *capability* (consensus and messaging); the topology provides the *logic*.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**
|
||||
* **Automation:** Eliminates manual failover, enabling massive scale.
|
||||
* **Reliability:** Guarantees data safety for critical workloads by default.
|
||||
* **Flexibility:** A single codebase serves both high-frequency trading and AI inference.
|
||||
* **Stability:** The public API remains abstract, allowing us to optimize the mesh internals without breaking user code.
|
||||
|
||||
**Negative**
|
||||
* **Deployment Complexity:** Users must now deploy and maintain a running service (the Agent) rather than just downloading a binary.
|
||||
* **Engineering Complexity:** Integrating NATS JetStream and handling distributed state machines is significantly more complex than the current CLI logic.
|
||||
|
||||
## Implementation Plan (Short Term)
|
||||
1. **Agent Bootstrap:** Create the initial scaffold for the Harmony Agent (daemon).
|
||||
2. **Mesh Integration:** Prototype NATS JetStream embedding within the Agent.
|
||||
3. **Strategy Implementation:** Add `FailoverStrategy` to the configuration schema and implement the logic in the Agent to read and act on it.
|
||||
4. **Migration:** Transition the current manual failover scripts into event-driven logic handled by the Agent.
|
||||
@@ -1 +1,33 @@
|
||||
Not much here yet, see the `adr` folder for now. More to come in time!
|
||||
# Harmony Documentation Hub
|
||||
|
||||
Welcome to the Harmony documentation. This is the main entry point for learning everything from core concepts to building your own Score, Topologies, and Capabilities.
|
||||
|
||||
## 1. Getting Started
|
||||
|
||||
If you're new to Harmony, start here:
|
||||
|
||||
- [**Getting Started Guide**](./guides/getting-started.md): A step-by-step tutorial that takes you from an empty project to deploying your first application.
|
||||
- [**Core Concepts**](./concepts.md): A high-level overview of the key concepts in Harmony: `Score`, `Topology`, `Capability`, `Inventory`, `Interpret`, ...
|
||||
|
||||
## 2. Use Cases & Examples
|
||||
|
||||
See how to use Harmony to solve real-world problems.
|
||||
|
||||
- [**OKD on Bare Metal**](./use-cases/okd-on-bare-metal.md): A detailed walkthrough of bootstrapping a high-availability OKD cluster from physical hardware.
|
||||
- [**Deploy a Rust Web App**](./use-cases/deploy-rust-webapp.md): A quick guide to deploying a monitored, containerized web application to a Kubernetes cluster.
|
||||
|
||||
## 3. Component Catalogs
|
||||
|
||||
Discover existing, reusable components you can use in your Harmony projects.
|
||||
|
||||
- [**Scores Catalog**](./catalogs/scores.md): A categorized list of all available `Scores` (the "what").
|
||||
- [**Topologies Catalog**](./catalogs/topologies.md): A list of all available `Topologies` (the "where").
|
||||
- [**Capabilities Catalog**](./catalogs/capabilities.md): A list of all available `Capabilities` (the "how").
|
||||
|
||||
## 4. Developer Guides
|
||||
|
||||
Ready to build your own components? These guides show you how.
|
||||
|
||||
- [**Writing a Score**](./guides/writing-a-score.md): Learn how to create your own `Score` and `Interpret` logic to define a new desired state.
|
||||
- [**Writing a Topology**](./guides/writing-a-topology.md): Learn how to model a new environment (like AWS, GCP, or custom hardware) as a `Topology`.
|
||||
- [**Adding Capabilities**](./guides/adding-capabilities.md): See how to add a `Capability` to your custom `Topology`.
|
||||
|
||||
7
docs/catalogs/README.md
Normal file
7
docs/catalogs/README.md
Normal file
@@ -0,0 +1,7 @@
|
||||
# Component Catalogs
|
||||
|
||||
This section is the "dictionary" for Harmony. It lists all the reusable components available out-of-the-box.
|
||||
|
||||
- [**Scores Catalog**](./scores.md): Discover all available `Scores` (the "what").
|
||||
- [**Topologies Catalog**](./topologies.md): A list of all available `Topologies` (the "where").
|
||||
- [**Capabilities Catalog**](./capabilities.md): A list of all available `Capabilities` (the "how").
|
||||
40
docs/catalogs/capabilities.md
Normal file
40
docs/catalogs/capabilities.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Capabilities Catalog
|
||||
|
||||
A `Capability` is a specific feature or API that a `Topology` offers. `Interpret` logic uses these capabilities to execute a `Score`.
|
||||
|
||||
This list is primarily for developers **writing new Topologies or Scores**. As a user, you just need to know that the `Topology` you pick (like `K8sAnywhereTopology`) provides the capabilities your `Scores` (like `ApplicationScore`) need.
|
||||
|
||||
<!--toc:start-->
|
||||
|
||||
- [Capabilities Catalog](#capabilities-catalog)
|
||||
- [Kubernetes & Application](#kubernetes-application)
|
||||
- [Monitoring & Observability](#monitoring-observability)
|
||||
- [Networking (Core Services)](#networking-core-services)
|
||||
- [Networking (Hardware & Host)](#networking-hardware-host)
|
||||
|
||||
<!--toc:end-->
|
||||
|
||||
## Kubernetes & Application
|
||||
|
||||
- **K8sClient**: Provides an authenticated client to interact with a Kubernetes API (create/read/update/delete resources).
|
||||
- **HelmCommand**: Provides the ability to execute Helm commands (install, upgrade, template).
|
||||
- **TenantManager**: Provides methods for managing tenants in a multi-tenant cluster.
|
||||
- **Ingress**: Provides an interface for managing ingress controllers and resources.
|
||||
|
||||
## Monitoring & Observability
|
||||
|
||||
- **Grafana**: Provides an API for configuring Grafana (datasources, dashboards).
|
||||
- **Monitoring**: A general capability for configuring monitoring (e.g., creating Prometheus rules).
|
||||
|
||||
## Networking (Core Services)
|
||||
|
||||
- **DnsServer**: Provides an interface for creating and managing DNS records.
|
||||
- **LoadBalancer**: Provides an interface for configuring a load balancer (e.g., OPNsense, MetalLB).
|
||||
- **DhcpServer**: Provides an interface for managing DHCP leases and host bindings.
|
||||
- **TftpServer**: Provides an interface for managing files on a TFTP server (e.g., iPXE boot files).
|
||||
|
||||
## Networking (Hardware & Host)
|
||||
|
||||
- **Router**: Provides an interface for configuring routing rules, typically on a firewall like OPNsense.
|
||||
- **Switch**: Provides an interface for configuring a physical network switch (e.g., managing VLANs and port channels).
|
||||
- **NetworkManager**: Provides an interface for configuring host-level networking (e.g., creating bonds and bridges on a node).
|
||||
102
docs/catalogs/scores.md
Normal file
102
docs/catalogs/scores.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Scores Catalog
|
||||
|
||||
A `Score` is a declarative description of a desired state. Find the Score you need and add it to your `harmony!` block's `scores` array.
|
||||
|
||||
<!--toc:start-->
|
||||
|
||||
- [Scores Catalog](#scores-catalog)
|
||||
- [Application Deployment](#application-deployment)
|
||||
- [OKD / Kubernetes Cluster Setup](#okd-kubernetes-cluster-setup)
|
||||
- [Cluster Services & Management](#cluster-services-management)
|
||||
- [Monitoring & Alerting](#monitoring-alerting)
|
||||
- [Infrastructure & Networking (Bare Metal)](#infrastructure-networking-bare-metal)
|
||||
- [Infrastructure & Networking (Cluster)](#infrastructure-networking-cluster)
|
||||
- [Tenant Management](#tenant-management)
|
||||
- [Utility](#utility)
|
||||
|
||||
<!--toc:end-->
|
||||
|
||||
## Application Deployment
|
||||
|
||||
Scores for deploying and managing end-user applications.
|
||||
|
||||
- **ApplicationScore**: The primary score for deploying a web application. Describes the application, its framework, and the features it requires (e.g., monitoring, CI/CD).
|
||||
- **HelmChartScore**: Deploys a generic Helm chart to a Kubernetes cluster.
|
||||
- **ArgoHelmScore**: Deploys an application using an ArgoCD Helm chart.
|
||||
- **LAMPScore**: A specialized score for deploying a classic LAMP (Linux, Apache, MySQL, PHP) stack.
|
||||
|
||||
## OKD / Kubernetes Cluster Setup
|
||||
|
||||
This collection of Scores is used to provision an entire OKD cluster from bare metal. They are typically used in order.
|
||||
|
||||
- **OKDSetup01InventoryScore**: Discovers and catalogs the physical hardware.
|
||||
- **OKDSetup02BootstrapScore**: Configures the bootstrap node, renders iPXE files, and kicks off the SCOS installation.
|
||||
- **OKDSetup03ControlPlaneScore**: Renders iPXE configurations for the control plane nodes.
|
||||
- **OKDSetupPersistNetworkBondScore**: Configures network bonds on the nodes and port channels on the switches.
|
||||
- **OKDSetup04WorkersScore**: Renders iPXE configurations for the worker nodes.
|
||||
- **OKDSetup06InstallationReportScore**: Runs post-installation checks and generates a report.
|
||||
- **OKDUpgradeScore**: Manages the upgrade process for an existing OKD cluster.
|
||||
|
||||
## Cluster Services & Management
|
||||
|
||||
Scores for installing and managing services _inside_ a Kubernetes cluster.
|
||||
|
||||
- **K3DInstallationScore**: Installs and configes a local K3D (k3s-in-docker) cluster. Used by `K8sAnywhereTopology`.
|
||||
- **CertManagerHelmScore**: Deploys the `cert-manager` Helm chart.
|
||||
- **ClusterIssuerScore**: Configures a `ClusterIssuer` for `cert-manager`, (e.g., for Let's Encrypt).
|
||||
- **K8sNamespaceScore**: Ensures a Kubernetes namespace exists.
|
||||
- **K8sDeploymentScore**: Deploys a generic `Deployment` resource to Kubernetes.
|
||||
- **K8sIngressScore**: Configures an `Ingress` resource for a service.
|
||||
|
||||
## Monitoring & Alerting
|
||||
|
||||
Scores for configuring observability, dashboards, and alerts.
|
||||
|
||||
- **ApplicationMonitoringScore**: A generic score to set up monitoring for an application.
|
||||
- **ApplicationRHOBMonitoringScore**: A specialized score for setting up monitoring via the Red Hat Observability stack.
|
||||
- **HelmPrometheusAlertingScore**: Configures Prometheus alerts via a Helm chart.
|
||||
- **K8sPrometheusCRDAlertingScore**: Configures Prometheus alerts using the `PrometheusRule` CRD.
|
||||
- **PrometheusAlertScore**: A generic score for creating a Prometheus alert.
|
||||
- **RHOBAlertingScore**: Configures alerts specifically for the Red Hat Observability stack.
|
||||
- **NtfyScore**: Configures alerts to be sent to a `ntfy.sh` server.
|
||||
|
||||
## Infrastructure & Networking (Bare Metal)
|
||||
|
||||
Low-level scores for managing physical hardware and network services.
|
||||
|
||||
- **DhcpScore**: Configures a DHCP server.
|
||||
- **OKDDhcpScore**: A specialized DHCP configuration for the OKD bootstrap process.
|
||||
- **OKDBootstrapDhcpScore**: Configures DHCP specifically for the bootstrap node.
|
||||
- **DhcpHostBindingScore**: Creates a specific MAC-to-IP binding in the DHCP server.
|
||||
- **DnsScore**: Configures a DNS server.
|
||||
- **OKDDnsScore**: A specialized DNS configuration for the OKD cluster (e.g., `api.*`, `*.apps.*`).
|
||||
- **StaticFilesHttpScore**: Serves a directory of static files (e.g., a documentation site) over HTTP.
|
||||
- **TftpScore**: Configures a TFTP server, typically for serving iPXE boot files.
|
||||
- **IPxeMacBootFileScore**: Assigns a specific iPXE boot file to a MAC address in the TFTP server.
|
||||
- **OKDIpxeScore**: A specialized score for generating the iPXE boot scripts for OKD.
|
||||
- **OPNsenseShellCommandScore**: Executes a shell command on an OPNsense firewall.
|
||||
|
||||
## Infrastructure & Networking (Cluster)
|
||||
|
||||
Network services that run inside the cluster or as part of the topology.
|
||||
|
||||
- **LoadBalancerScore**: Configures a general-purpose load balancer.
|
||||
- **OKDLoadBalancerScore**: Configures the high-availability load balancers for the OKD API and ingress.
|
||||
- **OKDBootstrapLoadBalancerScore**: Configures the load balancer specifically for the bootstrap-time API endpoint.
|
||||
- **K8sIngressScore**: Configures an Ingress controller or resource.
|
||||
- [HighAvailabilityHostNetworkScore](../../harmony/src/modules/okd/host_network.rs): Configures network bonds on a host and the corresponding port-channels on the switch stack for high-availability.
|
||||
|
||||
## Tenant Management
|
||||
|
||||
Scores for managing multi-tenancy within a cluster.
|
||||
|
||||
- **TenantScore**: Creates a new tenant (e.g., a namespace, quotas, network policies).
|
||||
- **TenantCredentialScore**: Generates and provisions credentials for a new tenant.
|
||||
|
||||
## Utility
|
||||
|
||||
Helper scores for discovery and inspection.
|
||||
|
||||
- **LaunchDiscoverInventoryAgentScore**: Launches the agent responsible for the `OKDSetup01InventoryScore`.
|
||||
- **DiscoverHostForRoleScore**: A utility score to find a host matching a specific role in the inventory.
|
||||
- **InspectInventoryScore**: Dumps the discovered inventory for inspection.
|
||||
59
docs/catalogs/topologies.md
Normal file
59
docs/catalogs/topologies.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Topologies Catalog
|
||||
|
||||
A `Topology` is the logical representation of your infrastructure and its `Capabilities`. You select a `Topology` in your Harmony project to define _where_ your `Scores` will be applied.
|
||||
|
||||
<!--toc:start-->
|
||||
|
||||
- [Topologies Catalog](#topologies-catalog)
|
||||
- [HAClusterTopology](#haclustertopology)
|
||||
- [K8sAnywhereTopology](#k8sanywheretopology)
|
||||
|
||||
<!--toc:end-->
|
||||
|
||||
### HAClusterTopology
|
||||
|
||||
- **`HAClusterTopology::autoload()`**
|
||||
|
||||
This `Topology` represents a high-availability, bare-metal cluster. It is designed for production-grade deployments like OKD.
|
||||
|
||||
It models an environment consisting of:
|
||||
|
||||
- At least 3 cluster nodes (for control plane/workers)
|
||||
- 2 redundant firewalls (e.g., OPNsense)
|
||||
- 2 redundant network switches
|
||||
|
||||
**Provided Capabilities:**
|
||||
This topology provides a rich set of capabilities required for bare-metal provisioning and cluster management, including:
|
||||
|
||||
- `K8sClient` (once the cluster is bootstrapped)
|
||||
- `DnsServer`
|
||||
- `LoadBalancer`
|
||||
- `DhcpServer`
|
||||
- `TftpServer`
|
||||
- `Router` (via the firewalls)
|
||||
- `Switch`
|
||||
- `NetworkManager` (for host-level network config)
|
||||
|
||||
---
|
||||
|
||||
### K8sAnywhereTopology
|
||||
|
||||
- **`K8sAnywhereTopology::from_env()`**
|
||||
|
||||
This `Topology` is designed for development and application deployment. It provides a simple, abstract way to deploy to _any_ Kubernetes cluster.
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. By default (`from_env()` with no env vars), it automatically provisions a **local K3D (k3s-in-docker) cluster** on your machine. This is perfect for local development and testing.
|
||||
2. If you provide a `KUBECONFIG` environment variable, it will instead connect to that **existing Kubernetes cluster** (e.g., your staging or production OKD cluster).
|
||||
|
||||
This allows you to use the _exact same code_ to deploy your application locally as you do to deploy it to production.
|
||||
|
||||
**Provided Capabilities:**
|
||||
|
||||
- `K8sClient`
|
||||
- `HelmCommand`
|
||||
- `TenantManager`
|
||||
- `Ingress`
|
||||
- `Monitoring`
|
||||
- ...and more.
|
||||
40
docs/concepts.md
Normal file
40
docs/concepts.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Core Concepts
|
||||
|
||||
Harmony's design is based on a few key concepts. Understanding them is the key to unlocking the framework's power.
|
||||
|
||||
### 1. Score
|
||||
|
||||
- **What it is:** A **Score** is a declarative description of a desired state. It's a "resource" that defines _what_ you want to achieve, not _how_ to do it.
|
||||
- **Example:** `ApplicationScore` declares "I want this web application to be running and monitored."
|
||||
|
||||
### 2. Topology
|
||||
|
||||
- **What it is:** A **Topology** is the logical representation of your infrastructure and its abilities. It's the "where" your Scores will be applied.
|
||||
- **Key Job:** A Topology's most important job is to expose which `Capabilities` it supports.
|
||||
- **Example:** `HAClusterTopology` represents a bare-metal cluster and exposes `Capabilities` like `NetworkManager` and `Switch`. `K8sAnywhereTopology` represents a Kubernetes cluster and exposes the `K8sClient` `Capability`.
|
||||
|
||||
### 3. Capability
|
||||
|
||||
- **What it is:** A **Capability** is a specific feature or API that a `Topology` offers. It's the "how" a `Topology` can fulfill a `Score`'s request.
|
||||
- **Example:** The `K8sClient` capability offers a way to interact with a Kubernetes API. The `Switch` capability offers a way to configure a physical network switch.
|
||||
|
||||
### 4. Interpret
|
||||
|
||||
- **What it is:** An **Interpret** is the execution logic that makes a `Score` a reality. It's the "glue" that connects the _desired state_ (`Score`) to the _environment's abilities_ (`Topology`'s `Capabilities`).
|
||||
- **How it works:** When you apply a `Score`, Harmony finds the matching `Interpret` for your `Topology`. This `Interpret` then uses the `Capabilities` provided by the `Topology` to execute the necessary steps.
|
||||
|
||||
### 5. Inventory
|
||||
|
||||
- **What it is:** An **Inventory** is the physical material (the "what") used in a cluster. This is most relevant for bare-metal or on-premise topologies.
|
||||
- **Example:** A list of nodes with their roles (control plane, worker), CPU, RAM, and network interfaces. For the `K8sAnywhereTopology`, the inventory might be empty or autoloaded, as the infrastructure is more abstract.
|
||||
|
||||
---
|
||||
|
||||
### How They Work Together (The Compile-Time Check)
|
||||
|
||||
1. You **write a `Score`** (e.g., `ApplicationScore`).
|
||||
2. Your `Score`'s `Interpret` logic requires certain **`Capabilities`** (e.g., `K8sClient` and `Ingress`).
|
||||
3. You choose a **`Topology`** to run it on (e.g., `HAClusterTopology`).
|
||||
4. **At compile-time**, Harmony checks: "Does `HAClusterTopology` provide the `K8sClient` and `Ingress` capabilities that `ApplicationScore` needs?"
|
||||
- **If Yes:** Your code compiles. You can be confident it will run.
|
||||
- **If No:** The compiler gives you an error. You've just prevented a "config-is-valid-but-platform-is-wrong" runtime error before you even deployed.
|
||||
42
docs/guides/getting-started.md
Normal file
42
docs/guides/getting-started.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Getting Started Guide
|
||||
|
||||
Welcome to Harmony! This guide will walk you through installing the Harmony framework, setting up a new project, and deploying your first application.
|
||||
|
||||
We will build and deploy the "Rust Web App" example, which automatically:
|
||||
|
||||
1. Provisions a local K3D (Kubernetes in Docker) cluster.
|
||||
2. Deploys a sample Rust web application.
|
||||
3. Sets up monitoring for the application.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before you begin, you'll need a few tools installed on your system:
|
||||
|
||||
- **Rust & Cargo:** [Install Rust](https://www.rust-lang.org/tools/install)
|
||||
- **Docker:** [Install Docker](https://docs.docker.com/get-docker/) (Required for the K3D local cluster)
|
||||
- **kubectl:** [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) (For inspecting the cluster)
|
||||
|
||||
## 1. Install Harmony
|
||||
|
||||
First, clone the Harmony repository and build the project. This gives you the `harmony` CLI and all the core libraries.
|
||||
|
||||
```bash
|
||||
# Clone the main repository
|
||||
git clone https://git.nationtech.io/nationtech/harmony
|
||||
cd harmony
|
||||
|
||||
# Build the project (this may take a few minutes)
|
||||
cargo build --release
|
||||
```
|
||||
|
||||
...
|
||||
|
||||
## Next Steps
|
||||
|
||||
Congratulations, you've just deployed an application using true infrastructure-as-code!
|
||||
|
||||
From here, you can:
|
||||
|
||||
- [Explore the Catalogs](../catalogs/README.md): See what other [Scores](../catalogs/scores.md) and [Topologies](../catalogs/topologies.md) are available.
|
||||
- [Read the Use Cases](../use-cases/README.md): Check out the [OKD on Bare Metal](./use-cases/okd-on-bare-metal.md) guide for a more advanced scenario.
|
||||
- [Write your own Score](../guides/writing-a-score.md): Dive into the [Developer Guide](./guides/developer-guide.md) to start building your own components.
|
||||
@@ -5,6 +5,10 @@ version.workspace = true
|
||||
readme.workspace = true
|
||||
license.workspace = true
|
||||
|
||||
[[example]]
|
||||
name = "try_rust_webapp"
|
||||
path = "src/main.rs"
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
harmony_cli = { path = "../../harmony_cli" }
|
||||
|
||||
@@ -1,6 +1,4 @@
|
||||
mod repository;
|
||||
use std::fmt;
|
||||
|
||||
pub use repository::*;
|
||||
|
||||
#[derive(Debug, new, Clone)]
|
||||
@@ -71,14 +69,5 @@ pub enum HostRole {
|
||||
Bootstrap,
|
||||
ControlPlane,
|
||||
Worker,
|
||||
}
|
||||
|
||||
impl fmt::Display for HostRole {
|
||||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
|
||||
match self {
|
||||
HostRole::Bootstrap => write!(f, "Bootstrap"),
|
||||
HostRole::ControlPlane => write!(f, "ControlPlane"),
|
||||
HostRole::Worker => write!(f, "Worker"),
|
||||
}
|
||||
}
|
||||
Storage,
|
||||
}
|
||||
|
||||
@@ -17,12 +17,6 @@ use crate::{
|
||||
topology::{HostNetworkConfig, NetworkError, NetworkManager, k8s::K8sClient},
|
||||
};
|
||||
|
||||
/// TODO document properly the non-intuitive behavior or "roll forward only" of nmstate in general
|
||||
/// It is documented in nmstate official doc, but worth mentionning here :
|
||||
///
|
||||
/// - You create a bond, nmstate will apply it
|
||||
/// - You delete de bond from nmstate, it will NOT delete it
|
||||
/// - To delete it you have to update it with configuration set to null
|
||||
pub struct OpenShiftNmStateNetworkManager {
|
||||
k8s_client: Arc<K8sClient>,
|
||||
}
|
||||
@@ -37,7 +31,6 @@ impl std::fmt::Debug for OpenShiftNmStateNetworkManager {
|
||||
impl NetworkManager for OpenShiftNmStateNetworkManager {
|
||||
async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError> {
|
||||
debug!("Installing NMState controller...");
|
||||
// TODO use operatorhub maybe?
|
||||
self.k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/nmstate.io_nmstates.yaml
|
||||
").unwrap(), Some("nmstate"))
|
||||
.await?;
|
||||
|
||||
@@ -1,8 +1,20 @@
|
||||
use crate::{
|
||||
interpret::Interpret, inventory::HostRole, modules::okd::bootstrap_okd_node::OKDNodeInterpret,
|
||||
score::Score, topology::HAClusterTopology,
|
||||
data::Version,
|
||||
hardware::PhysicalHost,
|
||||
infra::inventory::InventoryRepositoryFactory,
|
||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||
inventory::{HostRole, Inventory},
|
||||
modules::{
|
||||
dhcp::DhcpHostBindingScore, http::IPxeMacBootFileScore,
|
||||
inventory::DiscoverHostForRoleScore, okd::templates::BootstrapIpxeTpl,
|
||||
},
|
||||
score::Score,
|
||||
topology::{HAClusterTopology, HostBinding},
|
||||
};
|
||||
use async_trait::async_trait;
|
||||
use derive_new::new;
|
||||
use harmony_types::id::Id;
|
||||
use log::{debug, info};
|
||||
use serde::Serialize;
|
||||
|
||||
// -------------------------------------------------------------------------------------------------
|
||||
@@ -16,13 +28,226 @@ pub struct OKDSetup03ControlPlaneScore {}
|
||||
|
||||
impl Score<HAClusterTopology> for OKDSetup03ControlPlaneScore {
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<HAClusterTopology>> {
|
||||
// TODO: Implement a step to wait for the control plane nodes to join the cluster
|
||||
// and for the cluster operators to become available. This would be similar to
|
||||
// the `wait-for bootstrap-complete` command.
|
||||
Box::new(OKDNodeInterpret::new(HostRole::ControlPlane))
|
||||
Box::new(OKDSetup03ControlPlaneInterpret::new())
|
||||
}
|
||||
|
||||
fn name(&self) -> String {
|
||||
"OKDSetup03ControlPlaneScore".to_string()
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct OKDSetup03ControlPlaneInterpret {
|
||||
version: Version,
|
||||
status: InterpretStatus,
|
||||
}
|
||||
|
||||
impl OKDSetup03ControlPlaneInterpret {
|
||||
pub fn new() -> Self {
|
||||
let version = Version::from("1.0.0").unwrap();
|
||||
Self {
|
||||
version,
|
||||
status: InterpretStatus::QUEUED,
|
||||
}
|
||||
}
|
||||
|
||||
/// Ensures that three physical hosts are discovered and available for the ControlPlane role.
|
||||
/// It will trigger discovery if not enough hosts are found.
|
||||
async fn get_nodes(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &HAClusterTopology,
|
||||
) -> Result<Vec<PhysicalHost>, InterpretError> {
|
||||
const REQUIRED_HOSTS: usize = 3;
|
||||
let repo = InventoryRepositoryFactory::build().await?;
|
||||
let mut control_plane_hosts = repo.get_host_for_role(&HostRole::ControlPlane).await?;
|
||||
|
||||
while control_plane_hosts.len() < REQUIRED_HOSTS {
|
||||
info!(
|
||||
"Discovery of {} control plane hosts in progress, current number {}",
|
||||
REQUIRED_HOSTS,
|
||||
control_plane_hosts.len()
|
||||
);
|
||||
// This score triggers the discovery agent for a specific role.
|
||||
DiscoverHostForRoleScore {
|
||||
role: HostRole::ControlPlane,
|
||||
}
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
control_plane_hosts = repo.get_host_for_role(&HostRole::ControlPlane).await?;
|
||||
}
|
||||
|
||||
if control_plane_hosts.len() < REQUIRED_HOSTS {
|
||||
Err(InterpretError::new(format!(
|
||||
"OKD Requires at least {} control plane hosts, but only found {}. Cannot proceed.",
|
||||
REQUIRED_HOSTS,
|
||||
control_plane_hosts.len()
|
||||
)))
|
||||
} else {
|
||||
// Take exactly the number of required hosts to ensure consistency.
|
||||
Ok(control_plane_hosts
|
||||
.into_iter()
|
||||
.take(REQUIRED_HOSTS)
|
||||
.collect())
|
||||
}
|
||||
}
|
||||
|
||||
/// Configures DHCP host bindings for all control plane nodes.
|
||||
async fn configure_host_binding(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &HAClusterTopology,
|
||||
nodes: &Vec<PhysicalHost>,
|
||||
) -> Result<(), InterpretError> {
|
||||
info!("[ControlPlane] Configuring host bindings for control plane nodes.");
|
||||
|
||||
// Ensure the topology definition matches the number of physical nodes found.
|
||||
if topology.control_plane.len() != nodes.len() {
|
||||
return Err(InterpretError::new(format!(
|
||||
"Mismatch between logical control plane hosts defined in topology ({}) and physical nodes found ({}).",
|
||||
topology.control_plane.len(),
|
||||
nodes.len()
|
||||
)));
|
||||
}
|
||||
|
||||
// Create a binding for each physical host to its corresponding logical host.
|
||||
let bindings: Vec<HostBinding> = topology
|
||||
.control_plane
|
||||
.iter()
|
||||
.zip(nodes.iter())
|
||||
.map(|(logical_host, physical_host)| {
|
||||
info!(
|
||||
"Creating binding: Logical Host '{}' -> Physical Host ID '{}'",
|
||||
logical_host.name, physical_host.id
|
||||
);
|
||||
HostBinding {
|
||||
logical_host: logical_host.clone(),
|
||||
physical_host: physical_host.clone(),
|
||||
}
|
||||
})
|
||||
.collect();
|
||||
|
||||
DhcpHostBindingScore {
|
||||
host_binding: bindings,
|
||||
domain: Some(topology.domain_name.clone()),
|
||||
}
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Renders and deploys a per-MAC iPXE boot file for each control plane node.
|
||||
async fn configure_ipxe(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &HAClusterTopology,
|
||||
nodes: &Vec<PhysicalHost>,
|
||||
) -> Result<(), InterpretError> {
|
||||
info!("[ControlPlane] Rendering per-MAC iPXE configurations.");
|
||||
|
||||
// The iPXE script content is the same for all control plane nodes,
|
||||
// pointing to the 'master.ign' ignition file.
|
||||
let content = BootstrapIpxeTpl {
|
||||
http_ip: &topology.http_server.get_ip().to_string(),
|
||||
scos_path: "scos",
|
||||
ignition_http_path: "okd_ignition_files",
|
||||
installation_device: "/dev/sda", // This might need to be configurable per-host in the future
|
||||
ignition_file_name: "master.ign", // Control plane nodes use the master ignition file
|
||||
}
|
||||
.to_string();
|
||||
|
||||
debug!("[ControlPlane] iPXE content template:\n{content}");
|
||||
|
||||
// Create and apply an iPXE boot file for each node.
|
||||
for node in nodes {
|
||||
let mac_address = node.get_mac_address();
|
||||
if mac_address.is_empty() {
|
||||
return Err(InterpretError::new(format!(
|
||||
"Physical host with ID '{}' has no MAC addresses defined.",
|
||||
node.id
|
||||
)));
|
||||
}
|
||||
info!(
|
||||
"[ControlPlane] Applying iPXE config for node ID '{}' with MACs: {:?}",
|
||||
node.id, mac_address
|
||||
);
|
||||
|
||||
IPxeMacBootFileScore {
|
||||
mac_address,
|
||||
content: content.clone(),
|
||||
}
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Prompts the user to reboot the target control plane nodes.
|
||||
async fn reboot_targets(&self, nodes: &Vec<PhysicalHost>) -> Result<(), InterpretError> {
|
||||
let node_ids: Vec<String> = nodes.iter().map(|n| n.id.to_string()).collect();
|
||||
info!("[ControlPlane] Requesting reboot for control plane nodes: {node_ids:?}",);
|
||||
|
||||
let confirmation = inquire::Confirm::new(
|
||||
&format!("Please reboot the {} control plane nodes ({}) to apply their PXE configuration. Press enter when ready.", nodes.len(), node_ids.join(", ")),
|
||||
)
|
||||
.prompt()
|
||||
.map_err(|e| InterpretError::new(format!("User prompt failed: {e}")))?;
|
||||
|
||||
if !confirmation {
|
||||
return Err(InterpretError::new(
|
||||
"User aborted the operation.".to_string(),
|
||||
));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<HAClusterTopology> for OKDSetup03ControlPlaneInterpret {
|
||||
fn get_name(&self) -> InterpretName {
|
||||
InterpretName::Custom("OKDSetup03ControlPlane")
|
||||
}
|
||||
|
||||
fn get_version(&self) -> Version {
|
||||
self.version.clone()
|
||||
}
|
||||
|
||||
fn get_status(&self) -> InterpretStatus {
|
||||
self.status.clone()
|
||||
}
|
||||
|
||||
fn get_children(&self) -> Vec<Id> {
|
||||
vec![]
|
||||
}
|
||||
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &HAClusterTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
// 1. Ensure we have 3 physical hosts for the control plane.
|
||||
let nodes = self.get_nodes(inventory, topology).await?;
|
||||
|
||||
// 2. Create DHCP reservations for the control plane nodes.
|
||||
self.configure_host_binding(inventory, topology, &nodes)
|
||||
.await?;
|
||||
|
||||
// 3. Create iPXE files for each control plane node to boot from the master ignition.
|
||||
self.configure_ipxe(inventory, topology, &nodes).await?;
|
||||
|
||||
// 4. Reboot the nodes to start the OS installation.
|
||||
self.reboot_targets(&nodes).await?;
|
||||
|
||||
// TODO: Implement a step to wait for the control plane nodes to join the cluster
|
||||
// and for the cluster operators to become available. This would be similar to
|
||||
// the `wait-for bootstrap-complete` command.
|
||||
info!("[ControlPlane] Provisioning initiated. Monitor the cluster convergence manually.");
|
||||
|
||||
Ok(Outcome::success(
|
||||
"Control plane provisioning has been successfully initiated.".into(),
|
||||
))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,9 +1,15 @@
|
||||
use async_trait::async_trait;
|
||||
use derive_new::new;
|
||||
use harmony_types::id::Id;
|
||||
use log::info;
|
||||
use serde::Serialize;
|
||||
|
||||
use crate::{
|
||||
interpret::Interpret, inventory::HostRole, modules::okd::bootstrap_okd_node::OKDNodeInterpret,
|
||||
score::Score, topology::HAClusterTopology,
|
||||
data::Version,
|
||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||
inventory::Inventory,
|
||||
score::Score,
|
||||
topology::HAClusterTopology,
|
||||
};
|
||||
|
||||
// -------------------------------------------------------------------------------------------------
|
||||
@@ -17,10 +23,61 @@ pub struct OKDSetup04WorkersScore {}
|
||||
|
||||
impl Score<HAClusterTopology> for OKDSetup04WorkersScore {
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<HAClusterTopology>> {
|
||||
Box::new(OKDNodeInterpret::new(HostRole::Worker))
|
||||
Box::new(OKDSetup04WorkersInterpret::new(self.clone()))
|
||||
}
|
||||
|
||||
fn name(&self) -> String {
|
||||
"OKDSetup04WorkersScore".to_string()
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct OKDSetup04WorkersInterpret {
|
||||
score: OKDSetup04WorkersScore,
|
||||
version: Version,
|
||||
status: InterpretStatus,
|
||||
}
|
||||
|
||||
impl OKDSetup04WorkersInterpret {
|
||||
pub fn new(score: OKDSetup04WorkersScore) -> Self {
|
||||
let version = Version::from("1.0.0").unwrap();
|
||||
Self {
|
||||
version,
|
||||
score,
|
||||
status: InterpretStatus::QUEUED,
|
||||
}
|
||||
}
|
||||
|
||||
async fn render_and_reboot(&self) -> Result<(), InterpretError> {
|
||||
info!("[Workers] Rendering per-MAC PXE for workers and rebooting");
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<HAClusterTopology> for OKDSetup04WorkersInterpret {
|
||||
fn get_name(&self) -> InterpretName {
|
||||
InterpretName::Custom("OKDSetup04Workers")
|
||||
}
|
||||
|
||||
fn get_version(&self) -> Version {
|
||||
self.version.clone()
|
||||
}
|
||||
|
||||
fn get_status(&self) -> InterpretStatus {
|
||||
self.status.clone()
|
||||
}
|
||||
|
||||
fn get_children(&self) -> Vec<Id> {
|
||||
vec![]
|
||||
}
|
||||
|
||||
async fn execute(
|
||||
&self,
|
||||
_inventory: &Inventory,
|
||||
_topology: &HAClusterTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
self.render_and_reboot().await?;
|
||||
Ok(Outcome::success("Workers provisioned".into()))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,303 +0,0 @@
|
||||
use async_trait::async_trait;
|
||||
use derive_new::new;
|
||||
use harmony_types::id::Id;
|
||||
use log::{debug, info};
|
||||
use serde::Serialize;
|
||||
|
||||
use crate::{
|
||||
data::Version,
|
||||
hardware::PhysicalHost,
|
||||
infra::inventory::InventoryRepositoryFactory,
|
||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||
inventory::{HostRole, Inventory},
|
||||
modules::{
|
||||
dhcp::DhcpHostBindingScore,
|
||||
http::IPxeMacBootFileScore,
|
||||
inventory::DiscoverHostForRoleScore,
|
||||
okd::{
|
||||
okd_node::{
|
||||
BootstrapRole, ControlPlaneRole, OKDRoleProperties, StorageRole, WorkerRole,
|
||||
},
|
||||
templates::BootstrapIpxeTpl,
|
||||
},
|
||||
},
|
||||
score::Score,
|
||||
topology::{HAClusterTopology, HostBinding, LogicalHost},
|
||||
};
|
||||
|
||||
#[derive(Debug, Clone, Serialize, new)]
|
||||
pub struct OKDNodeInstallationScore {
|
||||
host_role: HostRole,
|
||||
}
|
||||
|
||||
impl Score<HAClusterTopology> for OKDNodeInstallationScore {
|
||||
fn name(&self) -> String {
|
||||
"OKDNodeScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<HAClusterTopology>> {
|
||||
Box::new(OKDNodeInterpret::new(self.host_role.clone()))
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct OKDNodeInterpret {
|
||||
host_role: HostRole,
|
||||
}
|
||||
|
||||
impl OKDNodeInterpret {
|
||||
pub fn new(host_role: HostRole) -> Self {
|
||||
Self { host_role }
|
||||
}
|
||||
|
||||
fn okd_role_properties(&self, role: &HostRole) -> &'static dyn OKDRoleProperties {
|
||||
match role {
|
||||
HostRole::Bootstrap => &BootstrapRole,
|
||||
HostRole::ControlPlane => &ControlPlaneRole,
|
||||
HostRole::Worker => &WorkerRole,
|
||||
}
|
||||
}
|
||||
|
||||
async fn get_nodes(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &HAClusterTopology,
|
||||
) -> Result<Vec<PhysicalHost>, InterpretError> {
|
||||
let repo = InventoryRepositoryFactory::build().await?;
|
||||
|
||||
let mut hosts = repo.get_host_for_role(&self.host_role).await?;
|
||||
|
||||
let okd_host_properties = self.okd_role_properties(&self.host_role);
|
||||
|
||||
let required_hosts: usize = okd_host_properties.required_hosts();
|
||||
|
||||
while hosts.len() < required_hosts {
|
||||
info!(
|
||||
"Discovery of {} {} hosts in progress, current number {}",
|
||||
required_hosts,
|
||||
self.host_role,
|
||||
hosts.len()
|
||||
);
|
||||
// This score triggers the discovery agent for a specific role.
|
||||
DiscoverHostForRoleScore {
|
||||
role: self.host_role.clone(),
|
||||
}
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
hosts = repo.get_host_for_role(&self.host_role).await?;
|
||||
}
|
||||
|
||||
if hosts.len() < required_hosts {
|
||||
Err(InterpretError::new(format!(
|
||||
"OKD Requires at least {} {} hosts, but only found {}. Cannot proceed.",
|
||||
required_hosts,
|
||||
self.host_role,
|
||||
hosts.len()
|
||||
)))
|
||||
} else {
|
||||
// Take exactly the number of required hosts to ensure consistency.
|
||||
Ok(hosts.into_iter().take(required_hosts).collect())
|
||||
}
|
||||
}
|
||||
|
||||
/// Configures DHCP host bindings for all nodes.
|
||||
async fn configure_host_binding(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &HAClusterTopology,
|
||||
nodes: &Vec<PhysicalHost>,
|
||||
) -> Result<(), InterpretError> {
|
||||
info!(
|
||||
"[{}] Configuring host bindings for {} plane nodes.",
|
||||
self.host_role, self.host_role,
|
||||
);
|
||||
|
||||
let host_properties = self.okd_role_properties(&self.host_role);
|
||||
|
||||
self.validate_host_node_match(nodes, host_properties.logical_hosts(topology))?;
|
||||
|
||||
let bindings: Vec<HostBinding> =
|
||||
self.host_bindings(nodes, host_properties.logical_hosts(topology));
|
||||
|
||||
DhcpHostBindingScore {
|
||||
host_binding: bindings,
|
||||
domain: Some(topology.domain_name.clone()),
|
||||
}
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// Ensure the topology definition matches the number of physical nodes found.
|
||||
fn validate_host_node_match(
|
||||
&self,
|
||||
nodes: &Vec<PhysicalHost>,
|
||||
hosts: &Vec<LogicalHost>,
|
||||
) -> Result<(), InterpretError> {
|
||||
if hosts.len() != nodes.len() {
|
||||
return Err(InterpretError::new(format!(
|
||||
"Mismatch between logical hosts defined in topology ({}) and physical nodes found ({}).",
|
||||
hosts.len(),
|
||||
nodes.len()
|
||||
)));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// Create a binding for each physical host to its corresponding logical host.
|
||||
fn host_bindings(
|
||||
&self,
|
||||
nodes: &Vec<PhysicalHost>,
|
||||
hosts: &Vec<LogicalHost>,
|
||||
) -> Vec<HostBinding> {
|
||||
hosts
|
||||
.iter()
|
||||
.zip(nodes.iter())
|
||||
.map(|(logical_host, physical_host)| {
|
||||
info!(
|
||||
"Creating binding: Logical Host '{}' -> Physical Host ID '{}'",
|
||||
logical_host.name, physical_host.id
|
||||
);
|
||||
HostBinding {
|
||||
logical_host: logical_host.clone(),
|
||||
physical_host: physical_host.clone(),
|
||||
}
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Renders and deploys a per-MAC iPXE boot file for each node.
|
||||
async fn configure_ipxe(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &HAClusterTopology,
|
||||
nodes: &Vec<PhysicalHost>,
|
||||
) -> Result<(), InterpretError> {
|
||||
info!(
|
||||
"[{}] Rendering per-MAC iPXE configurations.",
|
||||
self.host_role
|
||||
);
|
||||
|
||||
let okd_role_properties = self.okd_role_properties(&self.host_role);
|
||||
// The iPXE script content is the same for all control plane nodes,
|
||||
// pointing to the 'master.ign' ignition file.
|
||||
let content = BootstrapIpxeTpl {
|
||||
http_ip: &topology.http_server.get_ip().to_string(),
|
||||
scos_path: "scos",
|
||||
ignition_http_path: "okd_ignition_files",
|
||||
//TODO must be refactored to not only use /dev/sda
|
||||
installation_device: "/dev/sda", // This might need to be configurable per-host in the future
|
||||
ignition_file_name: okd_role_properties.ignition_file(),
|
||||
}
|
||||
.to_string();
|
||||
|
||||
debug!("[{}] iPXE content template:\n{content}", self.host_role);
|
||||
|
||||
// Create and apply an iPXE boot file for each node.
|
||||
for node in nodes {
|
||||
let mac_address = node.get_mac_address();
|
||||
if mac_address.is_empty() {
|
||||
return Err(InterpretError::new(format!(
|
||||
"Physical host with ID '{}' has no MAC addresses defined.",
|
||||
node.id
|
||||
)));
|
||||
}
|
||||
info!(
|
||||
"[{}] Applying iPXE config for node ID '{}' with MACs: {:?}",
|
||||
self.host_role, node.id, mac_address
|
||||
);
|
||||
|
||||
IPxeMacBootFileScore {
|
||||
mac_address,
|
||||
content: content.clone(),
|
||||
}
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Prompts the user to reboot the target control plane nodes.
|
||||
async fn reboot_targets(&self, nodes: &Vec<PhysicalHost>) -> Result<(), InterpretError> {
|
||||
let node_ids: Vec<String> = nodes.iter().map(|n| n.id.to_string()).collect();
|
||||
info!(
|
||||
"[{}] Requesting reboot for control plane nodes: {node_ids:?}",
|
||||
self.host_role
|
||||
);
|
||||
|
||||
let confirmation = inquire::Confirm::new(
|
||||
&format!("Please reboot the {} {} nodes ({}) to apply their PXE configuration. Press enter when ready.", nodes.len(), self.host_role, node_ids.join(", ")),
|
||||
)
|
||||
.prompt()
|
||||
.map_err(|e| InterpretError::new(format!("User prompt failed: {e}")))?;
|
||||
|
||||
if !confirmation {
|
||||
return Err(InterpretError::new(
|
||||
"User aborted the operation.".to_string(),
|
||||
));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<HAClusterTopology> for OKDNodeInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &HAClusterTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
// 1. Ensure we have the specfied number of physical hosts.
|
||||
let nodes = self.get_nodes(inventory, topology).await?;
|
||||
|
||||
// 2. Create DHCP reservations for the nodes.
|
||||
self.configure_host_binding(inventory, topology, &nodes)
|
||||
.await?;
|
||||
|
||||
// 3. Create iPXE files for each node to boot from the ignition.
|
||||
self.configure_ipxe(inventory, topology, &nodes).await?;
|
||||
|
||||
// 4. Reboot the nodes to start the OS installation.
|
||||
self.reboot_targets(&nodes).await?;
|
||||
// TODO: Implement a step to validate that the installation of the nodes is
|
||||
// complete and for the cluster operators to become available.
|
||||
//
|
||||
// The OpenShift installer only provides two wait commands which currently need to be
|
||||
// run manually:
|
||||
// - `openshift-install wait-for bootstrap-complete`
|
||||
// - `openshift-install wait-for install-complete`
|
||||
//
|
||||
// There is no installer command that waits specifically for worker node
|
||||
// provisioning. Worker nodes join asynchronously (via ignition + CSR approval),
|
||||
// and the cluster becomes fully functional only once all nodes are Ready and the
|
||||
// cluster operators report Available=True.
|
||||
info!(
|
||||
"[{}] Provisioning initiated. Monitor the cluster convergence manually.",
|
||||
self.host_role
|
||||
);
|
||||
|
||||
Ok(Outcome::success(format!(
|
||||
"{} provisioning has been successfully initiated.",
|
||||
self.host_role
|
||||
)))
|
||||
}
|
||||
|
||||
fn get_name(&self) -> InterpretName {
|
||||
InterpretName::Custom("OKDNodeSetup".into())
|
||||
}
|
||||
|
||||
fn get_version(&self) -> Version {
|
||||
todo!()
|
||||
}
|
||||
|
||||
fn get_status(&self) -> InterpretStatus {
|
||||
todo!()
|
||||
}
|
||||
|
||||
fn get_children(&self) -> Vec<Id> {
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
@@ -12,6 +12,74 @@ use crate::{
|
||||
topology::{HostNetworkConfig, NetworkInterface, NetworkManager, Switch, SwitchPort, Topology},
|
||||
};
|
||||
|
||||
/// Configures high-availability networking for a set of physical hosts.
|
||||
///
|
||||
/// This is an opinionated Score that creates a resilient network configuration.
|
||||
/// It assumes hosts have at least two network interfaces connected
|
||||
/// to redundant switches for high availability.
|
||||
///
|
||||
/// The Score's `Interpret` logic will:
|
||||
/// 1. Setup the switch with sane defaults (e.g. mark interfaces as switchports for discoverability).
|
||||
/// 2. Discover which switch ports each host's interfaces are connected to (via MAC address).
|
||||
/// 3. Create a network bond (e.g. LACP) on the host itself using these interfaces.
|
||||
/// 4. Configure a corresponding port-channel on the switch(es) for those ports.
|
||||
///
|
||||
/// This ensures that both the host and the switch are configured to treat the
|
||||
/// multiple links as a single, aggregated, and redundant connection.
|
||||
///
|
||||
/// Hosts with 0 or 1 detected interfaces will be skipped, as bonding is not
|
||||
/// applicable.
|
||||
///
|
||||
/// <div class="warning">
|
||||
/// The implementation is currently _not_ idempotent, even though it should be.
|
||||
/// Running it more than once on the same host might result in duplicated bond configurations.
|
||||
/// </div>
|
||||
///
|
||||
/// <div class="warning">
|
||||
/// This Score is not named well. A better name would be
|
||||
/// `HighAvailabilityHostNetworkScore`, or something similar to better express the intent.
|
||||
/// </div>
|
||||
///
|
||||
/// # Requirements
|
||||
///
|
||||
/// This Score can only be applied to a [Topology] that implements both the
|
||||
/// [NetworkManager] (to configure the host-side bond) and [Switch]
|
||||
/// (to configure the switch-side port-channel) capabilities.
|
||||
///
|
||||
/// # Current limitations
|
||||
///
|
||||
/// ## 1. No rollback logic & limited idempotency
|
||||
///
|
||||
/// If any of the steps described above fails, the Score will not attempt to revert any changes
|
||||
/// already applied. Which could render the host or switch in an inconsistent state.
|
||||
///
|
||||
/// ## 2. Propagation delays on the switch
|
||||
///
|
||||
/// It might take some time for the sane defaults in step 1) to be applied. In some cases,
|
||||
/// it was observed that the switch takes up to 5min to actually apply the config.
|
||||
///
|
||||
/// But this Score's Interpret doesn't wait and directly proceeds to step 2) to discover
|
||||
/// the MAC addresses. Which could result interfaces being skipped because their corresponding port
|
||||
/// on the switch couldn't be found.
|
||||
///
|
||||
/// TODO: Validate that the switch is in the expected state before continuing.
|
||||
///
|
||||
/// ## 3. Bond configuration
|
||||
///
|
||||
/// To find the next available bond id, the current
|
||||
/// [NetworkManager](crate::infra::network_manager::OpenShiftNmStateNetworkManager) implementation
|
||||
/// simply checks for existing bonds named `bond[n]` and take the next available `n` number.
|
||||
///
|
||||
/// It doesn't check that there are already a bond for the interfaces that should be bonded. Which
|
||||
/// might result in a duplicate bond being created.
|
||||
///
|
||||
/// TODO: Make sure the interfaces to aggregate are not already bonded.
|
||||
///
|
||||
/// # Future improvements
|
||||
///
|
||||
/// Along with the `TODO` items above, splitting this Score into multiple smaller ones would be
|
||||
/// beneficial. It has a lot of moving parts and some of them could be used on their own to make
|
||||
/// operations on a cluster easier.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct HostNetworkConfigurationScore {
|
||||
pub hosts: Vec<PhysicalHost>,
|
||||
|
||||
@@ -6,14 +6,12 @@ mod bootstrap_05_sanity_check;
|
||||
mod bootstrap_06_installation_report;
|
||||
pub mod bootstrap_dhcp;
|
||||
pub mod bootstrap_load_balancer;
|
||||
pub mod bootstrap_okd_node;
|
||||
mod bootstrap_persist_network_bond;
|
||||
pub mod dhcp;
|
||||
pub mod dns;
|
||||
pub mod installation;
|
||||
pub mod ipxe;
|
||||
pub mod load_balancer;
|
||||
pub mod okd_node;
|
||||
pub mod templates;
|
||||
pub mod upgrade;
|
||||
pub use bootstrap_01_prepare::*;
|
||||
|
||||
@@ -1,54 +0,0 @@
|
||||
use crate::topology::{HAClusterTopology, LogicalHost};
|
||||
|
||||
pub trait OKDRoleProperties {
|
||||
fn ignition_file(&self) -> &'static str;
|
||||
fn required_hosts(&self) -> usize;
|
||||
fn logical_hosts<'a>(&self, t: &'a HAClusterTopology) -> &'a Vec<LogicalHost>;
|
||||
}
|
||||
|
||||
pub struct BootstrapRole;
|
||||
pub struct ControlPlaneRole;
|
||||
pub struct WorkerRole;
|
||||
pub struct StorageRole;
|
||||
|
||||
impl OKDRoleProperties for BootstrapRole {
|
||||
fn ignition_file(&self) -> &'static str {
|
||||
"bootstrap.ign"
|
||||
}
|
||||
|
||||
fn required_hosts(&self) -> usize {
|
||||
1
|
||||
}
|
||||
|
||||
fn logical_hosts<'a>(&self, t: &'a HAClusterTopology) -> &'a Vec<LogicalHost> {
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
|
||||
impl OKDRoleProperties for ControlPlaneRole {
|
||||
fn ignition_file(&self) -> &'static str {
|
||||
"master.ign"
|
||||
}
|
||||
|
||||
fn required_hosts(&self) -> usize {
|
||||
3
|
||||
}
|
||||
|
||||
fn logical_hosts<'a>(&self, t: &'a HAClusterTopology) -> &'a Vec<LogicalHost> {
|
||||
&t.control_plane
|
||||
}
|
||||
}
|
||||
|
||||
impl OKDRoleProperties for WorkerRole {
|
||||
fn ignition_file(&self) -> &'static str {
|
||||
"worker.ign"
|
||||
}
|
||||
|
||||
fn required_hosts(&self) -> usize {
|
||||
2
|
||||
}
|
||||
|
||||
fn logical_hosts<'a>(&self, t: &'a HAClusterTopology) -> &'a Vec<LogicalHost> {
|
||||
&t.workers
|
||||
}
|
||||
}
|
||||
@@ -106,37 +106,11 @@ pub struct HAProxy {
|
||||
pub groups: MaybeString,
|
||||
pub users: MaybeString,
|
||||
pub cpus: MaybeString,
|
||||
pub resolvers: HAProxyResolvers,
|
||||
pub resolvers: MaybeString,
|
||||
pub mailers: MaybeString,
|
||||
pub maintenance: Maintenance,
|
||||
}
|
||||
|
||||
#[derive(Default, PartialEq, Debug, YaSerialize, YaDeserialize)]
|
||||
pub struct HAProxyResolvers {
|
||||
#[yaserde(rename = "resolver")]
|
||||
pub resolver: Option<Resolver>,
|
||||
}
|
||||
|
||||
#[derive(Default, PartialEq, Debug, YaSerialize, YaDeserialize)]
|
||||
pub struct Resolver {
|
||||
pub id: String,
|
||||
pub enabled: i32,
|
||||
pub name: String,
|
||||
pub description: MaybeString,
|
||||
pub nameservers: String,
|
||||
pub parse_resolv_conf: String,
|
||||
pub resolve_retries: i32,
|
||||
pub timeout_resolve: String,
|
||||
pub timeout_retry: String,
|
||||
pub accepted_payload_size: MaybeString,
|
||||
pub hold_valid: MaybeString,
|
||||
pub hold_obsolete: MaybeString,
|
||||
pub hold_refused: MaybeString,
|
||||
pub hold_nx: MaybeString,
|
||||
pub hold_timeout: MaybeString,
|
||||
pub hold_other: MaybeString,
|
||||
}
|
||||
|
||||
#[derive(Default, PartialEq, Debug, YaSerialize, YaDeserialize)]
|
||||
pub struct Maintenance {
|
||||
#[yaserde(rename = "cronjobs")]
|
||||
|
||||
@@ -136,7 +136,6 @@ pub struct Rule {
|
||||
pub updated: Option<Updated>,
|
||||
pub created: Option<Created>,
|
||||
pub disabled: Option<MaybeString>,
|
||||
pub log: Option<u32>,
|
||||
}
|
||||
|
||||
#[derive(Default, PartialEq, Debug, YaSerialize, YaDeserialize)]
|
||||
@@ -217,7 +216,7 @@ pub struct System {
|
||||
pub maximumfrags: Option<MaybeString>,
|
||||
pub aliasesresolveinterval: Option<MaybeString>,
|
||||
pub maximumtableentries: Option<MaybeString>,
|
||||
pub language: Option<String>,
|
||||
pub language: String,
|
||||
pub dnsserver: Option<MaybeString>,
|
||||
pub dns1gw: Option<String>,
|
||||
pub dns2gw: Option<String>,
|
||||
@@ -1141,7 +1140,6 @@ pub struct UnboundGeneral {
|
||||
pub local_zone_type: String,
|
||||
pub outgoing_interface: MaybeString,
|
||||
pub enable_wpad: MaybeString,
|
||||
pub safesearch: MaybeString,
|
||||
}
|
||||
|
||||
#[derive(Default, PartialEq, Debug, YaSerialize, YaDeserialize)]
|
||||
@@ -1195,15 +1193,15 @@ pub struct Acls {
|
||||
|
||||
#[derive(Default, PartialEq, Debug, YaSerialize, YaDeserialize)]
|
||||
pub struct Dnsbl {
|
||||
pub enabled: Option<i32>,
|
||||
pub safesearch: Option<MaybeString>,
|
||||
pub enabled: i32,
|
||||
pub safesearch: MaybeString,
|
||||
#[yaserde(rename = "type")]
|
||||
pub r#type: Option<MaybeString>,
|
||||
pub lists: Option<MaybeString>,
|
||||
pub whitelists: Option<MaybeString>,
|
||||
pub blocklists: Option<MaybeString>,
|
||||
pub wildcards: Option<MaybeString>,
|
||||
pub address: Option<MaybeString>,
|
||||
pub r#type: MaybeString,
|
||||
pub lists: MaybeString,
|
||||
pub whitelists: MaybeString,
|
||||
pub blocklists: MaybeString,
|
||||
pub wildcards: MaybeString,
|
||||
pub address: MaybeString,
|
||||
pub nxdomain: Option<i32>,
|
||||
}
|
||||
|
||||
@@ -1231,7 +1229,6 @@ pub struct Host {
|
||||
pub ttl: Option<MaybeString>,
|
||||
pub server: String,
|
||||
pub description: Option<String>,
|
||||
pub txtdata: MaybeString,
|
||||
}
|
||||
|
||||
impl Host {
|
||||
@@ -1247,7 +1244,6 @@ impl Host {
|
||||
ttl: Some(MaybeString::default()),
|
||||
mx: MaybeString::default(),
|
||||
description: None,
|
||||
txtdata: MaybeString::default(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1295,7 +1291,6 @@ pub struct WireguardServerItem {
|
||||
pub gateway: MaybeString,
|
||||
pub carp_depend_on: MaybeString,
|
||||
pub peers: String,
|
||||
pub debug: MaybeString,
|
||||
pub endpoint: MaybeString,
|
||||
pub peer_dns: MaybeString,
|
||||
}
|
||||
|
||||
@@ -612,7 +612,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad>0</enable_wpad>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity>0</hideidentity>
|
||||
|
||||
@@ -2003,7 +2003,6 @@
|
||||
<cacheflush/>
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<safesearch/>
|
||||
<enable_wpad/>
|
||||
</general>
|
||||
<advanced>
|
||||
@@ -2072,7 +2071,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
<host uuid="dd593e95-02bc-476f-8610-fa1ee454e950">
|
||||
<enabled>1</enabled>
|
||||
@@ -2083,7 +2081,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
<host uuid="e1606f96-dd38-471f-a3d7-ad25e41e810d">
|
||||
<enabled>1</enabled>
|
||||
@@ -2094,7 +2091,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
</hosts>
|
||||
<aliases/>
|
||||
@@ -2121,7 +2117,6 @@
|
||||
<endpoint/>
|
||||
<peer_dns/>
|
||||
<carp_depend_on/>
|
||||
<debug/>
|
||||
<peers>03031aec-2e84-462e-9eab-57762dde667a,98e6ca3d-1de9-449b-be80-77022221b509,67c0ace5-e802-4d2b-a536-f8b7a2db6f99,74b60fff-7844-4097-9966-f1c2b1ad29ff,3de82ad5-bc1b-4b91-9598-f906e58ac937,a95e6b5e-24a4-40b5-bb41-b79e784f6f1c,6c9a12c6-c1ca-4c14-866b-975406a30590,c33b308b-7125-4688-9561-989ace8787b5,e43f004a-23bf-4027-8fb0-953fbb40479f</peers>
|
||||
</server>
|
||||
</servers>
|
||||
|
||||
@@ -614,7 +614,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad>0</enable_wpad>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity>0</hideidentity>
|
||||
|
||||
@@ -750,7 +750,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad>0</enable_wpad>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity>0</hideidentity>
|
||||
|
||||
@@ -709,7 +709,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad>0</enable_wpad>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity>0</hideidentity>
|
||||
|
||||
@@ -951,7 +951,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad/>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity>0</hideidentity>
|
||||
|
||||
@@ -808,7 +808,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad/>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity/>
|
||||
|
||||
@@ -726,7 +726,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad/>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity>0</hideidentity>
|
||||
@@ -794,7 +793,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
<host uuid="dd593e95-02bc-476f-8610-fa1ee454e950">
|
||||
<enabled>1</enabled>
|
||||
@@ -805,7 +803,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
<host uuid="e1606f96-dd38-471f-a3d7-ad25e41e810d">
|
||||
<enabled>1</enabled>
|
||||
@@ -816,7 +813,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
</hosts>
|
||||
<aliases/>
|
||||
@@ -842,7 +838,6 @@
|
||||
<gateway/>
|
||||
<carp_depend_on/>
|
||||
<peers>03031aec-2e84-462e-9eab-57762dde667a,98e6ca3d-1de9-449b-be80-77022221b509,67c0ace5-e802-4d2b-a536-f8b7a2db6f99,74b60fff-7844-4097-9966-f1c2b1ad29ff,3de82ad5-bc1b-4b91-9598-f906e58ac937,a95e6b5e-24a4-40b5-bb41-b79e784f6f1c,6c9a12c6-c1ca-4c14-866b-975406a30590,c33b308b-7125-4688-9561-989ace8787b5,e43f004a-23bf-4027-8fb0-953fbb40479f</peers>
|
||||
<debug/>
|
||||
<endpoint/>
|
||||
<peer_dns/>
|
||||
</server>
|
||||
|
||||
@@ -718,7 +718,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad/>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity>0</hideidentity>
|
||||
@@ -786,7 +785,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
<host uuid="dd593e95-02bc-476f-8610-fa1ee454e950">
|
||||
<enabled>1</enabled>
|
||||
@@ -797,7 +795,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
<host uuid="e1606f96-dd38-471f-a3d7-ad25e41e810d">
|
||||
<enabled>1</enabled>
|
||||
@@ -808,7 +805,6 @@
|
||||
<mx/>
|
||||
<server>192.168.20.161</server>
|
||||
<description>Some app local</description>
|
||||
<txtdata/>
|
||||
</host>
|
||||
</hosts>
|
||||
<aliases/>
|
||||
@@ -836,7 +832,6 @@
|
||||
<gateway/>
|
||||
<carp_depend_on/>
|
||||
<peers>03031aec-2e84-462e-9eab-57762dde667a,98e6ca3d-1de9-449b-be80-77022221b509,67c0ace5-e802-4d2b-a536-f8b7a2db6f99,74b60fff-7844-4097-9966-f1c2b1ad29ff,3de82ad5-bc1b-4b91-9598-f906e58ac937,a95e6b5e-24a4-40b5-bb41-b79e784f6f1c,6c9a12c6-c1ca-4c14-866b-975406a30590,c33b308b-7125-4688-9561-989ace8787b5,e43f004a-23bf-4027-8fb0-953fbb40479f</peers>
|
||||
<debug/>
|
||||
</server>
|
||||
</servers>
|
||||
</server>
|
||||
|
||||
@@ -869,7 +869,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad/>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity/>
|
||||
|
||||
@@ -862,7 +862,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad/>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity/>
|
||||
|
||||
@@ -869,7 +869,6 @@
|
||||
<local_zone_type>transparent</local_zone_type>
|
||||
<outgoing_interface/>
|
||||
<enable_wpad/>
|
||||
<safesearch/>
|
||||
</general>
|
||||
<advanced>
|
||||
<hideidentity/>
|
||||
|
||||
Reference in New Issue
Block a user