Compare commits

..

6 Commits

Author SHA1 Message Date
bfdb11b217 Merge pull request 'feat(OKDInstallation): Implemented bootstrap of okd worker node, added features to allow both control plane and worker node to use the same bootstrap_okd_node score' (#198) from feat/okd-nodes into master
Some checks failed
Run Check Script / check (push) Successful in 1m57s
Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m59s
Reviewed-on: #198
Reviewed-by: johnride <jg@nationtech.io>
2025-12-10 19:27:51 +00:00
d5fadf4f44 fix: deleted storage node role, fixed erroneous comment, modified score name to be in line with clean code naming conventions, fixed how the OKDNodeInstallationScore is called via OKDSetup03ControlPlaneScore and OKDSetup04WorkersScore
All checks were successful
Run Check Script / check (pull_request) Successful in 1m45s
2025-12-10 14:20:24 -05:00
9617e1cfde Merge pull request 'adr: Higher order topologies' (#197) from adr/015-higher-order-topologies into master
Some checks failed
Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m54s
Run Check Script / check (push) Successful in 1m47s
Reviewed-on: #197
2025-12-10 18:12:23 +00:00
50bd5c5bba feat(OKDInstallation): Implemented bootstrap of okd worker node, added features to allow both control plane and worker node to use the same bootstrap_okd_node score
All checks were successful
Run Check Script / check (pull_request) Successful in 1m46s
2025-12-10 12:15:07 -05:00
a953284386 doc: Add note about counter-intuitive behavior of nmstate
Some checks failed
Run Check Script / check (push) Successful in 1m33s
Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m28s
2025-12-09 23:04:15 -05:00
bfde5f58ed adr: Higher order topologies
All checks were successful
Run Check Script / check (pull_request) Successful in 1m33s
These types of Topologies will orchestrate behavior in regular Topologies.

For example, a FailoverTopology is a Higher Order, it will orchestrate its capabilities between a primary and a replica topology. A great use case for this is a database deployment. The FailoverTopology will deploy both instances, connect them, and the able to execute the appropriate actions to promote de replica to primary and revert back to original state.

Other use cases are ShardedTopology, DecentralizedTopology, etc.
2025-12-09 11:23:30 -05:00
12 changed files with 670 additions and 384 deletions

15
Cargo.lock generated
View File

@@ -6049,6 +6049,21 @@ version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683"
[[package]]
name = "test-score"
version = "0.1.0"
dependencies = [
"base64 0.22.1",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "thiserror"
version = "1.0.69"

View File

@@ -0,0 +1,114 @@
# Architecture Decision Record: Higher-Order Topologies
**Initial Author:** Jean-Gabriel Gill-Couture
**Initial Date:** 2025-12-08
**Last Updated Date:** 2025-12-08
## Status
Implemented
## Context
Harmony models infrastructure as **Topologies** (deployment targets like `K8sAnywhereTopology`, `LinuxHostTopology`) implementing **Capabilities** (tech traits like `PostgreSQL`, `Docker`).
**Higher-Order Topologies** (e.g., `FailoverTopology<T>`) compose/orchestrate capabilities *across* multiple underlying topologies (e.g., primary+replica `T`).
Naive design requires manual `impl Capability for HigherOrderTopology<T>` *per T per capability*, causing:
- **Impl explosion**: N topologies × M capabilities = N×M boilerplate.
- **ISP violation**: Topologies forced to impl unrelated capabilities.
- **Maintenance hell**: New topology needs impls for *all* orchestrated capabilities; new capability needs impls for *all* topologies/higher-order.
- **Barrier to extension**: Users can't easily add topologies without todos/panics.
This makes scaling Harmony impractical as ecosystem grows.
## Decision
Use **blanket trait impls** on higher-order topologies to *automatically* derive orchestration:
````rust
/// Higher-Order Topology: Orchestrates capabilities across sub-topologies.
pub struct FailoverTopology<T> {
/// Primary sub-topology.
primary: T,
/// Replica sub-topology.
replica: T,
}
/// Automatically provides PostgreSQL failover for *any* `T: PostgreSQL`.
/// Delegates to primary for queries; orchestrates deploy across both.
#[async_trait]
impl<T: PostgreSQL> PostgreSQL for FailoverTopology<T> {
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
// Deploy primary; extract certs/endpoint;
// deploy replica with pg_basebackup + TLS passthrough.
// (Full impl logged/elaborated.)
}
// Delegate queries to primary.
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
self.primary.get_replication_certs(cluster_name).await
}
// ...
}
/// Similarly for other capabilities.
#[async_trait]
impl<T: Docker> Docker for FailoverTopology<T> {
// Failover Docker orchestration.
}
````
**Key properties:**
- **Auto-derivation**: `Failover<K8sAnywhere>` gets `PostgreSQL` iff `K8sAnywhere: PostgreSQL`.
- **No boilerplate**: One blanket impl per capability *per higher-order type*.
## Rationale
- **Composition via generics**: Rust trait solver auto-selects impls; zero runtime cost.
- **Compile-time safety**: Missing `T: Capability` → compile error (no panics).
- **Scalable**: O(capabilities) impls per higher-order; new `T` auto-works.
- **ISP-respecting**: Capabilities only surface if sub-topology provides.
- **Centralized logic**: Orchestration (e.g., cert propagation) in one place.
**Example usage:**
````rust
// ✅ Works: K8sAnywhere: PostgreSQL → Failover provides failover PG
let pg_failover: FailoverTopology<K8sAnywhereTopology> = ...;
pg_failover.deploy_pg(config).await;
// ✅ Works: LinuxHost: Docker → Failover provides failover Docker
let docker_failover: FailoverTopology<LinuxHostTopology> = ...;
docker_failover.deploy_docker(...).await;
// ❌ Compile fail: K8sAnywhere !: Docker
let invalid: FailoverTopology<K8sAnywhereTopology>;
invalid.deploy_docker(...); // `T: Docker` bound unsatisfied
````
## Consequences
**Pros:**
- **Extensible**: New topology `AWSTopology: PostgreSQL` → instant `Failover<AWSTopology>: PostgreSQL`.
- **Lean**: No useless impls (e.g., no `K8sAnywhere: Docker`).
- **Observable**: Logs trace every step.
**Cons:**
- **Monomorphization**: Generics generate code per T (mitigated: few Ts).
- **Delegation opacity**: Relies on rustdoc/logs for internals.
## Alternatives considered
| Approach | Pros | Cons |
|----------|------|------|
| **Manual per-T impls**<br>`impl PG for Failover<K8s> {..}`<br>`impl PG for Failover<Linux> {..}` | Explicit control | N×M explosion; violates ISP; hard to extend. |
| **Dynamic trait objects**<br>`Box<dyn AnyCapability>` | Runtime flex | Perf hit; type erasure; error-prone dispatch. |
| **Mega-topology trait**<br>All-in-one `OrchestratedTopology` | Simple wiring | Monolithic; poor composition. |
| **Registry dispatch**<br>Runtime capability lookup | Decoupled | Complex; no compile safety; perf/debug overhead. |
**Selected**: Blanket impls leverage Rust generics for safe, zero-cost composition.
## Additional Notes
- Applies to `MultisiteTopology<T>`, `ShardedTopology<T>`, etc.
- `FailoverTopology` in `failover.rs` is first implementation.

View File

@@ -0,0 +1,153 @@
//! Example of Higher-Order Topologies in Harmony.
//! Demonstrates how `FailoverTopology<T>` automatically provides failover for *any* capability
//! supported by a sub-topology `T` via blanket trait impls.
//!
//! Key insight: No manual impls per T or capability -- scales effortlessly.
//! Users can:
//! - Write new `Topology` (impl capabilities on a struct).
//! - Compose with `FailoverTopology` (gets capabilities if T has them).
//! - Compile fails if capability missing (safety).
use async_trait::async_trait;
use tokio;
/// Capability trait: Deploy and manage PostgreSQL.
#[async_trait]
pub trait PostgreSQL {
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String>;
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String>;
}
/// Capability trait: Deploy Docker.
#[async_trait]
pub trait Docker {
async fn deploy_docker(&self) -> Result<String, String>;
}
/// Configuration for PostgreSQL deployments.
#[derive(Clone)]
pub struct PostgreSQLConfig;
/// Replication certificates.
#[derive(Clone)]
pub struct ReplicationCerts;
/// Concrete topology: Kubernetes Anywhere (supports PostgreSQL).
#[derive(Clone)]
pub struct K8sAnywhereTopology;
#[async_trait]
impl PostgreSQL for K8sAnywhereTopology {
async fn deploy(&self, _config: &PostgreSQLConfig) -> Result<String, String> {
// Real impl: Use k8s helm chart, operator, etc.
Ok("K8sAnywhere PostgreSQL deployed".to_string())
}
async fn get_replication_certs(&self, _cluster_name: &str) -> Result<ReplicationCerts, String> {
Ok(ReplicationCerts)
}
}
/// Concrete topology: Linux Host (supports Docker).
#[derive(Clone)]
pub struct LinuxHostTopology;
#[async_trait]
impl Docker for LinuxHostTopology {
async fn deploy_docker(&self) -> Result<String, String> {
// Real impl: Install/configure Docker on host.
Ok("LinuxHost Docker deployed".to_string())
}
}
/// Higher-Order Topology: Composes multiple sub-topologies (primary + replica).
/// Automatically derives *all* capabilities of `T` with failover orchestration.
///
/// - If `T: PostgreSQL`, then `FailoverTopology<T>: PostgreSQL` (blanket impl).
/// - Same for `Docker`, etc. No boilerplate!
/// - Compile-time safe: Missing `T: Capability` → error.
#[derive(Clone)]
pub struct FailoverTopology<T> {
/// Primary sub-topology.
pub primary: T,
/// Replica sub-topology.
pub replica: T,
}
/// Blanket impl: Failover PostgreSQL if T provides PostgreSQL.
/// Delegates reads to primary; deploys to both.
#[async_trait]
impl<T: PostgreSQL + Send + Sync + Clone> PostgreSQL for FailoverTopology<T> {
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
// Orchestrate: Deploy primary first, then replica (e.g., via pg_basebackup).
let primary_result = self.primary.deploy(config).await?;
let replica_result = self.replica.deploy(config).await?;
Ok(format!("Failover PG deployed: {} | {}", primary_result, replica_result))
}
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
// Delegate to primary (replica follows).
self.primary.get_replication_certs(cluster_name).await
}
}
/// Blanket impl: Failover Docker if T provides Docker.
#[async_trait]
impl<T: Docker + Send + Sync + Clone> Docker for FailoverTopology<T> {
async fn deploy_docker(&self) -> Result<String, String> {
// Orchestrate across primary + replica.
let primary_result = self.primary.deploy_docker().await?;
let replica_result = self.replica.deploy_docker().await?;
Ok(format!("Failover Docker deployed: {} | {}", primary_result, replica_result))
}
}
#[tokio::main]
async fn main() {
let config = PostgreSQLConfig;
println!("=== ✅ PostgreSQL Failover (K8sAnywhere supports PG) ===");
let pg_failover = FailoverTopology {
primary: K8sAnywhereTopology,
replica: K8sAnywhereTopology,
};
let result = pg_failover.deploy(&config).await.unwrap();
println!("Result: {}", result);
println!("\n=== ✅ Docker Failover (LinuxHost supports Docker) ===");
let docker_failover = FailoverTopology {
primary: LinuxHostTopology,
replica: LinuxHostTopology,
};
let result = docker_failover.deploy_docker().await.unwrap();
println!("Result: {}", result);
println!("\n=== ❌ Would fail to compile (K8sAnywhere !: Docker) ===");
// let invalid = FailoverTopology {
// primary: K8sAnywhereTopology,
// replica: K8sAnywhereTopology,
// };
// invalid.deploy_docker().await.unwrap(); // Error: `K8sAnywhereTopology: Docker` not satisfied!
// Very clear error message :
// error[E0599]: the method `deploy_docker` exists for struct `FailoverTopology<K8sAnywhereTopology>`, but its trait bounds were not satisfied
// --> src/main.rs:90:9
// |
// 4 | pub struct FailoverTopology<T> {
// | ------------------------------ method `deploy_docker` not found for this struct because it doesn't satisfy `FailoverTopology<K8sAnywhereTopology>: Docker`
// ...
// 37 | struct K8sAnywhereTopology;
// | -------------------------- doesn't satisfy `K8sAnywhereTopology: Docker`
// ...
// 90 | invalid.deploy_docker(); // `T: Docker` bound unsatisfied
// | ^^^^^^^^^^^^^ method cannot be called on `FailoverTopology<K8sAnywhereTopology>` due to unsatisfied trait bounds
// |
// note: trait bound `K8sAnywhereTopology: Docker` was not satisfied
// --> src/main.rs:61:9
// |
// 61 | impl<T: Docker + Send + Sync> Docker for FailoverTopology<T> {
// | ^^^^^^ ------ -------------------
// | |
// | unsatisfied trait bound introduced here
// note: the trait `Docker` must be implemented
}

View File

@@ -1,4 +1,6 @@
mod repository;
use std::fmt;
pub use repository::*;
#[derive(Debug, new, Clone)]
@@ -69,5 +71,14 @@ pub enum HostRole {
Bootstrap,
ControlPlane,
Worker,
Storage,
}
impl fmt::Display for HostRole {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
HostRole::Bootstrap => write!(f, "Bootstrap"),
HostRole::ControlPlane => write!(f, "ControlPlane"),
HostRole::Worker => write!(f, "Worker"),
}
}
}

View File

@@ -27,7 +27,7 @@ use kube::{
};
use log::{debug, error, trace, warn};
use serde::{Serialize, de::DeserializeOwned};
use serde_json::{json, Value};
use serde_json::json;
use similar::TextDiff;
use tokio::{io::AsyncReadExt, time::sleep};
use url::Url;
@@ -64,10 +64,6 @@ impl K8sClient {
})
}
pub async fn patch_resource(&self, patch: Value, gvk: &GroupVersionKind) -> Result<(), Error> {
}
pub async fn service_account_api(&self, namespace: &str) -> Api<ServiceAccount> {
let api: Api<ServiceAccount> = Api::namespaced(self.client.clone(), namespace);
api

View File

@@ -17,6 +17,12 @@ use crate::{
topology::{HostNetworkConfig, NetworkError, NetworkManager, k8s::K8sClient},
};
/// TODO document properly the non-intuitive behavior or "roll forward only" of nmstate in general
/// It is documented in nmstate official doc, but worth mentionning here :
///
/// - You create a bond, nmstate will apply it
/// - You delete de bond from nmstate, it will NOT delete it
/// - To delete it you have to update it with configuration set to null
pub struct OpenShiftNmStateNetworkManager {
k8s_client: Arc<K8sClient>,
}
@@ -31,6 +37,7 @@ impl std::fmt::Debug for OpenShiftNmStateNetworkManager {
impl NetworkManager for OpenShiftNmStateNetworkManager {
async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError> {
debug!("Installing NMState controller...");
// TODO use operatorhub maybe?
self.k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/nmstate.io_nmstates.yaml
").unwrap(), Some("nmstate"))
.await?;

View File

@@ -1,20 +1,8 @@
use crate::{
data::Version,
hardware::PhysicalHost,
infra::inventory::InventoryRepositoryFactory,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::{HostRole, Inventory},
modules::{
dhcp::DhcpHostBindingScore, http::IPxeMacBootFileScore,
inventory::DiscoverHostForRoleScore, okd::templates::BootstrapIpxeTpl,
},
score::Score,
topology::{HAClusterTopology, HostBinding},
interpret::Interpret, inventory::HostRole, modules::okd::bootstrap_okd_node::OKDNodeInterpret,
score::Score, topology::HAClusterTopology,
};
use async_trait::async_trait;
use derive_new::new;
use harmony_types::id::Id;
use log::{debug, info};
use serde::Serialize;
// -------------------------------------------------------------------------------------------------
@@ -28,226 +16,13 @@ pub struct OKDSetup03ControlPlaneScore {}
impl Score<HAClusterTopology> for OKDSetup03ControlPlaneScore {
fn create_interpret(&self) -> Box<dyn Interpret<HAClusterTopology>> {
Box::new(OKDSetup03ControlPlaneInterpret::new())
// TODO: Implement a step to wait for the control plane nodes to join the cluster
// and for the cluster operators to become available. This would be similar to
// the `wait-for bootstrap-complete` command.
Box::new(OKDNodeInterpret::new(HostRole::ControlPlane))
}
fn name(&self) -> String {
"OKDSetup03ControlPlaneScore".to_string()
}
}
#[derive(Debug, Clone)]
pub struct OKDSetup03ControlPlaneInterpret {
version: Version,
status: InterpretStatus,
}
impl OKDSetup03ControlPlaneInterpret {
pub fn new() -> Self {
let version = Version::from("1.0.0").unwrap();
Self {
version,
status: InterpretStatus::QUEUED,
}
}
/// Ensures that three physical hosts are discovered and available for the ControlPlane role.
/// It will trigger discovery if not enough hosts are found.
async fn get_nodes(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
) -> Result<Vec<PhysicalHost>, InterpretError> {
const REQUIRED_HOSTS: usize = 3;
let repo = InventoryRepositoryFactory::build().await?;
let mut control_plane_hosts = repo.get_host_for_role(&HostRole::ControlPlane).await?;
while control_plane_hosts.len() < REQUIRED_HOSTS {
info!(
"Discovery of {} control plane hosts in progress, current number {}",
REQUIRED_HOSTS,
control_plane_hosts.len()
);
// This score triggers the discovery agent for a specific role.
DiscoverHostForRoleScore {
role: HostRole::ControlPlane,
}
.interpret(inventory, topology)
.await?;
control_plane_hosts = repo.get_host_for_role(&HostRole::ControlPlane).await?;
}
if control_plane_hosts.len() < REQUIRED_HOSTS {
Err(InterpretError::new(format!(
"OKD Requires at least {} control plane hosts, but only found {}. Cannot proceed.",
REQUIRED_HOSTS,
control_plane_hosts.len()
)))
} else {
// Take exactly the number of required hosts to ensure consistency.
Ok(control_plane_hosts
.into_iter()
.take(REQUIRED_HOSTS)
.collect())
}
}
/// Configures DHCP host bindings for all control plane nodes.
async fn configure_host_binding(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
nodes: &Vec<PhysicalHost>,
) -> Result<(), InterpretError> {
info!("[ControlPlane] Configuring host bindings for control plane nodes.");
// Ensure the topology definition matches the number of physical nodes found.
if topology.control_plane.len() != nodes.len() {
return Err(InterpretError::new(format!(
"Mismatch between logical control plane hosts defined in topology ({}) and physical nodes found ({}).",
topology.control_plane.len(),
nodes.len()
)));
}
// Create a binding for each physical host to its corresponding logical host.
let bindings: Vec<HostBinding> = topology
.control_plane
.iter()
.zip(nodes.iter())
.map(|(logical_host, physical_host)| {
info!(
"Creating binding: Logical Host '{}' -> Physical Host ID '{}'",
logical_host.name, physical_host.id
);
HostBinding {
logical_host: logical_host.clone(),
physical_host: physical_host.clone(),
}
})
.collect();
DhcpHostBindingScore {
host_binding: bindings,
domain: Some(topology.domain_name.clone()),
}
.interpret(inventory, topology)
.await?;
Ok(())
}
/// Renders and deploys a per-MAC iPXE boot file for each control plane node.
async fn configure_ipxe(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
nodes: &Vec<PhysicalHost>,
) -> Result<(), InterpretError> {
info!("[ControlPlane] Rendering per-MAC iPXE configurations.");
// The iPXE script content is the same for all control plane nodes,
// pointing to the 'master.ign' ignition file.
let content = BootstrapIpxeTpl {
http_ip: &topology.http_server.get_ip().to_string(),
scos_path: "scos",
ignition_http_path: "okd_ignition_files",
installation_device: "/dev/sda", // This might need to be configurable per-host in the future
ignition_file_name: "master.ign", // Control plane nodes use the master ignition file
}
.to_string();
debug!("[ControlPlane] iPXE content template:\n{content}");
// Create and apply an iPXE boot file for each node.
for node in nodes {
let mac_address = node.get_mac_address();
if mac_address.is_empty() {
return Err(InterpretError::new(format!(
"Physical host with ID '{}' has no MAC addresses defined.",
node.id
)));
}
info!(
"[ControlPlane] Applying iPXE config for node ID '{}' with MACs: {:?}",
node.id, mac_address
);
IPxeMacBootFileScore {
mac_address,
content: content.clone(),
}
.interpret(inventory, topology)
.await?;
}
Ok(())
}
/// Prompts the user to reboot the target control plane nodes.
async fn reboot_targets(&self, nodes: &Vec<PhysicalHost>) -> Result<(), InterpretError> {
let node_ids: Vec<String> = nodes.iter().map(|n| n.id.to_string()).collect();
info!("[ControlPlane] Requesting reboot for control plane nodes: {node_ids:?}",);
let confirmation = inquire::Confirm::new(
&format!("Please reboot the {} control plane nodes ({}) to apply their PXE configuration. Press enter when ready.", nodes.len(), node_ids.join(", ")),
)
.prompt()
.map_err(|e| InterpretError::new(format!("User prompt failed: {e}")))?;
if !confirmation {
return Err(InterpretError::new(
"User aborted the operation.".to_string(),
));
}
Ok(())
}
}
#[async_trait]
impl Interpret<HAClusterTopology> for OKDSetup03ControlPlaneInterpret {
fn get_name(&self) -> InterpretName {
InterpretName::Custom("OKDSetup03ControlPlane")
}
fn get_version(&self) -> Version {
self.version.clone()
}
fn get_status(&self) -> InterpretStatus {
self.status.clone()
}
fn get_children(&self) -> Vec<Id> {
vec![]
}
async fn execute(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
) -> Result<Outcome, InterpretError> {
// 1. Ensure we have 3 physical hosts for the control plane.
let nodes = self.get_nodes(inventory, topology).await?;
// 2. Create DHCP reservations for the control plane nodes.
self.configure_host_binding(inventory, topology, &nodes)
.await?;
// 3. Create iPXE files for each control plane node to boot from the master ignition.
self.configure_ipxe(inventory, topology, &nodes).await?;
// 4. Reboot the nodes to start the OS installation.
self.reboot_targets(&nodes).await?;
// TODO: Implement a step to wait for the control plane nodes to join the cluster
// and for the cluster operators to become available. This would be similar to
// the `wait-for bootstrap-complete` command.
info!("[ControlPlane] Provisioning initiated. Monitor the cluster convergence manually.");
Ok(Outcome::success(
"Control plane provisioning has been successfully initiated.".into(),
))
}
}

View File

@@ -1,15 +1,9 @@
use async_trait::async_trait;
use derive_new::new;
use harmony_types::id::Id;
use log::info;
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
score::Score,
topology::HAClusterTopology,
interpret::Interpret, inventory::HostRole, modules::okd::bootstrap_okd_node::OKDNodeInterpret,
score::Score, topology::HAClusterTopology,
};
// -------------------------------------------------------------------------------------------------
@@ -23,61 +17,10 @@ pub struct OKDSetup04WorkersScore {}
impl Score<HAClusterTopology> for OKDSetup04WorkersScore {
fn create_interpret(&self) -> Box<dyn Interpret<HAClusterTopology>> {
Box::new(OKDSetup04WorkersInterpret::new(self.clone()))
Box::new(OKDNodeInterpret::new(HostRole::Worker))
}
fn name(&self) -> String {
"OKDSetup04WorkersScore".to_string()
}
}
#[derive(Debug, Clone)]
pub struct OKDSetup04WorkersInterpret {
score: OKDSetup04WorkersScore,
version: Version,
status: InterpretStatus,
}
impl OKDSetup04WorkersInterpret {
pub fn new(score: OKDSetup04WorkersScore) -> Self {
let version = Version::from("1.0.0").unwrap();
Self {
version,
score,
status: InterpretStatus::QUEUED,
}
}
async fn render_and_reboot(&self) -> Result<(), InterpretError> {
info!("[Workers] Rendering per-MAC PXE for workers and rebooting");
Ok(())
}
}
#[async_trait]
impl Interpret<HAClusterTopology> for OKDSetup04WorkersInterpret {
fn get_name(&self) -> InterpretName {
InterpretName::Custom("OKDSetup04Workers")
}
fn get_version(&self) -> Version {
self.version.clone()
}
fn get_status(&self) -> InterpretStatus {
self.status.clone()
}
fn get_children(&self) -> Vec<Id> {
vec![]
}
async fn execute(
&self,
_inventory: &Inventory,
_topology: &HAClusterTopology,
) -> Result<Outcome, InterpretError> {
self.render_and_reboot().await?;
Ok(Outcome::success("Workers provisioned".into()))
}
}

View File

@@ -0,0 +1,303 @@
use async_trait::async_trait;
use derive_new::new;
use harmony_types::id::Id;
use log::{debug, info};
use serde::Serialize;
use crate::{
data::Version,
hardware::PhysicalHost,
infra::inventory::InventoryRepositoryFactory,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::{HostRole, Inventory},
modules::{
dhcp::DhcpHostBindingScore,
http::IPxeMacBootFileScore,
inventory::DiscoverHostForRoleScore,
okd::{
okd_node::{
BootstrapRole, ControlPlaneRole, OKDRoleProperties, StorageRole, WorkerRole,
},
templates::BootstrapIpxeTpl,
},
},
score::Score,
topology::{HAClusterTopology, HostBinding, LogicalHost},
};
#[derive(Debug, Clone, Serialize, new)]
pub struct OKDNodeInstallationScore {
host_role: HostRole,
}
impl Score<HAClusterTopology> for OKDNodeInstallationScore {
fn name(&self) -> String {
"OKDNodeScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<HAClusterTopology>> {
Box::new(OKDNodeInterpret::new(self.host_role.clone()))
}
}
#[derive(Debug, Clone)]
pub struct OKDNodeInterpret {
host_role: HostRole,
}
impl OKDNodeInterpret {
pub fn new(host_role: HostRole) -> Self {
Self { host_role }
}
fn okd_role_properties(&self, role: &HostRole) -> &'static dyn OKDRoleProperties {
match role {
HostRole::Bootstrap => &BootstrapRole,
HostRole::ControlPlane => &ControlPlaneRole,
HostRole::Worker => &WorkerRole,
}
}
async fn get_nodes(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
) -> Result<Vec<PhysicalHost>, InterpretError> {
let repo = InventoryRepositoryFactory::build().await?;
let mut hosts = repo.get_host_for_role(&self.host_role).await?;
let okd_host_properties = self.okd_role_properties(&self.host_role);
let required_hosts: usize = okd_host_properties.required_hosts();
while hosts.len() < required_hosts {
info!(
"Discovery of {} {} hosts in progress, current number {}",
required_hosts,
self.host_role,
hosts.len()
);
// This score triggers the discovery agent for a specific role.
DiscoverHostForRoleScore {
role: self.host_role.clone(),
}
.interpret(inventory, topology)
.await?;
hosts = repo.get_host_for_role(&self.host_role).await?;
}
if hosts.len() < required_hosts {
Err(InterpretError::new(format!(
"OKD Requires at least {} {} hosts, but only found {}. Cannot proceed.",
required_hosts,
self.host_role,
hosts.len()
)))
} else {
// Take exactly the number of required hosts to ensure consistency.
Ok(hosts.into_iter().take(required_hosts).collect())
}
}
/// Configures DHCP host bindings for all nodes.
async fn configure_host_binding(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
nodes: &Vec<PhysicalHost>,
) -> Result<(), InterpretError> {
info!(
"[{}] Configuring host bindings for {} plane nodes.",
self.host_role, self.host_role,
);
let host_properties = self.okd_role_properties(&self.host_role);
self.validate_host_node_match(nodes, host_properties.logical_hosts(topology))?;
let bindings: Vec<HostBinding> =
self.host_bindings(nodes, host_properties.logical_hosts(topology));
DhcpHostBindingScore {
host_binding: bindings,
domain: Some(topology.domain_name.clone()),
}
.interpret(inventory, topology)
.await?;
Ok(())
}
// Ensure the topology definition matches the number of physical nodes found.
fn validate_host_node_match(
&self,
nodes: &Vec<PhysicalHost>,
hosts: &Vec<LogicalHost>,
) -> Result<(), InterpretError> {
if hosts.len() != nodes.len() {
return Err(InterpretError::new(format!(
"Mismatch between logical hosts defined in topology ({}) and physical nodes found ({}).",
hosts.len(),
nodes.len()
)));
}
Ok(())
}
// Create a binding for each physical host to its corresponding logical host.
fn host_bindings(
&self,
nodes: &Vec<PhysicalHost>,
hosts: &Vec<LogicalHost>,
) -> Vec<HostBinding> {
hosts
.iter()
.zip(nodes.iter())
.map(|(logical_host, physical_host)| {
info!(
"Creating binding: Logical Host '{}' -> Physical Host ID '{}'",
logical_host.name, physical_host.id
);
HostBinding {
logical_host: logical_host.clone(),
physical_host: physical_host.clone(),
}
})
.collect()
}
/// Renders and deploys a per-MAC iPXE boot file for each node.
async fn configure_ipxe(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
nodes: &Vec<PhysicalHost>,
) -> Result<(), InterpretError> {
info!(
"[{}] Rendering per-MAC iPXE configurations.",
self.host_role
);
let okd_role_properties = self.okd_role_properties(&self.host_role);
// The iPXE script content is the same for all control plane nodes,
// pointing to the 'master.ign' ignition file.
let content = BootstrapIpxeTpl {
http_ip: &topology.http_server.get_ip().to_string(),
scos_path: "scos",
ignition_http_path: "okd_ignition_files",
//TODO must be refactored to not only use /dev/sda
installation_device: "/dev/sda", // This might need to be configurable per-host in the future
ignition_file_name: okd_role_properties.ignition_file(),
}
.to_string();
debug!("[{}] iPXE content template:\n{content}", self.host_role);
// Create and apply an iPXE boot file for each node.
for node in nodes {
let mac_address = node.get_mac_address();
if mac_address.is_empty() {
return Err(InterpretError::new(format!(
"Physical host with ID '{}' has no MAC addresses defined.",
node.id
)));
}
info!(
"[{}] Applying iPXE config for node ID '{}' with MACs: {:?}",
self.host_role, node.id, mac_address
);
IPxeMacBootFileScore {
mac_address,
content: content.clone(),
}
.interpret(inventory, topology)
.await?;
}
Ok(())
}
/// Prompts the user to reboot the target control plane nodes.
async fn reboot_targets(&self, nodes: &Vec<PhysicalHost>) -> Result<(), InterpretError> {
let node_ids: Vec<String> = nodes.iter().map(|n| n.id.to_string()).collect();
info!(
"[{}] Requesting reboot for control plane nodes: {node_ids:?}",
self.host_role
);
let confirmation = inquire::Confirm::new(
&format!("Please reboot the {} {} nodes ({}) to apply their PXE configuration. Press enter when ready.", nodes.len(), self.host_role, node_ids.join(", ")),
)
.prompt()
.map_err(|e| InterpretError::new(format!("User prompt failed: {e}")))?;
if !confirmation {
return Err(InterpretError::new(
"User aborted the operation.".to_string(),
));
}
Ok(())
}
}
#[async_trait]
impl Interpret<HAClusterTopology> for OKDNodeInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
) -> Result<Outcome, InterpretError> {
// 1. Ensure we have the specfied number of physical hosts.
let nodes = self.get_nodes(inventory, topology).await?;
// 2. Create DHCP reservations for the nodes.
self.configure_host_binding(inventory, topology, &nodes)
.await?;
// 3. Create iPXE files for each node to boot from the ignition.
self.configure_ipxe(inventory, topology, &nodes).await?;
// 4. Reboot the nodes to start the OS installation.
self.reboot_targets(&nodes).await?;
// TODO: Implement a step to validate that the installation of the nodes is
// complete and for the cluster operators to become available.
//
// The OpenShift installer only provides two wait commands which currently need to be
// run manually:
// - `openshift-install wait-for bootstrap-complete`
// - `openshift-install wait-for install-complete`
//
// There is no installer command that waits specifically for worker node
// provisioning. Worker nodes join asynchronously (via ignition + CSR approval),
// and the cluster becomes fully functional only once all nodes are Ready and the
// cluster operators report Available=True.
info!(
"[{}] Provisioning initiated. Monitor the cluster convergence manually.",
self.host_role
);
Ok(Outcome::success(format!(
"{} provisioning has been successfully initiated.",
self.host_role
)))
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("OKDNodeSetup".into())
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}

View File

@@ -1,87 +0,0 @@
use std::sync::Arc;
use async_trait::async_trait;
use harmony_types::id::Id;
use kube::api::GroupVersionKind;
use serde::Serialize;
use serde_json::json;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
score::Score,
topology::{K8sclient, Topology, k8s::K8sClient},
};
#[derive(Debug, Clone, Serialize)]
pub struct ControlPlaneConfig {}
impl<T: Topology + K8sclient> Score<T> for ControlPlaneConfig {
fn name(&self) -> String {
"ControlPlaneConfig".to_string()
}
#[doc(hidden)]
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
todo!()
}
}
#[derive(Debug, Clone, Serialize)]
pub struct ControlPlaneConfigInterpret {
score: ControlPlaneConfig,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for ControlPlaneConfigInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let client = topology.k8s_client().await.unwrap();
self.control_plane_unschedulable(&client).await
}
fn get_name(&self) -> InterpretName {
todo!()
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
impl ControlPlaneConfigInterpret {
async fn control_plane_unschedulable(
&self,
client: &Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
let patch = json!({
"spec": {
"mastersSchedulable": false
}
});
let resource = GroupVersionKind {
group: "config.openshift.io".to_string(),
version: "v1".to_string(),
kind: "Scheduler".to_string(),
};
client.patch_resource(patch, &resource).await?;
Ok(Outcome::success(
"control planes are no longer schedulable".to_string(),
))
}
}

View File

@@ -6,12 +6,14 @@ mod bootstrap_05_sanity_check;
mod bootstrap_06_installation_report;
pub mod bootstrap_dhcp;
pub mod bootstrap_load_balancer;
pub mod bootstrap_okd_node;
mod bootstrap_persist_network_bond;
pub mod dhcp;
pub mod dns;
pub mod installation;
pub mod ipxe;
pub mod load_balancer;
pub mod okd_node;
pub mod templates;
pub mod upgrade;
pub use bootstrap_01_prepare::*;

View File

@@ -0,0 +1,54 @@
use crate::topology::{HAClusterTopology, LogicalHost};
pub trait OKDRoleProperties {
fn ignition_file(&self) -> &'static str;
fn required_hosts(&self) -> usize;
fn logical_hosts<'a>(&self, t: &'a HAClusterTopology) -> &'a Vec<LogicalHost>;
}
pub struct BootstrapRole;
pub struct ControlPlaneRole;
pub struct WorkerRole;
pub struct StorageRole;
impl OKDRoleProperties for BootstrapRole {
fn ignition_file(&self) -> &'static str {
"bootstrap.ign"
}
fn required_hosts(&self) -> usize {
1
}
fn logical_hosts<'a>(&self, t: &'a HAClusterTopology) -> &'a Vec<LogicalHost> {
todo!()
}
}
impl OKDRoleProperties for ControlPlaneRole {
fn ignition_file(&self) -> &'static str {
"master.ign"
}
fn required_hosts(&self) -> usize {
3
}
fn logical_hosts<'a>(&self, t: &'a HAClusterTopology) -> &'a Vec<LogicalHost> {
&t.control_plane
}
}
impl OKDRoleProperties for WorkerRole {
fn ignition_file(&self) -> &'static str {
"worker.ign"
}
fn required_hosts(&self) -> usize {
2
}
fn logical_hosts<'a>(&self, t: &'a HAClusterTopology) -> &'a Vec<LogicalHost> {
&t.workers
}
}