Compare commits
3 Commits
feat/worke
...
9617e1cfde
| Author | SHA1 | Date | |
|---|---|---|---|
| 9617e1cfde | |||
| a953284386 | |||
| bfde5f58ed |
114
adr/015-higher-order-topologies.md
Normal file
114
adr/015-higher-order-topologies.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
# Architecture Decision Record: Higher-Order Topologies
|
||||||
|
|
||||||
|
**Initial Author:** Jean-Gabriel Gill-Couture
|
||||||
|
**Initial Date:** 2025-12-08
|
||||||
|
**Last Updated Date:** 2025-12-08
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Implemented
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Harmony models infrastructure as **Topologies** (deployment targets like `K8sAnywhereTopology`, `LinuxHostTopology`) implementing **Capabilities** (tech traits like `PostgreSQL`, `Docker`).
|
||||||
|
|
||||||
|
**Higher-Order Topologies** (e.g., `FailoverTopology<T>`) compose/orchestrate capabilities *across* multiple underlying topologies (e.g., primary+replica `T`).
|
||||||
|
|
||||||
|
Naive design requires manual `impl Capability for HigherOrderTopology<T>` *per T per capability*, causing:
|
||||||
|
- **Impl explosion**: N topologies × M capabilities = N×M boilerplate.
|
||||||
|
- **ISP violation**: Topologies forced to impl unrelated capabilities.
|
||||||
|
- **Maintenance hell**: New topology needs impls for *all* orchestrated capabilities; new capability needs impls for *all* topologies/higher-order.
|
||||||
|
- **Barrier to extension**: Users can't easily add topologies without todos/panics.
|
||||||
|
|
||||||
|
This makes scaling Harmony impractical as ecosystem grows.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Use **blanket trait impls** on higher-order topologies to *automatically* derive orchestration:
|
||||||
|
|
||||||
|
````rust
|
||||||
|
/// Higher-Order Topology: Orchestrates capabilities across sub-topologies.
|
||||||
|
pub struct FailoverTopology<T> {
|
||||||
|
/// Primary sub-topology.
|
||||||
|
primary: T,
|
||||||
|
/// Replica sub-topology.
|
||||||
|
replica: T,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Automatically provides PostgreSQL failover for *any* `T: PostgreSQL`.
|
||||||
|
/// Delegates to primary for queries; orchestrates deploy across both.
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: PostgreSQL> PostgreSQL for FailoverTopology<T> {
|
||||||
|
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
|
||||||
|
// Deploy primary; extract certs/endpoint;
|
||||||
|
// deploy replica with pg_basebackup + TLS passthrough.
|
||||||
|
// (Full impl logged/elaborated.)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delegate queries to primary.
|
||||||
|
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
|
||||||
|
self.primary.get_replication_certs(cluster_name).await
|
||||||
|
}
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Similarly for other capabilities.
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Docker> Docker for FailoverTopology<T> {
|
||||||
|
// Failover Docker orchestration.
|
||||||
|
}
|
||||||
|
````
|
||||||
|
|
||||||
|
**Key properties:**
|
||||||
|
- **Auto-derivation**: `Failover<K8sAnywhere>` gets `PostgreSQL` iff `K8sAnywhere: PostgreSQL`.
|
||||||
|
- **No boilerplate**: One blanket impl per capability *per higher-order type*.
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- **Composition via generics**: Rust trait solver auto-selects impls; zero runtime cost.
|
||||||
|
- **Compile-time safety**: Missing `T: Capability` → compile error (no panics).
|
||||||
|
- **Scalable**: O(capabilities) impls per higher-order; new `T` auto-works.
|
||||||
|
- **ISP-respecting**: Capabilities only surface if sub-topology provides.
|
||||||
|
- **Centralized logic**: Orchestration (e.g., cert propagation) in one place.
|
||||||
|
|
||||||
|
**Example usage:**
|
||||||
|
````rust
|
||||||
|
// ✅ Works: K8sAnywhere: PostgreSQL → Failover provides failover PG
|
||||||
|
let pg_failover: FailoverTopology<K8sAnywhereTopology> = ...;
|
||||||
|
pg_failover.deploy_pg(config).await;
|
||||||
|
|
||||||
|
// ✅ Works: LinuxHost: Docker → Failover provides failover Docker
|
||||||
|
let docker_failover: FailoverTopology<LinuxHostTopology> = ...;
|
||||||
|
docker_failover.deploy_docker(...).await;
|
||||||
|
|
||||||
|
// ❌ Compile fail: K8sAnywhere !: Docker
|
||||||
|
let invalid: FailoverTopology<K8sAnywhereTopology>;
|
||||||
|
invalid.deploy_docker(...); // `T: Docker` bound unsatisfied
|
||||||
|
````
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- **Extensible**: New topology `AWSTopology: PostgreSQL` → instant `Failover<AWSTopology>: PostgreSQL`.
|
||||||
|
- **Lean**: No useless impls (e.g., no `K8sAnywhere: Docker`).
|
||||||
|
- **Observable**: Logs trace every step.
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- **Monomorphization**: Generics generate code per T (mitigated: few Ts).
|
||||||
|
- **Delegation opacity**: Relies on rustdoc/logs for internals.
|
||||||
|
|
||||||
|
## Alternatives considered
|
||||||
|
|
||||||
|
| Approach | Pros | Cons |
|
||||||
|
|----------|------|------|
|
||||||
|
| **Manual per-T impls**<br>`impl PG for Failover<K8s> {..}`<br>`impl PG for Failover<Linux> {..}` | Explicit control | N×M explosion; violates ISP; hard to extend. |
|
||||||
|
| **Dynamic trait objects**<br>`Box<dyn AnyCapability>` | Runtime flex | Perf hit; type erasure; error-prone dispatch. |
|
||||||
|
| **Mega-topology trait**<br>All-in-one `OrchestratedTopology` | Simple wiring | Monolithic; poor composition. |
|
||||||
|
| **Registry dispatch**<br>Runtime capability lookup | Decoupled | Complex; no compile safety; perf/debug overhead. |
|
||||||
|
|
||||||
|
**Selected**: Blanket impls leverage Rust generics for safe, zero-cost composition.
|
||||||
|
|
||||||
|
## Additional Notes
|
||||||
|
|
||||||
|
- Applies to `MultisiteTopology<T>`, `ShardedTopology<T>`, etc.
|
||||||
|
- `FailoverTopology` in `failover.rs` is first implementation.
|
||||||
153
adr/015-higher-order-topologies/example.rs
Normal file
153
adr/015-higher-order-topologies/example.rs
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
//! Example of Higher-Order Topologies in Harmony.
|
||||||
|
//! Demonstrates how `FailoverTopology<T>` automatically provides failover for *any* capability
|
||||||
|
//! supported by a sub-topology `T` via blanket trait impls.
|
||||||
|
//!
|
||||||
|
//! Key insight: No manual impls per T or capability -- scales effortlessly.
|
||||||
|
//! Users can:
|
||||||
|
//! - Write new `Topology` (impl capabilities on a struct).
|
||||||
|
//! - Compose with `FailoverTopology` (gets capabilities if T has them).
|
||||||
|
//! - Compile fails if capability missing (safety).
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use tokio;
|
||||||
|
|
||||||
|
/// Capability trait: Deploy and manage PostgreSQL.
|
||||||
|
#[async_trait]
|
||||||
|
pub trait PostgreSQL {
|
||||||
|
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String>;
|
||||||
|
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Capability trait: Deploy Docker.
|
||||||
|
#[async_trait]
|
||||||
|
pub trait Docker {
|
||||||
|
async fn deploy_docker(&self) -> Result<String, String>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Configuration for PostgreSQL deployments.
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct PostgreSQLConfig;
|
||||||
|
|
||||||
|
/// Replication certificates.
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct ReplicationCerts;
|
||||||
|
|
||||||
|
/// Concrete topology: Kubernetes Anywhere (supports PostgreSQL).
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct K8sAnywhereTopology;
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl PostgreSQL for K8sAnywhereTopology {
|
||||||
|
async fn deploy(&self, _config: &PostgreSQLConfig) -> Result<String, String> {
|
||||||
|
// Real impl: Use k8s helm chart, operator, etc.
|
||||||
|
Ok("K8sAnywhere PostgreSQL deployed".to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_replication_certs(&self, _cluster_name: &str) -> Result<ReplicationCerts, String> {
|
||||||
|
Ok(ReplicationCerts)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Concrete topology: Linux Host (supports Docker).
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct LinuxHostTopology;
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl Docker for LinuxHostTopology {
|
||||||
|
async fn deploy_docker(&self) -> Result<String, String> {
|
||||||
|
// Real impl: Install/configure Docker on host.
|
||||||
|
Ok("LinuxHost Docker deployed".to_string())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Higher-Order Topology: Composes multiple sub-topologies (primary + replica).
|
||||||
|
/// Automatically derives *all* capabilities of `T` with failover orchestration.
|
||||||
|
///
|
||||||
|
/// - If `T: PostgreSQL`, then `FailoverTopology<T>: PostgreSQL` (blanket impl).
|
||||||
|
/// - Same for `Docker`, etc. No boilerplate!
|
||||||
|
/// - Compile-time safe: Missing `T: Capability` → error.
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct FailoverTopology<T> {
|
||||||
|
/// Primary sub-topology.
|
||||||
|
pub primary: T,
|
||||||
|
/// Replica sub-topology.
|
||||||
|
pub replica: T,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Blanket impl: Failover PostgreSQL if T provides PostgreSQL.
|
||||||
|
/// Delegates reads to primary; deploys to both.
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: PostgreSQL + Send + Sync + Clone> PostgreSQL for FailoverTopology<T> {
|
||||||
|
async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
|
||||||
|
// Orchestrate: Deploy primary first, then replica (e.g., via pg_basebackup).
|
||||||
|
let primary_result = self.primary.deploy(config).await?;
|
||||||
|
let replica_result = self.replica.deploy(config).await?;
|
||||||
|
Ok(format!("Failover PG deployed: {} | {}", primary_result, replica_result))
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
|
||||||
|
// Delegate to primary (replica follows).
|
||||||
|
self.primary.get_replication_certs(cluster_name).await
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Blanket impl: Failover Docker if T provides Docker.
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Docker + Send + Sync + Clone> Docker for FailoverTopology<T> {
|
||||||
|
async fn deploy_docker(&self) -> Result<String, String> {
|
||||||
|
// Orchestrate across primary + replica.
|
||||||
|
let primary_result = self.primary.deploy_docker().await?;
|
||||||
|
let replica_result = self.replica.deploy_docker().await?;
|
||||||
|
Ok(format!("Failover Docker deployed: {} | {}", primary_result, replica_result))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() {
|
||||||
|
let config = PostgreSQLConfig;
|
||||||
|
|
||||||
|
println!("=== ✅ PostgreSQL Failover (K8sAnywhere supports PG) ===");
|
||||||
|
let pg_failover = FailoverTopology {
|
||||||
|
primary: K8sAnywhereTopology,
|
||||||
|
replica: K8sAnywhereTopology,
|
||||||
|
};
|
||||||
|
let result = pg_failover.deploy(&config).await.unwrap();
|
||||||
|
println!("Result: {}", result);
|
||||||
|
|
||||||
|
println!("\n=== ✅ Docker Failover (LinuxHost supports Docker) ===");
|
||||||
|
let docker_failover = FailoverTopology {
|
||||||
|
primary: LinuxHostTopology,
|
||||||
|
replica: LinuxHostTopology,
|
||||||
|
};
|
||||||
|
let result = docker_failover.deploy_docker().await.unwrap();
|
||||||
|
println!("Result: {}", result);
|
||||||
|
|
||||||
|
println!("\n=== ❌ Would fail to compile (K8sAnywhere !: Docker) ===");
|
||||||
|
// let invalid = FailoverTopology {
|
||||||
|
// primary: K8sAnywhereTopology,
|
||||||
|
// replica: K8sAnywhereTopology,
|
||||||
|
// };
|
||||||
|
// invalid.deploy_docker().await.unwrap(); // Error: `K8sAnywhereTopology: Docker` not satisfied!
|
||||||
|
// Very clear error message :
|
||||||
|
// error[E0599]: the method `deploy_docker` exists for struct `FailoverTopology<K8sAnywhereTopology>`, but its trait bounds were not satisfied
|
||||||
|
// --> src/main.rs:90:9
|
||||||
|
// |
|
||||||
|
// 4 | pub struct FailoverTopology<T> {
|
||||||
|
// | ------------------------------ method `deploy_docker` not found for this struct because it doesn't satisfy `FailoverTopology<K8sAnywhereTopology>: Docker`
|
||||||
|
// ...
|
||||||
|
// 37 | struct K8sAnywhereTopology;
|
||||||
|
// | -------------------------- doesn't satisfy `K8sAnywhereTopology: Docker`
|
||||||
|
// ...
|
||||||
|
// 90 | invalid.deploy_docker(); // `T: Docker` bound unsatisfied
|
||||||
|
// | ^^^^^^^^^^^^^ method cannot be called on `FailoverTopology<K8sAnywhereTopology>` due to unsatisfied trait bounds
|
||||||
|
// |
|
||||||
|
// note: trait bound `K8sAnywhereTopology: Docker` was not satisfied
|
||||||
|
// --> src/main.rs:61:9
|
||||||
|
// |
|
||||||
|
// 61 | impl<T: Docker + Send + Sync> Docker for FailoverTopology<T> {
|
||||||
|
// | ^^^^^^ ------ -------------------
|
||||||
|
// | |
|
||||||
|
// | unsatisfied trait bound introduced here
|
||||||
|
// note: the trait `Docker` must be implemented
|
||||||
|
}
|
||||||
|
|
||||||
@@ -17,6 +17,12 @@ use crate::{
|
|||||||
topology::{HostNetworkConfig, NetworkError, NetworkManager, k8s::K8sClient},
|
topology::{HostNetworkConfig, NetworkError, NetworkManager, k8s::K8sClient},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/// TODO document properly the non-intuitive behavior or "roll forward only" of nmstate in general
|
||||||
|
/// It is documented in nmstate official doc, but worth mentionning here :
|
||||||
|
///
|
||||||
|
/// - You create a bond, nmstate will apply it
|
||||||
|
/// - You delete de bond from nmstate, it will NOT delete it
|
||||||
|
/// - To delete it you have to update it with configuration set to null
|
||||||
pub struct OpenShiftNmStateNetworkManager {
|
pub struct OpenShiftNmStateNetworkManager {
|
||||||
k8s_client: Arc<K8sClient>,
|
k8s_client: Arc<K8sClient>,
|
||||||
}
|
}
|
||||||
@@ -31,6 +37,7 @@ impl std::fmt::Debug for OpenShiftNmStateNetworkManager {
|
|||||||
impl NetworkManager for OpenShiftNmStateNetworkManager {
|
impl NetworkManager for OpenShiftNmStateNetworkManager {
|
||||||
async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError> {
|
async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError> {
|
||||||
debug!("Installing NMState controller...");
|
debug!("Installing NMState controller...");
|
||||||
|
// TODO use operatorhub maybe?
|
||||||
self.k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/nmstate.io_nmstates.yaml
|
self.k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/nmstate.io_nmstates.yaml
|
||||||
").unwrap(), Some("nmstate"))
|
").unwrap(), Some("nmstate"))
|
||||||
.await?;
|
.await?;
|
||||||
|
|||||||
@@ -1,21 +1,15 @@
|
|||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use derive_new::new;
|
use derive_new::new;
|
||||||
use harmony_types::id::Id;
|
use harmony_types::id::Id;
|
||||||
use log::{debug, info};
|
use log::info;
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
|
|
||||||
use crate::{
|
use crate::{
|
||||||
data::Version,
|
data::Version,
|
||||||
hardware::PhysicalHost,
|
|
||||||
infra::inventory::InventoryRepositoryFactory,
|
|
||||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
inventory::{HostRole, Inventory},
|
inventory::Inventory,
|
||||||
modules::{
|
|
||||||
dhcp::DhcpHostBindingScore, http::IPxeMacBootFileScore,
|
|
||||||
inventory::DiscoverHostForRoleScore, okd::templates::BootstrapIpxeTpl,
|
|
||||||
},
|
|
||||||
score::Score,
|
score::Score,
|
||||||
topology::{HAClusterTopology, HostBinding},
|
topology::HAClusterTopology,
|
||||||
};
|
};
|
||||||
|
|
||||||
// -------------------------------------------------------------------------------------------------
|
// -------------------------------------------------------------------------------------------------
|
||||||
@@ -58,159 +52,6 @@ impl OKDSetup04WorkersInterpret {
|
|||||||
info!("[Workers] Rendering per-MAC PXE for workers and rebooting");
|
info!("[Workers] Rendering per-MAC PXE for workers and rebooting");
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Ensures that three physical hosts are discovered and available for the ControlPlane role.
|
|
||||||
/// It will trigger discovery if not enough hosts are found.
|
|
||||||
async fn get_nodes(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &HAClusterTopology,
|
|
||||||
) -> Result<Vec<PhysicalHost>, InterpretError> {
|
|
||||||
const REQUIRED_HOSTS: usize = 2;
|
|
||||||
let repo = InventoryRepositoryFactory::build().await?;
|
|
||||||
let mut control_plane_hosts = repo.get_host_for_role(&HostRole::Worker).await?;
|
|
||||||
|
|
||||||
while control_plane_hosts.len() < REQUIRED_HOSTS {
|
|
||||||
info!(
|
|
||||||
"Discovery of {} control plane hosts in progress, current number {}",
|
|
||||||
REQUIRED_HOSTS,
|
|
||||||
control_plane_hosts.len()
|
|
||||||
);
|
|
||||||
// This score triggers the discovery agent for a specific role.
|
|
||||||
DiscoverHostForRoleScore {
|
|
||||||
role: HostRole::Worker,
|
|
||||||
}
|
|
||||||
.interpret(inventory, topology)
|
|
||||||
.await?;
|
|
||||||
control_plane_hosts = repo.get_host_for_role(&HostRole::Worker).await?;
|
|
||||||
}
|
|
||||||
|
|
||||||
if control_plane_hosts.len() < REQUIRED_HOSTS {
|
|
||||||
Err(InterpretError::new(format!(
|
|
||||||
"OKD Requires at least {} control plane hosts, but only found {}. Cannot proceed.",
|
|
||||||
REQUIRED_HOSTS,
|
|
||||||
control_plane_hosts.len()
|
|
||||||
)))
|
|
||||||
} else {
|
|
||||||
// Take exactly the number of required hosts to ensure consistency.
|
|
||||||
Ok(control_plane_hosts
|
|
||||||
.into_iter()
|
|
||||||
.take(REQUIRED_HOSTS)
|
|
||||||
.collect())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Configures DHCP host bindings for all control plane nodes.
|
|
||||||
async fn configure_host_binding(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &HAClusterTopology,
|
|
||||||
nodes: &Vec<PhysicalHost>,
|
|
||||||
) -> Result<(), InterpretError> {
|
|
||||||
info!("[Worker] Configuring host bindings for worker nodes.");
|
|
||||||
|
|
||||||
// Ensure the topology definition matches the number of physical nodes found.
|
|
||||||
if topology.control_plane.len() != nodes.len() {
|
|
||||||
return Err(InterpretError::new(format!(
|
|
||||||
"Mismatch between logical control plane hosts defined in topology ({}) and physical nodes found ({}).",
|
|
||||||
topology.control_plane.len(),
|
|
||||||
nodes.len()
|
|
||||||
)));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create a binding for each physical host to its corresponding logical host.
|
|
||||||
let bindings: Vec<HostBinding> = topology
|
|
||||||
.control_plane
|
|
||||||
.iter()
|
|
||||||
.zip(nodes.iter())
|
|
||||||
.map(|(logical_host, physical_host)| {
|
|
||||||
info!(
|
|
||||||
"Creating binding: Logical Host '{}' -> Physical Host ID '{}'",
|
|
||||||
logical_host.name, physical_host.id
|
|
||||||
);
|
|
||||||
HostBinding {
|
|
||||||
logical_host: logical_host.clone(),
|
|
||||||
physical_host: physical_host.clone(),
|
|
||||||
}
|
|
||||||
})
|
|
||||||
.collect();
|
|
||||||
|
|
||||||
DhcpHostBindingScore {
|
|
||||||
host_binding: bindings,
|
|
||||||
domain: Some(topology.domain_name.clone()),
|
|
||||||
}
|
|
||||||
.interpret(inventory, topology)
|
|
||||||
.await?;
|
|
||||||
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Renders and deploys a per-MAC iPXE boot file for each control plane node.
|
|
||||||
async fn configure_ipxe(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &HAClusterTopology,
|
|
||||||
nodes: &Vec<PhysicalHost>,
|
|
||||||
) -> Result<(), InterpretError> {
|
|
||||||
info!("[Worker] Rendering per-MAC iPXE configurations.");
|
|
||||||
|
|
||||||
// The iPXE script content is the same for all control plane nodes,
|
|
||||||
// pointing to the 'master.ign' ignition file.
|
|
||||||
let content = BootstrapIpxeTpl {
|
|
||||||
http_ip: &topology.http_server.get_ip().to_string(),
|
|
||||||
scos_path: "scos",
|
|
||||||
ignition_http_path: "okd_ignition_files",
|
|
||||||
installation_device: "/dev/sda", // This might need to be configurable per-host in the future
|
|
||||||
ignition_file_name: "worker.ign", // Worker nodes use the worker ignition file
|
|
||||||
}
|
|
||||||
.to_string();
|
|
||||||
|
|
||||||
debug!("[Worker] iPXE content template:\n{content}");
|
|
||||||
|
|
||||||
// Create and apply an iPXE boot file for each node.
|
|
||||||
for node in nodes {
|
|
||||||
let mac_address = node.get_mac_address();
|
|
||||||
if mac_address.is_empty() {
|
|
||||||
return Err(InterpretError::new(format!(
|
|
||||||
"Physical host with ID '{}' has no MAC addresses defined.",
|
|
||||||
node.id
|
|
||||||
)));
|
|
||||||
}
|
|
||||||
info!(
|
|
||||||
"[Worker] Applying iPXE config for node ID '{}' with MACs: {:?}",
|
|
||||||
node.id, mac_address
|
|
||||||
);
|
|
||||||
|
|
||||||
IPxeMacBootFileScore {
|
|
||||||
mac_address,
|
|
||||||
content: content.clone(),
|
|
||||||
}
|
|
||||||
.interpret(inventory, topology)
|
|
||||||
.await?;
|
|
||||||
}
|
|
||||||
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Prompts the user to reboot the target control plane nodes.
|
|
||||||
async fn reboot_targets(&self, nodes: &Vec<PhysicalHost>) -> Result<(), InterpretError> {
|
|
||||||
let node_ids: Vec<String> = nodes.iter().map(|n| n.id.to_string()).collect();
|
|
||||||
info!("[Worker] Requesting reboot for control plane nodes: {node_ids:?}",);
|
|
||||||
|
|
||||||
let confirmation = inquire::Confirm::new(
|
|
||||||
&format!("Please reboot the {} worker nodes ({}) to apply their PXE configuration. Press enter when ready.", nodes.len(), node_ids.join(", ")),
|
|
||||||
)
|
|
||||||
.prompt()
|
|
||||||
.map_err(|e| InterpretError::new(format!("User prompt failed: {e}")))?;
|
|
||||||
|
|
||||||
if !confirmation {
|
|
||||||
return Err(InterpretError::new(
|
|
||||||
"User aborted the operation.".to_string(),
|
|
||||||
));
|
|
||||||
}
|
|
||||||
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
@@ -233,23 +74,10 @@ impl Interpret<HAClusterTopology> for OKDSetup04WorkersInterpret {
|
|||||||
|
|
||||||
async fn execute(
|
async fn execute(
|
||||||
&self,
|
&self,
|
||||||
inventory: &Inventory,
|
_inventory: &Inventory,
|
||||||
topology: &HAClusterTopology,
|
_topology: &HAClusterTopology,
|
||||||
) -> Result<Outcome, InterpretError> {
|
) -> Result<Outcome, InterpretError> {
|
||||||
self.render_and_reboot().await?;
|
self.render_and_reboot().await?;
|
||||||
// 1. Ensure we have 2 physical hosts for the worker nodes.
|
|
||||||
let nodes = self.get_nodes(inventory, topology).await?;
|
|
||||||
|
|
||||||
// 2. Create DHCP reservations for the worker nodes.
|
|
||||||
self.configure_host_binding(inventory, topology, &nodes)
|
|
||||||
.await?;
|
|
||||||
|
|
||||||
// 3. Create iPXE files for each worker node to boot from the worker ignition.
|
|
||||||
self.configure_ipxe(inventory, topology, &nodes).await?;
|
|
||||||
|
|
||||||
// 4. Reboot the nodes to start the OS installation.
|
|
||||||
self.reboot_targets(&nodes).await?;
|
|
||||||
|
|
||||||
Ok(Outcome::success("Workers provisioned".into()))
|
Ok(Outcome::success("Workers provisioned".into()))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user