feat/drain_k8s_node #232
34
Cargo.lock
generated
34
Cargo.lock
generated
@@ -1828,6 +1828,40 @@ dependencies = [
|
|||||||
"url",
|
"url",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-k8s-drain-node"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"assert_cmd",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"inquire 0.7.5",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-k8s-write-file-on-node"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"assert_cmd",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"inquire 0.7.5",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "example-kube-rs"
|
name = "example-kube-rs"
|
||||||
version = "0.1.0"
|
version = "0.1.0"
|
||||||
|
|||||||
65
adr/019-Network-bond-setup.md
Normal file
65
adr/019-Network-bond-setup.md
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
# Architecture Decision Record: Network Bonding Configuration via External Automation
|
||||||
|
|
||||||
|
Initial Author: Jean-Gabriel Gill-Couture & Sylvain Tremblay
|
||||||
|
|
||||||
|
Initial Date: 2026-02-13
|
||||||
|
|
||||||
|
Last Updated Date: 2026-02-13
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
We need to configure LACP bonds on 10GbE interfaces across all worker nodes in the OpenShift cluster. A significant challenge is that interface names (e.g., `enp1s0f0` vs `ens1f0`) vary across different hardware nodes.
|
||||||
|
|
||||||
|
The standard OpenShift mechanism (MachineConfig) applies identical configurations to all nodes in a MachineConfigPool. Since the interface names differ, a single static MachineConfig cannot target specific physical devices across the entire cluster without complex workarounds.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
We will use the existing "Harmony" automation tool to generate and apply host-specific NetworkManager configuration files directly to the nodes.
|
||||||
|
|
||||||
|
1. Harmony will generate the specific `.nmconnection` files for the bond and slaves based on its inventory of interface names.
|
||||||
|
2. Files will be pushed to `/etc/NetworkManager/system-connections/` on each node.
|
||||||
|
3. Configuration will be applied via `nmcli` reload or a node reboot.
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
* **Inventory Awareness:** Harmony already possesses the specific interface mapping data for each host.
|
||||||
|
* **Persistence:** Fedora CoreOS/SCOS allows writing to `/etc`, and these files persist across reboots and OS upgrades (rpm-ostree updates).
|
||||||
|
* **Avoids Complexity:** This approach avoids the operational overhead of creating unique MachineConfigPools for every single host or hardware variant.
|
||||||
|
* **Safety:** Unlike wildcard matching, this ensures explicit interface selection, preventing accidental bonding of reserved interfaces (e.g., future separation of Ceph storage traffic).
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
* Precise, per-host configuration without polluting the Kubernetes API with hundreds of MachineConfigs.
|
||||||
|
* Standard Linux networking behavior; easy to debug locally.
|
||||||
|
* Prevents accidental interface capture (unlike wildcards).
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
* **Loss of Declarative K8s State:** The network config is not managed by the Machine Config Operator (MCO).
|
||||||
|
* **Node Replacement Friction:** Newly provisioned nodes (replacements) will boot with default config. Harmony must be run against new nodes manually or via a hook before they can fully join the cluster workload.
|
||||||
|
|
||||||
|
## Alternatives considered
|
||||||
|
|
||||||
|
1. **Wildcard Matching in NetworkManager (e.g., `interface-name=enp*`):**
|
||||||
|
* *Pros:* Single MachineConfig for the whole cluster.
|
||||||
|
* *Cons:* Rejected because it is too broad. It risks capturing interfaces intended for other purposes (e.g., splitting storage and cluster networks later).
|
||||||
|
|
||||||
|
2. **"Kitchen Sink" Configuration:**
|
||||||
|
* *Pros:* Single file listing every possible interface name as a slave.
|
||||||
|
* *Cons:* "Dirty" configuration; results in many inactive connections on every host; brittle if new naming schemes appear.
|
||||||
|
|
||||||
|
3. **Per-Host MachineConfig:**
|
||||||
|
* *Pros:* Fully declarative within OpenShift.
|
||||||
|
* *Cons:* Requires a unique `MachineConfigPool` per host, which is an anti-pattern and unmaintainable at scale.
|
||||||
|
|
||||||
|
4. **On-boot Generation Script:**
|
||||||
|
* *Pros:* Dynamic detection.
|
||||||
|
* *Cons:* Increases boot complexity; harder to debug if the script fails during startup.
|
||||||
|
|
||||||
|
## Additional Notes
|
||||||
|
|
||||||
|
While `/etc` is writable and persistent on CoreOS, this configuration falls outside the "Day 1" Ignition process. Operational runbooks must be updated to ensure Harmony runs on any node replacement events.
|
||||||
@@ -1,7 +1,7 @@
|
|||||||
use std::net::{IpAddr, Ipv4Addr};
|
use std::net::{IpAddr, Ipv4Addr};
|
||||||
|
|
||||||
use brocade::{BrocadeOptions, ssh};
|
use brocade::{BrocadeOptions, ssh};
|
||||||
use harmony_secret::Secret;
|
use harmony_secret::{Secret, SecretManager};
|
||||||
use harmony_types::switch::PortLocation;
|
use harmony_types::switch::PortLocation;
|
||||||
use schemars::JsonSchema;
|
use schemars::JsonSchema;
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
@@ -21,16 +21,14 @@ async fn main() {
|
|||||||
// let ip = IpAddr::V4(Ipv4Addr::new(192, 168, 4, 11)); // brocade @ st
|
// let ip = IpAddr::V4(Ipv4Addr::new(192, 168, 4, 11)); // brocade @ st
|
||||||
let switch_addresses = vec![ip];
|
let switch_addresses = vec![ip];
|
||||||
|
|
||||||
// let config = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
|
let config = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
|
||||||
// .await
|
.await
|
||||||
// .unwrap();
|
.unwrap();
|
||||||
|
|
||||||
let brocade = brocade::init(
|
let brocade = brocade::init(
|
||||||
&switch_addresses,
|
&switch_addresses,
|
||||||
// &config.username,
|
&config.username,
|
||||||
// &config.password,
|
&config.password,
|
||||||
"admin",
|
|
||||||
"password",
|
|
||||||
BrocadeOptions {
|
BrocadeOptions {
|
||||||
dry_run: true,
|
dry_run: true,
|
||||||
ssh: ssh::SshOptions {
|
ssh: ssh::SshOptions {
|
||||||
|
|||||||
20
examples/k8s_drain_node/Cargo.toml
Normal file
20
examples/k8s_drain_node/Cargo.toml
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
[package]
|
||||||
|
name = "example-k8s-drain-node"
|
||||||
|
edition = "2024"
|
||||||
|
version.workspace = true
|
||||||
|
readme.workspace = true
|
||||||
|
license.workspace = true
|
||||||
|
publish = false
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
harmony = { path = "../../harmony" }
|
||||||
|
harmony_cli = { path = "../../harmony_cli" }
|
||||||
|
harmony_types = { path = "../../harmony_types" }
|
||||||
|
cidr.workspace = true
|
||||||
|
tokio.workspace = true
|
||||||
|
harmony_macros = { path = "../../harmony_macros" }
|
||||||
|
log.workspace = true
|
||||||
|
env_logger.workspace = true
|
||||||
|
url.workspace = true
|
||||||
|
assert_cmd = "2.0.16"
|
||||||
|
inquire.workspace = true
|
||||||
61
examples/k8s_drain_node/src/main.rs
Normal file
61
examples/k8s_drain_node/src/main.rs
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use harmony::topology::k8s::{DrainOptions, K8sClient};
|
||||||
|
use log::{info, trace};
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() {
|
||||||
|
env_logger::init();
|
||||||
|
let k8s = K8sClient::try_default().await.unwrap();
|
||||||
|
let nodes = k8s.get_nodes(None).await.unwrap();
|
||||||
|
trace!("Got nodes : {nodes:#?}");
|
||||||
|
let node_names = nodes
|
||||||
|
.iter()
|
||||||
|
.map(|n| n.metadata.name.as_ref().unwrap())
|
||||||
|
.collect::<Vec<&String>>();
|
||||||
|
|
||||||
|
info!("Got nodes : {:?}", node_names);
|
||||||
|
|
||||||
|
let node_name = inquire::Select::new("What node do you want to operate on?", node_names)
|
||||||
|
.prompt()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let drain = inquire::Confirm::new("Do you wish to drain the node now ?")
|
||||||
|
.prompt()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
if drain {
|
||||||
|
let mut options = DrainOptions::default_ignore_daemonset_delete_emptydir_data();
|
||||||
|
options.timeout = Duration::from_secs(1);
|
||||||
|
k8s.drain_node(&node_name, &options).await.unwrap();
|
||||||
|
|
||||||
|
info!("Node {node_name} successfully drained");
|
||||||
|
}
|
||||||
|
|
||||||
|
let uncordon =
|
||||||
|
inquire::Confirm::new("Do you wish to uncordon node to resume scheduling workloads now?")
|
||||||
|
.prompt()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
if uncordon {
|
||||||
|
info!("Uncordoning node {node_name}");
|
||||||
|
k8s.uncordon_node(node_name).await.unwrap();
|
||||||
|
info!("Node {node_name} uncordoned");
|
||||||
|
}
|
||||||
|
|
||||||
|
let reboot = inquire::Confirm::new("Do you wish to reboot node now?")
|
||||||
|
.prompt()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
if reboot {
|
||||||
|
k8s.reboot_node(
|
||||||
|
&node_name,
|
||||||
|
&DrainOptions::default_ignore_daemonset_delete_emptydir_data(),
|
||||||
|
Duration::from_secs(3600),
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
|
info!("All done playing with nodes, happy harmonizing!");
|
||||||
|
}
|
||||||
20
examples/k8s_write_file_on_node/Cargo.toml
Normal file
20
examples/k8s_write_file_on_node/Cargo.toml
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
[package]
|
||||||
|
name = "example-k8s-write-file-on-node"
|
||||||
|
edition = "2024"
|
||||||
|
version.workspace = true
|
||||||
|
readme.workspace = true
|
||||||
|
license.workspace = true
|
||||||
|
publish = false
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
harmony = { path = "../../harmony" }
|
||||||
|
harmony_cli = { path = "../../harmony_cli" }
|
||||||
|
harmony_types = { path = "../../harmony_types" }
|
||||||
|
cidr.workspace = true
|
||||||
|
tokio.workspace = true
|
||||||
|
harmony_macros = { path = "../../harmony_macros" }
|
||||||
|
log.workspace = true
|
||||||
|
env_logger.workspace = true
|
||||||
|
url.workspace = true
|
||||||
|
assert_cmd = "2.0.16"
|
||||||
|
inquire.workspace = true
|
||||||
45
examples/k8s_write_file_on_node/src/main.rs
Normal file
45
examples/k8s_write_file_on_node/src/main.rs
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
use harmony::topology::k8s::{DrainOptions, K8sClient, NodeFile};
|
||||||
|
use log::{info, trace};
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() {
|
||||||
|
env_logger::init();
|
||||||
|
let k8s = K8sClient::try_default().await.unwrap();
|
||||||
|
let nodes = k8s.get_nodes(None).await.unwrap();
|
||||||
|
trace!("Got nodes : {nodes:#?}");
|
||||||
|
let node_names = nodes
|
||||||
|
.iter()
|
||||||
|
.map(|n| n.metadata.name.as_ref().unwrap())
|
||||||
|
.collect::<Vec<&String>>();
|
||||||
|
|
||||||
|
info!("Got nodes : {:?}", node_names);
|
||||||
|
|
||||||
|
let node = inquire::Select::new("What node do you want to write file to?", node_names)
|
||||||
|
.prompt()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let path = inquire::Text::new("File path on node").prompt().unwrap();
|
||||||
|
let content = inquire::Text::new("File content").prompt().unwrap();
|
||||||
|
|
||||||
|
let node_file = NodeFile {
|
||||||
|
path: path,
|
||||||
|
content: content,
|
||||||
|
mode: 0o600,
|
||||||
|
};
|
||||||
|
|
||||||
|
k8s.write_files_to_node(&node, &vec![node_file.clone()])
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let cmd = inquire::Text::new("Command to run on node")
|
||||||
|
.prompt()
|
||||||
|
.unwrap();
|
||||||
|
k8s.run_privileged_command_on_node(&node, &cmd)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
info!(
|
||||||
|
"File {} mode {} written in node {node}",
|
||||||
|
node_file.path, node_file.mode
|
||||||
|
);
|
||||||
|
}
|
||||||
@@ -1,5 +1,4 @@
|
|||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use brocade::PortOperatingMode;
|
|
||||||
use harmony_macros::ip;
|
use harmony_macros::ip;
|
||||||
use harmony_types::{
|
use harmony_types::{
|
||||||
id::Id,
|
id::Id,
|
||||||
@@ -301,10 +300,10 @@ impl Switch for HAClusterTopology {
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
|
async fn clear_port_channel(&self, _ids: &Vec<Id>) -> Result<(), SwitchError> {
|
||||||
todo!()
|
todo!()
|
||||||
}
|
}
|
||||||
async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
|
async fn configure_interface(&self, _ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
|
||||||
todo!()
|
todo!()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -322,7 +321,15 @@ impl NetworkManager for HAClusterTopology {
|
|||||||
self.network_manager().await.configure_bond(config).await
|
self.network_manager().await.configure_bond(config).await
|
||||||
}
|
}
|
||||||
|
|
||||||
//TODO add snmp here
|
async fn configure_bond_on_primary_interface(
|
||||||
|
&self,
|
||||||
|
config: &HostNetworkConfig,
|
||||||
|
) -> Result<(), NetworkError> {
|
||||||
|
self.network_manager()
|
||||||
|
.await
|
||||||
|
.configure_bond_on_primary_interface(config)
|
||||||
|
.await
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
@@ -562,10 +569,10 @@ impl SwitchClient for DummyInfra {
|
|||||||
) -> Result<u8, SwitchError> {
|
) -> Result<u8, SwitchError> {
|
||||||
unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
|
unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
|
||||||
}
|
}
|
||||||
async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
|
async fn clear_port_channel(&self, _ids: &Vec<Id>) -> Result<(), SwitchError> {
|
||||||
todo!()
|
todo!()
|
||||||
}
|
}
|
||||||
async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
|
async fn configure_interface(&self, _ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
|
||||||
todo!()
|
todo!()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
133
harmony/src/domain/topology/k8s/bundle.rs
Normal file
133
harmony/src/domain/topology/k8s/bundle.rs
Normal file
@@ -0,0 +1,133 @@
|
|||||||
|
//! Resource Bundle Pattern Implementation
|
||||||
|
//!
|
||||||
|
//! This module implements the Resource Bundle pattern for managing groups of
|
||||||
|
//! Kubernetes resources that form a logical unit of work.
|
||||||
|
//!
|
||||||
|
//! ## Purpose
|
||||||
|
//!
|
||||||
|
//! The ResourceBundle pattern addresses the need to manage ephemeral privileged
|
||||||
|
//! pods along with their platform-specific security requirements (e.g., OpenShift
|
||||||
|
//! Security Context Constraints).
|
||||||
|
//!
|
||||||
|
//! ## Use Cases
|
||||||
|
//!
|
||||||
|
//! - Writing files to node filesystems (e.g., NetworkManager configurations for
|
||||||
|
//! network bonding as described in ADR-019)
|
||||||
|
//! - Running privileged commands on nodes (e.g., reboots, system configuration)
|
||||||
|
//!
|
||||||
|
//! ## Benefits
|
||||||
|
//!
|
||||||
|
//! - **Separation of Concerns**: Client code doesn't need to know about
|
||||||
|
//! platform-specific RBAC requirements
|
||||||
|
//! - **Atomic Operations**: Resources are applied and deleted as a unit
|
||||||
|
//! - **Clean Abstractions**: Privileged operations are encapsulated in bundles
|
||||||
|
//! rather than scattered throughout client methods
|
||||||
|
//!
|
||||||
|
//! ## Example
|
||||||
|
//!
|
||||||
|
//! ```rust,no_run
|
||||||
|
//! use harmony::topology::k8s::{K8sClient, helper};
|
||||||
|
//! use harmony::topology::KubernetesDistribution;
|
||||||
|
//!
|
||||||
|
//! async fn write_network_config(client: &K8sClient, node: &str) {
|
||||||
|
//! // Create a bundle with platform-specific RBAC
|
||||||
|
//! let bundle = helper::build_privileged_bundle(
|
||||||
|
//! helper::PrivilegedPodConfig {
|
||||||
|
//! name: "network-config".to_string(),
|
||||||
|
//! namespace: "default".to_string(),
|
||||||
|
//! node_name: node.to_string(),
|
||||||
|
//! // ... other config
|
||||||
|
//! ..Default::default()
|
||||||
|
//! },
|
||||||
|
//! &KubernetesDistribution::OpenshiftFamily,
|
||||||
|
//! );
|
||||||
|
//!
|
||||||
|
//! // Apply all resources (RBAC + Pod) atomically
|
||||||
|
//! bundle.apply(client).await.unwrap();
|
||||||
|
//!
|
||||||
|
//! // ... wait for completion ...
|
||||||
|
//!
|
||||||
|
//! // Cleanup all resources
|
||||||
|
//! bundle.delete(client).await.unwrap();
|
||||||
|
//! }
|
||||||
|
//! ```
|
||||||
|
|
||||||
|
use kube::{Error, Resource, ResourceExt, api::DynamicObject};
|
||||||
|
use serde::Serialize;
|
||||||
|
use serde_json;
|
||||||
|
|
||||||
|
use crate::domain::topology::k8s::K8sClient;
|
||||||
|
|
||||||
|
/// A ResourceBundle represents a logical unit of work consisting of multiple
|
||||||
|
/// Kubernetes resources that should be applied or deleted together.
|
||||||
|
///
|
||||||
|
/// This pattern is useful for managing ephemeral privileged pods along with
|
||||||
|
/// their required RBAC bindings (e.g., OpenShift SCC bindings).
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct ResourceBundle {
|
||||||
|
pub resources: Vec<DynamicObject>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ResourceBundle {
|
||||||
|
pub fn new() -> Self {
|
||||||
|
Self {
|
||||||
|
resources: Vec::new(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Add a Kubernetes resource to this bundle.
|
||||||
|
/// The resource is converted to a DynamicObject for generic handling.
|
||||||
|
pub fn add<K>(&mut self, resource: K)
|
||||||
|
where
|
||||||
|
K: Resource + Serialize,
|
||||||
|
<K as Resource>::DynamicType: Default,
|
||||||
|
{
|
||||||
|
// Convert the typed resource to JSON, then to DynamicObject
|
||||||
|
let json = serde_json::to_value(&resource).expect("Failed to serialize resource");
|
||||||
|
let mut obj: DynamicObject =
|
||||||
|
serde_json::from_value(json).expect("Failed to convert to DynamicObject");
|
||||||
|
|
||||||
|
// Ensure type metadata is set
|
||||||
|
if obj.types.is_none() {
|
||||||
|
let api_version = Default::default();
|
||||||
|
let kind = Default::default();
|
||||||
|
let gvk = K::api_version(&api_version);
|
||||||
|
let kind = K::kind(&kind);
|
||||||
|
obj.types = Some(kube::api::TypeMeta {
|
||||||
|
api_version: gvk.to_string(),
|
||||||
|
kind: kind.to_string(),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
self.resources.push(obj);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Apply all resources in this bundle to the cluster.
|
||||||
|
/// Resources are applied in the order they were added.
|
||||||
|
pub async fn apply(&self, client: &K8sClient) -> Result<(), Error> {
|
||||||
|
for res in &self.resources {
|
||||||
|
let namespace = res.namespace();
|
||||||
|
client
|
||||||
|
.apply_dynamic(res, namespace.as_deref(), true)
|
||||||
|
.await?;
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Delete all resources in this bundle from the cluster.
|
||||||
|
/// Resources are deleted in reverse order to respect dependencies.
|
||||||
|
johnride marked this conversation as resolved
|
|||||||
|
pub async fn delete(&self, client: &K8sClient) -> Result<(), Error> {
|
||||||
|
// FIXME delete all in parallel and retry using kube::client::retry::RetryPolicy
|
||||||
|
for res in self.resources.iter().rev() {
|
||||||
|
let api = client.get_api_for_dynamic_object(res, res.namespace().as_deref())?;
|
||||||
|
let name = res.name_any();
|
||||||
|
// FIXME this swallows all errors. Swallowing a 404 is ok but other errors must be
|
||||||
|
// handled properly (such as retrying). A normal error case is when we delete a
|
||||||
|
// resource bundle with dependencies between various resources. Such as a pod with a
|
||||||
|
// dependency on a ClusterRoleBinding. Trying to delete the ClusterRoleBinding first
|
||||||
|
// is expected to fail
|
||||||
|
let _ = api.delete(&name, &kube::api::DeleteParams::default()).await;
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
1
harmony/src/domain/topology/k8s/config.rs
Normal file
1
harmony/src/domain/topology/k8s/config.rs
Normal file
@@ -0,0 +1 @@
|
|||||||
|
pub const PRIVILEGED_POD_IMAGE: &str = "hub.nationtech.io/redhat/ubi10:latest";
|
||||||
601
harmony/src/domain/topology/k8s/helper.rs
Normal file
601
harmony/src/domain/topology/k8s/helper.rs
Normal file
@@ -0,0 +1,601 @@
|
|||||||
|
use std::collections::BTreeMap;
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use crate::topology::KubernetesDistribution;
|
||||||
|
|
||||||
|
use super::bundle::ResourceBundle;
|
||||||
|
use super::config::PRIVILEGED_POD_IMAGE;
|
||||||
|
use k8s_openapi::api::core::v1::{
|
||||||
|
Container, HostPathVolumeSource, Pod, PodSpec, SecurityContext, Volume, VolumeMount,
|
||||||
|
};
|
||||||
|
use k8s_openapi::api::rbac::v1::{ClusterRoleBinding, RoleRef, Subject};
|
||||||
|
use k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta;
|
||||||
|
use kube::error::DiscoveryError;
|
||||||
|
use log::{debug, error, info, warn};
|
||||||
|
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct PrivilegedPodConfig {
|
||||||
|
pub name: String,
|
||||||
|
pub namespace: String,
|
||||||
|
pub node_name: String,
|
||||||
|
pub container_name: String,
|
||||||
|
pub command: Vec<String>,
|
||||||
|
pub volumes: Vec<Volume>,
|
||||||
|
pub volume_mounts: Vec<VolumeMount>,
|
||||||
|
pub host_pid: bool,
|
||||||
|
pub host_network: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for PrivilegedPodConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
name: "privileged-pod".to_string(),
|
||||||
|
namespace: "harmony".to_string(),
|
||||||
|
node_name: "".to_string(),
|
||||||
|
container_name: "privileged-container".to_string(),
|
||||||
|
command: vec![],
|
||||||
|
volumes: vec![],
|
||||||
|
volume_mounts: vec![],
|
||||||
|
host_pid: false,
|
||||||
|
host_network: false,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn build_privileged_pod(
|
||||||
|
config: PrivilegedPodConfig,
|
||||||
|
k8s_distribution: &KubernetesDistribution,
|
||||||
|
) -> Pod {
|
||||||
|
let annotations = match k8s_distribution {
|
||||||
|
KubernetesDistribution::OpenshiftFamily => Some(BTreeMap::from([
|
||||||
|
("openshift.io/scc".to_string(), "privileged".to_string()),
|
||||||
|
(
|
||||||
|
"openshift.io/required-scc".to_string(),
|
||||||
|
"privileged".to_string(),
|
||||||
|
),
|
||||||
|
])),
|
||||||
|
_ => None,
|
||||||
|
};
|
||||||
|
|
||||||
|
Pod {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(config.name),
|
||||||
|
namespace: Some(config.namespace),
|
||||||
|
annotations,
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
spec: Some(PodSpec {
|
||||||
|
node_name: Some(config.node_name),
|
||||||
|
restart_policy: Some("Never".to_string()),
|
||||||
|
host_pid: Some(config.host_pid),
|
||||||
|
host_network: Some(config.host_network),
|
||||||
|
containers: vec![Container {
|
||||||
|
name: config.container_name,
|
||||||
|
image: Some(PRIVILEGED_POD_IMAGE.to_string()),
|
||||||
|
command: Some(config.command),
|
||||||
|
security_context: Some(SecurityContext {
|
||||||
|
privileged: Some(true),
|
||||||
|
..Default::default()
|
||||||
|
}),
|
||||||
|
volume_mounts: Some(config.volume_mounts),
|
||||||
|
..Default::default()
|
||||||
|
}],
|
||||||
|
volumes: Some(config.volumes),
|
||||||
|
..Default::default()
|
||||||
|
}),
|
||||||
|
..Default::default()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn host_root_volume() -> (Volume, VolumeMount) {
|
||||||
|
(
|
||||||
|
Volume {
|
||||||
|
name: "host".to_string(),
|
||||||
|
host_path: Some(HostPathVolumeSource {
|
||||||
|
path: "/".to_string(),
|
||||||
|
..Default::default()
|
||||||
|
}),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
VolumeMount {
|
||||||
|
name: "host".to_string(),
|
||||||
|
mount_path: "/host".to_string(),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Build a ResourceBundle containing a privileged pod and any required RBAC.
|
||||||
|
///
|
||||||
|
/// This function implements the Resource Bundle pattern to encapsulate platform-specific
|
||||||
|
/// security requirements for running privileged operations on nodes.
|
||||||
|
///
|
||||||
|
/// # Platform-Specific Behavior
|
||||||
|
///
|
||||||
|
/// - **OpenShift**: Creates a ClusterRoleBinding to grant the default ServiceAccount
|
||||||
|
/// access to the `system:openshift:scc:privileged` ClusterRole, which allows the pod
|
||||||
|
/// to use the privileged Security Context Constraint (SCC).
|
||||||
|
/// - **Standard Kubernetes/K3s**: Only creates the Pod resource, as these distributions
|
||||||
|
/// use standard PodSecurityPolicy or don't enforce additional security constraints.
|
||||||
|
///
|
||||||
|
/// # Arguments
|
||||||
|
///
|
||||||
|
/// * `config` - Configuration for the privileged pod (name, namespace, command, etc.)
|
||||||
|
/// * `k8s_distribution` - The detected Kubernetes distribution to determine RBAC requirements
|
||||||
|
///
|
||||||
|
/// # Returns
|
||||||
|
///
|
||||||
|
/// A `ResourceBundle` containing 1-2 resources:
|
||||||
|
/// - ClusterRoleBinding (OpenShift only)
|
||||||
|
/// - Pod (all distributions)
|
||||||
|
///
|
||||||
|
/// # Example
|
||||||
|
///
|
||||||
|
/// ```rust,no_run
|
||||||
|
/// # use harmony::topology::k8s::helper::{build_privileged_bundle, PrivilegedPodConfig};
|
||||||
|
/// # use harmony::topology::KubernetesDistribution;
|
||||||
|
/// let bundle = build_privileged_bundle(
|
||||||
|
/// PrivilegedPodConfig {
|
||||||
|
/// name: "network-setup".to_string(),
|
||||||
|
/// namespace: "default".to_string(),
|
||||||
|
/// node_name: "worker-01".to_string(),
|
||||||
|
/// container_name: "setup".to_string(),
|
||||||
|
/// command: vec!["nmcli".to_string(), "connection".to_string(), "reload".to_string()],
|
||||||
|
/// ..Default::default()
|
||||||
|
/// },
|
||||||
|
/// &KubernetesDistribution::OpenshiftFamily,
|
||||||
|
/// );
|
||||||
|
/// // Bundle now contains ClusterRoleBinding + Pod
|
||||||
|
/// ```
|
||||||
|
pub fn build_privileged_bundle(
|
||||||
|
wjro
commented
Maybe the above comment doesn't matter right now since this will only work if the distribution is OpenshiftFamily. It feels to me like this should also be a match over the KubernetesDistributions Maybe the above comment doesn't matter right now since this will only work if the distribution is OpenshiftFamily. It feels to me like this should also be a match over the KubernetesDistributions
johnride
commented
There is just nothing to do on other distros than openshift family to get a privileged pod. No need for additional permissions mapping/binding like in openshift because of SCC. There is just nothing to do on other distros than openshift family to get a privileged pod. No need for additional permissions mapping/binding like in openshift because of SCC.
|
|||||||
|
config: PrivilegedPodConfig,
|
||||||
|
k8s_distribution: &KubernetesDistribution,
|
||||||
|
) -> ResourceBundle {
|
||||||
|
debug!(
|
||||||
|
"Building privileged bundle for config {config:#?} on distribution {k8s_distribution:?}"
|
||||||
|
);
|
||||||
|
let mut bundle = ResourceBundle::new();
|
||||||
|
let pod_name = config.name.clone();
|
||||||
|
let namespace = config.namespace.clone();
|
||||||
|
|
||||||
|
// 1. On OpenShift, create RBAC binding to privileged SCC
|
||||||
|
if let KubernetesDistribution::OpenshiftFamily = k8s_distribution {
|
||||||
|
// The default ServiceAccount needs to be bound to the privileged SCC
|
||||||
|
// via the system:openshift:scc:privileged ClusterRole
|
||||||
|
let crb = ClusterRoleBinding {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(format!("{}-scc-binding", pod_name)),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
role_ref: RoleRef {
|
||||||
|
api_group: "rbac.authorization.k8s.io".to_string(),
|
||||||
|
kind: "ClusterRole".to_string(),
|
||||||
|
name: "system:openshift:scc:privileged".to_string(),
|
||||||
|
},
|
||||||
|
subjects: Some(vec![Subject {
|
||||||
|
kind: "ServiceAccount".to_string(),
|
||||||
|
name: "default".to_string(),
|
||||||
|
namespace: Some(namespace.clone()),
|
||||||
|
api_group: None,
|
||||||
|
..Default::default()
|
||||||
|
}]),
|
||||||
|
};
|
||||||
|
bundle.add(crb);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. Build the privileged pod
|
||||||
|
let pod = build_privileged_pod(config, k8s_distribution);
|
||||||
|
bundle.add(pod);
|
||||||
|
|
||||||
|
bundle
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Action to take when a drain operation times out.
|
||||||
|
pub enum DrainTimeoutAction {
|
||||||
|
/// Accept the partial drain and continue
|
||||||
|
Accept,
|
||||||
|
/// Retry the drain for another timeout period
|
||||||
|
Retry,
|
||||||
|
/// Abort the drain operation
|
||||||
|
Abort,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Prompts the user to confirm acceptance of a partial drain.
|
||||||
|
///
|
||||||
|
/// Returns `Ok(true)` if the user confirms acceptance, `Ok(false)` if the user
|
||||||
|
/// chooses to retry or abort, and `Err` if the prompt system fails entirely.
|
||||||
|
pub fn prompt_drain_timeout_action(
|
||||||
|
node_name: &str,
|
||||||
|
pending_count: usize,
|
||||||
|
timeout_duration: Duration,
|
||||||
|
) -> Result<DrainTimeoutAction, kube::Error> {
|
||||||
|
let prompt_msg = format!(
|
||||||
|
"Drain operation timed out on node '{}' with {} pod(s) remaining. What would you like to do?",
|
||||||
|
node_name, pending_count
|
||||||
|
);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
let choices = vec![
|
||||||
|
"Accept drain failure (requires confirmation)".to_string(),
|
||||||
|
format!("Retry drain for another {:?}", timeout_duration),
|
||||||
|
"Abort operation".to_string(),
|
||||||
|
];
|
||||||
|
|
||||||
|
let selection = inquire::Select::new(&prompt_msg, choices)
|
||||||
|
.with_help_message("Use arrow keys to navigate, Enter to select")
|
||||||
|
.prompt()
|
||||||
|
.map_err(|e| {
|
||||||
|
kube::Error::Discovery(DiscoveryError::MissingResource(format!(
|
||||||
|
"Prompt failed: {}",
|
||||||
|
e
|
||||||
|
)))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
if selection.starts_with("Accept") {
|
||||||
|
// Require typed confirmation - retry until correct or user cancels
|
||||||
|
let required_confirmation = format!("yes-accept-drain:{}={}", node_name, pending_count);
|
||||||
|
|
||||||
|
let confirmation_prompt = format!(
|
||||||
|
"To accept this partial drain, type exactly: {}",
|
||||||
|
required_confirmation
|
||||||
|
);
|
||||||
|
|
||||||
|
match inquire::Text::new(&confirmation_prompt)
|
||||||
|
.with_help_message(&format!(
|
||||||
|
"This action acknowledges {} pods will remain on the node",
|
||||||
|
pending_count
|
||||||
|
))
|
||||||
|
.prompt()
|
||||||
|
{
|
||||||
|
Ok(input) if input == required_confirmation => {
|
||||||
|
warn!(
|
||||||
|
"User accepted partial drain of node '{}' with {} pods remaining (confirmation: {})",
|
||||||
|
node_name, pending_count, required_confirmation
|
||||||
|
);
|
||||||
|
return Ok(DrainTimeoutAction::Accept);
|
||||||
|
}
|
||||||
|
Ok(input) => {
|
||||||
|
warn!(
|
||||||
|
"Confirmation failed. Expected '{}', got '{}'. Please try again.",
|
||||||
|
required_confirmation, input
|
||||||
|
);
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
// User cancelled (Ctrl+C) or prompt system failed
|
||||||
|
error!("Confirmation prompt cancelled or failed: {}", e);
|
||||||
|
return Ok(DrainTimeoutAction::Abort);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else if selection.starts_with("Retry") {
|
||||||
|
info!(
|
||||||
|
"User chose to retry drain operation for another {:?}",
|
||||||
|
timeout_duration
|
||||||
|
);
|
||||||
|
return Ok(DrainTimeoutAction::Retry);
|
||||||
|
} else {
|
||||||
|
error!("Drain operation aborted by user");
|
||||||
|
return Ok(DrainTimeoutAction::Abort);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use pretty_assertions::assert_eq;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_host_root_volume() {
|
||||||
|
let (volume, mount) = host_root_volume();
|
||||||
|
|
||||||
|
assert_eq!(volume.name, "host");
|
||||||
|
assert_eq!(volume.host_path.as_ref().unwrap().path, "/");
|
||||||
|
|
||||||
|
assert_eq!(mount.name, "host");
|
||||||
|
assert_eq!(mount.mount_path, "/host");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_build_privileged_pod_minimal() {
|
||||||
|
let pod = build_privileged_pod(
|
||||||
|
PrivilegedPodConfig {
|
||||||
|
name: "minimal-pod".to_string(),
|
||||||
|
namespace: "kube-system".to_string(),
|
||||||
|
node_name: "node-123".to_string(),
|
||||||
|
container_name: "debug-container".to_string(),
|
||||||
|
command: vec!["sleep".to_string(), "3600".to_string()],
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
&KubernetesDistribution::Default,
|
||||||
|
);
|
||||||
|
|
||||||
|
assert_eq!(pod.metadata.name, Some("minimal-pod".to_string()));
|
||||||
|
assert_eq!(pod.metadata.namespace, Some("kube-system".to_string()));
|
||||||
|
|
||||||
|
let spec = pod.spec.as_ref().expect("Pod spec should be present");
|
||||||
|
assert_eq!(spec.node_name, Some("node-123".to_string()));
|
||||||
|
assert_eq!(spec.restart_policy, Some("Never".to_string()));
|
||||||
|
assert_eq!(spec.host_pid, Some(false));
|
||||||
|
assert_eq!(spec.host_network, Some(false));
|
||||||
|
|
||||||
|
assert_eq!(spec.containers.len(), 1);
|
||||||
|
let container = &spec.containers[0];
|
||||||
|
assert_eq!(container.name, "debug-container");
|
||||||
|
assert_eq!(container.image, Some(PRIVILEGED_POD_IMAGE.to_string()));
|
||||||
|
assert_eq!(
|
||||||
|
container.command,
|
||||||
|
Some(vec!["sleep".to_string(), "3600".to_string()])
|
||||||
|
);
|
||||||
|
|
||||||
|
// Security context check
|
||||||
|
let sec_ctx = container
|
||||||
|
.security_context
|
||||||
|
.as_ref()
|
||||||
|
.expect("Security context missing");
|
||||||
|
assert_eq!(sec_ctx.privileged, Some(true));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_build_privileged_pod_with_volumes_and_host_access() {
|
||||||
|
let (host_vol, host_mount) = host_root_volume();
|
||||||
|
|
||||||
|
let pod = build_privileged_pod(
|
||||||
|
PrivilegedPodConfig {
|
||||||
|
name: "full-pod".to_string(),
|
||||||
|
namespace: "default".to_string(),
|
||||||
|
node_name: "node-1".to_string(),
|
||||||
|
container_name: "runner".to_string(),
|
||||||
|
command: vec!["/bin/sh".to_string()],
|
||||||
|
volumes: vec![host_vol.clone()],
|
||||||
|
volume_mounts: vec![host_mount.clone()],
|
||||||
|
host_pid: true,
|
||||||
|
host_network: true,
|
||||||
|
},
|
||||||
|
&KubernetesDistribution::Default,
|
||||||
|
);
|
||||||
|
|
||||||
|
let spec = pod.spec.as_ref().expect("Pod spec should be present");
|
||||||
|
assert_eq!(spec.host_pid, Some(true));
|
||||||
|
assert_eq!(spec.host_network, Some(true));
|
||||||
|
|
||||||
|
// Check volumes in Spec
|
||||||
|
let volumes = spec.volumes.as_ref().expect("Volumes should be present");
|
||||||
|
assert_eq!(volumes.len(), 1);
|
||||||
|
assert_eq!(volumes[0].name, "host");
|
||||||
|
|
||||||
|
// Check mounts in Container
|
||||||
|
let container = &spec.containers[0];
|
||||||
|
let mounts = container
|
||||||
|
.volume_mounts
|
||||||
|
.as_ref()
|
||||||
|
.expect("Mounts should be present");
|
||||||
|
assert_eq!(mounts.len(), 1);
|
||||||
|
assert_eq!(mounts[0].name, "host");
|
||||||
|
assert_eq!(mounts[0].mount_path, "/host");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_build_privileged_pod_structure_correctness() {
|
||||||
|
// This test validates that the construction logic puts things in the right places
|
||||||
|
// effectively validating the "template".
|
||||||
|
|
||||||
|
let custom_vol = Volume {
|
||||||
|
name: "custom-vol".to_string(),
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
let custom_mount = VolumeMount {
|
||||||
|
name: "custom-vol".to_string(),
|
||||||
|
mount_path: "/custom".to_string(),
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
|
||||||
|
let pod = build_privileged_pod(
|
||||||
|
PrivilegedPodConfig {
|
||||||
|
name: "structure-test".to_string(),
|
||||||
|
namespace: "test-ns".to_string(),
|
||||||
|
node_name: "test-node".to_string(),
|
||||||
|
container_name: "test-container".to_string(),
|
||||||
|
command: vec!["cmd".to_string()],
|
||||||
|
volumes: vec![custom_vol],
|
||||||
|
volume_mounts: vec![custom_mount],
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
&KubernetesDistribution::Default,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Validate structure depth
|
||||||
|
let spec = pod.spec.as_ref().unwrap();
|
||||||
|
|
||||||
|
// 1. Spec level fields
|
||||||
|
assert!(spec.node_name.is_some());
|
||||||
|
assert!(spec.volumes.is_some());
|
||||||
|
|
||||||
|
// 2. Container level fields
|
||||||
|
let container = &spec.containers[0];
|
||||||
|
assert!(container.security_context.is_some());
|
||||||
|
assert!(container.volume_mounts.is_some());
|
||||||
|
|
||||||
|
// 3. Nested fields
|
||||||
|
assert!(
|
||||||
|
container
|
||||||
|
.security_context
|
||||||
|
.as_ref()
|
||||||
|
.unwrap()
|
||||||
|
.privileged
|
||||||
|
.unwrap()
|
||||||
|
);
|
||||||
|
assert_eq!(spec.volumes.as_ref().unwrap()[0].name, "custom-vol");
|
||||||
|
assert_eq!(
|
||||||
|
container.volume_mounts.as_ref().unwrap()[0].mount_path,
|
||||||
|
"/custom"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_build_privileged_bundle_default_distribution() {
|
||||||
|
let bundle = build_privileged_bundle(
|
||||||
|
PrivilegedPodConfig {
|
||||||
|
name: "test-bundle".to_string(),
|
||||||
|
namespace: "test-ns".to_string(),
|
||||||
|
node_name: "node-1".to_string(),
|
||||||
|
container_name: "test-container".to_string(),
|
||||||
|
command: vec!["echo".to_string(), "hello".to_string()],
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
&KubernetesDistribution::Default,
|
||||||
|
);
|
||||||
|
|
||||||
|
// For Default distribution, only the Pod should be in the bundle
|
||||||
|
assert_eq!(bundle.resources.len(), 1);
|
||||||
|
|
||||||
|
let pod_obj = &bundle.resources[0];
|
||||||
|
assert_eq!(pod_obj.metadata.name.as_deref(), Some("test-bundle"));
|
||||||
|
assert_eq!(pod_obj.metadata.namespace.as_deref(), Some("test-ns"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_build_privileged_bundle_openshift_distribution() {
|
||||||
|
let bundle = build_privileged_bundle(
|
||||||
|
PrivilegedPodConfig {
|
||||||
|
name: "test-bundle-ocp".to_string(),
|
||||||
|
namespace: "test-ns".to_string(),
|
||||||
|
node_name: "node-1".to_string(),
|
||||||
|
container_name: "test-container".to_string(),
|
||||||
|
command: vec!["echo".to_string(), "hello".to_string()],
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
&KubernetesDistribution::OpenshiftFamily,
|
||||||
|
);
|
||||||
|
|
||||||
|
// For OpenShift, both ClusterRoleBinding and Pod should be in the bundle
|
||||||
|
assert_eq!(bundle.resources.len(), 2);
|
||||||
|
|
||||||
|
// First resource should be the ClusterRoleBinding
|
||||||
|
let crb_obj = &bundle.resources[0];
|
||||||
|
assert_eq!(
|
||||||
|
crb_obj.metadata.name.as_deref(),
|
||||||
|
Some("test-bundle-ocp-scc-binding")
|
||||||
|
);
|
||||||
|
|
||||||
|
// Verify it's targeting the privileged SCC
|
||||||
|
if let Some(role_ref) = crb_obj.data.get("roleRef") {
|
||||||
|
assert_eq!(
|
||||||
|
role_ref.get("name").and_then(|v| v.as_str()),
|
||||||
|
Some("system:openshift:scc:privileged")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Second resource should be the Pod
|
||||||
|
let pod_obj = &bundle.resources[1];
|
||||||
|
assert_eq!(pod_obj.metadata.name.as_deref(), Some("test-bundle-ocp"));
|
||||||
|
assert_eq!(pod_obj.metadata.namespace.as_deref(), Some("test-ns"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_build_privileged_bundle_k3s_distribution() {
|
||||||
|
let bundle = build_privileged_bundle(
|
||||||
|
PrivilegedPodConfig {
|
||||||
|
name: "test-bundle-k3s".to_string(),
|
||||||
|
namespace: "test-ns".to_string(),
|
||||||
|
node_name: "node-1".to_string(),
|
||||||
|
container_name: "test-container".to_string(),
|
||||||
|
command: vec!["echo".to_string(), "hello".to_string()],
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
&KubernetesDistribution::K3sFamily,
|
||||||
|
);
|
||||||
|
|
||||||
|
// For K3s, only the Pod should be in the bundle (no special SCC)
|
||||||
|
assert_eq!(bundle.resources.len(), 1);
|
||||||
|
|
||||||
|
let pod_obj = &bundle.resources[0];
|
||||||
|
assert_eq!(pod_obj.metadata.name.as_deref(), Some("test-bundle-k3s"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_pod_yaml_rendering_expected() {
|
||||||
|
let pod = build_privileged_pod(
|
||||||
|
PrivilegedPodConfig {
|
||||||
|
name: "pod_name".to_string(),
|
||||||
|
namespace: "pod_namespace".to_string(),
|
||||||
|
node_name: "node name".to_string(),
|
||||||
|
container_name: "container name".to_string(),
|
||||||
|
command: vec!["command".to_string(), "argument".to_string()],
|
||||||
|
host_pid: true,
|
||||||
|
host_network: true,
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
&KubernetesDistribution::Default,
|
||||||
|
);
|
||||||
|
|
||||||
|
assert_eq!(
|
||||||
|
&serde_yaml::to_string(&pod).unwrap(),
|
||||||
|
"apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
metadata:
|
||||||
|
name: pod_name
|
||||||
|
namespace: pod_namespace
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- command:
|
||||||
|
- command
|
||||||
|
- argument
|
||||||
|
image: hub.nationtech.io/redhat/ubi10:latest
|
||||||
|
name: container name
|
||||||
|
securityContext:
|
||||||
|
privileged: true
|
||||||
|
volumeMounts: []
|
||||||
|
hostNetwork: true
|
||||||
|
hostPID: true
|
||||||
|
nodeName: node name
|
||||||
|
restartPolicy: Never
|
||||||
|
volumes: []
|
||||||
|
"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_pod_yaml_rendering_openshift() {
|
||||||
|
let pod = build_privileged_pod(
|
||||||
|
PrivilegedPodConfig {
|
||||||
|
name: "pod_name".to_string(),
|
||||||
|
namespace: "pod_namespace".to_string(),
|
||||||
|
node_name: "node name".to_string(),
|
||||||
|
container_name: "container name".to_string(),
|
||||||
|
command: vec!["command".to_string(), "argument".to_string()],
|
||||||
|
host_pid: true,
|
||||||
|
host_network: true,
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
&KubernetesDistribution::OpenshiftFamily,
|
||||||
|
);
|
||||||
|
|
||||||
|
assert_eq!(
|
||||||
|
&serde_yaml::to_string(&pod).unwrap(),
|
||||||
|
"apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
metadata:
|
||||||
|
annotations:
|
||||||
|
openshift.io/required-scc: privileged
|
||||||
|
openshift.io/scc: privileged
|
||||||
|
name: pod_name
|
||||||
|
namespace: pod_namespace
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- command:
|
||||||
|
- command
|
||||||
|
- argument
|
||||||
|
image: hub.nationtech.io/redhat/ubi10:latest
|
||||||
|
name: container name
|
||||||
|
securityContext:
|
||||||
|
privileged: true
|
||||||
|
volumeMounts: []
|
||||||
|
hostNetwork: true
|
||||||
|
hostPID: true
|
||||||
|
nodeName: node name
|
||||||
|
restartPolicy: Never
|
||||||
|
volumes: []
|
||||||
|
"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
2586
harmony/src/domain/topology/k8s/mod.rs
Normal file
2586
harmony/src/domain/topology/k8s/mod.rs
Normal file
File diff suppressed because it is too large
Load Diff
@@ -3,18 +3,12 @@ use std::{collections::BTreeMap, process::Command, sync::Arc, time::Duration};
|
|||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use base64::{Engine, engine::general_purpose};
|
use base64::{Engine, engine::general_purpose};
|
||||||
use harmony_types::rfc1123::Rfc1123Name;
|
use harmony_types::rfc1123::Rfc1123Name;
|
||||||
use k8s_openapi::{
|
use k8s_openapi::api::{
|
||||||
ByteString,
|
|
||||||
api::{
|
|
||||||
core::v1::{Pod, Secret},
|
core::v1::{Pod, Secret},
|
||||||
rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
|
rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
|
||||||
},
|
|
||||||
};
|
};
|
||||||
use kube::{
|
use kube::api::{DynamicObject, GroupVersionKind, ObjectMeta};
|
||||||
api::{DynamicObject, GroupVersionKind, ObjectMeta},
|
use log::{debug, info, trace, warn};
|
||||||
runtime::conditions,
|
|
||||||
};
|
|
||||||
use log::{debug, error, info, trace, warn};
|
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
use tokio::sync::OnceCell;
|
use tokio::sync::OnceCell;
|
||||||
|
|
||||||
@@ -34,10 +28,7 @@ use crate::{
|
|||||||
score_cert_management::CertificateManagementScore,
|
score_cert_management::CertificateManagementScore,
|
||||||
},
|
},
|
||||||
k3d::K3DInstallationScore,
|
k3d::K3DInstallationScore,
|
||||||
k8s::{
|
k8s::ingress::{K8sIngressScore, PathType},
|
||||||
ingress::{K8sIngressScore, PathType},
|
|
||||||
resource::K8sResourceScore,
|
|
||||||
},
|
|
||||||
monitoring::{
|
monitoring::{
|
||||||
grafana::{grafana::Grafana, helm::helm_grafana::grafana_helm_chart_score},
|
grafana::{grafana::Grafana, helm::helm_grafana::grafana_helm_chart_score},
|
||||||
kube_prometheus::crd::{
|
kube_prometheus::crd::{
|
||||||
@@ -54,7 +45,6 @@ use crate::{
|
|||||||
service_monitor::ServiceMonitor,
|
service_monitor::ServiceMonitor,
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
nats::capability::NatsCluster,
|
|
||||||
okd::{crd::ingresses_config::Ingress as IngressResource, route::OKDTlsPassthroughScore},
|
okd::{crd::ingresses_config::Ingress as IngressResource, route::OKDTlsPassthroughScore},
|
||||||
prometheus::{
|
prometheus::{
|
||||||
k8s_prometheus_alerting_score::K8sPrometheusCRDAlertingScore,
|
k8s_prometheus_alerting_score::K8sPrometheusCRDAlertingScore,
|
||||||
@@ -103,7 +93,6 @@ enum K8sSource {
|
|||||||
pub struct K8sAnywhereTopology {
|
pub struct K8sAnywhereTopology {
|
||||||
k8s_state: Arc<OnceCell<Option<K8sState>>>,
|
k8s_state: Arc<OnceCell<Option<K8sState>>>,
|
||||||
tenant_manager: Arc<OnceCell<K8sTenantManager>>,
|
tenant_manager: Arc<OnceCell<K8sTenantManager>>,
|
||||||
k8s_distribution: Arc<OnceCell<KubernetesDistribution>>,
|
|
||||||
config: Arc<K8sAnywhereConfig>,
|
config: Arc<K8sAnywhereConfig>,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -554,7 +543,6 @@ impl K8sAnywhereTopology {
|
|||||||
Self {
|
Self {
|
||||||
k8s_state: Arc::new(OnceCell::new()),
|
k8s_state: Arc::new(OnceCell::new()),
|
||||||
tenant_manager: Arc::new(OnceCell::new()),
|
tenant_manager: Arc::new(OnceCell::new()),
|
||||||
k8s_distribution: Arc::new(OnceCell::new()),
|
|
||||||
config: Arc::new(K8sAnywhereConfig::from_env()),
|
config: Arc::new(K8sAnywhereConfig::from_env()),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -563,7 +551,6 @@ impl K8sAnywhereTopology {
|
|||||||
Self {
|
Self {
|
||||||
k8s_state: Arc::new(OnceCell::new()),
|
k8s_state: Arc::new(OnceCell::new()),
|
||||||
tenant_manager: Arc::new(OnceCell::new()),
|
tenant_manager: Arc::new(OnceCell::new()),
|
||||||
k8s_distribution: Arc::new(OnceCell::new()),
|
|
||||||
config: Arc::new(config),
|
config: Arc::new(config),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -600,41 +587,6 @@ impl K8sAnywhereTopology {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
pub async fn get_k8s_distribution(&self) -> Result<&KubernetesDistribution, PreparationError> {
|
|
||||||
self.k8s_distribution
|
|
||||||
.get_or_try_init(async || {
|
|
||||||
debug!("Trying to detect k8s distribution");
|
|
||||||
let client = self.k8s_client().await.unwrap();
|
|
||||||
|
|
||||||
let discovery = client.discovery().await.map_err(|e| {
|
|
||||||
PreparationError::new(format!("Could not discover API groups: {}", e))
|
|
||||||
})?;
|
|
||||||
|
|
||||||
let version = client.get_apiserver_version().await.map_err(|e| {
|
|
||||||
PreparationError::new(format!("Could not get server version: {}", e))
|
|
||||||
})?;
|
|
||||||
|
|
||||||
// OpenShift / OKD
|
|
||||||
if discovery
|
|
||||||
.groups()
|
|
||||||
.any(|g| g.name() == "project.openshift.io")
|
|
||||||
{
|
|
||||||
info!("Found KubernetesDistribution OpenshiftFamily");
|
|
||||||
return Ok(KubernetesDistribution::OpenshiftFamily);
|
|
||||||
}
|
|
||||||
|
|
||||||
// K3d / K3s
|
|
||||||
if version.git_version.contains("k3s") {
|
|
||||||
info!("Found KubernetesDistribution K3sFamily");
|
|
||||||
return Ok(KubernetesDistribution::K3sFamily);
|
|
||||||
}
|
|
||||||
|
|
||||||
info!("Could not identify KubernetesDistribution, using Default");
|
|
||||||
return Ok(KubernetesDistribution::Default);
|
|
||||||
})
|
|
||||||
.await
|
|
||||||
}
|
|
||||||
|
|
||||||
fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
|
fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
|
||||||
let token_b64 = secret
|
let token_b64 = secret
|
||||||
.data
|
.data
|
||||||
@@ -652,6 +604,16 @@ impl K8sAnywhereTopology {
|
|||||||
Some(cleaned)
|
Some(cleaned)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
pub async fn get_k8s_distribution(&self) -> Result<KubernetesDistribution, PreparationError> {
|
||||||
|
self.k8s_client()
|
||||||
|
.await?
|
||||||
|
.get_k8s_distribution()
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
PreparationError::new(format!("Failed to get k8s distribution from client : {e}"))
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
pub fn build_cluster_rolebinding(
|
pub fn build_cluster_rolebinding(
|
||||||
&self,
|
&self,
|
||||||
service_account_name: &str,
|
service_account_name: &str,
|
||||||
|
|||||||
@@ -1,7 +1,6 @@
|
|||||||
use std::{net::SocketAddr, str::FromStr};
|
use std::{net::SocketAddr, str::FromStr};
|
||||||
|
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use log::debug;
|
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
|
|
||||||
use super::LogicalHost;
|
use super::LogicalHost;
|
||||||
|
|||||||
@@ -188,6 +188,10 @@ impl FromStr for DnsRecordType {
|
|||||||
pub trait NetworkManager: Debug + Send + Sync {
|
pub trait NetworkManager: Debug + Send + Sync {
|
||||||
async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError>;
|
async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError>;
|
||||||
async fn configure_bond(&self, config: &HostNetworkConfig) -> Result<(), NetworkError>;
|
async fn configure_bond(&self, config: &HostNetworkConfig) -> Result<(), NetworkError>;
|
||||||
|
async fn configure_bond_on_primary_interface(
|
||||||
|
&self,
|
||||||
|
config: &HostNetworkConfig,
|
||||||
|
) -> Result<(), NetworkError>;
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, new)]
|
#[derive(Debug, Clone, new)]
|
||||||
|
|||||||
@@ -3,6 +3,7 @@ use std::{
|
|||||||
sync::Arc,
|
sync::Arc,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
use askama::Template;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use harmony_types::id::Id;
|
use harmony_types::id::Id;
|
||||||
use k8s_openapi::api::core::v1::Node;
|
use k8s_openapi::api::core::v1::Node;
|
||||||
@@ -10,13 +11,71 @@ use kube::{
|
|||||||
ResourceExt,
|
ResourceExt,
|
||||||
api::{ObjectList, ObjectMeta},
|
api::{ObjectList, ObjectMeta},
|
||||||
};
|
};
|
||||||
use log::{debug, info};
|
use log::{debug, info, warn};
|
||||||
|
|
||||||
use crate::{
|
use crate::{
|
||||||
modules::okd::crd::nmstate,
|
modules::okd::crd::nmstate,
|
||||||
topology::{HostNetworkConfig, NetworkError, NetworkManager, k8s::K8sClient},
|
topology::{
|
||||||
|
HostNetworkConfig, NetworkError, NetworkManager,
|
||||||
|
k8s::{DrainOptions, K8sClient, NodeFile},
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/// NetworkManager bond configuration template
|
||||||
|
#[derive(Template)]
|
||||||
|
#[template(
|
||||||
|
source = r#"[connection]
|
||||||
|
id={{ bond_name }}
|
||||||
|
uuid={{ bond_uuid }}
|
||||||
|
type=bond
|
||||||
|
autoconnect-slaves=1
|
||||||
|
interface-name={{ bond_name }}
|
||||||
|
|
||||||
|
[bond]
|
||||||
|
lacp_rate=fast
|
||||||
|
mode=802.3ad
|
||||||
|
xmit_hash_policy=layer2
|
||||||
|
|
||||||
|
[ipv4]
|
||||||
|
method=auto
|
||||||
|
|
||||||
|
[ipv6]
|
||||||
|
addr-gen-mode=default
|
||||||
|
method=auto
|
||||||
|
|
||||||
|
[proxy]
|
||||||
|
"#,
|
||||||
|
ext = "txt"
|
||||||
|
)]
|
||||||
|
struct BondConfigTemplate {
|
||||||
|
bond_name: String,
|
||||||
|
bond_uuid: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// NetworkManager bond slave configuration template
|
||||||
|
#[derive(Template)]
|
||||||
|
#[template(
|
||||||
|
source = r#"[connection]
|
||||||
|
id={{ slave_id }}
|
||||||
|
uuid={{ slave_uuid }}
|
||||||
|
type=ethernet
|
||||||
|
interface-name={{ interface_name }}
|
||||||
|
master={{ bond_name }}
|
||||||
|
slave-type=bond
|
||||||
|
|
||||||
|
[ethernet]
|
||||||
|
|
||||||
|
[bond-port]
|
||||||
|
"#,
|
||||||
|
ext = "txt"
|
||||||
|
)]
|
||||||
|
struct BondSlaveConfigTemplate {
|
||||||
|
slave_id: String,
|
||||||
|
slave_uuid: String,
|
||||||
|
interface_name: String,
|
||||||
|
bond_name: String,
|
||||||
|
}
|
||||||
|
|
||||||
/// TODO document properly the non-intuitive behavior or "roll forward only" of nmstate in general
|
/// TODO document properly the non-intuitive behavior or "roll forward only" of nmstate in general
|
||||||
/// It is documented in nmstate official doc, but worth mentionning here :
|
/// It is documented in nmstate official doc, but worth mentionning here :
|
||||||
///
|
///
|
||||||
@@ -87,6 +146,117 @@ impl NetworkManager for OpenShiftNmStateNetworkManager {
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Configures bonding on the primary network interface of a node.
|
||||||
|
///
|
||||||
|
/// Changing the *primary* network interface (making it a bond
|
||||||
|
/// slave) will disrupt node connectivity mid-change, so the
|
||||||
|
/// procedure is:
|
||||||
|
///
|
||||||
|
/// 1. Generate NetworkManager .nmconnection files
|
||||||
|
/// 2. Drain the node (includes cordon)
|
||||||
|
/// 3. Write configuration files to `/etc/NetworkManager/system-connections/`
|
||||||
|
/// 4. Attempt to reload NetworkManager (optional, best-effort)
|
||||||
|
/// 5. Reboot the node with full verification (drain, boot_id check, uncordon)
|
||||||
|
///
|
||||||
|
/// The reboot procedure includes:
|
||||||
|
/// - Recording boot_id before reboot
|
||||||
|
/// - Fire-and-forget reboot command
|
||||||
|
/// - Waiting for NotReady status
|
||||||
|
/// - Waiting for Ready status
|
||||||
|
/// - Verifying boot_id changed
|
||||||
|
/// - Uncordoning the node
|
||||||
|
///
|
||||||
|
/// See ADR-019 for context and rationale.
|
||||||
|
async fn configure_bond_on_primary_interface(
|
||||||
|
&self,
|
||||||
|
config: &HostNetworkConfig,
|
||||||
|
) -> Result<(), NetworkError> {
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
let node_name = self.get_node_name_for_id(&config.host_id).await?;
|
||||||
|
let hostname = self.get_hostname(&config.host_id).await?;
|
||||||
|
|
||||||
|
info!(
|
||||||
|
"Configuring bond on primary interface for host '{}' (node '{}')",
|
||||||
|
config.host_id, node_name
|
||||||
|
);
|
||||||
|
|
||||||
|
// 1. Generate .nmconnection files
|
||||||
|
let files = self.generate_nmconnection_files(&hostname, config)?;
|
||||||
|
debug!(
|
||||||
|
"Generated {} NetworkManager configuration files",
|
||||||
|
files.len()
|
||||||
|
);
|
||||||
|
|
||||||
|
// 2. Write configuration files to the node (before draining)
|
||||||
|
// We do this while the node is still running for faster operation
|
||||||
|
info!(
|
||||||
|
"Writing NetworkManager configuration files to node '{}'...",
|
||||||
|
node_name
|
||||||
|
);
|
||||||
|
self.k8s_client
|
||||||
|
.write_files_to_node(&node_name, &files)
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
NetworkError::new(format!(
|
||||||
|
"Failed to write configuration files to node '{}': {}",
|
||||||
|
node_name, e
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
// 3. Reload NetworkManager configuration (best-effort)
|
||||||
|
// This won't activate the bond yet since the primary interface would lose connectivity,
|
||||||
|
// but it validates the configuration files are correct
|
||||||
|
info!(
|
||||||
|
"Reloading NetworkManager configuration on node '{}'...",
|
||||||
|
node_name
|
||||||
|
);
|
||||||
|
match self
|
||||||
|
.k8s_client
|
||||||
|
.run_privileged_command_on_node(&node_name, "chroot /host nmcli connection reload")
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(output) => {
|
||||||
|
debug!("NetworkManager reload output: {}", output.trim());
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
warn!(
|
||||||
|
"Failed to reload NetworkManager configuration: {}. Proceeding with reboot.",
|
||||||
|
e
|
||||||
|
);
|
||||||
|
// Don't fail here - reboot will pick up the config anyway
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 4. Reboot the node with full verification
|
||||||
|
// The reboot_node function handles: drain, boot_id capture, reboot, NotReady wait,
|
||||||
|
// Ready wait, boot_id verification, and uncordon
|
||||||
|
// 60 minutes timeout for bare-metal environments (drain can take 20-30 mins)
|
||||||
|
let reboot_timeout = Duration::from_secs(3600);
|
||||||
|
info!(
|
||||||
|
"Rebooting node '{}' to apply network configuration (timeout: {:?})...",
|
||||||
|
node_name, reboot_timeout
|
||||||
|
);
|
||||||
|
|
||||||
|
self.k8s_client
|
||||||
|
.reboot_node(
|
||||||
|
&node_name,
|
||||||
|
&DrainOptions::default_ignore_daemonset_delete_emptydir_data(),
|
||||||
|
reboot_timeout,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
NetworkError::new(format!("Failed to reboot node '{}': {}", node_name, e))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
info!(
|
||||||
|
"Successfully configured bond on primary interface for host '{}' (node '{}')",
|
||||||
|
config.host_id, node_name
|
||||||
|
);
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
async fn configure_bond(&self, config: &HostNetworkConfig) -> Result<(), NetworkError> {
|
async fn configure_bond(&self, config: &HostNetworkConfig) -> Result<(), NetworkError> {
|
||||||
let hostname = self.get_hostname(&config.host_id).await.map_err(|e| {
|
let hostname = self.get_hostname(&config.host_id).await.map_err(|e| {
|
||||||
NetworkError::new(format!(
|
NetworkError::new(format!(
|
||||||
@@ -208,14 +378,14 @@ impl OpenShiftNmStateNetworkManager {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn get_hostname(&self, host_id: &Id) -> Result<String, String> {
|
async fn get_node_for_id(&self, host_id: &Id) -> Result<Node, String> {
|
||||||
let nodes: ObjectList<Node> = self
|
let nodes: ObjectList<Node> = self
|
||||||
.k8s_client
|
.k8s_client
|
||||||
.list_resources(None, None)
|
.list_resources(None, None)
|
||||||
.await
|
.await
|
||||||
.map_err(|e| format!("Failed to list nodes: {e}"))?;
|
.map_err(|e| format!("Failed to list nodes: {e}"))?;
|
||||||
|
|
||||||
let Some(node) = nodes.iter().find(|n| {
|
let Some(node) = nodes.into_iter().find(|n| {
|
||||||
n.status
|
n.status
|
||||||
.as_ref()
|
.as_ref()
|
||||||
.and_then(|s| s.node_info.as_ref())
|
.and_then(|s| s.node_info.as_ref())
|
||||||
@@ -225,6 +395,20 @@ impl OpenShiftNmStateNetworkManager {
|
|||||||
return Err(format!("No node found for host '{host_id}'"));
|
return Err(format!("No node found for host '{host_id}'"));
|
||||||
};
|
};
|
||||||
|
|
||||||
|
Ok(node)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_node_name_for_id(&self, host_id: &Id) -> Result<String, String> {
|
||||||
|
let node = self.get_node_for_id(host_id).await?;
|
||||||
|
|
||||||
|
node.metadata.name.ok_or(format!(
|
||||||
|
"A node should always have a name, node for host_id {host_id} has no name"
|
||||||
|
))
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_hostname(&self, host_id: &Id) -> Result<String, String> {
|
||||||
|
let node = self.get_node_for_id(host_id).await?;
|
||||||
|
|
||||||
node.labels()
|
node.labels()
|
||||||
.get("kubernetes.io/hostname")
|
.get("kubernetes.io/hostname")
|
||||||
.ok_or(format!(
|
.ok_or(format!(
|
||||||
@@ -261,4 +445,82 @@ impl OpenShiftNmStateNetworkManager {
|
|||||||
let next_id = (0..).find(|id| !used_ids.contains(id)).unwrap();
|
let next_id = (0..).find(|id| !used_ids.contains(id)).unwrap();
|
||||||
Ok(format!("bond{next_id}"))
|
Ok(format!("bond{next_id}"))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Generates NetworkManager .nmconnection files for bonding configuration.
|
||||||
|
///
|
||||||
|
/// Creates:
|
||||||
|
/// - One bond master configuration file (bond0.nmconnection)
|
||||||
|
/// - One slave configuration file per interface (bond0-<iface>.nmconnection)
|
||||||
|
///
|
||||||
|
/// All files are placed in `/etc/NetworkManager/system-connections/` with
|
||||||
|
/// mode 0o600 (required by NetworkManager).
|
||||||
|
fn generate_nmconnection_files(
|
||||||
|
&self,
|
||||||
|
hostname: &str,
|
||||||
|
config: &HostNetworkConfig,
|
||||||
|
) -> Result<Vec<NodeFile>, NetworkError> {
|
||||||
|
let mut files = Vec::new();
|
||||||
|
let bond_name = "bond0";
|
||||||
|
let bond_uuid = uuid::Uuid::new_v4().to_string();
|
||||||
|
|
||||||
|
// Generate bond master configuration
|
||||||
|
let bond_template = BondConfigTemplate {
|
||||||
|
bond_name: bond_name.to_string(),
|
||||||
|
bond_uuid: bond_uuid.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let bond_content = bond_template.render().map_err(|e| {
|
||||||
|
NetworkError::new(format!(
|
||||||
|
"Failed to render bond configuration template: {}",
|
||||||
|
e
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
files.push(NodeFile {
|
||||||
|
path: format!(
|
||||||
|
"/etc/NetworkManager/system-connections/{}.nmconnection",
|
||||||
|
bond_name
|
||||||
|
),
|
||||||
|
content: bond_content,
|
||||||
|
mode: 0o600,
|
||||||
|
});
|
||||||
|
|
||||||
|
// Generate slave configurations for each interface
|
||||||
|
for switch_port in &config.switch_ports {
|
||||||
|
let interface_name = &switch_port.interface.name;
|
||||||
|
let slave_id = format!("{}-{}", bond_name, interface_name);
|
||||||
|
let slave_uuid = uuid::Uuid::new_v4().to_string();
|
||||||
|
|
||||||
|
let slave_template = BondSlaveConfigTemplate {
|
||||||
|
slave_id: slave_id.clone(),
|
||||||
|
slave_uuid,
|
||||||
|
interface_name: interface_name.clone(),
|
||||||
|
bond_name: bond_name.to_string(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let slave_content = slave_template.render().map_err(|e| {
|
||||||
|
NetworkError::new(format!(
|
||||||
|
"Failed to render slave configuration template for interface '{}': {}",
|
||||||
|
interface_name, e
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
files.push(NodeFile {
|
||||||
|
path: format!(
|
||||||
|
"/etc/NetworkManager/system-connections/{}.nmconnection",
|
||||||
|
slave_id
|
||||||
|
),
|
||||||
|
content: slave_content,
|
||||||
|
mode: 0o600,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
debug!(
|
||||||
|
"Generated {} NetworkManager configuration files for host '{}'",
|
||||||
|
files.len(),
|
||||||
|
hostname
|
||||||
|
);
|
||||||
|
|
||||||
|
Ok(files)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -142,9 +142,13 @@ impl HostNetworkConfigurationInterpret {
|
|||||||
);
|
);
|
||||||
|
|
||||||
info!("[Host {current_host}/{total_hosts}] Configuring host network...");
|
info!("[Host {current_host}/{total_hosts}] Configuring host network...");
|
||||||
topology.configure_bond(&config).await.map_err(|e| {
|
topology
|
||||||
|
.configure_bond_on_primary_interface(&config)
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
InterpretError::new(format!("Failed to configure host network: {e}"))
|
InterpretError::new(format!("Failed to configure host network: {e}"))
|
||||||
})?;
|
})?;
|
||||||
|
|
||||||
topology
|
topology
|
||||||
.configure_port_channel(&config)
|
.configure_port_channel(&config)
|
||||||
.await
|
.await
|
||||||
@@ -731,6 +735,16 @@ mod tests {
|
|||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async fn configure_bond_on_primary_interface(
|
||||||
|
&self,
|
||||||
|
config: &HostNetworkConfig,
|
||||||
|
) -> Result<(), NetworkError> {
|
||||||
|
let mut configured_bonds = self.configured_bonds.lock().unwrap();
|
||||||
|
configured_bonds.push((config.host_id.clone(), config.clone()));
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
|
|||||||
@@ -68,7 +68,7 @@ impl<'a> DhcpConfigDnsMasq<'a> {
|
|||||||
///
|
///
|
||||||
/// This function implements specific logic to handle existing entries:
|
/// This function implements specific logic to handle existing entries:
|
||||||
/// - If no host exists for the given IP or hostname, a new entry is created.
|
/// - If no host exists for the given IP or hostname, a new entry is created.
|
||||||
/// - If exactly one host exists for the IP and/or hostname, the new MAC is appended to it.
|
/// - If exactly one host exists for the IP and/or hostname, the new MAC is set. Old MAC addresses are dropped.
|
||||||
/// - It will error if the IP and hostname exist but point to two different host entries,
|
/// - It will error if the IP and hostname exist but point to two different host entries,
|
||||||
/// as this represents an unresolvable conflict.
|
/// as this represents an unresolvable conflict.
|
||||||
/// - It will also error if multiple entries are found for the IP or hostname, indicating an
|
/// - It will also error if multiple entries are found for the IP or hostname, indicating an
|
||||||
@@ -146,40 +146,24 @@ impl<'a> DhcpConfigDnsMasq<'a> {
|
|||||||
let host_to_modify_ip = host_to_modify.ip.content_string();
|
let host_to_modify_ip = host_to_modify.ip.content_string();
|
||||||
if host_to_modify_ip != ip_str {
|
if host_to_modify_ip != ip_str {
|
||||||
warn!(
|
warn!(
|
||||||
"Hostname '{}' already exists with a different IP ({}). Setting new IP {ip_str}. Appending MAC {}.",
|
"Hostname '{}' already exists with a different IP ({}). Setting new IP {ip_str}.",
|
||||||
hostname, host_to_modify_ip, mac_list
|
hostname, host_to_modify_ip,
|
||||||
);
|
);
|
||||||
host_to_modify.ip.content = Some(ip_str);
|
host_to_modify.ip.content = Some(ip_str);
|
||||||
} else if host_to_modify.host != hostname {
|
} else if host_to_modify.host != hostname {
|
||||||
warn!(
|
warn!(
|
||||||
"IP {} already exists with a different hostname ('{}'). Setting hostname to {hostname}. Appending MAC {}.",
|
"IP {} already exists with a different hostname ('{}'). Setting hostname to {hostname}",
|
||||||
ipaddr, host_to_modify.host, mac_list
|
ipaddr, host_to_modify.host
|
||||||
);
|
);
|
||||||
host_to_modify.host = hostname.to_string();
|
host_to_modify.host = hostname.to_string();
|
||||||
}
|
}
|
||||||
|
|
||||||
for single_mac in mac.iter() {
|
|
||||||
if !host_to_modify
|
|
||||||
.hwaddr
|
|
||||||
.content_string()
|
|
||||||
.split(',')
|
|
||||||
.any(|m| m.eq_ignore_ascii_case(single_mac))
|
|
||||||
{
|
|
||||||
info!(
|
info!(
|
||||||
"Appending MAC {} to existing static host for {} ({})",
|
"Replacing previous mac adresses {:?} with new {}",
|
||||||
single_mac, host_to_modify.host, host_to_modify_ip
|
host_to_modify.hwaddr, mac_list
|
||||||
);
|
);
|
||||||
let mut updated_macs = host_to_modify.hwaddr.content_string().to_string();
|
|
||||||
updated_macs.push(',');
|
host_to_modify.hwaddr.content = Some(mac_list);
|
||||||
updated_macs.push_str(single_mac);
|
|
||||||
host_to_modify.hwaddr.content = updated_macs.into();
|
|
||||||
} else {
|
|
||||||
debug!(
|
|
||||||
"MAC {} already present in static host entry for {} ({}). No changes made.",
|
|
||||||
single_mac, host_to_modify.host, host_to_modify_ip
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
_ => {
|
_ => {
|
||||||
return Err(DhcpError::Configuration(format!(
|
return Err(DhcpError::Configuration(format!(
|
||||||
@@ -397,7 +381,7 @@ mod test {
|
|||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_add_mac_to_existing_host_by_ip_and_hostname() {
|
fn test_replace_mac_on_existing_host_by_ip_and_hostname() {
|
||||||
let initial_host = create_host(
|
let initial_host = create_host(
|
||||||
"uuid-1",
|
"uuid-1",
|
||||||
"existing-host",
|
"existing-host",
|
||||||
@@ -416,14 +400,11 @@ mod test {
|
|||||||
let hosts = &dhcp_config.opnsense.dnsmasq.as_ref().unwrap().hosts;
|
let hosts = &dhcp_config.opnsense.dnsmasq.as_ref().unwrap().hosts;
|
||||||
assert_eq!(hosts.len(), 1);
|
assert_eq!(hosts.len(), 1);
|
||||||
let host = &hosts[0];
|
let host = &hosts[0];
|
||||||
assert_eq!(
|
assert_eq!(host.hwaddr.content_string(), "00:11:22:33:44:55");
|
||||||
host.hwaddr.content_string(),
|
|
||||||
"AA:BB:CC:DD:EE:FF,00:11:22:33:44:55"
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_add_mac_to_existing_host_by_ip_only() {
|
fn test_replace_mac_on_existing_host_by_ip_only() {
|
||||||
let initial_host = create_host(
|
let initial_host = create_host(
|
||||||
"uuid-1",
|
"uuid-1",
|
||||||
"existing-host",
|
"existing-host",
|
||||||
@@ -443,10 +424,7 @@ mod test {
|
|||||||
let hosts = &dhcp_config.opnsense.dnsmasq.as_ref().unwrap().hosts;
|
let hosts = &dhcp_config.opnsense.dnsmasq.as_ref().unwrap().hosts;
|
||||||
assert_eq!(hosts.len(), 1);
|
assert_eq!(hosts.len(), 1);
|
||||||
let host = &hosts[0];
|
let host = &hosts[0];
|
||||||
assert_eq!(
|
assert_eq!(host.hwaddr.content_string(), "00:11:22:33:44:55");
|
||||||
host.hwaddr.content_string(),
|
|
||||||
"AA:BB:CC:DD:EE:FF,00:11:22:33:44:55"
|
|
||||||
);
|
|
||||||
assert_eq!(host.host, new_hostname); // hostname should be updated
|
assert_eq!(host.host, new_hostname); // hostname should be updated
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -474,10 +452,7 @@ mod test {
|
|||||||
let hosts = &dhcp_config.opnsense.dnsmasq.as_ref().unwrap().hosts;
|
let hosts = &dhcp_config.opnsense.dnsmasq.as_ref().unwrap().hosts;
|
||||||
assert_eq!(hosts.len(), 1);
|
assert_eq!(hosts.len(), 1);
|
||||||
let host = &hosts[0];
|
let host = &hosts[0];
|
||||||
assert_eq!(
|
assert_eq!(host.hwaddr.content_string(), "00:11:22:33:44:55");
|
||||||
host.hwaddr.content_string(),
|
|
||||||
"AA:BB:CC:DD:EE:FF,00:11:22:33:44:55"
|
|
||||||
);
|
|
||||||
assert_eq!(host.ip.content_string(), "192.168.1.99"); // Original IP should be preserved.
|
assert_eq!(host.ip.content_string(), "192.168.1.99"); // Original IP should be preserved.
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user
Deleting resources in sequential reverse order can deadlock the deletion process. A solution idea is to launch all deletion in parallel