Compare commits

..

31 Commits

Author SHA1 Message Date
7b542c9865 feat: OPNSense Topology useful to interact with only an opnsense instance.
All checks were successful
Run Check Script / check (pull_request) Successful in 1m11s
With this work, no need to initialize a full HAClusterTopology to run
opnsense scores.

Also added an example showing how to use it and perform basic
operations.

Made a video out of it, might publish it at some point!
2025-11-05 10:02:45 -05:00
c80ede706b fix(host_network): adjust bond & port-channel configuration (partial) (#175)
Some checks failed
Run Check Script / check (push) Successful in 1m20s
Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m21s
## Description
* Replace the CatalogSource approach to install the OperatorHub.io catalog by a more simple & straightforward way to install NMState
* Improve logging
* Add report summarizing the host network configuration that was applied (which host, bonds, port-channels)
* Fix command to find next available port channel id

## Extra info
Using the `apply_url` approach to install the NMState operator isn't the best approach: it's harder to maintain and upgrade. But it helps us achieve waht we wanted for now: install the NMState Operator to configure bonds on a host.

The preferred approach, installing an operator from the OperatorHub.io catalog, didn't work for now. We had a timeout error with DeadlineExceeded probably caused by an insufficient CPU/Memory allocation to query such a big catalog, even though we tweaked the RAM allocation (we couldn't find a way to do it for CPU).

Spent too much time on this so we stopped these efforts for now. It would be good to get back to it when we need to install something else from a custom catalog.

Reviewed-on: #175
2025-10-29 17:09:16 +00:00
b2825ec1ef Merge pull request 'feat/impl_installable_crd_prometheus' (#170) from feat/impl_installable_crd_prometheus into master
Some checks failed
Run Check Script / check (push) Successful in 1m25s
Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m20s
Reviewed-on: #170
2025-10-24 16:42:54 +00:00
609d7acb5d feat: impl clone_box for ScrapeTarget<CRDPrometheus>
All checks were successful
Run Check Script / check (pull_request) Successful in 1m25s
2025-10-24 12:05:54 -04:00
de761cf538 Merge branch 'master' into feat/impl_installable_crd_prometheus 2025-10-24 11:23:56 -04:00
c069207f12 Merge pull request 'refactor(ha_cluster): inject switch client for better testability' (#174) from switch-client into master
Some checks failed
Run Check Script / check (push) Successful in 1m44s
Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m43s
Reviewed-on: #174
2025-10-23 15:05:17 +00:00
Ian Letourneau
7368184917 fix(ha_cluster): inject switch client for better testability
All checks were successful
Run Check Script / check (pull_request) Successful in 1m30s
2025-10-22 15:12:53 -04:00
05205f4ac1 Merge pull request 'feat: scrape targets to be able to get snmp alerts from machines to prometheus' (#171) from feat/scrape_target into master
Some checks are pending
Run Check Script / check (push) Waiting to run
Compile and package harmony_composer / package_harmony_composer (push) Waiting to run
Reviewed-on: #171
2025-10-22 15:33:24 +00:00
3174645c97 Merge branch 'master' into feat/scrape_target
All checks were successful
Run Check Script / check (pull_request) Successful in 1m32s
2025-10-22 15:33:01 +00:00
7536f4ec4b Merge pull request 'fix: fixed merge error that somehow got missed' (#172) from fix/merge_error into master
Some checks are pending
Run Check Script / check (push) Waiting to run
Compile and package harmony_composer / package_harmony_composer (push) Waiting to run
Reviewed-on: #172
2025-10-21 16:02:39 +00:00
464347d3e5 fix: fixed merge error that somehow got missed
Some checks failed
Run Check Script / check (pull_request) Has been cancelled
2025-10-21 12:01:31 -04:00
7f415f5b98 Merge pull request 'feat: K8sFlavour' (#161) from feat/detect_k8s_flavour into master
Some checks are pending
Run Check Script / check (push) Waiting to run
Compile and package harmony_composer / package_harmony_composer (push) Waiting to run
Reviewed-on: #161
2025-10-21 15:56:47 +00:00
2a520a1d7c Merge branch 'master' into feat/detect_k8s_flavour
Some checks failed
Run Check Script / check (pull_request) Has been cancelled
2025-10-21 15:56:18 +00:00
987f195e2f feat(cert-manager): add cluster issuer to okd cluster score (#157)
Some checks are pending
Run Check Script / check (push) Waiting to run
Compile and package harmony_composer / package_harmony_composer (push) Waiting to run
added score to install okd cluster issuer

Reviewed-on: #157
2025-10-21 15:55:55 +00:00
14d1823d15 fix: remove ceph osd deletes and purges osd from ceph osd tree\ (#120)
Some checks are pending
Run Check Script / check (push) Waiting to run
Compile and package harmony_composer / package_harmony_composer (push) Waiting to run
k8s returns None rather than zero when checking deployment for replicas
exec_app requires commands 's' and '-c' to run correctly

Reviewed-on: #120
Co-authored-by: Willem <wrolleman@nationtech.io>
Co-committed-by: Willem <wrolleman@nationtech.io>
2025-10-21 15:54:51 +00:00
2a48d51479 fix: naming of k8s distribution
Some checks failed
Run Check Script / check (pull_request) Has been cancelled
2025-10-21 11:09:45 -04:00
20a227bb41 Merge branch 'master' into feat/detect_k8s_flavour
Some checks failed
Run Check Script / check (pull_request) Has been cancelled
2025-10-21 15:02:15 +00:00
ce91ee0168 fix: removed dead code, mapped error from grafana operator to preparation error rather than ignoring it, modified k8sprometheus score to unwrap_or_default() service monitors
Some checks failed
Run Check Script / check (pull_request) Has been cancelled
2025-10-20 15:31:06 -04:00
cb66b7592e fix: made targets plural and changed scrape targets to option in AlertingInterpret
Some checks failed
Run Check Script / check (pull_request) Has been cancelled
2025-10-20 14:44:37 -04:00
a815f6ac9c feat: scrape targets to be able to get snmp alerts from machines to prometheus
Some checks failed
Run Check Script / check (pull_request) Has been cancelled
2025-10-20 11:44:11 -04:00
c0d54a4466 Merge remote-tracking branch 'origin/master' into feat/impl_installable_crd_prometheus
Some checks failed
Run Check Script / check (pull_request) Has been cancelled
2025-10-16 14:17:32 -04:00
fc384599a1 feat: implementation of Installable for CRDPrometheusIntroduction of Grafana trait and its impl for k8sanywhereallows for CRDPrometheus to be installed via AlertingInterpret which standardizes the installation of alert receivers, alerting rules, and alert senders 2025-10-16 14:07:23 -04:00
7dff70edcf wip: fixed token expiration and configured grafana dashboard 2025-10-15 15:26:36 -04:00
06a0c44c3c wip: connected the thanos-datasource to grafana, need to complete connecting the openshift-userworkload-monitoring as well 2025-10-14 15:53:42 -04:00
85bec66e58 wip: fixing grafana datasource for openshift which requires creating a token, sa, secret and inserting them into the grafanadatasource 2025-10-10 12:09:26 -04:00
1f3796f503 refactor(prometheus): modified crd prometheus to impl the installable trait 2025-10-09 12:26:05 -04:00
5f78300d78 Merge branch 'master' into feat/detect_k8s_flavour
All checks were successful
Run Check Script / check (pull_request) Successful in 1m20s
2025-10-02 17:14:30 -04:00
2d3c32469c chore: Simplify k8s flavour detection algorithm and do not unwrap when it cannot be detected, just return Err 2025-09-30 22:59:50 -04:00
1cec398d4d fix: modifed naming scheme to OpenshiftFamily, K3sFamily, and defaultswitched discovery of openshiftfamily to look for projet.openshift.io 2025-09-29 11:29:34 -04:00
58b6268989 wip: moving the install steps for grafana and prometheus into the trait installable<T> 2025-09-29 10:46:29 -04:00
f073b7e5fb feat:added k8s flavour to k8s_aywhere topology to be able to get the type of cluster
All checks were successful
Run Check Script / check (pull_request) Successful in 33s
2025-09-24 13:28:46 -04:00
66 changed files with 2229 additions and 570 deletions

18
Cargo.lock generated
View File

@@ -1780,6 +1780,7 @@ dependencies = [
name = "example-nanodc"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
@@ -1788,6 +1789,7 @@ dependencies = [
"harmony_tui",
"harmony_types",
"log",
"serde",
"tokio",
"url",
]
@@ -1806,6 +1808,7 @@ dependencies = [
name = "example-okd-install"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
@@ -1836,13 +1839,16 @@ dependencies = [
name = "example-opnsense"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
"harmony_macros",
"harmony_secret",
"harmony_tui",
"harmony_types",
"log",
"serde",
"tokio",
"url",
]
@@ -1851,6 +1857,7 @@ dependencies = [
name = "example-pxe"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
@@ -1865,6 +1872,15 @@ dependencies = [
"url",
]
[[package]]
name = "example-remove-rook-osd"
version = "0.1.0"
dependencies = [
"harmony",
"harmony_cli",
"tokio",
]
[[package]]
name = "example-rust"
version = "0.1.0"
@@ -1918,8 +1934,6 @@ dependencies = [
"env_logger",
"harmony",
"harmony_macros",
"harmony_secret",
"harmony_secret_derive",
"harmony_tui",
"harmony_types",
"log",

View File

@@ -10,6 +10,7 @@ use log::{debug, info};
use regex::Regex;
use std::{collections::HashSet, str::FromStr};
#[derive(Debug)]
pub struct FastIronClient {
shell: BrocadeShell,
version: BrocadeInfo,

View File

@@ -31,6 +31,7 @@ pub struct BrocadeOptions {
pub struct TimeoutConfig {
pub shell_ready: Duration,
pub command_execution: Duration,
pub command_output: Duration,
pub cleanup: Duration,
pub message_wait: Duration,
}
@@ -40,6 +41,7 @@ impl Default for TimeoutConfig {
Self {
shell_ready: Duration::from_secs(10),
command_execution: Duration::from_secs(60), // Commands like `deploy` (for a LAG) can take a while
command_output: Duration::from_secs(5), // Delay to start logging "waiting for command output"
cleanup: Duration::from_secs(10),
message_wait: Duration::from_millis(500),
}
@@ -162,7 +164,7 @@ pub async fn init(
}
#[async_trait]
pub trait BrocadeClient {
pub trait BrocadeClient: std::fmt::Debug {
/// Retrieves the operating system and version details from the connected Brocade switch.
///
/// This is typically the first call made after establishing a connection to determine

View File

@@ -3,6 +3,7 @@ use std::str::FromStr;
use async_trait::async_trait;
use harmony_types::switch::{PortDeclaration, PortLocation};
use log::{debug, info};
use regex::Regex;
use crate::{
BrocadeClient, BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo,
@@ -10,6 +11,7 @@ use crate::{
parse_brocade_mac_address, shell::BrocadeShell,
};
#[derive(Debug)]
pub struct NetworkOperatingSystemClient {
shell: BrocadeShell,
version: BrocadeInfo,
@@ -102,13 +104,37 @@ impl NetworkOperatingSystemClient {
};
Some(Ok(InterfaceInfo {
name: format!("{} {}", interface_type, port_location),
name: format!("{interface_type} {port_location}"),
port_location,
interface_type,
operating_mode,
status,
}))
}
fn map_configure_interfaces_error(&self, err: Error) -> Error {
debug!("[Brocade] {err}");
if let Error::CommandError(message) = &err {
if message.contains("switchport")
&& message.contains("Cannot configure aggregator member")
{
let re = Regex::new(r"\(conf-if-([a-zA-Z]+)-([\d/]+)\)#").unwrap();
if let Some(caps) = re.captures(message) {
let interface_type = &caps[1];
let port_location = &caps[2];
let interface = format!("{interface_type} {port_location}");
return Error::CommandError(format!(
"Cannot configure interface '{interface}', it is a member of a port-channel (LAG)"
));
}
}
}
err
}
}
#[async_trait]
@@ -196,11 +222,10 @@ impl BrocadeClient for NetworkOperatingSystemClient {
commands.push("exit".into());
}
commands.push("write memory".into());
self.shell
.run_commands(commands, ExecutionMode::Regular)
.await?;
.await
.map_err(|err| self.map_configure_interfaces_error(err))?;
info!("[Brocade] Interfaces configured.");
@@ -212,7 +237,7 @@ impl BrocadeClient for NetworkOperatingSystemClient {
let output = self
.shell
.run_command("show port-channel", ExecutionMode::Regular)
.run_command("show port-channel summary", ExecutionMode::Regular)
.await?;
let used_ids: Vec<u8> = output
@@ -247,7 +272,12 @@ impl BrocadeClient for NetworkOperatingSystemClient {
ports: &[PortLocation],
) -> Result<(), Error> {
info!(
"[Brocade] Configuring port-channel '{channel_name} {channel_id}' with ports: {ports:?}"
"[Brocade] Configuring port-channel '{channel_id} {channel_name}' with ports: {}",
ports
.iter()
.map(|p| format!("{p}"))
.collect::<Vec<String>>()
.join(", ")
);
let interfaces = self.get_interfaces().await?;
@@ -275,8 +305,6 @@ impl BrocadeClient for NetworkOperatingSystemClient {
commands.push("exit".into());
}
commands.push("write memory".into());
self.shell
.run_commands(commands, ExecutionMode::Regular)
.await?;
@@ -293,7 +321,6 @@ impl BrocadeClient for NetworkOperatingSystemClient {
"configure terminal".into(),
format!("no interface port-channel {}", channel_name),
"exit".into(),
"write memory".into(),
];
self.shell

View File

@@ -13,6 +13,7 @@ use log::info;
use russh::ChannelMsg;
use tokio::time::timeout;
#[derive(Debug)]
pub struct BrocadeShell {
ip: IpAddr,
port: u16,
@@ -210,7 +211,7 @@ impl BrocadeSession {
let mut output = Vec::new();
let start = Instant::now();
let read_timeout = Duration::from_millis(500);
let log_interval = Duration::from_secs(3);
let log_interval = Duration::from_secs(5);
let mut last_log = Instant::now();
loop {
@@ -220,7 +221,9 @@ impl BrocadeSession {
));
}
if start.elapsed() > Duration::from_secs(5) && last_log.elapsed() > log_interval {
if start.elapsed() > self.options.timeouts.command_output
&& last_log.elapsed() > log_interval
{
info!("[Brocade] Waiting for command output...");
last_log = Instant::now();
}
@@ -275,7 +278,7 @@ impl BrocadeSession {
let output_lower = output.to_lowercase();
if ERROR_PATTERNS.iter().any(|&p| output_lower.contains(p)) {
return Err(Error::CommandError(format!(
"Command '{command}' failed: {}",
"Command error: {}",
output.trim()
)));
}

View File

@@ -0,0 +1,21 @@
[package]
name = "example-ha-cluster"
edition = "2024"
version.workspace = true
readme.workspace = true
license.workspace = true
publish = false
[dependencies]
harmony = { path = "../../harmony" }
harmony_tui = { path = "../../harmony_tui" }
harmony_types = { path = "../../harmony_types" }
cidr = { workspace = true }
tokio = { workspace = true }
harmony_macros = { path = "../../harmony_macros" }
log = { workspace = true }
env_logger = { workspace = true }
url = { workspace = true }
harmony_secret = { path = "../../harmony_secret" }
brocade = { path = "../../brocade" }
serde = { workspace = true }

View File

@@ -0,0 +1,15 @@
## OPNSense demo
Download the virtualbox snapshot from {{TODO URL}}
Start the virtualbox image
This virtualbox image is configured to use a bridge on the host's physical interface, make sure the bridge is up and the virtual machine can reach internet.
Credentials are opnsense default (root/opnsense)
Run the project with the correct ip address on the command line :
```bash
cargo run -p example-opnsense -- 192.168.5.229
```

View File

@@ -0,0 +1,141 @@
use std::{
net::{IpAddr, Ipv4Addr},
sync::Arc,
};
use brocade::BrocadeOptions;
use cidr::Ipv4Cidr;
use harmony::{
hardware::{HostCategory, Location, PhysicalHost, SwitchGroup},
infra::{brocade::BrocadeSwitchClient, opnsense::OPNSenseManagementInterface},
inventory::Inventory,
modules::{
dummy::{ErrorScore, PanicScore, SuccessScore},
http::StaticFilesHttpScore,
okd::{dhcp::OKDDhcpScore, dns::OKDDnsScore, load_balancer::OKDLoadBalancerScore},
opnsense::OPNsenseShellCommandScore,
tftp::TftpScore,
},
topology::{LogicalHost, UnmanagedRouter},
};
use harmony_macros::{ip, mac_address};
use harmony_secret::{Secret, SecretManager};
use harmony_types::net::Url;
use serde::{Deserialize, Serialize};
#[tokio::main]
async fn main() {
let firewall = harmony::topology::LogicalHost {
ip: ip!("192.168.5.229"),
name: String::from("opnsense-1"),
};
let switch_auth = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
.await
.expect("Failed to get credentials");
let switches: Vec<IpAddr> = vec![ip!("192.168.5.101")]; // TODO: Adjust me
let brocade_options = Some(BrocadeOptions {
dry_run: *harmony::config::DRY_RUN,
..Default::default()
});
let switch_client = BrocadeSwitchClient::init(
&switches,
&switch_auth.username,
&switch_auth.password,
brocade_options,
)
.await
.expect("Failed to connect to switch");
let switch_client = Arc::new(switch_client);
let opnsense = Arc::new(
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, "root", "opnsense").await,
);
let lan_subnet = Ipv4Addr::new(10, 100, 8, 0);
let gateway_ipv4 = Ipv4Addr::new(10, 100, 8, 1);
let gateway_ip = IpAddr::V4(gateway_ipv4);
let topology = harmony::topology::HAClusterTopology {
kubeconfig: None,
domain_name: "demo.harmony.mcd".to_string(),
router: Arc::new(UnmanagedRouter::new(
gateway_ip,
Ipv4Cidr::new(lan_subnet, 24).unwrap(),
)),
load_balancer: opnsense.clone(),
firewall: opnsense.clone(),
tftp_server: opnsense.clone(),
http_server: opnsense.clone(),
dhcp_server: opnsense.clone(),
dns_server: opnsense.clone(),
control_plane: vec![LogicalHost {
ip: ip!("10.100.8.20"),
name: "cp0".to_string(),
}],
bootstrap_host: LogicalHost {
ip: ip!("10.100.8.20"),
name: "cp0".to_string(),
},
workers: vec![],
switch_client: switch_client.clone(),
};
let inventory = Inventory {
location: Location::new(
"232 des Éperviers, Wendake, Qc, G0A 4V0".to_string(),
"wk".to_string(),
),
switch: SwitchGroup::from([]),
firewall_mgmt: Box::new(OPNSenseManagementInterface::new()),
storage_host: vec![],
worker_host: vec![],
control_plane_host: vec![
PhysicalHost::empty(HostCategory::Server)
.mac_address(mac_address!("08:00:27:62:EC:C3")),
],
};
// TODO regroup smaller scores in a larger one such as this
// let okd_boostrap_preparation();
let dhcp_score = OKDDhcpScore::new(&topology, &inventory);
let dns_score = OKDDnsScore::new(&topology);
let load_balancer_score = OKDLoadBalancerScore::new(&topology);
let tftp_score = TftpScore::new(Url::LocalFolder("./data/watchguard/tftpboot".to_string()));
let http_score = StaticFilesHttpScore {
folder_to_serve: Some(Url::LocalFolder(
"./data/watchguard/pxe-http-files".to_string(),
)),
files: vec![],
remote_path: None,
};
harmony_tui::run(
inventory,
topology,
vec![
Box::new(dns_score),
Box::new(dhcp_score),
Box::new(load_balancer_score),
Box::new(tftp_score),
Box::new(http_score),
Box::new(OPNsenseShellCommandScore {
opnsense: opnsense.get_opnsense_config(),
command: "touch /tmp/helloharmonytouching".to_string(),
}),
Box::new(SuccessScore {}),
Box::new(ErrorScore {}),
Box::new(PanicScore {}),
],
)
.await
.unwrap();
}
#[derive(Secret, Serialize, Deserialize, Debug)]
pub struct BrocadeSwitchAuth {
pub username: String,
pub password: String,
}

View File

@@ -17,3 +17,5 @@ harmony_secret = { path = "../../harmony_secret" }
log = { workspace = true }
env_logger = { workspace = true }
url = { workspace = true }
serde = { workspace = true }
brocade = { path = "../../brocade" }

View File

@@ -3,12 +3,13 @@ use std::{
sync::Arc,
};
use brocade::BrocadeOptions;
use cidr::Ipv4Cidr;
use harmony::{
config::secret::SshKeyPair,
data::{FileContent, FilePath},
hardware::{HostCategory, Location, PhysicalHost, SwitchGroup},
infra::opnsense::OPNSenseManagementInterface,
infra::{brocade::BrocadeSwitchClient, opnsense::OPNSenseManagementInterface},
inventory::Inventory,
modules::{
http::StaticFilesHttpScore,
@@ -22,8 +23,9 @@ use harmony::{
topology::{LogicalHost, UnmanagedRouter},
};
use harmony_macros::{ip, mac_address};
use harmony_secret::SecretManager;
use harmony_secret::{Secret, SecretManager};
use harmony_types::net::Url;
use serde::{Deserialize, Serialize};
#[tokio::main]
async fn main() {
@@ -32,6 +34,26 @@ async fn main() {
name: String::from("fw0"),
};
let switch_auth = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
.await
.expect("Failed to get credentials");
let switches: Vec<IpAddr> = vec![ip!("192.168.33.101")];
let brocade_options = Some(BrocadeOptions {
dry_run: *harmony::config::DRY_RUN,
..Default::default()
});
let switch_client = BrocadeSwitchClient::init(
&switches,
&switch_auth.username,
&switch_auth.password,
brocade_options,
)
.await
.expect("Failed to connect to switch");
let switch_client = Arc::new(switch_client);
let opnsense = Arc::new(
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, "root", "opnsense").await,
);
@@ -39,6 +61,7 @@ async fn main() {
let gateway_ipv4 = Ipv4Addr::new(192, 168, 33, 1);
let gateway_ip = IpAddr::V4(gateway_ipv4);
let topology = harmony::topology::HAClusterTopology {
kubeconfig: None,
domain_name: "ncd0.harmony.mcd".to_string(), // TODO this must be set manually correctly
// when setting up the opnsense firewall
router: Arc::new(UnmanagedRouter::new(
@@ -83,7 +106,7 @@ async fn main() {
name: "wk2".to_string(),
},
],
switch: vec![],
switch_client: switch_client.clone(),
};
let inventory = Inventory {
@@ -166,3 +189,9 @@ async fn main() {
.await
.unwrap();
}
#[derive(Secret, Serialize, Deserialize, Debug)]
pub struct BrocadeSwitchAuth {
pub username: String,
pub password: String,
}

View File

@@ -19,3 +19,4 @@ log = { workspace = true }
env_logger = { workspace = true }
url = { workspace = true }
serde.workspace = true
brocade = { path = "../../brocade" }

View File

@@ -1,7 +1,8 @@
use brocade::BrocadeOptions;
use cidr::Ipv4Cidr;
use harmony::{
hardware::{Location, SwitchGroup},
infra::opnsense::OPNSenseManagementInterface,
infra::{brocade::BrocadeSwitchClient, opnsense::OPNSenseManagementInterface},
inventory::Inventory,
topology::{HAClusterTopology, LogicalHost, UnmanagedRouter},
};
@@ -22,6 +23,26 @@ pub async fn get_topology() -> HAClusterTopology {
name: String::from("opnsense-1"),
};
let switch_auth = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
.await
.expect("Failed to get credentials");
let switches: Vec<IpAddr> = vec![ip!("192.168.1.101")]; // TODO: Adjust me
let brocade_options = Some(BrocadeOptions {
dry_run: *harmony::config::DRY_RUN,
..Default::default()
});
let switch_client = BrocadeSwitchClient::init(
&switches,
&switch_auth.username,
&switch_auth.password,
brocade_options,
)
.await
.expect("Failed to connect to switch");
let switch_client = Arc::new(switch_client);
let config = SecretManager::get_or_prompt::<OPNSenseFirewallConfig>().await;
let config = config.unwrap();
@@ -38,6 +59,7 @@ pub async fn get_topology() -> HAClusterTopology {
let gateway_ipv4 = ipv4!("192.168.1.1");
let gateway_ip = IpAddr::V4(gateway_ipv4);
harmony::topology::HAClusterTopology {
kubeconfig: None,
domain_name: "demo.harmony.mcd".to_string(),
router: Arc::new(UnmanagedRouter::new(
gateway_ip,
@@ -58,7 +80,7 @@ pub async fn get_topology() -> HAClusterTopology {
name: "bootstrap".to_string(),
},
workers: vec![],
switch: vec![],
switch_client: switch_client.clone(),
}
}
@@ -75,3 +97,9 @@ pub fn get_inventory() -> Inventory {
control_plane_host: vec![],
}
}
#[derive(Secret, Serialize, Deserialize, Debug)]
pub struct BrocadeSwitchAuth {
pub username: String,
pub password: String,
}

View File

@@ -19,3 +19,4 @@ log = { workspace = true }
env_logger = { workspace = true }
url = { workspace = true }
serde.workspace = true
brocade = { path = "../../brocade" }

View File

@@ -1,13 +1,15 @@
use brocade::BrocadeOptions;
use cidr::Ipv4Cidr;
use harmony::{
config::secret::OPNSenseFirewallCredentials,
hardware::{Location, SwitchGroup},
infra::opnsense::OPNSenseManagementInterface,
infra::{brocade::BrocadeSwitchClient, opnsense::OPNSenseManagementInterface},
inventory::Inventory,
topology::{HAClusterTopology, LogicalHost, UnmanagedRouter},
};
use harmony_macros::{ip, ipv4};
use harmony_secret::SecretManager;
use harmony_secret::{Secret, SecretManager};
use serde::{Deserialize, Serialize};
use std::{net::IpAddr, sync::Arc};
pub async fn get_topology() -> HAClusterTopology {
@@ -16,6 +18,26 @@ pub async fn get_topology() -> HAClusterTopology {
name: String::from("opnsense-1"),
};
let switch_auth = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
.await
.expect("Failed to get credentials");
let switches: Vec<IpAddr> = vec![ip!("192.168.1.101")]; // TODO: Adjust me
let brocade_options = Some(BrocadeOptions {
dry_run: *harmony::config::DRY_RUN,
..Default::default()
});
let switch_client = BrocadeSwitchClient::init(
&switches,
&switch_auth.username,
&switch_auth.password,
brocade_options,
)
.await
.expect("Failed to connect to switch");
let switch_client = Arc::new(switch_client);
let config = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>().await;
let config = config.unwrap();
@@ -32,6 +54,7 @@ pub async fn get_topology() -> HAClusterTopology {
let gateway_ipv4 = ipv4!("192.168.1.1");
let gateway_ip = IpAddr::V4(gateway_ipv4);
harmony::topology::HAClusterTopology {
kubeconfig: None,
domain_name: "demo.harmony.mcd".to_string(),
router: Arc::new(UnmanagedRouter::new(
gateway_ip,
@@ -52,7 +75,7 @@ pub async fn get_topology() -> HAClusterTopology {
name: "cp0".to_string(),
},
workers: vec![],
switch: vec![],
switch_client: switch_client.clone(),
}
}
@@ -69,3 +92,9 @@ pub fn get_inventory() -> Inventory {
control_plane_host: vec![],
}
}
#[derive(Secret, Serialize, Deserialize, Debug)]
pub struct BrocadeSwitchAuth {
pub username: String,
pub password: String,
}

View File

@@ -8,7 +8,7 @@ publish = false
[dependencies]
harmony = { path = "../../harmony" }
harmony_tui = { path = "../../harmony_tui" }
harmony_cli = { path = "../../harmony_cli" }
harmony_types = { path = "../../harmony_types" }
cidr = { workspace = true }
tokio = { workspace = true }
@@ -16,3 +16,6 @@ harmony_macros = { path = "../../harmony_macros" }
log = { workspace = true }
env_logger = { workspace = true }
url = { workspace = true }
harmony_secret = { path = "../../harmony_secret" }
brocade = { path = "../../brocade" }
serde = { workspace = true }

3
examples/opnsense/env.sh Normal file
View File

@@ -0,0 +1,3 @@
export HARMONY_SECRET_NAMESPACE=example-opnsense
export HARMONY_SECRET_STORE=file
export RUST_LOG=info

View File

@@ -1,111 +1,77 @@
use std::{
net::{IpAddr, Ipv4Addr},
sync::Arc,
};
use cidr::Ipv4Cidr;
use harmony::{
hardware::{HostCategory, Location, PhysicalHost, SwitchGroup},
infra::opnsense::OPNSenseManagementInterface,
config::secret::OPNSenseFirewallCredentials,
infra::opnsense::OPNSenseFirewall,
inventory::Inventory,
modules::{
dummy::{ErrorScore, PanicScore, SuccessScore},
http::StaticFilesHttpScore,
okd::{dhcp::OKDDhcpScore, dns::OKDDnsScore, load_balancer::OKDLoadBalancerScore},
opnsense::OPNsenseShellCommandScore,
tftp::TftpScore,
},
topology::{LogicalHost, UnmanagedRouter},
modules::{dhcp::DhcpScore, opnsense::OPNsenseShellCommandScore},
topology::LogicalHost,
};
use harmony_macros::{ip, mac_address};
use harmony_types::net::Url;
use harmony_macros::{ip, ipv4};
use harmony_secret::{Secret, SecretManager};
use serde::{Deserialize, Serialize};
#[tokio::main]
async fn main() {
let firewall = harmony::topology::LogicalHost {
ip: ip!("192.168.5.229"),
let firewall = LogicalHost {
ip: ip!("192.168.55.1"),
name: String::from("opnsense-1"),
};
let opnsense = Arc::new(
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, "root", "opnsense").await,
);
let lan_subnet = Ipv4Addr::new(10, 100, 8, 0);
let gateway_ipv4 = Ipv4Addr::new(10, 100, 8, 1);
let gateway_ip = IpAddr::V4(gateway_ipv4);
let topology = harmony::topology::HAClusterTopology {
domain_name: "demo.harmony.mcd".to_string(),
router: Arc::new(UnmanagedRouter::new(
gateway_ip,
Ipv4Cidr::new(lan_subnet, 24).unwrap(),
)),
load_balancer: opnsense.clone(),
firewall: opnsense.clone(),
tftp_server: opnsense.clone(),
http_server: opnsense.clone(),
dhcp_server: opnsense.clone(),
dns_server: opnsense.clone(),
control_plane: vec![LogicalHost {
ip: ip!("10.100.8.20"),
name: "cp0".to_string(),
}],
bootstrap_host: LogicalHost {
ip: ip!("10.100.8.20"),
name: "cp0".to_string(),
},
workers: vec![],
switch: vec![],
};
let opnsense_auth = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
.await
.expect("Failed to get credentials");
let inventory = Inventory {
location: Location::new(
"232 des Éperviers, Wendake, Qc, G0A 4V0".to_string(),
"wk".to_string(),
let opnsense = OPNSenseFirewall::new(
firewall,
None,
&opnsense_auth.username,
&opnsense_auth.password,
)
.await;
let dhcp_score = DhcpScore {
dhcp_range: (
ipv4!("192.168.55.100").into(),
ipv4!("192.168.55.150").into(),
),
switch: SwitchGroup::from([]),
firewall_mgmt: Box::new(OPNSenseManagementInterface::new()),
storage_host: vec![],
worker_host: vec![],
control_plane_host: vec![
PhysicalHost::empty(HostCategory::Server)
.mac_address(mac_address!("08:00:27:62:EC:C3")),
],
host_binding: vec![],
next_server: None,
boot_filename: None,
filename: None,
filename64: None,
filenameipxe: Some("filename.ipxe".to_string()),
domain: None,
};
// let dns_score = OKDDnsScore::new(&topology);
// let load_balancer_score = OKDLoadBalancerScore::new(&topology);
//
// let tftp_score = TftpScore::new(Url::LocalFolder("./data/watchguard/tftpboot".to_string()));
// let http_score = StaticFilesHttpScore {
// folder_to_serve: Some(Url::LocalFolder(
// "./data/watchguard/pxe-http-files".to_string(),
// )),
// files: vec![],
// remote_path: None,
// };
let opnsense_config = opnsense.get_opnsense_config();
// TODO regroup smaller scores in a larger one such as this
// let okd_boostrap_preparation();
let dhcp_score = OKDDhcpScore::new(&topology, &inventory);
let dns_score = OKDDnsScore::new(&topology);
let load_balancer_score = OKDLoadBalancerScore::new(&topology);
let tftp_score = TftpScore::new(Url::LocalFolder("./data/watchguard/tftpboot".to_string()));
let http_score = StaticFilesHttpScore {
folder_to_serve: Some(Url::LocalFolder(
"./data/watchguard/pxe-http-files".to_string(),
)),
files: vec![],
remote_path: None,
};
harmony_tui::run(
inventory,
topology,
harmony_cli::run(
Inventory::autoload(),
opnsense,
vec![
Box::new(dns_score),
Box::new(dhcp_score),
Box::new(load_balancer_score),
Box::new(tftp_score),
Box::new(http_score),
Box::new(OPNsenseShellCommandScore {
opnsense: opnsense.get_opnsense_config(),
command: "touch /tmp/helloharmonytouching".to_string(),
opnsense: opnsense_config,
command: "touch /tmp/helloharmonytouching_2".to_string(),
}),
Box::new(SuccessScore {}),
Box::new(ErrorScore {}),
Box::new(PanicScore {}),
],
None,
)
.await
.unwrap();
}
#[derive(Secret, Serialize, Deserialize, Debug)]
pub struct BrocadeSwitchAuth {
pub username: String,
pub password: String,
}

View File

@@ -0,0 +1,11 @@
[package]
name = "example-remove-rook-osd"
edition = "2024"
version.workspace = true
readme.workspace = true
license.workspace = true
[dependencies]
harmony = { version = "0.1.0", path = "../../harmony" }
harmony_cli = { version = "0.1.0", path = "../../harmony_cli" }
tokio.workspace = true

View File

@@ -0,0 +1,18 @@
use harmony::{
inventory::Inventory, modules::storage::ceph::ceph_remove_osd_score::CephRemoveOsd,
topology::K8sAnywhereTopology,
};
#[tokio::main]
async fn main() {
let ceph_score = CephRemoveOsd {
osd_deployment_name: "rook-ceph-osd-2".to_string(),
rook_ceph_namespace: "rook-ceph".to_string(),
};
let topology = K8sAnywhereTopology::from_env();
let inventory = Inventory::autoload();
harmony_cli::run(inventory, topology, vec![Box::new(ceph_score)], None)
.await
.unwrap();
}

View File

@@ -3,7 +3,7 @@ use harmony::{
modules::{
application::{
ApplicationScore, RustWebFramework, RustWebapp,
features::{PackagingDeployment, rhob_monitoring::Monitoring},
features::{Monitoring, PackagingDeployment},
},
monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
},

View File

@@ -30,6 +30,7 @@ pub enum InterpretName {
Lamp,
ApplicationMonitoring,
K8sPrometheusCrdAlerting,
CephRemoveOsd,
DiscoverInventoryAgent,
CephClusterHealth,
Custom(&'static str),
@@ -61,6 +62,7 @@ impl std::fmt::Display for InterpretName {
InterpretName::Lamp => f.write_str("LAMP"),
InterpretName::ApplicationMonitoring => f.write_str("ApplicationMonitoring"),
InterpretName::K8sPrometheusCrdAlerting => f.write_str("K8sPrometheusCrdAlerting"),
InterpretName::CephRemoveOsd => f.write_str("CephRemoveOsd"),
InterpretName::DiscoverInventoryAgent => f.write_str("DiscoverInventoryAgent"),
InterpretName::CephClusterHealth => f.write_str("CephClusterHealth"),
InterpretName::Custom(name) => f.write_str(name),

View File

@@ -67,16 +67,16 @@ impl<T: Topology> Maestro<T> {
}
}
pub fn register_all(&mut self, mut scores: ScoreVec<T>) {
let mut score_mut = self.scores.write().expect("Should acquire lock");
score_mut.append(&mut scores);
}
fn is_topology_initialized(&self) -> bool {
self.topology_state.status == TopologyStatus::Success
|| self.topology_state.status == TopologyStatus::Noop
}
pub fn register_all(&mut self, mut scores: ScoreVec<T>) {
let mut score_mut = self.scores.write().expect("Should acquire lock");
score_mut.append(&mut scores);
}
pub async fn interpret(&self, score: Box<dyn Score<T>>) -> Result<Outcome, InterpretError> {
if !self.is_topology_initialized() {
warn!(

View File

@@ -1,26 +1,19 @@
use async_trait::async_trait;
use brocade::BrocadeOptions;
use harmony_macros::ip;
use harmony_secret::SecretManager;
use harmony_types::{
net::{MacAddress, Url},
switch::PortLocation,
};
use k8s_openapi::api::core::v1::Namespace;
use kube::api::ObjectMeta;
use log::debug;
use log::info;
use crate::data::FileContent;
use crate::executors::ExecutorError;
use crate::hardware::PhysicalHost;
use crate::infra::brocade::BrocadeSwitchAuth;
use crate::infra::brocade::BrocadeSwitchClient;
use crate::modules::okd::crd::{
InstallPlanApproval, OperatorGroup, OperatorGroupSpec, Subscription, SubscriptionSpec,
nmstate::{self, NMState, NodeNetworkConfigurationPolicy, NodeNetworkConfigurationPolicySpec},
};
use crate::modules::okd::crd::nmstate::{self, NodeNetworkConfigurationPolicy};
use crate::topology::PxeOptions;
use crate::{data::FileContent, modules::okd::crd::nmstate::NMState};
use crate::{
executors::ExecutorError, modules::okd::crd::nmstate::NodeNetworkConfigurationPolicySpec,
};
use super::{
DHCPStaticEntry, DhcpServer, DnsRecord, DnsRecordType, DnsServer, Firewall, HostNetworkConfig,
@@ -30,7 +23,6 @@ use super::{
};
use std::collections::BTreeMap;
use std::net::IpAddr;
use std::sync::Arc;
#[derive(Debug, Clone)]
@@ -43,10 +35,11 @@ pub struct HAClusterTopology {
pub tftp_server: Arc<dyn TftpServer>,
pub http_server: Arc<dyn HttpServer>,
pub dns_server: Arc<dyn DnsServer>,
pub switch_client: Arc<dyn SwitchClient>,
pub bootstrap_host: LogicalHost,
pub control_plane: Vec<LogicalHost>,
pub workers: Vec<LogicalHost>,
pub switch: Vec<LogicalHost>,
pub kubeconfig: Option<String>,
}
#[async_trait]
@@ -65,9 +58,17 @@ impl Topology for HAClusterTopology {
#[async_trait]
impl K8sclient for HAClusterTopology {
async fn k8s_client(&self) -> Result<Arc<K8sClient>, String> {
Ok(Arc::new(
K8sClient::try_default().await.map_err(|e| e.to_string())?,
))
match &self.kubeconfig {
None => Ok(Arc::new(
K8sClient::try_default().await.map_err(|e| e.to_string())?,
)),
Some(kubeconfig) => {
let Some(client) = K8sClient::from_kubeconfig(&kubeconfig).await else {
return Err("Failed to create k8s client".to_string());
};
Ok(Arc::new(client))
}
}
}
}
@@ -93,60 +94,48 @@ impl HAClusterTopology {
}
async fn ensure_nmstate_operator_installed(&self) -> Result<(), String> {
// FIXME: Find a way to check nmstate is already available (get pod -n openshift-nmstate)
debug!("Installing NMState operator...");
let k8s_client = self.k8s_client().await?;
let nmstate_namespace = Namespace {
metadata: ObjectMeta {
name: Some("openshift-nmstate".to_string()),
finalizers: Some(vec!["kubernetes".to_string()]),
..Default::default()
},
..Default::default()
};
debug!("Creating NMState namespace: {nmstate_namespace:#?}");
k8s_client
.apply(&nmstate_namespace, None)
debug!("Installing NMState controller...");
k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/nmstate.io_nmstates.yaml
").unwrap(), Some("nmstate"))
.await
.map_err(|e| e.to_string())?;
let nmstate_operator_group = OperatorGroup {
metadata: ObjectMeta {
name: Some("openshift-nmstate".to_string()),
namespace: Some("openshift-nmstate".to_string()),
..Default::default()
},
spec: OperatorGroupSpec {
target_namespaces: vec!["openshift-nmstate".to_string()],
},
};
debug!("Creating NMState operator group: {nmstate_operator_group:#?}");
k8s_client
.apply(&nmstate_operator_group, None)
debug!("Creating NMState namespace...");
k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/namespace.yaml
").unwrap(), Some("nmstate"))
.await
.map_err(|e| e.to_string())?;
let nmstate_subscription = Subscription {
metadata: ObjectMeta {
name: Some("kubernetes-nmstate-operator".to_string()),
namespace: Some("openshift-nmstate".to_string()),
..Default::default()
},
spec: SubscriptionSpec {
channel: Some("stable".to_string()),
install_plan_approval: Some(InstallPlanApproval::Automatic),
name: "kubernetes-nmstate-operator".to_string(),
source: "redhat-operators".to_string(),
source_namespace: "openshift-marketplace".to_string(),
},
};
debug!("Subscribing to NMState Operator: {nmstate_subscription:#?}");
k8s_client
.apply(&nmstate_subscription, None)
debug!("Creating NMState service account...");
k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/service_account.yaml
").unwrap(), Some("nmstate"))
.await
.map_err(|e| e.to_string())?;
debug!("Creating NMState role...");
k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/role.yaml
").unwrap(), Some("nmstate"))
.await
.map_err(|e| e.to_string())?;
debug!("Creating NMState role binding...");
k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/role_binding.yaml
").unwrap(), Some("nmstate"))
.await
.map_err(|e| e.to_string())?;
debug!("Creating NMState operator...");
k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/operator.yaml
").unwrap(), Some("nmstate"))
.await
.map_err(|e| e.to_string())?;
k8s_client
.wait_until_deployment_ready("nmstate-operator", Some("nmstate"), None)
.await?;
let nmstate = NMState {
metadata: ObjectMeta {
name: Some("nmstate".to_string()),
@@ -167,11 +156,7 @@ impl HAClusterTopology {
42 // FIXME: Find a better way to declare the bond id
}
async fn configure_bond(
&self,
host: &PhysicalHost,
config: &HostNetworkConfig,
) -> Result<(), SwitchError> {
async fn configure_bond(&self, config: &HostNetworkConfig) -> Result<(), SwitchError> {
self.ensure_nmstate_operator_installed()
.await
.map_err(|e| {
@@ -180,29 +165,33 @@ impl HAClusterTopology {
))
})?;
let bond_config = self.create_bond_configuration(host, config);
debug!("Configuring bond for host {host:?}: {bond_config:#?}");
let bond_config = self.create_bond_configuration(config);
debug!(
"Applying NMState bond config for host {}: {bond_config:#?}",
config.host_id
);
self.k8s_client()
.await
.unwrap()
.apply(&bond_config, None)
.await
.unwrap();
.map_err(|e| SwitchError::new(format!("Failed to configure bond: {e}")))?;
todo!()
Ok(())
}
fn create_bond_configuration(
&self,
host: &PhysicalHost,
config: &HostNetworkConfig,
) -> NodeNetworkConfigurationPolicy {
let host_name = host.id.clone();
let host_name = &config.host_id;
let bond_id = self.get_next_bond_id();
let bond_name = format!("bond{bond_id}");
info!("Configuring bond '{bond_name}' for host '{host_name}'...");
let mut bond_mtu: Option<u32> = None;
let mut bond_mac_address: Option<String> = None;
let mut copy_mac_from: Option<String> = None;
let mut bond_ports = Vec::new();
let mut interfaces: Vec<nmstate::InterfaceSpec> = Vec::new();
@@ -228,14 +217,14 @@ impl HAClusterTopology {
..Default::default()
});
bond_ports.push(interface_name);
bond_ports.push(interface_name.clone());
// Use the first port's details for the bond mtu and mac address
if bond_mtu.is_none() {
bond_mtu = Some(switch_port.interface.mtu);
}
if bond_mac_address.is_none() {
bond_mac_address = Some(switch_port.interface.mac_address.to_string());
if copy_mac_from.is_none() {
copy_mac_from = Some(interface_name);
}
}
@@ -244,8 +233,7 @@ impl HAClusterTopology {
description: Some(format!("Network bond for host {host_name}")),
r#type: "bond".to_string(),
state: "up".to_string(),
mtu: bond_mtu,
mac_address: bond_mac_address,
copy_mac_from,
ipv4: Some(nmstate::IpStackSpec {
dhcp: Some(true),
enabled: Some(true),
@@ -280,37 +268,12 @@ impl HAClusterTopology {
}
}
async fn get_switch_client(&self) -> Result<Box<dyn SwitchClient>, SwitchError> {
let auth = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
.await
.map_err(|e| SwitchError::new(format!("Failed to get credentials: {e}")))?;
// FIXME: We assume Brocade switches
let switches: Vec<IpAddr> = self.switch.iter().map(|s| s.ip).collect();
let brocade_options = Some(BrocadeOptions {
dry_run: *crate::config::DRY_RUN,
..Default::default()
});
let client =
BrocadeSwitchClient::init(&switches, &auth.username, &auth.password, brocade_options)
.await
.map_err(|e| SwitchError::new(format!("Failed to connect to switch: {e}")))?;
Ok(Box::new(client))
}
async fn configure_port_channel(
&self,
host: &PhysicalHost,
config: &HostNetworkConfig,
) -> Result<(), SwitchError> {
async fn configure_port_channel(&self, config: &HostNetworkConfig) -> Result<(), SwitchError> {
debug!("Configuring port channel: {config:#?}");
let client = self.get_switch_client().await?;
let switch_ports = config.switch_ports.iter().map(|s| s.port.clone()).collect();
client
.configure_port_channel(&format!("Harmony_{}", host.id), switch_ports)
self.switch_client
.configure_port_channel(&format!("Harmony_{}", config.host_id), switch_ports)
.await
.map_err(|e| SwitchError::new(format!("Failed to configure switch: {e}")))?;
@@ -325,6 +288,7 @@ impl HAClusterTopology {
};
Self {
kubeconfig: None,
domain_name: "DummyTopology".to_string(),
router: dummy_infra.clone(),
load_balancer: dummy_infra.clone(),
@@ -333,10 +297,10 @@ impl HAClusterTopology {
tftp_server: dummy_infra.clone(),
http_server: dummy_infra.clone(),
dns_server: dummy_infra.clone(),
switch_client: dummy_infra.clone(),
bootstrap_host: dummy_host,
control_plane: vec![],
workers: vec![],
switch: vec![],
}
}
}
@@ -494,8 +458,7 @@ impl HttpServer for HAClusterTopology {
#[async_trait]
impl Switch for HAClusterTopology {
async fn setup_switch(&self) -> Result<(), SwitchError> {
let client = self.get_switch_client().await?;
client.setup().await?;
self.switch_client.setup().await?;
Ok(())
}
@@ -503,18 +466,13 @@ impl Switch for HAClusterTopology {
&self,
mac_address: &MacAddress,
) -> Result<Option<PortLocation>, SwitchError> {
let client = self.get_switch_client().await?;
let port = client.find_port(mac_address).await?;
let port = self.switch_client.find_port(mac_address).await?;
Ok(port)
}
async fn configure_host_network(
&self,
host: &PhysicalHost,
config: HostNetworkConfig,
) -> Result<(), SwitchError> {
self.configure_bond(host, &config).await?;
self.configure_port_channel(host, &config).await
async fn configure_host_network(&self, config: &HostNetworkConfig) -> Result<(), SwitchError> {
self.configure_bond(config).await?;
self.configure_port_channel(config).await
}
}
@@ -704,3 +662,25 @@ impl DnsServer for DummyInfra {
unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
}
}
#[async_trait]
impl SwitchClient for DummyInfra {
async fn setup(&self) -> Result<(), SwitchError> {
unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
}
async fn find_port(
&self,
_mac_address: &MacAddress,
) -> Result<Option<PortLocation>, SwitchError> {
unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
}
async fn configure_port_channel(
&self,
_channel_name: &str,
_switch_ports: Vec<PortLocation>,
) -> Result<u8, SwitchError> {
unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
}
}

View File

@@ -5,14 +5,16 @@ use k8s_openapi::{
ClusterResourceScope, NamespaceResourceScope,
api::{
apps::v1::Deployment,
core::v1::{Pod, PodStatus},
core::v1::{Pod, ServiceAccount},
},
apimachinery::pkg::version::Info,
};
use kube::{
Client, Config, Error, Resource,
Client, Config, Discovery, Error, Resource,
api::{Api, AttachParams, DeleteParams, ListParams, Patch, PatchParams, ResourceExt},
config::{KubeConfigOptions, Kubeconfig},
core::ErrorResponse,
discovery::{ApiCapabilities, Scope},
error::DiscoveryError,
runtime::reflector::Lookup,
};
@@ -21,11 +23,12 @@ use kube::{
api::{ApiResource, GroupVersionKind},
runtime::wait::await_condition,
};
use log::{debug, error, trace};
use log::{debug, error, info, trace, warn};
use serde::{Serialize, de::DeserializeOwned};
use serde_json::{Value, json};
use serde_json::json;
use similar::TextDiff;
use tokio::{io::AsyncReadExt, time::sleep};
use url::Url;
#[derive(new, Clone)]
pub struct K8sClient {
@@ -59,6 +62,22 @@ impl K8sClient {
})
}
pub async fn service_account_api(&self, namespace: &str) -> Api<ServiceAccount> {
let api: Api<ServiceAccount> = Api::namespaced(self.client.clone(), namespace);
api
}
pub async fn get_apiserver_version(&self) -> Result<Info, Error> {
let client: Client = self.client.clone();
let version_info: Info = client.apiserver_version().await?;
Ok(version_info)
}
pub async fn discovery(&self) -> Result<Discovery, Error> {
let discovery: Discovery = Discovery::new(self.client.clone()).run().await?;
Ok(discovery)
}
pub async fn get_resource_json_value(
&self,
name: &str,
@@ -71,7 +90,8 @@ impl K8sClient {
} else {
Api::default_namespaced_with(self.client.clone(), &gvk)
};
Ok(resource.get(name).await?)
resource.get(name).await
}
pub async fn get_deployment(
@@ -80,11 +100,15 @@ impl K8sClient {
namespace: Option<&str>,
) -> Result<Option<Deployment>, Error> {
let deps: Api<Deployment> = if let Some(ns) = namespace {
debug!("getting namespaced deployment");
Api::namespaced(self.client.clone(), ns)
} else {
debug!("getting default namespace deployment");
Api::default_namespaced(self.client.clone())
};
Ok(deps.get_opt(name).await?)
debug!("getting deployment {} in ns {}", name, namespace.unwrap());
deps.get_opt(name).await
}
pub async fn get_pod(&self, name: &str, namespace: Option<&str>) -> Result<Option<Pod>, Error> {
@@ -93,7 +117,8 @@ impl K8sClient {
} else {
Api::default_namespaced(self.client.clone())
};
Ok(pods.get_opt(name).await?)
pods.get_opt(name).await
}
pub async fn scale_deployment(
@@ -114,7 +139,7 @@ impl K8sClient {
}
});
let pp = PatchParams::default();
let scale = Patch::Apply(&patch);
let scale = Patch::Merge(&patch);
deployments.patch_scale(name, &pp, &scale).await?;
Ok(())
}
@@ -136,9 +161,9 @@ impl K8sClient {
pub async fn wait_until_deployment_ready(
&self,
name: String,
name: &str,
namespace: Option<&str>,
timeout: Option<u64>,
timeout: Option<Duration>,
) -> Result<(), String> {
let api: Api<Deployment>;
@@ -148,9 +173,9 @@ impl K8sClient {
api = Api::default_namespaced(self.client.clone());
}
let establish = await_condition(api, name.as_str(), conditions::is_deployment_completed());
let t = timeout.unwrap_or(300);
let res = tokio::time::timeout(std::time::Duration::from_secs(t), establish).await;
let establish = await_condition(api, name, conditions::is_deployment_completed());
let timeout = timeout.unwrap_or(Duration::from_secs(120));
let res = tokio::time::timeout(timeout, establish).await;
if res.is_ok() {
Ok(())
@@ -240,7 +265,7 @@ impl K8sClient {
if let Some(s) = status.status {
let mut stdout_buf = String::new();
if let Some(mut stdout) = process.stdout().take() {
if let Some(mut stdout) = process.stdout() {
stdout
.read_to_string(&mut stdout_buf)
.await
@@ -346,14 +371,14 @@ impl K8sClient {
Ok(current) => {
trace!("Received current value {current:#?}");
// The resource exists, so we calculate and display a diff.
println!("\nPerforming dry-run for resource: '{}'", name);
println!("\nPerforming dry-run for resource: '{name}'");
let mut current_yaml = serde_yaml::to_value(&current).unwrap_or_else(|_| {
panic!("Could not serialize current value : {current:#?}")
});
if current_yaml.is_mapping() && current_yaml.get("status").is_some() {
let map = current_yaml.as_mapping_mut().unwrap();
let removed = map.remove_entry("status");
trace!("Removed status {:?}", removed);
trace!("Removed status {removed:?}");
} else {
trace!(
"Did not find status entry for current object {}/{}",
@@ -382,14 +407,14 @@ impl K8sClient {
similar::ChangeTag::Insert => "+",
similar::ChangeTag::Equal => " ",
};
print!("{}{}", sign, change);
print!("{sign}{change}");
}
// In a dry run, we return the new resource state that would have been applied.
Ok(resource.clone())
}
Err(Error::Api(ErrorResponse { code: 404, .. })) => {
// The resource does not exist, so the "diff" is the entire new resource.
println!("\nPerforming dry-run for new resource: '{}'", name);
println!("\nPerforming dry-run for new resource: '{name}'");
println!(
"Resource does not exist. It would be created with the following content:"
);
@@ -398,14 +423,14 @@ impl K8sClient {
// Print each line of the new resource with a '+' prefix.
for line in new_yaml.lines() {
println!("+{}", line);
println!("+{line}");
}
// In a dry run, we return the new resource state that would have been created.
Ok(resource.clone())
}
Err(e) => {
// Another API error occurred.
error!("Failed to get resource '{}': {}", name, e);
error!("Failed to get resource '{name}': {e}");
Err(e)
}
}
@@ -420,7 +445,7 @@ impl K8sClient {
where
K: Resource + Clone + std::fmt::Debug + DeserializeOwned + serde::Serialize,
<K as Resource>::Scope: ApplyStrategy<K>,
<K as kube::Resource>::DynamicType: Default,
<K as Resource>::DynamicType: Default,
{
let mut result = Vec::new();
for r in resource.iter() {
@@ -485,10 +510,7 @@ impl K8sClient {
// 6. Apply the object to the cluster using Server-Side Apply.
// This will create the resource if it doesn't exist, or update it if it does.
println!(
"Applying Argo Application '{}' in namespace '{}'...",
name, namespace
);
println!("Applying '{name}' in namespace '{namespace}'...",);
let patch_params = PatchParams::apply("harmony"); // Use a unique field manager name
let result = api.patch(name, &patch_params, &Patch::Apply(&obj)).await?;
@@ -497,6 +519,51 @@ impl K8sClient {
Ok(())
}
/// Apply a resource from a URL
///
/// It is the equivalent of `kubectl apply -f <url>`
pub async fn apply_url(&self, url: Url, ns: Option<&str>) -> Result<(), Error> {
let patch_params = PatchParams::apply("harmony");
let discovery = kube::Discovery::new(self.client.clone()).run().await?;
let yaml = reqwest::get(url)
.await
.expect("Could not get URL")
.text()
.await
.expect("Could not get content from URL");
for doc in multidoc_deserialize(&yaml).expect("failed to parse YAML from file") {
let obj: DynamicObject =
serde_yaml::from_value(doc).expect("cannot apply without valid YAML");
let namespace = obj.metadata.namespace.as_deref().or(ns);
let type_meta = obj
.types
.as_ref()
.expect("cannot apply object without valid TypeMeta");
let gvk = GroupVersionKind::try_from(type_meta)
.expect("cannot apply object without valid GroupVersionKind");
let name = obj.name_any();
if let Some((ar, caps)) = discovery.resolve_gvk(&gvk) {
let api = get_dynamic_api(ar, caps, self.client.clone(), namespace, false);
trace!(
"Applying {}: \n{}",
gvk.kind,
serde_yaml::to_string(&obj).expect("Failed to serialize YAML")
);
let data: serde_json::Value =
serde_json::to_value(&obj).expect("Failed to serialize JSON");
let _r = api.patch(&name, &patch_params, &Patch::Apply(data)).await?;
debug!("applied {} {}", gvk.kind, name);
} else {
warn!("Cannot apply document for unknown {gvk:?}");
}
}
Ok(())
}
pub(crate) async fn from_kubeconfig(path: &str) -> Option<K8sClient> {
let k = match Kubeconfig::read_from(path) {
Ok(k) => k,
@@ -516,6 +583,31 @@ impl K8sClient {
}
}
fn get_dynamic_api(
resource: ApiResource,
capabilities: ApiCapabilities,
client: Client,
ns: Option<&str>,
all: bool,
) -> Api<DynamicObject> {
if capabilities.scope == Scope::Cluster || all {
Api::all_with(client, &resource)
} else if let Some(namespace) = ns {
Api::namespaced_with(client, namespace, &resource)
} else {
Api::default_namespaced_with(client, &resource)
}
}
fn multidoc_deserialize(data: &str) -> Result<Vec<serde_yaml::Value>, serde_yaml::Error> {
use serde::Deserialize;
let mut docs = vec![];
for de in serde_yaml::Deserializer::from_str(data) {
docs.push(serde_yaml::Value::deserialize(de)?);
}
Ok(docs)
}
pub trait ApplyStrategy<K: Resource> {
fn get_api(client: &Client, ns: Option<&str>) -> Api<K>;
}

View File

@@ -1,7 +1,12 @@
use std::{process::Command, sync::Arc};
use std::{collections::BTreeMap, process::Command, sync::Arc, time::Duration};
use async_trait::async_trait;
use kube::api::GroupVersionKind;
use base64::{Engine, engine::general_purpose};
use k8s_openapi::api::{
core::v1::Secret,
rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
};
use kube::api::{DynamicObject, GroupVersionKind, ObjectMeta};
use log::{debug, info, warn};
use serde::Serialize;
use tokio::sync::OnceCell;
@@ -12,14 +17,26 @@ use crate::{
inventory::Inventory,
modules::{
k3d::K3DInstallationScore,
monitoring::kube_prometheus::crd::{
crd_alertmanager_config::CRDPrometheus,
prometheus_operator::prometheus_operator_helm_chart_score,
rhob_alertmanager_config::RHOBObservability,
k8s::ingress::{K8sIngressScore, PathType},
monitoring::{
grafana::{grafana::Grafana, helm::helm_grafana::grafana_helm_chart_score},
kube_prometheus::crd::{
crd_alertmanager_config::CRDPrometheus,
crd_grafana::{
Grafana as GrafanaCRD, GrafanaCom, GrafanaDashboard,
GrafanaDashboardDatasource, GrafanaDashboardSpec, GrafanaDatasource,
GrafanaDatasourceConfig, GrafanaDatasourceJsonData,
GrafanaDatasourceSecureJsonData, GrafanaDatasourceSpec, GrafanaSpec,
},
crd_prometheuses::LabelSelector,
prometheus_operator::prometheus_operator_helm_chart_score,
rhob_alertmanager_config::RHOBObservability,
service_monitor::ServiceMonitor,
},
},
prometheus::{
k8s_prometheus_alerting_score::K8sPrometheusCRDAlertingScore,
prometheus::PrometheusApplicationMonitoring, rhob_alerting_score::RHOBAlertingScore,
prometheus::PrometheusMonitoring, rhob_alerting_score::RHOBAlertingScore,
},
},
score::Score,
@@ -47,6 +64,13 @@ struct K8sState {
message: String,
}
#[derive(Debug, Clone)]
pub enum KubernetesDistribution {
OpenshiftFamily,
K3sFamily,
Default,
}
#[derive(Debug, Clone)]
enum K8sSource {
LocalK3d,
@@ -57,6 +81,7 @@ enum K8sSource {
pub struct K8sAnywhereTopology {
k8s_state: Arc<OnceCell<Option<K8sState>>>,
tenant_manager: Arc<OnceCell<K8sTenantManager>>,
k8s_distribution: Arc<OnceCell<KubernetesDistribution>>,
config: Arc<K8sAnywhereConfig>,
}
@@ -78,41 +103,172 @@ impl K8sclient for K8sAnywhereTopology {
}
#[async_trait]
impl PrometheusApplicationMonitoring<CRDPrometheus> for K8sAnywhereTopology {
impl Grafana for K8sAnywhereTopology {
async fn ensure_grafana_operator(
&self,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
debug!("ensure grafana operator");
let client = self.k8s_client().await.unwrap();
let grafana_gvk = GroupVersionKind {
group: "grafana.integreatly.org".to_string(),
version: "v1beta1".to_string(),
kind: "Grafana".to_string(),
};
let name = "grafanas.grafana.integreatly.org";
let ns = "grafana";
let grafana_crd = client
.get_resource_json_value(name, Some(ns), &grafana_gvk)
.await;
match grafana_crd {
Ok(_) => {
return Ok(PreparationOutcome::Success {
details: "Found grafana CRDs in cluster".to_string(),
});
}
Err(_) => {
return self
.install_grafana_operator(inventory, Some("grafana"))
.await;
}
};
}
async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError> {
let ns = "grafana";
let mut label = BTreeMap::new();
label.insert("dashboards".to_string(), "grafana".to_string());
let label_selector = LabelSelector {
match_labels: label.clone(),
match_expressions: vec![],
};
let client = self.k8s_client().await?;
let grafana = self.build_grafana(ns, &label);
client.apply(&grafana, Some(ns)).await?;
//TODO change this to a ensure ready or something better than just a timeout
client
.wait_until_deployment_ready(
"grafana-grafana-deployment",
Some("grafana"),
Some(Duration::from_secs(30)),
)
.await?;
let sa_name = "grafana-grafana-sa";
let token_secret_name = "grafana-sa-token-secret";
let sa_token_secret = self.build_sa_token_secret(token_secret_name, sa_name, ns);
client.apply(&sa_token_secret, Some(ns)).await?;
let secret_gvk = GroupVersionKind {
group: "".to_string(),
version: "v1".to_string(),
kind: "Secret".to_string(),
};
let secret = client
.get_resource_json_value(token_secret_name, Some(ns), &secret_gvk)
.await?;
let token = format!(
"Bearer {}",
self.extract_and_normalize_token(&secret).unwrap()
);
debug!("creating grafana clusterrole binding");
let clusterrolebinding =
self.build_cluster_rolebinding(sa_name, "cluster-monitoring-view", ns);
client.apply(&clusterrolebinding, Some(ns)).await?;
debug!("creating grafana datasource crd");
let thanos_url = format!(
"https://{}",
self.get_domain("thanos-querier-openshift-monitoring")
.await
.unwrap()
);
let thanos_openshift_datasource = self.build_grafana_datasource(
"thanos-openshift-monitoring",
ns,
&label_selector,
&thanos_url,
&token,
);
client.apply(&thanos_openshift_datasource, Some(ns)).await?;
debug!("creating grafana dashboard crd");
let dashboard = self.build_grafana_dashboard(ns, &label_selector);
client.apply(&dashboard, Some(ns)).await?;
debug!("creating grafana ingress");
let grafana_ingress = self.build_grafana_ingress(ns).await;
grafana_ingress
.interpret(&Inventory::empty(), self)
.await
.map_err(|e| PreparationError::new(e.to_string()))?;
Ok(PreparationOutcome::Success {
details: "Installed grafana composants".to_string(),
})
}
}
#[async_trait]
impl PrometheusMonitoring<CRDPrometheus> for K8sAnywhereTopology {
async fn install_prometheus(
&self,
sender: &CRDPrometheus,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
_inventory: &Inventory,
_receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let client = self.k8s_client().await?;
for monitor in sender.service_monitor.iter() {
client
.apply(monitor, Some(&sender.namespace))
.await
.map_err(|e| PreparationError::new(e.to_string()))?;
}
Ok(PreparationOutcome::Success {
details: "successfuly installed prometheus components".to_string(),
})
}
async fn ensure_prometheus_operator(
&self,
sender: &CRDPrometheus,
_inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
let po_result = self.ensure_prometheus_operator(sender).await?;
if po_result == PreparationOutcome::Noop {
debug!("Skipping Prometheus CR installation due to missing operator.");
return Ok(po_result);
}
let result = self
.get_k8s_prometheus_application_score(sender.clone(), receivers)
.await
.interpret(inventory, self)
.await;
match result {
Ok(outcome) => match outcome.status {
InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
details: outcome.message,
}),
InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
_ => Err(PreparationError::new(outcome.message)),
},
Err(err) => Err(PreparationError::new(err.to_string())),
match po_result {
PreparationOutcome::Success { details: _ } => {
debug!("Detected prometheus crds operator present in cluster.");
return Ok(po_result);
}
PreparationOutcome::Noop => {
debug!("Skipping Prometheus CR installation due to missing operator.");
return Ok(po_result);
}
}
}
}
#[async_trait]
impl PrometheusApplicationMonitoring<RHOBObservability> for K8sAnywhereTopology {
impl PrometheusMonitoring<RHOBObservability> for K8sAnywhereTopology {
async fn install_prometheus(
&self,
sender: &RHOBObservability,
@@ -146,6 +302,14 @@ impl PrometheusApplicationMonitoring<RHOBObservability> for K8sAnywhereTopology
Err(err) => Err(PreparationError::new(err.to_string())),
}
}
async fn ensure_prometheus_operator(
&self,
sender: &RHOBObservability,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
todo!()
}
}
impl Serialize for K8sAnywhereTopology {
@@ -162,6 +326,7 @@ impl K8sAnywhereTopology {
Self {
k8s_state: Arc::new(OnceCell::new()),
tenant_manager: Arc::new(OnceCell::new()),
k8s_distribution: Arc::new(OnceCell::new()),
config: Arc::new(K8sAnywhereConfig::from_env()),
}
}
@@ -170,10 +335,216 @@ impl K8sAnywhereTopology {
Self {
k8s_state: Arc::new(OnceCell::new()),
tenant_manager: Arc::new(OnceCell::new()),
k8s_distribution: Arc::new(OnceCell::new()),
config: Arc::new(config),
}
}
pub async fn get_k8s_distribution(&self) -> Result<&KubernetesDistribution, PreparationError> {
self.k8s_distribution
.get_or_try_init(async || {
let client = self.k8s_client().await.unwrap();
let discovery = client.discovery().await.map_err(|e| {
PreparationError::new(format!("Could not discover API groups: {}", e))
})?;
let version = client.get_apiserver_version().await.map_err(|e| {
PreparationError::new(format!("Could not get server version: {}", e))
})?;
// OpenShift / OKD
if discovery
.groups()
.any(|g| g.name() == "project.openshift.io")
{
return Ok(KubernetesDistribution::OpenshiftFamily);
}
// K3d / K3s
if version.git_version.contains("k3s") {
return Ok(KubernetesDistribution::K3sFamily);
}
return Ok(KubernetesDistribution::Default);
})
.await
}
fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
let token_b64 = secret
.data
.get("token")
.or_else(|| secret.data.get("data").and_then(|d| d.get("token")))
.and_then(|v| v.as_str())?;
let bytes = general_purpose::STANDARD.decode(token_b64).ok()?;
let s = String::from_utf8(bytes).ok()?;
let cleaned = s
.trim_matches(|c: char| c.is_whitespace() || c == '\0')
.to_string();
Some(cleaned)
}
pub fn build_cluster_rolebinding(
&self,
service_account_name: &str,
clusterrole_name: &str,
ns: &str,
) -> ClusterRoleBinding {
ClusterRoleBinding {
metadata: ObjectMeta {
name: Some(format!("{}-view-binding", service_account_name)),
..Default::default()
},
role_ref: RoleRef {
api_group: "rbac.authorization.k8s.io".into(),
kind: "ClusterRole".into(),
name: clusterrole_name.into(),
},
subjects: Some(vec![Subject {
kind: "ServiceAccount".into(),
name: service_account_name.into(),
namespace: Some(ns.into()),
..Default::default()
}]),
}
}
pub fn build_sa_token_secret(
&self,
secret_name: &str,
service_account_name: &str,
ns: &str,
) -> Secret {
let mut annotations = BTreeMap::new();
annotations.insert(
"kubernetes.io/service-account.name".to_string(),
service_account_name.to_string(),
);
Secret {
metadata: ObjectMeta {
name: Some(secret_name.into()),
namespace: Some(ns.into()),
annotations: Some(annotations),
..Default::default()
},
type_: Some("kubernetes.io/service-account-token".to_string()),
..Default::default()
}
}
fn build_grafana_datasource(
&self,
name: &str,
ns: &str,
label_selector: &LabelSelector,
url: &str,
token: &str,
) -> GrafanaDatasource {
let mut json_data = BTreeMap::new();
json_data.insert("timeInterval".to_string(), "5s".to_string());
GrafanaDatasource {
metadata: ObjectMeta {
name: Some(name.to_string()),
namespace: Some(ns.to_string()),
..Default::default()
},
spec: GrafanaDatasourceSpec {
instance_selector: label_selector.clone(),
allow_cross_namespace_import: Some(true),
values_from: None,
datasource: GrafanaDatasourceConfig {
access: "proxy".to_string(),
name: name.to_string(),
r#type: "prometheus".to_string(),
url: url.to_string(),
database: None,
json_data: Some(GrafanaDatasourceJsonData {
time_interval: Some("60s".to_string()),
http_header_name1: Some("Authorization".to_string()),
tls_skip_verify: Some(true),
oauth_pass_thru: Some(true),
}),
secure_json_data: Some(GrafanaDatasourceSecureJsonData {
http_header_value1: Some(format!("Bearer {token}")),
}),
is_default: Some(false),
editable: Some(true),
},
},
}
}
fn build_grafana_dashboard(
&self,
ns: &str,
label_selector: &LabelSelector,
) -> GrafanaDashboard {
let graf_dashboard = GrafanaDashboard {
metadata: ObjectMeta {
name: Some(format!("grafana-dashboard-{}", ns)),
namespace: Some(ns.to_string()),
..Default::default()
},
spec: GrafanaDashboardSpec {
resync_period: Some("30s".to_string()),
instance_selector: label_selector.clone(),
datasources: Some(vec![GrafanaDashboardDatasource {
input_name: "DS_PROMETHEUS".to_string(),
datasource_name: "thanos-openshift-monitoring".to_string(),
}]),
json: None,
grafana_com: Some(GrafanaCom {
id: 17406,
revision: None,
}),
},
};
graf_dashboard
}
fn build_grafana(&self, ns: &str, labels: &BTreeMap<String, String>) -> GrafanaCRD {
let grafana = GrafanaCRD {
metadata: ObjectMeta {
name: Some(format!("grafana-{}", ns)),
namespace: Some(ns.to_string()),
labels: Some(labels.clone()),
..Default::default()
},
spec: GrafanaSpec {
config: None,
admin_user: None,
admin_password: None,
ingress: None,
persistence: None,
resources: None,
},
};
grafana
}
async fn build_grafana_ingress(&self, ns: &str) -> K8sIngressScore {
let domain = self.get_domain(&format!("grafana-{}", ns)).await.unwrap();
let name = format!("{}-grafana", ns);
let backend_service = format!("grafana-{}-service", ns);
K8sIngressScore {
name: fqdn::fqdn!(&name),
host: fqdn::fqdn!(&domain),
backend_service: fqdn::fqdn!(&backend_service),
port: 3000,
path: Some("/".to_string()),
path_type: Some(PathType::Prefix),
namespace: Some(fqdn::fqdn!(&ns)),
ingress_class_name: Some("openshift-default".to_string()),
}
}
async fn get_cluster_observability_operator_prometheus_application_score(
&self,
sender: RHOBObservability,
@@ -191,13 +562,14 @@ impl K8sAnywhereTopology {
&self,
sender: CRDPrometheus,
receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
service_monitors: Option<Vec<ServiceMonitor>>,
) -> K8sPrometheusCRDAlertingScore {
K8sPrometheusCRDAlertingScore {
return K8sPrometheusCRDAlertingScore {
sender,
receivers: receivers.unwrap_or_default(),
service_monitors: vec![],
service_monitors: service_monitors.unwrap_or_default(),
prometheus_rules: vec![],
}
};
}
async fn openshift_ingress_operator_available(&self) -> Result<(), PreparationError> {
@@ -465,6 +837,30 @@ impl K8sAnywhereTopology {
details: "prometheus operator present in cluster".into(),
})
}
async fn install_grafana_operator(
&self,
inventory: &Inventory,
ns: Option<&str>,
) -> Result<PreparationOutcome, PreparationError> {
let namespace = ns.unwrap_or("grafana");
info!("installing grafana operator in ns {namespace}");
let tenant = self.get_k8s_tenant_manager()?.get_tenant_config().await;
let mut namespace_scope = false;
if tenant.is_some() {
namespace_scope = true;
}
let _grafana_operator_score = grafana_helm_chart_score(namespace, namespace_scope)
.interpret(inventory, self)
.await
.map_err(|e| PreparationError::new(e.to_string()));
Ok(PreparationOutcome::Success {
details: format!(
"Successfully installed grafana operator in ns {}",
ns.unwrap()
),
})
}
}
#[derive(Clone, Debug)]

View File

@@ -1,5 +1,6 @@
mod ha_cluster;
pub mod ingress;
pub mod opnsense;
use harmony_types::net::IpAddress;
mod host_binding;
mod http;

View File

@@ -1,8 +1,15 @@
use std::{error::Error, net::Ipv4Addr, str::FromStr, sync::Arc};
use std::{
error::Error,
fmt::{self, Debug},
net::Ipv4Addr,
str::FromStr,
sync::Arc,
};
use async_trait::async_trait;
use derive_new::new;
use harmony_types::{
id::Id,
net::{IpAddress, MacAddress},
switch::PortLocation,
};
@@ -19,8 +26,8 @@ pub struct DHCPStaticEntry {
pub ip: Ipv4Addr,
}
impl std::fmt::Display for DHCPStaticEntry {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
impl fmt::Display for DHCPStaticEntry {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let mac = self
.mac
.iter()
@@ -42,8 +49,8 @@ pub trait Firewall: Send + Sync {
fn get_host(&self) -> LogicalHost;
}
impl std::fmt::Debug for dyn Firewall {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
impl Debug for dyn Firewall {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_fmt(format_args!("Firewall {}", self.get_ip()))
}
}
@@ -65,7 +72,7 @@ pub struct PxeOptions {
}
#[async_trait]
pub trait DhcpServer: Send + Sync + std::fmt::Debug {
pub trait DhcpServer: Send + Sync + Debug {
async fn add_static_mapping(&self, entry: &DHCPStaticEntry) -> Result<(), ExecutorError>;
async fn remove_static_mapping(&self, mac: &MacAddress) -> Result<(), ExecutorError>;
async fn list_static_mappings(&self) -> Vec<(MacAddress, IpAddress)>;
@@ -104,8 +111,8 @@ pub trait DnsServer: Send + Sync {
}
}
impl std::fmt::Debug for dyn DnsServer {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
impl Debug for dyn DnsServer {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_fmt(format_args!("DnsServer {}", self.get_ip()))
}
}
@@ -141,8 +148,8 @@ pub enum DnsRecordType {
TXT,
}
impl std::fmt::Display for DnsRecordType {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
impl fmt::Display for DnsRecordType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
DnsRecordType::A => write!(f, "A"),
DnsRecordType::AAAA => write!(f, "AAAA"),
@@ -185,15 +192,12 @@ pub trait Switch: Send + Sync {
mac_address: &MacAddress,
) -> Result<Option<PortLocation>, SwitchError>;
async fn configure_host_network(
&self,
host: &PhysicalHost,
config: HostNetworkConfig,
) -> Result<(), SwitchError>;
async fn configure_host_network(&self, config: &HostNetworkConfig) -> Result<(), SwitchError>;
}
#[derive(Clone, Debug, PartialEq)]
pub struct HostNetworkConfig {
pub host_id: Id,
pub switch_ports: Vec<SwitchPort>,
}
@@ -216,8 +220,8 @@ pub struct SwitchError {
msg: String,
}
impl std::fmt::Display for SwitchError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
impl fmt::Display for SwitchError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str(&self.msg)
}
}
@@ -225,7 +229,7 @@ impl std::fmt::Display for SwitchError {
impl Error for SwitchError {}
#[async_trait]
pub trait SwitchClient: Send + Sync {
pub trait SwitchClient: Debug + Send + Sync {
/// Executes essential, idempotent, one-time initial configuration steps.
///
/// This is an opiniated procedure that setups a switch to provide high availability

View File

@@ -21,6 +21,7 @@ pub struct AlertingInterpret<S: AlertSender> {
pub sender: S,
pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
pub rules: Vec<Box<dyn AlertRule<S>>>,
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
}
#[async_trait]
@@ -30,6 +31,7 @@ impl<S: AlertSender + Installable<T>, T: Topology> Interpret<T> for AlertingInte
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
debug!("hit sender configure for AlertingInterpret");
self.sender.configure(inventory, topology).await?;
for receiver in self.receivers.iter() {
receiver.install(&self.sender).await?;
@@ -38,6 +40,12 @@ impl<S: AlertSender + Installable<T>, T: Topology> Interpret<T> for AlertingInte
debug!("installing rule: {:#?}", rule);
rule.install(&self.sender).await?;
}
if let Some(targets) = &self.scrape_targets {
for target in targets.iter() {
debug!("installing scrape_target: {:#?}", target);
target.install(&self.sender).await?;
}
}
self.sender.ensure_installed(inventory, topology).await?;
Ok(Outcome::success(format!(
"successfully installed alert sender {}",
@@ -77,6 +85,7 @@ pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
}
#[async_trait]
pub trait ScrapeTarget<S: AlertSender> {
async fn install(&self, sender: &S) -> Result<(), InterpretError>;
pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
}

View File

@@ -0,0 +1,23 @@
use async_trait::async_trait;
use log::info;
use crate::{
infra::opnsense::OPNSenseFirewall,
topology::{PreparationError, PreparationOutcome, Topology},
};
#[async_trait]
impl Topology for OPNSenseFirewall {
async fn ensure_ready(&self) -> Result<PreparationOutcome, PreparationError> {
// FIXME we should be initializing the opnsense config here instead of
// OPNSenseFirewall::new as this causes the config to be loaded too early in
// harmony initialization process
let details = "OPNSenseFirewall topology is ready".to_string();
info!("{}", details);
Ok(PreparationOutcome::Success { details })
}
fn name(&self) -> &str {
"OPNSenseFirewall"
}
}

View File

@@ -1,15 +1,14 @@
use async_trait::async_trait;
use brocade::{BrocadeClient, BrocadeOptions, InterSwitchLink, InterfaceStatus, PortOperatingMode};
use harmony_secret::Secret;
use harmony_types::{
net::{IpAddress, MacAddress},
switch::{PortDeclaration, PortLocation},
};
use option_ext::OptionExt;
use serde::{Deserialize, Serialize};
use crate::topology::{SwitchClient, SwitchError};
#[derive(Debug)]
pub struct BrocadeSwitchClient {
brocade: Box<dyn BrocadeClient + Send + Sync>,
}
@@ -114,12 +113,6 @@ impl SwitchClient for BrocadeSwitchClient {
}
}
#[derive(Secret, Serialize, Deserialize, Debug)]
pub struct BrocadeSwitchAuth {
pub username: String,
pub password: String,
}
#[cfg(test)]
mod tests {
use std::sync::{Arc, Mutex};
@@ -235,7 +228,7 @@ mod tests {
assert_that!(*configured_interfaces).is_empty();
}
#[derive(Clone)]
#[derive(Debug, Clone)]
struct FakeBrocadeClient {
stack_topology: Vec<InterSwitchLink>,
interfaces: Vec<InterfaceInfo>,

View File

@@ -11,7 +11,7 @@ pub struct InventoryRepositoryFactory;
impl InventoryRepositoryFactory {
pub async fn build() -> Result<Box<dyn InventoryRepository>, RepoError> {
Ok(Box::new(
SqliteInventoryRepository::new(&(*DATABASE_URL)).await?,
SqliteInventoryRepository::new(&DATABASE_URL).await?,
))
}
}

View File

@@ -25,6 +25,8 @@ impl OPNSenseFirewall {
self.host.ip
}
/// panics : if the opnsense config file cannot be loaded by the underlying opnsense_config
/// crate
pub async fn new(host: LogicalHost, port: Option<u16>, username: &str, password: &str) -> Self {
Self {
opnsense_config: Arc::new(RwLock::new(

View File

@@ -2,7 +2,11 @@ use crate::modules::application::{
Application, ApplicationFeature, InstallationError, InstallationOutcome,
};
use crate::modules::monitoring::application_monitoring::application_monitoring_score::ApplicationMonitoringScore;
use crate::modules::monitoring::grafana::grafana::Grafana;
use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus;
use crate::modules::monitoring::kube_prometheus::crd::service_monitor::{
ServiceMonitor, ServiceMonitorSpec,
};
use crate::topology::MultiTargetTopology;
use crate::topology::ingress::Ingress;
use crate::{
@@ -14,7 +18,7 @@ use crate::{
topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
};
use crate::{
modules::prometheus::prometheus::PrometheusApplicationMonitoring,
modules::prometheus::prometheus::PrometheusMonitoring,
topology::oberservability::monitoring::AlertReceiver,
};
use async_trait::async_trait;
@@ -22,6 +26,7 @@ use base64::{Engine as _, engine::general_purpose};
use harmony_secret::SecretManager;
use harmony_secret_derive::Secret;
use harmony_types::net::Url;
use kube::api::ObjectMeta;
use log::{debug, info};
use serde::{Deserialize, Serialize};
use std::sync::Arc;
@@ -40,7 +45,8 @@ impl<
+ TenantManager
+ K8sclient
+ MultiTargetTopology
+ PrometheusApplicationMonitoring<CRDPrometheus>
+ PrometheusMonitoring<CRDPrometheus>
+ Grafana
+ Ingress
+ std::fmt::Debug,
> ApplicationFeature<T> for Monitoring
@@ -57,10 +63,20 @@ impl<
.unwrap_or_else(|| self.application.name());
let domain = topology.get_domain("ntfy").await.unwrap();
let app_service_monitor = ServiceMonitor {
metadata: ObjectMeta {
name: Some(self.application.name()),
namespace: Some(namespace.clone()),
..Default::default()
},
spec: ServiceMonitorSpec::default(),
};
let mut alerting_score = ApplicationMonitoringScore {
sender: CRDPrometheus {
namespace: namespace.clone(),
client: topology.k8s_client().await.unwrap(),
service_monitor: vec![app_service_monitor],
},
application: self.application.clone(),
receivers: self.alert_receiver.clone(),

View File

@@ -18,7 +18,7 @@ use crate::{
topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
};
use crate::{
modules::prometheus::prometheus::PrometheusApplicationMonitoring,
modules::prometheus::prometheus::PrometheusMonitoring,
topology::oberservability::monitoring::AlertReceiver,
};
use async_trait::async_trait;
@@ -42,7 +42,7 @@ impl<
+ MultiTargetTopology
+ Ingress
+ std::fmt::Debug
+ PrometheusApplicationMonitoring<RHOBObservability>,
+ PrometheusMonitoring<RHOBObservability>,
> ApplicationFeature<T> for Monitoring
{
async fn ensure_installed(

View File

@@ -0,0 +1,209 @@
use std::sync::Arc;
use async_trait::async_trait;
use harmony_types::id::Id;
use kube::{CustomResource, api::ObjectMeta};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
score::Score,
topology::{K8sclient, Topology, k8s::K8sClient},
};
#[derive(Clone, Debug, Serialize)]
pub struct ClusterIssuerScore {
email: String,
server: String,
issuer_name: String,
namespace: String,
}
impl<T: Topology + K8sclient> Score<T> for ClusterIssuerScore {
fn name(&self) -> String {
"ClusterIssuerScore".to_string()
}
#[doc(hidden)]
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(ClusterIssuerInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone)]
pub struct ClusterIssuerInterpret {
score: ClusterIssuerScore,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for ClusterIssuerInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
self.apply_cluster_issuer(topology.k8s_client().await.unwrap())
.await
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("ClusterIssuer")
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
impl ClusterIssuerInterpret {
async fn validate_cert_manager(
&self,
client: &Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
let cert_manager = "cert-manager".to_string();
let operator_namespace = "openshift-operators".to_string();
match client
.get_deployment(&cert_manager, Some(&operator_namespace))
.await
{
Ok(Some(deployment)) => {
if let Some(status) = deployment.status {
let ready_count = status.ready_replicas.unwrap_or(0);
if ready_count >= 1 {
return Ok(Outcome::success(format!(
"'{}' is ready with {} replica(s).",
&cert_manager, ready_count
)));
} else {
return Err(InterpretError::new(
"cert-manager operator not ready in cluster".to_string(),
));
}
} else {
Err(InterpretError::new(format!(
"failed to get deployment status {} in ns {}",
&cert_manager, &operator_namespace
)))
}
}
Ok(None) => Err(InterpretError::new(format!(
"Deployment '{}' not found in namespace '{}'.",
&cert_manager, &operator_namespace
))),
Err(e) => Err(InterpretError::new(format!(
"Failed to query for deployment '{}': {}",
&cert_manager, e
))),
}
}
fn build_cluster_issuer(&self) -> Result<ClusterIssuer, InterpretError> {
let issuer_name = &self.score.issuer_name;
let email = &self.score.email;
let server = &self.score.server;
let namespace = &self.score.namespace;
let cluster_issuer = ClusterIssuer {
metadata: ObjectMeta {
name: Some(issuer_name.to_string()),
namespace: Some(namespace.to_string()),
..Default::default()
},
spec: ClusterIssuerSpec {
acme: AcmeSpec {
email: email.to_string(),
private_key_secret_ref: PrivateKeySecretRef {
name: issuer_name.to_string(),
},
server: server.to_string(),
solvers: vec![SolverSpec {
http01: Some(Http01Solver {
ingress: Http01Ingress {
class: "nginx".to_string(),
},
}),
}],
},
},
};
Ok(cluster_issuer)
}
pub async fn apply_cluster_issuer(
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
let namespace = self.score.namespace.clone();
self.validate_cert_manager(&client).await?;
let cluster_issuer = self.build_cluster_issuer().unwrap();
client
.apply_yaml(
&serde_yaml::to_value(cluster_issuer).unwrap(),
Some(&namespace),
)
.await?;
Ok(Outcome::success(format!(
"successfully deployed cluster operator: {} in namespace: {}",
self.score.issuer_name, self.score.namespace
)))
}
}
#[derive(CustomResource, Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[kube(
group = "cert-manager.io",
version = "v1",
kind = "ClusterIssuer",
plural = "clusterissuers"
)]
#[serde(rename_all = "camelCase")]
pub struct ClusterIssuerSpec {
pub acme: AcmeSpec,
}
#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct AcmeSpec {
pub email: String,
pub private_key_secret_ref: PrivateKeySecretRef,
pub server: String,
pub solvers: Vec<SolverSpec>,
}
#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct PrivateKeySecretRef {
pub name: String,
}
#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct SolverSpec {
pub http01: Option<Http01Solver>,
// Other solver types (e.g., dns01) would go here as Options
}
#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct Http01Solver {
pub ingress: Http01Ingress,
}
#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct Http01Ingress {
pub class: String,
}

View File

@@ -1,2 +1,3 @@
pub mod cluster_issuer;
mod helm;
pub use helm::*;

View File

@@ -38,13 +38,15 @@ impl<
+ 'static
+ Send
+ Clone,
T: Topology,
T: Topology + K8sclient,
> Score<T> for K8sResourceScore<K>
where
<K as kube::Resource>::DynamicType: Default,
{
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
todo!()
Box::new(K8sResourceInterpret {
score: self.clone(),
})
}
fn name(&self) -> String {

View File

@@ -1,21 +1,23 @@
use std::sync::Arc;
use async_trait::async_trait;
use log::debug;
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
interpret::Interpret,
modules::{
application::Application,
monitoring::kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus,
prometheus::prometheus::PrometheusApplicationMonitoring,
monitoring::{
grafana::grafana::Grafana, kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus,
},
prometheus::prometheus::PrometheusMonitoring,
},
score::Score,
topology::{PreparationOutcome, Topology, oberservability::monitoring::AlertReceiver},
topology::{
K8sclient, Topology,
oberservability::monitoring::{AlertReceiver, AlertingInterpret, ScrapeTarget},
},
};
use harmony_types::id::Id;
#[derive(Debug, Clone, Serialize)]
pub struct ApplicationMonitoringScore {
@@ -24,12 +26,16 @@ pub struct ApplicationMonitoringScore {
pub receivers: Vec<Box<dyn AlertReceiver<CRDPrometheus>>>,
}
impl<T: Topology + PrometheusApplicationMonitoring<CRDPrometheus>> Score<T>
impl<T: Topology + PrometheusMonitoring<CRDPrometheus> + K8sclient + Grafana> Score<T>
for ApplicationMonitoringScore
{
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(ApplicationMonitoringInterpret {
score: self.clone(),
debug!("creating alerting interpret");
Box::new(AlertingInterpret {
sender: self.sender.clone(),
receivers: self.receivers.clone(),
rules: vec![],
scrape_targets: None,
})
}
@@ -40,55 +46,3 @@ impl<T: Topology + PrometheusApplicationMonitoring<CRDPrometheus>> Score<T>
)
}
}
#[derive(Debug)]
pub struct ApplicationMonitoringInterpret {
score: ApplicationMonitoringScore,
}
#[async_trait]
impl<T: Topology + PrometheusApplicationMonitoring<CRDPrometheus>> Interpret<T>
for ApplicationMonitoringInterpret
{
async fn execute(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let result = topology
.install_prometheus(
&self.score.sender,
inventory,
Some(self.score.receivers.clone()),
)
.await;
match result {
Ok(outcome) => match outcome {
PreparationOutcome::Success { details: _ } => {
Ok(Outcome::success("Prometheus installed".into()))
}
PreparationOutcome::Noop => {
Ok(Outcome::noop("Prometheus installation skipped".into()))
}
},
Err(err) => Err(InterpretError::from(err)),
}
}
fn get_name(&self) -> InterpretName {
InterpretName::ApplicationMonitoring
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}

View File

@@ -12,7 +12,7 @@ use crate::{
monitoring::kube_prometheus::crd::{
crd_alertmanager_config::CRDPrometheus, rhob_alertmanager_config::RHOBObservability,
},
prometheus::prometheus::PrometheusApplicationMonitoring,
prometheus::prometheus::PrometheusMonitoring,
},
score::Score,
topology::{PreparationOutcome, Topology, oberservability::monitoring::AlertReceiver},
@@ -26,7 +26,7 @@ pub struct ApplicationRHOBMonitoringScore {
pub receivers: Vec<Box<dyn AlertReceiver<RHOBObservability>>>,
}
impl<T: Topology + PrometheusApplicationMonitoring<RHOBObservability>> Score<T>
impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Score<T>
for ApplicationRHOBMonitoringScore
{
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
@@ -49,7 +49,7 @@ pub struct ApplicationRHOBMonitoringInterpret {
}
#[async_trait]
impl<T: Topology + PrometheusApplicationMonitoring<RHOBObservability>> Interpret<T>
impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Interpret<T>
for ApplicationRHOBMonitoringInterpret
{
async fn execute(

View File

@@ -0,0 +1,17 @@
use async_trait::async_trait;
use k8s_openapi::Resource;
use crate::{
inventory::Inventory,
topology::{PreparationError, PreparationOutcome},
};
#[async_trait]
pub trait Grafana {
async fn ensure_grafana_operator(
&self,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError>;
async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError>;
}

View File

@@ -1,27 +1,28 @@
use harmony_macros::hurl;
use non_blank_string_rs::NonBlankString;
use std::str::FromStr;
use std::{collections::HashMap, str::FromStr};
use crate::modules::helm::chart::HelmChartScore;
pub fn grafana_helm_chart_score(ns: &str) -> HelmChartScore {
let values = r#"
rbac:
namespaced: true
sidecar:
dashboards:
enabled: true
"#
.to_string();
use crate::modules::helm::chart::{HelmChartScore, HelmRepository};
pub fn grafana_helm_chart_score(ns: &str, namespace_scope: bool) -> HelmChartScore {
let mut values_overrides = HashMap::new();
values_overrides.insert(
NonBlankString::from_str("namespaceScope").unwrap(),
namespace_scope.to_string(),
);
HelmChartScore {
namespace: Some(NonBlankString::from_str(ns).unwrap()),
release_name: NonBlankString::from_str("grafana").unwrap(),
chart_name: NonBlankString::from_str("oci://ghcr.io/grafana/helm-charts/grafana").unwrap(),
release_name: NonBlankString::from_str("grafana-operator").unwrap(),
chart_name: NonBlankString::from_str("grafana/grafana-operator").unwrap(),
chart_version: None,
values_overrides: None,
values_yaml: Some(values.to_string()),
values_overrides: Some(values_overrides),
values_yaml: None,
create_namespace: true,
install_only: true,
repository: None,
repository: Some(HelmRepository::new(
"grafana".to_string(),
hurl!("https://grafana.github.io/helm-charts"),
true,
)),
}
}

View File

@@ -1 +1,2 @@
pub mod grafana;
pub mod helm;

View File

@@ -1,12 +1,25 @@
use std::sync::Arc;
use async_trait::async_trait;
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::topology::{
k8s::K8sClient,
oberservability::monitoring::{AlertReceiver, AlertSender},
use crate::{
interpret::{InterpretError, Outcome},
inventory::Inventory,
modules::{
monitoring::{
grafana::grafana::Grafana, kube_prometheus::crd::service_monitor::ServiceMonitor,
},
prometheus::prometheus::PrometheusMonitoring,
},
topology::{
K8sclient, Topology,
installable::Installable,
k8s::K8sClient,
oberservability::monitoring::{AlertReceiver, AlertSender, ScrapeTarget},
},
};
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
@@ -26,6 +39,7 @@ pub struct AlertmanagerConfigSpec {
pub struct CRDPrometheus {
pub namespace: String,
pub client: Arc<K8sClient>,
pub service_monitor: Vec<ServiceMonitor>,
}
impl AlertSender for CRDPrometheus {
@@ -40,6 +54,12 @@ impl Clone for Box<dyn AlertReceiver<CRDPrometheus>> {
}
}
impl Clone for Box<dyn ScrapeTarget<CRDPrometheus>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
impl Serialize for Box<dyn AlertReceiver<CRDPrometheus>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
@@ -48,3 +68,24 @@ impl Serialize for Box<dyn AlertReceiver<CRDPrometheus>> {
todo!()
}
}
#[async_trait]
impl<T: Topology + K8sclient + PrometheusMonitoring<CRDPrometheus> + Grafana> Installable<T>
for CRDPrometheus
{
async fn configure(&self, inventory: &Inventory, topology: &T) -> Result<(), InterpretError> {
topology.ensure_grafana_operator(inventory).await?;
topology.ensure_prometheus_operator(self, inventory).await?;
Ok(())
}
async fn ensure_installed(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<(), InterpretError> {
topology.install_grafana().await?;
topology.install_prometheus(&self, inventory, None).await?;
Ok(())
}
}

View File

@@ -103,9 +103,34 @@ pub struct GrafanaDashboardSpec {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub resync_period: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub datasources: Option<Vec<GrafanaDashboardDatasource>>,
pub instance_selector: LabelSelector,
pub json: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub json: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub grafana_com: Option<GrafanaCom>,
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct GrafanaDashboardDatasource {
pub input_name: String,
pub datasource_name: String,
}
// ------------------------------------------------------------------------------------------------
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct GrafanaCom {
pub id: u32,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub revision: Option<u32>,
}
// ------------------------------------------------------------------------------------------------
@@ -126,20 +151,79 @@ pub struct GrafanaDatasourceSpec {
pub allow_cross_namespace_import: Option<bool>,
pub datasource: GrafanaDatasourceConfig,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub values_from: Option<Vec<GrafanaValueFrom>>,
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct GrafanaValueFrom {
pub target_path: String,
pub value_from: GrafanaValueSource,
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct GrafanaValueSource {
pub secret_key_ref: GrafanaSecretKeyRef,
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct GrafanaSecretKeyRef {
pub name: String,
pub key: String,
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct GrafanaDatasourceConfig {
pub access: String,
pub database: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub json_data: Option<BTreeMap<String, String>>,
pub database: Option<String>,
pub name: String,
pub r#type: String,
pub url: String,
/// Represents jsonData in the GrafanaDatasource spec
#[serde(default, skip_serializing_if = "Option::is_none")]
pub json_data: Option<GrafanaDatasourceJsonData>,
/// Represents secureJsonData (secrets)
#[serde(default, skip_serializing_if = "Option::is_none")]
pub secure_json_data: Option<GrafanaDatasourceSecureJsonData>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub is_default: Option<bool>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub editable: Option<bool>,
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct GrafanaDatasourceJsonData {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub time_interval: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub http_header_name1: Option<String>,
/// Disable TLS skip verification (false = verify)
#[serde(default, skip_serializing_if = "Option::is_none")]
pub tls_skip_verify: Option<bool>,
/// Auth type - set to "forward" for OpenShift OAuth identity
#[serde(default, skip_serializing_if = "Option::is_none")]
pub oauth_pass_thru: Option<bool>,
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct GrafanaDatasourceSecureJsonData {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub http_header_value1: Option<String>,
}
// ------------------------------------------------------------------------------------------------
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema, Default)]

View File

@@ -0,0 +1,187 @@
use std::net::IpAddr;
use async_trait::async_trait;
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::{
modules::monitoring::kube_prometheus::crd::{
crd_alertmanager_config::CRDPrometheus, crd_prometheuses::LabelSelector,
},
topology::oberservability::monitoring::ScrapeTarget,
};
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[kube(
group = "monitoring.coreos.com",
version = "v1alpha1",
kind = "ScrapeConfig",
plural = "scrapeconfigs",
namespaced
)]
#[serde(rename_all = "camelCase")]
pub struct ScrapeConfigSpec {
/// List of static configurations.
pub static_configs: Option<Vec<StaticConfig>>,
/// Kubernetes service discovery.
pub kubernetes_sd_configs: Option<Vec<KubernetesSDConfig>>,
/// HTTP-based service discovery.
pub http_sd_configs: Option<Vec<HttpSDConfig>>,
/// File-based service discovery.
pub file_sd_configs: Option<Vec<FileSDConfig>>,
/// DNS-based service discovery.
pub dns_sd_configs: Option<Vec<DnsSDConfig>>,
/// Consul service discovery.
pub consul_sd_configs: Option<Vec<ConsulSDConfig>>,
/// Relabeling configuration applied to discovered targets.
pub relabel_configs: Option<Vec<RelabelConfig>>,
/// Metric relabeling configuration applied to scraped samples.
pub metric_relabel_configs: Option<Vec<RelabelConfig>>,
/// Path to scrape metrics from (defaults to `/metrics`).
pub metrics_path: Option<String>,
/// Interval at which Prometheus scrapes targets (e.g., "30s").
pub scrape_interval: Option<String>,
/// Timeout for scraping (e.g., "10s").
pub scrape_timeout: Option<String>,
/// Optional job name override.
pub job_name: Option<String>,
/// Optional scheme (http or https).
pub scheme: Option<String>,
/// Authorization paramaters for snmp walk
pub params: Option<Params>,
}
/// Static configuration section of a ScrapeConfig.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct StaticConfig {
pub targets: Vec<String>,
pub labels: Option<LabelSelector>,
}
/// Relabeling configuration for target or metric relabeling.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct RelabelConfig {
pub source_labels: Option<Vec<String>>,
pub separator: Option<String>,
pub target_label: Option<String>,
pub regex: Option<String>,
pub modulus: Option<u64>,
pub replacement: Option<String>,
pub action: Option<String>,
}
/// Kubernetes service discovery configuration.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct KubernetesSDConfig {
///"pod", "service", "endpoints"pub role: String,
pub namespaces: Option<NamespaceSelector>,
pub selectors: Option<Vec<LabelSelector>>,
pub api_server: Option<String>,
pub bearer_token_file: Option<String>,
pub tls_config: Option<TLSConfig>,
}
/// Namespace selector for Kubernetes service discovery.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct NamespaceSelector {
pub any: Option<bool>,
pub match_names: Option<Vec<String>>,
}
/// HTTP-based service discovery configuration.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct HttpSDConfig {
pub url: String,
pub refresh_interval: Option<String>,
pub basic_auth: Option<BasicAuth>,
pub authorization: Option<Authorization>,
pub tls_config: Option<TLSConfig>,
}
/// File-based service discovery configuration.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct FileSDConfig {
pub files: Vec<String>,
pub refresh_interval: Option<String>,
}
/// DNS-based service discovery configuration.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct DnsSDConfig {
pub names: Vec<String>,
pub refresh_interval: Option<String>,
pub type_: Option<String>, // SRV, A, AAAA
pub port: Option<u16>,
}
/// Consul service discovery configuration.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct ConsulSDConfig {
pub server: String,
pub services: Option<Vec<String>>,
pub scheme: Option<String>,
pub datacenter: Option<String>,
pub tag_separator: Option<String>,
pub refresh_interval: Option<String>,
pub tls_config: Option<TLSConfig>,
}
/// Basic authentication credentials.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct BasicAuth {
pub username: String,
pub password: Option<String>,
pub password_file: Option<String>,
}
/// Bearer token or other auth mechanisms.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct Authorization {
pub credentials: Option<String>,
pub credentials_file: Option<String>,
pub type_: Option<String>,
}
/// TLS configuration for secure scraping.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct TLSConfig {
pub ca_file: Option<String>,
pub cert_file: Option<String>,
pub key_file: Option<String>,
pub server_name: Option<String>,
pub insecure_skip_verify: Option<bool>,
}
/// Authorization parameters for SNMP walk.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct Params {
pub auth: Option<Vec<String>>,
pub module: Option<Vec<String>>,
}

View File

@@ -4,6 +4,7 @@ pub mod crd_default_rules;
pub mod crd_grafana;
pub mod crd_prometheus_rules;
pub mod crd_prometheuses;
pub mod crd_scrape_config;
pub mod grafana_default_dashboard;
pub mod grafana_operator;
pub mod prometheus_operator;

View File

@@ -31,6 +31,7 @@ impl<T: Topology + HelmCommand + TenantManager> Score<T> for HelmPrometheusAlert
sender: KubePrometheus { config },
receivers: self.receivers.clone(),
rules: self.rules.clone(),
scrape_targets: None,
})
}
fn name(&self) -> String {

View File

@@ -6,3 +6,4 @@ pub mod kube_prometheus;
pub mod ntfy;
pub mod okd;
pub mod prometheus;
pub mod scrape_target;

View File

@@ -100,11 +100,7 @@ impl<T: Topology + HelmCommand + K8sclient + MultiTargetTopology> Interpret<T> f
info!("deploying ntfy...");
client
.wait_until_deployment_ready(
"ntfy".to_string(),
Some(self.score.namespace.as_str()),
None,
)
.wait_until_deployment_ready("ntfy", Some(self.score.namespace.as_str()), None)
.await?;
info!("ntfy deployed");

View File

@@ -114,7 +114,7 @@ impl Prometheus {
};
if let Some(ns) = namespace.as_deref() {
grafana_helm_chart_score(ns)
grafana_helm_chart_score(ns, false)
.interpret(inventory, topology)
.await
} else {

View File

@@ -0,0 +1 @@
pub mod server;

View File

@@ -0,0 +1,80 @@
use std::net::IpAddr;
use async_trait::async_trait;
use kube::api::ObjectMeta;
use serde::Serialize;
use crate::{
interpret::{InterpretError, Outcome},
modules::monitoring::kube_prometheus::crd::{
crd_alertmanager_config::CRDPrometheus,
crd_scrape_config::{Params, RelabelConfig, ScrapeConfig, ScrapeConfigSpec, StaticConfig},
},
topology::oberservability::monitoring::ScrapeTarget,
};
#[derive(Debug, Clone, Serialize)]
pub struct Server {
pub name: String,
pub ip: IpAddr,
pub auth: String,
pub module: String,
pub domain: String,
}
#[async_trait]
impl ScrapeTarget<CRDPrometheus> for Server {
async fn install(&self, sender: &CRDPrometheus) -> Result<Outcome, InterpretError> {
let scrape_config_spec = ScrapeConfigSpec {
static_configs: Some(vec![StaticConfig {
targets: vec![self.ip.to_string()],
labels: None,
}]),
scrape_interval: Some("2m".to_string()),
kubernetes_sd_configs: None,
http_sd_configs: None,
file_sd_configs: None,
dns_sd_configs: None,
params: Some(Params {
auth: Some(vec![self.auth.clone()]),
module: Some(vec![self.module.clone()]),
}),
consul_sd_configs: None,
relabel_configs: Some(vec![RelabelConfig {
action: None,
source_labels: Some(vec!["__address__".to_string()]),
separator: None,
target_label: Some("__param_target".to_string()),
regex: None,
replacement: Some(format!("snmp.{}:31080", self.domain.clone())),
modulus: None,
}]),
metric_relabel_configs: None,
metrics_path: Some("/snmp".to_string()),
scrape_timeout: Some("2m".to_string()),
job_name: Some(format!("snmp_exporter/cloud/{}", self.name.clone())),
scheme: None,
};
let scrape_config = ScrapeConfig {
metadata: ObjectMeta {
name: Some(self.name.clone()),
namespace: Some(sender.namespace.clone()),
..Default::default()
},
spec: scrape_config_spec,
};
sender
.client
.apply(&scrape_config, Some(&sender.namespace.clone()))
.await?;
Ok(Outcome::success(format!(
"installed scrape target {}",
self.name.clone()
)))
}
fn clone_box(&self) -> Box<dyn ScrapeTarget<CRDPrometheus>> {
Box::new(self.clone())
}
}

View File

@@ -5,10 +5,8 @@ use crate::{
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::{HostRole, Inventory},
modules::{
dhcp::DhcpHostBindingScore,
http::IPxeMacBootFileScore,
inventory::DiscoverHostForRoleScore,
okd::{host_network::HostNetworkConfigurationScore, templates::BootstrapIpxeTpl},
dhcp::DhcpHostBindingScore, http::IPxeMacBootFileScore,
inventory::DiscoverHostForRoleScore, okd::templates::BootstrapIpxeTpl,
},
score::Score,
topology::{HAClusterTopology, HostBinding},
@@ -205,28 +203,6 @@ impl OKDSetup03ControlPlaneInterpret {
Ok(())
}
/// Placeholder for automating network bonding configuration.
async fn persist_network_bond(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
hosts: &Vec<PhysicalHost>,
) -> Result<(), InterpretError> {
info!("[ControlPlane] Ensuring persistent bonding");
let score = HostNetworkConfigurationScore {
hosts: hosts.clone(),
};
score.interpret(inventory, topology).await?;
inquire::Confirm::new(
"Network configuration for control plane nodes is not automated yet. Configure it manually if needed.",
)
.prompt()
.map_err(|e| InterpretError::new(format!("User prompt failed: {e}")))?;
Ok(())
}
}
#[async_trait]
@@ -265,10 +241,6 @@ impl Interpret<HAClusterTopology> for OKDSetup03ControlPlaneInterpret {
// 4. Reboot the nodes to start the OS installation.
self.reboot_targets(&nodes).await?;
// 5. Placeholder for post-boot network configuration (e.g., bonding).
self.persist_network_bond(inventory, topology, &nodes)
.await?;
// TODO: Implement a step to wait for the control plane nodes to join the cluster
// and for the cluster operators to become available. This would be similar to
// the `wait-for bootstrap-complete` command.

View File

@@ -0,0 +1,130 @@
use crate::{
data::Version,
hardware::PhysicalHost,
infra::inventory::InventoryRepositoryFactory,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::{HostRole, Inventory},
modules::okd::host_network::HostNetworkConfigurationScore,
score::Score,
topology::HAClusterTopology,
};
use async_trait::async_trait;
use derive_new::new;
use harmony_types::id::Id;
use log::info;
use serde::Serialize;
// -------------------------------------------------------------------------------------------------
// Persist Network Bond
// - Persist bonding via NMState
// - Persist port channels on the Switch
// -------------------------------------------------------------------------------------------------
#[derive(Debug, Clone, Serialize, new)]
pub struct OKDSetupPersistNetworkBondScore {}
impl Score<HAClusterTopology> for OKDSetupPersistNetworkBondScore {
fn create_interpret(&self) -> Box<dyn Interpret<HAClusterTopology>> {
Box::new(OKDSetupPersistNetworkBondInterpet::new())
}
fn name(&self) -> String {
"OKDSetupPersistNetworkBondScore".to_string()
}
}
#[derive(Debug, Clone)]
pub struct OKDSetupPersistNetworkBondInterpet {
version: Version,
status: InterpretStatus,
}
impl OKDSetupPersistNetworkBondInterpet {
pub fn new() -> Self {
let version = Version::from("1.0.0").unwrap();
Self {
version,
status: InterpretStatus::QUEUED,
}
}
/// Ensures that three physical hosts are discovered and available for the ControlPlane role.
/// It will trigger discovery if not enough hosts are found.
async fn get_nodes(
&self,
_inventory: &Inventory,
_topology: &HAClusterTopology,
) -> Result<Vec<PhysicalHost>, InterpretError> {
const REQUIRED_HOSTS: usize = 3;
let repo = InventoryRepositoryFactory::build().await?;
let control_plane_hosts = repo.get_host_for_role(&HostRole::ControlPlane).await?;
if control_plane_hosts.len() < REQUIRED_HOSTS {
Err(InterpretError::new(format!(
"OKD Requires at least {} control plane hosts, but only found {}. Cannot proceed.",
REQUIRED_HOSTS,
control_plane_hosts.len()
)))
} else {
// Take exactly the number of required hosts to ensure consistency.
Ok(control_plane_hosts
.into_iter()
.take(REQUIRED_HOSTS)
.collect())
}
}
async fn persist_network_bond(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
hosts: &Vec<PhysicalHost>,
) -> Result<(), InterpretError> {
info!("Ensuring persistent bonding");
let score = HostNetworkConfigurationScore {
hosts: hosts.clone(),
};
score.interpret(inventory, topology).await?;
Ok(())
}
}
#[async_trait]
impl Interpret<HAClusterTopology> for OKDSetupPersistNetworkBondInterpet {
fn get_name(&self) -> InterpretName {
InterpretName::Custom("OKDSetupPersistNetworkBondInterpet")
}
fn get_version(&self) -> Version {
self.version.clone()
}
fn get_status(&self) -> InterpretStatus {
self.status.clone()
}
fn get_children(&self) -> Vec<Id> {
vec![]
}
async fn execute(
&self,
inventory: &Inventory,
topology: &HAClusterTopology,
) -> Result<Outcome, InterpretError> {
let nodes = self.get_nodes(inventory, topology).await?;
let res = self.persist_network_bond(inventory, topology, &nodes).await;
match res {
Ok(_) => Ok(Outcome::success(
"Network bond successfully persisted".into(),
)),
Err(_) => Err(InterpretError::new(
"Failed to persist network bond".to_string(),
)),
}
}
}

View File

@@ -1,41 +1 @@
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
pub mod nmstate;
#[derive(CustomResource, Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[kube(
group = "operators.coreos.com",
version = "v1",
kind = "OperatorGroup",
namespaced
)]
#[serde(rename_all = "camelCase")]
pub struct OperatorGroupSpec {
pub target_namespaces: Vec<String>,
}
#[derive(CustomResource, Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[kube(
group = "operators.coreos.com",
version = "v1alpha1",
kind = "Subscription",
namespaced
)]
#[serde(rename_all = "camelCase")]
pub struct SubscriptionSpec {
pub name: String,
pub source: String,
pub source_namespace: String,
pub channel: Option<String>,
pub install_plan_approval: Option<InstallPlanApproval>,
}
#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
pub enum InstallPlanApproval {
#[serde(rename = "Automatic")]
Automatic,
#[serde(rename = "Manual")]
Manual,
}

View File

@@ -6,9 +6,16 @@ use serde::{Deserialize, Serialize};
use serde_json::Value;
#[derive(CustomResource, Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[kube(group = "nmstate.io", version = "v1", kind = "NMState", namespaced)]
#[kube(
group = "nmstate.io",
version = "v1",
kind = "NMState",
plural = "nmstates",
namespaced = false
)]
#[serde(rename_all = "camelCase")]
pub struct NMStateSpec {
#[serde(skip_serializing_if = "Option::is_none")]
pub probe_configuration: Option<ProbeConfig>,
}
@@ -44,6 +51,7 @@ pub struct ProbeDns {
)]
#[serde(rename_all = "camelCase")]
pub struct NodeNetworkConfigurationPolicySpec {
#[serde(skip_serializing_if = "Option::is_none")]
pub node_selector: Option<BTreeMap<String, String>>,
pub desired_state: DesiredStateSpec,
}
@@ -58,37 +66,64 @@ pub struct DesiredStateSpec {
#[serde(rename_all = "kebab-case")]
pub struct InterfaceSpec {
pub name: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub description: Option<String>,
pub r#type: String,
pub state: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub mac_address: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub copy_mac_from: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub mtu: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub controller: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub ipv4: Option<IpStackSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub ipv6: Option<IpStackSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub ethernet: Option<EthernetSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub link_aggregation: Option<BondSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub vlan: Option<VlanSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub vxlan: Option<VxlanSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub mac_vtap: Option<MacVtapSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub mac_vlan: Option<MacVlanSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub infiniband: Option<InfinibandSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub linux_bridge: Option<LinuxBridgeSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub ovs_bridge: Option<OvsBridgeSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub ethtool: Option<EthtoolSpec>,
}
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct IpStackSpec {
#[serde(skip_serializing_if = "Option::is_none")]
pub enabled: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub dhcp: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub autoconf: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub address: Option<Vec<IpAddressSpec>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub auto_dns: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub auto_gateway: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub auto_routes: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub dhcp_client_id: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub dhcp_duid: Option<String>,
}
@@ -102,8 +137,11 @@ pub struct IpAddressSpec {
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct EthernetSpec {
#[serde(skip_serializing_if = "Option::is_none")]
pub speed: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub duplex: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub auto_negotiation: Option<bool>,
}
@@ -112,6 +150,7 @@ pub struct EthernetSpec {
pub struct BondSpec {
pub mode: String,
pub ports: Vec<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub options: Option<BTreeMap<String, Value>>,
}
@@ -120,6 +159,7 @@ pub struct BondSpec {
pub struct VlanSpec {
pub base_iface: String,
pub id: u16,
#[serde(skip_serializing_if = "Option::is_none")]
pub protocol: Option<String>,
}
@@ -129,8 +169,11 @@ pub struct VxlanSpec {
pub base_iface: String,
pub id: u32,
pub remote: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub local: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub learning: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub destination_port: Option<u16>,
}
@@ -139,6 +182,7 @@ pub struct VxlanSpec {
pub struct MacVtapSpec {
pub base_iface: String,
pub mode: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub promiscuous: Option<bool>,
}
@@ -147,6 +191,7 @@ pub struct MacVtapSpec {
pub struct MacVlanSpec {
pub base_iface: String,
pub mode: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub promiscuous: Option<bool>,
}
@@ -161,25 +206,35 @@ pub struct InfinibandSpec {
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct LinuxBridgeSpec {
#[serde(skip_serializing_if = "Option::is_none")]
pub options: Option<LinuxBridgeOptions>,
#[serde(skip_serializing_if = "Option::is_none")]
pub ports: Option<Vec<LinuxBridgePort>>,
}
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct LinuxBridgeOptions {
#[serde(skip_serializing_if = "Option::is_none")]
pub mac_ageing_time: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub multicast_snooping: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub stp: Option<StpOptions>,
}
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct StpOptions {
#[serde(skip_serializing_if = "Option::is_none")]
pub enabled: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub forward_delay: Option<u16>,
#[serde(skip_serializing_if = "Option::is_none")]
pub hello_time: Option<u16>,
#[serde(skip_serializing_if = "Option::is_none")]
pub max_age: Option<u16>,
#[serde(skip_serializing_if = "Option::is_none")]
pub priority: Option<u16>,
}
@@ -187,15 +242,20 @@ pub struct StpOptions {
#[serde(rename_all = "kebab-case")]
pub struct LinuxBridgePort {
pub name: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub vlan: Option<LinuxBridgePortVlan>,
}
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct LinuxBridgePortVlan {
#[serde(skip_serializing_if = "Option::is_none")]
pub mode: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub trunk_tags: Option<Vec<VlanTag>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub tag: Option<u16>,
#[serde(skip_serializing_if = "Option::is_none")]
pub enable_native: Option<bool>,
}
@@ -203,6 +263,7 @@ pub struct LinuxBridgePortVlan {
#[serde(rename_all = "kebab-case")]
pub struct VlanTag {
pub id: u16,
#[serde(skip_serializing_if = "Option::is_none")]
pub id_range: Option<VlanIdRange>,
}
@@ -216,15 +277,20 @@ pub struct VlanIdRange {
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct OvsBridgeSpec {
#[serde(skip_serializing_if = "Option::is_none")]
pub options: Option<OvsBridgeOptions>,
#[serde(skip_serializing_if = "Option::is_none")]
pub ports: Option<Vec<OvsPortSpec>>,
}
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct OvsBridgeOptions {
#[serde(skip_serializing_if = "Option::is_none")]
pub stp: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub rstp: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub mcast_snooping_enable: Option<bool>,
}
@@ -232,8 +298,11 @@ pub struct OvsBridgeOptions {
#[serde(rename_all = "kebab-case")]
pub struct OvsPortSpec {
pub name: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub link_aggregation: Option<BondSpec>,
#[serde(skip_serializing_if = "Option::is_none")]
pub vlan: Option<LinuxBridgePortVlan>,
#[serde(skip_serializing_if = "Option::is_none")]
pub r#type: Option<String>,
}
@@ -246,6 +315,8 @@ pub struct EthtoolSpec {
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
#[serde(rename_all = "kebab-case")]
pub struct EthtoolFecSpec {
#[serde(skip_serializing_if = "Option::is_none")]
pub auto: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub mode: Option<String>,
}

View File

@@ -39,30 +39,70 @@ impl HostNetworkConfigurationInterpret {
&self,
topology: &T,
host: &PhysicalHost,
) -> Result<(), InterpretError> {
let switch_ports = self.collect_switch_ports_for_host(topology, host).await?;
if !switch_ports.is_empty() {
topology
.configure_host_network(host, HostNetworkConfig { switch_ports })
.await
.map_err(|e| InterpretError::new(format!("Failed to configure host: {e}")))?;
current_host: &usize,
total_hosts: &usize,
) -> Result<HostNetworkConfig, InterpretError> {
if host.network.is_empty() {
info!("[Host {current_host}/{total_hosts}] No interfaces to configure, skipping");
return Ok(HostNetworkConfig {
host_id: host.id.clone(),
switch_ports: vec![],
});
}
Ok(())
let switch_ports = self
.collect_switch_ports_for_host(topology, host, current_host, total_hosts)
.await?;
let config = HostNetworkConfig {
host_id: host.id.clone(),
switch_ports,
};
if !config.switch_ports.is_empty() {
info!(
"[Host {current_host}/{total_hosts}] Found {} ports for {} interfaces",
config.switch_ports.len(),
host.network.len()
);
info!("[Host {current_host}/{total_hosts}] Configuring host network...");
topology
.configure_host_network(&config)
.await
.map_err(|e| InterpretError::new(format!("Failed to configure host: {e}")))?;
} else {
info!(
"[Host {current_host}/{total_hosts}] No ports found for {} interfaces, skipping",
host.network.len()
);
}
Ok(config)
}
async fn collect_switch_ports_for_host<T: Topology + Switch>(
&self,
topology: &T,
host: &PhysicalHost,
current_host: &usize,
total_hosts: &usize,
) -> Result<Vec<SwitchPort>, InterpretError> {
let mut switch_ports = vec![];
if host.network.is_empty() {
return Ok(switch_ports);
}
info!("[Host {current_host}/{total_hosts}] Collecting ports on switch...");
for network_interface in &host.network {
let mac_address = network_interface.mac_address;
match topology.get_port_for_mac_address(&mac_address).await {
Ok(Some(port)) => {
info!(
"[Host {current_host}/{total_hosts}] Found port '{port}' for '{mac_address}'"
);
switch_ports.push(SwitchPort {
interface: NetworkInterface {
name: network_interface.name.clone(),
@@ -73,7 +113,7 @@ impl HostNetworkConfigurationInterpret {
port,
});
}
Ok(None) => debug!("No port found for host '{}', skipping", host.id),
Ok(None) => debug!("No port found for '{mac_address}', skipping"),
Err(e) => {
return Err(InterpretError::new(format!(
"Failed to get port for host '{}': {}",
@@ -85,6 +125,47 @@ impl HostNetworkConfigurationInterpret {
Ok(switch_ports)
}
fn format_host_configuration(&self, configs: Vec<HostNetworkConfig>) -> Vec<String> {
let mut report = vec![
"Network Configuration Report".to_string(),
"------------------------------------------------------------------".to_string(),
];
for config in configs {
let host = self
.score
.hosts
.iter()
.find(|h| h.id == config.host_id)
.unwrap();
println!("[Host] {host}");
if config.switch_ports.is_empty() {
report.push(format!(
"⏭️ Host {}: SKIPPED (No matching switch ports found)",
config.host_id
));
} else {
let mappings: Vec<String> = config
.switch_ports
.iter()
.map(|p| format!("[{} -> {}]", p.interface.name, p.port))
.collect();
report.push(format!(
"✅ Host {}: Bonded {} port(s) {}",
config.host_id,
config.switch_ports.len(),
mappings.join(", ")
));
}
}
report
.push("------------------------------------------------------------------".to_string());
report
}
}
#[async_trait]
@@ -114,27 +195,38 @@ impl<T: Topology + Switch> Interpret<T> for HostNetworkConfigurationInterpret {
return Ok(Outcome::noop("No hosts to configure".into()));
}
info!(
"Started network configuration for {} host(s)...",
self.score.hosts.len()
);
let host_count = self.score.hosts.len();
info!("Started network configuration for {host_count} host(s)...",);
info!("Setting up switch with sane defaults...");
topology
.setup_switch()
.await
.map_err(|e| InterpretError::new(format!("Switch setup failed: {e}")))?;
info!("Switch ready");
let mut current_host = 1;
let mut host_configurations = vec![];
let mut configured_host_count = 0;
for host in &self.score.hosts {
self.configure_network_for_host(topology, host).await?;
configured_host_count += 1;
}
let host_configuration = self
.configure_network_for_host(topology, host, &current_host, &host_count)
.await?;
if configured_host_count > 0 {
Ok(Outcome::success(format!(
"Configured {configured_host_count}/{} host(s)",
self.score.hosts.len()
)))
host_configurations.push(host_configuration);
current_host += 1;
}
if current_host > 1 {
let details = self.format_host_configuration(host_configurations);
Ok(Outcome::success_with_details(
format!(
"Configured {}/{} host(s)",
current_host - 1,
self.score.hosts.len()
),
details,
))
} else {
Ok(Outcome::noop("No hosts configured".into()))
}
@@ -209,6 +301,7 @@ mod tests {
assert_that!(*configured_host_networks).contains_exactly(vec![(
HOST_ID.clone(),
HostNetworkConfig {
host_id: HOST_ID.clone(),
switch_ports: vec![SwitchPort {
interface: EXISTING_INTERFACE.clone(),
port: PORT.clone(),
@@ -234,6 +327,7 @@ mod tests {
assert_that!(*configured_host_networks).contains_exactly(vec![(
HOST_ID.clone(),
HostNetworkConfig {
host_id: HOST_ID.clone(),
switch_ports: vec![
SwitchPort {
interface: EXISTING_INTERFACE.clone(),
@@ -263,6 +357,7 @@ mod tests {
(
HOST_ID.clone(),
HostNetworkConfig {
host_id: HOST_ID.clone(),
switch_ports: vec![SwitchPort {
interface: EXISTING_INTERFACE.clone(),
port: PORT.clone(),
@@ -272,6 +367,7 @@ mod tests {
(
ANOTHER_HOST_ID.clone(),
HostNetworkConfig {
host_id: ANOTHER_HOST_ID.clone(),
switch_ports: vec![SwitchPort {
interface: ANOTHER_EXISTING_INTERFACE.clone(),
port: ANOTHER_PORT.clone(),
@@ -382,11 +478,10 @@ mod tests {
async fn configure_host_network(
&self,
host: &PhysicalHost,
config: HostNetworkConfig,
config: &HostNetworkConfig,
) -> Result<(), SwitchError> {
let mut configured_host_networks = self.configured_host_networks.lock().unwrap();
configured_host_networks.push((host.id.clone(), config.clone()));
configured_host_networks.push((config.host_id.clone(), config.clone()));
Ok(())
}

View File

@@ -50,7 +50,7 @@
use crate::{
modules::okd::{
OKDSetup01InventoryScore, OKDSetup02BootstrapScore, OKDSetup03ControlPlaneScore,
OKDSetup04WorkersScore, OKDSetup05SanityCheckScore,
OKDSetup04WorkersScore, OKDSetup05SanityCheckScore, OKDSetupPersistNetworkBondScore,
bootstrap_06_installation_report::OKDSetup06InstallationReportScore,
},
score::Score,
@@ -65,6 +65,7 @@ impl OKDInstallationPipeline {
Box::new(OKDSetup01InventoryScore::new()),
Box::new(OKDSetup02BootstrapScore::new()),
Box::new(OKDSetup03ControlPlaneScore::new()),
Box::new(OKDSetupPersistNetworkBondScore::new()),
Box::new(OKDSetup04WorkersScore::new()),
Box::new(OKDSetup05SanityCheckScore::new()),
Box::new(OKDSetup06InstallationReportScore::new()),

View File

@@ -6,6 +6,7 @@ mod bootstrap_05_sanity_check;
mod bootstrap_06_installation_report;
pub mod bootstrap_dhcp;
pub mod bootstrap_load_balancer;
mod bootstrap_persist_network_bond;
pub mod dhcp;
pub mod dns;
pub mod installation;
@@ -19,5 +20,6 @@ pub use bootstrap_03_control_plane::*;
pub use bootstrap_04_workers::*;
pub use bootstrap_05_sanity_check::*;
pub use bootstrap_06_installation_report::*;
pub use bootstrap_persist_network_bond::*;
pub mod crd;
pub mod host_network;

View File

@@ -12,7 +12,8 @@ use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::C
use crate::modules::monitoring::kube_prometheus::crd::crd_default_rules::build_default_application_rules;
use crate::modules::monitoring::kube_prometheus::crd::crd_grafana::{
Grafana, GrafanaDashboard, GrafanaDashboardSpec, GrafanaDatasource, GrafanaDatasourceConfig,
GrafanaDatasourceSpec, GrafanaSpec,
GrafanaDatasourceJsonData, GrafanaDatasourceSpec, GrafanaSecretKeyRef, GrafanaSpec,
GrafanaValueFrom, GrafanaValueSource,
};
use crate::modules::monitoring::kube_prometheus::crd::crd_prometheus_rules::{
PrometheusRule, PrometheusRuleSpec, RuleGroup,
@@ -39,7 +40,7 @@ use crate::{
};
use harmony_types::id::Id;
use super::prometheus::PrometheusApplicationMonitoring;
use super::prometheus::PrometheusMonitoring;
#[derive(Clone, Debug, Serialize)]
pub struct K8sPrometheusCRDAlertingScore {
@@ -49,7 +50,7 @@ pub struct K8sPrometheusCRDAlertingScore {
pub prometheus_rules: Vec<RuleGroup>,
}
impl<T: Topology + K8sclient + PrometheusApplicationMonitoring<CRDPrometheus>> Score<T>
impl<T: Topology + K8sclient + PrometheusMonitoring<CRDPrometheus>> Score<T>
for K8sPrometheusCRDAlertingScore
{
fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
@@ -75,7 +76,7 @@ pub struct K8sPrometheusCRDAlertingInterpret {
}
#[async_trait]
impl<T: Topology + K8sclient + PrometheusApplicationMonitoring<CRDPrometheus>> Interpret<T>
impl<T: Topology + K8sclient + PrometheusMonitoring<CRDPrometheus>> Interpret<T>
for K8sPrometheusCRDAlertingInterpret
{
async fn execute(
@@ -466,10 +467,13 @@ impl K8sPrometheusCRDAlertingInterpret {
match_labels: label.clone(),
match_expressions: vec![],
};
let mut json_data = BTreeMap::new();
json_data.insert("timeInterval".to_string(), "5s".to_string());
let namespace = self.sender.namespace.clone();
let json_data = GrafanaDatasourceJsonData {
time_interval: Some("5s".to_string()),
http_header_name1: None,
tls_skip_verify: Some(true),
oauth_pass_thru: Some(true),
};
let json = build_default_dashboard(&namespace);
let graf_data_source = GrafanaDatasource {
@@ -495,7 +499,11 @@ impl K8sPrometheusCRDAlertingInterpret {
"http://prometheus-operated.{}.svc.cluster.local:9090",
self.sender.namespace.clone()
),
secure_json_data: None,
is_default: None,
editable: None,
},
values_from: None,
},
};
@@ -516,7 +524,9 @@ impl K8sPrometheusCRDAlertingInterpret {
spec: GrafanaDashboardSpec {
resync_period: Some("30s".to_string()),
instance_selector: labels.clone(),
json,
json: Some(json),
grafana_com: None,
datasources: None,
},
};

View File

@@ -9,11 +9,17 @@ use crate::{
};
#[async_trait]
pub trait PrometheusApplicationMonitoring<S: AlertSender> {
pub trait PrometheusMonitoring<S: AlertSender> {
async fn install_prometheus(
&self,
sender: &S,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>,
) -> Result<PreparationOutcome, PreparationError>;
async fn ensure_prometheus_operator(
&self,
sender: &S,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError>;
}

View File

@@ -38,7 +38,7 @@ use crate::{
};
use harmony_types::id::Id;
use super::prometheus::PrometheusApplicationMonitoring;
use super::prometheus::PrometheusMonitoring;
#[derive(Clone, Debug, Serialize)]
pub struct RHOBAlertingScore {
@@ -48,8 +48,8 @@ pub struct RHOBAlertingScore {
pub prometheus_rules: Vec<RuleGroup>,
}
impl<T: Topology + K8sclient + Ingress + PrometheusApplicationMonitoring<RHOBObservability>>
Score<T> for RHOBAlertingScore
impl<T: Topology + K8sclient + Ingress + PrometheusMonitoring<RHOBObservability>> Score<T>
for RHOBAlertingScore
{
fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
Box::new(RHOBAlertingInterpret {
@@ -74,8 +74,8 @@ pub struct RHOBAlertingInterpret {
}
#[async_trait]
impl<T: Topology + K8sclient + Ingress + PrometheusApplicationMonitoring<RHOBObservability>>
Interpret<T> for RHOBAlertingInterpret
impl<T: Topology + K8sclient + Ingress + PrometheusMonitoring<RHOBObservability>> Interpret<T>
for RHOBAlertingInterpret
{
async fn execute(
&self,

View File

@@ -4,7 +4,7 @@ use std::{
};
use async_trait::async_trait;
use log::{info, warn};
use log::{debug, warn};
use serde::{Deserialize, Serialize};
use tokio::time::sleep;
@@ -19,8 +19,8 @@ use harmony_types::id::Id;
#[derive(Debug, Clone, Serialize)]
pub struct CephRemoveOsd {
osd_deployment_name: String,
rook_ceph_namespace: String,
pub osd_deployment_name: String,
pub rook_ceph_namespace: String,
}
impl<T: Topology + K8sclient> Score<T> for CephRemoveOsd {
@@ -54,18 +54,17 @@ impl<T: Topology + K8sclient> Interpret<T> for CephRemoveOsdInterpret {
self.verify_deployment_scaled(client.clone()).await?;
self.delete_deployment(client.clone()).await?;
self.verify_deployment_deleted(client.clone()).await?;
let osd_id_full = self.get_ceph_osd_id().unwrap();
self.purge_ceph_osd(client.clone(), &osd_id_full).await?;
self.verify_ceph_osd_removal(client.clone(), &osd_id_full)
.await?;
self.purge_ceph_osd(client.clone()).await?;
self.verify_ceph_osd_removal(client.clone()).await?;
let osd_id_full = self.get_ceph_osd_id().unwrap();
Ok(Outcome::success(format!(
"Successfully removed OSD {} from rook-ceph cluster by deleting deployment {}",
osd_id_full, self.score.osd_deployment_name
)))
}
fn get_name(&self) -> InterpretName {
todo!()
InterpretName::CephRemoveOsd
}
fn get_version(&self) -> Version {
@@ -82,7 +81,7 @@ impl<T: Topology + K8sclient> Interpret<T> for CephRemoveOsdInterpret {
}
impl CephRemoveOsdInterpret {
pub fn get_ceph_osd_id(&self) -> Result<String, InterpretError> {
pub fn get_ceph_osd_id_numeric(&self) -> Result<String, InterpretError> {
let osd_id_numeric = self
.score
.osd_deployment_name
@@ -94,9 +93,14 @@ impl CephRemoveOsdInterpret {
self.score.osd_deployment_name
))
})?;
Ok(osd_id_numeric.to_string())
}
pub fn get_ceph_osd_id(&self) -> Result<String, InterpretError> {
let osd_id_numeric = self.get_ceph_osd_id_numeric().unwrap();
let osd_id_full = format!("osd.{}", osd_id_numeric);
info!(
debug!(
"Targeting Ceph OSD: {} (parsed from deployment {})",
osd_id_full, self.score.osd_deployment_name
);
@@ -108,6 +112,7 @@ impl CephRemoveOsdInterpret {
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
debug!("verifying toolbox exists");
let toolbox_dep = "rook-ceph-tools".to_string();
match client
@@ -149,7 +154,7 @@ impl CephRemoveOsdInterpret {
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
info!(
debug!(
"Scaling down OSD deployment: {}",
self.score.osd_deployment_name
);
@@ -172,7 +177,7 @@ impl CephRemoveOsdInterpret {
) -> Result<Outcome, InterpretError> {
let (timeout, interval, start) = self.build_timer();
info!("Waiting for OSD deployment to scale down to 0 replicas");
debug!("Waiting for OSD deployment to scale down to 0 replicas");
loop {
let dep = client
.get_deployment(
@@ -180,11 +185,9 @@ impl CephRemoveOsdInterpret {
Some(&self.score.rook_ceph_namespace),
)
.await?;
if let Some(deployment) = dep {
if let Some(status) = deployment.status {
if status.replicas.unwrap_or(1) == 0 && status.ready_replicas.unwrap_or(1) == 0
{
if status.replicas == None && status.ready_replicas == None {
return Ok(Outcome::success(
"Deployment successfully scaled down.".to_string(),
));
@@ -212,7 +215,7 @@ impl CephRemoveOsdInterpret {
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
info!(
debug!(
"Deleting OSD deployment: {}",
self.score.osd_deployment_name
);
@@ -234,7 +237,7 @@ impl CephRemoveOsdInterpret {
) -> Result<Outcome, InterpretError> {
let (timeout, interval, start) = self.build_timer();
info!("Waiting for OSD deployment to scale down to 0 replicas");
debug!("Verifying OSD deployment deleted");
loop {
let dep = client
.get_deployment(
@@ -244,7 +247,7 @@ impl CephRemoveOsdInterpret {
.await?;
if dep.is_none() {
info!(
debug!(
"Deployment {} successfully deleted.",
self.score.osd_deployment_name
);
@@ -276,12 +279,10 @@ impl CephRemoveOsdInterpret {
Ok(tree)
}
pub async fn purge_ceph_osd(
&self,
client: Arc<K8sClient>,
osd_id_full: &str,
) -> Result<Outcome, InterpretError> {
info!(
pub async fn purge_ceph_osd(&self, client: Arc<K8sClient>) -> Result<Outcome, InterpretError> {
let osd_id_numeric = self.get_ceph_osd_id_numeric().unwrap();
let osd_id_full = self.get_ceph_osd_id().unwrap();
debug!(
"Purging OSD {} from Ceph cluster and removing its auth key",
osd_id_full
);
@@ -291,8 +292,9 @@ impl CephRemoveOsdInterpret {
"app".to_string(),
Some(&self.score.rook_ceph_namespace),
vec![
format!("ceph osd purge {osd_id_full} --yes-i-really-mean-it").as_str(),
format!("ceph auth del osd.{osd_id_full}").as_str(),
"sh",
"-c",
format!("ceph osd purge {osd_id_numeric} --yes-i-really-mean-it && ceph auth del {osd_id_full}").as_str(),
],
)
.await?;
@@ -305,10 +307,10 @@ impl CephRemoveOsdInterpret {
pub async fn verify_ceph_osd_removal(
&self,
client: Arc<K8sClient>,
osd_id_full: &str,
) -> Result<Outcome, InterpretError> {
let (timeout, interval, start) = self.build_timer();
info!(
let osd_id_full = self.get_ceph_osd_id().unwrap();
debug!(
"Verifying OSD {} has been removed from the Ceph tree...",
osd_id_full
);
@@ -318,7 +320,7 @@ impl CephRemoveOsdInterpret {
"rook-ceph-tools".to_string(),
"app".to_string(),
Some(&self.score.rook_ceph_namespace),
vec!["ceph osd tree -f json"],
vec!["sh", "-c", "ceph osd tree -f json"],
)
.await?;
let tree =

View File

@@ -1,2 +1,2 @@
pub mod ceph_osd_replacement_score;
pub mod ceph_remove_osd_score;
pub mod ceph_validate_health_score;

View File

@@ -40,7 +40,7 @@ pub fn init() {
HarmonyEvent::HarmonyFinished => {
if !details.is_empty() {
println!(
"\n{} All done! Here's what's next for you:",
"\n{} All done! Here's a few info for you:",
theme::EMOJI_SUMMARY
);
for detail in details.iter() {

View File

@@ -216,7 +216,7 @@ pub struct System {
pub maximumfrags: Option<MaybeString>,
pub aliasesresolveinterval: Option<MaybeString>,
pub maximumtableentries: Option<MaybeString>,
pub language: String,
pub language: Option<MaybeString>,
pub dnsserver: Option<MaybeString>,
pub dns1gw: Option<String>,
pub dns2gw: Option<String>,