Compare commits

..

27 Commits

Author SHA1 Message Date
a9fe4ab267 fix: cargo fmt
All checks were successful
Run Check Script / check (pull_request) Successful in 1m0s
2025-08-25 13:33:36 -04:00
65cc9befeb mod.rs
Some checks failed
Run Check Script / check (pull_request) Failing after 20s
2025-08-25 13:31:39 -04:00
d456a1f9ee feat: score to validate whether the ceph cluster is healthy 2025-08-25 13:30:32 -04:00
d36c574590 Merge pull request 'feat/inventory_agent' (#119) from feat/inventory_agent into master
Some checks failed
Run Check Script / check (push) Failing after 38s
Compile and package harmony_composer / package_harmony_composer (push) Successful in 5m48s
Reviewed-on: #119
2025-08-22 01:55:52 +00:00
bfca9cf163 Merge pull request 'feat/ceph-osd-score' (#116) from feat/ceph-osd-score into master
Some checks failed
Run Check Script / check (push) Failing after 36s
Compile and package harmony_composer / package_harmony_composer (push) Successful in 15m5s
Reviewed-on: #116
Reviewed-by: johnride <jg@nationtech.io>
2025-08-20 18:19:42 +00:00
cd3ea6fc10 fix: added check to ensure that rook-ceph-tools is available in the designated namespace
All checks were successful
Run Check Script / check (pull_request) Successful in 1m16s
2025-08-20 12:54:19 -04:00
89eb88d10e feat: socre to remove an osd from the ceph osd tree using K8sClient to interact with rook-ceph-toolbox pod 2025-08-20 12:09:55 -04:00
72fb05b5cc fix(inventory_agent) : Agent now retreives correct dmidecode fields, fixed uuid generation which is unacceptable, fixed storage drive parsing, much better error handling, much more strict behavior which also leads to more complete output as missing fields will raise errors unless explicitely optional 2025-08-19 17:56:06 -04:00
6685b05cc5 wip(inventory_agent): Refactoring for better error handling in progress 2025-08-19 17:05:23 -04:00
07116eb8a6 Merge pull request 'feat: Harmony inventory agent crate that exposes an endpoint listing the host hardware. Has to be reviewed, generated 99% by GLM-4.5' (#115) from feat/inventory_agent into master
Some checks failed
Run Check Script / check (push) Failing after 27s
Compile and package harmony_composer / package_harmony_composer (push) Successful in 5m34s
Reviewed-on: #115
2025-08-19 16:58:00 +00:00
3f34f868eb Merge remote-tracking branch 'origin/master' into feat/inventory_agent
Some checks failed
Run Check Script / check (pull_request) Failing after 29s
2025-08-19 12:56:10 -04:00
bc6f7336d2 feat(inventory_agent): use HARMONY_INVENTORY_AGENT_PORT as environment variable to set port
Some checks failed
Run Check Script / check (pull_request) Failing after 25s
2025-08-19 12:55:03 -04:00
01da8631da chore(inventory_agent): Cargo fmt
Some checks failed
Run Check Script / check (pull_request) Failing after 24s
2025-08-19 12:44:49 -04:00
67b5c2df07 Merge pull request 'feat: Add iobench project and python dashboard' (#112) from feat/iobench into master
All checks were successful
Run Check Script / check (push) Successful in 1m11s
Compile and package harmony_composer / package_harmony_composer (push) Successful in 5m41s
Reviewed-on: #112
2025-08-19 16:24:31 +00:00
1eaf63417b Merge pull request 'feat/secrets' (#111) from feat/secrets into master
Some checks failed
Compile and package harmony_composer / package_harmony_composer (push) Waiting to run
Run Check Script / check (push) Has been cancelled
Reviewed-on: #111

This pull request introduces a comprehensive and ergonomic secret management system via a new harmony-secret crate.
What's Done

    New harmony-secret Crate:
        A new crate dedicated to secret management, providing a clean, static API: SecretManager::get::<MySecret>() and SecretManager::set(&my_secret).
        A #[derive(Secret)] procedural macro that automatically uses the struct's name as the secret key, simplifying usage.
        An async SecretStore trait to support various backend implementations.

    Two Secret Store Implementations:
        LocalFileSecretStore: A simple file-based store that saves secrets as JSON in the user's data directory. Ideal for local development and testing.
        InfisicalSecretStore: A production-ready implementation that integrates with Infisical for centralized, secure secret management.

    Configuration via Environment Variables:
        The secret store is selected at runtime via the HARMONY_SECRET_STORE environment variable (file or infisical).
        Infisical integration is configured through HARMONY_SECRET_INFISICAL_* variables.

What's Not Done (Future Work)

    Automated Infisical Setup: The initial configuration for the Infisical backend is currently manual. Developers must create a project and a Universal Auth identity in Infisical and set the corresponding environment variables to run tests or use the backend. The new test_harmony_secret_infisical.sh script serves as a clear example of the required variables.

This new secrets module provides a solid and secure foundation for managing credentials for components like OPNsense, Kubernetes, and other infrastructure services going forward. Even with the manual first-time setup for Infisical, this architecture is robust enough to serve our needs for the foreseeable future.
2025-08-19 16:23:45 +00:00
5e7803d2ba chore(iobench-dash): Delete older revisions and rename to iobench-dash.py for clarity
All checks were successful
Run Check Script / check (pull_request) Successful in 1m3s
2025-08-19 12:21:42 -04:00
9a610661c7 chore: Add description and license fields to Cargo.toml to allow publishing the crate
All checks were successful
Run Check Script / check (pull_request) Successful in 1m1s
2025-08-19 12:12:41 -04:00
70a65ed5d0 Merge remote-tracking branch 'origin/master' into feat/secrets
All checks were successful
Run Check Script / check (pull_request) Successful in 1m9s
2025-08-19 12:00:19 -04:00
26e8e386b9 feat: Secret module works with infisical and local file storage backends
All checks were successful
Run Check Script / check (pull_request) Successful in 1m9s
2025-08-19 11:59:21 -04:00
19cb7f73bc feat: Harmony inventory agent crate that exposes an endpoint listing the host hardware. Has to be reviewed, generated 99% by GLM-4.5
Some checks failed
Run Check Script / check (pull_request) Failing after 29s
2025-08-19 11:24:20 -04:00
84f38974b1 Merge pull request 'fix: bring back the TUI' (#110) from fix-tui into master
All checks were successful
Run Check Script / check (push) Successful in 1m15s
Compile and package harmony_composer / package_harmony_composer (push) Successful in 5m34s
Reviewed-on: #110
2025-08-15 20:01:59 +00:00
7d027bcfc4 Merge pull request 'fix: remove indicatif in harmony_cli to simplify logging and fixing interactions' (#109) from rip-indicatif into master
Some checks failed
Compile and package harmony_composer / package_harmony_composer (push) Waiting to run
Run Check Script / check (push) Has been cancelled
Reviewed-on: #109
2025-08-15 20:01:13 +00:00
d1a274b705 fix: checks deployment status ready replicas rather than pod name since the pod name is not necessarily matching the deployment name and often has a random generated number in it 2025-08-15 15:44:06 -04:00
b43ca7c740 feat: score for preparing rook ceph cluster to remove drive based on rook-ceph-osd deployment name added functions to K8sclient to be able to scale deployment to a desired replicaset number and get pod based on name and namespace 2025-08-15 14:51:16 -04:00
Ian Letourneau
610ce84280 fix: bring back to TUI
All checks were successful
Run Check Script / check (pull_request) Successful in 1m20s
2025-08-15 12:47:36 -04:00
Ian Letourneau
8bb4a9d3f6 fix: remove indicatif in harmony_cli to simplify logging and fixing interactions
All checks were successful
Run Check Script / check (pull_request) Successful in 1m7s
2025-08-15 11:26:54 -04:00
fd8f643a8f feat: Add iobench project and python dashboard
All checks were successful
Run Check Script / check (pull_request) Successful in 1m3s
2025-08-14 10:37:30 -04:00
32 changed files with 2848 additions and 275 deletions

412
Cargo.lock generated
View File

@@ -2,6 +2,189 @@
# It is not intended for manual editing.
version = 4
[[package]]
name = "actix-codec"
version = "0.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5f7b0a21988c1bf877cf4759ef5ddaac04c1c9fe808c9142ecb78ba97d97a28a"
dependencies = [
"bitflags 2.9.1",
"bytes",
"futures-core",
"futures-sink",
"memchr",
"pin-project-lite",
"tokio",
"tokio-util",
"tracing",
]
[[package]]
name = "actix-http"
version = "3.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "44dfe5c9e0004c623edc65391dfd51daa201e7e30ebd9c9bedf873048ec32bc2"
dependencies = [
"actix-codec",
"actix-rt",
"actix-service",
"actix-utils",
"base64 0.22.1",
"bitflags 2.9.1",
"brotli",
"bytes",
"bytestring",
"derive_more",
"encoding_rs",
"flate2",
"foldhash",
"futures-core",
"h2 0.3.26",
"http 0.2.12",
"httparse",
"httpdate",
"itoa",
"language-tags",
"local-channel",
"mime",
"percent-encoding",
"pin-project-lite",
"rand 0.9.1",
"sha1",
"smallvec",
"tokio",
"tokio-util",
"tracing",
"zstd",
]
[[package]]
name = "actix-macros"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e01ed3140b2f8d422c68afa1ed2e85d996ea619c988ac834d255db32138655cb"
dependencies = [
"quote",
"syn",
]
[[package]]
name = "actix-router"
version = "0.5.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "13d324164c51f63867b57e73ba5936ea151b8a41a1d23d1031eeb9f70d0236f8"
dependencies = [
"bytestring",
"cfg-if",
"http 0.2.12",
"regex",
"regex-lite",
"serde",
"tracing",
]
[[package]]
name = "actix-rt"
version = "2.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "24eda4e2a6e042aa4e55ac438a2ae052d3b5da0ecf83d7411e1a368946925208"
dependencies = [
"futures-core",
"tokio",
]
[[package]]
name = "actix-server"
version = "2.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a65064ea4a457eaf07f2fba30b4c695bf43b721790e9530d26cb6f9019ff7502"
dependencies = [
"actix-rt",
"actix-service",
"actix-utils",
"futures-core",
"futures-util",
"mio 1.0.4",
"socket2 0.5.10",
"tokio",
"tracing",
]
[[package]]
name = "actix-service"
version = "2.0.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e46f36bf0e5af44bdc4bdb36fbbd421aa98c79a9bce724e1edeb3894e10dc7f"
dependencies = [
"futures-core",
"pin-project-lite",
]
[[package]]
name = "actix-utils"
version = "3.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "88a1dcdff1466e3c2488e1cb5c36a71822750ad43839937f85d2f4d9f8b705d8"
dependencies = [
"local-waker",
"pin-project-lite",
]
[[package]]
name = "actix-web"
version = "4.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a597b77b5c6d6a1e1097fddde329a83665e25c5437c696a3a9a4aa514a614dea"
dependencies = [
"actix-codec",
"actix-http",
"actix-macros",
"actix-router",
"actix-rt",
"actix-server",
"actix-service",
"actix-utils",
"actix-web-codegen",
"bytes",
"bytestring",
"cfg-if",
"cookie",
"derive_more",
"encoding_rs",
"foldhash",
"futures-core",
"futures-util",
"impl-more",
"itoa",
"language-tags",
"log",
"mime",
"once_cell",
"pin-project-lite",
"regex",
"regex-lite",
"serde",
"serde_json",
"serde_urlencoded",
"smallvec",
"socket2 0.5.10",
"time",
"tracing",
"url",
]
[[package]]
name = "actix-web-codegen"
version = "4.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f591380e2e68490b5dfaf1dd1aa0ebe78d84ba7067078512b4ea6e4492d622b8"
dependencies = [
"actix-router",
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "addr2line"
version = "0.24.2"
@@ -75,6 +258,21 @@ dependencies = [
"memchr",
]
[[package]]
name = "alloc-no-stdlib"
version = "2.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cc7bb162ec39d46ab1ca8c77bf72e890535becd1751bb45f64c597edb4c8c6b3"
[[package]]
name = "alloc-stdlib"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94fb8275041c72129eb51b7d0322c29b8387a0386127718b096429201a5d6ece"
dependencies = [
"alloc-no-stdlib",
]
[[package]]
name = "allocator-api2"
version = "0.2.21"
@@ -398,6 +596,27 @@ dependencies = [
"serde_with",
]
[[package]]
name = "brotli"
version = "8.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4bd8b9603c7aa97359dbd97ecf258968c95f3adddd6db2f7e7a5bef101c84560"
dependencies = [
"alloc-no-stdlib",
"alloc-stdlib",
"brotli-decompressor",
]
[[package]]
name = "brotli-decompressor"
version = "5.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "874bb8112abecc98cbd6d81ea4fa7e94fb9449648c93cc89aa40c81c24d7de03"
dependencies = [
"alloc-no-stdlib",
"alloc-stdlib",
]
[[package]]
name = "bstr"
version = "1.12.0"
@@ -427,6 +646,15 @@ version = "1.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d71b6127be86fdcfddb610f7182ac57211d4b18a3e9c82eb2d17662f2227ad6a"
[[package]]
name = "bytestring"
version = "1.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e465647ae23b2823b0753f50decb2d5a86d2bb2cac04788fafd1f80e45378e5f"
dependencies = [
"bytes",
]
[[package]]
name = "camino"
version = "1.1.10"
@@ -506,6 +734,8 @@ version = "1.2.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d487aa071b5f64da6f19a3e848e3578944b726ee5a4854b82172f02aa876bfdc"
dependencies = [
"jobserver",
"libc",
"shlex",
]
@@ -710,6 +940,17 @@ dependencies = [
"unicode-segmentation",
]
[[package]]
name = "cookie"
version = "0.16.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e859cd57d0710d9e06c381b550c06e76992472a8c6d527aecd2fc673dcc231fb"
dependencies = [
"percent-encoding",
"time",
"version_check",
]
[[package]]
name = "core-foundation"
version = "0.9.4"
@@ -763,6 +1004,25 @@ dependencies = [
"crossbeam-utils",
]
[[package]]
name = "crossbeam-deque"
version = "0.8.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9dd111b7b7f7d55b72c0a6ae361660ee5853c9af73f70c3c2ef6858b950e2e51"
dependencies = [
"crossbeam-epoch",
"crossbeam-utils",
]
[[package]]
name = "crossbeam-epoch"
version = "0.9.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e"
dependencies = [
"crossbeam-utils",
]
[[package]]
name = "crossbeam-utils"
version = "0.8.21"
@@ -973,6 +1233,7 @@ dependencies = [
"proc-macro2",
"quote",
"syn",
"unicode-xid",
]
[[package]]
@@ -1872,6 +2133,7 @@ name = "harmony_cli"
version = "0.1.0"
dependencies = [
"assert_cmd",
"chrono",
"clap",
"console",
"env_logger",
@@ -1906,6 +2168,18 @@ dependencies = [
"tokio",
]
[[package]]
name = "harmony_inventory_agent"
version = "0.1.0"
dependencies = [
"actix-web",
"env_logger",
"log",
"serde",
"serde_json",
"sysinfo",
]
[[package]]
name = "harmony_macros"
version = "0.1.0"
@@ -2345,7 +2619,7 @@ dependencies = [
"js-sys",
"log",
"wasm-bindgen",
"windows-core",
"windows-core 0.61.2",
]
[[package]]
@@ -2470,6 +2744,12 @@ dependencies = [
"icu_properties",
]
[[package]]
name = "impl-more"
version = "0.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8a5a9a0ff0086c7a148acb942baaabeadf9504d10400b5a05645853729b9cd2"
[[package]]
name = "indenter"
version = "0.3.3"
@@ -2654,6 +2934,16 @@ dependencies = [
"syn",
]
[[package]]
name = "jobserver"
version = "0.1.33"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "38f262f097c174adebe41eb73d66ae9c06b2844fb0da69969647bbddd9b0538a"
dependencies = [
"getrandom 0.3.3",
"libc",
]
[[package]]
name = "js-sys"
version = "0.3.77"
@@ -2856,6 +3146,12 @@ dependencies = [
"tracing",
]
[[package]]
name = "language-tags"
version = "0.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d4345964bb142484797b161f473a503a434de77149dd8c7427788c6e13379388"
[[package]]
name = "lazy_static"
version = "1.5.0"
@@ -2919,6 +3215,23 @@ version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "241eaef5fd12c88705a01fc1066c48c4b36e0dd4377dcdc7ec3942cea7a69956"
[[package]]
name = "local-channel"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6cbc85e69b8df4b8bb8b89ec634e7189099cea8927a276b7384ce5488e53ec8"
dependencies = [
"futures-core",
"futures-sink",
"local-waker",
]
[[package]]
name = "local-waker"
version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4d873d7c67ce09b42110d801813efbc9364414e356be9935700d368351657487"
[[package]]
name = "lock_api"
version = "0.4.13"
@@ -3054,6 +3367,15 @@ dependencies = [
"serde",
]
[[package]]
name = "ntapi"
version = "0.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8a3895c6391c39d7fe7ebc444a87eb2991b2a0bc718fdabd071eec617fc68e4"
dependencies = [
"winapi",
]
[[package]]
name = "num-bigint"
version = "0.4.6"
@@ -3838,6 +4160,26 @@ dependencies = [
"unicode-width 0.2.0",
]
[[package]]
name = "rayon"
version = "1.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "368f01d005bf8fd9b1206fb6fa653e6c4a81ceb1466406b81792d87c5677a58f"
dependencies = [
"either",
"rayon-core",
]
[[package]]
name = "rayon-core"
version = "1.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "22e18b0f0062d30d4230b2e85ff77fdfe4326feb054b9783a3460d8435c8ab91"
dependencies = [
"crossbeam-deque",
"crossbeam-utils",
]
[[package]]
name = "redox_syscall"
version = "0.5.13"
@@ -3901,6 +4243,12 @@ dependencies = [
"regex-syntax",
]
[[package]]
name = "regex-lite"
version = "0.1.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "53a49587ad06b26609c52e423de037e7f57f20d53535d66e08c695f347df952a"
[[package]]
name = "regex-syntax"
version = "0.8.5"
@@ -4955,6 +5303,21 @@ dependencies = [
"syn",
]
[[package]]
name = "sysinfo"
version = "0.30.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0a5b4ddaee55fb2bea2bf0e5000747e5f5c0de765e5a5ff87f4cd106439f4bb3"
dependencies = [
"cfg-if",
"core-foundation-sys",
"libc",
"ntapi",
"once_cell",
"rayon",
"windows",
]
[[package]]
name = "system-configuration"
version = "0.5.1"
@@ -5758,6 +6121,25 @@ version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"
[[package]]
name = "windows"
version = "0.52.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e48a53791691ab099e5e2ad123536d0fff50652600abaf43bbf952894110d0be"
dependencies = [
"windows-core 0.52.0",
"windows-targets 0.52.6",
]
[[package]]
name = "windows-core"
version = "0.52.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "33ab640c8d7e35bf8ba19b884ba838ceb4fba93a4e8c65a9059d08afcfc683d9"
dependencies = [
"windows-targets 0.52.6",
]
[[package]]
name = "windows-core"
version = "0.61.2"
@@ -6241,3 +6623,31 @@ dependencies = [
"quote",
"syn",
]
[[package]]
name = "zstd"
version = "0.13.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e91ee311a569c327171651566e07972200e76fcfe2242a4fa446149a3881c08a"
dependencies = [
"zstd-safe",
]
[[package]]
name = "zstd-safe"
version = "7.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8f49c4d5f0abb602a93fb8736af2a4f4dd9512e36f7f570d66e65ff867ed3b9d"
dependencies = [
"zstd-sys",
]
[[package]]
name = "zstd-sys"
version = "2.0.15+zstd.1.5.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eb81183ddd97d0c74cedf1d50d85c8d08c1b8b68ee863bdee9e706eedba1a237"
dependencies = [
"cc",
"pkg-config",
]

View File

@@ -12,6 +12,7 @@ members = [
"harmony_cli",
"k3d",
"harmony_composer",
"harmony_inventory_agent",
"harmony_secret_derive",
"harmony_secret",
]
@@ -22,7 +23,7 @@ readme = "README.md"
license = "GNU AGPL v3"
[workspace.dependencies]
log = "0.4"
log = { version = "0.4", features = ["kv"] }
env_logger = "0.11"
derive-new = "0.7"
async-trait = "0.1"
@@ -62,3 +63,5 @@ tar = "0.4.44"
lazy_static = "1.5.0"
directories = "6.0.0"
thiserror = "2.0.14"
serde = { version = "1.0.209", features = ["derive", "rc"] }
serde_json = "1.0.127"

Binary file not shown.

View File

@@ -8,7 +8,6 @@ use harmony::{
hardware::{FirewallGroup, HostCategory, Location, PhysicalHost, SwitchGroup},
infra::opnsense::OPNSenseManagementInterface,
inventory::Inventory,
maestro::Maestro,
modules::{
http::StaticFilesHttpScore,
ipxe::IpxeScore,
@@ -130,16 +129,21 @@ async fn main() {
"./data/watchguard/pxe-http-files".to_string(),
));
let ipxe_score = IpxeScore::new();
let mut maestro = Maestro::initialize(inventory, topology).await.unwrap();
maestro.register_all(vec![
Box::new(dns_score),
Box::new(bootstrap_dhcp_score),
Box::new(bootstrap_load_balancer_score),
Box::new(load_balancer_score),
Box::new(tftp_score),
Box::new(http_score),
Box::new(ipxe_score),
Box::new(dhcp_score),
]);
harmony_tui::init(maestro).await.unwrap();
harmony_tui::run(
inventory,
topology,
vec![
Box::new(dns_score),
Box::new(bootstrap_dhcp_score),
Box::new(bootstrap_load_balancer_score),
Box::new(load_balancer_score),
Box::new(tftp_score),
Box::new(http_score),
Box::new(ipxe_score),
Box::new(dhcp_score),
],
)
.await
.unwrap();
}

View File

@@ -8,7 +8,6 @@ use harmony::{
hardware::{FirewallGroup, HostCategory, Location, PhysicalHost, SwitchGroup},
infra::opnsense::OPNSenseManagementInterface,
inventory::Inventory,
maestro::Maestro,
modules::{
dummy::{ErrorScore, PanicScore, SuccessScore},
http::StaticFilesHttpScore,
@@ -84,20 +83,25 @@ async fn main() {
let http_score = StaticFilesHttpScore::new(Url::LocalFolder(
"./data/watchguard/pxe-http-files".to_string(),
));
let mut maestro = Maestro::initialize(inventory, topology).await.unwrap();
maestro.register_all(vec![
Box::new(dns_score),
Box::new(dhcp_score),
Box::new(load_balancer_score),
Box::new(tftp_score),
Box::new(http_score),
Box::new(OPNsenseShellCommandScore {
opnsense: opnsense.get_opnsense_config(),
command: "touch /tmp/helloharmonytouching".to_string(),
}),
Box::new(SuccessScore {}),
Box::new(ErrorScore {}),
Box::new(PanicScore {}),
]);
harmony_tui::init(maestro).await.unwrap();
harmony_tui::run(
inventory,
topology,
vec![
Box::new(dns_score),
Box::new(dhcp_score),
Box::new(load_balancer_score),
Box::new(tftp_score),
Box::new(http_score),
Box::new(OPNsenseShellCommandScore {
opnsense: opnsense.get_opnsense_config(),
command: "touch /tmp/helloharmonytouching".to_string(),
}),
Box::new(SuccessScore {}),
Box::new(ErrorScore {}),
Box::new(PanicScore {}),
],
)
.await
.unwrap();
}

View File

@@ -2,7 +2,6 @@ use std::net::{SocketAddr, SocketAddrV4};
use harmony::{
inventory::Inventory,
maestro::Maestro,
modules::{
dns::DnsScore,
dummy::{ErrorScore, PanicScore, SuccessScore},
@@ -16,18 +15,19 @@ use harmony_macros::ipv4;
#[tokio::main]
async fn main() {
let inventory = Inventory::autoload();
let topology = DummyInfra {};
let mut maestro = Maestro::initialize(inventory, topology).await.unwrap();
maestro.register_all(vec![
Box::new(SuccessScore {}),
Box::new(ErrorScore {}),
Box::new(PanicScore {}),
Box::new(DnsScore::new(vec![], None)),
Box::new(build_large_score()),
]);
harmony_tui::init(maestro).await.unwrap();
harmony_tui::run(
Inventory::autoload(),
DummyInfra {},
vec![
Box::new(SuccessScore {}),
Box::new(ErrorScore {}),
Box::new(PanicScore {}),
Box::new(DnsScore::new(vec![], None)),
Box::new(build_large_score()),
],
)
.await
.unwrap();
}
fn build_large_score() -> LoadBalancerScore {

View File

@@ -0,0 +1,11 @@
[package]
name = "example_validate_ceph_cluster_health"
edition = "2024"
version.workspace = true
readme.workspace = true
license.workspace = true
[dependencies]
harmony = { version = "0.1.0", path = "../../harmony" }
harmony_cli = { version = "0.1.0", path = "../../harmony_cli" }
tokio.workspace = true

View File

@@ -0,0 +1,18 @@
use harmony::{
inventory::Inventory,
modules::storage::ceph::ceph_validate_health_score::CephVerifyClusterHealth,
topology::K8sAnywhereTopology,
};
#[tokio::main]
async fn main() {
let ceph_health_score = CephVerifyClusterHealth {
rook_ceph_namespace: "rook-ceph".to_string(),
};
let topology = K8sAnywhereTopology::from_env();
let inventory = Inventory::autoload();
harmony_cli::run(inventory, topology, vec![Box::new(ceph_health_score)], None)
.await
.unwrap();
}

View File

@@ -16,8 +16,8 @@ reqwest = { version = "0.11", features = ["blocking", "json"] }
russh = "0.45.0"
rust-ipmi = "0.1.1"
semver = "1.0.23"
serde = { version = "1.0.209", features = ["derive", "rc"] }
serde_json = "1.0.127"
serde.workspace = true
serde_json.workspace = true
tokio.workspace = true
derive-new.workspace = true
log.workspace = true

View File

@@ -32,6 +32,7 @@ pub enum InterpretName {
Lamp,
ApplicationMonitoring,
K8sPrometheusCrdAlerting,
CephClusterHealth,
}
impl std::fmt::Display for InterpretName {
@@ -58,6 +59,7 @@ impl std::fmt::Display for InterpretName {
InterpretName::Lamp => f.write_str("LAMP"),
InterpretName::ApplicationMonitoring => f.write_str("ApplicationMonitoring"),
InterpretName::K8sPrometheusCrdAlerting => f.write_str("K8sPrometheusCrdAlerting"),
InterpretName::CephClusterHealth => f.write_str("CephClusterHealth"),
}
}
}

View File

@@ -241,7 +241,7 @@ pub struct DummyInfra;
#[async_trait]
impl Topology for DummyInfra {
fn name(&self) -> &str {
todo!()
"DummyInfra"
}
async fn ensure_ready(&self) -> Result<PreparationOutcome, PreparationError> {

View File

@@ -5,7 +5,7 @@ use k8s_openapi::{
};
use kube::{
Client, Config, Error, Resource,
api::{Api, AttachParams, ListParams, Patch, PatchParams, ResourceExt},
api::{Api, AttachParams, DeleteParams, ListParams, Patch, PatchParams, ResourceExt},
config::{KubeConfigOptions, Kubeconfig},
core::ErrorResponse,
runtime::reflector::Lookup,
@@ -17,7 +17,9 @@ use kube::{
};
use log::{debug, error, trace};
use serde::{Serialize, de::DeserializeOwned};
use serde_json::json;
use similar::TextDiff;
use tokio::io::AsyncReadExt;
#[derive(new, Clone)]
pub struct K8sClient {
@@ -51,6 +53,66 @@ impl K8sClient {
})
}
pub async fn get_deployment(
&self,
name: &str,
namespace: Option<&str>,
) -> Result<Option<Deployment>, Error> {
let deps: Api<Deployment> = if let Some(ns) = namespace {
Api::namespaced(self.client.clone(), ns)
} else {
Api::default_namespaced(self.client.clone())
};
Ok(deps.get_opt(name).await?)
}
pub async fn get_pod(&self, name: &str, namespace: Option<&str>) -> Result<Option<Pod>, Error> {
let pods: Api<Pod> = if let Some(ns) = namespace {
Api::namespaced(self.client.clone(), ns)
} else {
Api::default_namespaced(self.client.clone())
};
Ok(pods.get_opt(name).await?)
}
pub async fn scale_deployment(
&self,
name: &str,
namespace: Option<&str>,
replicas: u32,
) -> Result<(), Error> {
let deployments: Api<Deployment> = if let Some(ns) = namespace {
Api::namespaced(self.client.clone(), ns)
} else {
Api::default_namespaced(self.client.clone())
};
let patch = json!({
"spec": {
"replicas": replicas
}
});
let pp = PatchParams::default();
let scale = Patch::Apply(&patch);
deployments.patch_scale(name, &pp, &scale).await?;
Ok(())
}
pub async fn delete_deployment(
&self,
name: &str,
namespace: Option<&str>,
) -> Result<(), Error> {
let deployments: Api<Deployment> = if let Some(ns) = namespace {
Api::namespaced(self.client.clone(), ns)
} else {
Api::default_namespaced(self.client.clone())
};
let delete_params = DeleteParams::default();
deployments.delete(name, &delete_params).await?;
Ok(())
}
pub async fn wait_until_deployment_ready(
&self,
name: String,
@@ -76,6 +138,68 @@ impl K8sClient {
}
}
/// Will execute a commond in the first pod found that matches the specified label
/// '{label}={name}'
pub async fn exec_app_capture_output(
&self,
name: String,
label: String,
namespace: Option<&str>,
command: Vec<&str>,
) -> Result<String, String> {
let api: Api<Pod>;
if let Some(ns) = namespace {
api = Api::namespaced(self.client.clone(), ns);
} else {
api = Api::default_namespaced(self.client.clone());
}
let pod_list = api
.list(&ListParams::default().labels(format!("{label}={name}").as_str()))
.await
.expect("couldn't get list of pods");
let res = api
.exec(
pod_list
.items
.first()
.expect("couldn't get pod")
.name()
.expect("couldn't get pod name")
.into_owned()
.as_str(),
command,
&AttachParams::default().stdout(true).stderr(true),
)
.await;
match res {
Err(e) => Err(e.to_string()),
Ok(mut process) => {
let status = process
.take_status()
.expect("Couldn't get status")
.await
.expect("Couldn't unwrap status");
if let Some(s) = status.status {
let mut stdout_buf = String::new();
if let Some(mut stdout) = process.stdout().take() {
stdout.read_to_string(&mut stdout_buf).await;
}
debug!("Status: {} - {:?}", s, status.details);
if s == "Success" {
Ok(stdout_buf)
} else {
Err(s)
}
} else {
Err("Couldn't get inner status of pod exec".to_string())
}
}
}
}
/// Will execute a command in the first pod found that matches the label `app.kubernetes.io/name={name}`
pub async fn exec_app(
&self,

View File

@@ -22,18 +22,12 @@ pub struct OPNSenseFirewall {
host: LogicalHost,
}
// TODO figure out a design to have a unique identifiere for this firewall
// I think a project identifier would be good enough, then the secrets module configuration will
// point to the project's vault and this opnsense modules doesn't need to know anything about it
const OPNSENSE_CREDENTIALS: &str = "OPNSENSE_CREDENTIALS";
impl OPNSenseFirewall {
pub fn get_ip(&self) -> IpAddress {
self.host.ip
}
pub async fn new(host: LogicalHost, port: Option<u16>, username: &str, password: &str) -> Self {
// let credentials = Secrets::get_by_name(OPNSENSE_CREDENTIALS)
Self {
opnsense_config: Arc::new(RwLock::new(
opnsense_config::Config::from_credentials(host.ip, port, username, password).await,

View File

@@ -14,5 +14,6 @@ pub mod monitoring;
pub mod okd;
pub mod opnsense;
pub mod prometheus;
pub mod storage;
pub mod tenant;
pub mod tftp;

View File

@@ -0,0 +1,419 @@
use std::{
process::Command,
sync::Arc,
time::{Duration, Instant},
};
use async_trait::async_trait;
use log::{info, warn};
use serde::{Deserialize, Serialize};
use tokio::time::sleep;
use crate::{
data::{Id, Version},
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
score::Score,
topology::{K8sclient, Topology, k8s::K8sClient},
};
#[derive(Debug, Clone, Serialize)]
pub struct CephRemoveOsd {
osd_deployment_name: String,
rook_ceph_namespace: String,
}
impl<T: Topology + K8sclient> Score<T> for CephRemoveOsd {
fn name(&self) -> String {
format!("CephRemoveOsdScore")
}
#[doc(hidden)]
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(CephRemoveOsdInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone)]
pub struct CephRemoveOsdInterpret {
score: CephRemoveOsd,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for CephRemoveOsdInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let client = topology.k8s_client().await.unwrap();
self.verify_ceph_toolbox_exists(client.clone()).await?;
self.scale_deployment(client.clone()).await?;
self.verify_deployment_scaled(client.clone()).await?;
self.delete_deployment(client.clone()).await?;
self.verify_deployment_deleted(client.clone()).await?;
let osd_id_full = self.get_ceph_osd_id().unwrap();
self.purge_ceph_osd(client.clone(), &osd_id_full).await?;
self.verify_ceph_osd_removal(client.clone(), &osd_id_full)
.await?;
Ok(Outcome::success(format!(
"Successfully removed OSD {} from rook-ceph cluster by deleting deployment {}",
osd_id_full, self.score.osd_deployment_name
)))
}
fn get_name(&self) -> InterpretName {
todo!()
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
impl CephRemoveOsdInterpret {
pub fn get_ceph_osd_id(&self) -> Result<String, InterpretError> {
let osd_id_numeric = self
.score
.osd_deployment_name
.split('-')
.nth(3)
.ok_or_else(|| {
InterpretError::new(format!(
"Could not parse OSD id from deployment name {}",
self.score.osd_deployment_name
))
})?;
let osd_id_full = format!("osd.{}", osd_id_numeric);
info!(
"Targeting Ceph OSD: {} (parsed from deployment {})",
osd_id_full, self.score.osd_deployment_name
);
Ok(osd_id_full)
}
pub async fn verify_ceph_toolbox_exists(
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
let toolbox_dep = "rook-ceph-tools".to_string();
match client
.get_deployment(&toolbox_dep, Some(&self.score.rook_ceph_namespace))
.await
{
Ok(Some(deployment)) => {
if let Some(status) = deployment.status {
let ready_count = status.ready_replicas.unwrap_or(0);
if ready_count >= 1 {
return Ok(Outcome::success(format!(
"'{}' is ready with {} replica(s).",
&toolbox_dep, ready_count
)));
} else {
return Err(InterpretError::new(
"ceph-tool-box not ready in cluster".to_string(),
));
}
} else {
Err(InterpretError::new(format!(
"failed to get deployment status {}",
&toolbox_dep
)))
}
}
Ok(None) => Err(InterpretError::new(format!(
"Deployment '{}' not found in namespace '{}'.",
&toolbox_dep, self.score.rook_ceph_namespace
))),
Err(e) => Err(InterpretError::new(format!(
"Failed to query for deployment '{}': {}",
&toolbox_dep, e
))),
}
}
pub async fn scale_deployment(
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
info!(
"Scaling down OSD deployment: {}",
self.score.osd_deployment_name
);
client
.scale_deployment(
&self.score.osd_deployment_name,
Some(&self.score.rook_ceph_namespace),
0,
)
.await?;
Ok(Outcome::success(format!(
"Scaled down deployment {}",
self.score.osd_deployment_name
)))
}
pub async fn verify_deployment_scaled(
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
let (timeout, interval, start) = self.build_timer();
info!("Waiting for OSD deployment to scale down to 0 replicas");
loop {
let dep = client
.get_deployment(
&self.score.osd_deployment_name,
Some(&self.score.rook_ceph_namespace),
)
.await?;
if let Some(deployment) = dep {
if let Some(status) = deployment.status {
if status.replicas.unwrap_or(1) == 0 && status.ready_replicas.unwrap_or(1) == 0
{
return Ok(Outcome::success(
"Deployment successfully scaled down.".to_string(),
));
}
}
}
if start.elapsed() > timeout {
return Err(InterpretError::new(format!(
"Timed out waiting for deployment {} to scale down",
self.score.osd_deployment_name
)));
}
sleep(interval).await;
}
}
fn build_timer(&self) -> (Duration, Duration, Instant) {
let timeout = Duration::from_secs(120);
let interval = Duration::from_secs(5);
let start = Instant::now();
(timeout, interval, start)
}
pub async fn delete_deployment(
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
info!(
"Deleting OSD deployment: {}",
self.score.osd_deployment_name
);
client
.delete_deployment(
&self.score.osd_deployment_name,
Some(&self.score.rook_ceph_namespace),
)
.await?;
Ok(Outcome::success(format!(
"deployment {} deleted",
self.score.osd_deployment_name
)))
}
pub async fn verify_deployment_deleted(
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
let (timeout, interval, start) = self.build_timer();
info!("Waiting for OSD deployment to scale down to 0 replicas");
loop {
let dep = client
.get_deployment(
&self.score.osd_deployment_name,
Some(&self.score.rook_ceph_namespace),
)
.await?;
if dep.is_none() {
info!(
"Deployment {} successfully deleted.",
self.score.osd_deployment_name
);
return Ok(Outcome::success(format!(
"Deployment {} deleted.",
self.score.osd_deployment_name
)));
}
if start.elapsed() > timeout {
return Err(InterpretError::new(format!(
"Timed out waiting for deployment {} to be deleted",
self.score.osd_deployment_name
)));
}
sleep(interval).await;
}
}
fn get_osd_tree(&self, json: serde_json::Value) -> Result<CephOsdTree, InterpretError> {
let nodes = json.get("nodes").ok_or_else(|| {
InterpretError::new("Missing 'nodes' field in ceph osd tree JSON".to_string())
})?;
let tree: CephOsdTree = CephOsdTree {
nodes: serde_json::from_value(nodes.clone()).map_err(|e| {
InterpretError::new(format!("Failed to parse ceph osd tree JSON: {}", e))
})?,
};
Ok(tree)
}
pub async fn purge_ceph_osd(
&self,
client: Arc<K8sClient>,
osd_id_full: &str,
) -> Result<Outcome, InterpretError> {
info!(
"Purging OSD {} from Ceph cluster and removing its auth key",
osd_id_full
);
client
.exec_app_capture_output(
"rook-ceph-tools".to_string(),
"app".to_string(),
Some(&self.score.rook_ceph_namespace),
vec![
format!("ceph osd purge {osd_id_full} --yes-i-really-mean-it").as_str(),
format!("ceph auth del osd.{osd_id_full}").as_str(),
],
)
.await?;
Ok(Outcome::success(format!(
"osd id {} removed from osd tree",
osd_id_full
)))
}
pub async fn verify_ceph_osd_removal(
&self,
client: Arc<K8sClient>,
osd_id_full: &str,
) -> Result<Outcome, InterpretError> {
let (timeout, interval, start) = self.build_timer();
info!(
"Verifying OSD {} has been removed from the Ceph tree...",
osd_id_full
);
loop {
let output = client
.exec_app_capture_output(
"rook-ceph-tools".to_string(),
"app".to_string(),
Some(&self.score.rook_ceph_namespace),
vec!["ceph osd tree -f json"],
)
.await?;
let tree =
self.get_osd_tree(serde_json::from_str(&output).expect("could not extract json"));
let osd_found = tree
.unwrap()
.nodes
.iter()
.any(|node| node.name == osd_id_full);
if !osd_found {
return Ok(Outcome::success(format!(
"Successfully verified that OSD {} is removed from the Ceph cluster.",
osd_id_full,
)));
}
if start.elapsed() > timeout {
return Err(InterpretError::new(format!(
"Timed out waiting for OSD {} to be removed from Ceph tree",
osd_id_full
)));
}
warn!(
"OSD {} still found in Ceph tree, retrying in {:?}...",
osd_id_full, interval
);
sleep(interval).await;
}
}
}
#[derive(Debug, Deserialize, PartialEq)]
pub struct CephOsdTree {
pub nodes: Vec<CephNode>,
}
#[derive(Debug, Deserialize, PartialEq)]
pub struct CephNode {
pub id: i32,
pub name: String,
#[serde(rename = "type")]
pub node_type: String,
pub type_id: Option<i32>,
pub children: Option<Vec<i32>>,
pub exists: Option<i32>,
pub status: Option<String>,
}
#[cfg(test)]
mod tests {
use serde_json::json;
use super::*;
#[test]
fn test_get_osd_tree() {
let json_data = json!({
"nodes": [
{"id": 1, "name": "osd.1", "type": "osd", "primary_affinity":"1"},
{"id": 2, "name": "osd.2", "type": "osd", "crush_weight": 1.22344}
]
});
let interpret = CephRemoveOsdInterpret {
score: CephRemoveOsd {
osd_deployment_name: "osd-1".to_string(),
rook_ceph_namespace: "dummy_ns".to_string(),
},
};
let json = interpret.get_osd_tree(json_data).unwrap();
let expected = CephOsdTree {
nodes: vec![
CephNode {
id: 1,
name: "osd.1".to_string(),
node_type: "osd".to_string(),
type_id: None,
children: None,
exists: None,
status: None,
},
CephNode {
id: 2,
name: "osd.2".to_string(),
node_type: "osd".to_string(),
type_id: None,
children: None,
exists: None,
status: None,
},
],
};
assert_eq!(json, expected);
}
}

View File

@@ -0,0 +1,136 @@
use std::{sync::Arc, time::Duration};
use async_trait::async_trait;
use log::debug;
use serde::Serialize;
use tokio::time::Instant;
use crate::{
data::{Id, Version},
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
score::Score,
topology::{K8sclient, Topology, k8s::K8sClient},
};
#[derive(Clone, Debug, Serialize)]
pub struct CephVerifyClusterHealth {
pub rook_ceph_namespace: String,
}
impl<T: Topology + K8sclient> Score<T> for CephVerifyClusterHealth {
fn name(&self) -> String {
format!("CephValidateClusterHealth")
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(CephVerifyClusterHealthInterpret {
score: self.clone(),
})
}
}
#[derive(Clone, Debug)]
pub struct CephVerifyClusterHealthInterpret {
score: CephVerifyClusterHealth,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for CephVerifyClusterHealthInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let client = topology.k8s_client().await.unwrap();
self.verify_ceph_toolbox_exists(client.clone()).await?;
self.validate_ceph_cluster_health(client.clone()).await?;
Ok(Outcome::success("Ceph cluster healthy".to_string()))
}
fn get_name(&self) -> InterpretName {
InterpretName::CephClusterHealth
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
impl CephVerifyClusterHealthInterpret {
pub async fn verify_ceph_toolbox_exists(
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
let toolbox_dep = "rook-ceph-tools".to_string();
match client
.get_deployment(&toolbox_dep, Some(&self.score.rook_ceph_namespace))
.await
{
Ok(Some(deployment)) => {
if let Some(status) = deployment.status {
let ready_count = status.ready_replicas.unwrap_or(0);
if ready_count >= 1 {
return Ok(Outcome::success(format!(
"'{}' is ready with {} replica(s).",
&toolbox_dep, ready_count
)));
} else {
return Err(InterpretError::new(
"ceph-tool-box not ready in cluster".to_string(),
));
}
} else {
Err(InterpretError::new(format!(
"failed to get deployment status {}",
&toolbox_dep
)))
}
}
Ok(None) => Err(InterpretError::new(format!(
"Deployment '{}' not found in namespace '{}'.",
&toolbox_dep, self.score.rook_ceph_namespace
))),
Err(e) => Err(InterpretError::new(format!(
"Failed to query for deployment '{}': {}",
&toolbox_dep, e
))),
}
}
pub async fn validate_ceph_cluster_health(
&self,
client: Arc<K8sClient>,
) -> Result<Outcome, InterpretError> {
debug!("Verifying ceph cluster is in healthy state");
let health = client
.exec_app_capture_output(
"rook-ceph-tools".to_string(),
"app".to_string(),
Some(&self.score.rook_ceph_namespace),
vec!["sh", "-c", "ceph health"],
)
.await?;
if health.contains("HEALTH_OK") {
return Ok(Outcome::success(
"Ceph Cluster in healthy state".to_string(),
));
} else {
Err(InterpretError::new(format!(
"Ceph cluster unhealthy {}",
health
)))
}
}
}

View File

@@ -0,0 +1,2 @@
pub mod ceph_osd_replacement_score;
pub mod ceph_validate_health_score;

View File

@@ -0,0 +1 @@
pub mod ceph;

View File

@@ -1,51 +0,0 @@
use async_trait::async_trait;
use chrono::{DateTime, Utc};
use serde::Serialize;
use crate::{interpret::InterpretError, score::Score, topology::Topology};
/// Create and manage Tenant Credentials.
///
/// This is meant to be used by cluster administrators who need to provide their tenant users and
/// services with credentials to access their resources.
#[derive(Debug, Clone, Serialize)]
pub struct TenantCredentialScore;
impl<T: Topology + TenantCredentialManager> Score<T> for TenantCredentialScore {
fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
todo!()
}
fn name(&self) -> String {
todo!()
}
}
#[async_trait]
pub trait TenantCredentialManager {
async fn create_user(&self) -> Result<TenantCredentialBundle, InterpretError>;
}
#[derive(Debug, Clone)]
pub struct CredentialMetadata {
pub tenant_id: String,
pub credential_id: String,
pub description: String,
pub created_at: DateTime<Utc>,
pub expires_at: Option<DateTime<Utc>>,
}
#[derive(Debug, Clone)]
pub enum CredentialData {
/// Used to store login instructions destined to a human. Akin to AWS login instructions email
/// upon new console user creation.
PlainText(String),
}
pub struct TenantCredentialBundle {
_metadata: CredentialMetadata,
_content: CredentialData,
}
impl TenantCredentialBundle {}

View File

@@ -22,6 +22,7 @@ indicatif = "0.18.0"
lazy_static = "1.5.0"
log.workspace = true
indicatif-log-bridge = "0.2.3"
chrono.workspace = true
[dev-dependencies]
harmony = { path = "../harmony", features = ["testing"] }

View File

@@ -1,22 +1,17 @@
use chrono::Local;
use console::style;
use harmony::{
instrumentation::{self, HarmonyEvent},
modules::application::ApplicationFeatureStatus,
topology::TopologyStatus,
};
use indicatif::MultiProgress;
use indicatif_log_bridge::LogWrapper;
use log::error;
use std::{
sync::{Arc, Mutex},
thread,
time::Duration,
};
use crate::progress::{IndicatifProgressTracker, ProgressTracker};
use log::{error, info, log_enabled};
use std::io::Write;
use std::sync::{Arc, Mutex};
pub fn init() -> tokio::task::JoinHandle<()> {
let base_progress = configure_logger();
let handle = tokio::spawn(handle_events(base_progress));
configure_logger();
let handle = tokio::spawn(handle_events());
loop {
if instrumentation::instrument(HarmonyEvent::HarmonyStarted).is_ok() {
@@ -27,28 +22,76 @@ pub fn init() -> tokio::task::JoinHandle<()> {
handle
}
fn configure_logger() -> MultiProgress {
let logger =
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).build();
let level = logger.filter();
let progress = MultiProgress::new();
fn configure_logger() {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info"))
.format(|buf, record| {
let debug_mode = log_enabled!(log::Level::Debug);
let timestamp = Local::now().format("%Y-%m-%d %H:%M:%S");
LogWrapper::new(progress.clone(), logger)
.try_init()
.unwrap();
log::set_max_level(level);
progress
let level = match record.level() {
log::Level::Error => style("ERROR").red(),
log::Level::Warn => style("WARN").yellow(),
log::Level::Info => style("INFO").green(),
log::Level::Debug => style("DEBUG").blue(),
log::Level::Trace => style("TRACE").magenta(),
};
if let Some(status) = record.key_values().get(log::kv::Key::from("status")) {
let status = status.to_borrowed_str().unwrap();
let emoji = match status {
"finished" => style(crate::theme::EMOJI_SUCCESS.to_string()).green(),
"skipped" => style(crate::theme::EMOJI_SKIP.to_string()).yellow(),
"failed" => style(crate::theme::EMOJI_ERROR.to_string()).red(),
_ => style("".into()),
};
if debug_mode {
writeln!(
buf,
"[{} {:<5} {}] {} {}",
timestamp,
level,
record.target(),
emoji,
record.args()
)
} else {
writeln!(buf, "[{:<5}] {} {}", level, emoji, record.args())
}
} else if let Some(emoji) = record.key_values().get(log::kv::Key::from("emoji")) {
if debug_mode {
writeln!(
buf,
"[{} {:<5} {}] {} {}",
timestamp,
level,
record.target(),
emoji,
record.args()
)
} else {
writeln!(buf, "[{:<5}] {} {}", level, emoji, record.args())
}
} else if debug_mode {
writeln!(
buf,
"[{} {:<5} {}] {}",
timestamp,
level,
record.target(),
record.args()
)
} else {
writeln!(buf, "[{:<5}] {}", level, record.args())
}
})
.init();
}
async fn handle_events(base_progress: MultiProgress) {
let progress_tracker = Arc::new(IndicatifProgressTracker::new(base_progress.clone()));
async fn handle_events() {
let preparing_topology = Arc::new(Mutex::new(false));
let current_score: Arc<Mutex<Option<String>>> = Arc::new(Mutex::new(None));
instrumentation::subscribe("Harmony CLI Logger", {
move |event| {
let progress_tracker = Arc::clone(&progress_tracker);
let preparing_topology = Arc::clone(&preparing_topology);
let current_score = Arc::clone(&current_score);
@@ -59,90 +102,57 @@ async fn handle_events(base_progress: MultiProgress) {
match event {
HarmonyEvent::HarmonyStarted => {}
HarmonyEvent::HarmonyFinished => {
progress_tracker.add_section(
"harmony-summary",
&format!("\n{} Harmony completed\n\n", crate::theme::EMOJI_HARMONY),
);
progress_tracker.add_section("harmony-finished", "\n\n");
thread::sleep(Duration::from_millis(200));
let emoji = crate::theme::EMOJI_HARMONY.to_string();
info!(emoji = emoji.as_str(); "Harmony completed");
return false;
}
HarmonyEvent::TopologyStateChanged {
topology,
status,
message,
} => {
let section_key = topology_key(&topology);
match status {
TopologyStatus::Queued => {}
TopologyStatus::Preparing => {
progress_tracker.add_section(
&section_key,
&format!(
"\n{} Preparing environment: {topology}...",
crate::theme::EMOJI_TOPOLOGY
),
);
(*preparing_topology) = true;
}
TopologyStatus::Success => {
(*preparing_topology) = false;
progress_tracker.add_task(&section_key, "topology-success", "");
progress_tracker
.finish_task("topology-success", &message.unwrap_or("".into()));
}
TopologyStatus::Noop => {
(*preparing_topology) = false;
progress_tracker.add_task(&section_key, "topology-skip", "");
progress_tracker
.skip_task("topology-skip", &message.unwrap_or("".into()));
}
TopologyStatus::Error => {
progress_tracker.add_task(&section_key, "topology-error", "");
(*preparing_topology) = false;
progress_tracker
.fail_task("topology-error", &message.unwrap_or("".into()));
} => match status {
TopologyStatus::Queued => {}
TopologyStatus::Preparing => {
let emoji = format!("{}", style(crate::theme::EMOJI_TOPOLOGY.to_string()).yellow());
info!(emoji = emoji.as_str(); "Preparing environment: {topology}...");
(*preparing_topology) = true;
}
TopologyStatus::Success => {
(*preparing_topology) = false;
if let Some(message) = message {
info!(status = "finished"; "{message}");
}
}
}
TopologyStatus::Noop => {
(*preparing_topology) = false;
if let Some(message) = message {
info!(status = "skipped"; "{message}");
}
}
TopologyStatus::Error => {
(*preparing_topology) = false;
if let Some(message) = message {
error!(status = "failed"; "{message}");
}
}
},
HarmonyEvent::InterpretExecutionStarted {
execution_id: task_key,
topology,
execution_id: _,
topology: _,
interpret: _,
score,
message,
} => {
let is_key_topology = (*preparing_topology)
&& progress_tracker.contains_section(&topology_key(&topology));
let is_key_current_score = current_score.is_some()
&& progress_tracker
.contains_section(&score_key(&current_score.clone().unwrap()));
let is_key_score = progress_tracker.contains_section(&score_key(&score));
let section_key = if is_key_topology {
topology_key(&topology)
} else if is_key_current_score {
score_key(&current_score.clone().unwrap())
} else if is_key_score {
score_key(&score)
if *preparing_topology || current_score.is_some() {
info!("{message}");
} else {
(*current_score) = Some(score.clone());
let key = score_key(&score);
progress_tracker.add_section(
&key,
&format!(
"{} Interpreting score: {score}...",
crate::theme::EMOJI_SCORE
),
);
key
};
progress_tracker.add_task(&section_key, &task_key, &message);
let emoji = format!("{}", style(crate::theme::EMOJI_SCORE).blue());
info!(emoji = emoji.as_str(); "Interpreting score: {score}...");
}
}
HarmonyEvent::InterpretExecutionFinished {
execution_id: task_key,
execution_id: _,
topology: _,
interpret: _,
score,
@@ -155,16 +165,17 @@ async fn handle_events(base_progress: MultiProgress) {
match outcome {
Ok(outcome) => match outcome.status {
harmony::interpret::InterpretStatus::SUCCESS => {
progress_tracker.finish_task(&task_key, &outcome.message);
info!(status = "finished"; "{}", outcome.message);
}
harmony::interpret::InterpretStatus::NOOP => {
progress_tracker.skip_task(&task_key, &outcome.message);
info!(status = "skipped"; "{}", outcome.message);
}
_ => {
error!(status = "failed"; "{}", outcome.message);
}
_ => progress_tracker.fail_task(&task_key, &outcome.message),
},
Err(err) => {
error!("Interpret error: {err}");
progress_tracker.fail_task(&task_key, &err.to_string());
error!(status = "failed"; "{}", err);
}
}
}
@@ -173,30 +184,17 @@ async fn handle_events(base_progress: MultiProgress) {
application,
feature,
status,
} => {
if let Some(score) = &(*current_score) {
let section_key = score_key(score);
let task_key = app_feature_key(&application, &feature);
match status {
ApplicationFeatureStatus::Installing => {
let message = format!("Feature '{}' installing...", feature);
progress_tracker.add_task(&section_key, &task_key, &message);
}
ApplicationFeatureStatus::Installed => {
let message = format!("Feature '{}' installed", feature);
progress_tracker.finish_task(&task_key, &message);
}
ApplicationFeatureStatus::Failed { details } => {
let message = format!(
"Feature '{}' installation failed: {}",
feature, details
);
progress_tracker.fail_task(&task_key, &message);
}
}
} => match status {
ApplicationFeatureStatus::Installing => {
info!("Installing feature '{}' for '{}'...", feature, application);
}
}
ApplicationFeatureStatus::Installed => {
info!(status = "finished"; "Feature '{}' installed", feature);
}
ApplicationFeatureStatus::Failed { details } => {
error!(status = "failed"; "Feature '{}' installation failed: {}", feature, details);
}
},
}
true
}
@@ -204,15 +202,3 @@ async fn handle_events(base_progress: MultiProgress) {
})
.await;
}
fn topology_key(topology: &str) -> String {
format!("topology-{topology}")
}
fn score_key(score: &str) -> String {
format!("score-{score}")
}
fn app_feature_key(application: &str, feature: &str) -> String {
format!("app-{application}-{feature}")
}

View File

@@ -90,13 +90,37 @@ pub async fn run<T: Topology + Send + Sync + 'static>(
topology: T,
scores: Vec<Box<dyn Score<T>>>,
args_struct: Option<Args>,
) -> Result<(), Box<dyn std::error::Error>> {
let args = match args_struct {
Some(args) => args,
None => Args::parse(),
};
#[cfg(not(feature = "tui"))]
if args.interactive {
return Err("Not compiled with interactive support".into());
}
#[cfg(feature = "tui")]
if args.interactive {
return harmony_tui::run(inventory, topology, scores).await;
}
run_cli(inventory, topology, scores, args).await
}
pub async fn run_cli<T: Topology + Send + Sync + 'static>(
inventory: Inventory,
topology: T,
scores: Vec<Box<dyn Score<T>>>,
args: Args,
) -> Result<(), Box<dyn std::error::Error>> {
let cli_logger_handle = cli_logger::init();
let mut maestro = Maestro::initialize(inventory, topology).await.unwrap();
maestro.register_all(scores);
let result = init(maestro, args_struct).await;
let result = init(maestro, args).await;
instrumentation::instrument(instrumentation::HarmonyEvent::HarmonyFinished).unwrap();
let _ = tokio::try_join!(cli_logger_handle);
@@ -105,23 +129,8 @@ pub async fn run<T: Topology + Send + Sync + 'static>(
async fn init<T: Topology + Send + Sync + 'static>(
maestro: harmony::maestro::Maestro<T>,
args_struct: Option<Args>,
args: Args,
) -> Result<(), Box<dyn std::error::Error>> {
let args = match args_struct {
Some(args) => args,
None => Args::parse(),
};
#[cfg(feature = "tui")]
if args.interactive {
return harmony_tui::init(maestro).await;
}
#[cfg(not(feature = "tui"))]
if args.interactive {
return Err("Not compiled with interactive support".into());
}
let _ = env_logger::builder().try_init();
let scores_vec = maestro_scores_filter(&maestro, args.all, args.filter, args.number);
@@ -193,14 +202,14 @@ mod tests {
let maestro = init_test_maestro();
let res = crate::init(
maestro,
Some(crate::Args {
crate::Args {
yes: true,
filter: Some("SuccessScore".to_owned()),
interactive: false,
all: true,
number: 0,
list: false,
}),
},
)
.await;
@@ -213,14 +222,14 @@ mod tests {
let res = crate::init(
maestro,
Some(crate::Args {
crate::Args {
yes: true,
filter: Some("ErrorScore".to_owned()),
interactive: false,
all: true,
number: 0,
list: false,
}),
},
)
.await;
@@ -233,14 +242,14 @@ mod tests {
let res = crate::init(
maestro,
Some(crate::Args {
crate::Args {
yes: true,
filter: None,
interactive: false,
all: false,
number: 0,
list: false,
}),
},
)
.await;

View File

@@ -0,0 +1,12 @@
[package]
name = "harmony_inventory_agent"
version = "0.1.0"
edition = "2024"
[dependencies]
actix-web = "4.4"
sysinfo = "0.30"
serde.workspace = true
serde_json.workspace = true
log.workspace = true
env_logger.workspace = true

View File

@@ -0,0 +1,825 @@
use log::debug;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use std::fs;
use std::path::Path;
use std::process::Command;
use sysinfo::System;
#[derive(Serialize, Deserialize, Debug)]
pub struct PhysicalHost {
pub storage_drives: Vec<StorageDrive>,
pub storage_controller: StorageController,
pub memory_modules: Vec<MemoryModule>,
pub cpus: Vec<CPU>,
pub chipset: Chipset,
pub network_interfaces: Vec<NetworkInterface>,
pub management_interface: Option<ManagementInterface>,
pub host_uuid: String,
}
#[derive(Serialize, Deserialize, Debug)]
pub struct StorageDrive {
pub name: String,
pub model: String,
pub serial: String,
pub size_bytes: u64,
pub logical_block_size: u32,
pub physical_block_size: u32,
pub rotational: bool,
pub wwn: Option<String>,
pub interface_type: String,
pub smart_status: Option<String>,
}
#[derive(Serialize, Deserialize, Debug)]
pub struct StorageController {
pub name: String,
pub driver: String,
}
#[derive(Serialize, Deserialize, Debug)]
pub struct MemoryModule {
pub size_bytes: u64,
pub speed_mhz: Option<u32>,
pub manufacturer: Option<String>,
pub part_number: Option<String>,
pub serial_number: Option<String>,
pub rank: Option<u8>,
}
#[derive(Serialize, Deserialize, Debug)]
pub struct CPU {
pub model: String,
pub vendor: String,
pub cores: u32,
pub threads: u32,
pub frequency_mhz: u64,
}
#[derive(Serialize, Deserialize, Debug)]
pub struct Chipset {
pub name: String,
pub vendor: String,
}
#[derive(Serialize, Deserialize, Debug)]
pub struct NetworkInterface {
pub name: String,
pub mac_address: String,
pub speed_mbps: Option<u32>,
pub is_up: bool,
pub mtu: u32,
pub ipv4_addresses: Vec<String>,
pub ipv6_addresses: Vec<String>,
pub driver: String,
pub firmware_version: Option<String>,
}
#[derive(Serialize, Deserialize, Debug)]
pub struct ManagementInterface {
pub kind: String,
pub address: Option<String>,
pub firmware: Option<String>,
}
impl PhysicalHost {
pub fn gather() -> Result<Self, String> {
let mut sys = System::new_all();
sys.refresh_all();
Self::all_tools_available()?;
Ok(Self {
storage_drives: Self::gather_storage_drives()?,
storage_controller: Self::gather_storage_controller()?,
memory_modules: Self::gather_memory_modules()?,
cpus: Self::gather_cpus(&sys)?,
chipset: Self::gather_chipset()?,
network_interfaces: Self::gather_network_interfaces()?,
management_interface: Self::gather_management_interface()?,
host_uuid: Self::get_host_uuid()?,
})
}
fn all_tools_available() -> Result<(), String> {
let required_tools = [
("lsblk", "--version"),
("lspci", "--version"),
("lsmod", "--version"),
("dmidecode", "--version"),
("smartctl", "--version"),
("ip", "route"), // No version flag available
];
let mut missing_tools = Vec::new();
for (tool, tool_arg) in required_tools.iter() {
// First check if tool exists in PATH using which(1)
let exists = if let Ok(output) = Command::new("which").arg(tool).output() {
output.status.success()
} else {
// Fallback: manual PATH search if which(1) is unavailable
if let Ok(path_var) = std::env::var("PATH") {
path_var.split(':').any(|dir| {
let tool_path = std::path::Path::new(dir).join(tool);
tool_path.exists() && Self::is_executable(&tool_path)
})
} else {
false
}
};
if !exists {
missing_tools.push(*tool);
continue;
}
// Verify tool is functional by checking version/help output
let mut cmd = Command::new(tool);
cmd.arg(tool_arg);
cmd.stdout(std::process::Stdio::null());
cmd.stderr(std::process::Stdio::null());
if let Ok(status) = cmd.status() {
if !status.success() {
missing_tools.push(*tool);
}
} else {
missing_tools.push(*tool);
}
}
if !missing_tools.is_empty() {
let missing_str = missing_tools
.iter()
.map(|s| s.to_string())
.collect::<Vec<String>>()
.join(", ");
return Err(format!(
"The following required tools are not available: {}. Please install these tools to use PhysicalHost::gather()",
missing_str
));
}
Ok(())
}
#[cfg(unix)]
fn is_executable(path: &std::path::Path) -> bool {
use std::os::unix::fs::PermissionsExt;
match std::fs::metadata(path) {
Ok(meta) => meta.permissions().mode() & 0o111 != 0,
Err(_) => false,
}
}
#[cfg(not(unix))]
fn is_executable(_path: &std::path::Path) -> bool {
// On non-Unix systems, we assume existence implies executability
true
}
fn gather_storage_drives() -> Result<Vec<StorageDrive>, String> {
let mut drives = Vec::new();
// Use lsblk with JSON output for robust parsing
let output = Command::new("lsblk")
.args([
"-d",
"-o",
"NAME,MODEL,SERIAL,SIZE,ROTA,WWN",
"-n",
"-e",
"7",
"--json",
])
.output()
.map_err(|e| format!("Failed to execute lsblk: {}", e))?;
if !output.status.success() {
return Err(format!(
"lsblk command failed: {}",
String::from_utf8_lossy(&output.stderr)
));
}
let json: Value = serde_json::from_slice(&output.stdout)
.map_err(|e| format!("Failed to parse lsblk JSON output: {}", e))?;
let blockdevices = json
.get("blockdevices")
.and_then(|v| v.as_array())
.ok_or("Invalid lsblk JSON: missing 'blockdevices' array")?;
for device in blockdevices {
let name = device
.get("name")
.and_then(|v| v.as_str())
.ok_or("Missing 'name' in lsblk device")?
.to_string();
if name.is_empty() {
continue;
}
let model = device
.get("model")
.and_then(|v| v.as_str())
.map(|s| s.trim().to_string())
.unwrap_or_default();
let serial = device
.get("serial")
.and_then(|v| v.as_str())
.map(|s| s.trim().to_string())
.unwrap_or_default();
let size_str = device
.get("size")
.and_then(|v| v.as_str())
.ok_or("Missing 'size' in lsblk device")?;
let size_bytes = Self::parse_size(size_str)?;
let rotational = device
.get("rota")
.and_then(|v| v.as_bool())
.ok_or("Missing 'rota' in lsblk device")?;
let wwn = device
.get("wwn")
.and_then(|v| v.as_str())
.map(|s| s.trim().to_string())
.filter(|s| !s.is_empty() && s != "null");
let device_path = Path::new("/sys/block").join(&name);
let logical_block_size = Self::read_sysfs_u32(
&device_path.join("queue/logical_block_size"),
)
.map_err(|e| format!("Failed to read logical block size for {}: {}", name, e))?;
let physical_block_size = Self::read_sysfs_u32(
&device_path.join("queue/physical_block_size"),
)
.map_err(|e| format!("Failed to read physical block size for {}: {}", name, e))?;
let interface_type = Self::get_interface_type(&name, &device_path)?;
let smart_status = Self::get_smart_status(&name)?;
let mut drive = StorageDrive {
name: name.clone(),
model,
serial,
size_bytes,
logical_block_size,
physical_block_size,
rotational,
wwn,
interface_type,
smart_status,
};
// Enhance with additional sysfs info if available
if device_path.exists() {
if drive.model.is_empty() {
drive.model = Self::read_sysfs_string(&device_path.join("device/model"))
.map_err(|e| format!("Failed to read model for {}: {}", name, e))?;
}
if drive.serial.is_empty() {
drive.serial = Self::read_sysfs_string(&device_path.join("device/serial"))
.map_err(|e| format!("Failed to read serial for {}: {}", name, e))?;
}
}
drives.push(drive);
}
Ok(drives)
}
fn gather_storage_controller() -> Result<StorageController, String> {
let mut controller = StorageController {
name: "Unknown".to_string(),
driver: "Unknown".to_string(),
};
// Use lspci with JSON output if available
let output = Command::new("lspci")
.args(["-nn", "-d", "::0100", "-J"]) // Storage controllers class with JSON
.output()
.map_err(|e| format!("Failed to execute lspci: {}", e))?;
if output.status.success() {
let json: Value = serde_json::from_slice(&output.stdout)
.map_err(|e| format!("Failed to parse lspci JSON output: {}", e))?;
if let Some(devices) = json.as_array() {
for device in devices {
if let Some(device_info) = device.as_object()
&& let Some(name) = device_info
.get("device")
.and_then(|v| v.as_object())
.and_then(|v| v.get("name"))
.and_then(|v| v.as_str())
{
controller.name = name.to_string();
break;
}
}
}
}
// Fallback to text output if JSON fails or no device found
if controller.name == "Unknown" {
let output = Command::new("lspci")
.args(["-nn", "-d", "::0100"]) // Storage controllers class
.output()
.map_err(|e| format!("Failed to execute lspci (fallback): {}", e))?;
if output.status.success() {
let output_str = String::from_utf8_lossy(&output.stdout);
if let Some(line) = output_str.lines().next() {
let parts: Vec<&str> = line.split(':').collect();
if parts.len() > 2 {
controller.name = parts[2].trim().to_string();
}
}
}
}
// Try to get driver info from lsmod
let output = Command::new("lsmod")
.output()
.map_err(|e| format!("Failed to execute lsmod: {}", e))?;
if output.status.success() {
let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() {
if line.contains("ahci")
|| line.contains("nvme")
|| line.contains("megaraid")
|| line.contains("mpt3sas")
{
let parts: Vec<&str> = line.split_whitespace().collect();
if !parts.is_empty() {
controller.driver = parts[0].to_string();
break;
}
}
}
}
Ok(controller)
}
fn gather_memory_modules() -> Result<Vec<MemoryModule>, String> {
let mut modules = Vec::new();
let output = Command::new("dmidecode")
.arg("--type")
.arg("17")
.output()
.map_err(|e| format!("Failed to execute dmidecode: {}", e))?;
if !output.status.success() {
return Err(format!(
"dmidecode command failed: {}",
String::from_utf8_lossy(&output.stderr)
));
}
let output_str = String::from_utf8(output.stdout)
.map_err(|e| format!("Failed to parse dmidecode output: {}", e))?;
let sections: Vec<&str> = output_str.split("Memory Device").collect();
for section in sections.into_iter().skip(1) {
let mut module = MemoryModule {
size_bytes: 0,
speed_mhz: None,
manufacturer: None,
part_number: None,
serial_number: None,
rank: None,
};
for line in section.lines() {
let line = line.trim();
if let Some(size_str) = line.strip_prefix("Size: ") {
if size_str != "No Module Installed"
&& let Some((num, unit)) = size_str.split_once(' ')
&& let Ok(num) = num.parse::<u64>()
{
module.size_bytes = match unit {
"MB" => num * 1024 * 1024,
"GB" => num * 1024 * 1024 * 1024,
"KB" => num * 1024,
_ => 0,
};
}
} else if let Some(speed_str) = line.strip_prefix("Speed: ") {
if let Some((num, _unit)) = speed_str.split_once(' ') {
module.speed_mhz = num.parse().ok();
}
} else if let Some(man) = line.strip_prefix("Manufacturer: ") {
module.manufacturer = Some(man.to_string());
} else if let Some(part) = line.strip_prefix("Part Number: ") {
module.part_number = Some(part.to_string());
} else if let Some(serial) = line.strip_prefix("Serial Number: ") {
module.serial_number = Some(serial.to_string());
} else if let Some(rank) = line.strip_prefix("Rank: ") {
module.rank = rank.parse().ok();
}
}
if module.size_bytes > 0 {
modules.push(module);
}
}
Ok(modules)
}
fn gather_cpus(sys: &System) -> Result<Vec<CPU>, String> {
let mut cpus = Vec::new();
let global_cpu = sys.global_cpu_info();
cpus.push(CPU {
model: global_cpu.brand().to_string(),
vendor: global_cpu.vendor_id().to_string(),
cores: sys.physical_core_count().unwrap_or(1) as u32,
threads: sys.cpus().len() as u32,
frequency_mhz: global_cpu.frequency(),
});
Ok(cpus)
}
fn gather_chipset() -> Result<Chipset, String> {
Ok(Chipset {
name: Self::read_dmi("baseboard-product-name")?,
vendor: Self::read_dmi("baseboard-manufacturer")?,
})
}
fn gather_network_interfaces() -> Result<Vec<NetworkInterface>, String> {
let mut interfaces = Vec::new();
let sys_net_path = Path::new("/sys/class/net");
let entries = fs::read_dir(sys_net_path)
.map_err(|e| format!("Failed to read /sys/class/net: {}", e))?;
for entry in entries {
let entry = entry.map_err(|e| format!("Failed to read directory entry: {}", e))?;
let iface_name = entry
.file_name()
.into_string()
.map_err(|_| "Invalid UTF-8 in interface name")?;
let iface_path = entry.path();
// Skip virtual interfaces
if iface_name.starts_with("lo")
|| iface_name.starts_with("docker")
|| iface_name.starts_with("virbr")
|| iface_name.starts_with("veth")
|| iface_name.starts_with("br-")
|| iface_name.starts_with("tun")
|| iface_name.starts_with("wg")
{
continue;
}
// Check if it's a physical interface by looking for device directory
if !iface_path.join("device").exists() {
continue;
}
let mac_address = Self::read_sysfs_string(&iface_path.join("address"))
.map_err(|e| format!("Failed to read MAC address for {}: {}", iface_name, e))?;
let speed_mbps = if iface_path.join("speed").exists() {
match Self::read_sysfs_u32(&iface_path.join("speed")) {
Ok(speed) => Some(speed),
Err(e) => {
debug!(
"Failed to read speed for {}: {} . This is expected to fail on wifi interfaces.",
iface_name, e
);
None
}
}
} else {
None
};
let operstate = Self::read_sysfs_string(&iface_path.join("operstate"))
.map_err(|e| format!("Failed to read operstate for {}: {}", iface_name, e))?;
let mtu = Self::read_sysfs_u32(&iface_path.join("mtu"))
.map_err(|e| format!("Failed to read MTU for {}: {}", iface_name, e))?;
let driver =
Self::read_sysfs_symlink_basename(&iface_path.join("device/driver/module"))
.map_err(|e| format!("Failed to read driver for {}: {}", iface_name, e))?;
let firmware_version = Self::read_sysfs_opt_string(
&iface_path.join("device/firmware_version"),
)
.map_err(|e| format!("Failed to read firmware version for {}: {}", iface_name, e))?;
// Get IP addresses using ip command with JSON output
let (ipv4_addresses, ipv6_addresses) = Self::get_interface_ips_json(&iface_name)
.map_err(|e| format!("Failed to get IP addresses for {}: {}", iface_name, e))?;
interfaces.push(NetworkInterface {
name: iface_name,
mac_address,
speed_mbps,
is_up: operstate == "up",
mtu,
ipv4_addresses,
ipv6_addresses,
driver,
firmware_version,
});
}
Ok(interfaces)
}
fn gather_management_interface() -> Result<Option<ManagementInterface>, String> {
if Path::new("/dev/ipmi0").exists() {
Ok(Some(ManagementInterface {
kind: "IPMI".to_string(),
address: None,
firmware: Some(Self::read_dmi("bios-version")?),
}))
} else if Path::new("/sys/class/misc/mei").exists() {
Ok(Some(ManagementInterface {
kind: "Intel ME".to_string(),
address: None,
firmware: None,
}))
} else {
Ok(None)
}
}
fn get_host_uuid() -> Result<String, String> {
Self::read_dmi("system-uuid")
}
// Helper methods
fn read_sysfs_string(path: &Path) -> Result<String, String> {
fs::read_to_string(path)
.map(|s| s.trim().to_string())
.map_err(|e| format!("Failed to read {}: {}", path.display(), e))
}
fn read_sysfs_opt_string(path: &Path) -> Result<Option<String>, String> {
match fs::read_to_string(path) {
Ok(s) => {
let s = s.trim().to_string();
Ok(if s.is_empty() { None } else { Some(s) })
}
Err(e) if e.kind() == std::io::ErrorKind::NotFound => Ok(None),
Err(e) => Err(format!("Failed to read {}: {}", path.display(), e)),
}
}
fn read_sysfs_u32(path: &Path) -> Result<u32, String> {
fs::read_to_string(path)
.map_err(|e| format!("Failed to read {}: {}", path.display(), e))?
.trim()
.parse()
.map_err(|e| format!("Failed to parse {}: {}", path.display(), e))
}
fn read_sysfs_symlink_basename(path: &Path) -> Result<String, String> {
match fs::read_link(path) {
Ok(target_path) => match target_path.file_name() {
Some(name_osstr) => match name_osstr.to_str() {
Some(name_str) => Ok(name_str.to_string()),
None => Err(format!(
"Symlink target basename is not valid UTF-8: {}",
target_path.display()
)),
},
None => Err(format!(
"Symlink target has no basename: {} -> {}",
path.display(),
target_path.display()
)),
},
Err(e) if e.kind() == std::io::ErrorKind::NotFound => Err(format!(
"Could not resolve symlink for path : {}",
path.display()
)),
Err(e) => Err(format!("Failed to read symlink {}: {}", path.display(), e)),
}
}
fn read_dmi(field: &str) -> Result<String, String> {
let output = Command::new("dmidecode")
.arg("-s")
.arg(field)
.output()
.map_err(|e| format!("Failed to execute dmidecode for field {}: {}", field, e))?;
if !output.status.success() {
return Err(format!(
"dmidecode command failed for field {}: {}",
field,
String::from_utf8_lossy(&output.stderr)
));
}
String::from_utf8(output.stdout)
.map(|s| s.trim().to_string())
.map_err(|e| {
format!(
"Failed to parse dmidecode output for field {}: {}",
field, e
)
})
}
fn get_interface_type(device_name: &str, device_path: &Path) -> Result<String, String> {
if device_name.starts_with("nvme") {
Ok("NVMe".to_string())
} else if device_name.starts_with("sd") {
Ok("SATA".to_string())
} else if device_name.starts_with("hd") {
Ok("IDE".to_string())
} else if device_name.starts_with("vd") {
Ok("VirtIO".to_string())
} else {
// Try to determine from device path
let subsystem = Self::read_sysfs_string(&device_path.join("device/subsystem"))?;
Ok(subsystem
.split('/')
.next_back()
.unwrap_or("Unknown")
.to_string())
}
}
fn get_smart_status(device_name: &str) -> Result<Option<String>, String> {
let output = Command::new("smartctl")
.arg("-H")
.arg(format!("/dev/{}", device_name))
.output()
.map_err(|e| format!("Failed to execute smartctl for {}: {}", device_name, e))?;
if !output.status.success() {
return Ok(None);
}
let stdout = String::from_utf8(output.stdout)
.map_err(|e| format!("Failed to parse smartctl output for {}: {}", device_name, e))?;
for line in stdout.lines() {
if line.contains("SMART overall-health self-assessment") {
if let Some(status) = line.split(':').nth(1) {
return Ok(Some(status.trim().to_string()));
}
}
}
Ok(None)
}
fn parse_size(size_str: &str) -> Result<u64, String> {
debug!("Parsing size_str '{size_str}'");
let size;
if size_str.ends_with('T') {
size = size_str[..size_str.len() - 1]
.parse::<f64>()
.map(|t| t * 1024.0 * 1024.0 * 1024.0 * 1024.0)
.map_err(|e| format!("Failed to parse T size '{}': {}", size_str, e))
} else if size_str.ends_with('G') {
size = size_str[..size_str.len() - 1]
.parse::<f64>()
.map(|g| g * 1024.0 * 1024.0 * 1024.0)
.map_err(|e| format!("Failed to parse G size '{}': {}", size_str, e))
} else if size_str.ends_with('M') {
size = size_str[..size_str.len() - 1]
.parse::<f64>()
.map(|m| m * 1024.0 * 1024.0)
.map_err(|e| format!("Failed to parse M size '{}': {}", size_str, e))
} else if size_str.ends_with('K') {
size = size_str[..size_str.len() - 1]
.parse::<f64>()
.map(|k| k * 1024.0)
.map_err(|e| format!("Failed to parse K size '{}': {}", size_str, e))
} else if size_str.ends_with('B') {
size = size_str[..size_str.len() - 1]
.parse::<f64>()
.map_err(|e| format!("Failed to parse B size '{}': {}", size_str, e))
} else {
size = size_str
.parse::<f64>()
.map_err(|e| format!("Failed to parse size '{}': {}", size_str, e))
}
size.map(|s| s as u64)
}
fn get_interface_ips_json(iface_name: &str) -> Result<(Vec<String>, Vec<String>), String> {
let mut ipv4 = Vec::new();
let mut ipv6 = Vec::new();
// Get IPv4 addresses using JSON output
let output = Command::new("ip")
.args(["-j", "-4", "addr", "show", iface_name])
.output()
.map_err(|e| {
format!(
"Failed to execute ip command for IPv4 on {}: {}",
iface_name, e
)
})?;
if !output.status.success() {
return Err(format!(
"ip command for IPv4 on {} failed: {}",
iface_name,
String::from_utf8_lossy(&output.stderr)
));
}
let json: Value = serde_json::from_slice(&output.stdout).map_err(|e| {
format!(
"Failed to parse ip JSON output for IPv4 on {}: {}",
iface_name, e
)
})?;
if let Some(addrs) = json.as_array() {
for addr_info in addrs {
if let Some(addr_info_obj) = addr_info.as_object()
&& let Some(addr_info) =
addr_info_obj.get("addr_info").and_then(|v| v.as_array())
{
for addr in addr_info {
if let Some(addr_obj) = addr.as_object()
&& let Some(ip) = addr_obj.get("local").and_then(|v| v.as_str())
{
ipv4.push(ip.to_string());
}
}
}
}
}
// Get IPv6 addresses using JSON output
let output = Command::new("ip")
.args(["-j", "-6", "addr", "show", iface_name])
.output()
.map_err(|e| {
format!(
"Failed to execute ip command for IPv6 on {}: {}",
iface_name, e
)
})?;
if !output.status.success() {
return Err(format!(
"ip command for IPv6 on {} failed: {}",
iface_name,
String::from_utf8_lossy(&output.stderr)
));
}
let json: Value = serde_json::from_slice(&output.stdout).map_err(|e| {
format!(
"Failed to parse ip JSON output for IPv6 on {}: {}",
iface_name, e
)
})?;
if let Some(addrs) = json.as_array() {
for addr_info in addrs {
if let Some(addr_info_obj) = addr_info.as_object()
&& let Some(addr_info) =
addr_info_obj.get("addr_info").and_then(|v| v.as_array())
{
for addr in addr_info {
if let Some(addr_obj) = addr.as_object()
&& let Some(ip) = addr_obj.get("local").and_then(|v| v.as_str())
{
// Skip link-local addresses
if !ip.starts_with("fe80::") {
ipv6.push(ip.to_string());
}
}
}
}
}
}
Ok((ipv4, ipv6))
}
}

View File

@@ -0,0 +1,37 @@
// src/main.rs
use actix_web::{App, HttpServer, Responder, get};
use hwinfo::PhysicalHost;
use std::env;
mod hwinfo;
#[get("/inventory")]
async fn inventory() -> impl Responder {
log::info!("Received inventory request");
let host = PhysicalHost::gather();
match host {
Ok(host) => {
log::info!("Inventory data gathered successfully");
actix_web::HttpResponse::Ok().json(host)
}
Err(error) => {
log::error!("Inventory data gathering FAILED");
actix_web::HttpResponse::InternalServerError().json(error)
}
}
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
env_logger::init();
let port = env::var("HARMONY_INVENTORY_AGENT_PORT").unwrap_or_else(|_| "8080".to_string());
let bind_addr = format!("0.0.0.0:{}", port);
log::info!("Starting inventory agent on {}", bind_addr);
HttpServer::new(|| App::new().service(inventory))
.bind(&bind_addr)?
.run()
.await
}

View File

@@ -9,7 +9,13 @@ use widget::{help::HelpWidget, score::ScoreListWidget};
use std::{panic, sync::Arc, time::Duration};
use crossterm::event::{Event, EventStream, KeyCode, KeyEventKind};
use harmony::{maestro::Maestro, score::Score, topology::Topology};
use harmony::{
instrumentation::{self, HarmonyEvent},
inventory::Inventory,
maestro::Maestro,
score::Score,
topology::Topology,
};
use ratatui::{
self, Frame,
layout::{Constraint, Layout, Position},
@@ -39,22 +45,62 @@ pub mod tui {
///
/// #[tokio::main]
/// async fn main() {
/// let inventory = Inventory::autoload();
/// let topology = HAClusterTopology::autoload();
/// let mut maestro = Maestro::new_without_initialization(inventory, topology);
///
/// maestro.register_all(vec![
/// Box::new(SuccessScore {}),
/// Box::new(ErrorScore {}),
/// Box::new(PanicScore {}),
/// ]);
/// harmony_tui::init(maestro).await.unwrap();
/// harmony_tui::run(
/// Inventory::autoload(),
/// HAClusterTopology::autoload(),
/// vec![
/// Box::new(SuccessScore {}),
/// Box::new(ErrorScore {}),
/// Box::new(PanicScore {}),
/// ]
/// ).await.unwrap();
/// }
/// ```
pub async fn init<T: Topology + Send + Sync + 'static>(
pub async fn run<T: Topology + Send + Sync + 'static>(
inventory: Inventory,
topology: T,
scores: Vec<Box<dyn Score<T>>>,
) -> Result<(), Box<dyn std::error::Error>> {
let handle = init_instrumentation().await;
let mut maestro = Maestro::initialize(inventory, topology).await.unwrap();
maestro.register_all(scores);
let result = init(maestro).await;
let _ = tokio::try_join!(handle);
result
}
async fn init<T: Topology + Send + Sync + 'static>(
maestro: Maestro<T>,
) -> Result<(), Box<dyn std::error::Error>> {
HarmonyTUI::new(maestro).init().await
let result = HarmonyTUI::new(maestro).init().await;
instrumentation::instrument(HarmonyEvent::HarmonyFinished).unwrap();
result
}
async fn init_instrumentation() -> tokio::task::JoinHandle<()> {
let handle = tokio::spawn(handle_harmony_events());
loop {
if instrumentation::instrument(HarmonyEvent::HarmonyStarted).is_ok() {
break;
}
}
handle
}
async fn handle_harmony_events() {
instrumentation::subscribe("Harmony TUI Logger", async |event| {
if let HarmonyEvent::HarmonyFinished = event {
return false;
};
true
})
.await;
}
pub struct HarmonyTUI<T: Topology> {

17
iobench/Cargo.toml Normal file
View File

@@ -0,0 +1,17 @@
[package]
name = "iobench"
edition = "2024"
version = "1.0.0"
license = "AGPL-3.0-or-later"
description = "A small command line utility to run fio benchmarks on localhost or remote ssh or kubernetes host. Was born out of a need to benchmark various ceph configurations!"
[dependencies]
clap = { version = "4.0", features = ["derive"] }
chrono = "0.4"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
csv = "1.1"
num_cpus = "1.13"
[workspace]

10
iobench/dash/README.md Normal file
View File

@@ -0,0 +1,10 @@
This project was generated mostly by Gemini but it works so... :)
## To run iobench dashboard
```bash
virtualenv venv
source venv/bin/activate
pip install -r requirements_freeze.txt
python iobench-dash-v4.py
```

View File

@@ -0,0 +1,229 @@
import dash
from dash import dcc, html, Input, Output, State, clientside_callback, ClientsideFunction
import plotly.express as px
import pandas as pd
import dash_bootstrap_components as dbc
import io
# --- Data Loading and Preparation ---
# csv_data = """label,test_name,iops,bandwidth_kibps,latency_mean_ms,latency_stddev_ms
# Ceph HDD Only,read-4k-sync-test,1474.302,5897,0.673,0.591
# Ceph HDD Only,write-4k-sync-test,14.126,56,27.074,7.046
# Ceph HDD Only,randread-4k-sync-test,225.140,900,4.436,6.918
# Ceph HDD Only,randwrite-4k-sync-test,13.129,52,34.891,10.859
# Ceph HDD Only,multiread-4k-sync-test,6873.675,27494,0.578,0.764
# Ceph HDD Only,multiwrite-4k-sync-test,57.135,228,38.660,11.293
# Ceph HDD Only,multirandread-4k-sync-test,2451.376,9805,1.626,2.515
# Ceph HDD Only,multirandwrite-4k-sync-test,54.642,218,33.492,13.111
# Ceph 2 Hosts WAL+DB SSD and 1 Host HDD,read-4k-sync-test,1495.700,5982,0.664,1.701
# Ceph 2 Hosts WAL+DB SSD and 1 Host HDD,write-4k-sync-test,16.990,67,17.502,9.908
# Ceph 2 Hosts WAL+DB SSD and 1 Host HDD,randread-4k-sync-test,159.256,637,6.274,9.232
# Ceph 2 Hosts WAL+DB SSD and 1 Host HDD,randwrite-4k-sync-test,16.693,66,24.094,16.099
# Ceph 2 Hosts WAL+DB SSD and 1 Host HDD,multiread-4k-sync-test,7305.559,29222,0.544,1.338
# Ceph 2 Hosts WAL+DB SSD and 1 Host HDD,multiwrite-4k-sync-test,52.260,209,34.891,17.576
# Ceph 2 Hosts WAL+DB SSD and 1 Host HDD,multirandread-4k-sync-test,700.606,2802,5.700,10.429
# Ceph 2 Hosts WAL+DB SSD and 1 Host HDD,multirandwrite-4k-sync-test,52.723,210,29.709,25.829
# Ceph 2 Hosts WAL+DB SSD Only,randwrite-4k-sync-test,90.037,360,3.617,8.321
# Ceph WAL+DB SSD During Rebuild,randwrite-4k-sync-test,41.008,164,10.138,19.333
# Ceph WAL+DB SSD OSD HDD,read-4k-sync-test,1520.299,6081,0.654,1.539
# Ceph WAL+DB SSD OSD HDD,write-4k-sync-test,78.528,314,4.074,9.101
# Ceph WAL+DB SSD OSD HDD,randread-4k-sync-test,153.303,613,6.518,9.036
# Ceph WAL+DB SSD OSD HDD,randwrite-4k-sync-test,48.677,194,8.785,20.356
# Ceph WAL+DB SSD OSD HDD,multiread-4k-sync-test,6804.880,27219,0.584,1.422
# Ceph WAL+DB SSD OSD HDD,multiwrite-4k-sync-test,311.513,1246,4.978,9.458
# Ceph WAL+DB SSD OSD HDD,multirandread-4k-sync-test,581.756,2327,6.869,10.204
# Ceph WAL+DB SSD OSD HDD,multirandwrite-4k-sync-test,120.556,482,13.463,25.440
# """
#
# df = pd.read_csv(io.StringIO(csv_data))
df = pd.read_csv("iobench.csv") # Replace with the actual file path
df['bandwidth_mbps'] = df['bandwidth_kibps'] / 1024
# --- App Initialization and Global Settings ---
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.FLATLY])
# Create master lists of options for checklists
unique_labels = sorted(df['label'].unique())
unique_tests = sorted(df['test_name'].unique())
# Create a consistent color map for each unique label
color_map = {label: color for label, color in zip(unique_labels, px.colors.qualitative.Plotly)}
# --- App Layout ---
app.layout = dbc.Container([
# Header
dbc.Row(dbc.Col(html.H1("Ceph iobench Performance Dashboard", className="text-primary"),), className="my-4 text-center"),
# Controls and Graphs Row
dbc.Row([
# Control Panel Column
dbc.Col([
dbc.Card([
dbc.CardBody([
html.H4("Control Panel", className="card-title"),
html.Hr(),
# Metric Selection
dbc.Label("1. Select Metrics to Display:", html_for="metric-checklist", className="fw-bold"),
dcc.Checklist(
id='metric-checklist',
options=[
{'label': 'IOPS', 'value': 'iops'},
{'label': 'Latency (ms)', 'value': 'latency_mean_ms'},
{'label': 'Bandwidth (MB/s)', 'value': 'bandwidth_mbps'}
],
value=['iops', 'latency_mean_ms', 'bandwidth_mbps'], # Default selection
labelClassName="d-block"
),
html.Hr(),
# Configuration Selection
dbc.Label("2. Select Configurations:", html_for="config-checklist", className="fw-bold"),
dbc.ButtonGroup([
dbc.Button("All", id="config-select-all", n_clicks=0, color="primary", outline=True, size="sm"),
dbc.Button("None", id="config-select-none", n_clicks=0, color="primary", outline=True, size="sm"),
], className="mb-2"),
dcc.Checklist(
id='config-checklist',
options=[{'label': label, 'value': label} for label in unique_labels],
value=unique_labels, # Select all by default
labelClassName="d-block"
),
html.Hr(),
# Test Name Selection
dbc.Label("3. Select Tests:", html_for="test-checklist", className="fw-bold"),
dbc.ButtonGroup([
dbc.Button("All", id="test-select-all", n_clicks=0, color="primary", outline=True, size="sm"),
dbc.Button("None", id="test-select-none", n_clicks=0, color="primary", outline=True, size="sm"),
], className="mb-2"),
dcc.Checklist(
id='test-checklist',
options=[{'label': test, 'value': test} for test in unique_tests],
value=unique_tests, # Select all by default
labelClassName="d-block"
),
])
], className="mb-4")
], width=12, lg=4),
# Graph Display Column
dbc.Col(id='graph-container', width=12, lg=8)
])
], fluid=True)
# --- Callbacks ---
# Callback to handle "Select All" / "Select None" for configurations
@app.callback(
Output('config-checklist', 'value'),
Input('config-select-all', 'n_clicks'),
Input('config-select-none', 'n_clicks'),
prevent_initial_call=True
)
def select_all_none_configs(all_clicks, none_clicks):
ctx = dash.callback_context
if not ctx.triggered:
return dash.no_update
button_id = ctx.triggered[0]['prop_id'].split('.')[0]
if button_id == 'config-select-all':
return unique_labels
elif button_id == 'config-select-none':
return []
return dash.no_update
# Callback to handle "Select All" / "Select None" for tests
@app.callback(
Output('test-checklist', 'value'),
Input('test-select-all', 'n_clicks'),
Input('test-select-none', 'n_clicks'),
prevent_initial_call=True
)
def select_all_none_tests(all_clicks, none_clicks):
ctx = dash.callback_context
if not ctx.triggered:
return dash.no_update
button_id = ctx.triggered[0]['prop_id'].split('.')[0]
if button_id == 'test-select-all':
return unique_tests
elif button_id == 'test-select-none':
return []
return dash.no_update
# Main callback to update graphs based on all selections
@app.callback(
Output('graph-container', 'children'),
[Input('metric-checklist', 'value'),
Input('config-checklist', 'value'),
Input('test-checklist', 'value')]
)
def update_graphs(selected_metrics, selected_configs, selected_tests):
"""
This function is triggered when any control's value changes.
It generates and returns a list of graphs based on all user selections.
"""
# Handle cases where no selection is made to prevent errors and show a helpful message
if not all([selected_metrics, selected_configs, selected_tests]):
return dbc.Alert(
"Please select at least one item from each category (Metric, Configuration, and Test) to view data.",
color="info",
className="mt-4"
)
# Filter the DataFrame based on all selected criteria
filtered_df = df[df['label'].isin(selected_configs) & df['test_name'].isin(selected_tests)]
# If the filtered data is empty after selection, inform the user
if filtered_df.empty:
return dbc.Alert("No data available for the current selection.", color="warning", className="mt-4")
graph_list = []
metric_titles = {
'iops': 'IOPS Comparison (Higher is Better)',
'latency_mean_ms': 'Mean Latency (ms) Comparison (Lower is Better)',
'bandwidth_mbps': 'Bandwidth (MB/s) Comparison (Higher is Better)'
}
for metric in selected_metrics:
sort_order = 'total ascending' if metric == 'latency_mean_ms' else 'total descending'
error_y_param = 'latency_stddev_ms' if metric == 'latency_mean_ms' else None
fig = px.bar(
filtered_df,
x='test_name',
y=metric,
color='label',
barmode='group',
color_discrete_map=color_map,
error_y=error_y_param,
title=metric_titles.get(metric, metric),
labels={
"test_name": "Benchmark Test Name",
"iops": "IOPS",
"latency_mean_ms": "Mean Latency (ms)",
"bandwidth_mbps": "Bandwidth (MB/s)",
"label": "Cluster Configuration"
}
)
fig.update_layout(
height=500,
xaxis_title=None,
legend_title="Configuration",
title_x=0.5,
xaxis={'categoryorder': sort_order},
xaxis_tickangle=-45,
margin=dict(b=120) # Add bottom margin to prevent tick labels from being cut off
)
graph_list.append(dbc.Row(dbc.Col(dcc.Graph(figure=fig)), className="mb-4"))
return graph_list
# --- Run the App ---
if __name__ == '__main__':
app.run(debug=True)

View File

@@ -0,0 +1,29 @@
blinker==1.9.0
certifi==2025.7.14
charset-normalizer==3.4.2
click==8.2.1
dash==3.2.0
dash-bootstrap-components==2.0.3
Flask==3.1.1
idna==3.10
importlib_metadata==8.7.0
itsdangerous==2.2.0
Jinja2==3.1.6
MarkupSafe==3.0.2
narwhals==2.0.1
nest-asyncio==1.6.0
numpy==2.3.2
packaging==25.0
pandas==2.3.1
plotly==6.2.0
python-dateutil==2.9.0.post0
pytz==2025.2
requests==2.32.4
retrying==1.4.1
setuptools==80.9.0
six==1.17.0
typing_extensions==4.14.1
tzdata==2025.2
urllib3==2.5.0
Werkzeug==3.1.3
zipp==3.23.0

41
iobench/deployment.yaml Normal file
View File

@@ -0,0 +1,41 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: iobench
labels:
app: iobench
spec:
replicas: 1
selector:
matchLabels:
app: iobench
template:
metadata:
labels:
app: iobench
spec:
containers:
- name: fio
image: juicedata/fio:latest # Replace with your preferred fio image
imagePullPolicy: IfNotPresent
command: [ "sleep", "infinity" ] # Keeps the container running for kubectl exec
volumeMounts:
- name: iobench-pvc
mountPath: /data # Mount the PVC at /data
volumes:
- name: iobench-pvc
persistentVolumeClaim:
claimName: iobench-pvc # Matches your PVC name
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: iobench-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: ceph-block

253
iobench/src/main.rs Normal file
View File

@@ -0,0 +1,253 @@
use std::fs;
use std::io::{self, Write};
use std::process::{Command, Stdio};
use std::thread;
use std::time::Duration;
use chrono::Local;
use clap::Parser;
use serde::{Deserialize, Serialize};
/// A simple yet powerful I/O benchmarking tool using fio.
#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
struct Args {
/// Target for the benchmark.
/// Formats:
/// - localhost (default)
/// - ssh/{user}@{host}
/// - ssh/{user}@{host}:{port}
/// - k8s/{namespace}/{pod}
#[arg(short, long, default_value = "localhost")]
target: String,
#[arg(short, long, default_value = ".")]
benchmark_dir: String,
/// Comma-separated list of tests to run.
/// Available tests: read, write, randread, randwrite,
/// multiread, multiwrite, multirandread, multirandwrite.
#[arg(long, default_value = "read,write,randread,randwrite,multiread,multiwrite,multirandread,multirandwrite")]
tests: String,
/// Duration of each test in seconds.
#[arg(long, default_value_t = 15)]
duration: u64,
/// Output directory for results.
/// Defaults to ./iobench-{current_datetime}.
#[arg(long)]
output_dir: Option<String>,
/// The size of the test file for fio.
#[arg(long, default_value = "1G")]
size: String,
/// The block size for I/O operations.
#[arg(long, default_value = "4k")]
block_size: String,
}
#[derive(Debug, Serialize, Deserialize)]
struct FioOutput {
jobs: Vec<FioJobResult>,
}
#[derive(Debug, Serialize, Deserialize)]
struct FioJobResult {
jobname: String,
read: FioMetrics,
write: FioMetrics,
}
#[derive(Debug, Serialize, Deserialize)]
struct FioMetrics {
bw: f64,
iops: f64,
clat_ns: LatencyMetrics,
}
#[derive(Debug, Serialize, Deserialize)]
struct LatencyMetrics {
mean: f64,
stddev: f64,
}
#[derive(Debug, Serialize)]
struct BenchmarkResult {
test_name: String,
iops: f64,
bandwidth_kibps: f64,
latency_mean_ms: f64,
latency_stddev_ms: f64,
}
fn main() -> io::Result<()> {
let args = Args::parse();
let output_dir = args.output_dir.unwrap_or_else(|| {
format!("./iobench-{}", Local::now().format("%Y-%m-%d-%H%M%S"))
});
fs::create_dir_all(&output_dir)?;
let tests_to_run: Vec<&str> = args.tests.split(',').collect();
let mut results = Vec::new();
for test in tests_to_run {
println!("--------------------------------------------------");
println!("Running test: {}", test);
let (rw, numjobs) = match test {
"read" => ("read", 1),
"write" => ("write", 1),
"randread" => ("randread", 1),
"randwrite" => ("randwrite", 1),
"multiread" => ("read", 4),
"multiwrite" => ("write", 4),
"multirandread" => ("randread", 4),
"multirandwrite" => ("randwrite", 4),
_ => {
eprintln!("Unknown test: {}. Skipping.", test);
continue;
}
};
let test_name = format!("{}-{}-sync-test", test, args.block_size);
let fio_command = format!(
"fio --filename={}/iobench_testfile --direct=1 --fsync=1 --rw={} --bs={} --numjobs={} --iodepth=1 --runtime={} --time_based --group_reporting --name={} --size={} --output-format=json",
args.benchmark_dir, rw, args.block_size, numjobs, args.duration, test_name, args.size
);
println!("Executing command:\n{}\n", fio_command);
let output = match run_command(&args.target, &fio_command) {
Ok(out) => out,
Err(e) => {
eprintln!("Failed to execute command for test {}: {}", test, e);
continue;
}
};
let result = parse_fio_output(&output, &test_name, rw);
// TODO store raw fio output and print it
match result {
Ok(res) => {
results.push(res);
}
Err(e) => {
eprintln!("Error parsing fio output for test {}: {}", test, e);
eprintln!("Raw output:\n{}", output);
}
}
println!("{output}");
println!("Test {} completed.", test);
// A brief pause to let the system settle before the next test.
thread::sleep(Duration::from_secs(2));
}
// Cleanup the test file on the target
println!("--------------------------------------------------");
println!("Cleaning up test file on target...");
let cleanup_command = "rm -f ./iobench_testfile";
if let Err(e) = run_command(&args.target, cleanup_command) {
eprintln!("Warning: Failed to clean up test file on target: {}", e);
} else {
println!("Cleanup successful.");
}
if results.is_empty() {
println!("\nNo benchmark results to display.");
return Ok(());
}
// Output results to a CSV file for easy analysis
let csv_path = format!("{}/summary.csv", output_dir);
let mut wtr = csv::Writer::from_path(&csv_path)?;
for result in &results {
wtr.serialize(result)?;
}
wtr.flush()?;
println!("\nBenchmark summary saved to {}", csv_path);
println!("\n--- Benchmark Results Summary ---");
println!("{:<25} {:>10} {:>18} {:>20} {:>22}", "Test Name", "IOPS", "Bandwidth (KiB/s)", "Latency Mean (ms)", "Latency StdDev (ms)");
println!("{:-<98}", "");
for result in results {
println!("{:<25} {:>10.2} {:>18.2} {:>20.4} {:>22.4}", result.test_name, result.iops, result.bandwidth_kibps, result.latency_mean_ms, result.latency_stddev_ms);
}
Ok(())
}
fn run_command(target: &str, command: &str) -> io::Result<String> {
let (program, args) = if target == "localhost" {
("sudo", vec!["sh".to_string(), "-c".to_string(), command.to_string()])
} else if target.starts_with("ssh/") {
let target_str = target.strip_prefix("ssh/").unwrap();
let ssh_target;
let mut ssh_args = vec!["-o".to_string(), "StrictHostKeyChecking=no".to_string()];
let port_parts: Vec<&str> = target_str.split(':').collect();
if port_parts.len() == 2 {
ssh_target = port_parts[0].to_string();
ssh_args.push("-p".to_string());
ssh_args.push(port_parts[1].to_string());
} else {
ssh_target = target_str.to_string();
}
ssh_args.push(ssh_target);
ssh_args.push(format!("sudo sh -c '{}'", command));
("ssh", ssh_args)
} else if target.starts_with("k8s/") {
let parts: Vec<&str> = target.strip_prefix("k8s/").unwrap().split('/').collect();
if parts.len() != 2 {
return Err(io::Error::new(io::ErrorKind::InvalidInput, "Invalid k8s target format. Expected k8s/{namespace}/{pod}"));
}
let namespace = parts[0];
let pod = parts[1];
("kubectl", vec!["exec".to_string(), "-n".to_string(), namespace.to_string(), pod.to_string(), "--".to_string(), "sh".to_string(), "-c".to_string(), command.to_string()])
} else {
return Err(io::Error::new(io::ErrorKind::InvalidInput, "Invalid target format"));
};
let mut cmd = Command::new(program);
cmd.args(&args);
cmd.stdout(Stdio::piped()).stderr(Stdio::piped());
let child = cmd.spawn()?;
let output = child.wait_with_output()?;
if !output.status.success() {
eprintln!("Command failed with status: {}", output.status);
io::stderr().write_all(&output.stderr)?;
return Err(io::Error::new(io::ErrorKind::Other, "Command execution failed"));
}
String::from_utf8(output.stdout)
.map_err(|e| io::Error::new(io::ErrorKind::InvalidData, e))
}
fn parse_fio_output(output: &str, test_name: &str, rw: &str) -> Result<BenchmarkResult, String> {
let fio_data: FioOutput = serde_json::from_str(output)
.map_err(|e| format!("Failed to deserialize fio JSON: {}", e))?;
let job_result = fio_data.jobs.iter()
.find(|j| j.jobname == test_name)
.ok_or_else(|| format!("Could not find job result for '{}' in fio output", test_name))?;
let metrics = if rw.contains("read") {
&job_result.read
} else {
&job_result.write
};
Ok(BenchmarkResult {
test_name: test_name.to_string(),
iops: metrics.iops,
bandwidth_kibps: metrics.bw,
latency_mean_ms: metrics.clat_ns.mean / 1_000_000.0,
latency_stddev_ms: metrics.clat_ns.stddev / 1_000_000.0,
})
}