Compare commits

..

9 Commits

Author SHA1 Message Date
fb72e94dbb fix: when creating a port-channel with forced speed, it needs to be set on the port-channel and its member interfaces
Some checks failed
Run Check Script / check (pull_request) Failing after 11m29s
2026-04-10 15:41:09 -04:00
a646f1f4d0 feat: use an enum for interface types, add logging
Some checks failed
Run Check Script / check (pull_request) Failing after 23s
2026-03-25 12:35:54 -04:00
2728fc8989 feat: add possibility to configure interface speed
Some checks failed
Run Check Script / check (pull_request) Failing after 26s
2026-03-25 12:10:39 -04:00
8c8baaf9cc feat: create new brocade configuration score
Some checks failed
Run Check Script / check (pull_request) Failing after 24s
2026-03-25 11:45:57 -04:00
a1c9bfeabd feat: add a 'reset_interface' function
Some checks failed
Run Check Script / check (pull_request) Failing after 24s
2026-03-25 09:58:56 -04:00
d8dab12834 set the description of the port-channel interface
Some checks failed
Run Check Script / check (pull_request) Failing after 28s
2026-03-25 09:35:02 -04:00
7422534018 feat: require to specify port-channel ID instead of finding an available one
Some checks failed
Run Check Script / check (pull_request) Failing after 24s
2026-03-25 09:21:09 -04:00
b67275662d fix: use Vlan struct instance everywhere, never use an u16 to reference a Vlan
Some checks failed
Run Check Script / check (pull_request) Failing after 26s
2026-03-24 15:48:16 -04:00
6237e1d877 feat: brocade module now support vlans
Some checks failed
Run Check Script / check (pull_request) Failing after 27s
2026-03-24 15:24:32 -04:00
235 changed files with 4159 additions and 52168 deletions

6
.gitmodules vendored
View File

@@ -1,6 +1,12 @@
[submodule "examples/try_rust_webapp/tryrust.org"]
path = examples/try_rust_webapp/tryrust.org
url = https://github.com/rust-dd/tryrust.org.git
[submodule "/home/jeangab/work/nationtech/harmony2/opnsense-codegen/vendor/core"]
path = /home/jeangab/work/nationtech/harmony2/opnsense-codegen/vendor/core
url = https://github.com/opnsense/core.git
[submodule "/home/jeangab/work/nationtech/harmony2/opnsense-codegen/vendor/plugins"]
path = /home/jeangab/work/nationtech/harmony2/opnsense-codegen/vendor/plugins
url = https://github.com/opnsense/plugins.git
[submodule "opnsense-codegen/vendor/core"]
path = opnsense-codegen/vendor/core
url = https://github.com/opnsense/core.git

146
CLAUDE.md
View File

@@ -1,146 +0,0 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Build & Test Commands
```bash
# Full CI check (check + fmt + clippy + test)
./build/check.sh
# Individual commands
cargo check --all-targets --all-features --keep-going
cargo fmt --check # Check formatting
cargo clippy # Lint
cargo test # Run all tests
# Run a single test
cargo test -p <crate_name> <test_name>
# Run a specific example
cargo run -p <example_crate_name>
# Build the mdbook documentation
mdbook build
```
## What Harmony Is
Harmony is the orchestration framework powering NationTech's vision of **decentralized micro datacenters** — small computing clusters deployed in homes, offices, and community spaces instead of hyperscaler facilities. The goal: make computing cleaner, more resilient, locally beneficial, and resistant to centralized points of failure (including geopolitical threats).
Harmony exists because existing IaC tools (Terraform, Ansible, Helm) are trapped in a **YAML mud pit**: static configuration files validated only at runtime, fragmented across tools, with errors surfacing at 3 AM instead of at compile time. Harmony replaces this entire class of tools with a single Rust codebase where **the compiler catches infrastructure misconfigurations before anything is deployed**.
This is not a wrapper around existing tools. It is a paradigm shift: infrastructure-as-real-code with compile-time safety guarantees that no YAML/HCL/DSL-based tool can provide.
## The Score-Topology-Interpret Pattern
This is the core design pattern. Understand it before touching the codebase.
**Score** — declarative desired state. A Rust struct generic over `T: Topology` that describes *what* you want (e.g., "a PostgreSQL cluster", "DNS records for these hosts"). Scores are serializable, cloneable, idempotent.
**Topology** — infrastructure capabilities. Represents *where* things run and *what the environment can do*. Exposes capabilities as traits (`DnsServer`, `K8sclient`, `HelmCommand`, `LoadBalancer`, `Firewall`, etc.). Examples: `K8sAnywhereTopology` (local K3D or any K8s cluster), `HAClusterTopology` (bare-metal HA with redundant firewalls/switches).
**Interpret** — execution glue. Translates a Score into concrete operations against a Topology's capabilities. Returns an `Outcome` (SUCCESS, NOOP, FAILURE, RUNNING, QUEUED, BLOCKED).
**The key insight — compile-time safety through trait bounds:**
```rust
impl<T: Topology + DnsServer + DhcpServer> Score<T> for DnsScore { ... }
```
The compiler rejects any attempt to use `DnsScore` with a Topology that doesn't implement `DnsServer` and `DhcpServer`. Invalid infrastructure configurations become compilation errors, not runtime surprises.
**Higher-order topologies** compose transparently:
- `FailoverTopology<T>` — primary/replica orchestration
- `DecentralizedTopology<T>` — multi-site coordination
If `T: PostgreSQL`, then `FailoverTopology<T>: PostgreSQL` automatically via blanket impls. Zero boilerplate.
## Architecture (Hexagonal)
```
harmony/src/
├── domain/ # Core domain — the heart of the framework
│ ├── score.rs # Score trait (desired state)
│ ├── topology/ # Topology trait + implementations
│ ├── interpret/ # Interpret trait + InterpretName enum (25+ variants)
│ ├── inventory/ # Physical infrastructure metadata (hosts, switches, mgmt interfaces)
│ ├── executors/ # Executor trait definitions
│ └── maestro/ # Orchestration engine (registers scores, manages topology state, executes)
├── infra/ # Infrastructure adapters (driven ports)
│ ├── opnsense/ # OPNsense firewall adapter
│ ├── brocade.rs # Brocade switch adapter
│ ├── kube.rs # Kubernetes executor
│ └── sqlx.rs # Database executor
└── modules/ # Concrete deployment modules (23+)
├── k8s/ # Kubernetes (namespaces, deployments, ingress)
├── postgresql/ # CloudNativePG clusters + multi-site failover
├── okd/ # OpenShift bare-metal from scratch
├── helm/ # Helm chart inflation → vanilla K8s YAML
├── opnsense/ # OPNsense (DHCP, DNS, etc.)
├── monitoring/ # Prometheus, Alertmanager, Grafana
├── kvm/ # KVM virtual machine management
├── network/ # Network services (iPXE, TFTP, bonds)
└── ...
```
Domain types to know: `Inventory` (read-only physical infra context), `Maestro<T>` (orchestrator — calls `topology.ensure_ready()` then executes scores), `Outcome` / `InterpretError` (execution results).
## Key Crates
| Crate | Purpose |
|---|---|
| `harmony` | Core framework: domain, infra adapters, deployment modules |
| `harmony_cli` | CLI + optional TUI (`--features tui`) |
| `harmony_config` | Unified config+secret management (env → SQLite → OpenBao → interactive prompt) |
| `harmony_secret` / `harmony_secret_derive` | Secret backends (LocalFile, OpenBao, Infisical) |
| `harmony_execution` | Execution engine |
| `harmony_agent` / `harmony_inventory_agent` | Persistent agent framework (NATS JetStream mesh), hardware discovery |
| `harmony_assets` | Asset management (URLs, local cache, S3) |
| `harmony_composer` | Infrastructure composition tool |
| `harmony-k8s` | Kubernetes utilities |
| `k3d` | Local K3D cluster management |
| `brocade` | Brocade network switch integration |
## OPNsense Crates
The `opnsense-codegen` and `opnsense-api` crates exist because OPNsense's automation ecosystem is poor — no typed API client exists. These are support crates, not the core of Harmony.
- `opnsense-codegen`: XML model files → IR → Rust structs with serde helpers for OPNsense wire format quirks (`opn_bool` for "0"/"1" strings, `opn_u16`/`opn_u32` for string-encoded numbers). Vendor sources are git submodules under `opnsense-codegen/vendor/`.
- `opnsense-api`: Hand-written `OpnsenseClient` + generated model types in `src/generated/`.
## Key Design Decisions (ADRs in docs/adr/)
- **ADR-001**: Rust chosen for type system, refactoring safety, and performance
- **ADR-002**: Hexagonal architecture — domain isolated from adapters
- **ADR-003**: Infrastructure abstractions at domain level, not provider level (no vendor lock-in)
- **ADR-005**: Custom Rust DSL over YAML/Score-spec — real language, Cargo deps, composable
- **ADR-007**: K3D as default runtime (K8s-certified, lightweight, cross-platform)
- **ADR-009**: Helm charts inflated to vanilla K8s YAML, then deployed via existing code paths
- **ADR-015**: Higher-order topologies via blanket trait impls (zero-cost composition)
- **ADR-016**: Agent-based architecture with NATS JetStream for real-time failover and distributed consensus
- **ADR-020**: Unified config+secret management — Rust struct is the schema, resolution chain: env → store → prompt
## Capability and Score Design Rules
**Capabilities are industry concepts, not tools.** A capability trait represents a standard infrastructure need (e.g., `DnsServer`, `LoadBalancer`, `Router`, `CertificateManagement`) that can be fulfilled by different products. OPNsense provides `DnsServer` today; CoreDNS or Route53 could provide it tomorrow. Scores must not break when the backend changes.
**Exception:** When the developer fundamentally needs to know the implementation. `PostgreSQL` is a capability (not `Database`) because the developer writes PostgreSQL-specific SQL and replication configs. Swapping to MariaDB would break the application, not just the infrastructure.
**Test:** If you could swap the underlying tool without rewriting any Score that uses the capability, the boundary is correct.
**Don't name capabilities after tools.** `SecretVault` not `OpenbaoStore`. `IdentityProvider` not `ZitadelAuth`. Think: what is the core developer need that leads to using this tool?
**Scores encapsulate operational complexity.** Move procedural knowledge (init sequences, retry logic, distribution-specific config) into Scores. A high-level example should be ~15 lines, not ~400 lines of imperative orchestration.
**Scores must be idempotent.** Running twice = same result as once. Use create-or-update, handle "already exists" gracefully.
**Scores must not depend on execution order.** Declare capability requirements via trait bounds, don't assume another Score ran first. If Score B needs what Score A provides, Score B should declare that capability as a trait bound.
See `docs/guides/writing-a-score.md` for the full guide.
## Conventions
- **Rust edition 2024**, resolver v2
- **Conventional commits**: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`
- **Small PRs**: max ~200 lines (excluding generated code), single-purpose
- **License**: GNU AGPL v3
- **Quality bar**: This framework demands high-quality engineering. The type system is a feature, not a burden. Leverage it. Prefer compile-time guarantees over runtime checks. Abstractions should be domain-level, not provider-specific.

203
Cargo.lock generated
View File

@@ -148,7 +148,7 @@ dependencies = [
"bytes",
"bytestring",
"cfg-if",
"cookie 0.16.2",
"cookie",
"derive_more",
"encoding_rs",
"foldhash",
@@ -1262,6 +1262,22 @@ dependencies = [
"url",
]
[[package]]
name = "brocade-switch-configuration"
version = "0.1.0"
dependencies = [
"async-trait",
"brocade",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"serde",
"tokio",
]
[[package]]
name = "brotli"
version = "8.0.2"
@@ -1681,34 +1697,6 @@ dependencies = [
"version_check",
]
[[package]]
name = "cookie"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7efb37c3e1ccb1ff97164ad95ac1606e8ccd35b3fa0a7d99a304c7f4a428cc24"
dependencies = [
"percent-encoding",
"time",
"version_check",
]
[[package]]
name = "cookie_store"
version = "0.20.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "387461abbc748185c3a6e1673d826918b450b87ff22639429c694619a83b6cf6"
dependencies = [
"cookie 0.17.0",
"idna 0.3.0",
"log",
"publicsuffix",
"serde",
"serde_derive",
"serde_json",
"time",
"url",
]
[[package]]
name = "core-foundation"
version = "0.9.4"
@@ -2288,15 +2276,6 @@ dependencies = [
"dirs-sys",
]
[[package]]
name = "dirs"
version = "6.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c3e8aa94d75141228480295a7d0e7feb620b1a5ad9f12bc40be62411e38cce4e"
dependencies = [
"dirs-sys",
]
[[package]]
name = "dirs-sys"
version = "0.5.0"
@@ -2639,23 +2618,18 @@ name = "example-harmony-sso"
version = "0.1.0"
dependencies = [
"anyhow",
"clap",
"directories",
"env_logger",
"harmony",
"harmony-k8s",
"harmony_cli",
"harmony_config",
"harmony_macros",
"harmony_secret",
"harmony_types",
"interactive-parse",
"k3d-rs",
"k8s-openapi",
"kube",
"log",
"reqwest 0.12.28",
"schemars 0.8.22",
"serde",
"serde_json",
"tokio",
@@ -3551,7 +3525,6 @@ dependencies = [
"helm-wrapper-rs",
"hex",
"http 1.4.0",
"httptest",
"inquire 0.7.5",
"k3d-rs",
"k8s-openapi",
@@ -3561,7 +3534,6 @@ dependencies = [
"log",
"non-blank-string-rs",
"once_cell",
"opnsense-api",
"opnsense-config",
"opnsense-config-xml",
"option-ext",
@@ -3678,7 +3650,6 @@ dependencies = [
"blake3",
"clap",
"directories",
"env_logger",
"futures-util",
"httptest",
"indicatif",
@@ -3777,13 +3748,6 @@ dependencies = [
"thiserror 2.0.18",
]
[[package]]
name = "harmony_i18n"
version = "0.1.0"
dependencies = [
"serde",
]
[[package]]
name = "harmony_inventory_agent"
version = "0.1.0"
@@ -4399,16 +4363,6 @@ version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b9e0384b61958566e926dc50660321d12159025e767c18e043daf26b70104c39"
[[package]]
name = "idna"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e14ddfc70884202db2244c223200c204c2bda1bc6e0998d11b5e024d657209e6"
dependencies = [
"unicode-bidi",
"unicode-normalization",
]
[[package]]
name = "idna"
version = "1.1.0"
@@ -4686,6 +4640,26 @@ dependencies = [
"thiserror 1.0.69",
]
[[package]]
name = "json-prompt"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_secret",
"harmony_secret_derive",
"harmony_types",
"log",
"schemars 0.8.22",
"serde",
"tokio",
"url",
]
[[package]]
name = "jsonpath-rust"
version = "0.7.5"
@@ -4866,17 +4840,6 @@ dependencies = [
"tracing",
]
[[package]]
name = "kvm-vm-examples"
version = "0.1.0"
dependencies = [
"clap",
"env_logger",
"harmony",
"log",
"tokio",
]
[[package]]
name = "language-tags"
version = "0.3.2"
@@ -5132,29 +5095,6 @@ dependencies = [
"syn 2.0.117",
]
[[package]]
name = "network_stress_test"
version = "0.1.0"
dependencies = [
"actix-web",
"askama",
"async-stream",
"async-trait",
"brocade",
"chrono",
"env_logger",
"harmony_types",
"log",
"opnsense-api",
"rand 0.9.2",
"russh",
"russh-keys",
"serde",
"serde_json",
"sqlx",
"tokio",
]
[[package]]
name = "newline-converter"
version = "0.2.2"
@@ -5371,14 +5311,11 @@ checksum = "7c87def4c32ab89d880effc9e097653c8da5d6ef28e6b539d313baaacfbafcbe"
name = "opnsense-api"
version = "0.1.0"
dependencies = [
"async-trait",
"base64 0.22.1",
"env_logger",
"harmony_types",
"http 1.4.0",
"inquire 0.7.5",
"log",
"opnsense-config",
"pretty_assertions",
"reqwest 0.12.28",
"serde",
@@ -5398,7 +5335,6 @@ dependencies = [
"log",
"pretty_assertions",
"quick-xml",
"regex",
"serde",
"serde_json",
"thiserror 2.0.18",
@@ -5413,10 +5349,8 @@ dependencies = [
"async-trait",
"chrono",
"env_logger",
"harmony_types",
"httptest",
"log",
"opnsense-api",
"opnsense-config-xml",
"pretty_assertions",
"russh",
"russh-keys",
@@ -5427,7 +5361,6 @@ dependencies = [
"thiserror 1.0.69",
"tokio",
"tokio-stream",
"tokio-test",
"tokio-util",
"uuid",
]
@@ -5450,46 +5383,6 @@ dependencies = [
"yaserde_derive",
]
[[package]]
name = "opnsense-pair-integration"
version = "0.1.0"
dependencies = [
"dirs",
"env_logger",
"harmony",
"harmony_cli",
"harmony_inventory_agent",
"harmony_macros",
"harmony_types",
"log",
"opnsense-api",
"opnsense-config",
"reqwest 0.12.28",
"russh",
"serde_json",
"tokio",
]
[[package]]
name = "opnsense-vm-integration"
version = "0.1.0"
dependencies = [
"dirs",
"env_logger",
"harmony",
"harmony_cli",
"harmony_inventory_agent",
"harmony_macros",
"harmony_types",
"log",
"opnsense-api",
"opnsense-config",
"reqwest 0.12.28",
"russh",
"serde_json",
"tokio",
]
[[package]]
name = "option-ext"
version = "0.2.0"
@@ -5952,22 +5845,6 @@ dependencies = [
"unicode-ident",
]
[[package]]
name = "psl-types"
version = "2.0.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "33cb294fe86a74cbcf50d4445b37da762029549ebeea341421c7c70370f86cac"
[[package]]
name = "publicsuffix"
version = "2.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6f42ea446cab60335f76979ec15e12619a2165b5ae2c12166bef27d283a9fadf"
dependencies = [
"idna 1.1.0",
"psl-types",
]
[[package]]
name = "punycode"
version = "0.4.1"
@@ -6275,8 +6152,6 @@ checksum = "dd67538700a17451e7cba03ac727fb961abb7607553461627b97de0b89cf4a62"
dependencies = [
"base64 0.21.7",
"bytes",
"cookie 0.17.0",
"cookie_store",
"encoding_rs",
"futures-core",
"futures-util",
@@ -8429,7 +8304,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ff67a8a4397373c3ef660812acab3268222035010ab8680ec4215f38ba3d0eed"
dependencies = [
"form_urlencoded",
"idna 1.1.0",
"idna",
"percent-encoding",
"serde",
"serde_derive",

View File

@@ -16,10 +16,8 @@ members = [
"harmony_inventory_agent",
"harmony_secret_derive",
"harmony_secret",
"network_stress_test",
"examples/kvm_okd_ha_cluster",
"examples/example_linux_vm",
"harmony_i18n",
"harmony_config_derive",
"harmony_config",
"brocade",
@@ -45,7 +43,6 @@ tokio = { version = "1.40", features = [
"io-util",
"fs",
"macros",
"net",
"rt-multi-thread",
] }
tokio-retry = "0.3.0"

View File

@@ -1,4 +1,4 @@
FROM docker.io/rust:1.94 AS build
FROM docker.io/rust:1.89.0 AS build
WORKDIR /app
@@ -6,7 +6,7 @@ COPY . .
RUN cargo build --release --bin harmony_composer
FROM docker.io/rust:1.94
FROM docker.io/rust:1.89.0
WORKDIR /app

View File

@@ -1,6 +1,6 @@
# Harmony Roadmap
Eight phases to take Harmony from working prototype to production-ready open-source project.
Six phases to take Harmony from working prototype to production-ready open-source project.
| # | Phase | Status | Depends On | Detail |
|---|-------|--------|------------|--------|
@@ -10,29 +10,17 @@ Eight phases to take Harmony from working prototype to production-ready open-sou
| 4 | [Publish to GitHub](ROADMAP/04-publish-github.md) | Not started | 3 | Clean history, set up GitHub as community hub, CI on self-hosted runners |
| 5 | [E2E tests: PostgreSQL & RustFS](ROADMAP/05-e2e-tests-simple.md) | Not started | 1 | k3d-based test harness, two passing E2E tests, CI job |
| 6 | [E2E tests: OKD HA on KVM](ROADMAP/06-e2e-tests-kvm.md) | Not started | 5 | KVM test infrastructure, full OKD installation test, nightly CI |
| 7 | [OPNsense & Bare-Metal Network Automation](ROADMAP/07-opnsense-bare-metal.md) | **In progress** | — | Full OPNsense API coverage, Brocade switch integration, HA cluster network provisioning |
| 8 | [HA OKD Production Deployment](ROADMAP/08-ha-okd-production.md) | Not started | 7 | LAGG/CARP/multi-WAN/BINAT cluster with UpdateHostScore, end-to-end bare-metal automation |
| 9 | [SSO + Config Hardening](ROADMAP/09-sso-config-hardening.md) | **In progress** | 1 | Builder pattern for OpenbaoSecretStore, ZitadelScore PG fix, CoreDNSRewriteScore, integration tests |
## Current State (as of branch `feat/opnsense-codegen`)
## Current State (as of branch `feature/kvm-module`)
- `harmony_config` crate exists with `EnvSource`, `LocalFileSource`, `PromptSource`, `StoreSource`. 12 unit tests. **Zero consumers** in workspace — everything still uses `harmony_secret::SecretManager` directly (19 call sites).
- `harmony_assets` crate exists with `Asset`, `LocalCache`, `LocalStore`, `S3Store`. **No tests. Zero consumers.** The `k3d` crate has its own `DownloadableAsset` with identical functionality and full test coverage.
- `harmony_secret` has `LocalFileSecretStore`, `OpenbaoSecretStore` (token/userpass/OIDC device flow + JWT exchange), `InfisicalSecretStore`. Zitadel OIDC integration **implemented** with session caching.
- **SSO example** (`examples/harmony_sso/`): deploys Zitadel + OpenBao on k3d, provisions identity resources, authenticates via device flow, stores config in OpenBao. `OpenbaoSetupScore` and `ZitadelSetupScore` encapsulate day-two operations.
- `harmony_secret` has `LocalFileSecretStore`, `OpenbaoSecretStore` (token/userpass only), `InfisicalSecretStore`. Works but no Zitadel OIDC integration.
- KVM module exists on this branch with `KvmExecutor`, VM lifecycle, ISO download, two examples (`example_linux_vm`, `kvm_okd_ha_cluster`).
- RustFS module exists on `feat/rustfs` branch (2 commits ahead of master).
- 39 example crates, **zero E2E tests**. Unit tests pass across workspace (~240 tests).
- CI runs `cargo check`, `fmt`, `clippy`, `test` on Gitea. No E2E job.
### OPNsense & Bare-Metal (as of branch `feat/opnsense-codegen`)
- **9 OPNsense Scores** implemented: VlanScore, LaggScore, VipScore, DnatScore, FirewallRuleScore, OutboundNatScore, BinatScore, NodeExporterScore, OPNsenseShellCommandScore. All tested against a 4-NIC VM.
- **opnsense-codegen** pipeline operational: XML → IR → typed Rust structs with serde helpers. 11 generated API modules (26.5K lines).
- **opnsense-config** has 13 modules: DHCP (dnsmasq), DNS, firewall, LAGG, VIP, VLAN, load balancer (HAProxy), Caddy, TFTP, node exporter, and legacy DHCP.
- **Brocade switch integration** on `feat/brocade-client-add-vlans`: full VLAN CRUD, interface speed config, port-channel management, new `BrocadeSwitchConfigurationScore`. Breaking API changes (InterfaceConfig replaces tuples).
- **Missing for production**: `UpdateHostScore` (update MAC in DHCP for PXE boot + host network setup for LAGG LACP 802.3ad), `HostNetworkConfigurationScore` needs rework for LAGG/LACP (currently only creates bonds, doesn't configure LAGG on OPNsense side), brocade branch needs merge and API adaptation in `harmony/src/infra/brocade.rs`.
## Guiding Principles
- **Zero-setup first**: A new user clones, runs `cargo run`, gets prompted for config, values persist to local SQLite. No env vars, no external services required.

View File

@@ -1,57 +0,0 @@
# Phase 7: OPNsense & Bare-Metal Network Automation
## Goal
Complete the OPNsense API coverage and Brocade switch integration to enable fully automated bare-metal HA cluster provisioning with LAGG, CARP VIP, multi-WAN, and BINAT.
## Status: In Progress
### Done
- opnsense-codegen pipeline: XML model parsing, IR generation, Rust code generation with serde helpers
- 11 generated API modules covering firewall, interfaces (VLAN, LAGG, VIP), HAProxy, DNSMasq, Caddy, WireGuard
- 9 OPNsense Scores: VlanScore, LaggScore, VipScore, DnatScore, FirewallRuleScore, OutboundNatScore, BinatScore, NodeExporterScore, OPNsenseShellCommandScore
- 13 opnsense-config modules with high-level Rust APIs
- E2E tests for DNSMasq CRUD, HAProxy service lifecycle, interface settings
- Brocade branch with VLAN CRUD, interface speed config, port-channel management
### Remaining
#### UpdateHostScore (new)
A Score that updates a host's configuration in the DHCP server and prepares it for PXE boot. Core responsibilities:
1. **Update MAC address in DHCP**: When hardware is replaced or NICs are swapped, update the DHCP static mapping with the new MAC address(es). This is the most critical function — without it, PXE boot targets the wrong hardware.
2. **Configure PXE boot options**: Set next-server, boot filename (BIOS/UEFI/iPXE) for the specific host.
3. **Host network setup for LAGG LACP 802.3ad**: Configure the host's network interfaces for link aggregation. This replaces the current `HostNetworkConfigurationScore` approach which only handles bond creation on the host side — the new approach must also create the corresponding LAGG interface on OPNsense and configure the Brocade switch port-channel with LACP.
The existing `DhcpHostBindingScore` handles bulk MAC-to-IP registration but lacks the ability to _update_ an existing mapping (the `remove_static_mapping` and `list_static_mappings` methods on `OPNSenseFirewall` are still `todo!()`).
#### Merge Brocade branch
The `feat/brocade-client-add-vlans` branch has breaking API changes:
- `configure_interfaces` now takes `Vec<InterfaceConfig>` instead of `Vec<(String, PortOperatingMode)>`
- `InterfaceType` changed from `Ethernet(String)` to specific variants (TenGigabitEthernet, FortyGigabitEthernet)
- `harmony/src/infra/brocade.rs` needs adaptation to the new API
#### HostNetworkConfigurationScore rework
The current implementation (`harmony/src/modules/okd/host_network.rs`) has documented limitations:
- Not idempotent (running twice may duplicate bond configs)
- No rollback logic
- Doesn't wait for switch config propagation
- All tests are `#[ignore]` due to requiring interactive TTY (inquire prompts)
- Doesn't create LAGG on OPNsense — only bonds on the host and port-channels on the switch
For LAGG LACP 802.3ad the flow needs to be:
1. Create LAGG interface on OPNsense (LaggScore already exists)
2. Create port-channel on Brocade switch (BrocadeSwitchConfigurationScore)
3. Configure bond on host via NMState (existing NetworkManager)
4. All three must be coordinated and idempotent
#### Fill remaining OPNsense `todo!()` stubs
- `OPNSenseFirewall::remove_static_mapping` — needed by UpdateHostScore
- `OPNSenseFirewall::list_static_mappings` — needed for idempotent updates
- `OPNSenseFirewall::Firewall` trait (add_rule, remove_rule, list_rules) — stub only
- `OPNSenseFirewall::dns::register_dhcp_leases` — stub only

View File

@@ -1,56 +0,0 @@
# Phase 8: HA OKD Production Deployment
## Goal
Deploy a production HAClusterTopology OKD cluster in UPI mode with full LAGG LACP 802.3ad, CARP VIP, multi-WAN, and BINAT for customer traffic — entirely automated through Harmony Scores.
## Status: Not Started
## Prerequisites
- Phase 7 (OPNsense & Bare-Metal) substantially complete
- Brocade branch merged and adapted
- UpdateHostScore implemented and tested
## Deployment Stack
### Network Layer (OPNsense)
- **LAGG interfaces** (802.3ad LACP) for all cluster hosts — redundant links via LaggScore
- **CARP VIPs** for high availability — failover IPs via VipScore
- **Multi-WAN** configuration — multiple uplinks with gateway groups
- **BINAT** for customer-facing IPs — 1:1 NAT via BinatScore
- **Firewall rules** per-customer with proper source/dest filtering via FirewallRuleScore
- **Outbound NAT** for cluster egress via OutboundNatScore
### Switch Layer (Brocade)
- **VLAN** per network segment (management, cluster, customer, storage)
- **Port-channels** (LACP) matching OPNsense LAGG interfaces
- **Interface speed** configuration for 10G/40G links
### Host Layer
- **PXE boot** via UpdateHostScore (MAC → DHCP → TFTP → iPXE → SCOS)
- **Network bonds** (LACP) via reworked HostNetworkConfigurationScore
- **NMState** for persistent bond configuration on OpenShift nodes
### Cluster Layer
- OKD UPI installation via existing OKDSetup01-04 Scores
- HAProxy load balancer for API and ingress via LoadBalancerScore
- DNS via OKDDnsScore
- Monitoring via NodeExporterScore + Prometheus stack
## New Scores Needed
1. **UpdateHostScore** — Update MAC in DHCP, configure PXE boot, prepare host network for LAGG LACP
2. **MultiWanScore** — Configure OPNsense gateway groups for multi-WAN failover
3. **CustomerBinatScore** (optional) — Higher-level Score combining BinatScore + FirewallRuleScore + DnatScore per customer
## Validation Checklist
- [ ] All hosts PXE boot successfully after MAC update
- [ ] LAGG/LACP active on all host links (verify via `teamdctl` or `nmcli`)
- [ ] CARP VIPs fail over within expected time window
- [ ] BINAT customers reachable from external networks
- [ ] Multi-WAN failover tested (pull one uplink, verify traffic shifts)
- [ ] Full OKD installation completes end-to-end
- [ ] Cluster API accessible via CARP VIP
- [ ] Customer workloads routable via BINAT

View File

@@ -1,125 +0,0 @@
# Phase 9: SSO + Config System Hardening
## Goal
Make the Zitadel + OpenBao SSO config management stack production-ready, well-tested, and reusable across deployments. The `harmony_sso` example demonstrates the full loop: deploy infrastructure, authenticate via SSO, store and retrieve config -- all in one `cargo run`.
## Current State (as of `feat/opnsense-codegen`)
The SSO example works end-to-end:
- k3d cluster + OpenBao + Zitadel deployed via Scores
- `OpenbaoSetupScore`: init, unseal, policies, userpass, JWT auth
- `ZitadelSetupScore`: project + device-code app provisioning via Management API (PAT auth)
- JWT exchange: Zitadel id_token → OpenBao client token via `/v1/auth/jwt/login`
- Device flow triggers in terminal, user logs in via browser, config stored in OpenBao KV v2
- CoreDNS patched for in-cluster hostname resolution (K3sFamily only)
- Discovery cache invalidation after CRD installation
- Session caching with TTL
### What's solid
- **Score composition**: 4 Scores orchestrate the full stack in ~280 lines
- **Config trait**: clean `Serialize + Deserialize + JsonSchema`, developer doesn't see OpenBao or Zitadel
- **Auth chain transparency**: token → cached → OIDC device flow → userpass, right thing happens
- **Idempotency**: all Scores safe to re-run, cached sessions skip login
### What needs work
See tasks below.
## Tasks
### 9.1 Builder pattern for `OpenbaoSecretStore` — HIGH
**Problem**: `OpenbaoSecretStore::new()` has 11 positional arguments. Adding JWT params made it worse. Callers pass `None, None, None, None` for unused options.
**Fix**: Replace with a builder:
```rust
OpenbaoSecretStore::builder()
.url("http://127.0.0.1:8200")
.kv_mount("secret")
.skip_tls(true)
.zitadel_sso("http://sso.harmony.local:8080", "client-id-123")
.jwt_auth("harmony-developer", "jwt")
.build()
.await?
```
**Impact**: All callers updated (lib.rs, openbao_chain example, harmony_sso example). Breaking API change.
**Files**: `harmony_secret/src/store/openbao.rs`, all callers
### 9.2 Fix ZitadelScore PG readiness — HIGH
**Problem**: `ZitadelScore` calls `topology.get_endpoint()` immediately after deploying the CNPG Cluster CR. The PG `-rw` service takes 15-30s to appear. This forces a retry loop in the caller (the example).
**Fix**: Add a wait loop inside `ZitadelScore`'s interpret, after `topology.deploy(&pg_config)`, that polls for the `-rw` service to exist before calling `get_endpoint()`. Use `K8sClient::get_resource::<Service>()` with a poll loop.
**Impact**: Eliminates the retry wrapper in the harmony_sso example and any other Zitadel consumer.
**Files**: `harmony/src/modules/zitadel/mod.rs`
### 9.3 `CoreDNSRewriteScore` — MEDIUM
**Problem**: CoreDNS patching logic lives in the harmony_sso example. It's a general pattern: any service with ingress-based Host routing needs in-cluster DNS resolution.
**Fix**: Extract into `harmony/src/modules/k8s/coredns.rs` as a proper Score:
```rust
pub struct CoreDNSRewriteScore {
pub rewrites: Vec<(String, String)>, // (hostname, service FQDN)
}
impl<T: Topology + K8sclient> Score<T> for CoreDNSRewriteScore { ... }
```
K3sFamily only. No-op on OpenShift. Idempotent.
**Files**: `harmony/src/modules/k8s/coredns.rs` (new), `harmony/src/modules/k8s/mod.rs`
### 9.4 Integration tests for Scores — MEDIUM
**Problem**: Zero tests for `OpenbaoSetupScore`, `ZitadelSetupScore`, `CoreDNSRewriteScore`. The Scores are testable against a running k3d cluster.
**Fix**: Add `#[ignore]` integration tests that require a running cluster:
- `test_openbao_setup_score`: deploy OpenBao + run setup, verify KV works
- `test_zitadel_setup_score`: deploy Zitadel + run setup, verify project/app exist
- `test_config_round_trip`: store + retrieve config via SSO-authenticated OpenBao
Run with `cargo test -- --ignored` after deploying the example.
**Files**: `harmony/tests/integration/` (new directory)
### 9.5 Remove `resolve()` DNS hack — LOW
**Problem**: `ZitadelOidcAuth::http_client()` hardcodes `resolve(host, 127.0.0.1:port)`. This only works for local k3d development.
**Fix**: Make it configurable. Add an optional `resolve_to: Option<SocketAddr>` field to `ZitadelOidcAuth`. The example passes `Some(127.0.0.1:8080)` for k3d; production passes `None` (uses real DNS). Or better: detect whether the host resolves and only apply the override if it doesn't.
**Files**: `harmony_secret/src/store/zitadel.rs`
### 9.6 Typed Zitadel API client — LOW
**Problem**: `ZitadelSetupScore` uses hand-written JSON with string parsing for Management API calls. No type safety on request/response.
**Fix**: Create typed request/response structs for the Management API v1 endpoints used (projects, apps, users). Use `serde` for serialization. This doesn't need to be a full API client -- just the endpoints we use.
**Files**: `harmony/src/modules/zitadel/api.rs` (new)
### 9.7 Capability traits for secret vault + identity — FUTURE
**Problem**: `OpenbaoScore` and `ZitadelScore` are tool-specific. No capability abstraction for "I need a secret vault" or "I need an identity provider".
**Fix**: Design `SecretVault` and `IdentityProvider` capability traits on topologies. This is a significant architectural decision that needs an ADR.
**Blocked by**: Real-world use of a second implementation (e.g., HashiCorp Vault, Keycloak) to validate the abstraction boundary.
### 9.8 Auto-unseal for OpenBao — FUTURE
**Problem**: Every pod restart requires manual unseal. `OpenbaoSetupScore` handles this, but requires re-running the Score.
**Fix**: Configure Transit auto-unseal (using a second OpenBao/Vault instance) or cloud KMS auto-unseal. This is an operational concern that should be configurable in `OpenbaoSetupScore`.
## Relationship to Other Phases
- **Phase 1** (config crate): SSO flow builds directly on `harmony_config` + `StoreSource<OpenbaoSecretStore>`. Phase 1 task 1.4 is now **complete** via the harmony_sso example.
- **Phase 2** (migrate to harmony_config): The 19 `SecretManager` call sites should migrate to `ConfigManager` with the OpenbaoSecretStore backend. The SSO flow validates this pattern works.
- **Phase 5** (E2E tests): The harmony_sso example is a candidate for the first E2E test -- it deploys k3d, exercises multiple Scores, and verifies config storage.

View File

@@ -1,49 +0,0 @@
# Phase 10: Firewall Pair Topology & HA Firewall Automation
## Goal
Provide first-class support for managing OPNsense (and future) HA firewall pairs through a higher-order topology, including CARP VIP orchestration, per-device config differentiation, and integration testing.
## Current State
`FirewallPairTopology` is implemented as a concrete wrapper around two `OPNSenseFirewall` instances. It applies uniform scores to both firewalls and differentiates CARP VIP advskew (primary=0, backup=configurable). All existing OPNsense scores (Lagg, Vlan, Firewall Rules, DNAT, BINAT, Outbound NAT, DHCP) work with the pair topology. QC1 uses it for its NT firewall pair.
## Tasks
### 10.1 Generic FirewallPair over a capability trait
**Priority**: MEDIUM
**Status**: Not started
`FirewallPairTopology` is currently concrete over `OPNSenseFirewall`. This breaks extensibility — a pfSense or VyOS firewall pair would need a separate type. Introduce a `FirewallAppliance` capability trait that `OPNSenseFirewall` implements, and make `FirewallPairTopology<T: FirewallAppliance>` generic. The blanket-impl pattern from ADR-015 then gives automatic pair support for any appliance type.
Key challenge: the trait needs to expose enough for `CarpVipScore` to configure VIPs with per-device advskew, without leaking OPNsense-specific APIs.
### 10.2 Delegation macro for higher-order topologies
**Priority**: MEDIUM
**Status**: Not started
The "delegate to both" pattern used by uniform pair scores is pure boilerplate. Every `Score<FirewallPairTopology>` impl for uniform scores follows the same structure: create the inner `Score<OPNSenseFirewall>` interpret, execute against primary, then backup.
Design a proc macro (e.g., `#[derive(DelegatePair)]` or `delegate_score_to_pair!`) that generates these impls automatically. This would also apply to `DecentralizedTopology` (delegate to all sites) and future higher-order topologies.
### 10.3 XMLRPC sync support
**Priority**: LOW
**Status**: Not started
Add optional `FirewallPairTopology::sync_from_primary()` that triggers OPNsense XMLRPC config sync from primary to backup. Useful for settings that must be identical and don't need per-device differentiation. Not blocking — independent application to both firewalls achieves the same config state.
### 10.4 Integration test with CARP/LACP failover
**Priority**: LOW
**Status**: Not started
Extend the existing OPNsense example deployment to create a firewall pair test fixture:
- Two OPNsense VMs in CARP configuration
- A third VM as a client verifying connectivity
- Automated failover testing: disconnect primary's virtual NIC, verify CARP failover to backup, reconnect, verify failback
- LACP failover: disconnect one LAGG member, verify traffic continues on remaining member
This builds on the KVM test harness from Phase 6.

View File

@@ -1,77 +0,0 @@
# Phase 11: Named Config Instances & Cross-Namespace Access
## Goal
Allow multiple instances of the same config type within a single namespace, identified by name. Also allow explicit namespace specification when retrieving config items, enabling cross-deployment orchestration.
## Context
The current `harmony_config` system identifies config items by type only (`T::KEY` from `#[derive(Config)]`). This works for singletons but breaks when you need multiple instances of the same type:
- **Firewall pair**: primary and backup need separate `OPNSenseApiCredentials` (different API keys for different devices)
- **Worker nodes**: each BMC has its own `IpmiCredentials` with different username/password
- **Firewall administrators**: multiple `OPNSenseApiCredentials` with different permission levels
- **Multi-tenant**: customer firewalls vs. NationTech infrastructure firewalls need separate credential sets
Using separate namespaces per device is not the answer — a firewall pair belongs to a single deployment, and forcing namespace switches for each device in a pair adds unnecessary friction.
Cross-namespace access is a separate but related need: the NT firewall pair and C1 customer firewall pair live in separate namespaces (the customer manages their own firewall), but NationTech needs read access to the C1 namespace for BINAT coordination.
## Tasks
### 11.1 Named config instances within a namespace
**Priority**: HIGH
**Status**: Not started
Extend the `Config` trait and `ConfigManager` to support an optional instance name:
```rust
// Current (singleton): gets "OPNSenseApiCredentials" from the active namespace
let creds = ConfigManager::get::<OPNSenseApiCredentials>().await?;
// New (named): gets "OPNSenseApiCredentials/fw-primary" from the active namespace
let primary_creds = ConfigManager::get_named::<OPNSenseApiCredentials>("fw-primary").await?;
let backup_creds = ConfigManager::get_named::<OPNSenseApiCredentials>("fw-backup").await?;
```
Storage key becomes `{T::KEY}/{instance_name}` (or similar). The unnamed `get()` remains unchanged for backward compatibility.
This needs to work across all config sources:
- `EnvSource`: `HARMONY_CONFIG_{KEY}_{NAME}` (e.g., `HARMONY_CONFIG_OPNSENSE_API_CREDENTIALS_FW_PRIMARY`)
- `SqliteSource`: composite key `{key}/{name}`
- `StoreSource` (OpenBao): path `{namespace}/{key}/{name}`
- `PromptSource`: prompt includes the instance name for clarity
### 11.2 Cross-namespace config access
**Priority**: MEDIUM
**Status**: Not started
Allow specifying an explicit namespace when retrieving a config item:
```rust
// Get from the active namespace (current behavior)
let nt_creds = ConfigManager::get::<OPNSenseApiCredentials>().await?;
// Get from a specific namespace
let c1_creds = ConfigManager::get_from_namespace::<OPNSenseApiCredentials>("c1").await?;
```
This enables orchestration across deployments: the NT deployment can read C1's firewall credentials for BINAT coordination without switching the global namespace.
For the `StoreSource` (OpenBao), this maps to reading from a different KV path prefix. For `SqliteSource`, it maps to a different database file or a namespace column. For `EnvSource`, it could use a different prefix (`HARMONY_CONFIG_C1_{KEY}`).
### 11.3 Update FirewallPairTopology to use named configs
**Priority**: MEDIUM
**Status**: Blocked by 11.1
Once named config instances are available, update `FirewallPairTopology::opnsense_from_config()` to use them:
```rust
let primary_creds = ConfigManager::get_named::<OPNSenseApiCredentials>("fw-primary").await?;
let backup_creds = ConfigManager::get_named::<OPNSenseApiCredentials>("fw-backup").await?;
```
This removes the current limitation of shared credentials between primary and backup.

4
brocade/examples/env.sh Normal file
View File

@@ -0,0 +1,4 @@
export HARMONY_SECRET_NAMESPACE=brocade-example
export HARMONY_SECRET_STORE=file
export HARMONY_DATABASE_URL=sqlite://harmony_brocade_example.sqlite
export RUST_LOG=info

View File

@@ -1,6 +1,6 @@
use std::net::{IpAddr, Ipv4Addr};
use brocade::{BrocadeOptions, ssh};
use brocade::{BrocadeOptions, Vlan, ssh};
use harmony_secret::{Secret, SecretManager};
use harmony_types::switch::PortLocation;
use schemars::JsonSchema;
@@ -17,9 +17,12 @@ async fn main() {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
// let ip = IpAddr::V4(Ipv4Addr::new(10, 0, 0, 250)); // old brocade @ ianlet
let ip = IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)); // brocade @ sto1
// let ip = IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)); // brocade @ sto1
// let ip = IpAddr::V4(Ipv4Addr::new(192, 168, 4, 11)); // brocade @ st
let switch_addresses = vec![ip];
//let switch_addresses = vec![ip];
let ip0 = IpAddr::V4(Ipv4Addr::new(192, 168, 12, 147)); // brocade @ test
let ip1 = IpAddr::V4(Ipv4Addr::new(192, 168, 12, 109)); // brocade @ test
let switch_addresses = vec![ip0, ip1];
let config = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
.await
@@ -32,7 +35,7 @@ async fn main() {
&BrocadeOptions {
dry_run: true,
ssh: ssh::SshOptions {
port: 2222,
port: 22,
..Default::default()
},
..Default::default()
@@ -58,18 +61,38 @@ async fn main() {
}
println!("--------------");
todo!();
println!("Creating VLAN 100 (test-vlan)...");
brocade
.create_vlan(&Vlan {
id: 100,
name: "test-vlan".to_string(),
})
.await
.unwrap();
println!("--------------");
println!("Deleting VLAN 100...");
brocade
.delete_vlan(&Vlan {
id: 100,
name: "test-vlan".to_string(),
})
.await
.unwrap();
println!("--------------");
todo!("STOP!");
let channel_name = "1";
brocade.clear_port_channel(channel_name).await.unwrap();
println!("--------------");
let channel_id = brocade.find_available_channel_id().await.unwrap();
let channel_id = 1;
println!("--------------");
let channel_name = "HARMONY_LAG";
let ports = [PortLocation(2, 0, 35)];
brocade
.create_port_channel(channel_id, channel_name, &ports)
.create_port_channel(channel_id, channel_name, &ports, None)
.await
.unwrap();
}

View File

@@ -0,0 +1,242 @@
use std::io::{self, Write};
use brocade::{
BrocadeOptions, InterfaceConfig, InterfaceSpeed, InterfaceType, PortOperatingMode,
SwitchInterface, Vlan, VlanList, ssh,
};
use harmony_secret::{Secret, SecretManager};
use harmony_types::switch::PortLocation;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
#[derive(Secret, Clone, Debug, JsonSchema, Serialize, Deserialize)]
struct BrocadeSwitchAuth {
username: String,
password: String,
}
fn wait_for_enter() {
println!("\n--- Press ENTER to continue ---");
io::stdout().flush().unwrap();
io::stdin().read_line(&mut String::new()).unwrap();
}
#[tokio::main]
async fn main() {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
let ip0 = std::net::IpAddr::V4(std::net::Ipv4Addr::new(192, 168, 12, 147));
let ip1 = std::net::IpAddr::V4(std::net::Ipv4Addr::new(192, 168, 12, 109));
let switch_addresses = vec![ip0, ip1];
let config = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
.await
.unwrap();
let brocade = brocade::init(
&switch_addresses,
&config.username,
&config.password,
&BrocadeOptions {
dry_run: false,
ssh: ssh::SshOptions {
port: 22,
..Default::default()
},
..Default::default()
},
)
.await
.expect("Brocade client failed to connect");
println!("=== Connecting to Brocade switches ===");
let version = brocade.version().await.unwrap();
println!("Version: {version:?}");
let entries = brocade.get_stack_topology().await.unwrap();
println!("Stack topology: {entries:#?}");
println!("\n=== Creating VLANs 100, 200, 300 ===");
brocade
.create_vlan(&Vlan {
id: 100,
name: "vlan100".to_string(),
})
.await
.unwrap();
println!("Created VLAN 100 (vlan100)");
brocade
.create_vlan(&Vlan {
id: 200,
name: "vlan200".to_string(),
})
.await
.unwrap();
println!("Created VLAN 200 (vlan200)");
brocade
.create_vlan(&Vlan {
id: 300,
name: "vlan300".to_string(),
})
.await
.unwrap();
println!("Created VLAN 300 (vlan300)");
println!("\n=== Press ENTER to continue to port configuration tests ---");
wait_for_enter();
println!("\n=== TEST 1: Trunk port (all VLANs, speed 10Gbps) on TenGigabitEthernet 1/0/1 ===");
println!("Configuring port as trunk with all VLANs and speed 10Gbps...");
let configs = vec![InterfaceConfig {
interface: SwitchInterface::Ethernet(
InterfaceType::TenGigabitEthernet,
PortLocation(1, 0, 1),
),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::All),
speed: Some(InterfaceSpeed::Gbps10),
}];
brocade.configure_interfaces(&configs).await.unwrap();
println!("Querying interfaces...");
let interfaces = brocade.get_interfaces().await.unwrap();
for iface in &interfaces {
if iface.name.contains("1/0/1") {
println!(" {iface:?}");
}
}
wait_for_enter();
println!("\n=== TEST 2: Trunk port (specific VLANs) on TenGigabitEthernet 1/0/2 ===");
println!("Configuring port as trunk with VLANs 100, 200...");
let configs = vec![InterfaceConfig {
interface: SwitchInterface::Ethernet(
InterfaceType::TenGigabitEthernet,
PortLocation(1, 0, 2),
),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::Specific(vec![
Vlan {
id: 100,
name: "vlan100".to_string(),
},
Vlan {
id: 200,
name: "vlan200".to_string(),
},
])),
speed: None,
}];
brocade.configure_interfaces(&configs).await.unwrap();
println!("Querying interfaces...");
let interfaces = brocade.get_interfaces().await.unwrap();
for iface in &interfaces {
if iface.name.contains("1/0/2") {
println!(" {iface:?}");
}
}
wait_for_enter();
println!("\n=== TEST 3: Access port (default VLAN 1) on TenGigabitEthernet 1/0/3 ===");
println!("Configuring port as access (default VLAN 1)...");
let configs = vec![InterfaceConfig {
interface: SwitchInterface::Ethernet(
InterfaceType::TenGigabitEthernet,
PortLocation(1, 0, 3),
),
mode: PortOperatingMode::Access,
access_vlan: None,
trunk_vlans: None,
speed: None,
}];
brocade.configure_interfaces(&configs).await.unwrap();
println!("Querying interfaces...");
let interfaces = brocade.get_interfaces().await.unwrap();
for iface in &interfaces {
if iface.name.contains("1/0/3") {
println!(" {iface:?}");
}
}
wait_for_enter();
println!("\n=== TEST 4: Access port (custom VLAN 100) on TenGigabitEthernet 1/0/4 ===");
println!("Configuring port as access with VLAN 100...");
let configs = vec![InterfaceConfig {
interface: SwitchInterface::Ethernet(
InterfaceType::TenGigabitEthernet,
PortLocation(1, 0, 4),
),
mode: PortOperatingMode::Access,
access_vlan: Some(100),
trunk_vlans: None,
speed: None,
}];
brocade.configure_interfaces(&configs).await.unwrap();
println!("Querying interfaces...");
let interfaces = brocade.get_interfaces().await.unwrap();
for iface in &interfaces {
if iface.name.contains("1/0/4") {
println!(" {iface:?}");
}
}
wait_for_enter();
println!("\n=== TEST 5: Port-channel on TenGigabitEthernet 1/0/5 and 1/0/6 ===");
let channel_id = 1;
println!("Using channel ID: {channel_id}");
println!("Creating port-channel with ports 1/0/5 and 1/0/6...");
let ports = [PortLocation(1, 0, 5), PortLocation(1, 0, 6)];
brocade
.create_port_channel(channel_id, "HARMONY_LAG", &ports, None)
.await
.unwrap();
println!("Port-channel created.");
println!("Querying port-channel summary...");
let interfaces = brocade.get_interfaces().await.unwrap();
for iface in &interfaces {
if iface.name.contains("1/0/5") || iface.name.contains("1/0/6") {
println!(" {iface:?}");
}
}
wait_for_enter();
println!("\n=== TEARDOWN: Clearing port-channels and deleting VLANs ===");
println!("Clearing port-channel {channel_id}...");
brocade
.clear_port_channel(&channel_id.to_string())
.await
.unwrap();
println!("Resetting interfaces...");
for port in 1..=6 {
let interface = format!("TenGigabitEthernet 1/0/{port}");
println!(" Resetting {interface}...");
brocade.reset_interface(&interface).await.unwrap();
}
println!("Deleting VLAN 100...");
brocade
.delete_vlan(&Vlan {
id: 100,
name: "vlan100".to_string(),
})
.await
.unwrap();
println!("Deleting VLAN 200...");
brocade
.delete_vlan(&Vlan {
id: 200,
name: "vlan200".to_string(),
})
.await
.unwrap();
println!("Deleting VLAN 300...");
brocade
.delete_vlan(&Vlan {
id: 300,
name: "vlan300".to_string(),
})
.await
.unwrap();
println!("\n=== DONE ===");
}

View File

@@ -1,7 +1,8 @@
use super::BrocadeClient;
use crate::{
BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo, MacAddressEntry,
PortChannelId, PortOperatingMode, parse_brocade_mac_address, shell::BrocadeShell,
BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceConfig, InterfaceInfo,
InterfaceSpeed, MacAddressEntry, PortChannelId, PortOperatingMode, Vlan,
parse_brocade_mac_address, shell::BrocadeShell,
};
use async_trait::async_trait;
@@ -138,10 +139,15 @@ impl BrocadeClient for FastIronClient {
todo!()
}
async fn configure_interfaces(
&self,
_interfaces: &Vec<(String, PortOperatingMode)>,
) -> Result<(), Error> {
async fn configure_interfaces(&self, _interfaces: &Vec<InterfaceConfig>) -> Result<(), Error> {
todo!()
}
async fn create_vlan(&self, _vlan: &Vlan) -> Result<(), Error> {
todo!()
}
async fn delete_vlan(&self, _vlan: &Vlan) -> Result<(), Error> {
todo!()
}
@@ -180,11 +186,18 @@ impl BrocadeClient for FastIronClient {
channel_id: PortChannelId,
channel_name: &str,
ports: &[PortLocation],
speed: Option<&InterfaceSpeed>,
) -> Result<(), Error> {
info!(
"[Brocade] Configuring port-channel '{channel_name} {channel_id}' with ports: {ports:?}"
);
if let Some(speed) = speed {
log::warn!(
"[Brocade] FastIron: speed override ({speed}) on port-channel is not yet implemented; ignoring"
);
}
let commands = self.build_port_channel_commands(channel_id, channel_name, ports);
self.shell
.run_commands(commands, ExecutionMode::Privileged)
@@ -194,6 +207,25 @@ impl BrocadeClient for FastIronClient {
Ok(())
}
async fn reset_interface(&self, interface: &str) -> Result<(), Error> {
info!("[Brocade] Resetting interface: {interface}");
let commands = vec![
"configure terminal".into(),
format!("interface {interface}"),
"no switchport".into(),
"no speed".into(),
"exit".into(),
];
self.shell
.run_commands(commands, ExecutionMode::Privileged)
.await?;
info!("[Brocade] Interface '{interface}' reset.");
Ok(())
}
async fn clear_port_channel(&self, channel_name: &str) -> Result<(), Error> {
info!("[Brocade] Clearing port-channel: {channel_name}");

View File

@@ -76,6 +76,74 @@ pub struct MacAddressEntry {
pub type PortChannelId = u8;
#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
pub struct Vlan {
pub id: u16,
pub name: String,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
pub enum VlanList {
All,
Specific(Vec<Vlan>),
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
pub enum SwitchInterface {
Ethernet(InterfaceType, PortLocation),
PortChannel(PortChannelId),
}
impl fmt::Display for SwitchInterface {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
SwitchInterface::Ethernet(itype, loc) => write!(f, "{itype} {loc}"),
SwitchInterface::PortChannel(id) => write!(f, "port-channel {id}"),
}
}
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
pub enum InterfaceSpeed {
Mbps100,
Gbps1,
Gbps1Auto,
Gbps10,
Auto,
}
impl fmt::Display for InterfaceSpeed {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
InterfaceSpeed::Mbps100 => write!(f, "100"),
InterfaceSpeed::Gbps1 => write!(f, "1000"),
InterfaceSpeed::Gbps1Auto => write!(f, "1000-auto"),
InterfaceSpeed::Gbps10 => write!(f, "10000"),
InterfaceSpeed::Auto => write!(f, "auto"),
}
}
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
pub struct InterfaceConfig {
pub interface: SwitchInterface,
pub mode: PortOperatingMode,
pub access_vlan: Option<u16>,
pub trunk_vlans: Option<VlanList>,
pub speed: Option<InterfaceSpeed>,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
pub struct PortChannelConfig {
pub id: PortChannelId,
pub name: String,
pub ports: Vec<PortLocation>,
pub mode: PortOperatingMode,
pub access_vlan: Option<Vlan>,
pub trunk_vlans: Option<VlanList>,
pub speed: Option<InterfaceSpeed>,
}
/// Represents a single physical or logical link connecting two switches within a stack or fabric.
///
/// This structure provides a standardized view of the topology regardless of the
@@ -104,16 +172,17 @@ pub struct InterfaceInfo {
}
/// Categorizes the functional type of a switch interface.
#[derive(Debug, PartialEq, Eq, Clone)]
#[derive(Debug, PartialEq, Eq, Clone, Serialize)]
pub enum InterfaceType {
/// Physical or virtual Ethernet interface (e.g., TenGigabitEthernet, FortyGigabitEthernet).
Ethernet(String),
TenGigabitEthernet,
FortyGigabitEthernet,
}
impl fmt::Display for InterfaceType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
InterfaceType::Ethernet(name) => write!(f, "{name}"),
InterfaceType::TenGigabitEthernet => write!(f, "TenGigabitEthernet"),
InterfaceType::FortyGigabitEthernet => write!(f, "FortyGigabitEthernet"),
}
}
}
@@ -206,10 +275,13 @@ pub trait BrocadeClient: std::fmt::Debug {
async fn get_interfaces(&self) -> Result<Vec<InterfaceInfo>, Error>;
/// Configures a set of interfaces to be operated with a specified mode (access ports, ISL, etc.).
async fn configure_interfaces(
&self,
interfaces: &Vec<(String, PortOperatingMode)>,
) -> Result<(), Error>;
async fn configure_interfaces(&self, interfaces: &Vec<InterfaceConfig>) -> Result<(), Error>;
/// Creates a new VLAN on the switch.
async fn create_vlan(&self, vlan: &Vlan) -> Result<(), Error>;
/// Deletes a VLAN from the switch.
async fn delete_vlan(&self, vlan: &Vlan) -> Result<(), Error>;
/// Scans the existing configuration to find the next available (unused)
/// Port-Channel ID (`lag` or `trunk`) for assignment.
@@ -230,11 +302,16 @@ pub trait BrocadeClient: std::fmt::Debug {
/// * `channel_id`: The ID (e.g., 1-128) for the logical port channel.
/// * `channel_name`: A descriptive name for the LAG (used in configuration context).
/// * `ports`: A slice of `PortLocation` structs defining the physical member ports.
/// * `speed`: Optional speed override applied to both the logical port-channel
/// interface and each member port. Required on Brocade when forcing a
/// non-default speed (e.g. 1G on 10G-capable ports), otherwise the LAG
/// members and the logical interface end up inconsistent.
async fn create_port_channel(
&self,
channel_id: PortChannelId,
channel_name: &str,
ports: &[PortLocation],
speed: Option<&InterfaceSpeed>,
) -> Result<(), Error>;
/// Enables Simple Network Management Protocol (SNMP) server for switch
@@ -246,6 +323,9 @@ pub trait BrocadeClient: std::fmt::Debug {
/// * `des`: The Data Encryption Standard algorithm key
async fn enable_snmp(&self, user_name: &str, auth: &str, des: &str) -> Result<(), Error>;
/// Resets an interface to its default state by removing switchport configuration.
async fn reset_interface(&self, interface: &str) -> Result<(), Error>;
/// Removes all configuration associated with the specified Port-Channel name.
///
/// This operation should be idempotent; attempting to clear a non-existent

View File

@@ -6,9 +6,10 @@ use log::{debug, info};
use regex::Regex;
use crate::{
BrocadeClient, BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo,
InterfaceStatus, InterfaceType, MacAddressEntry, PortChannelId, PortOperatingMode,
parse_brocade_mac_address, shell::BrocadeShell,
BrocadeClient, BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceConfig,
InterfaceInfo, InterfaceSpeed, InterfaceStatus, InterfaceType, MacAddressEntry, PortChannelId,
PortOperatingMode, SwitchInterface, Vlan, VlanList, parse_brocade_mac_address,
shell::BrocadeShell,
};
#[derive(Debug)]
@@ -84,8 +85,8 @@ impl NetworkOperatingSystemClient {
}
let interface_type = match parts[0] {
"Fo" => InterfaceType::Ethernet("FortyGigabitEthernet".to_string()),
"Te" => InterfaceType::Ethernet("TenGigabitEthernet".to_string()),
"Fo" => InterfaceType::FortyGigabitEthernet,
"Te" => InterfaceType::TenGigabitEthernet,
_ => return None,
};
let port_location = PortLocation::from_str(parts[1]).ok()?;
@@ -185,18 +186,20 @@ impl BrocadeClient for NetworkOperatingSystemClient {
.collect()
}
async fn configure_interfaces(
&self,
interfaces: &Vec<(String, PortOperatingMode)>,
) -> Result<(), Error> {
async fn configure_interfaces(&self, interfaces: &Vec<InterfaceConfig>) -> Result<(), Error> {
info!("[Brocade] Configuring {} interface(s)...", interfaces.len());
let mut commands = vec!["configure terminal".to_string()];
for interface in interfaces {
commands.push(format!("interface {}", interface.0));
debug!(
"[Brocade] Configuring interface {} as {:?}",
interface.interface, interface.mode
);
match interface.1 {
commands.push(format!("interface {}", interface.interface));
match interface.mode {
PortOperatingMode::Fabric => {
commands.push("fabric isl enable".into());
commands.push("fabric trunk enable".into());
@@ -204,23 +207,50 @@ impl BrocadeClient for NetworkOperatingSystemClient {
PortOperatingMode::Trunk => {
commands.push("switchport".into());
commands.push("switchport mode trunk".into());
commands.push("switchport trunk allowed vlan all".into());
match &interface.trunk_vlans {
Some(VlanList::All) => {
commands.push("switchport trunk allowed vlan all".into());
}
Some(VlanList::Specific(vlans)) => {
for vlan in vlans {
commands.push(format!("switchport trunk allowed vlan add {}", vlan.id));
}
}
None => {
commands.push("switchport trunk allowed vlan all".into());
}
}
commands.push("no switchport trunk tag native-vlan".into());
commands.push("spanning-tree shutdown".into());
commands.push("no fabric isl enable".into());
commands.push("no fabric trunk enable".into());
commands.push("no shutdown".into());
if matches!(interface.interface, SwitchInterface::Ethernet(..)) {
commands.push("spanning-tree shutdown".into());
commands.push("no fabric isl enable".into());
commands.push("no fabric trunk enable".into());
}
}
PortOperatingMode::Access => {
commands.push("switchport".into());
commands.push("switchport mode access".into());
commands.push("switchport access vlan 1".into());
commands.push("no spanning-tree shutdown".into());
commands.push("no fabric isl enable".into());
commands.push("no fabric trunk enable".into());
let access_vlan = interface.access_vlan.unwrap_or(1);
commands.push(format!("switchport access vlan {access_vlan}"));
if matches!(interface.interface, SwitchInterface::Ethernet(..)) {
commands.push("no spanning-tree shutdown".into());
commands.push("no fabric isl enable".into());
commands.push("no fabric trunk enable".into());
}
}
}
if let Some(speed) = &interface.speed {
info!(
"[Brocade] Overriding speed on {} to {speed}",
interface.interface
);
if matches!(interface.interface, SwitchInterface::PortChannel(..)) {
commands.push("shutdown".into());
}
commands.push(format!("speed {speed}"));
}
commands.push("no shutdown".into());
commands.push("exit".into());
}
@@ -235,6 +265,40 @@ impl BrocadeClient for NetworkOperatingSystemClient {
Ok(())
}
async fn create_vlan(&self, vlan: &Vlan) -> Result<(), Error> {
info!("[Brocade] Creating VLAN {} ({})", vlan.id, vlan.name);
let commands = vec![
"configure terminal".into(),
format!("interface Vlan {}", vlan.id),
format!("name {}", vlan.name),
"exit".into(),
];
self.shell
.run_commands(commands, ExecutionMode::Regular)
.await?;
info!("[Brocade] VLAN {} ({}) created.", vlan.id, vlan.name);
Ok(())
}
async fn delete_vlan(&self, vlan: &Vlan) -> Result<(), Error> {
info!("[Brocade] Deleting VLAN {}", vlan.id);
let commands = vec![
"configure terminal".into(),
format!("no interface Vlan {}", vlan.id),
];
self.shell
.run_commands(commands, ExecutionMode::Regular)
.await?;
info!("[Brocade] VLAN {} deleted.", vlan.id);
Ok(())
}
async fn find_available_channel_id(&self) -> Result<PortChannelId, Error> {
info!("[Brocade] Finding next available channel id...");
@@ -273,6 +337,7 @@ impl BrocadeClient for NetworkOperatingSystemClient {
channel_id: PortChannelId,
channel_name: &str,
ports: &[PortLocation],
speed: Option<&InterfaceSpeed>,
) -> Result<(), Error> {
info!(
"[Brocade] Configuring port-channel '{channel_id} {channel_name}' with ports: {}",
@@ -283,27 +348,34 @@ impl BrocadeClient for NetworkOperatingSystemClient {
.join(", ")
);
let interfaces = self.get_interfaces().await?;
let mut commands = vec![
"configure terminal".into(),
format!("interface port-channel {}", channel_id),
"no shutdown".into(),
"exit".into(),
format!("description {channel_name}"),
];
if let Some(speed) = speed {
commands.push("shutdown".into());
commands.push(format!("speed {speed}"));
commands.push("no shutdown".into());
}
commands.push("exit".into());
for port in ports {
let interface = interfaces.iter().find(|i| i.port_location == *port);
let Some(interface) = interface else {
continue;
};
commands.push(format!("interface {}", interface.name));
debug!(
"[Brocade] Adding port TenGigabitEthernet {} to channel-group {}",
port, channel_id
);
commands.push(format!("interface TenGigabitEthernet {}", port));
commands.push("no switchport".into());
commands.push("no ip address".into());
commands.push("no fabric isl enable".into());
commands.push("no fabric trunk enable".into());
commands.push(format!("channel-group {channel_id} mode active"));
commands.push("lacp timeout short".into());
if let Some(speed) = speed {
commands.push(format!("speed {speed}"));
}
commands.push("no shutdown".into());
commands.push("exit".into());
}
@@ -317,6 +389,25 @@ impl BrocadeClient for NetworkOperatingSystemClient {
Ok(())
}
async fn reset_interface(&self, interface: &str) -> Result<(), Error> {
info!("[Brocade] Resetting interface: {interface}");
let commands = vec![
"configure terminal".into(),
format!("interface {interface}"),
"no switchport".into(),
"no speed".into(),
"exit".into(),
];
self.shell
.run_commands(commands, ExecutionMode::Regular)
.await?;
info!("[Brocade] Interface '{interface}' reset.");
Ok(())
}
async fn clear_port_channel(&self, channel_name: &str) -> Result<(), Error> {
info!("[Brocade] Clearing port-channel: {channel_name}");

View File

@@ -3,8 +3,8 @@ set -e
cd "$(dirname "$0")/.."
# Ensure vendor submodules are present (needed by opnsense-codegen tests)
git submodule update --init --depth 1 opnsense-codegen/vendor/core opnsense-codegen/vendor/plugins
git submodule init
git submodule update
rustc --version
cargo check --all-targets --all-features --keep-going

View File

@@ -1,54 +0,0 @@
#!/bin/sh
# OPNsense end-to-end integration test.
#
# Boots an OPNsense VM via KVM, runs all Harmony OPNsense Scores twice,
# and verifies idempotency (second run produces zero duplicates).
#
# Requirements:
# - libvirtd running (systemctl start libvirtd)
# - User in libvirt group (see examples/opnsense_vm_integration/setup-libvirt.sh)
# - OPNsense nano image downloaded (--download flag on first run)
#
# Usage:
# ./build/opnsense-e2e.sh # full test (boot if needed + run)
# ./build/opnsense-e2e.sh --download # download OPNsense image first
# ./build/opnsense-e2e.sh --clean # tear down VM after testing
set -e
cd "$(dirname "$0")/.."
PKG="opnsense-vm-integration"
echo "=== OPNsense E2E Integration Test ==="
echo ""
# Handle --download flag
if [ "$1" = "--download" ]; then
echo "--- Downloading OPNsense image ---"
cargo run -p "$PKG" -- --download
echo ""
fi
# Handle --clean flag
if [ "$1" = "--clean" ]; then
echo "--- Cleaning up VM ---"
cargo run -p "$PKG" -- --clean
echo "=== Clean complete ==="
exit 0
fi
# Check prerequisites
echo "--- Checking prerequisites ---"
cargo run -p "$PKG" -- --check
echo ""
# Boot VM if not already running
echo "--- Ensuring VM is running ---"
cargo run -p "$PKG" -- --boot
echo ""
# Run integration tests (includes idempotency verification)
echo "--- Running integration tests ---"
cargo run -p "$PKG"
echo ""
echo "=== OPNsense E2E Integration Test PASSED ==="

View File

@@ -13,7 +13,6 @@ If you're new to Harmony, start here:
See how to use Harmony to solve real-world problems.
- [**OPNsense VM Integration**](./use-cases/opnsense-vm-integration.md): Boot a real OPNsense firewall in a local KVM VM and configure it entirely through Harmony. Fully automated, zero manual steps — the flashiest demo. Requires Linux with KVM.
- [**PostgreSQL on Local K3D**](./use-cases/postgresql-on-local-k3d.md): Deploy a production-grade PostgreSQL cluster on a local K3D cluster. The fastest way to get started.
- [**OKD on Bare Metal**](./use-cases/okd-on-bare-metal.md): A detailed walkthrough of bootstrapping a high-availability OKD cluster from physical hardware.

View File

@@ -8,7 +8,6 @@
## Use Cases
- [PostgreSQL on Local K3D](./use-cases/postgresql-on-local-k3d.md)
- [OPNsense VM Integration](./use-cases/opnsense-vm-integration.md)
- [OKD on Bare Metal](./use-cases/okd-on-bare-metal.md)
## Component Catalogs

View File

@@ -1,117 +0,0 @@
# ADR-021: Agent Desired-State Convergence — Problem Statement and Initial Proposal
**Status:** Proposed (under review — see ADR-022 for alternatives)
**Date:** 2026-04-09
> This document was originally drafted as an "Accepted" ADR describing a shell-command executor. On review, the team was not convinced that the shell-executor shape is the right one. It has been re-framed as a **problem statement + one candidate proposal (Alternative A)**. Alternative designs — including a mini-kubelet model and an embedded-Score model — are explored in [ADR-022](./022-agent-desired-state-alternatives.md). A final decision has **not** been made.
## Context
The Harmony Agent (ADR-016) currently handles a single use case: PostgreSQL HA failover via `DeploymentConfig::FailoverPostgreSQL`. For the IoT fleet management platform (Raspberry Pi clusters deployed in homes, offices, and community spaces), we need the agent to become a general-purpose desired-state convergence engine.
Concretely, the central Harmony control plane must be able to:
1. Express the *desired state* of an individual Pi (or a class of Pis) in a typed, serializable form.
2. Ship that desired state to the device over the existing NATS JetStream mesh (ADR-017-1).
3. Have the on-device agent reconcile toward it — idempotently, observably, and without manual intervention.
4. Read back an authoritative, typed *actual state* so the control plane can report convergence, surface errors, and drive a fleet dashboard.
The existing heartbeat / failover machinery (ADR-017-3) remains valuable — it proves the agent can maintain persistent NATS connections, do CAS writes against KV, and react to state changes. Whatever desired-state mechanism we add **extends** that foundation rather than replacing it.
### Design forces
- **Coherence with the rest of Harmony.** Harmony's entire identity is Score-Topology-Interpret with compile-time safety. A desired-state mechanism that reintroduces stringly-typed, runtime-validated blobs on the edge would be a regression from our own design rules (see `CLAUDE.md`: "Capabilities are industry concepts, not tools", "Scores encapsulate operational complexity", "Scores must be idempotent").
- **The "mini-kubelet" framing.** The team is converging on a mental model where the agent is a stripped-down kubelet: it owns a set of local reconcilers, maintains a PLEG-like state machine per managed resource, and converges toward a declarative manifest. ADR-017-3 is already explicitly Kubernetes-inspired for staleness detection. This framing should inform the desired-state design, not fight it.
- **Speed to IoT MVP.** We need something shippable soon enough that real Pi fleets can be demoed. Over-engineering the v1 risks never shipping; under-engineering it risks a rewrite once the wrong abstraction is entrenched on hundreds of devices in the field.
- **Security.** Whatever lands on the device is, by construction, running with the agent's privileges. A mechanism that reduces to "run this shell string as root" is a very wide blast radius.
- **Serializability.** Today, Harmony Scores are *not* uniformly serializable across the wire — many hold trait objects, closures, or references to live topologies. Any design that assumes "just send a Score" needs to confront this.
## Initial Proposal (Alternative A — Shell Command Executor)
This is the first-pass design, implemented as a happy-path scaffold on this branch. **It is presented here for critique, not as a settled decision.**
### Desired-State Model
Each agent watches a NATS KV key `desired-state.<agent-id>` for its workload definition. When the value changes, the agent executes the workload and reports the result to `actual-state.<agent-id>`. This is a pull-based convergence loop: the control plane writes intent, the agent converges, the control plane reads the result.
A `DesiredState` is a serializable description of what should be running on the device. For this first iteration, it is a shell command plus a monotonic generation counter.
```rust
enum DeploymentConfig {
FailoverPostgreSQL(FailoverCNPGConfig), // existing
DesiredState(DesiredStateConfig), // new
}
struct DesiredStateConfig {
command: String,
generation: u64,
}
```
### Config Flow
```
Central Platform NATS JetStream Agent (Pi)
================ ============== ==========
1. Write desired state -------> KV: desired-state.<agent-id>
2. Watch detects change
3. Execute workload
4. Write result --------> KV: actual-state.<agent-id>
5. Read actual state <------- KV: actual-state.<agent-id>
```
The agent's heartbeat loop continues independently. The desired-state watcher runs as a separate async task, sharing the same NATS connection. This separation means a slow command execution does not block heartbeats.
### State Reporting
```rust
struct ActualState {
agent_id: Id,
generation: u64, // mirrors the desired-state generation
status: ExecutionStatus, // Success, Failed, Running
stdout: String,
stderr: String,
exit_code: Option<i32>,
executed_at: u64,
}
```
The control plane reads this key to determine convergence. If `actual_state.generation == desired_state.generation` and `status == Success`, the device has converged.
### Why this shape was chosen first
- Dirt cheap to implement (≈200 lines, done on this branch).
- Works for literally any task a human would type into a Pi shell.
- Reuses the existing NATS KV infrastructure and CAS write idiom already proven by the heartbeat loop.
- Provides an end-to-end demo path in under a day.
## Open Questions and Concerns
The following concerns block promoting this to an "Accepted" decision:
1. **Wrong abstraction level.** `sh -c "<string>"` is the *opposite* of what Harmony stands for. Harmony exists because IaC tools drown in stringly-typed, runtime-validated config. Shipping arbitrary shell to the edge recreates that problem inside our own agent — at the worst possible place (the device).
2. **No idempotency.** `systemctl start foo` and `apt install foo` are not idempotent by themselves. Every Score in Harmony is required to be idempotent. A shell executor pushes that burden onto whoever writes the commands, where we cannot check it.
3. **No resource model.** There is no notion of "this manifest owns this systemd unit". When desired state changes, we cannot compute a diff, we cannot garbage-collect the old resource, and we cannot surface "drift" meaningfully. We know generation N was "run"; we do not know what it left behind.
4. **No typed status.** `stdout`/`stderr`/`exit_code` is not enough to drive a fleet dashboard. We want typed `Status { container: Running { since, restarts }, unit: Active, file: PresentAt(sha256) }`.
5. **No lifecycle.** Shell commands are fire-and-forget. A kubelet-shaped agent needs to know whether a resource is *still* healthy after it was created — liveness and readiness are first-class concerns, not a post-hoc `exit_code` check.
6. **Security.** The ADR hand-waves "NATS ACLs + future signing". In practice, v1 lets anyone with write access to the KV bucket execute anything as the agent user. Even with NATS ACLs, the *shape* of the API invites abuse; a typed manifest with an allowlist of resource types has a much narrower attack surface by construction.
7. **Generational model is too coarse.** A single `generation: u64` per agent means we can only describe one monolithic "job". Real fleet state is a *set* of resources (this container, this unit, this file). We need per-resource generations, or a manifest-level generation with a sub-resource status map.
8. **Incoherent with ADR-017-3's kubelet framing.** That ADR deliberately borrowed K8s vocabulary (staleness, fencing, leader promotion) because kubelet-like semantics are the right ones for resilient edge workloads. Shell-exec abandons that lineage at the first opportunity.
9. **Coherence with the Score-Topology-Interpret pattern.** Today's proposal introduces a parallel concept ("DesiredStateConfig") that has nothing to do with Score or Topology. If a Pi is just "a topology with a small capability set" (systemd, podman, files, network), then the right thing to ship is a Score, not a shell string.
## Status of the Implementation on this Branch
The happy-path code in `harmony_agent/src/desired_state.rs` (≈250 lines, fully tested) implements Alternative A. It is **scaffolding**, not a committed design:
- It is useful as a vehicle to prove out the NATS KV watch + typed `ActualState` CAS write pattern, both of which are reusable regardless of which alternative we pick.
- It should **not** be wired into user-facing tooling until the architectural decision in ADR-022 is made.
- If we adopt Alternative B (mini-kubelet) or C (embedded Scores), the shell executor either becomes one *variant* of a typed `Resource` enum (a `ShellJob` resource, clearly labeled as an escape hatch) or is deleted outright.
## Next Steps
1. Review ADR-022 (alternatives + recommendation).
2. Pick a target design.
3. Either:
- Rework `desired_state.rs` to match the chosen target, **or**
- Keep it behind a feature flag as a demo fallback while the real design is built.
4. Re-file this ADR as "Superseded by ADR-022" or update it in place with the accepted design.

View File

@@ -1,218 +0,0 @@
# ADR-022: Agent Desired-State — Alternatives and Recommendation
**Status:** Proposed
**Date:** 2026-04-09
**Supersedes (candidate):** ADR-021 shell-executor proposal
## Context
ADR-021 drafted a first-pass "desired-state convergence" mechanism for the Harmony Agent (ADR-016) in the form of a shell-command executor. On review, that shape raised serious concerns (see ADR-021 §"Open Questions and Concerns"): it is incoherent with Harmony's Score-Topology-Interpret pattern, it is not idempotent, it has no resource model, no typed status, no lifecycle, and it weakens the agent's security posture.
Separately, the team has been converging on a **"mini-kubelet" framing** for the IoT agent:
- The agent owns a small, fixed set of *reconcilers*, one per resource type it can manage (systemd unit, container, file, network interface, overlay config...).
- The desired state is a *typed manifest* — a bag of resources with identities, generations, and typed status.
- The agent runs reconcile loops similar to kubelet's Pod Lifecycle Event Generator (PLEG): for each managed resource, observe actual, compare to desired, apply the minimum delta, update typed status.
- Failure and drift are first-class. "I tried, it failed, here is why" is a valid steady state.
ADR-017-3 already borrows Kubernetes vocabulary (staleness, fencing, promotion) on purpose. Doubling down on the kubelet metaphor at the desired-state layer is the natural continuation, not a tangent.
This ADR enumerates the candidate designs, argues their tradeoffs honestly, and recommends a path.
## Alternatives
### Alternative A — Shell Command Executor (ADR-021 as-is)
**Shape:** `DesiredState { command: String, generation: u64 }`, agent does `sh -c $command`, pipes stdout/stderr/exit into `ActualState`.
**Pros:**
- Trivial to implement. ~200 LOC, already on this branch.
- Works for any task that can be expressed as a shell pipeline — maximum flexibility at v1.
- Zero new abstractions: reuses existing NATS KV watch + CAS patterns.
- End-to-end demo-able in an afternoon.
**Cons:**
- **Wrong abstraction level.** Harmony's entire thesis is "no more stringly-typed YAML/shell mud pits". This design ships that mud pit *to the edge*.
- **Not idempotent.** The burden of idempotency falls on whoever writes the command string. `systemctl start foo` run twice is fine; `apt install foo && echo "done" >> /etc/state` run twice is broken. We cannot enforce correctness.
- **No resource model.** No concept of "this manifest owns X". No diffing, no GC, no drift detection, no "what does this agent currently run?".
- **No typed status.** stdout/stderr/exit_code does not tell a fleet dashboard "container nginx is running, restarted 3 times, last healthy 2s ago". It tells it "this bash ran and exited 0 once, three minutes ago".
- **No lifecycle.** Fire-and-forget; post-exit the agent has no notion of whether the resource is still healthy.
- **Security.** Even with NATS ACLs, the API's *shape* invites abuse. Any bug in the control plane that lets a user influence a desired-state write equals RCE on every Pi.
- **Incoherent with ADR-017-3 and Score-Topology-Interpret.** Introduces a parallel concept that has nothing to do with the rest of Harmony.
**Verdict:** Acceptable only as a *named escape hatch* inside a richer design (a `ShellJob` resource variant, explicitly labeled as such and audited). Not acceptable as the whole design.
---
### Alternative B — Mini-Kubelet with Typed Resource Manifests
**Shape:** The agent owns a fixed set of `Resource` variants and one reconciler per variant.
```rust
/// The unit of desired state shipped to an agent.
/// Serialized to JSON, pushed via NATS KV to `desired-state.<agent-id>`.
struct AgentManifest {
generation: u64, // monotonic, control-plane assigned
resources: Vec<ManagedResource>,
}
struct ManagedResource {
/// Stable, manifest-unique identity. Used for diffing across generations.
id: ResourceId,
spec: ResourceSpec,
}
enum ResourceSpec {
SystemdUnit(SystemdUnitSpec), // ensure unit exists, enabled, active
Container(ContainerSpec), // podman/docker run with image, env, volumes
File(FileSpec), // path, mode, owner, content (hash or inline)
NetworkConfig(NetworkConfigSpec), // interface, addresses, routes
ShellJob(ShellJobSpec), // explicit escape hatch, audited separately
// ...extend carefully
}
/// What the agent reports back.
struct AgentStatus {
manifest_generation: u64, // which desired-state gen this reflects
observed_generation: u64, // highest gen the agent has *processed*
resources: HashMap<ResourceId, ResourceStatus>,
conditions: Vec<AgentCondition>, // Ready, Degraded, Reconciling, ...
}
enum ResourceStatus {
Pending,
Reconciling { since: Timestamp },
Ready { since: Timestamp, details: ResourceReadyDetails },
Failed { since: Timestamp, error: String, retry_after: Option<Timestamp> },
}
```
Each reconciler implements a small trait:
```rust
trait Reconciler {
type Spec;
type Status;
async fn observe(&self, id: &ResourceId) -> Result<Self::Status>;
async fn reconcile(&self, id: &ResourceId, spec: &Self::Spec) -> Result<Self::Status>;
async fn delete(&self, id: &ResourceId) -> Result<()>;
}
```
The agent loop becomes:
1. Watch `desired-state.<agent-id>` for the latest `AgentManifest`.
2. On change, compute diff vs. observed set: additions, updates, deletions.
3. Dispatch each resource to its reconciler. Reconcilers are idempotent by contract.
4. Aggregate per-resource status into `AgentStatus`, write to `actual-state.<agent-id>` via CAS.
5. Re-run periodically to detect drift even when desired state has not changed (PLEG-equivalent).
**Pros:**
- **Declarative and idempotent by construction.** Reconcilers are required to be idempotent; the contract is enforced in Rust traits, not in docs.
- **Typed status.** Dashboards, alerts, and the control plane get structured data.
- **Drift detection.** Periodic re-observation catches "someone SSH'd in and stopped the service".
- **Lifecycle.** Each resource has a clear state machine; health is a first-class concept.
- **Coherent with ADR-017-3.** The kubelet framing becomes literal, not metaphorical.
- **Narrow attack surface.** The agent only knows how to do a handful of well-audited things. Adding a new capability is an explicit code change, not a new shell string.
- **Composable with Harmony's existing philosophy.** `ManagedResource` is to the agent what a Score is to a Topology, at a smaller scale.
**Cons:**
- More upfront design. Each reconciler needs to be written and tested.
- Requires us to *commit* to a resource type set and its status schema. Adding a new kind is a versioned change to the wire format.
- Duplicates, at the edge, some of the vocabulary already present in Harmony's Score layer (e.g., `FileDeployment`, container deployments). Risk of two parallel abstractions evolving in tension.
- Harder to demo in a single afternoon.
**Verdict:** Strong candidate. Matches the team's mini-kubelet intuition directly.
---
### Alternative C — Embedded Score Interpreter on the Agent
**Shape:** The desired state *is* a Harmony Score (or a set of Scores), serialized and pushed via NATS. The agent hosts a local `PiTopology` that exposes a small, carefully chosen set of capabilities (`SystemdHost`, `ContainerRuntime`, `FileSystemHost`, `NetworkConfigurator`, ...). The agent runs the Score's `interpret` against that local topology.
```rust
// On the control plane:
let score = SystemdServiceScore { ... };
let wire: SerializedScore = score.to_wire()?;
nats.put(format!("desired-state.{agent_id}"), wire).await?;
// On the agent:
let score = SerializedScore::decode(payload)?;
let topology = PiTopology::new();
let outcome = score.interpret(&inventory, &topology).await?;
// Outcome is already a typed Harmony result (SUCCESS/NOOP/FAILURE/RUNNING/...).
```
**Pros:**
- **Zero new abstractions.** The agent becomes "a Harmony executor that happens to run on a Pi". Everything we already know how to do in Harmony works, for free.
- **Maximum coherence.** There is exactly one way to describe desired state in the whole system: a Score. The type system enforces that a score requesting `K8sclient` cannot be shipped to a Pi topology that does not offer it — at compile time on the control plane, at deserialization time on the agent.
- **Composability.** Higher-order topologies (ADR-015) work unchanged: `FailoverTopology<PiTopology>` gets you HA at the edge for free.
- **Single mental model for the whole team.** "Write a Score" is already the Harmony primitive; no one needs to learn a second one.
**Cons:**
- **Serializability.** This is the hard one. Harmony Scores today hold trait objects, references to live topology state, and embedded closures in places. Making them uniformly serde-serializable is a non-trivial refactor that touches dozens of modules. We would be gating the IoT MVP on a cross-cutting refactor.
- **Agent binary size.** If "the agent can run any Score", it links every module. On a Pi Zero 2 W, that matters. We can mitigate with feature flags, but then we are back to "which scores does *this* agent support?" — i.e., we have reinvented resource-type registration, just spelled differently.
- **Capability scoping is subtle.** We have to be extremely careful about which capabilities `PiTopology` exposes. "A Pi can run containers" is true; "a Pi can run arbitrary k8s clusters" is not. Getting that boundary wrong opens the same attack surface as Alternative A, just hidden behind a Score.
- **Control-plane UX.** The central platform now needs to instantiate Scores for specific Pis, handle their inventories, and ship them. That is heavier than "push a JSON blob".
**Verdict:** The principled end state, almost certainly where we want to be in 18 months. Not shippable for the IoT MVP.
---
### Alternative D — Hybrid: Typed Manifests Now, Scores Later
**Shape:** Ship Alternative B (typed `AgentManifest` with a fixed set of reconcilers). Keep the Score ambition (Alternative C) as an explicit roadmap item. When Scores become uniformly wire-serializable and `PiTopology` is mature, migrate by adding a `ResourceSpec::Score(SerializedScore)` variant. Eventually that variant may subsume the others.
**Pros:**
- **Shippable soon.** Alternative B is the implementable core; we can have a fleet demo in weeks, not months.
- **On a path to the ideal.** We do not dead-end. The `ResourceSpec` enum becomes the migration seam.
- **De-risks the Score serialization refactor.** We learn what resource types we *actually* need on the edge before we refactor the Score layer.
- **Lets us delete Alternative A cleanly.** The shell executor either disappears or survives as a narrow, explicitly-audited `ResourceSpec::ShellJob` variant that documents itself as an escape hatch.
**Cons:**
- Temporarily maintains two vocabularies (`ResourceSpec` at the edge, `Score` in the core). There is a risk they drift before they reconverge.
- Requires team discipline to actually do the C migration and not leave B as the permanent design.
**Verdict:** Recommended.
---
## Recommendation
**Adopt Alternative D (Hybrid: typed manifests now, Scores later).**
Reasoning:
1. **Speed to IoT MVP** is real. Alternative C is a 3-6 month refactor of the Score layer before we can deploy anything; Alternative B can ship within the current iteration.
2. **Long-term coherence with Harmony's design philosophy** is preserved because D has an explicit migration seam to C. We do not paint ourselves into a corner.
3. **The mini-kubelet framing is directly satisfied by B.** Typed resources, reconciler loops, observed-generation pattern, PLEG-style drift detection. This is exactly what the team has been describing.
4. **Capability-trait discipline carries over cleanly.** `Reconciler` is the agent-side analog of a capability trait (`DnsServer`, `K8sclient`, etc.). The rule "capabilities are industry concepts, not tools" applies to `ResourceSpec` too: we name it `Container`, not `Podman`; `SystemdUnit`, not `Systemctl`.
5. **The shell executor is not wasted work.** It proved the NATS KV watch + typed CAS write pattern that Alternative B will also need. It becomes either `ResourceSpec::ShellJob` (audited escape hatch) or gets deleted.
6. **Security posture improves immediately.** A fixed resource-type allowlist is dramatically tighter than "run any shell", even before we add signing or sandboxing.
7. **The IoT product use case actually is "deploy simple workloads to Pi fleets".** Containers, systemd services, config files, network config. That is a short list, and it maps to four or five resource types. We do not need the full expressive power of a Score layer to hit the product milestone.
## Specific Findings on the Current Implementation
`harmony_agent/src/desired_state.rs` (≈250 lines, implemented on this branch):
- **Keep as scaffolding**, do not wire into user tooling.
- The NATS KV watch loop, the `ActualState` CAS write, and the generation-tracking skeleton are all reusable by Alternative B. They are the only parts worth keeping.
- The `execute_command` function (shelling out via `Command::new("sh").arg("-c")`) is the part that bakes in the wrong abstraction. It should be:
1. **Moved behind a `ResourceSpec::ShellJob` reconciler** if we decide to keep shell as an explicit, audited escape hatch, **or**
2. **Deleted** when the first two real reconcilers (Container, SystemdUnit) land.
- The `DesiredStateConfig` / `ActualState` types in `harmony_agent/src/agent/config.rs` are too narrow. They should be replaced by `AgentManifest` / `AgentStatus` as sketched above. `generation: u64` at the manifest level stays; per-resource status is added.
- The existing tests (`executes_command_and_reports_result`, `reports_failure_for_bad_command`) are testing the shell executor specifically; they will be deleted or repurposed when the resource model lands.
## Open Questions (to resolve before implementing B)
1. **What is the minimum viable resource type set for the IoT MVP?** Proposal: `Container`, `SystemdUnit`, `File`. Defer `NetworkConfig`, `ShellJob` until a concrete use case appears.
2. **Where does `AgentManifest` live in the crate graph?** It is consumed by both the control plane and the agent. Likely `harmony_agent_types` (new) or an existing shared types crate.
3. **How are images, files, and secrets referenced?** By content hash + asset store URL (ADR: `harmony_assets`)? By inline payload under a size cap?
4. **What is the reconcile cadence?** On NATS KV change + periodic drift check every N seconds? What is N on a Pi?
5. **How does `AgentStatus` interact with the heartbeat loop?** Is the status written on every reconcile, or aggregated into the heartbeat payload? The heartbeat cares about liveness; the status cares about workload health. They are probably separate KV keys, coupled by generation.
6. **How do we handle partial failures and retry?** Exponential backoff per resource? Global pause on repeated failures? Surface to the control plane via `conditions`?
7. **Can the agent refuse a manifest it does not understand?** (Forward compatibility: new `ResourceSpec` variant rolled out before the agent upgrade.) Proposal: fail loudly and report a typed `UnknownResource` status so the control plane can detect version skew.
## Decision
**None yet.** This ADR is explicitly a proposal to adopt **Alternative D**, pending team review. If approved, a follow-up ADR-023 will specify the concrete `AgentManifest` / `AgentStatus` schema and the initial reconciler set.

View File

@@ -1,189 +0,0 @@
# Harmony Architecture — Three Open Challenges
Three problems that, if solved well, would make Harmony the most capable infrastructure automation framework in existence.
## 1. Topology Evolution During Deployment
### The problem
A bare-metal OKD deployment is a multi-hour process where the infrastructure's capabilities change as the deployment progresses:
```
Phase 0: Network only → OPNsense reachable, Brocade reachable, no hosts
Phase 1: Discovery → PXE boots work, hosts appear via mDNS, no k8s
Phase 2: Bootstrap → openshift-install running, API partially available
Phase 3: Control plane → k8s API available, operators converging, no workers
Phase 4: Workers → Full cluster, apps can be deployed
Phase 5: Day-2 → Monitoring, alerting, tenant onboarding
```
Today, `HAClusterTopology` implements _all_ capability traits from the start. If a Score calls `k8s_client()` during Phase 0, it hits `DummyInfra` which panics. The type system says "this is valid" but the runtime says "this will crash."
### Why it matters
- Scores that require k8s compile and register happily at Phase 0, then panic if accidentally executed too early
- The pipeline is ordered by convention (Stage 01 → 02 → 03 → ...) but nothing enforces that Stage 04 can't run before Stage 02
- Adding new capabilities (like "cluster has monitoring installed") requires editing the topology struct, not declaring the capability was acquired
### Design direction
The topology should evolve through **phases** where capabilities are _acquired_, not assumed. Two possible approaches:
**A. Phase-gated topology (runtime)**
The topology tracks which phase it's in. Capability methods check the phase before executing and return a meaningful error instead of panicking:
```rust
impl K8sclient for HAClusterTopology {
async fn k8s_client(&self) -> Result<Arc<K8sClient>, String> {
if self.phase < Phase::ControlPlaneReady {
return Err("k8s API not available yet (current phase: {})".into());
}
// ... actual implementation
}
}
```
Scores that fail due to phase mismatch get a clear error message, not a panic. The Maestro can validate phase requirements before executing a Score.
**B. Typestate topology (compile-time)**
Use Rust's type system to make invalid phase transitions unrepresentable:
```rust
struct Topology<P: Phase> { ... }
impl Topology<NetworkReady> {
fn bootstrap(self) -> Topology<Bootstrapping> { ... }
}
impl Topology<Bootstrapping> {
fn promote(self) -> Topology<ClusterReady> { ... }
}
// Only ClusterReady implements K8sclient
impl K8sclient for Topology<ClusterReady> { ... }
```
This is the "correct" Rust approach but requires significant refactoring and may be too rigid for real deployments where phases overlap.
**Recommendation**: Start with (A) — runtime phase tracking. It's additive (no breaking changes), catches the DummyInfra panic problem immediately, and provides the data needed for (B) later.
---
## 2. Runtime Plan & Validation Phase
### The problem
Harmony validates Scores at compile time: if a Score requires `DhcpServer + TftpServer`, the topology must implement both traits or the program won't compile. This is powerful but insufficient.
What compile-time _cannot_ check:
- Is the OPNsense API actually reachable right now?
- Does VLAN 100 already exist (so we can skip creating it)?
- Is there already a DHCP entry for this MAC address?
- Will this firewall rule conflict with an existing one?
- Is there enough disk space on the TFTP server for the boot images?
Today, these are discovered at execution time, deep inside an Interpret's `execute()` method. A failure at minute 45 of a deployment is expensive.
### Why it matters
- No way to preview what Harmony will do before it does it
- No way to detect conflicts or precondition failures early
- Operators must read logs to understand what happened — there's no structured "here's what I did" report
- Re-running a deployment is scary because you don't know what will be re-applied vs skipped
### Design direction
Add a **validate** phase to the Score/Interpret lifecycle:
```rust
#[async_trait]
pub trait Interpret<T>: Debug + Send {
/// Check preconditions and return what this interpret WOULD do.
/// Default implementation returns "will execute" (opt-in validation).
async fn validate(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<ValidationReport, InterpretError> {
Ok(ValidationReport::will_execute(self.get_name()))
}
/// Execute the interpret (existing method, unchanged).
async fn execute(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError>;
// ... existing methods
}
```
A `ValidationReport` would contain:
- **Status**: `WillCreate`, `WillUpdate`, `WillDelete`, `AlreadyApplied`, `Blocked(reason)`
- **Details**: human-readable description of planned changes
- **Preconditions**: list of checks performed and their results
The Maestro would run validation for all registered Scores before executing any of them, producing a plan that the operator reviews.
This is opt-in: Scores that don't implement `validate()` get a default "will execute" report. Over time, each Score adds validation logic. The OPNsense Scores are ideal first candidates since they can query current state via the API.
### Relationship to state
This approach does _not_ require a state file. Validation queries the infrastructure directly — the same philosophy Harmony already follows. The "plan" is computed fresh every time by asking the infrastructure what exists right now.
### Concrete use case: WebGuiConfigScore → LoadBalancerScore
`LoadBalancerScore` configures HAProxy to bind on port 443. But OPNsense's webgui defaults to port 443 — creating a port conflict. `WebGuiConfigScore` moves the webgui to 9443 first.
Today this is solved by ordering convention: `WebGuiConfigScore` is registered before `LoadBalancerScore` in the Score list. If someone reorders them, HAProxy silently fails to bind.
This is the simplest example of an implicit Score dependency that the current system cannot express or enforce. The `score_with_dep.rs` sketch explores declaring these dependencies at the type level, and Challenge #1 (phase-gated topology) would also help — a topology in "webgui on 443" phase could reject `LoadBalancerScore` at validation time.
---
## 3. TUI as Primary Interface
### The problem
The TUI (`harmony_tui`) exists with ratatui, crossterm, and tui-logger, but it's underused. The CLI (`harmony_cli`) is the primary interface. During a multi-hour deployment, operators watch scrolling log output with no structure, no ability to drill into a specific Score's progress, and no overview of where they are in the pipeline.
### Why it matters
- Log output during interactive prompts corrupts the terminal
- No way to see "I'm on Stage 3 of 7, 2 hours elapsed, 3 Scores completed successfully"
- No way to inspect a Score's configuration or outcome without reading logs
- The pipeline feels like a black box during execution
### Design direction
The TUI should provide three views:
**Pipeline view** — the default. Shows the ordered list of Scores with their status:
```
OKD HA Cluster Deployment [Stage 3/7 — 1h 42m elapsed]
──────────────────────────────────────────────────────────────────
✅ OKDIpxeScore 2m 14s
✅ OKDSetup01InventoryScore 8m 03s
✅ OKDSetup02BootstrapScore 34m 21s
▶ OKDSetup03ControlPlaneScore ... running
⏳ OKDSetupPersistNetworkBondScore
⏳ OKDSetup04WorkersScore
⏳ OKDSetup06InstallationReportScore
```
**Detail view** — press Enter on a Score to see its Outcome details, sub-score executions, and logs.
**Log view** — the current tui-logger panel, filtered to the selected Score.
The TUI already has the Score widget and log integration. What's missing is the pipeline-level orchestration view and the duration/status data — which the `Score::interpret` timing we just added now provides.
### Immediate enablers
The instrumentation event system (`HarmonyEvent`) already captures start/finish with execution IDs. The TUI subscriber just needs to:
1. Track the ordered list of Scores from the Maestro
2. Update status as `InterpretExecutionStarted`/`Finished` events arrive
3. Render the pipeline view using ratatui
This doesn't require architectural changes — it's a TUI feature built on existing infrastructure.

View File

@@ -156,56 +156,9 @@ impl<T: Topology + K8sclient> Interpret<T> for MyInterpret {
}
```
## Design Principles
### Capabilities are industry concepts, not tools
A capability trait must represent a **standard infrastructure need** that could be fulfilled by multiple tools. The developer who writes a Score should not need to know which product provides the capability.
Good capabilities: `DnsServer`, `LoadBalancer`, `DhcpServer`, `CertificateManagement`, `Router`
These are industry-standard concepts. OPNsense provides `DnsServer` via Unbound; a future topology could provide it via CoreDNS or AWS Route53. The Score doesn't care.
The one exception is when the developer fundamentally needs to know the implementation: `PostgreSQL` is a capability (not `Database`) because the developer writes PostgreSQL-specific SQL, replication configs, and connection strings. Swapping it for MariaDB would break the application, not just the infrastructure.
**Test:** If you could swap the underlying tool without breaking any Score that uses the capability, you've drawn the boundary correctly. If swapping would require rewriting Scores, the capability is too tool-specific.
### One Score per concern, one capability per concern
A Score should express a single infrastructure intent. A capability should expose a single infrastructure concept.
If you're building a deployment that combines multiple concerns (e.g., "deploy Zitadel" requires PostgreSQL + Helm + K8s + Ingress), the Score **declares all of them as trait bounds** and the Topology provides them:
```rust
impl<T: Topology + K8sclient + HelmCommand + PostgreSQL> Score<T> for ZitadelScore
```
If you're building a tool that provides multiple capabilities (e.g., OpenBao provides secret storage, KV versioning, JWT auth, policy management), each capability should be a **separate trait** that can be implemented independently. This way, a Score that only needs secret storage doesn't pull in JWT auth machinery.
### Scores encapsulate operational complexity
The value of a Score is turning tribal knowledge into compiled, type-checked infrastructure. The `ZitadelScore` knows that you need to create a namespace, deploy a PostgreSQL cluster via CNPG, wait for the cluster to be ready, create a masterkey secret, generate a secure admin password, detect the K8s distribution, build distribution-specific Helm values, and deploy the chart. A developer using it writes:
```rust
let zitadel = ZitadelScore { host: "sso.example.com".to_string(), ..Default::default() };
```
Move procedural complexity into opinionated Scores. This makes them easy to test against various topologies (k3d, OpenShift, kubeadm, bare metal) and easy to compose in high-level examples.
### Scores must be idempotent
Running a Score twice should produce the same result as running it once. Use create-or-update semantics, check for existing state before acting, and handle "already exists" responses gracefully.
### Scores must not depend on other Scores running first
A Score declares its capability requirements via trait bounds. It does **not** assume that another Score has run before it. If your Score needs PostgreSQL, it declares `T: PostgreSQL` and lets the Topology handle whether PostgreSQL needs to be installed first.
If you find yourself writing "run Score A, then run Score B", consider whether Score B should declare the capability that Score A provides, or whether both should be orchestrated by a higher-level Score that composes them.
## Best Practices
- **Keep Scores focused** — one Score per concern (deployment, monitoring, networking)
- **Use `..Default::default()`** for optional fields so callers only need to specify what they care about
- **Return `Outcome`** — use `Outcome::success`, `Outcome::failure`, or `Outcome::success_with_details` to communicate results clearly
- **Handle errors gracefully** — return meaningful `InterpretError` messages that help operators debug issues
- **Design capabilities around the developer's need** — not around the tool that fulfills it. Ask: "what is the core need that leads a developer to use this tool?"
- **Don't name capabilities after tools** — `SecretVault` not `OpenbaoStore`, `IdentityProvider` not `ZitadelAuth`

View File

@@ -4,13 +4,9 @@ Real-world scenarios demonstrating Harmony in action.
## Available Use Cases
### [OPNsense VM Integration](./opnsense-vm-integration.md)
Boot a real OPNsense firewall in a local KVM VM and configure it entirely through Harmony — load balancer, DHCP, TFTP, VLANs, firewall rules, NAT, VIPs, and link aggregation. Fully automated, zero manual steps. The best way to see Harmony in action.
### [PostgreSQL on Local K3D](./postgresql-on-local-k3d.md)
Deploy a fully functional PostgreSQL cluster on a local K3D cluster in under 10 minutes. The quickest way to see Harmony's Kubernetes capabilities.
Deploy a fully functional PostgreSQL cluster on a local K3D cluster in under 10 minutes. The quickest way to see Harmony in action.
### [OKD on Bare Metal](./okd-on-bare-metal.md)

View File

@@ -1,234 +0,0 @@
# Use Case: OPNsense VM Integration
Boot a real OPNsense firewall in a local KVM virtual machine and configure it entirely through Harmony — load balancer, DHCP, TFTP, VLANs, firewall rules, NAT, VIPs, and link aggregation. Fully automated, zero manual steps, CI-friendly.
This is the best way to discover Harmony: you'll see 11 different Scores configure a production firewall through type-safe Rust code and the OPNsense REST API.
## What you'll have at the end
A local OPNsense VM fully configured by Harmony with:
- HAProxy load balancer with health-checked backends
- DHCP server with static host bindings and PXE boot options
- TFTP server serving boot files
- Prometheus node exporter enabled
- 2 VLANs on the LAN interface
- Firewall filter rules, outbound NAT, and bidirectional NAT
- Virtual IPs (IP aliases)
- Port forwarding (DNAT) rules
- LAGG interface (link aggregation)
All applied idempotently through the OPNsense REST API — the same Scores used in production bare-metal deployments.
## Prerequisites
- **Linux** with KVM support (Intel VT-x/AMD-V enabled in BIOS)
- **libvirt + QEMU** installed and running (`libvirtd` service active)
- **~10 GB** free disk space
- **~15 minutes** for the first run (image download + OPNsense firmware update)
- Docker running (if installed — the setup handles compatibility)
Supported distributions: Arch, Manjaro, Fedora, Ubuntu, Debian.
## Quick start (single command)
```bash
# One-time: install libvirt and configure permissions
./examples/opnsense_vm_integration/setup-libvirt.sh
newgrp libvirt
# Verify
cargo run -p opnsense-vm-integration -- --check
# Boot + bootstrap + run all 11 Scores (fully unattended)
cargo run -p opnsense-vm-integration -- --full
```
That's it. No browser clicks, no manual SSH configuration, no wizard interaction.
## What happens step by step
### Phase 1: Boot the VM
Downloads the OPNsense 26.1 nano image (~350 MB, cached after first run), injects a `config.xml` with virtio NIC assignments, creates a 4 GiB qcow2 disk, and boots the VM with 4 NICs:
```
vtnet0 = LAN (192.168.1.1/24) -- management
vtnet1 = WAN (DHCP) -- internet access
vtnet2 = LAGG member 1 -- for aggregation test
vtnet3 = LAGG member 2 -- for aggregation test
```
### Phase 2: Automated bootstrap
Once the web UI responds (~20 seconds after boot), `OPNsenseBootstrap` takes over:
1. **Logs in** to the web UI (root/opnsense) with automatic CSRF token handling
2. **Aborts the initial setup wizard** via the OPNsense API
3. **Enables SSH** with root login and password authentication
4. **Changes the web GUI port** to 9443 (prevents HAProxy conflicts on standard ports)
5. **Restarts lighttpd** via SSH to apply the port change
No browser, no Playwright, no expect scripts — just HTTP requests with session cookies and SSH commands.
### Phase 3: Run 11 Scores
Creates an API key via SSH, then configures the entire firewall:
| # | Score | What it configures |
|---|-------|--------------------|
| 1 | `LoadBalancerScore` | HAProxy with 2 frontends (ports 16443 and 18443), backends with health checks |
| 2 | `DhcpScore` | DHCP range, 2 static host bindings (MAC-to-IP), PXE boot options |
| 3 | `TftpScore` | TFTP server serving PXE boot files |
| 4 | `NodeExporterScore` | Prometheus node exporter on OPNsense |
| 5 | `VlanScore` | 2 test VLANs (tags 100 and 200) on vtnet0 |
| 6 | `FirewallRuleScore` | Firewall filter rules (allow/block with logging) |
| 7 | `OutboundNatScore` | Source NAT rule for outbound traffic |
| 8 | `BinatScore` | Bidirectional 1:1 NAT |
| 9 | `VipScore` | Virtual IPs (IP aliases for CARP/HA) |
| 10 | `DnatScore` | Port forwarding rules |
| 11 | `LaggScore` | Link aggregation group (failover on vtnet2+vtnet3) |
Each Score reports its status:
```
[LoadBalancerScore] SUCCESS in 2.2s -- Load balancer configured 2 services
[DhcpScore] SUCCESS in 1.4s -- Dhcp Interpret execution successful
[VlanScore] SUCCESS in 0.2s -- Configured 2 VLANs
...
PASSED -- All OPNsense integration tests successful
```
### Phase 4: Verify
After all Scores run, the integration test verifies each configuration via the REST API:
- HAProxy has 2+ frontends
- Dnsmasq has 2+ static hosts and a DHCP range
- TFTP is enabled
- Node exporter is enabled
- 2+ VLANs exist
- Firewall filter rules are present
- VIPs, DNAT, BINAT, SNAT rules are configured
- LAGG interface exists
## Explore in the web UI
After the test completes, open https://192.168.1.1:9443 (login: root/opnsense) and explore:
- **Services > HAProxy > Settings** -- frontends, backends, servers with health checks
- **Services > Dnsmasq DNS > Settings** -- host overrides (static DHCP entries)
- **Services > TFTP** -- enabled with uploaded files
- **Interfaces > Other Types > VLAN** -- two tagged VLANs
- **Firewall > Automation > Filter** -- filter rules created by Harmony
- **Firewall > NAT > Port Forward** -- DNAT rules
- **Firewall > NAT > Outbound** -- SNAT rules
- **Firewall > NAT > One-to-One** -- BINAT rules
- **Interfaces > Virtual IPs > Settings** -- IP aliases
- **Interfaces > Other Types > LAGG** -- link aggregation group
## Clean up
```bash
cargo run -p opnsense-vm-integration -- --clean
```
Destroys the VM and virtual networks. The cached OPNsense image is kept for next time.
## How it works
### Architecture
```
Your workstation OPNsense VM (KVM)
+--------------------+ +---------------------+
| Harmony | | OPNsense 26.1 |
| +---------------+ | REST API | +---------------+ |
| | OPNsense |----(HTTPS:9443)---->| | API + Plugins | |
| | Scores | | | +---------------+ |
| +---------------+ | SSH | +---------------+ |
| +---------------+ |----(port 22)----->| | FreeBSD Shell | |
| | OPNsense- | | | +---------------+ |
| | Bootstrap | | HTTP session | |
| +---------------+ |----(HTTPS:443)--->| (first-boot only) |
| +---------------+ | | |
| | opnsense- | | | LAN: 192.168.1.1 |
| | config | | | WAN: DHCP |
| +---------------+ | +---------------------+
+--------------------+
```
The stack has four layers:
1. **`opnsense-api`** -- auto-generated typed Rust client from OPNsense XML model files
2. **`opnsense-config`** -- high-level configuration modules (DHCP, firewall, load balancer, etc.)
3. **`OPNsenseBootstrap`** -- first-boot automation via HTTP session auth (login, wizard, SSH, webgui port)
4. **Harmony Scores** -- declarative desired-state descriptions that make the firewall match
### The Score pattern
```rust
// 1. Declare desired state
let score = VlanScore {
vlans: vec![
VlanDef { parent: "vtnet0", tag: 100, description: "management" },
VlanDef { parent: "vtnet0", tag: 200, description: "storage" },
],
};
// 2. Execute against topology -- queries current state, applies diff
score.interpret(&inventory, &topology).await?;
// Output: [VlanScore] SUCCESS in 0.9s -- Created 2 VLANs
```
Scores are idempotent: running the same Score twice produces the same result.
## Network architecture
```
Host (192.168.1.10) --- virbr-opn bridge --- OPNsense LAN (192.168.1.1)
192.168.1.0/24 vtnet0
NAT to internet
--- virbr0 (default) --- OPNsense WAN (DHCP)
192.168.122.0/24 vtnet1
NAT to internet
```
## Available commands
| Command | Description |
|---------|-------------|
| `--check` | Verify prerequisites (libvirtd, virsh, qemu-img) |
| `--download` | Download the OPNsense image (cached) |
| `--boot` | Create VM + automated bootstrap |
| (default) | Run integration test (assumes VM is bootstrapped) |
| `--full` | Boot + bootstrap + integration test (CI mode) |
| `--status` | Show VM state, ports, and connectivity |
| `--clean` | Destroy VM and networks |
## Environment variables
| Variable | Default | Description |
|----------|---------|-------------|
| `RUST_LOG` | (unset) | Log level: `info`, `debug`, `trace` |
| `HARMONY_KVM_URI` | `qemu:///system` | Libvirt connection URI |
| `HARMONY_KVM_IMAGE_DIR` | `~/.local/share/harmony/kvm/images` | Cached disk images |
## Troubleshooting
**VM won't start / permission denied**
Ensure your user is in the `libvirt` group and that the image directory is traversable by the qemu user. Run `setup-libvirt.sh` to fix.
**192.168.1.0/24 conflict**
If your host network already uses this subnet, the VM will be unreachable. Edit the constants in `src/main.rs` to use a different subnet.
**Web GUI didn't come up after bootstrap**
The bootstrap runs `diagnose_via_ssh()` automatically when the web UI doesn't respond. Check the diagnostic output for lighttpd status and listening ports. You can also access the serial console: `virsh -c qemu:///system console opn-integration`
**HAProxy install fails**
OPNsense may need a firmware update. The integration test handles this automatically but it may take a few minutes for the update + reboot cycle.
## What's next
- **[OPNsense Firewall Pair](../../examples/opnsense_pair_integration/README.md)** -- boot two VMs, configure CARP HA failover with `FirewallPairTopology` and `CarpVipScore`. Uses NIC link control to bootstrap both VMs sequentially despite sharing the same default IP.
- [OKD on Bare Metal](./okd-on-bare-metal.md) -- the full 7-stage OKD installation pipeline using OPNsense as the infrastructure backbone
- [PostgreSQL on Local K3D](./postgresql-on-local-k3d.md) -- a simpler starting point using Kubernetes

View File

@@ -18,8 +18,6 @@ This directory contains runnable examples demonstrating Harmony's capabilities.
| `remove_rook_osd` | Remove a Rook OSD | — | ✅ | Rook/Ceph |
| `brocade_snmp_server` | Configure Brocade switch SNMP | — | ✅ | Brocade switch |
| `opnsense_node_exporter` | Node exporter on OPNsense | — | ✅ | OPNsense firewall |
| `opnsense_vm_integration` | Full OPNsense firewall automation (11 Scores) | ✅ | — | KVM/libvirt |
| `opnsense_pair_integration` | OPNsense HA pair with CARP failover | ✅ | — | KVM/libvirt |
| `okd_pxe` | PXE boot configuration for OKD | — | — | ✅ |
| `okd_installation` | Full OKD bare-metal install | — | — | ✅ |
| `okd_cluster_alerts` | OKD cluster monitoring alerts | — | ✅ | OKD cluster |
@@ -77,8 +75,6 @@ This directory contains runnable examples demonstrating Harmony's capabilities.
- **`application_monitoring_with_tenant`** — App monitoring with tenant isolation
### Infrastructure & Bare Metal
- **`opnsense_vm_integration`** — **Recommended demo.** Boot an OPNsense VM and configure it with 11 Scores (load balancer, DHCP, TFTP, VLANs, firewall rules, NAT, VIPs, LAGG). Fully automated, requires only KVM. See the [detailed guide](../docs/use-cases/opnsense-vm-integration.md).
- **`opnsense_pair_integration`** — Boot two OPNsense VMs and configure a CARP HA firewall pair with `FirewallPairTopology` and `CarpVipScore`. Demonstrates NIC link control for sequential bootstrap.
- **`okd_installation`** — Full OKD cluster from scratch
- **`okd_pxe`** — PXE boot configuration for OKD
- **`sttest`** — Full OKD stack test with specific hardware

View File

@@ -27,7 +27,6 @@ async fn main() {
};
let application = Arc::new(RustWebapp {
name: "example-monitoring".to_string(),
version: "0.1.0".to_string(),
dns: "example-monitoring.harmony.mcd".to_string(),
project_root: PathBuf::from("./examples/rust/webapp"),
framework: Some(RustWebFramework::Leptos),

View File

@@ -1,6 +1,6 @@
use std::str::FromStr;
use brocade::{BrocadeOptions, PortOperatingMode};
use brocade::{BrocadeOptions, InterfaceConfig, InterfaceType, PortOperatingMode, SwitchInterface, VlanList};
use harmony::{
infra::brocade::BrocadeSwitchConfig,
inventory::Inventory,
@@ -9,6 +9,13 @@ use harmony::{
use harmony_macros::ip;
use harmony_types::{id::Id, switch::PortLocation};
fn tengig(stack: u8, slot: u8, port: u8) -> SwitchInterface {
SwitchInterface::Ethernet(
InterfaceType::TenGigabitEthernet,
PortLocation(stack, slot, port),
)
}
fn get_switch_config() -> BrocadeSwitchConfig {
let mut options = BrocadeOptions::default();
options.ssh.port = 2222;
@@ -33,9 +40,27 @@ async fn main() {
Id::from_str("18").unwrap(),
],
ports_to_configure: vec![
(PortLocation(2, 0, 17), PortOperatingMode::Trunk),
(PortLocation(2, 0, 19), PortOperatingMode::Trunk),
(PortLocation(1, 0, 18), PortOperatingMode::Trunk),
InterfaceConfig {
interface: tengig(2, 0, 17),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::All),
speed: None,
},
InterfaceConfig {
interface: tengig(2, 0, 19),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::All),
speed: None,
},
InterfaceConfig {
interface: tengig(1, 0, 18),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::All),
speed: None,
},
],
};

View File

@@ -0,0 +1,18 @@
[package]
name = "brocade-switch-configuration"
edition = "2024"
version.workspace = true
readme.workspace = true
license.workspace = true
[dependencies]
harmony = { path = "../../harmony" }
harmony_cli = { path = "../../harmony_cli" }
harmony_macros = { path = "../../harmony_macros" }
harmony_types = { path = "../../harmony_types" }
tokio.workspace = true
async-trait.workspace = true
serde.workspace = true
log.workspace = true
env_logger.workspace = true
brocade = { path = "../../brocade" }

View File

@@ -0,0 +1,4 @@
export HARMONY_SECRET_NAMESPACE=brocade-example
export HARMONY_SECRET_STORE=file
export HARMONY_DATABASE_URL=sqlite://harmony_brocade_example.sqlite
export RUST_LOG=info

View File

@@ -0,0 +1,144 @@
use brocade::{
BrocadeOptions, InterfaceConfig, InterfaceSpeed, InterfaceType, PortChannelConfig,
PortOperatingMode, SwitchInterface, Vlan, VlanList,
};
use harmony::{
infra::brocade::BrocadeSwitchConfig,
inventory::Inventory,
modules::brocade::{BrocadeSwitchAuth, BrocadeSwitchConfigurationScore, SwitchTopology},
};
use harmony_macros::ip;
use harmony_types::switch::PortLocation;
fn tengig(stack: u8, slot: u8, port: u8) -> SwitchInterface {
SwitchInterface::Ethernet(
InterfaceType::TenGigabitEthernet,
PortLocation(stack, slot, port),
)
}
fn get_switch_config() -> BrocadeSwitchConfig {
let auth = BrocadeSwitchAuth {
username: "admin".to_string(),
password: "password".to_string(),
};
BrocadeSwitchConfig {
// ips: vec![ip!("192.168.12.147"), ip!("192.168.12.109")],
ips: vec![ip!("192.168.4.12"), ip!("192.168.4.11")],
auth,
options: BrocadeOptions {
dry_run: false,
ssh: brocade::ssh::SshOptions {
port: 22,
..Default::default()
},
..Default::default()
},
}
}
#[tokio::main]
async fn main() {
harmony_cli::cli_logger::init();
// ===================================================
// Step 1: Define VLANs once, use them everywhere
// ===================================================
let mgmt = Vlan {
id: 100,
name: "MGMT".to_string(),
};
let data = Vlan {
id: 200,
name: "DATA".to_string(),
};
let storage = Vlan {
id: 300,
name: "STORAGE".to_string(),
};
let backup = Vlan {
id: 400,
name: "BACKUP".to_string(),
};
// ===================================================
// Step 2: Build the score
// ===================================================
let score = BrocadeSwitchConfigurationScore {
// All VLANs that need to exist on the switch
vlans: vec![mgmt.clone(), data.clone(), storage.clone(), backup.clone()],
// Standalone interfaces (not part of any port-channel)
interfaces: vec![
// Trunk port with ALL VLANs, forced to 10Gbps
InterfaceConfig {
interface: tengig(1, 0, 20),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::All),
speed: Some(InterfaceSpeed::Gbps10),
},
// Trunk port with specific VLANs (MGMT + DATA only)
InterfaceConfig {
interface: tengig(1, 0, 21),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::Specific(vec![mgmt.clone(), data.clone()])),
speed: None,
},
// Access port on the MGMT VLAN
InterfaceConfig {
interface: tengig(1, 0, 22),
mode: PortOperatingMode::Access,
access_vlan: Some(mgmt.id),
trunk_vlans: None,
speed: None,
},
// Access port on the STORAGE VLAN
InterfaceConfig {
interface: tengig(1, 0, 23),
mode: PortOperatingMode::Access,
access_vlan: Some(storage.id),
trunk_vlans: None,
speed: None,
},
],
// Port-channels: member ports are bundled, L2 config goes on the port-channel
port_channels: vec![
// Port-channel 1: trunk with DATA + STORAGE VLANs, forced to 1Gbps
PortChannelConfig {
id: 1,
name: "SERVER_BOND".to_string(),
ports: vec![PortLocation(1, 0, 24), PortLocation(1, 0, 25)],
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::Specific(vec![data.clone(), storage.clone()])),
speed: Some(InterfaceSpeed::Gbps1),
},
// Port-channel 2: trunk with all VLANs, default speed
PortChannelConfig {
id: 2,
name: "BACKUP_BOND".to_string(),
ports: vec![PortLocation(1, 0, 26), PortLocation(1, 0, 27)],
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: Some(VlanList::All),
speed: None,
},
],
};
// ===================================================
// Step 3: Run
// ===================================================
harmony_cli::run(
Inventory::autoload(),
SwitchTopology::new(get_switch_config()).await,
vec![Box::new(score)],
None,
)
.await
.unwrap();
}

View File

@@ -12,9 +12,7 @@ harmony_config = { path = "../../harmony_config" }
harmony_macros = { path = "../../harmony_macros" }
harmony_secret = { path = "../../harmony_secret" }
harmony_types = { path = "../../harmony_types" }
harmony-k8s = { path = "../../harmony-k8s" }
k3d-rs = { path = "../../k3d" }
k8s-openapi.workspace = true
kube.workspace = true
tokio.workspace = true
url.workspace = true
@@ -24,7 +22,4 @@ serde.workspace = true
serde_json.workspace = true
anyhow.workspace = true
reqwest.workspace = true
clap = { version = "4", features = ["derive"] }
schemars = "0.8"
interactive-parse = "0.1.5"
directories = "6.0.0"

View File

@@ -1,90 +0,0 @@
# Harmony SSO Example
Deploys Zitadel (identity provider) and OpenBao (secrets management) on a local k3d cluster, then demonstrates using them as `harmony_config` backends for shared config and secret management.
## Prerequisites
- Docker running
- Ports 8080 and 8200 free
- `/etc/hosts` entries (or use a local DNS resolver):
```
127.0.0.1 sso.harmony.local
127.0.0.1 bao.harmony.local
```
## Usage
### Full deployment
```bash
# Deploy everything (OpenBao + Zitadel)
cargo run -p example-harmony-sso
# OpenBao only (faster, skip Zitadel)
cargo run -p example-harmony-sso -- --skip-zitadel
```
### Config storage demo (token auth)
After deployment, run the config demo to verify `harmony_config` works with OpenBao:
```bash
cargo run -p example-harmony-sso -- --demo
```
This writes and reads a `SsoExampleConfig` through the `ConfigManager` chain (`EnvSource -> StoreSource<OpenbaoSecretStore>`), demonstrating environment variable overrides and persistent storage in OpenBao KV v2.
### SSO device flow demo
Requires a Zitadel application configured for device code grant:
```bash
HARMONY_SSO_CLIENT_ID=<zitadel-app-client-id> \
cargo run -p example-harmony-sso -- --sso-demo
```
### Cleanup
```bash
cargo run -p example-harmony-sso -- --cleanup
```
## What gets deployed
| Component | Namespace | Access |
|---|---|---|
| OpenBao (standalone, file storage) | `openbao` | `http://bao.harmony.local:8200` |
| Zitadel (with CNPG PostgreSQL) | `zitadel` | `http://sso.harmony.local:8080` |
### OpenBao configuration
- **Auth methods:** userpass, JWT
- **Secrets engine:** KV v2 at `secret/`
- **Policy:** `harmony-dev` grants CRUD on `secret/data/harmony/*`
- **Userpass credentials:** `harmony` / `harmony-dev-password`
- **JWT auth:** configured with Zitadel as OIDC provider, role `harmony-developer`
- **Unseal keys:** saved to `~/.local/share/harmony/openbao/unseal-keys.json`
## Architecture
```
Developer CLI
|
|-- harmony_config::ConfigManager
| |-- EnvSource (HARMONY_CONFIG_* env vars)
| |-- StoreSource<OpenbaoSecretStore>
| |-- Token auth (OPENBAO_TOKEN)
| |-- Cached token validation
| |-- Zitadel OIDC device flow (RFC 8628)
| |-- Userpass fallback
|
v
k3d cluster (harmony-example)
|-- OpenBao (KV v2 secrets engine)
| |-- JWT auth -> validates Zitadel id_tokens
| |-- userpass auth -> dev credentials
|
|-- Zitadel (OpenID Connect IdP)
|-- Device authorization grant
|-- Federated login (Google, GitHub, Entra ID)
```

View File

@@ -1,155 +0,0 @@
# Harmony SSO Plan
## Context
Deploy Zitadel and OpenBao on a local k3d cluster, use them as `harmony_config` backends, and demonstrate end-to-end config storage authenticated via SSO. The goal: rock-solid deployment so teams and collaborators can reliably share config and secrets through OpenBao with Zitadel SSO authentication.
## Status
### Phase A: MVP with Token Auth -- DONE
- [x] A.1 -- CLI argument parsing (`--demo`, `--sso-demo`, `--skip-zitadel`, `--cleanup`)
- [x] A.2 -- Zitadel deployment via `ZitadelScore` (`external_secure: false` for k3d)
- [x] A.3 -- OpenBao JWT auth method + `harmony-dev` policy configuration
- [x] A.4 -- `--demo` flag: config storage demo with token auth via `ConfigManager`
- [x] A.5 -- Hardening: retry loops for pod readiness, HTTP readiness checks, `--cleanup`
- [x] A.6 -- README with prerequisites, usage, and architecture
Verified end-to-end: fresh `k3d cluster delete` -> `cargo run -p example-harmony-sso` -> `--demo` succeeds.
### Phase B: OIDC Device Flow + JWT Exchange -- TODO
The Zitadel OIDC device flow code exists (`harmony_secret/src/store/zitadel.rs`) but the **JWT exchange** step is missing: `process_token_response()` stores the OIDC `access_token` as `openbao_token` directly, but per ADR 020-1 the `id_token` should be exchanged with OpenBao's `/v1/auth/jwt/login` endpoint.
**B.1 -- Implement JWT exchange in `harmony_secret/src/store/zitadel.rs`:**
- Add `openbao_url`, `jwt_auth_mount`, `jwt_role` fields to `ZitadelOidcAuth`
- Add `exchange_jwt_for_openbao_token(id_token)` using raw `reqwest` (vaultrs 0.7.4 has no JWT auth module)
- POST `{openbao_url}/v1/auth/{jwt_auth_mount}/login` with `{"role": "...", "jwt": "..."}`
- Modify `process_token_response()` to use exchange when `openbao_url` is set
**B.2 -- Wire JWT params through `harmony_secret/src/store/openbao.rs`:**
- Pass `base_url`, `jwt_auth_mount`, `jwt_role` to `ZitadelOidcAuth::new()` in `authenticate_zitadel_oidc()`
- Update `OpenbaoSecretStore::new()` signature for optional `jwt_role` and `jwt_auth_mount`
**B.3 -- Add env vars to `harmony_secret/src/config.rs`:**
- `OPENBAO_JWT_AUTH_MOUNT` (default: `jwt`)
- `OPENBAO_JWT_ROLE` (default: `harmony-developer`)
**B.4 -- Silent refresh:**
- Add `refresh_token()` method to `ZitadelOidcAuth`
- Update auth chain in `openbao.rs`: cached session -> silent refresh -> device flow
**B.5 -- `--sso-demo` flag:**
- Already stubbed in `examples/harmony_sso/src/main.rs`
- Requires a Zitadel device code application (manual setup, accept `HARMONY_SSO_CLIENT_ID` env var)
**B.6 -- Solve in-cluster DNS for JWT auth config:**
- OpenBao JWT auth needs `oidc_discovery_url` to fetch Zitadel's JWKS
- Zitadel requires `Host` header matching `ExternalDomain` on ALL endpoints (including `/oauth/v2/keys`)
- So `oidc_discovery_url=http://zitadel.zitadel.svc.cluster.local:8080` gets 404 from Zitadel
- Options: (a) CoreDNS rewrite rule mapping `sso.harmony.local` -> `zitadel.zitadel.svc`, (b) Kubernetes ExternalName service, (c) `Zitadel.AdditionalDomains` Helm config to accept the internal hostname
- Currently non-fatal (warning only), needed before `--sso-demo` can work
### Phase C: Testing & Automation -- TODO
**C.1 -- Integration tests** (`examples/harmony_sso/tests/integration.rs`, `#[ignore]`):
- `test_openbao_health` -- health endpoint
- `test_zitadel_openid_config` -- OIDC discovery
- `test_openbao_userpass_auth` -- write/read secret
- `test_config_manager_openbao_backend` -- full ConfigManager chain
- `test_openbao_jwt_auth_configured` -- verify JWT auth method + role exist
**C.2 -- Zitadel application automation** (`examples/harmony_sso/src/zitadel_setup.rs`):
- Automate project + device code app creation via Zitadel Management API
- Extract and save `client_id`
---
## Tricky Things / Lessons Learned
### ZitadelScore on k3d -- security context
The Zitadel container image (`ghcr.io/zitadel/zitadel`) defines `User: "zitadel"` (non-numeric string). With `runAsNonRoot: true` and `runAsUser: null`, kubelet can't verify the user is non-root and fails with `CreateContainerConfigError`. **Fix:** set `runAsUser: 1000` explicitly (that's the UID for `zitadel` in `/etc/passwd`). This applies to all security contexts: `podSecurityContext`, `securityContext`, `initJob`, `setupJob`, and `login`.
Changed in `harmony/src/modules/zitadel/mod.rs` for the `K3sFamily | Default` branch.
### ZitadelScore on k3d -- ingress class
The K3sFamily Helm values had `kubernetes.io/ingress.class: nginx` annotations. k3d ships with traefik, not nginx. The nginx annotation caused traefik to ignore the ingress entirely (404 on all routes). **Fix:** removed the explicit ingress class annotations -- traefik picks up ingresses without an explicit class by default.
Changed in `harmony/src/modules/zitadel/mod.rs` for the `K3sFamily | Default` branch.
### CNPG CRD registration race
After `helm install cloudnative-pg`, the operator deployment becomes ready but the CRD (`clusters.postgresql.cnpg.io`) is not yet registered in the API server's discovery cache. The kube client caches API discovery at init time, so even after the CRD registers, a reused client won't see it. **Fix:** the example creates a **fresh topology** (and therefore fresh kube client) on each retry attempt. Up to 5 retries with 15s delay.
### CNPG PostgreSQL cluster readiness
After the CNPG `Cluster` CR is created, the PostgreSQL pods and the `-rw` service take 15-30s to come up. `ZitadelScore` immediately calls `topology.get_endpoint()` which looks for the `zitadel-pg-rw` service. If the service doesn't exist yet, it fails with "not found for cluster". **Fix:** same retry loop catches this error pattern.
### Zitadel Helm init job timing
The Zitadel Helm chart runs a `zitadel-init` pre-install/pre-upgrade Job that connects to PostgreSQL. If the PG cluster isn't fully ready (primary not accepting connections), the init job hangs until Helm's 5-minute timeout. On a cold start from scratch, the sequence is: CNPG operator install -> CRD registration (5-15s) -> PG cluster creation -> PG pod scheduling + init (~30s) -> PG primary ready -> Zitadel init job can connect. The retry loop handles this by allowing the full sequence to settle between attempts.
### Zitadel Host header validation
Zitadel validates the `Host` header on **all** HTTP endpoints against its `ExternalDomain` config (`sso.harmony.local`). This means:
- The OIDC discovery endpoint (`/.well-known/openid-configuration`) returns 404 if called via the internal service URL without the correct Host header
- The JWKS endpoint (`/oauth/v2/keys`) also requires the correct Host
- OpenBao's JWT auth `oidc_discovery_url` can't use `http://zitadel.zitadel.svc.cluster.local:8080` because Zitadel rejects the Host
- From outside the cluster, use `127.0.0.1:8080` with `Host: sso.harmony.local` header (or add /etc/hosts entry)
- Phase B needs to solve in-cluster DNS resolution for `sso.harmony.local`
### Both services share one port
Both Zitadel and OpenBao are exposed through traefik ingress on port 80 (mapped to host port 8080). Traefik routes by `Host` header: `sso.harmony.local` -> Zitadel, `bao.harmony.local` -> OpenBao. The original plan had separate port mappings (8080 for Zitadel, 8200 for OpenBao) but the 8200 mapping was useless since traefik only listens on 80/443.
For `--demo` mode, the port-forward bypasses traefik and connects directly to the OpenBao service on port 8200 (no Host header needed).
### `run_bao_command` and shell escaping
The `run_bao_command` function runs `kubectl exec ... -- sh -c "export VAULT_TOKEN=xxx && bao ..."`. Two gotchas:
1. Must use `export VAULT_TOKEN=...` (not just `VAULT_TOKEN=...` prefix) because piped commands after `|` don't inherit the prefix env var
2. The policy creation uses `printf '...' | bao policy write harmony-dev -` which needs careful quoting inside the `sh -c` wrapper. Using `run_bao_command_raw()` avoids double-wrapping.
### FIXMEs for future refactoring
The user flagged several areas that should use `harmony-k8s` instead of raw `kubectl`:
- `wait_for_pod_running()` -- harmony-k8s has pod wait functionality
- `init_openbao()`, `unseal_openbao()` -- exec into pods via kubectl
- `get_k3d_binary_path()`, `get_openbao_data_path()` -- leaking implementation details from k3d/openbao crates
- `configure_openbao()` -- future candidate for an OpenBao/Vault capability trait
---
## Files Modified (Phase A)
| File | Change |
|---|---|
| `examples/harmony_sso/Cargo.toml` | Added clap, schemars, interactive-parse |
| `examples/harmony_sso/src/main.rs` | Complete rewrite: CLI args, Zitadel deploy, JWT auth config, demo modes, hardening |
| `examples/harmony_sso/README.md` | New: prerequisites, usage, architecture |
| `harmony/src/modules/zitadel/mod.rs` | Fixed K3s security context (`runAsUser: 1000`), removed nginx ingress annotations |
## Files to Modify (Phase B)
| File | Change |
|---|---|
| `harmony_secret/src/store/zitadel.rs` | JWT exchange, silent refresh |
| `harmony_secret/src/store/openbao.rs` | Wire JWT params, refresh in auth chain |
| `harmony_secret/src/config.rs` | OPENBAO_JWT_AUTH_MOUNT, OPENBAO_JWT_ROLE env vars |
## Verification
**Phase A (verified 2026-03-28):**
- `cargo run -p example-harmony-sso` -> deploys k3d + OpenBao + Zitadel (with retry for CNPG CRD + PG readiness)
- `curl -H "Host: bao.harmony.local" http://127.0.0.1:8080/v1/sys/health` -> OpenBao healthy (initialized, unsealed)
- `curl -H "Host: sso.harmony.local" http://127.0.0.1:8080/.well-known/openid-configuration` -> Zitadel OIDC config with device_authorization_endpoint
- `cargo run -p example-harmony-sso -- --demo` -> writes/reads config via ConfigManager + OpenbaoSecretStore, env override works
**Phase B:**
- `HARMONY_SSO_URL=http://sso.harmony.local HARMONY_SSO_CLIENT_ID=<id> cargo run -p example-harmony-sso -- --sso-demo`
- Device code appears, login in browser, config stored via SSO-authenticated OpenBao token
**Phase C:**
- `cargo test -p example-harmony-sso -- --ignored` -> integration tests pass

View File

@@ -1,407 +1,395 @@
use anyhow::Context;
use clap::Parser;
use harmony::inventory::Inventory;
use harmony::modules::k8s::coredns::{CoreDNSRewrite, CoreDNSRewriteScore};
use harmony::modules::openbao::{
OpenbaoJwtAuth, OpenbaoPolicy, OpenbaoScore, OpenbaoSetupScore, OpenbaoUser,
};
use harmony::modules::zitadel::{
ZitadelAppType, ZitadelApplication, ZitadelClientConfig, ZitadelScore, ZitadelSetupScore,
};
use harmony::modules::openbao::OpenbaoScore;
use harmony::score::Score;
use harmony::topology::{K8sclient, Topology};
use harmony_config::{Config, ConfigManager, EnvSource, StoreSource};
use harmony_k8s::K8sClient;
use harmony_secret::OpenbaoSecretStore;
use harmony::topology::Topology;
use k3d_rs::{K3d, PortMapping};
use log::info;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use serde::Deserialize;
use serde::Serialize;
use std::path::PathBuf;
use std::sync::Arc;
use std::process::Command;
const CLUSTER_NAME: &str = "harmony-example";
const ZITADEL_HOST: &str = "sso.harmony.local";
const OPENBAO_HOST: &str = "bao.harmony.local";
const HTTP_PORT: u32 = 8080;
const OPENBAO_NAMESPACE: &str = "openbao";
const OPENBAO_POD: &str = "openbao-0";
const APP_NAME: &str = "harmony-cli";
const PROJECT_NAME: &str = "harmony";
#[derive(Parser)]
#[command(
name = "harmony-sso",
about = "Deploy Zitadel + OpenBao on k3d, authenticate via SSO, store config"
)]
struct Args {
/// Skip Zitadel deployment (OpenBao only, faster iteration)
#[arg(long)]
skip_zitadel: bool,
const ZITADEL_PORT: u32 = 8080;
const OPENBAO_PORT: u32 = 8200;
/// Delete the k3d cluster and exit
#[arg(long)]
cleanup: bool,
}
// ---------------------------------------------------------------------------
// Config type stored via SSO-authenticated OpenBao
// ---------------------------------------------------------------------------
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, PartialEq)]
struct SsoExampleConfig {
team_name: String,
environment: String,
max_replicas: u16,
}
impl Default for SsoExampleConfig {
fn default() -> Self {
Self {
team_name: "platform-team".to_string(),
environment: "staging".to_string(),
max_replicas: 3,
}
}
}
impl Config for SsoExampleConfig {
const KEY: &'static str = "SsoExampleConfig";
}
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
fn harmony_data_dir() -> PathBuf {
fn get_k3d_binary_path() -> PathBuf {
directories::BaseDirs::new()
.map(|dirs| dirs.data_dir().join("harmony"))
.unwrap_or_else(|| PathBuf::from("/tmp/harmony"))
.map(|dirs| dirs.data_dir().join("harmony").join("k3d"))
.unwrap_or_else(|| PathBuf::from("/tmp/harmony-k3d"))
}
fn create_k3d() -> K3d {
let base_dir = harmony_data_dir().join("k3d");
std::fs::create_dir_all(&base_dir).expect("Failed to create k3d data directory");
K3d::new(base_dir, Some(CLUSTER_NAME.to_string()))
.with_port_mappings(vec![PortMapping::new(HTTP_PORT, 80)])
fn get_openbao_data_path() -> PathBuf {
directories::BaseDirs::new()
.map(|dirs| dirs.data_dir().join("harmony").join("openbao"))
.unwrap_or_else(|| PathBuf::from("/tmp/harmony-openbao"))
}
fn create_topology(k3d: &K3d) -> harmony::topology::K8sAnywhereTopology {
let context = k3d
.context_name()
.unwrap_or_else(|| format!("k3d-{}", CLUSTER_NAME));
unsafe {
std::env::set_var("HARMONY_USE_LOCAL_K3D", "false");
std::env::set_var("HARMONY_AUTOINSTALL", "false");
std::env::set_var("HARMONY_K8S_CONTEXT", &context);
}
harmony::topology::K8sAnywhereTopology::from_env()
}
async fn ensure_k3d_cluster() -> anyhow::Result<()> {
let base_dir = get_k3d_binary_path();
std::fs::create_dir_all(&base_dir).context("Failed to create k3d data directory")?;
fn harmony_dev_policy() -> OpenbaoPolicy {
OpenbaoPolicy {
name: "harmony-dev".to_string(),
hcl: r#"path "secret/data/harmony/*" { capabilities = ["create","read","update","delete","list"] }
path "secret/metadata/harmony/*" { capabilities = ["list","read"] }"#
.to_string(),
}
}
info!(
"Ensuring k3d cluster '{}' is running with port mappings",
CLUSTER_NAME
);
// ---------------------------------------------------------------------------
// Zitadel deployment (with CNPG retry)
// ---------------------------------------------------------------------------
let k3d = K3d::new(base_dir.clone(), Some(CLUSTER_NAME.to_string())).with_port_mappings(vec![
PortMapping::new(ZITADEL_PORT, 80),
PortMapping::new(OPENBAO_PORT, 8200),
]);
async fn deploy_zitadel(k3d: &K3d) -> anyhow::Result<()> {
info!("Deploying Zitadel (this may take several minutes)...");
let zitadel = ZitadelScore {
host: ZITADEL_HOST.to_string(),
zitadel_version: "v4.12.1".to_string(),
external_secure: false,
};
let topology = create_topology(k3d);
topology
.ensure_ready()
.await
.context("Topology init failed")?;
zitadel
.interpret(&Inventory::autoload(), &topology)
.await
.context("Zitadel deployment failed")?;
info!("Zitadel deployed successfully");
Ok(())
}
async fn wait_for_zitadel_ready() -> anyhow::Result<()> {
info!("Waiting for Zitadel to be ready...");
let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(5))
.build()?;
for attempt in 1..=90 {
match client
.get(format!(
"http://127.0.0.1:{}/.well-known/openid-configuration",
HTTP_PORT
))
.header("Host", ZITADEL_HOST)
.send()
.await
{
Ok(resp) if resp.status().is_success() => {
info!("Zitadel is ready");
return Ok(());
}
Ok(resp) if attempt % 10 == 0 => {
info!("Zitadel HTTP {}, attempt {}/90", resp.status(), attempt);
}
Err(e) if attempt % 10 == 0 => {
info!("Zitadel not reachable: {}, attempt {}/90", e, attempt);
}
_ => {}
}
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
}
anyhow::bail!("Timed out waiting for Zitadel")
}
// ---------------------------------------------------------------------------
// Cluster lifecycle
// ---------------------------------------------------------------------------
async fn ensure_k3d_cluster(k3d: &K3d) -> anyhow::Result<()> {
info!("Ensuring k3d cluster '{}' is running...", CLUSTER_NAME);
k3d.ensure_installed()
.await
.map_err(|e| anyhow::anyhow!("k3d setup failed: {}", e))?;
.map_err(|e| anyhow::anyhow!("Failed to ensure k3d installed: {}", e))?;
info!("k3d cluster '{}' is ready", CLUSTER_NAME);
Ok(())
}
fn cleanup_cluster(k3d: &K3d) -> anyhow::Result<()> {
let name = k3d
.cluster_name()
.ok_or_else(|| anyhow::anyhow!("No cluster name"))?;
info!("Deleting k3d cluster '{}'...", name);
k3d.run_k3d_command(["cluster", "delete", name])
.map_err(|e| anyhow::anyhow!("{}", e))?;
info!("Cluster '{}' deleted", name);
Ok(())
fn create_topology() -> harmony::topology::K8sAnywhereTopology {
unsafe {
std::env::set_var("HARMONY_USE_LOCAL_K3D", "false");
std::env::set_var("HARMONY_AUTOINSTALL", "false");
std::env::set_var("HARMONY_K8S_CONTEXT", "k3d-harmony-example");
}
harmony::topology::K8sAnywhereTopology::from_env()
}
async fn cleanup_openbao_webhook(k8s: &K8sClient) -> anyhow::Result<()> {
use k8s_openapi::api::admissionregistration::v1::MutatingWebhookConfiguration;
if k8s
.get_resource::<MutatingWebhookConfiguration>("openbao-agent-injector-cfg", None)
.await?
.is_some()
{
async fn cleanup_openbao_webhook() -> anyhow::Result<()> {
let output = Command::new("kubectl")
.args([
"--context",
"k3d-harmony-example",
"get",
"mutatingwebhookconfigurations",
])
.output()
.context("Failed to check webhooks")?;
if String::from_utf8_lossy(&output.stdout).contains("openbao-agent-injector-cfg") {
info!("Deleting conflicting OpenBao webhook...");
k8s.delete_resource::<MutatingWebhookConfiguration>("openbao-agent-injector-cfg", None)
.await?;
let _ = Command::new("kubectl")
.args([
"--context",
"k3d-harmony-example",
"delete",
"mutatingwebhookconfiguration",
"openbao-agent-injector-cfg",
"--ignore-not-found=true",
])
.output();
}
Ok(())
}
// ---------------------------------------------------------------------------
// Main
// ---------------------------------------------------------------------------
async fn deploy_openbao(topology: &harmony::topology::K8sAnywhereTopology) -> anyhow::Result<()> {
info!("Deploying OpenBao...");
let openbao = OpenbaoScore {
host: OPENBAO_HOST.to_string(),
openshift: false,
};
let inventory = Inventory::autoload();
openbao
.interpret(&inventory, topology)
.await
.context("OpenBao deployment failed")?;
info!("OpenBao deployed successfully");
Ok(())
}
async fn wait_for_openbao_running() -> anyhow::Result<()> {
info!("Waiting for OpenBao pods to be running...");
let output = Command::new("kubectl")
.args([
"--context",
"k3d-harmony-example",
"wait",
"-n",
"openbao",
"--for=condition=podinitialized",
"pod/openbao-0",
"--timeout=120s",
])
.output()
.context("Failed to wait for OpenBao pod")?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
info!(
"Pod initialized wait failed, trying alternative approach: {}",
stderr
);
}
tokio::time::sleep(tokio::time::Duration::from_secs(5)).await;
info!("OpenBao pod is running (may be sealed)");
Ok(())
}
#[derive(Debug, Serialize, Deserialize)]
struct OpenBaoInitOutput {
#[serde(rename = "unseal_keys_b64")]
keys: Vec<String>,
#[serde(rename = "root_token")]
root_token: String,
}
async fn init_openbao() -> anyhow::Result<String> {
let data_path = get_openbao_data_path();
std::fs::create_dir_all(&data_path).context("Failed to create openbao data directory")?;
let keys_file = data_path.join("unseal-keys.json");
if keys_file.exists() {
info!("OpenBao already initialized, loading existing keys");
let content = std::fs::read_to_string(&keys_file)?;
let init_output: OpenBaoInitOutput = serde_json::from_str(&content)?;
return Ok(init_output.root_token);
}
info!("Initializing OpenBao...");
let output = Command::new("kubectl")
.args([
"--context",
"k3d-harmony-example",
"exec",
"-n",
"openbao",
"openbao-0",
"--",
"bao",
"operator",
"init",
"-format=json",
])
.output()
.context("Failed to initialize OpenBao")?;
let stderr = String::from_utf8_lossy(&output.stderr);
let stdout = String::from_utf8_lossy(&output.stdout);
if stderr.contains("already initialized") {
info!("OpenBao is already initialized");
return Err(anyhow::anyhow!(
"OpenBao is already initialized but no keys file found. \
Please delete the cluster and try again: k3d cluster delete harmony-example"
));
}
if !output.status.success() {
return Err(anyhow::anyhow!(
"OpenBao init failed with status {}: {}",
output.status,
stderr
));
}
if stdout.trim().is_empty() {
return Err(anyhow::anyhow!(
"OpenBao init returned empty output. stderr: {}",
stderr
));
}
let init_output: OpenBaoInitOutput = serde_json::from_str(&stdout)?;
std::fs::write(&keys_file, serde_json::to_string_pretty(&init_output)?)?;
info!("OpenBao initialized successfully");
info!("Unseal keys saved to {:?}", keys_file);
Ok(init_output.root_token)
}
async fn unseal_openbao(root_token: &str) -> anyhow::Result<()> {
info!("Unsealing OpenBao...");
let status_output = Command::new("kubectl")
.args([
"--context",
"k3d-harmony-example",
"exec",
"-n",
"openbao",
"openbao-0",
"--",
"bao",
"status",
"-format=json",
])
.output()
.context("Failed to get OpenBao status")?;
#[derive(Deserialize)]
struct StatusOutput {
sealed: bool,
}
if status_output.status.success() {
if let Ok(status) =
serde_json::from_str::<StatusOutput>(&String::from_utf8_lossy(&status_output.stdout))
{
if !status.sealed {
info!("OpenBao is already unsealed");
return Ok(());
}
}
}
let data_path = get_openbao_data_path();
let keys_file = data_path.join("unseal-keys.json");
let content = std::fs::read_to_string(&keys_file)?;
let init_output: OpenBaoInitOutput = serde_json::from_str(&content)?;
for key in &init_output.keys[0..3] {
let output = Command::new("kubectl")
.args([
"--context",
"k3d-harmony-example",
"exec",
"-n",
"openbao",
"openbao-0",
"--",
"bao",
"operator",
"unseal",
key,
])
.output()
.context("Failed to unseal OpenBao")?;
if !output.status.success() {
return Err(anyhow::anyhow!(
"Unseal failed: {}",
String::from_utf8_lossy(&output.stderr)
));
}
}
info!("OpenBao unsealed successfully");
Ok(())
}
async fn run_bao_command(root_token: &str, args: &[&str]) -> anyhow::Result<String> {
let command = args.join(" ");
let shell_command = format!("VAULT_TOKEN={} {}", root_token, command);
let output = Command::new("kubectl")
.args([
"--context",
"k3d-harmony-example",
"exec",
"-n",
"openbao",
"openbao-0",
"--",
"sh",
"-c",
&shell_command,
])
.output()
.context("Failed to run bao command")?;
let stdout = String::from_utf8_lossy(&output.stdout);
let stderr = String::from_utf8_lossy(&output.stderr);
if !output.status.success() {
return Err(anyhow::anyhow!("bao command failed: {}", stderr));
}
Ok(stdout.to_string())
}
async fn configure_openbao_admin_user(root_token: &str) -> anyhow::Result<()> {
info!("Configuring OpenBao with userpass auth...");
let _ = run_bao_command(root_token, &["bao", "auth", "enable", "userpass"]).await;
let _ = run_bao_command(
root_token,
&["bao", "secrets", "enable", "-path=secret", "kv-v2"],
)
.await;
run_bao_command(
root_token,
&[
"bao",
"write",
"auth/userpass/users/harmony",
"password=harmony-dev-password",
"policies=default",
],
)
.await?;
info!("OpenBao configured with userpass auth");
info!(" Username: harmony");
info!(" Password: harmony-dev-password");
info!(" Root token: {}", root_token);
Ok(())
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
let args = Args::parse();
let k3d = create_k3d();
if args.cleanup {
return cleanup_cluster(&k3d);
}
info!("===========================================");
info!("Harmony SSO Example");
info!("Deploys Zitadel + OpenBao on k3d");
info!("===========================================");
// --- Phase 1: Infrastructure ---
ensure_k3d_cluster().await?;
ensure_k3d_cluster(&k3d).await?;
info!("===========================================");
info!("Cluster '{}' is ready", CLUSTER_NAME);
info!(
"Zitadel will be available at: http://{}:{}",
ZITADEL_HOST, ZITADEL_PORT
);
info!(
"OpenBao will be available at: http://{}:{}",
OPENBAO_HOST, OPENBAO_PORT
);
info!("===========================================");
let topology = create_topology(&k3d);
let topology = create_topology();
topology
.ensure_ready()
.await
.context("Topology init failed")?;
.context("Failed to initialize topology")?;
let k8s = topology
.k8s_client()
.await
.map_err(|e| anyhow::anyhow!("K8s client: {}", e))?;
cleanup_openbao_webhook().await?;
deploy_openbao(&topology).await?;
wait_for_openbao_running().await?;
// Deploy + configure OpenBao (no JWT auth yet -- Zitadel isn't up)
cleanup_openbao_webhook(&k8s).await?;
OpenbaoScore {
host: OPENBAO_HOST.to_string(),
openshift: false,
}
.interpret(&Inventory::autoload(), &topology)
.await
.context("OpenBao deploy failed")?;
OpenbaoSetupScore {
policies: vec![harmony_dev_policy()],
users: vec![OpenbaoUser {
username: "harmony".to_string(),
password: "harmony-dev-password".to_string(),
policies: vec!["harmony-dev".to_string()],
}],
jwt_auth: None, // Phase 2 adds JWT after Zitadel is ready
..Default::default()
}
.interpret(&Inventory::autoload(), &topology)
.await
.context("OpenBao setup failed")?;
if args.skip_zitadel {
info!("=== Skipping Zitadel (--skip-zitadel) ===");
info!("OpenBao: http://{}:{}", OPENBAO_HOST, HTTP_PORT);
return Ok(());
}
// --- Phase 2: Identity + SSO Wiring ---
CoreDNSRewriteScore {
rewrites: vec![
CoreDNSRewrite {
hostname: ZITADEL_HOST.to_string(),
target: "zitadel.zitadel.svc.cluster.local".to_string(),
},
CoreDNSRewrite {
hostname: OPENBAO_HOST.to_string(),
target: "openbao.openbao.svc.cluster.local".to_string(),
},
],
}
.interpret(&Inventory::autoload(), &topology)
.await
.context("CoreDNS rewrite failed")?;
deploy_zitadel(&k3d).await?;
wait_for_zitadel_ready().await?;
// Provision Zitadel project + device-code application
ZitadelSetupScore {
host: ZITADEL_HOST.to_string(),
port: HTTP_PORT as u16,
skip_tls: true,
applications: vec![ZitadelApplication {
project_name: PROJECT_NAME.to_string(),
app_name: APP_NAME.to_string(),
app_type: ZitadelAppType::DeviceCode,
}],
machine_users: vec![],
}
.interpret(&Inventory::autoload(), &topology)
.await
.context("Zitadel setup failed")?;
// Read the client_id from the cache written by ZitadelSetupScore
let zitadel_config =
ZitadelClientConfig::load().context("ZitadelSetupScore did not produce a client config")?;
let client_id = zitadel_config
.client_id(APP_NAME)
.context("No client_id for harmony-cli app")?
.clone();
info!("Zitadel app '{}' client_id: {}", APP_NAME, client_id);
// Now configure OpenBao JWT auth with the real client_id
OpenbaoSetupScore {
policies: vec![harmony_dev_policy()],
users: vec![OpenbaoUser {
username: "harmony".to_string(),
password: "harmony-dev-password".to_string(),
policies: vec!["harmony-dev".to_string()],
}],
jwt_auth: Some(OpenbaoJwtAuth {
oidc_discovery_url: format!("http://{}:{}", ZITADEL_HOST, HTTP_PORT),
bound_issuer: format!("http://{}:{}", ZITADEL_HOST, HTTP_PORT),
role_name: "harmony-developer".to_string(),
bound_audiences: client_id.clone(),
user_claim: "email".to_string(),
policies: vec!["harmony-dev".to_string()],
ttl: "4h".to_string(),
max_ttl: "24h".to_string(),
}),
..Default::default()
}
.interpret(&Inventory::autoload(), &topology)
.await
.context("OpenBao JWT auth setup failed")?;
// --- Phase 3: Config via SSO ---
let root_token = init_openbao().await?;
unseal_openbao(&root_token).await?;
configure_openbao_admin_user(&root_token).await?;
info!("===========================================");
info!("Storing config via SSO-authenticated OpenBao");
info!("OpenBao initialized and configured!");
info!("===========================================");
let _pf = k8s
.port_forward(OPENBAO_POD, OPENBAO_NAMESPACE, 8200, 8200)
.await
.context("Port-forward to OpenBao failed")?;
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
let openbao_url = format!("http://127.0.0.1:{}", _pf.port());
let sso_url = format!("http://{}:{}", ZITADEL_HOST, HTTP_PORT);
let store = OpenbaoSecretStore::new(
openbao_url,
"secret".to_string(),
"jwt".to_string(),
true,
None,
None,
None,
Some(sso_url),
Some(client_id),
Some("harmony-developer".to_string()),
Some("jwt".to_string()),
)
.await
.context("SSO authentication failed")?;
let manager = ConfigManager::new(vec![
Arc::new(EnvSource) as Arc<dyn harmony_config::ConfigSource>,
Arc::new(StoreSource::new("harmony".to_string(), store)),
]);
// Try to load existing config (succeeds on re-run)
match manager.get::<SsoExampleConfig>().await {
Ok(config) => {
info!("Config loaded from OpenBao: {:?}", config);
}
Err(harmony_config::ConfigError::NotFound { .. }) => {
info!("No config found, storing default...");
let config = SsoExampleConfig::default();
manager.set(&config).await?;
info!("Config stored: {:?}", config);
let retrieved: SsoExampleConfig = manager.get().await?;
info!("Config verified: {:?}", retrieved);
assert_eq!(config, retrieved);
}
Err(e) => return Err(e.into()),
}
info!("Zitadel: http://{}:{}", ZITADEL_HOST, ZITADEL_PORT);
info!("OpenBao: http://{}:{}", OPENBAO_HOST, OPENBAO_PORT);
info!("===========================================");
info!("Success! Config managed via Zitadel SSO + OpenBao");
info!("OpenBao credentials:");
info!(" Username: harmony");
info!(" Password: harmony-dev-password");
info!("===========================================");
info!("OpenBao: http://{}:{}", OPENBAO_HOST, HTTP_PORT);
info!("Zitadel: http://{}:{}", ZITADEL_HOST, HTTP_PORT);
info!("Run again to verify cached session works.");
info!("cargo run -p example-harmony-sso -- --cleanup # teardown");
Ok(())
}

View File

@@ -1,16 +0,0 @@
[package]
name = "kvm-vm-examples"
version.workspace = true
edition = "2024"
license.workspace = true
[[bin]]
name = "kvm-vm-examples"
path = "src/main.rs"
[dependencies]
harmony = { path = "../../harmony" }
tokio.workspace = true
log.workspace = true
env_logger.workspace = true
clap = { version = "4", features = ["derive"] }

View File

@@ -1,47 +0,0 @@
# KVM VM Examples
Demonstrates creating VMs with various configurations using harmony's KVM module. These examples exercise the same infrastructure primitives needed for the full OKD HA cluster with OPNsense, control plane, and workers with Ceph.
## Prerequisites
A working KVM/libvirt setup:
```bash
# Manjaro / Arch
sudo pacman -S qemu-full libvirt virt-install dnsmasq ebtables
sudo systemctl enable --now libvirtd
sudo usermod -aG libvirt $USER
# Log out and back in for group membership to take effect
```
## Scenarios
| Scenario | VMs | Disks | NICs | Purpose |
|----------|-----|-------|------|---------|
| `alpine` | 1 | 1x2G | 1 | Minimal VM, fast boot (~5s) |
| `ubuntu` | 1 | 1x25G | 1 | Standard server setup |
| `worker` | 1 | 3 (60G+100G+100G) | 1 | Multi-disk for Ceph OSD |
| `gateway` | 1 | 1x10G | 2 (WAN+LAN) | Dual-NIC firewall |
| `ha-cluster` | 7 | mixed | 1 each | Full HA: gateway + 3 CP + 3 workers |
## Usage
```bash
# Deploy a scenario
cargo run -p kvm-vm-examples -- alpine
cargo run -p kvm-vm-examples -- ubuntu
cargo run -p kvm-vm-examples -- worker
cargo run -p kvm-vm-examples -- gateway
cargo run -p kvm-vm-examples -- ha-cluster
# Check status
cargo run -p kvm-vm-examples -- status alpine
# Clean up
cargo run -p kvm-vm-examples -- clean alpine
```
## Environment variables
- `HARMONY_KVM_URI`: libvirt URI (default: `qemu:///system`)
- `HARMONY_KVM_IMAGE_DIR`: where disk images and ISOs are stored

View File

@@ -1,358 +0,0 @@
//! KVM VM examples demonstrating various configurations.
//!
//! Each subcommand creates a different VM setup. All VMs are managed
//! via libvirt — you need a working KVM hypervisor on the host.
//!
//! # Prerequisites
//!
//! ```bash
//! # Manjaro / Arch
//! sudo pacman -S qemu-full libvirt virt-install dnsmasq ebtables
//! sudo systemctl enable --now libvirtd
//! sudo usermod -aG libvirt $USER
//! ```
//!
//! # Environment variables
//!
//! - `HARMONY_KVM_URI`: libvirt URI (default: `qemu:///system`)
//! - `HARMONY_KVM_IMAGE_DIR`: disk image directory (default: `~/.local/share/harmony/kvm/images`)
//!
//! # Usage
//!
//! ```bash
//! # Simple Alpine VM (tiny, boots in seconds — great for testing)
//! cargo run -p kvm-vm-examples -- alpine
//!
//! # Ubuntu Server with cloud-init
//! cargo run -p kvm-vm-examples -- ubuntu
//!
//! # Multi-disk worker node (Ceph OSD style)
//! cargo run -p kvm-vm-examples -- worker
//!
//! # Multi-NIC gateway (OPNsense style: WAN + LAN)
//! cargo run -p kvm-vm-examples -- gateway
//!
//! # Full HA cluster: 1 gateway + 3 control plane + 3 workers
//! cargo run -p kvm-vm-examples -- ha-cluster
//!
//! # Clean up all VMs and networks from a scenario
//! cargo run -p kvm-vm-examples -- clean <scenario>
//! ```
use clap::{Parser, Subcommand};
use harmony::modules::kvm::config::init_executor;
use harmony::modules::kvm::{
BootDevice, ForwardMode, KvmExecutor, NetworkConfig, NetworkRef, VmConfig, VmStatus,
};
use log::info;
#[derive(Parser)]
#[command(name = "kvm-vm-examples")]
#[command(about = "KVM VM examples for various infrastructure setups")]
struct Cli {
#[command(subcommand)]
command: Commands,
}
#[derive(Subcommand)]
enum Commands {
/// Minimal Alpine Linux VM — fast boot, ~150MB ISO
Alpine,
/// Ubuntu Server 24.04 — standard server with 1 disk
Ubuntu,
/// Worker node with multiple disks (OS + Ceph OSD storage)
Worker,
/// Gateway/firewall with 2 NICs (WAN + LAN)
Gateway,
/// Full HA cluster: gateway + 3 control plane + 3 worker nodes
HaCluster,
/// Tear down all VMs and networks for a scenario
Clean {
/// Scenario to clean: alpine, ubuntu, worker, gateway, ha-cluster
scenario: String,
},
/// Show status of all VMs in a scenario
Status {
/// Scenario: alpine, ubuntu, worker, gateway, ha-cluster
scenario: String,
},
}
const ALPINE_ISO: &str =
"https://dl-cdn.alpinelinux.org/alpine/v3.21/releases/x86_64/alpine-virt-3.21.3-x86_64.iso";
const UBUNTU_ISO: &str = "https://releases.ubuntu.com/24.04.2/ubuntu-24.04.2-live-server-amd64.iso";
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
let cli = Cli::parse();
let executor = init_executor()?;
match cli.command {
Commands::Alpine => deploy_alpine(&executor).await?,
Commands::Ubuntu => deploy_ubuntu(&executor).await?,
Commands::Worker => deploy_worker(&executor).await?,
Commands::Gateway => deploy_gateway(&executor).await?,
Commands::HaCluster => deploy_ha_cluster(&executor).await?,
Commands::Clean { scenario } => clean(&executor, &scenario).await?,
Commands::Status { scenario } => status(&executor, &scenario).await?,
}
Ok(())
}
// ── Alpine: minimal VM ──────────────────────────────────────────────────
async fn deploy_alpine(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
let net = NetworkConfig::builder("alpine-net")
.subnet("192.168.110.1", 24)
.forward(ForwardMode::Nat)
.build();
executor.ensure_network(net).await?;
let vm = VmConfig::builder("alpine-vm")
.vcpus(1)
.memory_mib(512)
.disk(2)
.network(NetworkRef::named("alpine-net"))
.cdrom(ALPINE_ISO)
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
.build();
executor.ensure_vm(vm.clone()).await?;
executor.start_vm(&vm.name).await?;
info!("Alpine VM running. Connect: virsh console {}", vm.name);
info!("Login: root (no password). Install: setup-alpine");
Ok(())
}
// ── Ubuntu Server: standard setup ───────────────────────────────────────
async fn deploy_ubuntu(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
let net = NetworkConfig::builder("ubuntu-net")
.subnet("192.168.120.1", 24)
.forward(ForwardMode::Nat)
.build();
executor.ensure_network(net).await?;
let vm = VmConfig::builder("ubuntu-server")
.vcpus(2)
.memory_gb(4)
.disk(25)
.network(NetworkRef::named("ubuntu-net"))
.cdrom(UBUNTU_ISO)
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
.build();
executor.ensure_vm(vm.clone()).await?;
executor.start_vm(&vm.name).await?;
info!(
"Ubuntu Server VM running. Connect: virsh console {}",
vm.name
);
info!("Follow the interactive installer to complete setup.");
Ok(())
}
// ── Worker: multi-disk for Ceph ─────────────────────────────────────────
async fn deploy_worker(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
let net = NetworkConfig::builder("worker-net")
.subnet("192.168.130.1", 24)
.forward(ForwardMode::Nat)
.build();
executor.ensure_network(net).await?;
let vm = VmConfig::builder("worker-node")
.vcpus(4)
.memory_gb(8)
.disk(60) // vda: OS
.disk(100) // vdb: Ceph OSD 1
.disk(100) // vdc: Ceph OSD 2
.network(NetworkRef::named("worker-net"))
.cdrom(ALPINE_ISO) // Use Alpine for fast testing
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
.build();
executor.ensure_vm(vm.clone()).await?;
executor.start_vm(&vm.name).await?;
info!("Worker node running with 3 disks (vda=60G OS, vdb=100G OSD, vdc=100G OSD)");
info!("Connect: virsh console {}", vm.name);
Ok(())
}
// ── Gateway: dual-NIC firewall ──────────────────────────────────────────
async fn deploy_gateway(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
// WAN: NAT network (internet access)
let wan = NetworkConfig::builder("gw-wan")
.subnet("192.168.140.1", 24)
.forward(ForwardMode::Nat)
.build();
// LAN: isolated network (no internet, internal only)
let lan = NetworkConfig::builder("gw-lan")
.subnet("10.100.0.1", 24)
.isolated()
.build();
executor.ensure_network(wan).await?;
executor.ensure_network(lan).await?;
let vm = VmConfig::builder("gateway-vm")
.vcpus(2)
.memory_gb(2)
.disk(10)
.network(NetworkRef::named("gw-wan")) // First NIC = WAN
.network(NetworkRef::named("gw-lan")) // Second NIC = LAN
.cdrom(ALPINE_ISO)
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
.build();
executor.ensure_vm(vm.clone()).await?;
executor.start_vm(&vm.name).await?;
info!("Gateway VM running with 2 NICs: WAN (gw-wan) + LAN (gw-lan)");
info!("Connect: virsh console {}", vm.name);
Ok(())
}
// ── HA Cluster: full OKD-style deployment ───────────────────────────────
async fn deploy_ha_cluster(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
// Network: NAT for external access, all nodes on the same subnet
let cluster_net = NetworkConfig::builder("ha-cluster")
.bridge("virbr-ha")
.subnet("10.200.0.1", 24)
.forward(ForwardMode::Nat)
.build();
executor.ensure_network(cluster_net).await?;
// Gateway / firewall / load balancer
let gateway = VmConfig::builder("ha-gateway")
.vcpus(2)
.memory_gb(2)
.disk(10)
.network(NetworkRef::named("ha-cluster"))
.boot_order([BootDevice::Network, BootDevice::Disk])
.build();
executor.ensure_vm(gateway.clone()).await?;
info!("Defined: {} (gateway/firewall)", gateway.name);
// Control plane nodes
for i in 1..=3 {
let cp = VmConfig::builder(format!("ha-cp-{i}"))
.vcpus(4)
.memory_gb(16)
.disk(120)
.network(NetworkRef::named("ha-cluster"))
.boot_order([BootDevice::Network, BootDevice::Disk])
.build();
executor.ensure_vm(cp.clone()).await?;
info!("Defined: {} (control plane)", cp.name);
}
// Worker nodes with Ceph storage
for i in 1..=3 {
let worker = VmConfig::builder(format!("ha-worker-{i}"))
.vcpus(8)
.memory_gb(32)
.disk(120) // vda: OS
.disk(200) // vdb: Ceph OSD
.network(NetworkRef::named("ha-cluster"))
.boot_order([BootDevice::Network, BootDevice::Disk])
.build();
executor.ensure_vm(worker.clone()).await?;
info!("Defined: {} (worker + Ceph)", worker.name);
}
info!("HA cluster defined (7 VMs). Start individually or use PXE boot.");
info!(
"To start all: for vm in ha-gateway ha-cp-{{1..3}} ha-worker-{{1..3}}; do virsh start $vm; done"
);
Ok(())
}
// ── Clean up ────────────────────────────────────────────────────────────
async fn clean(executor: &KvmExecutor, scenario: &str) -> Result<(), Box<dyn std::error::Error>> {
let (vms, nets) = match scenario {
"alpine" => (vec!["alpine-vm"], vec!["alpine-net"]),
"ubuntu" => (vec!["ubuntu-server"], vec!["ubuntu-net"]),
"worker" => (vec!["worker-node"], vec!["worker-net"]),
"gateway" => (vec!["gateway-vm"], vec!["gw-wan", "gw-lan"]),
"ha-cluster" => (
vec![
"ha-gateway",
"ha-cp-1",
"ha-cp-2",
"ha-cp-3",
"ha-worker-1",
"ha-worker-2",
"ha-worker-3",
],
vec!["ha-cluster"],
),
other => {
eprintln!("Unknown scenario: {other}");
eprintln!("Available: alpine, ubuntu, worker, gateway, ha-cluster");
std::process::exit(1);
}
};
for vm in &vms {
info!("Cleaning up VM: {vm}");
let _ = executor.destroy_vm(vm).await;
let _ = executor.undefine_vm(vm).await;
}
for net in &nets {
info!("Cleaning up network: {net}");
let _ = executor.delete_network(net).await;
}
info!("Cleanup complete for scenario: {scenario}");
Ok(())
}
// ── Status ──────────────────────────────────────────────────────────────
async fn status(executor: &KvmExecutor, scenario: &str) -> Result<(), Box<dyn std::error::Error>> {
let vms: Vec<&str> = match scenario {
"alpine" => vec!["alpine-vm"],
"ubuntu" => vec!["ubuntu-server"],
"worker" => vec!["worker-node"],
"gateway" => vec!["gateway-vm"],
"ha-cluster" => vec![
"ha-gateway",
"ha-cp-1",
"ha-cp-2",
"ha-cp-3",
"ha-worker-1",
"ha-worker-2",
"ha-worker-3",
],
other => {
eprintln!("Unknown scenario: {other}");
std::process::exit(1);
}
};
println!("{:<20} {}", "VM", "STATUS");
println!("{}", "-".repeat(35));
for vm in &vms {
let status = match executor.vm_status(vm).await {
Ok(s) => format!("{s:?}"),
Err(_) => "not found".to_string(),
};
println!("{:<20} {}", vm, status);
}
Ok(())
}

View File

@@ -1,7 +1,6 @@
use brocade::BrocadeOptions;
use cidr::Ipv4Cidr;
use harmony::{
config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials},
hardware::{Location, SwitchGroup},
infra::{
brocade::{BrocadeSwitchClient, BrocadeSwitchConfig},
@@ -12,12 +11,20 @@ use harmony::{
topology::{HAClusterTopology, LogicalHost, UnmanagedRouter},
};
use harmony_macros::{ip, ipv4};
use harmony_secret::SecretManager;
use harmony_secret::{Secret, SecretManager};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use std::{
net::IpAddr,
sync::{Arc, OnceLock},
};
#[derive(Secret, Serialize, Deserialize, JsonSchema, Debug, PartialEq)]
struct OPNSenseFirewallConfig {
username: String,
password: String,
}
pub async fn get_topology() -> HAClusterTopology {
let firewall = harmony::topology::LogicalHost {
ip: ip!("192.168.1.1"),
@@ -43,16 +50,17 @@ pub async fn get_topology() -> HAClusterTopology {
let switch_client = Arc::new(switch_client);
let ssh_creds = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
.await
.unwrap();
let api_creds = SecretManager::get_or_prompt::<OPNSenseApiCredentials>()
.await
.unwrap();
let config = SecretManager::get_or_prompt::<OPNSenseFirewallConfig>().await;
let config = config.unwrap();
let opnsense = Arc::new(
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds)
.await,
harmony::infra::opnsense::OPNSenseFirewall::new(
firewall,
None,
&config.username,
&config.password,
)
.await,
);
let lan_subnet = ipv4!("192.168.1.0");
let gateway_ipv4 = ipv4!("192.168.1.1");

View File

@@ -43,17 +43,17 @@ pub async fn get_topology() -> HAClusterTopology {
let switch_client = Arc::new(switch_client);
let ssh_creds = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
.await
.unwrap();
let api_creds =
SecretManager::get_or_prompt::<harmony::config::secret::OPNSenseApiCredentials>()
.await
.unwrap();
let config = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>().await;
let config = config.unwrap();
let opnsense = Arc::new(
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds)
.await,
harmony::infra::opnsense::OPNSenseFirewall::new(
firewall,
None,
&config.username,
&config.password,
)
.await,
);
let lan_subnet = ipv4!("192.168.1.0");
let gateway_ipv4 = ipv4!("192.168.1.1");

View File

@@ -1,29 +1,15 @@
# OPNsense Basic Example
## OPNSense demo
Demonstrates connecting to an existing OPNsense firewall and running Harmony Scores against it.
Download the virtualbox snapshot from {{TODO URL}}
## Prerequisites
Start the virtualbox image
- An OPNsense firewall accessible via SSH and REST API
- API key + secret created in OPNsense (System > Access > Users > API keys)
This virtualbox image is configured to use a bridge on the host's physical interface, make sure the bridge is up and the virtual machine can reach internet.
## Usage
Credentials are opnsense default (root/opnsense)
Run the project with the correct ip address on the command line :
```bash
# Set credentials (or use env.sh)
export OPNSENSE_API_KEY=your_key
export OPNSENSE_API_SECRET=your_secret
# Run against a specific OPNsense host
cargo run -p example-opnsense -- 192.168.1.1
cargo run -p example-opnsense -- 192.168.5.229
```
## Scores applied
| Score | What it configures |
|-------|--------------------|
| `DhcpScore` | DHCP range and static host bindings via dnsmasq |
## For automated VM-based testing
See [`examples/opnsense_vm_integration/`](../opnsense_vm_integration/) which boots a fresh OPNsense VM via KVM, runs 11 Scores, and verifies idempotency automatically.

View File

@@ -1,5 +1,5 @@
use harmony::{
config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials},
config::secret::OPNSenseFirewallCredentials,
infra::opnsense::OPNSenseFirewall,
inventory::Inventory,
modules::{dhcp::DhcpScore, opnsense::OPNsenseShellCommandScore},
@@ -17,14 +17,17 @@ async fn main() {
name: String::from("opnsense-1"),
};
let ssh_creds = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
let opnsense_auth = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
.await
.expect("Failed to get SSH credentials");
let api_creds = SecretManager::get_or_prompt::<OPNSenseApiCredentials>()
.await
.expect("Failed to get API credentials");
.expect("Failed to get credentials");
let opnsense = OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds).await;
let opnsense = OPNSenseFirewall::new(
firewall,
None,
&opnsense_auth.username,
&opnsense_auth.password,
)
.await;
let dhcp_score = DhcpScore {
dhcp_range: (

View File

@@ -48,17 +48,8 @@ async fn main() {
name: String::from("fw0"),
};
let api_creds = harmony::config::secret::OPNSenseApiCredentials {
key: "root".to_string(),
secret: "opnsense".to_string(),
};
let ssh_creds = harmony::config::secret::OPNSenseFirewallCredentials {
username: "root".to_string(),
password: "opnsense".to_string(),
};
let opnsense = Arc::new(
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds)
.await,
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, "root", "opnsense").await,
);
let topology = OpnSenseTopology {

View File

@@ -1,25 +0,0 @@
[package]
name = "opnsense-pair-integration"
version.workspace = true
edition = "2024"
license.workspace = true
[[bin]]
name = "opnsense-pair-integration"
path = "src/main.rs"
[dependencies]
harmony = { path = "../../harmony" }
harmony_cli = { path = "../../harmony_cli" }
harmony_inventory_agent = { path = "../../harmony_inventory_agent" }
harmony_macros = { path = "../../harmony_macros" }
harmony_types = { path = "../../harmony_types" }
opnsense-api = { path = "../../opnsense-api" }
opnsense-config = { path = "../../opnsense-config" }
tokio.workspace = true
log.workspace = true
env_logger.workspace = true
reqwest.workspace = true
russh.workspace = true
serde_json.workspace = true
dirs = "6"

View File

@@ -1,64 +0,0 @@
# OPNsense Firewall Pair Integration Example
Boots two OPNsense VMs, bootstraps both with automated SSH/API setup, then configures a CARP HA firewall pair using `FirewallPairTopology` and `CarpVipScore`. Fully automated, CI-friendly.
## Quick start
```bash
# Prerequisites (same as single-VM example)
./examples/opnsense_vm_integration/setup-libvirt.sh
# Boot + bootstrap + pair test (fully unattended)
cargo run -p opnsense-pair-integration -- --full
```
## What it does
1. Creates a shared LAN network + 2 OPNsense VMs (2 NICs each: LAN + WAN)
2. Bootstraps both VMs sequentially using NIC link control to avoid IP conflicts:
- Disables backup's LAN NIC
- Bootstraps primary on .1 (login, SSH, webgui port 9443)
- Changes primary's LAN IP from .1 to .2
- Swaps NICs (disable primary, enable backup)
- Bootstraps backup on .1
- Changes backup's LAN IP from .1 to .3
- Re-enables all NICs
3. Applies pair scores via `FirewallPairTopology`:
- `CarpVipScore` — CARP VIP at .1 (primary advskew=0, backup advskew=100)
- `VlanScore` — VLAN 100 on both
- `FirewallRuleScore` — ICMP allow on both
4. Verifies CARP VIPs and VLANs via REST API on both firewalls
## Network topology
```
Host (192.168.1.10)
|
+--- virbr-pair (192.168.1.0/24, NAT)
| | |
| fw-primary fw-backup
| vtnet0=.2 vtnet0=.3
| (CARP VIP: .1)
|
+--- virbr0 (default, DHCP)
| |
fw-primary fw-backup
vtnet1=dhcp vtnet1=dhcp (WAN)
```
Both VMs boot with OPNsense's default LAN IP of 192.168.1.1. The NIC juggling sequence ensures only one VM has its LAN NIC active at a time during bootstrap, avoiding address conflicts.
## Requirements
Same as the single-VM example: Linux with KVM, libvirt, ~20 GB disk space, ~20 minutes first run.
## Commands
| Command | Description |
|---------|-------------|
| `--check` | Verify prerequisites |
| `--boot` | Boot + bootstrap both VMs |
| (default) | Run pair integration test |
| `--full` | Boot + bootstrap + test (CI mode) |
| `--status` | Show both VMs' status |
| `--clean` | Destroy both VMs and networks |

View File

@@ -1,690 +0,0 @@
//! OPNsense firewall pair integration example.
//!
//! Boots two OPNsense VMs, bootstraps both (login, SSH, webgui port),
//! then applies `FirewallPairTopology` + `CarpVipScore` for CARP HA testing.
//!
//! Both VMs share a LAN bridge but boot with the same default IP (.1).
//! The bootstrap sequence disables one VM's LAN NIC while bootstrapping
//! the other, then changes IPs via the API to avoid conflicts.
//!
//! # Usage
//!
//! ```bash
//! cargo run -p opnsense-pair-integration -- --check # verify prerequisites
//! cargo run -p opnsense-pair-integration -- --boot # boot + bootstrap both VMs
//! cargo run -p opnsense-pair-integration # run pair integration test
//! cargo run -p opnsense-pair-integration -- --full # boot + bootstrap + test (CI mode)
//! cargo run -p opnsense-pair-integration -- --status # check both VMs
//! cargo run -p opnsense-pair-integration -- --clean # tear down everything
//! ```
use std::net::IpAddr;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use harmony::config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials};
use harmony::infra::opnsense::OPNSenseFirewall;
use harmony::inventory::Inventory;
use harmony::modules::kvm::config::init_executor;
use harmony::modules::kvm::{
BootDevice, ForwardMode, KvmExecutor, NetworkConfig, NetworkRef, VmConfig,
};
use harmony::modules::opnsense::bootstrap::OPNsenseBootstrap;
use harmony::modules::opnsense::firewall::{FilterRuleDef, FirewallRuleScore};
use harmony::modules::opnsense::vip::VipDef;
use harmony::modules::opnsense::vlan::{VlanDef, VlanScore};
use harmony::score::Score;
use harmony::topology::{CarpVipScore, FirewallPairTopology, LogicalHost};
use harmony_types::firewall::{Direction, FirewallAction, IpProtocol, NetworkProtocol, VipMode};
use log::info;
const OPNSENSE_IMG_URL: &str =
"https://mirror.ams1.nl.leaseweb.net/opnsense/releases/26.1/OPNsense-26.1-nano-amd64.img.bz2";
const OPNSENSE_IMG_NAME: &str = "OPNsense-26.1-nano-amd64.img";
const VM_PRIMARY: &str = "opn-pair-primary";
const VM_BACKUP: &str = "opn-pair-backup";
const NET_LAN: &str = "opn-pair-lan";
/// Both VMs boot on this IP (OPNsense default, ignores injected config.xml).
/// We bootstrap one at a time by toggling LAN NICs, then change IPs via the API.
const BOOT_IP: &str = "192.168.1.1";
const HOST_IP: &str = "192.168.1.10";
/// After bootstrap, primary gets .2, backup gets .3, CARP VIP stays at .1
const PRIMARY_IP: &str = "192.168.1.2";
const BACKUP_IP: &str = "192.168.1.3";
const CARP_VIP: &str = "192.168.1.1";
const API_PORT: u16 = 9443;
const CARP_PASSWORD: &str = "pair-test-carp";
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
harmony_cli::cli_logger::init();
let args: Vec<String> = std::env::args().collect();
if args.iter().any(|a| a == "--check") {
return check_prerequisites();
}
if args.iter().any(|a| a == "--download") {
download_image().await?;
return Ok(());
}
let executor = init_executor()?;
if args.iter().any(|a| a == "--clean") {
return clean(&executor).await;
}
if args.iter().any(|a| a == "--status") {
return status(&executor).await;
}
if args.iter().any(|a| a == "--boot") {
let img_path = download_image().await?;
return boot_pair(&executor, &img_path).await;
}
if args.iter().any(|a| a == "--full") {
let img_path = download_image().await?;
boot_pair(&executor, &img_path).await?;
return run_pair_test().await;
}
// Default: run pair test (assumes VMs are bootstrapped)
check_prerequisites()?;
run_pair_test().await
}
// ── Phase 1: Boot and bootstrap both VMs ───────────────────────────
async fn boot_pair(
executor: &KvmExecutor,
img_path: &Path,
) -> Result<(), Box<dyn std::error::Error>> {
info!("Creating shared LAN network and two OPNsense VMs...");
// Create the shared LAN network
let network = NetworkConfig::builder(NET_LAN)
.bridge("virbr-pair")
.subnet(HOST_IP, 24)
.forward(ForwardMode::Nat)
.build();
executor.ensure_network(network).await?;
// Prepare disk images for both VMs
for vm_name in [VM_PRIMARY, VM_BACKUP] {
prepare_vm_disk(vm_name, img_path)?;
}
// Define and start both VMs (2 NICs each: LAN + WAN)
for vm_name in [VM_PRIMARY, VM_BACKUP] {
let disk = image_dir().join(format!("{vm_name}.qcow2"));
let vm = VmConfig::builder(vm_name)
.vcpus(1)
.memory_mib(1024)
.disk_from_path(disk.to_string_lossy().to_string())
.network(NetworkRef::named(NET_LAN)) // vtnet0 = LAN
.network(NetworkRef::named("default")) // vtnet1 = WAN
.boot_order([BootDevice::Disk])
.build();
executor.ensure_vm(vm).await?;
executor.start_vm(vm_name).await?;
}
// Get MAC addresses for LAN NICs (first interface on each VM)
let primary_interfaces = executor.list_interfaces(VM_PRIMARY).await?;
let backup_interfaces = executor.list_interfaces(VM_BACKUP).await?;
let primary_lan_mac = &primary_interfaces[0].mac;
let backup_lan_mac = &backup_interfaces[0].mac;
info!("Primary LAN MAC: {primary_lan_mac}, Backup LAN MAC: {backup_lan_mac}");
// ── Sequential bootstrap with NIC juggling ─────────────────────
//
// Both VMs boot on .1 (OPNsense default). We disable backup's LAN
// NIC so primary gets exclusive access to .1, bootstrap it, change
// its IP, then do the same for backup.
// Step 1: Disable backup's LAN NIC
info!("Disabling backup LAN NIC for primary bootstrap...");
executor
.set_interface_link(VM_BACKUP, backup_lan_mac, false)
.await?;
// Step 2: Wait for primary web UI and bootstrap
info!("Waiting for primary web UI at https://{BOOT_IP}...");
wait_for_https(BOOT_IP, 443).await?;
bootstrap_vm("primary", BOOT_IP).await?;
// Step 3: Change primary's LAN IP from .1 to .2 via API
info!("Changing primary LAN IP to {PRIMARY_IP}...");
change_lan_ip_via_ssh(BOOT_IP, PRIMARY_IP, 24).await?;
// Step 4: Wait for primary to come back on new IP
info!("Waiting for primary on new IP {PRIMARY_IP}:{API_PORT}...");
OPNsenseBootstrap::wait_for_ready(
&format!("https://{PRIMARY_IP}:{API_PORT}"),
std::time::Duration::from_secs(60),
)
.await?;
// Step 5: Disable primary's LAN NIC, enable backup's
info!("Swapping NICs: disabling primary, enabling backup...");
executor
.set_interface_link(VM_PRIMARY, primary_lan_mac, false)
.await?;
executor
.set_interface_link(VM_BACKUP, backup_lan_mac, true)
.await?;
// Step 6: Wait for backup web UI and bootstrap
info!("Waiting for backup web UI at https://{BOOT_IP}...");
wait_for_https(BOOT_IP, 443).await?;
bootstrap_vm("backup", BOOT_IP).await?;
// Step 7: Change backup's LAN IP from .1 to .3 via API
info!("Changing backup LAN IP to {BACKUP_IP}...");
change_lan_ip_via_ssh(BOOT_IP, BACKUP_IP, 24).await?;
// Step 8: Re-enable primary's LAN NIC
info!("Re-enabling primary LAN NIC...");
executor
.set_interface_link(VM_PRIMARY, primary_lan_mac, true)
.await?;
// Step 9: Wait for both to be reachable on their final IPs
info!("Waiting for both VMs on final IPs...");
OPNsenseBootstrap::wait_for_ready(
&format!("https://{PRIMARY_IP}:{API_PORT}"),
std::time::Duration::from_secs(60),
)
.await?;
OPNsenseBootstrap::wait_for_ready(
&format!("https://{BACKUP_IP}:{API_PORT}"),
std::time::Duration::from_secs(60),
)
.await?;
println!();
println!("OPNsense firewall pair is running and bootstrapped:");
println!(" Primary: https://{PRIMARY_IP}:{API_PORT} (root/opnsense)");
println!(" Backup: https://{BACKUP_IP}:{API_PORT} (root/opnsense)");
println!(" CARP VIP: {CARP_VIP} (will be configured by pair scores)");
println!();
println!("Run the pair integration test:");
println!(" cargo run -p opnsense-pair-integration");
Ok(())
}
async fn bootstrap_vm(role: &str, ip: &str) -> Result<(), Box<dyn std::error::Error>> {
info!("Bootstrapping {role} firewall at {ip}...");
let bootstrap = OPNsenseBootstrap::new(&format!("https://{ip}"));
bootstrap.login("root", "opnsense").await?;
bootstrap.abort_wizard().await?;
bootstrap.enable_ssh(true, true).await?;
bootstrap.set_webgui_port(API_PORT, ip, false).await?;
// Wait for webgui on new port
OPNsenseBootstrap::wait_for_ready(
&format!("https://{ip}:{API_PORT}"),
std::time::Duration::from_secs(120),
)
.await?;
// Verify SSH
for _ in 0..15 {
if check_tcp_port(ip, 22).await {
break;
}
tokio::time::sleep(std::time::Duration::from_secs(2)).await;
}
if !check_tcp_port(ip, 22).await {
return Err(format!("SSH not reachable on {role} after bootstrap").into());
}
info!("{role} bootstrap complete");
Ok(())
}
/// Change the LAN interface IP via SSH (using OPNsense's ifconfig + config edit).
async fn change_lan_ip_via_ssh(
current_ip: &str,
new_ip: &str,
subnet: u8,
) -> Result<(), Box<dyn std::error::Error>> {
use opnsense_config::config::{OPNsenseShell, SshCredentials, SshOPNSenseShell};
let ssh_config = Arc::new(russh::client::Config {
inactivity_timeout: None,
..<_>::default()
});
let credentials = SshCredentials::Password {
username: "root".to_string(),
password: "opnsense".to_string(),
};
let ip: IpAddr = current_ip.parse()?;
let shell = SshOPNSenseShell::new((ip, 22), credentials, ssh_config);
// Use a PHP script to update config.xml and apply
let php_script = format!(
r#"<?php
require_once '/usr/local/etc/inc/config.inc';
$config = OPNsense\Core\Config::getInstance();
$config->object()->interfaces->lan->ipaddr = '{new_ip}';
$config->object()->interfaces->lan->subnet = '{subnet}';
$config->save();
echo "OK\n";
"#
);
shell
.write_content_to_file(&php_script, "/tmp/change_ip.php")
.await?;
let output = shell
.exec("php /tmp/change_ip.php && rm /tmp/change_ip.php && configctl interface reconfigure lan")
.await?;
info!("IP change result: {}", output.trim());
Ok(())
}
// ── Phase 2: Pair integration test ─────────────────────────────────
async fn run_pair_test() -> Result<(), Box<dyn std::error::Error>> {
// Verify both VMs are reachable
info!("Checking primary at {PRIMARY_IP}:{API_PORT}...");
if !check_tcp_port(PRIMARY_IP, API_PORT).await {
return Err(format!("Primary not reachable at {PRIMARY_IP}:{API_PORT}").into());
}
info!("Checking backup at {BACKUP_IP}:{API_PORT}...");
if !check_tcp_port(BACKUP_IP, API_PORT).await {
return Err(format!("Backup not reachable at {BACKUP_IP}:{API_PORT}").into());
}
// Create API keys on both
info!("Creating API keys...");
let primary_ip: IpAddr = PRIMARY_IP.parse()?;
let backup_ip: IpAddr = BACKUP_IP.parse()?;
let (primary_key, primary_secret) = create_api_key_ssh(&primary_ip).await?;
let (backup_key, backup_secret) = create_api_key_ssh(&backup_ip).await?;
info!("API keys created for both firewalls");
// Build FirewallPairTopology
let primary_host = LogicalHost {
ip: primary_ip.into(),
name: VM_PRIMARY.to_string(),
};
let backup_host = LogicalHost {
ip: backup_ip.into(),
name: VM_BACKUP.to_string(),
};
let primary_api_creds = OPNSenseApiCredentials {
key: primary_key.clone(),
secret: primary_secret.clone(),
};
let backup_api_creds = OPNSenseApiCredentials {
key: backup_key.clone(),
secret: backup_secret.clone(),
};
let ssh_creds = OPNSenseFirewallCredentials {
username: "root".to_string(),
password: "opnsense".to_string(),
};
let primary_fw = OPNSenseFirewall::with_api_port(
primary_host,
None,
API_PORT,
&primary_api_creds,
&ssh_creds,
)
.await;
let backup_fw =
OPNSenseFirewall::with_api_port(backup_host, None, API_PORT, &backup_api_creds, &ssh_creds)
.await;
let pair = FirewallPairTopology {
primary: primary_fw,
backup: backup_fw,
};
// Build pair scores
let carp_score = CarpVipScore {
vips: vec![VipDef {
mode: VipMode::Carp,
interface: "lan".to_string(),
subnet: CARP_VIP.to_string(),
subnet_bits: 24,
vhid: Some(1),
advbase: Some(1),
advskew: None, // handled by CarpVipScore (primary=0, backup=100)
password: Some(CARP_PASSWORD.to_string()),
peer: None,
}],
backup_advskew: Some(100),
};
let vlan_score = VlanScore {
vlans: vec![VlanDef {
parent_interface: "vtnet0".to_string(),
tag: 100,
description: "pair-test-vlan-100".to_string(),
}],
};
let fw_rule_score = FirewallRuleScore {
rules: vec![FilterRuleDef {
action: FirewallAction::Pass,
direction: Direction::In,
interface: "lan".to_string(),
ip_protocol: IpProtocol::Inet,
protocol: NetworkProtocol::Icmp,
source_net: "any".to_string(),
destination_net: "any".to_string(),
destination_port: None,
gateway: None,
description: "pair-test-allow-icmp".to_string(),
log: false,
}],
};
// Run pair scores
info!("Running pair scores...");
let scores: Vec<Box<dyn Score<FirewallPairTopology>>> = vec![
Box::new(carp_score),
Box::new(vlan_score),
Box::new(fw_rule_score),
];
let args = harmony_cli::Args {
yes: true,
filter: None,
interactive: false,
all: true,
number: 0,
list: false,
};
harmony_cli::run_cli(Inventory::autoload(), pair, scores, args).await?;
// Verify CARP VIPs via API
info!("Verifying CARP VIPs...");
let primary_client = opnsense_api::OpnsenseClient::builder()
.base_url(format!("https://{PRIMARY_IP}:{API_PORT}/api"))
.auth_from_key_secret(&primary_key, &primary_secret)
.skip_tls_verify()
.timeout_secs(60)
.build()?;
let backup_client = opnsense_api::OpnsenseClient::builder()
.base_url(format!("https://{BACKUP_IP}:{API_PORT}/api"))
.auth_from_key_secret(&backup_key, &backup_secret)
.skip_tls_verify()
.timeout_secs(60)
.build()?;
let primary_vips: serde_json::Value = primary_client
.get_typed("interfaces", "vip_settings", "searchItem")
.await?;
let backup_vips: serde_json::Value = backup_client
.get_typed("interfaces", "vip_settings", "searchItem")
.await?;
let primary_vip_count = primary_vips["rowCount"].as_i64().unwrap_or(0);
let backup_vip_count = backup_vips["rowCount"].as_i64().unwrap_or(0);
info!(" Primary VIPs: {primary_vip_count}");
info!(" Backup VIPs: {backup_vip_count}");
assert!(primary_vip_count >= 1, "Primary should have at least 1 VIP");
assert!(backup_vip_count >= 1, "Backup should have at least 1 VIP");
// Verify VLANs on both
let primary_vlans: serde_json::Value = primary_client
.get_typed("interfaces", "vlan_settings", "get")
.await?;
let backup_vlans: serde_json::Value = backup_client
.get_typed("interfaces", "vlan_settings", "get")
.await?;
let p_vlan_count = primary_vlans["vlan"]["vlan"]
.as_object()
.map(|m| m.len())
.unwrap_or(0);
let b_vlan_count = backup_vlans["vlan"]["vlan"]
.as_object()
.map(|m| m.len())
.unwrap_or(0);
info!(" Primary VLANs: {p_vlan_count}");
info!(" Backup VLANs: {b_vlan_count}");
assert!(p_vlan_count >= 1, "Primary should have at least 1 VLAN");
assert!(b_vlan_count >= 1, "Backup should have at least 1 VLAN");
println!();
println!("PASSED - OPNsense firewall pair integration test:");
println!(
" - CarpVipScore: CARP VIP {CARP_VIP} on both (primary advskew=0, backup advskew=100)"
);
println!(" - VlanScore: VLAN 100 on both");
println!(" - FirewallRuleScore: ICMP allow on both");
println!();
println!("VMs are running. Use --clean to tear down.");
Ok(())
}
// ── Helpers ────────────────────────────────────────────────────────
fn prepare_vm_disk(vm_name: &str, img_path: &Path) -> Result<(), Box<dyn std::error::Error>> {
let vm_raw = image_dir().join(format!("{vm_name}.img"));
if !vm_raw.exists() {
info!("Copying nano image for {vm_name}...");
std::fs::copy(img_path, &vm_raw)?;
info!("Injecting config.xml for {vm_name}...");
let config =
harmony::modules::opnsense::image::minimal_config_xml("vtnet1", "vtnet0", BOOT_IP, 24);
harmony::modules::opnsense::image::replace_config_xml(&vm_raw, &config)?;
}
let vm_disk = image_dir().join(format!("{vm_name}.qcow2"));
if !vm_disk.exists() {
info!("Converting {vm_name} to qcow2...");
run_cmd(
"qemu-img",
&[
"convert",
"-f",
"raw",
"-O",
"qcow2",
&vm_raw.to_string_lossy(),
&vm_disk.to_string_lossy(),
],
)?;
run_cmd("qemu-img", &["resize", &vm_disk.to_string_lossy(), "4G"])?;
}
Ok(())
}
fn check_prerequisites() -> Result<(), Box<dyn std::error::Error>> {
let mut ok = true;
for (cmd, test_args) in [
("virsh", vec!["-c", "qemu:///system", "version"]),
("qemu-img", vec!["--version"]),
("bunzip2", vec!["--help"]),
] {
match std::process::Command::new(cmd).args(&test_args).output() {
Ok(out) if out.status.success() => println!("[ok] {cmd}"),
_ => {
println!("[FAIL] {cmd}");
ok = false;
}
}
}
if !ok {
return Err("Prerequisites not met".into());
}
println!("All prerequisites met.");
Ok(())
}
fn run_cmd(cmd: &str, args: &[&str]) -> Result<(), Box<dyn std::error::Error>> {
let status = std::process::Command::new(cmd).args(args).status()?;
if !status.success() {
return Err(format!("{cmd} failed").into());
}
Ok(())
}
fn image_dir() -> PathBuf {
let dir = std::env::var("HARMONY_KVM_IMAGE_DIR").unwrap_or_else(|_| {
dirs::data_dir()
.unwrap_or_else(|| PathBuf::from("/tmp"))
.join("harmony")
.join("kvm")
.join("images")
.to_string_lossy()
.to_string()
});
PathBuf::from(dir)
}
async fn download_image() -> Result<PathBuf, Box<dyn std::error::Error>> {
let dir = image_dir();
std::fs::create_dir_all(&dir)?;
let img_path = dir.join(OPNSENSE_IMG_NAME);
if img_path.exists() {
info!("Image cached: {}", img_path.display());
return Ok(img_path);
}
let bz2_path = dir.join(format!("{OPNSENSE_IMG_NAME}.bz2"));
if !bz2_path.exists() {
info!("Downloading OPNsense nano image (~350MB)...");
let response = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(600))
.build()?
.get(OPNSENSE_IMG_URL)
.send()
.await?;
if !response.status().is_success() {
return Err(format!("Download failed: HTTP {}", response.status()).into());
}
let bytes = response.bytes().await?;
std::fs::write(&bz2_path, &bytes)?;
}
info!("Decompressing...");
run_cmd("bunzip2", &["--keep", &bz2_path.to_string_lossy()])?;
Ok(img_path)
}
async fn clean(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
info!("Cleaning up pair integration...");
for vm_name in [VM_PRIMARY, VM_BACKUP] {
let _ = executor.destroy_vm(vm_name).await;
let _ = executor.undefine_vm(vm_name).await;
for ext in ["img", "qcow2"] {
let path = image_dir().join(format!("{vm_name}.{ext}"));
if path.exists() {
std::fs::remove_file(&path)?;
info!("Removed: {}", path.display());
}
}
}
let _ = executor.delete_network(NET_LAN).await;
info!("Done.");
Ok(())
}
async fn status(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
for (vm_name, ip) in [(VM_PRIMARY, PRIMARY_IP), (VM_BACKUP, BACKUP_IP)] {
match executor.vm_status(vm_name).await {
Ok(s) => {
let api = check_tcp_port(ip, API_PORT).await;
let ssh = check_tcp_port(ip, 22).await;
println!("{vm_name}: {s:?}");
println!(" LAN IP: {ip}");
println!(
" API: {}",
if api { "responding" } else { "not responding" }
);
println!(
" SSH: {}",
if ssh { "responding" } else { "not responding" }
);
}
Err(_) => println!("{vm_name}: not found"),
}
}
Ok(())
}
async fn wait_for_https(ip: &str, port: u16) -> Result<(), Box<dyn std::error::Error>> {
let client = reqwest::Client::builder()
.danger_accept_invalid_certs(true)
.timeout(std::time::Duration::from_secs(5))
.build()?;
let url = format!("https://{ip}:{port}");
for i in 0..60 {
if client.get(&url).send().await.is_ok() {
info!("Web UI responding at {url} (attempt {i})");
return Ok(());
}
if i % 10 == 0 {
info!("Waiting for {url}... (attempt {i})");
}
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
}
Err(format!("{url} did not respond within 5 minutes").into())
}
async fn check_tcp_port(ip: &str, port: u16) -> bool {
tokio::time::timeout(
std::time::Duration::from_secs(3),
tokio::net::TcpStream::connect(format!("{ip}:{port}")),
)
.await
.map(|r| r.is_ok())
.unwrap_or(false)
}
async fn create_api_key_ssh(ip: &IpAddr) -> Result<(String, String), Box<dyn std::error::Error>> {
use opnsense_config::config::{OPNsenseShell, SshCredentials, SshOPNSenseShell};
let ssh_config = Arc::new(russh::client::Config {
inactivity_timeout: None,
..<_>::default()
});
let credentials = SshCredentials::Password {
username: "root".to_string(),
password: "opnsense".to_string(),
};
let shell = SshOPNSenseShell::new((*ip, 22), credentials, ssh_config);
let php_script = r#"<?php
require_once '/usr/local/etc/inc/config.inc';
$key = bin2hex(random_bytes(20));
$secret = bin2hex(random_bytes(40));
$config = OPNsense\Core\Config::getInstance();
foreach ($config->object()->system->user as $user) {
if ((string)$user->name === 'root') {
if (!isset($user->apikeys)) { $user->addChild('apikeys'); }
$item = $user->apikeys->addChild('item');
$item->addChild('key', $key);
$item->addChild('secret', crypt($secret, '$6$' . bin2hex(random_bytes(8)) . '$'));
$config->save();
echo $key . "\n" . $secret . "\n";
exit(0);
}
}
echo "ERROR: root user not found\n";
exit(1);
"#;
shell
.write_content_to_file(php_script, "/tmp/create_api_key.php")
.await?;
let output = shell
.exec("php /tmp/create_api_key.php && rm /tmp/create_api_key.php")
.await?;
let lines: Vec<&str> = output.trim().lines().collect();
if lines.len() >= 2 && !lines[0].starts_with("ERROR") {
Ok((lines[0].to_string(), lines[1].to_string()))
} else {
Err(format!("API key creation failed: {output}").into())
}
}

View File

@@ -1,25 +0,0 @@
[package]
name = "opnsense-vm-integration"
version.workspace = true
edition = "2024"
license.workspace = true
[[bin]]
name = "opnsense-vm-integration"
path = "src/main.rs"
[dependencies]
harmony = { path = "../../harmony" }
harmony_cli = { path = "../../harmony_cli" }
harmony_inventory_agent = { path = "../../harmony_inventory_agent" }
harmony_macros = { path = "../../harmony_macros" }
harmony_types = { path = "../../harmony_types" }
opnsense-api = { path = "../../opnsense-api" }
opnsense-config = { path = "../../opnsense-config" }
tokio.workspace = true
log.workspace = true
env_logger.workspace = true
reqwest.workspace = true
russh.workspace = true
serde_json.workspace = true
dirs = "6"

View File

@@ -1,160 +0,0 @@
# OPNsense VM Integration Example
Fully automated end-to-end integration test: boots an OPNsense VM via KVM, bootstraps SSH and API access without any manual browser interaction, installs packages, runs 11 Harmony Scores, and verifies idempotency (runs all Scores twice, asserts zero duplicates). CI-friendly.
## Quick start
```bash
# 1. One-time setup (libvirt, Docker compatibility)
./examples/opnsense_vm_integration/setup-libvirt.sh
# 2. Verify prerequisites
cargo run -p opnsense-vm-integration -- --check
# 3. Boot + bootstrap + integration test (fully unattended)
cargo run -p opnsense-vm-integration -- --full
# 4. Clean up
cargo run -p opnsense-vm-integration -- --clean
```
Or use the build script:
```bash
./build/opnsense-e2e.sh # check + boot + test
./build/opnsense-e2e.sh --download # download image first
./build/opnsense-e2e.sh --clean # tear down
```
That's it. No browser clicks, no manual SSH setup, no wizard interaction.
## What happens during `--full`
1. Downloads OPNsense 26.1 nano image (~350MB, cached after first download)
2. Injects `config.xml` with virtio interface assignments (vtnet0=LAN, vtnet1=WAN)
3. Creates a 4 GiB qcow2 disk and boots via KVM (1 vCPU, 1GB RAM, 4 NICs)
4. Waits for web UI to respond (~20s)
5. **Automated bootstrap** via `OPNsenseBootstrap`:
- Logs in (root/opnsense) with CSRF token handling
- Aborts the initial setup wizard
- Enables SSH with root login and password auth
- Changes web GUI port to 9443 (avoids HAProxy conflicts)
- Restarts lighttpd via SSH to apply the port change
6. Creates OPNsense API key via SSH (PHP script)
7. Installs `os-haproxy` via firmware API
8. Runs 11 Scores configuring the entire firewall
9. Verifies all configurations via REST API assertions
10. **Idempotency test**: runs all 11 Scores again, asserts entity counts are unchanged
## Step-by-step mode
If you prefer to separate boot and test:
```bash
# Boot + bootstrap (creates VM, enables SSH, sets port)
cargo run -p opnsense-vm-integration -- --boot
# Run integration test (assumes VM is bootstrapped)
cargo run -p opnsense-vm-integration
# Check VM status at any time
cargo run -p opnsense-vm-integration -- --status
```
## Prerequisites
### System requirements
- **Linux** with KVM support (Intel VT-x/AMD-V)
- **~10 GB** free disk space
- **~15 minutes** for first run (image download + firmware update)
- Subsequent runs: ~2 minutes
### Required packages
**Arch/Manjaro:**
```bash
sudo pacman -S libvirt qemu-full dnsmasq
```
**Fedora:**
```bash
sudo dnf install libvirt qemu-kvm dnsmasq
```
**Ubuntu/Debian:**
```bash
sudo apt install libvirt-daemon-system qemu-kvm dnsmasq
```
### Automated setup
```bash
./examples/opnsense_vm_integration/setup-libvirt.sh
```
This handles: user group membership, libvirtd startup, default storage pool, Docker FORWARD policy conflict.
After running setup, apply group membership:
```bash
newgrp libvirt
```
### Docker + libvirt compatibility
Docker sets the iptables FORWARD policy to DROP, which blocks libvirt's NAT networking. The setup script detects this and switches libvirt to the iptables firewall backend so both coexist.
## Scores applied
| # | Score | What it configures |
|---|-------|--------------------|
| 1 | `LoadBalancerScore` | HAProxy with 2 frontends, backends with TCP health checks |
| 2 | `DhcpScore` | DHCP range, 2 static host bindings, PXE boot options |
| 3 | `TftpScore` | TFTP server serving boot files |
| 4 | `NodeExporterScore` | Prometheus node exporter |
| 5 | `VlanScore` | 2 VLANs (tags 100, 200) on vtnet0 |
| 6 | `FirewallRuleScore` | Firewall filter rules with logging |
| 7 | `OutboundNatScore` | Source NAT for outbound traffic |
| 8 | `BinatScore` | Bidirectional 1:1 NAT |
| 9 | `VipScore` | Virtual IPs (IP aliases) |
| 10 | `DnatScore` | Port forwarding rules |
| 11 | `LaggScore` | Link aggregation (vtnet2+vtnet3) |
All Scores are idempotent: the test runs them twice and asserts entity counts are unchanged. This catches duplicate creation bugs that mock tests cannot detect.
## Network architecture
```
Host (192.168.1.10) --- virbr-opn bridge --- OPNsense LAN (192.168.1.1)
192.168.1.0/24 vtnet0
NAT to internet
--- virbr0 (default) --- OPNsense WAN (DHCP)
192.168.122.0/24 vtnet1
NAT to internet
```
## Environment variables
| Variable | Default | Description |
|----------|---------|-------------|
| `RUST_LOG` | (unset) | Log level: `info`, `debug`, `trace` |
| `HARMONY_KVM_URI` | `qemu:///system` | Libvirt connection URI |
| `HARMONY_KVM_IMAGE_DIR` | `~/.local/share/harmony/kvm/images` | Cached disk images |
## Troubleshooting
**VM won't start / permission denied**
Ensure your user is in the `libvirt` group and that the image directory is traversable by the qemu user. The setup script handles this.
**192.168.1.0/24 conflict**
If your host network already uses this subnet, the VM will be unreachable. Edit the constants in `src/main.rs` to use a different subnet.
**HAProxy install fails**
OPNsense may need a firmware update first. The integration test attempts this automatically. If it fails, connect to the web UI at https://192.168.1.1:9443 and update manually.
**Serial console access**
```bash
virsh -c qemu:///system console opn-integration
# Press Ctrl+] to exit
```

View File

@@ -1,140 +0,0 @@
#!/bin/bash
set -euo pipefail
# Setup sudo-less libvirt access for KVM-based harmony examples.
#
# Run once on a fresh machine. After this, all KVM operations work
# without sudo — libvirt authenticates via group membership.
#
# Usage:
# ./setup-libvirt.sh # interactive, asks before each step
# ./setup-libvirt.sh --yes # non-interactive, runs everything
USER="${USER:-$(whoami)}"
AUTO_YES=false
[[ "${1:-}" == "--yes" ]] && AUTO_YES=true
green() { printf '\033[32m%s\033[0m\n' "$*"; }
red() { printf '\033[31m%s\033[0m\n' "$*"; }
bold() { printf '\033[1m%s\033[0m\n' "$*"; }
confirm() {
if $AUTO_YES; then return 0; fi
read -rp "$1 [Y/n] " answer
[[ -z "$answer" || "$answer" =~ ^[Yy] ]]
}
bold "Harmony KVM/libvirt setup"
echo
# ── Step 1: Install packages ────────────────────────────────────────────
echo "Checking required packages..."
MISSING=()
for pkg in qemu-full libvirt dnsmasq ebtables; do
if ! pacman -Qi "$pkg" &>/dev/null; then
MISSING+=("$pkg")
fi
done
if [[ ${#MISSING[@]} -gt 0 ]]; then
echo "Missing packages: ${MISSING[*]}"
if confirm "Install them?"; then
sudo pacman -S --needed "${MISSING[@]}"
else
red "Skipped package installation"
fi
else
green "[ok] All packages installed"
fi
# ── Step 2: Add user to libvirt group ────────────────────────────────────
if groups "$USER" 2>/dev/null | grep -qw libvirt; then
green "[ok] $USER is in libvirt group"
else
echo "$USER is NOT in the libvirt group"
if confirm "Add $USER to libvirt group?"; then
sudo usermod -aG libvirt "$USER"
green "[ok] Added $USER to libvirt group"
echo " Note: you need to log out and back in (or run 'newgrp libvirt') for this to take effect"
fi
fi
# ── Step 3: Start libvirtd ───────────────────────────────────────────────
if systemctl is-active --quiet libvirtd; then
green "[ok] libvirtd is running"
else
echo "libvirtd is not running"
if confirm "Enable and start libvirtd?"; then
sudo systemctl enable --now libvirtd
green "[ok] libvirtd started"
fi
fi
# ── Step 4: Default storage pool ─────────────────────────────────────────
if virsh -c qemu:///system pool-info default &>/dev/null; then
green "[ok] Default storage pool exists"
else
echo "Default storage pool does not exist"
if confirm "Create default storage pool at /var/lib/libvirt/images?"; then
sudo virsh pool-define-as default dir --target /var/lib/libvirt/images
sudo virsh pool-autostart default
sudo virsh pool-start default
green "[ok] Default storage pool created"
fi
fi
# ── Step 5: Fix Docker + libvirt FORWARD conflict ────────────────────────
# Docker sets iptables FORWARD policy to DROP, which blocks libvirt NAT.
# Libvirt defaults to nftables which doesn't interact with Docker's iptables.
# Fix: switch libvirt to iptables backend so rules coexist with Docker.
if docker info &>/dev/null; then
echo "Docker detected."
NETCONF="/etc/libvirt/network.conf"
if grep -q '^firewall_backend' "$NETCONF" 2>/dev/null; then
CURRENT=$(grep '^firewall_backend' "$NETCONF" | head -1)
if echo "$CURRENT" | grep -q 'iptables'; then
green "[ok] libvirt firewall_backend is already iptables"
else
echo "libvirt firewall_backend is: $CURRENT"
echo "Docker's iptables FORWARD DROP will block libvirt NAT."
if confirm "Switch libvirt to iptables backend?"; then
sudo sed -i 's/^firewall_backend.*/firewall_backend = "iptables"/' "$NETCONF"
echo "Restarting libvirtd to apply..."
sudo systemctl restart libvirtd
green "[ok] Switched to iptables backend"
fi
fi
else
echo "libvirt uses nftables (default), but Docker's iptables FORWARD DROP blocks NAT."
if confirm "Set libvirt to use iptables backend (recommended with Docker)?"; then
echo 'firewall_backend = "iptables"' | sudo tee -a "$NETCONF" >/dev/null
echo "Restarting libvirtd to apply..."
sudo systemctl restart libvirtd
# Re-activate networks so they get iptables rules
for net in $(virsh -c qemu:///system net-list --name 2>/dev/null); do
virsh -c qemu:///system net-destroy "$net" 2>/dev/null
virsh -c qemu:///system net-start "$net" 2>/dev/null
done
green "[ok] Switched to iptables backend and restarted networks"
fi
fi
else
green "[ok] Docker not detected, no FORWARD conflict"
fi
# ── Done ─────────────────────────────────────────────────────────────────
echo
bold "Setup complete."
echo
echo "If you were added to the libvirt group, apply it now:"
echo " newgrp libvirt"
echo
echo "Then verify:"
echo " cargo run -p opnsense-vm-integration -- --check"

File diff suppressed because it is too large Load Diff

View File

@@ -16,7 +16,6 @@ use harmony_types::{k8s_name::K8sName, net::Url};
async fn main() {
let application = Arc::new(RustWebapp {
name: "test-rhob-monitoring".to_string(),
version: "0.1.0".to_string(),
dns: "test-rhob-monitoring.harmony.mcd".to_string(),
project_root: PathBuf::from("./webapp"), // Relative from 'harmony-path' param
framework: Some(RustWebFramework::Leptos),

View File

@@ -20,7 +20,6 @@ use harmony_types::k8s_name::K8sName;
async fn main() {
let application = Arc::new(RustWebapp {
name: "harmony-example-rust-webapp".to_string(),
version: "0.1.0".to_string(),
dns: "harmony-example-rust-webapp.harmony.mcd".to_string(),
project_root: PathBuf::from("./webapp"),
framework: Some(RustWebFramework::Leptos),

View File

@@ -32,21 +32,17 @@ pub async fn get_topology() -> HAClusterTopology {
let switch_client = Arc::new(switch_client);
let config = SecretManager::get_or_prompt::<OPNSenseFirewallConfig>()
.await
.unwrap();
let api_creds = harmony::config::secret::OPNSenseApiCredentials {
key: config.username.clone(),
secret: config.password.clone(),
};
let ssh_creds = harmony::config::secret::OPNSenseFirewallCredentials {
username: config.username,
password: config.password,
};
let config = SecretManager::get_or_prompt::<OPNSenseFirewallConfig>().await;
let config = config.unwrap();
let opnsense = Arc::new(
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds)
.await,
harmony::infra::opnsense::OPNSenseFirewall::new(
firewall,
None,
&config.username,
&config.password,
)
.await,
);
let lan_subnet = ipv4!("192.168.40.0");
let gateway_ipv4 = ipv4!("192.168.40.1");

View File

@@ -17,7 +17,6 @@ use std::{path::PathBuf, sync::Arc};
async fn main() {
let application = Arc::new(RustWebapp {
name: "harmony-example-tryrust".to_string(),
version: "0.1.0".to_string(),
dns: "tryrust.example.harmony.mcd".to_string(),
project_root: PathBuf::from("./tryrust.org"), // <== Project root, in this case it is a
// submodule

View File

@@ -69,6 +69,5 @@ fn build_large_score() -> LoadBalancerScore {
lb_service.clone(),
lb_service.clone(),
],
wan_firewall_ports: vec![],
}
}

View File

@@ -4,7 +4,7 @@ use kube::config::{KubeConfigOptions, Kubeconfig};
use kube::{Client, Config, Discovery, Error};
use log::error;
use serde::Serialize;
use tokio::sync::{OnceCell, RwLock};
use tokio::sync::OnceCell;
use crate::types::KubernetesDistribution;
@@ -23,9 +23,7 @@ pub struct K8sClient {
/// to stdout instead. Initialised from the `DRY_RUN` environment variable.
pub(crate) dry_run: bool,
pub(crate) k8s_distribution: Arc<OnceCell<KubernetesDistribution>>,
/// API discovery cache. Wrapped in `RwLock` so it can be invalidated
/// after installing CRDs or operators that register new API groups.
pub(crate) discovery: Arc<RwLock<Option<Arc<Discovery>>>>,
pub(crate) discovery: Arc<OnceCell<Discovery>>,
}
impl Serialize for K8sClient {
@@ -54,7 +52,7 @@ impl K8sClient {
dry_run: read_dry_run_from_env(),
client,
k8s_distribution: Arc::new(OnceCell::new()),
discovery: Arc::new(RwLock::new(None)),
discovery: Arc::new(OnceCell::new()),
}
}

View File

@@ -1,4 +1,3 @@
use std::sync::Arc;
use std::time::Duration;
use kube::{Discovery, Error};
@@ -16,55 +15,38 @@ impl K8sClient {
self.client.clone().apiserver_version().await
}
/// Runs API discovery, caching the result. Call [`invalidate_discovery`]
/// after installing CRDs or operators to force a refresh on the next call.
pub async fn discovery(&self) -> Result<Arc<Discovery>, Error> {
// Fast path: return cached discovery
{
let guard = self.discovery.read().await;
if let Some(d) = guard.as_ref() {
return Ok(Arc::clone(d));
}
}
// Slow path: run discovery with retries
/// Runs (and caches) Kubernetes API discovery with exponential-backoff retries.
pub async fn discovery(&self) -> Result<&Discovery, Error> {
let retry_strategy = ExponentialBackoff::from_millis(1000)
.max_delay(Duration::from_secs(32))
.take(6);
let attempt = Mutex::new(0u32);
let d = Retry::spawn(retry_strategy, || async {
Retry::spawn(retry_strategy, || async {
let mut n = attempt.lock().await;
*n += 1;
debug!("Running Kubernetes API discovery (attempt {})", *n);
Discovery::new(self.client.clone())
.run()
.await
.map_err(|e| {
warn!("Kubernetes API discovery failed (attempt {}): {}", *n, e);
e
match self
.discovery
.get_or_try_init(async || {
debug!("Running Kubernetes API discovery (attempt {})", *n);
let d = Discovery::new(self.client.clone()).run().await?;
debug!("Kubernetes API discovery completed");
Ok(d)
})
.await
{
Ok(d) => Ok(d),
Err(e) => {
warn!("Kubernetes API discovery failed (attempt {}): {}", *n, e);
Err(e)
}
}
})
.await
.map_err(|e| {
error!("Kubernetes API discovery failed after all retries: {}", e);
e
})?;
debug!("Kubernetes API discovery completed");
let d = Arc::new(d);
let mut guard = self.discovery.write().await;
*guard = Some(Arc::clone(&d));
Ok(d)
}
/// Clears the cached API discovery so the next call to [`discovery`]
/// re-fetches from the API server. Call this after installing CRDs or
/// operators that register new API groups.
pub async fn invalidate_discovery(&self) {
let mut guard = self.discovery.write().await;
*guard = None;
debug!("API discovery cache invalidated");
})
}
/// Detect which Kubernetes distribution is running. Result is cached for

View File

@@ -6,10 +6,8 @@ pub mod discovery;
pub mod helper;
pub mod node;
pub mod pod;
pub mod port_forward;
pub mod resources;
pub mod types;
pub use client::K8sClient;
pub use port_forward::PortForwardHandle;
pub use types::{DrainOptions, KubernetesDistribution, NodeFile, ScopeResolver, WriteMode};

View File

@@ -190,77 +190,4 @@ impl K8sClient {
}
}
}
/// Execute a command in a specific pod by name, capturing stdout.
///
/// Returns the captured stdout on success. On failure, the error string
/// includes stderr output from the remote command.
pub async fn exec_pod_capture_output(
&self,
pod_name: &str,
namespace: Option<&str>,
command: Vec<&str>,
) -> Result<String, String> {
let api: Api<Pod> = match namespace {
Some(ns) => Api::namespaced(self.client.clone(), ns),
None => Api::default_namespaced(self.client.clone()),
};
match api
.exec(
pod_name,
command,
&AttachParams::default().stdout(true).stderr(true),
)
.await
{
Err(e) => Err(e.to_string()),
Ok(mut process) => {
let status = process
.take_status()
.expect("No status handle")
.await
.expect("Status channel closed");
let mut stdout_buf = String::new();
if let Some(mut stdout) = process.stdout() {
stdout
.read_to_string(&mut stdout_buf)
.await
.map_err(|e| format!("Failed to read stdout: {e}"))?;
}
let mut stderr_buf = String::new();
if let Some(mut stderr) = process.stderr() {
stderr
.read_to_string(&mut stderr_buf)
.await
.map_err(|e| format!("Failed to read stderr: {e}"))?;
}
if let Some(s) = status.status {
debug!("exec_pod status: {} - {:?}", s, status.details);
if s == "Success" {
Ok(stdout_buf)
} else {
Err(format!("{stderr_buf}"))
}
} else {
Err("No inner status from pod exec".to_string())
}
}
}
}
/// Execute a command in a specific pod by name (no output capture).
pub async fn exec_pod(
&self,
pod_name: &str,
namespace: Option<&str>,
command: Vec<&str>,
) -> Result<(), String> {
self.exec_pod_capture_output(pod_name, namespace, command)
.await
.map(|_| ())
}
}

View File

@@ -1,133 +0,0 @@
use std::net::SocketAddr;
use k8s_openapi::api::core::v1::Pod;
use kube::{Api, Error, error::DiscoveryError};
use log::{debug, error, info};
use tokio::net::TcpListener;
use crate::client::K8sClient;
/// Handle to a running port-forward. The forward is stopped when the handle is
/// dropped (or when [`abort`](Self::abort) is called explicitly).
pub struct PortForwardHandle {
local_addr: SocketAddr,
abort_handle: tokio::task::AbortHandle,
}
impl PortForwardHandle {
/// The local address the listener is bound to.
pub fn local_addr(&self) -> SocketAddr {
self.local_addr
}
/// The local port (convenience for `local_addr().port()`).
pub fn port(&self) -> u16 {
self.local_addr.port()
}
/// Stop the port-forward and close the listener.
pub fn abort(&self) {
self.abort_handle.abort();
}
}
impl Drop for PortForwardHandle {
fn drop(&mut self) {
self.abort_handle.abort();
}
}
impl K8sClient {
/// Forward a pod port to a local TCP listener.
///
/// Binds `127.0.0.1:{local_port}` (pass 0 to let the OS pick a free port)
/// and proxies every incoming TCP connection to the pod's `remote_port`
/// through the Kubernetes API server's portforward subresource (WebSocket).
///
/// Returns a [`PortForwardHandle`] whose [`port()`](PortForwardHandle::port)
/// gives the actual bound port. The forward runs in a background task and
/// is automatically stopped when the handle is dropped.
pub async fn port_forward(
&self,
pod_name: &str,
namespace: &str,
local_port: u16,
remote_port: u16,
) -> Result<PortForwardHandle, Error> {
let listener = TcpListener::bind(SocketAddr::from(([127, 0, 0, 1], local_port)))
.await
.map_err(|e| {
Error::Discovery(DiscoveryError::MissingResource(format!(
"Failed to bind 127.0.0.1:{local_port}: {e}"
)))
})?;
let local_addr = listener.local_addr().map_err(|e| {
Error::Discovery(DiscoveryError::MissingResource(format!(
"Failed to get local address: {e}"
)))
})?;
info!(
"Port-forward {} -> {}/{}:{}",
local_addr, namespace, pod_name, remote_port
);
let client = self.client.clone();
let ns = namespace.to_string();
let pod = pod_name.to_string();
let task = tokio::spawn(async move {
let api: Api<Pod> = Api::namespaced(client, &ns);
loop {
let (mut tcp_stream, peer) = match listener.accept().await {
Ok(conn) => conn,
Err(e) => {
debug!("Port-forward listener accept error: {e}");
break;
}
};
debug!("Port-forward connection from {peer}");
let api = api.clone();
let pod = pod.clone();
tokio::spawn(async move {
let mut pf = match api.portforward(&pod, &[remote_port]).await {
Ok(pf) => pf,
Err(e) => {
error!("Port-forward WebSocket setup failed: {e}");
return;
}
};
let mut kube_stream = match pf.take_stream(remote_port) {
Some(s) => s,
None => {
error!("Port-forward: no stream for port {remote_port}");
return;
}
};
match tokio::io::copy_bidirectional(&mut tcp_stream, &mut kube_stream).await {
Ok((from_client, from_pod)) => {
debug!(
"Port-forward connection closed ({from_client} bytes sent, {from_pod} bytes received)"
);
}
Err(e) => {
debug!("Port-forward copy error: {e}");
}
}
drop(pf);
});
}
});
Ok(PortForwardHandle {
local_addr,
abort_handle: task.abort_handle(),
})
}
}

View File

@@ -151,28 +151,6 @@ impl K8sClient {
Ok(!crds.items.is_empty())
}
/// Polls until a CRD is registered in the API server.
pub async fn wait_for_crd(&self, name: &str, timeout: Option<Duration>) -> Result<(), Error> {
let timeout = timeout.unwrap_or(Duration::from_secs(60));
let start = std::time::Instant::now();
let poll = Duration::from_secs(2);
loop {
if self.has_crd(name).await? {
return Ok(());
}
if start.elapsed() > timeout {
return Err(Error::Discovery(
kube::error::DiscoveryError::MissingResource(format!(
"CRD '{name}' not registered within {}s",
timeout.as_secs()
)),
));
}
tokio::time::sleep(poll).await;
}
}
pub async fn service_account_api(&self, namespace: &str) -> Api<ServiceAccount> {
Api::namespaced(self.client.clone(), namespace)
}
@@ -292,23 +270,6 @@ impl K8sClient {
api.get_opt(name).await
}
/// Deletes a single named resource. Returns `Ok(())` on success or if the
/// resource was already absent (idempotent).
pub async fn delete_resource<K>(&self, name: &str, namespace: Option<&str>) -> Result<(), Error>
where
K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
<K as Resource>::Scope: ScopeResolver<K>,
<K as Resource>::DynamicType: Default,
{
let api: Api<K> =
<<K as Resource>::Scope as ScopeResolver<K>>::get_api(&self.client, namespace);
match api.delete(name, &kube::api::DeleteParams::default()).await {
Ok(_) => Ok(()),
Err(Error::Api(ErrorResponse { code: 404, .. })) => Ok(()),
Err(e) => Err(e),
}
}
pub async fn list_resources<K>(
&self,
namespace: Option<&str>,

View File

@@ -12,7 +12,6 @@ testing = []
hex = "0.4"
reqwest = { version = "0.11", features = [
"blocking",
"cookies",
"json",
"rustls-tls",
], default-features = false }
@@ -29,7 +28,6 @@ log.workspace = true
env_logger.workspace = true
async-trait.workspace = true
cidr.workspace = true
opnsense-api = { path = "../opnsense-api" }
opnsense-config = { path = "../opnsense-config" }
opnsense-config-xml = { path = "../opnsense-config-xml" }
harmony_macros = { path = "../harmony_macros" }
@@ -91,4 +89,3 @@ virt = "0.4.3"
[dev-dependencies]
pretty_assertions.workspace = true
assertor.workspace = true
httptest = "0.16"

View File

@@ -8,12 +8,6 @@ pub struct OPNSenseFirewallCredentials {
pub password: String,
}
#[derive(Secret, Serialize, Deserialize, JsonSchema, Debug, PartialEq)]
pub struct OPNSenseApiCredentials {
pub key: String,
pub secret: String,
}
// TODO we need a better way to handle multiple "instances" of the same secret structure.
#[derive(Secret, Serialize, Deserialize, JsonSchema, Debug, PartialEq)]
pub struct SshKeyPair {

View File

@@ -32,7 +32,6 @@ pub enum InterpretName {
K8sPrometheusCrdAlerting,
CephRemoveOsd,
DiscoverInventoryAgent,
DeployInventoryAgent,
CephClusterHealth,
Custom(&'static str),
RHOBAlerting,
@@ -65,7 +64,6 @@ impl std::fmt::Display for InterpretName {
InterpretName::K8sPrometheusCrdAlerting => f.write_str("K8sPrometheusCrdAlerting"),
InterpretName::CephRemoveOsd => f.write_str("CephRemoveOsd"),
InterpretName::DiscoverInventoryAgent => f.write_str("DiscoverInventoryAgent"),
InterpretName::DeployInventoryAgent => f.write_str("DeployInventoryAgent"),
InterpretName::CephClusterHealth => f.write_str("CephClusterHealth"),
InterpretName::Custom(name) => f.write_str(name),
InterpretName::RHOBAlerting => f.write_str("RHOBAlerting"),

View File

@@ -2,7 +2,6 @@ use harmony_types::id::Id;
use std::collections::BTreeMap;
use async_trait::async_trait;
use log::info;
use serde::Serialize;
use serde_value::Value;
@@ -13,18 +12,6 @@ use super::{
topology::Topology,
};
/// Format a duration in a human-readable way.
fn format_duration(d: std::time::Duration) -> String {
let secs = d.as_secs();
if secs < 60 {
format!("{:.1}s", d.as_secs_f64())
} else if secs < 3600 {
format!("{}m {}s", secs / 60, secs % 60)
} else {
format!("{}h {}m {}s", secs / 3600, (secs % 3600) / 60, secs % 60)
}
}
#[async_trait]
pub trait Score<T: Topology>:
std::fmt::Debug + ScoreToString<T> + Send + Sync + CloneBoxScore<T> + SerializeScore<T>
@@ -36,47 +23,22 @@ pub trait Score<T: Topology>:
) -> Result<Outcome, InterpretError> {
let id = Id::default();
let interpret = self.create_interpret();
let score_name = self.name();
let interpret_name = interpret.get_name().to_string();
instrumentation::instrument(HarmonyEvent::InterpretExecutionStarted {
execution_id: id.clone().to_string(),
topology: topology.name().into(),
interpret: interpret_name.clone(),
score: score_name.clone(),
message: format!("{} running...", interpret_name),
interpret: interpret.get_name().to_string(),
score: self.name(),
message: format!("{} running...", interpret.get_name()),
})
.unwrap();
let start = std::time::Instant::now();
let result = interpret.execute(inventory, topology).await;
let elapsed = start.elapsed();
match &result {
Ok(outcome) => {
info!(
"[{}] {} in {} — {}",
score_name,
outcome.status,
format_duration(elapsed),
outcome.message
);
}
Err(e) => {
info!(
"[{}] FAILED after {} — {}",
score_name,
format_duration(elapsed),
e
);
}
}
instrumentation::instrument(HarmonyEvent::InterpretExecutionFinished {
execution_id: id.clone().to_string(),
topology: topology.name().into(),
interpret: interpret_name,
score: score_name,
interpret: interpret.get_name().to_string(),
score: self.name(),
outcome: result.clone(),
})
.unwrap();

View File

@@ -1,844 +0,0 @@
//! Higher-order topology for managing an OPNsense firewall HA pair.
//!
//! Wraps a primary and backup `OPNSenseFirewall` instance. Most scores are
//! applied identically to both; CARP VIPs get differentiated advskew values
//! (primary=0, backup=configurable) to establish correct failover priority.
//!
//! See ROADMAP/10-firewall-pair-topology.md for future work (generic trait,
//! delegation macro, XMLRPC sync, integration tests).
//! See ROADMAP/11-named-config-instances.md for per-device credential support.
use std::net::IpAddr;
use std::str::FromStr;
use async_trait::async_trait;
use harmony_types::firewall::VipMode;
use harmony_types::id::Id;
use harmony_types::net::{IpAddress, MacAddress};
use log::info;
use serde::Serialize;
use crate::config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials};
use crate::data::Version;
use crate::executors::ExecutorError;
use crate::infra::opnsense::OPNSenseFirewall;
use crate::interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome};
use crate::inventory::Inventory;
use crate::modules::opnsense::dnat::DnatScore;
use crate::modules::opnsense::firewall::{BinatScore, FirewallRuleScore, OutboundNatScore};
use crate::modules::opnsense::lagg::LaggScore;
use crate::modules::opnsense::vip::VipDef;
use crate::modules::opnsense::vlan::VlanScore;
use crate::score::Score;
use crate::topology::{
DHCPStaticEntry, DhcpServer, LogicalHost, PreparationError, PreparationOutcome, PxeOptions,
Topology,
};
use harmony_secret::SecretManager;
// ── FirewallPairTopology ───────────────────────────────────────────
/// An OPNsense HA firewall pair managed via CARP.
///
/// Configuration is applied independently to both firewalls (not via XMLRPC
/// sync), since some settings like CARP advskew intentionally differ between
/// primary and backup.
#[derive(Debug, Clone)]
pub struct FirewallPairTopology {
pub primary: OPNSenseFirewall,
pub backup: OPNSenseFirewall,
}
impl FirewallPairTopology {
/// Construct a firewall pair from the harmony config system.
///
/// Reads the following environment variables:
/// - `OPNSENSE_PRIMARY_IP` — IP address of the primary firewall
/// - `OPNSENSE_BACKUP_IP` — IP address of the backup firewall
/// - `OPNSENSE_API_PORT` — API/web GUI port (default: 443)
///
/// Credentials are loaded via `SecretManager::get_or_prompt`.
pub async fn opnsense_from_config() -> Self {
// TODO: both firewalls share the same credentials. Once named config
// instances are available (ROADMAP/11), use per-device credentials:
// ConfigManager::get_named::<OPNSenseApiCredentials>("fw-primary")
let ssh_creds = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
.await
.expect("Failed to get SSH credentials");
let api_creds = SecretManager::get_or_prompt::<OPNSenseApiCredentials>()
.await
.expect("Failed to get API credentials");
Self::opnsense_with_credentials(&ssh_creds, &api_creds, &ssh_creds, &api_creds).await
}
pub async fn opnsense_with_credentials(
primary_ssh_creds: &OPNSenseFirewallCredentials,
primary_api_creds: &OPNSenseApiCredentials,
backup_ssh_creds: &OPNSenseFirewallCredentials,
backup_api_creds: &OPNSenseApiCredentials,
) -> Self {
let primary_ip =
std::env::var("OPNSENSE_PRIMARY_IP").expect("OPNSENSE_PRIMARY_IP must be set");
let backup_ip =
std::env::var("OPNSENSE_BACKUP_IP").expect("OPNSENSE_BACKUP_IP must be set");
let api_port: u16 = std::env::var("OPNSENSE_API_PORT")
.ok()
.map(|p| {
p.parse()
.expect("OPNSENSE_API_PORT must be a valid port number")
})
.unwrap_or(443);
let primary_host = LogicalHost {
ip: IpAddr::from_str(&primary_ip).expect("OPNSENSE_PRIMARY_IP must be a valid IP"),
name: "fw-primary".to_string(),
};
let backup_host = LogicalHost {
ip: IpAddr::from_str(&backup_ip).expect("OPNSENSE_BACKUP_IP must be a valid IP"),
name: "fw-backup".to_string(),
};
info!("Connecting to primary firewall at {primary_ip}:{api_port}");
let primary = OPNSenseFirewall::with_api_port(
primary_host,
None,
api_port,
&primary_api_creds,
&primary_ssh_creds,
)
.await;
info!("Connecting to backup firewall at {backup_ip}:{api_port}");
let backup = OPNSenseFirewall::with_api_port(
backup_host,
None,
api_port,
&backup_api_creds,
&backup_ssh_creds,
)
.await;
Self { primary, backup }
}
}
#[async_trait]
impl Topology for FirewallPairTopology {
fn name(&self) -> &str {
"FirewallPairTopology"
}
async fn ensure_ready(&self) -> Result<PreparationOutcome, PreparationError> {
let primary_outcome = self.primary.ensure_ready().await?;
let backup_outcome = self.backup.ensure_ready().await?;
match (primary_outcome, backup_outcome) {
(PreparationOutcome::Noop, PreparationOutcome::Noop) => Ok(PreparationOutcome::Noop),
(p, b) => {
let mut details = Vec::new();
if let PreparationOutcome::Success { details: d } = p {
details.push(format!("Primary: {}", d));
}
if let PreparationOutcome::Success { details: d } = b {
details.push(format!("Backup: {}", d));
}
Ok(PreparationOutcome::Success {
details: details.join(", "),
})
}
}
}
}
// ── DhcpServer delegation ──────────────────────────────────────────
//
// Required so that DhcpScore (which uses `impl<T: Topology + DhcpServer> Score<T>`)
// automatically works with FirewallPairTopology.
#[async_trait]
impl DhcpServer for FirewallPairTopology {
async fn commit_config(&self) -> Result<(), ExecutorError> {
self.primary.commit_config().await?;
self.backup.commit_config().await
}
async fn add_static_mapping(&self, entry: &DHCPStaticEntry) -> Result<(), ExecutorError> {
self.primary.add_static_mapping(entry).await?;
self.backup.add_static_mapping(entry).await
}
async fn remove_static_mapping(&self, mac: &MacAddress) -> Result<(), ExecutorError> {
self.primary.remove_static_mapping(mac).await?;
self.backup.remove_static_mapping(mac).await
}
async fn list_static_mappings(&self) -> Vec<(MacAddress, IpAddress)> {
// Return primary's view — both should be identical
self.primary.list_static_mappings().await
}
/// Returns the primary firewall's IP. In a CARP setup, callers
/// typically want the CARP VIP instead — use the VIP address directly.
fn get_ip(&self) -> IpAddress {
self.primary.get_ip()
}
/// Returns the primary firewall's host. See `get_ip()` note.
fn get_host(&self) -> LogicalHost {
self.primary.get_host()
}
async fn set_pxe_options(&self, options: PxeOptions) -> Result<(), ExecutorError> {
// PXE options are the same on both; construct a second copy for backup
let backup_options = PxeOptions {
ipxe_filename: options.ipxe_filename.clone(),
bios_filename: options.bios_filename.clone(),
efi_filename: options.efi_filename.clone(),
tftp_ip: options.tftp_ip,
};
self.primary.set_pxe_options(options).await?;
self.backup.set_pxe_options(backup_options).await
}
async fn set_dhcp_range(
&self,
start: &IpAddress,
end: &IpAddress,
) -> Result<(), ExecutorError> {
self.primary.set_dhcp_range(start, end).await?;
self.backup.set_dhcp_range(start, end).await
}
}
// ── Helper for uniform score delegation ────────────────────────────
/// Standard boilerplate for Interpret methods on pair scores.
macro_rules! pair_interpret_boilerplate {
($name:expr) => {
fn get_name(&self) -> InterpretName {
InterpretName::Custom($name)
}
fn get_version(&self) -> Version {
Version::from("1.0.0").unwrap()
}
fn get_status(&self) -> InterpretStatus {
InterpretStatus::QUEUED
}
fn get_children(&self) -> Vec<Id> {
vec![]
}
};
}
// ── LaggScore for FirewallPairTopology ──────────────────────────────
impl Score<FirewallPairTopology> for LaggScore {
fn name(&self) -> String {
"LaggScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
Box::new(LaggPairInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone, Serialize)]
struct LaggPairInterpret {
score: LaggScore,
}
#[async_trait]
impl Interpret<FirewallPairTopology> for LaggPairInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &FirewallPairTopology,
) -> Result<Outcome, InterpretError> {
let inner = self.score.create_interpret();
info!("Applying LaggScore to primary firewall");
inner.execute(inventory, &topology.primary).await?;
info!("Applying LaggScore to backup firewall");
inner.execute(inventory, &topology.backup).await
}
pair_interpret_boilerplate!("LaggScore (pair)");
}
// ── VlanScore for FirewallPairTopology ──────────────────────────────
impl Score<FirewallPairTopology> for VlanScore {
fn name(&self) -> String {
"VlanScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
Box::new(VlanPairInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone, Serialize)]
struct VlanPairInterpret {
score: VlanScore,
}
#[async_trait]
impl Interpret<FirewallPairTopology> for VlanPairInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &FirewallPairTopology,
) -> Result<Outcome, InterpretError> {
let inner = self.score.create_interpret();
info!("Applying VlanScore to primary firewall");
inner.execute(inventory, &topology.primary).await?;
info!("Applying VlanScore to backup firewall");
inner.execute(inventory, &topology.backup).await
}
pair_interpret_boilerplate!("VlanScore (pair)");
}
// ── FirewallRuleScore for FirewallPairTopology ─────────────────────
impl Score<FirewallPairTopology> for FirewallRuleScore {
fn name(&self) -> String {
"FirewallRuleScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
Box::new(FirewallRulePairInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone, Serialize)]
struct FirewallRulePairInterpret {
score: FirewallRuleScore,
}
#[async_trait]
impl Interpret<FirewallPairTopology> for FirewallRulePairInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &FirewallPairTopology,
) -> Result<Outcome, InterpretError> {
let inner = self.score.create_interpret();
info!("Applying FirewallRuleScore to primary firewall");
inner.execute(inventory, &topology.primary).await?;
info!("Applying FirewallRuleScore to backup firewall");
inner.execute(inventory, &topology.backup).await
}
pair_interpret_boilerplate!("FirewallRuleScore (pair)");
}
// ── BinatScore for FirewallPairTopology ────────────────────────────
impl Score<FirewallPairTopology> for BinatScore {
fn name(&self) -> String {
"BinatScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
Box::new(BinatPairInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone, Serialize)]
struct BinatPairInterpret {
score: BinatScore,
}
#[async_trait]
impl Interpret<FirewallPairTopology> for BinatPairInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &FirewallPairTopology,
) -> Result<Outcome, InterpretError> {
let inner = self.score.create_interpret();
info!("Applying BinatScore to primary firewall");
inner.execute(inventory, &topology.primary).await?;
info!("Applying BinatScore to backup firewall");
inner.execute(inventory, &topology.backup).await
}
pair_interpret_boilerplate!("BinatScore (pair)");
}
// ── OutboundNatScore for FirewallPairTopology ──────────────────────
impl Score<FirewallPairTopology> for OutboundNatScore {
fn name(&self) -> String {
"OutboundNatScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
Box::new(OutboundNatPairInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone, Serialize)]
struct OutboundNatPairInterpret {
score: OutboundNatScore,
}
#[async_trait]
impl Interpret<FirewallPairTopology> for OutboundNatPairInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &FirewallPairTopology,
) -> Result<Outcome, InterpretError> {
let inner = self.score.create_interpret();
info!("Applying OutboundNatScore to primary firewall");
inner.execute(inventory, &topology.primary).await?;
info!("Applying OutboundNatScore to backup firewall");
inner.execute(inventory, &topology.backup).await
}
pair_interpret_boilerplate!("OutboundNatScore (pair)");
}
// ── DnatScore for FirewallPairTopology ─────────────────────────────
impl Score<FirewallPairTopology> for DnatScore {
fn name(&self) -> String {
"DnatScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
Box::new(DnatPairInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone, Serialize)]
struct DnatPairInterpret {
score: DnatScore,
}
#[async_trait]
impl Interpret<FirewallPairTopology> for DnatPairInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &FirewallPairTopology,
) -> Result<Outcome, InterpretError> {
let inner = self.score.create_interpret();
info!("Applying DnatScore to primary firewall");
inner.execute(inventory, &topology.primary).await?;
info!("Applying DnatScore to backup firewall");
inner.execute(inventory, &topology.backup).await
}
pair_interpret_boilerplate!("DnatScore (pair)");
}
// ── CarpVipScore ───────────────────────────────────────────────────
/// CARP-aware VIP score for firewall pairs.
///
/// Applies VIPs to both firewalls with differentiated CARP priority:
/// - Primary always gets `advskew=0` (highest priority, becomes CARP master)
/// - Backup gets `backup_advskew` (default 100, lower priority)
///
/// Non-CARP VIPs (IP alias, ProxyARP) are applied identically to both.
///
/// This is a distinct type from `VipScore` because the caller does not
/// specify advskew per-firewall — the pair semantics enforce it.
#[derive(Debug, Clone, Serialize)]
pub struct CarpVipScore {
pub vips: Vec<VipDef>,
/// advskew applied to backup firewall for CARP VIPs (default 100).
/// Primary always gets advskew=0.
pub backup_advskew: Option<u16>,
}
impl Score<FirewallPairTopology> for CarpVipScore {
fn name(&self) -> String {
"CarpVipScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
Box::new(CarpVipInterpret {
score: self.clone(),
})
}
}
#[derive(Debug, Clone, Serialize)]
struct CarpVipInterpret {
score: CarpVipScore,
}
impl CarpVipInterpret {
async fn apply_vips_to(
&self,
firewall: &OPNSenseFirewall,
role: &str,
carp_advskew: u16,
) -> Result<(), InterpretError> {
let vip_config = firewall.get_opnsense_config().vip();
for vip in &self.score.vips {
let advskew = if vip.mode == VipMode::Carp {
Some(carp_advskew)
} else {
vip.advskew
};
info!(
"Ensuring VIP {} on {} {} (advskew={:?})",
vip.subnet, role, vip.interface, advskew
);
vip_config
.ensure_vip_from(
&vip.mode,
&vip.interface,
&vip.subnet,
vip.subnet_bits,
vip.vhid,
vip.advbase,
advskew,
vip.password.as_deref(),
vip.peer.as_deref(),
)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
}
Ok(())
}
}
#[async_trait]
impl Interpret<FirewallPairTopology> for CarpVipInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &FirewallPairTopology,
) -> Result<Outcome, InterpretError> {
let backup_skew = self.score.backup_advskew.unwrap_or(100);
self.apply_vips_to(&topology.primary, "primary", 0).await?;
self.apply_vips_to(&topology.backup, "backup", backup_skew)
.await?;
Ok(Outcome::success(format!(
"Configured {} VIPs on pair (primary advskew=0, backup advskew={})",
self.score.vips.len(),
backup_skew
)))
}
pair_interpret_boilerplate!("CarpVipScore");
}
#[cfg(test)]
mod tests {
use super::*;
use httptest::{Expectation, Server, matchers::request, responders::*};
use opnsense_api::OpnsenseClient;
use std::sync::Arc;
/// Dummy SSH shell for tests — never called, satisfies the `OPNsenseShell` trait.
#[derive(Debug)]
struct NoopShell;
#[async_trait]
impl opnsense_config::config::OPNsenseShell for NoopShell {
async fn exec(&self, _cmd: &str) -> Result<String, opnsense_config::Error> {
unimplemented!("test-only shell")
}
async fn write_content_to_temp_file(
&self,
_content: &str,
) -> Result<String, opnsense_config::Error> {
unimplemented!("test-only shell")
}
async fn write_content_to_file(
&self,
_content: &str,
_filename: &str,
) -> Result<String, opnsense_config::Error> {
unimplemented!("test-only shell")
}
async fn upload_folder(
&self,
_source: &str,
_destination: &str,
) -> Result<String, opnsense_config::Error> {
unimplemented!("test-only shell")
}
}
fn mock_opnsense_config(server: &Server) -> opnsense_config::Config {
let url = server.url("/api").to_string();
let client = OpnsenseClient::builder()
.base_url(url)
.auth_from_key_secret("test_key", "test_secret")
.build()
.unwrap();
let shell: Arc<dyn opnsense_config::config::OPNsenseShell> = Arc::new(NoopShell);
opnsense_config::Config::new(client, shell)
}
fn mock_firewall(server: &Server, name: &str) -> OPNSenseFirewall {
let host = LogicalHost {
ip: "127.0.0.1".parse().unwrap(),
name: name.to_string(),
};
OPNSenseFirewall::from_config(host, mock_opnsense_config(server))
}
fn mock_pair(primary_server: &Server, backup_server: &Server) -> FirewallPairTopology {
FirewallPairTopology {
primary: mock_firewall(primary_server, "fw-primary"),
backup: mock_firewall(backup_server, "fw-backup"),
}
}
fn vip_search_empty() -> serde_json::Value {
serde_json::json!({ "rows": [] })
}
fn vip_add_ok() -> serde_json::Value {
serde_json::json!({ "uuid": "new-uuid" })
}
fn vip_reconfigure_ok() -> serde_json::Value {
serde_json::json!({ "status": "ok" })
}
/// Set up a mock server to expect a VIP creation (search → add → reconfigure).
fn expect_vip_creation(server: &Server) {
server.expect(
Expectation::matching(request::method_path(
"GET",
"/api/interfaces/vip_settings/searchItem",
))
.respond_with(json_encoded(vip_search_empty())),
);
server.expect(
Expectation::matching(request::method_path(
"POST",
"/api/interfaces/vip_settings/addItem",
))
.respond_with(json_encoded(vip_add_ok())),
);
server.expect(
Expectation::matching(request::method_path(
"POST",
"/api/interfaces/vip_settings/reconfigure",
))
.respond_with(json_encoded(vip_reconfigure_ok())),
);
}
// ── ensure_ready tests ─────────────────────────────────────────
#[tokio::test]
async fn ensure_ready_merges_both_success() {
let s1 = Server::run();
let s2 = Server::run();
let pair = mock_pair(&s1, &s2);
let result = pair.ensure_ready().await.unwrap();
match result {
PreparationOutcome::Success { details } => {
assert!(details.contains("Primary"));
assert!(details.contains("Backup"));
}
PreparationOutcome::Noop => panic!("Expected Success, got Noop"),
}
}
// ── CarpVipScore tests ─────────────────────────────────────────
#[tokio::test]
async fn carp_vip_score_applies_to_both_firewalls() {
let primary_server = Server::run();
let backup_server = Server::run();
// Both firewalls should receive VIP creation calls
expect_vip_creation(&primary_server);
expect_vip_creation(&backup_server);
let pair = mock_pair(&primary_server, &backup_server);
let inventory = Inventory::empty();
let score = CarpVipScore {
vips: vec![VipDef {
mode: VipMode::Carp,
interface: "lan".to_string(),
subnet: "192.168.1.1".to_string(),
subnet_bits: 24,
vhid: Some(1),
advbase: Some(1),
advskew: None,
password: Some("secret".to_string()),
peer: None,
}],
backup_advskew: Some(100),
};
let result = score.interpret(&inventory, &pair).await;
assert!(result.is_ok(), "CarpVipScore should succeed: {:?}", result);
let outcome = result.unwrap();
assert!(
outcome.message.contains("primary advskew=0"),
"Message should mention primary advskew: {}",
outcome.message
);
assert!(
outcome.message.contains("backup advskew=100"),
"Message should mention backup advskew: {}",
outcome.message
);
}
#[tokio::test]
async fn carp_vip_score_sends_to_both_and_reports_advskew() {
let primary_server = Server::run();
let backup_server = Server::run();
// Both firewalls should receive VIP creation calls
expect_vip_creation(&primary_server);
expect_vip_creation(&backup_server);
let pair = mock_pair(&primary_server, &backup_server);
let inventory = Inventory::empty();
let score = CarpVipScore {
vips: vec![VipDef {
mode: VipMode::Carp,
interface: "lan".to_string(),
subnet: "10.0.0.1".to_string(),
subnet_bits: 32,
vhid: Some(1),
advbase: Some(1),
advskew: None,
password: Some("pass".to_string()),
peer: None,
}],
backup_advskew: Some(50),
};
let result = score.interpret(&inventory, &pair).await;
assert!(result.is_ok(), "CarpVipScore should succeed: {:?}", result);
let outcome = result.unwrap();
assert!(
outcome.message.contains("backup advskew=50"),
"Custom backup_advskew should be respected: {}",
outcome.message
);
// httptest verifies both servers received exactly the expected API calls
}
#[tokio::test]
async fn carp_vip_score_default_backup_advskew_is_100() {
let primary_server = Server::run();
let backup_server = Server::run();
expect_vip_creation(&primary_server);
expect_vip_creation(&backup_server);
let pair = mock_pair(&primary_server, &backup_server);
let inventory = Inventory::empty();
// backup_advskew is None — should default to 100
let score = CarpVipScore {
vips: vec![VipDef {
mode: VipMode::Carp,
interface: "lan".to_string(),
subnet: "10.0.0.1".to_string(),
subnet_bits: 32,
vhid: Some(1),
advbase: Some(1),
advskew: None,
password: None,
peer: None,
}],
backup_advskew: None,
};
let result = score.interpret(&inventory, &pair).await;
assert!(result.is_ok());
let outcome = result.unwrap();
assert!(
outcome.message.contains("backup advskew=100"),
"Default backup advskew should be 100: {}",
outcome.message
);
}
// ── Uniform score delegation tests ─────────────────────────────
#[tokio::test]
async fn vlan_score_applies_to_both_firewalls() {
let primary_server = Server::run();
let backup_server = Server::run();
// VLAN API: GET .../get to list, POST .../addItem to create, POST .../reconfigure to apply
fn expect_vlan_creation(server: &Server) {
server.expect(
Expectation::matching(request::method_path(
"GET",
"/api/interfaces/vlan_settings/get",
))
.respond_with(json_encoded(serde_json::json!({
"vlan": { "vlan": [] }
}))),
);
server.expect(
Expectation::matching(request::method_path(
"POST",
"/api/interfaces/vlan_settings/addItem",
))
.respond_with(json_encoded(serde_json::json!({ "uuid": "vlan-uuid" }))),
);
server.expect(
Expectation::matching(request::method_path(
"POST",
"/api/interfaces/vlan_settings/reconfigure",
))
.respond_with(json_encoded(serde_json::json!({ "status": "ok" }))),
);
}
expect_vlan_creation(&primary_server);
expect_vlan_creation(&backup_server);
let pair = mock_pair(&primary_server, &backup_server);
let inventory = Inventory::empty();
let score = VlanScore {
vlans: vec![crate::modules::opnsense::vlan::VlanDef {
parent_interface: "lagg0".to_string(),
tag: 50,
description: "test_vlan".to_string(),
}],
};
let result = score.interpret(&inventory, &pair).await;
assert!(result.is_ok(), "VlanScore should succeed: {:?}", result);
// httptest verifies both servers received the expected calls
}
}

View File

@@ -1,4 +1,5 @@
use async_trait::async_trait;
use brocade::{InterfaceConfig, InterfaceSpeed, PortChannelConfig, PortChannelId, Vlan};
use harmony_k8s::K8sClient;
use harmony_macros::ip;
use harmony_types::{
@@ -11,7 +12,7 @@ use log::info;
use crate::topology::{HelmCommand, PxeOptions};
use crate::{data::FileContent, executors::ExecutorError, topology::node_exporter::NodeExporter};
use crate::{infra::network_manager::OpenShiftNmStateNetworkManager, topology::PortConfig};
use crate::infra::network_manager::OpenShiftNmStateNetworkManager;
use super::{
DHCPStaticEntry, DhcpServer, DnsRecord, DnsRecordType, DnsServer, Firewall, HostNetworkConfig,
@@ -204,9 +205,6 @@ impl LoadBalancer for HAClusterTopology {
async fn reload_restart(&self) -> Result<(), ExecutorError> {
self.load_balancer.reload_restart().await
}
async fn ensure_wan_access(&self, port: u16) -> Result<(), ExecutorError> {
self.load_balancer.ensure_wan_access(port).await
}
}
#[async_trait]
@@ -319,23 +317,57 @@ impl Switch for HAClusterTopology {
self.switch_client.find_port(mac_address).await
}
async fn configure_port_channel(&self, config: &HostNetworkConfig) -> Result<(), SwitchError> {
async fn configure_port_channel(
&self,
channel_id: PortChannelId,
config: &HostNetworkConfig,
) -> Result<(), SwitchError> {
debug!("Configuring port channel: {config:#?}");
let switch_ports = config.switch_ports.iter().map(|s| s.port.clone()).collect();
self.switch_client
.configure_port_channel(&format!("Harmony_{}", config.host_id), switch_ports)
.configure_port_channel(
channel_id,
&format!("Harmony_{}", config.host_id),
switch_ports,
None,
)
.await
.map_err(|e| SwitchError::new(format!("Failed to configure port-channel: {e}")))?;
Ok(())
}
async fn configure_port_channel_from_config(
&self,
config: &PortChannelConfig,
) -> Result<(), SwitchError> {
self.switch_client
.configure_port_channel(
config.id,
&config.name,
config.ports.clone(),
config.speed.as_ref(),
)
.await
.map_err(|e| SwitchError::new(format!("Failed to create port-channel: {e}")))?;
Ok(())
}
async fn clear_port_channel(&self, _ids: &Vec<Id>) -> Result<(), SwitchError> {
todo!()
}
async fn configure_interface(&self, _ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
todo!()
async fn configure_interfaces(
&self,
interfaces: &Vec<InterfaceConfig>,
) -> Result<(), SwitchError> {
self.switch_client.configure_interfaces(interfaces).await
}
async fn create_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError> {
self.switch_client.create_vlan(vlan).await
}
async fn delete_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError> {
self.switch_client.delete_vlan(vlan).await
}
}
@@ -595,15 +627,26 @@ impl SwitchClient for DummyInfra {
async fn configure_port_channel(
&self,
_channel_id: PortChannelId,
_channel_name: &str,
_switch_ports: Vec<PortLocation>,
_speed: Option<&InterfaceSpeed>,
) -> Result<u8, SwitchError> {
unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
}
async fn clear_port_channel(&self, _ids: &Vec<Id>) -> Result<(), SwitchError> {
todo!()
}
async fn configure_interface(&self, _ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
async fn configure_interfaces(
&self,
_interfaces: &Vec<InterfaceConfig>,
) -> Result<(), SwitchError> {
todo!()
}
async fn create_vlan(&self, _vlan: &Vlan) -> Result<(), SwitchError> {
todo!()
}
async fn delete_vlan(&self, _vlan: &Vlan) -> Result<(), SwitchError> {
todo!()
}
}

View File

@@ -30,18 +30,6 @@ pub trait LoadBalancer: Send + Sync {
self.add_service(service).await?;
Ok(())
}
/// Ensure a TCP port is open for inbound WAN traffic.
///
/// This creates a firewall rule to accept traffic on the given port
/// from the WAN interface. Used by load balancers that need to receive
/// external traffic (e.g., OKD ingress on ports 80/443).
///
/// Default implementation is a no-op for topologies that don't manage
/// firewall rules (e.g., cloud environments with security groups).
async fn ensure_wan_access(&self, _port: u16) -> Result<(), ExecutorError> {
Ok(())
}
}
#[derive(Debug, PartialEq, Clone, Serialize)]

View File

@@ -1,12 +1,10 @@
pub mod decentralized;
mod failover;
pub mod firewall_pair;
mod ha_cluster;
pub mod ingress;
pub mod node_exporter;
pub mod opnsense;
pub use failover::*;
pub use firewall_pair::*;
use harmony_types::net::IpAddress;
mod host_binding;
mod http;

View File

@@ -7,7 +7,7 @@ use std::{
};
use async_trait::async_trait;
use brocade::PortOperatingMode;
use brocade::{InterfaceConfig, InterfaceSpeed, PortChannelConfig, PortChannelId, Vlan};
use derive_new::new;
use harmony_k8s::K8sClient;
use harmony_types::{
@@ -220,8 +220,6 @@ impl From<String> for NetworkError {
}
}
pub type PortConfig = (PortLocation, PortOperatingMode);
#[async_trait]
pub trait Switch: Send + Sync {
async fn setup_switch(&self) -> Result<(), SwitchError>;
@@ -231,9 +229,24 @@ pub trait Switch: Send + Sync {
mac_address: &MacAddress,
) -> Result<Option<PortLocation>, SwitchError>;
async fn configure_port_channel(&self, config: &HostNetworkConfig) -> Result<(), SwitchError>;
async fn configure_port_channel(
&self,
channel_id: PortChannelId,
config: &HostNetworkConfig,
) -> Result<(), SwitchError>;
/// Creates a port-channel from a PortChannelConfig (id, name, member ports).
/// Does NOT configure L2 mode — use configure_interfaces for that.
async fn configure_port_channel_from_config(
&self,
config: &PortChannelConfig,
) -> Result<(), SwitchError>;
async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError>;
async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError>;
async fn configure_interfaces(
&self,
interfaces: &Vec<InterfaceConfig>,
) -> Result<(), SwitchError>;
async fn create_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError>;
async fn delete_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError>;
}
#[derive(Clone, Debug, PartialEq)]
@@ -290,12 +303,19 @@ pub trait SwitchClient: Debug + Send + Sync {
async fn configure_port_channel(
&self,
channel_id: PortChannelId,
channel_name: &str,
switch_ports: Vec<PortLocation>,
speed: Option<&InterfaceSpeed>,
) -> Result<u8, SwitchError>;
async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError>;
async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError>;
async fn configure_interfaces(
&self,
interfaces: &Vec<InterfaceConfig>,
) -> Result<(), SwitchError>;
async fn create_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError>;
async fn delete_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError>;
}
#[cfg(test)]

View File

@@ -1,16 +1,20 @@
use async_trait::async_trait;
use brocade::{BrocadeClient, BrocadeOptions, InterSwitchLink, InterfaceStatus, PortOperatingMode};
use brocade::{
BrocadeClient, BrocadeOptions, InterSwitchLink, InterfaceConfig, InterfaceSpeed,
InterfaceStatus, PortChannelId, PortOperatingMode, Vlan,
};
use harmony_types::{
id::Id,
net::{IpAddress, MacAddress},
switch::{PortDeclaration, PortLocation},
};
use log::{info, warn};
use log::info;
use option_ext::OptionExt;
use crate::{
modules::brocade::BrocadeSwitchAuth,
topology::{PortConfig, SwitchClient, SwitchError},
topology::{SwitchClient, SwitchError},
};
#[derive(Debug, Clone)]
@@ -54,7 +58,7 @@ impl SwitchClient for BrocadeSwitchClient {
info!("Brocade found interfaces {interfaces:#?}");
let interfaces: Vec<(String, PortOperatingMode)> = interfaces
let interfaces: Vec<InterfaceConfig> = interfaces
.into_iter()
.filter(|interface| {
interface.operating_mode.is_none() && interface.status == InterfaceStatus::Connected
@@ -65,7 +69,16 @@ impl SwitchClient for BrocadeSwitchClient {
|| link.remote_port.contains(&interface.port_location)
})
})
.map(|interface| (interface.name.clone(), PortOperatingMode::Trunk))
.map(|interface| InterfaceConfig {
interface: brocade::SwitchInterface::Ethernet(
interface.interface_type.clone(),
interface.port_location.clone(),
),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: None,
speed: None,
})
.collect();
if interfaces.is_empty() {
@@ -114,50 +127,19 @@ impl SwitchClient for BrocadeSwitchClient {
async fn configure_port_channel(
&self,
channel_id: PortChannelId,
channel_name: &str,
switch_ports: Vec<PortLocation>,
speed: Option<&InterfaceSpeed>,
) -> Result<u8, SwitchError> {
let mut channel_id = self
.brocade
.find_available_channel_id()
self.brocade
.create_port_channel(channel_id, channel_name, &switch_ports, speed)
.await
.map_err(|e| SwitchError::new(format!("{e}")))?;
info!("Found next available channel id : {channel_id}");
loop {
match self
.brocade
.create_port_channel(channel_id, channel_name, &switch_ports)
.await
.map_err(|e| SwitchError::new(format!("{e}")))
{
Ok(_) => {
info!(
"Successfully configured port channel {channel_id} {channel_name} for ports {switch_ports:?}"
);
break;
}
Err(e) => {
warn!(
"Could not configure port channel {channel_id} {channel_name} for ports {switch_ports:?}"
);
let previous_id = channel_id;
while previous_id == channel_id {
channel_id = inquire::Text::new(
"Type the port channel number to use (or CTRL+C to exit) :",
)
.prompt()
.map_err(|e| {
SwitchError::new(format!("Failed to prompt for channel id : {e}"))
})?
.parse()
.unwrap_or(channel_id);
}
}
}
}
info!(
"Successfully configured port channel {channel_id} {channel_name} for ports {switch_ports:?}"
);
Ok(channel_id)
}
@@ -170,14 +152,28 @@ impl SwitchClient for BrocadeSwitchClient {
}
Ok(())
}
async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
// FIXME hardcoded TenGigabitEthernet = bad
let ports = ports
.iter()
.map(|p| (format!("TenGigabitEthernet {}", p.0), p.1.clone()))
.collect();
async fn configure_interfaces(
&self,
interfaces: &Vec<InterfaceConfig>,
) -> Result<(), SwitchError> {
self.brocade
.configure_interfaces(&ports)
.configure_interfaces(interfaces)
.await
.map_err(|e| SwitchError::new(e.to_string()))?;
Ok(())
}
async fn create_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError> {
self.brocade
.create_vlan(vlan)
.await
.map_err(|e| SwitchError::new(e.to_string()))?;
Ok(())
}
async fn delete_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError> {
self.brocade
.delete_vlan(vlan)
.await
.map_err(|e| SwitchError::new(e.to_string()))?;
Ok(())
@@ -208,8 +204,10 @@ impl SwitchClient for UnmanagedSwitch {
async fn configure_port_channel(
&self,
channel_name: &str,
switch_ports: Vec<PortLocation>,
_channel_id: PortChannelId,
_channel_name: &str,
_switch_ports: Vec<PortLocation>,
_speed: Option<&InterfaceSpeed>,
) -> Result<u8, SwitchError> {
todo!("unmanaged switch. Nothing to do.")
}
@@ -217,8 +215,19 @@ impl SwitchClient for UnmanagedSwitch {
async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
todo!("unmanged switch. Nothing to do.")
}
async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
todo!("unmanged switch. Nothing to do.")
async fn configure_interfaces(
&self,
_interfaces: &Vec<InterfaceConfig>,
) -> Result<(), SwitchError> {
todo!("unmanaged switch. Nothing to do.")
}
async fn create_vlan(&self, _vlan: &Vlan) -> Result<(), SwitchError> {
todo!("unmanaged switch. Nothing to do.")
}
async fn delete_vlan(&self, _vlan: &Vlan) -> Result<(), SwitchError> {
todo!("unmanaged switch. Nothing to do.")
}
}
@@ -229,8 +238,9 @@ mod tests {
use assertor::*;
use async_trait::async_trait;
use brocade::{
BrocadeClient, BrocadeInfo, Error, InterSwitchLink, InterfaceInfo, InterfaceStatus,
InterfaceType, MacAddressEntry, PortChannelId, PortOperatingMode, SecurityLevel,
BrocadeClient, BrocadeInfo, Error, InterSwitchLink, InterfaceConfig, InterfaceInfo,
InterfaceSpeed, InterfaceStatus, InterfaceType, MacAddressEntry, PortChannelId,
PortOperatingMode, SecurityLevel, Vlan,
};
use harmony_types::switch::PortLocation;
@@ -258,8 +268,26 @@ mod tests {
//TODO not sure about this
let configured_interfaces = brocade.configured_interfaces.lock().unwrap();
assert_that!(*configured_interfaces).contains_exactly(vec![
(first_interface.name.clone(), PortOperatingMode::Trunk),
(second_interface.name.clone(), PortOperatingMode::Trunk),
InterfaceConfig {
interface: brocade::SwitchInterface::Ethernet(
InterfaceType::TenGigabitEthernet,
PortLocation(1, 0, 1),
),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: None,
speed: None,
},
InterfaceConfig {
interface: brocade::SwitchInterface::Ethernet(
InterfaceType::TenGigabitEthernet,
PortLocation(1, 0, 4),
),
mode: PortOperatingMode::Trunk,
access_vlan: None,
trunk_vlans: None,
speed: None,
},
]);
}
@@ -343,7 +371,7 @@ mod tests {
struct FakeBrocadeClient {
stack_topology: Vec<InterSwitchLink>,
interfaces: Vec<InterfaceInfo>,
configured_interfaces: Arc<Mutex<Vec<(String, PortOperatingMode)>>>,
configured_interfaces: Arc<Mutex<Vec<InterfaceConfig>>>,
}
#[async_trait]
@@ -366,7 +394,7 @@ mod tests {
async fn configure_interfaces(
&self,
interfaces: &Vec<(String, PortOperatingMode)>,
interfaces: &Vec<InterfaceConfig>,
) -> Result<(), Error> {
let mut configured_interfaces = self.configured_interfaces.lock().unwrap();
*configured_interfaces = interfaces.clone();
@@ -374,6 +402,14 @@ mod tests {
Ok(())
}
async fn create_vlan(&self, _vlan: &Vlan) -> Result<(), Error> {
todo!()
}
async fn delete_vlan(&self, _vlan: &Vlan) -> Result<(), Error> {
todo!()
}
async fn find_available_channel_id(&self) -> Result<PortChannelId, Error> {
todo!()
}
@@ -383,10 +419,15 @@ mod tests {
_channel_id: PortChannelId,
_channel_name: &str,
_ports: &[PortLocation],
_speed: Option<&InterfaceSpeed>,
) -> Result<(), Error> {
todo!()
}
async fn reset_interface(&self, _interface: &str) -> Result<(), Error> {
todo!()
}
async fn clear_port_channel(&self, _channel_name: &str) -> Result<(), Error> {
todo!()
}
@@ -418,7 +459,7 @@ mod tests {
let interface_type = self
.interface_type
.clone()
.unwrap_or(InterfaceType::Ethernet("TenGigabitEthernet".into()));
.unwrap_or(InterfaceType::TenGigabitEthernet);
let port_location = self.port_location.clone().unwrap_or(PortLocation(1, 0, 1));
let name = format!("{interface_type} {port_location}");
let status = self.status.clone().unwrap_or(InterfaceStatus::Connected);

View File

@@ -1,6 +1,6 @@
use async_trait::async_trait;
use harmony_types::net::MacAddress;
use log::{info, warn};
use log::info;
use crate::{
executors::ExecutorError,
@@ -19,46 +19,24 @@ impl DhcpServer for OPNSenseFirewall {
async fn add_static_mapping(&self, entry: &DHCPStaticEntry) -> Result<(), ExecutorError> {
let mac: Vec<String> = entry.mac.iter().map(MacAddress::to_string).collect();
self.opnsense_config
.dhcp()
.add_static_mapping(&mac, &entry.ip, &entry.name)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
{
let mut writable_opnsense = self.opnsense_config.write().await;
writable_opnsense
.dhcp()
.add_static_mapping(&mac, &entry.ip, &entry.name)
.unwrap();
}
info!("Registered {:?}", entry);
Ok(())
}
async fn remove_static_mapping(&self, mac: &MacAddress) -> Result<(), ExecutorError> {
self.opnsense_config
.dhcp()
.remove_static_mapping(&mac.to_string())
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
info!("Removed static mapping for MAC {}", mac);
Ok(())
async fn remove_static_mapping(&self, _mac: &MacAddress) -> Result<(), ExecutorError> {
todo!()
}
async fn list_static_mappings(&self) -> Vec<(MacAddress, IpAddress)> {
match self.opnsense_config.dhcp().list_static_mappings().await {
Ok(mappings) => mappings
.into_iter()
.filter_map(|(mac_str, ipv4)| {
let mac = MacAddress::try_from(mac_str.clone())
.map_err(|e| {
warn!("Skipping invalid MAC '{}': {}", mac_str, e);
e
})
.ok()?;
Some((mac, IpAddress::V4(ipv4)))
})
.collect(),
Err(e) => {
warn!("Failed to list static mappings: {}", e);
vec![]
}
}
todo!()
}
fn get_ip(&self) -> IpAddress {
@@ -70,13 +48,14 @@ impl DhcpServer for OPNSenseFirewall {
}
async fn set_pxe_options(&self, options: PxeOptions) -> Result<(), ExecutorError> {
let mut writable_opnsense = self.opnsense_config.write().await;
let PxeOptions {
ipxe_filename,
bios_filename,
efi_filename,
tftp_ip,
} = options;
self.opnsense_config
writable_opnsense
.dhcp()
.set_pxe_options(
tftp_ip.map(|i| i.to_string()),
@@ -95,7 +74,8 @@ impl DhcpServer for OPNSenseFirewall {
start: &IpAddress,
end: &IpAddress,
) -> Result<(), ExecutorError> {
self.opnsense_config
let mut writable_opnsense = self.opnsense_config.write().await;
writable_opnsense
.dhcp()
.set_dhcp_range(&start.to_string(), &end.to_string())
.await

View File

@@ -11,7 +11,22 @@ use super::OPNSenseFirewall;
#[async_trait]
impl DnsServer for OPNSenseFirewall {
async fn register_hosts(&self, _hosts: Vec<DnsRecord>) -> Result<(), ExecutorError> {
todo!("Refactor this to use dnsmasq API")
todo!("Refactor this to use dnsmasq")
// let mut writable_opnsense = self.opnsense_config.write().await;
// let mut dns = writable_opnsense.dns();
// let hosts = hosts
// .iter()
// .map(|h| {
// Host::new(
// h.host.clone(),
// h.domain.clone(),
// h.record_type.to_string(),
// h.value.to_string(),
// )
// })
// .collect();
// dns.add_static_mapping(hosts);
// Ok(())
}
fn remove_record(
@@ -23,7 +38,26 @@ impl DnsServer for OPNSenseFirewall {
}
async fn list_records(&self) -> Vec<crate::topology::DnsRecord> {
todo!("Refactor this to use dnsmasq API")
todo!("Refactor this to use dnsmasq")
// self.opnsense_config
// .write()
// .await
// .dns()
// .get_hosts()
// .iter()
// .map(|h| DnsRecord {
// host: h.hostname.clone(),
// domain: h.domain.clone(),
// record_type: h
// .rr
// .parse()
// .expect("received invalid record type {h.rr} from opnsense"),
// value: h
// .server
// .parse()
// .expect("received invalid ipv4 record from opnsense {h.server}"),
// })
// .collect()
}
fn get_ip(&self) -> IpAddress {
@@ -35,11 +69,23 @@ impl DnsServer for OPNSenseFirewall {
}
async fn register_dhcp_leases(&self, _register: bool) -> Result<(), ExecutorError> {
todo!("Refactor this to use dnsmasq API")
todo!("Refactor this to use dnsmasq")
// let mut writable_opnsense = self.opnsense_config.write().await;
// let mut dns = writable_opnsense.dns();
// dns.register_dhcp_leases(register);
//
// Ok(())
}
async fn commit_config(&self) -> Result<(), ExecutorError> {
self.opnsense_config
let opnsense = self.opnsense_config.read().await;
opnsense
.save()
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
opnsense
.restart_dns()
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))

View File

@@ -8,53 +8,6 @@ use harmony_types::net::IpAddress;
use harmony_types::net::Url;
const OPNSENSE_HTTP_ROOT_PATH: &str = "/usr/local/http";
/// Download a remote URL into a temporary directory, returning the temp dir path.
///
///
/// The file is saved with its original filename (extracted from the URL path).
/// The caller can then use `upload_files` to SFTP the whole temp dir contents
/// to the OPNsense appliance.
pub(in crate::infra::opnsense) async fn download_url_to_temp_dir(
url: &url::Url,
) -> Result<String, ExecutorError> {
let client = reqwest::Client::new();
let response =
client.get(url.as_str()).send().await.map_err(|e| {
ExecutorError::UnexpectedError(format!("Failed to download {url}: {e}"))
})?;
if !response.status().is_success() {
return Err(ExecutorError::UnexpectedError(format!(
"HTTP {} downloading {url}",
response.status()
)));
}
let file_name = url
.path_segments()
.and_then(|s| s.last())
.filter(|s| !s.is_empty())
.unwrap_or("download");
let temp_dir = std::env::temp_dir().join("harmony_url_downloads");
tokio::fs::create_dir_all(&temp_dir)
.await
.map_err(|e| ExecutorError::UnexpectedError(format!("Failed to create temp dir: {e}")))?;
let dest = temp_dir.join(file_name);
let bytes = response
.bytes()
.await
.map_err(|e| ExecutorError::UnexpectedError(format!("Failed to read response: {e}")))?;
tokio::fs::write(&dest, &bytes)
.await
.map_err(|e| ExecutorError::UnexpectedError(format!("Failed to write temp file: {e}")))?;
info!("Downloaded {} to {:?} ({} bytes)", url, dest, bytes.len());
Ok(temp_dir.to_string_lossy().to_string())
}
#[async_trait]
impl HttpServer for OPNSenseFirewall {
async fn serve_files(
@@ -62,6 +15,7 @@ impl HttpServer for OPNSenseFirewall {
url: &Url,
remote_path: &Option<String>,
) -> Result<(), ExecutorError> {
let config = self.opnsense_config.read().await;
info!("Uploading files from url {url} to {OPNSENSE_HTTP_ROOT_PATH}");
let remote_upload_path = remote_path
.clone()
@@ -69,18 +23,12 @@ impl HttpServer for OPNSenseFirewall {
.unwrap_or(OPNSENSE_HTTP_ROOT_PATH.to_string());
match url {
Url::LocalFolder(path) => {
self.opnsense_config
config
.upload_files(path, &remote_upload_path)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
}
Url::Url(remote_url) => {
let local_dir = download_url_to_temp_dir(remote_url).await?;
self.opnsense_config
.upload_files(&local_dir, &remote_upload_path)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
}
Url::Url(_url) => todo!(),
}
Ok(())
}
@@ -97,8 +45,9 @@ impl HttpServer for OPNSenseFirewall {
}
};
let config = self.opnsense_config.read().await;
info!("Uploading file content to {}", path);
self.opnsense_config
config
.upload_file_content(&path, &file.content)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
@@ -115,6 +64,8 @@ impl HttpServer for OPNSenseFirewall {
async fn reload_restart(&self) -> Result<(), ExecutorError> {
self.opnsense_config
.write()
.await
.caddy()
.reload_restart()
.await
@@ -122,20 +73,20 @@ impl HttpServer for OPNSenseFirewall {
}
async fn ensure_initialized(&self) -> Result<(), ExecutorError> {
if !self.opnsense_config.caddy().is_installed().await {
info!("Http config not available, installing os-caddy package");
self.opnsense_config
.install_package("os-caddy")
.await
.map_err(|e| {
ExecutorError::UnexpectedError(format!("Failed to install os-caddy: {e:?}"))
})?;
let mut config = self.opnsense_config.write().await;
let caddy = config.caddy();
if caddy.get_full_config().is_none() {
info!("Http config not available in opnsense config, installing package");
config.install_package("os-caddy").await.map_err(|e| {
ExecutorError::UnexpectedError(format!(
"Executor failed when trying to install os-caddy package with error {e:?}"
))
})?;
} else {
info!("Http config available, assuming Caddy is already installed");
info!("Http config available in opnsense config, assuming it is already installed");
}
info!("Adding custom caddy config files");
self.opnsense_config
config
.upload_files(
"./data/watchguard/caddy_config",
"/usr/local/etc/caddy/caddy.d/",
@@ -144,11 +95,7 @@ impl HttpServer for OPNSenseFirewall {
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
info!("Enabling http server");
self.opnsense_config
.caddy()
.enable(true)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
config.caddy().enable(true);
Ok(())
}

View File

@@ -1,8 +1,9 @@
use async_trait::async_trait;
use log::{debug, error, info, warn};
use opnsense_config::modules::load_balancer::{
HaproxyService, LbBackend, LbFrontend, LbHealthCheck, LbServer,
use opnsense_config_xml::{
Frontend, HAProxy, HAProxyBackend, HAProxyHealthCheck, HAProxyServer, MaybeString,
};
use uuid::Uuid;
use crate::{
executors::ExecutorError,
@@ -11,7 +12,6 @@ use crate::{
LogicalHost, SSL,
},
};
use harmony_types::firewall::{Direction, FirewallAction, IpProtocol, NetworkProtocol};
use harmony_types::net::IpAddress;
use super::OPNSenseFirewall;
@@ -26,13 +26,15 @@ impl LoadBalancer for OPNSenseFirewall {
}
async fn add_service(&self, service: &LoadBalancerService) -> Result<(), ExecutorError> {
let (frontend, backend, servers, healthcheck) = harmony_service_to_lb_types(service);
let mut config = self.opnsense_config.write().await;
let mut load_balancer = config.load_balancer();
self.opnsense_config
.load_balancer()
.configure_service(frontend, backend, servers, healthcheck)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
let (frontend, backend, servers, healthcheck) =
harmony_load_balancer_service_to_haproxy_xml(service);
load_balancer.configure_service(frontend, backend, servers, healthcheck);
Ok(())
}
async fn remove_service(&self, service: &LoadBalancerService) -> Result<(), ExecutorError> {
@@ -45,6 +47,8 @@ impl LoadBalancer for OPNSenseFirewall {
async fn reload_restart(&self) -> Result<(), ExecutorError> {
self.opnsense_config
.write()
.await
.load_balancer()
.reload_restart()
.await
@@ -52,214 +56,455 @@ impl LoadBalancer for OPNSenseFirewall {
}
async fn ensure_initialized(&self) -> Result<(), ExecutorError> {
let lb = self.opnsense_config.load_balancer();
if lb.is_installed().await {
debug!("HAProxy is installed");
let mut config = self.opnsense_config.write().await;
let load_balancer = config.load_balancer();
if let Some(config) = load_balancer.get_full_config() {
debug!(
"HAProxy config available in opnsense config, assuming it is already installed, {config:?}"
);
} else {
self.opnsense_config
.install_package("os-haproxy")
.await
.map_err(|e| {
ExecutorError::UnexpectedError(format!("Failed to install os-haproxy: {e:?}"))
})?;
config.install_package("os-haproxy").await.map_err(|e| {
ExecutorError::UnexpectedError(format!(
"Executor failed when trying to install os-haproxy package with error {e:?}"
))
})?;
}
self.opnsense_config
.load_balancer()
.enable(true)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
config.load_balancer().enable(true);
Ok(())
}
async fn list_services(&self) -> Vec<LoadBalancerService> {
match self.opnsense_config.load_balancer().list_services().await {
Ok(services) => services
.into_iter()
.filter_map(|svc| haproxy_service_to_harmony(&svc))
.collect(),
Err(e) => {
warn!("Failed to list HAProxy services: {e}");
vec![]
}
}
}
async fn ensure_wan_access(&self, port: u16) -> Result<(), ExecutorError> {
info!("Ensuring WAN firewall rule for TCP port {port}");
let fw = self.opnsense_config.firewall();
fw.ensure_filter_rule(
&FirewallAction::Pass,
&Direction::In,
"wan",
&IpProtocol::Inet,
&NetworkProtocol::Tcp,
"any",
"any",
Some(&port.to_string()),
None,
&format!("LB: Allow TCP/{port} ingress on WAN"),
false,
)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
fw.apply()
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
Ok(())
let mut config = self.opnsense_config.write().await;
let load_balancer = config.load_balancer();
let haproxy_xml_config = load_balancer.get_full_config();
haproxy_xml_config_to_harmony_loadbalancer(haproxy_xml_config)
}
}
fn haproxy_service_to_harmony(svc: &HaproxyService) -> Option<LoadBalancerService> {
let listening_port = svc.bind.parse().unwrap_or_else(|_| {
panic!(
"HAProxy frontend address should be a valid SocketAddr, got {}",
svc.bind
)
});
pub(crate) fn haproxy_xml_config_to_harmony_loadbalancer(
haproxy: &Option<HAProxy>,
) -> Vec<LoadBalancerService> {
let haproxy = match haproxy {
Some(haproxy) => haproxy,
None => return vec![],
};
let backend_servers: Vec<BackendServer> = svc
haproxy
.frontends
.frontend
.iter()
.map(|frontend| {
let mut backend_servers = vec![];
let matching_backend = haproxy
.backends
.backends
.iter()
.find(|b| Some(b.uuid.clone()) == frontend.default_backend);
let mut health_check = None;
match matching_backend {
Some(backend) => {
backend_servers.append(&mut get_servers_for_backend(backend, haproxy));
health_check = get_health_check_for_backend(backend, haproxy);
}
None => {
warn!(
"HAProxy config could not find a matching backend for frontend {frontend:?}"
);
}
}
LoadBalancerService {
backend_servers,
listening_port: frontend.bind.parse().unwrap_or_else(|_| {
panic!(
"HAProxy frontend address should be a valid SocketAddr, got {}",
frontend.bind
)
}),
health_check,
}
})
.collect()
}
pub(crate) fn get_servers_for_backend(
backend: &HAProxyBackend,
haproxy: &HAProxy,
) -> Vec<BackendServer> {
let backend_servers: Vec<&str> = match &backend.linked_servers.content {
Some(linked_servers) => linked_servers.split(',').collect(),
None => {
info!("No server defined for HAProxy backend {:?}", backend);
return vec![];
}
};
haproxy
.servers
.servers
.iter()
.map(|s| BackendServer {
address: s.address.clone(),
port: s.port,
.filter_map(|server| {
let address = server.address.clone()?;
let port = server.port?;
if backend_servers.contains(&server.uuid.as_str()) {
return Some(BackendServer { address, port });
}
None
})
.collect();
let health_check = svc
.health_check
.as_ref()
.and_then(|hc| match hc.check_type.as_str() {
"TCP" => Some(HealthCheck::TCP(hc.checkport)),
"HTTP" => {
let path = hc.http_uri.clone().unwrap_or_default();
let method: HttpMethod = hc.http_method.clone().unwrap_or_default().into();
let ssl = match hc.ssl.as_deref().unwrap_or("").to_uppercase().as_str() {
"SSL" => SSL::SSL,
"SSLNI" => SSL::SNI,
"NOSSL" => SSL::Disabled,
"" => SSL::Default,
other => {
error!("Unknown haproxy health check ssl config {other}");
SSL::Other(other.to_string())
}
};
Some(HealthCheck::HTTP(
hc.checkport,
path,
method,
HttpStatusCode::Success2xx,
ssl,
))
}
_ => {
warn!("Unsupported health check type: {}", hc.check_type);
None
}
});
Some(LoadBalancerService {
backend_servers,
listening_port,
health_check,
})
.collect()
}
pub(crate) fn harmony_service_to_lb_types(
service: &LoadBalancerService,
) -> (LbFrontend, LbBackend, Vec<LbServer>, Option<LbHealthCheck>) {
let healthcheck = service.health_check.as_ref().map(|hc| match hc {
HealthCheck::HTTP(port, path, http_method, _status_code, ssl) => {
let ssl_str = match ssl {
SSL::SSL => Some("ssl".to_string()),
SSL::SNI => Some("sslni".to_string()),
SSL::Disabled => Some("nossl".to_string()),
SSL::Default => Some(String::new()),
SSL::Other(other) => Some(other.clone()),
pub(crate) fn get_health_check_for_backend(
backend: &HAProxyBackend,
haproxy: &HAProxy,
) -> Option<HealthCheck> {
let health_check_uuid = match &backend.health_check.content {
Some(uuid) => uuid,
None => return None,
};
let haproxy_health_check = haproxy
.healthchecks
.healthchecks
.iter()
.find(|h| &h.uuid == health_check_uuid)?;
let binding = haproxy_health_check.health_check_type.to_uppercase();
let uppercase = binding.as_str();
match uppercase {
"TCP" => {
if let Some(checkport) = haproxy_health_check.checkport.content.as_ref() {
if !checkport.is_empty() {
return Some(HealthCheck::TCP(Some(checkport.parse().unwrap_or_else(
|_| {
panic!(
"HAProxy check port should be a valid port number, got {checkport}"
)
},
))));
}
}
Some(HealthCheck::TCP(None))
}
"HTTP" => {
let path: String = haproxy_health_check
.http_uri
.content
.clone()
.unwrap_or_default();
let method: HttpMethod = haproxy_health_check
.http_method
.content
.clone()
.unwrap_or_default()
.into();
let status_code: HttpStatusCode = HttpStatusCode::Success2xx;
let ssl = match haproxy_health_check
.ssl
.content_string()
.to_uppercase()
.as_str()
{
"SSL" => SSL::SSL,
"SSLNI" => SSL::SNI,
"NOSSL" => SSL::Disabled,
"" => SSL::Default,
other => {
error!("Unknown haproxy health check ssl config {other}");
SSL::Other(other.to_string())
}
};
let path_without_query = path.split_once('?').map_or(path.as_str(), |(p, _)| p);
let port_name = port
.map(|p| p.to_string())
.unwrap_or("serverport".to_string());
LbHealthCheck {
name: format!("HTTP_{http_method}_{path_without_query}_{port_name}"),
check_type: "http".to_string(),
interval: "2s".to_string(),
http_method: Some(http_method.to_string().to_lowercase()),
http_uri: Some(path.clone()),
ssl: ssl_str,
checkport: port.map(|p| p.to_string()),
let port = haproxy_health_check
.checkport
.content_string()
.parse::<u16>()
.ok();
debug!("Found haproxy healthcheck port {port:?}");
Some(HealthCheck::HTTP(port, path, method, status_code, ssl))
}
_ => panic!("Received unsupported health check type {}", uppercase),
}
}
pub(crate) fn harmony_load_balancer_service_to_haproxy_xml(
service: &LoadBalancerService,
) -> (
Frontend,
HAProxyBackend,
Vec<HAProxyServer>,
Option<HAProxyHealthCheck>,
) {
// Here we have to build :
// One frontend
// One backend
// One Option<healthcheck>
// Vec of servers
//
// Then merge then with haproxy config individually
//
// We also have to take into account that it is entirely possible that a backe uses a server
// with the same definition as in another backend. So when creating a new backend, we must not
// blindly create new servers because the backend does not exist yet. Even if it is a new
// backend, it may very well reuse existing servers
//
// Also we need to support router integration for port forwarding on WAN as a strategy to
// handle dyndns
// server is standalone
// backend points on server
// backend points to health check
// frontend points to backend
let healthcheck = if let Some(health_check) = &service.health_check {
match health_check {
HealthCheck::HTTP(port, path, http_method, _http_status_code, ssl) => {
let ssl: MaybeString = match ssl {
SSL::SSL => "ssl".into(),
SSL::SNI => "sslni".into(),
SSL::Disabled => "nossl".into(),
SSL::Default => "".into(),
SSL::Other(other) => other.as_str().into(),
};
let path_without_query = path.split_once('?').map_or(path.as_str(), |(p, _)| p);
let (port, port_name) = match port {
Some(port) => (Some(port.to_string()), port.to_string()),
None => (None, "serverport".to_string()),
};
let haproxy_check = HAProxyHealthCheck {
name: format!("HTTP_{http_method}_{path_without_query}_{port_name}"),
uuid: Uuid::new_v4().to_string(),
http_method: http_method.to_string().to_lowercase().into(),
health_check_type: "http".to_string(),
http_uri: path.clone().into(),
interval: "2s".to_string(),
ssl,
checkport: MaybeString::from(port.map(|p| p.to_string())),
..Default::default()
};
Some(haproxy_check)
}
HealthCheck::TCP(port) => {
let (port, port_name) = match port {
Some(port) => (Some(port.to_string()), port.to_string()),
None => (None, "serverport".to_string()),
};
let haproxy_check = HAProxyHealthCheck {
name: format!("TCP_{port_name}"),
uuid: Uuid::new_v4().to_string(),
health_check_type: "tcp".to_string(),
checkport: port.into(),
interval: "2s".to_string(),
..Default::default()
};
Some(haproxy_check)
}
}
HealthCheck::TCP(port) => {
let port_name = port
.map(|p| p.to_string())
.unwrap_or("serverport".to_string());
LbHealthCheck {
name: format!("TCP_{port_name}"),
check_type: "tcp".to_string(),
interval: "2s".to_string(),
http_method: None,
http_uri: None,
ssl: None,
checkport: port.map(|p| p.to_string()),
}
}
});
} else {
None
};
debug!("Built healthcheck {healthcheck:?}");
let servers: Vec<LbServer> = service
let servers: Vec<HAProxyServer> = service
.backend_servers
.iter()
.map(|s| LbServer {
name: format!("{}_{}", &s.address, &s.port),
address: s.address.clone(),
port: s.port,
enabled: true,
mode: "active".to_string(),
server_type: "static".to_string(),
})
.map(server_to_haproxy_server)
.collect();
debug!("Built servers {servers:?}");
let bind_str = service.listening_port.to_string();
let safe_name = bind_str.replace(':', "_");
let backend = LbBackend {
name: format!("backend_{safe_name}"),
mode: "tcp".to_string(),
let mut backend = HAProxyBackend {
uuid: Uuid::new_v4().to_string(),
enabled: 1,
name: format!(
"backend_{}",
service.listening_port.to_string().replace(':', "_")
),
algorithm: "roundrobin".to_string(),
enabled: true,
health_check_enabled: healthcheck.is_some(),
random_draws: Some(2),
stickiness_expire: Some("30m".to_string()),
stickiness_size: Some("50k".to_string()),
stickiness_conn_rate_period: Some("10s".to_string()),
stickiness_sess_rate_period: Some("10s".to_string()),
stickiness_http_req_rate_period: Some("10s".to_string()),
stickiness_http_err_rate_period: Some("10s".to_string()),
stickiness_bytes_in_rate_period: Some("1m".to_string()),
stickiness_bytes_out_rate_period: Some("1m".to_string()),
stickiness_expire: "30m".to_string(),
stickiness_size: "50k".to_string(),
stickiness_conn_rate_period: "10s".to_string(),
stickiness_sess_rate_period: "10s".to_string(),
stickiness_http_req_rate_period: "10s".to_string(),
stickiness_http_err_rate_period: "10s".to_string(),
stickiness_bytes_in_rate_period: "1m".to_string(),
stickiness_bytes_out_rate_period: "1m".to_string(),
mode: "tcp".to_string(), // TODO do not depend on health check here
..Default::default()
};
info!("HAProxy backend algorithm is currently hardcoded to roundrobin");
info!("HAPRoxy backend algorithm is currently hardcoded to roundrobin");
let frontend = LbFrontend {
name: format!("frontend_{safe_name}"),
bind: bind_str,
mode: "tcp".to_string(),
enabled: true,
default_backend: None, // Set by configure_service after creating backend
stickiness_expire: Some("30m".to_string()),
stickiness_size: Some("50k".to_string()),
stickiness_conn_rate_period: Some("10s".to_string()),
stickiness_sess_rate_period: Some("10s".to_string()),
stickiness_http_req_rate_period: Some("10s".to_string()),
stickiness_http_err_rate_period: Some("10s".to_string()),
stickiness_bytes_in_rate_period: Some("1m".to_string()),
stickiness_bytes_out_rate_period: Some("1m".to_string()),
ssl_hsts_max_age: Some(15768000),
if let Some(hcheck) = &healthcheck {
backend.health_check_enabled = 1;
backend.health_check = hcheck.uuid.clone().into();
}
backend.linked_servers = servers
.iter()
.map(|s| s.uuid.as_str())
.collect::<Vec<&str>>()
.join(",")
.into();
debug!("Built backend {backend:?}");
let frontend = Frontend {
uuid: uuid::Uuid::new_v4().to_string(),
enabled: 1,
name: format!(
"frontend_{}",
service.listening_port.to_string().replace(':', "_")
),
bind: service.listening_port.to_string(),
mode: "tcp".to_string(), // TODO do not depend on health check here
default_backend: Some(backend.uuid.clone()),
stickiness_expire: "30m".to_string().into(),
stickiness_size: "50k".to_string().into(),
stickiness_conn_rate_period: "10s".to_string().into(),
stickiness_sess_rate_period: "10s".to_string().into(),
stickiness_http_req_rate_period: "10s".to_string().into(),
stickiness_http_err_rate_period: "10s".to_string().into(),
stickiness_bytes_in_rate_period: "1m".to_string().into(),
stickiness_bytes_out_rate_period: "1m".to_string().into(),
ssl_hsts_max_age: 15768000,
..Default::default()
};
info!("HAProxy frontend and backend mode currently hardcoded to tcp");
info!("HAPRoxy frontend and backend mode currently hardcoded to tcp");
debug!("Built frontend {frontend:?}");
(frontend, backend, servers, healthcheck)
}
fn server_to_haproxy_server(server: &BackendServer) -> HAProxyServer {
HAProxyServer {
uuid: Uuid::new_v4().to_string(),
name: format!("{}_{}", &server.address, &server.port),
enabled: 1,
address: Some(server.address.clone()),
port: Some(server.port),
mode: "active".to_string(),
server_type: "static".to_string(),
..Default::default()
}
}
#[cfg(test)]
mod tests {
use opnsense_config_xml::HAProxyServer;
use super::*;
#[test]
fn test_get_servers_for_backend_with_linked_servers() {
// Create a backend with linked servers
let mut backend = HAProxyBackend::default();
backend.linked_servers.content = Some("server1,server2".to_string());
// Create an HAProxy instance with servers
let mut haproxy = HAProxy::default();
let server = HAProxyServer {
uuid: "server1".to_string(),
address: Some("192.168.1.1".to_string()),
port: Some(80),
..Default::default()
};
haproxy.servers.servers.push(server);
// Call the function
let result = get_servers_for_backend(&backend, &haproxy);
// Check the result
assert_eq!(
result,
vec![BackendServer {
address: "192.168.1.1".to_string(),
port: 80,
},]
);
}
#[test]
fn test_get_servers_for_backend_no_linked_servers() {
// Create a backend with no linked servers
let backend = HAProxyBackend::default();
// Create an HAProxy instance with servers
let mut haproxy = HAProxy::default();
let server = HAProxyServer {
uuid: "server1".to_string(),
address: Some("192.168.1.1".to_string()),
port: Some(80),
..Default::default()
};
haproxy.servers.servers.push(server);
// Call the function
let result = get_servers_for_backend(&backend, &haproxy);
// Check the result
assert_eq!(result, vec![]);
}
#[test]
fn test_get_servers_for_backend_no_matching_servers() {
// Create a backend with linked servers that do not match any in HAProxy
let mut backend = HAProxyBackend::default();
backend.linked_servers.content = Some("server4,server5".to_string());
// Create an HAProxy instance with servers
let mut haproxy = HAProxy::default();
let server = HAProxyServer {
uuid: "server1".to_string(),
address: Some("192.168.1.1".to_string()),
port: Some(80),
..Default::default()
};
haproxy.servers.servers.push(server);
// Call the function
let result = get_servers_for_backend(&backend, &haproxy);
// Check the result
assert_eq!(result, vec![]);
}
#[test]
fn test_get_servers_for_backend_multiple_linked_servers() {
// Create a backend with multiple linked servers
#[allow(clippy::field_reassign_with_default)]
let mut backend = HAProxyBackend::default();
backend.linked_servers.content = Some("server1,server2".to_string());
//
// Create an HAProxy instance with matching servers
let mut haproxy = HAProxy::default();
let server = HAProxyServer {
uuid: "server1".to_string(),
address: Some("some-hostname.test.mcd".to_string()),
port: Some(80),
..Default::default()
};
haproxy.servers.servers.push(server);
let server = HAProxyServer {
uuid: "server2".to_string(),
address: Some("192.168.1.2".to_string()),
port: Some(8080),
..Default::default()
};
haproxy.servers.servers.push(server);
// Call the function
let result = get_servers_for_backend(&backend, &haproxy);
// Check the result
assert_eq!(
result,
vec![
BackendServer {
address: "some-hostname.test.mcd".to_string(),
port: 80,
},
BackendServer {
address: "192.168.1.2".to_string(),
port: 8080,
},
]
);
}
}

View File

@@ -9,17 +9,14 @@ mod tftp;
use std::sync::Arc;
pub use management::*;
use tokio::sync::RwLock;
use cidr::Ipv4Cidr;
use crate::config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials};
use crate::topology::Router;
use crate::{executors::ExecutorError, topology::LogicalHost};
use harmony_types::net::IpAddress;
#[derive(Debug, Clone)]
pub struct OPNSenseFirewall {
opnsense_config: Arc<opnsense_config::Config>,
opnsense_config: Arc<RwLock<opnsense_config::Config>>,
host: LogicalHost,
}
@@ -28,87 +25,27 @@ impl OPNSenseFirewall {
self.host.ip
}
/// Create a new OPNSenseFirewall.
///
/// Requires both API credentials (for configuration CRUD) and SSH
/// credentials (for file uploads, PXE config).
///
/// API port defaults to 443
pub async fn new(
host: LogicalHost,
ssh_port: Option<u16>,
api_creds: &OPNSenseApiCredentials,
ssh_creds: &OPNSenseFirewallCredentials,
) -> Self {
Self::with_api_port(host, ssh_port, 443, api_creds, ssh_creds).await
}
/// Like [`new`] but with a custom API/web GUI port.
pub async fn with_api_port(
host: LogicalHost,
port: Option<u16>,
api_port: u16,
api_creds: &OPNSenseApiCredentials,
ssh_creds: &OPNSenseFirewallCredentials,
) -> Self {
let config = opnsense_config::Config::from_credentials_with_api_port(
host.ip,
port,
api_port,
&api_creds.key,
&api_creds.secret,
&ssh_creds.username,
&ssh_creds.password,
)
.await
.expect("Failed to create OPNsense config");
/// panics : if the opnsense config file cannot be loaded by the underlying opnsense_config
/// crate
pub async fn new(host: LogicalHost, port: Option<u16>, username: &str, password: &str) -> Self {
Self {
opnsense_config: Arc::new(config),
opnsense_config: Arc::new(RwLock::new(
opnsense_config::Config::from_credentials(host.ip, port, username, password).await,
)),
host,
}
}
pub fn get_opnsense_config(&self) -> Arc<opnsense_config::Config> {
pub fn get_opnsense_config(&self) -> Arc<RwLock<opnsense_config::Config>> {
self.opnsense_config.clone()
}
/// Test-only constructor from a pre-built `Config`.
///
/// Allows creating an `OPNSenseFirewall` backed by a mock HTTP server
/// without needing real credentials or SSH connections.
#[cfg(test)]
pub fn from_config(host: LogicalHost, config: opnsense_config::Config) -> Self {
Self {
opnsense_config: Arc::new(config),
host,
}
}
async fn commit_config(&self) -> Result<(), ExecutorError> {
// With the API backend, mutations are applied per-call.
// This is now a no-op for backward compatibility.
self.opnsense_config
.read()
.await
.apply()
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
}
}
impl Router for OPNSenseFirewall {
fn get_gateway(&self) -> IpAddress {
self.host.ip
}
fn get_cidr(&self) -> Ipv4Cidr {
let ipv4 = match self.host.ip {
IpAddress::V4(ip) => ip,
IpAddress::V6(_) => panic!("IPv6 not supported for OPNSense router"),
};
Ipv4Cidr::new(ipv4, 24).unwrap()
}
fn get_host(&self) -> LogicalHost {
self.host.clone()
}
}

View File

@@ -9,33 +9,36 @@ use crate::{
#[async_trait]
impl NodeExporter for OPNSenseFirewall {
async fn ensure_initialized(&self) -> Result<(), ExecutorError> {
if self.opnsense_config.node_exporter().is_installed().await {
debug!("Node exporter is installed");
let mut config = self.opnsense_config.write().await;
let node_exporter = config.node_exporter();
if let Some(config) = node_exporter.get_full_config() {
debug!(
"Node exporter available in opnsense config, assuming it is already installed. {config:?}"
);
} else {
self.opnsense_config
config
.install_package("os-node_exporter")
.await
.map_err(|e| {
ExecutorError::UnexpectedError(format!(
"Failed to install os-node_exporter: {e:?}"
))
})?;
ExecutorError::UnexpectedError(format!("Executor failed when trying to install os-node_exporter package with error {e:?}"
))
})?;
}
self.opnsense_config
config
.node_exporter()
.enable(true)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
Ok(())
}
async fn commit_config(&self) -> Result<(), ExecutorError> {
OPNSenseFirewall::commit_config(self).await
}
async fn reload_restart(&self) -> Result<(), ExecutorError> {
self.opnsense_config
.write()
.await
.node_exporter()
.reload_restart()
.await

View File

@@ -12,21 +12,16 @@ impl TftpServer for OPNSenseFirewall {
async fn serve_files(&self, url: &Url) -> Result<(), ExecutorError> {
let tftp_root_path = "/usr/local/tftp";
let config = self.opnsense_config.read().await;
info!("Uploading files from url {url} to {tftp_root_path}");
match url {
Url::LocalFolder(path) => {
self.opnsense_config
config
.upload_files(path, tftp_root_path)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
}
Url::Url(url) => {
let local_dir = super::http::download_url_to_temp_dir(url).await?;
self.opnsense_config
.upload_files(&local_dir, tftp_root_path)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
}
Url::Url(url) => todo!("This url is not supported yet {url}"),
}
Ok(())
}
@@ -38,10 +33,11 @@ impl TftpServer for OPNSenseFirewall {
async fn set_ip(&self, ip: IpAddress) -> Result<(), ExecutorError> {
info!("Setting listen_ip to {}", &ip);
self.opnsense_config
.tftp()
.listen_ip(&ip.to_string())
.write()
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
.tftp()
.listen_ip(&ip.to_string());
Ok(())
}
async fn commit_config(&self) -> Result<(), ExecutorError> {
@@ -50,6 +46,8 @@ impl TftpServer for OPNSenseFirewall {
async fn reload_restart(&self) -> Result<(), ExecutorError> {
self.opnsense_config
.write()
.await
.tftp()
.reload_restart()
.await
@@ -57,23 +55,22 @@ impl TftpServer for OPNSenseFirewall {
}
async fn ensure_initialized(&self) -> Result<(), ExecutorError> {
if !self.opnsense_config.tftp().is_installed().await {
info!("TFTP not installed, installing os-tftp package");
self.opnsense_config
.install_package("os-tftp")
.await
.map_err(|e| {
ExecutorError::UnexpectedError(format!("Failed to install os-tftp: {e:?}"))
})?;
let mut config = self.opnsense_config.write().await;
let tftp = config.tftp();
if tftp.get_full_config().is_none() {
info!("Tftp config not available in opnsense config, installing package");
config.install_package("os-tftp").await.map_err(|e| {
ExecutorError::UnexpectedError(format!(
"Executor failed when trying to install os-tftp package with error {e:?}"
))
})?;
} else {
info!("TFTP config available, assuming it is already installed");
info!("Tftp config available in opnsense config, assuming it is already installed");
}
info!("Enabling tftp server");
self.opnsense_config
.tftp()
.enable(true)
.await
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
config.tftp().enable(true);
Ok(())
}
}

View File

@@ -57,7 +57,6 @@ pub enum RustWebFramework {
#[derive(Debug, Clone, Serialize)]
pub struct RustWebapp {
pub name: String,
pub version: String,
/// The path to the root of the Rust project to be containerized.
pub project_root: PathBuf,
pub service_port: u32,
@@ -466,7 +465,6 @@ impl RustWebapp {
let app_name = &self.name;
let service_port = self.service_port;
let chart_version = &self.version;
// Create Chart.yaml
let chart_yaml = format!(
r#"
@@ -474,7 +472,7 @@ apiVersion: v2
name: {chart_name}
description: A Helm chart for the {app_name} web application.
type: application
version: {chart_version}
version: 0.2.1
appVersion: "{image_tag}"
"#,
);

View File

@@ -1,5 +1,5 @@
use async_trait::async_trait;
use brocade::{BrocadeOptions, PortOperatingMode};
use brocade::{BrocadeOptions, InterfaceConfig, PortChannelConfig, PortChannelId, PortOperatingMode, Vlan};
use crate::{
data::Version,
@@ -8,7 +8,7 @@ use crate::{
inventory::Inventory,
score::Score,
topology::{
HostNetworkConfig, PortConfig, PreparationError, PreparationOutcome, Switch, SwitchClient,
HostNetworkConfig, PreparationError, PreparationOutcome, Switch, SwitchClient,
SwitchError, Topology,
},
};
@@ -20,7 +20,7 @@ use serde::Serialize;
#[derive(Clone, Debug, Serialize)]
pub struct BrocadeSwitchScore {
pub port_channels_to_clear: Vec<Id>,
pub ports_to_configure: Vec<PortConfig>,
pub ports_to_configure: Vec<InterfaceConfig>,
}
impl<T: Topology + Switch> Score<T> for BrocadeSwitchScore {
@@ -59,7 +59,7 @@ impl<T: Topology + Switch> Interpret<T> for BrocadeSwitchInterpret {
.map_err(|e| InterpretError::new(e.to_string()))?;
debug!("Configuring interfaces {:?}", self.score.ports_to_configure);
topology
.configure_interface(&self.score.ports_to_configure)
.configure_interfaces(&self.score.ports_to_configure)
.await
.map_err(|e| InterpretError::new(e.to_string()))?;
Ok(Outcome::success("switch configured".to_string()))
@@ -126,13 +126,43 @@ impl Switch for SwitchTopology {
todo!()
}
async fn configure_port_channel(&self, _config: &HostNetworkConfig) -> Result<(), SwitchError> {
async fn configure_port_channel(
&self,
_channel_id: PortChannelId,
_config: &HostNetworkConfig,
) -> Result<(), SwitchError> {
todo!()
}
async fn configure_port_channel_from_config(
&self,
config: &PortChannelConfig,
) -> Result<(), SwitchError> {
self.client
.configure_port_channel(
config.id,
&config.name,
config.ports.clone(),
config.speed.as_ref(),
)
.await
.map_err(|e| SwitchError::new(format!("Failed to create port-channel: {e}")))?;
Ok(())
}
async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
self.client.clear_port_channel(ids).await
}
async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
self.client.configure_interface(ports).await
async fn configure_interfaces(
&self,
interfaces: &Vec<InterfaceConfig>,
) -> Result<(), SwitchError> {
self.client.configure_interfaces(interfaces).await
}
async fn create_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError> {
self.client.create_vlan(vlan).await
}
async fn delete_vlan(&self, vlan: &Vlan) -> Result<(), SwitchError> {
self.client.delete_vlan(vlan).await
}
}

View File

@@ -0,0 +1,179 @@
use async_trait::async_trait;
use brocade::{InterfaceConfig, PortChannelConfig, Vlan};
use harmony_types::id::Id;
use log::{debug, info};
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
score::Score,
topology::{Switch, SwitchError, Topology},
};
#[derive(Clone, Debug, Serialize)]
pub struct BrocadeSwitchConfigurationScore {
/// VLANs to create on the switch. Define once, reference everywhere.
pub vlans: Vec<Vlan>,
/// Standalone interfaces (NOT members of a port-channel).
/// Each has its own VLAN/mode configuration.
pub interfaces: Vec<InterfaceConfig>,
/// Port-channels: bundles of ports with VLAN/mode config
/// applied on the logical port-channel interface, not on the members.
pub port_channels: Vec<PortChannelConfig>,
}
impl<T: Topology + Switch> Score<T> for BrocadeSwitchConfigurationScore {
fn name(&self) -> String {
"BrocadeSwitchConfigurationScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(BrocadeSwitchConfigurationInterpret {
score: self.clone(),
})
}
}
#[derive(Debug)]
struct BrocadeSwitchConfigurationInterpret {
score: BrocadeSwitchConfigurationScore,
}
#[async_trait]
impl<T: Topology + Switch> Interpret<T> for BrocadeSwitchConfigurationInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
self.create_vlans(topology).await?;
self.create_port_channels(topology).await?;
self.configure_port_channel_interfaces(topology).await?;
self.configure_standalone_interfaces(topology).await?;
Ok(Outcome::success(
"Switch configuration applied successfully".to_string(),
))
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("BrocadeSwitchConfigurationInterpret")
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
impl BrocadeSwitchConfigurationInterpret {
async fn create_vlans<T: Topology + Switch>(
&self,
topology: &T,
) -> Result<(), InterpretError> {
for vlan in &self.score.vlans {
info!("Creating VLAN {} ({})", vlan.id, vlan.name);
topology
.create_vlan(vlan)
.await
.map_err(|e| InterpretError::new(format!("Failed to create VLAN {}: {e}", vlan.id)))?;
}
Ok(())
}
async fn create_port_channels<T: Topology + Switch>(
&self,
topology: &T,
) -> Result<(), InterpretError> {
for pc in &self.score.port_channels {
info!(
"Creating port-channel {} ({}) with ports: {:?}",
pc.id, pc.name, pc.ports
);
topology
.configure_port_channel_from_config(pc)
.await
.map_err(|e| {
InterpretError::new(format!(
"Failed to create port-channel {} ({}): {e}",
pc.id, pc.name
))
})?;
}
Ok(())
}
async fn configure_port_channel_interfaces<T: Topology + Switch>(
&self,
topology: &T,
) -> Result<(), InterpretError> {
let pc_interfaces: Vec<InterfaceConfig> = self
.score
.port_channels
.iter()
.map(|pc| InterfaceConfig {
interface: brocade::SwitchInterface::PortChannel(pc.id),
mode: pc.mode.clone(),
access_vlan: pc.access_vlan.as_ref().map(|v| v.id),
trunk_vlans: pc.trunk_vlans.clone(),
speed: pc.speed.clone(),
})
.collect();
if !pc_interfaces.is_empty() {
info!(
"Configuring L2 mode on {} port-channel interface(s)",
pc_interfaces.len()
);
for pc in &self.score.port_channels {
debug!(
" port-channel {} ({}): mode={:?}, vlans={:?}, speed={:?}",
pc.id, pc.name, pc.mode, pc.trunk_vlans, pc.speed
);
}
topology
.configure_interfaces(&pc_interfaces)
.await
.map_err(|e| {
InterpretError::new(format!(
"Failed to configure port-channel interfaces: {e}"
))
})?;
}
Ok(())
}
async fn configure_standalone_interfaces<T: Topology + Switch>(
&self,
topology: &T,
) -> Result<(), InterpretError> {
if !self.score.interfaces.is_empty() {
info!(
"Configuring {} standalone interface(s)",
self.score.interfaces.len()
);
for iface in &self.score.interfaces {
debug!(
" {}: mode={:?}, speed={:?}",
iface.interface, iface.mode, iface.speed
);
}
topology
.configure_interfaces(&self.score.interfaces)
.await
.map_err(|e| {
InterpretError::new(format!("Failed to configure interfaces: {e}"))
})?;
}
Ok(())
}
}

View File

@@ -3,3 +3,6 @@ pub use brocade::*;
pub mod brocade_snmp;
pub use brocade_snmp::*;
pub mod brocade_switch_configuration;
pub use brocade_switch_configuration::*;

View File

@@ -192,7 +192,7 @@ impl DhcpHostBindingInterpret {
for entry in dhcp_entries.into_iter() {
match dhcp_server.add_static_mapping(&entry).await {
Ok(_) => info!("Successfully registered DHCPStaticEntry {}", entry),
Err(e) => return Err(InterpretError::from(e)),
Err(_) => todo!(),
}
}

View File

@@ -4,24 +4,12 @@ use std::net::Ipv4Addr;
use cidr::{Ipv4Cidr, Ipv4Inet};
pub use discovery::*;
use k8s_openapi::api::{
apps::v1::{DaemonSet, DaemonSetSpec},
core::v1::{
Container, EnvVar, Namespace, PodSpec, PodTemplateSpec, ResourceRequirements,
SecurityContext, ServiceAccount, Toleration,
},
rbac::v1::{PolicyRule, Role, RoleBinding, RoleRef, Subject},
};
use k8s_openapi::apimachinery::pkg::api::resource::Quantity;
use k8s_openapi::apimachinery::pkg::apis::meta::v1::LabelSelector;
use kube::api::ObjectMeta;
use tokio::time::{Duration, timeout};
use async_trait::async_trait;
use harmony_inventory_agent::local_presence::DiscoveryEvent;
use log::{debug, info, trace};
use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
use crate::{
data::Version,
@@ -29,9 +17,8 @@ use crate::{
infra::inventory::InventoryRepositoryFactory,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
modules::k8s::resource::K8sResourceScore,
score::Score,
topology::{K8sclient, Topology},
topology::Topology,
};
use harmony_types::id::Id;
@@ -303,224 +290,3 @@ impl DiscoverInventoryAgentInterpret {
info!("CIDR discovery completed");
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DeployInventoryAgentScore {
pub image: Option<String>,
}
impl Default for DeployInventoryAgentScore {
fn default() -> Self {
Self {
image: Some("hub.nationtech.io/harmony/harmony_inventory_agent:latest".to_string()),
}
}
}
impl<T: Topology + K8sclient> Score<T> for DeployInventoryAgentScore {
fn name(&self) -> String {
"DeployInventoryAgentScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(DeployInventoryAgentInterpret {
score: self.clone(),
})
}
}
#[derive(Debug)]
struct DeployInventoryAgentInterpret {
score: DeployInventoryAgentScore,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for DeployInventoryAgentInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let namespace_name = "harmony-inventory-agent".to_string();
let image = self.score.image.as_ref().unwrap();
let mut ns_labels = BTreeMap::new();
ns_labels.insert(
"pod-security.kubernetes.io/enforce".to_string(),
"privileged".to_string(),
);
ns_labels.insert(
"pod-security.kubernetes.io/audit".to_string(),
"privileged".to_string(),
);
ns_labels.insert(
"pod-security.kubernetes.io/warn".to_string(),
"privileged".to_string(),
);
let namespace = Namespace {
metadata: ObjectMeta {
name: Some(namespace_name.clone()),
labels: Some(ns_labels),
..ObjectMeta::default()
},
..Namespace::default()
};
let service_account_name = "harmony-inventory-agent".to_string();
let service_account = ServiceAccount {
metadata: ObjectMeta {
name: Some(service_account_name.clone()),
namespace: Some(namespace_name.clone()),
..ObjectMeta::default()
},
..ServiceAccount::default()
};
let role = Role {
metadata: ObjectMeta {
name: Some("use-privileged-scc".to_string()),
namespace: Some(namespace_name.clone()),
..ObjectMeta::default()
},
rules: Some(vec![PolicyRule {
api_groups: Some(vec!["security.openshift.io".to_string()]),
resources: Some(vec!["securitycontextconstraints".to_string()]),
resource_names: Some(vec!["privileged".to_string()]),
verbs: vec!["use".to_string()],
..PolicyRule::default()
}]),
..Role::default()
};
let role_binding = RoleBinding {
metadata: ObjectMeta {
name: Some("use-privileged-scc".to_string()),
namespace: Some(namespace_name.clone()),
..ObjectMeta::default()
},
subjects: Some(vec![Subject {
kind: "ServiceAccount".to_string(),
name: service_account_name.clone(),
namespace: Some(namespace_name.clone()),
..Subject::default()
}]),
role_ref: RoleRef {
api_group: "rbac.authorization.k8s.io".to_string(),
kind: "Role".to_string(),
name: "use-privileged-scc".to_string(),
},
};
let mut daemonset_labels = BTreeMap::new();
daemonset_labels.insert("app".to_string(), "harmony-inventory-agent".to_string());
let daemon_set = DaemonSet {
metadata: ObjectMeta {
name: Some("harmony-inventory-agent".to_string()),
namespace: Some(namespace_name.clone()),
labels: Some(daemonset_labels.clone()),
..ObjectMeta::default()
},
spec: Some(DaemonSetSpec {
selector: LabelSelector {
match_labels: Some(daemonset_labels.clone()),
..LabelSelector::default()
},
template: PodTemplateSpec {
metadata: Some(ObjectMeta {
labels: Some(daemonset_labels),
..ObjectMeta::default()
}),
spec: Some(PodSpec {
service_account_name: Some(service_account_name.clone()),
host_network: Some(true),
dns_policy: Some("ClusterFirstWithHostNet".to_string()),
tolerations: Some(vec![Toleration {
key: Some("node-role.kubernetes.io/master".to_string()),
operator: Some("Exists".to_string()),
effect: Some("NoSchedule".to_string()),
..Toleration::default()
}]),
containers: vec![Container {
name: "inventory-agent".to_string(),
image: Some(image.to_string()),
image_pull_policy: Some("Always".to_string()),
env: Some(vec![EnvVar {
name: "RUST_LOG".to_string(),
value: Some("harmony_inventory_agent=trace,info".to_string()),
..EnvVar::default()
}]),
resources: Some(ResourceRequirements {
limits: Some({
let mut limits = BTreeMap::new();
limits.insert("cpu".to_string(), Quantity("200m".to_string()));
limits.insert(
"memory".to_string(),
Quantity("256Mi".to_string()),
);
limits
}),
requests: Some({
let mut requests = BTreeMap::new();
requests
.insert("cpu".to_string(), Quantity("100m".to_string()));
requests.insert(
"memory".to_string(),
Quantity("128Mi".to_string()),
);
requests
}),
..ResourceRequirements::default()
}),
security_context: Some(SecurityContext {
privileged: Some(true),
..SecurityContext::default()
}),
..Container::default()
}],
..PodSpec::default()
}),
},
..DaemonSetSpec::default()
}),
..DaemonSet::default()
};
K8sResourceScore::single(namespace, None)
.interpret(_inventory, topology)
.await?;
K8sResourceScore::single(service_account, Some(namespace_name.clone()))
.interpret(_inventory, topology)
.await?;
K8sResourceScore::single(role, Some(namespace_name.clone()))
.interpret(_inventory, topology)
.await?;
K8sResourceScore::single(role_binding, Some(namespace_name.clone()))
.interpret(_inventory, topology)
.await?;
K8sResourceScore::single(daemon_set, Some(namespace_name.clone()))
.interpret(_inventory, topology)
.await?;
Ok(Outcome::success(
"Harmony inventory agent successfully deployed".to_string(),
))
}
fn get_name(&self) -> InterpretName {
InterpretName::DeployInventoryAgent
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}

View File

@@ -1,181 +0,0 @@
use async_trait::async_trait;
use k8s_openapi::api::core::v1::{ConfigMap, Pod};
use kube::api::ListParams;
use log::{debug, info};
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
score::Score,
topology::{K8sclient, Topology},
};
use harmony_types::id::Id;
/// A DNS rewrite rule mapping a hostname to a cluster service FQDN.
#[derive(Debug, Clone, Serialize)]
pub struct CoreDNSRewrite {
/// The hostname to intercept (e.g., `"sso.harmony.local"`).
pub hostname: String,
/// The cluster service FQDN to resolve to (e.g., `"zitadel.zitadel.svc.cluster.local"`).
pub target: String,
}
/// Score that patches CoreDNS to add `rewrite name` rules.
///
/// Useful when in-cluster pods need to reach services by their external
/// hostnames (e.g., for Zitadel Host header validation, or OpenBao JWT
/// auth fetching JWKS from Zitadel).
///
/// Only applies to K3sFamily and Default distributions. No-op on OpenShift
/// (which uses a different DNS operator).
///
/// Idempotent: existing rules are detected and skipped. CoreDNS pods are
/// restarted only when new rules are added.
#[derive(Debug, Clone, Serialize)]
pub struct CoreDNSRewriteScore {
pub rewrites: Vec<CoreDNSRewrite>,
}
impl<T: Topology + K8sclient> Score<T> for CoreDNSRewriteScore {
fn name(&self) -> String {
"CoreDNSRewriteScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(CoreDNSRewriteInterpret {
rewrites: self.rewrites.clone(),
})
}
}
#[derive(Debug, Clone)]
struct CoreDNSRewriteInterpret {
rewrites: Vec<CoreDNSRewrite>,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for CoreDNSRewriteInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let k8s = topology
.k8s_client()
.await
.map_err(|e| InterpretError::new(format!("Failed to get K8s client: {e}")))?;
let distro = k8s
.get_k8s_distribution()
.await
.map_err(|e| InterpretError::new(format!("Failed to detect distribution: {e}")))?;
if !matches!(
distro,
harmony_k8s::KubernetesDistribution::K3sFamily
| harmony_k8s::KubernetesDistribution::Default
) {
return Ok(Outcome::noop(
"Skipping CoreDNS patch (not K3sFamily)".to_string(),
));
}
let cm: ConfigMap = k8s
.get_resource::<ConfigMap>("coredns", Some("kube-system"))
.await
.map_err(|e| InterpretError::new(format!("Failed to get coredns ConfigMap: {e}")))?
.ok_or_else(|| {
InterpretError::new("CoreDNS ConfigMap not found in kube-system".to_string())
})?;
let corefile = cm
.data
.as_ref()
.and_then(|d| d.get("Corefile"))
.ok_or_else(|| InterpretError::new("CoreDNS ConfigMap has no Corefile key".into()))?;
let mut new_rules = Vec::new();
for r in &self.rewrites {
if !corefile.contains(&format!("rewrite name {} {}", r.hostname, r.target)) {
new_rules.push(format!(" rewrite name {} {}", r.hostname, r.target));
}
}
if new_rules.is_empty() {
return Ok(Outcome::noop(
"CoreDNS rewrite rules already present".to_string(),
));
}
let patched = corefile.replacen(
".:53 {\n",
&format!(".:53 {{\n{}\n", new_rules.join("\n")),
1,
);
debug!("[CoreDNS] Patched Corefile:\n{}", patched);
// Use apply_dynamic with force_conflicts since the ConfigMap is
// owned by the cluster deployer (e.g., k3d) and server-side apply
// would conflict without force.
let patch_obj: kube::api::DynamicObject = serde_json::from_value(serde_json::json!({
"apiVersion": "v1",
"kind": "ConfigMap",
"metadata": { "name": "coredns", "namespace": "kube-system" },
"data": { "Corefile": patched }
}))
.map_err(|e| InterpretError::new(format!("Failed to build patch: {e}")))?;
k8s.apply_dynamic(&patch_obj, Some("kube-system"), true)
.await
.map_err(|e| InterpretError::new(format!("Failed to apply CoreDNS patch: {e}")))?;
// Restart CoreDNS pods to pick up the new config
let pods = k8s
.list_resources::<Pod>(
Some("kube-system"),
Some(ListParams::default().labels("k8s-app=kube-dns")),
)
.await
.map_err(|e| InterpretError::new(format!("Failed to list CoreDNS pods: {e}")))?;
for pod in pods.items {
if let Some(name) = &pod.metadata.name {
let _ = k8s.delete_resource::<Pod>(name, Some("kube-system")).await;
}
}
// Brief pause for pods to restart
tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;
info!("[CoreDNS] Patched with {} rewrite rule(s)", new_rules.len());
Ok(Outcome {
status: InterpretStatus::SUCCESS,
message: format!("{} CoreDNS rewrite rule(s) applied", new_rules.len()),
details: self
.rewrites
.iter()
.map(|r| format!("{} -> {}", r.hostname, r.target))
.collect(),
})
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("CoreDNSRewrite")
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
vec![]
}
}

View File

@@ -1,5 +1,4 @@
pub mod apps;
pub mod coredns;
pub mod deployment;
mod failover;
pub mod ingress;

View File

@@ -31,9 +31,6 @@ pub enum KvmError {
#[error("ISO download failed: {0}")]
IsoDownload(String),
#[error("command failed: {0}")]
CommandFailed(String),
#[error("libvirt error: {0}")]
Libvirt(#[from] virt::error::Error),

View File

@@ -1,5 +1,4 @@
use log::{debug, info, warn};
use std::net::IpAddr;
use virt::connect::Connect;
use virt::domain::Domain;
use virt::network::Network;
@@ -8,7 +7,7 @@ use virt::storage_vol::StorageVol;
use virt::sys;
use super::error::KvmError;
use super::types::{CdromConfig, NetworkConfig, VmConfig, VmInterface, VmStatus};
use super::types::{CdromConfig, NetworkConfig, VmConfig, VmStatus};
use super::xml;
/// A handle to a libvirt hypervisor.
@@ -200,11 +199,6 @@ impl KvmExecutor {
let dom = Domain::lookup_by_name(&conn, name).map_err(|_| KvmError::VmNotFound {
name: name.to_string(),
})?;
let (state, _) = dom.get_state()?;
if state == sys::VIR_DOMAIN_RUNNING || state == sys::VIR_DOMAIN_BLOCKED {
debug!("VM '{name}' is already running, skipping start");
return Ok(());
}
dom.create()?;
info!("VM '{name}' started");
Ok(())
@@ -298,154 +292,12 @@ impl KvmExecutor {
Ok(status)
}
/// Returns the first IPv4 address of a running VM, or `None` if no
/// address has been assigned yet.
///
/// Uses the libvirt lease/agent source to discover the IP. This requires
/// the VM to have obtained an address via DHCP from the libvirt network.
pub async fn vm_ip(&self, name: &str) -> Result<Option<IpAddr>, KvmError> {
let executor = self.clone();
let name = name.to_string();
tokio::task::spawn_blocking(move || executor.vm_ip_blocking(&name))
.await
.expect("blocking task panicked")
}
fn vm_ip_blocking(&self, name: &str) -> Result<Option<IpAddr>, KvmError> {
let conn = self.open_connection()?;
let dom = Domain::lookup_by_name(&conn, name).map_err(|_| KvmError::VmNotFound {
name: name.to_string(),
})?;
// Try lease-based source first (works with libvirt's built-in DHCP)
let interfaces = dom
.interface_addresses(sys::VIR_DOMAIN_INTERFACE_ADDRESSES_SRC_LEASE, 0)
.unwrap_or_default();
for iface in &interfaces {
for addr in &iface.addrs {
// typed == 0 means IPv4 (AF_INET)
if addr.typed == 0 {
if let Ok(ip) = addr.addr.parse::<IpAddr>() {
return Ok(Some(ip));
}
}
}
}
Ok(None)
}
/// Polls until a VM has an IP address, with a timeout.
///
/// Returns the IP once available, or an error if the timeout is reached.
pub async fn wait_for_ip(
&self,
name: &str,
timeout: std::time::Duration,
) -> Result<IpAddr, KvmError> {
let deadline = tokio::time::Instant::now() + timeout;
loop {
if let Some(ip) = self.vm_ip(name).await? {
info!("VM '{name}' has IP: {ip}");
return Ok(ip);
}
if tokio::time::Instant::now() > deadline {
return Err(KvmError::Io(std::io::Error::new(
std::io::ErrorKind::TimedOut,
format!("VM '{name}' did not obtain an IP within {timeout:?}"),
)));
}
tokio::time::sleep(std::time::Duration::from_secs(3)).await;
}
}
// -------------------------------------------------------------------------
// NIC link control
// -------------------------------------------------------------------------
/// Set the link state of a VM's network interface.
///
/// Brings a NIC up or down by MAC address. Useful for preventing IP
/// conflicts when multiple VMs boot with the same default IP — disable
/// all NICs, then enable one at a time for sequential bootstrapping.
///
/// Uses `virsh domif-setlink` under the hood.
pub async fn set_interface_link(
&self,
vm_name: &str,
mac: &str,
up: bool,
) -> Result<(), KvmError> {
let state = if up { "up" } else { "down" };
info!("Setting {vm_name} interface {mac} link {state}");
let output = tokio::process::Command::new("virsh")
.args(["-c", &self.uri, "domif-setlink", vm_name, mac, state])
.output()
.await?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
return Err(KvmError::CommandFailed(format!(
"domif-setlink failed: {}",
stderr.trim()
)));
}
Ok(())
}
/// List all network interfaces of a VM with their MAC addresses.
///
/// Returns a list of `(interface_type, source, mac, model)` tuples.
pub async fn list_interfaces(&self, vm_name: &str) -> Result<Vec<VmInterface>, KvmError> {
let output = tokio::process::Command::new("virsh")
.args(["-c", &self.uri, "domiflist", vm_name])
.output()
.await?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
return Err(KvmError::CommandFailed(format!(
"domiflist failed: {}",
stderr.trim()
)));
}
let stdout = String::from_utf8_lossy(&output.stdout);
let mut interfaces = Vec::new();
for line in stdout.lines().skip(2) {
// virsh domiflist columns: Interface, Type, Source, Model, MAC
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 5 {
interfaces.push(VmInterface {
interface_type: parts[1].to_string(),
source: parts[2].to_string(),
model: parts[3].to_string(),
mac: parts[4].to_string(),
});
}
}
Ok(interfaces)
}
// -------------------------------------------------------------------------
// Storage
// -------------------------------------------------------------------------
fn create_volumes_blocking(&self, conn: &Connect, config: &VmConfig) -> Result<(), KvmError> {
for disk in &config.disks {
// Skip volume creation for disks with an existing source path
if disk.source_path.is_some() {
debug!(
"Disk '{}' uses existing source, skipping volume creation",
disk.device
);
continue;
}
let pool = StoragePool::lookup_by_name(conn, &disk.pool).map_err(|_| {
KvmError::StoragePoolNotFound {
name: disk.pool.clone(),

View File

@@ -8,6 +8,6 @@ pub mod types;
pub use error::KvmError;
pub use executor::KvmExecutor;
pub use types::{
BootDevice, CdromConfig, DhcpHost, DiskConfig, ForwardMode, NetworkConfig,
NetworkConfigBuilder, NetworkRef, VmConfig, VmConfigBuilder, VmInterface, VmStatus,
BootDevice, CdromConfig, DiskConfig, ForwardMode, NetworkConfig, NetworkConfigBuilder,
NetworkRef, VmConfig, VmConfigBuilder, VmStatus,
};

View File

@@ -1,18 +1,5 @@
use serde::{Deserialize, Serialize};
/// Information about a VM's network interface, as reported by `virsh domiflist`.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VmInterface {
/// Interface type (e.g. "network", "bridge")
pub interface_type: String,
/// Source network or bridge name
pub source: String,
/// Device model (e.g. "virtio")
pub model: String,
/// MAC address
pub mac: String,
}
/// Specifies how a KVM host is accessed.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum KvmConnectionUri {
@@ -37,14 +24,12 @@ impl KvmConnectionUri {
/// Configuration for a virtual disk attached to a VM.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DiskConfig {
/// Disk size in gigabytes. Ignored when `source_path` is set.
/// Disk size in gigabytes.
pub size_gb: u32,
/// Target device name in the guest (e.g. `vda`, `vdb`).
pub device: String,
/// Storage pool to allocate the volume from. Defaults to `"default"`.
pub pool: String,
/// When set, use this existing disk image instead of creating a new volume.
pub source_path: Option<String>,
}
/// Configuration for a CD-ROM/ISO device attached to a VM.
@@ -66,18 +51,6 @@ impl DiskConfig {
size_gb,
device,
pool: "default".to_string(),
source_path: None,
}
}
/// Use an existing disk image file instead of creating a new volume.
pub fn from_path(path: impl Into<String>, index: u8) -> Self {
let device = format!("vd{}", (b'a' + index) as char);
Self {
size_gb: 0,
device,
pool: String::new(),
source_path: Some(path.into()),
}
}
@@ -206,13 +179,6 @@ impl VmConfigBuilder {
self
}
/// Appends a disk backed by an existing qcow2/raw image file.
pub fn disk_from_path(mut self, path: impl Into<String>) -> Self {
let idx = self.disks.len() as u8;
self.disks.push(DiskConfig::from_path(path, idx));
self
}
/// Appends a disk with an explicit pool override.
pub fn disk_from_pool(mut self, size_gb: u32, pool: impl Into<String>) -> Self {
let idx = self.disks.len() as u8;
@@ -256,17 +222,6 @@ impl VmConfigBuilder {
}
}
/// A DHCP static host entry for a libvirt network.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DhcpHost {
/// MAC address (e.g. `"52:54:00:00:50:01"`).
pub mac: String,
/// IP to assign (e.g. `"10.50.0.2"`).
pub ip: String,
/// Optional hostname.
pub name: Option<String>,
}
/// Configuration for an isolated virtual network.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NetworkConfig {
@@ -280,11 +235,6 @@ pub struct NetworkConfig {
pub prefix_len: u8,
/// Forward mode. When `None`, the network is fully isolated.
pub forward_mode: Option<ForwardMode>,
/// Optional DHCP range (start, end). When set, libvirt's built-in
/// DHCP server hands out addresses in this range.
pub dhcp_range: Option<(String, String)>,
/// Static DHCP host entries for fixed IP assignment by MAC.
pub dhcp_hosts: Vec<DhcpHost>,
}
/// Libvirt network forward mode.
@@ -308,8 +258,6 @@ pub struct NetworkConfigBuilder {
gateway_ip: String,
prefix_len: u8,
forward_mode: Option<ForwardMode>,
dhcp_range: Option<(String, String)>,
dhcp_hosts: Vec<DhcpHost>,
}
impl NetworkConfigBuilder {
@@ -320,8 +268,6 @@ impl NetworkConfigBuilder {
gateway_ip: "192.168.100.1".to_string(),
prefix_len: 24,
forward_mode: Some(ForwardMode::Nat),
dhcp_range: None,
dhcp_hosts: vec![],
}
}
@@ -347,27 +293,6 @@ impl NetworkConfigBuilder {
self
}
/// Enable libvirt's built-in DHCP server with the given range.
pub fn dhcp_range(mut self, start: impl Into<String>, end: impl Into<String>) -> Self {
self.dhcp_range = Some((start.into(), end.into()));
self
}
/// Add a static DHCP host entry (MAC → fixed IP).
pub fn dhcp_host(
mut self,
mac: impl Into<String>,
ip: impl Into<String>,
name: Option<String>,
) -> Self {
self.dhcp_hosts.push(DhcpHost {
mac: mac.into(),
ip: ip.into(),
name,
});
self
}
pub fn build(self) -> NetworkConfig {
NetworkConfig {
bridge: self
@@ -377,8 +302,6 @@ impl NetworkConfigBuilder {
gateway_ip: self.gateway_ip,
prefix_len: self.prefix_len,
forward_mode: self.forward_mode,
dhcp_range: self.dhcp_range,
dhcp_hosts: self.dhcp_hosts,
}
}
}

View File

@@ -1,40 +1,3 @@
//! Libvirt XML generation via string templates.
//!
//! # Why string templates?
//!
//! These functions build libvirt domain, network, and volume XML as formatted
//! strings rather than typed structs. This is fragile — there is no compile-time
//! guarantee that the output is valid XML, and tests rely on substring matching
//! rather than structural validation.
//!
//! We investigated typed alternatives (evaluated 2026-03-24):
//!
//! - **`libvirt-rust-xml`** (gen branch, by Marc-André Lureau / Red Hat):
//! <https://gitlab.com/marcandre.lureau/libvirt-rust-xml/-/tree/gen>
//! Uses `relaxng-gen` (<https://github.com/elmarco/relaxng-rust>) to generate
//! Rust structs from libvirt's official RelaxNG schemas. This is the correct
//! long-term solution — zero maintenance burden, schema-validated, round-trip
//! serialization. However, as of commit `baca481`, `virtxml-domain` and
//! `virtxml-storage-volume` do not compile (missing modules + type inference
//! errors in the generated code). Only `virtxml-network` compiles.
//!
//! - **`libvirt-go-xml-module`** (Go, official libvirt project):
//! <https://gitlab.com/libvirt/libvirt-go-xml-module>
//! 572 hand-maintained typed structs for domain XML alone. MIT licensed.
//! Could be ported to Rust, but maintaining a manual port is the burden we
//! want to avoid.
//!
//! - **`virt` crate** (0.4.3, already in use):
//! C bindings to libvirt. Handles API calls but provides no XML typing —
//! `Domain::define_xml()` takes `&str`. This stays regardless of XML approach.
//!
//! # When to revisit
//!
//! Track the `libvirt-rust-xml` gen branch. When `virtxml-domain` compiles,
//! replace these templates with typed struct construction + `quick-xml`
//! serialization. The `VmConfig`/`NetworkConfig` builder API stays unchanged —
//! only the internal XML generation changes.
use super::types::{CdromConfig, DiskConfig, ForwardMode, NetworkConfig, VmConfig};
/// Renders the libvirt domain XML for a VM definition.
@@ -99,10 +62,7 @@ fn cdrom_devices(vm: &VmConfig) -> String {
}
fn format_disk(vm: &VmConfig, disk: &DiskConfig, image_dir: &str) -> String {
let path = disk
.source_path
.clone()
.unwrap_or_else(|| format!("{image_dir}/{}-{}.qcow2", vm.name, disk.device));
let path = format!("{image_dir}/{}-{}.qcow2", vm.name, disk.device);
format!(
r#" <disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
@@ -118,15 +78,21 @@ fn format_disk(vm: &VmConfig, disk: &DiskConfig, image_dir: &str) -> String {
fn format_cdrom(cdrom: &CdromConfig) -> String {
let source = &cdrom.source;
let dev = &cdrom.device;
let device_type = if source.starts_with("http://") || source.starts_with("https://") {
"cdrom"
} else {
"cdrom"
};
format!(
r#" <disk type='file' device='cdrom'>
r#" <disk type='file' device='{device_type}'>
<driver name='qemu' type='raw'/>
<source file='{source}'/>
<target dev='{dev}' bus='sata'/>
<target dev='{dev}' bus='ide'/>
</disk>
"#,
source = source,
dev = dev,
device_type = device_type,
)
}
@@ -160,42 +126,17 @@ pub fn network_xml(cfg: &NetworkConfig) -> String {
None => "",
};
let dhcp = if cfg.dhcp_range.is_some() || !cfg.dhcp_hosts.is_empty() {
let mut dhcp_xml = String::from(" <dhcp>\n");
if let Some((start, end)) = &cfg.dhcp_range {
dhcp_xml.push_str(&format!(" <range start='{start}' end='{end}'/>\n"));
}
for host in &cfg.dhcp_hosts {
let name_attr = host
.name
.as_deref()
.map(|n| format!(" name='{n}'"))
.unwrap_or_default();
dhcp_xml.push_str(&format!(
" <host mac='{mac}'{name_attr} ip='{ip}'/>\n",
mac = host.mac,
ip = host.ip,
));
}
dhcp_xml.push_str(" </dhcp>\n");
dhcp_xml
} else {
String::new()
};
format!(
r#"<network>
<name>{name}</name>
<bridge name='{bridge}' stp='on' delay='0'/>
{forward} <ip address='{gateway}' prefix='{prefix}'>
{dhcp} </ip>
{forward} <ip address='{gateway}' prefix='{prefix}'/>
</network>"#,
name = cfg.name,
bridge = cfg.bridge,
forward = forward,
gateway = cfg.gateway_ip,
prefix = cfg.prefix_len,
dhcp = dhcp,
)
}
@@ -218,11 +159,7 @@ pub fn volume_xml(name: &str, size_gb: u32) -> String {
#[cfg(test)]
mod tests {
use super::*;
use crate::modules::kvm::types::{
BootDevice, ForwardMode, NetworkConfig, NetworkRef, VmConfig,
};
// ── Domain XML ──────────────────────────────────────────────────────
use crate::modules::kvm::types::{BootDevice, NetworkRef, VmConfig};
#[test]
fn domain_xml_contains_vm_name() {
@@ -241,101 +178,10 @@ mod tests {
assert!(xml.contains("boot dev='hd'"));
}
#[test]
fn domain_xml_memory_conversion() {
let vm = VmConfig::builder("mem-test").memory_gb(8).build();
let xml = domain_xml(&vm, "/tmp");
// 8 GB = 8 * 1024 MiB = 8192 MiB = 8388608 KiB
assert!(xml.contains("<memory unit='KiB'>8388608</memory>"));
}
#[test]
fn domain_xml_multiple_disks() {
let vm = VmConfig::builder("multi-disk")
.disk(120) // vda
.disk(200) // vdb
.disk(500) // vdc
.build();
let xml = domain_xml(&vm, "/images");
assert!(xml.contains("multi-disk-vda.qcow2"));
assert!(xml.contains("multi-disk-vdb.qcow2"));
assert!(xml.contains("multi-disk-vdc.qcow2"));
assert!(xml.contains("dev='vda'"));
assert!(xml.contains("dev='vdb'"));
assert!(xml.contains("dev='vdc'"));
}
#[test]
fn domain_xml_multiple_nics() {
let vm = VmConfig::builder("multi-nic")
.network(NetworkRef::named("default"))
.network(NetworkRef::named("management"))
.network(NetworkRef::named("storage"))
.build();
let xml = domain_xml(&vm, "/tmp");
assert!(xml.contains("source network='default'"));
assert!(xml.contains("source network='management'"));
assert!(xml.contains("source network='storage'"));
// All NICs should be virtio
assert_eq!(xml.matches("model type='virtio'").count(), 3);
}
#[test]
fn domain_xml_nic_with_mac_address() {
let vm = VmConfig::builder("mac-test")
.network(NetworkRef::named("mynet").with_mac("52:54:00:AA:BB:CC"))
.build();
let xml = domain_xml(&vm, "/tmp");
assert!(xml.contains("mac address='52:54:00:AA:BB:CC'"));
}
#[test]
fn domain_xml_cdrom_device() {
let vm = VmConfig::builder("iso-test")
.cdrom("/path/to/image.iso")
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
.build();
let xml = domain_xml(&vm, "/tmp");
assert!(xml.contains("device='cdrom'"));
assert!(xml.contains("source file='/path/to/image.iso'"));
assert!(xml.contains("bus='sata'"));
assert!(xml.contains("boot dev='cdrom'"));
}
#[test]
fn domain_xml_q35_machine_type() {
let vm = VmConfig::builder("q35-test").build();
let xml = domain_xml(&vm, "/tmp");
assert!(xml.contains("machine='q35'"));
assert!(xml.contains("<acpi/>"));
assert!(xml.contains("<apic/>"));
assert!(xml.contains("mode='host-model'"));
}
#[test]
fn domain_xml_serial_console() {
let vm = VmConfig::builder("console-test").build();
let xml = domain_xml(&vm, "/tmp");
assert!(xml.contains("<serial type='pty'>"));
assert!(xml.contains("<console type='pty'>"));
}
#[test]
fn domain_xml_empty_boot_order() {
let vm = VmConfig::builder("no-boot").build();
let xml = domain_xml(&vm, "/tmp");
// No boot entries should be present
assert!(!xml.contains("boot dev="));
}
// ── Network XML ─────────────────────────────────────────────────────
#[test]
fn network_xml_isolated_has_no_forward() {
use crate::modules::kvm::types::NetworkConfig;
let cfg = NetworkConfig::builder("testnet")
.subnet("10.0.0.1", 24)
.isolated()
@@ -344,144 +190,5 @@ mod tests {
let xml = network_xml(&cfg);
assert!(!xml.contains("<forward"));
assert!(xml.contains("10.0.0.1"));
assert!(xml.contains("prefix='24'"));
}
#[test]
fn network_xml_nat_mode() {
let cfg = NetworkConfig::builder("natnet")
.subnet("192.168.200.1", 24)
.forward(ForwardMode::Nat)
.build();
let xml = network_xml(&cfg);
assert!(xml.contains("<forward mode='nat'/>"));
assert!(xml.contains("192.168.200.1"));
}
#[test]
fn network_xml_route_mode() {
let cfg = NetworkConfig::builder("routenet")
.subnet("10.10.0.1", 16)
.forward(ForwardMode::Route)
.build();
let xml = network_xml(&cfg);
assert!(xml.contains("<forward mode='route'/>"));
assert!(xml.contains("prefix='16'"));
}
#[test]
fn network_xml_custom_bridge() {
let cfg = NetworkConfig::builder("custom")
.bridge("br-custom")
.subnet("172.16.0.1", 24)
.build();
let xml = network_xml(&cfg);
assert!(xml.contains("name='br-custom'"));
}
#[test]
fn network_xml_auto_bridge_name() {
let cfg = NetworkConfig::builder("harmony-test").isolated().build();
// Bridge auto-generated: virbr-{name} with hyphens removed from name
assert_eq!(cfg.bridge, "virbr-harmonytest");
}
// ── Volume XML ──────────────────────────────────────────────────────
#[test]
fn volume_xml_size_calculation() {
let xml = volume_xml("test-vol", 100);
// 100 GB = 100 * 1024^3 bytes = 107374182400
assert!(xml.contains("<capacity unit='bytes'>107374182400</capacity>"));
assert!(xml.contains("<name>test-vol.qcow2</name>"));
assert!(xml.contains("type='qcow2'"));
}
// ── Builder defaults ────────────────────────────────────────────────
#[test]
fn vm_builder_defaults() {
let vm = VmConfig::builder("defaults").build();
assert_eq!(vm.name, "defaults");
assert_eq!(vm.vcpus, 2);
assert_eq!(vm.memory_mib, 4096);
assert!(vm.disks.is_empty());
assert!(vm.networks.is_empty());
assert!(vm.cdroms.is_empty());
assert!(vm.boot_order.is_empty());
}
#[test]
fn network_builder_defaults() {
let net = NetworkConfig::builder("testnet").build();
assert_eq!(net.name, "testnet");
assert_eq!(net.gateway_ip, "192.168.100.1");
assert_eq!(net.prefix_len, 24);
assert!(matches!(net.forward_mode, Some(ForwardMode::Nat)));
}
#[test]
fn disk_sequential_naming() {
let vm = VmConfig::builder("seq")
.disk(10)
.disk(20)
.disk(30)
.disk(40)
.build();
assert_eq!(vm.disks[0].device, "vda");
assert_eq!(vm.disks[1].device, "vdb");
assert_eq!(vm.disks[2].device, "vdc");
assert_eq!(vm.disks[3].device, "vdd");
assert_eq!(vm.disks[0].size_gb, 10);
assert_eq!(vm.disks[3].size_gb, 40);
}
#[test]
fn network_xml_with_dhcp_range() {
let cfg = NetworkConfig::builder("dhcpnet")
.subnet("10.50.0.1", 24)
.dhcp_range("10.50.0.100", "10.50.0.200")
.build();
let xml = network_xml(&cfg);
assert!(xml.contains("<dhcp>"));
assert!(xml.contains("range start='10.50.0.100' end='10.50.0.200'"));
}
#[test]
fn network_xml_with_dhcp_host() {
let cfg = NetworkConfig::builder("hostnet")
.subnet("10.50.0.1", 24)
.dhcp_range("10.50.0.100", "10.50.0.200")
.dhcp_host(
"52:54:00:00:50:01",
"10.50.0.2",
Some("opnsense".to_string()),
)
.build();
let xml = network_xml(&cfg);
assert!(xml.contains("host mac='52:54:00:00:50:01'"));
assert!(xml.contains("name='opnsense'"));
assert!(xml.contains("ip='10.50.0.2'"));
}
#[test]
fn network_xml_no_dhcp_by_default() {
let cfg = NetworkConfig::builder("nodhcp").build();
let xml = network_xml(&cfg);
assert!(!xml.contains("<dhcp>"));
}
#[test]
fn disk_custom_pool() {
let vm = VmConfig::builder("pool-test")
.disk_from_pool(100, "ssd-pool")
.build();
assert_eq!(vm.disks[0].pool, "ssd-pool");
}
}

View File

@@ -19,12 +19,6 @@ pub struct LoadBalancerScore {
// (listen_interface, LoadBalancerService) tuples or something like that
// I am not sure what to use as listen_interface, should it be interface name, ip address,
// uuid?
/// TCP ports that must be open for inbound WAN traffic.
///
/// The load balancer interpret will call `ensure_wan_access` for each port
/// before configuring services, so that the load balancer is reachable
/// from outside the LAN.
pub wan_firewall_ports: Vec<u16>,
}
impl<T: Topology + LoadBalancer> Score<T> for LoadBalancerScore {
@@ -66,11 +60,6 @@ impl<T: Topology + LoadBalancer> Interpret<T> for LoadBalancerInterpret {
load_balancer.ensure_initialized().await?
);
for port in &self.score.wan_firewall_ports {
info!("Ensuring WAN access for port {port}");
load_balancer.ensure_wan_access(*port).await?;
}
for service in self.score.public_services.iter() {
info!("Ensuring service exists {service:?}");

View File

@@ -20,7 +20,6 @@ use async_trait::async_trait;
use derive_new::new;
use harmony_secret::SecretManager;
use harmony_types::id::Id;
use harmony_types::net::Url;
use log::{debug, info};
use serde::Serialize;
use std::path::PathBuf;
@@ -104,7 +103,7 @@ impl OKDSetup02BootstrapInterpret {
)));
} else {
info!(
"[Stage 02/Bootstrap] Created OKD installation directory {}",
"Created OKD installation directory {}",
okd_installation_path.to_string_lossy()
);
}
@@ -136,7 +135,7 @@ impl OKDSetup02BootstrapInterpret {
self.create_file(&install_config_backup, install_config_yaml.as_bytes())
.await?;
info!("[Stage 02/Bootstrap] Creating manifest files with openshift-install");
info!("Creating manifest files with openshift-install");
let output = Command::new(okd_bin_path.join("openshift-install"))
.args([
"create",
@@ -148,19 +147,10 @@ impl OKDSetup02BootstrapInterpret {
.await
.map_err(|e| InterpretError::new(format!("Failed to create okd manifest : {e}")))?;
let stdout = String::from_utf8(output.stdout).unwrap();
info!(
"[Stage 02/Bootstrap] openshift-install stdout :\n\n{}",
stdout
);
info!("openshift-install stdout :\n\n{}", stdout);
let stderr = String::from_utf8(output.stderr).unwrap();
info!(
"[Stage 02/Bootstrap] openshift-install stderr :\n\n{}",
stderr
);
info!(
"[Stage 02/Bootstrap] openshift-install exit status : {}",
output.status
);
info!("openshift-install stderr :\n\n{}", stderr);
info!("openshift-install exit status : {}", output.status);
if !output.status.success() {
return Err(InterpretError::new(format!(
"Failed to create okd manifest, exit code {} : {}",
@@ -168,7 +158,7 @@ impl OKDSetup02BootstrapInterpret {
)));
}
info!("[Stage 02/Bootstrap] Creating ignition files with openshift-install");
info!("Creating ignition files with openshift-install");
let output = Command::new(okd_bin_path.join("openshift-install"))
.args([
"create",
@@ -182,19 +172,10 @@ impl OKDSetup02BootstrapInterpret {
InterpretError::new(format!("Failed to create okd ignition config : {e}"))
})?;
let stdout = String::from_utf8(output.stdout).unwrap();
info!(
"[Stage 02/Bootstrap] openshift-install stdout :\n\n{}",
stdout
);
info!("openshift-install stdout :\n\n{}", stdout);
let stderr = String::from_utf8(output.stderr).unwrap();
info!(
"[Stage 02/Bootstrap] openshift-install stderr :\n\n{}",
stderr
);
info!(
"[Stage 02/Bootstrap] openshift-install exit status : {}",
output.status
);
info!("openshift-install stderr :\n\n{}", stderr);
info!("openshift-install exit status : {}", output.status);
if !output.status.success() {
return Err(InterpretError::new(format!(
"Failed to create okd manifest, exit code {} : {}",
@@ -208,7 +189,7 @@ impl OKDSetup02BootstrapInterpret {
let remote_path = ignition_files_http_path.join(filename);
info!(
"[Stage 02/Bootstrap] Preparing ignition file : {} -> {}",
"Preparing file content for local file : {} to remote : {}",
local_path.to_string_lossy(),
remote_path.to_string_lossy()
);
@@ -239,27 +220,25 @@ impl OKDSetup02BootstrapInterpret {
.interpret(inventory, topology)
.await?;
info!("[Stage 02/Bootstrap] Successfully prepared ignition files for OKD installation");
info!("Successfully prepared ignition files for OKD installation");
// ignition_files_http_path // = PathBuf::from("okd_ignition_files");
info!(
"[Stage 02/Bootstrap] Uploading SCOS installer images from {} to HTTP server",
okd_images_path.to_string_lossy()
);
info!(
r#"[Stage 02/Bootstrap] Images can be refreshed with: openshift-install coreos print-stream-json | grep -Eo '"https.*(kernel.|initramfs.|rootfs.)\w+(\.img)?"' | grep x86_64 | xargs -n 1 curl -LO"#
r#"Uploading images, they can be refreshed with a command similar to this one: openshift-install coreos print-stream-json | grep -Eo '"https.*(kernel.|initramfs.|rootfs.)\w+(\.img)?"' | grep x86_64 | xargs -n 1 curl -LO"#
);
StaticFilesHttpScore {
folder_to_serve: Some(Url::LocalFolder(
okd_images_path.to_string_lossy().to_string(),
)),
remote_path: Some("scos".to_string()),
files: vec![],
}
.interpret(inventory, topology)
.await?;
inquire::Confirm::new(
&format!("push installer image files with `scp -r {}/* root@{}:/usr/local/http/scos/` until performance issue is resolved", okd_images_path.to_string_lossy(), topology.http_server.get_ip())).prompt().expect("Prompt error");
info!("[Stage 02/Bootstrap] SCOS images uploaded successfully");
// let scos_http_path = PathBuf::from("scos");
// StaticFilesHttpScore {
// folder_to_serve: Some(Url::LocalFolder(
// okd_images_path.to_string_lossy().to_string(),
// )),
// remote_path: Some(scos_http_path.to_string_lossy().to_string()),
// files: vec![],
// }
// .interpret(inventory, topology)
// .await?;
Ok(())
}
@@ -276,7 +255,7 @@ impl OKDSetup02BootstrapInterpret {
physical_host,
host_config,
};
info!("[Stage 02/Bootstrap] Configuring host binding for bootstrap node {binding:?}");
info!("Configuring host binding for bootstrap node {binding:?}");
DhcpHostBindingScore {
host_binding: vec![binding],
@@ -329,7 +308,7 @@ impl OKDSetup02BootstrapInterpret {
let outcome = OKDBootstrapLoadBalancerScore::new(topology)
.interpret(inventory, topology)
.await?;
info!("[Stage 02/Bootstrap] Load balancer configured: {outcome:?}");
info!("Successfully executed OKDBootstrapLoadBalancerScore : {outcome:?}");
Ok(())
}
@@ -346,52 +325,10 @@ impl OKDSetup02BootstrapInterpret {
Ok(())
}
async fn wait_for_bootstrap_complete(
&self,
inventory: &Inventory,
) -> Result<(), InterpretError> {
info!("[Stage 02/Bootstrap] Waiting for bootstrap to complete...");
info!("[Stage 02/Bootstrap] Running: openshift-install wait-for bootstrap-complete");
let okd_installation_path =
format!("./data/okd/installation_files_{}", inventory.location.name);
let output = Command::new("./data/okd/bin/openshift-install")
.args([
"wait-for",
"bootstrap-complete",
"--dir",
&okd_installation_path,
"--log-level=info",
])
.output()
.await
.map_err(|e| {
InterpretError::new(format!(
"[Stage 02/Bootstrap] Failed to run openshift-install wait-for bootstrap-complete: {e}"
))
})?;
let stdout = String::from_utf8_lossy(&output.stdout);
let stderr = String::from_utf8_lossy(&output.stderr);
if !stdout.is_empty() {
info!("[Stage 02/Bootstrap] openshift-install stdout:\n{stdout}");
}
if !stderr.is_empty() {
info!("[Stage 02/Bootstrap] openshift-install stderr:\n{stderr}");
}
if !output.status.success() {
return Err(InterpretError::new(format!(
"[Stage 02/Bootstrap] bootstrap-complete failed (exit {}): {}",
output.status,
stderr.lines().last().unwrap_or("unknown error")
)));
}
info!("[Stage 02/Bootstrap] Bootstrap complete!");
Ok(())
async fn wait_for_bootstrap_complete(&self) -> Result<(), InterpretError> {
// Placeholder: wait-for bootstrap-complete
info!("[Bootstrap] Waiting for bootstrap-complete …");
todo!("[Bootstrap] Waiting for bootstrap-complete …")
}
async fn create_file(&self, path: &PathBuf, content: &[u8]) -> Result<(), InterpretError> {
@@ -444,7 +381,7 @@ impl Interpret<HAClusterTopology> for OKDSetup02BootstrapInterpret {
// self.validate_dns_config(inventory, topology).await?;
self.reboot_target().await?;
self.wait_for_bootstrap_complete(inventory).await?;
self.wait_for_bootstrap_complete().await?;
Ok(Outcome::success("Bootstrap phase complete".into()))
}

View File

@@ -1,4 +1,4 @@
use std::net::{IpAddr, Ipv4Addr, SocketAddr};
use std::net::SocketAddr;
use serde::Serialize;
@@ -19,30 +19,27 @@ pub struct OKDBootstrapLoadBalancerScore {
impl OKDBootstrapLoadBalancerScore {
pub fn new(topology: &HAClusterTopology) -> Self {
// Bind on 0.0.0.0 instead of the LAN IP to avoid CARP VIP race
// conditions where HAProxy fails to bind when the interface
// transitions back to master.
let bind_addr = IpAddr::V4(Ipv4Addr::UNSPECIFIED);
let private_ip = topology.router.get_gateway();
let private_services = vec![
LoadBalancerService {
backend_servers: Self::topology_to_backend_server(topology, 80),
listening_port: SocketAddr::new(bind_addr, 80),
listening_port: SocketAddr::new(private_ip, 80),
health_check: Some(HealthCheck::TCP(None)),
},
LoadBalancerService {
backend_servers: Self::topology_to_backend_server(topology, 443),
listening_port: SocketAddr::new(bind_addr, 443),
listening_port: SocketAddr::new(private_ip, 443),
health_check: Some(HealthCheck::TCP(None)),
},
LoadBalancerService {
backend_servers: Self::topology_to_backend_server(topology, 22623),
listening_port: SocketAddr::new(bind_addr, 22623),
listening_port: SocketAddr::new(private_ip, 22623),
health_check: Some(HealthCheck::TCP(None)),
},
LoadBalancerService {
backend_servers: Self::topology_to_backend_server(topology, 6443),
listening_port: SocketAddr::new(bind_addr, 6443),
listening_port: SocketAddr::new(private_ip, 6443),
health_check: Some(HealthCheck::HTTP(
None,
"/readyz".to_string(),
@@ -56,7 +53,6 @@ impl OKDBootstrapLoadBalancerScore {
load_balancer_score: LoadBalancerScore {
public_services: vec![],
private_services,
wan_firewall_ports: vec![80, 443],
},
}
}

View File

@@ -78,9 +78,9 @@ impl OKDNodeInterpret {
let required_hosts: i16 = okd_host_properties.required_hosts();
info!(
"[{}] Discovery of {} hosts in progress, {} found so far",
self.host_role,
"Discovery of {} {} hosts in progress, current number {}",
required_hosts,
self.host_role,
hosts.len()
);
// This score triggers the discovery agent for a specific role.
@@ -118,9 +118,8 @@ impl OKDNodeInterpret {
nodes: &Vec<(PhysicalHost, HostConfig)>,
) -> Result<(), InterpretError> {
info!(
"[{}] Configuring DHCP host bindings for {} nodes",
self.host_role,
nodes.len()
"[{}] Configuring host bindings for {} plane nodes.",
self.host_role, self.host_role,
);
let host_properties = self.okd_role_properties(&self.host_role);
@@ -297,18 +296,14 @@ impl Interpret<HAClusterTopology> for OKDNodeInterpret {
// and the cluster becomes fully functional only once all nodes are Ready and the
// cluster operators report Available=True.
info!(
"[{}] Provisioning initiated for {} nodes. Monitor cluster convergence with: oc get nodes && oc get co",
self.host_role,
nodes.len()
"[{}] Provisioning initiated. Monitor the cluster convergence manually.",
self.host_role
);
Ok(Outcome::success_with_details(
format!("{} provisioning initiated", self.host_role),
nodes
.iter()
.map(|(host, _)| format!(" {} (MACs: {:?})", host.id, host.get_mac_address()))
.collect(),
))
Ok(Outcome::success(format!(
"{} provisioning has been successfully initiated.",
self.host_role
)))
}
fn get_name(&self) -> InterpretName {

View File

@@ -1,6 +1,7 @@
use std::str::FromStr;
use async_trait::async_trait;
use brocade::{InterfaceConfig, PortChannelConfig, PortChannelId, Vlan};
use harmony_types::{id::Id, switch::PortLocation};
use log::{error, info, warn};
use serde::Serialize;
@@ -11,7 +12,10 @@ use crate::{
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
score::Score,
topology::{HostNetworkConfig, NetworkInterface, NetworkManager, Switch, SwitchPort, Topology},
topology::{
HostNetworkConfig, NetworkInterface, NetworkManager, Switch, SwitchPort,
Topology,
},
};
/// Configures high-availability networking for a set of physical hosts.
@@ -152,8 +156,9 @@ impl HostNetworkConfigurationInterpret {
InterpretError::new(format!("Failed to configure host network: {e}"))
})?;
let channel_id = todo!("Determine port-channel ID for this host");
topology
.configure_port_channel(&config)
.configure_port_channel(channel_id, &config)
.await
.map_err(|e| {
InterpretError::new(format!("Failed to configure host network: {e}"))
@@ -389,7 +394,7 @@ mod tests {
use crate::{
hardware::HostCategory,
topology::{
HostNetworkConfig, NetworkError, PortConfig, PreparationError, PreparationOutcome,
HostNetworkConfig, NetworkError, PreparationError, PreparationOutcome,
SwitchError, SwitchPort,
},
};
@@ -836,6 +841,7 @@ mod tests {
async fn configure_port_channel(
&self,
_channel_id: PortChannelId,
config: &HostNetworkConfig,
) -> Result<(), SwitchError> {
let mut configured_port_channels = self.configured_port_channels.lock().unwrap();
@@ -843,14 +849,26 @@ mod tests {
Ok(())
}
async fn configure_port_channel_from_config(
&self,
_config: &PortChannelConfig,
) -> Result<(), SwitchError> {
todo!()
}
async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
todo!()
}
async fn configure_interface(
async fn configure_interfaces(
&self,
port_config: &Vec<PortConfig>,
_interfaces: &Vec<InterfaceConfig>,
) -> Result<(), SwitchError> {
todo!()
}
async fn create_vlan(&self, _vlan: &Vlan) -> Result<(), SwitchError> {
todo!()
}
async fn delete_vlan(&self, _vlan: &Vlan) -> Result<(), SwitchError> {
todo!()
}
}
}

View File

@@ -74,7 +74,14 @@ impl<T: Topology + DhcpServer + TftpServer + HttpServer + Router> Interpret<T>
}),
Box::new(StaticFilesHttpScore {
remote_path: None,
folder_to_serve: Some(Url::LocalFolder("./data/pxe/okd/http_files/".to_string())),
// TODO The current russh based copy is way too slow, check for a lib update or use scp
// when available
//
// For now just run :
// scp -r data/pxe/okd/http_files/* root@192.168.1.1:/usr/local/http/
//
folder_to_serve: None,
// folder_to_serve: Some(Url::LocalFolder("./data/pxe/okd/http_files/".to_string())),
files: vec![
FileContent {
path: FilePath::Relative("boot.ipxe".to_string()),
@@ -116,9 +123,9 @@ impl<T: Topology + DhcpServer + TftpServer + HttpServer + Router> Interpret<T>
Err(e) => return Err(e),
};
}
Ok(Outcome::success(
"iPXE boot infrastructure installed".to_string(),
))
inquire::Confirm::new(&format!("Execute the copy : `scp -r data/pxe/okd/http_files/* root@{}:/usr/local/http/` and confirm when done to continue", HttpServer::get_ip(topology))).prompt().expect("Prompt error");
Ok(Outcome::success("Ipxe installed".to_string()))
}
fn get_name(&self) -> InterpretName {

View File

@@ -1,4 +1,4 @@
use std::net::{IpAddr, Ipv4Addr, SocketAddr};
use std::net::SocketAddr;
use serde::Serialize;
@@ -8,7 +8,7 @@ use crate::{
score::Score,
topology::{
BackendServer, HAClusterTopology, HealthCheck, HttpMethod, HttpStatusCode, LoadBalancer,
LoadBalancerService, SSL, Topology,
LoadBalancerService, LogicalHost, Router, SSL, Topology,
},
};
@@ -53,19 +53,16 @@ pub struct OKDLoadBalancerScore {
/// ```
impl OKDLoadBalancerScore {
pub fn new(topology: &HAClusterTopology) -> Self {
// Bind on 0.0.0.0 instead of the LAN IP to avoid CARP VIP race
// conditions where HAProxy fails to bind when the interface
// transitions back to master.
let bind_addr = IpAddr::V4(Ipv4Addr::UNSPECIFIED);
let public_ip = topology.router.get_gateway();
let public_services = vec![
LoadBalancerService {
backend_servers: Self::nodes_to_backend_server(topology, 80),
listening_port: SocketAddr::new(bind_addr, 80),
listening_port: SocketAddr::new(public_ip, 80),
health_check: None,
},
LoadBalancerService {
backend_servers: Self::nodes_to_backend_server(topology, 443),
listening_port: SocketAddr::new(bind_addr, 443),
listening_port: SocketAddr::new(public_ip, 443),
health_check: None,
},
];
@@ -73,7 +70,7 @@ impl OKDLoadBalancerScore {
let private_services = vec![
LoadBalancerService {
backend_servers: Self::nodes_to_backend_server(topology, 80),
listening_port: SocketAddr::new(bind_addr, 80),
listening_port: SocketAddr::new(public_ip, 80),
health_check: Some(HealthCheck::HTTP(
Some(25001),
"/health?check=okd_router_1936,node_ready".to_string(),
@@ -84,7 +81,7 @@ impl OKDLoadBalancerScore {
},
LoadBalancerService {
backend_servers: Self::nodes_to_backend_server(topology, 443),
listening_port: SocketAddr::new(bind_addr, 443),
listening_port: SocketAddr::new(public_ip, 443),
health_check: Some(HealthCheck::HTTP(
Some(25001),
"/health?check=okd_router_1936,node_ready".to_string(),
@@ -95,12 +92,12 @@ impl OKDLoadBalancerScore {
},
LoadBalancerService {
backend_servers: Self::control_plane_to_backend_server(topology, 22623),
listening_port: SocketAddr::new(bind_addr, 22623),
listening_port: SocketAddr::new(public_ip, 22623),
health_check: Some(HealthCheck::TCP(None)),
},
LoadBalancerService {
backend_servers: Self::control_plane_to_backend_server(topology, 6443),
listening_port: SocketAddr::new(bind_addr, 6443),
listening_port: SocketAddr::new(public_ip, 6443),
health_check: Some(HealthCheck::HTTP(
None,
"/readyz".to_string(),
@@ -114,7 +111,6 @@ impl OKDLoadBalancerScore {
load_balancer_score: LoadBalancerScore {
public_services,
private_services,
wan_firewall_ports: vec![80, 443],
},
}
}
@@ -169,7 +165,7 @@ mod tests {
use std::sync::{Arc, OnceLock};
use super::*;
use crate::topology::{DummyInfra, LogicalHost, Router};
use crate::topology::DummyInfra;
use harmony_macros::ip;
use harmony_types::net::IpAddress;
@@ -300,30 +296,6 @@ mod tests {
assert_eq!(public_service_443.backend_servers.len(), 5);
}
#[test]
fn test_all_services_bind_on_unspecified_address() {
let topology = create_test_topology();
let score = OKDLoadBalancerScore::new(&topology);
let unspecified = IpAddr::V4(Ipv4Addr::UNSPECIFIED);
for svc in &score.load_balancer_score.public_services {
assert_eq!(
svc.listening_port.ip(),
unspecified,
"Public service on port {} should bind on 0.0.0.0",
svc.listening_port.port()
);
}
for svc in &score.load_balancer_score.private_services {
assert_eq!(
svc.listening_port.ip(),
unspecified,
"Private service on port {} should bind on 0.0.0.0",
svc.listening_port.port()
);
}
}
#[test]
fn test_private_service_port_22623_only_control_plane() {
let topology = create_test_topology();
@@ -339,13 +311,6 @@ mod tests {
assert_eq!(private_service_22623.backend_servers.len(), 3);
}
#[test]
fn test_wan_firewall_ports_include_http_and_https() {
let topology = create_test_topology();
let score = OKDLoadBalancerScore::new(&topology);
assert_eq!(score.load_balancer_score.wan_firewall_ports, vec![80, 443]);
}
#[test]
fn test_all_backend_servers_have_correct_port() {
let topology = create_test_topology();

Some files were not shown because too many files have changed in this diff Show More