Compare commits
1 Commits
feat/opnse
...
worktree-b
| Author | SHA1 | Date | |
|---|---|---|---|
| 82d1f87ff8 |
12
.gitmodules
vendored
12
.gitmodules
vendored
@@ -1,15 +1,3 @@
|
||||
[submodule "examples/try_rust_webapp/tryrust.org"]
|
||||
path = examples/try_rust_webapp/tryrust.org
|
||||
url = https://github.com/rust-dd/tryrust.org.git
|
||||
[submodule "/home/jeangab/work/nationtech/harmony2/opnsense-codegen/vendor/core"]
|
||||
path = /home/jeangab/work/nationtech/harmony2/opnsense-codegen/vendor/core
|
||||
url = https://github.com/opnsense/core.git
|
||||
[submodule "/home/jeangab/work/nationtech/harmony2/opnsense-codegen/vendor/plugins"]
|
||||
path = /home/jeangab/work/nationtech/harmony2/opnsense-codegen/vendor/plugins
|
||||
url = https://github.com/opnsense/plugins.git
|
||||
[submodule "opnsense-codegen/vendor/core"]
|
||||
path = opnsense-codegen/vendor/core
|
||||
url = https://github.com/opnsense/core.git
|
||||
[submodule "opnsense-codegen/vendor/plugins"]
|
||||
path = opnsense-codegen/vendor/plugins
|
||||
url = https://github.com/opnsense/plugins.git
|
||||
|
||||
146
CLAUDE.md
146
CLAUDE.md
@@ -1,146 +0,0 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Build & Test Commands
|
||||
|
||||
```bash
|
||||
# Full CI check (check + fmt + clippy + test)
|
||||
./build/check.sh
|
||||
|
||||
# Individual commands
|
||||
cargo check --all-targets --all-features --keep-going
|
||||
cargo fmt --check # Check formatting
|
||||
cargo clippy # Lint
|
||||
cargo test # Run all tests
|
||||
|
||||
# Run a single test
|
||||
cargo test -p <crate_name> <test_name>
|
||||
|
||||
# Run a specific example
|
||||
cargo run -p <example_crate_name>
|
||||
|
||||
# Build the mdbook documentation
|
||||
mdbook build
|
||||
```
|
||||
|
||||
## What Harmony Is
|
||||
|
||||
Harmony is the orchestration framework powering NationTech's vision of **decentralized micro datacenters** — small computing clusters deployed in homes, offices, and community spaces instead of hyperscaler facilities. The goal: make computing cleaner, more resilient, locally beneficial, and resistant to centralized points of failure (including geopolitical threats).
|
||||
|
||||
Harmony exists because existing IaC tools (Terraform, Ansible, Helm) are trapped in a **YAML mud pit**: static configuration files validated only at runtime, fragmented across tools, with errors surfacing at 3 AM instead of at compile time. Harmony replaces this entire class of tools with a single Rust codebase where **the compiler catches infrastructure misconfigurations before anything is deployed**.
|
||||
|
||||
This is not a wrapper around existing tools. It is a paradigm shift: infrastructure-as-real-code with compile-time safety guarantees that no YAML/HCL/DSL-based tool can provide.
|
||||
|
||||
## The Score-Topology-Interpret Pattern
|
||||
|
||||
This is the core design pattern. Understand it before touching the codebase.
|
||||
|
||||
**Score** — declarative desired state. A Rust struct generic over `T: Topology` that describes *what* you want (e.g., "a PostgreSQL cluster", "DNS records for these hosts"). Scores are serializable, cloneable, idempotent.
|
||||
|
||||
**Topology** — infrastructure capabilities. Represents *where* things run and *what the environment can do*. Exposes capabilities as traits (`DnsServer`, `K8sclient`, `HelmCommand`, `LoadBalancer`, `Firewall`, etc.). Examples: `K8sAnywhereTopology` (local K3D or any K8s cluster), `HAClusterTopology` (bare-metal HA with redundant firewalls/switches).
|
||||
|
||||
**Interpret** — execution glue. Translates a Score into concrete operations against a Topology's capabilities. Returns an `Outcome` (SUCCESS, NOOP, FAILURE, RUNNING, QUEUED, BLOCKED).
|
||||
|
||||
**The key insight — compile-time safety through trait bounds:**
|
||||
```rust
|
||||
impl<T: Topology + DnsServer + DhcpServer> Score<T> for DnsScore { ... }
|
||||
```
|
||||
The compiler rejects any attempt to use `DnsScore` with a Topology that doesn't implement `DnsServer` and `DhcpServer`. Invalid infrastructure configurations become compilation errors, not runtime surprises.
|
||||
|
||||
**Higher-order topologies** compose transparently:
|
||||
- `FailoverTopology<T>` — primary/replica orchestration
|
||||
- `DecentralizedTopology<T>` — multi-site coordination
|
||||
|
||||
If `T: PostgreSQL`, then `FailoverTopology<T>: PostgreSQL` automatically via blanket impls. Zero boilerplate.
|
||||
|
||||
## Architecture (Hexagonal)
|
||||
|
||||
```
|
||||
harmony/src/
|
||||
├── domain/ # Core domain — the heart of the framework
|
||||
│ ├── score.rs # Score trait (desired state)
|
||||
│ ├── topology/ # Topology trait + implementations
|
||||
│ ├── interpret/ # Interpret trait + InterpretName enum (25+ variants)
|
||||
│ ├── inventory/ # Physical infrastructure metadata (hosts, switches, mgmt interfaces)
|
||||
│ ├── executors/ # Executor trait definitions
|
||||
│ └── maestro/ # Orchestration engine (registers scores, manages topology state, executes)
|
||||
├── infra/ # Infrastructure adapters (driven ports)
|
||||
│ ├── opnsense/ # OPNsense firewall adapter
|
||||
│ ├── brocade.rs # Brocade switch adapter
|
||||
│ ├── kube.rs # Kubernetes executor
|
||||
│ └── sqlx.rs # Database executor
|
||||
└── modules/ # Concrete deployment modules (23+)
|
||||
├── k8s/ # Kubernetes (namespaces, deployments, ingress)
|
||||
├── postgresql/ # CloudNativePG clusters + multi-site failover
|
||||
├── okd/ # OpenShift bare-metal from scratch
|
||||
├── helm/ # Helm chart inflation → vanilla K8s YAML
|
||||
├── opnsense/ # OPNsense (DHCP, DNS, etc.)
|
||||
├── monitoring/ # Prometheus, Alertmanager, Grafana
|
||||
├── kvm/ # KVM virtual machine management
|
||||
├── network/ # Network services (iPXE, TFTP, bonds)
|
||||
└── ...
|
||||
```
|
||||
|
||||
Domain types to know: `Inventory` (read-only physical infra context), `Maestro<T>` (orchestrator — calls `topology.ensure_ready()` then executes scores), `Outcome` / `InterpretError` (execution results).
|
||||
|
||||
## Key Crates
|
||||
|
||||
| Crate | Purpose |
|
||||
|---|---|
|
||||
| `harmony` | Core framework: domain, infra adapters, deployment modules |
|
||||
| `harmony_cli` | CLI + optional TUI (`--features tui`) |
|
||||
| `harmony_config` | Unified config+secret management (env → SQLite → OpenBao → interactive prompt) |
|
||||
| `harmony_secret` / `harmony_secret_derive` | Secret backends (LocalFile, OpenBao, Infisical) |
|
||||
| `harmony_execution` | Execution engine |
|
||||
| `harmony_agent` / `harmony_inventory_agent` | Persistent agent framework (NATS JetStream mesh), hardware discovery |
|
||||
| `harmony_assets` | Asset management (URLs, local cache, S3) |
|
||||
| `harmony_composer` | Infrastructure composition tool |
|
||||
| `harmony-k8s` | Kubernetes utilities |
|
||||
| `k3d` | Local K3D cluster management |
|
||||
| `brocade` | Brocade network switch integration |
|
||||
|
||||
## OPNsense Crates
|
||||
|
||||
The `opnsense-codegen` and `opnsense-api` crates exist because OPNsense's automation ecosystem is poor — no typed API client exists. These are support crates, not the core of Harmony.
|
||||
|
||||
- `opnsense-codegen`: XML model files → IR → Rust structs with serde helpers for OPNsense wire format quirks (`opn_bool` for "0"/"1" strings, `opn_u16`/`opn_u32` for string-encoded numbers). Vendor sources are git submodules under `opnsense-codegen/vendor/`.
|
||||
- `opnsense-api`: Hand-written `OpnsenseClient` + generated model types in `src/generated/`.
|
||||
|
||||
## Key Design Decisions (ADRs in docs/adr/)
|
||||
|
||||
- **ADR-001**: Rust chosen for type system, refactoring safety, and performance
|
||||
- **ADR-002**: Hexagonal architecture — domain isolated from adapters
|
||||
- **ADR-003**: Infrastructure abstractions at domain level, not provider level (no vendor lock-in)
|
||||
- **ADR-005**: Custom Rust DSL over YAML/Score-spec — real language, Cargo deps, composable
|
||||
- **ADR-007**: K3D as default runtime (K8s-certified, lightweight, cross-platform)
|
||||
- **ADR-009**: Helm charts inflated to vanilla K8s YAML, then deployed via existing code paths
|
||||
- **ADR-015**: Higher-order topologies via blanket trait impls (zero-cost composition)
|
||||
- **ADR-016**: Agent-based architecture with NATS JetStream for real-time failover and distributed consensus
|
||||
- **ADR-020**: Unified config+secret management — Rust struct is the schema, resolution chain: env → store → prompt
|
||||
|
||||
## Capability and Score Design Rules
|
||||
|
||||
**Capabilities are industry concepts, not tools.** A capability trait represents a standard infrastructure need (e.g., `DnsServer`, `LoadBalancer`, `Router`, `CertificateManagement`) that can be fulfilled by different products. OPNsense provides `DnsServer` today; CoreDNS or Route53 could provide it tomorrow. Scores must not break when the backend changes.
|
||||
|
||||
**Exception:** When the developer fundamentally needs to know the implementation. `PostgreSQL` is a capability (not `Database`) because the developer writes PostgreSQL-specific SQL and replication configs. Swapping to MariaDB would break the application, not just the infrastructure.
|
||||
|
||||
**Test:** If you could swap the underlying tool without rewriting any Score that uses the capability, the boundary is correct.
|
||||
|
||||
**Don't name capabilities after tools.** `SecretVault` not `OpenbaoStore`. `IdentityProvider` not `ZitadelAuth`. Think: what is the core developer need that leads to using this tool?
|
||||
|
||||
**Scores encapsulate operational complexity.** Move procedural knowledge (init sequences, retry logic, distribution-specific config) into Scores. A high-level example should be ~15 lines, not ~400 lines of imperative orchestration.
|
||||
|
||||
**Scores must be idempotent.** Running twice = same result as once. Use create-or-update, handle "already exists" gracefully.
|
||||
|
||||
**Scores must not depend on execution order.** Declare capability requirements via trait bounds, don't assume another Score ran first. If Score B needs what Score A provides, Score B should declare that capability as a trait bound.
|
||||
|
||||
See `docs/guides/writing-a-score.md` for the full guide.
|
||||
|
||||
## Conventions
|
||||
|
||||
- **Rust edition 2024**, resolver v2
|
||||
- **Conventional commits**: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`
|
||||
- **Small PRs**: max ~200 lines (excluding generated code), single-purpose
|
||||
- **License**: GNU AGPL v3
|
||||
- **Quality bar**: This framework demands high-quality engineering. The type system is a feature, not a burden. Leverage it. Prefer compile-time guarantees over runtime checks. Abstractions should be domain-level, not provider-specific.
|
||||
1154
Cargo.lock
generated
1154
Cargo.lock
generated
File diff suppressed because it is too large
Load Diff
10
Cargo.toml
10
Cargo.toml
@@ -16,9 +16,6 @@ members = [
|
||||
"harmony_inventory_agent",
|
||||
"harmony_secret_derive",
|
||||
"harmony_secret",
|
||||
"network_stress_test",
|
||||
"examples/kvm_okd_ha_cluster",
|
||||
"examples/example_linux_vm",
|
||||
"harmony_config_derive",
|
||||
"harmony_config",
|
||||
"brocade",
|
||||
@@ -26,7 +23,6 @@ members = [
|
||||
"harmony_agent/deploy",
|
||||
"harmony_node_readiness",
|
||||
"harmony-k8s",
|
||||
"harmony_assets", "opnsense-codegen", "opnsense-api",
|
||||
]
|
||||
|
||||
[workspace.package]
|
||||
@@ -41,10 +37,8 @@ derive-new = "0.7"
|
||||
async-trait = "0.1"
|
||||
tokio = { version = "1.40", features = [
|
||||
"io-std",
|
||||
"io-util",
|
||||
"fs",
|
||||
"macros",
|
||||
"net",
|
||||
"rt-multi-thread",
|
||||
] }
|
||||
tokio-retry = "0.3.0"
|
||||
@@ -79,7 +73,6 @@ base64 = "0.22.1"
|
||||
tar = "0.4.44"
|
||||
lazy_static = "1.5.0"
|
||||
directories = "6.0.0"
|
||||
futures-util = "0.3"
|
||||
thiserror = "2.0.14"
|
||||
serde = { version = "1.0.209", features = ["derive", "rc"] }
|
||||
serde_json = "1.0.127"
|
||||
@@ -93,6 +86,3 @@ reqwest = { version = "0.12", features = [
|
||||
"json",
|
||||
], default-features = false }
|
||||
assertor = "0.0.4"
|
||||
tokio-test = "0.4"
|
||||
anyhow = "1.0"
|
||||
clap = { version = "4", features = ["derive"] }
|
||||
|
||||
41
ROADMAP.md
41
ROADMAP.md
@@ -1,41 +0,0 @@
|
||||
# Harmony Roadmap
|
||||
|
||||
Eight phases to take Harmony from working prototype to production-ready open-source project.
|
||||
|
||||
| # | Phase | Status | Depends On | Detail |
|
||||
|---|-------|--------|------------|--------|
|
||||
| 1 | [Harden `harmony_config`](ROADMAP/01-config-crate.md) | Not started | — | Test every source, add SQLite backend, wire Zitadel + OpenBao, validate zero-setup UX |
|
||||
| 2 | [Migrate to `harmony_config`](ROADMAP/02-refactor-harmony-config.md) | Not started | 1 | Replace all 19 `SecretManager` call sites, deprecate direct `harmony_secret` usage |
|
||||
| 3 | [Complete `harmony_assets`](ROADMAP/03-assets-crate.md) | Not started | 1, 2 | Test, refactor k3d and OKD to use it, implement `Url::Url`, remove LFS |
|
||||
| 4 | [Publish to GitHub](ROADMAP/04-publish-github.md) | Not started | 3 | Clean history, set up GitHub as community hub, CI on self-hosted runners |
|
||||
| 5 | [E2E tests: PostgreSQL & RustFS](ROADMAP/05-e2e-tests-simple.md) | Not started | 1 | k3d-based test harness, two passing E2E tests, CI job |
|
||||
| 6 | [E2E tests: OKD HA on KVM](ROADMAP/06-e2e-tests-kvm.md) | Not started | 5 | KVM test infrastructure, full OKD installation test, nightly CI |
|
||||
| 7 | [OPNsense & Bare-Metal Network Automation](ROADMAP/07-opnsense-bare-metal.md) | **In progress** | — | Full OPNsense API coverage, Brocade switch integration, HA cluster network provisioning |
|
||||
| 8 | [HA OKD Production Deployment](ROADMAP/08-ha-okd-production.md) | Not started | 7 | LAGG/CARP/multi-WAN/BINAT cluster with UpdateHostScore, end-to-end bare-metal automation |
|
||||
| 9 | [SSO + Config Hardening](ROADMAP/09-sso-config-hardening.md) | **In progress** | 1 | Builder pattern for OpenbaoSecretStore, ZitadelScore PG fix, CoreDNSRewriteScore, integration tests |
|
||||
|
||||
## Current State (as of branch `feat/opnsense-codegen`)
|
||||
|
||||
- `harmony_config` crate exists with `EnvSource`, `LocalFileSource`, `PromptSource`, `StoreSource`. 12 unit tests. **Zero consumers** in workspace — everything still uses `harmony_secret::SecretManager` directly (19 call sites).
|
||||
- `harmony_assets` crate exists with `Asset`, `LocalCache`, `LocalStore`, `S3Store`. **No tests. Zero consumers.** The `k3d` crate has its own `DownloadableAsset` with identical functionality and full test coverage.
|
||||
- `harmony_secret` has `LocalFileSecretStore`, `OpenbaoSecretStore` (token/userpass/OIDC device flow + JWT exchange), `InfisicalSecretStore`. Zitadel OIDC integration **implemented** with session caching.
|
||||
- **SSO example** (`examples/harmony_sso/`): deploys Zitadel + OpenBao on k3d, provisions identity resources, authenticates via device flow, stores config in OpenBao. `OpenbaoSetupScore` and `ZitadelSetupScore` encapsulate day-two operations.
|
||||
- KVM module exists on this branch with `KvmExecutor`, VM lifecycle, ISO download, two examples (`example_linux_vm`, `kvm_okd_ha_cluster`).
|
||||
- RustFS module exists on `feat/rustfs` branch (2 commits ahead of master).
|
||||
- 39 example crates, **zero E2E tests**. Unit tests pass across workspace (~240 tests).
|
||||
- CI runs `cargo check`, `fmt`, `clippy`, `test` on Gitea. No E2E job.
|
||||
|
||||
### OPNsense & Bare-Metal (as of branch `feat/opnsense-codegen`)
|
||||
|
||||
- **9 OPNsense Scores** implemented: VlanScore, LaggScore, VipScore, DnatScore, FirewallRuleScore, OutboundNatScore, BinatScore, NodeExporterScore, OPNsenseShellCommandScore. All tested against a 4-NIC VM.
|
||||
- **opnsense-codegen** pipeline operational: XML → IR → typed Rust structs with serde helpers. 11 generated API modules (26.5K lines).
|
||||
- **opnsense-config** has 13 modules: DHCP (dnsmasq), DNS, firewall, LAGG, VIP, VLAN, load balancer (HAProxy), Caddy, TFTP, node exporter, and legacy DHCP.
|
||||
- **Brocade switch integration** on `feat/brocade-client-add-vlans`: full VLAN CRUD, interface speed config, port-channel management, new `BrocadeSwitchConfigurationScore`. Breaking API changes (InterfaceConfig replaces tuples).
|
||||
- **Missing for production**: `UpdateHostScore` (update MAC in DHCP for PXE boot + host network setup for LAGG LACP 802.3ad), `HostNetworkConfigurationScore` needs rework for LAGG/LACP (currently only creates bonds, doesn't configure LAGG on OPNsense side), brocade branch needs merge and API adaptation in `harmony/src/infra/brocade.rs`.
|
||||
|
||||
## Guiding Principles
|
||||
|
||||
- **Zero-setup first**: A new user clones, runs `cargo run`, gets prompted for config, values persist to local SQLite. No env vars, no external services required.
|
||||
- **Progressive disclosure**: Local SQLite → OpenBao → Zitadel SSO. Each layer is opt-in.
|
||||
- **Test what ships**: Every example that works should have an E2E test proving it works.
|
||||
- **Community over infrastructure**: GitHub for engagement, self-hosted runners for CI.
|
||||
@@ -1,623 +0,0 @@
|
||||
# Phase 1: Harden `harmony_config`, Validate UX, Zero-Setup Starting Point
|
||||
|
||||
## Goal
|
||||
|
||||
Make `harmony_config` production-ready with a seamless first-run experience: clone, run, get prompted, values persist locally. Then progressively add team-scale backends (OpenBao, Zitadel SSO) without changing any calling code.
|
||||
|
||||
## Current State
|
||||
|
||||
`harmony_config` now has:
|
||||
|
||||
- `Config` trait + `#[derive(Config)]` macro
|
||||
- `ConfigManager` with ordered source chain
|
||||
- Five `ConfigSource` implementations:
|
||||
- `EnvSource` — reads `HARMONY_CONFIG_{KEY}` env vars
|
||||
- `LocalFileSource` — reads/writes `{key}.json` files from a directory
|
||||
- `SqliteSource` — **NEW** reads/writes to SQLite database
|
||||
- `PromptSource` — returns `None` / no-op on set (placeholder for TUI integration)
|
||||
- `StoreSource<S: SecretStore>` — wraps any `harmony_secret::SecretStore` backend
|
||||
- 26 unit tests (mock source, env, local file, sqlite, prompt, integration, store graceful fallback)
|
||||
- Global `CONFIG_MANAGER` static with `init()`, `get()`, `get_or_prompt()`, `set()`
|
||||
- Two examples: `basic` and `prompting` in `harmony_config/examples/`
|
||||
- **Zero workspace consumers** — nothing calls `harmony_config` yet
|
||||
|
||||
## Tasks
|
||||
|
||||
### 1.1 Add `SqliteSource` as the default zero-setup backend ✅
|
||||
|
||||
**Status**: Implemented
|
||||
|
||||
**Implementation Details**:
|
||||
|
||||
- Database location: `~/.local/share/harmony/config/config.db` (directory is auto-created)
|
||||
- Schema: `config(key TEXT PRIMARY KEY, value TEXT NOT NULL, updated_at TEXT NOT NULL DEFAULT (datetime('now')))`
|
||||
- Uses `sqlx` with SQLite runtime
|
||||
- `SqliteSource::open(path)` - opens/creates database at given path
|
||||
- `SqliteSource::default()` - uses default Harmony data directory
|
||||
|
||||
**Files**:
|
||||
- `harmony_config/src/source/sqlite.rs` - new file
|
||||
- `harmony_config/Cargo.toml` - added `sqlx = { workspace = true, features = ["runtime-tokio", "sqlite"] }`
|
||||
- `Cargo.toml` - added `anyhow = "1.0"` to workspace dependencies
|
||||
|
||||
**Tests** (all passing):
|
||||
- `test_sqlite_set_and_get` — round-trip a `TestConfig` struct
|
||||
- `test_sqlite_get_returns_none_when_missing` — key not in DB
|
||||
- `test_sqlite_overwrites_on_set` — set twice, get returns latest
|
||||
- `test_sqlite_concurrent_access` — two tasks writing different keys simultaneously
|
||||
|
||||
### 1.1.1 Add Config example to show exact DX and confirm functionality ✅
|
||||
|
||||
**Status**: Implemented
|
||||
|
||||
**Examples created**:
|
||||
|
||||
1. `harmony_config/examples/basic.rs` - demonstrates:
|
||||
- Zero-setup SQLite backend (auto-creates directory)
|
||||
- Using the `#[derive(Config)]` macro
|
||||
- Environment variable override (`HARMONY_CONFIG_TestConfig` overrides SQLite)
|
||||
- Direct set/get operations
|
||||
- Persistence verification
|
||||
|
||||
2. `harmony_config/examples/prompting.rs` - demonstrates:
|
||||
- Config with no defaults (requires user input via `inquire`)
|
||||
- `get()` flow: env > sqlite > prompt fallback
|
||||
- `get_or_prompt()` for interactive configuration
|
||||
- Full resolution chain
|
||||
- Persistence of prompted values
|
||||
|
||||
### 1.2 Make `PromptSource` functional ✅
|
||||
|
||||
**Status**: Implemented with design improvement
|
||||
|
||||
**Key Finding - Bug Fixed During Implementation**:
|
||||
|
||||
The original design had a critical bug in `get_or_prompt()`:
|
||||
```rust
|
||||
// OLD (BUGGY) - breaks on first source where set() returns Ok(())
|
||||
for source in &self.sources {
|
||||
if source.set(T::KEY, &value).await.is_ok() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Since `EnvSource.set()` returns `Ok(())` (successfully sets env var), the loop would break immediately and never write to `SqliteSource`. Prompted values were never persisted!
|
||||
|
||||
**Solution - Added `should_persist()` method to ConfigSource trait**:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait ConfigSource: Send + Sync {
|
||||
async fn get(&self, key: &str) -> Result<Option<serde_json::Value>, ConfigError>;
|
||||
async fn set(&self, key: &str, value: &serde_json::Value) -> Result<(), ConfigError>;
|
||||
fn should_persist(&self) -> bool {
|
||||
true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `EnvSource::should_persist()` returns `false` - shouldn't persist prompted values to env vars
|
||||
- `PromptSource::should_persist()` returns `false` - doesn't persist anyway
|
||||
- `get_or_prompt()` now skips sources where `should_persist()` is `false`
|
||||
|
||||
**Updated `get_or_prompt()`**:
|
||||
```rust
|
||||
for source in &self.sources {
|
||||
if !source.should_persist() {
|
||||
continue;
|
||||
}
|
||||
if source.set(T::KEY, &value).await.is_ok() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Tests**:
|
||||
- `test_prompt_source_always_returns_none`
|
||||
- `test_prompt_source_set_is_noop`
|
||||
- `test_prompt_source_does_not_persist`
|
||||
- `test_full_chain_with_prompt_source_falls_through_to_prompt`
|
||||
|
||||
### 1.3 Integration test: full resolution chain ✅
|
||||
|
||||
**Status**: Implemented
|
||||
|
||||
**Tests**:
|
||||
- `test_full_resolution_chain_sqlite_fallback` — env not set, sqlite has value, get() returns sqlite
|
||||
- `test_full_resolution_chain_env_overrides_sqlite` — env set, sqlite has value, get() returns env
|
||||
- `test_branch_switching_scenario_deserialization_error` — old struct shape in sqlite returns Deserialization error
|
||||
|
||||
### 1.4 Validate Zitadel + OpenBao integration path ⏳
|
||||
|
||||
**Status**: Planning phase - detailed execution plan below
|
||||
|
||||
**Background**: ADR 020-1 documents the target architecture for Zitadel OIDC + OpenBao integration. This task validates the full chain by deploying Zitadel and OpenBao on a local k3d cluster and demonstrating an end-to-end example.
|
||||
|
||||
**Architecture Overview**:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Harmony CLI / App │
|
||||
│ │
|
||||
│ ConfigManager: │
|
||||
│ 1. EnvSource ← HARMONY_CONFIG_* env vars (highest priority) │
|
||||
│ 2. SqliteSource ← ~/.local/share/harmony/config/config.db │
|
||||
│ 3. StoreSource ← OpenBao (team-scale, via Zitadel OIDC) │
|
||||
│ │
|
||||
│ When StoreSource fails (OpenBao unreachable): │
|
||||
│ → returns Ok(None), chain falls through to SqliteSource │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
┌──────────────────┐ ┌──────────────────┐
|
||||
│ Zitadel │ │ OpenBao │
|
||||
│ (IdP + OIDC) │ │ (Secret Store) │
|
||||
│ │ │ │
|
||||
│ Device Auth │────JWT──▶│ JWT Auth │
|
||||
│ Flow (RFC 8628)│ │ Method │
|
||||
└──────────────────┘ └──────────────────┘
|
||||
```
|
||||
|
||||
**Prerequisites**:
|
||||
- Docker running (for k3d)
|
||||
- Rust toolchain (edition 2024)
|
||||
- Network access to download Helm charts
|
||||
- `kubectl` (installed automatically with k3d, or pre-installed)
|
||||
|
||||
**Step-by-Step Execution Plan**:
|
||||
|
||||
#### Step 1: Create k3d cluster for local development
|
||||
|
||||
When you run `cargo run -p example-zitadel` (or any example using `K8sAnywhereTopology::from_env()`), Harmony automatically provisions a k3d cluster if one does not exist. By default:
|
||||
|
||||
- `use_local_k3d = true` (env: `HARMONY_USE_LOCAL_K3D`, default `true`)
|
||||
- `autoinstall = true` (env: `HARMONY_AUTOINSTALL`, default `true`)
|
||||
- Cluster name: **`harmony`** (hardcoded in `K3DInstallationScore::default()`)
|
||||
- k3d binary is downloaded to `~/.local/share/harmony/k3d/`
|
||||
- Kubeconfig is merged into `~/.kube/config`, context set to `k3d-harmony`
|
||||
|
||||
No manual `k3d cluster create` is needed. If you want to create the cluster manually first:
|
||||
|
||||
```bash
|
||||
# Install k3d (requires sudo or install to user path)
|
||||
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
|
||||
|
||||
# Create the cluster with the same name Harmony expects
|
||||
k3d cluster create harmony
|
||||
kubectl cluster-info --context k3d-harmony
|
||||
```
|
||||
|
||||
**Validation**: `kubectl get nodes --context k3d-harmony` shows 1 server node (k3d default)
|
||||
|
||||
**Note**: The existing examples use hardcoded external hostnames (e.g., `sso.sto1.nationtech.io`) for ingress. On a local k3d cluster, these hostnames are not routable. For local development you must either:
|
||||
- Use `kubectl port-forward` to access services directly
|
||||
- Configure `/etc/hosts` entries pointing to `127.0.0.1`
|
||||
- Use a k3d loadbalancer with `--port` mappings
|
||||
|
||||
#### Step 2: Deploy Zitadel
|
||||
|
||||
Zitadel requires the topology to implement `Topology + K8sclient + HelmCommand + PostgreSQL`. The `K8sAnywhereTopology` satisfies all four.
|
||||
|
||||
```bash
|
||||
cargo run -p example-zitadel
|
||||
```
|
||||
|
||||
**What happens internally** (see `harmony/src/modules/zitadel/mod.rs`):
|
||||
|
||||
1. Creates `zitadel` namespace via `K8sResourceScore`
|
||||
2. Deploys a CNPG PostgreSQL cluster:
|
||||
- Name: `zitadel-pg`
|
||||
- Instances: **2** (not 1)
|
||||
- Storage: 10Gi
|
||||
- Namespace: `zitadel`
|
||||
3. Resolves the internal DB endpoint (`host:port`) from the CNPG cluster
|
||||
4. Generates a 32-byte alphanumeric masterkey, stores it as Kubernetes Secret `zitadel-masterkey` (idempotent: skips if it already exists)
|
||||
5. Generates a 16-char admin password (guaranteed 1+ uppercase, lowercase, digit, symbol)
|
||||
6. Deploys Zitadel Helm chart (`zitadel/zitadel` from `https://charts.zitadel.com`):
|
||||
- `chart_version: None` -- **uses latest chart version** (not pinned)
|
||||
- No `--wait` flag -- returns before pods are ready
|
||||
- Ingress annotations are **OpenShift-oriented** (`route.openshift.io/termination: edge`, `cert-manager.io/cluster-issuer: letsencrypt-prod`). On k3d these annotations are silently ignored.
|
||||
- Ingress includes TLS config with `secretName: "{host}-tls"`, which requires cert-manager. Without cert-manager, TLS termination does not happen at the ingress level.
|
||||
|
||||
**Key Helm values set by ZitadelScore**:
|
||||
- `zitadel.configmapConfig.ExternalDomain`: the `host` field (e.g., `sso.sto1.nationtech.io`)
|
||||
- `zitadel.configmapConfig.ExternalSecure: true`
|
||||
- `zitadel.configmapConfig.TLS.Enabled: false` (TLS at ingress, not in Zitadel)
|
||||
- Admin user: `UserName: "admin"`, Email: **`admin@zitadel.example.com`** (hardcoded, not derived from host)
|
||||
- Database credentials: injected via `env[].valueFrom.secretKeyRef` from secret `zitadel-pg-superuser` (both user and admin use the same superuser -- there is a TODO to fix this)
|
||||
|
||||
**Expected output**:
|
||||
```
|
||||
===== ZITADEL DEPLOYMENT COMPLETE =====
|
||||
Login URL: https://sso.sto1.nationtech.io
|
||||
Username: admin@zitadel.sso.sto1.nationtech.io
|
||||
Password: <generated 16-char password>
|
||||
```
|
||||
|
||||
**Note on the success message**: The printed username `admin@zitadel.{host}` does not match the actual configured email `admin@zitadel.example.com`. The actual login username in Zitadel is `admin` (the `UserName` field). This discrepancy exists in the current code.
|
||||
|
||||
**Validation on k3d**:
|
||||
```bash
|
||||
# Wait for pods to be ready (Helm returns before readiness)
|
||||
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=zitadel -n zitadel --timeout=300s
|
||||
|
||||
# Port-forward to access Zitadel (ingress won't work without proper DNS/TLS on k3d)
|
||||
kubectl port-forward svc/zitadel -n zitadel 8080:8080
|
||||
|
||||
# Access at http://localhost:8080 (note: ExternalSecure=true may cause redirect issues)
|
||||
```
|
||||
|
||||
**Known issues for k3d deployment**:
|
||||
- `ExternalSecure: true` tells Zitadel to expect HTTPS, but k3d port-forward is HTTP. This may cause redirect loops. Override with: modify the example to set `ExternalSecure: false` for local dev.
|
||||
- The CNPG operator must be installed on the cluster. `K8sAnywhereTopology` handles this via the `PostgreSQL` trait implementation, which deploys the operator first.
|
||||
|
||||
#### Step 3: Deploy OpenBao
|
||||
|
||||
OpenBao requires only `Topology + K8sclient + HelmCommand` (no PostgreSQL dependency).
|
||||
|
||||
```bash
|
||||
cargo run -p example-openbao
|
||||
```
|
||||
|
||||
**What happens internally** (see `harmony/src/modules/openbao/mod.rs`):
|
||||
|
||||
1. `OpenbaoScore` directly delegates to `HelmChartScore.create_interpret()` -- there is no custom `execute()` logic, no namespace creation step, no secret generation
|
||||
2. Deploys OpenBao Helm chart (`openbao/openbao` from `https://openbao.github.io/openbao-helm`):
|
||||
- `chart_version: None` -- **uses latest chart version** (not pinned)
|
||||
- `create_namespace: true` -- the `openbao` namespace is created by Helm
|
||||
- `install_only: false` -- uses `helm upgrade --install`
|
||||
|
||||
**Exact Helm values set by OpenbaoScore**:
|
||||
```yaml
|
||||
global:
|
||||
openshift: true # <-- PROBLEM: hardcoded, see below
|
||||
server:
|
||||
standalone:
|
||||
enabled: true
|
||||
config: |
|
||||
ui = true
|
||||
listener "tcp" {
|
||||
tls_disable = true
|
||||
address = "[::]:8200"
|
||||
cluster_address = "[::]:8201"
|
||||
}
|
||||
storage "file" {
|
||||
path = "/openbao/data"
|
||||
}
|
||||
service:
|
||||
enabled: true
|
||||
ingress:
|
||||
enabled: true
|
||||
hosts:
|
||||
- host: <host field> # e.g., openbao.sebastien.sto1.nationtech.io
|
||||
dataStorage:
|
||||
enabled: true
|
||||
size: 10Gi
|
||||
storageClass: null # uses cluster default
|
||||
accessMode: ReadWriteOnce
|
||||
auditStorage:
|
||||
enabled: true
|
||||
size: 10Gi
|
||||
storageClass: null
|
||||
accessMode: ReadWriteOnce
|
||||
ui:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
**Critical issue: `global.openshift: true` is hardcoded.** The OpenBao Helm chart default is `global.openshift: false`. When set to `true`, the chart adjusts security contexts and may create OpenShift Routes instead of standard Kubernetes Ingress resources. **On k3d (vanilla k8s), this will produce resources that may not work correctly.** Before deploying on k3d, this must be overridden.
|
||||
|
||||
**Fix required for k3d**: Either:
|
||||
1. Modify `OpenbaoScore` to accept an `openshift: bool` field (preferred long-term fix)
|
||||
2. Or for this example, create a custom example that passes `values_overrides` with `global.openshift=false`
|
||||
|
||||
**Post-deployment initialization** (manual -- the TODO in `mod.rs` acknowledges this is not automated):
|
||||
|
||||
OpenBao starts in a sealed state. You must initialize and unseal it manually. See https://openbao.org/docs/platform/k8s/helm/run/
|
||||
|
||||
```bash
|
||||
# Initialize OpenBao (generates unseal keys + root token)
|
||||
kubectl exec -n openbao openbao-0 -- bao operator init
|
||||
|
||||
# Save the output! It contains 5 unseal keys and the root token.
|
||||
# Example output:
|
||||
# Unseal Key 1: abc123...
|
||||
# Unseal Key 2: def456...
|
||||
# ...
|
||||
# Initial Root Token: hvs.xxxxx
|
||||
|
||||
# Unseal (requires 3 of 5 keys by default)
|
||||
kubectl exec -n openbao openbao-0 -- bao operator unseal <key1>
|
||||
kubectl exec -n openbao openbao-0 -- bao operator unseal <key2>
|
||||
kubectl exec -n openbao openbao-0 -- bao operator unseal <key3>
|
||||
```
|
||||
|
||||
**Validation**:
|
||||
```bash
|
||||
kubectl exec -n openbao openbao-0 -- bao status
|
||||
# Should show "Sealed: false"
|
||||
```
|
||||
|
||||
**Note**: The ingress has **no TLS configuration** (unlike Zitadel's ingress). Access is HTTP-only unless you configure TLS separately.
|
||||
|
||||
#### Step 4: Configure OpenBao for Harmony
|
||||
|
||||
Two paths are available depending on the authentication method:
|
||||
|
||||
##### Path A: Userpass auth (simpler, for local dev)
|
||||
|
||||
The current `OpenbaoSecretStore` supports **token** and **userpass** authentication. It does NOT yet implement the JWT/OIDC device flow described in ADR 020-1.
|
||||
|
||||
```bash
|
||||
# Port-forward to access OpenBao API
|
||||
kubectl port-forward svc/openbao -n openbao 8200:8200 &
|
||||
|
||||
export BAO_ADDR="http://127.0.0.1:8200"
|
||||
export BAO_TOKEN="<root token from init>"
|
||||
|
||||
# Enable KV v2 secrets engine (default mount "secret")
|
||||
bao secrets enable -path=secret kv-v2
|
||||
|
||||
# Enable userpass auth method
|
||||
bao auth enable userpass
|
||||
|
||||
# Create a user for Harmony
|
||||
bao write auth/userpass/login/harmony password="harmony-dev-password"
|
||||
|
||||
# Create policy granting read/write on harmony/* paths
|
||||
cat <<'EOF' | bao policy write harmony-dev -
|
||||
path "secret/data/harmony/*" {
|
||||
capabilities = ["create", "read", "update", "delete", "list"]
|
||||
}
|
||||
path "secret/metadata/harmony/*" {
|
||||
capabilities = ["list", "read", "delete"]
|
||||
}
|
||||
EOF
|
||||
|
||||
# Create the user with the policy attached
|
||||
bao write auth/userpass/users/harmony \
|
||||
password="harmony-dev-password" \
|
||||
policies="harmony-dev"
|
||||
```
|
||||
|
||||
**Bug in `OpenbaoSecretStore::authenticate_userpass()`**: The `kv_mount` parameter (default `"secret"`) is passed to `vaultrs::auth::userpass::login()` as the auth mount path. This means it calls `POST /v1/auth/secret/login/{username}` instead of the correct `POST /v1/auth/userpass/login/{username}`. **The auth mount and KV mount are conflated into one parameter.**
|
||||
|
||||
**Workaround**: Set `OPENBAO_KV_MOUNT=userpass` so the auth call hits the correct mount path. But then KV operations would use mount `userpass` instead of `secret`, which is wrong.
|
||||
|
||||
**Proper fix needed**: Split `kv_mount` into two separate parameters: one for the KV v2 engine mount (`secret`) and one for the auth mount (`userpass`). This is a bug in `harmony_secret/src/store/openbao.rs:234`.
|
||||
|
||||
**For this example**: Use **token auth** instead of userpass to sidestep the bug:
|
||||
|
||||
```bash
|
||||
# Set env vars for the example
|
||||
export OPENBAO_URL="http://127.0.0.1:8200"
|
||||
export OPENBAO_TOKEN="<root token from init>"
|
||||
export OPENBAO_KV_MOUNT="secret"
|
||||
```
|
||||
|
||||
##### Path B: JWT auth with Zitadel (target architecture, per ADR 020-1)
|
||||
|
||||
This is the production path described in the ADR. It requires the device flow code that is **not yet implemented** in `OpenbaoSecretStore`. The current code only supports token and userpass.
|
||||
|
||||
When implemented, the flow will be:
|
||||
1. Enable JWT auth method in OpenBao
|
||||
2. Configure it to trust Zitadel's OIDC discovery URL
|
||||
3. Create a role that maps Zitadel JWT claims to OpenBao policies
|
||||
|
||||
```bash
|
||||
# Enable JWT auth
|
||||
bao auth enable jwt
|
||||
|
||||
# Configure JWT auth to trust Zitadel
|
||||
bao write auth/jwt/config \
|
||||
oidc_discovery_url="https://<zitadel-host>" \
|
||||
bound_issuer="https://<zitadel-host>"
|
||||
|
||||
# Create role for Harmony developers
|
||||
bao write auth/jwt/role/harmony-developer \
|
||||
role_type="jwt" \
|
||||
bound_audiences="<harmony_client_id>" \
|
||||
user_claim="email" \
|
||||
groups_claim="urn:zitadel:iam:org:project:roles" \
|
||||
policies="harmony-dev" \
|
||||
ttl="4h" \
|
||||
max_ttl="24h" \
|
||||
token_type="service"
|
||||
```
|
||||
|
||||
**Zitadel application setup** (in Zitadel console):
|
||||
1. Create project: `Harmony`
|
||||
2. Add application: `Harmony CLI` (Native app type)
|
||||
3. Enable Device Authorization grant type
|
||||
4. Set scopes: `openid email profile offline_access`
|
||||
5. Note the `client_id`
|
||||
|
||||
This path is deferred until the device flow is implemented in `OpenbaoSecretStore`.
|
||||
|
||||
#### Step 5: Write end-to-end example
|
||||
|
||||
The example uses `StoreSource<OpenbaoSecretStore>` with token auth to avoid the userpass mount bug.
|
||||
|
||||
**Environment variables required** (from `harmony_secret/src/config.rs`):
|
||||
|
||||
| Variable | Required | Default | Notes |
|
||||
|---|---|---|---|
|
||||
| `OPENBAO_URL` | Yes | None | Falls back to `VAULT_ADDR` |
|
||||
| `OPENBAO_TOKEN` | For token auth | None | Root or user token |
|
||||
| `OPENBAO_USERNAME` | For userpass | None | Requires `OPENBAO_PASSWORD` too |
|
||||
| `OPENBAO_PASSWORD` | For userpass | None | |
|
||||
| `OPENBAO_KV_MOUNT` | No | `"secret"` | KV v2 engine mount path. **Also used as userpass auth mount -- this is a bug.** |
|
||||
| `OPENBAO_SKIP_TLS` | No | `false` | Set `"true"` to disable TLS verification |
|
||||
|
||||
**Note**: `OpenbaoSecretStore::new()` is `async` and **requires a running OpenBao** at construction time (it validates the token if using cached auth). If OpenBao is unreachable during construction, the call will fail. The graceful fallback only applies to `StoreSource::get()` calls after construction -- the `ConfigManager` must be built with a live store, or the store must be wrapped in a lazy initialization pattern.
|
||||
|
||||
```rust
|
||||
// harmony_config/examples/openbao_chain.rs
|
||||
use harmony_config::{ConfigManager, EnvSource, SqliteSource, StoreSource};
|
||||
use harmony_secret::OpenbaoSecretStore;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::sync::Arc;
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, schemars::JsonSchema, PartialEq)]
|
||||
struct AppConfig {
|
||||
host: String,
|
||||
port: u16,
|
||||
}
|
||||
|
||||
impl harmony_config::Config for AppConfig {
|
||||
const KEY: &'static str = "AppConfig";
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> anyhow::Result<()> {
|
||||
env_logger::init();
|
||||
|
||||
// Build the source chain
|
||||
let env_source: Arc<dyn harmony_config::ConfigSource> = Arc::new(EnvSource);
|
||||
|
||||
let sqlite = Arc::new(
|
||||
SqliteSource::default()
|
||||
.await
|
||||
.expect("Failed to open SQLite"),
|
||||
);
|
||||
|
||||
// OpenBao store -- requires OPENBAO_URL and OPENBAO_TOKEN env vars
|
||||
// Falls back gracefully if OpenBao is unreachable at query time
|
||||
let openbao_url = std::env::var("OPENBAO_URL")
|
||||
.or(std::env::var("VAULT_ADDR"))
|
||||
.ok();
|
||||
|
||||
let sources: Vec<Arc<dyn harmony_config::ConfigSource>> = if let Some(url) = openbao_url {
|
||||
let kv_mount = std::env::var("OPENBAO_KV_MOUNT")
|
||||
.unwrap_or_else(|_| "secret".to_string());
|
||||
let skip_tls = std::env::var("OPENBAO_SKIP_TLS")
|
||||
.map(|v| v == "true")
|
||||
.unwrap_or(false);
|
||||
|
||||
match OpenbaoSecretStore::new(
|
||||
url,
|
||||
kv_mount,
|
||||
skip_tls,
|
||||
std::env::var("OPENBAO_TOKEN").ok(),
|
||||
std::env::var("OPENBAO_USERNAME").ok(),
|
||||
std::env::var("OPENBAO_PASSWORD").ok(),
|
||||
)
|
||||
.await
|
||||
{
|
||||
Ok(store) => {
|
||||
let store_source = Arc::new(StoreSource::new("harmony".to_string(), store));
|
||||
vec![env_source, Arc::clone(&sqlite) as _, store_source]
|
||||
}
|
||||
Err(e) => {
|
||||
eprintln!("Warning: OpenBao unavailable ({e}), using local sources only");
|
||||
vec![env_source, sqlite]
|
||||
}
|
||||
}
|
||||
} else {
|
||||
println!("No OPENBAO_URL set, using local sources only");
|
||||
vec![env_source, sqlite]
|
||||
};
|
||||
|
||||
let manager = ConfigManager::new(sources);
|
||||
|
||||
// Scenario 1: get() with nothing stored -- returns NotFound
|
||||
let result = manager.get::<AppConfig>().await;
|
||||
println!("Get (empty): {:?}", result);
|
||||
|
||||
// Scenario 2: set() then get()
|
||||
let config = AppConfig {
|
||||
host: "production.example.com".to_string(),
|
||||
port: 443,
|
||||
};
|
||||
manager.set(&config).await?;
|
||||
println!("Set: {:?}", config);
|
||||
|
||||
let retrieved = manager.get::<AppConfig>().await?;
|
||||
println!("Get (after set): {:?}", retrieved);
|
||||
assert_eq!(config, retrieved);
|
||||
|
||||
println!("End-to-end chain validated!");
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
**Key behaviors demonstrated**:
|
||||
1. **Graceful construction fallback**: If `OPENBAO_URL` is not set or OpenBao is unreachable at startup, the chain is built without it
|
||||
2. **Graceful query fallback**: `StoreSource::get()` returns `Ok(None)` on any error, so the chain continues to SQLite
|
||||
3. **Environment override**: `HARMONY_CONFIG_AppConfig='{"host":"env-host","port":9090}'` bypasses all backends
|
||||
|
||||
#### Step 6: Validate graceful fallback
|
||||
|
||||
Already validated via unit tests (26 tests pass):
|
||||
|
||||
- `test_store_source_error_falls_through_to_sqlite` -- `StoreSource` with `AlwaysErrorStore` returns connection error, chain falls through to `SqliteSource`
|
||||
- `test_store_source_not_found_falls_through_to_sqlite` -- `StoreSource` returns `NotFound`, chain falls through to `SqliteSource`
|
||||
|
||||
**Code path (FIXED in `harmony_config/src/source/store.rs`)**:
|
||||
```rust
|
||||
// StoreSource::get() -- returns Ok(None) on ANY error, allowing chain to continue
|
||||
match self.store.get_raw(&self.namespace, key).await {
|
||||
Ok(bytes) => { /* deserialize and return */ Ok(Some(value)) }
|
||||
Err(SecretStoreError::NotFound { .. }) => Ok(None),
|
||||
Err(_) => Ok(None), // Connection errors, timeouts, etc.
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 7: Known issues and blockers
|
||||
|
||||
| Issue | Location | Severity | Status |
|
||||
|---|---|---|---|
|
||||
| `global.openshift: true` hardcoded | `harmony/src/modules/openbao/mod.rs:32` | **Blocker for k3d** | ✅ Fixed: Added `openshift: bool` field to `OpenbaoScore` (defaults to `false`) |
|
||||
| `kv_mount` used as auth mount path | `harmony_secret/src/store/openbao.rs:234` | **Bug** | ✅ Fixed: Added separate `auth_mount` parameter; added `OPENBAO_AUTH_MOUNT` env var |
|
||||
| Admin email hardcoded `admin@zitadel.example.com` | `harmony/src/modules/zitadel/mod.rs:314` | Minor | Cosmetic mismatch with success message |
|
||||
| `ExternalSecure: true` hardcoded | `harmony/src/modules/zitadel/mod.rs:306` | **Issue for k3d** | ✅ Fixed: Zitadel now detects Kubernetes distribution and uses appropriate settings (OpenShift = TLS + cert-manager annotations, k3d = plain nginx ingress without TLS) |
|
||||
| No Helm chart version pinning | Both modules | Risk | Non-deterministic deploys |
|
||||
| No `--wait` on Helm install | `harmony/src/modules/helm/chart.rs` | UX | Must manually wait for readiness |
|
||||
| `get_version()`/`get_status()` are `todo!()` | Both modules | Panic risk | Do not call these methods |
|
||||
| JWT/OIDC device flow not implemented | `harmony_secret/src/store/openbao.rs` | **Gap** | ✅ Implemented: `ZitadelOidcAuth` in `harmony_secret/src/store/zitadel.rs` |
|
||||
| `HARMONY_SECRET_NAMESPACE` panics if not set | `harmony_secret/src/config.rs:5` | Runtime panic | Only affects `SecretManager`, not `StoreSource` directly |
|
||||
|
||||
**Remaining work**:
|
||||
- [x] `StoreSource<OpenbaoSecretStore>` integration validates compilation
|
||||
- [x] StoreSource returns `Ok(None)` on connection error (not `Err`)
|
||||
- [x] Graceful fallback tests pass when OpenBao is unreachable (2 new tests)
|
||||
- [x] Fix `global.openshift: true` in `OpenbaoScore` for k3d compatibility
|
||||
- [x] Fix `kv_mount` / auth mount conflation bug in `OpenbaoSecretStore`
|
||||
- [x] Create and test `harmony_config/examples/openbao_chain.rs` against real k3d deployment
|
||||
- [x] Implement JWT/OIDC device flow in `OpenbaoSecretStore` (ADR 020-1) — `ZitadelOidcAuth` implemented and wired into `OpenbaoSecretStore::new()` auth chain
|
||||
- [x] Fix Zitadel distribution detection — Zitadel now uses `k8s_client.get_k8s_distribution()` to detect OpenShift vs k3d and applies appropriate Helm values (TLS + cert-manager for OpenShift, plain nginx for k3d)
|
||||
|
||||
### 1.5 UX validation checklist ⏳
|
||||
|
||||
**Status**: Partially complete - manual verification needed
|
||||
|
||||
- [ ] `cargo run --example postgresql` with no env vars → prompts for nothing
|
||||
- [ ] An example that uses `SecretManager` today (e.g., `brocade_snmp_server`) → when migrated to `harmony_config`, first run prompts, second run reads from SQLite
|
||||
- [ ] Setting `HARMONY_CONFIG_BrocadeSwitchAuth='{"host":"...","user":"...","password":"..."}'` → skips prompt, uses env value
|
||||
- [ ] Deleting `~/.local/share/harmony/config/` directory → re-prompts on next run
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [x] `SqliteSource` implementation with tests
|
||||
- [x] Functional `PromptSource` with `should_persist()` design
|
||||
- [x] Fix `get_or_prompt` to persist to first writable source (via `should_persist()`), not all sources
|
||||
- [x] Integration tests for full resolution chain
|
||||
- [x] Branch-switching deserialization failure test
|
||||
- [x] `StoreSource<OpenbaoSecretStore>` integration validated (compiles, graceful fallback)
|
||||
- [x] ADR for Zitadel OIDC target architecture
|
||||
- [ ] Update docs to reflect final implementation and behavior
|
||||
|
||||
## Key Implementation Notes
|
||||
|
||||
1. **SQLite path**: `~/.local/share/harmony/config/config.db` (not `~/.local/share/harmony/config.db`)
|
||||
|
||||
2. **Auto-create directory**: `SqliteSource::open()` creates parent directories if they don't exist
|
||||
|
||||
3. **Default path**: `SqliteSource::default()` uses `directories::ProjectDirs` to find the correct data directory
|
||||
|
||||
4. **Env var precedence**: Environment variables always take precedence over SQLite in the resolution chain
|
||||
|
||||
5. **Testing**: All tests use `tempfile::NamedTempFile` for temporary database paths, ensuring test isolation
|
||||
|
||||
6. **Graceful fallback**: `StoreSource::get()` returns `Ok(None)` on any error (connection refused, timeout, etc.), allowing the chain to fall through to the next source. This ensures OpenBao unavailability doesn't break the config chain.
|
||||
|
||||
7. **StoreSource errors don't block chain**: When OpenBao is unreachable, `StoreSource::get()` returns `Ok(None)` and the `ConfigManager` continues to the next source (typically `SqliteSource`). This is validated by `test_store_source_error_falls_through_to_sqlite` and `test_store_source_not_found_falls_through_to_sqlite`.
|
||||
@@ -1,112 +0,0 @@
|
||||
# Phase 2: Migrate Workspace to `harmony_config`
|
||||
|
||||
## Goal
|
||||
|
||||
Replace every direct `harmony_secret::SecretManager` call with `harmony_config` equivalents. After this phase, modules and examples depend only on `harmony_config`. `harmony_secret` becomes an internal implementation detail behind `StoreSource`.
|
||||
|
||||
## Current State
|
||||
|
||||
19 call sites use `SecretManager::get_or_prompt::<T>()` across:
|
||||
|
||||
| Location | Secret Types | Call Sites |
|
||||
|----------|-------------|------------|
|
||||
| `harmony/src/modules/brocade/brocade_snmp.rs` | `BrocadeSnmpAuth`, `BrocadeSwitchAuth` | 2 |
|
||||
| `harmony/src/modules/nats/score_nats_k8s.rs` | `NatsAdmin` | 1 |
|
||||
| `harmony/src/modules/okd/bootstrap_02_bootstrap.rs` | `RedhatSecret`, `SshKeyPair` | 2 |
|
||||
| `harmony/src/modules/application/features/monitoring.rs` | `NtfyAuth` | 1 |
|
||||
| `brocade/examples/main.rs` | `BrocadeSwitchAuth` | 1 |
|
||||
| `examples/okd_installation/src/main.rs` + `topology.rs` | `SshKeyPair`, `BrocadeSwitchAuth`, `OPNSenseFirewallConfig` | 3 |
|
||||
| `examples/okd_pxe/src/main.rs` + `topology.rs` | `SshKeyPair`, `BrocadeSwitchAuth`, `OPNSenseFirewallCredentials` | 3 |
|
||||
| `examples/opnsense/src/main.rs` | `OPNSenseFirewallCredentials` | 1 |
|
||||
| `examples/sttest/src/main.rs` + `topology.rs` | `SshKeyPair`, `OPNSenseFirewallConfig` | 2 |
|
||||
| `examples/opnsense_node_exporter/` | (has dep but unclear usage) | ~1 |
|
||||
| `examples/okd_cluster_alerts/` | (has dep but unclear usage) | ~1 |
|
||||
| `examples/brocade_snmp_server/` | (has dep but unclear usage) | ~1 |
|
||||
|
||||
## Tasks
|
||||
|
||||
### 2.1 Bootstrap `harmony_config` in CLI and TUI entry points
|
||||
|
||||
Add `harmony_config::init()` as the first thing that happens in `harmony_cli::run()` and `harmony_tui::run()`.
|
||||
|
||||
```rust
|
||||
// harmony_cli/src/lib.rs — inside run()
|
||||
pub async fn run<T: Topology + Send + Sync + 'static>(
|
||||
inventory: Inventory,
|
||||
topology: T,
|
||||
scores: Vec<Box<dyn Score<T>>>,
|
||||
args_struct: Option<Args>,
|
||||
) -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Initialize config system with default source chain
|
||||
let sqlite = Arc::new(SqliteSource::default().await?);
|
||||
let env = Arc::new(EnvSource);
|
||||
harmony_config::init(vec![env, sqlite]).await;
|
||||
|
||||
// ... rest of run()
|
||||
}
|
||||
```
|
||||
|
||||
This replaces the implicit `SecretManager` lazy initialization that currently happens on first `get_or_prompt` call.
|
||||
|
||||
### 2.2 Migrate each secret type from `Secret` to `Config`
|
||||
|
||||
For each secret struct, change:
|
||||
|
||||
```rust
|
||||
// Before
|
||||
use harmony_secret::Secret;
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, InteractiveParse, Secret)]
|
||||
struct BrocadeSwitchAuth { ... }
|
||||
|
||||
// After
|
||||
use harmony_config::Config;
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, InteractiveParse, Config)]
|
||||
struct BrocadeSwitchAuth { ... }
|
||||
```
|
||||
|
||||
At each call site, change:
|
||||
|
||||
```rust
|
||||
// Before
|
||||
let config = SecretManager::get_or_prompt::<BrocadeSwitchAuth>().await.unwrap();
|
||||
|
||||
// After
|
||||
let config = harmony_config::get_or_prompt::<BrocadeSwitchAuth>().await.unwrap();
|
||||
```
|
||||
|
||||
### 2.3 Migration order (low risk to high risk)
|
||||
|
||||
1. **`brocade/examples/main.rs`** — 1 call site, isolated example, easy to test manually
|
||||
2. **`examples/opnsense/src/main.rs`** — 1 call site, isolated
|
||||
3. **`harmony/src/modules/brocade/brocade_snmp.rs`** — 2 call sites, core module but straightforward
|
||||
4. **`harmony/src/modules/nats/score_nats_k8s.rs`** — 1 call site
|
||||
5. **`harmony/src/modules/application/features/monitoring.rs`** — 1 call site
|
||||
6. **`examples/sttest/`** — 2 call sites, has both main.rs and topology.rs patterns
|
||||
7. **`examples/okd_installation/`** — 3 call sites, complex topology setup
|
||||
8. **`examples/okd_pxe/`** — 3 call sites, similar to okd_installation
|
||||
9. **`harmony/src/modules/okd/bootstrap_02_bootstrap.rs`** — 2 call sites, critical OKD bootstrap path
|
||||
|
||||
### 2.4 Remove `harmony_secret` from direct dependencies
|
||||
|
||||
After all call sites are migrated:
|
||||
|
||||
1. Remove `harmony_secret` from `Cargo.toml` of: `harmony`, `brocade`, and all examples that had it
|
||||
2. `harmony_config` keeps `harmony_secret` as a dependency (for `StoreSource`)
|
||||
3. The `Secret` trait and `SecretManager` remain in `harmony_secret` but are not used directly anymore
|
||||
|
||||
### 2.5 Backward compatibility for existing local secrets
|
||||
|
||||
Users who already have secrets stored via `LocalFileSecretStore` (JSON files in `~/.local/share/harmony/secrets/`) need a migration path:
|
||||
|
||||
- On first run after upgrade, if SQLite has no entry for a key but the old JSON file exists, read from JSON and write to SQLite
|
||||
- Or: add `LocalFileSource` as a fallback source at the end of the chain (read-only) for one release cycle
|
||||
- Log a deprecation warning when reading from old JSON files
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [ ] `harmony_config::init()` called in `harmony_cli::run()` and `harmony_tui::run()`
|
||||
- [ ] All 19 call sites migrated from `SecretManager` to `harmony_config`
|
||||
- [ ] `harmony_secret` removed from direct dependencies of `harmony`, `brocade`, and all examples
|
||||
- [ ] Backward compatibility for existing local JSON secrets
|
||||
- [ ] All existing unit tests still pass
|
||||
- [ ] Manual verification: one migrated example works end-to-end (prompt → persist → read)
|
||||
@@ -1,141 +0,0 @@
|
||||
# Phase 3: Complete `harmony_assets`, Refactor Consumers
|
||||
|
||||
## Goal
|
||||
|
||||
Make `harmony_assets` the single way to manage downloadable binaries and images across Harmony. Eliminate `k3d::DownloadableAsset` duplication, implement `Url::Url` in OPNsense infra, remove LFS-tracked files from git.
|
||||
|
||||
## Current State
|
||||
|
||||
- `harmony_assets` exists with `Asset`, `LocalCache`, `LocalStore`, `S3Store` (behind feature flag). CLI with `upload`, `download`, `checksum`, `verify` commands. **No tests. Zero consumers.**
|
||||
- `k3d/src/downloadable_asset.rs` has the same functionality with full test coverage (httptest mock server, checksum verification, cache hit, 404 handling, checksum failure).
|
||||
- `Url::Url` variant in `harmony_types/src/net.rs` exists but is `todo!()` in OPNsense TFTP and HTTP infra layers.
|
||||
- OKD modules hardcode `./data/...` paths (`bootstrap_02_bootstrap.rs:84-88`, `ipxe.rs:73`).
|
||||
- `data/` directory contains ~3GB of LFS-tracked files (OKD binaries, PXE images, SCOS images).
|
||||
|
||||
## Tasks
|
||||
|
||||
### 3.1 Port k3d tests to `harmony_assets`
|
||||
|
||||
The k3d crate has 5 well-written tests in `downloadable_asset.rs`. Port them to test `harmony_assets::LocalStore`:
|
||||
|
||||
```rust
|
||||
// harmony_assets/tests/local_store.rs (or in src/ as unit tests)
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_fetch_downloads_and_verifies_checksum() {
|
||||
// Start httptest server serving a known file
|
||||
// Create Asset with URL pointing to mock server
|
||||
// Fetch via LocalStore
|
||||
// Assert file exists at expected cache path
|
||||
// Assert checksum matches
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_fetch_returns_cached_file_when_present() {
|
||||
// Pre-populate cache with correct file
|
||||
// Fetch — assert no HTTP request made (mock server not hit)
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_fetch_fails_on_404() { ... }
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_fetch_fails_on_checksum_mismatch() { ... }
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_fetch_with_progress_callback() {
|
||||
// Assert progress callback is called with (bytes_received, total_size)
|
||||
}
|
||||
```
|
||||
|
||||
Add `httptest` to `[dev-dependencies]` of `harmony_assets`.
|
||||
|
||||
### 3.2 Refactor `k3d` to use `harmony_assets`
|
||||
|
||||
Replace `k3d/src/downloadable_asset.rs` with calls to `harmony_assets`:
|
||||
|
||||
```rust
|
||||
// k3d/src/lib.rs — in download_latest_release()
|
||||
use harmony_assets::{Asset, LocalCache, LocalStore, ChecksumAlgo};
|
||||
|
||||
let asset = Asset::new(
|
||||
binary_url,
|
||||
checksum,
|
||||
ChecksumAlgo::SHA256,
|
||||
K3D_BIN_FILE_NAME.to_string(),
|
||||
);
|
||||
let cache = LocalCache::new(self.base_dir.clone());
|
||||
let store = LocalStore::new();
|
||||
let path = store.fetch(&asset, &cache, None).await
|
||||
.map_err(|e| format!("Failed to download k3d: {}", e))?;
|
||||
```
|
||||
|
||||
Delete `k3d/src/downloadable_asset.rs`. Update k3d's `Cargo.toml` to depend on `harmony_assets`.
|
||||
|
||||
### 3.3 Define asset metadata as config structs
|
||||
|
||||
Following `plan.md` Phase 2, create typed config for OKD assets using `harmony_config`:
|
||||
|
||||
```rust
|
||||
// harmony/src/modules/okd/config.rs
|
||||
#[derive(Config, Serialize, Deserialize, JsonSchema, InteractiveParse)]
|
||||
struct OkdInstallerConfig {
|
||||
pub openshift_install_url: String,
|
||||
pub openshift_install_sha256: String,
|
||||
pub scos_kernel_url: String,
|
||||
pub scos_kernel_sha256: String,
|
||||
pub scos_initramfs_url: String,
|
||||
pub scos_initramfs_sha256: String,
|
||||
pub scos_rootfs_url: String,
|
||||
pub scos_rootfs_sha256: String,
|
||||
}
|
||||
```
|
||||
|
||||
First run prompts for URLs/checksums (or uses compiled-in defaults). Values persist to SQLite. Can be overridden via env vars or OpenBao.
|
||||
|
||||
### 3.4 Implement `Url::Url` in OPNsense infra layer
|
||||
|
||||
In `harmony/src/infra/opnsense/http.rs` and `tftp.rs`, implement the `Url::Url(url)` match arm:
|
||||
|
||||
```rust
|
||||
// Instead of SCP-ing files to OPNsense:
|
||||
// SSH into OPNsense, run: fetch -o /usr/local/http/{path} {url}
|
||||
// (FreeBSD-native HTTP client, no extra deps on OPNsense)
|
||||
```
|
||||
|
||||
This eliminates the manual `scp` workaround and the `inquire::Confirm` prompts in `ipxe.rs:126` and `bootstrap_02_bootstrap.rs:230`.
|
||||
|
||||
### 3.5 Refactor OKD modules to use assets + config
|
||||
|
||||
In `bootstrap_02_bootstrap.rs`:
|
||||
- `openshift-install`: Resolve `OkdInstallerConfig` from `harmony_config`, download via `harmony_assets`, invoke from cache.
|
||||
- SCOS images: Pass `Url::Url(scos_kernel_url)` etc. to `StaticFilesHttpScore`. OPNsense fetches from S3 directly.
|
||||
- Remove `oc` and `kubectl` from `data/okd/bin/` (never used by code).
|
||||
|
||||
In `ipxe.rs`:
|
||||
- Replace the folder-to-serve SCP workaround with individual `Url::Url` entries.
|
||||
- Remove the `inquire::Confirm` SCP prompts.
|
||||
|
||||
### 3.6 Upload assets to S3
|
||||
|
||||
- Upload all current `data/` binaries to Ceph S3 bucket with path scheme: `harmony-assets/okd/v{version}/openshift-install`, `harmony-assets/pxe/centos-stream-9/install.img`, etc.
|
||||
- Set public-read ACL or configure presigned URL generation.
|
||||
- Record S3 URLs and SHA256 checksums as defaults in the config structs.
|
||||
|
||||
### 3.7 Remove LFS, clean git
|
||||
|
||||
- Remove all LFS-tracked files from the repo.
|
||||
- Update `.gitattributes` to remove LFS filters.
|
||||
- Keep `data/` in `.gitignore` (it becomes a local cache directory).
|
||||
- Optionally use `git filter-repo` or BFG to strip LFS objects from history (required before Phase 4 GitHub publish).
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [ ] `harmony_assets` has tests ported from k3d pattern (5+ tests with httptest)
|
||||
- [ ] `k3d::DownloadableAsset` replaced by `harmony_assets` usage
|
||||
- [ ] `OkdInstallerConfig` struct using `harmony_config`
|
||||
- [ ] `Url::Url` implemented in OPNsense HTTP and TFTP infra
|
||||
- [ ] OKD bootstrap refactored to use lazy-download pattern
|
||||
- [ ] Assets uploaded to S3 with documented URLs/checksums
|
||||
- [ ] LFS removed, git history cleaned
|
||||
- [ ] Repo size small enough for GitHub (~code + templates only)
|
||||
@@ -1,110 +0,0 @@
|
||||
# Phase 4: Publish to GitHub
|
||||
|
||||
## Goal
|
||||
|
||||
Make Harmony publicly available on GitHub as the primary community hub for issues, pull requests, and discussions. CI runs on self-hosted runners.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Phase 3 complete: LFS removed, git history cleaned, repo is small
|
||||
- README polished with quick-start, architecture overview, examples
|
||||
- All existing tests pass
|
||||
|
||||
## Tasks
|
||||
|
||||
### 4.1 Clean git history
|
||||
|
||||
```bash
|
||||
# Option A: git filter-repo (preferred)
|
||||
git filter-repo --strip-blobs-bigger-than 10M
|
||||
|
||||
# Option B: BFG Repo Cleaner
|
||||
bfg --strip-blobs-bigger-than 10M
|
||||
git reflog expire --expire=now --all
|
||||
git gc --prune=now --aggressive
|
||||
```
|
||||
|
||||
Verify final repo size is reasonable (target: <50MB including all code, docs, templates).
|
||||
|
||||
### 4.2 Create GitHub repository
|
||||
|
||||
- Create `NationTech/harmony` (or chosen org/name) on GitHub
|
||||
- Push cleaned repo as initial commit
|
||||
- Set default branch to `main` (rename from `master` if desired)
|
||||
|
||||
### 4.3 Set up CI on self-hosted runners
|
||||
|
||||
GitHub is the community hub, but CI runs on your own infrastructure. Options:
|
||||
|
||||
**Option A: GitHub Actions with self-hosted runners**
|
||||
- Register your Gitea runner machines as GitHub Actions self-hosted runners
|
||||
- Port `.gitea/workflows/check.yml` to `.github/workflows/check.yml`
|
||||
- Same Docker image (`hub.nationtech.io/harmony/harmony_composer:latest`), same commands
|
||||
- Pro: native GitHub PR checks, no external service needed
|
||||
- Con: runners need outbound access to GitHub API
|
||||
|
||||
**Option B: External CI (Woodpecker, Drone, Jenkins)**
|
||||
- Use any CI that supports webhooks from GitHub
|
||||
- Report status back to GitHub via commit status API / checks API
|
||||
- Pro: fully self-hosted, no GitHub dependency for builds
|
||||
- Con: extra integration work
|
||||
|
||||
**Option C: Keep Gitea CI, mirror from GitHub**
|
||||
- GitHub repo has a webhook that triggers Gitea CI on push
|
||||
- Gitea reports back to GitHub via commit status API
|
||||
- Pro: no migration of CI config
|
||||
- Con: fragile webhook chain
|
||||
|
||||
**Recommendation**: Option A. GitHub Actions self-hosted runners are straightforward and give the best contributor UX (native PR checks). The workflow files are nearly identical to Gitea workflows.
|
||||
|
||||
```yaml
|
||||
# .github/workflows/check.yml
|
||||
name: Check
|
||||
on: [push, pull_request]
|
||||
jobs:
|
||||
check:
|
||||
runs-on: self-hosted
|
||||
container:
|
||||
image: hub.nationtech.io/harmony/harmony_composer:latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: bash build/check.sh
|
||||
```
|
||||
|
||||
### 4.4 Polish documentation
|
||||
|
||||
- **README.md**: Quick-start (clone → run → get prompted → see result), architecture diagram (Score → Interpret → Topology), link to docs and examples
|
||||
- **CONTRIBUTING.md**: Already exists. Review for GitHub-specific guidance (fork workflow, PR template)
|
||||
- **docs/**: Already comprehensive. Verify links work on GitHub rendering
|
||||
- **Examples**: Ensure each example has a one-line description in its `Cargo.toml` and a comment block in `main.rs`
|
||||
|
||||
### 4.5 License and legal
|
||||
|
||||
- Verify workspace `license` field in root `Cargo.toml` is set correctly
|
||||
- Add `LICENSE` file at repo root if not present
|
||||
- Scan for any proprietary dependencies or hardcoded internal URLs
|
||||
|
||||
### 4.6 GitHub repository configuration
|
||||
|
||||
- Branch protection on `main`: require PR review, require CI to pass
|
||||
- Issue templates: bug report, feature request
|
||||
- PR template: checklist (tests pass, docs updated, etc.)
|
||||
- Topics/tags: `rust`, `infrastructure-as-code`, `kubernetes`, `orchestration`, `bare-metal`
|
||||
- Repository description: "Infrastructure orchestration framework. Declare what you want (Score), describe your infrastructure (Topology), let Harmony figure out how."
|
||||
|
||||
### 4.7 Gitea as internal mirror
|
||||
|
||||
- Set up Gitea to mirror from GitHub (pull mirror)
|
||||
- Internal CI can continue running on Gitea for private/experimental branches
|
||||
- Public contributions flow through GitHub
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [ ] Git history cleaned, repo size <50MB
|
||||
- [ ] Public GitHub repository created
|
||||
- [ ] CI running on self-hosted runners with GitHub Actions
|
||||
- [ ] Branch protection enabled
|
||||
- [ ] README polished with quick-start guide
|
||||
- [ ] Issue and PR templates created
|
||||
- [ ] LICENSE file present
|
||||
- [ ] Gitea configured as mirror
|
||||
@@ -1,255 +0,0 @@
|
||||
# Phase 5: E2E Tests for PostgreSQL & RustFS
|
||||
|
||||
## Goal
|
||||
|
||||
Establish an automated E2E test pipeline that proves working examples actually work. Start with the two simplest k8s-based examples: PostgreSQL and RustFS.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Phase 1 complete (config crate works, bootstrap is clean)
|
||||
- `feat/rustfs` branch merged
|
||||
|
||||
## Architecture
|
||||
|
||||
### Test harness: `tests/e2e/`
|
||||
|
||||
A dedicated workspace member crate at `tests/e2e/` that contains:
|
||||
|
||||
1. **Shared k3d utilities** — create/destroy clusters, wait for readiness
|
||||
2. **Per-example test modules** — each example gets a `#[tokio::test]` function
|
||||
3. **Assertion helpers** — wait for pods, check CRDs exist, verify services
|
||||
|
||||
```
|
||||
tests/
|
||||
e2e/
|
||||
Cargo.toml
|
||||
src/
|
||||
lib.rs # Shared test utilities
|
||||
k3d.rs # k3d cluster lifecycle
|
||||
k8s_assert.rs # K8s assertion helpers
|
||||
tests/
|
||||
postgresql.rs # PostgreSQL E2E test
|
||||
rustfs.rs # RustFS E2E test
|
||||
```
|
||||
|
||||
### k3d cluster lifecycle
|
||||
|
||||
```rust
|
||||
// tests/e2e/src/k3d.rs
|
||||
use k3d_rs::K3d;
|
||||
|
||||
pub struct TestCluster {
|
||||
pub name: String,
|
||||
pub k3d: K3d,
|
||||
pub client: kube::Client,
|
||||
reuse: bool,
|
||||
}
|
||||
|
||||
impl TestCluster {
|
||||
/// Creates a k3d cluster for testing.
|
||||
/// If HARMONY_E2E_REUSE_CLUSTER=1, reuses existing cluster.
|
||||
pub async fn ensure(name: &str) -> Result<Self, String> {
|
||||
let reuse = std::env::var("HARMONY_E2E_REUSE_CLUSTER")
|
||||
.map(|v| v == "1")
|
||||
.unwrap_or(false);
|
||||
|
||||
let base_dir = PathBuf::from("/tmp/harmony-e2e");
|
||||
let k3d = K3d::new(base_dir, Some(name.to_string()));
|
||||
|
||||
let client = k3d.ensure_installed().await?;
|
||||
|
||||
Ok(Self { name: name.to_string(), k3d, client, reuse })
|
||||
}
|
||||
|
||||
/// Returns the kubeconfig path for this cluster.
|
||||
pub fn kubeconfig_path(&self) -> String { ... }
|
||||
}
|
||||
|
||||
impl Drop for TestCluster {
|
||||
fn drop(&mut self) {
|
||||
if !self.reuse {
|
||||
// Best-effort cleanup
|
||||
let _ = self.k3d.run_k3d_command(["cluster", "delete", &self.name]);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### K8s assertion helpers
|
||||
|
||||
```rust
|
||||
// tests/e2e/src/k8s_assert.rs
|
||||
|
||||
/// Wait until a pod matching the label selector is Running in the namespace.
|
||||
/// Times out after `timeout` duration.
|
||||
pub async fn wait_for_pod_running(
|
||||
client: &kube::Client,
|
||||
namespace: &str,
|
||||
label_selector: &str,
|
||||
timeout: Duration,
|
||||
) -> Result<(), String>
|
||||
|
||||
/// Assert a CRD instance exists.
|
||||
pub async fn assert_resource_exists<K: kube::Resource>(
|
||||
client: &kube::Client,
|
||||
name: &str,
|
||||
namespace: Option<&str>,
|
||||
) -> Result<(), String>
|
||||
|
||||
/// Install a Helm chart. Returns when all pods in the release are running.
|
||||
pub async fn helm_install(
|
||||
release_name: &str,
|
||||
chart: &str,
|
||||
namespace: &str,
|
||||
repo_url: Option<&str>,
|
||||
timeout: Duration,
|
||||
) -> Result<(), String>
|
||||
```
|
||||
|
||||
## Tasks
|
||||
|
||||
### 5.1 Create the `tests/e2e/` crate
|
||||
|
||||
Add to workspace `Cargo.toml`:
|
||||
|
||||
```toml
|
||||
[workspace]
|
||||
members = [
|
||||
# ... existing members
|
||||
"tests/e2e",
|
||||
]
|
||||
```
|
||||
|
||||
`tests/e2e/Cargo.toml`:
|
||||
|
||||
```toml
|
||||
[package]
|
||||
name = "harmony-e2e-tests"
|
||||
edition = "2024"
|
||||
publish = false
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
harmony_cli = { path = "../../harmony_cli" }
|
||||
harmony_types = { path = "../../harmony_types" }
|
||||
k3d_rs = { path = "../../k3d", package = "k3d_rs" }
|
||||
kube = { workspace = true }
|
||||
k8s-openapi = { workspace = true }
|
||||
tokio = { workspace = true }
|
||||
log = { workspace = true }
|
||||
env_logger = { workspace = true }
|
||||
|
||||
[dev-dependencies]
|
||||
pretty_assertions = { workspace = true }
|
||||
```
|
||||
|
||||
### 5.2 PostgreSQL E2E test
|
||||
|
||||
```rust
|
||||
// tests/e2e/tests/postgresql.rs
|
||||
use harmony::modules::postgresql::{PostgreSQLScore, capability::PostgreSQLConfig};
|
||||
use harmony::topology::K8sAnywhereTopology;
|
||||
use harmony::inventory::Inventory;
|
||||
use harmony::maestro::Maestro;
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_postgresql_deploys_on_k3d() {
|
||||
let cluster = TestCluster::ensure("harmony-e2e-pg").await.unwrap();
|
||||
|
||||
// Install CNPG operator via Helm
|
||||
// (K8sAnywhereTopology::ensure_ready() now handles this since
|
||||
// commit e1183ef "K8s postgresql score now ensures cnpg is installed")
|
||||
// But we may need the Helm chart for non-OKD:
|
||||
helm_install(
|
||||
"cnpg",
|
||||
"cloudnative-pg",
|
||||
"cnpg-system",
|
||||
Some("https://cloudnative-pg.github.io/charts"),
|
||||
Duration::from_secs(120),
|
||||
).await.unwrap();
|
||||
|
||||
// Configure topology pointing to test cluster
|
||||
let config = K8sAnywhereConfig {
|
||||
kubeconfig: Some(cluster.kubeconfig_path()),
|
||||
use_local_k3d: false,
|
||||
autoinstall: false,
|
||||
use_system_kubeconfig: false,
|
||||
harmony_profile: "dev".to_string(),
|
||||
k8s_context: None,
|
||||
};
|
||||
let topology = K8sAnywhereTopology::with_config(config);
|
||||
|
||||
// Create and run the score
|
||||
let score = PostgreSQLScore {
|
||||
config: PostgreSQLConfig {
|
||||
cluster_name: "e2e-test-pg".to_string(),
|
||||
namespace: "e2e-pg-test".to_string(),
|
||||
..Default::default()
|
||||
},
|
||||
};
|
||||
|
||||
let mut maestro = Maestro::initialize(Inventory::autoload(), topology).await.unwrap();
|
||||
maestro.register_all(vec![Box::new(score)]);
|
||||
|
||||
let scores = maestro.scores().read().unwrap().first().unwrap().clone_box();
|
||||
let result = maestro.interpret(scores).await;
|
||||
assert!(result.is_ok(), "PostgreSQL score failed: {:?}", result.err());
|
||||
|
||||
// Assert: CNPG Cluster resource exists
|
||||
// (the Cluster CRD is applied — pod readiness may take longer)
|
||||
let client = cluster.client.clone();
|
||||
// ... assert Cluster CRD exists in e2e-pg-test namespace
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 RustFS E2E test
|
||||
|
||||
Similar structure. Details depend on what the RustFS score deploys (likely a Helm chart or k8s resources for MinIO/RustFS).
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_rustfs_deploys_on_k3d() {
|
||||
let cluster = TestCluster::ensure("harmony-e2e-rustfs").await.unwrap();
|
||||
// ... similar pattern: configure topology, create score, interpret, assert
|
||||
}
|
||||
```
|
||||
|
||||
### 5.4 CI job for E2E tests
|
||||
|
||||
New workflow file (Gitea or GitHub Actions):
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/e2e.yml (or .github/workflows/e2e.yml)
|
||||
name: E2E Tests
|
||||
on:
|
||||
push:
|
||||
branches: [master, main]
|
||||
# Don't run on every PR — too slow. Run on label or manual trigger.
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
e2e:
|
||||
runs-on: self-hosted # Must have Docker available for k3d
|
||||
timeout-minutes: 15
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Install k3d
|
||||
run: curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
|
||||
|
||||
- name: Run E2E tests
|
||||
run: cargo test -p harmony-e2e-tests -- --test-threads=1
|
||||
env:
|
||||
RUST_LOG: info
|
||||
```
|
||||
|
||||
Note `--test-threads=1`: E2E tests create k3d clusters and should not run in parallel (port conflicts, resource contention).
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [ ] `tests/e2e/` crate added to workspace
|
||||
- [ ] Shared test utilities: `TestCluster`, `wait_for_pod_running`, `helm_install`
|
||||
- [ ] PostgreSQL E2E test passing
|
||||
- [ ] RustFS E2E test passing (after `feat/rustfs` merge)
|
||||
- [ ] CI job running E2E tests on push to main
|
||||
- [ ] `HARMONY_E2E_REUSE_CLUSTER=1` for fast local iteration
|
||||
@@ -1,214 +0,0 @@
|
||||
# Phase 6: E2E Tests for OKD HA Cluster on KVM
|
||||
|
||||
## Goal
|
||||
|
||||
Prove the full OKD bare-metal installation flow works end-to-end using KVM virtual machines. This is the ultimate validation of Harmony's core value proposition: declare an OKD cluster, point it at infrastructure, watch it materialize.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Phase 5 complete (test harness exists, k3d tests passing)
|
||||
- `feature/kvm-module` merged to main
|
||||
- A CI runner with libvirt/KVM access and nested virtualization support
|
||||
|
||||
## Architecture
|
||||
|
||||
The KVM branch already has a `kvm_okd_ha_cluster` example that creates:
|
||||
|
||||
```
|
||||
Host bridge (WAN)
|
||||
|
|
||||
+--------------------+
|
||||
| OPNsense | 192.168.100.1
|
||||
| gateway + PXE |
|
||||
+--------+-----------+
|
||||
|
|
||||
harmonylan (192.168.100.0/24)
|
||||
+---------+---------+---------+---------+
|
||||
| | | | |
|
||||
+----+---+ +---+---+ +---+---+ +---+---+ +--+----+
|
||||
| cp0 | | cp1 | | cp2 | |worker0| |worker1|
|
||||
| .10 | | .11 | | .12 | | .20 | | .21 |
|
||||
+--------+ +-------+ +-------+ +-------+ +---+---+
|
||||
|
|
||||
+-----+----+
|
||||
| worker2 |
|
||||
| .22 |
|
||||
+----------+
|
||||
```
|
||||
|
||||
The test needs to orchestrate this entire setup, wait for OKD to converge, and assert the cluster is healthy.
|
||||
|
||||
## Tasks
|
||||
|
||||
### 6.1 Start with `example_linux_vm` — the simplest KVM test
|
||||
|
||||
Before tackling the full OKD stack, validate the KVM module itself with the simplest possible test:
|
||||
|
||||
```rust
|
||||
// tests/e2e/tests/kvm_linux_vm.rs
|
||||
|
||||
#[tokio::test]
|
||||
#[ignore] // Requires libvirt access — run with: cargo test -- --ignored
|
||||
async fn test_linux_vm_boots_from_iso() {
|
||||
let executor = KvmExecutor::from_env().unwrap();
|
||||
|
||||
// Create isolated network
|
||||
let network = NetworkConfig {
|
||||
name: "e2e-test-net".to_string(),
|
||||
bridge: "virbr200".to_string(),
|
||||
// ...
|
||||
};
|
||||
executor.ensure_network(&network).await.unwrap();
|
||||
|
||||
// Define and start VM
|
||||
let vm_config = VmConfig::builder("e2e-linux-test")
|
||||
.vcpus(1)
|
||||
.memory_gb(1)
|
||||
.disk(5)
|
||||
.network(NetworkRef::named("e2e-test-net"))
|
||||
.cdrom("https://releases.ubuntu.com/24.04/ubuntu-24.04-live-server-amd64.iso")
|
||||
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
|
||||
.build();
|
||||
|
||||
executor.ensure_vm(&vm_config).await.unwrap();
|
||||
executor.start_vm("e2e-linux-test").await.unwrap();
|
||||
|
||||
// Assert VM is running
|
||||
let status = executor.vm_status("e2e-linux-test").await.unwrap();
|
||||
assert_eq!(status, VmStatus::Running);
|
||||
|
||||
// Cleanup
|
||||
executor.destroy_vm("e2e-linux-test").await.unwrap();
|
||||
executor.undefine_vm("e2e-linux-test").await.unwrap();
|
||||
executor.delete_network("e2e-test-net").await.unwrap();
|
||||
}
|
||||
```
|
||||
|
||||
This test validates:
|
||||
- ISO download works (via `harmony_assets` if refactored, or built-in KVM module download)
|
||||
- libvirt XML generation is correct
|
||||
- VM lifecycle (define → start → status → destroy → undefine)
|
||||
- Network creation/deletion
|
||||
|
||||
### 6.2 OKD HA Cluster E2E test
|
||||
|
||||
The full integration test. This is long-running (30-60 minutes) and should only run nightly or on-demand.
|
||||
|
||||
```rust
|
||||
// tests/e2e/tests/kvm_okd_ha.rs
|
||||
|
||||
#[tokio::test]
|
||||
#[ignore] // Requires KVM + significant resources. Run nightly.
|
||||
async fn test_okd_ha_cluster_on_kvm() {
|
||||
// 1. Create virtual infrastructure
|
||||
// - OPNsense gateway VM
|
||||
// - 3 control plane VMs
|
||||
// - 3 worker VMs
|
||||
// - Virtual network (harmonylan)
|
||||
|
||||
// 2. Run OKD installation scores
|
||||
// (the kvm_okd_ha_cluster example, but as a test)
|
||||
|
||||
// 3. Wait for OKD API server to become reachable
|
||||
// - Poll https://api.okd.harmonylan:6443 until it responds
|
||||
// - Timeout: 30 minutes
|
||||
|
||||
// 4. Assert cluster health
|
||||
// - All nodes in Ready state
|
||||
// - ClusterVersion reports Available=True
|
||||
// - Sample workload (nginx) deploys and pod reaches Running
|
||||
|
||||
// 5. Cleanup
|
||||
// - Destroy all VMs
|
||||
// - Delete virtual networks
|
||||
// - Clean up disk images
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 CI runner requirements
|
||||
|
||||
The KVM E2E test needs a runner with:
|
||||
|
||||
- **Hardware**: 32GB+ RAM, 8+ CPU cores, 100GB+ disk
|
||||
- **Software**: libvirt, QEMU/KVM, `virsh`, nested virtualization enabled
|
||||
- **Network**: Outbound internet access (to download ISOs, OKD images)
|
||||
- **Permissions**: User in `libvirt` group, or root access
|
||||
|
||||
Options:
|
||||
- **Dedicated bare-metal machine** registered as a self-hosted GitHub Actions runner
|
||||
- **Cloud VM with nested virt** (e.g., GCP n2-standard-8 with `--enable-nested-virtualization`)
|
||||
- **Manual trigger only** — developer runs locally, CI just tracks pass/fail
|
||||
|
||||
### 6.4 Nightly CI job
|
||||
|
||||
```yaml
|
||||
# .github/workflows/e2e-kvm.yml
|
||||
name: E2E KVM Tests
|
||||
on:
|
||||
schedule:
|
||||
- cron: '0 2 * * *' # 2 AM daily
|
||||
workflow_dispatch: # Manual trigger
|
||||
|
||||
jobs:
|
||||
kvm-tests:
|
||||
runs-on: [self-hosted, kvm] # Label for KVM-capable runners
|
||||
timeout-minutes: 90
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Run KVM E2E tests
|
||||
run: cargo test -p harmony-e2e-tests -- --ignored --test-threads=1
|
||||
env:
|
||||
RUST_LOG: info
|
||||
HARMONY_KVM_URI: qemu:///system
|
||||
|
||||
- name: Cleanup VMs on failure
|
||||
if: failure()
|
||||
run: |
|
||||
virsh list --all --name | grep e2e | xargs -I {} virsh destroy {} || true
|
||||
virsh list --all --name | grep e2e | xargs -I {} virsh undefine {} --remove-all-storage || true
|
||||
```
|
||||
|
||||
### 6.5 Test resource management
|
||||
|
||||
KVM tests create real resources that must be cleaned up even on failure. Implement a test fixture pattern:
|
||||
|
||||
```rust
|
||||
struct KvmTestFixture {
|
||||
executor: KvmExecutor,
|
||||
vms: Vec<String>,
|
||||
networks: Vec<String>,
|
||||
}
|
||||
|
||||
impl KvmTestFixture {
|
||||
fn track_vm(&mut self, name: &str) { self.vms.push(name.to_string()); }
|
||||
fn track_network(&mut self, name: &str) { self.networks.push(name.to_string()); }
|
||||
}
|
||||
|
||||
impl Drop for KvmTestFixture {
|
||||
fn drop(&mut self) {
|
||||
// Best-effort cleanup of all tracked resources
|
||||
for vm in &self.vms {
|
||||
let _ = std::process::Command::new("virsh")
|
||||
.args(["destroy", vm]).output();
|
||||
let _ = std::process::Command::new("virsh")
|
||||
.args(["undefine", vm, "--remove-all-storage"]).output();
|
||||
}
|
||||
for net in &self.networks {
|
||||
let _ = std::process::Command::new("virsh")
|
||||
.args(["net-destroy", net]).output();
|
||||
let _ = std::process::Command::new("virsh")
|
||||
.args(["net-undefine", net]).output();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [ ] `test_linux_vm_boots_from_iso` — passing KVM smoke test
|
||||
- [ ] `test_okd_ha_cluster_on_kvm` — full OKD installation test
|
||||
- [ ] `KvmTestFixture` with resource cleanup on test failure
|
||||
- [ ] Nightly CI job on KVM-capable runner
|
||||
- [ ] Force-cleanup script for leaked VMs/networks
|
||||
- [ ] Documentation: how to set up a KVM runner for E2E tests
|
||||
@@ -1,57 +0,0 @@
|
||||
# Phase 7: OPNsense & Bare-Metal Network Automation
|
||||
|
||||
## Goal
|
||||
|
||||
Complete the OPNsense API coverage and Brocade switch integration to enable fully automated bare-metal HA cluster provisioning with LAGG, CARP VIP, multi-WAN, and BINAT.
|
||||
|
||||
## Status: In Progress
|
||||
|
||||
### Done
|
||||
|
||||
- opnsense-codegen pipeline: XML model parsing, IR generation, Rust code generation with serde helpers
|
||||
- 11 generated API modules covering firewall, interfaces (VLAN, LAGG, VIP), HAProxy, DNSMasq, Caddy, WireGuard
|
||||
- 9 OPNsense Scores: VlanScore, LaggScore, VipScore, DnatScore, FirewallRuleScore, OutboundNatScore, BinatScore, NodeExporterScore, OPNsenseShellCommandScore
|
||||
- 13 opnsense-config modules with high-level Rust APIs
|
||||
- E2E tests for DNSMasq CRUD, HAProxy service lifecycle, interface settings
|
||||
- Brocade branch with VLAN CRUD, interface speed config, port-channel management
|
||||
|
||||
### Remaining
|
||||
|
||||
#### UpdateHostScore (new)
|
||||
|
||||
A Score that updates a host's configuration in the DHCP server and prepares it for PXE boot. Core responsibilities:
|
||||
|
||||
1. **Update MAC address in DHCP**: When hardware is replaced or NICs are swapped, update the DHCP static mapping with the new MAC address(es). This is the most critical function — without it, PXE boot targets the wrong hardware.
|
||||
2. **Configure PXE boot options**: Set next-server, boot filename (BIOS/UEFI/iPXE) for the specific host.
|
||||
3. **Host network setup for LAGG LACP 802.3ad**: Configure the host's network interfaces for link aggregation. This replaces the current `HostNetworkConfigurationScore` approach which only handles bond creation on the host side — the new approach must also create the corresponding LAGG interface on OPNsense and configure the Brocade switch port-channel with LACP.
|
||||
|
||||
The existing `DhcpHostBindingScore` handles bulk MAC-to-IP registration but lacks the ability to _update_ an existing mapping (the `remove_static_mapping` and `list_static_mappings` methods on `OPNSenseFirewall` are still `todo!()`).
|
||||
|
||||
#### Merge Brocade branch
|
||||
|
||||
The `feat/brocade-client-add-vlans` branch has breaking API changes:
|
||||
- `configure_interfaces` now takes `Vec<InterfaceConfig>` instead of `Vec<(String, PortOperatingMode)>`
|
||||
- `InterfaceType` changed from `Ethernet(String)` to specific variants (TenGigabitEthernet, FortyGigabitEthernet)
|
||||
- `harmony/src/infra/brocade.rs` needs adaptation to the new API
|
||||
|
||||
#### HostNetworkConfigurationScore rework
|
||||
|
||||
The current implementation (`harmony/src/modules/okd/host_network.rs`) has documented limitations:
|
||||
- Not idempotent (running twice may duplicate bond configs)
|
||||
- No rollback logic
|
||||
- Doesn't wait for switch config propagation
|
||||
- All tests are `#[ignore]` due to requiring interactive TTY (inquire prompts)
|
||||
- Doesn't create LAGG on OPNsense — only bonds on the host and port-channels on the switch
|
||||
|
||||
For LAGG LACP 802.3ad the flow needs to be:
|
||||
1. Create LAGG interface on OPNsense (LaggScore already exists)
|
||||
2. Create port-channel on Brocade switch (BrocadeSwitchConfigurationScore)
|
||||
3. Configure bond on host via NMState (existing NetworkManager)
|
||||
4. All three must be coordinated and idempotent
|
||||
|
||||
#### Fill remaining OPNsense `todo!()` stubs
|
||||
|
||||
- `OPNSenseFirewall::remove_static_mapping` — needed by UpdateHostScore
|
||||
- `OPNSenseFirewall::list_static_mappings` — needed for idempotent updates
|
||||
- `OPNSenseFirewall::Firewall` trait (add_rule, remove_rule, list_rules) — stub only
|
||||
- `OPNSenseFirewall::dns::register_dhcp_leases` — stub only
|
||||
@@ -1,56 +0,0 @@
|
||||
# Phase 8: HA OKD Production Deployment
|
||||
|
||||
## Goal
|
||||
|
||||
Deploy a production HAClusterTopology OKD cluster in UPI mode with full LAGG LACP 802.3ad, CARP VIP, multi-WAN, and BINAT for customer traffic — entirely automated through Harmony Scores.
|
||||
|
||||
## Status: Not Started
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Phase 7 (OPNsense & Bare-Metal) substantially complete
|
||||
- Brocade branch merged and adapted
|
||||
- UpdateHostScore implemented and tested
|
||||
|
||||
## Deployment Stack
|
||||
|
||||
### Network Layer (OPNsense)
|
||||
- **LAGG interfaces** (802.3ad LACP) for all cluster hosts — redundant links via LaggScore
|
||||
- **CARP VIPs** for high availability — failover IPs via VipScore
|
||||
- **Multi-WAN** configuration — multiple uplinks with gateway groups
|
||||
- **BINAT** for customer-facing IPs — 1:1 NAT via BinatScore
|
||||
- **Firewall rules** per-customer with proper source/dest filtering via FirewallRuleScore
|
||||
- **Outbound NAT** for cluster egress via OutboundNatScore
|
||||
|
||||
### Switch Layer (Brocade)
|
||||
- **VLAN** per network segment (management, cluster, customer, storage)
|
||||
- **Port-channels** (LACP) matching OPNsense LAGG interfaces
|
||||
- **Interface speed** configuration for 10G/40G links
|
||||
|
||||
### Host Layer
|
||||
- **PXE boot** via UpdateHostScore (MAC → DHCP → TFTP → iPXE → SCOS)
|
||||
- **Network bonds** (LACP) via reworked HostNetworkConfigurationScore
|
||||
- **NMState** for persistent bond configuration on OpenShift nodes
|
||||
|
||||
### Cluster Layer
|
||||
- OKD UPI installation via existing OKDSetup01-04 Scores
|
||||
- HAProxy load balancer for API and ingress via LoadBalancerScore
|
||||
- DNS via OKDDnsScore
|
||||
- Monitoring via NodeExporterScore + Prometheus stack
|
||||
|
||||
## New Scores Needed
|
||||
|
||||
1. **UpdateHostScore** — Update MAC in DHCP, configure PXE boot, prepare host network for LAGG LACP
|
||||
2. **MultiWanScore** — Configure OPNsense gateway groups for multi-WAN failover
|
||||
3. **CustomerBinatScore** (optional) — Higher-level Score combining BinatScore + FirewallRuleScore + DnatScore per customer
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
- [ ] All hosts PXE boot successfully after MAC update
|
||||
- [ ] LAGG/LACP active on all host links (verify via `teamdctl` or `nmcli`)
|
||||
- [ ] CARP VIPs fail over within expected time window
|
||||
- [ ] BINAT customers reachable from external networks
|
||||
- [ ] Multi-WAN failover tested (pull one uplink, verify traffic shifts)
|
||||
- [ ] Full OKD installation completes end-to-end
|
||||
- [ ] Cluster API accessible via CARP VIP
|
||||
- [ ] Customer workloads routable via BINAT
|
||||
@@ -1,125 +0,0 @@
|
||||
# Phase 9: SSO + Config System Hardening
|
||||
|
||||
## Goal
|
||||
|
||||
Make the Zitadel + OpenBao SSO config management stack production-ready, well-tested, and reusable across deployments. The `harmony_sso` example demonstrates the full loop: deploy infrastructure, authenticate via SSO, store and retrieve config -- all in one `cargo run`.
|
||||
|
||||
## Current State (as of `feat/opnsense-codegen`)
|
||||
|
||||
The SSO example works end-to-end:
|
||||
- k3d cluster + OpenBao + Zitadel deployed via Scores
|
||||
- `OpenbaoSetupScore`: init, unseal, policies, userpass, JWT auth
|
||||
- `ZitadelSetupScore`: project + device-code app provisioning via Management API (PAT auth)
|
||||
- JWT exchange: Zitadel id_token → OpenBao client token via `/v1/auth/jwt/login`
|
||||
- Device flow triggers in terminal, user logs in via browser, config stored in OpenBao KV v2
|
||||
- CoreDNS patched for in-cluster hostname resolution (K3sFamily only)
|
||||
- Discovery cache invalidation after CRD installation
|
||||
- Session caching with TTL
|
||||
|
||||
### What's solid
|
||||
|
||||
- **Score composition**: 4 Scores orchestrate the full stack in ~280 lines
|
||||
- **Config trait**: clean `Serialize + Deserialize + JsonSchema`, developer doesn't see OpenBao or Zitadel
|
||||
- **Auth chain transparency**: token → cached → OIDC device flow → userpass, right thing happens
|
||||
- **Idempotency**: all Scores safe to re-run, cached sessions skip login
|
||||
|
||||
### What needs work
|
||||
|
||||
See tasks below.
|
||||
|
||||
## Tasks
|
||||
|
||||
### 9.1 Builder pattern for `OpenbaoSecretStore` — HIGH
|
||||
|
||||
**Problem**: `OpenbaoSecretStore::new()` has 11 positional arguments. Adding JWT params made it worse. Callers pass `None, None, None, None` for unused options.
|
||||
|
||||
**Fix**: Replace with a builder:
|
||||
```rust
|
||||
OpenbaoSecretStore::builder()
|
||||
.url("http://127.0.0.1:8200")
|
||||
.kv_mount("secret")
|
||||
.skip_tls(true)
|
||||
.zitadel_sso("http://sso.harmony.local:8080", "client-id-123")
|
||||
.jwt_auth("harmony-developer", "jwt")
|
||||
.build()
|
||||
.await?
|
||||
```
|
||||
|
||||
**Impact**: All callers updated (lib.rs, openbao_chain example, harmony_sso example). Breaking API change.
|
||||
|
||||
**Files**: `harmony_secret/src/store/openbao.rs`, all callers
|
||||
|
||||
### 9.2 Fix ZitadelScore PG readiness — HIGH
|
||||
|
||||
**Problem**: `ZitadelScore` calls `topology.get_endpoint()` immediately after deploying the CNPG Cluster CR. The PG `-rw` service takes 15-30s to appear. This forces a retry loop in the caller (the example).
|
||||
|
||||
**Fix**: Add a wait loop inside `ZitadelScore`'s interpret, after `topology.deploy(&pg_config)`, that polls for the `-rw` service to exist before calling `get_endpoint()`. Use `K8sClient::get_resource::<Service>()` with a poll loop.
|
||||
|
||||
**Impact**: Eliminates the retry wrapper in the harmony_sso example and any other Zitadel consumer.
|
||||
|
||||
**Files**: `harmony/src/modules/zitadel/mod.rs`
|
||||
|
||||
### 9.3 `CoreDNSRewriteScore` — MEDIUM
|
||||
|
||||
**Problem**: CoreDNS patching logic lives in the harmony_sso example. It's a general pattern: any service with ingress-based Host routing needs in-cluster DNS resolution.
|
||||
|
||||
**Fix**: Extract into `harmony/src/modules/k8s/coredns.rs` as a proper Score:
|
||||
```rust
|
||||
pub struct CoreDNSRewriteScore {
|
||||
pub rewrites: Vec<(String, String)>, // (hostname, service FQDN)
|
||||
}
|
||||
impl<T: Topology + K8sclient> Score<T> for CoreDNSRewriteScore { ... }
|
||||
```
|
||||
|
||||
K3sFamily only. No-op on OpenShift. Idempotent.
|
||||
|
||||
**Files**: `harmony/src/modules/k8s/coredns.rs` (new), `harmony/src/modules/k8s/mod.rs`
|
||||
|
||||
### 9.4 Integration tests for Scores — MEDIUM
|
||||
|
||||
**Problem**: Zero tests for `OpenbaoSetupScore`, `ZitadelSetupScore`, `CoreDNSRewriteScore`. The Scores are testable against a running k3d cluster.
|
||||
|
||||
**Fix**: Add `#[ignore]` integration tests that require a running cluster:
|
||||
- `test_openbao_setup_score`: deploy OpenBao + run setup, verify KV works
|
||||
- `test_zitadel_setup_score`: deploy Zitadel + run setup, verify project/app exist
|
||||
- `test_config_round_trip`: store + retrieve config via SSO-authenticated OpenBao
|
||||
|
||||
Run with `cargo test -- --ignored` after deploying the example.
|
||||
|
||||
**Files**: `harmony/tests/integration/` (new directory)
|
||||
|
||||
### 9.5 Remove `resolve()` DNS hack — LOW
|
||||
|
||||
**Problem**: `ZitadelOidcAuth::http_client()` hardcodes `resolve(host, 127.0.0.1:port)`. This only works for local k3d development.
|
||||
|
||||
**Fix**: Make it configurable. Add an optional `resolve_to: Option<SocketAddr>` field to `ZitadelOidcAuth`. The example passes `Some(127.0.0.1:8080)` for k3d; production passes `None` (uses real DNS). Or better: detect whether the host resolves and only apply the override if it doesn't.
|
||||
|
||||
**Files**: `harmony_secret/src/store/zitadel.rs`
|
||||
|
||||
### 9.6 Typed Zitadel API client — LOW
|
||||
|
||||
**Problem**: `ZitadelSetupScore` uses hand-written JSON with string parsing for Management API calls. No type safety on request/response.
|
||||
|
||||
**Fix**: Create typed request/response structs for the Management API v1 endpoints used (projects, apps, users). Use `serde` for serialization. This doesn't need to be a full API client -- just the endpoints we use.
|
||||
|
||||
**Files**: `harmony/src/modules/zitadel/api.rs` (new)
|
||||
|
||||
### 9.7 Capability traits for secret vault + identity — FUTURE
|
||||
|
||||
**Problem**: `OpenbaoScore` and `ZitadelScore` are tool-specific. No capability abstraction for "I need a secret vault" or "I need an identity provider".
|
||||
|
||||
**Fix**: Design `SecretVault` and `IdentityProvider` capability traits on topologies. This is a significant architectural decision that needs an ADR.
|
||||
|
||||
**Blocked by**: Real-world use of a second implementation (e.g., HashiCorp Vault, Keycloak) to validate the abstraction boundary.
|
||||
|
||||
### 9.8 Auto-unseal for OpenBao — FUTURE
|
||||
|
||||
**Problem**: Every pod restart requires manual unseal. `OpenbaoSetupScore` handles this, but requires re-running the Score.
|
||||
|
||||
**Fix**: Configure Transit auto-unseal (using a second OpenBao/Vault instance) or cloud KMS auto-unseal. This is an operational concern that should be configurable in `OpenbaoSetupScore`.
|
||||
|
||||
## Relationship to Other Phases
|
||||
|
||||
- **Phase 1** (config crate): SSO flow builds directly on `harmony_config` + `StoreSource<OpenbaoSecretStore>`. Phase 1 task 1.4 is now **complete** via the harmony_sso example.
|
||||
- **Phase 2** (migrate to harmony_config): The 19 `SecretManager` call sites should migrate to `ConfigManager` with the OpenbaoSecretStore backend. The SSO flow validates this pattern works.
|
||||
- **Phase 5** (E2E tests): The harmony_sso example is a candidate for the first E2E test -- it deploys k3d, exercises multiple Scores, and verifies config storage.
|
||||
@@ -1,49 +0,0 @@
|
||||
# Phase 10: Firewall Pair Topology & HA Firewall Automation
|
||||
|
||||
## Goal
|
||||
|
||||
Provide first-class support for managing OPNsense (and future) HA firewall pairs through a higher-order topology, including CARP VIP orchestration, per-device config differentiation, and integration testing.
|
||||
|
||||
## Current State
|
||||
|
||||
`FirewallPairTopology` is implemented as a concrete wrapper around two `OPNSenseFirewall` instances. It applies uniform scores to both firewalls and differentiates CARP VIP advskew (primary=0, backup=configurable). All existing OPNsense scores (Lagg, Vlan, Firewall Rules, DNAT, BINAT, Outbound NAT, DHCP) work with the pair topology. QC1 uses it for its NT firewall pair.
|
||||
|
||||
## Tasks
|
||||
|
||||
### 10.1 Generic FirewallPair over a capability trait
|
||||
|
||||
**Priority**: MEDIUM
|
||||
**Status**: Not started
|
||||
|
||||
`FirewallPairTopology` is currently concrete over `OPNSenseFirewall`. This breaks extensibility — a pfSense or VyOS firewall pair would need a separate type. Introduce a `FirewallAppliance` capability trait that `OPNSenseFirewall` implements, and make `FirewallPairTopology<T: FirewallAppliance>` generic. The blanket-impl pattern from ADR-015 then gives automatic pair support for any appliance type.
|
||||
|
||||
Key challenge: the trait needs to expose enough for `CarpVipScore` to configure VIPs with per-device advskew, without leaking OPNsense-specific APIs.
|
||||
|
||||
### 10.2 Delegation macro for higher-order topologies
|
||||
|
||||
**Priority**: MEDIUM
|
||||
**Status**: Not started
|
||||
|
||||
The "delegate to both" pattern used by uniform pair scores is pure boilerplate. Every `Score<FirewallPairTopology>` impl for uniform scores follows the same structure: create the inner `Score<OPNSenseFirewall>` interpret, execute against primary, then backup.
|
||||
|
||||
Design a proc macro (e.g., `#[derive(DelegatePair)]` or `delegate_score_to_pair!`) that generates these impls automatically. This would also apply to `DecentralizedTopology` (delegate to all sites) and future higher-order topologies.
|
||||
|
||||
### 10.3 XMLRPC sync support
|
||||
|
||||
**Priority**: LOW
|
||||
**Status**: Not started
|
||||
|
||||
Add optional `FirewallPairTopology::sync_from_primary()` that triggers OPNsense XMLRPC config sync from primary to backup. Useful for settings that must be identical and don't need per-device differentiation. Not blocking — independent application to both firewalls achieves the same config state.
|
||||
|
||||
### 10.4 Integration test with CARP/LACP failover
|
||||
|
||||
**Priority**: LOW
|
||||
**Status**: Not started
|
||||
|
||||
Extend the existing OPNsense example deployment to create a firewall pair test fixture:
|
||||
- Two OPNsense VMs in CARP configuration
|
||||
- A third VM as a client verifying connectivity
|
||||
- Automated failover testing: disconnect primary's virtual NIC, verify CARP failover to backup, reconnect, verify failback
|
||||
- LACP failover: disconnect one LAGG member, verify traffic continues on remaining member
|
||||
|
||||
This builds on the KVM test harness from Phase 6.
|
||||
@@ -1,77 +0,0 @@
|
||||
# Phase 11: Named Config Instances & Cross-Namespace Access
|
||||
|
||||
## Goal
|
||||
|
||||
Allow multiple instances of the same config type within a single namespace, identified by name. Also allow explicit namespace specification when retrieving config items, enabling cross-deployment orchestration.
|
||||
|
||||
## Context
|
||||
|
||||
The current `harmony_config` system identifies config items by type only (`T::KEY` from `#[derive(Config)]`). This works for singletons but breaks when you need multiple instances of the same type:
|
||||
|
||||
- **Firewall pair**: primary and backup need separate `OPNSenseApiCredentials` (different API keys for different devices)
|
||||
- **Worker nodes**: each BMC has its own `IpmiCredentials` with different username/password
|
||||
- **Firewall administrators**: multiple `OPNSenseApiCredentials` with different permission levels
|
||||
- **Multi-tenant**: customer firewalls vs. NationTech infrastructure firewalls need separate credential sets
|
||||
|
||||
Using separate namespaces per device is not the answer — a firewall pair belongs to a single deployment, and forcing namespace switches for each device in a pair adds unnecessary friction.
|
||||
|
||||
Cross-namespace access is a separate but related need: the NT firewall pair and C1 customer firewall pair live in separate namespaces (the customer manages their own firewall), but NationTech needs read access to the C1 namespace for BINAT coordination.
|
||||
|
||||
## Tasks
|
||||
|
||||
### 11.1 Named config instances within a namespace
|
||||
|
||||
**Priority**: HIGH
|
||||
**Status**: Not started
|
||||
|
||||
Extend the `Config` trait and `ConfigManager` to support an optional instance name:
|
||||
|
||||
```rust
|
||||
// Current (singleton): gets "OPNSenseApiCredentials" from the active namespace
|
||||
let creds = ConfigManager::get::<OPNSenseApiCredentials>().await?;
|
||||
|
||||
// New (named): gets "OPNSenseApiCredentials/fw-primary" from the active namespace
|
||||
let primary_creds = ConfigManager::get_named::<OPNSenseApiCredentials>("fw-primary").await?;
|
||||
let backup_creds = ConfigManager::get_named::<OPNSenseApiCredentials>("fw-backup").await?;
|
||||
```
|
||||
|
||||
Storage key becomes `{T::KEY}/{instance_name}` (or similar). The unnamed `get()` remains unchanged for backward compatibility.
|
||||
|
||||
This needs to work across all config sources:
|
||||
- `EnvSource`: `HARMONY_CONFIG_{KEY}_{NAME}` (e.g., `HARMONY_CONFIG_OPNSENSE_API_CREDENTIALS_FW_PRIMARY`)
|
||||
- `SqliteSource`: composite key `{key}/{name}`
|
||||
- `StoreSource` (OpenBao): path `{namespace}/{key}/{name}`
|
||||
- `PromptSource`: prompt includes the instance name for clarity
|
||||
|
||||
### 11.2 Cross-namespace config access
|
||||
|
||||
**Priority**: MEDIUM
|
||||
**Status**: Not started
|
||||
|
||||
Allow specifying an explicit namespace when retrieving a config item:
|
||||
|
||||
```rust
|
||||
// Get from the active namespace (current behavior)
|
||||
let nt_creds = ConfigManager::get::<OPNSenseApiCredentials>().await?;
|
||||
|
||||
// Get from a specific namespace
|
||||
let c1_creds = ConfigManager::get_from_namespace::<OPNSenseApiCredentials>("c1").await?;
|
||||
```
|
||||
|
||||
This enables orchestration across deployments: the NT deployment can read C1's firewall credentials for BINAT coordination without switching the global namespace.
|
||||
|
||||
For the `StoreSource` (OpenBao), this maps to reading from a different KV path prefix. For `SqliteSource`, it maps to a different database file or a namespace column. For `EnvSource`, it could use a different prefix (`HARMONY_CONFIG_C1_{KEY}`).
|
||||
|
||||
### 11.3 Update FirewallPairTopology to use named configs
|
||||
|
||||
**Priority**: MEDIUM
|
||||
**Status**: Blocked by 11.1
|
||||
|
||||
Once named config instances are available, update `FirewallPairTopology::opnsense_from_config()` to use them:
|
||||
|
||||
```rust
|
||||
let primary_creds = ConfigManager::get_named::<OPNSenseApiCredentials>("fw-primary").await?;
|
||||
let backup_creds = ConfigManager::get_named::<OPNSenseApiCredentials>("fw-backup").await?;
|
||||
```
|
||||
|
||||
This removes the current limitation of shared credentials between primary and backup.
|
||||
@@ -13,7 +13,6 @@ If you're new to Harmony, start here:
|
||||
|
||||
See how to use Harmony to solve real-world problems.
|
||||
|
||||
- [**OPNsense VM Integration**](./use-cases/opnsense-vm-integration.md): Boot a real OPNsense firewall in a local KVM VM and configure it entirely through Harmony. Fully automated, zero manual steps — the flashiest demo. Requires Linux with KVM.
|
||||
- [**PostgreSQL on Local K3D**](./use-cases/postgresql-on-local-k3d.md): Deploy a production-grade PostgreSQL cluster on a local K3D cluster. The fastest way to get started.
|
||||
- [**OKD on Bare Metal**](./use-cases/okd-on-bare-metal.md): A detailed walkthrough of bootstrapping a high-availability OKD cluster from physical hardware.
|
||||
|
||||
|
||||
@@ -8,7 +8,6 @@
|
||||
## Use Cases
|
||||
|
||||
- [PostgreSQL on Local K3D](./use-cases/postgresql-on-local-k3d.md)
|
||||
- [OPNsense VM Integration](./use-cases/opnsense-vm-integration.md)
|
||||
- [OKD on Bare Metal](./use-cases/okd-on-bare-metal.md)
|
||||
|
||||
## Component Catalogs
|
||||
|
||||
@@ -1,238 +0,0 @@
|
||||
Here are some rough notes on the previous design :
|
||||
|
||||
- We found an issue where there could be primary flapping when network latency is larger than the primary self fencing timeout.
|
||||
- e.g. network latency to get nats ack is 30 seconds (extreme but can happen), and self-fencing happens after 50 seconds. Then at second 50 self-fencing would occur, and then at second 60 ack comes in. At this point we reject the ack as already failed because of timeout. Self fencing happens. But then network latency comes back down to 5 seconds and lets one successful heartbeat through, this means the primary comes back to healthy, and the same thing repeats, so the primary flaps.
|
||||
- At least this does not cause split brain since the replica never times out and wins the leadership write since we validate strict write ordering and we force consensus on writes.
|
||||
|
||||
Also, we were seeing that the implementation became more complex. There is a lot of timers to handle and that becomes hard to reason about for edge cases.
|
||||
|
||||
So, we came up with a slightly different approach, inspired by k8s liveness probes.
|
||||
|
||||
We now want to use a failure and success threshold counter . However, on the replica side, all we can do is use a timer. The timer we can use is time since last primary heartbeat jetstream metadata timestamp. We could also try and mitigate clock skew by measuring time between internal clock and jetstream metadata timestamp when writing our own heartbeat (not for now, but worth thinking about, though I feel like it is useless).
|
||||
|
||||
So the current working design is this :
|
||||
|
||||
configure :
|
||||
- number of consecutive success to mark the node as UP
|
||||
- number of consecutive failures to mark the node as DOWN
|
||||
- note that success/failure must be consecutive. One success in a row of failures is enough to keep service up. This allows for various configuration profiles, from very stict availability to very lenient depending on the number of failure tolerated and success required to keep the service up.
|
||||
- failure_threshold at 100 will let a service fail (or timeout) 99/100 and stay up
|
||||
- success_threshold at 100 will not bring back up a service until it has succeeded 100 heartbeat in a row
|
||||
- failure threshold at 1 will fail the service at the slightest network latency spike/packet loss
|
||||
- success threshold at 1 will bring the service up very quickly and may cause flapping in unstable network conditions
|
||||
|
||||
|
||||
```
|
||||
# heartbeat session log
|
||||
# failure threshold : 3
|
||||
# success threshold : 2
|
||||
|
||||
STATUS UP :
|
||||
t=1 probe : fail f=1 s=0
|
||||
t=2 probe : fail : f=2 s=0
|
||||
t=3 probe : ok f=0 s=1
|
||||
t=4 probe : fail f=1 s=0
|
||||
```
|
||||
|
||||
Scenario :
|
||||
|
||||
failure threshold = 2
|
||||
heartbeat timeout = 1s
|
||||
total before fencing = 2 * 1 = 2s
|
||||
|
||||
staleness detection timer = 2*total before fencing
|
||||
|
||||
can we do this simple multiplication that staleness detection timer (time the replica waits since the last primary heartbeat before promoting itself) is double the time the replica will take before starting the fencing process.
|
||||
|
||||
---
|
||||
|
||||
### Context
|
||||
We are designing a **Staleness-Based Failover Algorithm** for the Harmony Agent. The goal is to manage High Availability (HA) for stateful workloads (like PostgreSQL) across decentralized, variable-quality networks ("Micro Data Centers").
|
||||
|
||||
We are moving away from complex, synchronized clocks in favor of a **Counter-Based Liveness** approach (inspired by Kubernetes probes) for the Primary, and a **Time-Based Watchdog** for the Replica.
|
||||
|
||||
### 1. The Algorithm
|
||||
|
||||
#### The Primary (Self-Health & Fencing)
|
||||
The Primary validates its own "License to Operate" via a heartbeat loop.
|
||||
* **Loop:** Every `heartbeat_interval` (e.g., 1s), it attempts to write a heartbeat to NATS and check the local DB.
|
||||
* **Counters:** It maintains `consecutive_failures` and `consecutive_successes`.
|
||||
* **State Transition:**
|
||||
* **To UNHEALTHY:** If `consecutive_failures >= failure_threshold`, the Primary **Fences Self** (stops DB, releases locks).
|
||||
* **To HEALTHY:** If `consecutive_successes >= success_threshold`, the Primary **Un-fences** (starts DB, acquires locks).
|
||||
* **Reset Logic:** A single success resets the failure counter to 0, and vice versa.
|
||||
|
||||
#### The Replica (Staleness Detection)
|
||||
The Replica acts as a passive watchdog observing the NATS stream.
|
||||
* **Calculation:** It calculates a `MaxStaleness` timeout.
|
||||
$$ \text{MaxStaleness} = (\text{failure\_threshold} \times \text{heartbeat\_interval}) \times \text{SafetyMultiplier} $$
|
||||
*(We use a SafetyMultiplier of 2 to ensure the Primary has definitely fenced itself before we take over).*
|
||||
* **Action:** If `Time.now() - LastPrimaryHeartbeat > MaxStaleness`, the Replica assumes the Primary is dead and **Promotes Self**.
|
||||
|
||||
---
|
||||
|
||||
### 2. Configuration Trade-offs
|
||||
|
||||
The separation of `success` and `failure` thresholds allows us to tune the "personality" of the cluster.
|
||||
|
||||
#### Scenario A: The "Nervous" Cluster (High Sensitivity)
|
||||
* **Config:** `failure_threshold: 1`, `success_threshold: 1`
|
||||
* **Behavior:** Fails over immediately upon a single missed packet or slow disk write.
|
||||
* **Pros:** Maximum availability for perfect networks.
|
||||
* **Cons:** **High Flapping Risk.** In a residential network, a microwave turning on might cause a failover.
|
||||
|
||||
#### Scenario B: The "Tank" Cluster (High Stability)
|
||||
* **Config:** `failure_threshold: 10`, `success_threshold: 1`
|
||||
* **Behavior:** The node must be consistently broken for 10 seconds (assuming 1s interval) to give up.
|
||||
* **Pros:** Extremely stable on bad networks (e.g., Starlink, 4G). Ignores transient spikes.
|
||||
* **Cons:** **Slow Failover.** Users experience 10+ seconds of downtime before the Replica even *thinks* about taking over.
|
||||
|
||||
#### Scenario C: The "Sticky" Cluster (Hysteresis)
|
||||
* **Config:** `failure_threshold: 5`, `success_threshold: 5`
|
||||
* **Behavior:** Hard to kill, hard to bring back.
|
||||
* **Pros:** Prevents "Yo-Yo" effects. If a node fails, it must prove it is *really* stable (5 clean checks in a row) before re-joining the cluster.
|
||||
|
||||
---
|
||||
|
||||
### 3. Failure Modes & Behavior Analysis
|
||||
|
||||
Here is how the algorithm handles specific edge cases:
|
||||
|
||||
#### Case 1: Immediate Outage (Power Cut / Kernel Panic)
|
||||
* **Event:** Primary vanishes instantly. No more writes to NATS.
|
||||
* **Primary:** Does nothing (it's dead).
|
||||
* **Replica:** Sees the `LastPrimaryHeartbeat` timestamp age. Once it crosses `MaxStaleness`, it promotes itself.
|
||||
* **Outcome:** Clean failover after the timeout duration.
|
||||
|
||||
#### Case 2: Network Instability (Packet Loss / Jitter)
|
||||
* **Event:** The Primary fails to write to NATS for 2 cycles due to Wi-Fi interference, then succeeds on the 3rd.
|
||||
* **Config:** `failure_threshold: 5`.
|
||||
* **Primary:**
|
||||
* $t=1$: Fail (Counter=1)
|
||||
* $t=2$: Fail (Counter=2)
|
||||
* $t=3$: Success (Counter resets to 0). **State remains HEALTHY.**
|
||||
* **Replica:** Sees a gap in heartbeats but the timestamp never exceeds `MaxStaleness`.
|
||||
* **Outcome:** No downtime, no failover. The system correctly identified this as noise, not failure.
|
||||
|
||||
#### Case 3: High Latency (The "Slow Death")
|
||||
* **Event:** Primary is under heavy load; heartbeats take 1.5s to complete (interval is 1s).
|
||||
* **Primary:** The `timeout` on the heartbeat logic triggers. `consecutive_failures` rises. Eventually, it hits `failure_threshold` and fences itself to prevent data corruption.
|
||||
* **Replica:** Sees the heartbeats stop (or arrive too late). The timestamp ages out.
|
||||
* **Outcome:** Primary fences self -> Replica waits for safety buffer -> Replica promotes. **Split-brain is avoided** because the Primary killed itself *before* the Replica acted (due to the SafetyMultiplier).
|
||||
|
||||
#### Case 4: Replica Network Partition
|
||||
* **Event:** Replica loses internet connection; Primary is fine.
|
||||
* **Replica:** Sees `LastPrimaryHeartbeat` age out (because it can't reach NATS). It *wants* to promote itself.
|
||||
* **Constraint:** To promote, the Replica must write to NATS. Since it is partitioned, the NATS write fails.
|
||||
* **Outcome:** The Replica remains in Standby (or fails to promote). The Primary continues serving traffic. **Cluster integrity is preserved.**
|
||||
|
||||
|
||||
----
|
||||
|
||||
|
||||
### Context & Use Case
|
||||
We are implementing a High Availability (HA) Failover Strategy for decentralized "Micro Data Centers." The core challenge is managing stateful workloads (PostgreSQL) over unreliable networks.
|
||||
|
||||
We solve this using a **Local Fencing First** approach, backed by **NATS JetStream Strict Ordering** for the final promotion authority.
|
||||
|
||||
In CAP theorem terms, we are developing a CP system, intentionally sacrificing availability. In practical terms, we expect an average of two primary outages per year, with a failover delay of around 2 minutes. This translates to an uptime of over five nines. To be precise, 2 outages * 2 minutes = 4 minutes per year = 99.99924% uptime.
|
||||
|
||||
### The Algorithm: Local Fencing & Remote Promotion
|
||||
|
||||
The safety (data consistency) of the system relies on the time gap between the **Primary giving up (Fencing)** and the **Replica taking over (Promotion)**.
|
||||
|
||||
To avoid clock skew issues between agents and datastore (nats), all timestamps comparisons will be done using jetstream metadata. I.E. a harmony agent will never use `Instant::now()` to get a timestamp, it will use `my_last_heartbeat.metadata.timestamp` (conceptually).
|
||||
|
||||
#### 1. Configuration
|
||||
* `heartbeat_timeout` (e.g., 1s): Max time allowed for a NATS write/DB check.
|
||||
* `failure_threshold` (e.g., 2): Consecutive failures before self-fencing.
|
||||
* `failover_timeout` (e.g., 5s): Time since last NATS update of Primary heartbeat before Replica promotes.
|
||||
* This timeout must be carefully configured to allow enough time for the primary to fence itself (after `heartbeat_timeout * failure_threshold`) BEFORE the replica gets promoted to avoid a split brain with two primaries.
|
||||
* Implementing this will rely on the actual deployment configuration. For example, a CNPG based PostgreSQL cluster might require a longer gap (such as 30s) than other technologies.
|
||||
* Expires when `replica_heartbeat.metadata.timestamp - primary_heartbeat.metadata.timestamp > failover_timeout`
|
||||
|
||||
#### 2. The Primary (Self-Preservation)
|
||||
|
||||
The Primary is aggressive about killing itself.
|
||||
|
||||
* It attempts a heartbeat.
|
||||
* If the network latency > `heartbeat_timeout`, the attempt is **cancelled locally** because the heartbeat did not make it back in time.
|
||||
* This counts as a failure and increments the `consecutive_failures` counter.
|
||||
* If `consecutive_failures` hit the threshold, **FENCING occurs immediately**. The database is stopped.
|
||||
|
||||
This means that the Primary will fence itself after `heartbeat_timeout * failure_threshold`.
|
||||
|
||||
#### 3. The Replica (The Watchdog)
|
||||
|
||||
The Replica is patient.
|
||||
|
||||
* It watches the NATS stream to measure if `replica_heartbeat.metadata.timestamp - primary_heartbeat.metadata.timestamp > failover_timeout`
|
||||
* It only attempts promotion if the `failover_timeout` (5s) has passed.
|
||||
* **Crucial:** Careful configuration of the failover_timeout is required. This is the only way to avoid a split brain in case of a network partition where the Primary cannot write its heartbeats in time anymore.
|
||||
* In short, `failover_timeout` should be tuned to be `heartbeat_timeout * failure_threshold + safety_margin`. This `safety_margin` will vary by use case. For example, a CNPG cluster may need 30 seconds to demote a Primary to Replica when fencing is triggered, so `safety_margin` should be at least 30s in that setup.
|
||||
|
||||
Since we forcibly fail timeouts after `heartbeat_timeout`, we are guaranteed that the primary will have **started** the fencing process after `heartbeat_timeout * failure_threshold`.
|
||||
|
||||
But, in a network split scenario where the failed primary is still accessible by clients but cannot write its heartbeat successfully, there is no way to know if the demotion has actually **completed**.
|
||||
|
||||
For example, in a CNPG cluster, the failed Primary agent will attempt to change the CNPG cluster state to read-only. But if anything fails after that attempt (permission error, k8s api failure, CNPG bug, etc) it is possible that the PostgreSQL instance keeps accepting writes.
|
||||
|
||||
While this is not a theoretical failure of the agent's algorithm, this is a practical failure where data corruption occurs.
|
||||
|
||||
This can be fixed by detecting the demotion failure and escalating the fencing procedure aggressiveness. Harmony being an infrastructure orchestrator, it can easily exert radical measures if given the proper credentials, such as forcibly powering off a server, disconnecting its network in the switch configuration, forcibly kill a pod/container/process, etc.
|
||||
|
||||
However, these details are out of scope of this algorithm, as they simply fall under the "fencing procedure".
|
||||
|
||||
The implementation of the fencing procedure itself is not relevant. This algorithm's responsibility stops at calling the fencing procedure in the appropriate situation.
|
||||
|
||||
#### 4. The Demotion Handshake (Return to Normalcy)
|
||||
|
||||
When the original Primary recovers:
|
||||
|
||||
1. It becomes healthy locally but sees `current_primary = Replica`. It waits.
|
||||
2. The Replica (current leader) detects the Original Primary is back (via NATS heartbeats).
|
||||
3. Replica performs a **Clean Demotion**:
|
||||
* Stops DB.
|
||||
* Writes `current_primary = None` to NATS.
|
||||
4. Original Primary sees `current_primary = None` and can launch the promotion procedure.
|
||||
|
||||
Depending on the implementation, the promotion procedure may require a transition phase. Typically, for a PostgreSQL use case the promoting primary will make sure it has caught up on WAL replication before starting to accept writes.
|
||||
|
||||
---
|
||||
|
||||
### Failure Modes & Behavior Analysis
|
||||
|
||||
#### Case 1: Immediate Outage (Power Cut)
|
||||
|
||||
* **Primary:** Dies instantly. Fencing is implicit (machine is off).
|
||||
* **Replica:** Waits for `failover_timeout` (5s). Sees staleness. Promotes self.
|
||||
* **Outcome:** Clean failover after 5s.
|
||||
|
||||
// TODO detail what happens when the primary comes back up. We will likely have to tie PostgreSQL's lifecycle (liveness/readiness probes) with the agent to ensure it does not come back up as primary.
|
||||
|
||||
#### Case 2: High Network Latency on the Primary (The "Split Brain" Trap)
|
||||
|
||||
* **Scenario:** Network latency spikes to 5s on the Primary, still below `heartbeat_timeout` on the Replica.
|
||||
* **T=0 to T=2 (Primary):** Tries to write. Latency (5s) > Timeout (1s). Fails twice.
|
||||
* **T=2 (Primary):** `consecutive_failures` = 2. **Primary Fences Self.** (Service is DOWN).
|
||||
* **T=2 to T=5 (Cluster):** **Read-Only Phase.** No Primary exists.
|
||||
* **T=5 (Replica):** `failover_timeout` reached. Replica promotes self.
|
||||
* **Outcome:** Safe failover. The "Read-Only Gap" (T=2 to T=5) ensures no Split Brain occurred.
|
||||
|
||||
#### Case 3: Replica Network Lag (False Positive)
|
||||
|
||||
* **Scenario:** Replica has high latency, greater than `failover_timeout`; Primary is fine.
|
||||
* **Replica:** Thinks Primary is dead. Tries to promote by setting `cluster_state.current_primary = replica_id`.
|
||||
* **NATS:** Rejects the write because the Primary is still updating the sequence numbers successfully.
|
||||
* **Outcome:** Promotion denied. Primary stays leader.
|
||||
|
||||
#### Case 4: Network Instability (Flapping)
|
||||
|
||||
* **Scenario:** Intermittent packet loss.
|
||||
* **Primary:** Fails 1 heartbeat, succeeds the next. `consecutive_failures` resets.
|
||||
* **Replica:** Sees a slight delay in updates, but never reaches `failover_timeout`.
|
||||
* **Outcome:** No Fencing, No Promotion. System rides out the noise.
|
||||
|
||||
## Contextual notes
|
||||
|
||||
* Clock skew : Tokio relies on monotonic clocks. This means that `tokio::time::sleep(...)` will not be affected by system clock corrections (such as NTP). But monotonic clocks are known to jump forward in some cases such as VM live migrations. This could mean a false timeout of a single heartbeat. If `failure_threshold = 1`, this can mean a false negative on the nodes' health, and a potentially useless demotion.
|
||||
@@ -1,107 +0,0 @@
|
||||
### Context & Use Case
|
||||
We are implementing a High Availability (HA) Failover Strategy for decentralized "Micro Data Centers." The core challenge is managing stateful workloads (PostgreSQL) over unreliable networks.
|
||||
|
||||
We solve this using a **Local Fencing First** approach, backed by **NATS JetStream Strict Ordering** for the final promotion authority.
|
||||
|
||||
In CAP theorem terms, we are developing a CP system, intentionally sacrificing availability. In practical terms, we expect an average of two primary outages per year, with a failover delay of around 2 minutes. This translates to an uptime of over five nines. To be precise, 2 outages * 2 minutes = 4 minutes per year = 99.99924% uptime.
|
||||
|
||||
### The Algorithm: Local Fencing & Remote Promotion
|
||||
|
||||
The safety (data consistency) of the system relies on the time gap between the **Primary giving up (Fencing)** and the **Replica taking over (Promotion)**.
|
||||
|
||||
To avoid clock skew issues between agents and datastore (nats), all timestamps comparisons will be done using jetstream metadata. I.E. a harmony agent will never use `Instant::now()` to get a timestamp, it will use `my_last_heartbeat.metadata.timestamp` (conceptually).
|
||||
|
||||
#### 1. Configuration
|
||||
* `heartbeat_timeout` (e.g., 1s): Max time allowed for a NATS write/DB check.
|
||||
* `failure_threshold` (e.g., 2): Consecutive failures before self-fencing.
|
||||
* `failover_timeout` (e.g., 5s): Time since last NATS update of Primary heartbeat before Replica promotes.
|
||||
* This timeout must be carefully configured to allow enough time for the primary to fence itself (after `heartbeat_timeout * failure_threshold`) BEFORE the replica gets promoted to avoid a split brain with two primaries.
|
||||
* Implementing this will rely on the actual deployment configuration. For example, a CNPG based PostgreSQL cluster might require a longer gap (such as 30s) than other technologies.
|
||||
* Expires when `replica_heartbeat.metadata.timestamp - primary_heartbeat.metadata.timestamp > failover_timeout`
|
||||
|
||||
#### 2. The Primary (Self-Preservation)
|
||||
|
||||
The Primary is aggressive about killing itself.
|
||||
|
||||
* It attempts a heartbeat.
|
||||
* If the network latency > `heartbeat_timeout`, the attempt is **cancelled locally** because the heartbeat did not make it back in time.
|
||||
* This counts as a failure and increments the `consecutive_failures` counter.
|
||||
* If `consecutive_failures` hit the threshold, **FENCING occurs immediately**. The database is stopped.
|
||||
|
||||
This means that the Primary will fence itself after `heartbeat_timeout * failure_threshold`.
|
||||
|
||||
#### 3. The Replica (The Watchdog)
|
||||
|
||||
The Replica is patient.
|
||||
|
||||
* It watches the NATS stream to measure if `replica_heartbeat.metadata.timestamp - primary_heartbeat.metadata.timestamp > failover_timeout`
|
||||
* It only attempts promotion if the `failover_timeout` (5s) has passed.
|
||||
* **Crucial:** Careful configuration of the failover_timeout is required. This is the only way to avoid a split brain in case of a network partition where the Primary cannot write its heartbeats in time anymore.
|
||||
* In short, `failover_timeout` should be tuned to be `heartbeat_timeout * failure_threshold + safety_margin`. This `safety_margin` will vary by use case. For example, a CNPG cluster may need 30 seconds to demote a Primary to Replica when fencing is triggered, so `safety_margin` should be at least 30s in that setup.
|
||||
|
||||
Since we forcibly fail timeouts after `heartbeat_timeout`, we are guaranteed that the primary will have **started** the fencing process after `heartbeat_timeout * failure_threshold`.
|
||||
|
||||
But, in a network split scenario where the failed primary is still accessible by clients but cannot write its heartbeat successfully, there is no way to know if the demotion has actually **completed**.
|
||||
|
||||
For example, in a CNPG cluster, the failed Primary agent will attempt to change the CNPG cluster state to read-only. But if anything fails after that attempt (permission error, k8s api failure, CNPG bug, etc) it is possible that the PostgreSQL instance keeps accepting writes.
|
||||
|
||||
While this is not a theoretical failure of the agent's algorithm, this is a practical failure where data corruption occurs.
|
||||
|
||||
This can be fixed by detecting the demotion failure and escalating the fencing procedure aggressiveness. Harmony being an infrastructure orchestrator, it can easily exert radical measures if given the proper credentials, such as forcibly powering off a server, disconnecting its network in the switch configuration, forcibly kill a pod/container/process, etc.
|
||||
|
||||
However, these details are out of scope of this algorithm, as they simply fall under the "fencing procedure".
|
||||
|
||||
The implementation of the fencing procedure itself is not relevant. This algorithm's responsibility stops at calling the fencing procedure in the appropriate situation.
|
||||
|
||||
#### 4. The Demotion Handshake (Return to Normalcy)
|
||||
|
||||
When the original Primary recovers:
|
||||
|
||||
1. It becomes healthy locally but sees `current_primary = Replica`. It waits.
|
||||
2. The Replica (current leader) detects the Original Primary is back (via NATS heartbeats).
|
||||
3. Replica performs a **Clean Demotion**:
|
||||
* Stops DB.
|
||||
* Writes `current_primary = None` to NATS.
|
||||
4. Original Primary sees `current_primary = None` and can launch the promotion procedure.
|
||||
|
||||
Depending on the implementation, the promotion procedure may require a transition phase. Typically, for a PostgreSQL use case the promoting primary will make sure it has caught up on WAL replication before starting to accept writes.
|
||||
|
||||
---
|
||||
|
||||
### Failure Modes & Behavior Analysis
|
||||
|
||||
#### Case 1: Immediate Outage (Power Cut)
|
||||
|
||||
* **Primary:** Dies instantly. Fencing is implicit (machine is off).
|
||||
* **Replica:** Waits for `failover_timeout` (5s). Sees staleness. Promotes self.
|
||||
* **Outcome:** Clean failover after 5s.
|
||||
|
||||
// TODO detail what happens when the primary comes back up. We will likely have to tie PostgreSQL's lifecycle (liveness/readiness probes) with the agent to ensure it does not come back up as primary.
|
||||
|
||||
#### Case 2: High Network Latency on the Primary (The "Split Brain" Trap)
|
||||
|
||||
* **Scenario:** Network latency spikes to 5s on the Primary, still below `heartbeat_timeout` on the Replica.
|
||||
* **T=0 to T=2 (Primary):** Tries to write. Latency (5s) > Timeout (1s). Fails twice.
|
||||
* **T=2 (Primary):** `consecutive_failures` = 2. **Primary Fences Self.** (Service is DOWN).
|
||||
* **T=2 to T=5 (Cluster):** **Read-Only Phase.** No Primary exists.
|
||||
* **T=5 (Replica):** `failover_timeout` reached. Replica promotes self.
|
||||
* **Outcome:** Safe failover. The "Read-Only Gap" (T=2 to T=5) ensures no Split Brain occurred.
|
||||
|
||||
#### Case 3: Replica Network Lag (False Positive)
|
||||
|
||||
* **Scenario:** Replica has high latency, greater than `failover_timeout`; Primary is fine.
|
||||
* **Replica:** Thinks Primary is dead. Tries to promote by setting `cluster_state.current_primary = replica_id`.
|
||||
* **NATS:** Rejects the write because the Primary is still updating the sequence numbers successfully.
|
||||
* **Outcome:** Promotion denied. Primary stays leader.
|
||||
|
||||
#### Case 4: Network Instability (Flapping)
|
||||
|
||||
* **Scenario:** Intermittent packet loss.
|
||||
* **Primary:** Fails 1 heartbeat, succeeds the next. `consecutive_failures` resets.
|
||||
* **Replica:** Sees a slight delay in updates, but never reaches `failover_timeout`.
|
||||
* **Outcome:** No Fencing, No Promotion. System rides out the noise.
|
||||
|
||||
## Contextual notes
|
||||
|
||||
* Clock skew : Tokio relies on monotonic clocks. This means that `tokio::time::sleep(...)` will not be affected by system clock corrections (such as NTP). But monotonic clocks are known to jump forward in some cases such as VM live migrations. This could mean a false timeout of a single heartbeat. If `failure_threshold = 1`, this can mean a false negative on the nodes' health, and a potentially useless demotion.
|
||||
* `heartbeat_timeout == heartbeat_interval` : We intentionally do not provide two separate settings for the timeout before considering a heartbeat failed and the interval between heartbeats. It could make sense in some configurations where low network latency is required to have a small `heartbeat_timeout = 50ms` and larger `hartbeat_interval == 2s`, but we do not have a practical use case for it yet. And having timeout larger than interval does not make sense in any situation we can think of at the moment. So we decided to have a single value for both, which makes the algorithm easier to reason about and implement.
|
||||
@@ -1,95 +0,0 @@
|
||||
# Architecture Decision Record: Staleness-Based Failover Mechanism & Observability
|
||||
|
||||
**Status:** Proposed
|
||||
**Date:** 2026-01-09
|
||||
**Precedes:** [016-Harmony-Agent-And-Global-Mesh-For-Decentralized-Workload-Management.md](https://git.nationtech.io/NationTech/harmony/raw/branch/master/adr/016-Harmony-Agent-And-Global-Mesh-For-Decentralized-Workload-Management.md)
|
||||
|
||||
## Context
|
||||
|
||||
In ADR 016, we established the **Harmony Agent** and the **Global Orchestration Mesh** (powered by NATS JetStream) as the foundation for our decentralized infrastructure. We defined the high-level need for a `FailoverStrategy` that can support both financial consistency (CP) and AI availability (AP).
|
||||
|
||||
However, a specific implementation challenge remains: **How do we reliably detect node failure without losing the ability to debug the event later?**
|
||||
|
||||
Standard distributed systems often use "Key Expiration" (TTL) for heartbeats. If a key disappears, the node is presumed dead. While simple, this approach is catastrophic for post-mortem analysis. When the key expires, the evidence of *when* and *how* the failure occurred evaporates.
|
||||
|
||||
For NationTech’s vision of **Humane Computing**—where micro datacenters might be heating a family home or running a local business—reliability and diagnosability are paramount. If a cluster fails over, we owe it to the user to provide a clear, historical log of exactly what happened. We cannot build a "wonderful future for computers" on ephemeral, untraceable errors.
|
||||
|
||||
## Decision
|
||||
|
||||
We will implement a **Staleness Detection** mechanism rather than a Key Expiration mechanism. We will leverage NATS JetStream Key-Value (KV) stores with **History Enabled** to create an immutable audit trail of cluster health.
|
||||
|
||||
### 1. The "Black Box" Flight Recorder (NATS Configuration)
|
||||
We will utilize a persistent NATS KV bucket named `harmony_failover`.
|
||||
* **Storage:** File (Persistent).
|
||||
* **History:** Set to `64` (or higher). This allows us to query the last 64 heartbeat entries to visualize the exact degradation of the primary node before failure.
|
||||
* **TTL:** None. Data never disappears; it only becomes "stale."
|
||||
|
||||
### 2. Data Structures
|
||||
We will define two primary schemas to manage the state.
|
||||
|
||||
|
||||
**A. The Rules of Engagement (`cluster_config`)**
|
||||
This persistent key defines the behavior of the mesh. It allows us to tune failover sensitivity dynamically without redeploying the Agent binary.
|
||||
|
||||
```json
|
||||
{
|
||||
"primary_site_id": "site-a-basement",
|
||||
"replica_site_id": "site-b-cloud",
|
||||
"failover_timeout_ms": 5000, // Time before Replica takes over
|
||||
"heartbeat_interval_ms": 1000 // Frequency of Primary updates
|
||||
}
|
||||
```
|
||||
|
||||
> **Note :** The location for this configuration data structure is TBD. See https://git.nationtech.io/NationTech/harmony/issues/206
|
||||
|
||||
**B. The Heartbeat (`primary_heartbeat`)**
|
||||
The Primary writes this; the Replica watches it.
|
||||
|
||||
```json
|
||||
{
|
||||
"site_id": "site-a-basement",
|
||||
"status": "HEALTHY",
|
||||
"counter": 10452,
|
||||
"timestamp": 1704661549000
|
||||
}
|
||||
```
|
||||
|
||||
### 3. The Failover Algorithm
|
||||
|
||||
**The Primary (Site A) Logic:**
|
||||
The Primary's ability to write to the mesh is its "License to Operate."
|
||||
1. **Write Loop:** Attempts to write `primary_heartbeat` every `heartbeat_interval_ms`.
|
||||
2. **Self-Preservation (Fencing):** If the write fails (NATS Ack timeout or NATS unreachable), the Primary **immediately self-demotes**. It assumes it is network-isolated. This prevents Split Brain scenarios where a partitioned Primary continues to accept writes while the Replica promotes itself.
|
||||
|
||||
**The Replica (Site B) Logic:**
|
||||
The Replica acts as the watchdog.
|
||||
1. **Watch:** Subscribes to updates on `primary_heartbeat`.
|
||||
2. **Staleness Check:** Maintains a local timer. Every time a heartbeat arrives, the timer resets.
|
||||
3. **Promotion:** If the timer exceeds `failover_timeout_ms`, the Replica declares the Primary dead and promotes itself to Leader.
|
||||
4. **Yielding:** If the Replica is Leader, but suddenly receives a valid, new heartbeat from the configured `primary_site_id` (indicating the Primary has recovered), the Replica will voluntarily **demote** itself to restore the preferred topology.
|
||||
|
||||
## Rationale
|
||||
|
||||
**Observability as a First-Class Citizen**
|
||||
By keeping the last 64 heartbeats, we can run `nats kv history` to see the exact timeline. Did the Primary stop suddenly (crash)? or did the heartbeats become erratic and slow before stopping (network congestion)? This data is critical for optimizing the "Micro Data Centers" described in our vision, where internet connections in residential areas may vary in quality.
|
||||
|
||||
**Energy Efficiency & Resource Optimization**
|
||||
NationTech aims to "maximize the value of our energy." A "flapping" cluster (constantly failing over and back) wastes immense energy in data re-synchronization and startup costs. By making the `failover_timeout_ms` configurable via `cluster_config`, we can tune a cluster heating a greenhouse to be less sensitive (slower failover is fine) compared to a cluster running a payment gateway.
|
||||
|
||||
**Decentralized Trust**
|
||||
This architecture relies on NATS as the consensus engine. If the Primary is part of the NATS majority, it lives. If it isn't, it dies. This removes ambiguity and allows us to scale to thousands of independent sites without a central "God mode" controller managing every single failover.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**
|
||||
* **Auditability:** Every failover event leaves a permanent trace in the KV history.
|
||||
* **Safety:** The "Write Ack" check on the Primary provides a strong guarantee against Split Brain in `AbsoluteConsistency` mode.
|
||||
* **Dynamic Tuning:** We can adjust timeouts for specific environments (e.g., high-latency satellite links) by updating a JSON key, requiring no downtime.
|
||||
|
||||
**Negative**
|
||||
* **Storage Overhead:** Keeping history requires marginally more disk space on the NATS servers, though for 64 small JSON payloads, this is negligible.
|
||||
* **Clock Skew:** While we rely on NATS server-side timestamps for ordering, extreme clock skew on the client side could confuse the debug logs (though not the failover logic itself).
|
||||
|
||||
## Alignment with Vision
|
||||
This architecture supports the NationTech goal of a **"Beautifully Integrated Design."** It takes the complex, high-stakes problem of distributed consensus and wraps it in a mechanism that is robust enough for enterprise banking yet flexible enough to manage a basement server heating a swimming pool. It bridges the gap between the reliability of Web2 clouds and the decentralized nature of Web3 infrastructure.
|
||||
|
||||
@@ -1,181 +0,0 @@
|
||||
# Harmony Architecture — Three Open Challenges
|
||||
|
||||
Three problems that, if solved well, would make Harmony the most capable infrastructure automation framework in existence.
|
||||
|
||||
## 1. Topology Evolution During Deployment
|
||||
|
||||
### The problem
|
||||
|
||||
A bare-metal OKD deployment is a multi-hour process where the infrastructure's capabilities change as the deployment progresses:
|
||||
|
||||
```
|
||||
Phase 0: Network only → OPNsense reachable, Brocade reachable, no hosts
|
||||
Phase 1: Discovery → PXE boots work, hosts appear via mDNS, no k8s
|
||||
Phase 2: Bootstrap → openshift-install running, API partially available
|
||||
Phase 3: Control plane → k8s API available, operators converging, no workers
|
||||
Phase 4: Workers → Full cluster, apps can be deployed
|
||||
Phase 5: Day-2 → Monitoring, alerting, tenant onboarding
|
||||
```
|
||||
|
||||
Today, `HAClusterTopology` implements _all_ capability traits from the start. If a Score calls `k8s_client()` during Phase 0, it hits `DummyInfra` which panics. The type system says "this is valid" but the runtime says "this will crash."
|
||||
|
||||
### Why it matters
|
||||
|
||||
- Scores that require k8s compile and register happily at Phase 0, then panic if accidentally executed too early
|
||||
- The pipeline is ordered by convention (Stage 01 → 02 → 03 → ...) but nothing enforces that Stage 04 can't run before Stage 02
|
||||
- Adding new capabilities (like "cluster has monitoring installed") requires editing the topology struct, not declaring the capability was acquired
|
||||
|
||||
### Design direction
|
||||
|
||||
The topology should evolve through **phases** where capabilities are _acquired_, not assumed. Two possible approaches:
|
||||
|
||||
**A. Phase-gated topology (runtime)**
|
||||
|
||||
The topology tracks which phase it's in. Capability methods check the phase before executing and return a meaningful error instead of panicking:
|
||||
|
||||
```rust
|
||||
impl K8sclient for HAClusterTopology {
|
||||
async fn k8s_client(&self) -> Result<Arc<K8sClient>, String> {
|
||||
if self.phase < Phase::ControlPlaneReady {
|
||||
return Err("k8s API not available yet (current phase: {})".into());
|
||||
}
|
||||
// ... actual implementation
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Scores that fail due to phase mismatch get a clear error message, not a panic. The Maestro can validate phase requirements before executing a Score.
|
||||
|
||||
**B. Typestate topology (compile-time)**
|
||||
|
||||
Use Rust's type system to make invalid phase transitions unrepresentable:
|
||||
|
||||
```rust
|
||||
struct Topology<P: Phase> { ... }
|
||||
|
||||
impl Topology<NetworkReady> {
|
||||
fn bootstrap(self) -> Topology<Bootstrapping> { ... }
|
||||
}
|
||||
impl Topology<Bootstrapping> {
|
||||
fn promote(self) -> Topology<ClusterReady> { ... }
|
||||
}
|
||||
|
||||
// Only ClusterReady implements K8sclient
|
||||
impl K8sclient for Topology<ClusterReady> { ... }
|
||||
```
|
||||
|
||||
This is the "correct" Rust approach but requires significant refactoring and may be too rigid for real deployments where phases overlap.
|
||||
|
||||
**Recommendation**: Start with (A) — runtime phase tracking. It's additive (no breaking changes), catches the DummyInfra panic problem immediately, and provides the data needed for (B) later.
|
||||
|
||||
---
|
||||
|
||||
## 2. Runtime Plan & Validation Phase
|
||||
|
||||
### The problem
|
||||
|
||||
Harmony validates Scores at compile time: if a Score requires `DhcpServer + TftpServer`, the topology must implement both traits or the program won't compile. This is powerful but insufficient.
|
||||
|
||||
What compile-time _cannot_ check:
|
||||
- Is the OPNsense API actually reachable right now?
|
||||
- Does VLAN 100 already exist (so we can skip creating it)?
|
||||
- Is there already a DHCP entry for this MAC address?
|
||||
- Will this firewall rule conflict with an existing one?
|
||||
- Is there enough disk space on the TFTP server for the boot images?
|
||||
|
||||
Today, these are discovered at execution time, deep inside an Interpret's `execute()` method. A failure at minute 45 of a deployment is expensive.
|
||||
|
||||
### Why it matters
|
||||
|
||||
- No way to preview what Harmony will do before it does it
|
||||
- No way to detect conflicts or precondition failures early
|
||||
- Operators must read logs to understand what happened — there's no structured "here's what I did" report
|
||||
- Re-running a deployment is scary because you don't know what will be re-applied vs skipped
|
||||
|
||||
### Design direction
|
||||
|
||||
Add a **validate** phase to the Score/Interpret lifecycle:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Interpret<T>: Debug + Send {
|
||||
/// Check preconditions and return what this interpret WOULD do.
|
||||
/// Default implementation returns "will execute" (opt-in validation).
|
||||
async fn validate(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &T,
|
||||
) -> Result<ValidationReport, InterpretError> {
|
||||
Ok(ValidationReport::will_execute(self.get_name()))
|
||||
}
|
||||
|
||||
/// Execute the interpret (existing method, unchanged).
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &T,
|
||||
) -> Result<Outcome, InterpretError>;
|
||||
|
||||
// ... existing methods
|
||||
}
|
||||
```
|
||||
|
||||
A `ValidationReport` would contain:
|
||||
- **Status**: `WillCreate`, `WillUpdate`, `WillDelete`, `AlreadyApplied`, `Blocked(reason)`
|
||||
- **Details**: human-readable description of planned changes
|
||||
- **Preconditions**: list of checks performed and their results
|
||||
|
||||
The Maestro would run validation for all registered Scores before executing any of them, producing a plan that the operator reviews.
|
||||
|
||||
This is opt-in: Scores that don't implement `validate()` get a default "will execute" report. Over time, each Score adds validation logic. The OPNsense Scores are ideal first candidates since they can query current state via the API.
|
||||
|
||||
### Relationship to state
|
||||
|
||||
This approach does _not_ require a state file. Validation queries the infrastructure directly — the same philosophy Harmony already follows. The "plan" is computed fresh every time by asking the infrastructure what exists right now.
|
||||
|
||||
---
|
||||
|
||||
## 3. TUI as Primary Interface
|
||||
|
||||
### The problem
|
||||
|
||||
The TUI (`harmony_tui`) exists with ratatui, crossterm, and tui-logger, but it's underused. The CLI (`harmony_cli`) is the primary interface. During a multi-hour deployment, operators watch scrolling log output with no structure, no ability to drill into a specific Score's progress, and no overview of where they are in the pipeline.
|
||||
|
||||
### Why it matters
|
||||
|
||||
- Log output during interactive prompts corrupts the terminal
|
||||
- No way to see "I'm on Stage 3 of 7, 2 hours elapsed, 3 Scores completed successfully"
|
||||
- No way to inspect a Score's configuration or outcome without reading logs
|
||||
- The pipeline feels like a black box during execution
|
||||
|
||||
### Design direction
|
||||
|
||||
The TUI should provide three views:
|
||||
|
||||
**Pipeline view** — the default. Shows the ordered list of Scores with their status:
|
||||
```
|
||||
OKD HA Cluster Deployment [Stage 3/7 — 1h 42m elapsed]
|
||||
──────────────────────────────────────────────────────────────────
|
||||
✅ OKDIpxeScore 2m 14s
|
||||
✅ OKDSetup01InventoryScore 8m 03s
|
||||
✅ OKDSetup02BootstrapScore 34m 21s
|
||||
▶ OKDSetup03ControlPlaneScore ... running
|
||||
⏳ OKDSetupPersistNetworkBondScore
|
||||
⏳ OKDSetup04WorkersScore
|
||||
⏳ OKDSetup06InstallationReportScore
|
||||
```
|
||||
|
||||
**Detail view** — press Enter on a Score to see its Outcome details, sub-score executions, and logs.
|
||||
|
||||
**Log view** — the current tui-logger panel, filtered to the selected Score.
|
||||
|
||||
The TUI already has the Score widget and log integration. What's missing is the pipeline-level orchestration view and the duration/status data — which the `Score::interpret` timing we just added now provides.
|
||||
|
||||
### Immediate enablers
|
||||
|
||||
The instrumentation event system (`HarmonyEvent`) already captures start/finish with execution IDs. The TUI subscriber just needs to:
|
||||
1. Track the ordered list of Scores from the Maestro
|
||||
2. Update status as `InterpretExecutionStarted`/`Finished` events arrive
|
||||
3. Render the pipeline view using ratatui
|
||||
|
||||
This doesn't require architectural changes — it's a TUI feature built on existing infrastructure.
|
||||
@@ -1,229 +0,0 @@
|
||||
# Harmony Coding Guide
|
||||
|
||||
Harmony is an infrastructure automation framework. It is **code-first and code-only**: operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Good code here means a good operator experience.
|
||||
|
||||
### Concrete context
|
||||
|
||||
We use here the context of the KVM module to explain the coding style. This will make it very easy to understand and should translate quite well to other modules/contexts managed by Harmony like OPNSense and Kubernetes.
|
||||
|
||||
## Core Philosophy
|
||||
|
||||
### High-level functions over raw primitives
|
||||
|
||||
Callers should not need to know about underlying protocols, XML schemas, or API quirks. A function that deploys a VM should accept meaningful parameters like CPU count, memory, and network name — not XML strings.
|
||||
|
||||
```rust
|
||||
// Bad: caller constructs XML and passes it to a thin wrapper
|
||||
let xml = format!(r#"<domain type='kvm'>...</domain>"#, name, memory_kb, ...);
|
||||
executor.create_vm(&xml).await?;
|
||||
|
||||
// Good: caller describes intent, the module handles representation
|
||||
executor.define_vm(&VmConfig::builder("my-vm")
|
||||
.cpu(4)
|
||||
.memory_gb(8)
|
||||
.disk(DiskConfig::new(50))
|
||||
.network(NetworkRef::named("mylan"))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build())
|
||||
.await?;
|
||||
```
|
||||
|
||||
The module owns the XML, the virsh invocations, the API calls — not the caller.
|
||||
|
||||
### Use the right abstraction layer
|
||||
|
||||
Prefer native library bindings over shelling out to CLI tools. The `virt` crate provides direct libvirt bindings and should be used instead of spawning `virsh` subprocesses.
|
||||
|
||||
- CLI subprocess calls are fragile: stdout/stderr parsing, exit codes, quoting, PATH differences
|
||||
- Native bindings give typed errors, no temp files, no shell escaping
|
||||
- `virt::connect::Connect` opens a connection; `virt::domain::Domain` manages VMs; `virt::network::Network` manages virtual networks
|
||||
|
||||
### Keep functions small and well-named
|
||||
|
||||
Each function should do one thing. If a function is doing two conceptually separate things, split it. Function names should read like plain English: `ensure_network_active`, `define_vm`, `vm_is_running`.
|
||||
|
||||
### Prefer short modules over large files
|
||||
|
||||
Group related types and functions by concept. A module that handles one resource (e.g., network, domain, storage) is better than a single file for everything.
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Use `thiserror` for all error types
|
||||
|
||||
Define error types with `thiserror::Error`. This removes the boilerplate of implementing `Display` and `std::error::Error` by hand, keeps error messages close to their variants, and makes types easy to extend.
|
||||
|
||||
```rust
|
||||
// Bad: hand-rolled Display + std::error::Error
|
||||
#[derive(Debug)]
|
||||
pub enum KVMError {
|
||||
ConnectionError(String),
|
||||
VMNotFound(String),
|
||||
}
|
||||
|
||||
impl std::fmt::Display for KVMError { ... }
|
||||
impl std::error::Error for KVMError {}
|
||||
|
||||
// Good: derive Display via thiserror
|
||||
#[derive(thiserror::Error, Debug)]
|
||||
pub enum KVMError {
|
||||
#[error("connection failed: {0}")]
|
||||
ConnectionFailed(String),
|
||||
#[error("VM not found: {name}")]
|
||||
VmNotFound { name: String },
|
||||
}
|
||||
```
|
||||
|
||||
### Make bubbling errors easy with `?` and `From`
|
||||
|
||||
`?` works on any error type for which there is a `From` impl. Add `From` conversions from lower-level errors into your module's error type so callers can use `?` without boilerplate.
|
||||
|
||||
With `thiserror`, wrapping a foreign error is one line:
|
||||
|
||||
```rust
|
||||
#[derive(thiserror::Error, Debug)]
|
||||
pub enum KVMError {
|
||||
#[error("libvirt error: {0}")]
|
||||
Libvirt(#[from] virt::error::Error),
|
||||
|
||||
#[error("IO error: {0}")]
|
||||
Io(#[from] std::io::Error),
|
||||
}
|
||||
```
|
||||
|
||||
This means a call that returns `virt::error::Error` can be `?`-propagated into a `Result<_, KVMError>` without any `.map_err(...)`.
|
||||
|
||||
### Typed errors over stringly-typed errors
|
||||
|
||||
Avoid `Box<dyn Error>` or `String` as error return types in library code. Callers need to distinguish errors programmatically — `KVMError::VmAlreadyExists` is actionable, `"VM already exists: foo"` as a `String` is not.
|
||||
|
||||
At binary entry points (e.g., `main`) it is acceptable to convert to `String` or `anyhow::Error` for display.
|
||||
|
||||
---
|
||||
|
||||
## Logging
|
||||
|
||||
### Use the `log` crate macros
|
||||
|
||||
All log output must go through the `log` crate. Never use `println!`, `eprintln!`, or `dbg!` in library code. This makes output compatible with any logging backend (env_logger, tracing, structured logging, etc.).
|
||||
|
||||
```rust
|
||||
// Bad
|
||||
println!("Creating VM: {}", name);
|
||||
|
||||
// Good
|
||||
use log::{info, debug, warn};
|
||||
info!("Creating VM: {name}");
|
||||
debug!("VM XML:\n{xml}");
|
||||
warn!("Network already active, skipping creation");
|
||||
```
|
||||
|
||||
Use the right level:
|
||||
|
||||
| Level | When to use |
|
||||
|---------|-------------|
|
||||
| `error` | Unrecoverable failures (before returning Err) |
|
||||
| `warn` | Recoverable issues, skipped steps |
|
||||
| `info` | High-level progress events visible in normal operation |
|
||||
| `debug` | Detailed operational info useful for debugging |
|
||||
| `trace` | Very granular, per-iteration or per-call data |
|
||||
|
||||
Log before significant operations and after unexpected conditions. Do not log inside tight loops at `info` level.
|
||||
|
||||
---
|
||||
|
||||
## Types and Builders
|
||||
|
||||
### Derive `Serialize` on all public domain types
|
||||
|
||||
All public structs and enums that represent configuration or state should derive `serde::Serialize`. Add `Deserialize` when round-trip serialization is needed.
|
||||
|
||||
### Builder pattern for complex configs
|
||||
|
||||
When a type has more than three fields or optional fields, provide a builder. The builder pattern allows named, incremental construction without positional arguments.
|
||||
|
||||
```rust
|
||||
let config = VmConfig::builder("bootstrap")
|
||||
.cpu(4)
|
||||
.memory_gb(8)
|
||||
.disk(DiskConfig::new(50).labeled("os"))
|
||||
.disk(DiskConfig::new(100).labeled("data"))
|
||||
.network(NetworkRef::named("harmonylan"))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build();
|
||||
```
|
||||
|
||||
### Avoid `pub` fields on config structs
|
||||
|
||||
Expose data through methods or the builder, not raw field access. This preserves the ability to validate, rename, or change representation without breaking callers.
|
||||
|
||||
---
|
||||
|
||||
## Async
|
||||
|
||||
### Use `tokio` for all async runtime needs
|
||||
|
||||
All async code runs on tokio. Use `tokio::spawn`, `tokio::time`, etc. Use `#[async_trait]` for traits with async methods.
|
||||
|
||||
### No blocking in async context
|
||||
|
||||
Never call blocking I/O (file I/O, network, process spawn) directly in an async function. Use `tokio::fs`, `tokio::process`, or `tokio::task::spawn_blocking` as appropriate.
|
||||
|
||||
---
|
||||
|
||||
## Module Structure
|
||||
|
||||
### Follow the `Score` / `Interpret` pattern
|
||||
|
||||
Modules that represent deployable infrastructure should implement `Score<T: Topology>` and `Interpret<T>`:
|
||||
|
||||
- `Score` is the serializable, clonable configuration declaring *what* to deploy
|
||||
- `Interpret` does the actual work when `execute()` is called
|
||||
|
||||
```rust
|
||||
pub struct KvmScore {
|
||||
network: NetworkConfig,
|
||||
vms: Vec<VmConfig>,
|
||||
}
|
||||
|
||||
impl<T: Topology + KvmHost> Score<T> for KvmScore {
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||
Box::new(KvmInterpret::new(self.clone()))
|
||||
}
|
||||
fn name(&self) -> String { "KvmScore".to_string() }
|
||||
}
|
||||
```
|
||||
|
||||
### Flatten the public API in `mod.rs`
|
||||
|
||||
Internal submodules are implementation detail. Re-export what callers need at the module root:
|
||||
|
||||
```rust
|
||||
// modules/kvm/mod.rs
|
||||
mod connection;
|
||||
mod domain;
|
||||
mod network;
|
||||
mod error;
|
||||
mod xml;
|
||||
|
||||
pub use connection::KvmConnection;
|
||||
pub use domain::{VmConfig, VmConfigBuilder, VmStatus, DiskConfig, BootDevice};
|
||||
pub use error::KvmError;
|
||||
pub use network::NetworkConfig;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Commit Style
|
||||
|
||||
Follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/):
|
||||
|
||||
```
|
||||
feat(kvm): add network isolation support
|
||||
fix(kvm): correct memory unit conversion for libvirt
|
||||
refactor(kvm): replace virsh subprocess calls with virt crate bindings
|
||||
docs: add coding guide
|
||||
```
|
||||
|
||||
Keep pull requests small and single-purpose (under ~200 lines excluding generated code). Do not mix refactoring, bug fixes, and new features in one PR.
|
||||
@@ -1,158 +0,0 @@
|
||||
# Ingress Resources in Harmony
|
||||
|
||||
Harmony generates standard Kubernetes `networking.k8s.io/v1` Ingress resources. This ensures your deployments are portable across any Kubernetes distribution (vanilla K8s, OKD/OpenShift, K3s, etc.) without requiring vendor-specific configurations.
|
||||
|
||||
By default, Harmony does **not** set `spec.ingressClassName`. This allows the cluster's default ingress controller to automatically claim the resource, which is the correct approach for most single-controller clusters.
|
||||
|
||||
---
|
||||
|
||||
## TLS Configurations
|
||||
|
||||
There are two portable TLS modes for Ingress resources. Use only these in your Harmony deployments.
|
||||
|
||||
### 1. Plain HTTP (No TLS)
|
||||
|
||||
Omit the `tls` block entirely. The Ingress serves traffic over plain HTTP. Use this for local development or when TLS is terminated elsewhere (e.g., by a service mesh or external load balancer).
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: my-app
|
||||
namespace: my-ns
|
||||
spec:
|
||||
rules:
|
||||
- host: app.example.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: my-app
|
||||
port:
|
||||
number: 8080
|
||||
```
|
||||
|
||||
### 2. HTTPS with a Named TLS Secret
|
||||
|
||||
Provide a `tls` block with both `hosts` and a `secretName`. The ingress controller will use that Secret for TLS termination. The Secret must be a `kubernetes.io/tls` type in the same namespace as the Ingress.
|
||||
|
||||
There are two ways to provide this Secret.
|
||||
|
||||
#### Option A: Manual Secret
|
||||
|
||||
Create the TLS Secret yourself before deploying the Ingress. This is suitable when certificates are issued outside the cluster or managed by another system.
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: my-app
|
||||
namespace: my-ns
|
||||
spec:
|
||||
rules:
|
||||
- host: app.example.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: my-app
|
||||
port:
|
||||
number: 8080
|
||||
tls:
|
||||
- hosts:
|
||||
- app.example.com
|
||||
secretName: app-example-com-tls
|
||||
```
|
||||
|
||||
#### Option B: Automated via cert-manager (Recommended)
|
||||
|
||||
Add the `cert-manager.io/cluster-issuer` annotation to the Ingress. cert-manager will automatically perform the ACME challenge, generate the certificate, store it in the named Secret, and handle renewal. You do not create the Secret yourself.
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: my-app
|
||||
namespace: my-ns
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
spec:
|
||||
rules:
|
||||
- host: app.example.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: my-app
|
||||
port:
|
||||
number: 8080
|
||||
tls:
|
||||
- hosts:
|
||||
- app.example.com
|
||||
secretName: app-example-com-tls
|
||||
```
|
||||
|
||||
If you use a namespace-scoped `Issuer` instead of a `ClusterIssuer`, replace the annotation with `cert-manager.io/issuer: <name>`.
|
||||
|
||||
---
|
||||
|
||||
## Do Not Use: TLS Without `secretName`
|
||||
|
||||
Avoid TLS entries that omit `secretName`:
|
||||
|
||||
```yaml
|
||||
# ⚠️ Non-portable — do not use
|
||||
tls:
|
||||
- hosts:
|
||||
- app.example.com
|
||||
```
|
||||
|
||||
Behavior for this pattern is **controller-specific and not portable**. On OKD/OpenShift, the ingress-to-route translation rejects it as incomplete. On other controllers, it may silently serve a self-signed fallback or fail in unpredictable ways. Harmony does not support this pattern.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites for cert-manager
|
||||
|
||||
To use automated certificates (Option B above):
|
||||
|
||||
1. **cert-manager** must be installed on the cluster.
|
||||
2. A `ClusterIssuer` or `Issuer` must exist. A typical Let's Encrypt production issuer:
|
||||
|
||||
```yaml
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
name: letsencrypt-prod
|
||||
spec:
|
||||
acme:
|
||||
server: https://acme-v02.api.letsencrypt.org/directory
|
||||
email: team@example.com
|
||||
privateKeySecretRef:
|
||||
name: letsencrypt-prod-account-key
|
||||
solvers:
|
||||
- http01:
|
||||
ingress: {}
|
||||
```
|
||||
|
||||
3. **DNS must already resolve** to the cluster's ingress endpoint before the Ingress is created. The HTTP01 challenge requires this routing to be active.
|
||||
|
||||
For wildcard certificates (e.g. `*.example.com`), HTTP01 cannot be used — configure a DNS01 solver with credentials for your DNS provider instead.
|
||||
|
||||
---
|
||||
|
||||
## OKD / OpenShift Notes
|
||||
|
||||
On OKD, standard Ingress resources are automatically translated into OpenShift `Route` objects. The default TLS termination mode is `edge`, which is correct for most HTTP applications. To control this explicitly, add:
|
||||
|
||||
```yaml
|
||||
annotations:
|
||||
route.openshift.io/termination: edge # or passthrough / reencrypt
|
||||
```
|
||||
|
||||
This annotation is ignored on non-OpenShift clusters and is safe to include unconditionally.
|
||||
@@ -156,56 +156,9 @@ impl<T: Topology + K8sclient> Interpret<T> for MyInterpret {
|
||||
}
|
||||
```
|
||||
|
||||
## Design Principles
|
||||
|
||||
### Capabilities are industry concepts, not tools
|
||||
|
||||
A capability trait must represent a **standard infrastructure need** that could be fulfilled by multiple tools. The developer who writes a Score should not need to know which product provides the capability.
|
||||
|
||||
Good capabilities: `DnsServer`, `LoadBalancer`, `DhcpServer`, `CertificateManagement`, `Router`
|
||||
These are industry-standard concepts. OPNsense provides `DnsServer` via Unbound; a future topology could provide it via CoreDNS or AWS Route53. The Score doesn't care.
|
||||
|
||||
The one exception is when the developer fundamentally needs to know the implementation: `PostgreSQL` is a capability (not `Database`) because the developer writes PostgreSQL-specific SQL, replication configs, and connection strings. Swapping it for MariaDB would break the application, not just the infrastructure.
|
||||
|
||||
**Test:** If you could swap the underlying tool without breaking any Score that uses the capability, you've drawn the boundary correctly. If swapping would require rewriting Scores, the capability is too tool-specific.
|
||||
|
||||
### One Score per concern, one capability per concern
|
||||
|
||||
A Score should express a single infrastructure intent. A capability should expose a single infrastructure concept.
|
||||
|
||||
If you're building a deployment that combines multiple concerns (e.g., "deploy Zitadel" requires PostgreSQL + Helm + K8s + Ingress), the Score **declares all of them as trait bounds** and the Topology provides them:
|
||||
|
||||
```rust
|
||||
impl<T: Topology + K8sclient + HelmCommand + PostgreSQL> Score<T> for ZitadelScore
|
||||
```
|
||||
|
||||
If you're building a tool that provides multiple capabilities (e.g., OpenBao provides secret storage, KV versioning, JWT auth, policy management), each capability should be a **separate trait** that can be implemented independently. This way, a Score that only needs secret storage doesn't pull in JWT auth machinery.
|
||||
|
||||
### Scores encapsulate operational complexity
|
||||
|
||||
The value of a Score is turning tribal knowledge into compiled, type-checked infrastructure. The `ZitadelScore` knows that you need to create a namespace, deploy a PostgreSQL cluster via CNPG, wait for the cluster to be ready, create a masterkey secret, generate a secure admin password, detect the K8s distribution, build distribution-specific Helm values, and deploy the chart. A developer using it writes:
|
||||
|
||||
```rust
|
||||
let zitadel = ZitadelScore { host: "sso.example.com".to_string(), ..Default::default() };
|
||||
```
|
||||
|
||||
Move procedural complexity into opinionated Scores. This makes them easy to test against various topologies (k3d, OpenShift, kubeadm, bare metal) and easy to compose in high-level examples.
|
||||
|
||||
### Scores must be idempotent
|
||||
|
||||
Running a Score twice should produce the same result as running it once. Use create-or-update semantics, check for existing state before acting, and handle "already exists" responses gracefully.
|
||||
|
||||
### Scores must not depend on other Scores running first
|
||||
|
||||
A Score declares its capability requirements via trait bounds. It does **not** assume that another Score has run before it. If your Score needs PostgreSQL, it declares `T: PostgreSQL` and lets the Topology handle whether PostgreSQL needs to be installed first.
|
||||
|
||||
If you find yourself writing "run Score A, then run Score B", consider whether Score B should declare the capability that Score A provides, or whether both should be orchestrated by a higher-level Score that composes them.
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **Keep Scores focused** — one Score per concern (deployment, monitoring, networking)
|
||||
- **Use `..Default::default()`** for optional fields so callers only need to specify what they care about
|
||||
- **Return `Outcome`** — use `Outcome::success`, `Outcome::failure`, or `Outcome::success_with_details` to communicate results clearly
|
||||
- **Handle errors gracefully** — return meaningful `InterpretError` messages that help operators debug issues
|
||||
- **Design capabilities around the developer's need** — not around the tool that fulfills it. Ask: "what is the core need that leads a developer to use this tool?"
|
||||
- **Don't name capabilities after tools** — `SecretVault` not `OpenbaoStore`, `IdentityProvider` not `ZitadelAuth`
|
||||
|
||||
@@ -1,16 +0,0 @@
|
||||
# Handy one liners for infrastructure management
|
||||
|
||||
### Delete all evicted pods from a cluster
|
||||
|
||||
```sh
|
||||
kubectl get po -A | grep Evic | awk '{ print "-n " $1 " " $2 }' | xargs -L 1 kubectl delete po
|
||||
```
|
||||
> Pods are evicted when the node they are running on lacks the ressources to keep them going. The most common case is when ephemeral storage becomes too full because of something like a log file getting too big.
|
||||
>
|
||||
> It could also happen because of memory or cpu pressure due to unpredictable workloads.
|
||||
>
|
||||
> This means it is generally ok to delete them.
|
||||
>
|
||||
> However, in a perfectly configured deployment and cluster, pods should rarely, if ever, get evicted. For example, a log file getting too big should be reconfigured not to use too much space, or the deployment should be configured to reserve the correct amount of ephemeral storage space.
|
||||
>
|
||||
> Note that deleting evicted pods do not solve the underlying issue, make sure to understand why the pod was evicted in the first place and put the proper solution in place.
|
||||
@@ -4,13 +4,9 @@ Real-world scenarios demonstrating Harmony in action.
|
||||
|
||||
## Available Use Cases
|
||||
|
||||
### [OPNsense VM Integration](./opnsense-vm-integration.md)
|
||||
|
||||
Boot a real OPNsense firewall in a local KVM VM and configure it entirely through Harmony — load balancer, DHCP, TFTP, VLANs, firewall rules, NAT, VIPs, and link aggregation. Fully automated, zero manual steps. The best way to see Harmony in action.
|
||||
|
||||
### [PostgreSQL on Local K3D](./postgresql-on-local-k3d.md)
|
||||
|
||||
Deploy a fully functional PostgreSQL cluster on a local K3D cluster in under 10 minutes. The quickest way to see Harmony's Kubernetes capabilities.
|
||||
Deploy a fully functional PostgreSQL cluster on a local K3D cluster in under 10 minutes. The quickest way to see Harmony in action.
|
||||
|
||||
### [OKD on Bare Metal](./okd-on-bare-metal.md)
|
||||
|
||||
|
||||
@@ -1,234 +0,0 @@
|
||||
# Use Case: OPNsense VM Integration
|
||||
|
||||
Boot a real OPNsense firewall in a local KVM virtual machine and configure it entirely through Harmony — load balancer, DHCP, TFTP, VLANs, firewall rules, NAT, VIPs, and link aggregation. Fully automated, zero manual steps, CI-friendly.
|
||||
|
||||
This is the best way to discover Harmony: you'll see 11 different Scores configure a production firewall through type-safe Rust code and the OPNsense REST API.
|
||||
|
||||
## What you'll have at the end
|
||||
|
||||
A local OPNsense VM fully configured by Harmony with:
|
||||
- HAProxy load balancer with health-checked backends
|
||||
- DHCP server with static host bindings and PXE boot options
|
||||
- TFTP server serving boot files
|
||||
- Prometheus node exporter enabled
|
||||
- 2 VLANs on the LAN interface
|
||||
- Firewall filter rules, outbound NAT, and bidirectional NAT
|
||||
- Virtual IPs (IP aliases)
|
||||
- Port forwarding (DNAT) rules
|
||||
- LAGG interface (link aggregation)
|
||||
|
||||
All applied idempotently through the OPNsense REST API — the same Scores used in production bare-metal deployments.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Linux** with KVM support (Intel VT-x/AMD-V enabled in BIOS)
|
||||
- **libvirt + QEMU** installed and running (`libvirtd` service active)
|
||||
- **~10 GB** free disk space
|
||||
- **~15 minutes** for the first run (image download + OPNsense firmware update)
|
||||
- Docker running (if installed — the setup handles compatibility)
|
||||
|
||||
Supported distributions: Arch, Manjaro, Fedora, Ubuntu, Debian.
|
||||
|
||||
## Quick start (single command)
|
||||
|
||||
```bash
|
||||
# One-time: install libvirt and configure permissions
|
||||
./examples/opnsense_vm_integration/setup-libvirt.sh
|
||||
newgrp libvirt
|
||||
|
||||
# Verify
|
||||
cargo run -p opnsense-vm-integration -- --check
|
||||
|
||||
# Boot + bootstrap + run all 11 Scores (fully unattended)
|
||||
cargo run -p opnsense-vm-integration -- --full
|
||||
```
|
||||
|
||||
That's it. No browser clicks, no manual SSH configuration, no wizard interaction.
|
||||
|
||||
## What happens step by step
|
||||
|
||||
### Phase 1: Boot the VM
|
||||
|
||||
Downloads the OPNsense 26.1 nano image (~350 MB, cached after first run), injects a `config.xml` with virtio NIC assignments, creates a 4 GiB qcow2 disk, and boots the VM with 4 NICs:
|
||||
|
||||
```
|
||||
vtnet0 = LAN (192.168.1.1/24) -- management
|
||||
vtnet1 = WAN (DHCP) -- internet access
|
||||
vtnet2 = LAGG member 1 -- for aggregation test
|
||||
vtnet3 = LAGG member 2 -- for aggregation test
|
||||
```
|
||||
|
||||
### Phase 2: Automated bootstrap
|
||||
|
||||
Once the web UI responds (~20 seconds after boot), `OPNsenseBootstrap` takes over:
|
||||
|
||||
1. **Logs in** to the web UI (root/opnsense) with automatic CSRF token handling
|
||||
2. **Aborts the initial setup wizard** via the OPNsense API
|
||||
3. **Enables SSH** with root login and password authentication
|
||||
4. **Changes the web GUI port** to 9443 (prevents HAProxy conflicts on standard ports)
|
||||
5. **Restarts lighttpd** via SSH to apply the port change
|
||||
|
||||
No browser, no Playwright, no expect scripts — just HTTP requests with session cookies and SSH commands.
|
||||
|
||||
### Phase 3: Run 11 Scores
|
||||
|
||||
Creates an API key via SSH, then configures the entire firewall:
|
||||
|
||||
| # | Score | What it configures |
|
||||
|---|-------|--------------------|
|
||||
| 1 | `LoadBalancerScore` | HAProxy with 2 frontends (ports 16443 and 18443), backends with health checks |
|
||||
| 2 | `DhcpScore` | DHCP range, 2 static host bindings (MAC-to-IP), PXE boot options |
|
||||
| 3 | `TftpScore` | TFTP server serving PXE boot files |
|
||||
| 4 | `NodeExporterScore` | Prometheus node exporter on OPNsense |
|
||||
| 5 | `VlanScore` | 2 test VLANs (tags 100 and 200) on vtnet0 |
|
||||
| 6 | `FirewallRuleScore` | Firewall filter rules (allow/block with logging) |
|
||||
| 7 | `OutboundNatScore` | Source NAT rule for outbound traffic |
|
||||
| 8 | `BinatScore` | Bidirectional 1:1 NAT |
|
||||
| 9 | `VipScore` | Virtual IPs (IP aliases for CARP/HA) |
|
||||
| 10 | `DnatScore` | Port forwarding rules |
|
||||
| 11 | `LaggScore` | Link aggregation group (failover on vtnet2+vtnet3) |
|
||||
|
||||
Each Score reports its status:
|
||||
|
||||
```
|
||||
[LoadBalancerScore] SUCCESS in 2.2s -- Load balancer configured 2 services
|
||||
[DhcpScore] SUCCESS in 1.4s -- Dhcp Interpret execution successful
|
||||
[VlanScore] SUCCESS in 0.2s -- Configured 2 VLANs
|
||||
...
|
||||
PASSED -- All OPNsense integration tests successful
|
||||
```
|
||||
|
||||
### Phase 4: Verify
|
||||
|
||||
After all Scores run, the integration test verifies each configuration via the REST API:
|
||||
- HAProxy has 2+ frontends
|
||||
- Dnsmasq has 2+ static hosts and a DHCP range
|
||||
- TFTP is enabled
|
||||
- Node exporter is enabled
|
||||
- 2+ VLANs exist
|
||||
- Firewall filter rules are present
|
||||
- VIPs, DNAT, BINAT, SNAT rules are configured
|
||||
- LAGG interface exists
|
||||
|
||||
## Explore in the web UI
|
||||
|
||||
After the test completes, open https://192.168.1.1:9443 (login: root/opnsense) and explore:
|
||||
|
||||
- **Services > HAProxy > Settings** -- frontends, backends, servers with health checks
|
||||
- **Services > Dnsmasq DNS > Settings** -- host overrides (static DHCP entries)
|
||||
- **Services > TFTP** -- enabled with uploaded files
|
||||
- **Interfaces > Other Types > VLAN** -- two tagged VLANs
|
||||
- **Firewall > Automation > Filter** -- filter rules created by Harmony
|
||||
- **Firewall > NAT > Port Forward** -- DNAT rules
|
||||
- **Firewall > NAT > Outbound** -- SNAT rules
|
||||
- **Firewall > NAT > One-to-One** -- BINAT rules
|
||||
- **Interfaces > Virtual IPs > Settings** -- IP aliases
|
||||
- **Interfaces > Other Types > LAGG** -- link aggregation group
|
||||
|
||||
## Clean up
|
||||
|
||||
```bash
|
||||
cargo run -p opnsense-vm-integration -- --clean
|
||||
```
|
||||
|
||||
Destroys the VM and virtual networks. The cached OPNsense image is kept for next time.
|
||||
|
||||
## How it works
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Your workstation OPNsense VM (KVM)
|
||||
+--------------------+ +---------------------+
|
||||
| Harmony | | OPNsense 26.1 |
|
||||
| +---------------+ | REST API | +---------------+ |
|
||||
| | OPNsense |----(HTTPS:9443)---->| | API + Plugins | |
|
||||
| | Scores | | | +---------------+ |
|
||||
| +---------------+ | SSH | +---------------+ |
|
||||
| +---------------+ |----(port 22)----->| | FreeBSD Shell | |
|
||||
| | OPNsense- | | | +---------------+ |
|
||||
| | Bootstrap | | HTTP session | |
|
||||
| +---------------+ |----(HTTPS:443)--->| (first-boot only) |
|
||||
| +---------------+ | | |
|
||||
| | opnsense- | | | LAN: 192.168.1.1 |
|
||||
| | config | | | WAN: DHCP |
|
||||
| +---------------+ | +---------------------+
|
||||
+--------------------+
|
||||
```
|
||||
|
||||
The stack has four layers:
|
||||
|
||||
1. **`opnsense-api`** -- auto-generated typed Rust client from OPNsense XML model files
|
||||
2. **`opnsense-config`** -- high-level configuration modules (DHCP, firewall, load balancer, etc.)
|
||||
3. **`OPNsenseBootstrap`** -- first-boot automation via HTTP session auth (login, wizard, SSH, webgui port)
|
||||
4. **Harmony Scores** -- declarative desired-state descriptions that make the firewall match
|
||||
|
||||
### The Score pattern
|
||||
|
||||
```rust
|
||||
// 1. Declare desired state
|
||||
let score = VlanScore {
|
||||
vlans: vec![
|
||||
VlanDef { parent: "vtnet0", tag: 100, description: "management" },
|
||||
VlanDef { parent: "vtnet0", tag: 200, description: "storage" },
|
||||
],
|
||||
};
|
||||
|
||||
// 2. Execute against topology -- queries current state, applies diff
|
||||
score.interpret(&inventory, &topology).await?;
|
||||
// Output: [VlanScore] SUCCESS in 0.9s -- Created 2 VLANs
|
||||
```
|
||||
|
||||
Scores are idempotent: running the same Score twice produces the same result.
|
||||
|
||||
## Network architecture
|
||||
|
||||
```
|
||||
Host (192.168.1.10) --- virbr-opn bridge --- OPNsense LAN (192.168.1.1)
|
||||
192.168.1.0/24 vtnet0
|
||||
NAT to internet
|
||||
|
||||
--- virbr0 (default) --- OPNsense WAN (DHCP)
|
||||
192.168.122.0/24 vtnet1
|
||||
NAT to internet
|
||||
```
|
||||
|
||||
## Available commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `--check` | Verify prerequisites (libvirtd, virsh, qemu-img) |
|
||||
| `--download` | Download the OPNsense image (cached) |
|
||||
| `--boot` | Create VM + automated bootstrap |
|
||||
| (default) | Run integration test (assumes VM is bootstrapped) |
|
||||
| `--full` | Boot + bootstrap + integration test (CI mode) |
|
||||
| `--status` | Show VM state, ports, and connectivity |
|
||||
| `--clean` | Destroy VM and networks |
|
||||
|
||||
## Environment variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `RUST_LOG` | (unset) | Log level: `info`, `debug`, `trace` |
|
||||
| `HARMONY_KVM_URI` | `qemu:///system` | Libvirt connection URI |
|
||||
| `HARMONY_KVM_IMAGE_DIR` | `~/.local/share/harmony/kvm/images` | Cached disk images |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**VM won't start / permission denied**
|
||||
Ensure your user is in the `libvirt` group and that the image directory is traversable by the qemu user. Run `setup-libvirt.sh` to fix.
|
||||
|
||||
**192.168.1.0/24 conflict**
|
||||
If your host network already uses this subnet, the VM will be unreachable. Edit the constants in `src/main.rs` to use a different subnet.
|
||||
|
||||
**Web GUI didn't come up after bootstrap**
|
||||
The bootstrap runs `diagnose_via_ssh()` automatically when the web UI doesn't respond. Check the diagnostic output for lighttpd status and listening ports. You can also access the serial console: `virsh -c qemu:///system console opn-integration`
|
||||
|
||||
**HAProxy install fails**
|
||||
OPNsense may need a firmware update. The integration test handles this automatically but it may take a few minutes for the update + reboot cycle.
|
||||
|
||||
## What's next
|
||||
|
||||
- **[OPNsense Firewall Pair](../../examples/opnsense_pair_integration/README.md)** -- boot two VMs, configure CARP HA failover with `FirewallPairTopology` and `CarpVipScore`. Uses NIC link control to bootstrap both VMs sequentially despite sharing the same default IP.
|
||||
- [OKD on Bare Metal](./okd-on-bare-metal.md) -- the full 7-stage OKD installation pipeline using OPNsense as the infrastructure backbone
|
||||
- [PostgreSQL on Local K3D](./postgresql-on-local-k3d.md) -- a simpler starting point using Kubernetes
|
||||
@@ -18,8 +18,6 @@ This directory contains runnable examples demonstrating Harmony's capabilities.
|
||||
| `remove_rook_osd` | Remove a Rook OSD | — | ✅ | Rook/Ceph |
|
||||
| `brocade_snmp_server` | Configure Brocade switch SNMP | — | ✅ | Brocade switch |
|
||||
| `opnsense_node_exporter` | Node exporter on OPNsense | — | ✅ | OPNsense firewall |
|
||||
| `opnsense_vm_integration` | Full OPNsense firewall automation (11 Scores) | ✅ | — | KVM/libvirt |
|
||||
| `opnsense_pair_integration` | OPNsense HA pair with CARP failover | ✅ | — | KVM/libvirt |
|
||||
| `okd_pxe` | PXE boot configuration for OKD | — | — | ✅ |
|
||||
| `okd_installation` | Full OKD bare-metal install | — | — | ✅ |
|
||||
| `okd_cluster_alerts` | OKD cluster monitoring alerts | — | ✅ | OKD cluster |
|
||||
@@ -77,8 +75,6 @@ This directory contains runnable examples demonstrating Harmony's capabilities.
|
||||
- **`application_monitoring_with_tenant`** — App monitoring with tenant isolation
|
||||
|
||||
### Infrastructure & Bare Metal
|
||||
- **`opnsense_vm_integration`** — **Recommended demo.** Boot an OPNsense VM and configure it with 11 Scores (load balancer, DHCP, TFTP, VLANs, firewall rules, NAT, VIPs, LAGG). Fully automated, requires only KVM. See the [detailed guide](../docs/use-cases/opnsense-vm-integration.md).
|
||||
- **`opnsense_pair_integration`** — Boot two OPNsense VMs and configure a CARP HA firewall pair with `FirewallPairTopology` and `CarpVipScore`. Demonstrates NIC link control for sequential bootstrap.
|
||||
- **`okd_installation`** — Full OKD cluster from scratch
|
||||
- **`okd_pxe`** — PXE boot configuration for OKD
|
||||
- **`sttest`** — Full OKD stack test with specific hardware
|
||||
|
||||
@@ -1,15 +0,0 @@
|
||||
[package]
|
||||
name = "example_linux_vm"
|
||||
version.workspace = true
|
||||
edition = "2024"
|
||||
license.workspace = true
|
||||
|
||||
[[bin]]
|
||||
name = "example_linux_vm"
|
||||
path = "src/main.rs"
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
tokio.workspace = true
|
||||
log.workspace = true
|
||||
env_logger.workspace = true
|
||||
@@ -1,43 +0,0 @@
|
||||
# Example: Linux VM from ISO
|
||||
|
||||
This example deploys a simple Linux virtual machine from an ISO URL.
|
||||
|
||||
## What it creates
|
||||
|
||||
- One isolated virtual network (`linuxvm-net`, 192.168.101.0/24)
|
||||
- One Ubuntu Server VM with the ISO attached as a CD-ROM
|
||||
- The VM is configured to boot from the CD-ROM first, allowing installation
|
||||
- After installation, the VM can be rebooted to boot from disk
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- A running KVM hypervisor (local or remote)
|
||||
- `HARMONY_KVM_URI` environment variable pointing to the hypervisor (defaults to `qemu:///system`)
|
||||
- `HARMONY_KVM_IMAGE_DIR` environment variable for storing VM images (defaults to harmony data dir)
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
cargo run -p example_linux_vm
|
||||
```
|
||||
|
||||
## After deployment
|
||||
|
||||
Once the VM is running, you can connect to its console:
|
||||
|
||||
```bash
|
||||
virsh -c qemu:///system console linux-vm
|
||||
```
|
||||
|
||||
To access the VM via SSH after installation, you'll need to configure a bridged network or port forwarding.
|
||||
|
||||
## Clean up
|
||||
|
||||
To remove the VM and network:
|
||||
|
||||
```bash
|
||||
virsh -c qemu:///system destroy linux-vm
|
||||
virsh -c qemu:///system undefine linux-vm
|
||||
virsh -c qemu:///system net-destroy linuxvm-net
|
||||
virsh -c qemu:///system net-undefine linuxvm-net
|
||||
```
|
||||
@@ -1,63 +0,0 @@
|
||||
use harmony::modules::kvm::config::init_executor;
|
||||
use harmony::modules::kvm::{BootDevice, NetworkConfig, NetworkRef, VmConfig};
|
||||
use log::info;
|
||||
|
||||
const NETWORK_NAME: &str = "linuxvm-net";
|
||||
const NETWORK_GATEWAY: &str = "192.168.101.1";
|
||||
const NETWORK_PREFIX: u8 = 24;
|
||||
|
||||
const UBUNTU_ISO_URL: &str =
|
||||
"https://releases.ubuntu.com/24.04/ubuntu-24.04.3-live-server-amd64.iso";
|
||||
|
||||
pub async fn deploy_linux_vm() -> Result<(), String> {
|
||||
let executor = init_executor().map_err(|e| format!("KVM initialization failed: {e}"))?;
|
||||
|
||||
let network = NetworkConfig::builder(NETWORK_NAME)
|
||||
.bridge("virbr101")
|
||||
.subnet(NETWORK_GATEWAY, NETWORK_PREFIX)
|
||||
.build();
|
||||
|
||||
info!("Ensuring network '{NETWORK_NAME}' ({NETWORK_GATEWAY}/{NETWORK_PREFIX}) exists");
|
||||
executor
|
||||
.ensure_network(network)
|
||||
.await
|
||||
.map_err(|e| format!("Network setup failed: {e}"))?;
|
||||
|
||||
let vm = linux_vm();
|
||||
info!("Defining Linux VM '{}'", vm.name);
|
||||
executor
|
||||
.ensure_vm(vm.clone())
|
||||
.await
|
||||
.map_err(|e| format!("Linux VM setup failed: {e}"))?;
|
||||
|
||||
info!("Starting VM '{}'", vm.name);
|
||||
executor
|
||||
.start_vm(&vm.name)
|
||||
.await
|
||||
.map_err(|e| format!("Failed to start VM: {e}"))?;
|
||||
|
||||
info!(
|
||||
"Linux VM '{}' is running. \
|
||||
Connect to the console using: virsh -c qemu:///system console {}",
|
||||
vm.name, vm.name
|
||||
);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn linux_vm() -> VmConfig {
|
||||
VmConfig::builder("linux-vm")
|
||||
.vcpus(2)
|
||||
.memory_gb(4)
|
||||
.disk(20)
|
||||
.network(NetworkRef::named(NETWORK_NAME))
|
||||
.cdrom(UBUNTU_ISO_URL)
|
||||
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
|
||||
.build()
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), String> {
|
||||
env_logger::init();
|
||||
deploy_linux_vm().await
|
||||
}
|
||||
@@ -1,30 +0,0 @@
|
||||
[package]
|
||||
name = "example-harmony-sso"
|
||||
edition = "2024"
|
||||
version.workspace = true
|
||||
readme.workspace = true
|
||||
license.workspace = true
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
harmony_cli = { path = "../../harmony_cli" }
|
||||
harmony_config = { path = "../../harmony_config" }
|
||||
harmony_macros = { path = "../../harmony_macros" }
|
||||
harmony_secret = { path = "../../harmony_secret" }
|
||||
harmony_types = { path = "../../harmony_types" }
|
||||
harmony-k8s = { path = "../../harmony-k8s" }
|
||||
k3d-rs = { path = "../../k3d" }
|
||||
k8s-openapi.workspace = true
|
||||
kube.workspace = true
|
||||
tokio.workspace = true
|
||||
url.workspace = true
|
||||
log.workspace = true
|
||||
env_logger.workspace = true
|
||||
serde.workspace = true
|
||||
serde_json.workspace = true
|
||||
anyhow.workspace = true
|
||||
reqwest.workspace = true
|
||||
clap = { version = "4", features = ["derive"] }
|
||||
schemars = "0.8"
|
||||
interactive-parse = "0.1.5"
|
||||
directories = "6.0.0"
|
||||
@@ -1,90 +0,0 @@
|
||||
# Harmony SSO Example
|
||||
|
||||
Deploys Zitadel (identity provider) and OpenBao (secrets management) on a local k3d cluster, then demonstrates using them as `harmony_config` backends for shared config and secret management.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker running
|
||||
- Ports 8080 and 8200 free
|
||||
- `/etc/hosts` entries (or use a local DNS resolver):
|
||||
```
|
||||
127.0.0.1 sso.harmony.local
|
||||
127.0.0.1 bao.harmony.local
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Full deployment
|
||||
|
||||
```bash
|
||||
# Deploy everything (OpenBao + Zitadel)
|
||||
cargo run -p example-harmony-sso
|
||||
|
||||
# OpenBao only (faster, skip Zitadel)
|
||||
cargo run -p example-harmony-sso -- --skip-zitadel
|
||||
```
|
||||
|
||||
### Config storage demo (token auth)
|
||||
|
||||
After deployment, run the config demo to verify `harmony_config` works with OpenBao:
|
||||
|
||||
```bash
|
||||
cargo run -p example-harmony-sso -- --demo
|
||||
```
|
||||
|
||||
This writes and reads a `SsoExampleConfig` through the `ConfigManager` chain (`EnvSource -> StoreSource<OpenbaoSecretStore>`), demonstrating environment variable overrides and persistent storage in OpenBao KV v2.
|
||||
|
||||
### SSO device flow demo
|
||||
|
||||
Requires a Zitadel application configured for device code grant:
|
||||
|
||||
```bash
|
||||
HARMONY_SSO_CLIENT_ID=<zitadel-app-client-id> \
|
||||
cargo run -p example-harmony-sso -- --sso-demo
|
||||
```
|
||||
|
||||
### Cleanup
|
||||
|
||||
```bash
|
||||
cargo run -p example-harmony-sso -- --cleanup
|
||||
```
|
||||
|
||||
## What gets deployed
|
||||
|
||||
| Component | Namespace | Access |
|
||||
|---|---|---|
|
||||
| OpenBao (standalone, file storage) | `openbao` | `http://bao.harmony.local:8200` |
|
||||
| Zitadel (with CNPG PostgreSQL) | `zitadel` | `http://sso.harmony.local:8080` |
|
||||
|
||||
### OpenBao configuration
|
||||
|
||||
- **Auth methods:** userpass, JWT
|
||||
- **Secrets engine:** KV v2 at `secret/`
|
||||
- **Policy:** `harmony-dev` grants CRUD on `secret/data/harmony/*`
|
||||
- **Userpass credentials:** `harmony` / `harmony-dev-password`
|
||||
- **JWT auth:** configured with Zitadel as OIDC provider, role `harmony-developer`
|
||||
- **Unseal keys:** saved to `~/.local/share/harmony/openbao/unseal-keys.json`
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Developer CLI
|
||||
|
|
||||
|-- harmony_config::ConfigManager
|
||||
| |-- EnvSource (HARMONY_CONFIG_* env vars)
|
||||
| |-- StoreSource<OpenbaoSecretStore>
|
||||
| |-- Token auth (OPENBAO_TOKEN)
|
||||
| |-- Cached token validation
|
||||
| |-- Zitadel OIDC device flow (RFC 8628)
|
||||
| |-- Userpass fallback
|
||||
|
|
||||
v
|
||||
k3d cluster (harmony-example)
|
||||
|-- OpenBao (KV v2 secrets engine)
|
||||
| |-- JWT auth -> validates Zitadel id_tokens
|
||||
| |-- userpass auth -> dev credentials
|
||||
|
|
||||
|-- Zitadel (OpenID Connect IdP)
|
||||
|-- Device authorization grant
|
||||
|-- Federated login (Google, GitHub, Entra ID)
|
||||
```
|
||||
@@ -1,155 +0,0 @@
|
||||
# Harmony SSO Plan
|
||||
|
||||
## Context
|
||||
|
||||
Deploy Zitadel and OpenBao on a local k3d cluster, use them as `harmony_config` backends, and demonstrate end-to-end config storage authenticated via SSO. The goal: rock-solid deployment so teams and collaborators can reliably share config and secrets through OpenBao with Zitadel SSO authentication.
|
||||
|
||||
## Status
|
||||
|
||||
### Phase A: MVP with Token Auth -- DONE
|
||||
|
||||
- [x] A.1 -- CLI argument parsing (`--demo`, `--sso-demo`, `--skip-zitadel`, `--cleanup`)
|
||||
- [x] A.2 -- Zitadel deployment via `ZitadelScore` (`external_secure: false` for k3d)
|
||||
- [x] A.3 -- OpenBao JWT auth method + `harmony-dev` policy configuration
|
||||
- [x] A.4 -- `--demo` flag: config storage demo with token auth via `ConfigManager`
|
||||
- [x] A.5 -- Hardening: retry loops for pod readiness, HTTP readiness checks, `--cleanup`
|
||||
- [x] A.6 -- README with prerequisites, usage, and architecture
|
||||
|
||||
Verified end-to-end: fresh `k3d cluster delete` -> `cargo run -p example-harmony-sso` -> `--demo` succeeds.
|
||||
|
||||
### Phase B: OIDC Device Flow + JWT Exchange -- TODO
|
||||
|
||||
The Zitadel OIDC device flow code exists (`harmony_secret/src/store/zitadel.rs`) but the **JWT exchange** step is missing: `process_token_response()` stores the OIDC `access_token` as `openbao_token` directly, but per ADR 020-1 the `id_token` should be exchanged with OpenBao's `/v1/auth/jwt/login` endpoint.
|
||||
|
||||
**B.1 -- Implement JWT exchange in `harmony_secret/src/store/zitadel.rs`:**
|
||||
- Add `openbao_url`, `jwt_auth_mount`, `jwt_role` fields to `ZitadelOidcAuth`
|
||||
- Add `exchange_jwt_for_openbao_token(id_token)` using raw `reqwest` (vaultrs 0.7.4 has no JWT auth module)
|
||||
- POST `{openbao_url}/v1/auth/{jwt_auth_mount}/login` with `{"role": "...", "jwt": "..."}`
|
||||
- Modify `process_token_response()` to use exchange when `openbao_url` is set
|
||||
|
||||
**B.2 -- Wire JWT params through `harmony_secret/src/store/openbao.rs`:**
|
||||
- Pass `base_url`, `jwt_auth_mount`, `jwt_role` to `ZitadelOidcAuth::new()` in `authenticate_zitadel_oidc()`
|
||||
- Update `OpenbaoSecretStore::new()` signature for optional `jwt_role` and `jwt_auth_mount`
|
||||
|
||||
**B.3 -- Add env vars to `harmony_secret/src/config.rs`:**
|
||||
- `OPENBAO_JWT_AUTH_MOUNT` (default: `jwt`)
|
||||
- `OPENBAO_JWT_ROLE` (default: `harmony-developer`)
|
||||
|
||||
**B.4 -- Silent refresh:**
|
||||
- Add `refresh_token()` method to `ZitadelOidcAuth`
|
||||
- Update auth chain in `openbao.rs`: cached session -> silent refresh -> device flow
|
||||
|
||||
**B.5 -- `--sso-demo` flag:**
|
||||
- Already stubbed in `examples/harmony_sso/src/main.rs`
|
||||
- Requires a Zitadel device code application (manual setup, accept `HARMONY_SSO_CLIENT_ID` env var)
|
||||
|
||||
**B.6 -- Solve in-cluster DNS for JWT auth config:**
|
||||
- OpenBao JWT auth needs `oidc_discovery_url` to fetch Zitadel's JWKS
|
||||
- Zitadel requires `Host` header matching `ExternalDomain` on ALL endpoints (including `/oauth/v2/keys`)
|
||||
- So `oidc_discovery_url=http://zitadel.zitadel.svc.cluster.local:8080` gets 404 from Zitadel
|
||||
- Options: (a) CoreDNS rewrite rule mapping `sso.harmony.local` -> `zitadel.zitadel.svc`, (b) Kubernetes ExternalName service, (c) `Zitadel.AdditionalDomains` Helm config to accept the internal hostname
|
||||
- Currently non-fatal (warning only), needed before `--sso-demo` can work
|
||||
|
||||
### Phase C: Testing & Automation -- TODO
|
||||
|
||||
**C.1 -- Integration tests** (`examples/harmony_sso/tests/integration.rs`, `#[ignore]`):
|
||||
- `test_openbao_health` -- health endpoint
|
||||
- `test_zitadel_openid_config` -- OIDC discovery
|
||||
- `test_openbao_userpass_auth` -- write/read secret
|
||||
- `test_config_manager_openbao_backend` -- full ConfigManager chain
|
||||
- `test_openbao_jwt_auth_configured` -- verify JWT auth method + role exist
|
||||
|
||||
**C.2 -- Zitadel application automation** (`examples/harmony_sso/src/zitadel_setup.rs`):
|
||||
- Automate project + device code app creation via Zitadel Management API
|
||||
- Extract and save `client_id`
|
||||
|
||||
---
|
||||
|
||||
## Tricky Things / Lessons Learned
|
||||
|
||||
### ZitadelScore on k3d -- security context
|
||||
|
||||
The Zitadel container image (`ghcr.io/zitadel/zitadel`) defines `User: "zitadel"` (non-numeric string). With `runAsNonRoot: true` and `runAsUser: null`, kubelet can't verify the user is non-root and fails with `CreateContainerConfigError`. **Fix:** set `runAsUser: 1000` explicitly (that's the UID for `zitadel` in `/etc/passwd`). This applies to all security contexts: `podSecurityContext`, `securityContext`, `initJob`, `setupJob`, and `login`.
|
||||
|
||||
Changed in `harmony/src/modules/zitadel/mod.rs` for the `K3sFamily | Default` branch.
|
||||
|
||||
### ZitadelScore on k3d -- ingress class
|
||||
|
||||
The K3sFamily Helm values had `kubernetes.io/ingress.class: nginx` annotations. k3d ships with traefik, not nginx. The nginx annotation caused traefik to ignore the ingress entirely (404 on all routes). **Fix:** removed the explicit ingress class annotations -- traefik picks up ingresses without an explicit class by default.
|
||||
|
||||
Changed in `harmony/src/modules/zitadel/mod.rs` for the `K3sFamily | Default` branch.
|
||||
|
||||
### CNPG CRD registration race
|
||||
|
||||
After `helm install cloudnative-pg`, the operator deployment becomes ready but the CRD (`clusters.postgresql.cnpg.io`) is not yet registered in the API server's discovery cache. The kube client caches API discovery at init time, so even after the CRD registers, a reused client won't see it. **Fix:** the example creates a **fresh topology** (and therefore fresh kube client) on each retry attempt. Up to 5 retries with 15s delay.
|
||||
|
||||
### CNPG PostgreSQL cluster readiness
|
||||
|
||||
After the CNPG `Cluster` CR is created, the PostgreSQL pods and the `-rw` service take 15-30s to come up. `ZitadelScore` immediately calls `topology.get_endpoint()` which looks for the `zitadel-pg-rw` service. If the service doesn't exist yet, it fails with "not found for cluster". **Fix:** same retry loop catches this error pattern.
|
||||
|
||||
### Zitadel Helm init job timing
|
||||
|
||||
The Zitadel Helm chart runs a `zitadel-init` pre-install/pre-upgrade Job that connects to PostgreSQL. If the PG cluster isn't fully ready (primary not accepting connections), the init job hangs until Helm's 5-minute timeout. On a cold start from scratch, the sequence is: CNPG operator install -> CRD registration (5-15s) -> PG cluster creation -> PG pod scheduling + init (~30s) -> PG primary ready -> Zitadel init job can connect. The retry loop handles this by allowing the full sequence to settle between attempts.
|
||||
|
||||
### Zitadel Host header validation
|
||||
|
||||
Zitadel validates the `Host` header on **all** HTTP endpoints against its `ExternalDomain` config (`sso.harmony.local`). This means:
|
||||
- The OIDC discovery endpoint (`/.well-known/openid-configuration`) returns 404 if called via the internal service URL without the correct Host header
|
||||
- The JWKS endpoint (`/oauth/v2/keys`) also requires the correct Host
|
||||
- OpenBao's JWT auth `oidc_discovery_url` can't use `http://zitadel.zitadel.svc.cluster.local:8080` because Zitadel rejects the Host
|
||||
- From outside the cluster, use `127.0.0.1:8080` with `Host: sso.harmony.local` header (or add /etc/hosts entry)
|
||||
- Phase B needs to solve in-cluster DNS resolution for `sso.harmony.local`
|
||||
|
||||
### Both services share one port
|
||||
|
||||
Both Zitadel and OpenBao are exposed through traefik ingress on port 80 (mapped to host port 8080). Traefik routes by `Host` header: `sso.harmony.local` -> Zitadel, `bao.harmony.local` -> OpenBao. The original plan had separate port mappings (8080 for Zitadel, 8200 for OpenBao) but the 8200 mapping was useless since traefik only listens on 80/443.
|
||||
|
||||
For `--demo` mode, the port-forward bypasses traefik and connects directly to the OpenBao service on port 8200 (no Host header needed).
|
||||
|
||||
### `run_bao_command` and shell escaping
|
||||
|
||||
The `run_bao_command` function runs `kubectl exec ... -- sh -c "export VAULT_TOKEN=xxx && bao ..."`. Two gotchas:
|
||||
1. Must use `export VAULT_TOKEN=...` (not just `VAULT_TOKEN=...` prefix) because piped commands after `|` don't inherit the prefix env var
|
||||
2. The policy creation uses `printf '...' | bao policy write harmony-dev -` which needs careful quoting inside the `sh -c` wrapper. Using `run_bao_command_raw()` avoids double-wrapping.
|
||||
|
||||
### FIXMEs for future refactoring
|
||||
|
||||
The user flagged several areas that should use `harmony-k8s` instead of raw `kubectl`:
|
||||
- `wait_for_pod_running()` -- harmony-k8s has pod wait functionality
|
||||
- `init_openbao()`, `unseal_openbao()` -- exec into pods via kubectl
|
||||
- `get_k3d_binary_path()`, `get_openbao_data_path()` -- leaking implementation details from k3d/openbao crates
|
||||
- `configure_openbao()` -- future candidate for an OpenBao/Vault capability trait
|
||||
|
||||
---
|
||||
|
||||
## Files Modified (Phase A)
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `examples/harmony_sso/Cargo.toml` | Added clap, schemars, interactive-parse |
|
||||
| `examples/harmony_sso/src/main.rs` | Complete rewrite: CLI args, Zitadel deploy, JWT auth config, demo modes, hardening |
|
||||
| `examples/harmony_sso/README.md` | New: prerequisites, usage, architecture |
|
||||
| `harmony/src/modules/zitadel/mod.rs` | Fixed K3s security context (`runAsUser: 1000`), removed nginx ingress annotations |
|
||||
|
||||
## Files to Modify (Phase B)
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `harmony_secret/src/store/zitadel.rs` | JWT exchange, silent refresh |
|
||||
| `harmony_secret/src/store/openbao.rs` | Wire JWT params, refresh in auth chain |
|
||||
| `harmony_secret/src/config.rs` | OPENBAO_JWT_AUTH_MOUNT, OPENBAO_JWT_ROLE env vars |
|
||||
|
||||
## Verification
|
||||
|
||||
**Phase A (verified 2026-03-28):**
|
||||
- `cargo run -p example-harmony-sso` -> deploys k3d + OpenBao + Zitadel (with retry for CNPG CRD + PG readiness)
|
||||
- `curl -H "Host: bao.harmony.local" http://127.0.0.1:8080/v1/sys/health` -> OpenBao healthy (initialized, unsealed)
|
||||
- `curl -H "Host: sso.harmony.local" http://127.0.0.1:8080/.well-known/openid-configuration` -> Zitadel OIDC config with device_authorization_endpoint
|
||||
- `cargo run -p example-harmony-sso -- --demo` -> writes/reads config via ConfigManager + OpenbaoSecretStore, env override works
|
||||
|
||||
**Phase B:**
|
||||
- `HARMONY_SSO_URL=http://sso.harmony.local HARMONY_SSO_CLIENT_ID=<id> cargo run -p example-harmony-sso -- --sso-demo`
|
||||
- Device code appears, login in browser, config stored via SSO-authenticated OpenBao token
|
||||
|
||||
**Phase C:**
|
||||
- `cargo test -p example-harmony-sso -- --ignored` -> integration tests pass
|
||||
@@ -1,407 +0,0 @@
|
||||
use anyhow::Context;
|
||||
use clap::Parser;
|
||||
use harmony::inventory::Inventory;
|
||||
use harmony::modules::k8s::coredns::{CoreDNSRewrite, CoreDNSRewriteScore};
|
||||
use harmony::modules::openbao::{
|
||||
OpenbaoJwtAuth, OpenbaoPolicy, OpenbaoScore, OpenbaoSetupScore, OpenbaoUser,
|
||||
};
|
||||
use harmony::modules::zitadel::{
|
||||
ZitadelAppType, ZitadelApplication, ZitadelClientConfig, ZitadelScore, ZitadelSetupScore,
|
||||
};
|
||||
use harmony::score::Score;
|
||||
use harmony::topology::{K8sclient, Topology};
|
||||
use harmony_config::{Config, ConfigManager, EnvSource, StoreSource};
|
||||
use harmony_k8s::K8sClient;
|
||||
use harmony_secret::OpenbaoSecretStore;
|
||||
use k3d_rs::{K3d, PortMapping};
|
||||
use log::info;
|
||||
use schemars::JsonSchema;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::path::PathBuf;
|
||||
use std::sync::Arc;
|
||||
|
||||
const CLUSTER_NAME: &str = "harmony-example";
|
||||
const ZITADEL_HOST: &str = "sso.harmony.local";
|
||||
const OPENBAO_HOST: &str = "bao.harmony.local";
|
||||
const HTTP_PORT: u32 = 8080;
|
||||
const OPENBAO_NAMESPACE: &str = "openbao";
|
||||
const OPENBAO_POD: &str = "openbao-0";
|
||||
const APP_NAME: &str = "harmony-cli";
|
||||
const PROJECT_NAME: &str = "harmony";
|
||||
|
||||
#[derive(Parser)]
|
||||
#[command(
|
||||
name = "harmony-sso",
|
||||
about = "Deploy Zitadel + OpenBao on k3d, authenticate via SSO, store config"
|
||||
)]
|
||||
struct Args {
|
||||
/// Skip Zitadel deployment (OpenBao only, faster iteration)
|
||||
#[arg(long)]
|
||||
skip_zitadel: bool,
|
||||
|
||||
/// Delete the k3d cluster and exit
|
||||
#[arg(long)]
|
||||
cleanup: bool,
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Config type stored via SSO-authenticated OpenBao
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, PartialEq)]
|
||||
struct SsoExampleConfig {
|
||||
team_name: String,
|
||||
environment: String,
|
||||
max_replicas: u16,
|
||||
}
|
||||
|
||||
impl Default for SsoExampleConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
team_name: "platform-team".to_string(),
|
||||
environment: "staging".to_string(),
|
||||
max_replicas: 3,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Config for SsoExampleConfig {
|
||||
const KEY: &'static str = "SsoExampleConfig";
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn harmony_data_dir() -> PathBuf {
|
||||
directories::BaseDirs::new()
|
||||
.map(|dirs| dirs.data_dir().join("harmony"))
|
||||
.unwrap_or_else(|| PathBuf::from("/tmp/harmony"))
|
||||
}
|
||||
|
||||
fn create_k3d() -> K3d {
|
||||
let base_dir = harmony_data_dir().join("k3d");
|
||||
std::fs::create_dir_all(&base_dir).expect("Failed to create k3d data directory");
|
||||
K3d::new(base_dir, Some(CLUSTER_NAME.to_string()))
|
||||
.with_port_mappings(vec![PortMapping::new(HTTP_PORT, 80)])
|
||||
}
|
||||
|
||||
fn create_topology(k3d: &K3d) -> harmony::topology::K8sAnywhereTopology {
|
||||
let context = k3d
|
||||
.context_name()
|
||||
.unwrap_or_else(|| format!("k3d-{}", CLUSTER_NAME));
|
||||
unsafe {
|
||||
std::env::set_var("HARMONY_USE_LOCAL_K3D", "false");
|
||||
std::env::set_var("HARMONY_AUTOINSTALL", "false");
|
||||
std::env::set_var("HARMONY_K8S_CONTEXT", &context);
|
||||
}
|
||||
harmony::topology::K8sAnywhereTopology::from_env()
|
||||
}
|
||||
|
||||
fn harmony_dev_policy() -> OpenbaoPolicy {
|
||||
OpenbaoPolicy {
|
||||
name: "harmony-dev".to_string(),
|
||||
hcl: r#"path "secret/data/harmony/*" { capabilities = ["create","read","update","delete","list"] }
|
||||
path "secret/metadata/harmony/*" { capabilities = ["list","read"] }"#
|
||||
.to_string(),
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Zitadel deployment (with CNPG retry)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
async fn deploy_zitadel(k3d: &K3d) -> anyhow::Result<()> {
|
||||
info!("Deploying Zitadel (this may take several minutes)...");
|
||||
|
||||
let zitadel = ZitadelScore {
|
||||
host: ZITADEL_HOST.to_string(),
|
||||
zitadel_version: "v4.12.1".to_string(),
|
||||
external_secure: false,
|
||||
};
|
||||
|
||||
let topology = create_topology(k3d);
|
||||
topology
|
||||
.ensure_ready()
|
||||
.await
|
||||
.context("Topology init failed")?;
|
||||
|
||||
zitadel
|
||||
.interpret(&Inventory::autoload(), &topology)
|
||||
.await
|
||||
.context("Zitadel deployment failed")?;
|
||||
|
||||
info!("Zitadel deployed successfully");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn wait_for_zitadel_ready() -> anyhow::Result<()> {
|
||||
info!("Waiting for Zitadel to be ready...");
|
||||
let client = reqwest::Client::builder()
|
||||
.timeout(std::time::Duration::from_secs(5))
|
||||
.build()?;
|
||||
|
||||
for attempt in 1..=90 {
|
||||
match client
|
||||
.get(format!(
|
||||
"http://127.0.0.1:{}/.well-known/openid-configuration",
|
||||
HTTP_PORT
|
||||
))
|
||||
.header("Host", ZITADEL_HOST)
|
||||
.send()
|
||||
.await
|
||||
{
|
||||
Ok(resp) if resp.status().is_success() => {
|
||||
info!("Zitadel is ready");
|
||||
return Ok(());
|
||||
}
|
||||
Ok(resp) if attempt % 10 == 0 => {
|
||||
info!("Zitadel HTTP {}, attempt {}/90", resp.status(), attempt);
|
||||
}
|
||||
Err(e) if attempt % 10 == 0 => {
|
||||
info!("Zitadel not reachable: {}, attempt {}/90", e, attempt);
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
|
||||
}
|
||||
|
||||
anyhow::bail!("Timed out waiting for Zitadel")
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Cluster lifecycle
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
async fn ensure_k3d_cluster(k3d: &K3d) -> anyhow::Result<()> {
|
||||
info!("Ensuring k3d cluster '{}' is running...", CLUSTER_NAME);
|
||||
k3d.ensure_installed()
|
||||
.await
|
||||
.map_err(|e| anyhow::anyhow!("k3d setup failed: {}", e))?;
|
||||
info!("k3d cluster '{}' is ready", CLUSTER_NAME);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn cleanup_cluster(k3d: &K3d) -> anyhow::Result<()> {
|
||||
let name = k3d
|
||||
.cluster_name()
|
||||
.ok_or_else(|| anyhow::anyhow!("No cluster name"))?;
|
||||
info!("Deleting k3d cluster '{}'...", name);
|
||||
k3d.run_k3d_command(["cluster", "delete", name])
|
||||
.map_err(|e| anyhow::anyhow!("{}", e))?;
|
||||
info!("Cluster '{}' deleted", name);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn cleanup_openbao_webhook(k8s: &K8sClient) -> anyhow::Result<()> {
|
||||
use k8s_openapi::api::admissionregistration::v1::MutatingWebhookConfiguration;
|
||||
if k8s
|
||||
.get_resource::<MutatingWebhookConfiguration>("openbao-agent-injector-cfg", None)
|
||||
.await?
|
||||
.is_some()
|
||||
{
|
||||
info!("Deleting conflicting OpenBao webhook...");
|
||||
k8s.delete_resource::<MutatingWebhookConfiguration>("openbao-agent-injector-cfg", None)
|
||||
.await?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Main
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> anyhow::Result<()> {
|
||||
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
|
||||
let args = Args::parse();
|
||||
let k3d = create_k3d();
|
||||
|
||||
if args.cleanup {
|
||||
return cleanup_cluster(&k3d);
|
||||
}
|
||||
|
||||
info!("===========================================");
|
||||
info!("Harmony SSO Example");
|
||||
info!("===========================================");
|
||||
|
||||
// --- Phase 1: Infrastructure ---
|
||||
|
||||
ensure_k3d_cluster(&k3d).await?;
|
||||
|
||||
let topology = create_topology(&k3d);
|
||||
topology
|
||||
.ensure_ready()
|
||||
.await
|
||||
.context("Topology init failed")?;
|
||||
|
||||
let k8s = topology
|
||||
.k8s_client()
|
||||
.await
|
||||
.map_err(|e| anyhow::anyhow!("K8s client: {}", e))?;
|
||||
|
||||
// Deploy + configure OpenBao (no JWT auth yet -- Zitadel isn't up)
|
||||
cleanup_openbao_webhook(&k8s).await?;
|
||||
OpenbaoScore {
|
||||
host: OPENBAO_HOST.to_string(),
|
||||
openshift: false,
|
||||
}
|
||||
.interpret(&Inventory::autoload(), &topology)
|
||||
.await
|
||||
.context("OpenBao deploy failed")?;
|
||||
|
||||
OpenbaoSetupScore {
|
||||
policies: vec![harmony_dev_policy()],
|
||||
users: vec![OpenbaoUser {
|
||||
username: "harmony".to_string(),
|
||||
password: "harmony-dev-password".to_string(),
|
||||
policies: vec!["harmony-dev".to_string()],
|
||||
}],
|
||||
jwt_auth: None, // Phase 2 adds JWT after Zitadel is ready
|
||||
..Default::default()
|
||||
}
|
||||
.interpret(&Inventory::autoload(), &topology)
|
||||
.await
|
||||
.context("OpenBao setup failed")?;
|
||||
|
||||
if args.skip_zitadel {
|
||||
info!("=== Skipping Zitadel (--skip-zitadel) ===");
|
||||
info!("OpenBao: http://{}:{}", OPENBAO_HOST, HTTP_PORT);
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// --- Phase 2: Identity + SSO Wiring ---
|
||||
|
||||
CoreDNSRewriteScore {
|
||||
rewrites: vec![
|
||||
CoreDNSRewrite {
|
||||
hostname: ZITADEL_HOST.to_string(),
|
||||
target: "zitadel.zitadel.svc.cluster.local".to_string(),
|
||||
},
|
||||
CoreDNSRewrite {
|
||||
hostname: OPENBAO_HOST.to_string(),
|
||||
target: "openbao.openbao.svc.cluster.local".to_string(),
|
||||
},
|
||||
],
|
||||
}
|
||||
.interpret(&Inventory::autoload(), &topology)
|
||||
.await
|
||||
.context("CoreDNS rewrite failed")?;
|
||||
|
||||
deploy_zitadel(&k3d).await?;
|
||||
wait_for_zitadel_ready().await?;
|
||||
|
||||
// Provision Zitadel project + device-code application
|
||||
ZitadelSetupScore {
|
||||
host: ZITADEL_HOST.to_string(),
|
||||
port: HTTP_PORT as u16,
|
||||
skip_tls: true,
|
||||
applications: vec![ZitadelApplication {
|
||||
project_name: PROJECT_NAME.to_string(),
|
||||
app_name: APP_NAME.to_string(),
|
||||
app_type: ZitadelAppType::DeviceCode,
|
||||
}],
|
||||
machine_users: vec![],
|
||||
}
|
||||
.interpret(&Inventory::autoload(), &topology)
|
||||
.await
|
||||
.context("Zitadel setup failed")?;
|
||||
|
||||
// Read the client_id from the cache written by ZitadelSetupScore
|
||||
let zitadel_config =
|
||||
ZitadelClientConfig::load().context("ZitadelSetupScore did not produce a client config")?;
|
||||
let client_id = zitadel_config
|
||||
.client_id(APP_NAME)
|
||||
.context("No client_id for harmony-cli app")?
|
||||
.clone();
|
||||
|
||||
info!("Zitadel app '{}' client_id: {}", APP_NAME, client_id);
|
||||
|
||||
// Now configure OpenBao JWT auth with the real client_id
|
||||
OpenbaoSetupScore {
|
||||
policies: vec![harmony_dev_policy()],
|
||||
users: vec![OpenbaoUser {
|
||||
username: "harmony".to_string(),
|
||||
password: "harmony-dev-password".to_string(),
|
||||
policies: vec!["harmony-dev".to_string()],
|
||||
}],
|
||||
jwt_auth: Some(OpenbaoJwtAuth {
|
||||
oidc_discovery_url: format!("http://{}:{}", ZITADEL_HOST, HTTP_PORT),
|
||||
bound_issuer: format!("http://{}:{}", ZITADEL_HOST, HTTP_PORT),
|
||||
role_name: "harmony-developer".to_string(),
|
||||
bound_audiences: client_id.clone(),
|
||||
user_claim: "email".to_string(),
|
||||
policies: vec!["harmony-dev".to_string()],
|
||||
ttl: "4h".to_string(),
|
||||
max_ttl: "24h".to_string(),
|
||||
}),
|
||||
..Default::default()
|
||||
}
|
||||
.interpret(&Inventory::autoload(), &topology)
|
||||
.await
|
||||
.context("OpenBao JWT auth setup failed")?;
|
||||
|
||||
// --- Phase 3: Config via SSO ---
|
||||
|
||||
info!("===========================================");
|
||||
info!("Storing config via SSO-authenticated OpenBao");
|
||||
info!("===========================================");
|
||||
|
||||
let _pf = k8s
|
||||
.port_forward(OPENBAO_POD, OPENBAO_NAMESPACE, 8200, 8200)
|
||||
.await
|
||||
.context("Port-forward to OpenBao failed")?;
|
||||
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
|
||||
|
||||
let openbao_url = format!("http://127.0.0.1:{}", _pf.port());
|
||||
let sso_url = format!("http://{}:{}", ZITADEL_HOST, HTTP_PORT);
|
||||
|
||||
let store = OpenbaoSecretStore::new(
|
||||
openbao_url,
|
||||
"secret".to_string(),
|
||||
"jwt".to_string(),
|
||||
true,
|
||||
None,
|
||||
None,
|
||||
None,
|
||||
Some(sso_url),
|
||||
Some(client_id),
|
||||
Some("harmony-developer".to_string()),
|
||||
Some("jwt".to_string()),
|
||||
)
|
||||
.await
|
||||
.context("SSO authentication failed")?;
|
||||
|
||||
let manager = ConfigManager::new(vec![
|
||||
Arc::new(EnvSource) as Arc<dyn harmony_config::ConfigSource>,
|
||||
Arc::new(StoreSource::new("harmony".to_string(), store)),
|
||||
]);
|
||||
|
||||
// Try to load existing config (succeeds on re-run)
|
||||
match manager.get::<SsoExampleConfig>().await {
|
||||
Ok(config) => {
|
||||
info!("Config loaded from OpenBao: {:?}", config);
|
||||
}
|
||||
Err(harmony_config::ConfigError::NotFound { .. }) => {
|
||||
info!("No config found, storing default...");
|
||||
let config = SsoExampleConfig::default();
|
||||
manager.set(&config).await?;
|
||||
info!("Config stored: {:?}", config);
|
||||
|
||||
let retrieved: SsoExampleConfig = manager.get().await?;
|
||||
info!("Config verified: {:?}", retrieved);
|
||||
assert_eq!(config, retrieved);
|
||||
}
|
||||
Err(e) => return Err(e.into()),
|
||||
}
|
||||
|
||||
info!("===========================================");
|
||||
info!("Success! Config managed via Zitadel SSO + OpenBao");
|
||||
info!("===========================================");
|
||||
info!("OpenBao: http://{}:{}", OPENBAO_HOST, HTTP_PORT);
|
||||
info!("Zitadel: http://{}:{}", ZITADEL_HOST, HTTP_PORT);
|
||||
info!("Run again to verify cached session works.");
|
||||
info!("cargo run -p example-harmony-sso -- --cleanup # teardown");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -1,15 +0,0 @@
|
||||
[package]
|
||||
name = "example-kvm-okd-ha-cluster"
|
||||
version.workspace = true
|
||||
edition = "2024"
|
||||
license.workspace = true
|
||||
|
||||
[[bin]]
|
||||
name = "kvm_okd_ha_cluster"
|
||||
path = "src/main.rs"
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
tokio.workspace = true
|
||||
log.workspace = true
|
||||
env_logger.workspace = true
|
||||
@@ -1,100 +0,0 @@
|
||||
# OKD HA Cluster on KVM
|
||||
|
||||
Deploys a complete OKD high-availability cluster on a KVM hypervisor using
|
||||
Harmony's KVM module. All infrastructure is defined in Rust — no YAML, no
|
||||
shell scripts, no hand-crafted XML.
|
||||
|
||||
## What it creates
|
||||
|
||||
| Resource | Details |
|
||||
|-------------------|------------------------------------------|
|
||||
| Virtual network | `harmonylan` — 192.168.100.0/24, NAT |
|
||||
| OPNsense VM | 2 vCPU / 4 GiB RAM — gateway + PXE |
|
||||
| Control plane ×3 | 4 vCPU / 16 GiB RAM — `cp0` … `cp2` |
|
||||
| Worker ×3 | 8 vCPU / 32 GiB RAM — `worker0` … `worker2` |
|
||||
|
||||
## Architecture
|
||||
|
||||
All VMs share the same `harmonylan` virtual network. OPNsense sits on both
|
||||
that network and the host bridge, acting as the gateway and PXE server.
|
||||
|
||||
```
|
||||
Host network (bridge)
|
||||
│
|
||||
┌───────┴──────────┐
|
||||
│ OPNsense │ 192.168.100.1
|
||||
│ gateway + PXE │
|
||||
└───────┬──────────┘
|
||||
│
|
||||
│ harmonylan (192.168.100.0/24)
|
||||
├─────────────┬──────────────────┬──────────────────┐
|
||||
│ │ │ │
|
||||
┌───────┴──┐ ┌──────┴───┐ ┌──────────┴─┐ ┌──────────┴─┐
|
||||
│ cp0 │ │ cp1 │ │ cp2 │ │ worker0 │
|
||||
│ .10 │ │ .11 │ │ .12 │ │ .20 │
|
||||
└──────────┘ └──────────┘ └────────────┘ └──────┬─────┘
|
||||
│
|
||||
┌───────┴────┐
|
||||
│ worker1 │
|
||||
│ .21 │
|
||||
└───────┬────┘
|
||||
│
|
||||
┌───────┴────┐
|
||||
│ worker2 │
|
||||
│ .22 │
|
||||
└────────────┘
|
||||
```
|
||||
|
||||
All nodes PXE boot from the network interface. OPNsense serves the OKD
|
||||
bootstrap images via TFTP/iPXE and handles DHCP for the whole subnet.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Linux host with KVM/QEMU and libvirt installed
|
||||
- `libvirt-dev` headers (for building the `virt` crate)
|
||||
- A `default` storage pool configured in libvirt
|
||||
- Sufficient disk space (~550 GiB for all VM images)
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
cargo run --bin kvm_okd_ha_cluster
|
||||
```
|
||||
|
||||
Set `RUST_LOG=info` (or `debug`) to control verbosity.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment variable | Default | Description |
|
||||
|-------------------------|--------------------|-------------------------------------|
|
||||
| `HARMONY_KVM_URI` | `qemu:///system` | Libvirt connection URI |
|
||||
| `HARMONY_KVM_IMAGE_DIR` | harmony data dir | Directory for qcow2 disk images |
|
||||
|
||||
For a remote KVM host over SSH:
|
||||
|
||||
```bash
|
||||
export HARMONY_KVM_URI="qemu+ssh://user@myhost/system"
|
||||
```
|
||||
|
||||
## What happens after `cargo run`
|
||||
|
||||
The program defines all resources in libvirt but does not start any VMs.
|
||||
Next steps:
|
||||
|
||||
1. Start OPNsense: `virsh start opnsense-harmony`
|
||||
2. Connect to the OPNsense web UI at `https://192.168.100.1`
|
||||
3. Configure DHCP, TFTP, and the iPXE menu for OKD
|
||||
4. Start the control plane and worker nodes — they will PXE boot and begin
|
||||
the OKD installation automatically
|
||||
|
||||
## Cleanup
|
||||
|
||||
```bash
|
||||
for vm in opnsense-harmony cp0-harmony cp1-harmony cp2-harmony \
|
||||
worker0-harmony worker1-harmony worker2-harmony; do
|
||||
virsh destroy "$vm" 2>/dev/null || true
|
||||
virsh undefine "$vm" --remove-all-storage 2>/dev/null || true
|
||||
done
|
||||
virsh net-destroy harmonylan 2>/dev/null || true
|
||||
virsh net-undefine harmonylan 2>/dev/null || true
|
||||
```
|
||||
@@ -1,132 +0,0 @@
|
||||
use harmony::modules::kvm::{
|
||||
BootDevice, NetworkConfig, NetworkRef, VmConfig, config::init_executor,
|
||||
};
|
||||
use log::info;
|
||||
|
||||
const NETWORK_NAME: &str = "harmonylan";
|
||||
const NETWORK_GATEWAY: &str = "192.168.100.1";
|
||||
const NETWORK_PREFIX: u8 = 24;
|
||||
|
||||
const OPNSENSE_IP: &str = "192.168.100.1";
|
||||
|
||||
/// Deploys a full OKD HA cluster on a local or remote KVM hypervisor.
|
||||
///
|
||||
/// # What it creates
|
||||
///
|
||||
/// - One isolated virtual network (`harmonylan`, 192.168.100.0/24)
|
||||
/// - One OPNsense VM acting as the cluster gateway and PXE server
|
||||
/// - Three OKD control-plane nodes
|
||||
/// - Three OKD worker nodes
|
||||
///
|
||||
/// All nodes are configured to PXE boot from the network so that OPNsense
|
||||
/// can drive unattended OKD installation via TFTP/iPXE.
|
||||
///
|
||||
/// # Configuration
|
||||
///
|
||||
/// | Environment variable | Default | Description |
|
||||
/// |---------------------------|-----------------------|-----------------------------------|
|
||||
/// | `HARMONY_KVM_URI` | `qemu:///system` | Libvirt connection URI |
|
||||
/// | `HARMONY_KVM_IMAGE_DIR` | harmony data dir | Directory for qcow2 disk images |
|
||||
pub async fn deploy_okd_ha_cluster() -> Result<(), String> {
|
||||
let executor = init_executor().map_err(|e| format!("KVM initialisation failed: {e}"))?;
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Network
|
||||
// -------------------------------------------------------------------------
|
||||
let network = NetworkConfig::builder(NETWORK_NAME)
|
||||
.bridge("virbr100")
|
||||
.subnet(NETWORK_GATEWAY, NETWORK_PREFIX)
|
||||
.build();
|
||||
|
||||
info!("Ensuring network '{NETWORK_NAME}' ({NETWORK_GATEWAY}/{NETWORK_PREFIX}) exists");
|
||||
executor
|
||||
.ensure_network(network)
|
||||
.await
|
||||
.map_err(|e| format!("Network setup failed: {e}"))?;
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// OPNsense gateway / PXE server
|
||||
// -------------------------------------------------------------------------
|
||||
let opnsense = opnsense_vm();
|
||||
info!("Defining OPNsense VM '{}'", opnsense.name);
|
||||
executor
|
||||
.ensure_vm(opnsense)
|
||||
.await
|
||||
.map_err(|e| format!("OPNsense VM setup failed: {e}"))?;
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Control plane nodes
|
||||
// -------------------------------------------------------------------------
|
||||
for i in 0u8..3 {
|
||||
let vm = control_plane_vm(i);
|
||||
info!("Defining control plane VM '{}'", vm.name);
|
||||
executor
|
||||
.ensure_vm(vm)
|
||||
.await
|
||||
.map_err(|e| format!("Control plane VM setup failed: {e}"))?;
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Worker nodes
|
||||
// -------------------------------------------------------------------------
|
||||
for i in 0u8..3 {
|
||||
let vm = worker_vm(i);
|
||||
info!("Defining worker VM '{}'", vm.name);
|
||||
executor
|
||||
.ensure_vm(vm)
|
||||
.await
|
||||
.map_err(|e| format!("Worker VM setup failed: {e}"))?;
|
||||
}
|
||||
|
||||
info!(
|
||||
"OKD HA cluster infrastructure ready. \
|
||||
Connect OPNsense at https://{OPNSENSE_IP} to configure DHCP, TFTP, and PXE \
|
||||
before starting the nodes."
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// VM definitions
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
/// OPNsense firewall — gateway and PXE server for the cluster.
|
||||
///
|
||||
/// Connected to both the host bridge (WAN) and `harmonylan` (LAN). It manages
|
||||
/// DHCP, TFTP, and the PXE menu that drives OKD installation on all other VMs.
|
||||
fn opnsense_vm() -> VmConfig {
|
||||
VmConfig::builder("opnsense-harmony")
|
||||
.vcpus(2)
|
||||
.memory_gb(4)
|
||||
.disk(20) // OS disk: vda
|
||||
.network(NetworkRef::named(NETWORK_NAME))
|
||||
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
|
||||
.build()
|
||||
}
|
||||
|
||||
/// One OKD control-plane node. Indexed 0..2 → `cp0-harmony` … `cp2-harmony`.
|
||||
///
|
||||
/// Boots from network so OPNsense can serve the OKD bootstrap image via PXE.
|
||||
fn control_plane_vm(index: u8) -> VmConfig {
|
||||
VmConfig::builder(format!("cp{index}-harmony"))
|
||||
.vcpus(4)
|
||||
.memory_gb(16)
|
||||
.disk(120) // OS + etcd: vda
|
||||
.network(NetworkRef::named(NETWORK_NAME))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build()
|
||||
}
|
||||
|
||||
/// One OKD worker node. Indexed 0..2 → `worker0-harmony` … `worker2-harmony`.
|
||||
///
|
||||
/// Boots from network for automated OKD installation.
|
||||
fn worker_vm(index: u8) -> VmConfig {
|
||||
VmConfig::builder(format!("worker{index}-harmony"))
|
||||
.vcpus(8)
|
||||
.memory_gb(32)
|
||||
.disk(120) // OS: vda
|
||||
.disk(200) // Persistent storage (ODF/Rook): vdb
|
||||
.network(NetworkRef::named(NETWORK_NAME))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build()
|
||||
}
|
||||
@@ -1,7 +0,0 @@
|
||||
use example_kvm_okd_ha_cluster::deploy_okd_ha_cluster;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), String> {
|
||||
env_logger::init();
|
||||
deploy_okd_ha_cluster().await
|
||||
}
|
||||
@@ -1,16 +0,0 @@
|
||||
[package]
|
||||
name = "kvm-vm-examples"
|
||||
version.workspace = true
|
||||
edition = "2024"
|
||||
license.workspace = true
|
||||
|
||||
[[bin]]
|
||||
name = "kvm-vm-examples"
|
||||
path = "src/main.rs"
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
tokio.workspace = true
|
||||
log.workspace = true
|
||||
env_logger.workspace = true
|
||||
clap = { version = "4", features = ["derive"] }
|
||||
@@ -1,47 +0,0 @@
|
||||
# KVM VM Examples
|
||||
|
||||
Demonstrates creating VMs with various configurations using harmony's KVM module. These examples exercise the same infrastructure primitives needed for the full OKD HA cluster with OPNsense, control plane, and workers with Ceph.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
A working KVM/libvirt setup:
|
||||
|
||||
```bash
|
||||
# Manjaro / Arch
|
||||
sudo pacman -S qemu-full libvirt virt-install dnsmasq ebtables
|
||||
sudo systemctl enable --now libvirtd
|
||||
sudo usermod -aG libvirt $USER
|
||||
# Log out and back in for group membership to take effect
|
||||
```
|
||||
|
||||
## Scenarios
|
||||
|
||||
| Scenario | VMs | Disks | NICs | Purpose |
|
||||
|----------|-----|-------|------|---------|
|
||||
| `alpine` | 1 | 1x2G | 1 | Minimal VM, fast boot (~5s) |
|
||||
| `ubuntu` | 1 | 1x25G | 1 | Standard server setup |
|
||||
| `worker` | 1 | 3 (60G+100G+100G) | 1 | Multi-disk for Ceph OSD |
|
||||
| `gateway` | 1 | 1x10G | 2 (WAN+LAN) | Dual-NIC firewall |
|
||||
| `ha-cluster` | 7 | mixed | 1 each | Full HA: gateway + 3 CP + 3 workers |
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Deploy a scenario
|
||||
cargo run -p kvm-vm-examples -- alpine
|
||||
cargo run -p kvm-vm-examples -- ubuntu
|
||||
cargo run -p kvm-vm-examples -- worker
|
||||
cargo run -p kvm-vm-examples -- gateway
|
||||
cargo run -p kvm-vm-examples -- ha-cluster
|
||||
|
||||
# Check status
|
||||
cargo run -p kvm-vm-examples -- status alpine
|
||||
|
||||
# Clean up
|
||||
cargo run -p kvm-vm-examples -- clean alpine
|
||||
```
|
||||
|
||||
## Environment variables
|
||||
|
||||
- `HARMONY_KVM_URI`: libvirt URI (default: `qemu:///system`)
|
||||
- `HARMONY_KVM_IMAGE_DIR`: where disk images and ISOs are stored
|
||||
@@ -1,358 +0,0 @@
|
||||
//! KVM VM examples demonstrating various configurations.
|
||||
//!
|
||||
//! Each subcommand creates a different VM setup. All VMs are managed
|
||||
//! via libvirt — you need a working KVM hypervisor on the host.
|
||||
//!
|
||||
//! # Prerequisites
|
||||
//!
|
||||
//! ```bash
|
||||
//! # Manjaro / Arch
|
||||
//! sudo pacman -S qemu-full libvirt virt-install dnsmasq ebtables
|
||||
//! sudo systemctl enable --now libvirtd
|
||||
//! sudo usermod -aG libvirt $USER
|
||||
//! ```
|
||||
//!
|
||||
//! # Environment variables
|
||||
//!
|
||||
//! - `HARMONY_KVM_URI`: libvirt URI (default: `qemu:///system`)
|
||||
//! - `HARMONY_KVM_IMAGE_DIR`: disk image directory (default: `~/.local/share/harmony/kvm/images`)
|
||||
//!
|
||||
//! # Usage
|
||||
//!
|
||||
//! ```bash
|
||||
//! # Simple Alpine VM (tiny, boots in seconds — great for testing)
|
||||
//! cargo run -p kvm-vm-examples -- alpine
|
||||
//!
|
||||
//! # Ubuntu Server with cloud-init
|
||||
//! cargo run -p kvm-vm-examples -- ubuntu
|
||||
//!
|
||||
//! # Multi-disk worker node (Ceph OSD style)
|
||||
//! cargo run -p kvm-vm-examples -- worker
|
||||
//!
|
||||
//! # Multi-NIC gateway (OPNsense style: WAN + LAN)
|
||||
//! cargo run -p kvm-vm-examples -- gateway
|
||||
//!
|
||||
//! # Full HA cluster: 1 gateway + 3 control plane + 3 workers
|
||||
//! cargo run -p kvm-vm-examples -- ha-cluster
|
||||
//!
|
||||
//! # Clean up all VMs and networks from a scenario
|
||||
//! cargo run -p kvm-vm-examples -- clean <scenario>
|
||||
//! ```
|
||||
|
||||
use clap::{Parser, Subcommand};
|
||||
use harmony::modules::kvm::config::init_executor;
|
||||
use harmony::modules::kvm::{
|
||||
BootDevice, ForwardMode, KvmExecutor, NetworkConfig, NetworkRef, VmConfig, VmStatus,
|
||||
};
|
||||
use log::info;
|
||||
|
||||
#[derive(Parser)]
|
||||
#[command(name = "kvm-vm-examples")]
|
||||
#[command(about = "KVM VM examples for various infrastructure setups")]
|
||||
struct Cli {
|
||||
#[command(subcommand)]
|
||||
command: Commands,
|
||||
}
|
||||
|
||||
#[derive(Subcommand)]
|
||||
enum Commands {
|
||||
/// Minimal Alpine Linux VM — fast boot, ~150MB ISO
|
||||
Alpine,
|
||||
/// Ubuntu Server 24.04 — standard server with 1 disk
|
||||
Ubuntu,
|
||||
/// Worker node with multiple disks (OS + Ceph OSD storage)
|
||||
Worker,
|
||||
/// Gateway/firewall with 2 NICs (WAN + LAN)
|
||||
Gateway,
|
||||
/// Full HA cluster: gateway + 3 control plane + 3 worker nodes
|
||||
HaCluster,
|
||||
/// Tear down all VMs and networks for a scenario
|
||||
Clean {
|
||||
/// Scenario to clean: alpine, ubuntu, worker, gateway, ha-cluster
|
||||
scenario: String,
|
||||
},
|
||||
/// Show status of all VMs in a scenario
|
||||
Status {
|
||||
/// Scenario: alpine, ubuntu, worker, gateway, ha-cluster
|
||||
scenario: String,
|
||||
},
|
||||
}
|
||||
|
||||
const ALPINE_ISO: &str =
|
||||
"https://dl-cdn.alpinelinux.org/alpine/v3.21/releases/x86_64/alpine-virt-3.21.3-x86_64.iso";
|
||||
const UBUNTU_ISO: &str = "https://releases.ubuntu.com/24.04.2/ubuntu-24.04.2-live-server-amd64.iso";
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
|
||||
|
||||
let cli = Cli::parse();
|
||||
let executor = init_executor()?;
|
||||
|
||||
match cli.command {
|
||||
Commands::Alpine => deploy_alpine(&executor).await?,
|
||||
Commands::Ubuntu => deploy_ubuntu(&executor).await?,
|
||||
Commands::Worker => deploy_worker(&executor).await?,
|
||||
Commands::Gateway => deploy_gateway(&executor).await?,
|
||||
Commands::HaCluster => deploy_ha_cluster(&executor).await?,
|
||||
Commands::Clean { scenario } => clean(&executor, &scenario).await?,
|
||||
Commands::Status { scenario } => status(&executor, &scenario).await?,
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Alpine: minimal VM ──────────────────────────────────────────────────
|
||||
|
||||
async fn deploy_alpine(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let net = NetworkConfig::builder("alpine-net")
|
||||
.subnet("192.168.110.1", 24)
|
||||
.forward(ForwardMode::Nat)
|
||||
.build();
|
||||
|
||||
executor.ensure_network(net).await?;
|
||||
|
||||
let vm = VmConfig::builder("alpine-vm")
|
||||
.vcpus(1)
|
||||
.memory_mib(512)
|
||||
.disk(2)
|
||||
.network(NetworkRef::named("alpine-net"))
|
||||
.cdrom(ALPINE_ISO)
|
||||
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
|
||||
.build();
|
||||
|
||||
executor.ensure_vm(vm.clone()).await?;
|
||||
executor.start_vm(&vm.name).await?;
|
||||
|
||||
info!("Alpine VM running. Connect: virsh console {}", vm.name);
|
||||
info!("Login: root (no password). Install: setup-alpine");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Ubuntu Server: standard setup ───────────────────────────────────────
|
||||
|
||||
async fn deploy_ubuntu(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let net = NetworkConfig::builder("ubuntu-net")
|
||||
.subnet("192.168.120.1", 24)
|
||||
.forward(ForwardMode::Nat)
|
||||
.build();
|
||||
|
||||
executor.ensure_network(net).await?;
|
||||
|
||||
let vm = VmConfig::builder("ubuntu-server")
|
||||
.vcpus(2)
|
||||
.memory_gb(4)
|
||||
.disk(25)
|
||||
.network(NetworkRef::named("ubuntu-net"))
|
||||
.cdrom(UBUNTU_ISO)
|
||||
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
|
||||
.build();
|
||||
|
||||
executor.ensure_vm(vm.clone()).await?;
|
||||
executor.start_vm(&vm.name).await?;
|
||||
|
||||
info!(
|
||||
"Ubuntu Server VM running. Connect: virsh console {}",
|
||||
vm.name
|
||||
);
|
||||
info!("Follow the interactive installer to complete setup.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Worker: multi-disk for Ceph ─────────────────────────────────────────
|
||||
|
||||
async fn deploy_worker(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let net = NetworkConfig::builder("worker-net")
|
||||
.subnet("192.168.130.1", 24)
|
||||
.forward(ForwardMode::Nat)
|
||||
.build();
|
||||
|
||||
executor.ensure_network(net).await?;
|
||||
|
||||
let vm = VmConfig::builder("worker-node")
|
||||
.vcpus(4)
|
||||
.memory_gb(8)
|
||||
.disk(60) // vda: OS
|
||||
.disk(100) // vdb: Ceph OSD 1
|
||||
.disk(100) // vdc: Ceph OSD 2
|
||||
.network(NetworkRef::named("worker-net"))
|
||||
.cdrom(ALPINE_ISO) // Use Alpine for fast testing
|
||||
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
|
||||
.build();
|
||||
|
||||
executor.ensure_vm(vm.clone()).await?;
|
||||
executor.start_vm(&vm.name).await?;
|
||||
|
||||
info!("Worker node running with 3 disks (vda=60G OS, vdb=100G OSD, vdc=100G OSD)");
|
||||
info!("Connect: virsh console {}", vm.name);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Gateway: dual-NIC firewall ──────────────────────────────────────────
|
||||
|
||||
async fn deploy_gateway(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
// WAN: NAT network (internet access)
|
||||
let wan = NetworkConfig::builder("gw-wan")
|
||||
.subnet("192.168.140.1", 24)
|
||||
.forward(ForwardMode::Nat)
|
||||
.build();
|
||||
|
||||
// LAN: isolated network (no internet, internal only)
|
||||
let lan = NetworkConfig::builder("gw-lan")
|
||||
.subnet("10.100.0.1", 24)
|
||||
.isolated()
|
||||
.build();
|
||||
|
||||
executor.ensure_network(wan).await?;
|
||||
executor.ensure_network(lan).await?;
|
||||
|
||||
let vm = VmConfig::builder("gateway-vm")
|
||||
.vcpus(2)
|
||||
.memory_gb(2)
|
||||
.disk(10)
|
||||
.network(NetworkRef::named("gw-wan")) // First NIC = WAN
|
||||
.network(NetworkRef::named("gw-lan")) // Second NIC = LAN
|
||||
.cdrom(ALPINE_ISO)
|
||||
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
|
||||
.build();
|
||||
|
||||
executor.ensure_vm(vm.clone()).await?;
|
||||
executor.start_vm(&vm.name).await?;
|
||||
|
||||
info!("Gateway VM running with 2 NICs: WAN (gw-wan) + LAN (gw-lan)");
|
||||
info!("Connect: virsh console {}", vm.name);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── HA Cluster: full OKD-style deployment ───────────────────────────────
|
||||
|
||||
async fn deploy_ha_cluster(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Network: NAT for external access, all nodes on the same subnet
|
||||
let cluster_net = NetworkConfig::builder("ha-cluster")
|
||||
.bridge("virbr-ha")
|
||||
.subnet("10.200.0.1", 24)
|
||||
.forward(ForwardMode::Nat)
|
||||
.build();
|
||||
|
||||
executor.ensure_network(cluster_net).await?;
|
||||
|
||||
// Gateway / firewall / load balancer
|
||||
let gateway = VmConfig::builder("ha-gateway")
|
||||
.vcpus(2)
|
||||
.memory_gb(2)
|
||||
.disk(10)
|
||||
.network(NetworkRef::named("ha-cluster"))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build();
|
||||
executor.ensure_vm(gateway.clone()).await?;
|
||||
info!("Defined: {} (gateway/firewall)", gateway.name);
|
||||
|
||||
// Control plane nodes
|
||||
for i in 1..=3 {
|
||||
let cp = VmConfig::builder(format!("ha-cp-{i}"))
|
||||
.vcpus(4)
|
||||
.memory_gb(16)
|
||||
.disk(120)
|
||||
.network(NetworkRef::named("ha-cluster"))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build();
|
||||
executor.ensure_vm(cp.clone()).await?;
|
||||
info!("Defined: {} (control plane)", cp.name);
|
||||
}
|
||||
|
||||
// Worker nodes with Ceph storage
|
||||
for i in 1..=3 {
|
||||
let worker = VmConfig::builder(format!("ha-worker-{i}"))
|
||||
.vcpus(8)
|
||||
.memory_gb(32)
|
||||
.disk(120) // vda: OS
|
||||
.disk(200) // vdb: Ceph OSD
|
||||
.network(NetworkRef::named("ha-cluster"))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build();
|
||||
executor.ensure_vm(worker.clone()).await?;
|
||||
info!("Defined: {} (worker + Ceph)", worker.name);
|
||||
}
|
||||
|
||||
info!("HA cluster defined (7 VMs). Start individually or use PXE boot.");
|
||||
info!(
|
||||
"To start all: for vm in ha-gateway ha-cp-{{1..3}} ha-worker-{{1..3}}; do virsh start $vm; done"
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Clean up ────────────────────────────────────────────────────────────
|
||||
|
||||
async fn clean(executor: &KvmExecutor, scenario: &str) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let (vms, nets) = match scenario {
|
||||
"alpine" => (vec!["alpine-vm"], vec!["alpine-net"]),
|
||||
"ubuntu" => (vec!["ubuntu-server"], vec!["ubuntu-net"]),
|
||||
"worker" => (vec!["worker-node"], vec!["worker-net"]),
|
||||
"gateway" => (vec!["gateway-vm"], vec!["gw-wan", "gw-lan"]),
|
||||
"ha-cluster" => (
|
||||
vec![
|
||||
"ha-gateway",
|
||||
"ha-cp-1",
|
||||
"ha-cp-2",
|
||||
"ha-cp-3",
|
||||
"ha-worker-1",
|
||||
"ha-worker-2",
|
||||
"ha-worker-3",
|
||||
],
|
||||
vec!["ha-cluster"],
|
||||
),
|
||||
other => {
|
||||
eprintln!("Unknown scenario: {other}");
|
||||
eprintln!("Available: alpine, ubuntu, worker, gateway, ha-cluster");
|
||||
std::process::exit(1);
|
||||
}
|
||||
};
|
||||
|
||||
for vm in &vms {
|
||||
info!("Cleaning up VM: {vm}");
|
||||
let _ = executor.destroy_vm(vm).await;
|
||||
let _ = executor.undefine_vm(vm).await;
|
||||
}
|
||||
for net in &nets {
|
||||
info!("Cleaning up network: {net}");
|
||||
let _ = executor.delete_network(net).await;
|
||||
}
|
||||
|
||||
info!("Cleanup complete for scenario: {scenario}");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Status ──────────────────────────────────────────────────────────────
|
||||
|
||||
async fn status(executor: &KvmExecutor, scenario: &str) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let vms: Vec<&str> = match scenario {
|
||||
"alpine" => vec!["alpine-vm"],
|
||||
"ubuntu" => vec!["ubuntu-server"],
|
||||
"worker" => vec!["worker-node"],
|
||||
"gateway" => vec!["gateway-vm"],
|
||||
"ha-cluster" => vec![
|
||||
"ha-gateway",
|
||||
"ha-cp-1",
|
||||
"ha-cp-2",
|
||||
"ha-cp-3",
|
||||
"ha-worker-1",
|
||||
"ha-worker-2",
|
||||
"ha-worker-3",
|
||||
],
|
||||
other => {
|
||||
eprintln!("Unknown scenario: {other}");
|
||||
std::process::exit(1);
|
||||
}
|
||||
};
|
||||
|
||||
println!("{:<20} {}", "VM", "STATUS");
|
||||
println!("{}", "-".repeat(35));
|
||||
for vm in &vms {
|
||||
let status = match executor.vm_status(vm).await {
|
||||
Ok(s) => format!("{s:?}"),
|
||||
Err(_) => "not found".to_string(),
|
||||
};
|
||||
println!("{:<20} {}", vm, status);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
use brocade::BrocadeOptions;
|
||||
use cidr::Ipv4Cidr;
|
||||
use harmony::{
|
||||
config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials},
|
||||
hardware::{Location, SwitchGroup},
|
||||
infra::{
|
||||
brocade::{BrocadeSwitchClient, BrocadeSwitchConfig},
|
||||
@@ -12,12 +11,20 @@ use harmony::{
|
||||
topology::{HAClusterTopology, LogicalHost, UnmanagedRouter},
|
||||
};
|
||||
use harmony_macros::{ip, ipv4};
|
||||
use harmony_secret::SecretManager;
|
||||
use harmony_secret::{Secret, SecretManager};
|
||||
use schemars::JsonSchema;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::{
|
||||
net::IpAddr,
|
||||
sync::{Arc, OnceLock},
|
||||
};
|
||||
|
||||
#[derive(Secret, Serialize, Deserialize, JsonSchema, Debug, PartialEq)]
|
||||
struct OPNSenseFirewallConfig {
|
||||
username: String,
|
||||
password: String,
|
||||
}
|
||||
|
||||
pub async fn get_topology() -> HAClusterTopology {
|
||||
let firewall = harmony::topology::LogicalHost {
|
||||
ip: ip!("192.168.1.1"),
|
||||
@@ -43,16 +50,17 @@ pub async fn get_topology() -> HAClusterTopology {
|
||||
|
||||
let switch_client = Arc::new(switch_client);
|
||||
|
||||
let ssh_creds = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
|
||||
.await
|
||||
.unwrap();
|
||||
let api_creds = SecretManager::get_or_prompt::<OPNSenseApiCredentials>()
|
||||
.await
|
||||
.unwrap();
|
||||
let config = SecretManager::get_or_prompt::<OPNSenseFirewallConfig>().await;
|
||||
let config = config.unwrap();
|
||||
|
||||
let opnsense = Arc::new(
|
||||
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds)
|
||||
.await,
|
||||
harmony::infra::opnsense::OPNSenseFirewall::new(
|
||||
firewall,
|
||||
None,
|
||||
&config.username,
|
||||
&config.password,
|
||||
)
|
||||
.await,
|
||||
);
|
||||
let lan_subnet = ipv4!("192.168.1.0");
|
||||
let gateway_ipv4 = ipv4!("192.168.1.1");
|
||||
|
||||
@@ -43,17 +43,17 @@ pub async fn get_topology() -> HAClusterTopology {
|
||||
|
||||
let switch_client = Arc::new(switch_client);
|
||||
|
||||
let ssh_creds = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
|
||||
.await
|
||||
.unwrap();
|
||||
let api_creds =
|
||||
SecretManager::get_or_prompt::<harmony::config::secret::OPNSenseApiCredentials>()
|
||||
.await
|
||||
.unwrap();
|
||||
let config = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>().await;
|
||||
let config = config.unwrap();
|
||||
|
||||
let opnsense = Arc::new(
|
||||
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds)
|
||||
.await,
|
||||
harmony::infra::opnsense::OPNSenseFirewall::new(
|
||||
firewall,
|
||||
None,
|
||||
&config.username,
|
||||
&config.password,
|
||||
)
|
||||
.await,
|
||||
);
|
||||
let lan_subnet = ipv4!("192.168.1.0");
|
||||
let gateway_ipv4 = ipv4!("192.168.1.1");
|
||||
|
||||
@@ -6,7 +6,6 @@ use harmony::{
|
||||
async fn main() {
|
||||
let openbao = OpenbaoScore {
|
||||
host: "openbao.sebastien.sto1.nationtech.io".to_string(),
|
||||
openshift: false,
|
||||
};
|
||||
|
||||
harmony_cli::run(
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
use harmony::{
|
||||
config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials},
|
||||
config::secret::OPNSenseFirewallCredentials,
|
||||
infra::opnsense::OPNSenseFirewall,
|
||||
inventory::Inventory,
|
||||
modules::{dhcp::DhcpScore, opnsense::OPNsenseShellCommandScore},
|
||||
@@ -17,14 +17,17 @@ async fn main() {
|
||||
name: String::from("opnsense-1"),
|
||||
};
|
||||
|
||||
let ssh_creds = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
|
||||
let opnsense_auth = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
|
||||
.await
|
||||
.expect("Failed to get SSH credentials");
|
||||
let api_creds = SecretManager::get_or_prompt::<OPNSenseApiCredentials>()
|
||||
.await
|
||||
.expect("Failed to get API credentials");
|
||||
.expect("Failed to get credentials");
|
||||
|
||||
let opnsense = OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds).await;
|
||||
let opnsense = OPNSenseFirewall::new(
|
||||
firewall,
|
||||
None,
|
||||
&opnsense_auth.username,
|
||||
&opnsense_auth.password,
|
||||
)
|
||||
.await;
|
||||
|
||||
let dhcp_score = DhcpScore {
|
||||
dhcp_range: (
|
||||
|
||||
@@ -48,17 +48,8 @@ async fn main() {
|
||||
name: String::from("fw0"),
|
||||
};
|
||||
|
||||
let api_creds = harmony::config::secret::OPNSenseApiCredentials {
|
||||
key: "root".to_string(),
|
||||
secret: "opnsense".to_string(),
|
||||
};
|
||||
let ssh_creds = harmony::config::secret::OPNSenseFirewallCredentials {
|
||||
username: "root".to_string(),
|
||||
password: "opnsense".to_string(),
|
||||
};
|
||||
let opnsense = Arc::new(
|
||||
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds)
|
||||
.await,
|
||||
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, "root", "opnsense").await,
|
||||
);
|
||||
|
||||
let topology = OpnSenseTopology {
|
||||
|
||||
@@ -1,25 +0,0 @@
|
||||
[package]
|
||||
name = "opnsense-pair-integration"
|
||||
version.workspace = true
|
||||
edition = "2024"
|
||||
license.workspace = true
|
||||
|
||||
[[bin]]
|
||||
name = "opnsense-pair-integration"
|
||||
path = "src/main.rs"
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
harmony_cli = { path = "../../harmony_cli" }
|
||||
harmony_inventory_agent = { path = "../../harmony_inventory_agent" }
|
||||
harmony_macros = { path = "../../harmony_macros" }
|
||||
harmony_types = { path = "../../harmony_types" }
|
||||
opnsense-api = { path = "../../opnsense-api" }
|
||||
opnsense-config = { path = "../../opnsense-config" }
|
||||
tokio.workspace = true
|
||||
log.workspace = true
|
||||
env_logger.workspace = true
|
||||
reqwest.workspace = true
|
||||
russh.workspace = true
|
||||
serde_json.workspace = true
|
||||
dirs = "6"
|
||||
@@ -1,64 +0,0 @@
|
||||
# OPNsense Firewall Pair Integration Example
|
||||
|
||||
Boots two OPNsense VMs, bootstraps both with automated SSH/API setup, then configures a CARP HA firewall pair using `FirewallPairTopology` and `CarpVipScore`. Fully automated, CI-friendly.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
# Prerequisites (same as single-VM example)
|
||||
./examples/opnsense_vm_integration/setup-libvirt.sh
|
||||
|
||||
# Boot + bootstrap + pair test (fully unattended)
|
||||
cargo run -p opnsense-pair-integration -- --full
|
||||
```
|
||||
|
||||
## What it does
|
||||
|
||||
1. Creates a shared LAN network + 2 OPNsense VMs (2 NICs each: LAN + WAN)
|
||||
2. Bootstraps both VMs sequentially using NIC link control to avoid IP conflicts:
|
||||
- Disables backup's LAN NIC
|
||||
- Bootstraps primary on .1 (login, SSH, webgui port 9443)
|
||||
- Changes primary's LAN IP from .1 to .2
|
||||
- Swaps NICs (disable primary, enable backup)
|
||||
- Bootstraps backup on .1
|
||||
- Changes backup's LAN IP from .1 to .3
|
||||
- Re-enables all NICs
|
||||
3. Applies pair scores via `FirewallPairTopology`:
|
||||
- `CarpVipScore` — CARP VIP at .1 (primary advskew=0, backup advskew=100)
|
||||
- `VlanScore` — VLAN 100 on both
|
||||
- `FirewallRuleScore` — ICMP allow on both
|
||||
4. Verifies CARP VIPs and VLANs via REST API on both firewalls
|
||||
|
||||
## Network topology
|
||||
|
||||
```
|
||||
Host (192.168.1.10)
|
||||
|
|
||||
+--- virbr-pair (192.168.1.0/24, NAT)
|
||||
| | |
|
||||
| fw-primary fw-backup
|
||||
| vtnet0=.2 vtnet0=.3
|
||||
| (CARP VIP: .1)
|
||||
|
|
||||
+--- virbr0 (default, DHCP)
|
||||
| |
|
||||
fw-primary fw-backup
|
||||
vtnet1=dhcp vtnet1=dhcp (WAN)
|
||||
```
|
||||
|
||||
Both VMs boot with OPNsense's default LAN IP of 192.168.1.1. The NIC juggling sequence ensures only one VM has its LAN NIC active at a time during bootstrap, avoiding address conflicts.
|
||||
|
||||
## Requirements
|
||||
|
||||
Same as the single-VM example: Linux with KVM, libvirt, ~20 GB disk space, ~20 minutes first run.
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `--check` | Verify prerequisites |
|
||||
| `--boot` | Boot + bootstrap both VMs |
|
||||
| (default) | Run pair integration test |
|
||||
| `--full` | Boot + bootstrap + test (CI mode) |
|
||||
| `--status` | Show both VMs' status |
|
||||
| `--clean` | Destroy both VMs and networks |
|
||||
@@ -1,690 +0,0 @@
|
||||
//! OPNsense firewall pair integration example.
|
||||
//!
|
||||
//! Boots two OPNsense VMs, bootstraps both (login, SSH, webgui port),
|
||||
//! then applies `FirewallPairTopology` + `CarpVipScore` for CARP HA testing.
|
||||
//!
|
||||
//! Both VMs share a LAN bridge but boot with the same default IP (.1).
|
||||
//! The bootstrap sequence disables one VM's LAN NIC while bootstrapping
|
||||
//! the other, then changes IPs via the API to avoid conflicts.
|
||||
//!
|
||||
//! # Usage
|
||||
//!
|
||||
//! ```bash
|
||||
//! cargo run -p opnsense-pair-integration -- --check # verify prerequisites
|
||||
//! cargo run -p opnsense-pair-integration -- --boot # boot + bootstrap both VMs
|
||||
//! cargo run -p opnsense-pair-integration # run pair integration test
|
||||
//! cargo run -p opnsense-pair-integration -- --full # boot + bootstrap + test (CI mode)
|
||||
//! cargo run -p opnsense-pair-integration -- --status # check both VMs
|
||||
//! cargo run -p opnsense-pair-integration -- --clean # tear down everything
|
||||
//! ```
|
||||
|
||||
use std::net::IpAddr;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::Arc;
|
||||
|
||||
use harmony::config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials};
|
||||
use harmony::infra::opnsense::OPNSenseFirewall;
|
||||
use harmony::inventory::Inventory;
|
||||
use harmony::modules::kvm::config::init_executor;
|
||||
use harmony::modules::kvm::{
|
||||
BootDevice, ForwardMode, KvmExecutor, NetworkConfig, NetworkRef, VmConfig,
|
||||
};
|
||||
use harmony::modules::opnsense::bootstrap::OPNsenseBootstrap;
|
||||
use harmony::modules::opnsense::firewall::{FilterRuleDef, FirewallRuleScore};
|
||||
use harmony::modules::opnsense::vip::VipDef;
|
||||
use harmony::modules::opnsense::vlan::{VlanDef, VlanScore};
|
||||
use harmony::score::Score;
|
||||
use harmony::topology::{CarpVipScore, FirewallPairTopology, LogicalHost};
|
||||
use harmony_types::firewall::{Direction, FirewallAction, IpProtocol, NetworkProtocol, VipMode};
|
||||
use log::info;
|
||||
|
||||
const OPNSENSE_IMG_URL: &str =
|
||||
"https://mirror.ams1.nl.leaseweb.net/opnsense/releases/26.1/OPNsense-26.1-nano-amd64.img.bz2";
|
||||
const OPNSENSE_IMG_NAME: &str = "OPNsense-26.1-nano-amd64.img";
|
||||
|
||||
const VM_PRIMARY: &str = "opn-pair-primary";
|
||||
const VM_BACKUP: &str = "opn-pair-backup";
|
||||
const NET_LAN: &str = "opn-pair-lan";
|
||||
|
||||
/// Both VMs boot on this IP (OPNsense default, ignores injected config.xml).
|
||||
/// We bootstrap one at a time by toggling LAN NICs, then change IPs via the API.
|
||||
const BOOT_IP: &str = "192.168.1.1";
|
||||
const HOST_IP: &str = "192.168.1.10";
|
||||
|
||||
/// After bootstrap, primary gets .2, backup gets .3, CARP VIP stays at .1
|
||||
const PRIMARY_IP: &str = "192.168.1.2";
|
||||
const BACKUP_IP: &str = "192.168.1.3";
|
||||
const CARP_VIP: &str = "192.168.1.1";
|
||||
|
||||
const API_PORT: u16 = 9443;
|
||||
const CARP_PASSWORD: &str = "pair-test-carp";
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
harmony_cli::cli_logger::init();
|
||||
|
||||
let args: Vec<String> = std::env::args().collect();
|
||||
|
||||
if args.iter().any(|a| a == "--check") {
|
||||
return check_prerequisites();
|
||||
}
|
||||
if args.iter().any(|a| a == "--download") {
|
||||
download_image().await?;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let executor = init_executor()?;
|
||||
|
||||
if args.iter().any(|a| a == "--clean") {
|
||||
return clean(&executor).await;
|
||||
}
|
||||
if args.iter().any(|a| a == "--status") {
|
||||
return status(&executor).await;
|
||||
}
|
||||
if args.iter().any(|a| a == "--boot") {
|
||||
let img_path = download_image().await?;
|
||||
return boot_pair(&executor, &img_path).await;
|
||||
}
|
||||
if args.iter().any(|a| a == "--full") {
|
||||
let img_path = download_image().await?;
|
||||
boot_pair(&executor, &img_path).await?;
|
||||
return run_pair_test().await;
|
||||
}
|
||||
|
||||
// Default: run pair test (assumes VMs are bootstrapped)
|
||||
check_prerequisites()?;
|
||||
run_pair_test().await
|
||||
}
|
||||
|
||||
// ── Phase 1: Boot and bootstrap both VMs ───────────────────────────
|
||||
|
||||
async fn boot_pair(
|
||||
executor: &KvmExecutor,
|
||||
img_path: &Path,
|
||||
) -> Result<(), Box<dyn std::error::Error>> {
|
||||
info!("Creating shared LAN network and two OPNsense VMs...");
|
||||
|
||||
// Create the shared LAN network
|
||||
let network = NetworkConfig::builder(NET_LAN)
|
||||
.bridge("virbr-pair")
|
||||
.subnet(HOST_IP, 24)
|
||||
.forward(ForwardMode::Nat)
|
||||
.build();
|
||||
executor.ensure_network(network).await?;
|
||||
|
||||
// Prepare disk images for both VMs
|
||||
for vm_name in [VM_PRIMARY, VM_BACKUP] {
|
||||
prepare_vm_disk(vm_name, img_path)?;
|
||||
}
|
||||
|
||||
// Define and start both VMs (2 NICs each: LAN + WAN)
|
||||
for vm_name in [VM_PRIMARY, VM_BACKUP] {
|
||||
let disk = image_dir().join(format!("{vm_name}.qcow2"));
|
||||
let vm = VmConfig::builder(vm_name)
|
||||
.vcpus(1)
|
||||
.memory_mib(1024)
|
||||
.disk_from_path(disk.to_string_lossy().to_string())
|
||||
.network(NetworkRef::named(NET_LAN)) // vtnet0 = LAN
|
||||
.network(NetworkRef::named("default")) // vtnet1 = WAN
|
||||
.boot_order([BootDevice::Disk])
|
||||
.build();
|
||||
executor.ensure_vm(vm).await?;
|
||||
executor.start_vm(vm_name).await?;
|
||||
}
|
||||
|
||||
// Get MAC addresses for LAN NICs (first interface on each VM)
|
||||
let primary_interfaces = executor.list_interfaces(VM_PRIMARY).await?;
|
||||
let backup_interfaces = executor.list_interfaces(VM_BACKUP).await?;
|
||||
let primary_lan_mac = &primary_interfaces[0].mac;
|
||||
let backup_lan_mac = &backup_interfaces[0].mac;
|
||||
info!("Primary LAN MAC: {primary_lan_mac}, Backup LAN MAC: {backup_lan_mac}");
|
||||
|
||||
// ── Sequential bootstrap with NIC juggling ─────────────────────
|
||||
//
|
||||
// Both VMs boot on .1 (OPNsense default). We disable backup's LAN
|
||||
// NIC so primary gets exclusive access to .1, bootstrap it, change
|
||||
// its IP, then do the same for backup.
|
||||
|
||||
// Step 1: Disable backup's LAN NIC
|
||||
info!("Disabling backup LAN NIC for primary bootstrap...");
|
||||
executor
|
||||
.set_interface_link(VM_BACKUP, backup_lan_mac, false)
|
||||
.await?;
|
||||
|
||||
// Step 2: Wait for primary web UI and bootstrap
|
||||
info!("Waiting for primary web UI at https://{BOOT_IP}...");
|
||||
wait_for_https(BOOT_IP, 443).await?;
|
||||
bootstrap_vm("primary", BOOT_IP).await?;
|
||||
|
||||
// Step 3: Change primary's LAN IP from .1 to .2 via API
|
||||
info!("Changing primary LAN IP to {PRIMARY_IP}...");
|
||||
change_lan_ip_via_ssh(BOOT_IP, PRIMARY_IP, 24).await?;
|
||||
|
||||
// Step 4: Wait for primary to come back on new IP
|
||||
info!("Waiting for primary on new IP {PRIMARY_IP}:{API_PORT}...");
|
||||
OPNsenseBootstrap::wait_for_ready(
|
||||
&format!("https://{PRIMARY_IP}:{API_PORT}"),
|
||||
std::time::Duration::from_secs(60),
|
||||
)
|
||||
.await?;
|
||||
|
||||
// Step 5: Disable primary's LAN NIC, enable backup's
|
||||
info!("Swapping NICs: disabling primary, enabling backup...");
|
||||
executor
|
||||
.set_interface_link(VM_PRIMARY, primary_lan_mac, false)
|
||||
.await?;
|
||||
executor
|
||||
.set_interface_link(VM_BACKUP, backup_lan_mac, true)
|
||||
.await?;
|
||||
|
||||
// Step 6: Wait for backup web UI and bootstrap
|
||||
info!("Waiting for backup web UI at https://{BOOT_IP}...");
|
||||
wait_for_https(BOOT_IP, 443).await?;
|
||||
bootstrap_vm("backup", BOOT_IP).await?;
|
||||
|
||||
// Step 7: Change backup's LAN IP from .1 to .3 via API
|
||||
info!("Changing backup LAN IP to {BACKUP_IP}...");
|
||||
change_lan_ip_via_ssh(BOOT_IP, BACKUP_IP, 24).await?;
|
||||
|
||||
// Step 8: Re-enable primary's LAN NIC
|
||||
info!("Re-enabling primary LAN NIC...");
|
||||
executor
|
||||
.set_interface_link(VM_PRIMARY, primary_lan_mac, true)
|
||||
.await?;
|
||||
|
||||
// Step 9: Wait for both to be reachable on their final IPs
|
||||
info!("Waiting for both VMs on final IPs...");
|
||||
OPNsenseBootstrap::wait_for_ready(
|
||||
&format!("https://{PRIMARY_IP}:{API_PORT}"),
|
||||
std::time::Duration::from_secs(60),
|
||||
)
|
||||
.await?;
|
||||
OPNsenseBootstrap::wait_for_ready(
|
||||
&format!("https://{BACKUP_IP}:{API_PORT}"),
|
||||
std::time::Duration::from_secs(60),
|
||||
)
|
||||
.await?;
|
||||
|
||||
println!();
|
||||
println!("OPNsense firewall pair is running and bootstrapped:");
|
||||
println!(" Primary: https://{PRIMARY_IP}:{API_PORT} (root/opnsense)");
|
||||
println!(" Backup: https://{BACKUP_IP}:{API_PORT} (root/opnsense)");
|
||||
println!(" CARP VIP: {CARP_VIP} (will be configured by pair scores)");
|
||||
println!();
|
||||
println!("Run the pair integration test:");
|
||||
println!(" cargo run -p opnsense-pair-integration");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn bootstrap_vm(role: &str, ip: &str) -> Result<(), Box<dyn std::error::Error>> {
|
||||
info!("Bootstrapping {role} firewall at {ip}...");
|
||||
let bootstrap = OPNsenseBootstrap::new(&format!("https://{ip}"));
|
||||
bootstrap.login("root", "opnsense").await?;
|
||||
bootstrap.abort_wizard().await?;
|
||||
bootstrap.enable_ssh(true, true).await?;
|
||||
bootstrap.set_webgui_port(API_PORT, ip, false).await?;
|
||||
|
||||
// Wait for webgui on new port
|
||||
OPNsenseBootstrap::wait_for_ready(
|
||||
&format!("https://{ip}:{API_PORT}"),
|
||||
std::time::Duration::from_secs(120),
|
||||
)
|
||||
.await?;
|
||||
|
||||
// Verify SSH
|
||||
for _ in 0..15 {
|
||||
if check_tcp_port(ip, 22).await {
|
||||
break;
|
||||
}
|
||||
tokio::time::sleep(std::time::Duration::from_secs(2)).await;
|
||||
}
|
||||
if !check_tcp_port(ip, 22).await {
|
||||
return Err(format!("SSH not reachable on {role} after bootstrap").into());
|
||||
}
|
||||
|
||||
info!("{role} bootstrap complete");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Change the LAN interface IP via SSH (using OPNsense's ifconfig + config edit).
|
||||
async fn change_lan_ip_via_ssh(
|
||||
current_ip: &str,
|
||||
new_ip: &str,
|
||||
subnet: u8,
|
||||
) -> Result<(), Box<dyn std::error::Error>> {
|
||||
use opnsense_config::config::{OPNsenseShell, SshCredentials, SshOPNSenseShell};
|
||||
|
||||
let ssh_config = Arc::new(russh::client::Config {
|
||||
inactivity_timeout: None,
|
||||
..<_>::default()
|
||||
});
|
||||
let credentials = SshCredentials::Password {
|
||||
username: "root".to_string(),
|
||||
password: "opnsense".to_string(),
|
||||
};
|
||||
let ip: IpAddr = current_ip.parse()?;
|
||||
let shell = SshOPNSenseShell::new((ip, 22), credentials, ssh_config);
|
||||
|
||||
// Use a PHP script to update config.xml and apply
|
||||
let php_script = format!(
|
||||
r#"<?php
|
||||
require_once '/usr/local/etc/inc/config.inc';
|
||||
$config = OPNsense\Core\Config::getInstance();
|
||||
$config->object()->interfaces->lan->ipaddr = '{new_ip}';
|
||||
$config->object()->interfaces->lan->subnet = '{subnet}';
|
||||
$config->save();
|
||||
echo "OK\n";
|
||||
"#
|
||||
);
|
||||
|
||||
shell
|
||||
.write_content_to_file(&php_script, "/tmp/change_ip.php")
|
||||
.await?;
|
||||
let output = shell
|
||||
.exec("php /tmp/change_ip.php && rm /tmp/change_ip.php && configctl interface reconfigure lan")
|
||||
.await?;
|
||||
info!("IP change result: {}", output.trim());
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Phase 2: Pair integration test ─────────────────────────────────
|
||||
|
||||
async fn run_pair_test() -> Result<(), Box<dyn std::error::Error>> {
|
||||
// Verify both VMs are reachable
|
||||
info!("Checking primary at {PRIMARY_IP}:{API_PORT}...");
|
||||
if !check_tcp_port(PRIMARY_IP, API_PORT).await {
|
||||
return Err(format!("Primary not reachable at {PRIMARY_IP}:{API_PORT}").into());
|
||||
}
|
||||
info!("Checking backup at {BACKUP_IP}:{API_PORT}...");
|
||||
if !check_tcp_port(BACKUP_IP, API_PORT).await {
|
||||
return Err(format!("Backup not reachable at {BACKUP_IP}:{API_PORT}").into());
|
||||
}
|
||||
|
||||
// Create API keys on both
|
||||
info!("Creating API keys...");
|
||||
let primary_ip: IpAddr = PRIMARY_IP.parse()?;
|
||||
let backup_ip: IpAddr = BACKUP_IP.parse()?;
|
||||
let (primary_key, primary_secret) = create_api_key_ssh(&primary_ip).await?;
|
||||
let (backup_key, backup_secret) = create_api_key_ssh(&backup_ip).await?;
|
||||
info!("API keys created for both firewalls");
|
||||
|
||||
// Build FirewallPairTopology
|
||||
let primary_host = LogicalHost {
|
||||
ip: primary_ip.into(),
|
||||
name: VM_PRIMARY.to_string(),
|
||||
};
|
||||
let backup_host = LogicalHost {
|
||||
ip: backup_ip.into(),
|
||||
name: VM_BACKUP.to_string(),
|
||||
};
|
||||
let primary_api_creds = OPNSenseApiCredentials {
|
||||
key: primary_key.clone(),
|
||||
secret: primary_secret.clone(),
|
||||
};
|
||||
let backup_api_creds = OPNSenseApiCredentials {
|
||||
key: backup_key.clone(),
|
||||
secret: backup_secret.clone(),
|
||||
};
|
||||
let ssh_creds = OPNSenseFirewallCredentials {
|
||||
username: "root".to_string(),
|
||||
password: "opnsense".to_string(),
|
||||
};
|
||||
let primary_fw = OPNSenseFirewall::with_api_port(
|
||||
primary_host,
|
||||
None,
|
||||
API_PORT,
|
||||
&primary_api_creds,
|
||||
&ssh_creds,
|
||||
)
|
||||
.await;
|
||||
let backup_fw =
|
||||
OPNSenseFirewall::with_api_port(backup_host, None, API_PORT, &backup_api_creds, &ssh_creds)
|
||||
.await;
|
||||
let pair = FirewallPairTopology {
|
||||
primary: primary_fw,
|
||||
backup: backup_fw,
|
||||
};
|
||||
|
||||
// Build pair scores
|
||||
let carp_score = CarpVipScore {
|
||||
vips: vec![VipDef {
|
||||
mode: VipMode::Carp,
|
||||
interface: "lan".to_string(),
|
||||
subnet: CARP_VIP.to_string(),
|
||||
subnet_bits: 24,
|
||||
vhid: Some(1),
|
||||
advbase: Some(1),
|
||||
advskew: None, // handled by CarpVipScore (primary=0, backup=100)
|
||||
password: Some(CARP_PASSWORD.to_string()),
|
||||
peer: None,
|
||||
}],
|
||||
backup_advskew: Some(100),
|
||||
};
|
||||
|
||||
let vlan_score = VlanScore {
|
||||
vlans: vec![VlanDef {
|
||||
parent_interface: "vtnet0".to_string(),
|
||||
tag: 100,
|
||||
description: "pair-test-vlan-100".to_string(),
|
||||
}],
|
||||
};
|
||||
|
||||
let fw_rule_score = FirewallRuleScore {
|
||||
rules: vec![FilterRuleDef {
|
||||
action: FirewallAction::Pass,
|
||||
direction: Direction::In,
|
||||
interface: "lan".to_string(),
|
||||
ip_protocol: IpProtocol::Inet,
|
||||
protocol: NetworkProtocol::Icmp,
|
||||
source_net: "any".to_string(),
|
||||
destination_net: "any".to_string(),
|
||||
destination_port: None,
|
||||
gateway: None,
|
||||
description: "pair-test-allow-icmp".to_string(),
|
||||
log: false,
|
||||
}],
|
||||
};
|
||||
|
||||
// Run pair scores
|
||||
info!("Running pair scores...");
|
||||
let scores: Vec<Box<dyn Score<FirewallPairTopology>>> = vec![
|
||||
Box::new(carp_score),
|
||||
Box::new(vlan_score),
|
||||
Box::new(fw_rule_score),
|
||||
];
|
||||
let args = harmony_cli::Args {
|
||||
yes: true,
|
||||
filter: None,
|
||||
interactive: false,
|
||||
all: true,
|
||||
number: 0,
|
||||
list: false,
|
||||
};
|
||||
harmony_cli::run_cli(Inventory::autoload(), pair, scores, args).await?;
|
||||
|
||||
// Verify CARP VIPs via API
|
||||
info!("Verifying CARP VIPs...");
|
||||
let primary_client = opnsense_api::OpnsenseClient::builder()
|
||||
.base_url(format!("https://{PRIMARY_IP}:{API_PORT}/api"))
|
||||
.auth_from_key_secret(&primary_key, &primary_secret)
|
||||
.skip_tls_verify()
|
||||
.timeout_secs(60)
|
||||
.build()?;
|
||||
let backup_client = opnsense_api::OpnsenseClient::builder()
|
||||
.base_url(format!("https://{BACKUP_IP}:{API_PORT}/api"))
|
||||
.auth_from_key_secret(&backup_key, &backup_secret)
|
||||
.skip_tls_verify()
|
||||
.timeout_secs(60)
|
||||
.build()?;
|
||||
|
||||
let primary_vips: serde_json::Value = primary_client
|
||||
.get_typed("interfaces", "vip_settings", "searchItem")
|
||||
.await?;
|
||||
let backup_vips: serde_json::Value = backup_client
|
||||
.get_typed("interfaces", "vip_settings", "searchItem")
|
||||
.await?;
|
||||
|
||||
let primary_vip_count = primary_vips["rowCount"].as_i64().unwrap_or(0);
|
||||
let backup_vip_count = backup_vips["rowCount"].as_i64().unwrap_or(0);
|
||||
info!(" Primary VIPs: {primary_vip_count}");
|
||||
info!(" Backup VIPs: {backup_vip_count}");
|
||||
assert!(primary_vip_count >= 1, "Primary should have at least 1 VIP");
|
||||
assert!(backup_vip_count >= 1, "Backup should have at least 1 VIP");
|
||||
|
||||
// Verify VLANs on both
|
||||
let primary_vlans: serde_json::Value = primary_client
|
||||
.get_typed("interfaces", "vlan_settings", "get")
|
||||
.await?;
|
||||
let backup_vlans: serde_json::Value = backup_client
|
||||
.get_typed("interfaces", "vlan_settings", "get")
|
||||
.await?;
|
||||
let p_vlan_count = primary_vlans["vlan"]["vlan"]
|
||||
.as_object()
|
||||
.map(|m| m.len())
|
||||
.unwrap_or(0);
|
||||
let b_vlan_count = backup_vlans["vlan"]["vlan"]
|
||||
.as_object()
|
||||
.map(|m| m.len())
|
||||
.unwrap_or(0);
|
||||
info!(" Primary VLANs: {p_vlan_count}");
|
||||
info!(" Backup VLANs: {b_vlan_count}");
|
||||
assert!(p_vlan_count >= 1, "Primary should have at least 1 VLAN");
|
||||
assert!(b_vlan_count >= 1, "Backup should have at least 1 VLAN");
|
||||
|
||||
println!();
|
||||
println!("PASSED - OPNsense firewall pair integration test:");
|
||||
println!(
|
||||
" - CarpVipScore: CARP VIP {CARP_VIP} on both (primary advskew=0, backup advskew=100)"
|
||||
);
|
||||
println!(" - VlanScore: VLAN 100 on both");
|
||||
println!(" - FirewallRuleScore: ICMP allow on both");
|
||||
println!();
|
||||
println!("VMs are running. Use --clean to tear down.");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Helpers ────────────────────────────────────────────────────────
|
||||
|
||||
fn prepare_vm_disk(vm_name: &str, img_path: &Path) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let vm_raw = image_dir().join(format!("{vm_name}.img"));
|
||||
if !vm_raw.exists() {
|
||||
info!("Copying nano image for {vm_name}...");
|
||||
std::fs::copy(img_path, &vm_raw)?;
|
||||
|
||||
info!("Injecting config.xml for {vm_name}...");
|
||||
let config =
|
||||
harmony::modules::opnsense::image::minimal_config_xml("vtnet1", "vtnet0", BOOT_IP, 24);
|
||||
harmony::modules::opnsense::image::replace_config_xml(&vm_raw, &config)?;
|
||||
}
|
||||
|
||||
let vm_disk = image_dir().join(format!("{vm_name}.qcow2"));
|
||||
if !vm_disk.exists() {
|
||||
info!("Converting {vm_name} to qcow2...");
|
||||
run_cmd(
|
||||
"qemu-img",
|
||||
&[
|
||||
"convert",
|
||||
"-f",
|
||||
"raw",
|
||||
"-O",
|
||||
"qcow2",
|
||||
&vm_raw.to_string_lossy(),
|
||||
&vm_disk.to_string_lossy(),
|
||||
],
|
||||
)?;
|
||||
run_cmd("qemu-img", &["resize", &vm_disk.to_string_lossy(), "4G"])?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn check_prerequisites() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let mut ok = true;
|
||||
for (cmd, test_args) in [
|
||||
("virsh", vec!["-c", "qemu:///system", "version"]),
|
||||
("qemu-img", vec!["--version"]),
|
||||
("bunzip2", vec!["--help"]),
|
||||
] {
|
||||
match std::process::Command::new(cmd).args(&test_args).output() {
|
||||
Ok(out) if out.status.success() => println!("[ok] {cmd}"),
|
||||
_ => {
|
||||
println!("[FAIL] {cmd}");
|
||||
ok = false;
|
||||
}
|
||||
}
|
||||
}
|
||||
if !ok {
|
||||
return Err("Prerequisites not met".into());
|
||||
}
|
||||
println!("All prerequisites met.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn run_cmd(cmd: &str, args: &[&str]) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let status = std::process::Command::new(cmd).args(args).status()?;
|
||||
if !status.success() {
|
||||
return Err(format!("{cmd} failed").into());
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn image_dir() -> PathBuf {
|
||||
let dir = std::env::var("HARMONY_KVM_IMAGE_DIR").unwrap_or_else(|_| {
|
||||
dirs::data_dir()
|
||||
.unwrap_or_else(|| PathBuf::from("/tmp"))
|
||||
.join("harmony")
|
||||
.join("kvm")
|
||||
.join("images")
|
||||
.to_string_lossy()
|
||||
.to_string()
|
||||
});
|
||||
PathBuf::from(dir)
|
||||
}
|
||||
|
||||
async fn download_image() -> Result<PathBuf, Box<dyn std::error::Error>> {
|
||||
let dir = image_dir();
|
||||
std::fs::create_dir_all(&dir)?;
|
||||
let img_path = dir.join(OPNSENSE_IMG_NAME);
|
||||
if img_path.exists() {
|
||||
info!("Image cached: {}", img_path.display());
|
||||
return Ok(img_path);
|
||||
}
|
||||
let bz2_path = dir.join(format!("{OPNSENSE_IMG_NAME}.bz2"));
|
||||
if !bz2_path.exists() {
|
||||
info!("Downloading OPNsense nano image (~350MB)...");
|
||||
let response = reqwest::Client::builder()
|
||||
.timeout(std::time::Duration::from_secs(600))
|
||||
.build()?
|
||||
.get(OPNSENSE_IMG_URL)
|
||||
.send()
|
||||
.await?;
|
||||
if !response.status().is_success() {
|
||||
return Err(format!("Download failed: HTTP {}", response.status()).into());
|
||||
}
|
||||
let bytes = response.bytes().await?;
|
||||
std::fs::write(&bz2_path, &bytes)?;
|
||||
}
|
||||
info!("Decompressing...");
|
||||
run_cmd("bunzip2", &["--keep", &bz2_path.to_string_lossy()])?;
|
||||
Ok(img_path)
|
||||
}
|
||||
|
||||
async fn clean(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
info!("Cleaning up pair integration...");
|
||||
for vm_name in [VM_PRIMARY, VM_BACKUP] {
|
||||
let _ = executor.destroy_vm(vm_name).await;
|
||||
let _ = executor.undefine_vm(vm_name).await;
|
||||
for ext in ["img", "qcow2"] {
|
||||
let path = image_dir().join(format!("{vm_name}.{ext}"));
|
||||
if path.exists() {
|
||||
std::fs::remove_file(&path)?;
|
||||
info!("Removed: {}", path.display());
|
||||
}
|
||||
}
|
||||
}
|
||||
let _ = executor.delete_network(NET_LAN).await;
|
||||
info!("Done.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn status(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
for (vm_name, ip) in [(VM_PRIMARY, PRIMARY_IP), (VM_BACKUP, BACKUP_IP)] {
|
||||
match executor.vm_status(vm_name).await {
|
||||
Ok(s) => {
|
||||
let api = check_tcp_port(ip, API_PORT).await;
|
||||
let ssh = check_tcp_port(ip, 22).await;
|
||||
println!("{vm_name}: {s:?}");
|
||||
println!(" LAN IP: {ip}");
|
||||
println!(
|
||||
" API: {}",
|
||||
if api { "responding" } else { "not responding" }
|
||||
);
|
||||
println!(
|
||||
" SSH: {}",
|
||||
if ssh { "responding" } else { "not responding" }
|
||||
);
|
||||
}
|
||||
Err(_) => println!("{vm_name}: not found"),
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn wait_for_https(ip: &str, port: u16) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let client = reqwest::Client::builder()
|
||||
.danger_accept_invalid_certs(true)
|
||||
.timeout(std::time::Duration::from_secs(5))
|
||||
.build()?;
|
||||
let url = format!("https://{ip}:{port}");
|
||||
for i in 0..60 {
|
||||
if client.get(&url).send().await.is_ok() {
|
||||
info!("Web UI responding at {url} (attempt {i})");
|
||||
return Ok(());
|
||||
}
|
||||
if i % 10 == 0 {
|
||||
info!("Waiting for {url}... (attempt {i})");
|
||||
}
|
||||
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
|
||||
}
|
||||
Err(format!("{url} did not respond within 5 minutes").into())
|
||||
}
|
||||
|
||||
async fn check_tcp_port(ip: &str, port: u16) -> bool {
|
||||
tokio::time::timeout(
|
||||
std::time::Duration::from_secs(3),
|
||||
tokio::net::TcpStream::connect(format!("{ip}:{port}")),
|
||||
)
|
||||
.await
|
||||
.map(|r| r.is_ok())
|
||||
.unwrap_or(false)
|
||||
}
|
||||
|
||||
async fn create_api_key_ssh(ip: &IpAddr) -> Result<(String, String), Box<dyn std::error::Error>> {
|
||||
use opnsense_config::config::{OPNsenseShell, SshCredentials, SshOPNSenseShell};
|
||||
|
||||
let ssh_config = Arc::new(russh::client::Config {
|
||||
inactivity_timeout: None,
|
||||
..<_>::default()
|
||||
});
|
||||
let credentials = SshCredentials::Password {
|
||||
username: "root".to_string(),
|
||||
password: "opnsense".to_string(),
|
||||
};
|
||||
let shell = SshOPNSenseShell::new((*ip, 22), credentials, ssh_config);
|
||||
|
||||
let php_script = r#"<?php
|
||||
require_once '/usr/local/etc/inc/config.inc';
|
||||
$key = bin2hex(random_bytes(20));
|
||||
$secret = bin2hex(random_bytes(40));
|
||||
$config = OPNsense\Core\Config::getInstance();
|
||||
foreach ($config->object()->system->user as $user) {
|
||||
if ((string)$user->name === 'root') {
|
||||
if (!isset($user->apikeys)) { $user->addChild('apikeys'); }
|
||||
$item = $user->apikeys->addChild('item');
|
||||
$item->addChild('key', $key);
|
||||
$item->addChild('secret', crypt($secret, '$6$' . bin2hex(random_bytes(8)) . '$'));
|
||||
$config->save();
|
||||
echo $key . "\n" . $secret . "\n";
|
||||
exit(0);
|
||||
}
|
||||
}
|
||||
echo "ERROR: root user not found\n";
|
||||
exit(1);
|
||||
"#;
|
||||
|
||||
shell
|
||||
.write_content_to_file(php_script, "/tmp/create_api_key.php")
|
||||
.await?;
|
||||
let output = shell
|
||||
.exec("php /tmp/create_api_key.php && rm /tmp/create_api_key.php")
|
||||
.await?;
|
||||
let lines: Vec<&str> = output.trim().lines().collect();
|
||||
if lines.len() >= 2 && !lines[0].starts_with("ERROR") {
|
||||
Ok((lines[0].to_string(), lines[1].to_string()))
|
||||
} else {
|
||||
Err(format!("API key creation failed: {output}").into())
|
||||
}
|
||||
}
|
||||
@@ -1,25 +0,0 @@
|
||||
[package]
|
||||
name = "opnsense-vm-integration"
|
||||
version.workspace = true
|
||||
edition = "2024"
|
||||
license.workspace = true
|
||||
|
||||
[[bin]]
|
||||
name = "opnsense-vm-integration"
|
||||
path = "src/main.rs"
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
harmony_cli = { path = "../../harmony_cli" }
|
||||
harmony_inventory_agent = { path = "../../harmony_inventory_agent" }
|
||||
harmony_macros = { path = "../../harmony_macros" }
|
||||
harmony_types = { path = "../../harmony_types" }
|
||||
opnsense-api = { path = "../../opnsense-api" }
|
||||
opnsense-config = { path = "../../opnsense-config" }
|
||||
tokio.workspace = true
|
||||
log.workspace = true
|
||||
env_logger.workspace = true
|
||||
reqwest.workspace = true
|
||||
russh.workspace = true
|
||||
serde_json.workspace = true
|
||||
dirs = "6"
|
||||
@@ -1,151 +0,0 @@
|
||||
# OPNsense VM Integration Example
|
||||
|
||||
Fully automated end-to-end integration test: boots an OPNsense VM via KVM, bootstraps SSH and API access without any manual browser interaction, installs packages, and runs 11 Harmony Scores against it. CI-friendly.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
# 1. One-time setup (libvirt, Docker compatibility)
|
||||
./examples/opnsense_vm_integration/setup-libvirt.sh
|
||||
|
||||
# 2. Verify prerequisites
|
||||
cargo run -p opnsense-vm-integration -- --check
|
||||
|
||||
# 3. Boot + bootstrap + integration test (fully unattended)
|
||||
cargo run -p opnsense-vm-integration -- --full
|
||||
|
||||
# 4. Clean up
|
||||
cargo run -p opnsense-vm-integration -- --clean
|
||||
```
|
||||
|
||||
That's it. No browser clicks, no manual SSH setup, no wizard interaction.
|
||||
|
||||
## What happens during `--full`
|
||||
|
||||
1. Downloads OPNsense 26.1 nano image (~350MB, cached after first download)
|
||||
2. Injects `config.xml` with virtio interface assignments (vtnet0=LAN, vtnet1=WAN)
|
||||
3. Creates a 4 GiB qcow2 disk and boots via KVM (1 vCPU, 1GB RAM, 4 NICs)
|
||||
4. Waits for web UI to respond (~20s)
|
||||
5. **Automated bootstrap** via `OPNsenseBootstrap`:
|
||||
- Logs in (root/opnsense) with CSRF token handling
|
||||
- Aborts the initial setup wizard
|
||||
- Enables SSH with root login and password auth
|
||||
- Changes web GUI port to 9443 (avoids HAProxy conflicts)
|
||||
- Restarts lighttpd via SSH to apply the port change
|
||||
6. Creates OPNsense API key via SSH (PHP script)
|
||||
7. Installs `os-haproxy` via firmware API
|
||||
8. Runs 11 Scores configuring the entire firewall
|
||||
9. Verifies all configurations via REST API assertions
|
||||
|
||||
## Step-by-step mode
|
||||
|
||||
If you prefer to separate boot and test:
|
||||
|
||||
```bash
|
||||
# Boot + bootstrap (creates VM, enables SSH, sets port)
|
||||
cargo run -p opnsense-vm-integration -- --boot
|
||||
|
||||
# Run integration test (assumes VM is bootstrapped)
|
||||
cargo run -p opnsense-vm-integration
|
||||
|
||||
# Check VM status at any time
|
||||
cargo run -p opnsense-vm-integration -- --status
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### System requirements
|
||||
|
||||
- **Linux** with KVM support (Intel VT-x/AMD-V)
|
||||
- **~10 GB** free disk space
|
||||
- **~15 minutes** for first run (image download + firmware update)
|
||||
- Subsequent runs: ~2 minutes
|
||||
|
||||
### Required packages
|
||||
|
||||
**Arch/Manjaro:**
|
||||
```bash
|
||||
sudo pacman -S libvirt qemu-full dnsmasq
|
||||
```
|
||||
|
||||
**Fedora:**
|
||||
```bash
|
||||
sudo dnf install libvirt qemu-kvm dnsmasq
|
||||
```
|
||||
|
||||
**Ubuntu/Debian:**
|
||||
```bash
|
||||
sudo apt install libvirt-daemon-system qemu-kvm dnsmasq
|
||||
```
|
||||
|
||||
### Automated setup
|
||||
|
||||
```bash
|
||||
./examples/opnsense_vm_integration/setup-libvirt.sh
|
||||
```
|
||||
|
||||
This handles: user group membership, libvirtd startup, default storage pool, Docker FORWARD policy conflict.
|
||||
|
||||
After running setup, apply group membership:
|
||||
```bash
|
||||
newgrp libvirt
|
||||
```
|
||||
|
||||
### Docker + libvirt compatibility
|
||||
|
||||
Docker sets the iptables FORWARD policy to DROP, which blocks libvirt's NAT networking. The setup script detects this and switches libvirt to the iptables firewall backend so both coexist.
|
||||
|
||||
## Scores applied
|
||||
|
||||
| # | Score | What it configures |
|
||||
|---|-------|--------------------|
|
||||
| 1 | `LoadBalancerScore` | HAProxy with 2 frontends, backends with TCP health checks |
|
||||
| 2 | `DhcpScore` | DHCP range, 2 static host bindings, PXE boot options |
|
||||
| 3 | `TftpScore` | TFTP server serving boot files |
|
||||
| 4 | `NodeExporterScore` | Prometheus node exporter |
|
||||
| 5 | `VlanScore` | 2 VLANs (tags 100, 200) on vtnet0 |
|
||||
| 6 | `FirewallRuleScore` | Firewall filter rules with logging |
|
||||
| 7 | `OutboundNatScore` | Source NAT for outbound traffic |
|
||||
| 8 | `BinatScore` | Bidirectional 1:1 NAT |
|
||||
| 9 | `VipScore` | Virtual IPs (IP aliases) |
|
||||
| 10 | `DnatScore` | Port forwarding rules |
|
||||
| 11 | `LaggScore` | Link aggregation (vtnet2+vtnet3) |
|
||||
|
||||
All Scores are idempotent: running them twice produces the same result.
|
||||
|
||||
## Network architecture
|
||||
|
||||
```
|
||||
Host (192.168.1.10) --- virbr-opn bridge --- OPNsense LAN (192.168.1.1)
|
||||
192.168.1.0/24 vtnet0
|
||||
NAT to internet
|
||||
|
||||
--- virbr0 (default) --- OPNsense WAN (DHCP)
|
||||
192.168.122.0/24 vtnet1
|
||||
NAT to internet
|
||||
```
|
||||
|
||||
## Environment variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `RUST_LOG` | (unset) | Log level: `info`, `debug`, `trace` |
|
||||
| `HARMONY_KVM_URI` | `qemu:///system` | Libvirt connection URI |
|
||||
| `HARMONY_KVM_IMAGE_DIR` | `~/.local/share/harmony/kvm/images` | Cached disk images |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**VM won't start / permission denied**
|
||||
Ensure your user is in the `libvirt` group and that the image directory is traversable by the qemu user. The setup script handles this.
|
||||
|
||||
**192.168.1.0/24 conflict**
|
||||
If your host network already uses this subnet, the VM will be unreachable. Edit the constants in `src/main.rs` to use a different subnet.
|
||||
|
||||
**HAProxy install fails**
|
||||
OPNsense may need a firmware update first. The integration test attempts this automatically. If it fails, connect to the web UI at https://192.168.1.1:9443 and update manually.
|
||||
|
||||
**Serial console access**
|
||||
```bash
|
||||
virsh -c qemu:///system console opn-integration
|
||||
# Press Ctrl+] to exit
|
||||
```
|
||||
@@ -1,140 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
# Setup sudo-less libvirt access for KVM-based harmony examples.
|
||||
#
|
||||
# Run once on a fresh machine. After this, all KVM operations work
|
||||
# without sudo — libvirt authenticates via group membership.
|
||||
#
|
||||
# Usage:
|
||||
# ./setup-libvirt.sh # interactive, asks before each step
|
||||
# ./setup-libvirt.sh --yes # non-interactive, runs everything
|
||||
|
||||
USER="${USER:-$(whoami)}"
|
||||
AUTO_YES=false
|
||||
[[ "${1:-}" == "--yes" ]] && AUTO_YES=true
|
||||
|
||||
green() { printf '\033[32m%s\033[0m\n' "$*"; }
|
||||
red() { printf '\033[31m%s\033[0m\n' "$*"; }
|
||||
bold() { printf '\033[1m%s\033[0m\n' "$*"; }
|
||||
|
||||
confirm() {
|
||||
if $AUTO_YES; then return 0; fi
|
||||
read -rp "$1 [Y/n] " answer
|
||||
[[ -z "$answer" || "$answer" =~ ^[Yy] ]]
|
||||
}
|
||||
|
||||
bold "Harmony KVM/libvirt setup"
|
||||
echo
|
||||
|
||||
# ── Step 1: Install packages ────────────────────────────────────────────
|
||||
|
||||
echo "Checking required packages..."
|
||||
MISSING=()
|
||||
for pkg in qemu-full libvirt dnsmasq ebtables; do
|
||||
if ! pacman -Qi "$pkg" &>/dev/null; then
|
||||
MISSING+=("$pkg")
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ ${#MISSING[@]} -gt 0 ]]; then
|
||||
echo "Missing packages: ${MISSING[*]}"
|
||||
if confirm "Install them?"; then
|
||||
sudo pacman -S --needed "${MISSING[@]}"
|
||||
else
|
||||
red "Skipped package installation"
|
||||
fi
|
||||
else
|
||||
green "[ok] All packages installed"
|
||||
fi
|
||||
|
||||
# ── Step 2: Add user to libvirt group ────────────────────────────────────
|
||||
|
||||
if groups "$USER" 2>/dev/null | grep -qw libvirt; then
|
||||
green "[ok] $USER is in libvirt group"
|
||||
else
|
||||
echo "$USER is NOT in the libvirt group"
|
||||
if confirm "Add $USER to libvirt group?"; then
|
||||
sudo usermod -aG libvirt "$USER"
|
||||
green "[ok] Added $USER to libvirt group"
|
||||
echo " Note: you need to log out and back in (or run 'newgrp libvirt') for this to take effect"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Step 3: Start libvirtd ───────────────────────────────────────────────
|
||||
|
||||
if systemctl is-active --quiet libvirtd; then
|
||||
green "[ok] libvirtd is running"
|
||||
else
|
||||
echo "libvirtd is not running"
|
||||
if confirm "Enable and start libvirtd?"; then
|
||||
sudo systemctl enable --now libvirtd
|
||||
green "[ok] libvirtd started"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Step 4: Default storage pool ─────────────────────────────────────────
|
||||
|
||||
if virsh -c qemu:///system pool-info default &>/dev/null; then
|
||||
green "[ok] Default storage pool exists"
|
||||
else
|
||||
echo "Default storage pool does not exist"
|
||||
if confirm "Create default storage pool at /var/lib/libvirt/images?"; then
|
||||
sudo virsh pool-define-as default dir --target /var/lib/libvirt/images
|
||||
sudo virsh pool-autostart default
|
||||
sudo virsh pool-start default
|
||||
green "[ok] Default storage pool created"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ── Step 5: Fix Docker + libvirt FORWARD conflict ────────────────────────
|
||||
|
||||
# Docker sets iptables FORWARD policy to DROP, which blocks libvirt NAT.
|
||||
# Libvirt defaults to nftables which doesn't interact with Docker's iptables.
|
||||
# Fix: switch libvirt to iptables backend so rules coexist with Docker.
|
||||
|
||||
if docker info &>/dev/null; then
|
||||
echo "Docker detected."
|
||||
NETCONF="/etc/libvirt/network.conf"
|
||||
if grep -q '^firewall_backend' "$NETCONF" 2>/dev/null; then
|
||||
CURRENT=$(grep '^firewall_backend' "$NETCONF" | head -1)
|
||||
if echo "$CURRENT" | grep -q 'iptables'; then
|
||||
green "[ok] libvirt firewall_backend is already iptables"
|
||||
else
|
||||
echo "libvirt firewall_backend is: $CURRENT"
|
||||
echo "Docker's iptables FORWARD DROP will block libvirt NAT."
|
||||
if confirm "Switch libvirt to iptables backend?"; then
|
||||
sudo sed -i 's/^firewall_backend.*/firewall_backend = "iptables"/' "$NETCONF"
|
||||
echo "Restarting libvirtd to apply..."
|
||||
sudo systemctl restart libvirtd
|
||||
green "[ok] Switched to iptables backend"
|
||||
fi
|
||||
fi
|
||||
else
|
||||
echo "libvirt uses nftables (default), but Docker's iptables FORWARD DROP blocks NAT."
|
||||
if confirm "Set libvirt to use iptables backend (recommended with Docker)?"; then
|
||||
echo 'firewall_backend = "iptables"' | sudo tee -a "$NETCONF" >/dev/null
|
||||
echo "Restarting libvirtd to apply..."
|
||||
sudo systemctl restart libvirtd
|
||||
# Re-activate networks so they get iptables rules
|
||||
for net in $(virsh -c qemu:///system net-list --name 2>/dev/null); do
|
||||
virsh -c qemu:///system net-destroy "$net" 2>/dev/null
|
||||
virsh -c qemu:///system net-start "$net" 2>/dev/null
|
||||
done
|
||||
green "[ok] Switched to iptables backend and restarted networks"
|
||||
fi
|
||||
fi
|
||||
else
|
||||
green "[ok] Docker not detected, no FORWARD conflict"
|
||||
fi
|
||||
|
||||
# ── Done ─────────────────────────────────────────────────────────────────
|
||||
|
||||
echo
|
||||
bold "Setup complete."
|
||||
echo
|
||||
echo "If you were added to the libvirt group, apply it now:"
|
||||
echo " newgrp libvirt"
|
||||
echo
|
||||
echo "Then verify:"
|
||||
echo " cargo run -p opnsense-vm-integration -- --check"
|
||||
@@ -1,937 +0,0 @@
|
||||
//! OPNsense VM integration example.
|
||||
//!
|
||||
//! Fully unattended workflow — no manual browser interaction required:
|
||||
//!
|
||||
//! 1. `--boot` — creates a KVM VM, waits for web UI, bootstraps SSH + webgui port
|
||||
//! 2. (default run) — creates API key via SSH, installs packages, runs Scores
|
||||
//! 3. `--full` — does both in a single invocation (CI-friendly)
|
||||
//!
|
||||
//! # Usage
|
||||
//!
|
||||
//! ```bash
|
||||
//! cargo run -p opnsense-vm-integration -- --check # verify prerequisites
|
||||
//! cargo run -p opnsense-vm-integration -- --download # download OPNsense image
|
||||
//! cargo run -p opnsense-vm-integration -- --boot # create VM + automated bootstrap
|
||||
//! cargo run -p opnsense-vm-integration # run integration test
|
||||
//! cargo run -p opnsense-vm-integration -- --full # boot + bootstrap + test (CI mode)
|
||||
//! cargo run -p opnsense-vm-integration -- --status # check VM state
|
||||
//! cargo run -p opnsense-vm-integration -- --clean # tear down everything
|
||||
//! ```
|
||||
|
||||
use std::net::IpAddr;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::Arc;
|
||||
|
||||
use harmony::config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials};
|
||||
use harmony::hardware::{HostCategory, PhysicalHost};
|
||||
use harmony::infra::opnsense::OPNSenseFirewall;
|
||||
use harmony::inventory::Inventory;
|
||||
use harmony::modules::dhcp::DhcpScore;
|
||||
use harmony::modules::kvm::config::init_executor;
|
||||
use harmony::modules::kvm::{
|
||||
BootDevice, ForwardMode, KvmExecutor, NetworkConfig, NetworkRef, VmConfig,
|
||||
};
|
||||
use harmony::modules::load_balancer::LoadBalancerScore;
|
||||
use harmony::modules::opnsense::bootstrap::OPNsenseBootstrap;
|
||||
use harmony::modules::opnsense::dnat::{DnatRuleDef, DnatScore};
|
||||
use harmony::modules::opnsense::firewall::{
|
||||
BinatRuleDef, BinatScore, FilterRuleDef, FirewallRuleScore, OutboundNatScore, SnatRuleDef,
|
||||
};
|
||||
use harmony::modules::opnsense::lagg::{LaggDef, LaggScore};
|
||||
use harmony::modules::opnsense::node_exporter::NodeExporterScore;
|
||||
use harmony::modules::opnsense::vip::{VipDef, VipScore};
|
||||
use harmony::modules::opnsense::vlan::{VlanDef, VlanScore};
|
||||
use harmony::modules::tftp::TftpScore;
|
||||
use harmony::score::Score;
|
||||
use harmony::topology::{
|
||||
BackendServer, HealthCheck, HostBinding, HostConfig, LoadBalancerService, LogicalHost,
|
||||
};
|
||||
use harmony_inventory_agent::hwinfo::NetworkInterface;
|
||||
use harmony_macros::ip;
|
||||
use harmony_types::firewall::{
|
||||
Direction, FirewallAction, IpProtocol, LaggProtocol, NetworkProtocol, VipMode,
|
||||
};
|
||||
use harmony_types::id::Id;
|
||||
use harmony_types::net::{MacAddress, Url};
|
||||
use log::{info, warn};
|
||||
|
||||
const OPNSENSE_IMG_URL: &str =
|
||||
"https://mirror.ams1.nl.leaseweb.net/opnsense/releases/26.1/OPNsense-26.1-nano-amd64.img.bz2";
|
||||
const OPNSENSE_IMG_NAME: &str = "OPNsense-26.1-nano-amd64.img";
|
||||
|
||||
const VM_NAME: &str = "opn-integration";
|
||||
const NET_NAME: &str = "opn-test";
|
||||
// OPNsense nano defaults LAN to 192.168.1.1/24.
|
||||
// The libvirt network uses the same subnet so the host can reach the VM.
|
||||
const HOST_IP: &str = "192.168.1.10";
|
||||
const OPN_LAN_IP: &str = "192.168.1.1";
|
||||
/// Web GUI/API port — moved from 443 to avoid HAProxy conflicts.
|
||||
/// Set in the manual step: System > Settings > Administration > TCP Port.
|
||||
const OPN_API_PORT: u16 = 9443;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
harmony_cli::cli_logger::init();
|
||||
|
||||
let args: Vec<String> = std::env::args().collect();
|
||||
|
||||
if args.iter().any(|a| a == "--setup") {
|
||||
print_setup();
|
||||
return Ok(());
|
||||
}
|
||||
if args.iter().any(|a| a == "--check") {
|
||||
return check_prerequisites();
|
||||
}
|
||||
if args.iter().any(|a| a == "--download") {
|
||||
download_image().await?;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let executor = init_executor()?;
|
||||
|
||||
if args.iter().any(|a| a == "--clean") {
|
||||
return clean(&executor).await;
|
||||
}
|
||||
if args.iter().any(|a| a == "--status") {
|
||||
return status(&executor).await;
|
||||
}
|
||||
if args.iter().any(|a| a == "--boot") {
|
||||
let img_path = download_image().await?;
|
||||
return boot_vm(&executor, &img_path).await;
|
||||
}
|
||||
|
||||
if args.iter().any(|a| a == "--full") {
|
||||
// CI mode: boot + bootstrap + integration test in one shot
|
||||
let img_path = download_image().await?;
|
||||
boot_vm(&executor, &img_path).await?;
|
||||
return run_integration().await;
|
||||
}
|
||||
|
||||
// Default: run the integration test (assumes VM is booted + bootstrapped)
|
||||
check_prerequisites()?;
|
||||
run_integration().await
|
||||
}
|
||||
|
||||
// ── Phase 1: Boot VM ────────────────────────────────────────────────────
|
||||
|
||||
async fn boot_vm(
|
||||
executor: &KvmExecutor,
|
||||
img_path: &Path,
|
||||
) -> Result<(), Box<dyn std::error::Error>> {
|
||||
info!("Creating network and OPNsense VM...");
|
||||
|
||||
let network = NetworkConfig::builder(NET_NAME)
|
||||
.bridge("virbr-opn")
|
||||
.subnet(HOST_IP, 24)
|
||||
.forward(ForwardMode::Nat)
|
||||
.build();
|
||||
executor.ensure_network(network).await?;
|
||||
|
||||
// Copy and convert the nano image
|
||||
let vm_raw = image_dir().join(format!("{VM_NAME}-boot.img"));
|
||||
if !vm_raw.exists() {
|
||||
info!("Copying nano image...");
|
||||
std::fs::copy(img_path, &vm_raw)?;
|
||||
|
||||
// Inject config.xml with virtio interface names
|
||||
info!("Injecting config.xml for virtio NICs...");
|
||||
let config = harmony::modules::opnsense::image::minimal_config_xml(
|
||||
"vtnet1", "vtnet0", OPN_LAN_IP, 24,
|
||||
);
|
||||
harmony::modules::opnsense::image::replace_config_xml(&vm_raw, &config)?;
|
||||
}
|
||||
|
||||
let vm_disk = image_dir().join(format!("{VM_NAME}-boot.qcow2"));
|
||||
if !vm_disk.exists() {
|
||||
info!("Converting to qcow2...");
|
||||
run_cmd(
|
||||
"qemu-img",
|
||||
&[
|
||||
"convert",
|
||||
"-f",
|
||||
"raw",
|
||||
"-O",
|
||||
"qcow2",
|
||||
&vm_raw.to_string_lossy(),
|
||||
&vm_disk.to_string_lossy(),
|
||||
],
|
||||
)?;
|
||||
run_cmd("qemu-img", &["resize", &vm_disk.to_string_lossy(), "4G"])?;
|
||||
}
|
||||
|
||||
let vm = VmConfig::builder(VM_NAME)
|
||||
.vcpus(1)
|
||||
.memory_mib(1024)
|
||||
.disk_from_path(vm_disk.to_string_lossy().to_string())
|
||||
.network(NetworkRef::named(NET_NAME)) // vtnet0 = LAN
|
||||
.network(NetworkRef::named("default")) // vtnet1 = WAN
|
||||
.network(NetworkRef::named(NET_NAME)) // vtnet2 = LAGG member 1
|
||||
.network(NetworkRef::named(NET_NAME)) // vtnet3 = LAGG member 2
|
||||
.boot_order([BootDevice::Disk])
|
||||
.build();
|
||||
|
||||
executor.ensure_vm(vm).await?;
|
||||
executor.start_vm(VM_NAME).await?;
|
||||
info!("VM started. Waiting for web UI at https://{OPN_LAN_IP} ...");
|
||||
|
||||
wait_for_https(OPN_LAN_IP, 443).await?;
|
||||
|
||||
// ── Automated bootstrap (replaces manual browser interaction) ───
|
||||
info!("Bootstrapping OPNsense: login, abort wizard, enable SSH, set webgui port...");
|
||||
let bootstrap = OPNsenseBootstrap::new(&format!("https://{OPN_LAN_IP}"));
|
||||
bootstrap.login("root", "opnsense").await?;
|
||||
bootstrap.abort_wizard().await?;
|
||||
bootstrap.enable_ssh(true, true).await?;
|
||||
bootstrap
|
||||
.set_webgui_port(OPN_API_PORT, OPN_LAN_IP, false)
|
||||
.await?;
|
||||
|
||||
// Wait for the web UI to come back on the new port
|
||||
info!("Waiting for web UI on new port {OPN_API_PORT}...");
|
||||
if let Err(e) = OPNsenseBootstrap::wait_for_ready(
|
||||
&format!("https://{OPN_LAN_IP}:{OPN_API_PORT}"),
|
||||
std::time::Duration::from_secs(120),
|
||||
)
|
||||
.await
|
||||
{
|
||||
warn!("Web UI did not come up on port {OPN_API_PORT}: {e}");
|
||||
info!("Running diagnostics via SSH...");
|
||||
match OPNsenseBootstrap::diagnose_via_ssh(OPN_LAN_IP).await {
|
||||
Ok(report) => {
|
||||
info!("Diagnostic report:\n{}", report);
|
||||
}
|
||||
Err(diag_err) => warn!("Diagnostics failed: {diag_err}"),
|
||||
}
|
||||
return Err(e.into());
|
||||
}
|
||||
|
||||
// Verify SSH is reachable
|
||||
info!("Verifying SSH is reachable...");
|
||||
for _ in 0..30 {
|
||||
if check_tcp_port(OPN_LAN_IP, 22).await {
|
||||
break;
|
||||
}
|
||||
tokio::time::sleep(std::time::Duration::from_secs(2)).await;
|
||||
}
|
||||
if !check_tcp_port(OPN_LAN_IP, 22).await {
|
||||
return Err("SSH did not become reachable after bootstrap".into());
|
||||
}
|
||||
|
||||
println!();
|
||||
println!("OPNsense VM is running and fully bootstrapped:");
|
||||
println!(" Web UI: https://{OPN_LAN_IP}:{OPN_API_PORT}");
|
||||
println!(" SSH: root@{OPN_LAN_IP} (password: opnsense)");
|
||||
println!(" Login: root / opnsense");
|
||||
println!();
|
||||
println!("Run the integration test:");
|
||||
println!(" cargo run -p opnsense-vm-integration");
|
||||
println!();
|
||||
println!("Or use --full to boot + test in one shot (CI mode):");
|
||||
println!(" cargo run -p opnsense-vm-integration -- --full");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Phase 2: Integration test ───────────────────────────────────────────
|
||||
|
||||
async fn run_integration() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let vm_ip: IpAddr = OPN_LAN_IP.parse().unwrap();
|
||||
|
||||
// Verify SSH is reachable (bootstrap should have enabled it)
|
||||
info!("Checking SSH at {OPN_LAN_IP}:22...");
|
||||
if !check_tcp_port(OPN_LAN_IP, 22).await {
|
||||
eprintln!("SSH is not reachable at {OPN_LAN_IP}:22");
|
||||
eprintln!("Run '--boot' first (it will automatically enable SSH).");
|
||||
return Err("SSH not available".into());
|
||||
}
|
||||
info!("SSH is reachable");
|
||||
|
||||
// Create API key
|
||||
info!("Creating API key via SSH...");
|
||||
let (api_key, api_secret) = create_api_key_ssh(&vm_ip).await?;
|
||||
info!("API key created: {}...", &api_key[..api_key.len().min(12)]);
|
||||
|
||||
// Build topology
|
||||
let firewall_host = LogicalHost {
|
||||
ip: vm_ip.into(),
|
||||
name: VM_NAME.to_string(),
|
||||
};
|
||||
let api_creds = OPNSenseApiCredentials {
|
||||
key: api_key.clone(),
|
||||
secret: api_secret.clone(),
|
||||
};
|
||||
let ssh_creds = OPNSenseFirewallCredentials {
|
||||
username: "root".to_string(),
|
||||
password: "opnsense".to_string(),
|
||||
};
|
||||
let opnsense =
|
||||
OPNSenseFirewall::with_api_port(firewall_host, None, OPN_API_PORT, &api_creds, &ssh_creds)
|
||||
.await;
|
||||
|
||||
// Install packages
|
||||
let config = opnsense.get_opnsense_config();
|
||||
if !config.is_package_installed("os-haproxy").await {
|
||||
info!("Installing os-haproxy (may need firmware update first)...");
|
||||
match config.install_package("os-haproxy").await {
|
||||
Ok(()) => info!("os-haproxy installed"),
|
||||
Err(e) => {
|
||||
warn!("os-haproxy install failed: {e}");
|
||||
info!("Attempting firmware update...");
|
||||
// Trigger firmware update then retry
|
||||
let _: serde_json::Value = config
|
||||
.client()
|
||||
.post_typed("core", "firmware", "update", None::<&()>)
|
||||
.await
|
||||
.map_err(|e| format!("firmware update failed: {e}"))?;
|
||||
// Poll for completion
|
||||
for _ in 0..120 {
|
||||
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
|
||||
let status: serde_json::Value = match config
|
||||
.client()
|
||||
.get_typed("core", "firmware", "upgradestatus")
|
||||
.await
|
||||
{
|
||||
Ok(s) => s,
|
||||
Err(_) => continue, // VM may be rebooting
|
||||
};
|
||||
if status["status"].as_str() == Some("done")
|
||||
|| status["status"].as_str() == Some("reboot")
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
info!("Firmware updated, retrying package install...");
|
||||
// Wait for API to come back if reboot needed
|
||||
wait_for_https(OPN_LAN_IP, 443).await?;
|
||||
config.install_package("os-haproxy").await?;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
info!("os-haproxy already installed");
|
||||
}
|
||||
|
||||
// ── Build all Scores ──────────────────────────────────────────────
|
||||
|
||||
// 1. LoadBalancerScore — HAProxy with 2 frontends
|
||||
let lb_score = LoadBalancerScore {
|
||||
public_services: vec![
|
||||
LoadBalancerService {
|
||||
listening_port: format!("{OPN_LAN_IP}:16443").parse()?,
|
||||
backend_servers: vec![
|
||||
BackendServer {
|
||||
address: "10.50.0.10".into(),
|
||||
port: 6443,
|
||||
},
|
||||
BackendServer {
|
||||
address: "10.50.0.11".into(),
|
||||
port: 6443,
|
||||
},
|
||||
BackendServer {
|
||||
address: "10.50.0.12".into(),
|
||||
port: 6443,
|
||||
},
|
||||
],
|
||||
health_check: Some(HealthCheck::TCP(None)),
|
||||
},
|
||||
LoadBalancerService {
|
||||
listening_port: format!("{OPN_LAN_IP}:18443").parse()?,
|
||||
backend_servers: vec![
|
||||
BackendServer {
|
||||
address: "10.50.0.10".into(),
|
||||
port: 443,
|
||||
},
|
||||
BackendServer {
|
||||
address: "10.50.0.11".into(),
|
||||
port: 443,
|
||||
},
|
||||
],
|
||||
health_check: Some(HealthCheck::TCP(None)),
|
||||
},
|
||||
],
|
||||
private_services: vec![],
|
||||
wan_firewall_ports: vec![],
|
||||
};
|
||||
|
||||
// 2. DhcpScore — DHCP range + 2 static host bindings
|
||||
let dhcp_score = DhcpScore::new(
|
||||
vec![
|
||||
make_host_binding(
|
||||
"node1",
|
||||
ip!("192.168.1.50"),
|
||||
[0x52, 0x54, 0x00, 0xAA, 0xBB, 0x01],
|
||||
),
|
||||
make_host_binding(
|
||||
"node2",
|
||||
ip!("192.168.1.51"),
|
||||
[0x52, 0x54, 0x00, 0xAA, 0xBB, 0x02],
|
||||
),
|
||||
],
|
||||
None, // next_server
|
||||
None, // boot_filename
|
||||
None, // filename (BIOS)
|
||||
None, // filename64 (EFI)
|
||||
None, // filenameipxe
|
||||
(ip!("192.168.1.100"), ip!("192.168.1.200")), // dhcp_range
|
||||
Some("test.local".to_string()), // domain
|
||||
);
|
||||
|
||||
// 3. TftpScore — install os-tftp, configure, serve a dummy file
|
||||
let tftp_dir = std::env::temp_dir().join("harmony-tftp-test");
|
||||
std::fs::create_dir_all(&tftp_dir)?;
|
||||
std::fs::write(tftp_dir.join("test.txt"), "harmony integration test\n")?;
|
||||
let tftp_score = TftpScore::new(Url::LocalFolder(tftp_dir.to_string_lossy().to_string()));
|
||||
|
||||
// 4. NodeExporterScore — install + enable Prometheus node exporter
|
||||
let node_exporter_score = NodeExporterScore {};
|
||||
|
||||
// 5. VlanScore — create test VLANs on vtnet0
|
||||
let vlan_score = VlanScore {
|
||||
vlans: vec![
|
||||
VlanDef {
|
||||
parent_interface: "vtnet0".to_string(),
|
||||
tag: 100,
|
||||
description: "test-vlan-100".to_string(),
|
||||
},
|
||||
VlanDef {
|
||||
parent_interface: "vtnet0".to_string(),
|
||||
tag: 200,
|
||||
description: "test-vlan-200".to_string(),
|
||||
},
|
||||
],
|
||||
};
|
||||
|
||||
// 6. FirewallRuleScore — create test filter rules
|
||||
let fw_rule_score = FirewallRuleScore {
|
||||
rules: vec![FilterRuleDef {
|
||||
action: FirewallAction::Pass,
|
||||
direction: Direction::In,
|
||||
interface: "lan".to_string(),
|
||||
ip_protocol: IpProtocol::Inet,
|
||||
protocol: NetworkProtocol::Tcp,
|
||||
source_net: "any".to_string(),
|
||||
destination_net: "any".to_string(),
|
||||
destination_port: Some("8080".to_string()),
|
||||
gateway: None,
|
||||
description: "harmony-test-allow-8080".to_string(),
|
||||
log: true,
|
||||
}],
|
||||
};
|
||||
|
||||
// 7. OutboundNatScore — create test SNAT rule
|
||||
let snat_score = OutboundNatScore {
|
||||
rules: vec![SnatRuleDef {
|
||||
interface: "wan".to_string(),
|
||||
ip_protocol: IpProtocol::Inet,
|
||||
protocol: NetworkProtocol::Any,
|
||||
source_net: "192.168.1.0/24".to_string(),
|
||||
destination_net: "any".to_string(),
|
||||
target: "wanip".to_string(),
|
||||
description: "harmony-test-snat-lan".to_string(),
|
||||
log: false,
|
||||
nonat: false,
|
||||
}],
|
||||
};
|
||||
|
||||
// 8. BinatScore — create test 1:1 NAT rule
|
||||
let binat_score = BinatScore {
|
||||
rules: vec![BinatRuleDef {
|
||||
interface: "wan".to_string(),
|
||||
source_net: "192.168.1.50".to_string(),
|
||||
external: "10.0.0.50".to_string(),
|
||||
description: "harmony-test-binat".to_string(),
|
||||
log: false,
|
||||
}],
|
||||
};
|
||||
|
||||
// 9. VipScore — IP alias on LAN
|
||||
let vip_score = VipScore {
|
||||
vips: vec![VipDef {
|
||||
mode: VipMode::IpAlias,
|
||||
interface: "lan".to_string(),
|
||||
subnet: "192.168.1.250".to_string(),
|
||||
subnet_bits: 32,
|
||||
vhid: None,
|
||||
advbase: None,
|
||||
advskew: None,
|
||||
password: None,
|
||||
peer: None,
|
||||
}],
|
||||
};
|
||||
|
||||
// 10. DnatScore — port forward 8443 → 192.168.1.50:443
|
||||
let dnat_score = DnatScore {
|
||||
rules: vec![DnatRuleDef {
|
||||
interface: "wan".to_string(),
|
||||
ip_protocol: IpProtocol::Inet,
|
||||
protocol: NetworkProtocol::Tcp,
|
||||
destination: "wanip".to_string(),
|
||||
destination_port: "8443".to_string(),
|
||||
target: "192.168.1.50".to_string(),
|
||||
local_port: Some("443".to_string()),
|
||||
description: "harmony-test-dnat-8443".to_string(),
|
||||
log: false,
|
||||
register_rule: true,
|
||||
}],
|
||||
};
|
||||
|
||||
// 11. LaggScore — failover LAGG with vtnet2 + vtnet3
|
||||
let lagg_score = LaggScore {
|
||||
laggs: vec![LaggDef {
|
||||
members: vec!["vtnet2".to_string(), "vtnet3".to_string()],
|
||||
protocol: LaggProtocol::Failover,
|
||||
description: "harmony-test-lagg".to_string(),
|
||||
mtu: None,
|
||||
lacp_fast_timeout: false,
|
||||
}],
|
||||
};
|
||||
|
||||
// ── Run all Scores ──────────────────────────────────────────────
|
||||
info!("Running all Scores...");
|
||||
let scores: Vec<Box<dyn Score<OPNSenseFirewall>>> = vec![
|
||||
Box::new(lb_score),
|
||||
Box::new(dhcp_score),
|
||||
Box::new(tftp_score),
|
||||
Box::new(node_exporter_score),
|
||||
Box::new(vlan_score),
|
||||
Box::new(fw_rule_score),
|
||||
Box::new(snat_score),
|
||||
Box::new(binat_score),
|
||||
Box::new(vip_score),
|
||||
Box::new(dnat_score),
|
||||
Box::new(lagg_score),
|
||||
];
|
||||
let args = harmony_cli::Args {
|
||||
yes: true,
|
||||
filter: None,
|
||||
interactive: false,
|
||||
all: true,
|
||||
number: 0,
|
||||
list: false,
|
||||
};
|
||||
harmony_cli::run_cli(Inventory::autoload(), opnsense, scores, args).await?;
|
||||
|
||||
// ── Verify via API ──────────────────────────────────────────────
|
||||
info!("Verifying all Scores via API...");
|
||||
let client = opnsense_api::OpnsenseClient::builder()
|
||||
.base_url(format!("https://{OPN_LAN_IP}:{OPN_API_PORT}/api"))
|
||||
.auth_from_key_secret(&api_key, &api_secret)
|
||||
.skip_tls_verify()
|
||||
.timeout_secs(60)
|
||||
.build()?;
|
||||
|
||||
// Verify HAProxy
|
||||
let haproxy: serde_json::Value = client.get_typed("haproxy", "settings", "get").await?;
|
||||
let frontends = haproxy["haproxy"]["frontends"]["frontend"]
|
||||
.as_object()
|
||||
.map(|m| m.len())
|
||||
.unwrap_or(0);
|
||||
info!(" HAProxy frontends: {frontends}");
|
||||
assert!(
|
||||
frontends >= 2,
|
||||
"Expected at least 2 HAProxy frontends, got {frontends}"
|
||||
);
|
||||
|
||||
// Verify DHCP (dnsmasq hosts)
|
||||
let dnsmasq: serde_json::Value = client.get_typed("dnsmasq", "settings", "get").await?;
|
||||
let hosts = dnsmasq["dnsmasq"]["hosts"]
|
||||
.as_object()
|
||||
.map(|m| m.len())
|
||||
.unwrap_or(0);
|
||||
info!(" Dnsmasq hosts: {hosts}");
|
||||
assert!(hosts >= 2, "Expected at least 2 dnsmasq hosts, got {hosts}");
|
||||
|
||||
// Verify DHCP range
|
||||
let ranges = dnsmasq["dnsmasq"]["dhcp_ranges"]
|
||||
.as_object()
|
||||
.map(|m| m.len())
|
||||
.unwrap_or(0);
|
||||
info!(" Dnsmasq DHCP ranges: {ranges}");
|
||||
assert!(ranges >= 1, "Expected at least 1 DHCP range, got {ranges}");
|
||||
|
||||
// Verify TFTP
|
||||
let tftp: serde_json::Value = client.get_typed("tftp", "general", "get").await?;
|
||||
let tftp_enabled = tftp["general"]["enabled"].as_str() == Some("1");
|
||||
info!(" TFTP enabled: {tftp_enabled}");
|
||||
assert!(tftp_enabled, "TFTP should be enabled");
|
||||
|
||||
// Verify Node Exporter
|
||||
let ne: serde_json::Value = client.get_typed("nodeexporter", "general", "get").await?;
|
||||
let ne_enabled = ne["general"]["enabled"].as_str() == Some("1");
|
||||
info!(" Node Exporter enabled: {ne_enabled}");
|
||||
assert!(ne_enabled, "Node Exporter should be enabled");
|
||||
|
||||
// Verify VLANs
|
||||
let vlans: serde_json::Value = client
|
||||
.get_typed("interfaces", "vlan_settings", "get")
|
||||
.await?;
|
||||
let vlan_count = vlans["vlan"]["vlan"]
|
||||
.as_object()
|
||||
.map(|m| m.len())
|
||||
.unwrap_or(0);
|
||||
info!(" VLANs: {vlan_count}");
|
||||
assert!(
|
||||
vlan_count >= 2,
|
||||
"Expected at least 2 VLANs, got {vlan_count}"
|
||||
);
|
||||
|
||||
// Verify firewall rules (search endpoint returns rows)
|
||||
let fw_rules: serde_json::Value = client.get_typed("firewall", "filter", "searchRule").await?;
|
||||
let fw_count = fw_rules["rowCount"].as_i64().unwrap_or(0);
|
||||
info!(" Firewall rules: {fw_count}");
|
||||
assert!(
|
||||
fw_count >= 1,
|
||||
"Expected at least 1 firewall rule, got {fw_count}"
|
||||
);
|
||||
|
||||
// Verify VIPs
|
||||
let vips: serde_json::Value = client
|
||||
.get_typed("interfaces", "vip_settings", "searchItem")
|
||||
.await?;
|
||||
let vip_count = vips["rowCount"].as_i64().unwrap_or(0);
|
||||
info!(" VIPs: {vip_count}");
|
||||
assert!(vip_count >= 1, "Expected at least 1 VIP, got {vip_count}");
|
||||
|
||||
// Verify DNat rules
|
||||
let dnat_rules: serde_json::Value = client.get_typed("firewall", "d_nat", "searchRule").await?;
|
||||
let dnat_count = dnat_rules["rowCount"].as_i64().unwrap_or(0);
|
||||
info!(" DNat rules: {dnat_count}");
|
||||
assert!(
|
||||
dnat_count >= 1,
|
||||
"Expected at least 1 DNat rule, got {dnat_count}"
|
||||
);
|
||||
|
||||
// Verify LAGGs
|
||||
let laggs: serde_json::Value = client
|
||||
.get_typed("interfaces", "lagg_settings", "get")
|
||||
.await?;
|
||||
let lagg_count = laggs["lagg"]["lagg"]
|
||||
.as_object()
|
||||
.map(|m| m.len())
|
||||
.unwrap_or(0);
|
||||
info!(" LAGGs: {lagg_count}");
|
||||
assert!(
|
||||
lagg_count >= 1,
|
||||
"Expected at least 1 LAGG, got {lagg_count}"
|
||||
);
|
||||
|
||||
// Clean up temp files
|
||||
let _ = std::fs::remove_dir_all(&tftp_dir);
|
||||
|
||||
println!();
|
||||
println!("PASSED — All OPNsense integration tests successful:");
|
||||
println!(" - LoadBalancerScore: {frontends} HAProxy frontends configured");
|
||||
println!(" - DhcpScore: {hosts} static hosts, {ranges} DHCP range(s)");
|
||||
println!(" - TftpScore: TFTP server enabled");
|
||||
println!(" - NodeExporterScore: Node Exporter enabled");
|
||||
println!(" - VlanScore: {vlan_count} VLANs configured");
|
||||
println!(" - FirewallRuleScore: {fw_count} filter rules");
|
||||
println!(" - OutboundNatScore: SNAT rule configured");
|
||||
println!(" - BinatScore: 1:1 NAT rule configured");
|
||||
println!(" - VipScore: {vip_count} VIPs configured");
|
||||
println!(" - DnatScore: {dnat_count} DNat rules");
|
||||
println!(" - LaggScore: {lagg_count} LAGGs configured");
|
||||
println!();
|
||||
println!("VM is running at {OPN_LAN_IP}. Use --clean to tear down.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ── Helpers ─────────────────────────────────────────────────────────────
|
||||
|
||||
fn print_setup() {
|
||||
println!("Run the setup script for sudo-less libvirt access:");
|
||||
println!(" ./examples/opnsense_vm_integration/setup-libvirt.sh");
|
||||
println!();
|
||||
println!("Verify with:");
|
||||
println!(" cargo run -p opnsense-vm-integration -- --check");
|
||||
}
|
||||
|
||||
fn check_prerequisites() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let mut ok = true;
|
||||
|
||||
let libvirtd = std::process::Command::new("systemctl")
|
||||
.args(["is-active", "libvirtd"])
|
||||
.output();
|
||||
match libvirtd {
|
||||
Ok(out) if out.status.success() => println!("[ok] libvirtd is running"),
|
||||
_ => {
|
||||
println!("[FAIL] libvirtd is not running");
|
||||
ok = false;
|
||||
}
|
||||
}
|
||||
|
||||
let virsh = std::process::Command::new("virsh")
|
||||
.args(["-c", "qemu:///system", "version"])
|
||||
.output();
|
||||
match virsh {
|
||||
Ok(out) if out.status.success() => {
|
||||
let v = String::from_utf8_lossy(&out.stdout);
|
||||
println!("[ok] virsh connects: {}", v.lines().next().unwrap_or("?"));
|
||||
}
|
||||
_ => {
|
||||
println!("[FAIL] Cannot connect to qemu:///system");
|
||||
ok = false;
|
||||
}
|
||||
}
|
||||
|
||||
let pool = std::process::Command::new("virsh")
|
||||
.args(["-c", "qemu:///system", "pool-info", "default"])
|
||||
.output();
|
||||
match pool {
|
||||
Ok(out) if out.status.success() => println!("[ok] Default storage pool exists"),
|
||||
_ => {
|
||||
println!("[FAIL] Default storage pool not found");
|
||||
ok = false;
|
||||
}
|
||||
}
|
||||
|
||||
if which("bunzip2") {
|
||||
println!("[ok] bunzip2 available");
|
||||
} else {
|
||||
println!("[FAIL] bunzip2 not found");
|
||||
ok = false;
|
||||
}
|
||||
|
||||
if which("qemu-img") {
|
||||
println!("[ok] qemu-img available");
|
||||
} else {
|
||||
println!("[FAIL] qemu-img not found");
|
||||
ok = false;
|
||||
}
|
||||
|
||||
// Check Docker + libvirt FORWARD conflict
|
||||
if which("docker") {
|
||||
let fw_backend = std::fs::read_to_string("/etc/libvirt/network.conf").unwrap_or_default();
|
||||
if fw_backend
|
||||
.lines()
|
||||
.any(|l| l.trim().starts_with("firewall_backend") && l.contains("iptables"))
|
||||
{
|
||||
println!("[ok] libvirt uses iptables backend (Docker compatible)");
|
||||
} else {
|
||||
println!("[WARN] Docker detected but libvirt uses nftables backend");
|
||||
println!(" VM NAT may not work. Run setup-libvirt.sh to fix.");
|
||||
}
|
||||
}
|
||||
|
||||
if !ok {
|
||||
println!("\nRun --setup for setup instructions.");
|
||||
return Err("Prerequisites not met".into());
|
||||
}
|
||||
println!("\nAll prerequisites met.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn which(cmd: &str) -> bool {
|
||||
std::process::Command::new("which")
|
||||
.arg(cmd)
|
||||
.output()
|
||||
.map(|o| o.status.success())
|
||||
.unwrap_or(false)
|
||||
}
|
||||
|
||||
fn run_cmd(cmd: &str, args: &[&str]) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let status = std::process::Command::new(cmd).args(args).status()?;
|
||||
if !status.success() {
|
||||
return Err(format!("{cmd} failed").into());
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn image_dir() -> PathBuf {
|
||||
let dir = std::env::var("HARMONY_KVM_IMAGE_DIR").unwrap_or_else(|_| {
|
||||
dirs::data_dir()
|
||||
.unwrap_or_else(|| PathBuf::from("/tmp"))
|
||||
.join("harmony")
|
||||
.join("kvm")
|
||||
.join("images")
|
||||
.to_string_lossy()
|
||||
.to_string()
|
||||
});
|
||||
PathBuf::from(dir)
|
||||
}
|
||||
|
||||
/// FIXME this should be using the harmony-asset crate
|
||||
async fn download_image() -> Result<PathBuf, Box<dyn std::error::Error>> {
|
||||
let dir = image_dir();
|
||||
std::fs::create_dir_all(&dir)?;
|
||||
let img_path = dir.join(OPNSENSE_IMG_NAME);
|
||||
|
||||
if img_path.exists() {
|
||||
info!("Image cached: {}", img_path.display());
|
||||
return Ok(img_path);
|
||||
}
|
||||
|
||||
let bz2_path = dir.join(format!("{OPNSENSE_IMG_NAME}.bz2"));
|
||||
if !bz2_path.exists() {
|
||||
info!("Downloading OPNsense nano image (~350MB)...");
|
||||
let response = reqwest::Client::builder()
|
||||
.timeout(std::time::Duration::from_secs(600))
|
||||
.build()?
|
||||
.get(OPNSENSE_IMG_URL)
|
||||
.send()
|
||||
.await?;
|
||||
if !response.status().is_success() {
|
||||
return Err(format!("Download failed: HTTP {}", response.status()).into());
|
||||
}
|
||||
let bytes = response.bytes().await?;
|
||||
std::fs::write(&bz2_path, &bytes)?;
|
||||
info!("Downloaded {} bytes", bytes.len());
|
||||
}
|
||||
|
||||
info!("Decompressing...");
|
||||
run_cmd("bunzip2", &["--keep", &bz2_path.to_string_lossy()])?;
|
||||
info!("Image ready: {}", img_path.display());
|
||||
Ok(img_path)
|
||||
}
|
||||
|
||||
async fn clean(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
info!("Cleaning up...");
|
||||
let _ = executor.destroy_vm(VM_NAME).await;
|
||||
let _ = executor.undefine_vm(VM_NAME).await;
|
||||
let _ = executor.delete_network(NET_NAME).await;
|
||||
for ext in ["img", "qcow2"] {
|
||||
let path = image_dir().join(format!("{VM_NAME}-boot.{ext}"));
|
||||
if path.exists() {
|
||||
std::fs::remove_file(&path)?;
|
||||
info!("Removed: {}", path.display());
|
||||
}
|
||||
}
|
||||
info!("Done. (Original image cached at {})", image_dir().display());
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn status(executor: &KvmExecutor) -> Result<(), Box<dyn std::error::Error>> {
|
||||
match executor.vm_status(VM_NAME).await {
|
||||
Ok(s) => {
|
||||
println!("{VM_NAME}: {s:?}");
|
||||
if let Ok(Some(ip)) = executor.vm_ip(VM_NAME).await {
|
||||
println!(" WAN IP: {ip}");
|
||||
}
|
||||
println!(" LAN IP: {OPN_LAN_IP} (static)");
|
||||
let https_default = check_tcp_port(OPN_LAN_IP, 443).await;
|
||||
let https_custom = check_tcp_port(OPN_LAN_IP, OPN_API_PORT).await;
|
||||
let ssh = check_tcp_port(OPN_LAN_IP, 22).await;
|
||||
if https_custom {
|
||||
println!(" API: responding on port {OPN_API_PORT}");
|
||||
} else if https_default {
|
||||
println!(" API: responding on port 443 (change to {OPN_API_PORT} in web UI)");
|
||||
} else {
|
||||
println!(" API: not responding");
|
||||
}
|
||||
println!(
|
||||
" SSH: {}",
|
||||
if ssh { "responding" } else { "not responding" }
|
||||
);
|
||||
}
|
||||
Err(_) => println!("{VM_NAME}: not found (run --boot first)"),
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn wait_for_https(ip: &str, port: u16) -> Result<(), Box<dyn std::error::Error>> {
|
||||
let client = reqwest::Client::builder()
|
||||
.danger_accept_invalid_certs(true)
|
||||
.timeout(std::time::Duration::from_secs(5))
|
||||
.build()?;
|
||||
let url = format!("https://{ip}:{port}");
|
||||
|
||||
for i in 0..60 {
|
||||
if client.get(&url).send().await.is_ok() {
|
||||
info!("Web UI responding (attempt {i})");
|
||||
return Ok(());
|
||||
}
|
||||
if i % 10 == 0 {
|
||||
info!("Waiting for OPNsense... (attempt {i})");
|
||||
}
|
||||
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
|
||||
}
|
||||
Err("OPNsense web UI did not respond within 5 minutes".into())
|
||||
}
|
||||
|
||||
async fn check_tcp_port(ip: &str, port: u16) -> bool {
|
||||
tokio::time::timeout(
|
||||
std::time::Duration::from_secs(3),
|
||||
tokio::net::TcpStream::connect(format!("{ip}:{port}")),
|
||||
)
|
||||
.await
|
||||
.map(|r| r.is_ok())
|
||||
.unwrap_or(false)
|
||||
}
|
||||
|
||||
/// Build a HostBinding from a name, IP, and MAC bytes for use with DhcpScore.
|
||||
fn make_host_binding(name: &str, ip: IpAddr, mac: [u8; 6]) -> HostBinding {
|
||||
let logical = LogicalHost {
|
||||
ip,
|
||||
name: name.to_string(),
|
||||
};
|
||||
let physical = PhysicalHost {
|
||||
id: Id::from(name.to_string()),
|
||||
category: HostCategory::Server,
|
||||
network: vec![NetworkInterface {
|
||||
name: "eth0".to_string(),
|
||||
mac_address: MacAddress(mac),
|
||||
speed_mbps: None,
|
||||
is_up: true,
|
||||
mtu: 1500,
|
||||
ipv4_addresses: vec![ip.to_string()],
|
||||
ipv6_addresses: vec![],
|
||||
driver: String::new(),
|
||||
firmware_version: None,
|
||||
}],
|
||||
storage: vec![],
|
||||
labels: vec![],
|
||||
memory_modules: vec![],
|
||||
cpus: vec![],
|
||||
};
|
||||
HostBinding::new(logical, physical, HostConfig::new(None))
|
||||
}
|
||||
|
||||
async fn create_api_key_ssh(ip: &IpAddr) -> Result<(String, String), Box<dyn std::error::Error>> {
|
||||
use opnsense_config::config::{OPNsenseShell, SshCredentials, SshOPNSenseShell};
|
||||
|
||||
let ssh_config = Arc::new(russh::client::Config {
|
||||
inactivity_timeout: None,
|
||||
..<_>::default()
|
||||
});
|
||||
let credentials = SshCredentials::Password {
|
||||
username: "root".to_string(),
|
||||
password: "opnsense".to_string(),
|
||||
};
|
||||
let shell = SshOPNSenseShell::new((*ip, 22), credentials, ssh_config);
|
||||
|
||||
let php_script = r#"<?php
|
||||
require_once '/usr/local/etc/inc/config.inc';
|
||||
$key = bin2hex(random_bytes(20));
|
||||
$secret = bin2hex(random_bytes(40));
|
||||
$config = OPNsense\Core\Config::getInstance();
|
||||
foreach ($config->object()->system->user as $user) {
|
||||
if ((string)$user->name === 'root') {
|
||||
if (!isset($user->apikeys)) { $user->addChild('apikeys'); }
|
||||
$item = $user->apikeys->addChild('item');
|
||||
$item->addChild('key', $key);
|
||||
$item->addChild('secret', crypt($secret, '$6$' . bin2hex(random_bytes(8)) . '$'));
|
||||
$config->save();
|
||||
echo $key . "\n" . $secret . "\n";
|
||||
exit(0);
|
||||
}
|
||||
}
|
||||
echo "ERROR: root user not found\n";
|
||||
exit(1);
|
||||
"#;
|
||||
|
||||
info!("Writing API key script...");
|
||||
shell
|
||||
.write_content_to_file(php_script, "/tmp/create_api_key.php")
|
||||
.await?;
|
||||
|
||||
info!("Executing API key generation...");
|
||||
let output = shell
|
||||
.exec("php /tmp/create_api_key.php && rm /tmp/create_api_key.php")
|
||||
.await?;
|
||||
|
||||
let lines: Vec<&str> = output.trim().lines().collect();
|
||||
if lines.len() >= 2 && !lines[0].starts_with("ERROR") {
|
||||
Ok((lines[0].to_string(), lines[1].to_string()))
|
||||
} else {
|
||||
Err(format!("API key creation failed: {output}").into())
|
||||
}
|
||||
}
|
||||
@@ -1,14 +0,0 @@
|
||||
[package]
|
||||
name = "example-penpot"
|
||||
edition = "2024"
|
||||
version.workspace = true
|
||||
readme.workspace = true
|
||||
license.workspace = true
|
||||
|
||||
[dependencies]
|
||||
harmony = { path = "../../harmony" }
|
||||
harmony_cli = { path = "../../harmony_cli" }
|
||||
harmony_macros = { path = "../../harmony_macros" }
|
||||
harmony_types = { path = "../../harmony_types" }
|
||||
tokio.workspace = true
|
||||
url.workspace = true
|
||||
@@ -1,41 +0,0 @@
|
||||
use std::{collections::HashMap, str::FromStr};
|
||||
|
||||
use harmony::{
|
||||
inventory::Inventory,
|
||||
modules::helm::chart::{HelmChartScore, HelmRepository, NonBlankString},
|
||||
topology::K8sAnywhereTopology,
|
||||
};
|
||||
use harmony_macros::hurl;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
// let mut chart_values = HashMap::new();
|
||||
// chart_values.insert(
|
||||
// NonBlankString::from_str("persistence.assets.enabled").unwrap(),
|
||||
// "true".into(),
|
||||
// );
|
||||
// let penpot_chart = HelmChartScore {
|
||||
// namespace: Some(NonBlankString::from_str("penpot").unwrap()),
|
||||
// release_name: NonBlankString::from_str("penpot").unwrap(),
|
||||
// chart_name: NonBlankString::from_str("penpot/penpot").unwrap(),
|
||||
// chart_version: None,
|
||||
// values_overrides: Some(chart_values),
|
||||
// values_yaml: None,
|
||||
// create_namespace: true,
|
||||
// install_only: true,
|
||||
// repository: Some(HelmRepository::new(
|
||||
// "penpot".to_string(),
|
||||
// hurl!("http://helm.penpot.app"),
|
||||
// true,
|
||||
// )),
|
||||
// };
|
||||
//
|
||||
// harmony_cli::run(
|
||||
// Inventory::autoload(),
|
||||
// K8sAnywhereTopology::from_env(),
|
||||
// vec![Box::new(penpot_chart)],
|
||||
// None,
|
||||
// )
|
||||
// .await
|
||||
// .unwrap();
|
||||
}
|
||||
@@ -32,21 +32,17 @@ pub async fn get_topology() -> HAClusterTopology {
|
||||
|
||||
let switch_client = Arc::new(switch_client);
|
||||
|
||||
let config = SecretManager::get_or_prompt::<OPNSenseFirewallConfig>()
|
||||
.await
|
||||
.unwrap();
|
||||
let api_creds = harmony::config::secret::OPNSenseApiCredentials {
|
||||
key: config.username.clone(),
|
||||
secret: config.password.clone(),
|
||||
};
|
||||
let ssh_creds = harmony::config::secret::OPNSenseFirewallCredentials {
|
||||
username: config.username,
|
||||
password: config.password,
|
||||
};
|
||||
let config = SecretManager::get_or_prompt::<OPNSenseFirewallConfig>().await;
|
||||
let config = config.unwrap();
|
||||
|
||||
let opnsense = Arc::new(
|
||||
harmony::infra::opnsense::OPNSenseFirewall::new(firewall, None, &api_creds, &ssh_creds)
|
||||
.await,
|
||||
harmony::infra::opnsense::OPNSenseFirewall::new(
|
||||
firewall,
|
||||
None,
|
||||
&config.username,
|
||||
&config.password,
|
||||
)
|
||||
.await,
|
||||
);
|
||||
let lan_subnet = ipv4!("192.168.40.0");
|
||||
let gateway_ipv4 = ipv4!("192.168.40.1");
|
||||
|
||||
@@ -69,6 +69,5 @@ fn build_large_score() -> LoadBalancerScore {
|
||||
lb_service.clone(),
|
||||
lb_service.clone(),
|
||||
],
|
||||
wan_firewall_ports: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
@@ -7,7 +7,6 @@ async fn main() {
|
||||
let zitadel = ZitadelScore {
|
||||
host: "sso.sto1.nationtech.io".to_string(),
|
||||
zitadel_version: "v4.12.1".to_string(),
|
||||
external_secure: true,
|
||||
};
|
||||
|
||||
harmony_cli::run(
|
||||
|
||||
BIN
examples/zitadel/zitadel-9.24.0.tgz
Normal file
BIN
examples/zitadel/zitadel-9.24.0.tgz
Normal file
Binary file not shown.
@@ -52,7 +52,7 @@
|
||||
//! }
|
||||
//! ```
|
||||
|
||||
use kube::{Error, Resource, ResourceExt, api::DynamicObject};
|
||||
use kube::{Error, Resource, ResourceExt, api::DynamicObject, core::ErrorResponse};
|
||||
use serde::Serialize;
|
||||
use serde_json;
|
||||
|
||||
@@ -117,16 +117,13 @@ impl ResourceBundle {
|
||||
/// Delete all resources in this bundle from the cluster.
|
||||
/// Resources are deleted in reverse order to respect dependencies.
|
||||
pub async fn delete(&self, client: &K8sClient) -> Result<(), Error> {
|
||||
// FIXME delete all in parallel and retry using kube::client::retry::RetryPolicy
|
||||
for res in self.resources.iter().rev() {
|
||||
let api = client.get_api_for_dynamic_object(res, res.namespace().as_deref())?;
|
||||
let name = res.name_any();
|
||||
// FIXME this swallows all errors. Swallowing a 404 is ok but other errors must be
|
||||
// handled properly (such as retrying). A normal error case is when we delete a
|
||||
// resource bundle with dependencies between various resources. Such as a pod with a
|
||||
// dependency on a ClusterRoleBinding. Trying to delete the ClusterRoleBinding first
|
||||
// is expected to fail
|
||||
let _ = api.delete(&name, &kube::api::DeleteParams::default()).await;
|
||||
match api.delete(&name, &kube::api::DeleteParams::default()).await {
|
||||
Ok(_) | Err(Error::Api(ErrorResponse { code: 404, .. })) => {}
|
||||
Err(e) => return Err(e),
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
@@ -4,7 +4,7 @@ use kube::config::{KubeConfigOptions, Kubeconfig};
|
||||
use kube::{Client, Config, Discovery, Error};
|
||||
use log::error;
|
||||
use serde::Serialize;
|
||||
use tokio::sync::{OnceCell, RwLock};
|
||||
use tokio::sync::OnceCell;
|
||||
|
||||
use crate::types::KubernetesDistribution;
|
||||
|
||||
@@ -23,9 +23,7 @@ pub struct K8sClient {
|
||||
/// to stdout instead. Initialised from the `DRY_RUN` environment variable.
|
||||
pub(crate) dry_run: bool,
|
||||
pub(crate) k8s_distribution: Arc<OnceCell<KubernetesDistribution>>,
|
||||
/// API discovery cache. Wrapped in `RwLock` so it can be invalidated
|
||||
/// after installing CRDs or operators that register new API groups.
|
||||
pub(crate) discovery: Arc<RwLock<Option<Arc<Discovery>>>>,
|
||||
pub(crate) discovery: Arc<OnceCell<Discovery>>,
|
||||
}
|
||||
|
||||
impl Serialize for K8sClient {
|
||||
@@ -54,7 +52,7 @@ impl K8sClient {
|
||||
dry_run: read_dry_run_from_env(),
|
||||
client,
|
||||
k8s_distribution: Arc::new(OnceCell::new()),
|
||||
discovery: Arc::new(RwLock::new(None)),
|
||||
discovery: Arc::new(OnceCell::new()),
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
|
||||
use kube::{Discovery, Error};
|
||||
@@ -16,55 +15,38 @@ impl K8sClient {
|
||||
self.client.clone().apiserver_version().await
|
||||
}
|
||||
|
||||
/// Runs API discovery, caching the result. Call [`invalidate_discovery`]
|
||||
/// after installing CRDs or operators to force a refresh on the next call.
|
||||
pub async fn discovery(&self) -> Result<Arc<Discovery>, Error> {
|
||||
// Fast path: return cached discovery
|
||||
{
|
||||
let guard = self.discovery.read().await;
|
||||
if let Some(d) = guard.as_ref() {
|
||||
return Ok(Arc::clone(d));
|
||||
}
|
||||
}
|
||||
|
||||
// Slow path: run discovery with retries
|
||||
/// Runs (and caches) Kubernetes API discovery with exponential-backoff retries.
|
||||
pub async fn discovery(&self) -> Result<&Discovery, Error> {
|
||||
let retry_strategy = ExponentialBackoff::from_millis(1000)
|
||||
.max_delay(Duration::from_secs(32))
|
||||
.take(6);
|
||||
|
||||
let attempt = Mutex::new(0u32);
|
||||
let d = Retry::spawn(retry_strategy, || async {
|
||||
Retry::spawn(retry_strategy, || async {
|
||||
let mut n = attempt.lock().await;
|
||||
*n += 1;
|
||||
debug!("Running Kubernetes API discovery (attempt {})", *n);
|
||||
Discovery::new(self.client.clone())
|
||||
.run()
|
||||
.await
|
||||
.map_err(|e| {
|
||||
warn!("Kubernetes API discovery failed (attempt {}): {}", *n, e);
|
||||
e
|
||||
match self
|
||||
.discovery
|
||||
.get_or_try_init(async || {
|
||||
debug!("Running Kubernetes API discovery (attempt {})", *n);
|
||||
let d = Discovery::new(self.client.clone()).run().await?;
|
||||
debug!("Kubernetes API discovery completed");
|
||||
Ok(d)
|
||||
})
|
||||
.await
|
||||
{
|
||||
Ok(d) => Ok(d),
|
||||
Err(e) => {
|
||||
warn!("Kubernetes API discovery failed (attempt {}): {}", *n, e);
|
||||
Err(e)
|
||||
}
|
||||
}
|
||||
})
|
||||
.await
|
||||
.map_err(|e| {
|
||||
error!("Kubernetes API discovery failed after all retries: {}", e);
|
||||
e
|
||||
})?;
|
||||
|
||||
debug!("Kubernetes API discovery completed");
|
||||
let d = Arc::new(d);
|
||||
let mut guard = self.discovery.write().await;
|
||||
*guard = Some(Arc::clone(&d));
|
||||
Ok(d)
|
||||
}
|
||||
|
||||
/// Clears the cached API discovery so the next call to [`discovery`]
|
||||
/// re-fetches from the API server. Call this after installing CRDs or
|
||||
/// operators that register new API groups.
|
||||
pub async fn invalidate_discovery(&self) {
|
||||
let mut guard = self.discovery.write().await;
|
||||
*guard = None;
|
||||
debug!("API discovery cache invalidated");
|
||||
})
|
||||
}
|
||||
|
||||
/// Detect which Kubernetes distribution is running. Result is cached for
|
||||
|
||||
@@ -6,10 +6,8 @@ pub mod discovery;
|
||||
pub mod helper;
|
||||
pub mod node;
|
||||
pub mod pod;
|
||||
pub mod port_forward;
|
||||
pub mod resources;
|
||||
pub mod types;
|
||||
|
||||
pub use client::K8sClient;
|
||||
pub use port_forward::PortForwardHandle;
|
||||
pub use types::{DrainOptions, KubernetesDistribution, NodeFile, ScopeResolver, WriteMode};
|
||||
|
||||
@@ -190,77 +190,4 @@ impl K8sClient {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Execute a command in a specific pod by name, capturing stdout.
|
||||
///
|
||||
/// Returns the captured stdout on success. On failure, the error string
|
||||
/// includes stderr output from the remote command.
|
||||
pub async fn exec_pod_capture_output(
|
||||
&self,
|
||||
pod_name: &str,
|
||||
namespace: Option<&str>,
|
||||
command: Vec<&str>,
|
||||
) -> Result<String, String> {
|
||||
let api: Api<Pod> = match namespace {
|
||||
Some(ns) => Api::namespaced(self.client.clone(), ns),
|
||||
None => Api::default_namespaced(self.client.clone()),
|
||||
};
|
||||
|
||||
match api
|
||||
.exec(
|
||||
pod_name,
|
||||
command,
|
||||
&AttachParams::default().stdout(true).stderr(true),
|
||||
)
|
||||
.await
|
||||
{
|
||||
Err(e) => Err(e.to_string()),
|
||||
Ok(mut process) => {
|
||||
let status = process
|
||||
.take_status()
|
||||
.expect("No status handle")
|
||||
.await
|
||||
.expect("Status channel closed");
|
||||
|
||||
let mut stdout_buf = String::new();
|
||||
if let Some(mut stdout) = process.stdout() {
|
||||
stdout
|
||||
.read_to_string(&mut stdout_buf)
|
||||
.await
|
||||
.map_err(|e| format!("Failed to read stdout: {e}"))?;
|
||||
}
|
||||
|
||||
let mut stderr_buf = String::new();
|
||||
if let Some(mut stderr) = process.stderr() {
|
||||
stderr
|
||||
.read_to_string(&mut stderr_buf)
|
||||
.await
|
||||
.map_err(|e| format!("Failed to read stderr: {e}"))?;
|
||||
}
|
||||
|
||||
if let Some(s) = status.status {
|
||||
debug!("exec_pod status: {} - {:?}", s, status.details);
|
||||
if s == "Success" {
|
||||
Ok(stdout_buf)
|
||||
} else {
|
||||
Err(format!("{stderr_buf}"))
|
||||
}
|
||||
} else {
|
||||
Err("No inner status from pod exec".to_string())
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Execute a command in a specific pod by name (no output capture).
|
||||
pub async fn exec_pod(
|
||||
&self,
|
||||
pod_name: &str,
|
||||
namespace: Option<&str>,
|
||||
command: Vec<&str>,
|
||||
) -> Result<(), String> {
|
||||
self.exec_pod_capture_output(pod_name, namespace, command)
|
||||
.await
|
||||
.map(|_| ())
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,133 +0,0 @@
|
||||
use std::net::SocketAddr;
|
||||
|
||||
use k8s_openapi::api::core::v1::Pod;
|
||||
use kube::{Api, Error, error::DiscoveryError};
|
||||
use log::{debug, error, info};
|
||||
use tokio::net::TcpListener;
|
||||
|
||||
use crate::client::K8sClient;
|
||||
|
||||
/// Handle to a running port-forward. The forward is stopped when the handle is
|
||||
/// dropped (or when [`abort`](Self::abort) is called explicitly).
|
||||
pub struct PortForwardHandle {
|
||||
local_addr: SocketAddr,
|
||||
abort_handle: tokio::task::AbortHandle,
|
||||
}
|
||||
|
||||
impl PortForwardHandle {
|
||||
/// The local address the listener is bound to.
|
||||
pub fn local_addr(&self) -> SocketAddr {
|
||||
self.local_addr
|
||||
}
|
||||
|
||||
/// The local port (convenience for `local_addr().port()`).
|
||||
pub fn port(&self) -> u16 {
|
||||
self.local_addr.port()
|
||||
}
|
||||
|
||||
/// Stop the port-forward and close the listener.
|
||||
pub fn abort(&self) {
|
||||
self.abort_handle.abort();
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for PortForwardHandle {
|
||||
fn drop(&mut self) {
|
||||
self.abort_handle.abort();
|
||||
}
|
||||
}
|
||||
|
||||
impl K8sClient {
|
||||
/// Forward a pod port to a local TCP listener.
|
||||
///
|
||||
/// Binds `127.0.0.1:{local_port}` (pass 0 to let the OS pick a free port)
|
||||
/// and proxies every incoming TCP connection to the pod's `remote_port`
|
||||
/// through the Kubernetes API server's portforward subresource (WebSocket).
|
||||
///
|
||||
/// Returns a [`PortForwardHandle`] whose [`port()`](PortForwardHandle::port)
|
||||
/// gives the actual bound port. The forward runs in a background task and
|
||||
/// is automatically stopped when the handle is dropped.
|
||||
pub async fn port_forward(
|
||||
&self,
|
||||
pod_name: &str,
|
||||
namespace: &str,
|
||||
local_port: u16,
|
||||
remote_port: u16,
|
||||
) -> Result<PortForwardHandle, Error> {
|
||||
let listener = TcpListener::bind(SocketAddr::from(([127, 0, 0, 1], local_port)))
|
||||
.await
|
||||
.map_err(|e| {
|
||||
Error::Discovery(DiscoveryError::MissingResource(format!(
|
||||
"Failed to bind 127.0.0.1:{local_port}: {e}"
|
||||
)))
|
||||
})?;
|
||||
|
||||
let local_addr = listener.local_addr().map_err(|e| {
|
||||
Error::Discovery(DiscoveryError::MissingResource(format!(
|
||||
"Failed to get local address: {e}"
|
||||
)))
|
||||
})?;
|
||||
|
||||
info!(
|
||||
"Port-forward {} -> {}/{}:{}",
|
||||
local_addr, namespace, pod_name, remote_port
|
||||
);
|
||||
|
||||
let client = self.client.clone();
|
||||
let ns = namespace.to_string();
|
||||
let pod = pod_name.to_string();
|
||||
|
||||
let task = tokio::spawn(async move {
|
||||
let api: Api<Pod> = Api::namespaced(client, &ns);
|
||||
loop {
|
||||
let (mut tcp_stream, peer) = match listener.accept().await {
|
||||
Ok(conn) => conn,
|
||||
Err(e) => {
|
||||
debug!("Port-forward listener accept error: {e}");
|
||||
break;
|
||||
}
|
||||
};
|
||||
|
||||
debug!("Port-forward connection from {peer}");
|
||||
|
||||
let api = api.clone();
|
||||
let pod = pod.clone();
|
||||
tokio::spawn(async move {
|
||||
let mut pf = match api.portforward(&pod, &[remote_port]).await {
|
||||
Ok(pf) => pf,
|
||||
Err(e) => {
|
||||
error!("Port-forward WebSocket setup failed: {e}");
|
||||
return;
|
||||
}
|
||||
};
|
||||
|
||||
let mut kube_stream = match pf.take_stream(remote_port) {
|
||||
Some(s) => s,
|
||||
None => {
|
||||
error!("Port-forward: no stream for port {remote_port}");
|
||||
return;
|
||||
}
|
||||
};
|
||||
|
||||
match tokio::io::copy_bidirectional(&mut tcp_stream, &mut kube_stream).await {
|
||||
Ok((from_client, from_pod)) => {
|
||||
debug!(
|
||||
"Port-forward connection closed ({from_client} bytes sent, {from_pod} bytes received)"
|
||||
);
|
||||
}
|
||||
Err(e) => {
|
||||
debug!("Port-forward copy error: {e}");
|
||||
}
|
||||
}
|
||||
|
||||
drop(pf);
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
Ok(PortForwardHandle {
|
||||
local_addr,
|
||||
abort_handle: task.abort_handle(),
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -151,28 +151,6 @@ impl K8sClient {
|
||||
Ok(!crds.items.is_empty())
|
||||
}
|
||||
|
||||
/// Polls until a CRD is registered in the API server.
|
||||
pub async fn wait_for_crd(&self, name: &str, timeout: Option<Duration>) -> Result<(), Error> {
|
||||
let timeout = timeout.unwrap_or(Duration::from_secs(60));
|
||||
let start = std::time::Instant::now();
|
||||
let poll = Duration::from_secs(2);
|
||||
|
||||
loop {
|
||||
if self.has_crd(name).await? {
|
||||
return Ok(());
|
||||
}
|
||||
if start.elapsed() > timeout {
|
||||
return Err(Error::Discovery(
|
||||
kube::error::DiscoveryError::MissingResource(format!(
|
||||
"CRD '{name}' not registered within {}s",
|
||||
timeout.as_secs()
|
||||
)),
|
||||
));
|
||||
}
|
||||
tokio::time::sleep(poll).await;
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn service_account_api(&self, namespace: &str) -> Api<ServiceAccount> {
|
||||
Api::namespaced(self.client.clone(), namespace)
|
||||
}
|
||||
@@ -292,23 +270,6 @@ impl K8sClient {
|
||||
api.get_opt(name).await
|
||||
}
|
||||
|
||||
/// Deletes a single named resource. Returns `Ok(())` on success or if the
|
||||
/// resource was already absent (idempotent).
|
||||
pub async fn delete_resource<K>(&self, name: &str, namespace: Option<&str>) -> Result<(), Error>
|
||||
where
|
||||
K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
|
||||
<K as Resource>::Scope: ScopeResolver<K>,
|
||||
<K as Resource>::DynamicType: Default,
|
||||
{
|
||||
let api: Api<K> =
|
||||
<<K as Resource>::Scope as ScopeResolver<K>>::get_api(&self.client, namespace);
|
||||
match api.delete(name, &kube::api::DeleteParams::default()).await {
|
||||
Ok(_) => Ok(()),
|
||||
Err(Error::Api(ErrorResponse { code: 404, .. })) => Ok(()),
|
||||
Err(e) => Err(e),
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn list_resources<K>(
|
||||
&self,
|
||||
namespace: Option<&str>,
|
||||
|
||||
@@ -12,7 +12,6 @@ testing = []
|
||||
hex = "0.4"
|
||||
reqwest = { version = "0.11", features = [
|
||||
"blocking",
|
||||
"cookies",
|
||||
"json",
|
||||
"rustls-tls",
|
||||
], default-features = false }
|
||||
@@ -29,7 +28,6 @@ log.workspace = true
|
||||
env_logger.workspace = true
|
||||
async-trait.workspace = true
|
||||
cidr.workspace = true
|
||||
opnsense-api = { path = "../opnsense-api" }
|
||||
opnsense-config = { path = "../opnsense-config" }
|
||||
opnsense-config-xml = { path = "../opnsense-config-xml" }
|
||||
harmony_macros = { path = "../harmony_macros" }
|
||||
@@ -80,15 +78,12 @@ harmony_inventory_agent = { path = "../harmony_inventory_agent" }
|
||||
harmony_secret_derive = { path = "../harmony_secret_derive" }
|
||||
harmony_secret = { path = "../harmony_secret" }
|
||||
askama.workspace = true
|
||||
sha2 = "0.10"
|
||||
sqlx.workspace = true
|
||||
inquire.workspace = true
|
||||
brocade = { path = "../brocade" }
|
||||
option-ext = "0.2.0"
|
||||
rand.workspace = true
|
||||
virt = "0.4.3"
|
||||
|
||||
[dev-dependencies]
|
||||
pretty_assertions.workspace = true
|
||||
assertor.workspace = true
|
||||
httptest = "0.16"
|
||||
|
||||
@@ -8,12 +8,6 @@ pub struct OPNSenseFirewallCredentials {
|
||||
pub password: String,
|
||||
}
|
||||
|
||||
#[derive(Secret, Serialize, Deserialize, JsonSchema, Debug, PartialEq)]
|
||||
pub struct OPNSenseApiCredentials {
|
||||
pub key: String,
|
||||
pub secret: String,
|
||||
}
|
||||
|
||||
// TODO we need a better way to handle multiple "instances" of the same secret structure.
|
||||
#[derive(Secret, Serialize, Deserialize, JsonSchema, Debug, PartialEq)]
|
||||
pub struct SshKeyPair {
|
||||
|
||||
@@ -2,7 +2,6 @@ use harmony_types::id::Id;
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use log::info;
|
||||
use serde::Serialize;
|
||||
use serde_value::Value;
|
||||
|
||||
@@ -13,18 +12,6 @@ use super::{
|
||||
topology::Topology,
|
||||
};
|
||||
|
||||
/// Format a duration in a human-readable way.
|
||||
fn format_duration(d: std::time::Duration) -> String {
|
||||
let secs = d.as_secs();
|
||||
if secs < 60 {
|
||||
format!("{:.1}s", d.as_secs_f64())
|
||||
} else if secs < 3600 {
|
||||
format!("{}m {}s", secs / 60, secs % 60)
|
||||
} else {
|
||||
format!("{}h {}m {}s", secs / 3600, (secs % 3600) / 60, secs % 60)
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait Score<T: Topology>:
|
||||
std::fmt::Debug + ScoreToString<T> + Send + Sync + CloneBoxScore<T> + SerializeScore<T>
|
||||
@@ -36,47 +23,22 @@ pub trait Score<T: Topology>:
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let id = Id::default();
|
||||
let interpret = self.create_interpret();
|
||||
let score_name = self.name();
|
||||
let interpret_name = interpret.get_name().to_string();
|
||||
|
||||
instrumentation::instrument(HarmonyEvent::InterpretExecutionStarted {
|
||||
execution_id: id.clone().to_string(),
|
||||
topology: topology.name().into(),
|
||||
interpret: interpret_name.clone(),
|
||||
score: score_name.clone(),
|
||||
message: format!("{} running...", interpret_name),
|
||||
interpret: interpret.get_name().to_string(),
|
||||
score: self.name(),
|
||||
message: format!("{} running...", interpret.get_name()),
|
||||
})
|
||||
.unwrap();
|
||||
|
||||
let start = std::time::Instant::now();
|
||||
let result = interpret.execute(inventory, topology).await;
|
||||
let elapsed = start.elapsed();
|
||||
|
||||
match &result {
|
||||
Ok(outcome) => {
|
||||
info!(
|
||||
"[{}] {} in {} — {}",
|
||||
score_name,
|
||||
outcome.status,
|
||||
format_duration(elapsed),
|
||||
outcome.message
|
||||
);
|
||||
}
|
||||
Err(e) => {
|
||||
info!(
|
||||
"[{}] FAILED after {} — {}",
|
||||
score_name,
|
||||
format_duration(elapsed),
|
||||
e
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
instrumentation::instrument(HarmonyEvent::InterpretExecutionFinished {
|
||||
execution_id: id.clone().to_string(),
|
||||
topology: topology.name().into(),
|
||||
interpret: interpret_name,
|
||||
score: score_name,
|
||||
interpret: interpret.get_name().to_string(),
|
||||
score: self.name(),
|
||||
outcome: result.clone(),
|
||||
})
|
||||
.unwrap();
|
||||
|
||||
@@ -1,844 +0,0 @@
|
||||
//! Higher-order topology for managing an OPNsense firewall HA pair.
|
||||
//!
|
||||
//! Wraps a primary and backup `OPNSenseFirewall` instance. Most scores are
|
||||
//! applied identically to both; CARP VIPs get differentiated advskew values
|
||||
//! (primary=0, backup=configurable) to establish correct failover priority.
|
||||
//!
|
||||
//! See ROADMAP/10-firewall-pair-topology.md for future work (generic trait,
|
||||
//! delegation macro, XMLRPC sync, integration tests).
|
||||
//! See ROADMAP/11-named-config-instances.md for per-device credential support.
|
||||
|
||||
use std::net::IpAddr;
|
||||
use std::str::FromStr;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use harmony_types::firewall::VipMode;
|
||||
use harmony_types::id::Id;
|
||||
use harmony_types::net::{IpAddress, MacAddress};
|
||||
use log::info;
|
||||
use serde::Serialize;
|
||||
|
||||
use crate::config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials};
|
||||
use crate::data::Version;
|
||||
use crate::executors::ExecutorError;
|
||||
use crate::infra::opnsense::OPNSenseFirewall;
|
||||
use crate::interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome};
|
||||
use crate::inventory::Inventory;
|
||||
use crate::modules::opnsense::dnat::DnatScore;
|
||||
use crate::modules::opnsense::firewall::{BinatScore, FirewallRuleScore, OutboundNatScore};
|
||||
use crate::modules::opnsense::lagg::LaggScore;
|
||||
use crate::modules::opnsense::vip::VipDef;
|
||||
use crate::modules::opnsense::vlan::VlanScore;
|
||||
use crate::score::Score;
|
||||
use crate::topology::{
|
||||
DHCPStaticEntry, DhcpServer, LogicalHost, PreparationError, PreparationOutcome, PxeOptions,
|
||||
Topology,
|
||||
};
|
||||
|
||||
use harmony_secret::SecretManager;
|
||||
|
||||
// ── FirewallPairTopology ───────────────────────────────────────────
|
||||
|
||||
/// An OPNsense HA firewall pair managed via CARP.
|
||||
///
|
||||
/// Configuration is applied independently to both firewalls (not via XMLRPC
|
||||
/// sync), since some settings like CARP advskew intentionally differ between
|
||||
/// primary and backup.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct FirewallPairTopology {
|
||||
pub primary: OPNSenseFirewall,
|
||||
pub backup: OPNSenseFirewall,
|
||||
}
|
||||
|
||||
impl FirewallPairTopology {
|
||||
/// Construct a firewall pair from the harmony config system.
|
||||
///
|
||||
/// Reads the following environment variables:
|
||||
/// - `OPNSENSE_PRIMARY_IP` — IP address of the primary firewall
|
||||
/// - `OPNSENSE_BACKUP_IP` — IP address of the backup firewall
|
||||
/// - `OPNSENSE_API_PORT` — API/web GUI port (default: 443)
|
||||
///
|
||||
/// Credentials are loaded via `SecretManager::get_or_prompt`.
|
||||
pub async fn opnsense_from_config() -> Self {
|
||||
// TODO: both firewalls share the same credentials. Once named config
|
||||
// instances are available (ROADMAP/11), use per-device credentials:
|
||||
// ConfigManager::get_named::<OPNSenseApiCredentials>("fw-primary")
|
||||
let ssh_creds = SecretManager::get_or_prompt::<OPNSenseFirewallCredentials>()
|
||||
.await
|
||||
.expect("Failed to get SSH credentials");
|
||||
|
||||
let api_creds = SecretManager::get_or_prompt::<OPNSenseApiCredentials>()
|
||||
.await
|
||||
.expect("Failed to get API credentials");
|
||||
|
||||
Self::opnsense_with_credentials(&ssh_creds, &api_creds, &ssh_creds, &api_creds).await
|
||||
}
|
||||
|
||||
pub async fn opnsense_with_credentials(
|
||||
primary_ssh_creds: &OPNSenseFirewallCredentials,
|
||||
primary_api_creds: &OPNSenseApiCredentials,
|
||||
backup_ssh_creds: &OPNSenseFirewallCredentials,
|
||||
backup_api_creds: &OPNSenseApiCredentials,
|
||||
) -> Self {
|
||||
let primary_ip =
|
||||
std::env::var("OPNSENSE_PRIMARY_IP").expect("OPNSENSE_PRIMARY_IP must be set");
|
||||
let backup_ip =
|
||||
std::env::var("OPNSENSE_BACKUP_IP").expect("OPNSENSE_BACKUP_IP must be set");
|
||||
let api_port: u16 = std::env::var("OPNSENSE_API_PORT")
|
||||
.ok()
|
||||
.map(|p| {
|
||||
p.parse()
|
||||
.expect("OPNSENSE_API_PORT must be a valid port number")
|
||||
})
|
||||
.unwrap_or(443);
|
||||
|
||||
let primary_host = LogicalHost {
|
||||
ip: IpAddr::from_str(&primary_ip).expect("OPNSENSE_PRIMARY_IP must be a valid IP"),
|
||||
name: "fw-primary".to_string(),
|
||||
};
|
||||
let backup_host = LogicalHost {
|
||||
ip: IpAddr::from_str(&backup_ip).expect("OPNSENSE_BACKUP_IP must be a valid IP"),
|
||||
name: "fw-backup".to_string(),
|
||||
};
|
||||
|
||||
info!("Connecting to primary firewall at {primary_ip}:{api_port}");
|
||||
let primary = OPNSenseFirewall::with_api_port(
|
||||
primary_host,
|
||||
None,
|
||||
api_port,
|
||||
&primary_api_creds,
|
||||
&primary_ssh_creds,
|
||||
)
|
||||
.await;
|
||||
|
||||
info!("Connecting to backup firewall at {backup_ip}:{api_port}");
|
||||
let backup = OPNSenseFirewall::with_api_port(
|
||||
backup_host,
|
||||
None,
|
||||
api_port,
|
||||
&backup_api_creds,
|
||||
&backup_ssh_creds,
|
||||
)
|
||||
.await;
|
||||
|
||||
Self { primary, backup }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Topology for FirewallPairTopology {
|
||||
fn name(&self) -> &str {
|
||||
"FirewallPairTopology"
|
||||
}
|
||||
|
||||
async fn ensure_ready(&self) -> Result<PreparationOutcome, PreparationError> {
|
||||
let primary_outcome = self.primary.ensure_ready().await?;
|
||||
let backup_outcome = self.backup.ensure_ready().await?;
|
||||
|
||||
match (primary_outcome, backup_outcome) {
|
||||
(PreparationOutcome::Noop, PreparationOutcome::Noop) => Ok(PreparationOutcome::Noop),
|
||||
(p, b) => {
|
||||
let mut details = Vec::new();
|
||||
if let PreparationOutcome::Success { details: d } = p {
|
||||
details.push(format!("Primary: {}", d));
|
||||
}
|
||||
if let PreparationOutcome::Success { details: d } = b {
|
||||
details.push(format!("Backup: {}", d));
|
||||
}
|
||||
Ok(PreparationOutcome::Success {
|
||||
details: details.join(", "),
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── DhcpServer delegation ──────────────────────────────────────────
|
||||
//
|
||||
// Required so that DhcpScore (which uses `impl<T: Topology + DhcpServer> Score<T>`)
|
||||
// automatically works with FirewallPairTopology.
|
||||
|
||||
#[async_trait]
|
||||
impl DhcpServer for FirewallPairTopology {
|
||||
async fn commit_config(&self) -> Result<(), ExecutorError> {
|
||||
self.primary.commit_config().await?;
|
||||
self.backup.commit_config().await
|
||||
}
|
||||
|
||||
async fn add_static_mapping(&self, entry: &DHCPStaticEntry) -> Result<(), ExecutorError> {
|
||||
self.primary.add_static_mapping(entry).await?;
|
||||
self.backup.add_static_mapping(entry).await
|
||||
}
|
||||
|
||||
async fn remove_static_mapping(&self, mac: &MacAddress) -> Result<(), ExecutorError> {
|
||||
self.primary.remove_static_mapping(mac).await?;
|
||||
self.backup.remove_static_mapping(mac).await
|
||||
}
|
||||
|
||||
async fn list_static_mappings(&self) -> Vec<(MacAddress, IpAddress)> {
|
||||
// Return primary's view — both should be identical
|
||||
self.primary.list_static_mappings().await
|
||||
}
|
||||
|
||||
/// Returns the primary firewall's IP. In a CARP setup, callers
|
||||
/// typically want the CARP VIP instead — use the VIP address directly.
|
||||
fn get_ip(&self) -> IpAddress {
|
||||
self.primary.get_ip()
|
||||
}
|
||||
|
||||
/// Returns the primary firewall's host. See `get_ip()` note.
|
||||
fn get_host(&self) -> LogicalHost {
|
||||
self.primary.get_host()
|
||||
}
|
||||
|
||||
async fn set_pxe_options(&self, options: PxeOptions) -> Result<(), ExecutorError> {
|
||||
// PXE options are the same on both; construct a second copy for backup
|
||||
let backup_options = PxeOptions {
|
||||
ipxe_filename: options.ipxe_filename.clone(),
|
||||
bios_filename: options.bios_filename.clone(),
|
||||
efi_filename: options.efi_filename.clone(),
|
||||
tftp_ip: options.tftp_ip,
|
||||
};
|
||||
self.primary.set_pxe_options(options).await?;
|
||||
self.backup.set_pxe_options(backup_options).await
|
||||
}
|
||||
|
||||
async fn set_dhcp_range(
|
||||
&self,
|
||||
start: &IpAddress,
|
||||
end: &IpAddress,
|
||||
) -> Result<(), ExecutorError> {
|
||||
self.primary.set_dhcp_range(start, end).await?;
|
||||
self.backup.set_dhcp_range(start, end).await
|
||||
}
|
||||
}
|
||||
|
||||
// ── Helper for uniform score delegation ────────────────────────────
|
||||
|
||||
/// Standard boilerplate for Interpret methods on pair scores.
|
||||
macro_rules! pair_interpret_boilerplate {
|
||||
($name:expr) => {
|
||||
fn get_name(&self) -> InterpretName {
|
||||
InterpretName::Custom($name)
|
||||
}
|
||||
|
||||
fn get_version(&self) -> Version {
|
||||
Version::from("1.0.0").unwrap()
|
||||
}
|
||||
|
||||
fn get_status(&self) -> InterpretStatus {
|
||||
InterpretStatus::QUEUED
|
||||
}
|
||||
|
||||
fn get_children(&self) -> Vec<Id> {
|
||||
vec![]
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
// ── LaggScore for FirewallPairTopology ──────────────────────────────
|
||||
|
||||
impl Score<FirewallPairTopology> for LaggScore {
|
||||
fn name(&self) -> String {
|
||||
"LaggScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
|
||||
Box::new(LaggPairInterpret {
|
||||
score: self.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct LaggPairInterpret {
|
||||
score: LaggScore,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<FirewallPairTopology> for LaggPairInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &FirewallPairTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let inner = self.score.create_interpret();
|
||||
info!("Applying LaggScore to primary firewall");
|
||||
inner.execute(inventory, &topology.primary).await?;
|
||||
info!("Applying LaggScore to backup firewall");
|
||||
inner.execute(inventory, &topology.backup).await
|
||||
}
|
||||
|
||||
pair_interpret_boilerplate!("LaggScore (pair)");
|
||||
}
|
||||
|
||||
// ── VlanScore for FirewallPairTopology ──────────────────────────────
|
||||
|
||||
impl Score<FirewallPairTopology> for VlanScore {
|
||||
fn name(&self) -> String {
|
||||
"VlanScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
|
||||
Box::new(VlanPairInterpret {
|
||||
score: self.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct VlanPairInterpret {
|
||||
score: VlanScore,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<FirewallPairTopology> for VlanPairInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &FirewallPairTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let inner = self.score.create_interpret();
|
||||
info!("Applying VlanScore to primary firewall");
|
||||
inner.execute(inventory, &topology.primary).await?;
|
||||
info!("Applying VlanScore to backup firewall");
|
||||
inner.execute(inventory, &topology.backup).await
|
||||
}
|
||||
|
||||
pair_interpret_boilerplate!("VlanScore (pair)");
|
||||
}
|
||||
|
||||
// ── FirewallRuleScore for FirewallPairTopology ─────────────────────
|
||||
|
||||
impl Score<FirewallPairTopology> for FirewallRuleScore {
|
||||
fn name(&self) -> String {
|
||||
"FirewallRuleScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
|
||||
Box::new(FirewallRulePairInterpret {
|
||||
score: self.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct FirewallRulePairInterpret {
|
||||
score: FirewallRuleScore,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<FirewallPairTopology> for FirewallRulePairInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &FirewallPairTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let inner = self.score.create_interpret();
|
||||
info!("Applying FirewallRuleScore to primary firewall");
|
||||
inner.execute(inventory, &topology.primary).await?;
|
||||
info!("Applying FirewallRuleScore to backup firewall");
|
||||
inner.execute(inventory, &topology.backup).await
|
||||
}
|
||||
|
||||
pair_interpret_boilerplate!("FirewallRuleScore (pair)");
|
||||
}
|
||||
|
||||
// ── BinatScore for FirewallPairTopology ────────────────────────────
|
||||
|
||||
impl Score<FirewallPairTopology> for BinatScore {
|
||||
fn name(&self) -> String {
|
||||
"BinatScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
|
||||
Box::new(BinatPairInterpret {
|
||||
score: self.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct BinatPairInterpret {
|
||||
score: BinatScore,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<FirewallPairTopology> for BinatPairInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &FirewallPairTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let inner = self.score.create_interpret();
|
||||
info!("Applying BinatScore to primary firewall");
|
||||
inner.execute(inventory, &topology.primary).await?;
|
||||
info!("Applying BinatScore to backup firewall");
|
||||
inner.execute(inventory, &topology.backup).await
|
||||
}
|
||||
|
||||
pair_interpret_boilerplate!("BinatScore (pair)");
|
||||
}
|
||||
|
||||
// ── OutboundNatScore for FirewallPairTopology ──────────────────────
|
||||
|
||||
impl Score<FirewallPairTopology> for OutboundNatScore {
|
||||
fn name(&self) -> String {
|
||||
"OutboundNatScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
|
||||
Box::new(OutboundNatPairInterpret {
|
||||
score: self.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct OutboundNatPairInterpret {
|
||||
score: OutboundNatScore,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<FirewallPairTopology> for OutboundNatPairInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &FirewallPairTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let inner = self.score.create_interpret();
|
||||
info!("Applying OutboundNatScore to primary firewall");
|
||||
inner.execute(inventory, &topology.primary).await?;
|
||||
info!("Applying OutboundNatScore to backup firewall");
|
||||
inner.execute(inventory, &topology.backup).await
|
||||
}
|
||||
|
||||
pair_interpret_boilerplate!("OutboundNatScore (pair)");
|
||||
}
|
||||
|
||||
// ── DnatScore for FirewallPairTopology ─────────────────────────────
|
||||
|
||||
impl Score<FirewallPairTopology> for DnatScore {
|
||||
fn name(&self) -> String {
|
||||
"DnatScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
|
||||
Box::new(DnatPairInterpret {
|
||||
score: self.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct DnatPairInterpret {
|
||||
score: DnatScore,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<FirewallPairTopology> for DnatPairInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
topology: &FirewallPairTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let inner = self.score.create_interpret();
|
||||
info!("Applying DnatScore to primary firewall");
|
||||
inner.execute(inventory, &topology.primary).await?;
|
||||
info!("Applying DnatScore to backup firewall");
|
||||
inner.execute(inventory, &topology.backup).await
|
||||
}
|
||||
|
||||
pair_interpret_boilerplate!("DnatScore (pair)");
|
||||
}
|
||||
|
||||
// ── CarpVipScore ───────────────────────────────────────────────────
|
||||
|
||||
/// CARP-aware VIP score for firewall pairs.
|
||||
///
|
||||
/// Applies VIPs to both firewalls with differentiated CARP priority:
|
||||
/// - Primary always gets `advskew=0` (highest priority, becomes CARP master)
|
||||
/// - Backup gets `backup_advskew` (default 100, lower priority)
|
||||
///
|
||||
/// Non-CARP VIPs (IP alias, ProxyARP) are applied identically to both.
|
||||
///
|
||||
/// This is a distinct type from `VipScore` because the caller does not
|
||||
/// specify advskew per-firewall — the pair semantics enforce it.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct CarpVipScore {
|
||||
pub vips: Vec<VipDef>,
|
||||
/// advskew applied to backup firewall for CARP VIPs (default 100).
|
||||
/// Primary always gets advskew=0.
|
||||
pub backup_advskew: Option<u16>,
|
||||
}
|
||||
|
||||
impl Score<FirewallPairTopology> for CarpVipScore {
|
||||
fn name(&self) -> String {
|
||||
"CarpVipScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<FirewallPairTopology>> {
|
||||
Box::new(CarpVipInterpret {
|
||||
score: self.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct CarpVipInterpret {
|
||||
score: CarpVipScore,
|
||||
}
|
||||
|
||||
impl CarpVipInterpret {
|
||||
async fn apply_vips_to(
|
||||
&self,
|
||||
firewall: &OPNSenseFirewall,
|
||||
role: &str,
|
||||
carp_advskew: u16,
|
||||
) -> Result<(), InterpretError> {
|
||||
let vip_config = firewall.get_opnsense_config().vip();
|
||||
for vip in &self.score.vips {
|
||||
let advskew = if vip.mode == VipMode::Carp {
|
||||
Some(carp_advskew)
|
||||
} else {
|
||||
vip.advskew
|
||||
};
|
||||
info!(
|
||||
"Ensuring VIP {} on {} {} (advskew={:?})",
|
||||
vip.subnet, role, vip.interface, advskew
|
||||
);
|
||||
vip_config
|
||||
.ensure_vip_from(
|
||||
&vip.mode,
|
||||
&vip.interface,
|
||||
&vip.subnet,
|
||||
vip.subnet_bits,
|
||||
vip.vhid,
|
||||
vip.advbase,
|
||||
advskew,
|
||||
vip.password.as_deref(),
|
||||
vip.peer.as_deref(),
|
||||
)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Interpret<FirewallPairTopology> for CarpVipInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
_inventory: &Inventory,
|
||||
topology: &FirewallPairTopology,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let backup_skew = self.score.backup_advskew.unwrap_or(100);
|
||||
|
||||
self.apply_vips_to(&topology.primary, "primary", 0).await?;
|
||||
self.apply_vips_to(&topology.backup, "backup", backup_skew)
|
||||
.await?;
|
||||
|
||||
Ok(Outcome::success(format!(
|
||||
"Configured {} VIPs on pair (primary advskew=0, backup advskew={})",
|
||||
self.score.vips.len(),
|
||||
backup_skew
|
||||
)))
|
||||
}
|
||||
|
||||
pair_interpret_boilerplate!("CarpVipScore");
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use httptest::{Expectation, Server, matchers::request, responders::*};
|
||||
use opnsense_api::OpnsenseClient;
|
||||
use std::sync::Arc;
|
||||
|
||||
/// Dummy SSH shell for tests — never called, satisfies the `OPNsenseShell` trait.
|
||||
#[derive(Debug)]
|
||||
struct NoopShell;
|
||||
|
||||
#[async_trait]
|
||||
impl opnsense_config::config::OPNsenseShell for NoopShell {
|
||||
async fn exec(&self, _cmd: &str) -> Result<String, opnsense_config::Error> {
|
||||
unimplemented!("test-only shell")
|
||||
}
|
||||
async fn write_content_to_temp_file(
|
||||
&self,
|
||||
_content: &str,
|
||||
) -> Result<String, opnsense_config::Error> {
|
||||
unimplemented!("test-only shell")
|
||||
}
|
||||
async fn write_content_to_file(
|
||||
&self,
|
||||
_content: &str,
|
||||
_filename: &str,
|
||||
) -> Result<String, opnsense_config::Error> {
|
||||
unimplemented!("test-only shell")
|
||||
}
|
||||
async fn upload_folder(
|
||||
&self,
|
||||
_source: &str,
|
||||
_destination: &str,
|
||||
) -> Result<String, opnsense_config::Error> {
|
||||
unimplemented!("test-only shell")
|
||||
}
|
||||
}
|
||||
|
||||
fn mock_opnsense_config(server: &Server) -> opnsense_config::Config {
|
||||
let url = server.url("/api").to_string();
|
||||
let client = OpnsenseClient::builder()
|
||||
.base_url(url)
|
||||
.auth_from_key_secret("test_key", "test_secret")
|
||||
.build()
|
||||
.unwrap();
|
||||
let shell: Arc<dyn opnsense_config::config::OPNsenseShell> = Arc::new(NoopShell);
|
||||
opnsense_config::Config::new(client, shell)
|
||||
}
|
||||
|
||||
fn mock_firewall(server: &Server, name: &str) -> OPNSenseFirewall {
|
||||
let host = LogicalHost {
|
||||
ip: "127.0.0.1".parse().unwrap(),
|
||||
name: name.to_string(),
|
||||
};
|
||||
OPNSenseFirewall::from_config(host, mock_opnsense_config(server))
|
||||
}
|
||||
|
||||
fn mock_pair(primary_server: &Server, backup_server: &Server) -> FirewallPairTopology {
|
||||
FirewallPairTopology {
|
||||
primary: mock_firewall(primary_server, "fw-primary"),
|
||||
backup: mock_firewall(backup_server, "fw-backup"),
|
||||
}
|
||||
}
|
||||
|
||||
fn vip_search_empty() -> serde_json::Value {
|
||||
serde_json::json!({ "rows": [] })
|
||||
}
|
||||
|
||||
fn vip_add_ok() -> serde_json::Value {
|
||||
serde_json::json!({ "uuid": "new-uuid" })
|
||||
}
|
||||
|
||||
fn vip_reconfigure_ok() -> serde_json::Value {
|
||||
serde_json::json!({ "status": "ok" })
|
||||
}
|
||||
|
||||
/// Set up a mock server to expect a VIP creation (search → add → reconfigure).
|
||||
fn expect_vip_creation(server: &Server) {
|
||||
server.expect(
|
||||
Expectation::matching(request::method_path(
|
||||
"GET",
|
||||
"/api/interfaces/vip_settings/searchItem",
|
||||
))
|
||||
.respond_with(json_encoded(vip_search_empty())),
|
||||
);
|
||||
server.expect(
|
||||
Expectation::matching(request::method_path(
|
||||
"POST",
|
||||
"/api/interfaces/vip_settings/addItem",
|
||||
))
|
||||
.respond_with(json_encoded(vip_add_ok())),
|
||||
);
|
||||
server.expect(
|
||||
Expectation::matching(request::method_path(
|
||||
"POST",
|
||||
"/api/interfaces/vip_settings/reconfigure",
|
||||
))
|
||||
.respond_with(json_encoded(vip_reconfigure_ok())),
|
||||
);
|
||||
}
|
||||
|
||||
// ── ensure_ready tests ─────────────────────────────────────────
|
||||
|
||||
#[tokio::test]
|
||||
async fn ensure_ready_merges_both_success() {
|
||||
let s1 = Server::run();
|
||||
let s2 = Server::run();
|
||||
let pair = mock_pair(&s1, &s2);
|
||||
|
||||
let result = pair.ensure_ready().await.unwrap();
|
||||
match result {
|
||||
PreparationOutcome::Success { details } => {
|
||||
assert!(details.contains("Primary"));
|
||||
assert!(details.contains("Backup"));
|
||||
}
|
||||
PreparationOutcome::Noop => panic!("Expected Success, got Noop"),
|
||||
}
|
||||
}
|
||||
|
||||
// ── CarpVipScore tests ─────────────────────────────────────────
|
||||
|
||||
#[tokio::test]
|
||||
async fn carp_vip_score_applies_to_both_firewalls() {
|
||||
let primary_server = Server::run();
|
||||
let backup_server = Server::run();
|
||||
|
||||
// Both firewalls should receive VIP creation calls
|
||||
expect_vip_creation(&primary_server);
|
||||
expect_vip_creation(&backup_server);
|
||||
|
||||
let pair = mock_pair(&primary_server, &backup_server);
|
||||
let inventory = Inventory::empty();
|
||||
|
||||
let score = CarpVipScore {
|
||||
vips: vec![VipDef {
|
||||
mode: VipMode::Carp,
|
||||
interface: "lan".to_string(),
|
||||
subnet: "192.168.1.1".to_string(),
|
||||
subnet_bits: 24,
|
||||
vhid: Some(1),
|
||||
advbase: Some(1),
|
||||
advskew: None,
|
||||
password: Some("secret".to_string()),
|
||||
peer: None,
|
||||
}],
|
||||
backup_advskew: Some(100),
|
||||
};
|
||||
|
||||
let result = score.interpret(&inventory, &pair).await;
|
||||
assert!(result.is_ok(), "CarpVipScore should succeed: {:?}", result);
|
||||
|
||||
let outcome = result.unwrap();
|
||||
assert!(
|
||||
outcome.message.contains("primary advskew=0"),
|
||||
"Message should mention primary advskew: {}",
|
||||
outcome.message
|
||||
);
|
||||
assert!(
|
||||
outcome.message.contains("backup advskew=100"),
|
||||
"Message should mention backup advskew: {}",
|
||||
outcome.message
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn carp_vip_score_sends_to_both_and_reports_advskew() {
|
||||
let primary_server = Server::run();
|
||||
let backup_server = Server::run();
|
||||
|
||||
// Both firewalls should receive VIP creation calls
|
||||
expect_vip_creation(&primary_server);
|
||||
expect_vip_creation(&backup_server);
|
||||
|
||||
let pair = mock_pair(&primary_server, &backup_server);
|
||||
let inventory = Inventory::empty();
|
||||
|
||||
let score = CarpVipScore {
|
||||
vips: vec![VipDef {
|
||||
mode: VipMode::Carp,
|
||||
interface: "lan".to_string(),
|
||||
subnet: "10.0.0.1".to_string(),
|
||||
subnet_bits: 32,
|
||||
vhid: Some(1),
|
||||
advbase: Some(1),
|
||||
advskew: None,
|
||||
password: Some("pass".to_string()),
|
||||
peer: None,
|
||||
}],
|
||||
backup_advskew: Some(50),
|
||||
};
|
||||
|
||||
let result = score.interpret(&inventory, &pair).await;
|
||||
assert!(result.is_ok(), "CarpVipScore should succeed: {:?}", result);
|
||||
|
||||
let outcome = result.unwrap();
|
||||
assert!(
|
||||
outcome.message.contains("backup advskew=50"),
|
||||
"Custom backup_advskew should be respected: {}",
|
||||
outcome.message
|
||||
);
|
||||
// httptest verifies both servers received exactly the expected API calls
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn carp_vip_score_default_backup_advskew_is_100() {
|
||||
let primary_server = Server::run();
|
||||
let backup_server = Server::run();
|
||||
|
||||
expect_vip_creation(&primary_server);
|
||||
expect_vip_creation(&backup_server);
|
||||
|
||||
let pair = mock_pair(&primary_server, &backup_server);
|
||||
let inventory = Inventory::empty();
|
||||
|
||||
// backup_advskew is None — should default to 100
|
||||
let score = CarpVipScore {
|
||||
vips: vec![VipDef {
|
||||
mode: VipMode::Carp,
|
||||
interface: "lan".to_string(),
|
||||
subnet: "10.0.0.1".to_string(),
|
||||
subnet_bits: 32,
|
||||
vhid: Some(1),
|
||||
advbase: Some(1),
|
||||
advskew: None,
|
||||
password: None,
|
||||
peer: None,
|
||||
}],
|
||||
backup_advskew: None,
|
||||
};
|
||||
|
||||
let result = score.interpret(&inventory, &pair).await;
|
||||
assert!(result.is_ok());
|
||||
let outcome = result.unwrap();
|
||||
assert!(
|
||||
outcome.message.contains("backup advskew=100"),
|
||||
"Default backup advskew should be 100: {}",
|
||||
outcome.message
|
||||
);
|
||||
}
|
||||
|
||||
// ── Uniform score delegation tests ─────────────────────────────
|
||||
|
||||
#[tokio::test]
|
||||
async fn vlan_score_applies_to_both_firewalls() {
|
||||
let primary_server = Server::run();
|
||||
let backup_server = Server::run();
|
||||
|
||||
// VLAN API: GET .../get to list, POST .../addItem to create, POST .../reconfigure to apply
|
||||
fn expect_vlan_creation(server: &Server) {
|
||||
server.expect(
|
||||
Expectation::matching(request::method_path(
|
||||
"GET",
|
||||
"/api/interfaces/vlan_settings/get",
|
||||
))
|
||||
.respond_with(json_encoded(serde_json::json!({
|
||||
"vlan": { "vlan": [] }
|
||||
}))),
|
||||
);
|
||||
server.expect(
|
||||
Expectation::matching(request::method_path(
|
||||
"POST",
|
||||
"/api/interfaces/vlan_settings/addItem",
|
||||
))
|
||||
.respond_with(json_encoded(serde_json::json!({ "uuid": "vlan-uuid" }))),
|
||||
);
|
||||
server.expect(
|
||||
Expectation::matching(request::method_path(
|
||||
"POST",
|
||||
"/api/interfaces/vlan_settings/reconfigure",
|
||||
))
|
||||
.respond_with(json_encoded(serde_json::json!({ "status": "ok" }))),
|
||||
);
|
||||
}
|
||||
|
||||
expect_vlan_creation(&primary_server);
|
||||
expect_vlan_creation(&backup_server);
|
||||
|
||||
let pair = mock_pair(&primary_server, &backup_server);
|
||||
let inventory = Inventory::empty();
|
||||
|
||||
let score = VlanScore {
|
||||
vlans: vec![crate::modules::opnsense::vlan::VlanDef {
|
||||
parent_interface: "lagg0".to_string(),
|
||||
tag: 50,
|
||||
description: "test_vlan".to_string(),
|
||||
}],
|
||||
};
|
||||
|
||||
let result = score.interpret(&inventory, &pair).await;
|
||||
assert!(result.is_ok(), "VlanScore should succeed: {:?}", result);
|
||||
// httptest verifies both servers received the expected calls
|
||||
}
|
||||
}
|
||||
@@ -204,9 +204,6 @@ impl LoadBalancer for HAClusterTopology {
|
||||
async fn reload_restart(&self) -> Result<(), ExecutorError> {
|
||||
self.load_balancer.reload_restart().await
|
||||
}
|
||||
async fn ensure_wan_access(&self, port: u16) -> Result<(), ExecutorError> {
|
||||
self.load_balancer.ensure_wan_access(port).await
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
|
||||
@@ -30,18 +30,6 @@ pub trait LoadBalancer: Send + Sync {
|
||||
self.add_service(service).await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Ensure a TCP port is open for inbound WAN traffic.
|
||||
///
|
||||
/// This creates a firewall rule to accept traffic on the given port
|
||||
/// from the WAN interface. Used by load balancers that need to receive
|
||||
/// external traffic (e.g., OKD ingress on ports 80/443).
|
||||
///
|
||||
/// Default implementation is a no-op for topologies that don't manage
|
||||
/// firewall rules (e.g., cloud environments with security groups).
|
||||
async fn ensure_wan_access(&self, _port: u16) -> Result<(), ExecutorError> {
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, PartialEq, Clone, Serialize)]
|
||||
|
||||
@@ -1,12 +1,10 @@
|
||||
pub mod decentralized;
|
||||
mod failover;
|
||||
pub mod firewall_pair;
|
||||
mod ha_cluster;
|
||||
pub mod ingress;
|
||||
pub mod node_exporter;
|
||||
pub mod opnsense;
|
||||
pub use failover::*;
|
||||
pub use firewall_pair::*;
|
||||
use harmony_types::net::IpAddress;
|
||||
mod host_binding;
|
||||
mod http;
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
use async_trait::async_trait;
|
||||
use harmony_types::net::MacAddress;
|
||||
use log::{info, warn};
|
||||
use log::info;
|
||||
|
||||
use crate::{
|
||||
executors::ExecutorError,
|
||||
@@ -19,46 +19,24 @@ impl DhcpServer for OPNSenseFirewall {
|
||||
async fn add_static_mapping(&self, entry: &DHCPStaticEntry) -> Result<(), ExecutorError> {
|
||||
let mac: Vec<String> = entry.mac.iter().map(MacAddress::to_string).collect();
|
||||
|
||||
self.opnsense_config
|
||||
.dhcp()
|
||||
.add_static_mapping(&mac, &entry.ip, &entry.name)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
{
|
||||
let mut writable_opnsense = self.opnsense_config.write().await;
|
||||
writable_opnsense
|
||||
.dhcp()
|
||||
.add_static_mapping(&mac, &entry.ip, &entry.name)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
info!("Registered {:?}", entry);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn remove_static_mapping(&self, mac: &MacAddress) -> Result<(), ExecutorError> {
|
||||
self.opnsense_config
|
||||
.dhcp()
|
||||
.remove_static_mapping(&mac.to_string())
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
|
||||
info!("Removed static mapping for MAC {}", mac);
|
||||
Ok(())
|
||||
async fn remove_static_mapping(&self, _mac: &MacAddress) -> Result<(), ExecutorError> {
|
||||
todo!()
|
||||
}
|
||||
|
||||
async fn list_static_mappings(&self) -> Vec<(MacAddress, IpAddress)> {
|
||||
match self.opnsense_config.dhcp().list_static_mappings().await {
|
||||
Ok(mappings) => mappings
|
||||
.into_iter()
|
||||
.filter_map(|(mac_str, ipv4)| {
|
||||
let mac = MacAddress::try_from(mac_str.clone())
|
||||
.map_err(|e| {
|
||||
warn!("Skipping invalid MAC '{}': {}", mac_str, e);
|
||||
e
|
||||
})
|
||||
.ok()?;
|
||||
Some((mac, IpAddress::V4(ipv4)))
|
||||
})
|
||||
.collect(),
|
||||
Err(e) => {
|
||||
warn!("Failed to list static mappings: {}", e);
|
||||
vec![]
|
||||
}
|
||||
}
|
||||
todo!()
|
||||
}
|
||||
|
||||
fn get_ip(&self) -> IpAddress {
|
||||
@@ -70,13 +48,14 @@ impl DhcpServer for OPNSenseFirewall {
|
||||
}
|
||||
|
||||
async fn set_pxe_options(&self, options: PxeOptions) -> Result<(), ExecutorError> {
|
||||
let mut writable_opnsense = self.opnsense_config.write().await;
|
||||
let PxeOptions {
|
||||
ipxe_filename,
|
||||
bios_filename,
|
||||
efi_filename,
|
||||
tftp_ip,
|
||||
} = options;
|
||||
self.opnsense_config
|
||||
writable_opnsense
|
||||
.dhcp()
|
||||
.set_pxe_options(
|
||||
tftp_ip.map(|i| i.to_string()),
|
||||
@@ -95,7 +74,8 @@ impl DhcpServer for OPNSenseFirewall {
|
||||
start: &IpAddress,
|
||||
end: &IpAddress,
|
||||
) -> Result<(), ExecutorError> {
|
||||
self.opnsense_config
|
||||
let mut writable_opnsense = self.opnsense_config.write().await;
|
||||
writable_opnsense
|
||||
.dhcp()
|
||||
.set_dhcp_range(&start.to_string(), &end.to_string())
|
||||
.await
|
||||
|
||||
@@ -11,7 +11,22 @@ use super::OPNSenseFirewall;
|
||||
#[async_trait]
|
||||
impl DnsServer for OPNSenseFirewall {
|
||||
async fn register_hosts(&self, _hosts: Vec<DnsRecord>) -> Result<(), ExecutorError> {
|
||||
todo!("Refactor this to use dnsmasq API")
|
||||
todo!("Refactor this to use dnsmasq")
|
||||
// let mut writable_opnsense = self.opnsense_config.write().await;
|
||||
// let mut dns = writable_opnsense.dns();
|
||||
// let hosts = hosts
|
||||
// .iter()
|
||||
// .map(|h| {
|
||||
// Host::new(
|
||||
// h.host.clone(),
|
||||
// h.domain.clone(),
|
||||
// h.record_type.to_string(),
|
||||
// h.value.to_string(),
|
||||
// )
|
||||
// })
|
||||
// .collect();
|
||||
// dns.add_static_mapping(hosts);
|
||||
// Ok(())
|
||||
}
|
||||
|
||||
fn remove_record(
|
||||
@@ -23,7 +38,26 @@ impl DnsServer for OPNSenseFirewall {
|
||||
}
|
||||
|
||||
async fn list_records(&self) -> Vec<crate::topology::DnsRecord> {
|
||||
todo!("Refactor this to use dnsmasq API")
|
||||
todo!("Refactor this to use dnsmasq")
|
||||
// self.opnsense_config
|
||||
// .write()
|
||||
// .await
|
||||
// .dns()
|
||||
// .get_hosts()
|
||||
// .iter()
|
||||
// .map(|h| DnsRecord {
|
||||
// host: h.hostname.clone(),
|
||||
// domain: h.domain.clone(),
|
||||
// record_type: h
|
||||
// .rr
|
||||
// .parse()
|
||||
// .expect("received invalid record type {h.rr} from opnsense"),
|
||||
// value: h
|
||||
// .server
|
||||
// .parse()
|
||||
// .expect("received invalid ipv4 record from opnsense {h.server}"),
|
||||
// })
|
||||
// .collect()
|
||||
}
|
||||
|
||||
fn get_ip(&self) -> IpAddress {
|
||||
@@ -35,11 +69,23 @@ impl DnsServer for OPNSenseFirewall {
|
||||
}
|
||||
|
||||
async fn register_dhcp_leases(&self, _register: bool) -> Result<(), ExecutorError> {
|
||||
todo!("Refactor this to use dnsmasq API")
|
||||
todo!("Refactor this to use dnsmasq")
|
||||
// let mut writable_opnsense = self.opnsense_config.write().await;
|
||||
// let mut dns = writable_opnsense.dns();
|
||||
// dns.register_dhcp_leases(register);
|
||||
//
|
||||
// Ok(())
|
||||
}
|
||||
|
||||
async fn commit_config(&self) -> Result<(), ExecutorError> {
|
||||
self.opnsense_config
|
||||
let opnsense = self.opnsense_config.read().await;
|
||||
|
||||
opnsense
|
||||
.save()
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
|
||||
opnsense
|
||||
.restart_dns()
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
|
||||
|
||||
@@ -8,53 +8,6 @@ use harmony_types::net::IpAddress;
|
||||
use harmony_types::net::Url;
|
||||
const OPNSENSE_HTTP_ROOT_PATH: &str = "/usr/local/http";
|
||||
|
||||
/// Download a remote URL into a temporary directory, returning the temp dir path.
|
||||
///
|
||||
///
|
||||
/// The file is saved with its original filename (extracted from the URL path).
|
||||
/// The caller can then use `upload_files` to SFTP the whole temp dir contents
|
||||
/// to the OPNsense appliance.
|
||||
pub(in crate::infra::opnsense) async fn download_url_to_temp_dir(
|
||||
url: &url::Url,
|
||||
) -> Result<String, ExecutorError> {
|
||||
let client = reqwest::Client::new();
|
||||
let response =
|
||||
client.get(url.as_str()).send().await.map_err(|e| {
|
||||
ExecutorError::UnexpectedError(format!("Failed to download {url}: {e}"))
|
||||
})?;
|
||||
|
||||
if !response.status().is_success() {
|
||||
return Err(ExecutorError::UnexpectedError(format!(
|
||||
"HTTP {} downloading {url}",
|
||||
response.status()
|
||||
)));
|
||||
}
|
||||
|
||||
let file_name = url
|
||||
.path_segments()
|
||||
.and_then(|s| s.last())
|
||||
.filter(|s| !s.is_empty())
|
||||
.unwrap_or("download");
|
||||
|
||||
let temp_dir = std::env::temp_dir().join("harmony_url_downloads");
|
||||
tokio::fs::create_dir_all(&temp_dir)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(format!("Failed to create temp dir: {e}")))?;
|
||||
|
||||
let dest = temp_dir.join(file_name);
|
||||
let bytes = response
|
||||
.bytes()
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(format!("Failed to read response: {e}")))?;
|
||||
|
||||
tokio::fs::write(&dest, &bytes)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(format!("Failed to write temp file: {e}")))?;
|
||||
|
||||
info!("Downloaded {} to {:?} ({} bytes)", url, dest, bytes.len());
|
||||
Ok(temp_dir.to_string_lossy().to_string())
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl HttpServer for OPNSenseFirewall {
|
||||
async fn serve_files(
|
||||
@@ -62,6 +15,7 @@ impl HttpServer for OPNSenseFirewall {
|
||||
url: &Url,
|
||||
remote_path: &Option<String>,
|
||||
) -> Result<(), ExecutorError> {
|
||||
let config = self.opnsense_config.read().await;
|
||||
info!("Uploading files from url {url} to {OPNSENSE_HTTP_ROOT_PATH}");
|
||||
let remote_upload_path = remote_path
|
||||
.clone()
|
||||
@@ -69,18 +23,12 @@ impl HttpServer for OPNSenseFirewall {
|
||||
.unwrap_or(OPNSENSE_HTTP_ROOT_PATH.to_string());
|
||||
match url {
|
||||
Url::LocalFolder(path) => {
|
||||
self.opnsense_config
|
||||
config
|
||||
.upload_files(path, &remote_upload_path)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
}
|
||||
Url::Url(remote_url) => {
|
||||
let local_dir = download_url_to_temp_dir(remote_url).await?;
|
||||
self.opnsense_config
|
||||
.upload_files(&local_dir, &remote_upload_path)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
}
|
||||
Url::Url(_url) => todo!(),
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
@@ -97,8 +45,9 @@ impl HttpServer for OPNSenseFirewall {
|
||||
}
|
||||
};
|
||||
|
||||
let config = self.opnsense_config.read().await;
|
||||
info!("Uploading file content to {}", path);
|
||||
self.opnsense_config
|
||||
config
|
||||
.upload_file_content(&path, &file.content)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
@@ -115,6 +64,8 @@ impl HttpServer for OPNSenseFirewall {
|
||||
|
||||
async fn reload_restart(&self) -> Result<(), ExecutorError> {
|
||||
self.opnsense_config
|
||||
.write()
|
||||
.await
|
||||
.caddy()
|
||||
.reload_restart()
|
||||
.await
|
||||
@@ -122,20 +73,20 @@ impl HttpServer for OPNSenseFirewall {
|
||||
}
|
||||
|
||||
async fn ensure_initialized(&self) -> Result<(), ExecutorError> {
|
||||
if !self.opnsense_config.caddy().is_installed().await {
|
||||
info!("Http config not available, installing os-caddy package");
|
||||
self.opnsense_config
|
||||
.install_package("os-caddy")
|
||||
.await
|
||||
.map_err(|e| {
|
||||
ExecutorError::UnexpectedError(format!("Failed to install os-caddy: {e:?}"))
|
||||
})?;
|
||||
let mut config = self.opnsense_config.write().await;
|
||||
let caddy = config.caddy();
|
||||
if caddy.get_full_config().is_none() {
|
||||
info!("Http config not available in opnsense config, installing package");
|
||||
config.install_package("os-caddy").await.map_err(|e| {
|
||||
ExecutorError::UnexpectedError(format!(
|
||||
"Executor failed when trying to install os-caddy package with error {e:?}"
|
||||
))
|
||||
})?;
|
||||
} else {
|
||||
info!("Http config available, assuming Caddy is already installed");
|
||||
info!("Http config available in opnsense config, assuming it is already installed");
|
||||
}
|
||||
|
||||
info!("Adding custom caddy config files");
|
||||
self.opnsense_config
|
||||
config
|
||||
.upload_files(
|
||||
"./data/watchguard/caddy_config",
|
||||
"/usr/local/etc/caddy/caddy.d/",
|
||||
@@ -144,11 +95,7 @@ impl HttpServer for OPNSenseFirewall {
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
|
||||
info!("Enabling http server");
|
||||
self.opnsense_config
|
||||
.caddy()
|
||||
.enable(true)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
config.caddy().enable(true);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
@@ -1,8 +1,9 @@
|
||||
use async_trait::async_trait;
|
||||
use log::{debug, error, info, warn};
|
||||
use opnsense_config::modules::load_balancer::{
|
||||
HaproxyService, LbBackend, LbFrontend, LbHealthCheck, LbServer,
|
||||
use opnsense_config_xml::{
|
||||
Frontend, HAProxy, HAProxyBackend, HAProxyHealthCheck, HAProxyServer, MaybeString,
|
||||
};
|
||||
use uuid::Uuid;
|
||||
|
||||
use crate::{
|
||||
executors::ExecutorError,
|
||||
@@ -11,7 +12,6 @@ use crate::{
|
||||
LogicalHost, SSL,
|
||||
},
|
||||
};
|
||||
use harmony_types::firewall::{Direction, FirewallAction, IpProtocol, NetworkProtocol};
|
||||
use harmony_types::net::IpAddress;
|
||||
|
||||
use super::OPNSenseFirewall;
|
||||
@@ -26,13 +26,15 @@ impl LoadBalancer for OPNSenseFirewall {
|
||||
}
|
||||
|
||||
async fn add_service(&self, service: &LoadBalancerService) -> Result<(), ExecutorError> {
|
||||
let (frontend, backend, servers, healthcheck) = harmony_service_to_lb_types(service);
|
||||
let mut config = self.opnsense_config.write().await;
|
||||
let mut load_balancer = config.load_balancer();
|
||||
|
||||
self.opnsense_config
|
||||
.load_balancer()
|
||||
.configure_service(frontend, backend, servers, healthcheck)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
|
||||
let (frontend, backend, servers, healthcheck) =
|
||||
harmony_load_balancer_service_to_haproxy_xml(service);
|
||||
|
||||
load_balancer.configure_service(frontend, backend, servers, healthcheck);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn remove_service(&self, service: &LoadBalancerService) -> Result<(), ExecutorError> {
|
||||
@@ -45,6 +47,8 @@ impl LoadBalancer for OPNSenseFirewall {
|
||||
|
||||
async fn reload_restart(&self) -> Result<(), ExecutorError> {
|
||||
self.opnsense_config
|
||||
.write()
|
||||
.await
|
||||
.load_balancer()
|
||||
.reload_restart()
|
||||
.await
|
||||
@@ -52,214 +56,455 @@ impl LoadBalancer for OPNSenseFirewall {
|
||||
}
|
||||
|
||||
async fn ensure_initialized(&self) -> Result<(), ExecutorError> {
|
||||
let lb = self.opnsense_config.load_balancer();
|
||||
if lb.is_installed().await {
|
||||
debug!("HAProxy is installed");
|
||||
let mut config = self.opnsense_config.write().await;
|
||||
let load_balancer = config.load_balancer();
|
||||
if let Some(config) = load_balancer.get_full_config() {
|
||||
debug!(
|
||||
"HAProxy config available in opnsense config, assuming it is already installed, {config:?}"
|
||||
);
|
||||
} else {
|
||||
self.opnsense_config
|
||||
.install_package("os-haproxy")
|
||||
.await
|
||||
.map_err(|e| {
|
||||
ExecutorError::UnexpectedError(format!("Failed to install os-haproxy: {e:?}"))
|
||||
})?;
|
||||
config.install_package("os-haproxy").await.map_err(|e| {
|
||||
ExecutorError::UnexpectedError(format!(
|
||||
"Executor failed when trying to install os-haproxy package with error {e:?}"
|
||||
))
|
||||
})?;
|
||||
}
|
||||
|
||||
self.opnsense_config
|
||||
.load_balancer()
|
||||
.enable(true)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
config.load_balancer().enable(true);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn list_services(&self) -> Vec<LoadBalancerService> {
|
||||
match self.opnsense_config.load_balancer().list_services().await {
|
||||
Ok(services) => services
|
||||
.into_iter()
|
||||
.filter_map(|svc| haproxy_service_to_harmony(&svc))
|
||||
.collect(),
|
||||
Err(e) => {
|
||||
warn!("Failed to list HAProxy services: {e}");
|
||||
vec![]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn ensure_wan_access(&self, port: u16) -> Result<(), ExecutorError> {
|
||||
info!("Ensuring WAN firewall rule for TCP port {port}");
|
||||
let fw = self.opnsense_config.firewall();
|
||||
fw.ensure_filter_rule(
|
||||
&FirewallAction::Pass,
|
||||
&Direction::In,
|
||||
"wan",
|
||||
&IpProtocol::Inet,
|
||||
&NetworkProtocol::Tcp,
|
||||
"any",
|
||||
"any",
|
||||
Some(&port.to_string()),
|
||||
None,
|
||||
&format!("LB: Allow TCP/{port} ingress on WAN"),
|
||||
false,
|
||||
)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
fw.apply()
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
Ok(())
|
||||
let mut config = self.opnsense_config.write().await;
|
||||
let load_balancer = config.load_balancer();
|
||||
let haproxy_xml_config = load_balancer.get_full_config();
|
||||
haproxy_xml_config_to_harmony_loadbalancer(haproxy_xml_config)
|
||||
}
|
||||
}
|
||||
|
||||
fn haproxy_service_to_harmony(svc: &HaproxyService) -> Option<LoadBalancerService> {
|
||||
let listening_port = svc.bind.parse().unwrap_or_else(|_| {
|
||||
panic!(
|
||||
"HAProxy frontend address should be a valid SocketAddr, got {}",
|
||||
svc.bind
|
||||
)
|
||||
});
|
||||
pub(crate) fn haproxy_xml_config_to_harmony_loadbalancer(
|
||||
haproxy: &Option<HAProxy>,
|
||||
) -> Vec<LoadBalancerService> {
|
||||
let haproxy = match haproxy {
|
||||
Some(haproxy) => haproxy,
|
||||
None => return vec![],
|
||||
};
|
||||
|
||||
let backend_servers: Vec<BackendServer> = svc
|
||||
haproxy
|
||||
.frontends
|
||||
.frontend
|
||||
.iter()
|
||||
.map(|frontend| {
|
||||
let mut backend_servers = vec![];
|
||||
let matching_backend = haproxy
|
||||
.backends
|
||||
.backends
|
||||
.iter()
|
||||
.find(|b| Some(b.uuid.clone()) == frontend.default_backend);
|
||||
|
||||
let mut health_check = None;
|
||||
match matching_backend {
|
||||
Some(backend) => {
|
||||
backend_servers.append(&mut get_servers_for_backend(backend, haproxy));
|
||||
health_check = get_health_check_for_backend(backend, haproxy);
|
||||
}
|
||||
None => {
|
||||
warn!(
|
||||
"HAProxy config could not find a matching backend for frontend {frontend:?}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
LoadBalancerService {
|
||||
backend_servers,
|
||||
listening_port: frontend.bind.parse().unwrap_or_else(|_| {
|
||||
panic!(
|
||||
"HAProxy frontend address should be a valid SocketAddr, got {}",
|
||||
frontend.bind
|
||||
)
|
||||
}),
|
||||
health_check,
|
||||
}
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
pub(crate) fn get_servers_for_backend(
|
||||
backend: &HAProxyBackend,
|
||||
haproxy: &HAProxy,
|
||||
) -> Vec<BackendServer> {
|
||||
let backend_servers: Vec<&str> = match &backend.linked_servers.content {
|
||||
Some(linked_servers) => linked_servers.split(',').collect(),
|
||||
None => {
|
||||
info!("No server defined for HAProxy backend {:?}", backend);
|
||||
return vec![];
|
||||
}
|
||||
};
|
||||
haproxy
|
||||
.servers
|
||||
.servers
|
||||
.iter()
|
||||
.map(|s| BackendServer {
|
||||
address: s.address.clone(),
|
||||
port: s.port,
|
||||
.filter_map(|server| {
|
||||
let address = server.address.clone()?;
|
||||
let port = server.port?;
|
||||
|
||||
if backend_servers.contains(&server.uuid.as_str()) {
|
||||
return Some(BackendServer { address, port });
|
||||
}
|
||||
None
|
||||
})
|
||||
.collect();
|
||||
|
||||
let health_check = svc
|
||||
.health_check
|
||||
.as_ref()
|
||||
.and_then(|hc| match hc.check_type.as_str() {
|
||||
"TCP" => Some(HealthCheck::TCP(hc.checkport)),
|
||||
"HTTP" => {
|
||||
let path = hc.http_uri.clone().unwrap_or_default();
|
||||
let method: HttpMethod = hc.http_method.clone().unwrap_or_default().into();
|
||||
let ssl = match hc.ssl.as_deref().unwrap_or("").to_uppercase().as_str() {
|
||||
"SSL" => SSL::SSL,
|
||||
"SSLNI" => SSL::SNI,
|
||||
"NOSSL" => SSL::Disabled,
|
||||
"" => SSL::Default,
|
||||
other => {
|
||||
error!("Unknown haproxy health check ssl config {other}");
|
||||
SSL::Other(other.to_string())
|
||||
}
|
||||
};
|
||||
Some(HealthCheck::HTTP(
|
||||
hc.checkport,
|
||||
path,
|
||||
method,
|
||||
HttpStatusCode::Success2xx,
|
||||
ssl,
|
||||
))
|
||||
}
|
||||
_ => {
|
||||
warn!("Unsupported health check type: {}", hc.check_type);
|
||||
None
|
||||
}
|
||||
});
|
||||
|
||||
Some(LoadBalancerService {
|
||||
backend_servers,
|
||||
listening_port,
|
||||
health_check,
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
pub(crate) fn harmony_service_to_lb_types(
|
||||
service: &LoadBalancerService,
|
||||
) -> (LbFrontend, LbBackend, Vec<LbServer>, Option<LbHealthCheck>) {
|
||||
let healthcheck = service.health_check.as_ref().map(|hc| match hc {
|
||||
HealthCheck::HTTP(port, path, http_method, _status_code, ssl) => {
|
||||
let ssl_str = match ssl {
|
||||
SSL::SSL => Some("ssl".to_string()),
|
||||
SSL::SNI => Some("sslni".to_string()),
|
||||
SSL::Disabled => Some("nossl".to_string()),
|
||||
SSL::Default => Some(String::new()),
|
||||
SSL::Other(other) => Some(other.clone()),
|
||||
pub(crate) fn get_health_check_for_backend(
|
||||
backend: &HAProxyBackend,
|
||||
haproxy: &HAProxy,
|
||||
) -> Option<HealthCheck> {
|
||||
let health_check_uuid = match &backend.health_check.content {
|
||||
Some(uuid) => uuid,
|
||||
None => return None,
|
||||
};
|
||||
|
||||
let haproxy_health_check = haproxy
|
||||
.healthchecks
|
||||
.healthchecks
|
||||
.iter()
|
||||
.find(|h| &h.uuid == health_check_uuid)?;
|
||||
|
||||
let binding = haproxy_health_check.health_check_type.to_uppercase();
|
||||
let uppercase = binding.as_str();
|
||||
match uppercase {
|
||||
"TCP" => {
|
||||
if let Some(checkport) = haproxy_health_check.checkport.content.as_ref() {
|
||||
if !checkport.is_empty() {
|
||||
return Some(HealthCheck::TCP(Some(checkport.parse().unwrap_or_else(
|
||||
|_| {
|
||||
panic!(
|
||||
"HAProxy check port should be a valid port number, got {checkport}"
|
||||
)
|
||||
},
|
||||
))));
|
||||
}
|
||||
}
|
||||
Some(HealthCheck::TCP(None))
|
||||
}
|
||||
"HTTP" => {
|
||||
let path: String = haproxy_health_check
|
||||
.http_uri
|
||||
.content
|
||||
.clone()
|
||||
.unwrap_or_default();
|
||||
let method: HttpMethod = haproxy_health_check
|
||||
.http_method
|
||||
.content
|
||||
.clone()
|
||||
.unwrap_or_default()
|
||||
.into();
|
||||
let status_code: HttpStatusCode = HttpStatusCode::Success2xx;
|
||||
let ssl = match haproxy_health_check
|
||||
.ssl
|
||||
.content_string()
|
||||
.to_uppercase()
|
||||
.as_str()
|
||||
{
|
||||
"SSL" => SSL::SSL,
|
||||
"SSLNI" => SSL::SNI,
|
||||
"NOSSL" => SSL::Disabled,
|
||||
"" => SSL::Default,
|
||||
other => {
|
||||
error!("Unknown haproxy health check ssl config {other}");
|
||||
SSL::Other(other.to_string())
|
||||
}
|
||||
};
|
||||
let path_without_query = path.split_once('?').map_or(path.as_str(), |(p, _)| p);
|
||||
let port_name = port
|
||||
.map(|p| p.to_string())
|
||||
.unwrap_or("serverport".to_string());
|
||||
|
||||
LbHealthCheck {
|
||||
name: format!("HTTP_{http_method}_{path_without_query}_{port_name}"),
|
||||
check_type: "http".to_string(),
|
||||
interval: "2s".to_string(),
|
||||
http_method: Some(http_method.to_string().to_lowercase()),
|
||||
http_uri: Some(path.clone()),
|
||||
ssl: ssl_str,
|
||||
checkport: port.map(|p| p.to_string()),
|
||||
let port = haproxy_health_check
|
||||
.checkport
|
||||
.content_string()
|
||||
.parse::<u16>()
|
||||
.ok();
|
||||
debug!("Found haproxy healthcheck port {port:?}");
|
||||
|
||||
Some(HealthCheck::HTTP(port, path, method, status_code, ssl))
|
||||
}
|
||||
_ => panic!("Received unsupported health check type {}", uppercase),
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn harmony_load_balancer_service_to_haproxy_xml(
|
||||
service: &LoadBalancerService,
|
||||
) -> (
|
||||
Frontend,
|
||||
HAProxyBackend,
|
||||
Vec<HAProxyServer>,
|
||||
Option<HAProxyHealthCheck>,
|
||||
) {
|
||||
// Here we have to build :
|
||||
// One frontend
|
||||
// One backend
|
||||
// One Option<healthcheck>
|
||||
// Vec of servers
|
||||
//
|
||||
// Then merge then with haproxy config individually
|
||||
//
|
||||
// We also have to take into account that it is entirely possible that a backe uses a server
|
||||
// with the same definition as in another backend. So when creating a new backend, we must not
|
||||
// blindly create new servers because the backend does not exist yet. Even if it is a new
|
||||
// backend, it may very well reuse existing servers
|
||||
//
|
||||
// Also we need to support router integration for port forwarding on WAN as a strategy to
|
||||
// handle dyndns
|
||||
// server is standalone
|
||||
// backend points on server
|
||||
// backend points to health check
|
||||
// frontend points to backend
|
||||
let healthcheck = if let Some(health_check) = &service.health_check {
|
||||
match health_check {
|
||||
HealthCheck::HTTP(port, path, http_method, _http_status_code, ssl) => {
|
||||
let ssl: MaybeString = match ssl {
|
||||
SSL::SSL => "ssl".into(),
|
||||
SSL::SNI => "sslni".into(),
|
||||
SSL::Disabled => "nossl".into(),
|
||||
SSL::Default => "".into(),
|
||||
SSL::Other(other) => other.as_str().into(),
|
||||
};
|
||||
let path_without_query = path.split_once('?').map_or(path.as_str(), |(p, _)| p);
|
||||
let (port, port_name) = match port {
|
||||
Some(port) => (Some(port.to_string()), port.to_string()),
|
||||
None => (None, "serverport".to_string()),
|
||||
};
|
||||
|
||||
let haproxy_check = HAProxyHealthCheck {
|
||||
name: format!("HTTP_{http_method}_{path_without_query}_{port_name}"),
|
||||
uuid: Uuid::new_v4().to_string(),
|
||||
http_method: http_method.to_string().to_lowercase().into(),
|
||||
health_check_type: "http".to_string(),
|
||||
http_uri: path.clone().into(),
|
||||
interval: "2s".to_string(),
|
||||
ssl,
|
||||
checkport: MaybeString::from(port.map(|p| p.to_string())),
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
Some(haproxy_check)
|
||||
}
|
||||
HealthCheck::TCP(port) => {
|
||||
let (port, port_name) = match port {
|
||||
Some(port) => (Some(port.to_string()), port.to_string()),
|
||||
None => (None, "serverport".to_string()),
|
||||
};
|
||||
|
||||
let haproxy_check = HAProxyHealthCheck {
|
||||
name: format!("TCP_{port_name}"),
|
||||
uuid: Uuid::new_v4().to_string(),
|
||||
health_check_type: "tcp".to_string(),
|
||||
checkport: port.into(),
|
||||
interval: "2s".to_string(),
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
Some(haproxy_check)
|
||||
}
|
||||
}
|
||||
HealthCheck::TCP(port) => {
|
||||
let port_name = port
|
||||
.map(|p| p.to_string())
|
||||
.unwrap_or("serverport".to_string());
|
||||
LbHealthCheck {
|
||||
name: format!("TCP_{port_name}"),
|
||||
check_type: "tcp".to_string(),
|
||||
interval: "2s".to_string(),
|
||||
http_method: None,
|
||||
http_uri: None,
|
||||
ssl: None,
|
||||
checkport: port.map(|p| p.to_string()),
|
||||
}
|
||||
}
|
||||
});
|
||||
} else {
|
||||
None
|
||||
};
|
||||
debug!("Built healthcheck {healthcheck:?}");
|
||||
|
||||
let servers: Vec<LbServer> = service
|
||||
let servers: Vec<HAProxyServer> = service
|
||||
.backend_servers
|
||||
.iter()
|
||||
.map(|s| LbServer {
|
||||
name: format!("{}_{}", &s.address, &s.port),
|
||||
address: s.address.clone(),
|
||||
port: s.port,
|
||||
enabled: true,
|
||||
mode: "active".to_string(),
|
||||
server_type: "static".to_string(),
|
||||
})
|
||||
.map(server_to_haproxy_server)
|
||||
.collect();
|
||||
debug!("Built servers {servers:?}");
|
||||
|
||||
let bind_str = service.listening_port.to_string();
|
||||
let safe_name = bind_str.replace(':', "_");
|
||||
|
||||
let backend = LbBackend {
|
||||
name: format!("backend_{safe_name}"),
|
||||
mode: "tcp".to_string(),
|
||||
let mut backend = HAProxyBackend {
|
||||
uuid: Uuid::new_v4().to_string(),
|
||||
enabled: 1,
|
||||
name: format!(
|
||||
"backend_{}",
|
||||
service.listening_port.to_string().replace(':', "_")
|
||||
),
|
||||
algorithm: "roundrobin".to_string(),
|
||||
enabled: true,
|
||||
health_check_enabled: healthcheck.is_some(),
|
||||
random_draws: Some(2),
|
||||
stickiness_expire: Some("30m".to_string()),
|
||||
stickiness_size: Some("50k".to_string()),
|
||||
stickiness_conn_rate_period: Some("10s".to_string()),
|
||||
stickiness_sess_rate_period: Some("10s".to_string()),
|
||||
stickiness_http_req_rate_period: Some("10s".to_string()),
|
||||
stickiness_http_err_rate_period: Some("10s".to_string()),
|
||||
stickiness_bytes_in_rate_period: Some("1m".to_string()),
|
||||
stickiness_bytes_out_rate_period: Some("1m".to_string()),
|
||||
stickiness_expire: "30m".to_string(),
|
||||
stickiness_size: "50k".to_string(),
|
||||
stickiness_conn_rate_period: "10s".to_string(),
|
||||
stickiness_sess_rate_period: "10s".to_string(),
|
||||
stickiness_http_req_rate_period: "10s".to_string(),
|
||||
stickiness_http_err_rate_period: "10s".to_string(),
|
||||
stickiness_bytes_in_rate_period: "1m".to_string(),
|
||||
stickiness_bytes_out_rate_period: "1m".to_string(),
|
||||
mode: "tcp".to_string(), // TODO do not depend on health check here
|
||||
..Default::default()
|
||||
};
|
||||
info!("HAProxy backend algorithm is currently hardcoded to roundrobin");
|
||||
info!("HAPRoxy backend algorithm is currently hardcoded to roundrobin");
|
||||
|
||||
let frontend = LbFrontend {
|
||||
name: format!("frontend_{safe_name}"),
|
||||
bind: bind_str,
|
||||
mode: "tcp".to_string(),
|
||||
enabled: true,
|
||||
default_backend: None, // Set by configure_service after creating backend
|
||||
stickiness_expire: Some("30m".to_string()),
|
||||
stickiness_size: Some("50k".to_string()),
|
||||
stickiness_conn_rate_period: Some("10s".to_string()),
|
||||
stickiness_sess_rate_period: Some("10s".to_string()),
|
||||
stickiness_http_req_rate_period: Some("10s".to_string()),
|
||||
stickiness_http_err_rate_period: Some("10s".to_string()),
|
||||
stickiness_bytes_in_rate_period: Some("1m".to_string()),
|
||||
stickiness_bytes_out_rate_period: Some("1m".to_string()),
|
||||
ssl_hsts_max_age: Some(15768000),
|
||||
if let Some(hcheck) = &healthcheck {
|
||||
backend.health_check_enabled = 1;
|
||||
backend.health_check = hcheck.uuid.clone().into();
|
||||
}
|
||||
|
||||
backend.linked_servers = servers
|
||||
.iter()
|
||||
.map(|s| s.uuid.as_str())
|
||||
.collect::<Vec<&str>>()
|
||||
.join(",")
|
||||
.into();
|
||||
debug!("Built backend {backend:?}");
|
||||
|
||||
let frontend = Frontend {
|
||||
uuid: uuid::Uuid::new_v4().to_string(),
|
||||
enabled: 1,
|
||||
name: format!(
|
||||
"frontend_{}",
|
||||
service.listening_port.to_string().replace(':', "_")
|
||||
),
|
||||
bind: service.listening_port.to_string(),
|
||||
mode: "tcp".to_string(), // TODO do not depend on health check here
|
||||
default_backend: Some(backend.uuid.clone()),
|
||||
stickiness_expire: "30m".to_string().into(),
|
||||
stickiness_size: "50k".to_string().into(),
|
||||
stickiness_conn_rate_period: "10s".to_string().into(),
|
||||
stickiness_sess_rate_period: "10s".to_string().into(),
|
||||
stickiness_http_req_rate_period: "10s".to_string().into(),
|
||||
stickiness_http_err_rate_period: "10s".to_string().into(),
|
||||
stickiness_bytes_in_rate_period: "1m".to_string().into(),
|
||||
stickiness_bytes_out_rate_period: "1m".to_string().into(),
|
||||
ssl_hsts_max_age: 15768000,
|
||||
..Default::default()
|
||||
};
|
||||
info!("HAProxy frontend and backend mode currently hardcoded to tcp");
|
||||
info!("HAPRoxy frontend and backend mode currently hardcoded to tcp");
|
||||
|
||||
debug!("Built frontend {frontend:?}");
|
||||
(frontend, backend, servers, healthcheck)
|
||||
}
|
||||
|
||||
fn server_to_haproxy_server(server: &BackendServer) -> HAProxyServer {
|
||||
HAProxyServer {
|
||||
uuid: Uuid::new_v4().to_string(),
|
||||
name: format!("{}_{}", &server.address, &server.port),
|
||||
enabled: 1,
|
||||
address: Some(server.address.clone()),
|
||||
port: Some(server.port),
|
||||
mode: "active".to_string(),
|
||||
server_type: "static".to_string(),
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use opnsense_config_xml::HAProxyServer;
|
||||
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_get_servers_for_backend_with_linked_servers() {
|
||||
// Create a backend with linked servers
|
||||
let mut backend = HAProxyBackend::default();
|
||||
backend.linked_servers.content = Some("server1,server2".to_string());
|
||||
|
||||
// Create an HAProxy instance with servers
|
||||
let mut haproxy = HAProxy::default();
|
||||
let server = HAProxyServer {
|
||||
uuid: "server1".to_string(),
|
||||
address: Some("192.168.1.1".to_string()),
|
||||
port: Some(80),
|
||||
..Default::default()
|
||||
};
|
||||
haproxy.servers.servers.push(server);
|
||||
|
||||
// Call the function
|
||||
let result = get_servers_for_backend(&backend, &haproxy);
|
||||
|
||||
// Check the result
|
||||
assert_eq!(
|
||||
result,
|
||||
vec![BackendServer {
|
||||
address: "192.168.1.1".to_string(),
|
||||
port: 80,
|
||||
},]
|
||||
);
|
||||
}
|
||||
#[test]
|
||||
fn test_get_servers_for_backend_no_linked_servers() {
|
||||
// Create a backend with no linked servers
|
||||
let backend = HAProxyBackend::default();
|
||||
// Create an HAProxy instance with servers
|
||||
let mut haproxy = HAProxy::default();
|
||||
let server = HAProxyServer {
|
||||
uuid: "server1".to_string(),
|
||||
address: Some("192.168.1.1".to_string()),
|
||||
port: Some(80),
|
||||
..Default::default()
|
||||
};
|
||||
haproxy.servers.servers.push(server);
|
||||
// Call the function
|
||||
let result = get_servers_for_backend(&backend, &haproxy);
|
||||
// Check the result
|
||||
assert_eq!(result, vec![]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_get_servers_for_backend_no_matching_servers() {
|
||||
// Create a backend with linked servers that do not match any in HAProxy
|
||||
let mut backend = HAProxyBackend::default();
|
||||
backend.linked_servers.content = Some("server4,server5".to_string());
|
||||
// Create an HAProxy instance with servers
|
||||
let mut haproxy = HAProxy::default();
|
||||
let server = HAProxyServer {
|
||||
uuid: "server1".to_string(),
|
||||
address: Some("192.168.1.1".to_string()),
|
||||
port: Some(80),
|
||||
..Default::default()
|
||||
};
|
||||
haproxy.servers.servers.push(server);
|
||||
// Call the function
|
||||
let result = get_servers_for_backend(&backend, &haproxy);
|
||||
// Check the result
|
||||
assert_eq!(result, vec![]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_get_servers_for_backend_multiple_linked_servers() {
|
||||
// Create a backend with multiple linked servers
|
||||
#[allow(clippy::field_reassign_with_default)]
|
||||
let mut backend = HAProxyBackend::default();
|
||||
backend.linked_servers.content = Some("server1,server2".to_string());
|
||||
//
|
||||
// Create an HAProxy instance with matching servers
|
||||
let mut haproxy = HAProxy::default();
|
||||
let server = HAProxyServer {
|
||||
uuid: "server1".to_string(),
|
||||
address: Some("some-hostname.test.mcd".to_string()),
|
||||
port: Some(80),
|
||||
..Default::default()
|
||||
};
|
||||
haproxy.servers.servers.push(server);
|
||||
|
||||
let server = HAProxyServer {
|
||||
uuid: "server2".to_string(),
|
||||
address: Some("192.168.1.2".to_string()),
|
||||
port: Some(8080),
|
||||
..Default::default()
|
||||
};
|
||||
haproxy.servers.servers.push(server);
|
||||
|
||||
// Call the function
|
||||
let result = get_servers_for_backend(&backend, &haproxy);
|
||||
// Check the result
|
||||
assert_eq!(
|
||||
result,
|
||||
vec![
|
||||
BackendServer {
|
||||
address: "some-hostname.test.mcd".to_string(),
|
||||
port: 80,
|
||||
},
|
||||
BackendServer {
|
||||
address: "192.168.1.2".to_string(),
|
||||
port: 8080,
|
||||
},
|
||||
]
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -9,17 +9,14 @@ mod tftp;
|
||||
use std::sync::Arc;
|
||||
|
||||
pub use management::*;
|
||||
use tokio::sync::RwLock;
|
||||
|
||||
use cidr::Ipv4Cidr;
|
||||
|
||||
use crate::config::secret::{OPNSenseApiCredentials, OPNSenseFirewallCredentials};
|
||||
use crate::topology::Router;
|
||||
use crate::{executors::ExecutorError, topology::LogicalHost};
|
||||
use harmony_types::net::IpAddress;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct OPNSenseFirewall {
|
||||
opnsense_config: Arc<opnsense_config::Config>,
|
||||
opnsense_config: Arc<RwLock<opnsense_config::Config>>,
|
||||
host: LogicalHost,
|
||||
}
|
||||
|
||||
@@ -28,87 +25,27 @@ impl OPNSenseFirewall {
|
||||
self.host.ip
|
||||
}
|
||||
|
||||
/// Create a new OPNSenseFirewall.
|
||||
///
|
||||
/// Requires both API credentials (for configuration CRUD) and SSH
|
||||
/// credentials (for file uploads, PXE config).
|
||||
///
|
||||
/// API port defaults to 443
|
||||
pub async fn new(
|
||||
host: LogicalHost,
|
||||
ssh_port: Option<u16>,
|
||||
api_creds: &OPNSenseApiCredentials,
|
||||
ssh_creds: &OPNSenseFirewallCredentials,
|
||||
) -> Self {
|
||||
Self::with_api_port(host, ssh_port, 443, api_creds, ssh_creds).await
|
||||
}
|
||||
|
||||
/// Like [`new`] but with a custom API/web GUI port.
|
||||
pub async fn with_api_port(
|
||||
host: LogicalHost,
|
||||
port: Option<u16>,
|
||||
api_port: u16,
|
||||
api_creds: &OPNSenseApiCredentials,
|
||||
ssh_creds: &OPNSenseFirewallCredentials,
|
||||
) -> Self {
|
||||
let config = opnsense_config::Config::from_credentials_with_api_port(
|
||||
host.ip,
|
||||
port,
|
||||
api_port,
|
||||
&api_creds.key,
|
||||
&api_creds.secret,
|
||||
&ssh_creds.username,
|
||||
&ssh_creds.password,
|
||||
)
|
||||
.await
|
||||
.expect("Failed to create OPNsense config");
|
||||
|
||||
/// panics : if the opnsense config file cannot be loaded by the underlying opnsense_config
|
||||
/// crate
|
||||
pub async fn new(host: LogicalHost, port: Option<u16>, username: &str, password: &str) -> Self {
|
||||
Self {
|
||||
opnsense_config: Arc::new(config),
|
||||
opnsense_config: Arc::new(RwLock::new(
|
||||
opnsense_config::Config::from_credentials(host.ip, port, username, password).await,
|
||||
)),
|
||||
host,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn get_opnsense_config(&self) -> Arc<opnsense_config::Config> {
|
||||
pub fn get_opnsense_config(&self) -> Arc<RwLock<opnsense_config::Config>> {
|
||||
self.opnsense_config.clone()
|
||||
}
|
||||
|
||||
/// Test-only constructor from a pre-built `Config`.
|
||||
///
|
||||
/// Allows creating an `OPNSenseFirewall` backed by a mock HTTP server
|
||||
/// without needing real credentials or SSH connections.
|
||||
#[cfg(test)]
|
||||
pub fn from_config(host: LogicalHost, config: opnsense_config::Config) -> Self {
|
||||
Self {
|
||||
opnsense_config: Arc::new(config),
|
||||
host,
|
||||
}
|
||||
}
|
||||
|
||||
async fn commit_config(&self) -> Result<(), ExecutorError> {
|
||||
// With the API backend, mutations are applied per-call.
|
||||
// This is now a no-op for backward compatibility.
|
||||
self.opnsense_config
|
||||
.read()
|
||||
.await
|
||||
.apply()
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
|
||||
}
|
||||
}
|
||||
|
||||
impl Router for OPNSenseFirewall {
|
||||
fn get_gateway(&self) -> IpAddress {
|
||||
self.host.ip
|
||||
}
|
||||
|
||||
fn get_cidr(&self) -> Ipv4Cidr {
|
||||
let ipv4 = match self.host.ip {
|
||||
IpAddress::V4(ip) => ip,
|
||||
IpAddress::V6(_) => panic!("IPv6 not supported for OPNSense router"),
|
||||
};
|
||||
Ipv4Cidr::new(ipv4, 24).unwrap()
|
||||
}
|
||||
|
||||
fn get_host(&self) -> LogicalHost {
|
||||
self.host.clone()
|
||||
}
|
||||
}
|
||||
|
||||
@@ -9,33 +9,36 @@ use crate::{
|
||||
#[async_trait]
|
||||
impl NodeExporter for OPNSenseFirewall {
|
||||
async fn ensure_initialized(&self) -> Result<(), ExecutorError> {
|
||||
if self.opnsense_config.node_exporter().is_installed().await {
|
||||
debug!("Node exporter is installed");
|
||||
let mut config = self.opnsense_config.write().await;
|
||||
let node_exporter = config.node_exporter();
|
||||
if let Some(config) = node_exporter.get_full_config() {
|
||||
debug!(
|
||||
"Node exporter available in opnsense config, assuming it is already installed. {config:?}"
|
||||
);
|
||||
} else {
|
||||
self.opnsense_config
|
||||
config
|
||||
.install_package("os-node_exporter")
|
||||
.await
|
||||
.map_err(|e| {
|
||||
ExecutorError::UnexpectedError(format!(
|
||||
"Failed to install os-node_exporter: {e:?}"
|
||||
))
|
||||
})?;
|
||||
ExecutorError::UnexpectedError(format!("Executor failed when trying to install os-node_exporter package with error {e:?}"
|
||||
))
|
||||
})?;
|
||||
}
|
||||
|
||||
self.opnsense_config
|
||||
config
|
||||
.node_exporter()
|
||||
.enable(true)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn commit_config(&self) -> Result<(), ExecutorError> {
|
||||
OPNSenseFirewall::commit_config(self).await
|
||||
}
|
||||
|
||||
async fn reload_restart(&self) -> Result<(), ExecutorError> {
|
||||
self.opnsense_config
|
||||
.write()
|
||||
.await
|
||||
.node_exporter()
|
||||
.reload_restart()
|
||||
.await
|
||||
|
||||
@@ -12,21 +12,16 @@ impl TftpServer for OPNSenseFirewall {
|
||||
async fn serve_files(&self, url: &Url) -> Result<(), ExecutorError> {
|
||||
let tftp_root_path = "/usr/local/tftp";
|
||||
|
||||
let config = self.opnsense_config.read().await;
|
||||
info!("Uploading files from url {url} to {tftp_root_path}");
|
||||
match url {
|
||||
Url::LocalFolder(path) => {
|
||||
self.opnsense_config
|
||||
config
|
||||
.upload_files(path, tftp_root_path)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
}
|
||||
Url::Url(url) => {
|
||||
let local_dir = super::http::download_url_to_temp_dir(url).await?;
|
||||
self.opnsense_config
|
||||
.upload_files(&local_dir, tftp_root_path)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))?;
|
||||
}
|
||||
Url::Url(url) => todo!("This url is not supported yet {url}"),
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
@@ -38,10 +33,11 @@ impl TftpServer for OPNSenseFirewall {
|
||||
async fn set_ip(&self, ip: IpAddress) -> Result<(), ExecutorError> {
|
||||
info!("Setting listen_ip to {}", &ip);
|
||||
self.opnsense_config
|
||||
.tftp()
|
||||
.listen_ip(&ip.to_string())
|
||||
.write()
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
|
||||
.tftp()
|
||||
.listen_ip(&ip.to_string());
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn commit_config(&self) -> Result<(), ExecutorError> {
|
||||
@@ -50,6 +46,8 @@ impl TftpServer for OPNSenseFirewall {
|
||||
|
||||
async fn reload_restart(&self) -> Result<(), ExecutorError> {
|
||||
self.opnsense_config
|
||||
.write()
|
||||
.await
|
||||
.tftp()
|
||||
.reload_restart()
|
||||
.await
|
||||
@@ -57,23 +55,22 @@ impl TftpServer for OPNSenseFirewall {
|
||||
}
|
||||
|
||||
async fn ensure_initialized(&self) -> Result<(), ExecutorError> {
|
||||
if !self.opnsense_config.tftp().is_installed().await {
|
||||
info!("TFTP not installed, installing os-tftp package");
|
||||
self.opnsense_config
|
||||
.install_package("os-tftp")
|
||||
.await
|
||||
.map_err(|e| {
|
||||
ExecutorError::UnexpectedError(format!("Failed to install os-tftp: {e:?}"))
|
||||
})?;
|
||||
let mut config = self.opnsense_config.write().await;
|
||||
let tftp = config.tftp();
|
||||
if tftp.get_full_config().is_none() {
|
||||
info!("Tftp config not available in opnsense config, installing package");
|
||||
config.install_package("os-tftp").await.map_err(|e| {
|
||||
ExecutorError::UnexpectedError(format!(
|
||||
"Executor failed when trying to install os-tftp package with error {e:?}"
|
||||
))
|
||||
})?;
|
||||
} else {
|
||||
info!("TFTP config available, assuming it is already installed");
|
||||
info!("Tftp config available in opnsense config, assuming it is already installed");
|
||||
}
|
||||
|
||||
info!("Enabling tftp server");
|
||||
self.opnsense_config
|
||||
.tftp()
|
||||
.enable(true)
|
||||
.await
|
||||
.map_err(|e| ExecutorError::UnexpectedError(e.to_string()))
|
||||
config.tftp().enable(true);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
@@ -192,7 +192,7 @@ impl DhcpHostBindingInterpret {
|
||||
for entry in dhcp_entries.into_iter() {
|
||||
match dhcp_server.add_static_mapping(&entry).await {
|
||||
Ok(_) => info!("Successfully registered DHCPStaticEntry {}", entry),
|
||||
Err(e) => return Err(InterpretError::from(e)),
|
||||
Err(_) => todo!(),
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,181 +0,0 @@
|
||||
use async_trait::async_trait;
|
||||
use k8s_openapi::api::core::v1::{ConfigMap, Pod};
|
||||
use kube::api::ListParams;
|
||||
use log::{debug, info};
|
||||
use serde::Serialize;
|
||||
|
||||
use crate::{
|
||||
data::Version,
|
||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||
inventory::Inventory,
|
||||
score::Score,
|
||||
topology::{K8sclient, Topology},
|
||||
};
|
||||
use harmony_types::id::Id;
|
||||
|
||||
/// A DNS rewrite rule mapping a hostname to a cluster service FQDN.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct CoreDNSRewrite {
|
||||
/// The hostname to intercept (e.g., `"sso.harmony.local"`).
|
||||
pub hostname: String,
|
||||
/// The cluster service FQDN to resolve to (e.g., `"zitadel.zitadel.svc.cluster.local"`).
|
||||
pub target: String,
|
||||
}
|
||||
|
||||
/// Score that patches CoreDNS to add `rewrite name` rules.
|
||||
///
|
||||
/// Useful when in-cluster pods need to reach services by their external
|
||||
/// hostnames (e.g., for Zitadel Host header validation, or OpenBao JWT
|
||||
/// auth fetching JWKS from Zitadel).
|
||||
///
|
||||
/// Only applies to K3sFamily and Default distributions. No-op on OpenShift
|
||||
/// (which uses a different DNS operator).
|
||||
///
|
||||
/// Idempotent: existing rules are detected and skipped. CoreDNS pods are
|
||||
/// restarted only when new rules are added.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct CoreDNSRewriteScore {
|
||||
pub rewrites: Vec<CoreDNSRewrite>,
|
||||
}
|
||||
|
||||
impl<T: Topology + K8sclient> Score<T> for CoreDNSRewriteScore {
|
||||
fn name(&self) -> String {
|
||||
"CoreDNSRewriteScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||
Box::new(CoreDNSRewriteInterpret {
|
||||
rewrites: self.rewrites.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
struct CoreDNSRewriteInterpret {
|
||||
rewrites: Vec<CoreDNSRewrite>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl<T: Topology + K8sclient> Interpret<T> for CoreDNSRewriteInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
_inventory: &Inventory,
|
||||
topology: &T,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let k8s = topology
|
||||
.k8s_client()
|
||||
.await
|
||||
.map_err(|e| InterpretError::new(format!("Failed to get K8s client: {e}")))?;
|
||||
|
||||
let distro = k8s
|
||||
.get_k8s_distribution()
|
||||
.await
|
||||
.map_err(|e| InterpretError::new(format!("Failed to detect distribution: {e}")))?;
|
||||
|
||||
if !matches!(
|
||||
distro,
|
||||
harmony_k8s::KubernetesDistribution::K3sFamily
|
||||
| harmony_k8s::KubernetesDistribution::Default
|
||||
) {
|
||||
return Ok(Outcome::noop(
|
||||
"Skipping CoreDNS patch (not K3sFamily)".to_string(),
|
||||
));
|
||||
}
|
||||
|
||||
let cm: ConfigMap = k8s
|
||||
.get_resource::<ConfigMap>("coredns", Some("kube-system"))
|
||||
.await
|
||||
.map_err(|e| InterpretError::new(format!("Failed to get coredns ConfigMap: {e}")))?
|
||||
.ok_or_else(|| {
|
||||
InterpretError::new("CoreDNS ConfigMap not found in kube-system".to_string())
|
||||
})?;
|
||||
|
||||
let corefile = cm
|
||||
.data
|
||||
.as_ref()
|
||||
.and_then(|d| d.get("Corefile"))
|
||||
.ok_or_else(|| InterpretError::new("CoreDNS ConfigMap has no Corefile key".into()))?;
|
||||
|
||||
let mut new_rules = Vec::new();
|
||||
for r in &self.rewrites {
|
||||
if !corefile.contains(&format!("rewrite name {} {}", r.hostname, r.target)) {
|
||||
new_rules.push(format!(" rewrite name {} {}", r.hostname, r.target));
|
||||
}
|
||||
}
|
||||
|
||||
if new_rules.is_empty() {
|
||||
return Ok(Outcome::noop(
|
||||
"CoreDNS rewrite rules already present".to_string(),
|
||||
));
|
||||
}
|
||||
|
||||
let patched = corefile.replacen(
|
||||
".:53 {\n",
|
||||
&format!(".:53 {{\n{}\n", new_rules.join("\n")),
|
||||
1,
|
||||
);
|
||||
|
||||
debug!("[CoreDNS] Patched Corefile:\n{}", patched);
|
||||
|
||||
// Use apply_dynamic with force_conflicts since the ConfigMap is
|
||||
// owned by the cluster deployer (e.g., k3d) and server-side apply
|
||||
// would conflict without force.
|
||||
let patch_obj: kube::api::DynamicObject = serde_json::from_value(serde_json::json!({
|
||||
"apiVersion": "v1",
|
||||
"kind": "ConfigMap",
|
||||
"metadata": { "name": "coredns", "namespace": "kube-system" },
|
||||
"data": { "Corefile": patched }
|
||||
}))
|
||||
.map_err(|e| InterpretError::new(format!("Failed to build patch: {e}")))?;
|
||||
|
||||
k8s.apply_dynamic(&patch_obj, Some("kube-system"), true)
|
||||
.await
|
||||
.map_err(|e| InterpretError::new(format!("Failed to apply CoreDNS patch: {e}")))?;
|
||||
|
||||
// Restart CoreDNS pods to pick up the new config
|
||||
let pods = k8s
|
||||
.list_resources::<Pod>(
|
||||
Some("kube-system"),
|
||||
Some(ListParams::default().labels("k8s-app=kube-dns")),
|
||||
)
|
||||
.await
|
||||
.map_err(|e| InterpretError::new(format!("Failed to list CoreDNS pods: {e}")))?;
|
||||
|
||||
for pod in pods.items {
|
||||
if let Some(name) = &pod.metadata.name {
|
||||
let _ = k8s.delete_resource::<Pod>(name, Some("kube-system")).await;
|
||||
}
|
||||
}
|
||||
|
||||
// Brief pause for pods to restart
|
||||
tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;
|
||||
|
||||
info!("[CoreDNS] Patched with {} rewrite rule(s)", new_rules.len());
|
||||
|
||||
Ok(Outcome {
|
||||
status: InterpretStatus::SUCCESS,
|
||||
message: format!("{} CoreDNS rewrite rule(s) applied", new_rules.len()),
|
||||
details: self
|
||||
.rewrites
|
||||
.iter()
|
||||
.map(|r| format!("{} -> {}", r.hostname, r.target))
|
||||
.collect(),
|
||||
})
|
||||
}
|
||||
|
||||
fn get_name(&self) -> InterpretName {
|
||||
InterpretName::Custom("CoreDNSRewrite")
|
||||
}
|
||||
|
||||
fn get_version(&self) -> Version {
|
||||
todo!()
|
||||
}
|
||||
|
||||
fn get_status(&self) -> InterpretStatus {
|
||||
todo!()
|
||||
}
|
||||
|
||||
fn get_children(&self) -> Vec<Id> {
|
||||
vec![]
|
||||
}
|
||||
}
|
||||
@@ -1,5 +1,4 @@
|
||||
pub mod apps;
|
||||
pub mod coredns;
|
||||
pub mod deployment;
|
||||
mod failover;
|
||||
pub mod ingress;
|
||||
|
||||
@@ -1,54 +0,0 @@
|
||||
use log::{debug, info};
|
||||
|
||||
use crate::domain::config::HARMONY_DATA_DIR;
|
||||
|
||||
use super::error::KvmError;
|
||||
use super::executor::KvmExecutor;
|
||||
|
||||
const DEFAULT_IMAGE_DIR: &str = "/var/lib/libvirt/images";
|
||||
|
||||
/// Creates a [`KvmExecutor`] from environment variables.
|
||||
///
|
||||
/// | Variable | Description |
|
||||
/// |---------------------------|----------------------------------------------------|
|
||||
/// | `HARMONY_KVM_URI` | Full libvirt URI. Defaults to `qemu:///system`. |
|
||||
/// | `HARMONY_KVM_IMAGE_DIR` | Directory for VM disk images. Defaults to `/var/lib/libvirt/images`. |
|
||||
///
|
||||
/// For backwards compatibility, `HARMONY_KVM_CONNECTION` is also accepted as
|
||||
/// an alias for `HARMONY_KVM_URI`.
|
||||
pub fn init_executor() -> Result<KvmExecutor, KvmError> {
|
||||
let uri = std::env::var("HARMONY_KVM_URI")
|
||||
.or_else(|_| std::env::var("HARMONY_KVM_CONNECTION"))
|
||||
.unwrap_or_else(|_| "qemu:///system".to_string());
|
||||
|
||||
let image_dir = std::env::var("HARMONY_KVM_IMAGE_DIR").unwrap_or_else(|_| {
|
||||
// Fall back to the harmony data dir if available, else the system default.
|
||||
let data_dir = HARMONY_DATA_DIR.join("kvm").join("images");
|
||||
let path = data_dir.to_string_lossy().to_string();
|
||||
debug!("HARMONY_KVM_IMAGE_DIR not set; using {path}");
|
||||
path
|
||||
});
|
||||
|
||||
if uri.starts_with("qemu+ssh://") {
|
||||
validate_ssh_uri(&uri)?;
|
||||
}
|
||||
|
||||
info!("KVM executor initialised: uri={uri}, image_dir={image_dir}");
|
||||
Ok(KvmExecutor::new(uri, image_dir))
|
||||
}
|
||||
|
||||
/// Validates that an SSH URI looks structurally correct and returns an error
|
||||
/// with a helpful message when it does not.
|
||||
fn validate_ssh_uri(uri: &str) -> Result<(), KvmError> {
|
||||
// Expected form: qemu+ssh://user@host/system
|
||||
let without_scheme = uri
|
||||
.strip_prefix("qemu+ssh://")
|
||||
.ok_or_else(|| KvmError::InvalidUri(uri.to_string()))?;
|
||||
|
||||
if !without_scheme.contains('@') || !without_scheme.contains('/') {
|
||||
return Err(KvmError::InvalidUri(format!(
|
||||
"expected qemu+ssh://user@host/system, got: {uri}"
|
||||
)));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
@@ -1,42 +0,0 @@
|
||||
use std::io::Error as IoError;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum KvmError {
|
||||
#[error("connection failed to '{uri}': {source}")]
|
||||
ConnectionFailed {
|
||||
uri: String,
|
||||
#[source]
|
||||
source: virt::error::Error,
|
||||
},
|
||||
|
||||
#[error("invalid connection URI: {0}")]
|
||||
InvalidUri(String),
|
||||
|
||||
#[error("VM '{name}' already exists")]
|
||||
VmAlreadyExists { name: String },
|
||||
|
||||
#[error("VM '{name}' not found")]
|
||||
VmNotFound { name: String },
|
||||
|
||||
#[error("network '{name}' already exists")]
|
||||
NetworkAlreadyExists { name: String },
|
||||
|
||||
#[error("network '{name}' not found")]
|
||||
NetworkNotFound { name: String },
|
||||
|
||||
#[error("storage pool '{name}' not found")]
|
||||
StoragePoolNotFound { name: String },
|
||||
|
||||
#[error("ISO download failed: {0}")]
|
||||
IsoDownload(String),
|
||||
|
||||
#[error("command failed: {0}")]
|
||||
CommandFailed(String),
|
||||
|
||||
#[error("libvirt error: {0}")]
|
||||
Libvirt(#[from] virt::error::Error),
|
||||
|
||||
#[error("IO error: {0}")]
|
||||
Io(#[from] IoError),
|
||||
}
|
||||
@@ -1,515 +0,0 @@
|
||||
use log::{debug, info, warn};
|
||||
use std::net::IpAddr;
|
||||
use virt::connect::Connect;
|
||||
use virt::domain::Domain;
|
||||
use virt::network::Network;
|
||||
use virt::storage_pool::StoragePool;
|
||||
use virt::storage_vol::StorageVol;
|
||||
use virt::sys;
|
||||
|
||||
use super::error::KvmError;
|
||||
use super::types::{CdromConfig, NetworkConfig, VmConfig, VmInterface, VmStatus};
|
||||
use super::xml;
|
||||
|
||||
/// A handle to a libvirt hypervisor.
|
||||
///
|
||||
/// Wraps a [`virt::connect::Connect`] and provides high-level operations for
|
||||
/// virtual machines, networks, and storage volumes. All methods that call
|
||||
/// libvirt are dispatched to a blocking thread via
|
||||
/// [`tokio::task::spawn_blocking`] to avoid blocking the async executor.
|
||||
#[derive(Clone)]
|
||||
pub struct KvmExecutor {
|
||||
/// Libvirt connection URI (e.g. `qemu:///system`).
|
||||
uri: String,
|
||||
/// Path used as the base image directory for new VM disks.
|
||||
image_dir: String,
|
||||
}
|
||||
|
||||
impl KvmExecutor {
|
||||
/// Creates an executor that will open a libvirt connection on each
|
||||
/// blocking call. Connection is not held across calls to keep `Clone`
|
||||
/// and `Send` simple.
|
||||
pub fn new(uri: impl Into<String>, image_dir: impl Into<String>) -> Self {
|
||||
Self {
|
||||
uri: uri.into(),
|
||||
image_dir: image_dir.into(),
|
||||
}
|
||||
}
|
||||
|
||||
fn open_connection(&self) -> Result<Connect, KvmError> {
|
||||
let uri = self.uri.clone();
|
||||
Connect::open(Some(&uri)).map_err(|e| KvmError::ConnectionFailed {
|
||||
uri: uri.clone(),
|
||||
source: e,
|
||||
})
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Networks
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
/// Ensures the given network exists and is active.
|
||||
///
|
||||
/// If the network already exists, it is started if not already active.
|
||||
/// If it does not exist, it is defined and started.
|
||||
pub async fn ensure_network(&self, cfg: NetworkConfig) -> Result<(), KvmError> {
|
||||
let executor = self.clone();
|
||||
tokio::task::spawn_blocking(move || executor.ensure_network_blocking(&cfg))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn ensure_network_blocking(&self, cfg: &NetworkConfig) -> Result<(), KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
match Network::lookup_by_name(&conn, &cfg.name) {
|
||||
Ok(net) => {
|
||||
if !net.is_active()? {
|
||||
info!("Network '{}' exists but is inactive; starting it", cfg.name);
|
||||
net.create()?;
|
||||
} else {
|
||||
debug!("Network '{}' already active", cfg.name);
|
||||
}
|
||||
if !net.get_autostart()? {
|
||||
net.set_autostart(true)?;
|
||||
}
|
||||
}
|
||||
Err(_) => {
|
||||
info!("Defining network '{}'", cfg.name);
|
||||
let xml = xml::network_xml(cfg);
|
||||
debug!("Network XML:\n{xml}");
|
||||
let net = Network::define_xml(&conn, &xml)?;
|
||||
net.create()?;
|
||||
net.set_autostart(true)?;
|
||||
info!("Network '{}' created and active", cfg.name);
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Stops and removes a network. No-ops if the network does not exist.
|
||||
pub async fn delete_network(&self, name: &str) -> Result<(), KvmError> {
|
||||
let executor = self.clone();
|
||||
let name = name.to_string();
|
||||
tokio::task::spawn_blocking(move || executor.delete_network_blocking(&name))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn delete_network_blocking(&self, name: &str) -> Result<(), KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
match Network::lookup_by_name(&conn, name) {
|
||||
Ok(net) => {
|
||||
if net.is_active()? {
|
||||
info!("Destroying network '{name}'");
|
||||
net.destroy()?;
|
||||
}
|
||||
net.undefine()?;
|
||||
info!("Network '{name}' removed");
|
||||
Ok(())
|
||||
}
|
||||
Err(_) => {
|
||||
warn!("delete_network: network '{name}' not found, skipping");
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Domains (VMs)
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
/// Returns `true` if a domain with `name` is known to libvirt.
|
||||
pub async fn vm_exists(&self, name: &str) -> Result<bool, KvmError> {
|
||||
let executor = self.clone();
|
||||
let name = name.to_string();
|
||||
tokio::task::spawn_blocking(move || executor.vm_exists_blocking(&name))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn vm_exists_blocking(&self, name: &str) -> Result<bool, KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
match Domain::lookup_by_name(&conn, name) {
|
||||
Ok(_) => Ok(true),
|
||||
Err(_) => Ok(false),
|
||||
}
|
||||
}
|
||||
|
||||
/// Defines a VM from `config`, creating storage volumes as needed.
|
||||
///
|
||||
/// Fails if the VM already exists. Use [`KvmExecutor::ensure_vm`] for
|
||||
/// idempotent behaviour.
|
||||
pub async fn define_vm(&self, config: VmConfig) -> Result<(), KvmError> {
|
||||
let executor = self.clone();
|
||||
tokio::task::spawn_blocking(move || executor.define_vm_blocking(&config))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn define_vm_blocking(&self, config: &VmConfig) -> Result<(), KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
|
||||
if Domain::lookup_by_name(&conn, &config.name).is_ok() {
|
||||
return Err(KvmError::VmAlreadyExists {
|
||||
name: config.name.clone(),
|
||||
});
|
||||
}
|
||||
|
||||
self.create_volumes_blocking(&conn, config)?;
|
||||
|
||||
let xml = xml::domain_xml(config, &self.image_dir);
|
||||
debug!("Defining domain '{}' with XML:\n{xml}", config.name);
|
||||
Domain::define_xml(&conn, &xml)?;
|
||||
info!("VM '{}' defined", config.name);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Idempotent: defines the VM if it does not already exist.
|
||||
pub async fn ensure_vm(&self, config: VmConfig) -> Result<(), KvmError> {
|
||||
let executor = self.clone();
|
||||
tokio::task::spawn_blocking(move || executor.ensure_vm_blocking(&config))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn ensure_vm_blocking(&self, config: &VmConfig) -> Result<(), KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
if Domain::lookup_by_name(&conn, &config.name).is_ok() {
|
||||
debug!("VM '{}' already defined, skipping", config.name);
|
||||
return Ok(());
|
||||
}
|
||||
self.create_volumes_blocking(&conn, config)?;
|
||||
let xml = xml::domain_xml(config, &self.image_dir);
|
||||
debug!("Defining domain '{}' with XML:\n{xml}", config.name);
|
||||
Domain::define_xml(&conn, &xml)?;
|
||||
info!("VM '{}' defined", config.name);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Starts a defined VM.
|
||||
pub async fn start_vm(&self, name: &str) -> Result<(), KvmError> {
|
||||
let executor = self.clone();
|
||||
let name = name.to_string();
|
||||
tokio::task::spawn_blocking(move || executor.start_vm_blocking(&name))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn start_vm_blocking(&self, name: &str) -> Result<(), KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
let dom = Domain::lookup_by_name(&conn, name).map_err(|_| KvmError::VmNotFound {
|
||||
name: name.to_string(),
|
||||
})?;
|
||||
let (state, _) = dom.get_state()?;
|
||||
if state == sys::VIR_DOMAIN_RUNNING || state == sys::VIR_DOMAIN_BLOCKED {
|
||||
debug!("VM '{name}' is already running, skipping start");
|
||||
return Ok(());
|
||||
}
|
||||
dom.create()?;
|
||||
info!("VM '{name}' started");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Gracefully shuts down a VM.
|
||||
pub async fn shutdown_vm(&self, name: &str) -> Result<(), KvmError> {
|
||||
let executor = self.clone();
|
||||
let name = name.to_string();
|
||||
tokio::task::spawn_blocking(move || executor.shutdown_vm_blocking(&name))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn shutdown_vm_blocking(&self, name: &str) -> Result<(), KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
let dom = Domain::lookup_by_name(&conn, name).map_err(|_| KvmError::VmNotFound {
|
||||
name: name.to_string(),
|
||||
})?;
|
||||
dom.shutdown()?;
|
||||
info!("VM '{name}' shutdown requested");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Forcibly powers off a VM without a graceful shutdown.
|
||||
pub async fn destroy_vm(&self, name: &str) -> Result<(), KvmError> {
|
||||
let executor = self.clone();
|
||||
let name = name.to_string();
|
||||
tokio::task::spawn_blocking(move || executor.destroy_vm_blocking(&name))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn destroy_vm_blocking(&self, name: &str) -> Result<(), KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
let dom = Domain::lookup_by_name(&conn, name).map_err(|_| KvmError::VmNotFound {
|
||||
name: name.to_string(),
|
||||
})?;
|
||||
dom.destroy()?;
|
||||
info!("VM '{name}' forcibly destroyed");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Undefines (removes) a VM. The VM must not be running.
|
||||
pub async fn undefine_vm(&self, name: &str) -> Result<(), KvmError> {
|
||||
let executor = self.clone();
|
||||
let name = name.to_string();
|
||||
tokio::task::spawn_blocking(move || executor.undefine_vm_blocking(&name))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn undefine_vm_blocking(&self, name: &str) -> Result<(), KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
match Domain::lookup_by_name(&conn, name) {
|
||||
Ok(dom) => {
|
||||
dom.undefine()?;
|
||||
info!("VM '{name}' undefined");
|
||||
Ok(())
|
||||
}
|
||||
Err(_) => {
|
||||
warn!("undefine_vm: VM '{name}' not found, skipping");
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns the current [`VmStatus`] of a VM.
|
||||
pub async fn vm_status(&self, name: &str) -> Result<VmStatus, KvmError> {
|
||||
let executor = self.clone();
|
||||
let name = name.to_string();
|
||||
tokio::task::spawn_blocking(move || executor.vm_status_blocking(&name))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn vm_status_blocking(&self, name: &str) -> Result<VmStatus, KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
let dom = Domain::lookup_by_name(&conn, name).map_err(|_| KvmError::VmNotFound {
|
||||
name: name.to_string(),
|
||||
})?;
|
||||
let (state, _reason) = dom.get_state()?;
|
||||
let status = match state {
|
||||
sys::VIR_DOMAIN_RUNNING | sys::VIR_DOMAIN_BLOCKED => VmStatus::Running,
|
||||
sys::VIR_DOMAIN_PAUSED => VmStatus::Paused,
|
||||
sys::VIR_DOMAIN_SHUTDOWN | sys::VIR_DOMAIN_SHUTOFF => VmStatus::Shutoff,
|
||||
sys::VIR_DOMAIN_CRASHED => VmStatus::Crashed,
|
||||
sys::VIR_DOMAIN_PMSUSPENDED => VmStatus::PMSuspended,
|
||||
_ => VmStatus::Other,
|
||||
};
|
||||
Ok(status)
|
||||
}
|
||||
|
||||
/// Returns the first IPv4 address of a running VM, or `None` if no
|
||||
/// address has been assigned yet.
|
||||
///
|
||||
/// Uses the libvirt lease/agent source to discover the IP. This requires
|
||||
/// the VM to have obtained an address via DHCP from the libvirt network.
|
||||
pub async fn vm_ip(&self, name: &str) -> Result<Option<IpAddr>, KvmError> {
|
||||
let executor = self.clone();
|
||||
let name = name.to_string();
|
||||
tokio::task::spawn_blocking(move || executor.vm_ip_blocking(&name))
|
||||
.await
|
||||
.expect("blocking task panicked")
|
||||
}
|
||||
|
||||
fn vm_ip_blocking(&self, name: &str) -> Result<Option<IpAddr>, KvmError> {
|
||||
let conn = self.open_connection()?;
|
||||
let dom = Domain::lookup_by_name(&conn, name).map_err(|_| KvmError::VmNotFound {
|
||||
name: name.to_string(),
|
||||
})?;
|
||||
|
||||
// Try lease-based source first (works with libvirt's built-in DHCP)
|
||||
let interfaces = dom
|
||||
.interface_addresses(sys::VIR_DOMAIN_INTERFACE_ADDRESSES_SRC_LEASE, 0)
|
||||
.unwrap_or_default();
|
||||
|
||||
for iface in &interfaces {
|
||||
for addr in &iface.addrs {
|
||||
// typed == 0 means IPv4 (AF_INET)
|
||||
if addr.typed == 0 {
|
||||
if let Ok(ip) = addr.addr.parse::<IpAddr>() {
|
||||
return Ok(Some(ip));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(None)
|
||||
}
|
||||
|
||||
/// Polls until a VM has an IP address, with a timeout.
|
||||
///
|
||||
/// Returns the IP once available, or an error if the timeout is reached.
|
||||
pub async fn wait_for_ip(
|
||||
&self,
|
||||
name: &str,
|
||||
timeout: std::time::Duration,
|
||||
) -> Result<IpAddr, KvmError> {
|
||||
let deadline = tokio::time::Instant::now() + timeout;
|
||||
loop {
|
||||
if let Some(ip) = self.vm_ip(name).await? {
|
||||
info!("VM '{name}' has IP: {ip}");
|
||||
return Ok(ip);
|
||||
}
|
||||
if tokio::time::Instant::now() > deadline {
|
||||
return Err(KvmError::Io(std::io::Error::new(
|
||||
std::io::ErrorKind::TimedOut,
|
||||
format!("VM '{name}' did not obtain an IP within {timeout:?}"),
|
||||
)));
|
||||
}
|
||||
tokio::time::sleep(std::time::Duration::from_secs(3)).await;
|
||||
}
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// NIC link control
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
/// Set the link state of a VM's network interface.
|
||||
///
|
||||
/// Brings a NIC up or down by MAC address. Useful for preventing IP
|
||||
/// conflicts when multiple VMs boot with the same default IP — disable
|
||||
/// all NICs, then enable one at a time for sequential bootstrapping.
|
||||
///
|
||||
/// Uses `virsh domif-setlink` under the hood.
|
||||
pub async fn set_interface_link(
|
||||
&self,
|
||||
vm_name: &str,
|
||||
mac: &str,
|
||||
up: bool,
|
||||
) -> Result<(), KvmError> {
|
||||
let state = if up { "up" } else { "down" };
|
||||
info!("Setting {vm_name} interface {mac} link {state}");
|
||||
|
||||
let output = tokio::process::Command::new("virsh")
|
||||
.args(["-c", &self.uri, "domif-setlink", vm_name, mac, state])
|
||||
.output()
|
||||
.await?;
|
||||
|
||||
if !output.status.success() {
|
||||
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||
return Err(KvmError::CommandFailed(format!(
|
||||
"domif-setlink failed: {}",
|
||||
stderr.trim()
|
||||
)));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// List all network interfaces of a VM with their MAC addresses.
|
||||
///
|
||||
/// Returns a list of `(interface_type, source, mac, model)` tuples.
|
||||
pub async fn list_interfaces(&self, vm_name: &str) -> Result<Vec<VmInterface>, KvmError> {
|
||||
let output = tokio::process::Command::new("virsh")
|
||||
.args(["-c", &self.uri, "domiflist", vm_name])
|
||||
.output()
|
||||
.await?;
|
||||
|
||||
if !output.status.success() {
|
||||
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||
return Err(KvmError::CommandFailed(format!(
|
||||
"domiflist failed: {}",
|
||||
stderr.trim()
|
||||
)));
|
||||
}
|
||||
|
||||
let stdout = String::from_utf8_lossy(&output.stdout);
|
||||
let mut interfaces = Vec::new();
|
||||
|
||||
for line in stdout.lines().skip(2) {
|
||||
// virsh domiflist columns: Interface, Type, Source, Model, MAC
|
||||
let parts: Vec<&str> = line.split_whitespace().collect();
|
||||
if parts.len() >= 5 {
|
||||
interfaces.push(VmInterface {
|
||||
interface_type: parts[1].to_string(),
|
||||
source: parts[2].to_string(),
|
||||
model: parts[3].to_string(),
|
||||
mac: parts[4].to_string(),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
Ok(interfaces)
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Storage
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
fn create_volumes_blocking(&self, conn: &Connect, config: &VmConfig) -> Result<(), KvmError> {
|
||||
for disk in &config.disks {
|
||||
// Skip volume creation for disks with an existing source path
|
||||
if disk.source_path.is_some() {
|
||||
debug!(
|
||||
"Disk '{}' uses existing source, skipping volume creation",
|
||||
disk.device
|
||||
);
|
||||
continue;
|
||||
}
|
||||
let pool = StoragePool::lookup_by_name(conn, &disk.pool).map_err(|_| {
|
||||
KvmError::StoragePoolNotFound {
|
||||
name: disk.pool.clone(),
|
||||
}
|
||||
})?;
|
||||
|
||||
let vol_name = format!("{}-{}", config.name, disk.device);
|
||||
match StorageVol::lookup_by_name(&pool, &format!("{vol_name}.qcow2")) {
|
||||
Ok(_) => {
|
||||
debug!(
|
||||
"Volume '{vol_name}.qcow2' already exists in pool '{}'",
|
||||
disk.pool
|
||||
);
|
||||
}
|
||||
Err(_) => {
|
||||
info!(
|
||||
"Creating volume '{vol_name}.qcow2' ({} GiB) in pool '{}'",
|
||||
disk.size_gb, disk.pool
|
||||
);
|
||||
let xml = xml::volume_xml(&vol_name, disk.size_gb);
|
||||
StorageVol::create_xml(&pool, &xml, 0)?;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for cdrom in &config.cdroms {
|
||||
self.prepare_iso_blocking(cdrom)?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn prepare_iso_blocking(&self, cdrom: &CdromConfig) -> Result<(), KvmError> {
|
||||
let source = &cdrom.source;
|
||||
|
||||
if source.starts_with("http://") || source.starts_with("https://") {
|
||||
let file_name = source.split('/').last().unwrap_or("downloaded.iso");
|
||||
let target_path = format!("{}/{}", self.image_dir, file_name);
|
||||
|
||||
if std::path::Path::new(&target_path).exists() {
|
||||
info!("ISO '{}' already downloaded, skipping", file_name);
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
info!("Downloading ISO '{}' to '{}'", file_name, target_path);
|
||||
self.download_iso_blocking(source, &target_path)?;
|
||||
info!("ISO '{}' downloaded successfully", file_name);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn download_iso_blocking(&self, url: &str, target_path: &str) -> Result<(), KvmError> {
|
||||
let response =
|
||||
reqwest::blocking::get(url).map_err(|e| KvmError::IsoDownload(e.to_string()))?;
|
||||
|
||||
let mut file = std::fs::File::create(target_path)?;
|
||||
|
||||
let content = response
|
||||
.bytes()
|
||||
.map_err(|e| KvmError::IsoDownload(e.to_string()))?;
|
||||
|
||||
std::io::copy(&mut content.as_ref(), &mut file)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
@@ -1,13 +0,0 @@
|
||||
mod xml;
|
||||
|
||||
pub mod config;
|
||||
pub mod error;
|
||||
pub mod executor;
|
||||
pub mod types;
|
||||
|
||||
pub use error::KvmError;
|
||||
pub use executor::KvmExecutor;
|
||||
pub use types::{
|
||||
BootDevice, CdromConfig, DhcpHost, DiskConfig, ForwardMode, NetworkConfig,
|
||||
NetworkConfigBuilder, NetworkRef, VmConfig, VmConfigBuilder, VmInterface, VmStatus,
|
||||
};
|
||||
@@ -1,395 +0,0 @@
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
/// Information about a VM's network interface, as reported by `virsh domiflist`.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct VmInterface {
|
||||
/// Interface type (e.g. "network", "bridge")
|
||||
pub interface_type: String,
|
||||
/// Source network or bridge name
|
||||
pub source: String,
|
||||
/// Device model (e.g. "virtio")
|
||||
pub model: String,
|
||||
/// MAC address
|
||||
pub mac: String,
|
||||
}
|
||||
|
||||
/// Specifies how a KVM host is accessed.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub enum KvmConnectionUri {
|
||||
/// Local hypervisor via UNIX socket. Equivalent to `qemu:///system`.
|
||||
Local,
|
||||
/// Remote hypervisor over SSH. Equivalent to `qemu+ssh://user@host/system`.
|
||||
RemoteSsh { host: String, username: String },
|
||||
}
|
||||
|
||||
impl KvmConnectionUri {
|
||||
/// Returns the libvirt URI string for this connection.
|
||||
pub fn as_uri(&self) -> String {
|
||||
match self {
|
||||
KvmConnectionUri::Local => "qemu:///system".to_string(),
|
||||
KvmConnectionUri::RemoteSsh { host, username } => {
|
||||
format!("qemu+ssh://{username}@{host}/system")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Configuration for a virtual disk attached to a VM.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct DiskConfig {
|
||||
/// Disk size in gigabytes. Ignored when `source_path` is set.
|
||||
pub size_gb: u32,
|
||||
/// Target device name in the guest (e.g. `vda`, `vdb`).
|
||||
pub device: String,
|
||||
/// Storage pool to allocate the volume from. Defaults to `"default"`.
|
||||
pub pool: String,
|
||||
/// When set, use this existing disk image instead of creating a new volume.
|
||||
pub source_path: Option<String>,
|
||||
}
|
||||
|
||||
/// Configuration for a CD-ROM/ISO device attached to a VM.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct CdromConfig {
|
||||
/// Path or URL to the ISO image. If it starts with `http` or `https`, it will be downloaded.
|
||||
pub source: String,
|
||||
/// Target device name in the guest (e.g. `hda`, `hdb`). Defaults to `hda`.
|
||||
pub device: String,
|
||||
}
|
||||
|
||||
impl DiskConfig {
|
||||
/// Creates a new disk config with sequential virtio device naming.
|
||||
///
|
||||
/// `index` maps 0 → `vda`, 1 → `vdb`, etc.
|
||||
pub fn new(size_gb: u32, index: u8) -> Self {
|
||||
let device = format!("vd{}", (b'a' + index) as char);
|
||||
Self {
|
||||
size_gb,
|
||||
device,
|
||||
pool: "default".to_string(),
|
||||
source_path: None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Use an existing disk image file instead of creating a new volume.
|
||||
pub fn from_path(path: impl Into<String>, index: u8) -> Self {
|
||||
let device = format!("vd{}", (b'a' + index) as char);
|
||||
Self {
|
||||
size_gb: 0,
|
||||
device,
|
||||
pool: String::new(),
|
||||
source_path: Some(path.into()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Override the storage pool.
|
||||
pub fn from_pool(mut self, pool: impl Into<String>) -> Self {
|
||||
self.pool = pool.into();
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// A reference to a libvirt virtual network by name.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct NetworkRef {
|
||||
/// Libvirt network name (e.g. `"harmonylan"`).
|
||||
pub name: String,
|
||||
/// Optional fixed MAC address for this interface. When `None`, libvirt
|
||||
/// assigns one automatically.
|
||||
pub mac: Option<String>,
|
||||
}
|
||||
|
||||
impl NetworkRef {
|
||||
pub fn named(name: impl Into<String>) -> Self {
|
||||
Self {
|
||||
name: name.into(),
|
||||
mac: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn with_mac(mut self, mac: impl Into<String>) -> Self {
|
||||
self.mac = Some(mac.into());
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// Boot device priority entry.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub enum BootDevice {
|
||||
/// Boot from first hard disk (vda)
|
||||
Disk,
|
||||
/// Boot from network (PXE)
|
||||
Network,
|
||||
/// Boot from CD-ROM/ISO
|
||||
Cdrom,
|
||||
}
|
||||
|
||||
impl BootDevice {
|
||||
pub(crate) fn as_xml_dev(&self) -> &'static str {
|
||||
match self {
|
||||
BootDevice::Disk => "hd",
|
||||
BootDevice::Network => "network",
|
||||
BootDevice::Cdrom => "cdrom",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Full configuration for a KVM virtual machine.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct VmConfig {
|
||||
/// VM name, must be unique on the host.
|
||||
pub name: String,
|
||||
/// Number of virtual CPUs.
|
||||
pub vcpus: u32,
|
||||
/// Memory in mebibytes (MiB).
|
||||
pub memory_mib: u64,
|
||||
/// Disks to attach, in order.
|
||||
pub disks: Vec<DiskConfig>,
|
||||
/// Network interfaces to attach, in order.
|
||||
pub networks: Vec<NetworkRef>,
|
||||
/// CD-ROM/ISO devices to attach.
|
||||
pub cdroms: Vec<CdromConfig>,
|
||||
/// Boot order. First entry has highest priority.
|
||||
pub boot_order: Vec<BootDevice>,
|
||||
}
|
||||
|
||||
impl VmConfig {
|
||||
pub fn builder(name: impl Into<String>) -> VmConfigBuilder {
|
||||
VmConfigBuilder::new(name)
|
||||
}
|
||||
}
|
||||
|
||||
/// Builder for [`VmConfig`].
|
||||
#[derive(Debug)]
|
||||
pub struct VmConfigBuilder {
|
||||
name: String,
|
||||
vcpus: u32,
|
||||
memory_mib: u64,
|
||||
disks: Vec<DiskConfig>,
|
||||
networks: Vec<NetworkRef>,
|
||||
cdroms: Vec<CdromConfig>,
|
||||
boot_order: Vec<BootDevice>,
|
||||
}
|
||||
|
||||
impl VmConfigBuilder {
|
||||
pub fn new(name: impl Into<String>) -> Self {
|
||||
Self {
|
||||
name: name.into(),
|
||||
vcpus: 2,
|
||||
memory_mib: 4096,
|
||||
disks: vec![],
|
||||
networks: vec![],
|
||||
cdroms: vec![],
|
||||
boot_order: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
pub fn vcpus(mut self, vcpus: u32) -> Self {
|
||||
self.vcpus = vcpus;
|
||||
self
|
||||
}
|
||||
|
||||
/// Convenience shorthand: sets memory in whole gigabytes.
|
||||
pub fn memory_gb(mut self, gb: u32) -> Self {
|
||||
self.memory_mib = gb as u64 * 1024;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn memory_mib(mut self, mib: u64) -> Self {
|
||||
self.memory_mib = mib;
|
||||
self
|
||||
}
|
||||
|
||||
/// Appends a disk. Devices are named sequentially: `vda`, `vdb`, …
|
||||
pub fn disk(mut self, size_gb: u32) -> Self {
|
||||
let idx = self.disks.len() as u8;
|
||||
self.disks.push(DiskConfig::new(size_gb, idx));
|
||||
self
|
||||
}
|
||||
|
||||
/// Appends a disk backed by an existing qcow2/raw image file.
|
||||
pub fn disk_from_path(mut self, path: impl Into<String>) -> Self {
|
||||
let idx = self.disks.len() as u8;
|
||||
self.disks.push(DiskConfig::from_path(path, idx));
|
||||
self
|
||||
}
|
||||
|
||||
/// Appends a disk with an explicit pool override.
|
||||
pub fn disk_from_pool(mut self, size_gb: u32, pool: impl Into<String>) -> Self {
|
||||
let idx = self.disks.len() as u8;
|
||||
self.disks
|
||||
.push(DiskConfig::new(size_gb, idx).from_pool(pool));
|
||||
self
|
||||
}
|
||||
|
||||
pub fn network(mut self, net: NetworkRef) -> Self {
|
||||
self.networks.push(net);
|
||||
self
|
||||
}
|
||||
|
||||
/// Attaches a CD-ROM with the given ISO source.
|
||||
///
|
||||
/// The source can be a local path or an HTTP/HTTPS URL that will be
|
||||
/// downloaded to the image directory.
|
||||
pub fn cdrom(mut self, source: impl Into<String>) -> Self {
|
||||
self.cdroms.push(CdromConfig {
|
||||
source: source.into(),
|
||||
device: "hda".to_string(),
|
||||
});
|
||||
self
|
||||
}
|
||||
|
||||
pub fn boot_order(mut self, order: impl IntoIterator<Item = BootDevice>) -> Self {
|
||||
self.boot_order = order.into_iter().collect();
|
||||
self
|
||||
}
|
||||
|
||||
pub fn build(self) -> VmConfig {
|
||||
VmConfig {
|
||||
name: self.name,
|
||||
vcpus: self.vcpus,
|
||||
memory_mib: self.memory_mib,
|
||||
disks: self.disks,
|
||||
networks: self.networks,
|
||||
cdroms: self.cdroms,
|
||||
boot_order: self.boot_order,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A DHCP static host entry for a libvirt network.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct DhcpHost {
|
||||
/// MAC address (e.g. `"52:54:00:00:50:01"`).
|
||||
pub mac: String,
|
||||
/// IP to assign (e.g. `"10.50.0.2"`).
|
||||
pub ip: String,
|
||||
/// Optional hostname.
|
||||
pub name: Option<String>,
|
||||
}
|
||||
|
||||
/// Configuration for an isolated virtual network.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct NetworkConfig {
|
||||
/// Libvirt network name.
|
||||
pub name: String,
|
||||
/// Bridge device name (e.g. `"virbr100"`).
|
||||
pub bridge: String,
|
||||
/// Gateway IP address of the network.
|
||||
pub gateway_ip: String,
|
||||
/// Network prefix length (e.g. `24`).
|
||||
pub prefix_len: u8,
|
||||
/// Forward mode. When `None`, the network is fully isolated.
|
||||
pub forward_mode: Option<ForwardMode>,
|
||||
/// Optional DHCP range (start, end). When set, libvirt's built-in
|
||||
/// DHCP server hands out addresses in this range.
|
||||
pub dhcp_range: Option<(String, String)>,
|
||||
/// Static DHCP host entries for fixed IP assignment by MAC.
|
||||
pub dhcp_hosts: Vec<DhcpHost>,
|
||||
}
|
||||
|
||||
/// Libvirt network forward mode.
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
|
||||
pub enum ForwardMode {
|
||||
Nat,
|
||||
Route,
|
||||
}
|
||||
|
||||
impl NetworkConfig {
|
||||
pub fn builder(name: impl Into<String>) -> NetworkConfigBuilder {
|
||||
NetworkConfigBuilder::new(name)
|
||||
}
|
||||
}
|
||||
|
||||
/// Builder for [`NetworkConfig`].
|
||||
#[derive(Debug)]
|
||||
pub struct NetworkConfigBuilder {
|
||||
name: String,
|
||||
bridge: Option<String>,
|
||||
gateway_ip: String,
|
||||
prefix_len: u8,
|
||||
forward_mode: Option<ForwardMode>,
|
||||
dhcp_range: Option<(String, String)>,
|
||||
dhcp_hosts: Vec<DhcpHost>,
|
||||
}
|
||||
|
||||
impl NetworkConfigBuilder {
|
||||
pub fn new(name: impl Into<String>) -> Self {
|
||||
Self {
|
||||
name: name.into(),
|
||||
bridge: None,
|
||||
gateway_ip: "192.168.100.1".to_string(),
|
||||
prefix_len: 24,
|
||||
forward_mode: Some(ForwardMode::Nat),
|
||||
dhcp_range: None,
|
||||
dhcp_hosts: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
pub fn bridge(mut self, bridge: impl Into<String>) -> Self {
|
||||
self.bridge = Some(bridge.into());
|
||||
self
|
||||
}
|
||||
|
||||
/// Sets the gateway IP and prefix length (e.g. `"192.168.100.1"`, `24`).
|
||||
pub fn subnet(mut self, gateway_ip: impl Into<String>, prefix_len: u8) -> Self {
|
||||
self.gateway_ip = gateway_ip.into();
|
||||
self.prefix_len = prefix_len;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn isolated(mut self) -> Self {
|
||||
self.forward_mode = None;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn forward(mut self, mode: ForwardMode) -> Self {
|
||||
self.forward_mode = Some(mode);
|
||||
self
|
||||
}
|
||||
|
||||
/// Enable libvirt's built-in DHCP server with the given range.
|
||||
pub fn dhcp_range(mut self, start: impl Into<String>, end: impl Into<String>) -> Self {
|
||||
self.dhcp_range = Some((start.into(), end.into()));
|
||||
self
|
||||
}
|
||||
|
||||
/// Add a static DHCP host entry (MAC → fixed IP).
|
||||
pub fn dhcp_host(
|
||||
mut self,
|
||||
mac: impl Into<String>,
|
||||
ip: impl Into<String>,
|
||||
name: Option<String>,
|
||||
) -> Self {
|
||||
self.dhcp_hosts.push(DhcpHost {
|
||||
mac: mac.into(),
|
||||
ip: ip.into(),
|
||||
name,
|
||||
});
|
||||
self
|
||||
}
|
||||
|
||||
pub fn build(self) -> NetworkConfig {
|
||||
NetworkConfig {
|
||||
bridge: self
|
||||
.bridge
|
||||
.unwrap_or_else(|| format!("virbr-{}", self.name.replace('-', ""))),
|
||||
name: self.name,
|
||||
gateway_ip: self.gateway_ip,
|
||||
prefix_len: self.prefix_len,
|
||||
forward_mode: self.forward_mode,
|
||||
dhcp_range: self.dhcp_range,
|
||||
dhcp_hosts: self.dhcp_hosts,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Current state of a VM as returned by libvirt.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize)]
|
||||
pub enum VmStatus {
|
||||
Running,
|
||||
Paused,
|
||||
Shutoff,
|
||||
Crashed,
|
||||
PMSuspended,
|
||||
Other,
|
||||
}
|
||||
@@ -1,487 +0,0 @@
|
||||
//! Libvirt XML generation via string templates.
|
||||
//!
|
||||
//! # Why string templates?
|
||||
//!
|
||||
//! These functions build libvirt domain, network, and volume XML as formatted
|
||||
//! strings rather than typed structs. This is fragile — there is no compile-time
|
||||
//! guarantee that the output is valid XML, and tests rely on substring matching
|
||||
//! rather than structural validation.
|
||||
//!
|
||||
//! We investigated typed alternatives (evaluated 2026-03-24):
|
||||
//!
|
||||
//! - **`libvirt-rust-xml`** (gen branch, by Marc-André Lureau / Red Hat):
|
||||
//! <https://gitlab.com/marcandre.lureau/libvirt-rust-xml/-/tree/gen>
|
||||
//! Uses `relaxng-gen` (<https://github.com/elmarco/relaxng-rust>) to generate
|
||||
//! Rust structs from libvirt's official RelaxNG schemas. This is the correct
|
||||
//! long-term solution — zero maintenance burden, schema-validated, round-trip
|
||||
//! serialization. However, as of commit `baca481`, `virtxml-domain` and
|
||||
//! `virtxml-storage-volume` do not compile (missing modules + type inference
|
||||
//! errors in the generated code). Only `virtxml-network` compiles.
|
||||
//!
|
||||
//! - **`libvirt-go-xml-module`** (Go, official libvirt project):
|
||||
//! <https://gitlab.com/libvirt/libvirt-go-xml-module>
|
||||
//! 572 hand-maintained typed structs for domain XML alone. MIT licensed.
|
||||
//! Could be ported to Rust, but maintaining a manual port is the burden we
|
||||
//! want to avoid.
|
||||
//!
|
||||
//! - **`virt` crate** (0.4.3, already in use):
|
||||
//! C bindings to libvirt. Handles API calls but provides no XML typing —
|
||||
//! `Domain::define_xml()` takes `&str`. This stays regardless of XML approach.
|
||||
//!
|
||||
//! # When to revisit
|
||||
//!
|
||||
//! Track the `libvirt-rust-xml` gen branch. When `virtxml-domain` compiles,
|
||||
//! replace these templates with typed struct construction + `quick-xml`
|
||||
//! serialization. The `VmConfig`/`NetworkConfig` builder API stays unchanged —
|
||||
//! only the internal XML generation changes.
|
||||
|
||||
use super::types::{CdromConfig, DiskConfig, ForwardMode, NetworkConfig, VmConfig};
|
||||
|
||||
/// Renders the libvirt domain XML for a VM definition.
|
||||
///
|
||||
/// The caller passes the image directory where qcow2 volumes are stored.
|
||||
pub fn domain_xml(vm: &VmConfig, image_dir: &str) -> String {
|
||||
let memory_kib = vm.memory_mib * 1024;
|
||||
|
||||
let os_boot = vm
|
||||
.boot_order
|
||||
.iter()
|
||||
.map(|b| format!(" <boot dev='{}'/>\n", b.as_xml_dev()))
|
||||
.collect::<String>();
|
||||
|
||||
let devices = {
|
||||
let disks = disk_devices(vm, image_dir);
|
||||
let cdroms = cdrom_devices(vm);
|
||||
let nics = nic_devices(vm);
|
||||
format!("{disks}{cdroms}{nics}")
|
||||
};
|
||||
|
||||
format!(
|
||||
r#"<domain type='kvm'>
|
||||
<name>{name}</name>
|
||||
<memory unit='KiB'>{memory_kib}</memory>
|
||||
<vcpu>{vcpus}</vcpu>
|
||||
<os>
|
||||
<type arch='x86_64' machine='q35'>hvm</type>
|
||||
{os_boot} </os>
|
||||
<features>
|
||||
<acpi/>
|
||||
<apic/>
|
||||
</features>
|
||||
<cpu mode='host-model'/>
|
||||
<devices>
|
||||
<emulator>/usr/bin/qemu-system-x86_64</emulator>
|
||||
{devices} <serial type='pty'>
|
||||
<target port='0'/>
|
||||
</serial>
|
||||
<console type='pty'>
|
||||
<target type='serial' port='0'/>
|
||||
</console>
|
||||
</devices>
|
||||
</domain>"#,
|
||||
name = vm.name,
|
||||
memory_kib = memory_kib,
|
||||
vcpus = vm.vcpus,
|
||||
os_boot = os_boot,
|
||||
devices = devices,
|
||||
)
|
||||
}
|
||||
|
||||
fn disk_devices(vm: &VmConfig, image_dir: &str) -> String {
|
||||
vm.disks
|
||||
.iter()
|
||||
.map(|d| format_disk(vm, d, image_dir))
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn cdrom_devices(vm: &VmConfig) -> String {
|
||||
vm.cdroms.iter().map(|c| format_cdrom(c)).collect()
|
||||
}
|
||||
|
||||
fn format_disk(vm: &VmConfig, disk: &DiskConfig, image_dir: &str) -> String {
|
||||
let path = disk
|
||||
.source_path
|
||||
.clone()
|
||||
.unwrap_or_else(|| format!("{image_dir}/{}-{}.qcow2", vm.name, disk.device));
|
||||
format!(
|
||||
r#" <disk type='file' device='disk'>
|
||||
<driver name='qemu' type='qcow2'/>
|
||||
<source file='{path}'/>
|
||||
<target dev='{dev}' bus='virtio'/>
|
||||
</disk>
|
||||
"#,
|
||||
path = path,
|
||||
dev = disk.device,
|
||||
)
|
||||
}
|
||||
|
||||
fn format_cdrom(cdrom: &CdromConfig) -> String {
|
||||
let source = &cdrom.source;
|
||||
let dev = &cdrom.device;
|
||||
format!(
|
||||
r#" <disk type='file' device='cdrom'>
|
||||
<driver name='qemu' type='raw'/>
|
||||
<source file='{source}'/>
|
||||
<target dev='{dev}' bus='sata'/>
|
||||
</disk>
|
||||
"#,
|
||||
source = source,
|
||||
dev = dev,
|
||||
)
|
||||
}
|
||||
|
||||
fn nic_devices(vm: &VmConfig) -> String {
|
||||
vm.networks
|
||||
.iter()
|
||||
.map(|net| {
|
||||
let mac_line = net
|
||||
.mac
|
||||
.as_deref()
|
||||
.map(|m| format!("\n <mac address='{m}'/>"))
|
||||
.unwrap_or_default();
|
||||
format!(
|
||||
r#" <interface type='network'>
|
||||
<source network='{network}'/>{mac}
|
||||
<model type='virtio'/>
|
||||
</interface>
|
||||
"#,
|
||||
network = net.name,
|
||||
mac = mac_line,
|
||||
)
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Renders the libvirt network XML for a virtual network definition.
|
||||
pub fn network_xml(cfg: &NetworkConfig) -> String {
|
||||
let forward = match cfg.forward_mode {
|
||||
Some(ForwardMode::Nat) => " <forward mode='nat'/>\n",
|
||||
Some(ForwardMode::Route) => " <forward mode='route'/>\n",
|
||||
None => "",
|
||||
};
|
||||
|
||||
let dhcp = if cfg.dhcp_range.is_some() || !cfg.dhcp_hosts.is_empty() {
|
||||
let mut dhcp_xml = String::from(" <dhcp>\n");
|
||||
if let Some((start, end)) = &cfg.dhcp_range {
|
||||
dhcp_xml.push_str(&format!(" <range start='{start}' end='{end}'/>\n"));
|
||||
}
|
||||
for host in &cfg.dhcp_hosts {
|
||||
let name_attr = host
|
||||
.name
|
||||
.as_deref()
|
||||
.map(|n| format!(" name='{n}'"))
|
||||
.unwrap_or_default();
|
||||
dhcp_xml.push_str(&format!(
|
||||
" <host mac='{mac}'{name_attr} ip='{ip}'/>\n",
|
||||
mac = host.mac,
|
||||
ip = host.ip,
|
||||
));
|
||||
}
|
||||
dhcp_xml.push_str(" </dhcp>\n");
|
||||
dhcp_xml
|
||||
} else {
|
||||
String::new()
|
||||
};
|
||||
|
||||
format!(
|
||||
r#"<network>
|
||||
<name>{name}</name>
|
||||
<bridge name='{bridge}' stp='on' delay='0'/>
|
||||
{forward} <ip address='{gateway}' prefix='{prefix}'>
|
||||
{dhcp} </ip>
|
||||
</network>"#,
|
||||
name = cfg.name,
|
||||
bridge = cfg.bridge,
|
||||
forward = forward,
|
||||
gateway = cfg.gateway_ip,
|
||||
prefix = cfg.prefix_len,
|
||||
dhcp = dhcp,
|
||||
)
|
||||
}
|
||||
|
||||
/// Renders the libvirt storage volume XML for a qcow2 disk.
|
||||
pub fn volume_xml(name: &str, size_gb: u32) -> String {
|
||||
let capacity_bytes: u64 = size_gb as u64 * 1024 * 1024 * 1024;
|
||||
format!(
|
||||
r#"<volume>
|
||||
<name>{name}.qcow2</name>
|
||||
<capacity unit='bytes'>{capacity}</capacity>
|
||||
<target>
|
||||
<format type='qcow2'/>
|
||||
</target>
|
||||
</volume>"#,
|
||||
name = name,
|
||||
capacity = capacity_bytes,
|
||||
)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::modules::kvm::types::{
|
||||
BootDevice, ForwardMode, NetworkConfig, NetworkRef, VmConfig,
|
||||
};
|
||||
|
||||
// ── Domain XML ──────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn domain_xml_contains_vm_name() {
|
||||
let vm = VmConfig::builder("test-vm")
|
||||
.vcpus(2)
|
||||
.memory_gb(4)
|
||||
.disk(20)
|
||||
.network(NetworkRef::named("mynet"))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build();
|
||||
|
||||
let xml = domain_xml(&vm, "/var/lib/libvirt/images");
|
||||
assert!(xml.contains("<name>test-vm</name>"));
|
||||
assert!(xml.contains("source network='mynet'"));
|
||||
assert!(xml.contains("boot dev='network'"));
|
||||
assert!(xml.contains("boot dev='hd'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn domain_xml_memory_conversion() {
|
||||
let vm = VmConfig::builder("mem-test").memory_gb(8).build();
|
||||
let xml = domain_xml(&vm, "/tmp");
|
||||
// 8 GB = 8 * 1024 MiB = 8192 MiB = 8388608 KiB
|
||||
assert!(xml.contains("<memory unit='KiB'>8388608</memory>"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn domain_xml_multiple_disks() {
|
||||
let vm = VmConfig::builder("multi-disk")
|
||||
.disk(120) // vda
|
||||
.disk(200) // vdb
|
||||
.disk(500) // vdc
|
||||
.build();
|
||||
|
||||
let xml = domain_xml(&vm, "/images");
|
||||
assert!(xml.contains("multi-disk-vda.qcow2"));
|
||||
assert!(xml.contains("multi-disk-vdb.qcow2"));
|
||||
assert!(xml.contains("multi-disk-vdc.qcow2"));
|
||||
assert!(xml.contains("dev='vda'"));
|
||||
assert!(xml.contains("dev='vdb'"));
|
||||
assert!(xml.contains("dev='vdc'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn domain_xml_multiple_nics() {
|
||||
let vm = VmConfig::builder("multi-nic")
|
||||
.network(NetworkRef::named("default"))
|
||||
.network(NetworkRef::named("management"))
|
||||
.network(NetworkRef::named("storage"))
|
||||
.build();
|
||||
|
||||
let xml = domain_xml(&vm, "/tmp");
|
||||
assert!(xml.contains("source network='default'"));
|
||||
assert!(xml.contains("source network='management'"));
|
||||
assert!(xml.contains("source network='storage'"));
|
||||
// All NICs should be virtio
|
||||
assert_eq!(xml.matches("model type='virtio'").count(), 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn domain_xml_nic_with_mac_address() {
|
||||
let vm = VmConfig::builder("mac-test")
|
||||
.network(NetworkRef::named("mynet").with_mac("52:54:00:AA:BB:CC"))
|
||||
.build();
|
||||
|
||||
let xml = domain_xml(&vm, "/tmp");
|
||||
assert!(xml.contains("mac address='52:54:00:AA:BB:CC'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn domain_xml_cdrom_device() {
|
||||
let vm = VmConfig::builder("iso-test")
|
||||
.cdrom("/path/to/image.iso")
|
||||
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
|
||||
.build();
|
||||
|
||||
let xml = domain_xml(&vm, "/tmp");
|
||||
assert!(xml.contains("device='cdrom'"));
|
||||
assert!(xml.contains("source file='/path/to/image.iso'"));
|
||||
assert!(xml.contains("bus='sata'"));
|
||||
assert!(xml.contains("boot dev='cdrom'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn domain_xml_q35_machine_type() {
|
||||
let vm = VmConfig::builder("q35-test").build();
|
||||
let xml = domain_xml(&vm, "/tmp");
|
||||
assert!(xml.contains("machine='q35'"));
|
||||
assert!(xml.contains("<acpi/>"));
|
||||
assert!(xml.contains("<apic/>"));
|
||||
assert!(xml.contains("mode='host-model'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn domain_xml_serial_console() {
|
||||
let vm = VmConfig::builder("console-test").build();
|
||||
let xml = domain_xml(&vm, "/tmp");
|
||||
assert!(xml.contains("<serial type='pty'>"));
|
||||
assert!(xml.contains("<console type='pty'>"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn domain_xml_empty_boot_order() {
|
||||
let vm = VmConfig::builder("no-boot").build();
|
||||
let xml = domain_xml(&vm, "/tmp");
|
||||
// No boot entries should be present
|
||||
assert!(!xml.contains("boot dev="));
|
||||
}
|
||||
|
||||
// ── Network XML ─────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn network_xml_isolated_has_no_forward() {
|
||||
let cfg = NetworkConfig::builder("testnet")
|
||||
.subnet("10.0.0.1", 24)
|
||||
.isolated()
|
||||
.build();
|
||||
|
||||
let xml = network_xml(&cfg);
|
||||
assert!(!xml.contains("<forward"));
|
||||
assert!(xml.contains("10.0.0.1"));
|
||||
assert!(xml.contains("prefix='24'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn network_xml_nat_mode() {
|
||||
let cfg = NetworkConfig::builder("natnet")
|
||||
.subnet("192.168.200.1", 24)
|
||||
.forward(ForwardMode::Nat)
|
||||
.build();
|
||||
|
||||
let xml = network_xml(&cfg);
|
||||
assert!(xml.contains("<forward mode='nat'/>"));
|
||||
assert!(xml.contains("192.168.200.1"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn network_xml_route_mode() {
|
||||
let cfg = NetworkConfig::builder("routenet")
|
||||
.subnet("10.10.0.1", 16)
|
||||
.forward(ForwardMode::Route)
|
||||
.build();
|
||||
|
||||
let xml = network_xml(&cfg);
|
||||
assert!(xml.contains("<forward mode='route'/>"));
|
||||
assert!(xml.contains("prefix='16'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn network_xml_custom_bridge() {
|
||||
let cfg = NetworkConfig::builder("custom")
|
||||
.bridge("br-custom")
|
||||
.subnet("172.16.0.1", 24)
|
||||
.build();
|
||||
|
||||
let xml = network_xml(&cfg);
|
||||
assert!(xml.contains("name='br-custom'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn network_xml_auto_bridge_name() {
|
||||
let cfg = NetworkConfig::builder("harmony-test").isolated().build();
|
||||
|
||||
// Bridge auto-generated: virbr-{name} with hyphens removed from name
|
||||
assert_eq!(cfg.bridge, "virbr-harmonytest");
|
||||
}
|
||||
|
||||
// ── Volume XML ──────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn volume_xml_size_calculation() {
|
||||
let xml = volume_xml("test-vol", 100);
|
||||
// 100 GB = 100 * 1024^3 bytes = 107374182400
|
||||
assert!(xml.contains("<capacity unit='bytes'>107374182400</capacity>"));
|
||||
assert!(xml.contains("<name>test-vol.qcow2</name>"));
|
||||
assert!(xml.contains("type='qcow2'"));
|
||||
}
|
||||
|
||||
// ── Builder defaults ────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn vm_builder_defaults() {
|
||||
let vm = VmConfig::builder("defaults").build();
|
||||
assert_eq!(vm.name, "defaults");
|
||||
assert_eq!(vm.vcpus, 2);
|
||||
assert_eq!(vm.memory_mib, 4096);
|
||||
assert!(vm.disks.is_empty());
|
||||
assert!(vm.networks.is_empty());
|
||||
assert!(vm.cdroms.is_empty());
|
||||
assert!(vm.boot_order.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn network_builder_defaults() {
|
||||
let net = NetworkConfig::builder("testnet").build();
|
||||
assert_eq!(net.name, "testnet");
|
||||
assert_eq!(net.gateway_ip, "192.168.100.1");
|
||||
assert_eq!(net.prefix_len, 24);
|
||||
assert!(matches!(net.forward_mode, Some(ForwardMode::Nat)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn disk_sequential_naming() {
|
||||
let vm = VmConfig::builder("seq")
|
||||
.disk(10)
|
||||
.disk(20)
|
||||
.disk(30)
|
||||
.disk(40)
|
||||
.build();
|
||||
assert_eq!(vm.disks[0].device, "vda");
|
||||
assert_eq!(vm.disks[1].device, "vdb");
|
||||
assert_eq!(vm.disks[2].device, "vdc");
|
||||
assert_eq!(vm.disks[3].device, "vdd");
|
||||
assert_eq!(vm.disks[0].size_gb, 10);
|
||||
assert_eq!(vm.disks[3].size_gb, 40);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn network_xml_with_dhcp_range() {
|
||||
let cfg = NetworkConfig::builder("dhcpnet")
|
||||
.subnet("10.50.0.1", 24)
|
||||
.dhcp_range("10.50.0.100", "10.50.0.200")
|
||||
.build();
|
||||
|
||||
let xml = network_xml(&cfg);
|
||||
assert!(xml.contains("<dhcp>"));
|
||||
assert!(xml.contains("range start='10.50.0.100' end='10.50.0.200'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn network_xml_with_dhcp_host() {
|
||||
let cfg = NetworkConfig::builder("hostnet")
|
||||
.subnet("10.50.0.1", 24)
|
||||
.dhcp_range("10.50.0.100", "10.50.0.200")
|
||||
.dhcp_host(
|
||||
"52:54:00:00:50:01",
|
||||
"10.50.0.2",
|
||||
Some("opnsense".to_string()),
|
||||
)
|
||||
.build();
|
||||
|
||||
let xml = network_xml(&cfg);
|
||||
assert!(xml.contains("host mac='52:54:00:00:50:01'"));
|
||||
assert!(xml.contains("name='opnsense'"));
|
||||
assert!(xml.contains("ip='10.50.0.2'"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn network_xml_no_dhcp_by_default() {
|
||||
let cfg = NetworkConfig::builder("nodhcp").build();
|
||||
let xml = network_xml(&cfg);
|
||||
assert!(!xml.contains("<dhcp>"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn disk_custom_pool() {
|
||||
let vm = VmConfig::builder("pool-test")
|
||||
.disk_from_pool(100, "ssd-pool")
|
||||
.build();
|
||||
assert_eq!(vm.disks[0].pool, "ssd-pool");
|
||||
}
|
||||
}
|
||||
@@ -19,12 +19,6 @@ pub struct LoadBalancerScore {
|
||||
// (listen_interface, LoadBalancerService) tuples or something like that
|
||||
// I am not sure what to use as listen_interface, should it be interface name, ip address,
|
||||
// uuid?
|
||||
/// TCP ports that must be open for inbound WAN traffic.
|
||||
///
|
||||
/// The load balancer interpret will call `ensure_wan_access` for each port
|
||||
/// before configuring services, so that the load balancer is reachable
|
||||
/// from outside the LAN.
|
||||
pub wan_firewall_ports: Vec<u16>,
|
||||
}
|
||||
|
||||
impl<T: Topology + LoadBalancer> Score<T> for LoadBalancerScore {
|
||||
@@ -66,11 +60,6 @@ impl<T: Topology + LoadBalancer> Interpret<T> for LoadBalancerInterpret {
|
||||
load_balancer.ensure_initialized().await?
|
||||
);
|
||||
|
||||
for port in &self.score.wan_firewall_ports {
|
||||
info!("Ensuring WAN access for port {port}");
|
||||
load_balancer.ensure_wan_access(*port).await?;
|
||||
}
|
||||
|
||||
for service in self.score.public_services.iter() {
|
||||
info!("Ensuring service exists {service:?}");
|
||||
|
||||
|
||||
@@ -10,7 +10,6 @@ pub mod http;
|
||||
pub mod inventory;
|
||||
pub mod k3d;
|
||||
pub mod k8s;
|
||||
pub mod kvm;
|
||||
pub mod lamp;
|
||||
pub mod load_balancer;
|
||||
pub mod monitoring;
|
||||
|
||||
@@ -20,7 +20,6 @@ use async_trait::async_trait;
|
||||
use derive_new::new;
|
||||
use harmony_secret::SecretManager;
|
||||
use harmony_types::id::Id;
|
||||
use harmony_types::net::Url;
|
||||
use log::{debug, info};
|
||||
use serde::Serialize;
|
||||
use std::path::PathBuf;
|
||||
@@ -104,7 +103,7 @@ impl OKDSetup02BootstrapInterpret {
|
||||
)));
|
||||
} else {
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] Created OKD installation directory {}",
|
||||
"Created OKD installation directory {}",
|
||||
okd_installation_path.to_string_lossy()
|
||||
);
|
||||
}
|
||||
@@ -136,7 +135,7 @@ impl OKDSetup02BootstrapInterpret {
|
||||
self.create_file(&install_config_backup, install_config_yaml.as_bytes())
|
||||
.await?;
|
||||
|
||||
info!("[Stage 02/Bootstrap] Creating manifest files with openshift-install");
|
||||
info!("Creating manifest files with openshift-install");
|
||||
let output = Command::new(okd_bin_path.join("openshift-install"))
|
||||
.args([
|
||||
"create",
|
||||
@@ -148,19 +147,10 @@ impl OKDSetup02BootstrapInterpret {
|
||||
.await
|
||||
.map_err(|e| InterpretError::new(format!("Failed to create okd manifest : {e}")))?;
|
||||
let stdout = String::from_utf8(output.stdout).unwrap();
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] openshift-install stdout :\n\n{}",
|
||||
stdout
|
||||
);
|
||||
info!("openshift-install stdout :\n\n{}", stdout);
|
||||
let stderr = String::from_utf8(output.stderr).unwrap();
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] openshift-install stderr :\n\n{}",
|
||||
stderr
|
||||
);
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] openshift-install exit status : {}",
|
||||
output.status
|
||||
);
|
||||
info!("openshift-install stderr :\n\n{}", stderr);
|
||||
info!("openshift-install exit status : {}", output.status);
|
||||
if !output.status.success() {
|
||||
return Err(InterpretError::new(format!(
|
||||
"Failed to create okd manifest, exit code {} : {}",
|
||||
@@ -168,7 +158,7 @@ impl OKDSetup02BootstrapInterpret {
|
||||
)));
|
||||
}
|
||||
|
||||
info!("[Stage 02/Bootstrap] Creating ignition files with openshift-install");
|
||||
info!("Creating ignition files with openshift-install");
|
||||
let output = Command::new(okd_bin_path.join("openshift-install"))
|
||||
.args([
|
||||
"create",
|
||||
@@ -182,19 +172,10 @@ impl OKDSetup02BootstrapInterpret {
|
||||
InterpretError::new(format!("Failed to create okd ignition config : {e}"))
|
||||
})?;
|
||||
let stdout = String::from_utf8(output.stdout).unwrap();
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] openshift-install stdout :\n\n{}",
|
||||
stdout
|
||||
);
|
||||
info!("openshift-install stdout :\n\n{}", stdout);
|
||||
let stderr = String::from_utf8(output.stderr).unwrap();
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] openshift-install stderr :\n\n{}",
|
||||
stderr
|
||||
);
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] openshift-install exit status : {}",
|
||||
output.status
|
||||
);
|
||||
info!("openshift-install stderr :\n\n{}", stderr);
|
||||
info!("openshift-install exit status : {}", output.status);
|
||||
if !output.status.success() {
|
||||
return Err(InterpretError::new(format!(
|
||||
"Failed to create okd manifest, exit code {} : {}",
|
||||
@@ -208,7 +189,7 @@ impl OKDSetup02BootstrapInterpret {
|
||||
let remote_path = ignition_files_http_path.join(filename);
|
||||
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] Preparing ignition file : {} -> {}",
|
||||
"Preparing file content for local file : {} to remote : {}",
|
||||
local_path.to_string_lossy(),
|
||||
remote_path.to_string_lossy()
|
||||
);
|
||||
@@ -239,27 +220,25 @@ impl OKDSetup02BootstrapInterpret {
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
|
||||
info!("[Stage 02/Bootstrap] Successfully prepared ignition files for OKD installation");
|
||||
|
||||
info!("Successfully prepared ignition files for OKD installation");
|
||||
// ignition_files_http_path // = PathBuf::from("okd_ignition_files");
|
||||
info!(
|
||||
"[Stage 02/Bootstrap] Uploading SCOS installer images from {} to HTTP server",
|
||||
okd_images_path.to_string_lossy()
|
||||
);
|
||||
info!(
|
||||
r#"[Stage 02/Bootstrap] Images can be refreshed with: openshift-install coreos print-stream-json | grep -Eo '"https.*(kernel.|initramfs.|rootfs.)\w+(\.img)?"' | grep x86_64 | xargs -n 1 curl -LO"#
|
||||
r#"Uploading images, they can be refreshed with a command similar to this one: openshift-install coreos print-stream-json | grep -Eo '"https.*(kernel.|initramfs.|rootfs.)\w+(\.img)?"' | grep x86_64 | xargs -n 1 curl -LO"#
|
||||
);
|
||||
|
||||
StaticFilesHttpScore {
|
||||
folder_to_serve: Some(Url::LocalFolder(
|
||||
okd_images_path.to_string_lossy().to_string(),
|
||||
)),
|
||||
remote_path: Some("scos".to_string()),
|
||||
files: vec![],
|
||||
}
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
inquire::Confirm::new(
|
||||
&format!("push installer image files with `scp -r {}/* root@{}:/usr/local/http/scos/` until performance issue is resolved", okd_images_path.to_string_lossy(), topology.http_server.get_ip())).prompt().expect("Prompt error");
|
||||
|
||||
info!("[Stage 02/Bootstrap] SCOS images uploaded successfully");
|
||||
// let scos_http_path = PathBuf::from("scos");
|
||||
// StaticFilesHttpScore {
|
||||
// folder_to_serve: Some(Url::LocalFolder(
|
||||
// okd_images_path.to_string_lossy().to_string(),
|
||||
// )),
|
||||
// remote_path: Some(scos_http_path.to_string_lossy().to_string()),
|
||||
// files: vec![],
|
||||
// }
|
||||
// .interpret(inventory, topology)
|
||||
// .await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -276,7 +255,7 @@ impl OKDSetup02BootstrapInterpret {
|
||||
physical_host,
|
||||
host_config,
|
||||
};
|
||||
info!("[Stage 02/Bootstrap] Configuring host binding for bootstrap node {binding:?}");
|
||||
info!("Configuring host binding for bootstrap node {binding:?}");
|
||||
|
||||
DhcpHostBindingScore {
|
||||
host_binding: vec![binding],
|
||||
@@ -329,7 +308,7 @@ impl OKDSetup02BootstrapInterpret {
|
||||
let outcome = OKDBootstrapLoadBalancerScore::new(topology)
|
||||
.interpret(inventory, topology)
|
||||
.await?;
|
||||
info!("[Stage 02/Bootstrap] Load balancer configured: {outcome:?}");
|
||||
info!("Successfully executed OKDBootstrapLoadBalancerScore : {outcome:?}");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
@@ -346,52 +325,10 @@ impl OKDSetup02BootstrapInterpret {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn wait_for_bootstrap_complete(
|
||||
&self,
|
||||
inventory: &Inventory,
|
||||
) -> Result<(), InterpretError> {
|
||||
info!("[Stage 02/Bootstrap] Waiting for bootstrap to complete...");
|
||||
info!("[Stage 02/Bootstrap] Running: openshift-install wait-for bootstrap-complete");
|
||||
|
||||
let okd_installation_path =
|
||||
format!("./data/okd/installation_files_{}", inventory.location.name);
|
||||
|
||||
let output = Command::new("./data/okd/bin/openshift-install")
|
||||
.args([
|
||||
"wait-for",
|
||||
"bootstrap-complete",
|
||||
"--dir",
|
||||
&okd_installation_path,
|
||||
"--log-level=info",
|
||||
])
|
||||
.output()
|
||||
.await
|
||||
.map_err(|e| {
|
||||
InterpretError::new(format!(
|
||||
"[Stage 02/Bootstrap] Failed to run openshift-install wait-for bootstrap-complete: {e}"
|
||||
))
|
||||
})?;
|
||||
|
||||
let stdout = String::from_utf8_lossy(&output.stdout);
|
||||
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||
|
||||
if !stdout.is_empty() {
|
||||
info!("[Stage 02/Bootstrap] openshift-install stdout:\n{stdout}");
|
||||
}
|
||||
if !stderr.is_empty() {
|
||||
info!("[Stage 02/Bootstrap] openshift-install stderr:\n{stderr}");
|
||||
}
|
||||
|
||||
if !output.status.success() {
|
||||
return Err(InterpretError::new(format!(
|
||||
"[Stage 02/Bootstrap] bootstrap-complete failed (exit {}): {}",
|
||||
output.status,
|
||||
stderr.lines().last().unwrap_or("unknown error")
|
||||
)));
|
||||
}
|
||||
|
||||
info!("[Stage 02/Bootstrap] Bootstrap complete!");
|
||||
Ok(())
|
||||
async fn wait_for_bootstrap_complete(&self) -> Result<(), InterpretError> {
|
||||
// Placeholder: wait-for bootstrap-complete
|
||||
info!("[Bootstrap] Waiting for bootstrap-complete …");
|
||||
todo!("[Bootstrap] Waiting for bootstrap-complete …")
|
||||
}
|
||||
|
||||
async fn create_file(&self, path: &PathBuf, content: &[u8]) -> Result<(), InterpretError> {
|
||||
@@ -444,7 +381,7 @@ impl Interpret<HAClusterTopology> for OKDSetup02BootstrapInterpret {
|
||||
// self.validate_dns_config(inventory, topology).await?;
|
||||
|
||||
self.reboot_target().await?;
|
||||
self.wait_for_bootstrap_complete(inventory).await?;
|
||||
self.wait_for_bootstrap_complete().await?;
|
||||
|
||||
Ok(Outcome::success("Bootstrap phase complete".into()))
|
||||
}
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
use std::net::{IpAddr, Ipv4Addr, SocketAddr};
|
||||
use std::net::SocketAddr;
|
||||
|
||||
use serde::Serialize;
|
||||
|
||||
@@ -19,30 +19,27 @@ pub struct OKDBootstrapLoadBalancerScore {
|
||||
|
||||
impl OKDBootstrapLoadBalancerScore {
|
||||
pub fn new(topology: &HAClusterTopology) -> Self {
|
||||
// Bind on 0.0.0.0 instead of the LAN IP to avoid CARP VIP race
|
||||
// conditions where HAProxy fails to bind when the interface
|
||||
// transitions back to master.
|
||||
let bind_addr = IpAddr::V4(Ipv4Addr::UNSPECIFIED);
|
||||
let private_ip = topology.router.get_gateway();
|
||||
|
||||
let private_services = vec![
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::topology_to_backend_server(topology, 80),
|
||||
listening_port: SocketAddr::new(bind_addr, 80),
|
||||
listening_port: SocketAddr::new(private_ip, 80),
|
||||
health_check: Some(HealthCheck::TCP(None)),
|
||||
},
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::topology_to_backend_server(topology, 443),
|
||||
listening_port: SocketAddr::new(bind_addr, 443),
|
||||
listening_port: SocketAddr::new(private_ip, 443),
|
||||
health_check: Some(HealthCheck::TCP(None)),
|
||||
},
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::topology_to_backend_server(topology, 22623),
|
||||
listening_port: SocketAddr::new(bind_addr, 22623),
|
||||
listening_port: SocketAddr::new(private_ip, 22623),
|
||||
health_check: Some(HealthCheck::TCP(None)),
|
||||
},
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::topology_to_backend_server(topology, 6443),
|
||||
listening_port: SocketAddr::new(bind_addr, 6443),
|
||||
listening_port: SocketAddr::new(private_ip, 6443),
|
||||
health_check: Some(HealthCheck::HTTP(
|
||||
None,
|
||||
"/readyz".to_string(),
|
||||
@@ -56,7 +53,6 @@ impl OKDBootstrapLoadBalancerScore {
|
||||
load_balancer_score: LoadBalancerScore {
|
||||
public_services: vec![],
|
||||
private_services,
|
||||
wan_firewall_ports: vec![80, 443],
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
@@ -78,9 +78,9 @@ impl OKDNodeInterpret {
|
||||
let required_hosts: i16 = okd_host_properties.required_hosts();
|
||||
|
||||
info!(
|
||||
"[{}] Discovery of {} hosts in progress, {} found so far",
|
||||
self.host_role,
|
||||
"Discovery of {} {} hosts in progress, current number {}",
|
||||
required_hosts,
|
||||
self.host_role,
|
||||
hosts.len()
|
||||
);
|
||||
// This score triggers the discovery agent for a specific role.
|
||||
@@ -118,9 +118,8 @@ impl OKDNodeInterpret {
|
||||
nodes: &Vec<(PhysicalHost, HostConfig)>,
|
||||
) -> Result<(), InterpretError> {
|
||||
info!(
|
||||
"[{}] Configuring DHCP host bindings for {} nodes",
|
||||
self.host_role,
|
||||
nodes.len()
|
||||
"[{}] Configuring host bindings for {} plane nodes.",
|
||||
self.host_role, self.host_role,
|
||||
);
|
||||
|
||||
let host_properties = self.okd_role_properties(&self.host_role);
|
||||
@@ -297,18 +296,14 @@ impl Interpret<HAClusterTopology> for OKDNodeInterpret {
|
||||
// and the cluster becomes fully functional only once all nodes are Ready and the
|
||||
// cluster operators report Available=True.
|
||||
info!(
|
||||
"[{}] Provisioning initiated for {} nodes. Monitor cluster convergence with: oc get nodes && oc get co",
|
||||
self.host_role,
|
||||
nodes.len()
|
||||
"[{}] Provisioning initiated. Monitor the cluster convergence manually.",
|
||||
self.host_role
|
||||
);
|
||||
|
||||
Ok(Outcome::success_with_details(
|
||||
format!("{} provisioning initiated", self.host_role),
|
||||
nodes
|
||||
.iter()
|
||||
.map(|(host, _)| format!(" {} (MACs: {:?})", host.id, host.get_mac_address()))
|
||||
.collect(),
|
||||
))
|
||||
Ok(Outcome::success(format!(
|
||||
"{} provisioning has been successfully initiated.",
|
||||
self.host_role
|
||||
)))
|
||||
}
|
||||
|
||||
fn get_name(&self) -> InterpretName {
|
||||
|
||||
@@ -74,7 +74,14 @@ impl<T: Topology + DhcpServer + TftpServer + HttpServer + Router> Interpret<T>
|
||||
}),
|
||||
Box::new(StaticFilesHttpScore {
|
||||
remote_path: None,
|
||||
folder_to_serve: Some(Url::LocalFolder("./data/pxe/okd/http_files/".to_string())),
|
||||
// TODO The current russh based copy is way too slow, check for a lib update or use scp
|
||||
// when available
|
||||
//
|
||||
// For now just run :
|
||||
// scp -r data/pxe/okd/http_files/* root@192.168.1.1:/usr/local/http/
|
||||
//
|
||||
folder_to_serve: None,
|
||||
// folder_to_serve: Some(Url::LocalFolder("./data/pxe/okd/http_files/".to_string())),
|
||||
files: vec![
|
||||
FileContent {
|
||||
path: FilePath::Relative("boot.ipxe".to_string()),
|
||||
@@ -116,9 +123,9 @@ impl<T: Topology + DhcpServer + TftpServer + HttpServer + Router> Interpret<T>
|
||||
Err(e) => return Err(e),
|
||||
};
|
||||
}
|
||||
Ok(Outcome::success(
|
||||
"iPXE boot infrastructure installed".to_string(),
|
||||
))
|
||||
inquire::Confirm::new(&format!("Execute the copy : `scp -r data/pxe/okd/http_files/* root@{}:/usr/local/http/` and confirm when done to continue", HttpServer::get_ip(topology))).prompt().expect("Prompt error");
|
||||
|
||||
Ok(Outcome::success("Ipxe installed".to_string()))
|
||||
}
|
||||
|
||||
fn get_name(&self) -> InterpretName {
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
use std::net::{IpAddr, Ipv4Addr, SocketAddr};
|
||||
use std::net::SocketAddr;
|
||||
|
||||
use serde::Serialize;
|
||||
|
||||
@@ -8,7 +8,7 @@ use crate::{
|
||||
score::Score,
|
||||
topology::{
|
||||
BackendServer, HAClusterTopology, HealthCheck, HttpMethod, HttpStatusCode, LoadBalancer,
|
||||
LoadBalancerService, SSL, Topology,
|
||||
LoadBalancerService, LogicalHost, Router, SSL, Topology,
|
||||
},
|
||||
};
|
||||
|
||||
@@ -53,19 +53,16 @@ pub struct OKDLoadBalancerScore {
|
||||
/// ```
|
||||
impl OKDLoadBalancerScore {
|
||||
pub fn new(topology: &HAClusterTopology) -> Self {
|
||||
// Bind on 0.0.0.0 instead of the LAN IP to avoid CARP VIP race
|
||||
// conditions where HAProxy fails to bind when the interface
|
||||
// transitions back to master.
|
||||
let bind_addr = IpAddr::V4(Ipv4Addr::UNSPECIFIED);
|
||||
let public_ip = topology.router.get_gateway();
|
||||
let public_services = vec![
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::nodes_to_backend_server(topology, 80),
|
||||
listening_port: SocketAddr::new(bind_addr, 80),
|
||||
listening_port: SocketAddr::new(public_ip, 80),
|
||||
health_check: None,
|
||||
},
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::nodes_to_backend_server(topology, 443),
|
||||
listening_port: SocketAddr::new(bind_addr, 443),
|
||||
listening_port: SocketAddr::new(public_ip, 443),
|
||||
health_check: None,
|
||||
},
|
||||
];
|
||||
@@ -73,7 +70,7 @@ impl OKDLoadBalancerScore {
|
||||
let private_services = vec![
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::nodes_to_backend_server(topology, 80),
|
||||
listening_port: SocketAddr::new(bind_addr, 80),
|
||||
listening_port: SocketAddr::new(public_ip, 80),
|
||||
health_check: Some(HealthCheck::HTTP(
|
||||
Some(25001),
|
||||
"/health?check=okd_router_1936,node_ready".to_string(),
|
||||
@@ -84,7 +81,7 @@ impl OKDLoadBalancerScore {
|
||||
},
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::nodes_to_backend_server(topology, 443),
|
||||
listening_port: SocketAddr::new(bind_addr, 443),
|
||||
listening_port: SocketAddr::new(public_ip, 443),
|
||||
health_check: Some(HealthCheck::HTTP(
|
||||
Some(25001),
|
||||
"/health?check=okd_router_1936,node_ready".to_string(),
|
||||
@@ -95,12 +92,12 @@ impl OKDLoadBalancerScore {
|
||||
},
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::control_plane_to_backend_server(topology, 22623),
|
||||
listening_port: SocketAddr::new(bind_addr, 22623),
|
||||
listening_port: SocketAddr::new(public_ip, 22623),
|
||||
health_check: Some(HealthCheck::TCP(None)),
|
||||
},
|
||||
LoadBalancerService {
|
||||
backend_servers: Self::control_plane_to_backend_server(topology, 6443),
|
||||
listening_port: SocketAddr::new(bind_addr, 6443),
|
||||
listening_port: SocketAddr::new(public_ip, 6443),
|
||||
health_check: Some(HealthCheck::HTTP(
|
||||
None,
|
||||
"/readyz".to_string(),
|
||||
@@ -114,7 +111,6 @@ impl OKDLoadBalancerScore {
|
||||
load_balancer_score: LoadBalancerScore {
|
||||
public_services,
|
||||
private_services,
|
||||
wan_firewall_ports: vec![80, 443],
|
||||
},
|
||||
}
|
||||
}
|
||||
@@ -169,7 +165,7 @@ mod tests {
|
||||
use std::sync::{Arc, OnceLock};
|
||||
|
||||
use super::*;
|
||||
use crate::topology::{DummyInfra, LogicalHost, Router};
|
||||
use crate::topology::DummyInfra;
|
||||
use harmony_macros::ip;
|
||||
use harmony_types::net::IpAddress;
|
||||
|
||||
@@ -300,30 +296,6 @@ mod tests {
|
||||
assert_eq!(public_service_443.backend_servers.len(), 5);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_all_services_bind_on_unspecified_address() {
|
||||
let topology = create_test_topology();
|
||||
let score = OKDLoadBalancerScore::new(&topology);
|
||||
let unspecified = IpAddr::V4(Ipv4Addr::UNSPECIFIED);
|
||||
|
||||
for svc in &score.load_balancer_score.public_services {
|
||||
assert_eq!(
|
||||
svc.listening_port.ip(),
|
||||
unspecified,
|
||||
"Public service on port {} should bind on 0.0.0.0",
|
||||
svc.listening_port.port()
|
||||
);
|
||||
}
|
||||
for svc in &score.load_balancer_score.private_services {
|
||||
assert_eq!(
|
||||
svc.listening_port.ip(),
|
||||
unspecified,
|
||||
"Private service on port {} should bind on 0.0.0.0",
|
||||
svc.listening_port.port()
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_private_service_port_22623_only_control_plane() {
|
||||
let topology = create_test_topology();
|
||||
@@ -339,13 +311,6 @@ mod tests {
|
||||
assert_eq!(private_service_22623.backend_servers.len(), 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_wan_firewall_ports_include_http_and_https() {
|
||||
let topology = create_test_topology();
|
||||
let score = OKDLoadBalancerScore::new(&topology);
|
||||
assert_eq!(score.load_balancer_score.wan_firewall_ports, vec![80, 443]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_all_backend_servers_have_correct_port() {
|
||||
let topology = create_test_topology();
|
||||
|
||||
@@ -1,5 +1,3 @@
|
||||
pub mod setup;
|
||||
|
||||
use std::str::FromStr;
|
||||
|
||||
use harmony_macros::hurl;
|
||||
@@ -13,15 +11,10 @@ use crate::{
|
||||
topology::{HelmCommand, K8sclient, Topology},
|
||||
};
|
||||
|
||||
pub use setup::{OpenbaoJwtAuth, OpenbaoPolicy, OpenbaoSetupScore, OpenbaoUser};
|
||||
|
||||
#[derive(Debug, Serialize, Clone)]
|
||||
pub struct OpenbaoScore {
|
||||
/// Host used for external access (ingress)
|
||||
pub host: String,
|
||||
/// Set to true when deploying to OpenShift. Defaults to false for k3d/Kubernetes.
|
||||
#[serde(default)]
|
||||
pub openshift: bool,
|
||||
}
|
||||
|
||||
impl<T: Topology + K8sclient + HelmCommand> Score<T> for OpenbaoScore {
|
||||
@@ -31,12 +24,12 @@ impl<T: Topology + K8sclient + HelmCommand> Score<T> for OpenbaoScore {
|
||||
|
||||
#[doc(hidden)]
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||
// TODO exec pod commands to initialize secret store if not already done
|
||||
let host = &self.host;
|
||||
let openshift = self.openshift;
|
||||
|
||||
let values_yaml = Some(format!(
|
||||
r#"global:
|
||||
openshift: {openshift}
|
||||
openshift: true
|
||||
server:
|
||||
standalone:
|
||||
enabled: true
|
||||
|
||||
@@ -1,527 +0,0 @@
|
||||
use std::path::PathBuf;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use log::{info, warn};
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use crate::{
|
||||
data::Version,
|
||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||
inventory::Inventory,
|
||||
score::Score,
|
||||
topology::{K8sclient, Topology},
|
||||
};
|
||||
use harmony_types::id::Id;
|
||||
|
||||
const DEFAULT_NAMESPACE: &str = "openbao";
|
||||
const DEFAULT_POD: &str = "openbao-0";
|
||||
const DEFAULT_KV_MOUNT: &str = "secret";
|
||||
|
||||
/// A policy to create in OpenBao.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct OpenbaoPolicy {
|
||||
pub name: String,
|
||||
pub hcl: String,
|
||||
}
|
||||
|
||||
/// A userpass user to create in OpenBao.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct OpenbaoUser {
|
||||
pub username: String,
|
||||
pub password: String,
|
||||
pub policies: Vec<String>,
|
||||
}
|
||||
|
||||
/// JWT auth method configuration for OpenBao.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct OpenbaoJwtAuth {
|
||||
pub oidc_discovery_url: String,
|
||||
pub bound_issuer: String,
|
||||
pub role_name: String,
|
||||
pub bound_audiences: String,
|
||||
pub user_claim: String,
|
||||
pub policies: Vec<String>,
|
||||
pub ttl: String,
|
||||
pub max_ttl: String,
|
||||
}
|
||||
|
||||
/// Score that initializes, unseals, and configures an already-deployed OpenBao
|
||||
/// instance.
|
||||
///
|
||||
/// This Score handles the operational lifecycle that follows the Helm
|
||||
/// deployment (handled by [`OpenbaoScore`]):
|
||||
///
|
||||
/// 1. **Init** — `bao operator init`, stores unseal keys locally
|
||||
/// 2. **Unseal** — applies stored unseal keys (3 of 5 by default)
|
||||
/// 3. **KV v2** — enables the versioned KV secrets engine
|
||||
/// 4. **Policies** — creates configurable access policies
|
||||
/// 5. **Userpass** — creates dev/operator users with assigned policies
|
||||
/// 6. **JWT auth** — (optional) configures JWT auth for OIDC-based access
|
||||
///
|
||||
/// All steps are idempotent: re-running skips already-completed work.
|
||||
///
|
||||
/// Unseal keys are cached at `~/.local/share/harmony/openbao/unseal-keys.json`
|
||||
/// (with `0600` permissions on Unix). This is a development convenience; production
|
||||
/// deployments should use auto-unseal (Transit, cloud KMS, etc.).
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct OpenbaoSetupScore {
|
||||
/// Kubernetes namespace where OpenBao is deployed.
|
||||
#[serde(default = "default_namespace")]
|
||||
pub namespace: String,
|
||||
|
||||
/// StatefulSet pod name to exec into.
|
||||
#[serde(default = "default_pod")]
|
||||
pub pod: String,
|
||||
|
||||
/// KV v2 mount path to enable.
|
||||
#[serde(default = "default_kv_mount")]
|
||||
pub kv_mount: String,
|
||||
|
||||
/// Policies to create.
|
||||
#[serde(default)]
|
||||
pub policies: Vec<OpenbaoPolicy>,
|
||||
|
||||
/// Userpass users to create.
|
||||
#[serde(default)]
|
||||
pub users: Vec<OpenbaoUser>,
|
||||
|
||||
/// Optional JWT auth configuration (e.g., for Zitadel OIDC).
|
||||
#[serde(default)]
|
||||
pub jwt_auth: Option<OpenbaoJwtAuth>,
|
||||
}
|
||||
|
||||
fn default_namespace() -> String {
|
||||
DEFAULT_NAMESPACE.to_string()
|
||||
}
|
||||
fn default_pod() -> String {
|
||||
DEFAULT_POD.to_string()
|
||||
}
|
||||
fn default_kv_mount() -> String {
|
||||
DEFAULT_KV_MOUNT.to_string()
|
||||
}
|
||||
|
||||
impl Default for OpenbaoSetupScore {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
namespace: default_namespace(),
|
||||
pod: default_pod(),
|
||||
kv_mount: default_kv_mount(),
|
||||
policies: Vec::new(),
|
||||
users: Vec::new(),
|
||||
jwt_auth: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T: Topology + K8sclient> Score<T> for OpenbaoSetupScore {
|
||||
fn name(&self) -> String {
|
||||
"OpenbaoSetupScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||
Box::new(OpenbaoSetupInterpret {
|
||||
score: self.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Interpret
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
struct OpenbaoSetupInterpret {
|
||||
score: OpenbaoSetupScore,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
struct InitOutput {
|
||||
#[serde(rename = "unseal_keys_b64")]
|
||||
keys: Vec<String>,
|
||||
root_token: String,
|
||||
}
|
||||
|
||||
fn keys_dir() -> PathBuf {
|
||||
directories::BaseDirs::new()
|
||||
.map(|dirs| dirs.data_dir().join("harmony").join("openbao"))
|
||||
.unwrap_or_else(|| PathBuf::from("/tmp/harmony-openbao"))
|
||||
}
|
||||
|
||||
fn keys_file() -> PathBuf {
|
||||
keys_dir().join("unseal-keys.json")
|
||||
}
|
||||
|
||||
impl OpenbaoSetupInterpret {
|
||||
async fn exec(
|
||||
&self,
|
||||
k8s: &harmony_k8s::K8sClient,
|
||||
command: Vec<&str>,
|
||||
) -> Result<String, String> {
|
||||
k8s.exec_pod_capture_output(&self.score.pod, Some(&self.score.namespace), command)
|
||||
.await
|
||||
}
|
||||
|
||||
async fn bao_command(
|
||||
&self,
|
||||
k8s: &harmony_k8s::K8sClient,
|
||||
root_token: &str,
|
||||
shell_cmd: &str,
|
||||
) -> Result<String, String> {
|
||||
let full = format!("export VAULT_TOKEN={} && {}", root_token, shell_cmd);
|
||||
self.exec(k8s, vec!["sh", "-c", &full]).await
|
||||
}
|
||||
|
||||
async fn bao(
|
||||
&self,
|
||||
k8s: &harmony_k8s::K8sClient,
|
||||
root_token: &str,
|
||||
args: &[&str],
|
||||
) -> Result<String, String> {
|
||||
self.bao_command(k8s, root_token, &args.join(" ")).await
|
||||
}
|
||||
|
||||
// -- Step 1: Init ---------------------------------------------------------
|
||||
|
||||
async fn init(&self, k8s: &harmony_k8s::K8sClient) -> Result<String, InterpretError> {
|
||||
let dir = keys_dir();
|
||||
std::fs::create_dir_all(&dir).map_err(|e| {
|
||||
InterpretError::new(format!("Failed to create keys directory {:?}: {}", dir, e))
|
||||
})?;
|
||||
|
||||
let path = keys_file();
|
||||
if path.exists() {
|
||||
// Verify the vault is actually initialized before trusting cached keys.
|
||||
// If the cluster was recreated, the vault has a fresh PVC but the local
|
||||
// keys file is stale.
|
||||
let status = self.exec(k8s, vec!["bao", "status", "-format=json"]).await;
|
||||
let is_initialized = match &status {
|
||||
Ok(stdout) => !stdout.contains("\"initialized\":false"),
|
||||
Err(e) => !e.contains("not initialized"),
|
||||
};
|
||||
|
||||
if is_initialized {
|
||||
info!("[OpenbaoSetup] Already initialized, loading existing keys");
|
||||
let content = std::fs::read_to_string(&path)
|
||||
.map_err(|e| InterpretError::new(format!("Failed to read keys: {e}")))?;
|
||||
let init: InitOutput = serde_json::from_str(&content)
|
||||
.map_err(|e| InterpretError::new(format!("Failed to parse keys: {e}")))?;
|
||||
return Ok(init.root_token);
|
||||
}
|
||||
|
||||
warn!(
|
||||
"[OpenbaoSetup] Vault not initialized but stale keys file exists, re-initializing"
|
||||
);
|
||||
let _ = std::fs::remove_file(&path);
|
||||
}
|
||||
|
||||
info!("[OpenbaoSetup] Initializing OpenBao...");
|
||||
let output = self
|
||||
.exec(k8s, vec!["bao", "operator", "init", "-format=json"])
|
||||
.await;
|
||||
|
||||
match output {
|
||||
Ok(stdout) => {
|
||||
let init: InitOutput = serde_json::from_str(&stdout).map_err(|e| {
|
||||
InterpretError::new(format!("Failed to parse init output: {e}"))
|
||||
})?;
|
||||
let json = serde_json::to_string_pretty(&init)
|
||||
.map_err(|e| InterpretError::new(format!("Failed to serialize keys: {e}")))?;
|
||||
std::fs::write(&path, json).map_err(|e| {
|
||||
InterpretError::new(format!("Failed to write keys to {:?}: {e}", path))
|
||||
})?;
|
||||
|
||||
#[cfg(unix)]
|
||||
{
|
||||
use std::os::unix::fs::PermissionsExt;
|
||||
let _ = std::fs::set_permissions(&path, std::fs::Permissions::from_mode(0o600));
|
||||
}
|
||||
|
||||
info!("[OpenbaoSetup] Initialized, keys saved to {:?}", path);
|
||||
Ok(init.root_token)
|
||||
}
|
||||
Err(e) if e.contains("already initialized") => Err(InterpretError::new(format!(
|
||||
"OpenBao already initialized but no local keys file at {:?}. \
|
||||
Delete the cluster or restore the keys file.",
|
||||
path
|
||||
))),
|
||||
Err(e) => Err(InterpretError::new(format!(
|
||||
"OpenBao operator init failed: {e}"
|
||||
))),
|
||||
}
|
||||
}
|
||||
|
||||
// -- Step 2: Unseal -------------------------------------------------------
|
||||
|
||||
async fn unseal(&self, k8s: &harmony_k8s::K8sClient) -> Result<(), InterpretError> {
|
||||
#[derive(Deserialize)]
|
||||
struct Status {
|
||||
sealed: bool,
|
||||
}
|
||||
|
||||
// bao status exits 2 when sealed — treat exec error as "sealed"
|
||||
let sealed = match self.exec(k8s, vec!["bao", "status", "-format=json"]).await {
|
||||
Ok(stdout) => serde_json::from_str::<Status>(&stdout)
|
||||
.map(|s| s.sealed)
|
||||
.unwrap_or(true),
|
||||
Err(_) => true,
|
||||
};
|
||||
|
||||
if !sealed {
|
||||
info!("[OpenbaoSetup] Already unsealed");
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
info!("[OpenbaoSetup] Unsealing...");
|
||||
let path = keys_file();
|
||||
let content = std::fs::read_to_string(&path)
|
||||
.map_err(|e| InterpretError::new(format!("Failed to read keys: {e}")))?;
|
||||
let init: InitOutput = serde_json::from_str(&content)
|
||||
.map_err(|e| InterpretError::new(format!("Failed to parse keys: {e}")))?;
|
||||
|
||||
for key in &init.keys[0..3] {
|
||||
self.exec(k8s, vec!["bao", "operator", "unseal", key])
|
||||
.await
|
||||
.map_err(|e| InterpretError::new(format!("Unseal failed: {e}")))?;
|
||||
}
|
||||
|
||||
info!("[OpenbaoSetup] Unsealed successfully");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// -- Step 3: Enable KV v2 -------------------------------------------------
|
||||
|
||||
async fn enable_kv(
|
||||
&self,
|
||||
k8s: &harmony_k8s::K8sClient,
|
||||
root_token: &str,
|
||||
) -> Result<(), InterpretError> {
|
||||
let mount = &self.score.kv_mount;
|
||||
let _ = self
|
||||
.bao(
|
||||
k8s,
|
||||
root_token,
|
||||
&[
|
||||
"bao",
|
||||
"secrets",
|
||||
"enable",
|
||||
&format!("-path={mount}"),
|
||||
"kv-v2",
|
||||
],
|
||||
)
|
||||
.await; // ignore "already enabled"
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// -- Step 4: Enable userpass auth -----------------------------------------
|
||||
|
||||
async fn enable_userpass(
|
||||
&self,
|
||||
k8s: &harmony_k8s::K8sClient,
|
||||
root_token: &str,
|
||||
) -> Result<(), InterpretError> {
|
||||
let _ = self
|
||||
.bao(k8s, root_token, &["bao", "auth", "enable", "userpass"])
|
||||
.await;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// -- Step 5: Policies -----------------------------------------------------
|
||||
|
||||
async fn apply_policies(
|
||||
&self,
|
||||
k8s: &harmony_k8s::K8sClient,
|
||||
root_token: &str,
|
||||
) -> Result<(), InterpretError> {
|
||||
for policy in &self.score.policies {
|
||||
let escaped_hcl = policy.hcl.replace('\'', "'\\''");
|
||||
let cmd = format!(
|
||||
"printf '{}' | bao policy write {} -",
|
||||
escaped_hcl, policy.name
|
||||
);
|
||||
self.bao_command(k8s, root_token, &cmd).await.map_err(|e| {
|
||||
InterpretError::new(format!("Failed to create policy '{}': {e}", policy.name))
|
||||
})?;
|
||||
info!("[OpenbaoSetup] Policy '{}' applied", policy.name);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// -- Step 6: Users --------------------------------------------------------
|
||||
|
||||
async fn create_users(
|
||||
&self,
|
||||
k8s: &harmony_k8s::K8sClient,
|
||||
root_token: &str,
|
||||
) -> Result<(), InterpretError> {
|
||||
for user in &self.score.users {
|
||||
let policies = user.policies.join(",");
|
||||
self.bao(
|
||||
k8s,
|
||||
root_token,
|
||||
&[
|
||||
"bao",
|
||||
"write",
|
||||
&format!("auth/userpass/users/{}", user.username),
|
||||
&format!("password={}", user.password),
|
||||
&format!("policies={}", policies),
|
||||
],
|
||||
)
|
||||
.await
|
||||
.map_err(|e| {
|
||||
InterpretError::new(format!("Failed to create user '{}': {e}", user.username))
|
||||
})?;
|
||||
info!(
|
||||
"[OpenbaoSetup] User '{}' created (policies: {})",
|
||||
user.username, policies
|
||||
);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// -- Step 7: JWT auth -----------------------------------------------------
|
||||
|
||||
async fn configure_jwt(
|
||||
&self,
|
||||
k8s: &harmony_k8s::K8sClient,
|
||||
root_token: &str,
|
||||
) -> Result<(), InterpretError> {
|
||||
let jwt = match &self.score.jwt_auth {
|
||||
Some(j) => j,
|
||||
None => return Ok(()),
|
||||
};
|
||||
|
||||
let _ = self
|
||||
.bao(k8s, root_token, &["bao", "auth", "enable", "jwt"])
|
||||
.await;
|
||||
|
||||
// Configure JWT discovery. This may fail if the discovery URL is not
|
||||
// reachable from inside the cluster (e.g., Zitadel's ExternalDomain
|
||||
// isn't resolvable). Non-fatal — warn and continue.
|
||||
let config_result = self
|
||||
.bao(
|
||||
k8s,
|
||||
root_token,
|
||||
&[
|
||||
"bao",
|
||||
"write",
|
||||
"auth/jwt/config",
|
||||
&format!("oidc_discovery_url={}", jwt.oidc_discovery_url),
|
||||
&format!("bound_issuer={}", jwt.bound_issuer),
|
||||
],
|
||||
)
|
||||
.await;
|
||||
|
||||
match config_result {
|
||||
Ok(_) => {
|
||||
info!(
|
||||
"[OpenbaoSetup] JWT auth configured (issuer: {})",
|
||||
jwt.bound_issuer
|
||||
);
|
||||
}
|
||||
Err(e) => {
|
||||
warn!(
|
||||
"[OpenbaoSetup] JWT auth config failed (non-fatal): {}. \
|
||||
Ensure '{}' resolves from inside the cluster.",
|
||||
e, jwt.oidc_discovery_url
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
let policies = jwt.policies.join(",");
|
||||
self.bao(
|
||||
k8s,
|
||||
root_token,
|
||||
&[
|
||||
"bao",
|
||||
"write",
|
||||
&format!("auth/jwt/role/{}", jwt.role_name),
|
||||
"role_type=jwt",
|
||||
&format!("bound_audiences={}", jwt.bound_audiences),
|
||||
&format!("user_claim={}", jwt.user_claim),
|
||||
&format!("policies={}", policies),
|
||||
&format!("ttl={}", jwt.ttl),
|
||||
&format!("max_ttl={}", jwt.max_ttl),
|
||||
"token_type=service",
|
||||
],
|
||||
)
|
||||
.await
|
||||
.map_err(|e| {
|
||||
InterpretError::new(format!(
|
||||
"Failed to create JWT role '{}': {e}",
|
||||
jwt.role_name
|
||||
))
|
||||
})?;
|
||||
|
||||
info!(
|
||||
"[OpenbaoSetup] JWT role '{}' created (policies: {})",
|
||||
jwt.role_name, policies
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl<T: Topology + K8sclient> Interpret<T> for OpenbaoSetupInterpret {
|
||||
async fn execute(
|
||||
&self,
|
||||
_inventory: &Inventory,
|
||||
topology: &T,
|
||||
) -> Result<Outcome, InterpretError> {
|
||||
let k8s = topology
|
||||
.k8s_client()
|
||||
.await
|
||||
.map_err(|e| InterpretError::new(format!("Failed to get K8s client: {e}")))?;
|
||||
|
||||
// Wait for the pod to be running before attempting any operations.
|
||||
k8s.wait_for_pod_ready(&self.score.pod, Some(&self.score.namespace))
|
||||
.await
|
||||
.map_err(|e| {
|
||||
InterpretError::new(format!(
|
||||
"Pod {}/{} not ready: {e}",
|
||||
self.score.namespace, self.score.pod
|
||||
))
|
||||
})?;
|
||||
|
||||
let root_token = self.init(&k8s).await?;
|
||||
self.unseal(&k8s).await?;
|
||||
self.enable_kv(&k8s, &root_token).await?;
|
||||
|
||||
if !self.score.users.is_empty() {
|
||||
self.enable_userpass(&k8s, &root_token).await?;
|
||||
}
|
||||
|
||||
self.apply_policies(&k8s, &root_token).await?;
|
||||
self.create_users(&k8s, &root_token).await?;
|
||||
self.configure_jwt(&k8s, &root_token).await?;
|
||||
|
||||
let mut details = vec![
|
||||
format!("root_token={}", root_token),
|
||||
format!("kv_mount={}", self.score.kv_mount),
|
||||
];
|
||||
for user in &self.score.users {
|
||||
details.push(format!("user={}", user.username));
|
||||
}
|
||||
|
||||
Ok(Outcome {
|
||||
status: InterpretStatus::SUCCESS,
|
||||
message: "OpenBao initialized, unsealed, and configured".to_string(),
|
||||
details,
|
||||
})
|
||||
}
|
||||
|
||||
fn get_name(&self) -> InterpretName {
|
||||
InterpretName::Custom("OpenbaoSetup")
|
||||
}
|
||||
|
||||
fn get_version(&self) -> Version {
|
||||
todo!()
|
||||
}
|
||||
|
||||
fn get_status(&self) -> InterpretStatus {
|
||||
todo!()
|
||||
}
|
||||
|
||||
fn get_children(&self) -> Vec<Id> {
|
||||
vec![]
|
||||
}
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user