Code review findings:
- WebGuiConfigScore: write PHP script to temp file via SSH instead of
inline php -r with shell escaping. Eliminates the shell quoting
attack surface entirely (port is u16 so was safe, but the pattern
of format!() into shell commands is a code smell)
- parser.rs: prefix unused parameter with underscore
The web UI responds before the API backend (configd/PHP) is fully
ready after a firmware update reboot. Adding 10s settle time between
web UI detection and package install retry fixes the timeout.
Full --full cold start now completes in ~174 seconds:
Boot + bootstrap: 48s
Firmware update + reboot: 60s
Package installs + 12 Scores x2 + idempotency check: 66s
The firmware update path waited on port 443 after reboot, but the
webgui config persists across reboots (config.xml stays at 9443).
Changed to use OPN_API_PORT which goes through wait_for_https with
port fallback (tries 9443 first, then 443).
Also increase VM resources from 1 vCPU / 1GB to 3 vCPU / 2GB —
cold boot drops from >10 minutes to ~48 seconds.
- Increase VM boot wait from 5 to 10 minutes (cold OPNsense nano first
boot with filesystem expansion can exceed 5 minutes)
- wait_for_https now tries target port first, then falls back to 443
on each attempt (handles both fresh VMs on port 443 and already-
bootstrapped VMs on custom port)
- cargo fmt on network_stress_test and webgui.rs
Split the OPNsense webgui port change from OPNsenseBootstrap into a
proper idempotent Score:
- WebGuiConfigScore reads current port via SSH before changing
- Returns NOOP if already on the target port
- Modifies config.xml via PHP and restarts webgui via configctl
- Runs before LoadBalancerScore to free port 443 for HAProxy
Also:
- Add Config::shell() accessor for SSH access from Scores
- Add WebGuiConfigScore to VM integration example (12 Scores now)
- Document the WebGuiConfig → LoadBalancer ordering dependency
as a concrete use case in docs/architecture-challenges.md
(ties into Challenge #2: Runtime Plan & Validation)
The implicit dependency (LoadBalancerScore needs 443 free, which
requires WebGuiConfigScore to run first) remains a convention-based
ordering. This is tracked in architecture-challenges.md alongside
the score_with_dep.rs design sketch.
- Run cargo fmt across opnsense-api, opnsense-config, opnsense-codegen
(fixes formatting in generated files and hand-written modules)
- Update examples/opnsense/README.md: replace stale VirtualBox docs
with current API key + cargo run instructions
- Update examples/opnsense_vm_integration/README.md: document
idempotency test (run twice, assert zero duplicates), add
build/opnsense-e2e.sh usage instructions
The E2E test revealed that OPNsense validation failures were being
silently swallowed: add/set operations returned {"result": "failed",
"validations": {...}} but the code treated them as success.
Critical fixes:
- add_item/set_item now return Error::Validation on failure instead
of silently returning empty/failed responses
- VLAN: set pcp (PriorityCodePoint) — required in OPNsense 26.1
- Firewall filter: set sequence and statetype (KeepState)
- SNAT: set sequence
- BINAT: set sequence and destination_net ("any")
- DNAT: set sequence
- VIP: default advbase=1 and advskew=0 (required even for IP aliases)
- HAProxy backend: set mode, algorithm, persistence_cookiemode enums
- HAProxy frontend: set mode, connectionBehaviour enums
E2E test now passes: all 11 Scores run successfully against a real
OPNsense VM, and the idempotency test (run twice, verify counts
unchanged) confirms zero duplicates.
- Set mode, algorithm, persistence_cookiemode on HAProxy backend struct
(OPNsense requires these fields, empty string causes validation failure)
- Set mode, connectionBehaviour on HAProxy frontend struct (same issue)
- Switch verify_state() to use rowCount from raw JSON search responses
instead of typed SearchRow deserialization (more reliable with OPNsense
search API pagination)
Found by running E2E tests against real OPNsense VM.
- Make UuidResponse.uuid default to empty string so validation failures
({"result": "failed", "validations": {...}}) don't cause deserialization
errors. Add is_failed() helper method.
- Fix HAProxy healthcheck construction: map check_type string to
HealthcheckType enum (was sending empty string, OPNsense rejected it)
- Fix HAProxy server construction: set mode (ServerMode) and type
(ServerType) enum fields (were defaulting to empty, OPNsense rejected)
Discovered by running E2E tests against real OPNsense VM — the typed
structs with ..Default::default() sent empty strings for required enum
fields, which OPNsense rejected as validation errors.
Still needed: HAProxy backend mode/algorithm and frontend mode/
connectionBehaviour enums, and fixing search API pagination for
filter/snat/vip verification counts.
Add idempotency verification to the OPNsense VM integration example:
- Extract verify_state() that queries all entity counts via typed API
(uses DNatApi, FilterApi, SourceNatApi, VipSettingsApi)
- Extract build_all_scores() for reuse across runs
- Run all Scores twice, assert entity counts are unchanged on 2nd run
- This catches duplicate creation in VLAN, LAGG, firewall rules, etc.
Add build/opnsense-e2e.sh — CI-friendly script that:
- Checks prerequisites (libvirtd, user groups)
- Boots OPNsense VM via KVM (idempotent — skips if running)
- Runs full Score suite with idempotency verification
- Supports --download, --clean flags
- Remove stale FIXME/TODO comments on VlanScore and LaggScore — both
already use idempotent ensure_* methods that check before creating
- Fix DNAT apply pattern: remove per-rule apply() from ensure_rule(),
add single apply() call in DnatInterpret::execute() after all rules
(matching the pattern used by FirewallRuleScore and OutboundNatScore)
- Make DnatConfig::apply() public so callers can batch applies
- Add typed ensure_binat_rule_from() to FirewallFilterConfig, removing
the last json!() construction in the harmony Score layer
- BinatInterpret now uses typed method instead of manual json!()
Complete the refactoring of all opnsense-config modules:
- dnsmasq.rs: uses DnsmasqHost, DnsmasqDhcpRang, DhcpRangDomainType
structs + SettingsApi typed client for all CRUD operations
- load_balancer.rs: uses OpNsenseHaProxyServersServer,
OpNsenseHaProxyBackendsBackend, OpNsenseHaProxyFrontendsFrontend,
OpNsenseHaProxyHealthchecksHealthcheck structs with correct field
types (required String vs Option, u16 vs String, bool vs Option)
All 10 opnsense-config modules now have zero json!() in production
code. The only remaining json!() calls are in test mock responses.
Six more modules migrated to typed APIs:
- vlan.rs: uses VlansVlan struct + VlanSettingsApi, reads VlansResponse
- lagg.rs: uses LaggsLagg struct + LaggSettingsApi, reads LaggsResponse
- vip.rs: uses VirtualipVip struct + VipSettingsApi with typed enums
- caddy.rs: typed CaddyGeneralSettings struct instead of json!()
- tftp.rs: typed TftpSettings struct instead of json!()
- node_exporter.rs: typed NodeExporterSettings struct instead of json!()
All six modules now have zero json!() in production code.
Rewrite api_codegen to generate proper envelope-wrapping methods that
accept model structs directly. Callers no longer need to manually
construct RuleBody wrappers or extract UUIDs from raw JSON.
Key changes:
- Generated API clients wrap request bodies internally via serde rename
(e.g., add_rule(&my_rule) serializes as {"rule": {...}})
- Add shared SearchRow type to response.rs with label() and is_enabled()
helpers, eliminating per-module RuleSearchRow type conflicts
- Extract body_key from PHP controller addBase/setBase calls
- Rewrite dnat.rs and firewall.rs to use the typed API end-to-end:
search returns SearchResponse<SearchRow>, add returns UuidResponse,
set/del return StatusResponse — zero raw JSON in production code
- Add EnsureApi trait in firewall.rs for generic find-or-create pattern
The only remaining json!() calls in dnat.rs and firewall.rs are in test
mock responses, which is expected.
Replace manual json!() construction in ensure_filter_rule and
ensure_snat_rule_from with generated typed structs
(FirewallFilterRulesRule, FirewallFilterSnatrulesRule) and their
associated enums for action, direction, ip protocol.
Also generate typed API clients for FilterController, SourceNatController,
and OneToOneController. Add parse_controller_with_defaults for controllers
that inherit model fields from a parent class.
BINAT (one-to-one) ensure still uses json Value as the generated
OnetooneRule struct needs further validation against the wire format.
Replace all manual json!() construction and raw Value parsing in
opnsense-config's dnat module with generated typed structs (NatRuleRule,
NatRuleRuleDestination, RuleIpprotocol, RulePass) and the typed DNatApi
client.
Also fix a regression in the parser where the custom *Field handler
trigger condition was too broad, causing InterfaceField/ProtocolField
to be incorrectly treated as ArrayField subclasses. Reverted the trigger
to only match children with type attributes.
The XML parser silently skipped container elements (like <source>,
<destination>) nested inside ArrayField nodes because it only processed
children with a type attribute. This caused generated structs to be
missing nested fields, forcing opnsense-config to use json!() macros
instead of typed structs.
- Add container handling in ArrayField and custom *Field child loops
- Add serialize function to opn_map serde helper (was deserialize-only)
- Change opn_map serde attribute from deserialize_with to with
- Regenerate all 9 model files with the fixes
NatRuleRule now correctly has source/destination/created/updated
container structs with all child fields.
- Remove unused `warn` import in pair integration example
- Add TODO comment for shared credentials limitation (ROADMAP/11)
- Add doc comments on DhcpServer::get_ip/get_host noting they return
primary's address, not the CARP VIP
- Fixed network topology diagram in pair README: 192.168.10.x -> 192.168.1.x
to match the actual code (OPNsense boots on .1 of 192.168.1.0/24)
- Added explanation of NIC juggling to the diagram section
- Updated single-VM "What's next" to link to pair example (was "in progress")
- Added opnsense_pair_integration to examples/README.md table and category
- Fixed VmInterface parsing: virsh domiflist has 5 columns (Interface,
Type, Source, Model, MAC), not 4. MAC is at index 4, not 3.
- Changed pair integration subnet to 192.168.1.0/24 to match OPNsense's
hard-coded default boot IP of .1.
Tested: full --full pair integration passes end-to-end with CARP VIP
configured on both firewalls (primary advskew=0, backup advskew=100).
Boots two OPNsense VMs, bootstraps both with NIC juggling to handle
the .1 IP conflict, then applies FirewallPairTopology with CarpVipScore.
The bootstrap sequence:
1. Boot both VMs on shared LAN bridge
2. Disable backup's LAN NIC
3. Bootstrap primary on .1, change IP to .2
4. Swap NICs (disable primary, enable backup)
5. Bootstrap backup on .1, change IP to .3
6. Re-enable all NICs
7. Apply pair scores (CARP VIP, VLANs, firewall rules)
8. Verify via API on both firewalls
Supports --full flag for single-shot CI execution.
Adds set_interface_link() and list_interfaces() to KvmExecutor,
enabling programmatic up/down control of VM network interfaces by
MAC address.
This is essential for bootstrapping multiple VMs that boot with the
same default IP (e.g., OPNsense on 192.168.1.1) — disable all LAN
NICs, then enable and bootstrap one at a time.
Uses virsh domif-setlink and domiflist under the hood. Tested against
a live KVM VM.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major rewrite of OPNsense documentation to reflect the new unattended
workflow — no manual browser interaction required.
- Rewrote examples/opnsense_vm_integration/README.md: highlights --full
CI mode, documents OPNsenseBootstrap automated steps, lists system
requirements by distro
- Rewrote docs/use-cases/opnsense-vm-integration.md: removed manual
Step 3 (SSH/webgui), added Phase 2 bootstrap description, updated
architecture diagram with OPNsenseBootstrap layer
- Added OPNsense VM Integration to docs/README.md (was missing)
- Added OPNsense VM Integration to docs/use-cases/README.md (was missing)
- Added opnsense_vm_integration to examples/README.md quick reference
table and Infrastructure category (was missing, marked as recommended)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes:
- CSRF token parser now extracts <input> tags individually instead of
parsing whole lines, fixing the bug where <form name="iform"> on the
same line as the CSRF hidden input caused the wrong name to be extracted
- extract_selected_option() for <select> dropdowns (webguiproto,
ssl-certref) which extract_input_value() couldn't handle
- After webgui port change, explicitly restart lighttpd via SSH
(configctl webgui restart) as a safety net — the PHP configd call
can fail if lighttpd dies before executing it
Adds:
- diagnose_via_ssh() reports webgui config, listening ports, lighttpd
process status, and configctl status — invaluable for troubleshooting
- Diagnostic output is shown automatically when wait_for_ready() fails
Tested: full --boot + integration test passes end-to-end with zero
manual interaction on a fresh OPNsense 26.1 VM.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the manual browser steps (wizard, SSH, webgui port) with
automated OPNsenseBootstrap calls. Adds --full flag for CI-friendly
single-shot boot + test.
Working: login, wizard abort, SSH enable with root+password auth.
In progress: webgui port change (lighttpd falls back to port 80 —
needs fix for <select> dropdown extraction and CSRF token refresh).
Also adds:
- diagnose_via_ssh() for troubleshooting webgui status
- restart_webgui_via_ssh() safety net after port changes
- CSRF parser fix for same-line form+input HTML (real OPNsense layout)
- cookie_store(true) for reliable session management
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automates OPNsense initial setup via HTTP session authentication,
eliminating manual browser interaction. The module:
- Logs in with username/password (handles CSRF token extraction)
- Aborts the initial setup wizard via /api/core/initial_setup/abort
- Enables SSH with root login and password auth
- Changes the web GUI port (fire-and-forget, handles server restart)
- Provides wait_for_ready() polling helper
Uses reqwest with cookie jar for session management. No browser or
external dependencies needed — pure Rust HTTP client approach.
Includes unit tests for CSRF token extraction and HTML parsing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover:
- ensure_ready outcome merging (both Success)
- CarpVipScore applies VIPs to both firewalls with correct advskew
- CarpVipScore custom backup_advskew is respected
- CarpVipScore defaults backup_advskew to 100 when unset
- VlanScore uniform delegation applies to both firewalls
Uses httptest mock HTTP servers to intercept OPNsense API calls
without requiring real firewall devices. Adds httptest dev-dependency
to harmony crate and a #[cfg(test)] from_config constructor on
OPNSenseFirewall for test-friendly instantiation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces a higher-order topology that wraps two OPNSenseFirewall
instances (primary + backup) and orchestrates score application across
both. CARP VIPs get differentiated advskew values (primary=0,
backup=configurable) while all other scores apply identically to both
firewalls.
Includes CarpVipScore, DhcpServer delegation, pair Score impls for all
existing OPNsense scores, and opnsense_from_config() factory method.
Also adds ROADMAP entries for generic firewall trait (10), delegation
macro, integration tests, and named config instances (11).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add wait_for_ready field (default: true) to PostgreSQLConfig. When
enabled, K8sPostgreSQLInterpret waits for the cluster's -rw service
to exist after applying the Cluster CR, ensuring callers like
get_endpoint() succeed immediately.
This eliminates the retry loop in the harmony_sso example's
deploy_zitadel() -- ZitadelScore now deploys in a single pass because
the PG service is guaranteed to exist before Zitadel's Helm chart
init job tries to connect.
The deploy_zitadel function shrinks from a 5-attempt retry loop to a
simple score.interpret() call.
Move CoreDNS rewrite logic into a reusable Score at
harmony/src/modules/k8s/coredns.rs. The Score patches CoreDNS on
K3sFamily clusters to add name rewrite rules (e.g., mapping
sso.harmony.local to the in-cluster service FQDN).
K3sFamily/Default only, no-op on OpenShift. Idempotent.
The harmony_sso example now uses CoreDNSRewriteScore.interpret()
instead of an inline function.
- Remove unused serde default functions in ZitadelSetupScore
- Replace redundant closures with function references (InterpretError::new)
- Allow dead_code on AppSearchEntry.id (needed for deserialization)
- Fix empty line after doc comment in ZitadelScore
- Remove unneeded return statement in generate_secure_password
New roadmap phase covering the hardening path for the SSO config
management stack: builder pattern for OpenbaoSecretStore, ZitadelScore
PG readiness fix, CoreDNSRewriteScore, integration tests, and future
capability traits.
Updates current state to reflect implemented Zitadel OIDC integration
and harmony_sso example.
- OpenbaoSetupScore: verify vault init state before trusting cached
keys (handles cluster recreation with stale local keys file)
- ZitadelSetupScore: trim PAT whitespace (K8s secret had trailing
newline that corrupted the Authorization header)
- ZitadelOidcAuth: resolve SSO hostname to 127.0.0.1 via reqwest
resolve() so device flow works without /etc/hosts entries
- Fix OIDC discovery URL to include port (Zitadel issuer is
http://sso.harmony.local:8080, not http://sso.harmony.local)
The full SSO flow now works end-to-end: deploy, provision identity,
configure JWT auth, trigger device flow. User sees verification URL
and code in the terminal.
The example now deploys the complete SSO stack and uses it:
Phase 1: Deploy OpenBao + basic setup (init, unseal, policies, users)
Phase 2: CoreDNS patch + Deploy Zitadel + ZitadelSetupScore (creates
project + device-code app) + OpenBao JWT auth (with real client_id)
Phase 3: Store config via SSO-authenticated OpenBao (triggers device
flow on first run, uses cached session on re-run)
Removed --demo and --sso-demo flags. The default run IS the demo.
Kept --skip-zitadel and --cleanup.
On re-run: all deployments are idempotent, cached OIDC session is
reused, config is loaded from OpenBao without login prompt.
Fix the core SSO authentication flow: instead of storing the Zitadel
access_token as the OpenBao token (which OpenBao doesn't recognize),
exchange the id_token with OpenBao's JWT auth method via
POST /v1/auth/{mount}/login to get a real OpenBao client token.
Changes:
- ZitadelOidcAuth: add openbao_url, jwt_auth_mount, jwt_role fields
- New exchange_jwt_for_openbao_token() method using reqwest (vaultrs
0.7.4 has no JWT auth module)
- process_token_response() now exchanges id_token when openbao_url is
set, falls back to access_token for backward compat
- OpenbaoSecretStore::new() accepts optional jwt_role + jwt_auth_mount
- All callers updated (lib.rs, openbao_chain example, harmony_sso)
This implements ADR 020-1 Step 6 (OpenBao JWT exchange).
New Score that provisions identity resources in a deployed Zitadel
instance via the Management API v1:
- Create projects
- Create OIDC applications (device-code grant for CLI/headless)
- Machine user provisioning (stubbed for future iteration)
Authenticates using the admin PAT from the iam-admin-pat K8s secret
(provisioned automatically by the Zitadel Helm chart). No password
extraction or deprecated grant types needed.
All operations are idempotent: checks for existing resources before
creating. Results cached at ~/.local/share/harmony/zitadel/client-config.json.
This is the "day two" counterpart to ZitadelScore, enabling enterprise
automation of identity management (users, machines, applications, groups).
Replace OnceCell<Discovery> with RwLock<Option<Arc<Discovery>>> so the
cache can be cleared after installing CRDs or operators that register
new API groups.
Add invalidate_discovery() method. Call it in ensure_cnpg_operator()
after confirming the Cluster CRD is registered, so the subsequent
apply() call sees the new CRD without needing a fresh client.
This eliminates the "Cannot resolve GVK" retry loop -- PostgreSQL
Cluster resources now apply on the first attempt after CNPG operator
installation.
Patch CoreDNS on K3sFamily to add rewrite rules that map external
hostnames (sso.harmony.local, bao.harmony.local) to cluster service
FQDNs. This allows OpenBao's JWT auth to fetch Zitadel's JWKS from
inside the cluster, where Zitadel validates Host headers against its
ExternalDomain.
Uses apply_dynamic with force_conflicts since the CoreDNS ConfigMap
is owned by the k3d deployer. Restarts CoreDNS pods after patching.
No-op on non-K3sFamily distributions (OpenShift, etc.).
Idempotent: skips patching if rewrite rules already present.
The CNPG operator deployment being ready does not guarantee that the
Cluster CRD is registered in the API server's discovery cache. This
caused intermittent "Cannot resolve GVK: postgresql.cnpg.io/v1/Cluster"
errors when applying PostgreSQL Cluster resources immediately after
operator installation.
Add wait_for_crd() to harmony-k8s that polls has_crd() until the CRD
appears (2s interval, 60s timeout). Call it in ensure_cnpg_operator()
after the deployment readiness check.
This eliminates the need for retry loops in callers like harmony_sso.
Replace ~200 lines of manual init/unseal/configure/jwt-auth code with
a single OpenbaoSetupScore invocation. The deployment path is now:
1. OpenbaoScore (Helm deploy)
2. OpenbaoSetupScore (init, unseal, policies, users, JWT auth)
3. ZitadelScore (CNPG + Helm, with retry)
The example main.rs goes from ~800 lines to ~370 lines. The removed
imperative logic now lives in the reusable OpenbaoSetupScore which can
be tested against any topology.
New Score that handles the operational complexity of making a deployed
OpenBao instance operational:
- Init (operator init) with local key storage (~/.local/share/harmony/openbao/)
- Unseal (3 of 5 keys)
- Enable KV v2 secrets engine
- Create configurable policies (HCL)
- Enable userpass auth and create users
- Optional JWT auth configuration for OIDC integration
All steps are idempotent. Requires T: Topology + K8sclient.
This encapsulates the tribal knowledge of OpenBao lifecycle management
into a compiled, type-checked Score that can be tested against any
topology (k3d, OpenShift, kubeadm, bare metal).
docs/guides/writing-a-score.md:
- Add Design Principles section: capabilities are industry concepts not
tools, Scores encapsulate operational complexity, idempotency rules,
no execution order dependencies
CLAUDE.md:
- Add Capability and Score Design Rules section with the swap test:
if swapping the underlying tool breaks Scores, the capability
boundary is wrong
Replace all Command::new("kubectl") calls with harmony-k8s K8sClient
methods:
- wait_for_pod_ready() instead of kubectl get pod jsonpath
- exec_pod_capture_output() for OpenBao init/unseal/configure
- delete_resource<MutatingWebhookConfiguration>() for webhook cleanup
- port_forward() instead of kubectl port-forward subprocess
Thread K3d and K8sClient through all functions instead of
reconstructing context strings. Consolidate path helpers into
harmony_data_dir().
Add Zitadel deployment via ZitadelScore with retry logic for CNPG CRD
registration race and PostgreSQL cluster readiness timing.
Add CLI flags: --demo, --sso-demo, --skip-zitadel, --cleanup.
Add --demo mode: ConfigManager with EnvSource + StoreSource<OpenbaoSecretStore>.
Configure OpenBao with harmony-dev policy, userpass auth, and JWT auth.
harmony-k8s:
- exec_pod() and exec_pod_capture_output(): exec commands in pods by
name (not just label), with proper stdout/stderr capture
- delete_resource<K>(): generic typed delete using ScopeResolver,
idempotent (404 = Ok)
- port_forward(): native port forwarding via kube-rs Portforwarder +
tokio TcpListener, replacing kubectl subprocess. Returns
PortForwardHandle that auto-aborts on drop.
k3d:
- base_dir(), cluster_name(), context_name() public getters
Also adds tokio "net" feature to workspace for TcpListener.
VIP: Fix subnet matching from starts_with() to exact equality. Previously
"192.168.1.10" would wrongly match a request for "192.168.1.100".
LAGG: Add config diff detection when updating existing LAGGs. Logs a
warning with previous config when protocol, description, or MTU differs
from desired state.
Firewall: Detect duplicate rules with same description and warn. When
multiple rules share a description, updates the first one and logs a
warning suggesting unique descriptions.
7 new tests proving:
- VIP exact subnet match (rejects prefix match, finds exact, mode check)
- Firewall create/update/duplicate/different-description scenarios
Move vendor-neutral firewall and network types (FirewallAction, Direction,
IpProtocol, NetworkProtocol, VipMode, LaggProtocol) from harmony Score
modules to harmony_types::firewall as industry-standard IaC types.
Display impls use human-readable names (IPv4, CARP, LACP) — not wire
format. OPNsense-specific wire translations live in opnsense-api::wire
via the ToOPNsenseValue trait ("inet", "carp", "lacp").
Dependency chain: harmony_types → opnsense-api → opnsense-config → harmony.
Users import types from harmony_types, translations happen transparently
in the infrastructure layer.
Includes 6 new tests verifying all wire value translations.
Add shared enums for firewall, NAT, and LAGG Score definitions:
- FirewallAction (Pass, Block, Reject)
- Direction (In, Out)
- IpProtocol (Inet, Inet6) — shared across filter, SNAT, DNAT
- NetworkProtocol (Tcp, Udp, TcpUdp, Icmp, Any) — shared across all rule types
- LaggProtocol (Lacp, Failover, LoadBalance, RoundRobin, None)
Combined with the VipMode enum from the previous commit, all OPNsense
Score definitions now use proper types instead of raw strings. Typos in
mode/action/direction/protocol fields are now compile-time errors.
Replace the stringly-typed mode field in VipDef with a VipMode enum
(IpAlias, Carp, ProxyArp). Prevents typos and makes the API discoverable
through IDE autocompletion. The as_api_str() method converts to the wire
format expected by OPNsense.
New use-case tutorial walking newcomers through the full OPNsense VM
integration test: system setup, VM boot, SSH config, running all 11
Scores, and understanding the three-layer architecture.
Add architecture-challenges.md analyzing topology evolution during
deployment, runtime plan/validation phase, and TUI as primary interface.
Add Phase 7 (OPNsense & Bare-Metal Network Automation) tracking current
progress on OPNsense Scores, codegen, and Brocade integration. Details
the UpdateHostScore requirement and HostNetworkConfigurationScore rework
needed for LAGG LACP 802.3ad.
Add Phase 8 (HA OKD Production Deployment) describing the target
architecture with LAGG/CARP/multi-WAN/BINAT and validation checklist.
Update current state section to reflect opnsense-codegen branch progress.
Every Score execution now logs its status and elapsed time after
completion. The timing is measured in Score::interpret (the central
execution path) so it applies to all Scores automatically.
Example output:
[VlanScore] SUCCESS in 0.9s — Created 2 VLANs
[DhcpScore] SUCCESS in 1.8s — Dhcp execution successful
[LoadBalancerScore] FAILED after 45.3s — connection refused
Replace manual scp prompts in bootstrap_02 and ipxe with automated
StaticFilesHttpScore uploads. SCOS installer images and HTTP boot files
now upload via SFTP without operator intervention.
Implement wait_for_bootstrap_complete by shelling out to
openshift-install wait-for bootstrap-complete with stdout/stderr logging.
Previously this was a todo!() that would panic and crash mid-deployment.
Add [Stage 02/Bootstrap] prefixes to all bootstrap_02 log messages.
Improve bootstrap_okd_node outcome to include per-host details with
MAC addresses.
Replace todo!() in OPNSenseFirewall HTTP and TFTP serve_files with
download-then-upload logic. When a Url::Url is provided, download the
remote file to a temp directory via reqwest, then upload to OPNsense
via the existing SFTP path.
Enables StaticFilesHttpScore and TftpScore to serve files from remote
URLs (e.g. S3) in addition to local folders.
Wire the existing dnsmasq remove_static_mapping through the OPNSenseFirewall
infra layer. Add list_static_mappings at both config and infra layers for
querying current DHCP host entries. Includes 6 new unit tests with httptest
mocks covering empty, single/multi-MAC, multiple hosts, and skip edge cases.
Foundation for the upcoming UpdateHostScore.
Add VIP (IP alias / CARP) and destination NAT (port forwarding) Scores.
Update VM to 4 NICs (LAN, WAN, LAGG member 1, LAGG member 2) so LAGG
can be tested with failover protocol on vtnet2+vtnet3.
All 11 Scores pass end-to-end against OPNsense VM:
- LoadBalancerScore, DhcpScore, TftpScore, NodeExporterScore
- VlanScore (2 VLANs on vtnet0)
- FirewallRuleScore (filter rule with gateway support)
- OutboundNatScore (SNAT), BinatScore (1:1 NAT)
- VipScore (IP alias on LAN)
- DnatScore (port forward 8443→192.168.1.50:443)
- LaggScore (failover LAGG on vtnet2+vtnet3)
Add Scores for managing OPNsense new-generation firewall filter rules,
outbound NAT (SNAT), and 1:1 NAT (BINAT) via the REST API.
- opnsense-config: firewall.rs module with idempotent CRUD for filter
rules, SNAT rules, and BINAT rules (match by description)
- harmony: FirewallRuleScore (with gateway support for multi-WAN),
OutboundNatScore, BinatScore
- All 3 tested end-to-end against OPNsense VM, idempotent on re-run
- Integration test now exercises 8 Scores total
Fix codegen to handle FilterRuleField, SourceNatRuleField, and other
custom *Field types that extend ArrayField. When an XML element has
a custom type AND child elements with type attributes, recursively
parse children into struct fields instead of falling back to
Option<String> stubs.
Also fix hyphenated field names (state-policy → state_policy with
serde rename) and avoid enum name collisions by using the full struct
name as prefix for custom *Field enums.
Regenerated firewall_filter.rs: now has full FirewallFilterRulesRule
(60+ fields including action, direction, gateway, source/dest nets),
FirewallFilterSnatrulesRule, FirewallFilterNptRule,
FirewallFilterOnetooneRule.
New generated modules:
- vip.rs — Virtual IPs (CARP, IP aliases, ProxyARP)
- firewall_alias.rs — Firewall aliases (host, network, port, URL, GeoIP)
- firewall_dnat.rs — Destination NAT / port forwarding rules
Add VLAN and LAGG management via the OPNsense REST API:
- opnsense-config: vlan.rs and lagg.rs modules with idempotent CRUD
- harmony: VlanScore and LaggScore with OPNSenseFirewall integration
- VlanScore tested end-to-end against OPNsense VM (2 VLANs on vtnet0)
- LaggScore implemented but not VM-testable (needs physical NICs)
- Handle OPNsense select widget fields in VLAN interface responses
- Use direct post_typed calls (addItem/setItem/delItem/reconfigure)
Run LoadBalancerScore, DhcpScore, TftpScore, and NodeExporterScore
against a real OPNsense VM to prove the XML→API migration works.
- Add Router impl for OPNSenseFirewall (gateway + /24 CIDR)
- Fix TFTP/NodeExporter API controller paths (general, not settings)
- Fix TFTP/NodeExporter body wrapper key (general, not module name)
- Fix dnsmasq DHCP range API endpoint (Range, not DhcpRang)
- Fix dnsmasq deserialization for OPNsense select widgets and empty []
- Fix DhcpHostBindingInterpret error propagation (was todo!())
- Expand VM integration example with all 4 Scores + API verification
Add Config::from_credentials_with_api_port() and
OPNSenseFirewall::with_api_port() so the API port is not hardcoded
to 443. This allows running HAProxy on standard ports without
conflicting with the OPNsense web UI.
The integration example now instructs users to change the web GUI
port to 9443 (System > Settings > Administration > TCP Port) as
part of the manual setup, alongside enabling SSH.
The --status command detects whether the API is on 443 or 9443
and advises accordingly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the full workflow, network architecture, manual SSH step,
Docker compatibility, known issues, and future improvements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When OPNsense is on a base version that needs updating before packages
can install, attempt a firmware update and retry. Use high ports
(16443/18443) for test HAProxy services to avoid conflicting with
the OPNsense web UI on port 443.
Known issue: firmware update on a fresh 26.1 nano image may need
a manual reboot cycle before packages install successfully.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hand-written HaproxyGetResponse structs used HashMap which fails
when OPNsense returns [] for empty collections. The generated types
in opnsense-api handle this via opn_map, but opnsense-config had
duplicated structs without that fix.
Replace all hand-written HAProxy response types with serde_json::Value
traversal. This avoids the duplication and handles the []/{} duality.
Also fix integration example:
- Use high ports (16443, 18443) to avoid conflicting with web UI on 443
- Skip package install if already installed
- Use harmony_cli::cli_logger::init() instead of env_logger (safe to
call multiple times)
- Increase verification timeout to 60s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker sets iptables FORWARD policy to DROP, which blocks libvirt's
NAT networking (libvirt defaults to nftables which doesn't interact
with Docker's iptables chain).
Fix: setup-libvirt.sh now detects Docker and offers to switch libvirt
to the iptables firewall backend, so both sets of rules coexist.
The --check command warns about this mismatch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restructure the example into two clear phases:
Phase 1 (--boot): creates KVM network + VM, waits for web UI,
prints instructions for enabling SSH via the OPNsense GUI.
Phase 2 (default run): checks SSH is reachable, creates API key,
installs HAProxy, runs LoadBalancerScore, verifies via API.
The config.xml injection sets vtnet0=LAN (192.168.1.1) and
vtnet1=WAN (DHCP). SSH must be enabled manually in the web UI
because OPNsense has no REST API for SSH management and the
config.xml injection doesn't reliably enable sshd.
Future: use a pre-customized OPNsense image on S3 for CI.
Also add show_ssh_config example to opnsense-api crate.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add opnsense::image module for customizing OPNsense nano disk images:
- find_config_offset(): scans raw image for config.xml location
- replace_config_xml(): overwrites config with null-padded replacement
- minimal_config_xml(): generates WAN+LAN config for virtio NICs
- Supports auto-scanning for unknown images
KVM improvements:
- disk_from_path(): attach existing disk images (not just new volumes)
- start_vm() now idempotent (skips if already running)
- cdrom uses SATA bus instead of IDE (q35 compatibility)
Integration example updates:
- LAN on 192.168.1.0/24 (matches OPNsense defaults, host reachable)
- WAN on libvirt default network (internet access)
- Config.xml injection replaces em0/em1 with vtnet0/vtnet1
- API key creation via PHP script (writes to file, avoids escaping)
Status: VM boots, web UI responds at 192.168.1.1, interfaces assigned.
Remaining: SSH enablement in config.xml, API key creation, WAN subnet.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Interactive script that installs packages, adds user to libvirt group,
starts libvirtd, and creates the default storage pool. Asks before
each step (or run with --yes for non-interactive).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add prerequisite checking (libvirtd, group membership, storage pool,
bunzip2) with clear error messages and fix suggestions.
Add --setup to print the exact sudo commands needed for initial setup.
Add --download to pre-fetch and decompress the OPNsense nano image.
Full flow: download image → create network with DHCP → boot VM →
discover IP via libvirt lease → wait for API → create API key via
SSH → install HAProxy + Caddy → run LoadBalancerScore → verify.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
KVM module enhancements:
- Add vm_ip() and wait_for_ip() to KvmExecutor using
Domain::interface_addresses() for DHCP IP discovery
- Add DHCP range and static host entries to NetworkConfig/NetworkConfigBuilder
- Generate DHCP XML in network definitions for libvirt's built-in DHCP
- Export DhcpHost type
OPNsense VM integration example (opnsense-vm-integration):
- Boots OPNsense nano VM via KVM
- Discovers IP via libvirt DHCP lease query
- Creates API key via SSH
- Installs HAProxy + Caddy via firmware API
- Runs LoadBalancerScore (2 services: K8s API + HTTPS)
- Verifies HAProxy configuration via API
22 KVM unit tests pass (3 new DHCP tests).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explain why we use string templates for libvirt XML generation and
what the path to typed structs looks like. The best candidate is
libvirt-rust-xml (gen branch) which generates Rust structs from
libvirt's RelaxNG schemas via relaxng-gen, but it doesn't compile
yet (virtxml-domain has 6 errors as of baca481).
Also fix dead code in format_cdrom (redundant device_type branch).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add comprehensive XML generation tests covering: multi-disk VMs,
multi-NIC configurations, MAC addresses, boot order, memory conversion,
sequential disk naming, custom storage pools, NAT/route/isolated
networks, volume sizing, builder defaults, q35 machine type, and
serial console.
Add kvm-vm-examples binary with 5 scenarios:
- alpine: minimal 512MB VM, fast boot for testing
- ubuntu: standard server with 25GB disk
- worker: multi-disk (60G OS + 2x100G Ceph OSD) for storage nodes
- gateway: dual-NIC (WAN NAT + LAN isolated) for firewall/router
- ha-cluster: full 7-VM deployment (gateway + 3 CP + 3 workers)
Each scenario has clean and status subcommands.
19 KVM unit tests pass (17 new + 2 existing).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add integration tests that verify the full stack against a real OPNsense
VM. Tests are #[ignore]d by default — run with:
OPNSENSE_TEST_URL=https://10.99.99.1/api \
OPNSENSE_TEST_KEY=key OPNSENSE_TEST_SECRET=secret \
cargo test -p opnsense-api --test e2e_test -- --ignored
Tests cover:
- Firmware: status, package list
- Dnsmasq: settings/get, CRUD host lifecycle, add_static_mapping via config
- HAProxy: settings/get, CRUD server, configure_service + idempotency
- VLAN, WireGuard, Firewall: settings/get
Each test cleans up after itself. Do NOT run against production.
Also make DhcpConfigDnsMasq::new and LoadBalancerConfig::new pub for
external test usage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 14 unit tests covering the critical business logic:
Dnsmasq (11 tests):
- add_static_mapping: create new, update by IP, update by hostname,
hostname/domain splitting, duplicate MAC handling
- Conflict detection: IP/hostname in different entries, multiple matches
- remove_static_mapping: partial remove, full delete, case insensitivity
Load balancer (3 tests):
- configure_service creates all components (healthcheck→server→backend→frontend)
- Idempotent replacement on same bind address (cascade delete then re-create)
- Isolation between services on different bind addresses
Tests use httptest to mock the OPNsense API — no VM or real firewall needed.
All 100 tests pass across the workspace (0 failures).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace opnsense-config-xml dependency with opnsense-api. All
configuration CRUD now goes through the OPNsense REST API instead
of SSH + XML editing of /conf/config.xml.
Key changes:
- Config struct holds OpnsenseClient + SSH shell (for file ops only)
- Module handlers (dnsmasq, haproxy, caddy, tftp, node_exporter) are
now API-backed with async methods
- apply()/save() are no-ops — each module calls reconfigure after mutations
- install_package uses firmware API with polling
- LoadBalancer uses new domain types (LbFrontend, LbBackend, LbServer,
LbHealthCheck) instead of XML types, with UUID chaining via API
- Dnsmasq conflict detection logic preserved, adapted for API HashMap
- RwLock<Config> replaced with Arc<Config> — Config is now stateless
Benefits over XML approach:
- Per-module soft reload instead of "reload all services"
- Server-side validation of all changes
- No more hash-based race condition detection
- No more fragile XML schema coupling
SSH retained for: file uploads, PXE config writing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pin vendor/core submodule to 26.1.5 tag (matches running firewall)
- Regenerate dnsmasq from model v1.0.9 (migrated during firmware upgrade)
- Handle array-style select widgets in enum deserialization: OPNsense
sometimes returns [{value, selected}, ...] instead of {key: {value, selected}}
- Add firmware_upgrade and reboot examples for managing OPNsense updates
- All 7 modules validated against live OPNsense 26.1.5:
dnsmasq, haproxy, caddy, vlan, lagg, wireguard, firewall
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace lossy enum deserialization (unknown variants → None) with
Other(String) catch-all variant. This ensures unknown wire values
survive round-trips: reading an object and POSTing it back will not
silently destroy field values that the codegen doesn't recognize.
This is critical for data integrity — in a read-modify-write cycle,
dropping an unknown enum value would overwrite it with empty on the
next POST.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generate typed API models for HAProxy, Caddy, Firewall, VLAN, LAGG,
WireGuard (client/server/general), and regenerate Dnsmasq. All core
modules validated against a live OPNsense 26.1.2 instance.
Codegen improvements:
- Add --module-name and --api-key CLI flags for controlling output
filenames and API response envelope keys
- Fix enum variant names starting with digits (prefix with V)
- Use value="" XML attribute for wire values instead of element names
- Handle unknown *Field types as opn_string (select widget safe)
- Forgiving enum deserialization (warn instead of error on unknown)
- Handle empty arrays in opn_string deserializer
Add per-module examples (list_haproxy, list_caddy, list_vlan, etc.)
and utility examples (raw_get, check_package, install_and_wait).
Extract shared client setup into examples/common/mod.rs.
Fix post_typed sending empty JSON body ({}) instead of no body,
which was causing 400 errors on firmware endpoints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add entity-level CRUD operations (get_item, add_item, set_item,
del_item, search_items) and service management (reconfigure,
service_status) to OpnsenseClient. These map directly to OPNsense's
MVC controller patterns.
Add response module with UuidResponse, StatusResponse, and
SearchResponse<T> covering the standard OPNsense API response shapes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- KVM module with connection configuration (local/SSH)
- VM lifecycle management (create/start/stop/destroy/delete)
- Network management (create/delete isolated virtual networks)
- Volume management (create/delete storage volumes)
- Example: OKD HA cluster deployment with OPNsense firewall
- All VMs configured for PXE boot with isolated network
The KVM module uses virsh command-line tools for management and is fully integrated with Harmony's architecture. It provides a clean Rust API for defining VMs, networks, and volumes. The example demonstrates deploying a complete OKD high-availability cluster (3 control planes, 3 workers) plus OPNsense firewall on an isolated network.