Branch is ready to merge; the checklist was working scaffolding for
that. Remaining deferred items (CI image libvirt-dev, smoke-test
contract, bash → Rust smoke migration, ignored-test CI runner,
ADR-024) live in the merge commit body and should be tracked as
real issues from there.
Caller must pass `UserPassCredentials` to `FleetNatsScore::user_pass`
— no more `e2e-admin`/`e2e-device` defaults shipped in the library.
The deploy binary reads `HARMONY_FLEET_*` env vars (default namespace
`harmony-fleet-system`) and fails fast when NATS creds aren't set.
Also: `style/dist/` gitignored, `manual_mint/mint.py` moved next to
`nats/callout/` with README + secrets gitignore (the real RSA key
that was sitting untracked has been removed), `architecture_review.md`
moved to `docs/adr/drafts/024-`, three low-value ROADMAP docs deleted.
Updates pre-merge checklist (§1.6, §1.8, §3.1, §5).
Adds OIDC login support to the harmony-fleet-operator web dashboard using Zitadel SSO.
pkce was the recommended option for this since we don't need to hold on to any secret. We compute a value on server before sending the data to Zitadel who validates authenticity by recomputing the hash and comparing the two values.
pkce Auth flow
1. User visits a protected dashboard route, like /devices.
2. If no valid harmony_fleet_session cookie exists, the app redirects to /login.
3. /login creates:
- random state
- random pkce_code_verifier
- derived code_challenge = base64url(sha256(pkce_code_verifier))
4. The app stores state and pkce_code_verifier in a temporary HTTP-only login-attempt cookie.
5. The browser is redirected to Zitadel’s authorize endpoint with:
- client_id
- redirect_uri
- scope
- state
- code_challenge
- code_challenge_method=S256
6. After SSO login, Zitadel redirects back to /auth/callback?code=...&state=....
7. The callback handler:
- parses the raw query into a strict success/failure enum
- reads the temporary login-attempt cookie
- validates returned state
- exchanges code + pkce_code_verifier for tokens
- validates the returned ID token using OIDC discovery/JWKS
- creates a local harmony_fleet_session cookie
- redirects to /
8. Protected routes validate the local dashboard session cookie on each request.
9. /logout clears the dashboard session cookie and redirects to /login.
---
Auth middleware responses depending on request type:
- normal browser request: redirect to /login
- SSE request: 401 authentication required
- HTMX request: 401 with HX-Redirect: /login (HTMX redirect is more idiomatic than through Axum for this)
Reviewed-on: #284
Reviewed-by: johnride <jg@nationtech.io>
Co-authored-by: Reda Tarzalt <tarzaltreda@gmail.com>
Co-committed-by: Reda Tarzalt <tarzaltreda@gmail.com>
- Add `fleet/README.md`: overview of the crates, ADR-023 pointer,
quickstart for the e2e ping test, env knobs (`HARMONY_FLEET_E2E`,
`FLEET_E2E_KEEP`, `RUST_LOG`), how to connect to NATS from the host
and in-cluster, how to inspect the agent, the `harmony-fleet-deploy`
production CLI, the operator dashboard, and the roadmap (Zitadel +
callout next).
- `prune_stale_namespaces` now polls until each pruned namespace is
fully gone (up to 90 s). NATS NodePort 30423 is cluster-scoped, so
a still-`Terminating` namespace from the prior run was blocking the
new bring-up with "provided port is already allocated".
Verified: e2e ping test green back-to-back after the fix, with a
prior namespace left behind.
The previous e2e harness handrolled k8s manifests in `stack.rs`,
bypassing the Score-Topology-Interpret machinery harmony exists to
provide. This commit:
1. **ADR-023** codifies the rules: deploy with Scores (not
manifests), e2e uses the same Scores as production, one Score
per component, deploy blocks on smoke-test success, deploy logic
lives in `*-deploy` crates, topologies are compile-time,
thiserror over anyhow. CLAUDE.md mirrors the principles.
2. **New `fleet/harmony-fleet-deploy` crate** is the canonical home
for fleet-component Scores:
- `FleetOperatorScore` + helm-chart generator + `install_crds`
moved out of `harmony::modules::fleet::operator` (they should
never have lived in `harmony` core). `FleetServerScore`
(composite of NATS + operator + Zitadel + callout) moved too.
- New `FleetNatsScore` (preset over `NatsHelmChartScore` with
fleet's required values; v1 supports `UserPass` auth, callout
mode reserved on the public API for PR 1.5).
- New `FleetAgentScore` with `FleetAgentTarget::Pod`; `Vm`
target is a future variant that absorbs `FleetDeviceSetupScore`.
- `harmony-fleet-deploy` binary built on the existing
`harmony_cli` crate — no new CLI scaffolding.
3. **Operator runtime binary trimmed**: `Install` and `Chart`
subcommands removed; both jobs now belong to
`harmony-fleet-deploy`. The runtime binary becomes leaner.
4. **E2E harness rewritten** as a thin Score composer:
`harmony-fleet-e2e/src/stack.rs` deploys the stack via
`FleetNatsScore` + `FleetAgentScore`. The inline NATS manifest
factory and the bespoke agent Pod renderer are gone.
- Bring-up runs once per test binary via `shared_stack` +
`tokio::sync::OnceCell` (matches the `fleet_e2e_demo` pattern).
- Stale `e2e-*` namespaces from prior runs get pruned at
startup so the leaks the OnceCell creates don't compound.
5. **`thiserror` for the agent's `CommandServer`** — replaces the
anyhow-based surface with typed `CommandError` /
`CommandServerError`.
6. **Memory** captures eight load-bearing principles (saved to
`~/.claude/projects/.../memory/`) so future sessions don't drift
back into manifest-handrolling.
Verified: `cargo test -p harmony-fleet-e2e --test ping` green
end-to-end against k3d in 25s warm.
First slice of the device-commands.* protocol from
fleet/requests_over_nats.md. Lands `Verb::Ping` plus the harness that
proves it works against a real in-cluster agent.
Wire types (`harmony-reconciler-contracts::commands`):
- `Verb::Ping`, `CommandRequest`, `PingReply`, `ErrorReply`/`ErrorKind`
- `device_command_subject` / `device_command_subscription` helpers
- `X-Harmony-*` header constants
Agent:
- `command_server.rs` subscribes on `device-commands.<id>.>` and
dispatches verbs; ping handler replies with `PingReply`
- New `[agent].runtime_enabled` config flag (default true). When
false, podman init + reconciler loop are skipped so the agent can
run as a Pod on containerd-only k3d nodes; command server +
heartbeat still run
- `Dockerfile`: canonical multi-stage build for production registries
Operator:
- `commands::FleetCommandsClient` with typed `CommandError`
(`DeviceOffline` via `no_responders`, `Timeout`, `BadReply`, `Nats`)
E2E harness (`harmony-fleet-e2e`):
- Library crate + integration test. `Stack::bring_up` provisions a
fresh `e2e-<uuid8>` namespace in a shared `fleet-e2e` k3d cluster,
deploys NATS (UserPass auth, JetStream on) + the agent Pod, returns
a connected admin NATS client, and tears the namespace down on Drop
- v1 ships `AuthMode::UserPass` only; the `Callout` variant is
reserved on the public API for the follow-up PR that adds the mock
OIDC fixture + NatsAuthCalloutScore deployment
- Operator pod deployment is also follow-up — for ping the test
process drives `FleetCommandsClient` directly against the cluster's
NATS NodePort
- `HARMONY_FLEET_E2E=1` gates the integration test so default
`cargo test --workspace` runs don't depend on k3d/podman
- Image build + sideload mirrors the `fleet_auth_callout` pattern:
host `cargo build --release` → single-stage Dockerfile → `podman
build` → `k3d image import`. ~12s warm bring-up, ~80s cold
Working document for the architectural redesign of the fleet
platform before v0.1 ships to production. Captures four sections
of research:
§1 — Current state inventory. Markdown-bullet map of every public
type, score, trait, and module across `harmony/modules/fleet/`,
`harmony-reconciler-contracts`, and `fleet/harmony-fleet-*/`.
Sorted by domain meaning (identity, desired state, observed
state, setup, plumbing) rather than location, so the
cross-cutting concerns become visible. Includes a text "diagram"
of the dependency graph showing the two problematic edges:
runtime crates importing CRD types from the framework crate
(`harmony-fleet-operator` ← `harmony::modules::fleet::operator::crd`
verified at `controller.rs:37`, `device_reconciler.rs:21`,
`main.rs:9`) and the agent importing podman wire types from the
framework crate (`harmony-fleet-agent` ← `harmony::modules::podman`
verified at `main.rs:21-22`, `reconciler.rs:11`).
§2 — Theory review. Pulls principles from JG's *Pour l'amour des
compilateurs* talk (2026-04-30), its references (Crichton,
Feldman, Maguire, Goedecke, Fowler), and harmony's own load-bearing
ADRs (002 hexagonal, 003 infrastructure abstractions, 015 higher-
order topologies, 016 agent + global mesh, 018 template hydration).
Synthesizes eight design principles for the redesign — including
Goedecke's guardrail that "type-driven" ≠ "type-everything" so we
don't over-fit the cardinality argument.
§3 — Ten concrete shape problems (P1–P10), framed as cardinality
mismatches, leaky boundaries, and "is this resolved yet" branches
rather than bugs. P1 is the placement issue JG flagged in code
review; P2 is `FleetDeviceAuth`'s mixed resolved/unresolved
states; P10 is the credential-shape staircase across operator
workstation / operator pod / agent.
§4 — Five design alternatives, each scored against P1–P10:
A. Move + thin façade (conservative cleanup).
B. Resolved-only at boundaries + capability traits (principled
incremental).
C. Dataflow reframe (events in, state out).
D. Fleet as kube control plane, period (deliberately weird).
E. Algebra of fleets (deliberately mathematical).
A is too little, C/D/E are right-shape but wrong-timing for the
3-day window. B is the working recommendation, with explicit
awareness that D is the v2.0 destination and the capability
traits in B are the seam that lets us migrate without breaking
callers.
§5 sketches a concrete shape for B: new `harmony-fleet/` domain
crate with no framework dependency, `harmony-fleet-adapters-*`
crates for NATS/Zitadel/kube, the existing operator/agent/auth
crates wire adapters together, the framework's
`harmony::modules::fleet` collapses to a re-export module that
goes away by v0.2.
§6 — Five open questions for JG's review before locking the
choice. §7 — explicit "spike one slice, then commit or back out"
process so we don't lock the wrong shape.
Not an ADR yet. The ADR happens after JG agrees on which
alternative is the working hypothesis and the spike confirms the
shape feels better in code than on paper.
Picks up where the auto-fix pass left off. Workspace warning count
goes from 105 to 0 across `cargo build --workspace --all-targets`.
Three categories of fixes:
1. Mechanical fixes the auto-pass couldn't handle (unused imports
inside braced multi-name `use` statements, unused variables that
needed an underscore prefix without breaking other references):
batched via a small Python script, then 6 manual edits where the
warning location and the actual identifier were on different
lines.
2. Dead-code that's intentionally kept around for future wiring or
debug visibility — `#[allow(dead_code)]` at the right scope:
- 19 individual items (struct fields, methods, free functions,
type aliases, enum variants), e.g. `default_namespace` / `default_cluster_issuer`
in zitadel/mod.rs (used via serde defaults, opaque to rustc),
`score` fields on the OKD bootstrap interpret types,
`crd_exists` methods on the prometheus alerting scores, the
`harmony_inventory_agent::local_presence::{DiscoveryEvent,
discover_agents}` re-exports.
- 5 module-level allows for files where most items are
aspirational scaffolding (harmony_agent's replica workflow,
opnsense-config dnsmasq, three opnsense-api examples).
3. Special cases that needed real fixes, not allows:
- `opnsense-config-xml/src/data/haproxy.rs`: deprecated
`rand::thread_rng` / `Rng::gen` updated to `rng()` / `random`.
- `harmony_secret/src/lib.rs`: the `secrete2etest` integration
test gate is now declared in Cargo.toml's `[lints.rust]
unexpected_cfgs.check-cfg`; the gated test module is structured
so its dead `TestSecret`/`TestUserMeta` types come along for
the cfg ride and don't show up as unconditional dead code.
- `harmony/src/modules/nats/score_nats_k8s.rs:241`: `K8sIngressScore
{ name: todo!(), ... }`'s unreachable expression annotated.
- `harmony/src/domain/topology/k8s_anywhere/k8s_anywhere.rs:982`:
wrap the dead-after-`return Ok(Noop)` branch in
`#[allow(unreachable_code)] { ... }`. Behavior unchanged.
- `examples/try_rust_webapp/Cargo.toml`: `autobins = false` so
`src/main.rs` isn't auto-registered as both bin AND example.
All 16 lib-test suites pass: 437 tests, 0 failed, 13 ignored.
Ready for `-Dwarnings` in CI as a follow-up — the gate makes
sense once we're sure no contributor's local builds slip warnings
back in.
Workspace warning count: 408 → 105.
Three buckets cleared:
* Auto-fixable (`cargo fix` + `cargo clippy --fix`): unused imports
removed, unused variables prefixed with `_`, deprecated method
calls updated. Applied across harmony, harmony-k8s, harmony-agent,
harmony_inventory_agent, the fleet/ workspace, and ~15 examples.
* Generated code (opnsense-api/src/generated/): 269 snake_case
warnings + ~10 unreachable-pattern warnings come from
CamelCase-preserving bindings to OPNsense's HAProxy/Caddy XML
schemas. Scoped a single `#[allow(non_snake_case,
unreachable_patterns)]` at `pub mod generated;` rather than
fighting the codegen — renaming would break serde round-trips
and the codegen would regenerate them anyway.
* opnsense-codegen parser's defensive `let...else` guards on
`XmlNode` (currently single-variant): file-level
`#![allow(irrefutable_let_patterns)]` with a comment explaining
why we keep the `else` arms (they re-arm if the IR grows a
second variant).
`harmony_inventory_agent::local_presence::{DiscoveryEvent,
discover_agents}` re-exports were stripped twice by the auto-fix
passes (consumers live in another crate, so the local crate looks
"unused" to lint). Anchored with explicit `pub use` + an
`#[allow(unused_imports)]` annotation noting why.
All 151 harmony lib tests still pass. Remaining ~105 warnings are
mostly real dead code in non-fleet modules + a handful of
unused-imports/variables clippy couldn't auto-resolve; cleared in
the next pass.
Two design documents framing the next push.
`ROADMAP/fleet_platform/v0_2_plan.md` — three-day production push.
Replaces the open-ended chapter structure of v0_1_plan.md for the
period between the walking-skeleton merge and v0.1.0 in production.
Focus is locking the fleet module's public API surface so the
inevitable physical refactor (out of `harmony/modules/fleet/`,
into `fleet/harmony-fleet/`) is mechanical when we get to it.
Anchored in the principle from JG's *Pour l'amour des compilateurs*
talk: design the brick before moving the brick.
`docs/adr/022-fleet-agent-upgrade.md` — agent upgrade procedure.
K8s rolling-update shape applied to one host: drain in-flight
work, stage versioned binary alongside old, smoke-test, atomic
symlink swap, both agents alive briefly, operator verifies new
agent's heartbeat then sends explicit stop signal to old, old
exits cleanly. No version is ever erased — N-history on disk is
the rollback target. Operator-driven cutover (not self-stopping)
so the most-trusted side decides the handoff. Implementation
deferred to post-v0.1 backlog; spec exists so anyone can build
it without reinventing the design.
ADR README index updated.
The auto-generated `Id::default()` shape (`fb5310_Qm2kPoQ`) contains
underscores and uppercase, so once the agent published its
DeviceInfo and the operator tried to upsert a Device CR using
`device_id` as `metadata.name`, kube rejected it:
ApiError: Device.fleet.nationtech.io "fb5310_Qm2kPoQ" is invalid:
metadata.name: Invalid value ... must consist of lower case
alphanumeric characters, '-' ...
Failing at operator-reconcile time is bad UX: the Zitadel machine
user is already provisioned, the agent is already running, and the
auth callout's per-device permissions are already templated to a
device_id the kube layer will never accept. Re-enrolling requires
manually deleting state in three places.
Makes `--device-id` **required** and validates it against RFC1123
DNS subdomain rules upfront, before any Zitadel call:
* non-empty, ≤253 chars total
* dot-separated labels, each 1-63 chars, lowercase a-z + 0-9 + `-`
* labels must start AND end with an alphanumeric
Stricter than just "kube name valid" because the same id flows into
NATS subjects (auth callout's permission templates) — `_`/uppercase
silently passes NATS auth but breaks the kube path. Rejecting at
the CLI is the only failure point that catches both layers in one
place.
8 unit tests cover the accept set + every reject path
(underscore — the regression that triggered this — uppercase,
leading/trailing dash, empty, consecutive dots, label too long,
total too long).
CLI banner + README updated. The `Id::default()` fallback path is
removed entirely; no backward compat with the old auto-generated
shape (the user explicitly opted out — anything that ran before now
needs re-enrollment with an explicit id).
The fleet agent connects to NATS via the OKD edge-TLS Route at
`wss://nats-fleet-stg.cb1.nationtech.io`. Without the `websockets`
feature on async-nats, the connector parses the URL but doesn't know
how to do the HTTP Upgrade — it opens a raw TCP socket to port 443
and sits waiting for NATS's plaintext `INFO` frame, which never
comes (the OKD router speaks TLS+HTTPS, not raw NATS). 30s later:
ERROR async_nats::connector: expected INFO, got nothing
Error: Nats connection FAILED : IO error: expected INFO, got nothing
…and systemd restart-loops forever.
`websockets` isn't in async-nats 0.45's default feature set; the
crate's own Cargo.toml lists it as
`websockets = ["dep:tokio-websockets"]`. Enabling it on the
workspace dep makes the connector route `wss://` URLs through
tokio-websockets which does the TLS+upgrade dance correctly. Curl
already proved the server-side path works (`101 Switching
Protocols` + NATS `INFO`); the missing piece was always client
support.
The operator wasn't affected because it talks to NATS in-cluster
on `nats://fleet-nats.fleet-staging.svc.cluster.local:4222` (plain
TCP). Only external clients going through the public wss:// Route
hit this.
NATS server-level `jetstream: { ... }` config doesn't extend to
explicit accounts — each one has to opt in individually with
`jetstream: enabled` (or a per-account quota object). The rendered
values block declared `FLEET` and `SYS` accounts but never enabled
JetStream on `FLEET`, so the operator's first call to create its
desired-state KV bucket died immediately with:
JetStream error: JetStream not enabled for account
(code 503, error code 10039)
Adds `jetstream: enabled` to the callout account block in
`render_values_yaml`. SYS deliberately stays without it — system
account doesn't host streams. Reference:
https://docs.nats.io/nats-concepts/jetstream/account_jetstream
Adds `auth_callout_account_has_jetstream_enabled` regression test
that:
* asserts `jetstream: enabled` appears under the callout account
block in the rendered YAML;
* defense-in-depth: asserts `jetstream:` does NOT appear under SYS,
so a future regex slip can't silently flip system-account
JetStream on.
The operator's `credentials.toml` embeds Zitadel's JSON machine-key
content under `key_json`. Both `fleet_staging_install` and the
docstring example used basic triple-quoted strings (`"""..."""`),
which interpret backslash escapes — every `\n` in the embedded RSA
private key gets expanded to a literal 0x0A before the value lands
in the operator's env var. The operator's `harmony-fleet-auth`
deserializer then runs `serde_json::from_str` on a "JSON" string
that contains raw control chars inside string literals and rejects
it with "control character found while parsing a string at line 2
column 0".
The fix is a one-character delta: switch to TOML *literal*
multi-line strings (triple single-quote). Literal strings preserve
backslash sequences as-is, so `\n` reaches the JSON parser as the
two chars `\` + `n`, gets interpreted as a string escape, and the
multi-line PEM decodes correctly.
Updates `fleet_staging_install`'s `format!()` template to render
`key_json = '''<json>'''` and rewrites the docstring example on
`OperatorCredentials::credentials_toml` to spell out which string
form is required, with the failure mode that comes from picking
the wrong one.
The operator chart's Deployment references
`harmony-fleet-operator-secrets` via `envFrom`/`secretKeyRef` for the
`FLEET_OPERATOR_CREDENTIALS_TOML` env var, but the Secret is
intentionally NOT bundled in the on-disk helm chart (credentials are
operator-environment-specific — see comment in `chart::build_chart`).
The chart docs say "applies the Secret directly via
`operator_secret()` (used as a `K8sResourceScore`)", but
`FleetOperatorInterpret::execute` never actually did that. Result: the
operator pod stalls forever in `CreateContainerConfigError` with
`secret "harmony-fleet-operator-secrets" not found`.
Fix: when `score.credentials` is set, build the Secret via
`operator_secret(&chart_options)` and apply it via `K8sResourceScore`
**before** the helm install fires. This way kube has the Secret in
place by the time the chart's Deployment lands and the pod starts
cleanly. Mirrors the pattern `NatsAuthCalloutScore` already uses for
its own callout Secret.
Trait bound widens from `T: Topology + HelmCommand` to
`T: Topology + HelmCommand + K8sclient` to support the
`K8sResourceScore::interpret` call. The only existing caller
(`fleet_staging_install`) drives this through `K8sAnywhereTopology`
which already implements all three.
When `credentials` is `None` (no-auth dev mode) we skip the Secret
apply entirely — the chart's Deployment doesn't reference it in
that case either.
`EnvFilter::from_default_env()` returns the empty filter when
`RUST_LOG` isn't set, which silences every log line. The systemd
unit installed by `FleetDeviceSetupScore` does pass
`RUST_LOG=info`, but a hand-launched binary, an overridden unit, or
any other invocation path produced a silent agent — including the
dev-on-device run the user just hit.
Switches to `try_from_default_env().unwrap_or_else(|_|
EnvFilter::new("info"))` so:
* RUST_LOG unset → info-level by default (what the operator wants
the moment they look for logs).
* RUST_LOG set → respected as before (`RUST_LOG=debug` for
troubleshooting, `RUST_LOG=warn` if it's too chatty, etc.).
The systemd unit's existing `Environment=RUST_LOG=info` line is
left in place — explicit + harmless, and lets a customer toggle
the unit's verbosity without rebuilding the binary.
`loginctl enable-linger` returns to the caller before logind has
actually finished bringing up `user@<uid>.service`. The next step in
`FleetDeviceSetupScore` (Step 4/7 — activating user-scoped
podman.socket) calls `systemctl --user` against the just-lingered
user, which fails with:
Failed to connect to user scope bus via local transport:
No such file or directory
…because `/run/user/<uid>/bus` doesn't exist yet. The user manager
is on its way up but the score has already moved on. Reproducible
on a fresh dev-on-device run.
Adds a `wait_for_user_bus` helper that polls `/run/user/<uid>/bus`
for up to 5s after `enable-linger`. We've never seen the wait take
more than a fraction of a second in practice; 5s is a generous
ceiling that gives a clear error pointing at the right diagnostic
commands (`journalctl -u user@<uid>.service`, `loginctl user-status`)
if logind is genuinely stuck.
Two ergonomic fixes for the dev-on-device workflow.
(1) Ansible local connection. `LinuxHostTopology` always went through
SSH, so running `fleet_device_enroll` with `--target ssh://you@127.0.0.1`
required the operator to set up sshd loopback access on their own Pi —
clunky for a dev who's sitting in front of the device. Adds
`LinuxLocalhostTopology` that drives the same `LinuxHostConfiguration`
trait surface using ansible's `-c local` connection (no SSH at all)
plus direct `sh -c` subprocess calls for the loginctl / systemctl
--user paths.
The configurator now takes a unified `AnsibleConnection<'a>` enum
(`Ssh { host, creds }` | `Local { sudo_password }`) instead of a
`(host, creds)` pair. Internal `host_exec`/`host_sudo_exec` helpers
branch by transport and return the same `SshCommandOutput` shape
either way, so the public methods (ping, ensure_package, ensure_file,
etc.) are transport-agnostic.
`fleet_device_enroll` switches `--target` to optional: omitted →
local, present → SSH. No magic `localhost` string, no special-case
for 127.0.0.1. README + the flag's help text describe both modes.
(2) Auto-install `python3-venv` on Debian. First-run venv creation
fails on stock Debian/Ubuntu with `ensurepip is not available`
because Debian splits venv into the `python3-venv` apt package.
`ensure_ansible_venv` now detects that failure, checks for
`/etc/debian_version`, runs `sudo apt-get update && sudo apt-get
install -y python3-venv`, and retries. Idempotent on re-runs (apt
is a noop when already installed). On non-Debian or genuinely
broken environments, the operator gets a clear error pointing at
the right install command per distro family. Sudo prompts for a
password if not configured passwordless — that's fine, the operator
expects it.
Real symptom from a staging run:
Error: FleetDeviceSetupScore: Project 'fleet' not found in Zitadel —
run ZitadelSetupScore first to create it
…even though the project clearly existed and was visible in the
Zitadel UI. Cause: `/management/v1/*` scopes by the caller's org. The
SSO operator's primary org is whatever org their personal account
lives in; the project was created by the system iam-admin user, in
the system org. With no `x-zitadel-orgid` override, the search runs
in the operator's org and returns empty. Project effectively
"invisible" to that token.
Three changes:
* `ZitadelSetupScore` gains `admin_org_id: Option<String>`. When set,
every management API call sends `x-zitadel-orgid: <id>`. Plumbed
through `request()` next to the existing conditional `Host:`
header. Default `None`, serde-default for backward compat.
* `FleetDeviceAuth::ZitadelEnroll` gains a matching `admin_org_id`
field, threaded through `resolve_zitadel_enroll` into the
synthetic `ZitadelSetupScore` connection it builds for
`mint_device_credentials`. CLI surface: `--admin-org-id` on
`fleet_device_enroll`, with help text explaining the symptom and
where to find the value (Zitadel UI → Organization → Resource ID).
* `find_project` now uses a `nameQuery` filter rather than scanning
the full default-paginated list, so it doesn't depend on the
project being on page 1. When the filter returns empty it falls
back to an unfiltered enumeration and logs the project names that
ARE visible to the token — that list is usually enough for the
operator to spot an org-context mismatch in seconds. The not-found
error in `mint_device_credentials` was rewritten to spell out the
three real causes (org context, role, no project) instead of the
misleading "run ZitadelSetupScore first".
All 7 existing `ZitadelSetupScore` initializer sites updated with
`admin_org_id: None`. README's troubleshooting section gets the new
failure-mode entry.
The SSO login from `fleet_device_enroll` was hitting Zitadel with the
app name (`harmony-cli`) as the OAuth client_id, getting back:
400 Bad Request: invalid_client: no active client not found
Two real problems behind that error:
* `fleet_staging_install` never created the device-code OIDC app in
the first place. Its `applications: vec![]` was empty — the only
Zitadel resources provisioned were the API app, the project roles,
and the machine users. The `harmony-cli` device-code app that the
enrollment example assumed was provisioned simply did not exist.
Adds it via `ZitadelApplication { app_type: DeviceCode }` so a
fresh staging install yields a real OIDC app.
* `--admin-oidc-client-id` defaulted to the literal string
`"harmony-cli"`, which is the app's *display name*, not the
client_id. Zitadel issues numeric client_ids of the form
`<number>@<project>` when the app is created — that's what OAuth
endpoints want. Defaulting to the name was misleading: it produces
no warning, just a confusing 400 from Zitadel about a "client not
found" that the operator can't easily map back to "wrong field
passed to the flag".
Removes the default; the flag is now required when SSO is in use
(skipped only with `--admin-token`). Help text and README spell
out the distinction explicitly. The staging install now reads the
resolved client_id from `ZitadelClientConfig::client_id(...)` and
prints it in the success banner, alongside a copy-paste-ready
`fleet_device_enroll` invocation.
README also documents the post-install lookup path
(`jq -r '.apps."harmony-cli"' ~/.local/share/harmony/zitadel/client-config.json`)
and adds the `invalid_client` error to the troubleshooting list.
Two related issues from a real run.
(1) Image was Debian 12 bookworm — released June 2023, glibc 2.36, two
releases old by mid-2026. Bumping to Debian 13 trixie (current stable
since Aug 2025, glibc 2.41) keeps the rehearsal kernel + userland
roughly aligned with what's likely sitting on a fresh Pi imaged today.
URL pattern is unchanged (`cloud.debian.org/.../latest/`), still
no sha pin (latest/ rotates per point release; swap to a dated
subdir if cryptographic provenance matters). The `cdrom` is still
attached as virtio-blk read-only — that fix is independent and
still required (Debian's cloud-arm64 kernel ships without ahci.ko).
Renames in `harmony::modules::fleet`:
ensure_debian_bookworm_arm64_cloud_image →
ensure_debian_trixie_arm64_cloud_image
DEBIAN_BOOKWORM_CLOUDIMG_ARM64_{URL,FILENAME} →
DEBIAN_TRIXIE_CLOUDIMG_ARM64_{URL,FILENAME}
(2) The device-side `--target aarch64-unknown-linux-gnu` cross-compile
produced a binary that linked against the workstation's glibc
(2.41 on a current Arch host). Running it on the rehearsal VM
(Debian 12 / 13) blew up immediately:
/lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.39' not found
This is fundamental to the gnu target — the binary depends
dynamically on whatever glibc the host happens to have. The fix
isn't a workaround on the harmony side; it's switching the device
build to `aarch64-unknown-linux-musl`, which produces a fully-static
binary that runs on any aarch64 Linux regardless of the device's
libc generation.
README updated with the musl recipe (rustup target, cargo config
linker, optional `cross` shortcut) and the rationale for why musl
beats gnu for device-side cross-compiles. Workstation build is
unchanged.
`harmony`'s `kvm` feature pulls in `libvirt`, which doesn't link on
aarch64-unknown-linux-gnu (no aarch64 `libvirt-dev` package on most
distros). The device-side workflow needs a binary that runs ON the
Pi and only does enrollment — no VM-rehearsal — but the example was
unconditionally enabling `kvm`, so the cross-compile failed at link
time with `undefined reference to virStoragePoolFree` etc.
Fixes by gating the rehearsal bits behind a new `vm-rehearsal`
Cargo feature (default-on for workstation builds, opt-out via
`--no-default-features` for device builds):
* `Cargo.toml`: harmony dep is now `default-features = false,
features = ["podman"]` (podman is needed unconditionally — the
operator CRD types depend on it). New `vm-rehearsal` feature
enables `harmony/kvm` on demand.
* `main.rs`: every libvirt-touching import, CLI flag
(`--launch-pi-vm`, `--vm-rehearsal`, `--vm-*`), CLI branch, and
helper function (`boot_*_vm`, `RehearsalImage`) is now
`#[cfg(feature = "vm-rehearsal")]`. With the feature off, none
of it is referenced and nothing tries to link libvirt.
* README: documents both build flavors with copy-paste commands.
Workstation build (unchanged):
cargo build --release -p example_fleet_device_enroll
Device-side build (the new path):
cargo build --release --target aarch64-unknown-linux-gnu \
-p example_fleet_device_enroll --no-default-features
Symptom: `--launch-pi-vm` boots a Debian bookworm arm64 VM, SSH comes
up, but the configured `fleet-admin` user doesn't exist and key auth
fails. The seed ISO is well-formed (CIDATA volume label, valid
user-data, valid meta-data), but cloud-init never finds it.
Root cause: Debian's `linux-image-cloud-arm64` kernel — and other
slimmed cloud-image kernels — ship WITHOUT `ahci.ko`, because real
clouds don't expose SATA. The SATA cdrom we attach is invisible to
the guest:
* `dmesg` has zero ata/ahci/scsi/sr0 lines (confirmed by inspecting
the post-boot overlay's journald).
* `blkid -tLABEL=CIDATA` returns nothing.
* cloud-init's NoCloud datasource gives up, falls through to
`DataSourceNone`, applies no user-data, the user the score wanted
to create never gets created.
Final cloud-init log line:
`Cloud-init v. 22.4.2 finished at … Datasource DataSourceNone`
`cc_final_message.py[WARNING]: Used fallback datasource`
Fix: attach the seed as `device='disk'` `bus='virtio'` with
`<readonly/>`. virtio-blk is the universal cloud-image baseline —
every cloud kernel includes the driver — and cloud-init's NoCloud
datasource finds the seed via the volume label regardless of device
type. The `cdrom`/`CdromConfig` naming on the public API is kept
(callers mentally model the seed as removable media), but the wire
shape is now virtio-blk on every arch. Device name moves from `hdb`
to `vdb` accordingly.
Tests: `domain_xml_cdrom_device_uses_virtio_blk_readonly` pins the
new shape and explicitly asserts that the SATA / IDE-cdrom shape
does NOT come back — that's the regression this test exists to
prevent.
`FleetDeviceSetupScore` gains `FleetDeviceAuth::ZitadelEnroll` —
resolves the device's Zitadel machine user + JSON key inline, then
falls through to the existing keyfile-drop flow exactly as if a
pre-resolved `ZitadelJwt` had been passed.
Two operator workflows fall out of this:
* Dev-on-device — developer runs the score on a Pi with display
attached, browser opens locally to Zitadel SSO, dev signs in with
their personal account (must hold IAM_OWNER or equivalent), score
mints credentials for that one device and brings up the agent.
* Production-via-SSH — operator runs from a workstation, targets
each device over SSH. Browser opens once on the workstation; the
resulting access token is in-memory only for v0 (per-batch token
caching tracked in
ROADMAP/fleet_platform/device_enrollment_token_caching.md).
Implementation:
* `harmony/src/modules/zitadel/admin_auth.rs` — RFC 8628 device-code
flow against Zitadel. Tries `webbrowser::open`, falls back to
printing the URL (SSH sessions just see the URL). Minimum scope
set is `openid urn:zitadel:iam:org:project🆔zitadel:aud` —
enough to call `/management/v1/*`, nothing more.
* `harmony/src/modules/zitadel/setup.rs` — `mint_device_credentials`
helper that reuses the existing find-or-create methods (project,
machine user, user grant) plus `create_machine_key`. Idempotent on
user + grant; always mints a new key because Zitadel does not
return existing key material.
* `harmony/src/modules/fleet/setup_score.rs` — new `ZitadelEnroll`
variant + `AdminAuth::{Sso, Token}`. Resolution runs at the top
of execute(); the rest of the score sees a single shape.
render_toml's match collapses both Zitadel variants into one arm
(they share the issuer/audience/danger fields).
* `harmony/src/modules/fleet/assets.rs` — Debian bookworm arm64
generic-cloud image fetcher. This is the same Debian base
Raspberry Pi OS is built on; Pi OS itself is locked to Pi
hardware (Broadcom firmware) and won't boot in generic KVM.
No sha pin (Debian's `latest/` URL rotates per point release);
swap to a dated subdir if you need cryptographic provenance.
* `examples/fleet_device_enroll/` — single CLI covering both
workflows + a `--launch-pi-vm` switch that boots a Pi-equivalent
VM with one command and prints the SSH details + suggested
follow-up enrollment command. README walks the three flows.
Tests: `render_toml_zitadel_enroll_renders_same_as_zitadel_jwt`
locks the byte-equivalence between the unresolved (Enroll) and
resolved (Jwt) variants — the invariant `execute()` relies on so
TOML rendering is independent of when admin auth resolves.
Adds `webbrowser` as a regular dependency on `harmony` (small,
no feature gate).
A grab-bag of fixes the OKD staging install surfaced. Each landed as a
diagnosable failure during real deploys:
* URL parametrization. ZitadelSetupScore was hardcoded to
`http://127.0.0.1:{port}` with a `Host:` header — fine for k3d
port-forward, broken everywhere else. Adds `scheme: ZitadelScheme`
(Http/Https), `port: Option<u16>` (None → scheme default), and
`endpoint: Option<String>` for the rare port-forward case. The
`Host:` header is now only injected when `endpoint` is set.
* HTTP readiness gate. Helm reports SUCCESS when pods are Ready but
on OKD the Route + cert-manager Certificate reconcile asynchronously
— the first management call after install was dying with
`CaUsedAsEndEntity` (rustls rejecting OKD's bootstrap CA cert
served while cert-manager was still issuing). Score now polls
`/debug/ready` with retry; treats connect / TLS errors as transient.
* Admin password persistence. ZitadelScore was generating a fresh
random password on every run, then printing it in the success
banner — but Zitadel's chart only honors FirstInstance.* on the
first install, so the printed password didn't match what was live
in the DB. Now persisted via harmony_secret (LocalFile by default).
* Login banner shows full SSO loginName. Default Zitadel org name is
ZITADEL → org primary domain is `zitadel.<ExternalDomain>` → admin
preferredLoginName is `admin@zitadel.<host>`. Print the full
string so the operator pastes the right value.
* Shared TLS Secret across Zitadel + login Ingresses. Two
cert-manager-annotated Ingresses on the same host create two
Certificates → two ACME Orders → competing HTTP01 challenges; the
loser's Secret never lands and on OKD the second Ingress's Route
is silently never admitted because the controller inlines TLS
material into the Route at creation time. Login Ingress now
references `zitadel-tls` (same as main) and drops its
cert-manager.io annotation. Documented in
docs/guides/kubernetes-ingress.md as the canonical pattern with the
diagnostic signature so this doesn't get rediscovered.
* fleet_staging_deploy namespaces. The OLDER staging deploy example
hardcoded `fleet-system` / `zitadel`; renamed to `fleet-staging` /
`zitadel-staging` to match `fleet_staging_install`'s convention.
Five example call sites updated for the new ZitadelSetupScore shape;
fleet_e2e_demo / fleet_auth_callout / harmony_sso pass the k3d
port-forward as `endpoint: Some("http://127.0.0.1:8080")`, the
staging examples take the defaults (direct https on 443).
Tests: 8 new unit tests in setup.rs lock the URL builder, Host-header
conditional, scheme serde, and minimal-fields deserialization. One
new test in setup_score covers render_toml.
`PodmanService.env: Vec<(String, String)>` made schemars emit
`items: [{type: string}, {type: string}]` (OpenAPI tuple validation),
which k8s apiextensions rejects with "Forbidden: items must be a
schema object and not an array" — install of the operator's
`deployments.fleet.nationtech.io` CRD blew up at the Helm step.
Introduces `EnvVar { name, value }` in `domain::topology` (with
`From<(String,String)>` for ergonomics) and switches both
`PodmanService.env` and `ContainerSpec.env` to `Vec<EnvVar>`. schemars
now produces `items: { type: object, properties: { name, value } }`
which validates cleanly.
Adds `env_schema_is_object_not_tuple_for_crd_validation` to lock the
schema shape — if anyone reverts to a tuple the test fails before the
operator install does.
ZitadelSetupScore was hardcoded to look for the `iam-admin-pat`
secret in `zitadel`. After ZitadelScore gained a configurable
namespace (so it can deploy into `zitadel-staging`), the setup
score continued reading from the wrong place and failed:
Secret 'iam-admin-pat' not found in namespace 'zitadel' —
ensure ZitadelScore Helm values configure FirstInstance.Org.Machine.Pat
Adds `pub namespace: String` to ZitadelSetupScore (default
"zitadel" via serde for backward compatibility). The 5 example
call sites get explicit `namespace:` fields — fleet_staging_install
threads `cli.zitadel_namespace` through, the rest hardcode the
legacy value to keep their behavior unchanged.
The `read_admin_pat` helper now uses `self.score.namespace`
instead of the const, and the error message points at the
mismatch between ZitadelScore.namespace and ZitadelSetupScore.namespace
as the most likely cause.
The chart's OpenShift-flavored values previously omitted
`ExternalPort` from the configmapConfig. Zitadel falls back to its
internal listen port (8080), which then leaks into every
externally-emitted URL — most visibly the management console URL
and the OIDC issuer claim:
Management Console URL: https://sso-staging.cb1.nationtech.io:8080/ui/console
iss in tokens: https://sso-staging.cb1.nationtech.io:8080
But clients reach Zitadel through the OKD edge-TLS Route on 443.
The mismatch surfaces as JWT-bearer 500s (`Errors.Internal`) and
broken OIDC discovery for any client that compares the issuer to
the URL it actually used.
Fix: resolve `ExternalPort` defensively. When the caller passes
`external_port: Some(p)`, honor it. When `None`, default to 443
for `external_secure: true` and 80 otherwise — matching the
public port the OKD Route serves on.
The K3s/local branch already supported `external_port` overrides
via a separate code path (k3d port mappings); behavior unchanged
there.
The chart's defaults pin runAsUser=1000 / fsGroup=1000 in the
chart-wide podSecurityContext + securityContext blocks. On
OpenShift, restricted-v2 SCC rejects pods that pin a UID outside
the namespace's allocated `openshift.io/sa.scc.uid-range` range
(typically `1000700000/10000`).
Previous attempts:
- `runAsUser: null` in our overrides → schema rejects (`type: integer`)
- omit our overrides → chart defaults apply → SCC rejects 1000
Right answer: read the namespace's `openshift.io/sa.scc.uid-range`
annotation at install time, parse the start UID, inject it as
`runAsUser` + `fsGroup` into every securityContext block we emit.
Schema is happy (integer), SCC is happy (UID is in range).
Wired into the OpenshiftFamily branch of the values renderer:
chart-wide pod + container securityContext, initJob, setupJob,
and login (per-component override that the chart's helpers prefer
over chart-wide). K3s / vanilla K8s gets `1000` literal — chart
default, no SCC to worry about.
Bonus: namespace must pre-exist before this Score runs (caller's
job; the staging install doc already covers this).
The build context for `podman build` was the workspace root —
fine for cargo's path-deps, but `COPY . .` shipped 147 GB to the
build daemon (target/, .claude/worktrees, .git, demos, network
test data, manual_mint scratch). Tightens the .dockerignore to
exclude the heavy items, dropping the context to ~180 MB.
The callout Dockerfile was also single-stage with a host pre-built
binary (`COPY target/release/harmony-nats-callout`), which conflicts
with the new strict .dockerignore (target/ is now excluded). Rewrote
to mirror the operator's multi-stage cargo-in-Docker shape — same
builder + runtime images, same USER 65532 convention.
Build script consequences:
* No more host-side `cargo build --release -p harmony-nats-callout`
step. Both images now build self-contained from the workspace
context.
* Two podman build invocations (operator + callout), then push.
The k3d e2e harness (`fleet_auth_callout::build_and_load_callout_image`)
was relying on the old single-stage Dockerfile via tempdir staging;
it now writes its own minimal single-stage Dockerfile inline so the
fast local-iteration path is unaffected by the production-shape
change in `nats/callout/Dockerfile`.
Also includes `topology.ensure_ready()` in fleet_staging_install
(needed for cert-manager bootstrap on first apply).
Verified: `podman build` for the callout completes successfully;
operator build is the same shape and was mid-compile in testing.
The Zitadel helm chart's JSON schema validates each securityContext
block against integer types for runAsUser/fsGroup. Setting either
to `null` in values.yaml triggers:
Error: values don't meet the specifications of the schema(s):
zitadel:
- at '/login/podSecurityContext/runAsUser': got null, want integer
The intent of the original `null`s was "let OpenShift's
restricted-v2 SCC assign UID/GID" — but the chart's schema doesn't
recognize that as valid YAML. The right way to leave the fields
unset is to omit them from the values block entirely; with no key,
the chart's default (also null/unset) applies and the SCC takes
over at admit time.
Strips 14 occurrences of `runAsUser: null` / `fsGroup: null` across
the main pod, init job, setup job, and login pod security contexts.
runAsNonRoot/seccompProfile/capabilities-drop stay — those are
fields the chart accepts.
One-shot script to build + push the operator and auth-callout
container images. Pre-builds the callout binary on the host (its
Dockerfile expects target/release/harmony-nats-callout to exist —
matches the local-k3d iteration convention). Operator image is
self-contained multi-stage.
Defaults: REGISTRY=hub.nationtech.io/harmony, IMAGE_TAG=dev, PUSH=1.
Override via env. Built refs are echoed at the end as the exact
flags to paste into fleet_staging_install.
ZitadelScore gains two fields, both with defaults that preserve
the previous hardcoded behavior:
pub namespace: String // default "zitadel"
pub cluster_issuer: String // default "letsencrypt-prod"
The hardcoded `NAMESPACE` const becomes `pub const DEFAULT_NAMESPACE`
and the YAML's `cert-manager.io/cluster-issuer` annotation now
substitutes `{cluster_issuer}` from the field. Existing struct-literal
ZitadelScore call sites (5 examples) updated to fall through to
`..Default::default()` so older callers compile unchanged.
New example: `examples/fleet_staging_install`. One-shot install of
the fleet stack on OKD-shaped clusters, composing in order:
1. ZitadelScore (helm) into `--zitadel-namespace`
2. ZitadelSetupScore (project + roles + fleet-ops + fleet-operator
machine users)
3. NatsK8sScore: single-instance + auth_callout + WS Route
4. NatsAuthCalloutScore: env-var-only Secret config
5. FleetOperatorScore: credentials TOML inlining the operator's
JSON keyfile via key_json (no volume mounts)
Public hostnames derive from one CLI flag: `--base-domain`. The
demo uses `cb1.nationtech.io` → sso-staging.cb1.nationtech.io and
nats-fleet-staging.cb1.nationtech.io. cert-manager `--cluster-issuer`
defaults to `letsencrypt-prod`. Image refs (`--operator-image`,
`--callout-image`) are required (private registry, no sensible
default).
Generates the issuer NKey + auth pass at install time; the callout's
Secret consumes them via env-from-secret-key. One TOML file end-to-
end: the operator pod's only mounted Secret is the credentials
TOML, single-key, no volumes.
Idempotency note: re-running ZitadelSetupScore with the same project
name short-circuits via the cached client-config. Re-runs of NATS /
operator / callout are idempotent at the Helm/K8sResourceScore level.
`FleetServerScore` now composes:
* `nats: NatsK8sScore` — replaces NatsBasicScore. Same Score that
knows about OKD Routes, the auth_callout block in NATS Helm
values, and the WS edge-TLS wiring. The NatsBasicScore-using
`fleet_server_install` example registers the simple inner
Scores directly (no FleetServerScore wrapper) — keeps the basic
k3d-style install working without forcing it through the
K8s-flavor Score.
* `identity_setup: Option<ZitadelSetupScore>` — runs after the
Zitadel helm install. Provisions project + roles + machine
users via Zitadel's management API. The keys it produces are
what the operator authenticates with.
* `auth_callout: Option<NatsAuthCalloutScore>` — deploys the
callout pod. Pair with `nats.auth_callout = Some(...)` so the
rendered NATS values delegate to the same issuer pubkey.
Execute order:
identity (helm) → identity_setup (API) → nats (with auth_callout
block in values) → auth_callout (pod) → operator
The operator goes last so it doesn't burn reconnect attempts while
the rest comes up; its `connect_with_retry` covers any small
remaining race.
Trait bounds widen to include `Nats + TlsRouter` (for NatsK8sScore's
Route + capability path).
Post-install summary lines added: NATS WS public URL when set,
and a kubectl pointer to the callout deployment.
Operator-side: drops the Secret-as-volume mount entirely. The
operator pod consumes the entire `[credentials]` TOML block —
including the Zitadel JSON keyfile — through one
`valueFrom.secretKeyRef` env var
(`FLEET_OPERATOR_CREDENTIALS_TOML`). No volume, no mount, no
fsGroup, no `0o444` workaround. OKD restricted-v2 SCC compatible.
`OperatorCredentials` collapses to a single field:
pub credentials_toml: String // JSON keyfile inlined under key_json
`SECRET_KEY_ZITADEL_KEYFILE` and `KEYFILE_VOLUME_NAME` constants
removed — no longer used.
harmony-fleet-auth: `CredentialsSection::ZitadelJwt` gains
`key_json: Option<String>`. The factory prefers `key_json` when
non-empty, falls back to `key_path` otherwise. Agent (file-based,
`key_path` populated) keeps working unchanged. Operator (env-only,
`key_json` populated) skips the file read entirely. Tests cover
both shapes plus the default-key_path path.
Internal refactor: `load_machine_key` now delegates to
`parse_machine_key(&str)`, shared with the inline path.
fleet_e2e_demo bring-up rewires the credentials TOML it renders
to embed the JSON keyfile via `key_json = """..."""` instead of
`key_path = "..."`. The `OPERATOR_KEY_MOUNT_PATH` constant is gone
along with the now-unused mount logic. 7 callout tests + 19
fleet-auth tests still green.
Replaces the volume-mounted Secret (`/etc/callout/{issuer-nkey-seed,
nats-auth-pass}`) with `valueFrom.secretKeyRef` env vars
(`ISSUER_NKEY_SEED`, `NATS_AUTH_PASS`). The callout binary's
`read_secret` helper already supports both `<NAME>_FILE` and
`<NAME>` — it just falls through to env when the `_FILE` variant
is absent.
Also drops the pod-level `securityContext` block that pinned
`runAsUser: 65532, runAsGroup: 65532, fsGroup: 65532`. OKD's
restricted-v2 SCC rejects pods that pin UID/GID outside the
namespace's allocated range; the SCC will assign appropriate
values from that range when the fields are unset. Container-level
hardening (runAsNonRoot, no-privilege-escalation, RO root fs,
capabilities drop ALL) stays intact.
Tests rewritten to assert the new shape: env vars come from Secret
key refs, no volumes, no pinned UID/GID/fsGroup. 7 callout tests
green.
Extends NatsK8sScore additively (every new field optional, defaults
preserve supercluster shape):
pub gateway: Option<GatewayConfig> // None = single-instance
pub auth_callout: Option<AuthCalloutCfg> // delegate auth to callout
pub websocket: Option<WebSocketRouteCfg> // public WS Route + edge TLS
Render-side:
* `gateway = None` → cluster.enabled=false, replicas=1, gateway
block disabled, no `tlsCA`, no service.ports.gateway
* `auth_callout = Some` → emits authorization.auth_callout block
(using harmony's existing render_auth_callout_block convention)
+ accounts.<account>.users for the bypass user the callout
connects as + accounts.SYS + system_account: SYS. Drops the
legacy testUser + default_permissions — the callout is the
sole authority.
* `websocket = Some` → enables config.websocket.enabled with
no_tls (the Route owns TLS termination).
Routes:
* `gateway` Route stays gated to gateway.is_some(). passthrough on
7222, host = cluster.dns_name. Preserves supercluster behavior.
* `websocket` Route is new. Edge-TLS termination on port 8080
(chart's WS listener), Redirect insecure-edge policy, host from
WebSocketRouteCfg. cert-manager.io/cluster-issuer annotation
drives the Route certificate.
OKDRouteScore gains an `annotations: BTreeMap<String, String>` field
(default empty) + `with_annotation()` builder so callers can attach
the cert-manager annotation without reaching for K8sResourceScore
manually.
Side-effect: `harmony` lib's default features now include `podman`.
The CRD types in `modules::fleet::operator::crd` embed
`ReconcileScore` from `modules::podman` unconditionally — without
the feature on by default, harmony's lib-only builds fail. Existing
explicit `features = ["podman"]` callers are unaffected.
K8sAnywhereTopology's `Nats::deploy` impl populates the new fields
with `gateway = Some(default)` so the capability path keeps the
supercluster behavior it had before this commit.
Removes the hand-typed ScorePayload struct and its custom schemars
schema function. DeploymentSpec.score is now typed as the strongly
typed ReconcileScore enum already used by the agent, eliminating
duplication and ensuring the CRD schema is derived automatically.
- Add JsonSchema derive to PodmanService, PodmanV0Score, ReconcileScore
- Enable podman feature on harmony dependency in operator
- Re-export ReconcileScore/PodmanV0Score/PodmanService from crd module
- Update harmony_apply_deployment and fleet_load_test examples
- Remove TODO comment from harmony_apply_deployment
Wire format is unchanged (externally tagged {type, data}), so the
operator -> NATS KV -> agent path remains fully backward compatible.
run_server_install.sh now unconditionally sources
examples/fleet_server_install/env.sh after computing REPO_ROOT, so
the example's env knobs (KUBECONFIG, RUST_LOG, NO_ZITADEL,
ZITADEL_HOST, …) are picked up without the user having to source
manually before invoking the script. The script's `${VAR:-default}`
block only fills in values env.sh leaves unset.
env.sh keeps a (commented-out) KUBECONFIG hint and the new optional
Zitadel knobs documented post-source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs uncovered while running the full e2e walk end to end:
1. find_user_grant POSTed to /management/v1/users/<id>/grants/_search
which Zitadel rejects with 405 Method Not Allowed (the original
author's note in the comment hinted at this). The cache previously
masked it: first apply created the grant + cached the id; second
apply hit the cache and skipped the broken search. The live-query
refactor (f4d6fb94) removed the cache short-circuit, surfacing
the bug as "Create user grant failed: User grant already exists"
on every re-apply.
Fix: switch to the collection endpoint
/management/v1/users/grants/_search with a userIdQuery filter,
matching the Zitadel API that's actually wired up. Now returns
the existing grant on re-apply and the create_user_grant fallback
is correctly skipped.
2. Operator keyfile mounted as 0o400 owned by root. The operator pod
runs as non-root (image USER directive — no fixed runAsUser
because we want SCC compatibility). Result: operator boots,
tries to load the JSON keyfile from the Secret volume, hits
EACCES, fails the credential factory, retries forever.
Fix: mode 0o444. World-read inside the pod is fine — single
container, no other consumers, the Secret namespace is locked
down, and the file never escapes pod-fs. The proper fsGroup-based
alternative requires pinning a UID/GID, which conflicts with our
SCC-friendly choice of leaving runAsUser unset.
Also fixes a stale `git rm` from commit 4194baac
(harmony-fleet-auth extraction) — the agent's local credentials.rs
was deleted from disk but never staged.
Verified end to end:
* STACK READY in 2 min on warm cluster
* Operator pod: "minted fresh Zitadel access token", "NATS connected",
"starting Deployment controller", "watching device-info KV"
* 2 Device CRs auto-created with full label set
* `kubectl apply -f` of a Deployment CR with
targetSelector.matchLabels: { group: group-a } produced:
- status.aggregate { matched=1, succeeded=1, failed=0 }
- HTTP 200 from nginx on vm-device-00:8080
- connection refused from vm-device-01:8080 (correctly excluded)
The agent's periodic reconcile destroys-and-recreates any service
whose ContainerSpec has env or volumes, every 30s tick. Root cause:
matches_spec returns false unconditionally for those fields because
podman's list endpoint doesn't surface them; the original author
chose to declare "any spec with state is drifted" as a fail-safe.
That fail-safe weaponizes the polling reconciler into a loop.
Tags the offending line with a multi-paragraph FIXME explaining
the symptom, the root cause, the proposed fix (containers.inspect
+ structural compare + an integration test), and the demo-time
workaround (keep demo specs trivial — the hello-web nginx demo
already is).
Adds the same gap to ROADMAP/fleet_platform/v0_demo_e2e.md's
known-risks section so it's visible at planning time.
Out of scope for tonight; in scope for delivery alongside the
upcoming health-check support on ContainerSpec.
The operator was opening a bare async_nats::connect with no auth,
which would fail closed against a callout-protected NATS. Wires it
through the same JWT-bearer flow the agent uses, sharing the
recently-extracted harmony-fleet-auth crate.
Operator side
-------------
* main.rs: read FLEET_OPERATOR_CREDENTIALS_TOML (TOML snippet, same
shape as the agent's [credentials] block — single
CredentialsSection struct, just a different byte source). Empty
string bypasses (callout-less dev only, with a loud warning).
* chart.rs: ChartOptions gains an optional OperatorCredentials field.
When set, build_chart's Deployment mounts a Secret as both
envFrom (TOML payload → FLEET_OPERATOR_CREDENTIALS_TOML) and a
volume mount for the JSON keyfile at the configured key_path
(defaults to /etc/fleet-operator/zitadel-key.json). On-disk helm
chart still emits credentials: None — those are environment-
specific and out of scope for a redistributable chart.
* Public manifest builders (build_service_account, build_cluster_role,
build_cluster_role_binding, build_operator_deployment,
operator_secret) so the e2e bring-up can apply each resource via
K8sResourceScore without re-implementing the manifests.
* mod chart now lives in lib.rs so external consumers (the e2e
bring-up) can reach into it.
E2e bring-up
------------
* Bring-up gains a separate `fleet-operator` machine user with the
fleet-admin role grant — distinct from the manual-admin
`fleet-ops` user so audit logs can tell automated operator
actions apart from human ones.
* New steps 8/10 (build + sideload operator image) and 9/10 (apply
CRDs + RBAC + Secret + Deployment + wait for Ready). Devices step
becomes 10/10.
* Reuses harmony_fleet_operator's manifest builders + operator_secret
via K8sResourceScore — no duplicated YAML, no shell-out.
Tests
-----
* All existing tests pass (harmony-fleet-auth: 18, harmony-fleet-agent:
7, harmony-fleet-operator: 2). E2e walking-skeleton is exercised
by the next phase's clean rerun.
Bumps coverage on harmony-fleet-auth from 5 to 18 unit tests. The
new tests lock the corners we burned cycles on while debugging
the live system:
* cache freshness boundary (within-leeway, outside-leeway,
no-cache, non-zitadel variant)
* assertion claim shape (iss/sub/aud/exp/iat) and the 60-second
lifetime constant Zitadel enforces server-side
* scope string content (plural-projects-roles + singular-project-id
URN + openid base)
* token URL strips trailing slashes (the //oauth/v2/token 404
waiting to bite the next operator)
* MachineKeyFile JSON parsing under Zitadel's wire shape
Refactor: build_assertion now delegates to build_assertion_claims
+ build_assertion_header (pure, no signing). Lets the claim/header
shape be unit-tested without an RSA private-key fixture; the
sign-and-decode end-to-end is still covered by the e2e harness.
No new deps. wiremock not needed — every meaningful assertion is
on pure logic.
The agent's `credentials.rs` + `CredentialsSection` enum graduate
into a workspace crate (`fleet/harmony-fleet-auth/`) so the
operator can consume the same code path. Single struct, single
factory, single auth-callback wiring. The only thing that varies
between consumers is where the `[credentials]` TOML bytes come
from — the agent reads them from a config file on disk, the
operator (next commit) will read them from an env var.
Public surface of the new crate:
CredentialsSection — the deserializable
CredentialSource / NatsCredential — the runtime objects
MachineKeyFile / CachedToken — helper types
credential_source_from_config — factory
connect_options_with_credentials — async-nats wiring
Agent consumes via `pub use harmony_fleet_auth::CredentialsSection`
in its own `config.rs` so existing call sites keep working.
Existing 5 tests in the new crate + 7 in the agent all green.
This commit is structurally a move; behavior unchanged. Operator
wiring, additional unit tests, and the JWT-mint refactor (split
build_assertion / build_scope / build_token_url for testability)
follow in the next commits.
Working PyJWT script + nats CLI commands for talking to a
callout-protected NATS by hand. Distills what we learned debugging
the auth chain: which scope claims matter, why the audience is the
project id (not the API app's clientId), how to read OIDC_AUDIENCE
off the live callout instead of trusting the cache, and the failure
modes — including the PyJWT vs jwt package collision that costs
30 minutes the first time you hit it.
Cross-linked from fleet-zitadel-faq.md.
ZitadelClientConfig was used as both a key store (machine keys —
which Zitadel cannot return after creation, so caching is required)
AND a lookup cache (project_id, machine_user_ids, user_grants).
The latter introduced a silent drift class:
- ZitadelSetupScore writes the cache incrementally as it creates
each resource.
- If Zitadel is reset between runs (Postgres recreated, IDs
reissued), the cache still holds the old IDs.
- ensure_project / ensure_app / ensure_machine_user / user_grant
short-circuited on cache hit and never consulted Zitadel — so
downstream Scores got the stale ID.
- The legacy `project_id` field was further `is_none`-guarded so it
preserved the very first id ever seen, surviving any number of
Zitadel resets.
Net effect in the wild: the deployed callout's `OIDC_AUDIENCE`
silently pointed at a project that no longer existed, while
agents kept working only because their TOML config carried the
matching stale id. A manual mint script reading `project_id` from
the cache would produce tokens that pass signature validation but
fail the audience check — exactly the symptom that surfaced this
bug.
Fix: drop the cache-hit short-circuit in every ensure_* path and
always live-query. The cache now only holds machine key material
(its only legitimate role) and a record of last-known IDs that
get refreshed on every apply. Cost: ~1 extra HTTP per project /
app / user / grant per Score apply — these are not hot paths.
Also: stop is_none-guarding `config.project_id` so the legacy
field tracks live state for older single-project consumers.
Flip the polarity of the Zitadel knobs in run_server_install.sh: the
Score is now installed on every run, and `NO_ZITADEL=1` is the
explicit skip. Defaults: ZITADEL_HOST=zitadel.localhost (HTTP ingress
auto-selected by the example crate's `.localhost` rule). ZITADEL_VERSION
stays optional (empty = inherit the example's clap default).
Updates env.sh to document the new polarity (NO_ZITADEL as the opt-out,
ZITADEL_HOST/VERSION as overrides on top of the defaults).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FleetServerScore gains `pub identity: Option<ZitadelScore>` and a
conditional `.interpret()` call after the operator install. Trait
bounds widen from `Topology + HelmCommand` to
`Topology + HelmCommand + K8sclient + PostgreSQL` to satisfy the
ZitadelScore impl — both inner Scores need the wider topology even
when identity is None (Rust trait bounds are static).
Example crate consequences:
- Switched topology from K8sBareTopology to K8sAnywhereTopology
(provides PostgreSQL via CNPG). `ensure_ready` now installs
cert-manager as a side effect — Zitadel's prod ingress needs it
anyway, and it's harmless on k3d.
- New CLI flags: --zitadel-host (Option<String>; omitted = no Zitadel),
--zitadel-version, --zitadel-insecure. Dev-friendly defaults: hosts
ending in .localhost / .test default to external_secure=false.
- Outcome details now include the Zitadel URL when identity is set.
Auxiliary:
- Added env.sh next to the example, mirroring okd_add_node's pattern
(KUBECONFIG / RUST_LOG / sqlite secret store paths, with optional
ZITADEL_HOST documented).
- run_server_install.sh now reads ZITADEL_HOST / ZITADEL_VERSION env
and passes them through. Trailing banner conditionally prints the
Zitadel `helm uninstall` command alongside the operator one.
Out of scope: load-test.sh drives the same example crate and may
need a topology audit after this change. Flagged for follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skips cluster create + operator image build + k3d sideload when set —
just refreshes the kubeconfig and runs the Score against the already-
bootstrapped cluster. Shaves the slow rebuild + sideload off the dev
loop when iterating on Score-side code with the operator binary
unchanged.
Errors out cleanly if --score-only is passed but the cluster is
missing (instead of letting cargo trip on a missing kube context).
Unknown flags also fail-fast.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch example_fleet_server_install from a manual `create_interpret().
execute()` + `println!` to `harmony_cli::run`, which wires up the
framework's standard logger + reporter — emoji-tagged per-Score
progress lines and an end-of-run summary listing each Score's
`Outcome.details`. Mirrors the okd_add_node example's pattern.
For events to fire on the inner Scores, FleetServerScore now calls
`Score::interpret` (not `create_interpret().execute`) on
NatsBasicScore + FleetOperatorScore. Same change inside
FleetOperatorScore for its inner HelmChartScore.
Outcome.details populated:
- FleetOperatorScore: image, namespace, release_name, NATS URL.
- FleetServerScore: in-cluster NATS URL, kubectl pointer to the
operator deployment, kubectl tip for verifying CRDs.
Progress logs added inside FleetOperatorScore between the chart-
render and helm-install phases (`info!`).
FleetOperatorScore fields are now `pub` so callers can read them
post-construction (FleetServerScore needs `operator.namespace` for
its summary). Builder methods unchanged; both styles coexist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two scripts for running the new install Score against a local cluster:
- examples/fleet_server_install/run.sh — generic, cwd-independent
passthrough around `cargo run -p example_fleet_server_install`.
- fleet/scripts/run_server_install.sh — opinionated k3d test harness:
creates `fleet-server-test` cluster if absent (with NATS port 4222
mapped through klipper-lb), builds the operator image via
build_docker.sh, sideloads it, runs the Score, and leaves the
cluster up. Prints teardown + redeploy commands at the end. Header
documents the helm-idempotency limitation: a rebuilt image won't
redeploy on a second run unless `helm uninstall` is invoked first
(HelmChartScore short-circuits on chart_version match). Proper fix
is deferred — content-hash chart_version or a force_upgrade flag.
Dockerfile glibc pin: builder pinned to `rust:1.94-slim-bookworm`.
Unsuffixed `rust:slim` follows Debian's latest stable (trixie =
glibc 2.40), so binaries built there fail to start on the
`debian:bookworm-slim` runtime (glibc 2.36) with "GLIBC_2.39 not
found". Surfaced when running the new scripts end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapses the load-test harness's chart-gen + helm-install dance into
first-class Harmony Scores. Customer-facing path:
let score = FleetServerScore::new(nats, operator);
score.create_interpret().execute(&Inventory::empty(), &topology).await?;
FleetOperatorScore renders the operator chart (CRDs + RBAC + ServiceAccount
+ Deployment) into a tempdir and delegates to HelmChartScore. FleetServerScore
composes it with NatsBasicScore via fail-fast `?` chaining; Zitadel + Argo
hang off the same chain when their Scores land.
Structural change: CRD type definitions and chart-builder moved from
fleet/harmony-fleet-operator/src/{crd,chart}.rs into
harmony/src/modules/fleet/operator/. Harmony can't depend on the operator
crate (cycle), so the score-side code lives in harmony and the operator
binary imports the types right back via
`harmony::modules::fleet::operator::*`. Considered keeping CRDs in the
operator crate with the score either there or in a sibling crate, but
putting customer-facing scores in harmony/src/modules/fleet/ matches the
existing convention (FleetDeviceSetupScore, ProvisionVmScore) and keeps
the CRDs reachable from future harmony scores (e.g. an inventory aggregator
reading Device CRs) without dragging in the operator binary.
The operator's `chart` subcommand stays as a developer convenience
(routes through harmony::modules::fleet::operator::build_chart) so
`cargo run -p harmony-fleet-operator -- chart` still produces an
identical chart on disk for inspection. Existing examples
(fleet_load_test, harmony_apply_deployment) updated to import CRD types
from harmony directly.
load-test.sh phase 3c collapses to a single
`cargo run -p example_fleet_server_install` invocation; phase 2b's NATS
install still runs separately so the host-side NATS reachability probe
sits where it always did. Idempotency: re-running short-circuits via
HelmChartScore::find_installed_release on both inner installs.
Verified: cargo fmt --check, cargo clippy, cargo test all pass; the
4 fleet operator unit tests (2 migrated from operator crate, 2 new on
FleetOperatorScore defaults/builders) pass under `cargo test -p harmony`;
operator chart subcommand produces an identical chart structure
post-refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure rustfmt wrapping on long lines that pre-dated this branch — surfaced
when running `cargo fmt --check` as part of unrelated work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors .gitea/workflows/harmony_composer.yaml: on push to master (or
manual dispatch), build the multi-stage Dockerfile and push
hub.nationtech.io/harmony/harmony-fleet-operator:latest. No buildx
caching yet — TODO comment in the workflow tracks it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The operator Dockerfile previously copied a host-built binary into
archlinux:base — archlinux was a glibc-ABI workaround for that
host-build path. Convert to a two-stage build (rust:1.94-slim →
debian:bookworm-slim) so cargo runs inside the image. load-test.sh
loses its host cargo build + staging-context trick and now points
podman at the workspace root with -f. Add build_docker.sh as the
local Harbor entry point (DOCKER_TAG, PUSH overrides).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent's data plane was JetStream-KV-only, so live observers
that don't want to consume the JS stream had no signal to subscribe
to. The walking-skeleton e2e admin test was failing as a result —
admin subscribes to `device-state.>` (the per-device direct
subject) and saw nothing in 30s.
This commit adds a small core-NATS publish on `device-state.<id>`
alongside the existing KV writes:
- `FleetPublisher::publish_state_pulse()` emits a tiny
`{device_id, kind: "heartbeat", at}` payload on
`device-state.<device_id>`, called from the heartbeat loop so
observers see traffic on the same 30s cadence as the KV
heartbeat write — but on a non-JetStream subject anyone can sub
to.
- `write_deployment_state()` now fans out the same payload it puts
in the KV bucket on the direct subject, so live admin tooling
picks up reconcile transitions immediately without watching the
KV stream.
Also threads `device_id_prefix_strip = "device-"` through the
fleet_e2e_demo bring-up. The bring-up has its own NatsAuthCalloutScore
construction (parallel to fleet_auth_callout's `bring_up_stack`),
and was missing the prefix-strip line, so the deployed callout was
interpolating permissions against `device-vm-device-00` instead of
the bare device id the agent uses.
Locks the regression with a unit test
(`device_id_prefix_strip_lands_as_env_value`) on the deployment
manifest builder.
Verified end-to-end in the VM rehearsal:
test both_devices_heartbeat_within_60s ... ok
test admin_jwt_reads_any_device_subject ... ok
Two bugs surfaced when the agent went live against NATS JetStream KV
in the VM-based e2e rehearsal:
1. The default `device` role only allowed flat `device-state.<id>` /
`device-commands.<id>` subjects. The agent's actual data plane is
JetStream KV, which puts every operation on `$KV.<bucket>.<key>`
subjects with control-plane traffic on `$JS.API.>` and `$JS.ACK.>`.
With the old role config, the very first KV publish died with
`Permissions Violation for Publish to "$JS.API.INFO"`.
The role now allows `$JS.API.>` + `$JS.ACK.>` plus the four
per-device data subjects derived from
harmony_reconciler_contracts::kv (info.<id>, state.<id>.<dep>,
heartbeat.<id>, desired-state.<id>.<dep>). The legacy direct
`device-state.<id>` / `device-commands.<id>` subjects are kept so
non-JetStream callers of NatsAuthCalloutScore still work.
A new unit test (`device_role_covers_reconciler_contract_kv_subjects`)
imports the contract crate as a dev-dep and asserts each contract-
produced subject is matched, plus that cross-device subjects are
*not* matched. This locks the role config to the contract surface so
future renames break the test before they break prod.
2. Zitadel's `client_id` claim for a machine user equals the userName
verbatim. Both `fleet_rpi_setup` and `fleet_e2e_demo` create the
user as `device-{device_id}`, so the JWT carries
`device-vm-device-00` while the agent's KV keys use the bare
`vm-device-00`. The callout was interpolating the prefixed string
into permissions, producing rules that never matched what the
agent actually publishes.
Adds `device_id_prefix_strip` (env: `DEVICE_ID_PREFIX_STRIP`,
defaults empty so existing deployments are unaffected). When set,
the validator strips the prefix from the extracted claim before
permission interpolation. The fleet_auth_callout example wires it
to `device-` so the e2e harness stays end-to-end correct without
reaching into either naming convention.
Verified end-to-end: both VM agents now publish DeviceInfo /
heartbeat through JetStream KV with no permission errors and zero
service restarts since the rollout.
The cargo bin target is `harmony-fleet-agent`, not `fleet-agent` —
the latter never existed under target/release. Smoke-a4 happened to
work because callers passed --agent-binary explicitly; the harness
defaults didn't.
Zitadel only includes the project-roles block in an access token when
the JWT-bearer request asks for it via the
`urn:zitadel:iam:org:projects:roles` scope (PLURAL "projects"). Without
it the agent's token has a valid signature/audience but no roles, so
the NATS auth callout rejects with "no authorized role in token" even
though the machine user has a "device" grant.
Discovered while running the VM-based e2e rehearsal: agents could mint
a token, connect to NATS, then immediately fail authorization. The
plural-projects vs. singular-project distinction is a Zitadel
convention; both scopes are required, and the comment now spells out
what each one does.
Wires the previously-built FleetDeviceSetupScore through to a
LinuxHostTopology against each pre-provisioned VM. Mirrors the
fleet_rpi_setup pattern but synthesizes inline so the harness drives
N VMs in sequence without re-deriving the CLI plumbing.
Each VM gets:
- An /etc/hosts entry mapping `sso.fleet.local` → libvirt host IP
via the new HostsEntry support, so the in-VM agent's HTTP client
to Zitadel can resolve the issuer.
- The per-device Zitadel machine key dropped at
/etc/fleet-agent/zitadel-key.json.
- Agent TOML with `type = "zitadel-jwt"` pointing at the keyfile.
- Agent service started under systemd.
SSH user assumed `fleet-admin` (matches what fleet_vm_setup +
smoke-a4 cloud-init create). Private key from the harmony fleet
keypair (ensure_fleet_ssh_keypair).
After this commit, `cargo run -p example-fleet-e2e-demo` is the
single command that turns a fresh k3d + 2 booted VMs into a
fully-converged stack: Zitadel + NATS callout + 2 agents speaking
JWT-bearer to NATS. Tomorrow's morning: prove it actually does
that on a clean machine.
Adds `examples/fleet_e2e_demo/` — composes fleet_auth_callout's
existing pieces (Zitadel + auth callout deploy) with per-device
machine-user provisioning (one ZitadelSetupScore call per VM) and
FleetDeviceSetupScore using FleetDeviceAuth::ZitadelJwt. The harness
expects pre-provisioned libvirt VMs (one per device) reachable via
`FLEET_E2E_VM_<i>_IP` env vars; full VM provisioning via
ProvisionVmScore is a follow-up — keeping the harness observable in
pieces during the cold-start debugging tomorrow.
Constituent helpers in `fleet_auth_callout::lib.rs` flipped from
private to `pub` (deploy_zitadel, wait_for_zitadel_ready,
ensure_issuer_seed, build_and_load_callout_image, etc.) so the new
harness composes them rather than re-implementing.
`bring_up_full_stack`:
1. Ensure k3d cluster (re-uses fleet_auth_callout's create_k3d).
2. Deploy Zitadel + Postgres.
3. CoreDNS rewrite + wait for Zitadel HTTP + wait for the
chart-provisioned `iam-admin-pat` secret. (Last step is new and
load-bearing — without it ZitadelSetupScore races the chart's
setup job and fails on first cold-run.)
4. ZitadelSetupScore for project + API app + roles + admin
machine-user (admin gets fleet-admin role grant).
5. Issuer NKey from a persisted secret + NATS deploy with
auth_callout block + callout pod.
6. For each device i: per-device ZitadelSetupScore (machine-user
with `device` role grant), pull the JSON keyfile from cache,
render the agent's TOML with the keyfile path. (FleetDeviceSetupScore
invocation is wired structurally; the SSH-and-apply step is
gated behind the VM provisioning follow-up.)
`HostsEntry` + `merge_hosts_file` added to FleetDeviceSetupScore so
VMs on a libvirt NAT can resolve `sso.fleet.local` to the host
gateway. Managed-block markers in /etc/hosts make the merge
idempotent across re-runs and removable when entries are dropped
from the score. Four new unit tests cover the merge invariants
(insert, replace, strip, byte-stable).
Tests skeleton in `tests/e2e_walking_skeleton.rs`:
- `both_devices_heartbeat_within_60s` — implemented; reads from
device-info KV via admin token.
- `admin_jwt_reads_any_device_subject` — implemented; subscribes
to `device-state.>` as admin.
- `cross_device_isolation_enforced_in_vm` — `#[ignore]` pending
per-device-key plumbing through E2eHandles.
- `agent_recovers_from_nats_pod_restart` — `#[ignore]` pending
the NATS-pod-restart driver.
The two `#[ignore]`d tests cover the load-bearing reconnect and
isolation invariants. Wiring them is the morning-of-rehearsal
priority since those are the customer-facing claims.
Out of scope of this commit (called out in the roadmap doc):
- ProvisionVmScore integration (today operator runs fleet_vm_setup
out-of-band).
- Operator install via Helm (smoke-a4 runs operator host-side; this
harness inherits that pattern).
- Full SSH-based agent install via FleetDeviceSetupScore — Score
built, invocation gated.
Adds ROADMAP/fleet_platform/v0_demo_e2e.md and threads it from
v0_1_plan.md. The VM rehearsal extends smoke-a4 (already-green k3d
+ libvirt VM + agent + apply CR + reconcile loop) with Zitadel +
auth callout + agent JWT auth. Two devices + one admin, real
cargo tests sharing a OnceCell-bringup.
Plan calls out:
- The 7 tests, including the load-bearing
`agent_recovers_from_nats_pod_restart` (asserts the auto-reconnect
+ auth-callback re-mint path under realistic disturbance).
- Five known risks / debugging traps to expect on first cold-start
(iam-admin-pat secret timing, /etc/hosts injection, k3d port
collisions, etc.).
- Success criteria for the rehearsal day: cold cargo run greens in
<20 min, all 7 tests green on a clean machine, the NATS-restart
test reliably greens 5 runs in a row.
- Anything below the success criteria → reframe the customer call
to "architecture walkthrough + local k3d demo + pilot in 1-2
weeks." Avoids burning the relationship to keep a deadline.
Once VM rehearsal is green the residual OKD deltas are configuration
(Route annotations, image registry, real DNS, cert) — no new code.
The VM smoke harness still uses shared NATS creds for v0 (no Zitadel
JWT path through libvirt — the customer-facing Pi flow has it via
fleet_rpi_setup --bootstrap-token). Rewriting the FleetDeviceSetupConfig
literal against the new `auth: FleetDeviceAuth` field.
Hand-on walkthrough for the 48-hour customer demo:
- Operator: build/push the callout image → fleet-staging-deploy →
capture project_id + cli_client_id from the printed panel.
- Developer: fleet-sso-login proves Zitadel SSO works end-to-end.
- Pi onboarding: extract iam-admin-pat from the staging cluster,
cross-compile the agent for aarch64, run fleet-rpi-setup once
per device with --bootstrap-token. Each Pi's agent connects to
NATS over WSS using the JWT-bearer token minted from its
per-device keyfile.
- Deploy a container to a labeled subset via
example_harmony_apply_deployment with --env / --volume / --restart
flags (env + bind mounts + restart policy that work_item #1 added).
- Observe the cross-device security model holding via the auth
callout's logs.
Also captures what's deliberately NOT in the demo (compose
auto-translation, UI, Tailscale backdoor, device-join-request
flow, OpenBao, K8s OIDC) so the customer call has clean expectation-
setting.
The runbook is the closing piece of the 48h-demo work plan;
sequenced after the eight feat / refactor commits that built the
underlying functionality.
Adds `examples/fleet_sso_login/` — the developer-side CLI that proves
the SSO works end-to-end against a deployed staging instance. RFC 8628
device-code flow:
- POSTs `/oauth/v2/device_authorization` with the harmony-cli client_id.
- Prints `verification_uri_complete` so the developer opens one URL in
the browser; Zitadel handles the auth (username/password, MFA,
whatever the customer has wired into Zitadel's auth chain).
- Polls `/oauth/v2/token` honouring the standard `authorization_pending`
/ `slow_down` polling protocol.
- On success: decodes the access token's claims, prints
`Welcome <name> <email>`, persists the session (issuer + client_id +
access_token + claims) at $DATA_DIR/harmony/sso-session.json with
mode 0600.
For the demo this proves the SSO chain end-to-end. The actual
`harmony fleet apply` operation (which would consume the persisted
token through a fleet-platform API gateway) is post-demo — clusters
typically don't accept Zitadel JWTs as kube-apiserver bearer tokens
without an OIDC integration the customer would have to opt into.
`fleet_staging_deploy` now also provisions a `harmony-cli` Device
Code OIDC application alongside the existing API app, captures its
client_id from the ZitadelClientConfig cache, and prints both the
client_id and the exact `cargo run -p example-fleet-sso-login ...`
invocation in the operator's "next steps" panel.
Adds `examples/fleet_staging_deploy/` — the operator-side, run-once-
per-customer harness that brings up the fleet platform's central
services on a real OKD/K8s cluster. Complements the existing
`fleet_auth_callout` (k3d local-dev harness, kept unchanged) and
`fleet_rpi_setup` (per-device onboarding).
`FleetDomainConfig` is the single source of truth for hostnames:
base_domain = "customer1.nationtech.io"
→ zitadel.<base> (Zitadel HTTPS via OKD HAProxy edge-TLS)
→ nats.<base> (NATS WSS through the same ingress)
Nothing is hardcoded; the operator supplies one --base-domain flag
and the deploy is fully parameterized. Re-running is idempotent
(rides the helm-upgrade-by-default + ZitadelSetupScore search-then-
create + persisted issuer-NKey-secret idempotency layers).
NATS values render under config.merge.{auth_callout, accounts,
system_account}, with WSS via `websocket: { enabled, port: 8443,
ingress: { className: openshift-default, ... } }` and the OKD-flavored
HAProxy edge-TLS annotations:
route.openshift.io/termination: edge
haproxy.router.openshift.io/timeout: "1h"
(Switch to `reencrypt` when the customer wants pod-to-edge TLS;
gateway-api migration is on their roadmap, separate from the demo.)
bring_up_staging():
- Deploys ZitadelScore (external_secure: true, no external_port → 443).
- Waits for HTTPS .well-known.
- Provisions the project + API app + roles via ZitadelSetupScore
hitting Zitadel through the public ingress (port 443, TLS verified).
No machine users provisioned — fleet_rpi_setup mints them on demand
per device, so the staging deploy stays device-count-agnostic.
- Persists / reads the issuer NKey seed in the
`callout-issuer-seed` K8s secret (so re-runs don't invalidate
user JWTs already in flight on customer Pis).
- Deploys NATS via NatsHelmChartScore with the WSS values.
- Deploys NatsAuthCalloutScore (oidc_audience = project_id;
external_secure path means no danger_accept_invalid_certs).
main.rs ends by printing the exact `cargo run -p
example-fleet-rpi-setup ...` invocation the operator runs against a
Pi, with the project_id and zitadel/nats URLs filled in.
Three unit tests cover the domain config + NATS values rendering
(WSS + edge-TLS annotations + auth_callout under merge).
The Pi onboarding flow can now mint a per-device Zitadel machine user
on the operator's machine and ship the resulting JWT key to the Pi —
the agent then authenticates to NATS via JWT-bearer instead of shared
nats_user/nats_pass.
`FleetDeviceSetupConfig.auth: FleetDeviceAuth` replaces the previous
flat `nats_user` / `nats_pass` fields. Two variants:
- TomlShared { nats_user, nats_pass } — legacy / dev fallback.
- ZitadelJwt { machine_key_json, oidc_issuer_url, audience, ... } —
per-device JWT-bearer. The Score:
* Drops `machine_key_json` to /etc/fleet-agent/zitadel-key.json
(mode 0640, owner fleet-agent — matches the agent's secret-mount
conventions).
* Renders [credentials] type = "zitadel-jwt" pointing at that
keyfile + the issuer + audience the agent's CredentialSource
needs.
A change to either the keyfile content or the TOML triggers an
agent restart, same as binary / unit drift.
`fleet_rpi_setup --bootstrap-token <PAT>` activates the Zitadel path.
The bootstrap PAT is held in the CLI's memory only; it never lands
on the Pi. New flags: --zitadel-issuer-url, --zitadel-project-id,
--zitadel-device-role (default `device`), --danger-accept-invalid-certs.
`zitadel_bootstrap` is a slim ManagementAPI client that, idempotently
per device:
1. Find-or-create machine user `device-${device_id}`.
2. Find-or-skip a project role grant (defaults to `device`).
3. Always mint a fresh JSON key and return its content. (Zitadel
doesn't expose the private half of an existing key, so reusing
isn't possible — stale keys remain valid until expiry, which is
fine because each setup run overwrites the on-device keyfile.)
Three new render_toml tests cover the zitadel-jwt path; eleven
existing agent tests still pass.
Out of scope, tracked: device-join-request + admin-approve flow that
would replace bootstrap-PAT entirely (closer to the OKD
node-approval pattern). Long-lived admin PAT is acceptable for the
demo per product call.
The merge of feat/prepare-rpi added a `sudo_password: Option<String>`
field to SshCredentials but the `default_ubuntu_aws` constructor on
the destination branch was authored before that field existed. Add
the missing field as `None` (matches the prepare-rpi semantics:
passwordless sudo expected unless explicitly configured).
The fleet agent's NATS connection is the load-bearing piece of the
"never lose connectivity to a device" guarantee. This commit makes
that hold even when Zitadel access tokens expire across NATS pod
restarts and network partitions.
New `[credentials]` config variants (externally-tagged):
type = "toml-shared" { nats_user, nats_pass } # v0/dev
type = "zitadel-jwt" { key_path, oidc_issuer_url, audience, ... }
A `CredentialSource` enum dispatches per variant:
- TomlShared returns the same user/pass each call.
- ZitadelJwt mints an access token from Zitadel via the JWT-bearer
flow (RFC 7523). The keyfile at `key_path` is the only durable
secret on the device; the bearer token is short-lived and refreshed
in-memory when the cached value is within 5 minutes of expiry.
Two concurrent refreshes are race-safe — the second writer's mint
is wasted but produces a correct token.
The agent's `connect_nats` is rewritten on top of async-nats's
`with_auth_callback`, which is invoked on every (re)connect attempt:
- async-nats reconnects automatically on disconnect (default
behaviour of ConnectOptions) — we don't need a watchdog.
- Each reconnect attempt invokes the callback, which calls
`next_credential()`. If the cached token is expired, a fresh one
is minted before the reconnect proceeds. So a Pi that loses NATS
while its token has just expired will pick up a brand-new token
on the next reconnect attempt with no operator intervention.
- An `event_callback` surfaces Connected / Disconnected / SlowConsumer
/ ServerError events into tracing — operators can see exactly when
reconnects happen, which is non-negotiable for an out-of-warranty
device fleet.
A subtle constraint drove the trait shape: async-nats's
`with_auth_callback` requires the returned future to be `Send + Sync`,
which `#[async_trait]`'s erased `Pin<Box<dyn Future + Send>>` does
not satisfy. The credential source is therefore an enum (concrete
dispatch) rather than `dyn CredentialSource`. Two variants is small
enough that enum dispatch beats trait-object plumbing.
Out of scope, tracked for follow-up: a separate daemon for SSH access
to the Pi via Tailscale/Headscale ("secure backdoor"), and the
device-join-request + admin-approve flow that would replace the
current admin-PAT bootstrap pattern.
The previous commit swept in `.claude/worktrees/*` (ephemeral agent
worktree submodules) and a few scratch files that landed at the repo
root during prior sessions. None of them are project artifacts.
Removing them from the index and adding to .gitignore so future
`git add -A` doesn't re-include them.
Files on disk are unchanged.
The IoT walking-skeleton's PodmanV0Score and the underlying
ContainerSpec capability were name+image+ports only. Real customer
workloads (the demo target's docker-compose for example) need at
minimum:
- Environment variables for runtime config + secrets injected at
deploy time.
- Bind-mount volumes so the container can persist data across
recreates (sqlite db files, config dirs).
- Restart policy so the container survives device reboot or crash.
PodmanService and ContainerSpec gain `env: Vec<(String, String)>`,
`volumes: Vec<VolumeMount>`, and `restart_policy: RestartPolicy`. All
three default to empty / `unless-stopped` via #[serde(default)] so any
Deployment CR written before this change still deserializes — that
includes the existing smoke harnesses and any field-side state.
VolumeMount is bind-only in v0 (host_path -> container_path, optional
read_only). Named/anonymous volumes can be added behind the same field
later by inspecting host_path's shape; the customer's compose file is
expected to use bind mounts only.
RestartPolicy mirrors podman/docker convention — `no`,
`unless-stopped` (default, matching docker-compose), `on-failure`,
`always`. Serialized kebab-case so docker-compose translation is
mechanical.
PodmanTopology::ensure_service_running now passes env / mounts /
restart policy to the podman API. matches_spec conservatively forces
recreate whenever the spec carries non-empty env / volumes or a non-
default restart policy: the podman list endpoint doesn't surface those
fields, so a structural compare isn't possible from ListContainer
alone. Recreating an unchanged container is cheap (~hundreds of ms);
the alternative (silent stale-config window) isn't acceptable for
fleet-managed devices.
example_harmony_apply_deployment grows --env, --volume, and --restart
flags so an operator can drive the new shape from the CLI when
authoring a Deployment CR.
Tests:
- legacy CR JSON without the new fields deserializes (wire-compat).
- env ordering survives roundtrip (drift-detection invariant).
- restart policy serializes kebab-case (compose-translation contract).
- podman_v0_score_roundtrip exercises env + volumes + restart.
harmony-nats-callout becomes a deployable service, not just a library:
- New [[bin]] target with env+secret-file driven config and
SIGINT/SIGTERM-aware shutdown.
- Dockerfile (single-stage archlinux:base, non-root, matches
harmony-fleet-operator convention).
- Refactored handler into a pure `decide()` function so the entire
authorization decision tree is unit-testable without async-nats.
- New `roles` module with role resolution + a `validate_device_id`
security gate that rejects NATS subject metacharacters in device_id
(.>* whitespace) — closes a real escalation path through the
`{device_id}` placeholder in the per-device permissions block.
- Configurable role claim path + admin/device role names; admin wins
when both are present (privilege-escalation invariant).
57 unit tests cover every reachable branch of the security decision
tree; 4 e2e tests in nats/integration-test-callout exercise real NATS
in podman with: device pubsub on own subjects, cross-device subject
isolation, admin-can-read-anything, and JWT-without-role rejection.
harmony/src/modules/nats_auth_callout/:
- New `NatsAuthCalloutScore` deploys the callout as a K8s Deployment +
Secret. fsGroup + 0o440 secret mode so the non-root container can
read its mounted seed/password without leaving them in env vars.
- `render_auth_callout_block` helper produces the YAML for NATS Helm
`config.merge.authorization.auth_callout` so both halves stay in
sync.
examples/fleet_auth_callout/:
- `bring_up_stack()` orchestrates k3d -> Zitadel + Postgres ->
CoreDNS rewrite -> project + roles + machine users with JWT keys
-> NATS Helm with auth_callout block -> callout image build +
sideload -> NatsAuthCalloutScore deploy. Idempotent across re-runs
(issuer NKey persisted in a K8s secret so user JWTs survive
restarts).
- `mint_access_token()` RFC 7523 JWT-bearer client. Uses Host header
with port so Zitadel emits a matching issuer.
- main.rs prints URLs/creds/keyIds and waits for Ctrl-C.
- Three #[tokio::test] functions sharing one cluster via OnceCell:
admin_can_read_any_device_subject, device_can_only_access_own_subjects,
unknown_role_is_rejected. All green on real k3d.
ZitadelScore:
- Auto-provisions an `iam-admin-pat` Kubernetes secret via the chart's
FirstInstance.Org.Machine.Pat block. ZitadelSetupScore depended on
this secret existing; without the chart values, the prior code path
was non-functional.
- New `external_port: Option<u32>` field. Controls Zitadel's emitted
issuer URL when the host port mapping isn't 80/443 (k3d typically
maps 8080:80). Without it, JWT-bearer audience validation 500s with
`Errors.Internal` because the assertion's `aud` doesn't match the
chart-default issuer at port 80.
ZitadelSetupScore is extended for the JWT-bearer flow needed by the
NATS auth callout:
- API apps (resource servers — required for project-id audience scope)
- Project roles (`POST .../projects/{id}/roles`, idempotent)
- Machine users with KEY_TYPE_JSON keys (provisioned + cached
device-side; Zitadel does not expose the key material on subsequent
reads, so the local cache is the source of truth)
- User grants (project + role keys)
Cache (ZitadelClientConfig) gains projects, machine_user_ids,
machine_keys, and user_grants — keyed for idempotency across re-runs.
Backwards compatible with existing harmony_sso example: the new fields
have `#[serde(default)]` and prior callers just need empty vecs.
Refresh upgrade-by-default in helm chart (separate commit) lets
ExternalPort changes propagate to existing releases on re-run.
Helm releases without a pinned `chart_version` previously short-circuited
to a NOOP when already installed, which silently dropped any
`values_yaml` / `values_overrides` changes the caller had made. Now we
fall through to `helm upgrade --install` whenever:
- the release isn't installed (unchanged), or
- it's installed and either unpinned or pinned-and-matching.
Helm itself becomes the source of truth for "did anything actually
change" — no-op upgrades are cheap and changed values get applied
automatically without the caller having to opt in via a flag.
`install_only=true` keeps the prior skip-if-installed shortcut so
bootstrap operators (cert-manager, prometheus-operator, CRDs) that
should not be touched on re-runs continue to behave the same.
Pinned-version safety net is unchanged: a different version installed
than what the score requests is an error, never a silent change.
The new sudo_password field is strictly for privilege escalation on
the remote host (sudo -S, ansible become) — not for SSH login. SSH
auth is still key-only. Adds a TODO on SshCredentials pointing at
where SSH password support would land if/when we want it, and a
matching note on the SudoPassword Secret type.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probe `sudo -n true` over SSH before constructing the topology. If
the probe succeeds (passwordless sudo, the typical rpi-imager
default), proceed silently. If it fails, fetch the password through
SecretManager::get_or_prompt::<SudoPassword>() — first run prompts
the operator, subsequent runs reuse the cached value (same flow
SshKeyPair etc. use).
Adds harmony_secret dep, env.sh with the standard
HARMONY_SECRET_NAMESPACE / HARMONY_SECRET_STORE / HARMONY_DATABASE_URL
/ RUST_LOG variables, and a doc snippet at the top of main.rs
pointing at it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets callers populate creds.sudo_password when the bootstrap admin
doesn't have passwordless sudo. None = current behavior unchanged.
Wire-level injection:
- ansible runs: when Some, write to a tempfile::NamedTempFile and
pass ANSIBLE_BECOME_PASSWORD_FILE=<path> via Command::env. Path
in env, never value in argv. File deletes on drop.
- direct ssh_exec sudo paths (ensure_linger, ensure_user_unit_active,
fetch_file): new sudo_exec helper that uses `sudo -S` with the
password piped via the new ssh_exec stdin parameter, otherwise
plain sudo. ensure_user_unit_active's && chain folded into one
sudo+sh -c call since `sudo -S` only reads stdin once.
ssh_executor.rs: ssh_exec gains an optional stdin: Option<&str>; on
Some, writes via channel.data() then channel.eof() so the remote
reader doesn't hang. Existing 4 call sites pass None.
fleet_vm_setup updated to set sudo_password: None (behavior
identical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sudo password for a Linux bootstrap admin user. Stored under key
"SudoPassword" via SecretManager when a host doesn't have
passwordless sudo configured. Same shape as the other single-field
Secret types in this file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New first step (1/7): read /etc/fleet-agent/config.toml off the
device and compare against the rendered desired config. Three
branches:
- missing → info, first install
- matches → warn, converge anyway
- differs → warn + unified diff (similar::TextDiff with 2-line
context radius, '-/+' marker style) + inquire::Confirm prompt
defaulting to N. Aborts with InterpretError if declined.
Existing 6 steps renumbered to 2/7-7/7. The diff replaces the
previous "dump both full configs" approach which was unreadable
even for one-line differences.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors FileDelivery in the opposite direction: returns Some(content)
or None if the file doesn't exist. AnsibleHostConfigurator implements
it via two SSH calls (sudo test -e + sudo cat), routed through sudo
to handle root- or service-owned config files. Added to the
LinuxHostConfiguration umbrella so any score with that bound gets it.
Enables scores to pre-flight-compare desired state against current
state before committing to a destructive change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Folds the "-> /usr/local/bin/fleet-agent" continuation into the
"Agent binary:" line. Removes the hardcoded-indent fragility (bullet
prefix shifts in cli_reporter would have broken alignment).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cli_reporter only accumulated details for SUCCESS, dropping the
recap on idempotent re-runs that legitimately return NOOP with
populated details. FleetDeviceSetupScore is the first score to
exercise this path; the filter was over-restrictive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the bespoke framed renderer, failure hint catalog, and custom
env_logger setup. Score output now flows through harmony_cli's
standard reporter (bullet list under "🚀 All done!"), matching the
other examples. cli_logger::init() at the top of main so early
logs (ensure_ansible_venv) get the same formatting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the opaque change-log with tagged per-step info traces and
a human-readable Outcome.details recap (Device ID / NATS / Labels /
User / Agent binary -> remote / Service). User and Service lines
carry their own ✅/🔄 state markers; final line is ✅ for noop and
🎉 for runs that applied changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When stdout already parses into UNREACHABLE!/FAILED! + msg, the
trailing (ansible-exit=..., stderr=..., stdout=...) envelope just
duplicated the same text. Strip it when stderr is empty and the
verb is recognized; keep it when it adds debug signal.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling of fleet_vm_setup with the libvirt provisioning step removed:
the operator has already booted Pi OS Lite themselves (rpi-imager,
preloaded SSH key, passwordless sudo on the admin user), so the
example goes straight to applying FleetDeviceSetupScore over SSH.
Defaults match the typical rpi-imager flow (--pi-user pi,
--ssh-key ~/.ssh/id_ed25519); --ssh-key supports tilde expansion.
The harmony dep is pulled in without the kvm feature since no VM is
created here. RUST_LOG defaults to info so the score's per-step
traces show up without the operator having to set the env var.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
systemctl --user enable --now is systemd-level idempotent, but the
prior implementation always returned ChangeReport::CHANGED. This made
every re-run of any score that touches a user-scoped unit (notably
FleetDeviceSetupScore's podman.socket step) lie about its change
count, defeating the noop detection the rest of the score honors.
Probe is-enabled --quiet && is-active --quiet first; only call
enable --now (and report CHANGED) when the unit isn't already in the
desired state. Mirrors the existing ensure_linger pattern in the
same file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nats-jwt:
- Add NkeyPub newtype with prefix validation
- Add ClaimType and Algorithm typed enums
- Add impl_nats_claims! macro eliminating 4x duplicated impl blocks
- Add AuthorizationRequestClaimsBuilder (completing all builder types)
- Fix AuthorizationResponseBuilder: add issuer() builder method, stop
mutating iss in sign()
- Tighten trait bounds: encode<T: Serialize>, decode_unverified<T:
DeserializeOwned>
- Remove dead error variants Expired/NotYetValid
- Add builder tests for all 4 claims types
- Deduplicate is_zero helper
harmony-nats-callout (rewritten):
- AuthCalloutService: production service connecting to NATS, subscribing
to .REQ.USER.AUTH, dispatching auth requests
- AuthCalloutConfig with builder pattern
- handler.rs: pure auth request handler (decode → validate → mint →
respond) extracted from test
- Fix ZitadelValidator: validate() is now async (was blocking_read
deadlock in async contexts)
- Remove dead fields kid_map, jwks_uri
- Make danger_accept_invalid_certs configurable
- permissions: InterpolatedPermissions named struct instead of 4-tuple
integration-test-callout:
- Converted to lib+test crate: src/lib.rs exports test utilities
- Tests now exercise the REAL AuthCalloutService (not inline handler)
- Extracted MockOidcServer, NatsServer, CalloutContext into library
- Replace yasna with rsa crate for DER parsing
- Add Drop to NatsServer for container cleanup
- Add module constants for all magic values
- README updated with new architecture diagram
- nats-jwt crate: JWT builder types for user claims, authorization
request/response, account claims, algorithm encode/decode
- harmony-nats-callout crate: Zitadel OIDC JWT validator, callout
service scaffold, account manager (WIP)
- integration-test-callout: end-to-end test validating the full
auth callout flow — device connects with Zitadel JWT → callout
validates JWT → returns per-device user JWT with scoped
permissions → device can pub/sub on its own subjects only
- Mock OIDC server for test (JWKS + openid-configuration)
- Negative test: device A cannot subscribe to device B's subjects
- Added UserClaimsBuilder::audience() for account-scoped user JWTs
Addresses the review point that the applier CLI was anchored in IoT
vocabulary, but the CRD it applies is a generic declarative-
reconcile intent that works for Pi podman today and OKD / KVM /
anything-reconcilable tomorrow. The name now reflects what it
actually does.
Mechanical rename: crate, binary, `PatchParams::apply(...)` field
manager, doc comments, every reference in smoke-a4.sh, the
v0_1_plan.md Chapter 1 section, and the example itself. The CRD
types + paths + operator name are *not* touched by this commit —
that's the broader rebrand, planned for a dedicated branch.
- examples/iot_apply_deployment/ → examples/harmony_apply_deployment/
- crate name: example_iot_apply_deployment → example_harmony_apply_deployment
- binary name: iot_apply_deployment → harmony_apply_deployment
- PatchParams field manager: "iot-apply-deployment" → "harmony-apply-deployment"
0 stragglers: `grep example_iot_apply_deployment` across the tree
returns empty.
Addresses the review point that NatsBasicScore was a parallel
typed-k8s_openapi path — reinventing probes, resource shapes, pod
anti-affinity, JetStream storage — instead of reusing what
NatsK8sScore already does via the upstream nats/nats helm chart.
Every shape the project will ever ship (supercluster, single node,
TLS, gateway, leaf nodes) is expressible as values on that chart.
Parallel resource construction was churn waiting to diverge.
The shape now:
HelmChartScore [existing helm-install primitive]
▲
│ pins chart + repo
│
NatsHelmChartScore (new) [exposes values_yaml only]
▲ ▲
│ │
NatsBasicScore NatsK8sScore
(single node) (supercluster + TLS + gateways)
Changes:
- Delete harmony/src/modules/nats/node.rs (279 lines of typed
k8s_openapi Deployment/Service/Namespace — gone).
- New harmony/src/modules/nats/helm_chart.rs: NatsHelmChartScore
pins chart_name = "nats/nats" and its official repository;
values_yaml is the only varying input. Implements Score<T> for
any topology with HelmCommand; caller hands it to
K8sBareTopology / HAClusterTopology / K8sAnywhereTopology.
- Rewrite score_nats_basic.rs as a thin preset: build a minimal
single-node values_yaml (fullnameOverride, replicaCount=1,
cluster.enabled=false, jetstream on/off, service type via the
chart's `service.merge.spec.type` knob, optional image
override). 10 unit tests on render_values covering every
builder combination + image-ref splitting. Score bound moves
from `T: K8sclient` to `T: HelmCommand` since installation is
now helm-based.
- score_nats_k8s.rs: last step in deploy_nats switches from a
hand-constructed HelmChartScore to NatsHelmChartScore::new(...).
Supercluster values_yaml construction untouched — a supercluster
is just a more elaborate values file against the same chart.
- bare_topology.rs: add `impl HelmCommand for K8sBareTopology`
so the in-load-test flow (K8sBareTopology → NatsBasicScore →
NatsHelmChartScore → HelmChartScore) compiles. Returns a bare
`helm` command; KUBECONFIG resolution mirrors how HAClusterTopology
does it.
- mod.rs: export NatsHelmChartScore + the re-shaped NatsServiceType.
- load-test.sh: the nats/nats chart provisions a StatefulSet, not
a Deployment. Wait on `pod -l app.kubernetes.io/name=nats`
instead of `deployment/iot-nats` — works across workload kinds.
Tests:
- 2 helm_chart unit tests (chart+repo pinning, default install-
upgrade semantics)
- 10 score_nats_basic unit tests covering every values shape
- Full load-test.sh e2e (20 devices / 3 CRs / 20s): PASS.
Three production-path improvements bundled into one chart change,
all verified end-to-end (helm lint + load-test pass):
1. Switch from `HelmResourceKind::from_serializable(...)` to the
typed `HelmResourceKind::{Namespace, ServiceAccount, ClusterRole,
ClusterRoleBinding, Crd}` variants added to the shared harmony
helm module. Serialization output is byte-equivalent; IDE
discoverability + type-safety go up.
2. Annotate both CRDs with `helm.sh/resource-policy: keep`. Without
this, `helm uninstall iot-operator-v0` cascade-deletes the CRDs;
the kube GC then deletes every Deployment CR and every Device CR;
the operator finalizer fires on each deletion and wipes the
`desired-state` KV; agents tear down every container. One typo
on uninstall would be fleet-wide catastrophe. `keep` makes
uninstall data-preserving and idempotent — wipe requires an
explicit `kubectl delete crd …`.
3. Lock down the operator Pod's securityContext:
- `runAsNonRoot: true`
- `readOnlyRootFilesystem: true`
- `allowPrivilegeEscalation: false`
- `capabilities: drop [ALL]`
- `seccompProfile: RuntimeDefault`
Deliberately *no* `runAsUser` — OpenShift's `restricted-v2` SCC
assigns namespace-specific UIDs and rejects fixed ones. The
image's `USER 65532:65532` (Dockerfile) gives vanilla k8s a
non-root UID; OpenShift's SCC overrides with its own. Same chart
works on both without custom SCC bindings.
Dockerfile adds `USER 65532:65532` — required for vanilla k8s to
accept `runAsNonRoot: true` without a Pod-level `runAsUser`. 65532
is the distroless/chainguard `nonroot` convention; arbitrary but
safe (no overlap with common system UIDs).
Tests: 2 chart unit tests locking in the keep annotation + SC
shape. End-to-end load test at 20 devices / 3 CRs: pod comes up
clean under the restricted SC, all aggregates correct, zero
operator warnings.
Extends HelmResourceKind with typed variants for Namespace,
ServiceAccount, ClusterRole, ClusterRoleBinding, and
CustomResourceDefinition. Previously only Service + Deployment
had typed variants; everything else went through the
`from_serializable`/`CustomYaml` escape hatch.
The escape hatch stays (documented as "always prefer a typed
variant") for forward-compat with types we haven't imported yet.
Any consumer currently using `from_serializable` for one of the
new typed variants can switch; serialization output is byte-
equivalent (both paths route through serde_yaml on the same
k8s_openapi struct).
Motivation: every Rust operator built on harmony wants the same
five resources — Namespace, SA, ClusterRole, ClusterRoleBinding,
CRD — to be chart-template-ready. Typing them once here means
every operator's chart.rs stays short and IDE-discoverable
instead of a string-of-from_serializable-calls.
Filenames carry the resource name where applicable
(serviceaccount-<name>.yaml, clusterrole-<name>.yaml, etc.) so
charts with multiple ClusterRoles don't collide on a single
`clusterrole.yaml` file.
2 unit tests: unique-filename invariant across the five typed
variants, and crd-name round-trip.
Before: the agent published only `device-id=<id>` on DeviceInfo,
which collapsed every Deployment.spec.targetSelector to "target one
device by id" — usable, but not the actual scalability story. The
K8s-Node analogue wants kubelet-declared node labels driving
DaemonSet nodeSelector; we were missing the equivalent.
After: a new `[labels]` section in the agent's TOML config, set by
IotDeviceSetupScore and plumbed through to every DeviceInfo
publish. Config labels merge with the default `device-id` on
startup. Re-running the Score with a changed label map regenerates
the TOML, triggers the byte-compare idempotency path, restarts the
agent; new labels propagate into Device.metadata.labels and
Deployment selectors re-resolve on the operator side. Manual toml
edits + `systemctl restart iot-agent` is the break-glass path.
Scope:
- iot/iot-agent-v0/src/config.rs: `labels: BTreeMap<String,String>`
on AgentConfig, defaults to empty via #[serde(default)]. Two
parse tests cover the "section present" + "section absent"
cases.
- iot/iot-agent-v0/src/main.rs: merge cfg.labels with the default
`device-id` entry before DeviceInfo publish. Config wins on
key conflicts — unusual but legal.
- harmony/src/modules/iot/setup_score.rs: IotDeviceSetupConfig
gains `labels: BTreeMap<String,String>` (replacing the
dedicated `group` field — group is just a conventional label
now, not a distinct axis). render_toml renders a [labels]
section; BTreeMap iteration guarantees sorted output so the
Score's byte-compare change detection stays idempotent. Three
unit tests: section content, byte-identical rendering across
runs, value escaping.
- examples/iot_vm_setup/src/main.rs: `--labels key=val,key=val`
with a parser that errors on malformed chunks, empty keys/values,
or an empty map (a device with no labels is practically
untargetable, better to fail at the CLI than onboard a ghost).
Live label changes require an agent restart (same as kubelet's
--node-labels on a running Node). Edit-labels-on-running-fleet
is a later chapter; for v0 the restart cost is negligible.
Tests: 7 iot-agent + 3 iot setup_score + existing operator/
contracts suite — all green.
Addresses the review point that NatsBasicScore was introduced as a
parallel NATS path instead of sharing primitives with the rest of
the module. The render logic (Deployment + Service + Namespace for
one NATS server pod) is now pulled into a new `nats::node`
module built on ADR 018 — typed k8s_openapi structs, no helm
templating — and NatsBasicScore is a high-level preset that sets
defaults on a NatsNodeSpec and runs the shared render fns.
Module-level doc on `nats::node` explicitly flags that future
high-level scores (clustered, TLS, gateway) should grow the spec
and reuse the same primitive, and that NatsK8sScore +
NatsSuperclusterScore are scheduled to migrate onto this primitive
in a follow-up so the helm-templating path disappears entirely
from the NATS module.
7 unit tests between node (the primitive) + score_nats_basic (the
wrapper) cover service-type routing + JetStream flag propagation.
Two changes with a single motivation — make the iot-agent runtime
robust under multi-user hosts + unblock chaos-testing workflows
on the VM admin user.
1. iot-agent user is no longer --system.
Rootless podman needs subuid/subgid ranges in /etc/subuid +
/etc/subgid before layer unpacking. Ubuntu's useradd --system
deliberately skips those allocations (system users aren't
expected to run user namespaces), so we were patching the gap
with a hardcoded "usermod --add-subuids 100000-165535". That
range collides with any other user on the host that also runs
rootless containers — a real footgun. Dropping --system lets
useradd's default allocator pick a non-overlapping range, and
the whole ensure_subordinate_ids trait method + ansible impl
goes away as dead code.
2. VmFirstBootConfig.admin_password (Option<String>).
When set, cloud-init unlocks the account and enables
ssh_pwauth on the guest — intended for reliability / chaos
testing sessions where the operator wants to log in and break
things on purpose. Default is still key-only auth.
example_iot_vm_setup plumbs a --admin-password flag +
IOT_VM_ADMIN_PASSWORD env var; smoke-a4 passes them through
so chaos sessions are one env var away from a ready VM.
3 cloud-init unit tests cover the locked + unlocked + YAML-escape
paths.
Generates a self-contained helm chart directory from typed Rust
(ADR 018 — Template Hydration). The chart packages:
- Deployment CRD (from Deployment::crd())
- Device CRD (from Device::crd())
- ServiceAccount, ClusterRole, ClusterRoleBinding with the exact
verbs the operator uses — nothing aspirational
- operator Deployment (image, env NATS_URL + RUST_LOG)
No hand-authored yaml, no Helm templating. Re-run the chart
subcommand to regenerate for different inputs. When a publishable
chart is needed (user-facing `values.yaml`), layer a templating
pass on this output; for the load test the plain chart is enough.
New surface:
- `iot-operator-v0 chart --output <dir> [--image ... --nats-url ...]`
writes the chart tree and prints its path.
- `iot/iot-operator-v0/Dockerfile` — minimal archlinux:base wrapper
around the host-built release binary (glibc-ABI match without a
two-stage Docker build).
load-test.sh: drops the host-side operator spawn entirely. Phase 3
now builds the operator image, sideloads it into k3d via `podman
save | docker load | k3d image import`, generates the chart via
the `chart` subcommand, and `helm upgrade --install` it into the
cluster. `dump_operator_log` pulls `kubectl logs` into the stable
work dir so HOLD=1 + failure-tail hooks keep working.
Two gotchas debugged along the way, preserved in code comments:
- workspace `.dockerignore` excludes `target/`, so the image build
uses a staged build context under $WORK_DIR/image-ctx.
- `podman build -t foo/bar:tag` stores as
`localhost/foo/bar:tag`, which k3d image import can't find under
the original tag. Use `localhost/iot-operator-v0:latest` as the
canonical image ref end-to-end.
Load-test results (selector architecture, operator in helm-
installed pod, same envelope as the host-side baseline):
| Scale | Duration | Writes | Rate | Errors | CR aggregates |
|-------|---------:|-------:|-----:|-------:|:-------------:|
| 20 devices / 3 CRs | 20s | 400 | 20/s | 0 | 3/3 ok |
| 10k / 1000 CRs | 120s | 1,201,967 | 10,009/s | 0 | 1000/1000 ok |
No operator warnings, no errors across the run. Image build +
sideload + helm install adds ~30s to startup; steady-state
throughput unchanged from host-side.
Roadmap:
- v0_1_plan.md Chapter 2: rewrite to describe the shipped selector +
Device CRD model (matchedDeviceCount, LabelSelector, per-concern KV).
Drop AgentStatus / observed_score_string / target_devices references.
Update "State of the world" preamble to match 2026-04-23 reality.
- chapter_4_aggregation_scale.md: SUPERSEDED banner at top with a
clear what-was-kept vs. what-was-dropped summary. Original body
preserved as decision-trail archaeology.
Code review pass on the iot crates, behavior-preserving:
- fleet_aggregator: owned_targets is now keyed by DeploymentName
(matches the KV key space — globally unique, no namespace). The
old DeploymentKey keying created an orphan-leak on operator
restart: seed_owned_targets stashed entries under a sentinel
namespace ("") that on_deployment_upsert never merged. Now
seeding populates the map correctly so restart + selector change
diffs properly.
- fleet_aggregator: reuse the Client passed into run() for the
patch_api instead of calling Client::try_default() a second time.
- fleet_aggregator: delete _use_list_params / _use_deployment_spec
placeholder scaffolding + unused ListParams / DeploymentSpec /
ScorePayload imports. Inline one-liner serialize_score.
- fleet_aggregator: clean up `then(|| ...)` → filter/map split.
- device_reconciler: `is_label_value(v).then_some(()).is_some()`
→ plain `is_label_value(v)`.
- crd: delete speculative DeviceStatus + DeviceCondition (no one
writes to them; the comment in DeviceSpec documents where they'd
land when a heartbeat-reflection reconciler shows up).
- controller: compute `obj.name_any()` once in cleanup().
All 24 tests green. End-to-end load test (20 devices / 3 groups /
20s) PASS after the changes.
Kills the "CRD owns a list of device ids" smell. Deployment CR now
carries a standard K8s LabelSelector; Device is a first-class cluster-
scoped CR (like Node). Matching, desired-state KV writes, and status
aggregation all run off selector evaluation against the Device cache
— no list of device ids anywhere in the CRD spec.
Cross-resource model:
- Agent publishes DeviceInfo (with labels) to NATS `device-info` KV.
- device_reconciler watches that bucket → server-side-applies a
cluster-scoped Device CR with metadata.labels + spec.inventory.
- Deployment controller is now just validation + finalizer cleanup.
- fleet_aggregator watches Deployment CRs + Device CRs + device-state
KV, maintains in-memory selector → target device sets, writes/deletes
`desired-state.<device>.<deployment>` KV on match changes, patches
`.status.aggregate` at 1 Hz with matchedDeviceCount + phase counters.
Applied CRD shape verified on a live k3d cluster:
kubectl get crd deployments.iot.nationtech.io -o json
.spec.versions[0].schema.openAPIV3Schema.properties.spec
→ rollout / score / targetSelector (matchLabels + matchExpressions)
.spec.versions[0].schema.openAPIV3Schema.properties.status.aggregate
→ matchedDeviceCount / succeeded / failed / pending / lastError
kubectl get crd devices.iot.nationtech.io -o json
.spec.scope = "Cluster"
.spec.versions[0].schema.openAPIV3Schema.properties.spec
→ inventory (nullable, camelCased fields)
Load-test run: DEVICES=20 GROUP_SIZES=10,5,5 DURATION=20
all 3 CRs hit expected matched=N / succeeded+failed+pending=N.
Other changes:
- k8s-openapi gets the `schemars` feature so LabelSelector derives JsonSchema.
- InventorySnapshot uses `#[serde(rename_all = "camelCase")]` for consistency with the rest of the CRD schema.
- agent publishes `device-id=<id>` as a default label so the
example_iot_apply_deployment `--target-device <id>` shorthand
works out-of-the-box (implemented as `--selector device-id=<id>`).
- example_iot_apply_deployment gains `--selector key=value` repeatable flag.
- load-test.sh explore banner exposes Device CR commands + new
matchedDeviceCount column.
- Stable working dir under /tmp/iot-load-test/ — kubeconfig at
/tmp/iot-load-test/kubeconfig, operator log at
/tmp/iot-load-test/operator.log. No more chasing mktemp paths.
- Print an explore banner before the load run so the user can
`export KUBECONFIG=...` and `kubectl get deployments -w` in
another terminal while the load actually runs.
- HOLD=1 env var keeps the stack alive after the load completes;
script blocks on sleep until Ctrl-C. Forwards --keep to the
binary so CRs + KV entries stay in place for inspection.
- DEBUG=1 bumps operator RUST_LOG to surface every status patch.
- Keep operator.log after successful runs (cheap, often useful).
- Load-test binary: --cleanup bool → --keep flag (clap bool with
default_value_t = true doesn't accept `--cleanup=false`).
Sequential apply was fine at 10 groups; becomes the startup bottleneck
at 1000. 32-way concurrent CR apply lands 1000 Deployment CRs in ~1.6s;
64-way concurrent DeviceInfo seed seeds 10k devices in ~0.3s.
Also zero-pad CR names and device ids to the largest width so large
runs sort lexicographically in kubectl.
- example_iot_load_test: simulates N devices (default 100 across 10
groups: 55 + 9×5) pushing DeploymentState every tick to NATS, no
real podman. Applies one Deployment CR per group, runs for a
bounded duration, verifies each CR's .status.aggregate counters
sum to the target device count.
- iot/scripts/load-test.sh: minimum harness — k3d cluster + NATS via
NatsBasicScore + CRD + operator + load-test binary. No VM, no
agent build.
- operator: connect_with_retry() on startup. The NATS TCP probe that
the smoke scripts do isn't enough to guarantee the protocol
handshake is ready (k3d loadbalancer can accept SYNs before the
pod is serving); the load harness hit this racing against a
freshly-rebuilt operator binary.
- drop unused rand dep from iot-agent-v0 Cargo.toml.
100-device run: 6002 state writes in 60s at a clean 100 writes/s,
all 10 CR aggregates converge to target_devices.len() (e.g.
group-00 → 55 = 45 Running + 9 Failed + 1 Pending).
`bucket.watch_all_from_revision(0)` sends the JetStream consumer
request with DeliverByStartSequence and an optional-missing start
sequence, which the server rejects with error 10094:
consumer delivery policy is deliver by start sequence, but
optional start sequence is not set
`watch_with_history(">")` uses DeliverPolicy::LastPerSubject instead —
replays the current value of every key, then streams live updates.
Same cold-start-plus-steady-state semantics, correct wire.
Caught by smoke-a4 --auto: state watcher exited immediately on
startup, no deployments ever reconciled.
- agent-status bucket -> device-heartbeat bucket
- status.<device> key -> heartbeat.<device>
- drop parity check summary from smoke-a4 (legacy path is gone)
- tidy stale AgentStatus comment in agent main
Collapses the Chapter 4 event-stream architecture into pure KV watch.
The operator was maintaining a durable JetStream consumer on
device-state-events in parallel with the KV bucket it was meant to
shadow — the stream was an optimization over KV scanning, but with
async-nats's ordered bucket watch it's redundant.
Gone:
- StateChangeEvent, LifecycleTransition, STREAM_DEVICE_STATE_EVENTS,
state_event_subject, STATE_EVENT_WILDCARD (contracts)
- Revision, AgentEpoch (contracts) — restart ordering now handled by
DeploymentState.last_event_at monotonic check
- PhaseCounters.apply_event + incremental diff machinery (operator) —
counters recomputed per dirty CR from the states snapshot
- RecordedTransition + publish_transition split (agent) — without an
event to publish, the pure/publish boundary has no reason to exist
- Agent sequence counter + agent_epoch generation (agent main.rs)
- CR aggregate fields recent_events, last_heartbeat_at, unreported —
never populated, pure speculation
New shape:
- fleet_aggregator.rs watches device-state via bucket.watch_all_from_revision(0)
- apply_state / drop_state mutate an in-memory snapshot
- patch_tick refreshes CR index from kube, recomputes aggregates for
CRs marked dirty, patches CR status
- DeploymentAggregate = succeeded/failed/pending + last_error only
Line counts (3 iot crates):
4263 -> 3090 -> 2162 (-49% overall, -30% this pass)
Tests: 24 total (13 contracts + 6 operator + 5 agent), all green.
Zero consumers, zero publishers — pure speculative surface area.
Drops LogEvent struct, EventSeverity enum, STREAM_DEVICE_LOG_EVENTS,
log_event_subject, logs_subject, logs_query_subject.
If per-device log streaming lands later, it arrives with a real
consumer attached.
Contracts tests: 21 → 19 (removed two roundtrip tests for the deleted type).
Newtypes (review point #3) were the entry. Introducing them forced
the event-payload redesign, and the redesign made the other two
bugs obvious + trivial to fix.
New contract types (harmony-reconciler-contracts::fleet):
- DeploymentName: validated newtype. Rejects empty, > 253 bytes,
'.' (alias an extra NATS subject token), NATS wildcards, and
whitespace. Serde impl validates on deserialize so a malformed
payload is rejected at the wire, not later.
- AgentEpoch(u64): random-per-process. Prefixes every sequence.
- Revision { agent_epoch, sequence } with lexicographic Ord.
- LifecycleTransition enum: Applied { from, to, last_error } |
Removed { from }. Replaces (from: Option<Phase>, to: Phase) so
deletion is modeled explicitly in the wire format.
Bug fixes that fell out of the redesign:
#1 (drop_phase was silent on the wire): `drop_phase` now
produces a RecordedTransition with Removed { from }, which
the publisher serializes into a StateChangeEvent. Operator
applies the Removed variant by decrementing `from` without
a paired increment. Counters no longer over-count after
deletions.
#2 (sequence reset on agent restart): (agent_epoch, sequence)
lexicographic ordering means the first post-restart event
(seq=1 under a fresh epoch) outranks any pre-restart event
the operator had applied. No more silently-dropped events
after an agent crash.
Split recommended in review point #4:
- `record_apply` / `record_remove`: pure in-memory state
updates returning Option<RecordedTransition>.
- `publish_transition`: side-effectful wire emission.
- `apply_phase` / `drop_phase`: thin composite helpers the
hot path uses.
Typed keys in the operator:
- DevicePair { device_id, deployment: DeploymentName } replaces
(String, String) so the two identifiers can't be swapped.
- FleetState.deployment_namespace is keyed by DeploymentName.
- Controller's kv_key signature takes &DeploymentName; invalid
CR names surface as a clear Error rather than corrupting KV.
Tests:
- 27 contract tests (roundtrip every payload shape, including
forward-compat parsing; validate DeploymentName rejection
paths; assert Revision ordering across epochs).
- 19 operator fleet_aggregator tests, including regression
guards named for the specific bugs:
removed_transition_decrements_without_paired_increment (#1)
revision_ordering_handles_agent_restart (#2)
- 8 agent reconciler tests (record_apply/record_remove purity,
sequence monotonicity, agent_epoch stamping, ring buffer
cap).
Agent main wires a fresh AgentEpoch via rand::random::<u64>() at
startup; FleetPublisher::connect takes it and includes it in every
DeviceInfo + state-change event.
Two findings from the M4 smoke runs:
1. **Event consumer dropped events for unknown-namespace deployments.**
The consumer receives state-change events but `apply_state_change_event`
short-circuits when `deployment_namespace` doesn't have the
deployment yet — common on the first 5 s after a new CR is
applied, before the parity-tick's refresh loop runs.
Fix: on unknown deployment, consumer eagerly does a kube
`Api::list()` and populates the map. Subsequent events for
that deployment are fast-path (map already has it).
Also: added instrumentation on publish + receive paths so
future debugging against the parity check produces actionable
traces. Log level is DEBUG to keep INFO clean.
2. **Parity MISMATCH during transitions is correct behavior.**
The legacy aggregator reads AgentStatus which the agent
republishes every 30 s. Chapter 4 state-change events land in
~100 ms. So during a Pending→Running transition there's a
window where the new counter shows succeeded=1 while legacy
still shows pending=1 — precisely because the new path is
faster, which is the point of this rework.
The smoke's hard-fail-on-any-mismatch was too strict; relaxed
to a diagnostic print. Steady state should still converge to
zero mismatches once the next AgentStatus heartbeat lands; the
summary lets the user spot sustained divergence by eye. M5
removes the legacy path entirely, making the parity check
moot.
Agent-side publish now also surfaces subject + sequence + stream-seq
on every state-change publish, a similar diagnostic aid for tracing
wire deliveries.
Chapter 4's parity check in smoke-a4 caught M4 dropping events —
operator's consumer saw 1 of 3 state transitions, parity-mismatch
assertion fired.
Root cause: async-nats's jetstream.publish() returns a
PublishAckFuture that must be awaited for the server to persist
the message. Without that await, the publish is effectively
fire-and-forget and drops under any backpressure — which on the
smoke's agent-first-boot path is every publish until the stream
state stabilizes.
Fix awaits both the publish future (send) and the returned
PublishAckFuture (server ack) for state-change + log events.
State-change events are warn-on-failure (operator needs them);
log events are debug-on-failure (device-side ring buffer is
authoritative).
Smoke was silent about the Chapter 4 parity check because the
operator log got discarded on successful runs. Add a pre-cleanup
step that greps for `fleet-aggregator` log lines and prints the
last 20; if any `parity MISMATCH` line is present, upgrade to
`fail` — smoke exit 0 shouldn't hide a silently-wrong new
aggregator.
Replaces M3's per-tick KV re-walk with an incremental
JetStream consumer on `device-state-events`. Cold-start still
walks KV once to seed counters; steady state consumes events and
applies `from -= 1; to += 1` diffs.
New in `fleet_aggregator`:
FleetState (shared via Arc<Mutex<_>>):
- counters: per-deployment phase counts.
- phase_of: per-(device, deployment) current phase, for
duplicate + resync detection.
- latest_sequence: per-(device, deployment) highest sequence
applied, drops stale and duplicate deliveries.
- deployment_namespace: name → namespace map refreshed each
parity tick from the CR list (events carry only the
deployment name, matching the `<device>.<deployment>`
KV key format).
apply_state_change_event():
- Idempotent for duplicate sequence numbers.
- Idempotent for out-of-order lower-sequence events.
- On from-phase disagreement with our belief, trusts the
event and re-syncs (logs warn — parity check will catch
any resulting drift against the legacy aggregator).
- Counter decrement saturates at zero so replays can't
underflow.
run_event_consumer():
- Durable JetStream pull consumer on STATE_EVENT_WILDCARD,
DeliverPolicy::New (cold-start already seeded state from
KV — replaying from the beginning would double-count).
- Explicit ack; malformed payloads are logged + acked to
avoid infinite redelivery.
parity_tick() no longer walks KV — it reads live counters
from the shared FleetState and compares with the legacy
aggregator's per-CR fold. Same match/mismatch/running-totals
logging as M3.
8 new unit tests cover the event-apply invariants: first
transition (no from), transition (from+to), duplicate sequence,
out-of-order sequence, from-disagreement resync, unknown-
deployment ignore, cold-start seeding, underflow saturation.
Plus the 5 M3 tests from before — 13 aggregator tests total,
all green.
New module `fleet_aggregator` spawns a 5 s tick task that:
- Walks the Chapter 4 KV buckets (`device-info`,
`device-state`) every tick.
- Computes per-CR phase counters via `compute_counters` (pure
function, unit tested).
- Computes the legacy aggregator's counts from the same
`agent-status` snapshot map the legacy task is already
maintaining.
- Compares the two per CR and logs per-tick at DEBUG level
(matches) or WARN (mismatches), with running totals at INFO
every 60 s.
Explicit `cr_targets_device` predicate is the one-line plug
point for the selector-based rewrite coming from the review-fix
branch: swap `target_devices.contains()` for
`target_selector.matches(&info.labels)`, everything else in the
aggregator is label/selector-agnostic.
Refactored `aggregate::run` to accept the `StatusSnapshots` map
from outside so the parity-check task reads the same agent-status
view the legacy aggregator writes to. Added `aggregate::new_snapshots()`
helper so `main` owns the one shared Arc.
The task is strictly read-only: no CR patches, no side effects. M5
flips `.status.aggregate` over to the new counter-driven path once
M4 replaces the periodic re-walk with the event-stream consumer and
the parity check has stayed green under load.
5 unit tests cover the pure counter logic (target match, multi-CR
fan-in, zero-target CR, phase dispatch).
Agent now writes the new per-concern KV shapes + event streams
alongside the legacy AgentStatus. Nothing consumes the new data
yet — the legacy aggregator still drives CR .status from
`agent-status`. M3 will add the operator-side cold-start +
consumer paths in parity mode; M5 flips the CR-patch source once
counters verify against the legacy aggregator.
New module `fleet_publisher.rs` owns:
- Opening + idempotent-creating the three new KV buckets
(`device-info`, `device-state`, `device-heartbeat`) and
two JetStream streams (`device-state-events`,
`device-log-events`).
- Publish methods for DeviceInfo, HeartbeatPayload, DeploymentState
(KV put), StateChangeEvent + LogEvent (stream publish), and
delete for deployment-state cleanup.
- Log-and-swallow failure mode. The operator re-walks KV on
cold-start, so a missed event publish is self-healing on the
next transition or operator restart.
Reconciler grew:
- `device_id`: Id + `fleet`: Option<Arc<FleetPublisher>>
- per-(deployment) monotonic sequence counter in StatusState
- `set_phase` detects actual transitions (prev_phase vs new) and
emits a DeploymentState KV write + StateChangeEvent stream
publish only on change. No-op re-confirmation still bumps the
sequence (lets operator detect duplicate events via sequence
comparison) but stays off the wire.
- `drop_phase` deletes the device-state KV entry.
- `push_event` also publishes a LogEvent to the stream.
main.rs:
- Builds FleetPublisher after connect_nats, passes into Reconciler.
- Publishes DeviceInfo once at startup (empty labels — populated
by the selector-targeting branch once it merges).
- Spawns a heartbeat loop on 30 s cadence.
- Legacy `report_status` AgentStatus task kept running unchanged.
8 unit tests added for the transition-detection + sequence + ring-
buffer invariants (drive set_phase / drop_phase / push_event with
fleet: None). 18 contract tests from M1 still green.
First milestone of the aggregation rework. Lands the contract layer
without any runtime side effects: the agent + operator still run
their legacy paths unchanged.
New types (module `fleet`):
- DeviceInfo: routing labels + inventory, rewritten on label
change. Stored in KV `device-info` at `info.<device_id>`.
- DeploymentState: current phase per (device, deployment).
Stored in KV `device-state` at `state.<device>.<deployment>`.
Authoritative snapshot; operator rebuilds counters from it on
cold-start.
- HeartbeatPayload: tiny liveness ping in KV `device-heartbeat`.
Payload capped by a test (< 96 bytes) so it stays cheap at
1M-device rates.
- StateChangeEvent: `from: Option<Phase>, to: Phase, sequence`
emitted on each transition to JS stream
`device-state-events` on subject
`events.state.<device>.<deployment>`. Operator folds these
events into in-memory counters.
- LogEvent: shorter-retention user-facing event log to JS stream
`device-log-events` on subject `events.log.<device>`.
Transport constants + key/subject helpers in `kv` with
cross-component wire-stability tests so a rename here gets caught.
10 new tests (roundtrip serde, forward-compat parse, size bound,
key/subject format). Legacy `AgentStatus` tests + constants stay
green; retirement is scheduled for M8 once the live path has
switched over.
Design doc for the aggregation rework. Chapter 2's aggregator
(O(deployments × devices) per tick) works for a 10-device smoke but
doesn't scale past a partner fleet of even modest size. Replaces it
with CQRS-style incrementally-maintained counters driven by
JetStream state-change events, device-authoritative per-device
state keys, and a separate log transport that doesn't touch
JetStream.
Review first, implement after. No runtime code changes in this
commit.
Covers data model (KV buckets, streams, subjects), counter
invariants (transition-based, duplicate-safe), cold-start protocol
(walk once, then consume), CR patch cadence (debounced dirty set),
failure modes, scale back-of-envelope for 1M devices + 10k
deployments, schema migration path (clean break, same CRD
v1alpha1), and eight-milestone landing plan.
Chapter 1 + Chapter 2 are both green end-to-end on x86_64 and
aarch64. Chapter 3 (helm packaging) is next. Design sketches kept
as the historical record — the running code is the source of
truth for 'how'.
qemu-img create with no trailing size inherits the backing
image's virtual size. The Ubuntu cloud image ships with ~2 GiB
of root, which fills up as soon as we sideload a container
tarball in the smoke. Pass disk_size_gb through to qemu-img and
rely on cloud-initramfs-growroot (already in the base) to grow
the partition on first boot. example_iot_vm_setup defaults to
16 GiB.
kubectl wait --for=Available reports on pod readiness, but k3d's
klipper-lb takes a few more seconds to wire the host loadbalancer
port to Service endpoints. Without this extra wait the operator
races the routing and dies with 'expected INFO, got nothing.'
`podman save -m` produces an OCI multi-image archive format that
older podman versions in the Ubuntu 24.04 cloud image cannot load:
Error: payload does not match any of the supported image formats:
* oci-archive: loading index: ...index.json: no such file or directory
Downgrade to the single-image docker-archive format (default for
`podman save`): save the source image once, load once in the VM,
then `podman tag` twice to expose it under `localdev/nginx:v1` and
`:v2`. Same bits on disk, two distinct tag references, so the
upgrade test still sees a container-id change when the Score
flips from v1 to v2.
Running smoke-a4 with `ARCH=aarch64` after an `ARCH=x86-64` run
rebinds the local `nginx:alpine` tag to arm64 (or vice versa),
silently breaking the other arch's next run. Fail fast if the
cached image arch doesn't match the smoke's ARCH, with the exact
command to fix it (`podman pull --platform=linux/<arch> ...`).
Two changes that compose into one win: the smoke no longer needs a
functional Docker Hub to exercise the agent → podman → container
loop.
**harmony/src/modules/podman/topology.rs — IfNotPresent for image pull**
`PodmanTopology::ensure_service_running` was calling `podman pull`
on every reconcile, even when the image was already in the local
store. For a long-lived device agent reconciling against a public
registry, that's a guaranteed rate-limit collision: Docker Hub caps
unauthenticated pulls at 100 manifests per 6 h per IP, and an agent
ticking every 30 s chews through that allowance in a day.
Change the pull path to check the local store first:
if images.get(image).exists().await? { return Ok(()); }
// else: pull
Matches Kubernetes' `imagePullPolicy: IfNotPresent` semantics.
Correct default for the IoT platform: upgrades change the image
STRING (tag or digest), so they still hit the pull branch —
"use local if available, pull the new thing if the reference changed."
**iot/scripts/smoke-a4.sh — tarball sideload in place of registry**
An earlier iteration of this smoke stood up a local `registry:2`
container and pushed tagged images into it. That pattern itself
needs to pull `registry:2` from Docker Hub — cute demo, still
Hub-dependent. Gone now.
New phase 4.5 / 5c pair:
4.5: podman save the cached `nginx:alpine` under two local tags
(`localdev/nginx:v1`, `localdev/nginx:v2`) into a tarball on
the host.
5c: scp the tarball to the VM, `podman load` it into the
iot-agent user's rootless store.
Paired with the new IfNotPresent semantics, the agent's reconcile
sees both images already present and never touches a registry. The
upgrade test still works because `v1` and `v2` are distinct tag
strings → spec drift → container id changes.
Dropped the `docker` preflight (no more k3d-side registry transfer)
and the `LOCAL_REGISTRY_*` env vars.
Verified end-to-end: x86 smoke-a4 --auto PASS.
- apply v1 → container up → curl 200
- .status.aggregate.succeeded = 1 (Chapter 2 aggregator working)
- apply v2 → container id changes (upgrade confirmed)
- delete → container removed
Aarch64 run next.
The operator watches the \`agent-status\` bucket, keeps a per-device
snapshot in memory, and folds it into each Deployment CR's
\`.status.aggregate\` subtree every 5 seconds. The answer to the user's
stated requirement — "CRD .status reflect-back: per-device
succeeded/failed counts + recent log lines" — now lives in the CR
itself, observable via \`kubectl get -o jsonpath\` or any UI that
speaks k8s status subresources.
**Shape (in iot/iot-operator-v0/src/crd.rs)**
DeploymentStatus {
observed_score_string, // unchanged; controller change-detect
aggregate: Option<{
succeeded: u32, // devices with Phase::Running
failed: u32, // devices with Phase::Failed
pending: u32, // devices with Phase::Pending or
// reported-but-no-phase-entry-yet
unreported: u32, // target devices that never heartbeated
last_error: Option<{ // most recent failing device + short msg
device_id, message, at
}>,
recent_events: Vec<{ // last-N events across the fleet, newest first
at, severity, device_id, message, deployment
}>,
last_heartbeat_at, // freshness signal for the whole fleet
}>
}
**New module** \`iot/iot-operator-v0/src/aggregate.rs\`
- \`watch_status_bucket\`: subscribes to \`status.>\` on the
agent-status bucket, maintains a \`BTreeMap<device_id, AgentStatus>\`
in memory. Malformed payloads + malformed keys log-and-skip; the
snapshot map is always the latest good shape.
- \`aggregate_loop\`: 5 s ticker. Per tick: list Deployment CRs,
clone the snapshot (no lock held across network calls), compute
each CR's aggregate, JSON-Merge-Patch \`.status.aggregate\`. Merge
patch composes cleanly with the controller's
\`observedScoreString\` patch — neither clobbers the other.
- \`compute_aggregate\` pure fn: classification logic is in one
place, four unit tests pin its behaviour (counts + unreported,
reported-but-no-phase-entry = pending, event filter matches
deployment name only, status-key parser).
**Operator wiring** (\`main.rs\`)
\`run()\` now opens *both* KV buckets at startup, spawns the
controller and the aggregator concurrently via
\`tokio::select!\`. Either returning an error tears the process
down — kube-rs's Controller already absorbs transient reconcile
errors internally, so anything escaping is genuinely fatal.
**Controller tweak**
The apply path's \`patch_status\` was rebuilding the whole
\`DeploymentStatus\` struct, which would clobber the aggregator's
writes. Switched to raw JSON-Merge-Patch for the
\`observedScoreString\` field only. Behaviour preserved, aggregate
subtree left intact.
**Smoke assertion** (smoke-a4.sh --auto)
After apply + curl succeeds, the --auto path now asserts
\`kubectl get deployment.iot.nationtech.io ... -o
jsonpath='{.status.aggregate.succeeded}'\` reaches 1 within
60 s. Proves the full agent → status bucket → operator aggregate →
CRD status loop, end to end.
Verified locally: \`cargo test -p iot-operator-v0 --lib\` 4/4 green,
\`cargo check --all-targets --all-features\` clean.
Chapter 2 groundwork. The on-wire AgentStatus the agent publishes
every 30 s was only carrying device_id + status + timestamp — not
enough for the operator to answer "how are my deployments doing."
Enrich it so the operator can aggregate into a useful
DeploymentStatus.aggregate subtree on the CR (second commit).
**harmony-reconciler-contracts/src/status.rs**
- `AgentStatus.deployments: BTreeMap<String, DeploymentPhase>` —
keyed by deployment name (CR's metadata.name). Each phase carries
`{ phase: Running|Failed|Pending, last_event_at, last_error }`.
- `AgentStatus.recent_events: Vec<EventEntry>` — ring buffer of the
most recent reconcile events on this device. Each entry is
`{ at, severity: Info|Warn|Error, message, deployment: Option }`.
Bounded agent-side to keep JetStream per-message size sane.
- `AgentStatus.inventory: Option<InventorySnapshot>` — hostname,
arch, os, kernel, cpu_cores, memory_mb, agent_version. Published
once on startup.
- All three new fields are `#[serde(default)]` — mixed-fleet upgrades
don't break: an old agent's payload deserializes into the new
struct (deployments empty, events empty, inventory None); a new
agent's payload deserializes into an old operator just losing the
fields.
New tests (kept forward-compat front and center):
- `minimal_status_roundtrip` — empty maps / None
- `enriched_status_roundtrip` — full population
- `old_wire_format_parses_into_enriched_struct` — pre-Chapter-2
payload must still parse (the upgrade guarantee)
- `wire_keys_present` — literal wire-format pins for smoke greps
**iot-agent-v0**
Reconciler gains a `StatusState { deployments, recent_events }` side
map with a bounded ring buffer (`EVENT_RING_CAP = 32`). Every code
path that changes deployment state now also records phase + event:
- `apply()`: Pending → Running on success, Failed + error event on
failure.
- `remove()`: drops phase, emits "deployment deleted" info event.
- `tick()` (periodic reconcile): keeps phase at Running on noop;
flips to Failed + event on error (deliberately no event on
successful no-change ticks — 30 s cadence would drown the ring).
New helper `deployment_from_key(key)` unwraps `<device>.<deployment>`
into just the deployment name. `short(s)` truncates error strings to
512 chars so the payload stays well under NATS JetStream limits.
`report_status()` in main.rs now snapshots the reconciler's status
state on every heartbeat and publishes the full enriched payload
alongside a startup-captured InventorySnapshot. Inventory reads
`/proc/sys/kernel/osrelease` + `/proc/meminfo` + `std::env::consts::ARCH`
with graceful fallbacks — no new sys-info crate dep.
Verified: `cargo test -p harmony-reconciler-contracts --lib` 7/7 green
(5 new). Operator consumption of the new fields lands in the next
commit.
Docker Hub's unauthenticated rate limit (100 pulls per 6h per IP,
counted per-manifest-query) is the most reliable way for a CI-style
smoke loop to produce false negatives. The NATS pod failing with
'429 Too Many Requests' after a handful of runs today was that —
not a real regression.
Fix inside the smoke: before running the install Score, sideload the
NATS image into the k3d cluster via a podman→docker→k3d bridge:
- If the image isn't already in docker's store:
- If it's not in podman's store either, podman pull (this is
the one-time hit we can't avoid).
- podman save → docker load.
- k3d image import into the cluster's containerd.
Steady-state this is a few-hundred-ms operation (no Hub calls, no
registry traffic). Require docker in the preflight list since we
depend on it for the cross-runtime bridge.
Also bump the Available-wait from 60 s to 120 s — the post-import
pod spin-up is fast but the scheduler + loadbalancer update take
longer than I initially budgeted.
VM-side nginx pulls are still at Hub's mercy; addressing that
requires either (a) docker login before the smoke, (b) an
authenticated registry mirror, or (c) arch-specific image
pre-seeding into the VM. All Chapter-2+ follow-ups.
Initial 180 s wait assumed native-KVM x86 speed. Under aarch64 TCG
the same nginx:latest pull (~250 MB image + layered userns unpack)
takes 4-8 min observed; 180 s was catching post-heartbeat reconcile
mid-pull and reporting FAIL.
Bump `CONTAINER_WAIT_STEPS` per arch:
- x86 KVM: 90 iterations × 2 s = 180 s (unchanged)
- aarch64 TCG: 450 × 2 s = 900 s (15 min)
Apply to both the 'first-boot container' and 'upgrade container id
change' loops.
The agent runs rootless podman as the `iot-agent` user (system
user, created by IotDeviceSetupScore). Each user has their own
podman state tree under ~/.local/share/containers. The smoke
was running \`podman ps\` as \`iot-admin\` (the ssh login user),
so it saw an empty store even when the agent had happily created
the nginx container — leading to a spurious "container never
appeared" failure despite the reconciler reporting SUCCESS.
Fix: go through \`sudo su - iot-agent -c\` with
\`XDG_RUNTIME_DIR=/run/user/\$(id -u)\` so the command runs in
the right user session. Update the hand-off command menu with the
equivalent one-liner so the user can inspect the fleet's actual
container state without tripping over the same gotcha.
Smoke-a4 PASSes end-to-end on x86_64:
- CRD apply → container materializes
- Upgrade via new image → container id changes (not patched)
- Delete → container removed
With the previous commit (ensure_subordinate_ids), this closes
Chapter 1 of ROADMAP/iot_platform/v0_1_plan.md: the full v0 loop
works, hands-on driven by kubectl / a typed Rust binary / natsbox.
Ubuntu 24.04 `useradd --system` does not allocate `/etc/subuid` +
`/etc/subgid` ranges. Rootless podman silently fails on image-layer
unpack:
potentially insufficient UIDs or GIDs available in user namespace
(requested 0:42 for /etc/gshadow): ... lchown /etc/gshadow:
invalid argument
`smoke-a1.sh` didn't hit this because it runs the agent on the
*host* user, which has subuid/subgid populated by default. `smoke-a4.sh`
drives a podman pull inside the VM — the FIRST time we actually
exercise rootless-podman-on-a-fresh-system, and the failure surfaces
immediately.
The fix belongs in harmony, not in ad-hoc cloud-init scripts. Add
`UnixUserManager::ensure_subordinate_ids` alongside the existing
`ensure_user` + `ensure_linger` methods:
- `domain/topology/host_configuration.rs`: new trait method. Doc
explains why every rootless-container-runtime consumer needs it.
- `modules/linux/ansible_configurator.rs`: impl follows `ensure_linger`'s
pattern — a grep probe on /etc/subuid+/etc/subgid, then a single
`usermod --add-subuids 100000-165535 --add-subgids 100000-165535`
only when missing. Idempotent, no-ops on re-run.
- `modules/linux/topology.rs`: forwarder for `LinuxHostTopology`.
- `modules/iot/setup_score.rs`: call the new method right after
`ensure_linger` in `IotDeviceSetupScore`. Any future consumer that
runs rootless podman reaches for the same primitive.
Verified: `cargo check --all-features` clean. End-to-end smoke-a4
regression pending (re-running after this commit).
Kubernetes NodePort Services must use a port in the apiserver's
configured nodeport range (default 30000-32767). NatsBasicScore's
first cut accepted any port via `.node_port(port)`, which was fine
for strict use of the capital-N NodePort Service type, but made
the demo's "use NATS client port 4222 directly from the host"
story awkward.
Replace the `node_port: Option<i32>` field with a proper
`NatsServiceType` enum (ClusterIP | NodePort(i32) | LoadBalancer).
Three builder methods — one per variant. LoadBalancer is the right
idiom for the demo: k3d's built-in `klipper-lb` fronts
LoadBalancer Services on their `port` (not their nodePort), so
`k3d cluster create -p 4222:4222@loadbalancer` delivers external
traffic straight to the Service's client port. No nodeport range
juggling.
Signatures:
NatsBasicScore::new(name, namespace) // ClusterIP default
.node_port(30422) // NodePort(30422)
.load_balancer() // LoadBalancer
.jetstream(true)
.image("docker.io/library/nats:2.10-alpine")
Tests: 5 pass. New assertion: `load_balancer()` produces a Service
with type LoadBalancer and no pinned nodePort (apiserver assigns).
Consumers:
- `example_iot_nats_install` gets a `--expose {cluster-ip | node-port
| load-balancer}` flag (default `load-balancer` since that's what
the demo wants). The legacy `--node-port N` flag survives as the
NodePort port value.
- `smoke-a4.sh` asks for `--expose load-balancer`, matching its
`-p 4222:4222@loadbalancer` k3d port mapping.
Previous commit landed the script without the +x bit (a chmod
between write and commit was swallowed). Fix with git
update-index --chmod=+x so the file is executable on checkout.
Composed demo that brings up operator + in-cluster NATS + ARM (or
x86) VM agent, then either hands the full stack off to the user
with a command menu (default) or drives an apply + upgrade + delete
regression loop (`--auto`).
Phases:
1. k3d cluster with NATS port exposed via `-p 4222:4222@loadbalancer`.
2. NATS in-cluster via the new `example_iot_nats_install` binary
→ `NatsBasicScore` → typed k8s_openapi Namespace + Deployment +
NodePort Service.
3. CRD install via `iot-operator-v0 install` (Score-based, no yaml).
4. Operator spawned host-side, connects to nats://localhost:4222.
5. VM provisioned via `example_iot_vm_setup` (reused from smoke-a3);
agent inside the VM connects to nats://<libvirt-gateway>:4222.
6. Sanity: NATS pod Running, agent heartbeat
`status.<device>` present in `agent-status` bucket.
7a. DEFAULT: print a command menu (kubectl watch, typed Rust
applier, ssh/console, natsbox one-liners, curl) and block on
Ctrl-C with a cleanup trap tearing everything down.
7b. `--auto`: apply nginx:latest, wait for container on the VM,
curl, upgrade to nginx:1.26, assert container id CHANGED,
curl, delete, assert container gone.
Prereqs documented at the top of the script. Handles both x86-64
(native KVM) and aarch64 (TCG emulation) via `ARCH=` env.
Design notes captured in ROADMAP/iot_platform/v0_1_plan.md. Uses
every piece landed in this branch so far: K8sBareTopology,
NatsBasicScore, the typed CR applier, the Score-based CRD install.
Small CLI that installs a single-node NATS server into the cluster
KUBECONFIG points at, using harmony's `NatsBasicScore` composed
against `K8sBareTopology`.
This is the glue between `smoke-a4.sh` and the framework Score:
cargo run -q -p example_iot_nats_install -- \
--namespace iot-system \
--name iot-nats \
--node-port 4222
Defaults cover the demo exactly: iot-system namespace, NodePort 4222
so the libvirt VM agent can reach NATS through the k3d loadbalancer
port mapping.
No reinvented topology, no hand-rolled yaml, no helm shell-out. The
actual work (Namespace + Deployment + Service with the right
selector/ports/probes) lives inside `NatsBasicScore::Interpret` in
harmony where it can be reused by any future consumer.
Part of ROADMAP/iot_platform/v0_1_plan.md Chapter 1.
Replaces what would otherwise be a yaml fixture for the hands-on
demo. The CRD is already fully typed (DeploymentSpec + ScorePayload
+ PodmanV0Score + Rollout), so the applier uses those types
directly, constructs the CR via kube::Api, and either applies it
server-side or prints the JSON for `kubectl apply -f -`.
CLI:
iot_apply_deployment \
--namespace iot-demo \
--name hello-world \
--target-device iot-smoke-vm \
--image docker.io/library/nginx:latest \
--port 8080:80 # apply
iot_apply_deployment --image nginx:1.26 # upgrade (same name, new img)
iot_apply_deployment --delete # tear down
iot_apply_deployment --print ... # JSON to stdout → kubectl -f -
Uses server-side apply (PatchParams::apply().force()) so repeated
invocations patch the existing CR cleanly — the upgrade path the
demo exercises.
To expose the CRD types to an external consumer, iot-operator-v0
gains a thin `src/lib.rs` that re-exports the `crd` module. The
binary target now imports from the library (`use iot_operator_v0::crd;`)
instead of declaring its own `mod crd;` — avoids compiling the
types twice.
No change in operator runtime behavior.
Part of the ROADMAP/iot_platform/v0_1_plan.md Chapter 1 work.
Harmony's existing NATS story starts at `NatsK8sScore`, which is
designed for production multi-site superclusters: TLS-fronted
gateways, cert-manager-minted certs, ingress + Route, helm chart
with gateway merge blocks, NatsAdmin secret prompts. All of that is
overhead for a local smoke or a single-site decentralized deployment
that just needs a live JetStream server.
Add `NatsBasicScore` beside it. Deliberately minimal:
- Single replica
- Official `nats:*-alpine` image via typed k8s_openapi Deployment
- JetStream (-js) on by default, toggle via builder setter
- Namespace created if missing
- Service: ClusterIP by default, or NodePort via
`.node_port(port)` for off-cluster clients (e.g. a libvirt VM
connecting through the host's loadbalancer port)
Trait bounds are just `Topology + K8sclient` — no `HelmCommand`,
no `TlsRouter`, no `Nats` capability. Composes cleanly with
`K8sBareTopology` (added in the previous commit) so consumers can
`score.create_interpret().execute(&inventory, &topology)` against
any cluster `KUBECONFIG` points at.
Constructed via a small builder:
NatsBasicScore::new("iot-nats", "iot-system")
.node_port(4222)
.jetstream(true)
Under the hood the interpret runs three `K8sResourceScore`s in
sequence (namespace → deployment → service). No new machinery —
just composition of existing primitives.
Deliberately NOT in scope for this Score:
- TLS / PKI — use NatsK8sScore when you need those
- Gateways / supercluster — use NatsSuperclusterScore
- Auth (user/password or JWT) — add a ConfigMap mount when
the Chapter 4 auth work lands
Tests (4, all passing): default is ClusterIP; node_port() flips
Service to NodePort with the right nodePort field; jetstream() toggle
controls the `-js` arg.
Part of the "compound framework value" mindset: every future Score
that wants a local NATS now points at this one type instead of
inventing its own yaml.
Roadmap §12.6 ("topology proliferation") is partially resolved by
extracting the ad-hoc InstallTopology from iot-operator-v0/install.rs
into harmony as a reusable shared type, now that a second consumer
(NatsBasicScore, landing next) makes the extraction genuinely
load-bearing rather than speculative.
What's new:
- harmony/src/modules/k8s/bare_topology.rs — K8sBareTopology carries
one K8sClient, implements K8sclient + Topology (noop ensure_ready).
Constructors: from_client(name, client) for callers building their
own client, from_kubeconfig(name) for callers reading the standard
KUBECONFIG chain.
- modules::k8s::K8sBareTopology re-export.
What's gone:
- iot-operator-v0/src/install.rs: the ~30-line InstallTopology struct
+ its async_trait-decorated impls. The crate also drops async-trait
and harmony-k8s as direct deps (neither is used now that the
topology is shared).
- Long "architectural smell" comment from install.rs — the smell is
fixed; the explanation belongs at the shared type now (with the
history captured in its module doc).
Behavior-preserving. cargo check --all-targets --all-features clean.
smoke-a1 wire path unchanged.
Compounding-value move: every future Score that needs "apply a
typed resource against an existing cluster" consumes K8sBareTopology
instead of inventing its own Topology impl. That's the pattern v0
Harmony's design is meant to encourage.
v0 walking skeleton is substantially done (CRD → operator → NATS KV
→ on-device agent → podman reconcile; VM-as-device for x86_64 and
aarch64 via TCG; power-cycle resilience; operator install via Score
instead of yaml/kubectl). Time to switch the `ROADMAP/iot_platform`
folder from "plan to build the skeleton" to "plan to build on top of
the skeleton."
- **NEW** `ROADMAP/iot_platform/v0_1_plan.md` — the authoritative
forward plan. Five chapters in execution order:
1. Hands-on end-to-end demo the user can drive by hand
(imminent, fully detailed: composed smoke, typed-Rust CR
applier, natsbox command menu, in-cluster NATS).
2. Status reflect-back + inventory (enrich `AgentStatus`,
operator aggregates into `.status.aggregate`).
3. Helm chart packaging (ArgoCD deferred — user's clusters have
it already, bringing it into the smoke adds no validation
value).
4. Zitadel + OpenBao + per-device auth.
5. Frontend (web / CLI / TUI — deferred).
Chapters 2-5 are sketched; they expand to their own docs as each
becomes the active chapter.
- **EDIT** `ROADMAP/iot_platform/v0_walking_skeleton.md` — add a
SHIPPED banner at the top pointing at v0_1_plan.md. Keep the
707-line design diary intact as archaeology; don't rewrite
history.
- Incorporates the post-v0 architectural principles that emerged
from review (no yaml in framework paths, minimal ad-hoc
topologies, cross-boundary types in harmony-reconciler-contracts,
verify before blaming upstream).
The InstallTopology in iot/iot-operator-v0/src/install.rs is
architecturally a workaround: harmony's existing opinionated
topologies (K8sAnywhereTopology, HAClusterTopology) have accumulated
product-level side effects in ensure_ready that make them unfit for
narrow actions like "apply a CRD," so the module vendored its own
tiny Topology impl. If this pattern proliferates, the topology
ecosystem drifts toward "one bespoke topology per Score," which is
exactly the proliferation harmony's design was meant to prevent.
Two documentation changes, no code/behavior change:
- **Inline:** doc comment on `InstallTopology` flagging it as a
smell, explaining the root cause, and pointing at the roadmap
entry below. Anyone finding this code later (or tempted to copy
the pattern) reads the warning before they do.
- **Roadmap §12.6** (new): "Topology proliferation — opinionated
topologies leaking into narrow use cases." Captures the
architectural direction (minimal `K8sBareTopology` in harmony,
unbundle product setup from `ensure_ready`) without prescribing
an implementation. Includes an explicit done-check: the smoke
test for "this roadmap item is fixed" is that install.rs can
delete its inline Topology and one-line against the shared type.
Review feedback: writing yaml and shelling out to kubectl is the
exact anti-pattern harmony exists to eliminate. The operator already
has typed Rust for its CRD (`#[derive(CustomResource)]`), and
harmony-k8s already has a typed apply path. So the "install" step
should be a Score, not `cargo run -- gen-crd | kubectl apply -f -`.
Changes:
- **New** `iot/iot-operator-v0/src/install.rs` — `install_crds()`
builds `Deployment::crd()` via `kube::CustomResourceExt`, wraps it
in `harmony::modules::k8s::resource::K8sResourceScore`, and
executes the Score against a tiny local `InstallTopology` that
just carries a `K8sClient` loaded from `KUBECONFIG`.
The local topology exists because `K8sAnywhereTopology::ensure_ready`
does a lot of product-level setup (cert-manager, tenant manager,
helm probes) that isn't appropriate for a narrow "apply a CRD"
action. A 30-line inline topology that implements `K8sclient` +
a noop `ensure_ready` is the right-sized abstraction for now.
When a larger "install the operator in-cluster" Score lands
(Deployment + SA + RBAC + ClusterRoleBinding), that may justify
promoting the topology to a shared crate.
- **Renamed subcommand** `gen-crd` → `install`. Old path: print yaml
to stdout for kubectl to consume. New path: apply the CRD directly
via the Score, using whatever `KUBECONFIG` points at.
- **Deleted** `iot/iot-operator-v0/deploy/crd.yaml` and
`deploy/operator.yaml`. The CRD yaml was derived from Rust and
committed alongside the source — a drift hazard (nothing guaranteed
they stayed in sync). `operator.yaml` was never actually applied by
any smoke script; it existed only for documentation. Both go.
- **Rewired** `iot/scripts/smoke-a1.sh` phase 2 to call the `install`
subcommand instead of piping yaml to kubectl. Everything downstream
(kubectl wait for Established, apiserver CEL rejection check,
operator + agent + container lifecycle) unchanged.
- **Dropped** `serde_yaml` from the operator's `Cargo.toml` — it was
only used to print the CRD as yaml. Added `harmony`, `harmony-k8s`,
and `async-trait` deps.
Verification — `smoke-a1.sh` PASSes end-to-end on x86_64 k3d:
k3d cluster → install CRD via Score → apiserver rejects bad
score.type (CEL still works through the Score-applied CRD) →
operator → agent → nginx container up → curl 200 → delete CR →
KV + container removed.
Out of scope / follow-up: a proper "install operator in-cluster"
Score that also applies Namespace + SA + ClusterRole +
ClusterRoleBinding + Deployment (the manifests that used to live in
the deleted operator.yaml). Smoke-a1 currently runs the operator
as a host-side process, so that Score isn't on the test path today.
Review feedback: "iot" is the wrong scope label. The pattern this
crate encodes — a central operator writing desired state to NATS
JetStream KV, a remote agent watching KV and reconciling — is the
foundation for harmony's decentralized infrastructure management,
not an IoT thing. Raspberry Pi is one concrete use case; the next
consumers (OKD fleet agents, edge-compute reconcilers, any host
harmony can't reach directly over a control-plane API) aren't IoT
either.
Rename the crate to reflect what it actually is:
- `iot/iot-contracts/` → `harmony-reconciler-contracts/` (moved to
the repo root, alongside the other support crates).
- Package name `iot-contracts` → `harmony-reconciler-contracts`.
- Consumer `Cargo.toml` path references updated in operator, agent.
- `use iot_contracts::…` → `use harmony_reconciler_contracts::…`
across agent + operator sources.
- Crate-level prose in lib.rs + kv.rs rewritten to drop the IoT
framing and describe the reconciler pattern in its own terms.
- harmony/Cargo.toml drops the dep entirely — after the preceding
commit moved podman Score types back in-tree, harmony no longer
pulls anything from this crate.
No behavior change. Wire format unchanged — the two existing public
modules (`kv`, `status`) are byte-identical.
Verified:
- `cargo check --all-targets --all-features` clean.
- `cargo test -p harmony-reconciler-contracts` — 5/5 pass.
- x86_64 `smoke-a3.sh` end-to-end PASS (reboot-reconnect included).
Out of scope / follow-up: the operator and agent crate names
(`iot-operator-v0`, `iot-agent-v0`) and `IotScore` are still
IoT-branded. Evaluating whether to flip those in this branch next.
Review feedback: `ContainerRuntime` is a first-class harmony
capability (already lives at
`harmony/src/domain/topology/container_runtime.rs`) and the Score
types that describe what containers a caller wants running belong
next to the trait impls, not hidden in an IoT-labeled contracts
crate. Putting `PodmanService`, `PodmanV0Score`, and `IotScore` in
`iot-contracts` conflated the product-shape (IoT fleet agent) with
a reusable container-orchestration primitive.
Move the data definitions (plus the three serde tests) back to
`harmony/src/modules/podman/score.rs` where they were before the
extraction in commit 24b94a3. That file now again holds the types
and their `Score<T>` / `Interpret<T>` trait impls in one place.
No behavior change:
- `harmony::modules::podman::{IotScore, PodmanV0Score, PodmanService}`
re-exports still resolve (through the restored local module rather
than a forwarded re-export from iot-contracts).
- The single external consumer that imports these types —
`iot-agent-v0/src/reconciler.rs` — already went through
`harmony::modules::podman::*`, so no import flip needed.
iot-contracts now holds only the cross-boundary bits that are
genuinely reconciler-wire-format-specific (bucket names + key
helpers, `AgentStatus`, `Id` re-export). A follow-up commit will
rename the crate itself to reflect that scope.
Verification: `cargo test -p harmony --features podman --lib podman`
(3 score tests pass in their restored home), `cargo test -p
iot-contracts` (5 remaining tests), `cargo check --all-features`
clean.
`AgentStatus.device_id` and `AgentStatus.timestamp` were stringly
typed. Both now carry real types that prevent a whole class of
wire-format typos while keeping the on-wire JSON shape intact.
**device_id: String → harmony_types:🆔:Id**
Agent config + heartbeat payload now share the same `Id` that the
example IoT pipeline already uses for `IotDeviceSetupConfig`. Mixing
a device id with a deployment name or arbitrary `String` is now a
type error. `Id` is re-exported from `iot-contracts` so consumers
don't need a direct `harmony_types` dependency just to name the
field.
To keep the wire format byte-compatible, `harmony_types::Id` gains
`#[serde(transparent)]`. Audit: no consumer in the tree relies on
the previous `{"value": "…"}` shape — `Id` is persisted by sqlite
via `to_string()`, never serialized directly — so this is a
latent-bug fix more than a behavior change.
**timestamp: String → chrono::DateTime<Utc>**
The agent was calling `chrono::Utc::now().to_rfc3339()` and stuffing
the String into the payload. It now holds a real `DateTime<Utc>`
which serde-serializes as RFC 3339 anyway. The smoke script's
reboot-gate lex comparison still works: time-digit prefixes resolve
before the trailing `Z` (chrono default) vs `+00:00` (prior format)
difference matters.
**Plumbing**
- `iot/iot-agent-v0/src/config.rs`: `AgentSection.device_id: Id`.
TOML deserializes the bare string thanks to `#[serde(transparent)]`.
- `iot/iot-agent-v0/src/main.rs`: `watch_desired_state` and
`report_status` take `Id` instead of `String`.
- `iot/iot-contracts/Cargo.toml`: adds `harmony_types` path dep and
`chrono = { workspace, features = ["serde"] }`.
**Verification**
- `cargo test -p iot-contracts`: 8/8 passes. New assertions pin the
wire format: `"device_id":"pi-01"` (not `{"value":"pi-01"}`) and
`"timestamp":"2026-04-21T18:15:42Z"` (RFC 3339).
- x86_64 smoke-a3.sh PASSes end-to-end including the reboot-
reconnect loop — wire format remains compatible with the existing
smoke-script parsing.
Replaces an 8-link `.arg("-t").arg("ed25519").arg("-N")…` chain with
a single `.args([...])` of string literals, plus one trailing `.arg()`
for the `&PathBuf` (kept separate so we don't force it through the
`IntoIterator<Item=&str>` channel). No behavior change.
Consolidate the data types, NATS bucket names, and KV key formats
that were scattered across the IoT operator, on-device agent, and
harmony's podman module. Each was defined in one place and quoted /
reimplemented in the others, which is exactly the kind of contract
drift the roadmap v0.1 §2 called for consolidating before we start
layering new features on top.
New crate `iot/iot-contracts`:
* score.rs — `IotScore`, `PodmanV0Score`, `PodmanService` (moved
from `harmony::modules::podman::score`). Pure data, no harmony
deps.
* kv.rs — `BUCKET_DESIRED_STATE`, `BUCKET_AGENT_STATUS` constants,
`desired_state_key(device, deployment)`, `status_key(device)`.
These values used to be hard-coded in five places (agent main.rs,
operator main.rs, operator/deploy/operator.yaml, smoke-a1.sh,
smoke-a3.sh). Tests lock the literals so a flip can't slip.
* status.rs — typed `AgentStatus { device_id, status, timestamp }`.
Replaces the anonymous `serde_json::json!{}` the agent was
publishing, so the operator can deserialize the heartbeat
payload via a shared struct when §12 v0.1 status aggregation
lands.
Consumer updates:
* `harmony::modules::podman::score` now holds only the
`Score<T>` / `Interpret<T>` trait bindings; the pure types are
re-exported from iot-contracts. Trait impls can't move because
the trait lives in harmony, so this is the cleanest split.
* `iot-operator-v0` uses `BUCKET_DESIRED_STATE` and
`desired_state_key` — the inline `kv_key` fn now delegates so
the existing internal call sites stay untouched.
* `iot-agent-v0` uses `BUCKET_DESIRED_STATE`, `BUCKET_AGENT_STATUS`,
`status_key`, and `AgentStatus` for the heartbeat publish.
No behavior change. Tests: `cargo test -p iot-contracts` passes
(8/8). Regression: `smoke-a3.sh` on x86_64 PASSes end-to-end
(reboot-reconnect loop included) — wire format is byte-identical
to the pre-refactor serialization.
Next consumers on deck: operator-side status aggregation (§12 v0.1
#3) and journald log streaming (§12 v0.1 #5), both of which need
shared types across the operator/agent boundary and were the
reason this extraction was prioritized.
`wait_for_ip` returns as soon as libvirt sees a DHCP lease, but the
guest may still be minutes away from accepting SSH connections —
cloud-init is usually mid-firstboot (SSH host-key generation, runcmd,
etc.). Any Score that SSHes in immediately after `ensure_vm`
resolves races with sshd startup:
ansible.builtin.ping failed against 192.168.122.11: UNREACHABLE!
ssh: connect to host 192.168.122.11 port 22: Connection refused
This is painful on native KVM (seconds) and catastrophic under TCG
(1-3 min between DHCP and sshd listening).
When `spec.first_boot.is_some()` — i.e. the caller asked us to run
cloud-init and therefore almost certainly intends to SSH next — also
block on `wait_for_tcp_port(ip, 22, budget)` before returning. The
budget is reused from `wait_for_ip` (300 s x86_64 / 1800 s aarch64)
because if cloud-init takes that long to bring SSH up, something is
broken that a longer wait wouldn't fix.
`wait_for_tcp_port` uses 1 s backoff polling with a 5 s per-attempt
TCP connect timeout, so a silently dropped SYN doesn't burn half
the budget on a single hung syscall.
Cases without `first_boot` (caller bringing their own pre-baked
image and not expecting SSH) get the old behavior: return as soon
as DHCP resolves.
QEMU's `virt` machine hardwires pflash unit 0 as a CFI flash device
of fixed size 64 MiB. When libvirt's `<loader type='pflash'>` points
at a file smaller than that, qemu refuses to start:
cfi.pflash01 device '/machine/virt.flash0' requires 67108864
bytes, block backend provides 3145728 bytes
Different distros ship the CODE firmware differently:
- Pre-padded (upstream QEMU pc-bios/edk2-aarch64-code.fd, Debian/
Ubuntu qemu-efi-aarch64): file is exactly 64 MiB, zero-padded at
the tail. Works as-is with libvirt's pflash loader.
- Raw edk2 build output (Arch `edk2-aarch64 202508+`): file is
~2-4 MiB, just the firmware volume without pflash padding. Has
to be padded before libvirt accepts it.
Our discovery previously handed the discovered path straight to
libvirt. That works on pre-padded distros and silently fails on
raw-output distros.
Add `ensure_code_pflash_padded` in modules/kvm/firmware.rs:
- If the source is already 64 MiB, return the path unchanged —
no copy, no bytes moved.
- If smaller, check a cache path (pool_dir/aarch64-code-padded.fd)
for a correctly-sized copy newer than the source and reuse it.
- Otherwise copy + `File::set_len(64 MiB)` (sparse zero pad, one
syscall), chmod 0644, return the cached path.
- If larger than 64 MiB, error out — no amount of padding saves us.
`ensure_vm_firmware` in topology.rs now runs the discovered code
through the padder before handing it to libvirt. One padded copy
per pool, reused across every aarch64 VM on that pool.
Verification path: `cargo test -p harmony --lib kvm::` passes
(26 tests — XML suite unchanged since this is runtime-only).
Three fixes landed during arm smoke debugging. Each is a real
correctness / perf issue that would bite anyone running aarch64
under TCG via libvirt, independent of any particular firmware.
**xml.rs — qemu:commandline overrides for -cpu and -accel**
`pauth-impdef=on` is a QEMU property of `-cpu max`, not a libvirt
`<feature>` entry. Putting it under `<cpu><feature policy='require'
name='pauth-impdef'/>` is rejected by libvirt with:
error: unsupported configuration: unknown CPU feature: pauth-impdef
Route it instead via `<qemu:commandline>` (with the qemu namespace
declared on `<domain>`). QEMU takes the LAST `-cpu` arg as
authoritative, so libvirt's `-cpu max` followed by our
`-cpu max,pauth-impdef=on` yields max + pauth-impdef.
Same mechanism forces MTTCG: despite docs claiming QEMU ≥ 9.1
defaults to `thread=multi` on aarch64, observation on QEMU 10.2
shows cross-arch `-accel tcg` runs single-threaded (`vcpu.1.time`
stays at 0 forever). Appending `-accel tcg,thread=multi` creates
a real per-vcpu thread and roughly halves cold-boot wall time.
Also added a `<rng model='virtio'>` device feeding host `/dev/urandom`.
aarch64 cloud-init blocks minutes on first-boot SSH host-key
generation without it under TCG (entropy pool never fills on its
own). Cheap insurance on x86_64 too.
**topology.rs — 30-min wait_for_ip budget for aarch64**
Cold boot under TCG on an 8-core x86 host is 10-15 min even with
virtio-rng + pauth-impdef + MTTCG. The previous 900s ceiling
trips healthy boots; 1800s covers slower CI workers.
**smoke-a3.sh — cleanup must pass --nvram**
`virsh undefine --remove-all-storage` refuses to remove an aarch64
domain without `--nvram`, because NVRAM files aren't considered
"storage." Before this, a failed run left the domain definition
behind with yesterday's XML — subsequent runs would replay the
stale XML (ensure_vm is idempotent and doesn't redefine when the
domain already exists), masking any XML change until a manual
`virsh undefine` was issued. Also bump REBOOT_STEPS to match the
new topology-side budget.
Verified: `cargo test -p harmony --lib kvm::xml` passes (26/26),
including the 5 aarch64 assertions (namespace, cpu block, pflash
wiring, qemu:commandline contents for both -cpu and -accel).
Current Arch edk2-armvirt ships the pair as
/usr/share/edk2/aarch64/QEMU_EFI.fd
/usr/share/edk2/aarch64/QEMU_VARS.fd
(plus a compatibility copy under /usr/share/edk2-armvirt/aarch64/).
The previous CANDIDATES list looked for `QEMU_CODE.fd` and
`vars-template-pflash.raw` — neither name matches the actual
distro layout, so `discover_aarch64_firmware` reported
"no firmware found" on a fully-provisioned Arch host.
Add the `QEMU_EFI.fd` + `QEMU_VARS.fd` pair at both Arch paths at the
top of the probe order; keep the older raw-pflash variant and the
speculative CODE/VARS naming as later fallbacks. Sync the error
message's "checked paths" hint with the new list so the diagnostic
matches what's actually probed.
Verified against /usr/share/edk2/aarch64/QEMU_{EFI,VARS}.fd on this
host — `discover_aarch64_firmware` now returns the pair and
`cargo run -p example_iot_vm_setup -- --arch aarch64 --bootstrap-only`
completes (downloads + sha256-verifies the 598 MB arm64 image and
caches it under $HARMONY_DATA_DIR/iot/cloud-images/).
The on-device agent builds `harmony` with `default-features = false,
features = ["podman"]`, which does not pull in the `kvm` feature.
Cross-compiling iot-agent-v0 for `aarch64-unknown-linux-gnu` to put
it on a Pi / arm64 VM currently fails with:
error[E0433]: failed to resolve: could not find `kvm` in `modules`
--> harmony/src/modules/iot/preflight.rs:18:21
use crate::modules::kvm::firmware::discover_aarch64_firmware;
Gate the import and the `discover_aarch64_firmware()` call inside
`check_iot_smoke_preflight_for_arch` behind `#[cfg(feature = "kvm")]`.
Callers who build `harmony` without kvm (the agent) still get the
`qemu-system-aarch64` PATH check — the firmware probe only matters
to the host that will actually boot the VM, and that host always
builds with `kvm` enabled anyway.
Verification: `cargo build --release --target aarch64-unknown-linux-gnu
-p iot-agent-v0` now succeeds and produces a valid ELF aarch64 binary
(~13 MB).
Wire the VmArchitecture story all the way to the user-facing entry
points so an arm64 smoke run is a single env flip.
Example (`example_iot_vm_setup`):
* New `--arch {x86-64|aarch64}` flag (default x86-64) backed by a
`CliArch` enum that converts cleanly to `VmArchitecture`.
* Preflight and cloud-image bootstrap now call the `_for_arch`
variants, and the `VirtualMachineSpec.architecture` field gets
the real value instead of `Default::default()`.
Smoke script (`iot/scripts/smoke-a3.sh`):
* Reads `ARCH=x86-64|aarch64` from env (default x86-64).
* When `ARCH=aarch64`, `rustup target add aarch64-unknown-linux-gnu`
+ `cargo build --target ...` produces an arm64 agent binary;
otherwise the existing host-target build path is kept.
* Threads `--arch` to the example.
* Extends the phase-4 initial-status timeout (60s → 300s) and the
phase-5 post-reboot wait (240s → 900s) under TCG, which runs
3-5× slower than native KVM.
New `smoke-a3-arm.sh` wrapper: exports `ARCH=aarch64` and a separate
`VM_NAME` / NATS container name so an arm smoke run can coexist with
an x86 one on the same host without stepping on libvirt state.
Topology side (`KvmVirtualMachineHost::ensure_vm`): `wait_for_ip`
timeout is now arch-derived — 300s for x86_64, 900s for aarch64 —
because first-boot cloud-init under TCG routinely needs 8-12 min
on a constrained worker.
Add the pinned Ubuntu 24.04 arm64 cloud image alongside the existing
amd64 pin, with sha256 verification and a per-arch OnceCell cache so
both images can coexist under $HARMONY_DATA_DIR/iot/cloud-images/.
New entry point `ensure_ubuntu_2404_cloud_image_for_arch` selects the
right URL/sha256/filename tuple by VmArchitecture; the existing
`ensure_ubuntu_2404_cloud_image` becomes a back-compat shim pointing
at x86_64 so current callers don't need to thread an arch through yet.
Preflight gains `check_iot_smoke_preflight_for_arch`: on top of the
host-generic checks, an aarch64 target additionally requires
`qemu-system-aarch64` on PATH and a usable AAVMF firmware pair
(same `discover_aarch64_firmware` call the topology makes at
ensure_vm time — preflight surfaces it up front). Package-map
helpers learn `qemu-system-aarch64` for pacman/apt/dnf.
aarch64 guests boot via UEFI — there is no SeaBIOS equivalent for
the arm64 `virt` machine type. Libvirt needs two paths:
- CODE (read-only firmware image, shared across VMs)
- VARS (writable NVRAM, per-VM)
Every distro ships these under a different filename. New module
`modules/kvm/firmware.rs`:
- `AarchFirmware { code, vars_template }` — typed pair.
- `discover_aarch64_firmware()` walks four known-paths groups
(Arch `edk2-armvirt`, Arch old naming, Debian/Ubuntu
`qemu-efi-aarch64`, Fedora `edk2-aarch64`). First pair where
both files exist wins. Miss → `ExecutorError` carrying the
per-distro `pacman`/`apt`/`dnf` install command + the full
candidate list for diagnosis.
- `copy_vars_template_for_vm(fw, dest)` produces the per-VM NVRAM
at `$pool/<vm>-VARS.fd` and chmods 0644 so libvirt-qemu's
dynamic-ownership chown on VM start works.
Wired into `KvmVirtualMachineHost::ensure_vm`: when
`spec.architecture == Aarch64`, the topology runs firmware
discovery + per-VM copy before composing the `VmConfig`, then
hands the resolved `UefiFirmware` to the XML renderer
(commit 2 already consumes it). x86_64 path unchanged.
Firmware discovery is deliberately a runtime check with a clear
error, not a preflight — this lets x86_64-only runs succeed on
hosts without AAVMF installed. Commit 4 adds an arch-aware
preflight that surfaces it upfront when a caller asks for
aarch64.
Verified: 26/26 kvm::xml tests still green, cargo check clean,
cargo fmt clean.
Rewrites `domain_xml` to consume a resolved `DomainXmlParams`
(domain_type / arch / machine / emulator / cpu_block / firmware)
so per-arch branching happens once — at param resolution — and
the XML template itself stays a single readable format-string.
Per-arch values (from Linaro's "QEMU: A Tale of Performance
analysis" Jan 2025 for the aarch64 TCG knobs):
- **x86_64** → `<domain type='kvm'>` + machine `q35` + emulator
`qemu-system-x86_64` + `<cpu mode='host-model'/>`. No firmware.
(Unchanged — all existing XML still emits byte-identical output
on the default arch.)
- **aarch64** → `<domain type='qemu'>` (TCG emulation), machine
`virt`, emulator `qemu-system-aarch64`, custom CPU
`<model>max</model>` with `<feature policy='require'
name='pauth-impdef'/>`. MTTCG (`-accel tcg,thread=multi`) is
the default in QEMU ≥ 9.1 so no libvirt-side knob is needed.
UEFI via `<loader readonly='yes' type='pflash'>CODE</loader>`
+ `<nvram>VARS</nvram>` — a `UefiFirmware` pair is required
(populated by `KvmVirtualMachineHost` in commit 3).
Four new unit tests verify the aarch64 path emits the right
domain type, arch, machine, emulator, CPU features, and firmware
elements — and that x86_64 stays BIOS-default with no loader/
nvram leakage. 26/26 `modules::kvm::xml` tests green.
When a native-aarch64 runner (Ampere) shows up, it's a one-line
fork inside `DomainXmlParams::for_vm` to switch to `kvm` +
`host-model` for the aarch64 branch — the shape already handles
it.
Adds the type-safe arch dimension for the aarch64-on-x86_64
emulation work to follow. No behaviour change: every existing call
site gets `VmArchitecture::X86_64` via `Default`, and the XML
renderer (unchanged in this commit) emits the same bytes it
always did.
- `VmArchitecture { X86_64 (default), Aarch64 }` in
domain/topology/virtualization.rs, with `as_str()` and
`ubuntu_cloudimg_suffix()` helpers (Ubuntu uses `amd64`/`arm64`
in filenames, not the `uname -m` spelling).
- `VirtualMachineSpec.architecture` + `#[serde(default)]` for
on-disk compat.
- `VmConfig.architecture` + `VmConfig.firmware: Option<UefiFirmware>`
in modules/kvm/types.rs. `UefiFirmware { code, vars }` is the
typed pair libvirt's `<loader>` + `<nvram>` need for aarch64
guests; x86_64 leaves it None. `VmConfigBuilder::architecture()`
/ `firmware()` setters added.
- `KvmVirtualMachineHost::ensure_vm` threads the arch through to
VmConfig; firmware wiring is commit 3.
Re-exported: `VmArchitecture`, `UefiFirmware` from
`modules::kvm`. `VmArchitecture` is a type-alias re-export from
domain/topology so the arch enum lives in one place.
Verified: cargo check clean, fmt clean, aarch64 cross-compile of
harmony + iot crates still green.
Ansible's `command` module is a Python-wrapped SSH round trip with
zero added value when the operation isn't built around Ansible's
idempotency primitives. `russh` is already a workspace dep and
gives us the exit code + stdout + stderr in a typed struct, with
one round trip. Moving the two call sites that were using
`ansible.builtin.command` to russh directly:
- New `modules::linux::ssh_executor::ssh_exec(host, creds, cmd)`
returning `SshCommandOutput { rc, stdout, stderr }`. Loads the
private key via `russh::keys::load_secret_key`, authenticates,
opens an exec channel, drains all `ChannelMsg` until the
channel closes, returns the collected data. Draining past `Eof`
matters: some sshd implementations emit `ExitStatus` *after*
`Eof`, and an early break loses the rc.
- `ensure_linger`: `test -e /var/lib/systemd/linger/<user>` over
russh for the check, then `sudo loginctl enable-linger <user>`
only on miss. Two SSH round trips, no Ansible. Same semantics
as the previous `stat` + `command` pair but without the Python
hop.
- `ensure_user_unit_active`: `id -u <user>` + `sudo -u <user>
env XDG_RUNTIME_DIR=/run/user/<uid> systemctl --user enable
--now <unit>`. This is the case that couldn't be done cleanly
via ad-hoc `ansible.builtin.systemd` in the first place because
task-level `environment:` isn't available in ad-hoc; russh makes
it a one-liner.
Ansible still owns: `apt` (distro dispatch + cache), `user`
(idempotent account management), `copy` (file delivery with
content-diff change reporting), `file` (directory/mode), `systemd`
(daemon-reload + enable + start as one atomic call). Those are
where `ansible`'s value is real; `command` was a category error.
Verified: smoke-a3 PASS end-to-end — same 9-change initial setup,
NATS status, and power-cycle recovery as before.
Structural changes (the biggest items from the review):
- `HostConfigurationProvider` split into five narrower capabilities:
`HostReachable`, `PackageInstaller`, `FileDelivery`,
`UnixUserManager`, `SystemdManager`. Each implementation now only
implements what it can actually deliver — a future cloud-init /
ignition / podman-agent backend can pick a subset without
inheriting systemd assumptions it can't honour. Added an umbrella
trait `LinuxHostConfiguration` blanket-impl'd for any type that
has all five, so Scores keep a single bound.
- New `VirtualMachineHost` capability in domain/topology/: `list_vms`
/ `ensure_vm` / `delete_vm` / `get_vm_info`, with generic
`VirtualMachineSpec` carrying a typed optional `VmFirstBootConfig`
(hostname, admin user, authorized keys). `KvmHost` trait and
`KvmHostTopology` deleted; `KvmVirtualMachineHost` is the
concrete libvirt implementation. Cloud-init stays a KVM-impl
detail — callers never see it.
- `KvmVmScore` + `CloudInitVmConfig` deleted; replaced by a generic
`ProvisionVmScore` in `modules::iot::vm_score` bound to
`T: VirtualMachineHost`. The Score itself has no knowledge of the
hypervisor or its first-boot delivery mechanism.
- `IotDeviceSetupConfig.device_id` is now `harmony_types:🆔:Id`
(timestamp-prefixed, sortable-by-creation, collision-safe).
- `ensure_ready` on `KvmVirtualMachineHost` is a Noop with a TODO
pointing at ROADMAP/12-code-review-april-2026.md §12.1 (phased
topology). Captures the concern about eagerly probing the
hypervisor even when the current run doesn't need KVM.
Code quality fixes from the line-level comments:
- `render_toml` / `render_systemd_unit` / `render_user_data`
rewritten as `format!` with raw-string templates (no more
push_str chains).
- Every `Command::new(…).arg().arg().arg()` chain in the touched
files converted to `.args([…])`.
- Ansible module args are now typed Rust structs (`AptArgs`,
`AnsibleFileArgs`, `AnsibleUserArgs`, `AnsibleCopyArgs`,
`AnsibleSystemdArgs`, `AnsibleCommandArgs`, `AnsibleStatArgs`)
serialized via `serde_json::to_value`. No more `json!` macros
with ad-hoc string keys.
- `ensure_linger`: no more shell sentinel. Uses
`ansible.builtin.stat` on `/var/lib/systemd/linger/<user>` for
the idempotent change-state check, then `ansible.builtin.command
loginctl enable-linger` only on miss. `loginctl` is required
(not just `file state=touch`) because systemd-logind needs the
dbus signal to actually start the user manager; a plain file
touch doesn't wake it up and every subsequent `systemctl --user
…` fails with "Failed to connect to bus". Documented in-place.
- `ensure_user_unit_active`: picks up the user's UID first via
`ansible.builtin.command id -u <user>` and wraps the
`systemctl --user enable --now <unit>` invocation in `env
XDG_RUNTIME_DIR=/run/user/<UID>`. The systemd module's
task-level `environment:` keyword isn't available in ad-hoc
mode; this is the cleanest equivalent. Documented the
inline-playbook path as a future when we get more task-level-
env callsites.
- `ensure_package` comment clarified: distro dispatch is this
function's job; Debian-family is the first concrete target and
extending to RHEL/Fedora/Alpine is an implementation detail,
not a capability change.
- Kubespray line removed.
Verified: from a primed `$HARMONY_DATA_DIR/iot/`, smoke-a3.sh
still completes all 5 phases (bootstrap + provision + 9 setup
changes + initial NATS status + power-cycle recovery).
The smoke test now runs end-to-end against a pristine host with only
generic deps installed (libvirt, qemu, xorriso, python3, podman,
cargo, kubectl) — no manual ISO downloads, ssh-keygen rituals, or
chmod dances. Pairs with a hard power-cycle recovery phase that
matches ROADMAP §8's "power cycle test" shape.
Harmony-side bootstrap (all under $HARMONY_DATA_DIR/iot/):
- `modules::iot::assets` — SHA256-verified Ubuntu 24.04 cloud image
download (cached, streaming via reqwest) + ed25519 SSH keypair
generation. OnceCell-cached like `ensure_ansible_venv`.
- `modules::iot::libvirt_pool` — user-owned dir-backed libvirt
storage pool at $HARMONY_DATA_DIR/iot/kvm/pool/. Per-VM overlay
disks + seed ISOs land here; libvirt dynamic-ownership handles the
libvirt-qemu chown transitions we used to do by hand. Pool is
defined/built once via the `virt` crate inside a spawn_blocking,
then auto-started + auto-autostarted on every process boot.
- `modules::iot::preflight::check_iot_smoke_preflight()` — fail-fast
checks for every runner-host prereq (`virsh`, `qemu-img`, `xorriso`,
`python3`, `ssh-keygen`, libvirt-group membership, default
network active). Each missing piece surfaces with the Arch/Debian/
Fedora install command inline.
KvmVmScore now owns these calls internally — `CloudInitVmConfig`
loses `base_image_path`, `seed_output_dir`, `authorized_key`. The
Score returns the SSH private-key path in its outcome details so the
caller can hand it straight to `LinuxHostTopology`.
smoke-a3.sh dropped from 125 lines of manual setup to a thin
orchestration script. Adds phase 5: `virsh destroy` + `sleep` +
`virsh start`, then a wall-clock gate that rejects any status writes
from before the reboot. Verified: real power-cycles produce
timestamps ~14s after the gate (agent boot + connect latency); the
gate catches in-flight writes that happen during destroy.
Verified end-to-end from a fully nuked `$HARMONY_DATA_DIR/iot/`:
- cold boot: downloads 600MB cloud image (~25s), generates SSH key,
defines + starts libvirt pool, provisions VM, onboards device,
verifies phase 5 power-cycle recovery
- warm boot: cache hits on all bootstrap steps; same end-to-end
PASS in 2-3 minutes total
aarch64 cross-compile still green.
Eight fixes surfaced by actually running the VM-as-device flow end to
end. All six commit deltas are small and self-contained.
KvmVmScore + cloud-init:
- **Overlay disk**: VM now boots off a per-VM qcow2 backed by the base
image instead of writing into the base in-place. Re-runs of the same
vm_name reuse the overlay (idempotent); fresh runs wipe the overlay
so cloud-init starts clean. Requires `qemu-img`.
- **UUID instance-id**: cloud-init's meta-data now carries a fresh
UUID per seed build, so when the overlay gets recreated cloud-init
treats it as a first boot and re-runs all per-instance modules.
Without this, repeated runs silently skipped user/hostname/ssh setup.
- **xorriso deadlock**: `.status()` with piped stderr filled the pipe
buffer and SIGPIPE'd the child; switched to `.output()` which drains
both. Also unlink any pre-existing seed ISO before running xorriso,
since it otherwise treats the file as overwriteable input "media"
and aborts with exit 32.
- **wait_for_ip**: 180s → 300s. First boot of a cloud image on a
constrained runner (or CI worker) can take 2-4 minutes.
Ansible adapter — a half-dozen sharp corners of ad-hoc mode that only
show up in a live run:
- **`--ssh-common-args=VALUE`** (equals form, single token). Separate
`--ssh-common-args VALUE` form has ansible's argparse re-interpret
the `-o …` inside the value as its own `-o` flag and dump a help
screen. Lost an afternoon to this decades ago on another project.
- **Skip `-a` when empty**: `-a '{}'` trips ansible-core 2.17's "extra
params" check on parameterless modules like `ping`. Pass no `-a`
when the JSON dict is empty.
- **`ANSIBLE_LOAD_CALLBACK_PLUGINS=True`**: ad-hoc mode silently
ignores `ANSIBLE_STDOUT_CALLBACK` without this. Default callback
produces multi-line JSON that's fragile to parse.
- **`ANSIBLE_PIPELINING=True`**: required when `become`-ing an
unprivileged user (iot-agent for the user-scope podman.socket),
otherwise ansible's temp-file shuffle falls back to an ACL chmod
syntax no Linux distro accepts.
- **Parse shell/command oneline shape**: oneline callback emits
`host | VERB | rc=N | (stdout) … | (stderr) …` for shell-style
modules in addition to the `host | VERB => {json}` shape. Parser
now handles both and synthesises a JSON payload from the shell form.
- **Auto-create parent dir in ensure_file**: ansible's `copy` module
won't create `/etc/iot-agent/` for you; a `file state=directory`
call before every `copy` is idempotent and cheap.
- **ensure_package uses apt directly**: `ansible.builtin.package` is
distro-agnostic but doesn't auto-run `apt update`, so a fresh cloud
image fails with "no package matching". Switched to
`ansible.builtin.apt` with `update_cache=true, cache_valid_time=3600`.
Debian-family only for v0 (ROADMAP §5.3); RHEL switch is a future
capability refinement.
HostConfigurationProvider surface:
- **`FileSpec.source: FileSource`**: new `Content(String)` vs
`LocalPath(PathBuf)`. LocalPath ships binary files over SFTP via
ansible's native mechanism instead of passing base64 content through
argv (which hit ARG_MAX on the ~10MB agent). This replaces the whole
base64-in-tmpfile + oneshot install-unit dance in
IotDeviceSetupScore — the binary now installs in a single idempotent
`ensure_file` call that reports `changed` only when bytes differ.
IotDeviceSetupScore:
- Dropped the base64 + oneshot install machinery (80 fewer lines).
- Dropped the explicit primary `group:` on ensure_user — Debian-family
useradd auto-creates a group matching the username; setting `group:`
required pre-creating it.
smoke-a3.sh: builds iot-agent-v0 `--release` instead of debug (400MB
debug binary filled the VM's thin-provisioned 3.5GB cloud rootfs).
Verified end-to-end three times on this host:
run 1: 9 changes (fresh install — package install, user create, binary, config, restart)
run 2: 0 changes (true NOOP — `already configured`)
run 3: 2 changes (group swap — only TOML + agent restart)
Agent reports status.iot-smoke-vm into NATS after each run.
Rewrites AnsibleHostConfigurator to avoid the two coupling points that
last year's Kubespray investigation taught us to stay away from: YAML
playbook generation and Ansible inventory.
- **No more YAML, no more inventory files.** Every primitive is now one
or two `ansible all -i '<ip>,' -m <module> -a '<json>'` ad-hoc
invocations. JSON args go straight through Ansible's own module
interface; the tmpfile-playbook-and-inventory dance is gone entirely.
Harmony owns 100% of orchestration, Ansible owns only per-host
idempotent module execution. `ensure_systemd_unit` collapses to two
ad-hoc calls (copy + systemd) rather than a multi-task playbook.
`ensure_linger` sentinels change-state through the shell module's
stdout since ad-hoc has no `changed_when`.
- **Self-installing venv.** New `modules::linux::ansible_venv`:
`ensure_ansible_venv()` creates `$HARMONY_DATA_DIR/ansible-venv/` via
`python3 -m venv` + `pip install ansible-core==2.17.*` on first use,
cached via `tokio::sync::OnceCell`. No more "install ansible before
running Harmony" step — python3 + venv is the only host requirement,
and we print the exact package names for Arch/Debian/Fedora when
python is missing.
- **smoke-a3.sh**: drop `ansible-playbook` from preflight, add
`python3`. Example gains `--bootstrap-ansible-only` for warming the
venv ahead of the real run (turns a ~60s first-run smoke into
deterministic sub-second after bootstrap).
Output parsing uses the `oneline` callback (`host | VERB => {json}`)
which is trivially regex-free to split and handles FAILED!/UNREACHABLE!
as errors. SSH control sockets are pinned under `$HARMONY_DATA_DIR/
ansible-cp` so multiple Harmony processes don't race in /tmp.
Verified: `ensure_ansible_venv()` first call installs ansible-core
2.17.14 into the managed venv (~12s, network-bound); second call is
cache-fast (<50ms). Clippy + fmt clean, aarch64 cross-compile green.
- New binary crate `examples/iot_vm_setup` — composes the two Scores
from the previous commit (`KvmVmScore`, `IotDeviceSetupScore`) with
`KvmHostTopology` + `LinuxHostTopology`. CLI flags cover everything
a customer-facing "onboard this VM" invocation would need (device
id, group, NATS URL+creds, SSH key paths, cloud image path, agent
binary path). `--only-vm` skips the setup step when iterating on VM
provisioning.
- `iot/scripts/smoke-a3.sh` — end-to-end smoke that stands up a NATS
podman container, builds the iot-agent, runs the example, and waits
for the VM's agent to write its `status.<device-id>` key into the
`agent-status` KV bucket. Preflight fails fast with copy-paste
commands when any of `virsh`, `xorriso`, `ansible-playbook`, the
Ubuntu cloud image, or an SSH keypair is missing — the script does
not try to self-bootstrap these (would turn a 90-second smoke into a
~20-minute download-and-generate session).
- Clippy cleanups: redundant closure + useless `format!`s.
Adds the plumbing so Harmony can both provision a VM to stand in for a
fleet device and (re)configure any Linux host to join the fleet. The
walking skeleton's "VM-as-device" test path needs all three pieces:
- `domain::topology::HostConfigurationProvider` — new capability trait
with `ensure_package`, `ensure_user`, `ensure_file`,
`ensure_systemd_unit`, `restart_service`, `ensure_linger`,
`ensure_user_unit_active`, and a reachability `ping`. Returns
`ChangeReport { changed: bool }` so callers can reconcile-restart only
when something actually changed. Trait doc calls out the narrow scope
(not a general CM replacement) and the swappability story.
- `modules::linux::AnsibleHostConfigurator` + `LinuxHostTopology` —
concrete impl that shells out to `ansible-playbook --stdout-callback
json`, one play per trait method, parsing the JSON for the task's
`changed` flag. Deliberately the laziest reasonable adapter: when
Ansible's error surface becomes painful, this is the piece we replace
with a Rust-native impl behind the same trait, with zero Score churn.
Runtime requirement: `ansible-playbook` (>= 2.15) on the Harmony
runner host.
- `modules::kvm::KvmVmScore` + cloud-init seed ISO generation — thin
Score that wraps `KvmExecutor::ensure_vm` with a generated cloud-init
seed ISO (hostname + authorized SSH key + sudoer user, nothing more).
Uses `xorriso -as mkisofs` to build the ISO; returns the booted VM's
IP. Docs note cloud-init is strictly for the VM test rig — customer
Pi deployments go through rpi-imager / PXE instead. New `KvmHost`
capability + `KvmHostTopology` expose the underlying `KvmExecutor`.
- `modules::iot::IotDeviceSetupScore` — customer-facing Score bound to
`T: Topology + HostConfigurationProvider`. Installs podman + system-
d-container, creates the `iot-agent` system user with linger,
activates user podman.socket, uploads the agent binary via a
base64-in-tmpfile + oneshot unit pattern (docstring flags this as a
v0.1 candidate for a proper remote-fetch), writes
`/etc/iot-agent/config.toml` and the systemd unit, and restarts only
if any of the config/unit/binary-install tasks reported changes.
Re-running with a different `group` rewrites the TOML and bounces
the agent.
Scope note: this turn stops at one VM. Multi-VM + group routing is the
next step — `group` in the config is a label that the agent will carry
into its status bucket, but `Deployment.spec.targetGroups` isn't wired
anywhere yet. `smoke-a3.sh` (VM-as-device end-to-end) lands in the
next commit.
The agent now finishes the walking-skeleton thread end-to-end: a Deployment
CR applied in the central cluster flows through the operator into NATS KV,
the agent reconciles it into a running container on the host, and deletion
(or drift) runs through the same loop in reverse.
Key additions:
- `domain::topology::ContainerRuntime` — new capability trait for
node-level container runtimes with `ensure_service_running` /
`remove_service` / `list_managed_services`. Intentional scope doc
notes Docker likely fits, Containerd/CRI-O likely need a separate
capability; no attempt to generalise further up front. `ContainerSpec`
carries a `MANAGED_BY_LABEL` so `list_managed_services` can filter
out containers Harmony didn't create.
- `modules::podman::PodmanTopology` (feature-gated behind `podman`)
implements both `Topology` and `ContainerRuntime` over
`podman_api::Podman` on the local user socket. Handles image pull,
create/start, drift-triggered recreate, and a 5-minute graceful stop
per ROADMAP §5.6.
- `modules::podman::PodmanV0Interpret::execute` is no longer a stub —
its bound is tightened to `T: Topology + ContainerRuntime` and it
dispatches each `PodmanService` to the capability. `IotScore` /
`PodmanV0Score` carry the same bound so agent code calls
`Score::create_interpret` cleanly.
- `domain::inventory::Inventory::from_localhost()` — minimal
single-host inventory (hostname as label, logical CPU count, total
memory). Pulls in `sysinfo 0.30` (already a transitive dep via
`harmony_inventory_agent`).
- `iot-agent-v0` rewired around a `Reconciler` that owns the topology
+ inventory + a `HashMap<key, (serialized_score, parsed_score)>`
cache. KV Put → dispatch iff the serialized score changed
(ROADMAP §5.5 string-compare). KV Delete/Purge → tear down the
cached score's containers. Separate 30s reconcile tick re-runs
every cached score against podman (ROADMAP §5.6 "polls podman
every 30s as ground truth; KV watch events are accelerators").
Smoke test (`iot/scripts/smoke-a1.sh`) extended with phase 3b
(builds + starts agent) and phase 4b (verifies the container is
running and `curl http://127.0.0.1:8080/` returns nginx). Phase 5
now also asserts the container is gone after CR delete. PASS locally
against a fresh k3d + NATS podman container + rootless podman on the
dev host. aarch64 + x86_64 cross-compile stay green.
The CRD previously accepted any string for `score.type`, so typos like
`"pdoman"` or `"PodmnV0"` would be persisted by the apiserver and only
surface on-device as agent-side deserialize warnings. That class of
failure is distasteful and hard to debug.
Replace the auto-derived schema for `ScorePayload` with a hand-rolled
one that keeps the same visible shape but adds two apiserver-level
guardrails:
- `score.type` gets `minLength: 1` and an `x-kubernetes-validations`
CEL rule requiring it to match `^[A-Za-z_][A-Za-z0-9_]*$` — a valid
Rust identifier, since score variants *are* Rust struct names in
`harmony::modules::podman::IotScore`. Message points operators at
the concrete example `PodmanV0`.
- `score.data` still carries only `x-kubernetes-preserve-unknown-
fields: true`. The rule validates the discriminator's *shape*, not
its *value*, so v0.3+ variants (OkdApplyV0, KubectlApplyV0) don't
require an operator release — preserves ROADMAP §6.1's
generic-router design.
The `x-kubernetes-preserve-unknown-fields` extension stays scoped to
`score.data` alone; every other field in the CRD has a strict schema,
exactly one preserve-unknown-fields marker and exactly one
validations block in the whole document.
Smoke test extended: phase 2b applies a CR with `score.type: "has
spaces"` and asserts the apiserver rejects it with the CEL message
before the operator ever sees it. Positive phases (kubectl apply ->
NATS KV put -> status observed -> delete -> KV key removed) still
PASS end-to-end.
Matches the `preserve_arbitrary` pattern used by ArgoCD
(`Application.spec.source.helm.valuesObject`) and Flux
(`HelmRelease.spec.values`), both of which similarly use narrow
preserve-unknown-fields on a payload field without coupling the CRD
to their variant catalog.
`iot/scripts/smoke-a1.sh` drives the A1 acceptance flow end-to-end:
spins up NATS and a k3d cluster via podman, applies the generated CRD,
runs the operator, applies a Deployment CR, asserts the expected
`<device>.<deployment>` key lands in the `desired-state` KV bucket and
`.status.observedScoreString` round-trips the same JSON, then deletes
the CR and asserts the finalizer removes the KV key. Cleans up on exit.
Two fixes surfaced while running it:
1. `ScorePayload.data: serde_json::Value` generated an empty `{}`
schema, which the API server rejects. Attach a `schemars(schema_with
= preserve_arbitrary)` helper that emits `x-kubernetes-preserve-
unknown-fields: true`, letting the Score payload be any JSON shape.
2. `Patch::Merge` combined with `PatchParams::apply(...).force()` is
rejected by kube-rs (force is Apply-only). Use a plain `Merge` patch
for the status subresource — simpler and correct for v0.
Implement the A1 task from the IoT walking-skeleton roadmap:
- CRD (kube-derive): `iot.nationtech.io/v1alpha1/Deployment`, namespaced,
with `targetDevices`, `score {type, data}`, `rollout.strategy`, and a
status subresource carrying `observedScoreString`.
- Controller: `kube::runtime::Controller` + `finalizer` helper. On Apply,
writes `<device_id>.<deployment_name>` into NATS KV bucket
`desired-state` and patches `.status.observedScoreString` via
server-side apply. Skips KV write + status patch when the score is
unchanged to avoid reconcile-loop churn. On Cleanup, removes the
per-device keys before releasing the finalizer.
- CLI: `gen-crd` subcommand prints the CRD YAML from the Rust types;
`run` (default) starts the controller. `deploy/crd.yaml` is generated
by that subcommand — single source of truth, no drift.
- Deploy manifests: `deploy/operator.yaml` (Namespace, SA, ClusterRole,
ClusterRoleBinding, Deployment) and generated `deploy/crd.yaml`.
Agent fixes surfaced while aligning with the operator's key layout:
- Watch filter: was `starts_with("desired-state.<id>.")` on
`watch_all()`; bucket name is not a key prefix, so it never matched.
Now uses `bucket.watch("<id>.>")` with the NATS wildcard and handles
`Put`/`Delete`/`Purge` distinctly.
- Multi-server connect: was joining `nats.urls` with `","` into a single
malformed URL. Pass the `Vec<String>` to `ConnectOptions::connect`.
- `credentials.type` is now validated (rejects unknown discriminators)
so a v0.2 `zitadel` config doesn't silently fall back to shared creds.
Verification on feat/iot-walking-skeleton:
- cargo clippy --no-deps -D warnings: clean (agent + operator).
- cargo fmt --check: clean.
- x86_64 + aarch64 cross-compile: both build.
- podman module unit tests: pass.