The production CSP is `script-src 'self'` (no 'unsafe-inline'), so every inline <script> / on*= handler was silently dead in prod (worked in dev, which adds 'unsafe-inline'). Move that logic to app.js and HTMX. - Logs pop-out did nothing: the modal's inline showModal()/onclick/onclose were CSP-blocked. app.js now opens the dialog on htmx:afterSwap to #modal-root (backdrop-close, clear-on-close, autoscroll). - Device logs showed fabricated lines: the SSE handler now emits one honest "not implemented yet" notice instead of fake logs. - Sidebar doubled when opening a device from the dashboard: the attention rows swapped a full page into `closest main`; now target `body` like the list. - Both list action buttons just opened the device (their onclick stopPropagation was CSP-blocked): removed the redundant no-op/quick-log buttons + the unused checkbox column; rows simply navigate. Filter dropdowns used onchange — switched to hx-trigger="change". - Deployment Overview tab loaded the home page; switching tabs left the highlight on Overview: the handler returned the full page for ?tab=overview, and only the content swapped. Tabs now re-render as a unit (bar + content) so overview returns content and the active highlight follows. Same fix applied to device-detail tabs. - Removed the dead Reconcile/Pause/Rollback/Roll-out buttons; replaced the fabricated deployment "manifest" with the real fields we have.
Harmony Fleet
IoT / decentralized-edge orchestration for harmony. A fleet stack is:
| Component | Crate | Role |
|---|---|---|
| Operator | harmony-fleet-operator |
Watches Deployment CRs, writes desired state into NATS JetStream KV, aggregates device state back into CR status. Runtime binary; no harmony dep. |
| Agent | harmony-fleet-agent |
One per device. Watches the desired-state KV, drives the local runtime (podman today), publishes heartbeats + per-deployment state, answers device-commands.* request/reply. |
| Auth | harmony-fleet-auth |
Shared NATS credential plumbing — TomlShared (dev) and ZitadelJwt (prod with auth-callout). |
| Deploy | harmony-fleet-deploy |
The canonical deploy crate. Imports harmony and exposes one *Score per component (FleetOperatorScore, FleetAgentScore, FleetNatsScore, FleetServerScore). Both the production CLI and the e2e harness compose these — see ADR-023. |
| E2E harness | harmony-fleet-e2e |
Brings the stack up in a fresh k3d namespace and runs integration tests against it. |
The on-the-wire types both ends agree on (KV bucket names, key formats, command-protocol payloads) live in ../harmony-reconciler-contracts.
Architecture in one line
FleetOperatorScore, FleetAgentScore, etc. are real Rust types with capability-bound Topology parameters. Production deploys, the e2e harness, and any future control-plane tool all compose the same Scores; the only thing that changes is the Topology instance. No handrolled YAML or imperative manifest factories anywhere. Read ADR-023 before adding deploy logic.
Quickstart — run the e2e ping test
The fastest path to a green fleet stack on your laptop. Requires podman, kubectl, and helm on $PATH; everything else (k3d, the NATS chart, all images) is fetched / built on demand.
HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping -- --nocapture
What it does, in order:
- Ensures a
fleet-e2ek3d cluster exists (creates one if not). NodePort30423on the host forwards to NATS inside the cluster. - Builds
harmony-fleet-agentin release mode, packages it intolocalhost/harmony-fleet-agent:e2e, and sideloads the image into the k3d cluster's containerd store. - Mints a per-bring-up namespace
e2e-<uuid8>and prunes any leftovere2e-*namespaces from prior runs (NodePort30423is cluster-scoped, so a stuckTerminatingnamespace would block the new bring-up — the prune waits up to 90 s for full cleanup before proceeding). - Deploys NATS via
FleetNatsScore(helm chart, JetStream on, static admin/device users, NodePort Service). - Waits for NATS to be reachable from the host on
nats://localhost:30423(admin/e2e-admin). - Deploys one
FleetAgentScore { target: Pod }— runs withruntime_enabled = falseso it skips podman and only runs the command-server + heartbeat loop. - Waits for the agent Deployment to be Ready.
- The test publishes
device-commands.<device_id>.pingviaFleetCommandsClient::pingand asserts the agent replies with{ device_id, agent_version, uptime_s }.
Cold first run: ~80 s (release build of the agent dominates). Warm: ~25 s.
Useful env knobs
| Var | Effect |
|---|---|
HARMONY_FLEET_E2E=1 |
Required. Without it the test is skipped — keeps cargo test --workspace cheap on machines without k3d. |
FLEET_E2E_KEEP=1 |
Skip namespace teardown on Drop. Lets you kubectl -n e2e-<…> logs deploy/… after a failure. The next run prunes it. |
RUST_LOG=info |
Or debug for the per-message command dispatch traces inside harmony-fleet-agent::command_server. |
Connecting to NATS while the stack is up
# Host-side, via the NodePort
nats://localhost:30423 # user=admin pass=e2e-admin (full access)
nats://localhost:30423 # user=device pass=e2e-device (device permissions)
# In-cluster, from any Pod in the same namespace
nats://fleet-nats.e2e-<uuid8>.svc.cluster.local:4222
FLEET_E2E_KEEP=1 + the harness's stdout line [e2e] NATS: nats://127.0.0.1:30423 … is the path most tests will take — leave the harness running, point a NATS client at that URL.
Inspecting the agent
# Find your namespace
kubectl get ns -l harmony.io/managed-by=fleet-e2e
# Tail the agent
kubectl -n e2e-<uuid8> logs deploy/fleet-agent-<device-id> -f
# Tail NATS (StatefulSet, not Deployment)
kubectl -n e2e-<uuid8> logs sts/fleet-nats -c nats -f
# Send a ping by hand (requires the `nats` CLI:
# https://github.com/nats-io/natscli/releases)
nats --server nats://localhost:30423 --user admin --password e2e-admin \
request "device-commands.vm-device-00-<uuid8>.ping" ""
Or if you don't want to install the nats binary :
alias natsbox='podman run --network=host --rm docker.io/natsio/nats-box:latest nats --server nats://localhost:30423 --user admin --password e2e-admin'
You should see something like {"device_id":"vm-device-00-<uuid8>","agent_version":"0.1.0","uptime_s":12}.
Cleaning up
The shared OnceCell in harmony-fleet-e2e lives for the test binary's lifetime, so namespaces survive a cargo test exit (the static is never explicitly dropped). The next cargo test invocation prunes them. To force a manual cleanup:
kubectl delete ns -l harmony.io/managed-by=fleet-e2e
# wipe the whole cluster:
k3d cluster delete fleet-e2e
Production deploys
harmony-fleet-deploy puts the published operator chart on a real cluster (OKD, vanilla k8s, anywhere K8sAnywhereTopology can reach) — the harmony apply / CD path. It loads FleetDeploySecrets from config (Env → OpenBao) and runs one FleetOperatorScore; auth is Zitadel-SSO-only. The full bring-up stack (FleetNatsScore + FleetAgentScore + …) is composed by the e2e harness directly over the same lib Scores, not by this binary.
# Deploy a released tag (version parsed from it in Rust):
cargo run -p harmony-fleet-deploy -- \
--filter FleetOperatorScore \
--from-tag harmony-fleet-operator-v0.0.2 \
--namespace fleet-system --yes
See deployment-process.md for the clickable CD workflow and the in-cluster runner bootstrap.
Connecting to the operator
The operator runs as a single-replica Deployment in --namespace (default fleet-system).
# Tail logs
kubectl -n fleet-system logs deploy/harmony-fleet-operator -f
# Port-forward the embedded web dashboard (web-frontend feature)
kubectl -n fleet-system port-forward deploy/harmony-fleet-operator 18080:18080
# Or run the dashboard standalone with seeded fake data — no NATS, no cluster
cargo run -p harmony-fleet-operator --features web-frontend -- serve-web --mock
# browse http://127.0.0.1:18080
Existing manual rehearsal — examples/fleet_e2e_demo
examples/fleet_e2e_demo brings up a fuller stack than the e2e harness — real Zitadel, the auth-callout, libvirt VM agents over SSH — at the cost of a 5-min cold start. It's the manual rehearsal flow; not what you want during the dev loop. See the example's RUNBOOK.md.
The harness and the rehearsal will converge: the follow-up PR lifts FleetCalloutScore + a mock-OIDC fixture into harmony-fleet-deploy, at which point the harness can run the full production auth path in ~30 s instead of 5 min, and fleet_e2e_demo thins down to a caller over the same Scores.
What's next
This branch lands the deploy-architecture cleanup (ADR-023), the per-component Scores, and the ping path. Slated immediately after:
- Zitadel + auth callout in
harmony-fleet-deploy. NewFleetCalloutScore(preset overNatsAuthCalloutScore) plus an in-cluster mock-OIDC fixture so the e2e harness can exercise the real auth-callout code path without paying Zitadel's 5-min cold-start cost. The harness'sAuthMode::Calloutvariant is already on the public API for this. - Operator pod in the e2e harness.
FleetOperatorScoreis already in the deploy crate; wiring it into the harness gives integration tests against the actualDeployment/Devicereconcile loops. Verb::LogsandVerb::Exec— the next two verbs on thedevice-commands.*protocol. Same harness, same TDD shape asping.- CRD types out of
harmonycore.harmony::modules::fleet::operator::crdis the last fleet-deploy thing still living inharmony. TheReconcileScorepayload coupling is the only blocker. - Smoke-test contract. ADR-023 principle 4 — every Score blocks on a smoke test before
deployreturns success. Today the e2e suite plays that role; the trait/companion shape lands once it's been validated in practice.
See PLAN_requests_over_nats.md for the full TDD-style plan this branch implements.