Files
harmony/fleet/harmony-fleet-operator
Jean-Gabriel Gill-Couture 020ebcb1f9 refactor(fleet): deploy-architecture cleanup per ADR-023 — Scores everywhere, deploy crate, principles in CLAUDE.md
The previous e2e harness handrolled k8s manifests in `stack.rs`,
bypassing the Score-Topology-Interpret machinery harmony exists to
provide. This commit:

1. **ADR-023** codifies the rules: deploy with Scores (not
   manifests), e2e uses the same Scores as production, one Score
   per component, deploy blocks on smoke-test success, deploy logic
   lives in `*-deploy` crates, topologies are compile-time,
   thiserror over anyhow. CLAUDE.md mirrors the principles.

2. **New `fleet/harmony-fleet-deploy` crate** is the canonical home
   for fleet-component Scores:
   - `FleetOperatorScore` + helm-chart generator + `install_crds`
     moved out of `harmony::modules::fleet::operator` (they should
     never have lived in `harmony` core). `FleetServerScore`
     (composite of NATS + operator + Zitadel + callout) moved too.
   - New `FleetNatsScore` (preset over `NatsHelmChartScore` with
     fleet's required values; v1 supports `UserPass` auth, callout
     mode reserved on the public API for PR 1.5).
   - New `FleetAgentScore` with `FleetAgentTarget::Pod`; `Vm`
     target is a future variant that absorbs `FleetDeviceSetupScore`.
   - `harmony-fleet-deploy` binary built on the existing
     `harmony_cli` crate — no new CLI scaffolding.

3. **Operator runtime binary trimmed**: `Install` and `Chart`
   subcommands removed; both jobs now belong to
   `harmony-fleet-deploy`. The runtime binary becomes leaner.

4. **E2E harness rewritten** as a thin Score composer:
   `harmony-fleet-e2e/src/stack.rs` deploys the stack via
   `FleetNatsScore` + `FleetAgentScore`. The inline NATS manifest
   factory and the bespoke agent Pod renderer are gone.
   - Bring-up runs once per test binary via `shared_stack` +
     `tokio::sync::OnceCell` (matches the `fleet_e2e_demo` pattern).
   - Stale `e2e-*` namespaces from prior runs get pruned at
     startup so the leaks the OnceCell creates don't compound.

5. **`thiserror` for the agent's `CommandServer`** — replaces the
   anyhow-based surface with typed `CommandError` /
   `CommandServerError`.

6. **Memory** captures eight load-bearing principles (saved to
   `~/.claude/projects/.../memory/`) so future sessions don't drift
   back into manifest-handrolling.

Verified: `cargo test -p harmony-fleet-e2e --test ping` green
end-to-end against k3d in 25s warm.
2026-05-18 22:54:50 -04:00
..

harmony-fleet-operator

IoT operator — reconciles Deployment CRDs into NATS KV desired-state and aggregates device/deployment state back into CR status.

Web frontend (optional)

A small server-side dashboard is built into the operator behind the web-frontend cargo feature. Stack: axum + maud (HTML-in-Rust) + vendored HTMX + Tailwind CSS. No WASM, no cargo-leptos, no JS build toolchain — cargo build --features web-frontend is the whole build.

Why this stack

Every interaction is an HTTP request that returns an HTML fragment, and HTMX swaps it into the DOM. There is no client-side state. The presentation layer is intentionally thin:

async fn devices_handler(State(s): State<AppState>) -> Result<Markup, AppError> {
    let devices = s.fleet.list_devices().await?;
    Ok(page("Devices", s.live_reload, devices_view::page(&devices)))
}

Each handler is extract state → call domain service → render Maud markup. All real work — listing devices, blacklisting, etc. — lives in service::FleetService, a trait the dashboard, tests, and a future CLI all share. Presentation never reaches past that trait.

Why Maud instead of Leptos? We don't use Leptos's reactivity (it's pure SSR + HTMX), so the runtime/macro footprint was dead weight. Maud is a compile-time HTML macro that produces a Markup value — smaller dep tree, faster compiles, same Rust-flavored ergonomics.

Why HTMX + xterm.js for interactivity? A real terminal needs xterm.js in the browser regardless; once that JS exists, HTMX (~14 KB) is a rounding error and lets every other interaction stay declarative in markup (hx-post, hx-target, hx-swap).

Why everything bundled? The operator already ships as a single container. Tailwind CSS, HTMX, and the HTMX SSE extension are all embedded via include_bytes! so air-gapped clusters get the dashboard with nothing extra to mount. The only build-time external is the standalone tailwindcss v4 CLI — missing-CLI degrades gracefully (warning + empty embedded CSS); the dev workflow uses --css-from instead anyway.

Running it locally (mock data, no NATS, no kube)

# One-time: install the standalone Tailwind v4 CLI (single static binary).
curl -L -o ~/.local/bin/tailwindcss \
  https://github.com/tailwindlabs/tailwindcss/releases/latest/download/tailwindcss-linux-x64
chmod +x ~/.local/bin/tailwindcss

Two terminals for the dev loop:

# Terminal 1 — Tailwind sidecar, regenerates CSS on every class change.
tailwindcss \
  -i fleet/harmony-fleet-operator/style/input.css \
  -o fleet/harmony-fleet-operator/style/dist/tailwind.css \
  --watch

# Terminal 2 — the operator, serving the dashboard against fake data and
# reading CSS from Tailwind's output. `--live-reload` reloads the browser
# tab whenever you restart the server.
cargo run -p harmony-fleet-operator --features web-frontend -- serve-web \
  --mock \
  --css-from fleet/harmony-fleet-operator/style/dist/tailwind.css \
  --live-reload

Open http://localhost:18080.

--mock uses MockFleetService, an in-memory seeded dataset (10 fake devices in mixed states, 4 deployments). You can click "Blacklist" on a row and the row will swap in place to reflect the new status — this exercises the same FleetService API the real impl will satisfy. No NATS, no Kubernetes cluster needed.

Iteration cost

Change Reload step
Tailwind class in a Maud template edit → save → refresh tab (Tailwind sidecar already rebuilt CSS; no Rust compile)
Maud template structure / handler logic edit → cargo run restarts → --live-reload auto-refreshes
FleetService types edit → cargo run restarts → tab auto-refreshes

The Rust recompile is the actual floor. Tailwind changes never trigger one.

Production builds

# Once, before cargo build: produce the embedded CSS.
tailwindcss \
  -i fleet/harmony-fleet-operator/style/input.css \
  -o fleet/harmony-fleet-operator/style/dist/tailwind.css \
  --minify

cargo build -p harmony-fleet-operator --features web-frontend --release

The release binary serves the embedded CSS unless you pass --css-from at runtime. (build.rs will also run tailwindcss if it's on PATH; the manual step above is just a guarantee that the embedded copy is correct.)

Layout

fleet/harmony-fleet-operator/
├── src/
│   ├── service/             ← domain abstraction (FleetService trait + Mock)
│   │   ├── mod.rs           ← trait + summary types
│   │   └── mock.rs          ← in-memory seeded data
│   └── frontend/            ← presentation layer (cfg web-frontend)
│       ├── server.rs        ← axum router + handlers
│       ├── layout.rs        ← page shell (Maud)
│       ├── assets.rs        ← embedded Tailwind/HTMX bytes
│       └── views/
│           ├── dashboard.rs
│           ├── devices.rs   ← also exposes `row()` for HTMX swaps
│           └── deployments.rs
├── style/
│   └── input.css            ← Tailwind v4 entry point
└── vendor/
    ├── htmx.min.js          ← HTMX v2.0.9
    └── htmx-ext-sse.js      ← SSE extension (used by future log-tail views)

What's deferred

  • Real FleetService impl (wraps the kube client + NATS KV the reconcilers already use). serve-web without --mock currently errors out.
  • Zitadel SSO + admin-role check. v1 assumes an oauth2-proxy fronts the dashboard at the cluster edge.
  • Live log tail (SSE-based, HTMX sse-swap) — the wiring is in place.
  • Interactive shell (xterm.js + axum WS + portable-pty) — separate design.