Files
harmony/ROADMAP/fleet_platform/v0_1_plan.md
Jean-Gabriel Gill-Couture fdcc7040dd docs(fleet): chapter 6 — VM-based customer demo rehearsal plan
Adds ROADMAP/fleet_platform/v0_demo_e2e.md and threads it from
v0_1_plan.md. The VM rehearsal extends smoke-a4 (already-green k3d
+ libvirt VM + agent + apply CR + reconcile loop) with Zitadel +
auth callout + agent JWT auth. Two devices + one admin, real
cargo tests sharing a OnceCell-bringup.

Plan calls out:
- The 7 tests, including the load-bearing
  `agent_recovers_from_nats_pod_restart` (asserts the auto-reconnect
  + auth-callback re-mint path under realistic disturbance).
- Five known risks / debugging traps to expect on first cold-start
  (iam-admin-pat secret timing, /etc/hosts injection, k3d port
  collisions, etc.).
- Success criteria for the rehearsal day: cold cargo run greens in
  <20 min, all 7 tests green on a clean machine, the NATS-restart
  test reliably greens 5 runs in a row.
- Anything below the success criteria → reframe the customer call
  to "architecture walkthrough + local k3d demo + pilot in 1-2
  weeks." Avoids burning the relationship to keep a deadline.

Once VM rehearsal is green the residual OKD deltas are configuration
(Route annotations, image registry, real DNS, cert) — no new code.
2026-05-03 16:59:43 -04:00

17 KiB
Raw Permalink Blame History

IoT Platform v0.1 and beyond — forward plan

Authoritative forward plan for the NationTech decentralized-infra / IoT platform, written after the v0 walking skeleton shipped (see v0_walking_skeleton.md for the historical diary). Organized as five chapters in execution order.

State of the world (as of 2026-04-23)

Green, end-to-end:

  • CRD → operator → NATS JetStream KV write path (smoke-a1.sh).
  • Agent watches KV, reconciles podman containers (smoke-a1.sh).
  • VM-as-device provisioning: cloud-init + fleet-agent install + NATS smoke (smoke-a3.sh), x86_64 (native KVM) and aarch64 (TCG).
  • Power-cycle / reboot resilience (smoke-a3.sh phase 5).
  • aarch64 cross-compile of the agent (no Harmony modules need to feature-gate aarch64).
  • Operator installed via a harmony Score (typed Rust, no yaml).
  • harmony-reconciler-contracts crate — cross-boundary types (bucket names, key helpers, DeviceInfo, DeploymentState, HeartbeatPayload, DeploymentName, Id re-export).

Chapter 1 shipped (2026-04-21): composed end-to-end demo (smoke-a4.sh) — operator in k3d + in-cluster NATS + ARM VM + typed-Rust CR applier + hand-off menu + --auto regression. Green on x86_64 (native KVM) and aarch64 (TCG).

Chapter 2 shipped (2026-04-23): selector-based targeting + Device CRD + .status.aggregate reflect-back. Deployment.spec. targetSelector: LabelSelector resolves against cluster-scoped Device CRs materialized from NATS device-info. Operator writes desired-state KV per matched pair, patches .status.aggregate (matchedDeviceCount / succeeded / failed / pending / lastError) at 1 Hz. Load-tested to 10 000 devices × 1 000 Deployments at 10 000 KV writes/s sustained, zero errors.

Not yet wired (real v0.1 work still to go):

  • Helm packaging of the operator (Chapter 3).
  • Zitadel + OpenBao auth (per-device credentials, SSO for operator users). Placeholder CredentialSource trait on the agent side (Chapter 4).
  • Any frontend (Chapter 5).
  • Small quality items (not blockers): agent config-driven labels, matchExpressions in selectors, Device.status.conditions populated from heartbeat staleness.

Verified during planning (so future implementation doesn't have to re-litigate):

  • Upgrade already works. reconciler.rs::apply byte-compares serialized score payloads; drift triggers re-reconcile. PodmanTopology::ensure_service_running removes then re-creates containers on spec drift. No "stale + new" window.
  • The polymorphism stays. ReconcileScore is an externally-tagged enum; adding OkdApplyV0 later is additive.

Surprises since v0 started (for context, none architectural):

  • Arch edk2-aarch64-202602-2 shipped empty firmware blobs; 202508-1 ships unpadded edk2 that needs 64 MiB pflash padding. Fixed via runtime discovery + padding in modules/kvm/firmware.rs.
  • MTTCG isn't default for cross-arch TCG on QEMU 10.2; force via qemu:commandline override. pauth-impdef=on likewise a qemu:commandline opt-in.
  • ensure_vm is idempotent on "domain exists" — re-apply of a changed XML requires manual undefine --nvram --remove-all-storage. Noted as a follow-up in the code comments.

Chapter 1 — Hands-on end-to-end demo (imminent)

Goal: the user runs one command, watches operator + NATS + ARM VM come up, then drives a CRD through the full loop by hand: kubectl apply it (manually or via a typed Rust applier), watch the operator log "acquired," check the NATS KV store with natsbox, SSH/console into the VM, curl the running nginx container from the workstation.

User-facing requirements (explicit)

  • No yaml fixtures. Sample Deployment CRs constructed in typed Rust using DeploymentSpec + PodmanV0Score. Same discipline as the install Score that replaced gen-crd | kubectl apply.
  • ArgoCD deferred. User's production clusters have it; bringing it into the smoke harness adds setup overhead without validating anything helm install doesn't. Chapter 3 produces the chart; ArgoCD integration is a later operational concern.
  • Operator logs every CR it acquirescontroller.rs already does tracing::info!(%ns, %name, "reconcile"); verify the output reads well in the command-menu hand-off.
  • natsbox debugging is first-class. Script prints exact natsbox one-liners at hand-off so the user can inspect KV state.
  • In-cluster NATS. Not a side-by-side podman container (as smoke-a1 does today). Expose to the libvirt VM via k3d loadbalancer port mapping.

Design decisions

  • Rust CR applier. New binary examples/harmony_apply_deployment/. CLI flags --name --namespace --target-device --image --port --delete. Constructs the Deployment CR via kube::Api<Deployment> + typed DeploymentSpec; calls api.apply(...). Can also --print the CR JSON to stdout so kubectl apply -f - still works from the terminal.
  • smoke-a4.sh orchestration stays bash for now. User agreed this is test-harness scope, not framework path; converting it to Rust is "not as important right now."
  • Hand-off is the default mode, not --keep. The whole point of Chapter 1 is that the user drives the last stage interactively. smoke-a4.sh brings everything up, applies nothing, prints the command menu, waits on INT/TERM to tear down. --auto runs the full apply/curl/upgrade/delete regression for CI.
  • In-cluster NATS path. Preferred: use harmony::modules::nats if it has a lightweight single-node / no-supercluster mode. Fallback: typed K8sResourceScore applying a minimal Deployment
    • NodePort Service. 15-min research task before committing.

Composed smoke phases (smoke-a4.sh)

  1. k3d cluster up with -p "4222:4222@loadbalancer" so the host port 4222 forwards into the cluster. Reachable from the libvirt VM via the gateway IP (typically 192.168.122.1:4222).
  2. NATS in-cluster via the chosen path (harmony module or direct K8sResourceScore). Wait for readiness.
  3. Install CRD via the operator's install subcommand (typed Rust).
  4. Spawn operator as a host-side process (same pattern as smoke-a1). Operator connects to nats://localhost:4222.
  5. Provision ARM VM via example_iot_vm_setup (same entry point smoke-a3 uses). Agent configured to connect to nats://<libvirt_gateway>:4222 — discover the gateway IP via virsh net-dumpxml default, as smoke-a3 already does.
  6. Sanity: kubectl wait ... crd Established, operator logged "KV bucket ready", agent logged "watching KV keys", status.<device> present in agent-status bucket.
  7. Hand off. Print the command menu below. Exit 0 with a cleanup trap on INT/TERM.

Command menu at hand-off

  • kubectl get deployments.fleet.nationtech.io -A -w — watch CR reconcile reactively.
  • cargo run -q -p example_harmony_apply_deployment -- --image nginx:latest --target-device $TARGET_DEVICE — apply an nginx deployment via typed Rust.
  • cargo run -q -p example_harmony_apply_deployment -- --print --image nginx:latest --target-device $TARGET_DEVICE | kubectl apply -f - — same thing, through kubectl.
  • ssh -i $SSH_KEY fleet-admin@$VM_IP — connect to the VM.
  • virsh console $VM_NAME --force — serial console alternative.
  • podman --url unix://$VM_IP:... ps or ssh + podman ps — list containers on the VM from the workstation.
  • podman run --rm docker.io/natsio/nats-box nats --server nats://localhost:4222 kv ls desired-state — list desired state keys (from the host).
  • podman run --rm ... nats kv get desired-state '<device>.<deployment>' --raw — dump a specific desired state.
  • podman run --rm ... nats kv get agent-status 'status.<device>' --raw — dump the heartbeat.
  • curl http://$VM_IP:8080/ — hit the deployed nginx.

--auto path (for regression)

  1. Apply nginx:latest, wait for container on VM, curl 200.
  2. Apply nginx:1.26 (upgrade), wait for container id to change, curl 200 against the new container.
  3. Apply --delete, wait for container gone from VM.

Files

  • NEW examples/harmony_apply_deployment/Cargo.toml + src/main.rs — typed applier.
  • NEW fleet/scripts/smoke-a4.sh.
  • NO yaml fixtures. Rust CLI flags cover the shape.
  • Optional: factor shared smoke phases (NATS up, k3d up, operator spawn, VM provision) into fleet/scripts/lib/ if the duplication across a1/a3/a4 becomes obvious. Don't force it.

NATS exposure — implementation-time notes

  • k3d @loadbalancer port mapping binds the host's 0.0.0.0:4222 by default; libvirt VMs on virbr0 can reach it via the gateway IP. No special NAT config required.
  • Fallback if environmental snag: keep the side-by-side podman container on an opt-in NATS_MODE=podman flag. Don't default to that — user explicitly asked for in-cluster.

Verification

  • Fresh host: ARCH=aarch64 ./fleet/scripts/smoke-a4.sh completes in 8-15 min, prints the command menu.
  • ARCH=aarch64 ./fleet/scripts/smoke-a4.sh --auto PASSes end-to-end including upgrade id-change assertion.
  • x86_64 (ARCH=x86-64) completes in 2-5 min.

Explicitly out of scope

  • AgentStatus / DeploymentStatus enrichment — Chapter 2.
  • Helm chart, ArgoCD, auth, frontend — later chapters.
  • Lifting the applier into a reusable ApplyDeploymentScore — only if a second consumer appears.

Chapter 2 — Status reflect-back + selector-based targeting [SHIPPED 2026-04-23]

Goal: CRD .status reflects fleet reality — per-deployment success/failure/pending counts, last-error surface, freshness. The Deployment CR targets devices by label selector, not by id list.

The shipped design replaces the original AgentStatus + list-of-ids proposal wholesale. See chapter_4_aggregation_scale.md for the superseded design-doc archaeology. Commits: refactor(iot): delete legacy AgentStatus path, refactor(iot): operator watches device-state KV directly; drop event stream, refactor(iot): Deployment.targetSelector + Device CRD (DaemonSet-like).

What shipped

Wire format (in harmony-reconciler-contracts): four per-concern payloads on dedicated NATS KV buckets. No monolithic per-device blob, no separate event stream.

Type Bucket Cadence
DeviceInfo device-info on startup + label/inventory change
DeploymentState device-state on reconcile phase transition
HeartbeatPayload device-heartbeat every 30 s

CRDs. Two cluster resources:

  • Deployment (namespaced) — spec.targetSelector: LabelSelector (standard K8s matchLabels / matchExpressions). No device list on spec. .status.aggregate carries matchedDeviceCount, succeeded, failed, pending, lastError.
  • Device (cluster-scoped, like Node) — metadata.labels carries the device's routing labels; spec.inventory holds the hardware/OS snapshot; status.conditions is reserved for liveness (populated lazily by a future heartbeat-freshness reconciler, not every ping).

Operator tasks (three concurrent loops in one process):

  1. controller — validates Deployment CR names, holds the finalizer that cleans desired-state.<device>.<deployment> KV entries on delete. No writes on apply (aggregator handles that).
  2. device_reconciler — watches the device-info KV; server-side- applies a Device CR per DeviceInfo payload, with label sanitization. Agents remain kube-unaware.
  3. fleet_aggregator — three caches driven by watches (Deployment CRs, Device CRs, device-state KV). On any change, resolves each selector against the Device cache, writes/deletes desired-state KV entries for diffed matches, and patches .status.aggregate at 1 Hz for the CRs whose counters moved.

Agents publish device-id=<id> as a default DeviceInfo label, so targeting a single device with matchLabels: {device-id: pi-42} is zero-config. User-defined labels layer on from agent config (scoped out of this chapter; follow-up item).

Scale proof

fleet/scripts/load-test.sh + examples/fleet_load_test simulate N devices across M Deployments, driving device-state KV updates at a configurable cadence while the full operator stack runs against a local k3d apiserver. Verified:

  • 100 devices / 10 groups / 1 Hz / 60 s — 100 writes/s sustained, all 10 CR aggregates converge.
  • 10 000 devices / 1 000 groups / 1 Hz / 120 s — ~10 000 writes/s sustained, 0 errors, all 1 000 CR aggregates correct (matchedDeviceCount == expected, succeeded + failed + pending == matched). Same envelope before and after the selector rewrite.

Out of scope in this chapter (follow-ups)

  • Agent config-driven labels ([labels] in agent toml → DeviceInfo). ~30 lines; deferred until a concrete need lands.
  • matchExpressions evaluator. Operator currently supports matchLabels only and logs a warning for expression-bearing selectors. ~50 lines; deferred.
  • Device.status.conditions populated from heartbeat staleness (Reachable / Stale transitions). Liveness is computable today by reading device-heartbeat directly; CR-side reflection is a convenience. ~100 lines; deferred.
  • Full journald log streaming. The .status.aggregate.lastError surface covers the user's reflect-back requirement for now.
  • Multi-device regression smoke — defer until real hardware or a second VM is around.

Chapter 3 — Helm chart (ArgoCD deferred)

Goal: operator ships as a versioned helm chart with CRD version-locked inside.

User clarified this session: ArgoCD exists in production; all it does is apply resources from the chart. Standing up ArgoCD in the smoke adds setup overhead with no incremental validation value.

Chapter 3 produces the chart + validates helm install / helm upgrade lifecycles. ArgoCD consumption is a user operational concern downstream.

Sketch

  • Chart location: fleet/harmony-fleet-operator/chart/ (or sibling repo — defer decision to implementation time).
  • Templates: Namespace, SA, ClusterRole, ClusterRoleBinding, Deployment (operator pod), CRD.
  • CRD yaml in the chart is generated at chart-publish time from the Rust Deployment::crd(). One-off release artifact, not framework path — consistent with "no yaml in framework code."
  • Values: operator image tag, NATS URL, log level.
  • Smoke: helm install into k3d → CR apply → same assertions as Chapter 1.

Open questions

  • Chart repo: subdir vs. separate git repo.
  • CRD install mechanism: chart hook vs. templates directory. Drives CRD upgrade story.

Chapter 4 — Auth: Zitadel + OpenBao + per-device identity

Goal: per-device granular NATS credentials; SSO for operator users; OpenBao policy per device; JWT bootstrap from Zitadel.

Zitadel + OpenBao are already ~99% integrated in harmony; this chapter is wiring the IoT-specific flows.

Sketch

  • Agent's CredentialSource trait (already abstract in agent config.rs) gets a Zitadel-JWT-backed implementation. Mints short-lived NATS creds via OpenBao auth callout.
  • Remove the shared-credentials toml-shared variant (v0 demo leftover).
  • Availability: auth-callout caches policies, tolerates OpenBao outages.
  • SSO for operator users (separate flow): Zitadel groups → Kubernetes RBAC subjects on the Deployment CRD.

Chapter 5 — Frontend (last)

Goal: operator-friendly UI for the decentralized platform.

Form factor undecided: Leptos web dashboard, CLI extension to harmony_cli, or a TUI. Minimum viable product: read-only view of fleet state (devices + deployments + aggregated status) powered by the CRD .status from Chapter 2. Aspiration: write operations with auth from Chapter 4.


Chapter 6 — Customer demo rehearsal [in progress]

48-hour customer demo prep. PO assessment concluded that promising a real-OKD deployment without first proving the JWT-auth chain is reckless. VM-based rehearsal first, OKD second.

The rehearsal extends smoke-a4 (k3d + libvirt VM + agent + apply CR + reconcile podman) with Zitadel + auth callout + agent JWT auth. Two devices + one admin. Same code paths as production — only the cluster topology differs.

Detailed plan: v0_demo_e2e.md.

Once the VM rehearsal is green (success criteria in that doc), the residual deltas to ship to real OKD are configuration, not new code.


Principles — what we've learned and want to keep doing

  • No yaml in framework code paths. Every kube-rs type is typed; every Score apply goes through typed Rust. Yaml generation happens only at chart-publish time, never at runtime.
  • Scores describe desired state; topologies expose capabilities. Prefer adding capability traits over thickening a single topology.
  • Minimal topologies for ad-hoc Score execution. K8sAnywhereTopology has too many opinions (cert-manager install, tenant-manager bootstrap, helm probes) for narrow apply-a-CRD use cases. See ROADMAP §12.6 — a lean shared K8sBareTopology is the durable fix.
  • Cross-boundary wire types in harmony-reconciler-contracts, everything else in its natural crate.
  • Never ship untested code. Every commit that changes runtime behavior is verified against a smoke script before landing. Cargo check + unit tests aren't enough.
  • Prove claims about upstream before blaming upstream. The Arch edk2 investigation showed this matters; see memory/feedback_prove_before_blaming_upstream.md.