harmony/ROADMAP/fleet_platform/v0_1_plan.md

# IoT Platform v0.1 and beyond — forward plan

Authoritative forward plan for the NationTech decentralized-infra /
IoT platform, written after the v0 walking skeleton shipped
(see `v0_walking_skeleton.md` for the historical diary). Organized as
five chapters in execution order.

## State of the world (as of 2026-04-23)

**Green, end-to-end:**

- CRD → operator → NATS JetStream KV write path (`smoke-a1.sh`).
- Agent watches KV, reconciles podman containers (`smoke-a1.sh`).
- VM-as-device provisioning: cloud-init + fleet-agent install + NATS
  smoke (`smoke-a3.sh`), x86_64 (native KVM) and aarch64 (TCG).
- Power-cycle / reboot resilience (`smoke-a3.sh` phase 5).
- aarch64 cross-compile of the agent (no Harmony modules need to
  feature-gate aarch64).
- Operator installed via a harmony Score (typed Rust, no yaml).
- `harmony-reconciler-contracts` crate — cross-boundary types
  (bucket names, key helpers, `DeviceInfo`, `DeploymentState`,
  `HeartbeatPayload`, `DeploymentName`, `Id` re-export).

**Chapter 1 shipped** (2026-04-21): composed end-to-end demo
(`smoke-a4.sh`) — operator in k3d + in-cluster NATS + ARM VM +
typed-Rust CR applier + hand-off menu + `--auto` regression. Green
on x86_64 (native KVM) and aarch64 (TCG).

**Chapter 2 shipped** (2026-04-23): selector-based targeting +
Device CRD + `.status.aggregate` reflect-back. `Deployment.spec.
targetSelector: LabelSelector` resolves against cluster-scoped
`Device` CRs materialized from NATS `device-info`. Operator writes
`desired-state` KV per matched pair, patches
`.status.aggregate` (matchedDeviceCount / succeeded / failed /
pending / lastError) at 1 Hz. Load-tested to 10 000 devices ×
1 000 Deployments at 10 000 KV writes/s sustained, zero errors.

**Not yet wired (real v0.1 work still to go):**

- Helm packaging of the operator (Chapter 3).
- Zitadel + OpenBao auth (per-device credentials, SSO for
  operator users). Placeholder `CredentialSource` trait on the
  agent side (Chapter 4).
- Any frontend (Chapter 5).
- Small quality items (not blockers): agent config-driven labels,
  `matchExpressions` in selectors, `Device.status.conditions`
  populated from heartbeat staleness.

**Verified during planning** (so future implementation doesn't
have to re-litigate):

- **Upgrade already works.** `reconciler.rs::apply` byte-compares
  serialized score payloads; drift triggers re-reconcile.
  `PodmanTopology::ensure_service_running` removes then re-creates
  containers on spec drift. No "stale + new" window.
- **The polymorphism stays.** `ReconcileScore` is an externally-tagged
  enum; adding `OkdApplyV0` later is additive.

**Surprises since v0 started** (for context, none architectural):

- Arch `edk2-aarch64-202602-2` shipped empty firmware blobs;
  `202508-1` ships unpadded edk2 that needs 64 MiB pflash padding.
  Fixed via runtime discovery + padding in `modules/kvm/firmware.rs`.
- MTTCG isn't default for cross-arch TCG on QEMU 10.2; force via
  `qemu:commandline` override. `pauth-impdef=on` likewise a
  qemu:commandline opt-in.
- `ensure_vm` is idempotent on "domain exists" — re-apply of a
  changed XML requires manual `undefine --nvram --remove-all-storage`.
  Noted as a follow-up in the code comments.

---

## Chapter 1 — Hands-on end-to-end demo (imminent)

**Goal:** the user runs one command, watches operator + NATS + ARM
VM come up, then drives a CRD through the full loop by hand:
`kubectl apply` it (manually or via a typed Rust applier), watch the
operator log "acquired," check the NATS KV store with `natsbox`,
SSH/console into the VM, `curl` the running nginx container from
the workstation.

### User-facing requirements (explicit)

- **No yaml fixtures.** Sample `Deployment` CRs constructed in
  typed Rust using `DeploymentSpec` + `PodmanV0Score`. Same
  discipline as the `install` Score that replaced `gen-crd | kubectl
  apply`.
- **ArgoCD deferred.** User's production clusters have it; bringing
  it into the smoke harness adds setup overhead without validating
  anything `helm install` doesn't. Chapter 3 produces the chart;
  ArgoCD integration is a later operational concern.
- **Operator logs every CR it acquires** — `controller.rs` already
  does `tracing::info!(%ns, %name, "reconcile")`; verify the output
  reads well in the command-menu hand-off.
- **natsbox debugging is first-class.** Script prints exact
  natsbox one-liners at hand-off so the user can inspect KV state.
- **In-cluster NATS.** Not a side-by-side podman container (as
  smoke-a1 does today). Expose to the libvirt VM via k3d
  loadbalancer port mapping.

### Design decisions

- **Rust CR applier.** New binary `examples/harmony_apply_deployment/`.
  CLI flags `--name --namespace --target-device --image --port
  --delete`. Constructs the `Deployment` CR via
  `kube::Api<Deployment>` + typed `DeploymentSpec`; calls
  `api.apply(...)`. Can also `--print` the CR JSON to stdout so
  `kubectl apply -f -` still works from the terminal.
- **smoke-a4.sh orchestration stays bash for now.** User agreed
  this is test-harness scope, not framework path; converting it
  to Rust is "not as important right now."
- **Hand-off is the default mode**, not `--keep`. The whole point
  of Chapter 1 is that the user drives the last stage interactively.
  `smoke-a4.sh` brings everything up, applies *nothing*, prints
  the command menu, waits on `INT/TERM` to tear down. `--auto`
  runs the full apply/curl/upgrade/delete regression for CI.
- **In-cluster NATS path.** Preferred: use `harmony::modules::nats`
  if it has a lightweight single-node / no-supercluster mode.
  Fallback: typed `K8sResourceScore` applying a minimal Deployment
  + NodePort Service. 15-min research task before committing.

### Composed smoke phases (`smoke-a4.sh`)

1. k3d cluster up with `-p "4222:4222@loadbalancer"` so the host
   port 4222 forwards into the cluster. Reachable from the
   libvirt VM via the gateway IP (typically `192.168.122.1:4222`).
2. NATS in-cluster via the chosen path (harmony module or direct
   K8sResourceScore). Wait for readiness.
3. Install CRD via the operator's `install` subcommand (typed Rust).
4. Spawn operator as a host-side process (same pattern as
   smoke-a1). Operator connects to `nats://localhost:4222`.
5. Provision ARM VM via `example_iot_vm_setup` (same entry point
   smoke-a3 uses). Agent configured to connect to
   `nats://<libvirt_gateway>:4222` — discover the gateway IP via
   `virsh net-dumpxml default`, as smoke-a3 already does.
6. Sanity: `kubectl wait ... crd Established`, operator logged
   "KV bucket ready", agent logged "watching KV keys",
   `status.<device>` present in `agent-status` bucket.
7. Hand off. Print the command menu below. Exit 0 with a cleanup
   trap on `INT/TERM`.

### Command menu at hand-off

- `kubectl get deployments.fleet.nationtech.io -A -w` — watch CR
  reconcile reactively.
- `cargo run -q -p example_harmony_apply_deployment -- --image
   nginx:latest --target-device $TARGET_DEVICE` — apply an nginx
  deployment via typed Rust.
- `cargo run -q -p example_harmony_apply_deployment -- --print
   --image nginx:latest --target-device $TARGET_DEVICE |
   kubectl apply -f -` — same thing, through kubectl.
- `ssh -i $SSH_KEY fleet-admin@$VM_IP` — connect to the VM.
- `virsh console $VM_NAME --force` — serial console alternative.
- `podman --url unix://$VM_IP:... ps` or ssh + `podman ps`
  — list containers on the VM from the workstation.
- `podman run --rm docker.io/natsio/nats-box nats --server
   nats://localhost:4222 kv ls desired-state` — list desired
  state keys (from the host).
- `podman run --rm ... nats kv get desired-state
   '<device>.<deployment>' --raw` — dump a specific desired state.
- `podman run --rm ... nats kv get agent-status
   'status.<device>' --raw` — dump the heartbeat.
- `curl http://$VM_IP:8080/` — hit the deployed nginx.

### `--auto` path (for regression)

1. Apply `nginx:latest`, wait for container on VM, `curl` 200.
2. Apply `nginx:1.26` (upgrade), wait for container *id* to change,
   `curl` 200 against the new container.
3. Apply `--delete`, wait for container gone from VM.

### Files

- **NEW** `examples/harmony_apply_deployment/Cargo.toml` +
  `src/main.rs` — typed applier.
- **NEW** `fleet/scripts/smoke-a4.sh`.
- **NO yaml fixtures.** Rust CLI flags cover the shape.
- Optional: factor shared smoke phases (NATS up, k3d up, operator
  spawn, VM provision) into `fleet/scripts/lib/` if the duplication
  across a1/a3/a4 becomes obvious. Don't force it.

### NATS exposure — implementation-time notes

- k3d `@loadbalancer` port mapping binds the host's `0.0.0.0:4222`
  by default; libvirt VMs on `virbr0` can reach it via the gateway
  IP. No special NAT config required.
- Fallback if environmental snag: keep the side-by-side podman
  container on an opt-in `NATS_MODE=podman` flag. Don't default
  to that — user explicitly asked for in-cluster.

### Verification

- Fresh host: `ARCH=aarch64 ./fleet/scripts/smoke-a4.sh` completes
  in 8-15 min, prints the command menu.
- `ARCH=aarch64 ./fleet/scripts/smoke-a4.sh --auto` PASSes
  end-to-end including upgrade id-change assertion.
- x86_64 (`ARCH=x86-64`) completes in 2-5 min.

### Explicitly out of scope

- `AgentStatus` / `DeploymentStatus` enrichment — Chapter 2.
- Helm chart, ArgoCD, auth, frontend — later chapters.
- Lifting the applier into a reusable `ApplyDeploymentScore` —
  only if a second consumer appears.

---

## Chapter 2 — Status reflect-back + selector-based targeting **[SHIPPED 2026-04-23]**

**Goal:** CRD `.status` reflects fleet reality — per-deployment
success/failure/pending counts, last-error surface, freshness. The
Deployment CR targets devices by label selector, not by id list.

> The shipped design replaces the original `AgentStatus` + list-of-ids
> proposal wholesale. See `chapter_4_aggregation_scale.md` for the
> superseded design-doc archaeology. Commits:
> `refactor(iot): delete legacy AgentStatus path`,
> `refactor(iot): operator watches device-state KV directly; drop event stream`,
> `refactor(iot): Deployment.targetSelector + Device CRD (DaemonSet-like)`.

### What shipped

**Wire format** (in `harmony-reconciler-contracts`): four per-concern
payloads on dedicated NATS KV buckets. No monolithic per-device blob,
no separate event stream.

| Type | Bucket | Cadence |
|------|--------|---------|
| `DeviceInfo` | `device-info` | on startup + label/inventory change |
| `DeploymentState` | `device-state` | on reconcile phase transition |
| `HeartbeatPayload` | `device-heartbeat` | every 30 s |

**CRDs.** Two cluster resources:

- `Deployment` (namespaced) — `spec.targetSelector: LabelSelector`
  (standard K8s `matchLabels` / `matchExpressions`). No device list
  on spec. `.status.aggregate` carries `matchedDeviceCount`,
  `succeeded`, `failed`, `pending`, `lastError`.
- `Device` (cluster-scoped, like `Node`) — `metadata.labels` carries
  the device's routing labels; `spec.inventory` holds the hardware/OS
  snapshot; `status.conditions` is reserved for liveness (populated
  lazily by a future heartbeat-freshness reconciler, not every ping).

**Operator tasks** (three concurrent loops in one process):

1. `controller` — validates Deployment CR names, holds the finalizer
   that cleans `desired-state.<device>.<deployment>` KV entries on
   delete. No writes on apply (aggregator handles that).
2. `device_reconciler` — watches the `device-info` KV; server-side-
   applies a `Device` CR per `DeviceInfo` payload, with label
   sanitization. Agents remain kube-unaware.
3. `fleet_aggregator` — three caches driven by watches (Deployment
   CRs, Device CRs, `device-state` KV). On any change, resolves
   each selector against the Device cache, writes/deletes
   `desired-state` KV entries for diffed matches, and patches
   `.status.aggregate` at 1 Hz for the CRs whose counters moved.

**Agents** publish `device-id=<id>` as a default DeviceInfo label, so
targeting a single device with `matchLabels: {device-id: pi-42}` is
zero-config. User-defined labels layer on from agent config (scoped
out of this chapter; follow-up item).

### Scale proof

`fleet/scripts/load-test.sh` + `examples/fleet_load_test` simulate N
devices across M Deployments, driving `device-state` KV updates at a
configurable cadence while the full operator stack runs against a
local k3d apiserver. Verified:

- 100 devices / 10 groups / 1 Hz / 60 s — 100 writes/s sustained,
  all 10 CR aggregates converge.
- 10 000 devices / 1 000 groups / 1 Hz / 120 s — ~10 000 writes/s
  sustained, 0 errors, all 1 000 CR aggregates correct
  (`matchedDeviceCount == expected`, `succeeded + failed + pending
  == matched`). Same envelope before and after the selector rewrite.

### Out of scope in this chapter (follow-ups)

- Agent config-driven labels (`[labels]` in agent toml → DeviceInfo).
  ~30 lines; deferred until a concrete need lands.
- `matchExpressions` evaluator. Operator currently supports
  `matchLabels` only and logs a warning for expression-bearing
  selectors. ~50 lines; deferred.
- `Device.status.conditions` populated from heartbeat staleness
  (Reachable / Stale transitions). Liveness is computable today by
  reading `device-heartbeat` directly; CR-side reflection is a
  convenience. ~100 lines; deferred.
- Full journald log streaming. The `.status.aggregate.lastError`
  surface covers the user's reflect-back requirement for now.
- Multi-device regression smoke — defer until real hardware or a
  second VM is around.

---

## Chapter 3 — Helm chart (ArgoCD deferred)

**Goal:** operator ships as a versioned helm chart with CRD
version-locked inside.

User clarified this session: ArgoCD exists in production; all it
does is apply resources from the chart. Standing up ArgoCD in the
smoke adds setup overhead with no incremental validation value.

Chapter 3 produces the chart + validates `helm install / helm
upgrade` lifecycles. ArgoCD consumption is a user operational
concern downstream.

### Sketch

- Chart location: `fleet/harmony-fleet-operator/chart/` (or sibling repo —
  defer decision to implementation time).
- Templates: Namespace, SA, ClusterRole, ClusterRoleBinding,
  Deployment (operator pod), CRD.
- **CRD yaml in the chart is generated at chart-publish time** from
  the Rust `Deployment::crd()`. One-off release artifact, not
  framework path — consistent with "no yaml in framework code."
- Values: operator image tag, NATS URL, log level.
- Smoke: `helm install` into k3d → CR apply → same assertions as
  Chapter 1.

### Open questions

- Chart repo: subdir vs. separate git repo.
- CRD install mechanism: chart hook vs. templates directory.
  Drives CRD upgrade story.

---

## Chapter 4 — Auth: Zitadel + OpenBao + per-device identity

**Goal:** per-device granular NATS credentials; SSO for operator
users; OpenBao policy per device; JWT bootstrap from Zitadel.

Zitadel + OpenBao are already ~99% integrated in harmony; this
chapter is wiring the IoT-specific flows.

### Sketch

- Agent's `CredentialSource` trait (already abstract in agent
  `config.rs`) gets a Zitadel-JWT-backed implementation. Mints
  short-lived NATS creds via OpenBao auth callout.
- Remove the shared-credentials `toml-shared` variant (v0 demo
  leftover).
- Availability: auth-callout caches policies, tolerates OpenBao
  outages.
- SSO for operator users (separate flow): Zitadel groups →
  Kubernetes RBAC subjects on the `Deployment` CRD.

---

## Chapter 5 — Frontend (last)

**Goal:** operator-friendly UI for the decentralized platform.

Form factor undecided: Leptos web dashboard, CLI extension to
`harmony_cli`, or a TUI. Minimum viable product: read-only view of
fleet state (devices + deployments + aggregated status) powered by
the CRD `.status` from Chapter 2. Aspiration: write operations with
auth from Chapter 4.

---

## Chapter 6 — Customer demo rehearsal **[in progress]**

48-hour customer demo prep. PO assessment concluded that promising a
real-OKD deployment without first proving the JWT-auth chain is
reckless. **VM-based rehearsal first**, OKD second.

The rehearsal extends `smoke-a4` (k3d + libvirt VM + agent + apply
CR + reconcile podman) with **Zitadel + auth callout + agent JWT
auth**. Two devices + one admin. Same code paths as production —
only the cluster topology differs.

Detailed plan: [`v0_demo_e2e.md`](v0_demo_e2e.md).

Once the VM rehearsal is green (success criteria in that doc), the
residual deltas to ship to real OKD are configuration, not new code.

---

## Principles — what we've learned and want to keep doing

- **No yaml in framework code paths.** Every kube-rs type is
  typed; every Score apply goes through typed Rust. Yaml generation
  happens only at chart-publish time, never at runtime.
- **Scores describe desired state; topologies expose capabilities.**
  Prefer adding capability traits over thickening a single topology.
- **Minimal topologies for ad-hoc Score execution.** `K8sAnywhereTopology`
  has too many opinions (cert-manager install, tenant-manager bootstrap,
  helm probes) for narrow apply-a-CRD use cases. See ROADMAP
  §12.6 — a lean shared `K8sBareTopology` is the durable fix.
- **Cross-boundary wire types in `harmony-reconciler-contracts`**,
  everything else in its natural crate.
- **Never ship untested code.** Every commit that changes runtime
  behavior is verified against a smoke script before landing.
  Cargo check + unit tests aren't enough.
- **Prove claims about upstream before blaming upstream.** The
  Arch edk2 investigation showed this matters; see
  `memory/feedback_prove_before_blaming_upstream.md`.