Files
harmony/ROADMAP/fleet_platform/v0_1_plan.md
Jean-Gabriel Gill-Couture fdcc7040dd docs(fleet): chapter 6 — VM-based customer demo rehearsal plan
Adds ROADMAP/fleet_platform/v0_demo_e2e.md and threads it from
v0_1_plan.md. The VM rehearsal extends smoke-a4 (already-green k3d
+ libvirt VM + agent + apply CR + reconcile loop) with Zitadel +
auth callout + agent JWT auth. Two devices + one admin, real
cargo tests sharing a OnceCell-bringup.

Plan calls out:
- The 7 tests, including the load-bearing
  `agent_recovers_from_nats_pod_restart` (asserts the auto-reconnect
  + auth-callback re-mint path under realistic disturbance).
- Five known risks / debugging traps to expect on first cold-start
  (iam-admin-pat secret timing, /etc/hosts injection, k3d port
  collisions, etc.).
- Success criteria for the rehearsal day: cold cargo run greens in
  <20 min, all 7 tests green on a clean machine, the NATS-restart
  test reliably greens 5 runs in a row.
- Anything below the success criteria → reframe the customer call
  to "architecture walkthrough + local k3d demo + pilot in 1-2
  weeks." Avoids burning the relationship to keep a deadline.

Once VM rehearsal is green the residual OKD deltas are configuration
(Route annotations, image registry, real DNS, cert) — no new code.
2026-05-03 16:59:43 -04:00

400 lines
17 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# IoT Platform v0.1 and beyond — forward plan
Authoritative forward plan for the NationTech decentralized-infra /
IoT platform, written after the v0 walking skeleton shipped
(see `v0_walking_skeleton.md` for the historical diary). Organized as
five chapters in execution order.
## State of the world (as of 2026-04-23)
**Green, end-to-end:**
- CRD → operator → NATS JetStream KV write path (`smoke-a1.sh`).
- Agent watches KV, reconciles podman containers (`smoke-a1.sh`).
- VM-as-device provisioning: cloud-init + fleet-agent install + NATS
smoke (`smoke-a3.sh`), x86_64 (native KVM) and aarch64 (TCG).
- Power-cycle / reboot resilience (`smoke-a3.sh` phase 5).
- aarch64 cross-compile of the agent (no Harmony modules need to
feature-gate aarch64).
- Operator installed via a harmony Score (typed Rust, no yaml).
- `harmony-reconciler-contracts` crate — cross-boundary types
(bucket names, key helpers, `DeviceInfo`, `DeploymentState`,
`HeartbeatPayload`, `DeploymentName`, `Id` re-export).
**Chapter 1 shipped** (2026-04-21): composed end-to-end demo
(`smoke-a4.sh`) — operator in k3d + in-cluster NATS + ARM VM +
typed-Rust CR applier + hand-off menu + `--auto` regression. Green
on x86_64 (native KVM) and aarch64 (TCG).
**Chapter 2 shipped** (2026-04-23): selector-based targeting +
Device CRD + `.status.aggregate` reflect-back. `Deployment.spec.
targetSelector: LabelSelector` resolves against cluster-scoped
`Device` CRs materialized from NATS `device-info`. Operator writes
`desired-state` KV per matched pair, patches
`.status.aggregate` (matchedDeviceCount / succeeded / failed /
pending / lastError) at 1 Hz. Load-tested to 10 000 devices ×
1 000 Deployments at 10 000 KV writes/s sustained, zero errors.
**Not yet wired (real v0.1 work still to go):**
- Helm packaging of the operator (Chapter 3).
- Zitadel + OpenBao auth (per-device credentials, SSO for
operator users). Placeholder `CredentialSource` trait on the
agent side (Chapter 4).
- Any frontend (Chapter 5).
- Small quality items (not blockers): agent config-driven labels,
`matchExpressions` in selectors, `Device.status.conditions`
populated from heartbeat staleness.
**Verified during planning** (so future implementation doesn't
have to re-litigate):
- **Upgrade already works.** `reconciler.rs::apply` byte-compares
serialized score payloads; drift triggers re-reconcile.
`PodmanTopology::ensure_service_running` removes then re-creates
containers on spec drift. No "stale + new" window.
- **The polymorphism stays.** `ReconcileScore` is an externally-tagged
enum; adding `OkdApplyV0` later is additive.
**Surprises since v0 started** (for context, none architectural):
- Arch `edk2-aarch64-202602-2` shipped empty firmware blobs;
`202508-1` ships unpadded edk2 that needs 64 MiB pflash padding.
Fixed via runtime discovery + padding in `modules/kvm/firmware.rs`.
- MTTCG isn't default for cross-arch TCG on QEMU 10.2; force via
`qemu:commandline` override. `pauth-impdef=on` likewise a
qemu:commandline opt-in.
- `ensure_vm` is idempotent on "domain exists" — re-apply of a
changed XML requires manual `undefine --nvram --remove-all-storage`.
Noted as a follow-up in the code comments.
---
## Chapter 1 — Hands-on end-to-end demo (imminent)
**Goal:** the user runs one command, watches operator + NATS + ARM
VM come up, then drives a CRD through the full loop by hand:
`kubectl apply` it (manually or via a typed Rust applier), watch the
operator log "acquired," check the NATS KV store with `natsbox`,
SSH/console into the VM, `curl` the running nginx container from
the workstation.
### User-facing requirements (explicit)
- **No yaml fixtures.** Sample `Deployment` CRs constructed in
typed Rust using `DeploymentSpec` + `PodmanV0Score`. Same
discipline as the `install` Score that replaced `gen-crd | kubectl
apply`.
- **ArgoCD deferred.** User's production clusters have it; bringing
it into the smoke harness adds setup overhead without validating
anything `helm install` doesn't. Chapter 3 produces the chart;
ArgoCD integration is a later operational concern.
- **Operator logs every CR it acquires** — `controller.rs` already
does `tracing::info!(%ns, %name, "reconcile")`; verify the output
reads well in the command-menu hand-off.
- **natsbox debugging is first-class.** Script prints exact
natsbox one-liners at hand-off so the user can inspect KV state.
- **In-cluster NATS.** Not a side-by-side podman container (as
smoke-a1 does today). Expose to the libvirt VM via k3d
loadbalancer port mapping.
### Design decisions
- **Rust CR applier.** New binary `examples/harmony_apply_deployment/`.
CLI flags `--name --namespace --target-device --image --port
--delete`. Constructs the `Deployment` CR via
`kube::Api<Deployment>` + typed `DeploymentSpec`; calls
`api.apply(...)`. Can also `--print` the CR JSON to stdout so
`kubectl apply -f -` still works from the terminal.
- **smoke-a4.sh orchestration stays bash for now.** User agreed
this is test-harness scope, not framework path; converting it
to Rust is "not as important right now."
- **Hand-off is the default mode**, not `--keep`. The whole point
of Chapter 1 is that the user drives the last stage interactively.
`smoke-a4.sh` brings everything up, applies *nothing*, prints
the command menu, waits on `INT/TERM` to tear down. `--auto`
runs the full apply/curl/upgrade/delete regression for CI.
- **In-cluster NATS path.** Preferred: use `harmony::modules::nats`
if it has a lightweight single-node / no-supercluster mode.
Fallback: typed `K8sResourceScore` applying a minimal Deployment
+ NodePort Service. 15-min research task before committing.
### Composed smoke phases (`smoke-a4.sh`)
1. k3d cluster up with `-p "4222:4222@loadbalancer"` so the host
port 4222 forwards into the cluster. Reachable from the
libvirt VM via the gateway IP (typically `192.168.122.1:4222`).
2. NATS in-cluster via the chosen path (harmony module or direct
K8sResourceScore). Wait for readiness.
3. Install CRD via the operator's `install` subcommand (typed Rust).
4. Spawn operator as a host-side process (same pattern as
smoke-a1). Operator connects to `nats://localhost:4222`.
5. Provision ARM VM via `example_iot_vm_setup` (same entry point
smoke-a3 uses). Agent configured to connect to
`nats://<libvirt_gateway>:4222` — discover the gateway IP via
`virsh net-dumpxml default`, as smoke-a3 already does.
6. Sanity: `kubectl wait ... crd Established`, operator logged
"KV bucket ready", agent logged "watching KV keys",
`status.<device>` present in `agent-status` bucket.
7. Hand off. Print the command menu below. Exit 0 with a cleanup
trap on `INT/TERM`.
### Command menu at hand-off
- `kubectl get deployments.fleet.nationtech.io -A -w` — watch CR
reconcile reactively.
- `cargo run -q -p example_harmony_apply_deployment -- --image
nginx:latest --target-device $TARGET_DEVICE` — apply an nginx
deployment via typed Rust.
- `cargo run -q -p example_harmony_apply_deployment -- --print
--image nginx:latest --target-device $TARGET_DEVICE |
kubectl apply -f -` — same thing, through kubectl.
- `ssh -i $SSH_KEY fleet-admin@$VM_IP` — connect to the VM.
- `virsh console $VM_NAME --force` — serial console alternative.
- `podman --url unix://$VM_IP:... ps` or ssh + `podman ps`
— list containers on the VM from the workstation.
- `podman run --rm docker.io/natsio/nats-box nats --server
nats://localhost:4222 kv ls desired-state` — list desired
state keys (from the host).
- `podman run --rm ... nats kv get desired-state
'<device>.<deployment>' --raw` — dump a specific desired state.
- `podman run --rm ... nats kv get agent-status
'status.<device>' --raw` — dump the heartbeat.
- `curl http://$VM_IP:8080/` — hit the deployed nginx.
### `--auto` path (for regression)
1. Apply `nginx:latest`, wait for container on VM, `curl` 200.
2. Apply `nginx:1.26` (upgrade), wait for container *id* to change,
`curl` 200 against the new container.
3. Apply `--delete`, wait for container gone from VM.
### Files
- **NEW** `examples/harmony_apply_deployment/Cargo.toml` +
`src/main.rs` — typed applier.
- **NEW** `fleet/scripts/smoke-a4.sh`.
- **NO yaml fixtures.** Rust CLI flags cover the shape.
- Optional: factor shared smoke phases (NATS up, k3d up, operator
spawn, VM provision) into `fleet/scripts/lib/` if the duplication
across a1/a3/a4 becomes obvious. Don't force it.
### NATS exposure — implementation-time notes
- k3d `@loadbalancer` port mapping binds the host's `0.0.0.0:4222`
by default; libvirt VMs on `virbr0` can reach it via the gateway
IP. No special NAT config required.
- Fallback if environmental snag: keep the side-by-side podman
container on an opt-in `NATS_MODE=podman` flag. Don't default
to that — user explicitly asked for in-cluster.
### Verification
- Fresh host: `ARCH=aarch64 ./fleet/scripts/smoke-a4.sh` completes
in 8-15 min, prints the command menu.
- `ARCH=aarch64 ./fleet/scripts/smoke-a4.sh --auto` PASSes
end-to-end including upgrade id-change assertion.
- x86_64 (`ARCH=x86-64`) completes in 2-5 min.
### Explicitly out of scope
- `AgentStatus` / `DeploymentStatus` enrichment — Chapter 2.
- Helm chart, ArgoCD, auth, frontend — later chapters.
- Lifting the applier into a reusable `ApplyDeploymentScore` —
only if a second consumer appears.
---
## Chapter 2 — Status reflect-back + selector-based targeting **[SHIPPED 2026-04-23]**
**Goal:** CRD `.status` reflects fleet reality — per-deployment
success/failure/pending counts, last-error surface, freshness. The
Deployment CR targets devices by label selector, not by id list.
> The shipped design replaces the original `AgentStatus` + list-of-ids
> proposal wholesale. See `chapter_4_aggregation_scale.md` for the
> superseded design-doc archaeology. Commits:
> `refactor(iot): delete legacy AgentStatus path`,
> `refactor(iot): operator watches device-state KV directly; drop event stream`,
> `refactor(iot): Deployment.targetSelector + Device CRD (DaemonSet-like)`.
### What shipped
**Wire format** (in `harmony-reconciler-contracts`): four per-concern
payloads on dedicated NATS KV buckets. No monolithic per-device blob,
no separate event stream.
| Type | Bucket | Cadence |
|------|--------|---------|
| `DeviceInfo` | `device-info` | on startup + label/inventory change |
| `DeploymentState` | `device-state` | on reconcile phase transition |
| `HeartbeatPayload` | `device-heartbeat` | every 30 s |
**CRDs.** Two cluster resources:
- `Deployment` (namespaced) — `spec.targetSelector: LabelSelector`
(standard K8s `matchLabels` / `matchExpressions`). No device list
on spec. `.status.aggregate` carries `matchedDeviceCount`,
`succeeded`, `failed`, `pending`, `lastError`.
- `Device` (cluster-scoped, like `Node`) — `metadata.labels` carries
the device's routing labels; `spec.inventory` holds the hardware/OS
snapshot; `status.conditions` is reserved for liveness (populated
lazily by a future heartbeat-freshness reconciler, not every ping).
**Operator tasks** (three concurrent loops in one process):
1. `controller` — validates Deployment CR names, holds the finalizer
that cleans `desired-state.<device>.<deployment>` KV entries on
delete. No writes on apply (aggregator handles that).
2. `device_reconciler` — watches the `device-info` KV; server-side-
applies a `Device` CR per `DeviceInfo` payload, with label
sanitization. Agents remain kube-unaware.
3. `fleet_aggregator` — three caches driven by watches (Deployment
CRs, Device CRs, `device-state` KV). On any change, resolves
each selector against the Device cache, writes/deletes
`desired-state` KV entries for diffed matches, and patches
`.status.aggregate` at 1 Hz for the CRs whose counters moved.
**Agents** publish `device-id=<id>` as a default DeviceInfo label, so
targeting a single device with `matchLabels: {device-id: pi-42}` is
zero-config. User-defined labels layer on from agent config (scoped
out of this chapter; follow-up item).
### Scale proof
`fleet/scripts/load-test.sh` + `examples/fleet_load_test` simulate N
devices across M Deployments, driving `device-state` KV updates at a
configurable cadence while the full operator stack runs against a
local k3d apiserver. Verified:
- 100 devices / 10 groups / 1 Hz / 60 s — 100 writes/s sustained,
all 10 CR aggregates converge.
- 10 000 devices / 1 000 groups / 1 Hz / 120 s — ~10 000 writes/s
sustained, 0 errors, all 1 000 CR aggregates correct
(`matchedDeviceCount == expected`, `succeeded + failed + pending
== matched`). Same envelope before and after the selector rewrite.
### Out of scope in this chapter (follow-ups)
- Agent config-driven labels (`[labels]` in agent toml → DeviceInfo).
~30 lines; deferred until a concrete need lands.
- `matchExpressions` evaluator. Operator currently supports
`matchLabels` only and logs a warning for expression-bearing
selectors. ~50 lines; deferred.
- `Device.status.conditions` populated from heartbeat staleness
(Reachable / Stale transitions). Liveness is computable today by
reading `device-heartbeat` directly; CR-side reflection is a
convenience. ~100 lines; deferred.
- Full journald log streaming. The `.status.aggregate.lastError`
surface covers the user's reflect-back requirement for now.
- Multi-device regression smoke — defer until real hardware or a
second VM is around.
---
## Chapter 3 — Helm chart (ArgoCD deferred)
**Goal:** operator ships as a versioned helm chart with CRD
version-locked inside.
User clarified this session: ArgoCD exists in production; all it
does is apply resources from the chart. Standing up ArgoCD in the
smoke adds setup overhead with no incremental validation value.
Chapter 3 produces the chart + validates `helm install / helm
upgrade` lifecycles. ArgoCD consumption is a user operational
concern downstream.
### Sketch
- Chart location: `fleet/harmony-fleet-operator/chart/` (or sibling repo —
defer decision to implementation time).
- Templates: Namespace, SA, ClusterRole, ClusterRoleBinding,
Deployment (operator pod), CRD.
- **CRD yaml in the chart is generated at chart-publish time** from
the Rust `Deployment::crd()`. One-off release artifact, not
framework path — consistent with "no yaml in framework code."
- Values: operator image tag, NATS URL, log level.
- Smoke: `helm install` into k3d → CR apply → same assertions as
Chapter 1.
### Open questions
- Chart repo: subdir vs. separate git repo.
- CRD install mechanism: chart hook vs. templates directory.
Drives CRD upgrade story.
---
## Chapter 4 — Auth: Zitadel + OpenBao + per-device identity
**Goal:** per-device granular NATS credentials; SSO for operator
users; OpenBao policy per device; JWT bootstrap from Zitadel.
Zitadel + OpenBao are already ~99% integrated in harmony; this
chapter is wiring the IoT-specific flows.
### Sketch
- Agent's `CredentialSource` trait (already abstract in agent
`config.rs`) gets a Zitadel-JWT-backed implementation. Mints
short-lived NATS creds via OpenBao auth callout.
- Remove the shared-credentials `toml-shared` variant (v0 demo
leftover).
- Availability: auth-callout caches policies, tolerates OpenBao
outages.
- SSO for operator users (separate flow): Zitadel groups →
Kubernetes RBAC subjects on the `Deployment` CRD.
---
## Chapter 5 — Frontend (last)
**Goal:** operator-friendly UI for the decentralized platform.
Form factor undecided: Leptos web dashboard, CLI extension to
`harmony_cli`, or a TUI. Minimum viable product: read-only view of
fleet state (devices + deployments + aggregated status) powered by
the CRD `.status` from Chapter 2. Aspiration: write operations with
auth from Chapter 4.
---
## Chapter 6 — Customer demo rehearsal **[in progress]**
48-hour customer demo prep. PO assessment concluded that promising a
real-OKD deployment without first proving the JWT-auth chain is
reckless. **VM-based rehearsal first**, OKD second.
The rehearsal extends `smoke-a4` (k3d + libvirt VM + agent + apply
CR + reconcile podman) with **Zitadel + auth callout + agent JWT
auth**. Two devices + one admin. Same code paths as production —
only the cluster topology differs.
Detailed plan: [`v0_demo_e2e.md`](v0_demo_e2e.md).
Once the VM rehearsal is green (success criteria in that doc), the
residual deltas to ship to real OKD are configuration, not new code.
---
## Principles — what we've learned and want to keep doing
- **No yaml in framework code paths.** Every kube-rs type is
typed; every Score apply goes through typed Rust. Yaml generation
happens only at chart-publish time, never at runtime.
- **Scores describe desired state; topologies expose capabilities.**
Prefer adding capability traits over thickening a single topology.
- **Minimal topologies for ad-hoc Score execution.** `K8sAnywhereTopology`
has too many opinions (cert-manager install, tenant-manager bootstrap,
helm probes) for narrow apply-a-CRD use cases. See ROADMAP
§12.6 — a lean shared `K8sBareTopology` is the durable fix.
- **Cross-boundary wire types in `harmony-reconciler-contracts`**,
everything else in its natural crate.
- **Never ship untested code.** Every commit that changes runtime
behavior is verified against a smoke script before landing.
Cargo check + unit tests aren't enough.
- **Prove claims about upstream before blaming upstream.** The
Arch edk2 investigation showed this matters; see
`memory/feedback_prove_before_blaming_upstream.md`.