Adds ROADMAP/fleet_platform/v0_demo_e2e.md and threads it from v0_1_plan.md. The VM rehearsal extends smoke-a4 (already-green k3d + libvirt VM + agent + apply CR + reconcile loop) with Zitadel + auth callout + agent JWT auth. Two devices + one admin, real cargo tests sharing a OnceCell-bringup. Plan calls out: - The 7 tests, including the load-bearing `agent_recovers_from_nats_pod_restart` (asserts the auto-reconnect + auth-callback re-mint path under realistic disturbance). - Five known risks / debugging traps to expect on first cold-start (iam-admin-pat secret timing, /etc/hosts injection, k3d port collisions, etc.). - Success criteria for the rehearsal day: cold cargo run greens in <20 min, all 7 tests green on a clean machine, the NATS-restart test reliably greens 5 runs in a row. - Anything below the success criteria → reframe the customer call to "architecture walkthrough + local k3d demo + pilot in 1-2 weeks." Avoids burning the relationship to keep a deadline. Once VM rehearsal is green the residual OKD deltas are configuration (Route annotations, image registry, real DNS, cert) — no new code.
400 lines
17 KiB
Markdown
400 lines
17 KiB
Markdown
# IoT Platform v0.1 and beyond — forward plan
|
||
|
||
Authoritative forward plan for the NationTech decentralized-infra /
|
||
IoT platform, written after the v0 walking skeleton shipped
|
||
(see `v0_walking_skeleton.md` for the historical diary). Organized as
|
||
five chapters in execution order.
|
||
|
||
## State of the world (as of 2026-04-23)
|
||
|
||
**Green, end-to-end:**
|
||
|
||
- CRD → operator → NATS JetStream KV write path (`smoke-a1.sh`).
|
||
- Agent watches KV, reconciles podman containers (`smoke-a1.sh`).
|
||
- VM-as-device provisioning: cloud-init + fleet-agent install + NATS
|
||
smoke (`smoke-a3.sh`), x86_64 (native KVM) and aarch64 (TCG).
|
||
- Power-cycle / reboot resilience (`smoke-a3.sh` phase 5).
|
||
- aarch64 cross-compile of the agent (no Harmony modules need to
|
||
feature-gate aarch64).
|
||
- Operator installed via a harmony Score (typed Rust, no yaml).
|
||
- `harmony-reconciler-contracts` crate — cross-boundary types
|
||
(bucket names, key helpers, `DeviceInfo`, `DeploymentState`,
|
||
`HeartbeatPayload`, `DeploymentName`, `Id` re-export).
|
||
|
||
**Chapter 1 shipped** (2026-04-21): composed end-to-end demo
|
||
(`smoke-a4.sh`) — operator in k3d + in-cluster NATS + ARM VM +
|
||
typed-Rust CR applier + hand-off menu + `--auto` regression. Green
|
||
on x86_64 (native KVM) and aarch64 (TCG).
|
||
|
||
**Chapter 2 shipped** (2026-04-23): selector-based targeting +
|
||
Device CRD + `.status.aggregate` reflect-back. `Deployment.spec.
|
||
targetSelector: LabelSelector` resolves against cluster-scoped
|
||
`Device` CRs materialized from NATS `device-info`. Operator writes
|
||
`desired-state` KV per matched pair, patches
|
||
`.status.aggregate` (matchedDeviceCount / succeeded / failed /
|
||
pending / lastError) at 1 Hz. Load-tested to 10 000 devices ×
|
||
1 000 Deployments at 10 000 KV writes/s sustained, zero errors.
|
||
|
||
**Not yet wired (real v0.1 work still to go):**
|
||
|
||
- Helm packaging of the operator (Chapter 3).
|
||
- Zitadel + OpenBao auth (per-device credentials, SSO for
|
||
operator users). Placeholder `CredentialSource` trait on the
|
||
agent side (Chapter 4).
|
||
- Any frontend (Chapter 5).
|
||
- Small quality items (not blockers): agent config-driven labels,
|
||
`matchExpressions` in selectors, `Device.status.conditions`
|
||
populated from heartbeat staleness.
|
||
|
||
**Verified during planning** (so future implementation doesn't
|
||
have to re-litigate):
|
||
|
||
- **Upgrade already works.** `reconciler.rs::apply` byte-compares
|
||
serialized score payloads; drift triggers re-reconcile.
|
||
`PodmanTopology::ensure_service_running` removes then re-creates
|
||
containers on spec drift. No "stale + new" window.
|
||
- **The polymorphism stays.** `ReconcileScore` is an externally-tagged
|
||
enum; adding `OkdApplyV0` later is additive.
|
||
|
||
**Surprises since v0 started** (for context, none architectural):
|
||
|
||
- Arch `edk2-aarch64-202602-2` shipped empty firmware blobs;
|
||
`202508-1` ships unpadded edk2 that needs 64 MiB pflash padding.
|
||
Fixed via runtime discovery + padding in `modules/kvm/firmware.rs`.
|
||
- MTTCG isn't default for cross-arch TCG on QEMU 10.2; force via
|
||
`qemu:commandline` override. `pauth-impdef=on` likewise a
|
||
qemu:commandline opt-in.
|
||
- `ensure_vm` is idempotent on "domain exists" — re-apply of a
|
||
changed XML requires manual `undefine --nvram --remove-all-storage`.
|
||
Noted as a follow-up in the code comments.
|
||
|
||
---
|
||
|
||
## Chapter 1 — Hands-on end-to-end demo (imminent)
|
||
|
||
**Goal:** the user runs one command, watches operator + NATS + ARM
|
||
VM come up, then drives a CRD through the full loop by hand:
|
||
`kubectl apply` it (manually or via a typed Rust applier), watch the
|
||
operator log "acquired," check the NATS KV store with `natsbox`,
|
||
SSH/console into the VM, `curl` the running nginx container from
|
||
the workstation.
|
||
|
||
### User-facing requirements (explicit)
|
||
|
||
- **No yaml fixtures.** Sample `Deployment` CRs constructed in
|
||
typed Rust using `DeploymentSpec` + `PodmanV0Score`. Same
|
||
discipline as the `install` Score that replaced `gen-crd | kubectl
|
||
apply`.
|
||
- **ArgoCD deferred.** User's production clusters have it; bringing
|
||
it into the smoke harness adds setup overhead without validating
|
||
anything `helm install` doesn't. Chapter 3 produces the chart;
|
||
ArgoCD integration is a later operational concern.
|
||
- **Operator logs every CR it acquires** — `controller.rs` already
|
||
does `tracing::info!(%ns, %name, "reconcile")`; verify the output
|
||
reads well in the command-menu hand-off.
|
||
- **natsbox debugging is first-class.** Script prints exact
|
||
natsbox one-liners at hand-off so the user can inspect KV state.
|
||
- **In-cluster NATS.** Not a side-by-side podman container (as
|
||
smoke-a1 does today). Expose to the libvirt VM via k3d
|
||
loadbalancer port mapping.
|
||
|
||
### Design decisions
|
||
|
||
- **Rust CR applier.** New binary `examples/harmony_apply_deployment/`.
|
||
CLI flags `--name --namespace --target-device --image --port
|
||
--delete`. Constructs the `Deployment` CR via
|
||
`kube::Api<Deployment>` + typed `DeploymentSpec`; calls
|
||
`api.apply(...)`. Can also `--print` the CR JSON to stdout so
|
||
`kubectl apply -f -` still works from the terminal.
|
||
- **smoke-a4.sh orchestration stays bash for now.** User agreed
|
||
this is test-harness scope, not framework path; converting it
|
||
to Rust is "not as important right now."
|
||
- **Hand-off is the default mode**, not `--keep`. The whole point
|
||
of Chapter 1 is that the user drives the last stage interactively.
|
||
`smoke-a4.sh` brings everything up, applies *nothing*, prints
|
||
the command menu, waits on `INT/TERM` to tear down. `--auto`
|
||
runs the full apply/curl/upgrade/delete regression for CI.
|
||
- **In-cluster NATS path.** Preferred: use `harmony::modules::nats`
|
||
if it has a lightweight single-node / no-supercluster mode.
|
||
Fallback: typed `K8sResourceScore` applying a minimal Deployment
|
||
+ NodePort Service. 15-min research task before committing.
|
||
|
||
### Composed smoke phases (`smoke-a4.sh`)
|
||
|
||
1. k3d cluster up with `-p "4222:4222@loadbalancer"` so the host
|
||
port 4222 forwards into the cluster. Reachable from the
|
||
libvirt VM via the gateway IP (typically `192.168.122.1:4222`).
|
||
2. NATS in-cluster via the chosen path (harmony module or direct
|
||
K8sResourceScore). Wait for readiness.
|
||
3. Install CRD via the operator's `install` subcommand (typed Rust).
|
||
4. Spawn operator as a host-side process (same pattern as
|
||
smoke-a1). Operator connects to `nats://localhost:4222`.
|
||
5. Provision ARM VM via `example_iot_vm_setup` (same entry point
|
||
smoke-a3 uses). Agent configured to connect to
|
||
`nats://<libvirt_gateway>:4222` — discover the gateway IP via
|
||
`virsh net-dumpxml default`, as smoke-a3 already does.
|
||
6. Sanity: `kubectl wait ... crd Established`, operator logged
|
||
"KV bucket ready", agent logged "watching KV keys",
|
||
`status.<device>` present in `agent-status` bucket.
|
||
7. Hand off. Print the command menu below. Exit 0 with a cleanup
|
||
trap on `INT/TERM`.
|
||
|
||
### Command menu at hand-off
|
||
|
||
- `kubectl get deployments.fleet.nationtech.io -A -w` — watch CR
|
||
reconcile reactively.
|
||
- `cargo run -q -p example_harmony_apply_deployment -- --image
|
||
nginx:latest --target-device $TARGET_DEVICE` — apply an nginx
|
||
deployment via typed Rust.
|
||
- `cargo run -q -p example_harmony_apply_deployment -- --print
|
||
--image nginx:latest --target-device $TARGET_DEVICE |
|
||
kubectl apply -f -` — same thing, through kubectl.
|
||
- `ssh -i $SSH_KEY fleet-admin@$VM_IP` — connect to the VM.
|
||
- `virsh console $VM_NAME --force` — serial console alternative.
|
||
- `podman --url unix://$VM_IP:... ps` or ssh + `podman ps`
|
||
— list containers on the VM from the workstation.
|
||
- `podman run --rm docker.io/natsio/nats-box nats --server
|
||
nats://localhost:4222 kv ls desired-state` — list desired
|
||
state keys (from the host).
|
||
- `podman run --rm ... nats kv get desired-state
|
||
'<device>.<deployment>' --raw` — dump a specific desired state.
|
||
- `podman run --rm ... nats kv get agent-status
|
||
'status.<device>' --raw` — dump the heartbeat.
|
||
- `curl http://$VM_IP:8080/` — hit the deployed nginx.
|
||
|
||
### `--auto` path (for regression)
|
||
|
||
1. Apply `nginx:latest`, wait for container on VM, `curl` 200.
|
||
2. Apply `nginx:1.26` (upgrade), wait for container *id* to change,
|
||
`curl` 200 against the new container.
|
||
3. Apply `--delete`, wait for container gone from VM.
|
||
|
||
### Files
|
||
|
||
- **NEW** `examples/harmony_apply_deployment/Cargo.toml` +
|
||
`src/main.rs` — typed applier.
|
||
- **NEW** `fleet/scripts/smoke-a4.sh`.
|
||
- **NO yaml fixtures.** Rust CLI flags cover the shape.
|
||
- Optional: factor shared smoke phases (NATS up, k3d up, operator
|
||
spawn, VM provision) into `fleet/scripts/lib/` if the duplication
|
||
across a1/a3/a4 becomes obvious. Don't force it.
|
||
|
||
### NATS exposure — implementation-time notes
|
||
|
||
- k3d `@loadbalancer` port mapping binds the host's `0.0.0.0:4222`
|
||
by default; libvirt VMs on `virbr0` can reach it via the gateway
|
||
IP. No special NAT config required.
|
||
- Fallback if environmental snag: keep the side-by-side podman
|
||
container on an opt-in `NATS_MODE=podman` flag. Don't default
|
||
to that — user explicitly asked for in-cluster.
|
||
|
||
### Verification
|
||
|
||
- Fresh host: `ARCH=aarch64 ./fleet/scripts/smoke-a4.sh` completes
|
||
in 8-15 min, prints the command menu.
|
||
- `ARCH=aarch64 ./fleet/scripts/smoke-a4.sh --auto` PASSes
|
||
end-to-end including upgrade id-change assertion.
|
||
- x86_64 (`ARCH=x86-64`) completes in 2-5 min.
|
||
|
||
### Explicitly out of scope
|
||
|
||
- `AgentStatus` / `DeploymentStatus` enrichment — Chapter 2.
|
||
- Helm chart, ArgoCD, auth, frontend — later chapters.
|
||
- Lifting the applier into a reusable `ApplyDeploymentScore` —
|
||
only if a second consumer appears.
|
||
|
||
---
|
||
|
||
## Chapter 2 — Status reflect-back + selector-based targeting **[SHIPPED 2026-04-23]**
|
||
|
||
**Goal:** CRD `.status` reflects fleet reality — per-deployment
|
||
success/failure/pending counts, last-error surface, freshness. The
|
||
Deployment CR targets devices by label selector, not by id list.
|
||
|
||
> The shipped design replaces the original `AgentStatus` + list-of-ids
|
||
> proposal wholesale. See `chapter_4_aggregation_scale.md` for the
|
||
> superseded design-doc archaeology. Commits:
|
||
> `refactor(iot): delete legacy AgentStatus path`,
|
||
> `refactor(iot): operator watches device-state KV directly; drop event stream`,
|
||
> `refactor(iot): Deployment.targetSelector + Device CRD (DaemonSet-like)`.
|
||
|
||
### What shipped
|
||
|
||
**Wire format** (in `harmony-reconciler-contracts`): four per-concern
|
||
payloads on dedicated NATS KV buckets. No monolithic per-device blob,
|
||
no separate event stream.
|
||
|
||
| Type | Bucket | Cadence |
|
||
|------|--------|---------|
|
||
| `DeviceInfo` | `device-info` | on startup + label/inventory change |
|
||
| `DeploymentState` | `device-state` | on reconcile phase transition |
|
||
| `HeartbeatPayload` | `device-heartbeat` | every 30 s |
|
||
|
||
**CRDs.** Two cluster resources:
|
||
|
||
- `Deployment` (namespaced) — `spec.targetSelector: LabelSelector`
|
||
(standard K8s `matchLabels` / `matchExpressions`). No device list
|
||
on spec. `.status.aggregate` carries `matchedDeviceCount`,
|
||
`succeeded`, `failed`, `pending`, `lastError`.
|
||
- `Device` (cluster-scoped, like `Node`) — `metadata.labels` carries
|
||
the device's routing labels; `spec.inventory` holds the hardware/OS
|
||
snapshot; `status.conditions` is reserved for liveness (populated
|
||
lazily by a future heartbeat-freshness reconciler, not every ping).
|
||
|
||
**Operator tasks** (three concurrent loops in one process):
|
||
|
||
1. `controller` — validates Deployment CR names, holds the finalizer
|
||
that cleans `desired-state.<device>.<deployment>` KV entries on
|
||
delete. No writes on apply (aggregator handles that).
|
||
2. `device_reconciler` — watches the `device-info` KV; server-side-
|
||
applies a `Device` CR per `DeviceInfo` payload, with label
|
||
sanitization. Agents remain kube-unaware.
|
||
3. `fleet_aggregator` — three caches driven by watches (Deployment
|
||
CRs, Device CRs, `device-state` KV). On any change, resolves
|
||
each selector against the Device cache, writes/deletes
|
||
`desired-state` KV entries for diffed matches, and patches
|
||
`.status.aggregate` at 1 Hz for the CRs whose counters moved.
|
||
|
||
**Agents** publish `device-id=<id>` as a default DeviceInfo label, so
|
||
targeting a single device with `matchLabels: {device-id: pi-42}` is
|
||
zero-config. User-defined labels layer on from agent config (scoped
|
||
out of this chapter; follow-up item).
|
||
|
||
### Scale proof
|
||
|
||
`fleet/scripts/load-test.sh` + `examples/fleet_load_test` simulate N
|
||
devices across M Deployments, driving `device-state` KV updates at a
|
||
configurable cadence while the full operator stack runs against a
|
||
local k3d apiserver. Verified:
|
||
|
||
- 100 devices / 10 groups / 1 Hz / 60 s — 100 writes/s sustained,
|
||
all 10 CR aggregates converge.
|
||
- 10 000 devices / 1 000 groups / 1 Hz / 120 s — ~10 000 writes/s
|
||
sustained, 0 errors, all 1 000 CR aggregates correct
|
||
(`matchedDeviceCount == expected`, `succeeded + failed + pending
|
||
== matched`). Same envelope before and after the selector rewrite.
|
||
|
||
### Out of scope in this chapter (follow-ups)
|
||
|
||
- Agent config-driven labels (`[labels]` in agent toml → DeviceInfo).
|
||
~30 lines; deferred until a concrete need lands.
|
||
- `matchExpressions` evaluator. Operator currently supports
|
||
`matchLabels` only and logs a warning for expression-bearing
|
||
selectors. ~50 lines; deferred.
|
||
- `Device.status.conditions` populated from heartbeat staleness
|
||
(Reachable / Stale transitions). Liveness is computable today by
|
||
reading `device-heartbeat` directly; CR-side reflection is a
|
||
convenience. ~100 lines; deferred.
|
||
- Full journald log streaming. The `.status.aggregate.lastError`
|
||
surface covers the user's reflect-back requirement for now.
|
||
- Multi-device regression smoke — defer until real hardware or a
|
||
second VM is around.
|
||
|
||
---
|
||
|
||
## Chapter 3 — Helm chart (ArgoCD deferred)
|
||
|
||
**Goal:** operator ships as a versioned helm chart with CRD
|
||
version-locked inside.
|
||
|
||
User clarified this session: ArgoCD exists in production; all it
|
||
does is apply resources from the chart. Standing up ArgoCD in the
|
||
smoke adds setup overhead with no incremental validation value.
|
||
|
||
Chapter 3 produces the chart + validates `helm install / helm
|
||
upgrade` lifecycles. ArgoCD consumption is a user operational
|
||
concern downstream.
|
||
|
||
### Sketch
|
||
|
||
- Chart location: `fleet/harmony-fleet-operator/chart/` (or sibling repo —
|
||
defer decision to implementation time).
|
||
- Templates: Namespace, SA, ClusterRole, ClusterRoleBinding,
|
||
Deployment (operator pod), CRD.
|
||
- **CRD yaml in the chart is generated at chart-publish time** from
|
||
the Rust `Deployment::crd()`. One-off release artifact, not
|
||
framework path — consistent with "no yaml in framework code."
|
||
- Values: operator image tag, NATS URL, log level.
|
||
- Smoke: `helm install` into k3d → CR apply → same assertions as
|
||
Chapter 1.
|
||
|
||
### Open questions
|
||
|
||
- Chart repo: subdir vs. separate git repo.
|
||
- CRD install mechanism: chart hook vs. templates directory.
|
||
Drives CRD upgrade story.
|
||
|
||
---
|
||
|
||
## Chapter 4 — Auth: Zitadel + OpenBao + per-device identity
|
||
|
||
**Goal:** per-device granular NATS credentials; SSO for operator
|
||
users; OpenBao policy per device; JWT bootstrap from Zitadel.
|
||
|
||
Zitadel + OpenBao are already ~99% integrated in harmony; this
|
||
chapter is wiring the IoT-specific flows.
|
||
|
||
### Sketch
|
||
|
||
- Agent's `CredentialSource` trait (already abstract in agent
|
||
`config.rs`) gets a Zitadel-JWT-backed implementation. Mints
|
||
short-lived NATS creds via OpenBao auth callout.
|
||
- Remove the shared-credentials `toml-shared` variant (v0 demo
|
||
leftover).
|
||
- Availability: auth-callout caches policies, tolerates OpenBao
|
||
outages.
|
||
- SSO for operator users (separate flow): Zitadel groups →
|
||
Kubernetes RBAC subjects on the `Deployment` CRD.
|
||
|
||
---
|
||
|
||
## Chapter 5 — Frontend (last)
|
||
|
||
**Goal:** operator-friendly UI for the decentralized platform.
|
||
|
||
Form factor undecided: Leptos web dashboard, CLI extension to
|
||
`harmony_cli`, or a TUI. Minimum viable product: read-only view of
|
||
fleet state (devices + deployments + aggregated status) powered by
|
||
the CRD `.status` from Chapter 2. Aspiration: write operations with
|
||
auth from Chapter 4.
|
||
|
||
---
|
||
|
||
## Chapter 6 — Customer demo rehearsal **[in progress]**
|
||
|
||
48-hour customer demo prep. PO assessment concluded that promising a
|
||
real-OKD deployment without first proving the JWT-auth chain is
|
||
reckless. **VM-based rehearsal first**, OKD second.
|
||
|
||
The rehearsal extends `smoke-a4` (k3d + libvirt VM + agent + apply
|
||
CR + reconcile podman) with **Zitadel + auth callout + agent JWT
|
||
auth**. Two devices + one admin. Same code paths as production —
|
||
only the cluster topology differs.
|
||
|
||
Detailed plan: [`v0_demo_e2e.md`](v0_demo_e2e.md).
|
||
|
||
Once the VM rehearsal is green (success criteria in that doc), the
|
||
residual deltas to ship to real OKD are configuration, not new code.
|
||
|
||
---
|
||
|
||
## Principles — what we've learned and want to keep doing
|
||
|
||
- **No yaml in framework code paths.** Every kube-rs type is
|
||
typed; every Score apply goes through typed Rust. Yaml generation
|
||
happens only at chart-publish time, never at runtime.
|
||
- **Scores describe desired state; topologies expose capabilities.**
|
||
Prefer adding capability traits over thickening a single topology.
|
||
- **Minimal topologies for ad-hoc Score execution.** `K8sAnywhereTopology`
|
||
has too many opinions (cert-manager install, tenant-manager bootstrap,
|
||
helm probes) for narrow apply-a-CRD use cases. See ROADMAP
|
||
§12.6 — a lean shared `K8sBareTopology` is the durable fix.
|
||
- **Cross-boundary wire types in `harmony-reconciler-contracts`**,
|
||
everything else in its natural crate.
|
||
- **Never ship untested code.** Every commit that changes runtime
|
||
behavior is verified against a smoke script before landing.
|
||
Cargo check + unit tests aren't enough.
|
||
- **Prove claims about upstream before blaming upstream.** The
|
||
Arch edk2 investigation showed this matters; see
|
||
`memory/feedback_prove_before_blaming_upstream.md`.
|