Adds ROADMAP/fleet_platform/v0_demo_e2e.md and threads it from v0_1_plan.md. The VM rehearsal extends smoke-a4 (already-green k3d + libvirt VM + agent + apply CR + reconcile loop) with Zitadel + auth callout + agent JWT auth. Two devices + one admin, real cargo tests sharing a OnceCell-bringup. Plan calls out: - The 7 tests, including the load-bearing `agent_recovers_from_nats_pod_restart` (asserts the auto-reconnect + auth-callback re-mint path under realistic disturbance). - Five known risks / debugging traps to expect on first cold-start (iam-admin-pat secret timing, /etc/hosts injection, k3d port collisions, etc.). - Success criteria for the rehearsal day: cold cargo run greens in <20 min, all 7 tests green on a clean machine, the NATS-restart test reliably greens 5 runs in a row. - Anything below the success criteria → reframe the customer call to "architecture walkthrough + local k3d demo + pilot in 1-2 weeks." Avoids burning the relationship to keep a deadline. Once VM rehearsal is green the residual OKD deltas are configuration (Route annotations, image registry, real DNS, cert) — no new code.
17 KiB
IoT Platform v0.1 and beyond — forward plan
Authoritative forward plan for the NationTech decentralized-infra /
IoT platform, written after the v0 walking skeleton shipped
(see v0_walking_skeleton.md for the historical diary). Organized as
five chapters in execution order.
State of the world (as of 2026-04-23)
Green, end-to-end:
- CRD → operator → NATS JetStream KV write path (
smoke-a1.sh). - Agent watches KV, reconciles podman containers (
smoke-a1.sh). - VM-as-device provisioning: cloud-init + fleet-agent install + NATS
smoke (
smoke-a3.sh), x86_64 (native KVM) and aarch64 (TCG). - Power-cycle / reboot resilience (
smoke-a3.shphase 5). - aarch64 cross-compile of the agent (no Harmony modules need to feature-gate aarch64).
- Operator installed via a harmony Score (typed Rust, no yaml).
harmony-reconciler-contractscrate — cross-boundary types (bucket names, key helpers,DeviceInfo,DeploymentState,HeartbeatPayload,DeploymentName,Idre-export).
Chapter 1 shipped (2026-04-21): composed end-to-end demo
(smoke-a4.sh) — operator in k3d + in-cluster NATS + ARM VM +
typed-Rust CR applier + hand-off menu + --auto regression. Green
on x86_64 (native KVM) and aarch64 (TCG).
Chapter 2 shipped (2026-04-23): selector-based targeting +
Device CRD + .status.aggregate reflect-back. Deployment.spec. targetSelector: LabelSelector resolves against cluster-scoped
Device CRs materialized from NATS device-info. Operator writes
desired-state KV per matched pair, patches
.status.aggregate (matchedDeviceCount / succeeded / failed /
pending / lastError) at 1 Hz. Load-tested to 10 000 devices ×
1 000 Deployments at 10 000 KV writes/s sustained, zero errors.
Not yet wired (real v0.1 work still to go):
- Helm packaging of the operator (Chapter 3).
- Zitadel + OpenBao auth (per-device credentials, SSO for
operator users). Placeholder
CredentialSourcetrait on the agent side (Chapter 4). - Any frontend (Chapter 5).
- Small quality items (not blockers): agent config-driven labels,
matchExpressionsin selectors,Device.status.conditionspopulated from heartbeat staleness.
Verified during planning (so future implementation doesn't have to re-litigate):
- Upgrade already works.
reconciler.rs::applybyte-compares serialized score payloads; drift triggers re-reconcile.PodmanTopology::ensure_service_runningremoves then re-creates containers on spec drift. No "stale + new" window. - The polymorphism stays.
ReconcileScoreis an externally-tagged enum; addingOkdApplyV0later is additive.
Surprises since v0 started (for context, none architectural):
- Arch
edk2-aarch64-202602-2shipped empty firmware blobs;202508-1ships unpadded edk2 that needs 64 MiB pflash padding. Fixed via runtime discovery + padding inmodules/kvm/firmware.rs. - MTTCG isn't default for cross-arch TCG on QEMU 10.2; force via
qemu:commandlineoverride.pauth-impdef=onlikewise a qemu:commandline opt-in. ensure_vmis idempotent on "domain exists" — re-apply of a changed XML requires manualundefine --nvram --remove-all-storage. Noted as a follow-up in the code comments.
Chapter 1 — Hands-on end-to-end demo (imminent)
Goal: the user runs one command, watches operator + NATS + ARM
VM come up, then drives a CRD through the full loop by hand:
kubectl apply it (manually or via a typed Rust applier), watch the
operator log "acquired," check the NATS KV store with natsbox,
SSH/console into the VM, curl the running nginx container from
the workstation.
User-facing requirements (explicit)
- No yaml fixtures. Sample
DeploymentCRs constructed in typed Rust usingDeploymentSpec+PodmanV0Score. Same discipline as theinstallScore that replacedgen-crd | kubectl apply. - ArgoCD deferred. User's production clusters have it; bringing
it into the smoke harness adds setup overhead without validating
anything
helm installdoesn't. Chapter 3 produces the chart; ArgoCD integration is a later operational concern. - Operator logs every CR it acquires —
controller.rsalready doestracing::info!(%ns, %name, "reconcile"); verify the output reads well in the command-menu hand-off. - natsbox debugging is first-class. Script prints exact natsbox one-liners at hand-off so the user can inspect KV state.
- In-cluster NATS. Not a side-by-side podman container (as smoke-a1 does today). Expose to the libvirt VM via k3d loadbalancer port mapping.
Design decisions
- Rust CR applier. New binary
examples/harmony_apply_deployment/. CLI flags--name --namespace --target-device --image --port --delete. Constructs theDeploymentCR viakube::Api<Deployment>+ typedDeploymentSpec; callsapi.apply(...). Can also--printthe CR JSON to stdout sokubectl apply -f -still works from the terminal. - smoke-a4.sh orchestration stays bash for now. User agreed this is test-harness scope, not framework path; converting it to Rust is "not as important right now."
- Hand-off is the default mode, not
--keep. The whole point of Chapter 1 is that the user drives the last stage interactively.smoke-a4.shbrings everything up, applies nothing, prints the command menu, waits onINT/TERMto tear down.--autoruns the full apply/curl/upgrade/delete regression for CI. - In-cluster NATS path. Preferred: use
harmony::modules::natsif it has a lightweight single-node / no-supercluster mode. Fallback: typedK8sResourceScoreapplying a minimal Deployment- NodePort Service. 15-min research task before committing.
Composed smoke phases (smoke-a4.sh)
- k3d cluster up with
-p "4222:4222@loadbalancer"so the host port 4222 forwards into the cluster. Reachable from the libvirt VM via the gateway IP (typically192.168.122.1:4222). - NATS in-cluster via the chosen path (harmony module or direct K8sResourceScore). Wait for readiness.
- Install CRD via the operator's
installsubcommand (typed Rust). - Spawn operator as a host-side process (same pattern as
smoke-a1). Operator connects to
nats://localhost:4222. - Provision ARM VM via
example_iot_vm_setup(same entry point smoke-a3 uses). Agent configured to connect tonats://<libvirt_gateway>:4222— discover the gateway IP viavirsh net-dumpxml default, as smoke-a3 already does. - Sanity:
kubectl wait ... crd Established, operator logged "KV bucket ready", agent logged "watching KV keys",status.<device>present inagent-statusbucket. - Hand off. Print the command menu below. Exit 0 with a cleanup
trap on
INT/TERM.
Command menu at hand-off
kubectl get deployments.fleet.nationtech.io -A -w— watch CR reconcile reactively.cargo run -q -p example_harmony_apply_deployment -- --image nginx:latest --target-device $TARGET_DEVICE— apply an nginx deployment via typed Rust.cargo run -q -p example_harmony_apply_deployment -- --print --image nginx:latest --target-device $TARGET_DEVICE | kubectl apply -f -— same thing, through kubectl.ssh -i $SSH_KEY fleet-admin@$VM_IP— connect to the VM.virsh console $VM_NAME --force— serial console alternative.podman --url unix://$VM_IP:... psor ssh +podman ps— list containers on the VM from the workstation.podman run --rm docker.io/natsio/nats-box nats --server nats://localhost:4222 kv ls desired-state— list desired state keys (from the host).podman run --rm ... nats kv get desired-state '<device>.<deployment>' --raw— dump a specific desired state.podman run --rm ... nats kv get agent-status 'status.<device>' --raw— dump the heartbeat.curl http://$VM_IP:8080/— hit the deployed nginx.
--auto path (for regression)
- Apply
nginx:latest, wait for container on VM,curl200. - Apply
nginx:1.26(upgrade), wait for container id to change,curl200 against the new container. - Apply
--delete, wait for container gone from VM.
Files
- NEW
examples/harmony_apply_deployment/Cargo.toml+src/main.rs— typed applier. - NEW
fleet/scripts/smoke-a4.sh. - NO yaml fixtures. Rust CLI flags cover the shape.
- Optional: factor shared smoke phases (NATS up, k3d up, operator
spawn, VM provision) into
fleet/scripts/lib/if the duplication across a1/a3/a4 becomes obvious. Don't force it.
NATS exposure — implementation-time notes
- k3d
@loadbalancerport mapping binds the host's0.0.0.0:4222by default; libvirt VMs onvirbr0can reach it via the gateway IP. No special NAT config required. - Fallback if environmental snag: keep the side-by-side podman
container on an opt-in
NATS_MODE=podmanflag. Don't default to that — user explicitly asked for in-cluster.
Verification
- Fresh host:
ARCH=aarch64 ./fleet/scripts/smoke-a4.shcompletes in 8-15 min, prints the command menu. ARCH=aarch64 ./fleet/scripts/smoke-a4.sh --autoPASSes end-to-end including upgrade id-change assertion.- x86_64 (
ARCH=x86-64) completes in 2-5 min.
Explicitly out of scope
AgentStatus/DeploymentStatusenrichment — Chapter 2.- Helm chart, ArgoCD, auth, frontend — later chapters.
- Lifting the applier into a reusable
ApplyDeploymentScore— only if a second consumer appears.
Chapter 2 — Status reflect-back + selector-based targeting [SHIPPED 2026-04-23]
Goal: CRD .status reflects fleet reality — per-deployment
success/failure/pending counts, last-error surface, freshness. The
Deployment CR targets devices by label selector, not by id list.
The shipped design replaces the original
AgentStatus+ list-of-ids proposal wholesale. Seechapter_4_aggregation_scale.mdfor the superseded design-doc archaeology. Commits:refactor(iot): delete legacy AgentStatus path,refactor(iot): operator watches device-state KV directly; drop event stream,refactor(iot): Deployment.targetSelector + Device CRD (DaemonSet-like).
What shipped
Wire format (in harmony-reconciler-contracts): four per-concern
payloads on dedicated NATS KV buckets. No monolithic per-device blob,
no separate event stream.
| Type | Bucket | Cadence |
|---|---|---|
DeviceInfo |
device-info |
on startup + label/inventory change |
DeploymentState |
device-state |
on reconcile phase transition |
HeartbeatPayload |
device-heartbeat |
every 30 s |
CRDs. Two cluster resources:
Deployment(namespaced) —spec.targetSelector: LabelSelector(standard K8smatchLabels/matchExpressions). No device list on spec..status.aggregatecarriesmatchedDeviceCount,succeeded,failed,pending,lastError.Device(cluster-scoped, likeNode) —metadata.labelscarries the device's routing labels;spec.inventoryholds the hardware/OS snapshot;status.conditionsis reserved for liveness (populated lazily by a future heartbeat-freshness reconciler, not every ping).
Operator tasks (three concurrent loops in one process):
controller— validates Deployment CR names, holds the finalizer that cleansdesired-state.<device>.<deployment>KV entries on delete. No writes on apply (aggregator handles that).device_reconciler— watches thedevice-infoKV; server-side- applies aDeviceCR perDeviceInfopayload, with label sanitization. Agents remain kube-unaware.fleet_aggregator— three caches driven by watches (Deployment CRs, Device CRs,device-stateKV). On any change, resolves each selector against the Device cache, writes/deletesdesired-stateKV entries for diffed matches, and patches.status.aggregateat 1 Hz for the CRs whose counters moved.
Agents publish device-id=<id> as a default DeviceInfo label, so
targeting a single device with matchLabels: {device-id: pi-42} is
zero-config. User-defined labels layer on from agent config (scoped
out of this chapter; follow-up item).
Scale proof
fleet/scripts/load-test.sh + examples/fleet_load_test simulate N
devices across M Deployments, driving device-state KV updates at a
configurable cadence while the full operator stack runs against a
local k3d apiserver. Verified:
- 100 devices / 10 groups / 1 Hz / 60 s — 100 writes/s sustained, all 10 CR aggregates converge.
- 10 000 devices / 1 000 groups / 1 Hz / 120 s — ~10 000 writes/s
sustained, 0 errors, all 1 000 CR aggregates correct
(
matchedDeviceCount == expected,succeeded + failed + pending == matched). Same envelope before and after the selector rewrite.
Out of scope in this chapter (follow-ups)
- Agent config-driven labels (
[labels]in agent toml → DeviceInfo). ~30 lines; deferred until a concrete need lands. matchExpressionsevaluator. Operator currently supportsmatchLabelsonly and logs a warning for expression-bearing selectors. ~50 lines; deferred.Device.status.conditionspopulated from heartbeat staleness (Reachable / Stale transitions). Liveness is computable today by readingdevice-heartbeatdirectly; CR-side reflection is a convenience. ~100 lines; deferred.- Full journald log streaming. The
.status.aggregate.lastErrorsurface covers the user's reflect-back requirement for now. - Multi-device regression smoke — defer until real hardware or a second VM is around.
Chapter 3 — Helm chart (ArgoCD deferred)
Goal: operator ships as a versioned helm chart with CRD version-locked inside.
User clarified this session: ArgoCD exists in production; all it does is apply resources from the chart. Standing up ArgoCD in the smoke adds setup overhead with no incremental validation value.
Chapter 3 produces the chart + validates helm install / helm upgrade lifecycles. ArgoCD consumption is a user operational
concern downstream.
Sketch
- Chart location:
fleet/harmony-fleet-operator/chart/(or sibling repo — defer decision to implementation time). - Templates: Namespace, SA, ClusterRole, ClusterRoleBinding, Deployment (operator pod), CRD.
- CRD yaml in the chart is generated at chart-publish time from
the Rust
Deployment::crd(). One-off release artifact, not framework path — consistent with "no yaml in framework code." - Values: operator image tag, NATS URL, log level.
- Smoke:
helm installinto k3d → CR apply → same assertions as Chapter 1.
Open questions
- Chart repo: subdir vs. separate git repo.
- CRD install mechanism: chart hook vs. templates directory. Drives CRD upgrade story.
Chapter 4 — Auth: Zitadel + OpenBao + per-device identity
Goal: per-device granular NATS credentials; SSO for operator users; OpenBao policy per device; JWT bootstrap from Zitadel.
Zitadel + OpenBao are already ~99% integrated in harmony; this chapter is wiring the IoT-specific flows.
Sketch
- Agent's
CredentialSourcetrait (already abstract in agentconfig.rs) gets a Zitadel-JWT-backed implementation. Mints short-lived NATS creds via OpenBao auth callout. - Remove the shared-credentials
toml-sharedvariant (v0 demo leftover). - Availability: auth-callout caches policies, tolerates OpenBao outages.
- SSO for operator users (separate flow): Zitadel groups →
Kubernetes RBAC subjects on the
DeploymentCRD.
Chapter 5 — Frontend (last)
Goal: operator-friendly UI for the decentralized platform.
Form factor undecided: Leptos web dashboard, CLI extension to
harmony_cli, or a TUI. Minimum viable product: read-only view of
fleet state (devices + deployments + aggregated status) powered by
the CRD .status from Chapter 2. Aspiration: write operations with
auth from Chapter 4.
Chapter 6 — Customer demo rehearsal [in progress]
48-hour customer demo prep. PO assessment concluded that promising a real-OKD deployment without first proving the JWT-auth chain is reckless. VM-based rehearsal first, OKD second.
The rehearsal extends smoke-a4 (k3d + libvirt VM + agent + apply
CR + reconcile podman) with Zitadel + auth callout + agent JWT
auth. Two devices + one admin. Same code paths as production —
only the cluster topology differs.
Detailed plan: v0_demo_e2e.md.
Once the VM rehearsal is green (success criteria in that doc), the residual deltas to ship to real OKD are configuration, not new code.
Principles — what we've learned and want to keep doing
- No yaml in framework code paths. Every kube-rs type is typed; every Score apply goes through typed Rust. Yaml generation happens only at chart-publish time, never at runtime.
- Scores describe desired state; topologies expose capabilities. Prefer adding capability traits over thickening a single topology.
- Minimal topologies for ad-hoc Score execution.
K8sAnywhereTopologyhas too many opinions (cert-manager install, tenant-manager bootstrap, helm probes) for narrow apply-a-CRD use cases. See ROADMAP §12.6 — a lean sharedK8sBareTopologyis the durable fix. - Cross-boundary wire types in
harmony-reconciler-contracts, everything else in its natural crate. - Never ship untested code. Every commit that changes runtime behavior is verified against a smoke script before landing. Cargo check + unit tests aren't enough.
- Prove claims about upstream before blaming upstream. The
Arch edk2 investigation showed this matters; see
memory/feedback_prove_before_blaming_upstream.md.