feat: scaffold IoT walking skeleton — podman module, operator, and agent #264

Open
johnride wants to merge 4 commits from feat/iot-walking-skeleton into master
Owner
  • Add PodmanV0Score/IotScore (adjacent-tagged serde) and PodmanV0Interpret stub
  • Gate virt behind kvm feature and podman-api behind podman feature
  • Scaffold iot-operator-v0 (kube-rs operator stub) and iot-agent-v0 (NATS KV watch)
  • Add PodmanV0 to InterpretName enum
  • Fix aarch64 cross-compilation by making kvm/podman optional features
  • Align async-nats across workspace, add workspace deps for tracing/toml/tracing-subscriber
  • Remove unused deps (serde_yaml from agent, schemars from operator)
  • Add Send+Sync to CredentialSource, fix &PathBuf → &Path, remove dead_code allow
  • Update 5 KVM example Cargo.tomls with explicit features = ["kvm"]
- Add PodmanV0Score/IotScore (adjacent-tagged serde) and PodmanV0Interpret stub - Gate virt behind kvm feature and podman-api behind podman feature - Scaffold iot-operator-v0 (kube-rs operator stub) and iot-agent-v0 (NATS KV watch) - Add PodmanV0 to InterpretName enum - Fix aarch64 cross-compilation by making kvm/podman optional features - Align async-nats across workspace, add workspace deps for tracing/toml/tracing-subscriber - Remove unused deps (serde_yaml from agent, schemars from operator) - Add Send+Sync to CredentialSource, fix &PathBuf → &Path, remove dead_code allow - Update 5 KVM example Cargo.tomls with explicit features = ["kvm"]
johnride added 1 commit 2026-04-18 02:37:58 +00:00
feat: scaffold IoT walking skeleton — podman module, operator, and agent
Some checks are pending
Run Check Script / check (pull_request) Waiting to run
65ef540b97
- Add PodmanV0Score/IotScore (adjacent-tagged serde) and PodmanV0Interpret stub
- Gate virt behind kvm feature and podman-api behind podman feature
- Scaffold iot-operator-v0 (kube-rs operator stub) and iot-agent-v0 (NATS KV watch)
- Add PodmanV0 to InterpretName enum
- Fix aarch64 cross-compilation by making kvm/podman optional features
- Align async-nats across workspace, add workspace deps for tracing/toml/tracing-subscriber
- Remove unused deps (serde_yaml from agent, schemars from operator)
- Add Send+Sync to CredentialSource, fix &PathBuf → &Path, remove dead_code allow
- Update 5 KVM example Cargo.tomls with explicit features = ["kvm"]
johnride added 2 commits 2026-04-18 14:08:04 +00:00
Implement the A1 task from the IoT walking-skeleton roadmap:

- CRD (kube-derive): `iot.nationtech.io/v1alpha1/Deployment`, namespaced,
  with `targetDevices`, `score {type, data}`, `rollout.strategy`, and a
  status subresource carrying `observedScoreString`.
- Controller: `kube::runtime::Controller` + `finalizer` helper. On Apply,
  writes `<device_id>.<deployment_name>` into NATS KV bucket
  `desired-state` and patches `.status.observedScoreString` via
  server-side apply. Skips KV write + status patch when the score is
  unchanged to avoid reconcile-loop churn. On Cleanup, removes the
  per-device keys before releasing the finalizer.
- CLI: `gen-crd` subcommand prints the CRD YAML from the Rust types;
  `run` (default) starts the controller. `deploy/crd.yaml` is generated
  by that subcommand — single source of truth, no drift.
- Deploy manifests: `deploy/operator.yaml` (Namespace, SA, ClusterRole,
  ClusterRoleBinding, Deployment) and generated `deploy/crd.yaml`.

Agent fixes surfaced while aligning with the operator's key layout:

- Watch filter: was `starts_with("desired-state.<id>.")` on
  `watch_all()`; bucket name is not a key prefix, so it never matched.
  Now uses `bucket.watch("<id>.>")` with the NATS wildcard and handles
  `Put`/`Delete`/`Purge` distinctly.
- Multi-server connect: was joining `nats.urls` with `","` into a single
  malformed URL. Pass the `Vec<String>` to `ConnectOptions::connect`.
- `credentials.type` is now validated (rejects unknown discriminators)
  so a v0.2 `zitadel` config doesn't silently fall back to shared creds.

Verification on feat/iot-walking-skeleton:
- cargo clippy --no-deps -D warnings: clean (agent + operator).
- cargo fmt --check: clean.
- x86_64 + aarch64 cross-compile: both build.
- podman module unit tests: pass.
test(iot-operator): A1 end-to-end smoke test + CRD/patch fixes
Some checks are pending
Run Check Script / check (pull_request) Waiting to run
1c916340f1
`iot/scripts/smoke-a1.sh` drives the A1 acceptance flow end-to-end:
spins up NATS and a k3d cluster via podman, applies the generated CRD,
runs the operator, applies a Deployment CR, asserts the expected
`<device>.<deployment>` key lands in the `desired-state` KV bucket and
`.status.observedScoreString` round-trips the same JSON, then deletes
the CR and asserts the finalizer removes the KV key. Cleans up on exit.

Two fixes surfaced while running it:

1. `ScorePayload.data: serde_json::Value` generated an empty `{}`
   schema, which the API server rejects. Attach a `schemars(schema_with
   = preserve_arbitrary)` helper that emits `x-kubernetes-preserve-
   unknown-fields: true`, letting the Score payload be any JSON shape.
2. `Patch::Merge` combined with `PatchParams::apply(...).force()` is
   rejected by kube-rs (force is Apply-only). Use a plain `Merge` patch
   for the status subresource — simpler and correct for v0.
johnride added 1 commit 2026-04-18 14:37:10 +00:00
feat(iot-operator): CEL-validate score.type as a Rust identifier
Some checks are pending
Run Check Script / check (pull_request) Waiting to run
d21bdef050
The CRD previously accepted any string for `score.type`, so typos like
`"pdoman"` or `"PodmnV0"` would be persisted by the apiserver and only
surface on-device as agent-side deserialize warnings. That class of
failure is distasteful and hard to debug.

Replace the auto-derived schema for `ScorePayload` with a hand-rolled
one that keeps the same visible shape but adds two apiserver-level
guardrails:

- `score.type` gets `minLength: 1` and an `x-kubernetes-validations`
  CEL rule requiring it to match `^[A-Za-z_][A-Za-z0-9_]*$` — a valid
  Rust identifier, since score variants *are* Rust struct names in
  `harmony::modules::podman::IotScore`. Message points operators at
  the concrete example `PodmanV0`.
- `score.data` still carries only `x-kubernetes-preserve-unknown-
  fields: true`. The rule validates the discriminator's *shape*, not
  its *value*, so v0.3+ variants (OkdApplyV0, KubectlApplyV0) don't
  require an operator release — preserves ROADMAP §6.1's
  generic-router design.

The `x-kubernetes-preserve-unknown-fields` extension stays scoped to
`score.data` alone; every other field in the CRD has a strict schema,
exactly one preserve-unknown-fields marker and exactly one
validations block in the whole document.

Smoke test extended: phase 2b applies a CR with `score.type: "has
spaces"` and asserts the apiserver rejects it with the CEL message
before the operator ever sees it. Positive phases (kubectl apply ->
NATS KV put -> status observed -> delete -> KV key removed) still
PASS end-to-end.

Matches the `preserve_arbitrary` pattern used by ArgoCD
(`Application.spec.source.helm.valuesObject`) and Flux
(`HelmRelease.spec.values`), both of which similarly use narrow
preserve-unknown-fields on a payload field without coupling the CRD
to their variant catalog.
Some checks are pending
Run Check Script / check (pull_request) Waiting to run
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feat/iot-walking-skeleton:feat/iot-walking-skeleton
git checkout feat/iot-walking-skeleton
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: NationTech/harmony#264
No description provided.