feat: scaffold IoT walking skeleton — podman module, operator, and agent #264
Open
johnride
wants to merge 4 commits from
feat/iot-walking-skeleton into master
pull from: feat/iot-walking-skeleton
merge into: NationTech:master
NationTech:master
NationTech:feat/disableDadScore
NationTech:feat/removesideeffect
NationTech:feat/test-alert-receivers-sttest
NationTech:feat/brocade-client-add-vlans
NationTech:feat/agent-desired-state
NationTech:feat/opnsense-dns-implementation
NationTech:feat/named-config-instances
NationTech:worktree-bridge-cse_012j1jB37XfjXvDGHUjHrKSj
NationTech:chore/leftover-adr
NationTech:feat/config_e2e_zitadel_openbao
NationTech:example/vllm
NationTech:feat/config_sqlite
NationTech:chore/roadmap
NationTech:feature/kvm-module
NationTech:feat/rustfs
NationTech:feat/harmony_assets
NationTech:feat/brocade_assisted_setup
NationTech:feat/cluster_alerting_score
NationTech:e2e-tests-multicluster
NationTech:fix/refactor_alert_receivers
NationTech:feat/change-node-readiness-strategy
NationTech:feat/zitadel
NationTech:feat/improve-inventory-discovery
NationTech:fix/monitoring_abstractions_openshift
NationTech:feat/nats-jetstream
NationTech:adr-nats-creds
NationTech:feat/st_test
NationTech:feat/dockerAutoinstall
NationTech:chore/cleanup_hacluster
NationTech:doc/cert-management
NationTech:feat/certificate_management
NationTech:adr/017-staleness-failover
NationTech:fix/nats_non_root
NationTech:feat/rebuild_inventory
NationTech:fix/opnsense_update
NationTech:feat/unshedulable_control_planes
NationTech:feat/worker_okd_install
NationTech:doc-and-braindump
NationTech:fix/pxe_install
NationTech:switch-client
NationTech:okd_enable_user_workload_monitoring
NationTech:configure-switch
NationTech:fix/clippy
NationTech:feat/gen-ca-cert
NationTech:feat/okd_default_ingress_class
NationTech:fix/add_routes_to_domain
NationTech:secrets-prompt-editor
NationTech:feat/multisiteApplication
NationTech:feat/ceph-install-score
NationTech:feat/ceph-osd-score
NationTech:feat/ceph_validate_health
NationTech:better-indicatif-progress-grouped
NationTech:feat/crd-alertmanager-configs
NationTech:better-cli
NationTech:opnsense_upgrade
NationTech:feat/monitoring-application-feature
NationTech:dev/postgres
NationTech:feat/cd/localdeploymentdemo
NationTech:feat/webhook_receiver
NationTech:feat/kube-prometheus
NationTech:feat/init_k8s_tenant
NationTech:feat/discord-webhook-receiver
NationTech:feat/kube-prometheus-monitor
NationTech:feat/tenantScore
NationTech:feat/teams-integration
NationTech:feat/slack-notifs
NationTech:monitoring
NationTech:runtime-profiles
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| d21bdef050 |
feat(iot-operator): CEL-validate score.type as a Rust identifier
Some checks are pending
Run Check Script / check (pull_request) Waiting to run
The CRD previously accepted any string for `score.type`, so typos like `"pdoman"` or `"PodmnV0"` would be persisted by the apiserver and only surface on-device as agent-side deserialize warnings. That class of failure is distasteful and hard to debug. Replace the auto-derived schema for `ScorePayload` with a hand-rolled one that keeps the same visible shape but adds two apiserver-level guardrails: - `score.type` gets `minLength: 1` and an `x-kubernetes-validations` CEL rule requiring it to match `^[A-Za-z_][A-Za-z0-9_]*$` — a valid Rust identifier, since score variants *are* Rust struct names in `harmony::modules::podman::IotScore`. Message points operators at the concrete example `PodmanV0`. - `score.data` still carries only `x-kubernetes-preserve-unknown- fields: true`. The rule validates the discriminator's *shape*, not its *value*, so v0.3+ variants (OkdApplyV0, KubectlApplyV0) don't require an operator release — preserves ROADMAP §6.1's generic-router design. The `x-kubernetes-preserve-unknown-fields` extension stays scoped to `score.data` alone; every other field in the CRD has a strict schema, exactly one preserve-unknown-fields marker and exactly one validations block in the whole document. Smoke test extended: phase 2b applies a CR with `score.type: "has spaces"` and asserts the apiserver rejects it with the CEL message before the operator ever sees it. Positive phases (kubectl apply -> NATS KV put -> status observed -> delete -> KV key removed) still PASS end-to-end. Matches the `preserve_arbitrary` pattern used by ArgoCD (`Application.spec.source.helm.valuesObject`) and Flux (`HelmRelease.spec.values`), both of which similarly use narrow preserve-unknown-fields on a payload field without coupling the CRD to their variant catalog. |
|||
| 1c916340f1 |
test(iot-operator): A1 end-to-end smoke test + CRD/patch fixes
Some checks are pending
Run Check Script / check (pull_request) Waiting to run
`iot/scripts/smoke-a1.sh` drives the A1 acceptance flow end-to-end:
spins up NATS and a k3d cluster via podman, applies the generated CRD,
runs the operator, applies a Deployment CR, asserts the expected
`<device>.<deployment>` key lands in the `desired-state` KV bucket and
`.status.observedScoreString` round-trips the same JSON, then deletes
the CR and asserts the finalizer removes the KV key. Cleans up on exit.
Two fixes surfaced while running it:
1. `ScorePayload.data: serde_json::Value` generated an empty `{}`
schema, which the API server rejects. Attach a `schemars(schema_with
= preserve_arbitrary)` helper that emits `x-kubernetes-preserve-
unknown-fields: true`, letting the Score payload be any JSON shape.
2. `Patch::Merge` combined with `PatchParams::apply(...).force()` is
rejected by kube-rs (force is Apply-only). Use a plain `Merge` patch
for the status subresource — simpler and correct for v0.
|
|||
| e50ab741fc |
feat(iot-operator): Deployment CRD controller writing to NATS KV
Implement the A1 task from the IoT walking-skeleton roadmap:
- CRD (kube-derive): `iot.nationtech.io/v1alpha1/Deployment`, namespaced,
with `targetDevices`, `score {type, data}`, `rollout.strategy`, and a
status subresource carrying `observedScoreString`.
- Controller: `kube::runtime::Controller` + `finalizer` helper. On Apply,
writes `<device_id>.<deployment_name>` into NATS KV bucket
`desired-state` and patches `.status.observedScoreString` via
server-side apply. Skips KV write + status patch when the score is
unchanged to avoid reconcile-loop churn. On Cleanup, removes the
per-device keys before releasing the finalizer.
- CLI: `gen-crd` subcommand prints the CRD YAML from the Rust types;
`run` (default) starts the controller. `deploy/crd.yaml` is generated
by that subcommand — single source of truth, no drift.
- Deploy manifests: `deploy/operator.yaml` (Namespace, SA, ClusterRole,
ClusterRoleBinding, Deployment) and generated `deploy/crd.yaml`.
Agent fixes surfaced while aligning with the operator's key layout:
- Watch filter: was `starts_with("desired-state.<id>.")` on
`watch_all()`; bucket name is not a key prefix, so it never matched.
Now uses `bucket.watch("<id>.>")` with the NATS wildcard and handles
`Put`/`Delete`/`Purge` distinctly.
- Multi-server connect: was joining `nats.urls` with `","` into a single
malformed URL. Pass the `Vec<String>` to `ConnectOptions::connect`.
- `credentials.type` is now validated (rejects unknown discriminators)
so a v0.2 `zitadel` config doesn't silently fall back to shared creds.
Verification on feat/iot-walking-skeleton:
- cargo clippy --no-deps -D warnings: clean (agent + operator).
- cargo fmt --check: clean.
- x86_64 + aarch64 cross-compile: both build.
- podman module unit tests: pass.
|
|||
| 65ef540b97 |
feat: scaffold IoT walking skeleton — podman module, operator, and agent
Some checks are pending
Run Check Script / check (pull_request) Waiting to run
- Add PodmanV0Score/IotScore (adjacent-tagged serde) and PodmanV0Interpret stub - Gate virt behind kvm feature and podman-api behind podman feature - Scaffold iot-operator-v0 (kube-rs operator stub) and iot-agent-v0 (NATS KV watch) - Add PodmanV0 to InterpretName enum - Fix aarch64 cross-compilation by making kvm/podman optional features - Align async-nats across workspace, add workspace deps for tracing/toml/tracing-subscriber - Remove unused deps (serde_yaml from agent, schemars from operator) - Add Send+Sync to CredentialSource, fix &PathBuf → &Path, remove dead_code allow - Update 5 KVM example Cargo.tomls with explicit features = ["kvm"] |