Some checks failed
Run Check Script / check (pull_request) Failing after -44h57m23s
Two bugs surfaced when the agent went live against NATS JetStream KV
in the VM-based e2e rehearsal:
1. The default `device` role only allowed flat `device-state.<id>` /
`device-commands.<id>` subjects. The agent's actual data plane is
JetStream KV, which puts every operation on `$KV.<bucket>.<key>`
subjects with control-plane traffic on `$JS.API.>` and `$JS.ACK.>`.
With the old role config, the very first KV publish died with
`Permissions Violation for Publish to "$JS.API.INFO"`.
The role now allows `$JS.API.>` + `$JS.ACK.>` plus the four
per-device data subjects derived from
harmony_reconciler_contracts::kv (info.<id>, state.<id>.<dep>,
heartbeat.<id>, desired-state.<id>.<dep>). The legacy direct
`device-state.<id>` / `device-commands.<id>` subjects are kept so
non-JetStream callers of NatsAuthCalloutScore still work.
A new unit test (`device_role_covers_reconciler_contract_kv_subjects`)
imports the contract crate as a dev-dep and asserts each contract-
produced subject is matched, plus that cross-device subjects are
*not* matched. This locks the role config to the contract surface so
future renames break the test before they break prod.
2. Zitadel's `client_id` claim for a machine user equals the userName
verbatim. Both `fleet_rpi_setup` and `fleet_e2e_demo` create the
user as `device-{device_id}`, so the JWT carries
`device-vm-device-00` while the agent's KV keys use the bare
`vm-device-00`. The callout was interpolating the prefixed string
into permissions, producing rules that never matched what the
agent actually publishes.
Adds `device_id_prefix_strip` (env: `DEVICE_ID_PREFIX_STRIP`,
defaults empty so existing deployments are unaffected). When set,
the validator strips the prefix from the extracted claim before
permission interpolation. The fleet_auth_callout example wires it
to `device-` so the e2e harness stays end-to-end correct without
reaching into either naming convention.
Verified end-to-end: both VM agents now publish DeviceInfo /
heartbeat through JetStream KV with no permission errors and zero
service restarts since the rollout.