Files
harmony/nats/callout/Cargo.toml
Jean-Gabriel Gill-Couture d4fd4859ec
Some checks failed
Run Check Script / check (pull_request) Failing after -44h57m23s
fix(callout): align device permissions with KV key formats and machine-user prefix
Two bugs surfaced when the agent went live against NATS JetStream KV
in the VM-based e2e rehearsal:

1. The default `device` role only allowed flat `device-state.<id>` /
   `device-commands.<id>` subjects. The agent's actual data plane is
   JetStream KV, which puts every operation on `$KV.<bucket>.<key>`
   subjects with control-plane traffic on `$JS.API.>` and `$JS.ACK.>`.
   With the old role config, the very first KV publish died with
   `Permissions Violation for Publish to "$JS.API.INFO"`.

   The role now allows `$JS.API.>` + `$JS.ACK.>` plus the four
   per-device data subjects derived from
   harmony_reconciler_contracts::kv (info.<id>, state.<id>.<dep>,
   heartbeat.<id>, desired-state.<id>.<dep>). The legacy direct
   `device-state.<id>` / `device-commands.<id>` subjects are kept so
   non-JetStream callers of NatsAuthCalloutScore still work.

   A new unit test (`device_role_covers_reconciler_contract_kv_subjects`)
   imports the contract crate as a dev-dep and asserts each contract-
   produced subject is matched, plus that cross-device subjects are
   *not* matched. This locks the role config to the contract surface so
   future renames break the test before they break prod.

2. Zitadel's `client_id` claim for a machine user equals the userName
   verbatim. Both `fleet_rpi_setup` and `fleet_e2e_demo` create the
   user as `device-{device_id}`, so the JWT carries
   `device-vm-device-00` while the agent's KV keys use the bare
   `vm-device-00`. The callout was interpolating the prefixed string
   into permissions, producing rules that never matched what the
   agent actually publishes.

   Adds `device_id_prefix_strip` (env: `DEVICE_ID_PREFIX_STRIP`,
   defaults empty so existing deployments are unaffected). When set,
   the validator strips the prefix from the extracted claim before
   permission interpolation. The fleet_auth_callout example wires it
   to `device-` so the e2e harness stays end-to-end correct without
   reaching into either naming convention.

Verified end-to-end: both VM agents now publish DeviceInfo /
heartbeat through JetStream KV with no permission errors and zero
service restarts since the rollout.
2026-05-03 17:49:48 -04:00

35 lines
919 B
TOML

[package]
name = "harmony-nats-callout"
edition = "2024"
version.workspace = true
readme.workspace = true
license.workspace = true
description = "NATS auth callout service for Zitadel SSO with per-device permissions"
rust-version = "1.85"
[lib]
name = "harmony_nats_callout"
path = "src/lib.rs"
[[bin]]
name = "harmony-nats-callout"
path = "src/main.rs"
[dependencies]
nats-jwt = { path = "../jwt" }
async-nats.workspace = true
nkeys = "0.4"
jsonwebtoken = "9"
reqwest = { workspace = true }
serde = { workspace = true, features = ["derive"] }
serde_json.workspace = true
tracing.workspace = true
tracing-subscriber.workspace = true
thiserror.workspace = true
anyhow.workspace = true
tokio = { workspace = true, features = ["rt", "rt-multi-thread", "macros", "signal", "sync", "time"] }
futures-util.workspace = true
[dev-dependencies]
harmony-reconciler-contracts = { path = "../../harmony-reconciler-contracts" }