Some checks failed
Run Check Script / check (pull_request) Failing after -44h57m23s
Two bugs surfaced when the agent went live against NATS JetStream KV
in the VM-based e2e rehearsal:
1. The default `device` role only allowed flat `device-state.<id>` /
`device-commands.<id>` subjects. The agent's actual data plane is
JetStream KV, which puts every operation on `$KV.<bucket>.<key>`
subjects with control-plane traffic on `$JS.API.>` and `$JS.ACK.>`.
With the old role config, the very first KV publish died with
`Permissions Violation for Publish to "$JS.API.INFO"`.
The role now allows `$JS.API.>` + `$JS.ACK.>` plus the four
per-device data subjects derived from
harmony_reconciler_contracts::kv (info.<id>, state.<id>.<dep>,
heartbeat.<id>, desired-state.<id>.<dep>). The legacy direct
`device-state.<id>` / `device-commands.<id>` subjects are kept so
non-JetStream callers of NatsAuthCalloutScore still work.
A new unit test (`device_role_covers_reconciler_contract_kv_subjects`)
imports the contract crate as a dev-dep and asserts each contract-
produced subject is matched, plus that cross-device subjects are
*not* matched. This locks the role config to the contract surface so
future renames break the test before they break prod.
2. Zitadel's `client_id` claim for a machine user equals the userName
verbatim. Both `fleet_rpi_setup` and `fleet_e2e_demo` create the
user as `device-{device_id}`, so the JWT carries
`device-vm-device-00` while the agent's KV keys use the bare
`vm-device-00`. The callout was interpolating the prefixed string
into permissions, producing rules that never matched what the
agent actually publishes.
Adds `device_id_prefix_strip` (env: `DEVICE_ID_PREFIX_STRIP`,
defaults empty so existing deployments are unaffected). When set,
the validator strips the prefix from the extracted claim before
permission interpolation. The fleet_auth_callout example wires it
to `device-` so the e2e harness stays end-to-end correct without
reaching into either naming convention.
Verified end-to-end: both VM agents now publish DeviceInfo /
heartbeat through JetStream KV with no permission errors and zero
service restarts since the rollout.
35 lines
919 B
TOML
35 lines
919 B
TOML
[package]
|
|
name = "harmony-nats-callout"
|
|
edition = "2024"
|
|
version.workspace = true
|
|
readme.workspace = true
|
|
license.workspace = true
|
|
description = "NATS auth callout service for Zitadel SSO with per-device permissions"
|
|
rust-version = "1.85"
|
|
|
|
[lib]
|
|
name = "harmony_nats_callout"
|
|
path = "src/lib.rs"
|
|
|
|
[[bin]]
|
|
name = "harmony-nats-callout"
|
|
path = "src/main.rs"
|
|
|
|
[dependencies]
|
|
nats-jwt = { path = "../jwt" }
|
|
async-nats.workspace = true
|
|
nkeys = "0.4"
|
|
jsonwebtoken = "9"
|
|
reqwest = { workspace = true }
|
|
serde = { workspace = true, features = ["derive"] }
|
|
serde_json.workspace = true
|
|
tracing.workspace = true
|
|
tracing-subscriber.workspace = true
|
|
thiserror.workspace = true
|
|
anyhow.workspace = true
|
|
tokio = { workspace = true, features = ["rt", "rt-multi-thread", "macros", "signal", "sync", "time"] }
|
|
futures-util.workspace = true
|
|
|
|
[dev-dependencies]
|
|
harmony-reconciler-contracts = { path = "../../harmony-reconciler-contracts" }
|