The fleet agent's NATS connection is the load-bearing piece of the
"never lose connectivity to a device" guarantee. This commit makes
that hold even when Zitadel access tokens expire across NATS pod
restarts and network partitions.
New `[credentials]` config variants (externally-tagged):
type = "toml-shared" { nats_user, nats_pass } # v0/dev
type = "zitadel-jwt" { key_path, oidc_issuer_url, audience, ... }
A `CredentialSource` enum dispatches per variant:
- TomlShared returns the same user/pass each call.
- ZitadelJwt mints an access token from Zitadel via the JWT-bearer
flow (RFC 7523). The keyfile at `key_path` is the only durable
secret on the device; the bearer token is short-lived and refreshed
in-memory when the cached value is within 5 minutes of expiry.
Two concurrent refreshes are race-safe — the second writer's mint
is wasted but produces a correct token.
The agent's `connect_nats` is rewritten on top of async-nats's
`with_auth_callback`, which is invoked on every (re)connect attempt:
- async-nats reconnects automatically on disconnect (default
behaviour of ConnectOptions) — we don't need a watchdog.
- Each reconnect attempt invokes the callback, which calls
`next_credential()`. If the cached token is expired, a fresh one
is minted before the reconnect proceeds. So a Pi that loses NATS
while its token has just expired will pick up a brand-new token
on the next reconnect attempt with no operator intervention.
- An `event_callback` surfaces Connected / Disconnected / SlowConsumer
/ ServerError events into tracing — operators can see exactly when
reconnects happen, which is non-negotiable for an out-of-warranty
device fleet.
A subtle constraint drove the trait shape: async-nats's
`with_auth_callback` requires the returned future to be `Send + Sync`,
which `#[async_trait]`'s erased `Pin<Box<dyn Future + Send>>` does
not satisfy. The credential source is therefore an enum (concrete
dispatch) rather than `dyn CredentialSource`. Two variants is small
enough that enum dispatch beats trait-object plumbing.
Out of scope, tracked for follow-up: a separate daemon for SSH access
to the Pi via Tailscale/Headscale ("secure backdoor"), and the
device-join-request + admin-approve flow that would replace the
current admin-PAT bootstrap pattern.
2.7 KiB
-- documentation : https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/nkey_auth https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/jwt https://docs.nats.io/running-a-nats-service/nats_admin/security/jwt
--- context : openbao allows integration with jwks or whatever protocol required to interact with zitadel directly, but nats does not. See documentation above and analysis below :
These are notes taken from this video
https://www.youtube.com/watch?v=VvGxrT-jv64 https://github.com/synadia-io/rethink_connectivity/tree/main/19-auth-callout
nsc generate nkey --account
generates nsc key pair for the auth callout service
- nats.conf
add
authorization {
auth_callout {
issuer: <pubkey of the new nsc key pair>
auth_users: [ auth, user ] # list of users we can discover on the account. (something I don't get here, I want dynamic users management through the jwt)
account: CHAT # Name of the account we want to discover users on, this account exists in the accounts block
}
}
- Write the auth callout service, full code example here https://github.com/synadia-io/rethink_connectivity/tree/main/19-auth-callout
3.1 This service will be the app authorized by the SSO provider (google in the example, zitadel in our case)
3.2 Load the NKeySeed (private key from the pair above)
3.3 connect to nats. We will communicate with the nats server through nats protocol itself to handle auth callout requests
3.4 Subscribe to the KV workspace (not sure why yet)
3.5 start forging the nats jwt token using the request nkey (each new client connection comes with an nkey which will be used for the session)
3.6 setup the audience (nats account from above, CHAT in the example)
3.7 Validate and decode the jwt (nats passes the user jwt as request connectionoptions token)
3.8 Add user to the workspace (wtf this is completely dynamic?, how do we remove it?)
3.9 Attach permissions inside the nats jwt such as
Allow : [ "$JS.API.INFO", format!("chat.*.{userId}") ]where userId is read from the google jwt, our case zitadel jwt.
Now, synadia provides a small SDK to ease writing auth callout services in Go. But we're in rust. It might be worth writing this thing in go to benefit from synadia's stuff but from what I gathered, only the nats jwt minting is maybe something that we would benefit a lot from. But then again I think that crafting a jwt is something standard?
Interaction with zitadel and all the rest is likely the same or more work for us as our entire ecosystem is in rust. Let's analyze this properly.
https://github.com/synadia-io/callout.go/tree/main
https://github.com/synadia-io/callout.go/tree/main/examples/dynamic_accounts