Working PyJWT script + nats CLI commands for talking to a callout-protected NATS by hand. Distills what we learned debugging the auth chain: which scope claims matter, why the audience is the project id (not the API app's clientId), how to read OIDC_AUDIENCE off the live callout instead of trusting the cache, and the failure modes — including the PyJWT vs jwt package collision that costs 30 minutes the first time you hit it. Cross-linked from fleet-zitadel-faq.md.
8.0 KiB
Manual Zitadel token mint + NATS write
Operator-side recipe for talking to a callout-protected NATS by
hand: sign a JWT-bearer assertion with a Zitadel machine user's
private key, exchange it for an access token, drive nats CLI
commands with the token. Useful for debugging the auth chain,
poking the desired-state KV without the operator running, and
validating that a deployed callout is actually accepting what
you think it should.
Read fleet-zitadel-faq.md first for the underlying mechanism (RFC 7523 JWT-bearer flow, why we sign locally, what each claim means).
Inputs you need
Five strings:
| Input | Where to find it |
|---|---|
OIDC_ISSUER_URL (the Zitadel base URL) |
callout Deployment env: kubectl exec -n fleet-system deploy/fleet-callout -- printenv OIDC_ISSUER_URL |
project_id (becomes the access token's aud) |
callout Deployment env: OIDC_AUDIENCE |
Machine user's userId |
the JSON keyfile's userId field |
Machine user's keyId |
the JSON keyfile's keyId field |
| Private RSA key (PEM) | the JSON keyfile's key field |
Get the fleet-ops (admin role) JSON keyfile from the cache:
jq -r '.machine_keys["fleet-ops"]' \
~/.local/share/harmony/zitadel/client-config.json \
> /tmp/fleet-ops.json
jq -r '.userId' /tmp/fleet-ops.json # → user_id
jq -r '.keyId' /tmp/fleet-ops.json # → key_id
jq -r '.key' /tmp/fleet-ops.json > /tmp/fleet-ops.pem
The cache may drift from the deployed Zitadel state if Zitadel has
been re-seeded; always pull OIDC_AUDIENCE from the running
callout, not from the cache. The cache fix landed in commit
f4d6fb94 but older entries can still trip you up.
Mint script (PyJWT)
# pip install PyJWT requests ← MUST be PyJWT, not the `jwt` package.
# The two share `import jwt`; `jwt` (the package) refuses raw PEM
# strings and demands an AbstractJWKBase wrapper. PyJWT takes PEM
# directly. If you ever see `TypeError: key must be an instance of
# a class implements jwt.AbstractJWKBase`, you have the wrong one.
import jwt, time, requests
# These come from the running callout + Zitadel. Don't reuse stale
# values from a checked-in note; verify against the live cluster.
OIDC_ISSUER_URL = "http://sso.fleet.local:8080"
PROJECT_ID = "371158654839160853" # = OIDC_AUDIENCE on callout
USER_ID = "..." # from machine keyfile
KEY_ID = "..." # from machine keyfile
key = open("/tmp/fleet-ops.pem").read()
now = int(time.time())
assertion = jwt.encode(
{
"iss": USER_ID,
"sub": USER_ID,
"aud": OIDC_ISSUER_URL, # for Zitadel itself, NOT the project_id
"exp": now + 60, # Zitadel rejects exp - iat > 60s
"iat": now,
},
key,
algorithm="RS256",
headers={"kid": KEY_ID}, # PyJWT spelling — `headers=`, not `optional_headers=`
)
r = requests.post(
f"{OIDC_ISSUER_URL}/oauth/v2/token",
data={
"grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
"assertion": assertion,
# Three scopes:
# openid — base OIDC
# urn:zitadel:iam:org:projects:roles — PLURAL.
# Without this, Zitadel omits the role claim and the
# callout rejects with "no authorized role in token".
# urn:zitadel:iam:org:project:id:<id>:aud — singular.
# Tells Zitadel to put <id> into the access token's
# `aud` claim, which the callout's audience check
# compares against OIDC_AUDIENCE.
"scope": (
"openid "
"urn:zitadel:iam:org:projects:roles "
f"urn:zitadel:iam:org:project:id:{PROJECT_ID}:aud"
),
},
)
r.raise_for_status()
token = r.json()["access_token"]
# Sanity check — decode without verifying signature so you can see
# what Zitadel actually emitted. If anything below is wrong, the
# callout will reject your token.
print(jwt.decode(token, options={"verify_signature": False}))
print(token)
Expected decoded claims (the parts the callout will check):
| Claim | What it should be | Why |
|---|---|---|
iss |
OIDC_ISSUER_URL (byte-equal) |
Callout: validation.set_issuer(&[&self.issuer_url]) |
aud |
["<PROJECT_ID>"] |
Callout: validation.set_audience(&[&self.audience]); the array form is Zitadel's default |
exp |
~now + 12h | Zitadel default access token TTL |
client_id |
the machine user's username (fleet-ops, device-vm-device-00, …) |
Callout uses this as device_id_claim (with optional DEVICE_ID_PREFIX_STRIP applied) |
urn:zitadel:iam:org:project:<PROJECT_ID>:roles |
object with role names as keys (e.g. {"fleet-admin": {"<orgId>": "<orgName>"}}) |
Callout uses this as roles_claim and admits the role if fleet-admin or device is present |
If any of these is wrong, fix the script before bothering with NATS.
Drive NATS with the token
nats --token=<bearer> puts the value into the CONNECT frame's
auth_token, which is what the callout expects.
NATS_SERVER=192.168.122.1:30422 # libvirt host's port mapping
TOKEN=$(python3 mint.py | tail -1) # last line is the raw token
# Read everything (admin role allows >):
nats --server "$NATS_SERVER" --token "$TOKEN" kv ls device-info
nats --server "$NATS_SERVER" --token "$TOKEN" kv get device-info info.vm-device-00
# Write a desired state — agent's KV watcher fires within 1s,
# reconciler creates the podman container.
nats --server "$NATS_SERVER" --token "$TOKEN" \
kv put desired-state vm-device-00.hello-web '{
"name": "hello-web",
"type": "PodmanV0",
"data": {
"services": [{
"name": "testnginx",
"image": "docker.io/nginx:latest",
"ports": ["8080:80"]
}]
}
}'
The exact JSON shape comes from
harmony-reconciler-contracts/src/fleet.rs — read that crate when
in doubt about field names, NOT this doc; this doc is a worked
example and may drift.
Common failures and what they mean
| Symptom | Likely cause |
|---|---|
TypeError: key must be an instance of … AbstractJWKBase |
Wrong PyPI package. pip uninstall jwt && pip install PyJWT. |
HTTP 400 from /oauth/v2/token: "invalid_grant_type" |
Forgot the percent-encoded form encoding, OR grant_type value mistyped. The full URN is urn:ietf:params:oauth:grant-type:jwt-bearer. |
HTTP 400: "jwt: token is expired" |
Your assertion's exp is in the past. Wall-clock skew between your laptop and the cluster — sync NTP. |
Token mints but no urn:zitadel:…:roles claim |
Missing the plural urn:zitadel:iam:org:projects:roles in scope. |
Token mints but aud is the issuer URL instead of the project id |
Forgot the urn:zitadel:iam:org:project:id:<id>:aud scope. |
NATS CLI: nats: Authorization Violation |
Token is good but callout rejected it — check kubectl logs -n fleet-system -l app=fleet-callout for the actual reason. The most common ones are "InvalidAudience" (your aud ≠ deployed OIDC_AUDIENCE) and "no authorized role in token". |
Callout log: JWT validation failed: InvalidIssuer |
Trailing slash drift. OIDC_ISSUER_URL=http://sso.fleet.local:8080/ ≠ http://sso.fleet.local:8080. Match exactly. |
When the callout rejects, its log is the source of truth, not your decoded claims. The validation error includes which check failed; work backwards from there.
Rotating the deployed OIDC_AUDIENCE
If Zitadel was re-seeded and OIDC_AUDIENCE on the callout now
points at a non-existent project:
# 1. Confirm the live project id
oc -n zitadel exec -ti deploy/zitadel -- /bin/sh -c \
'curl -s -H "Authorization: Bearer $PAT" \
$ZITADEL_URL/management/v1/projects/_search \
| jq ".result[] | select(.name == \"fleet\") | .id"'
# 2. Re-run the bring-up — the live-query fix in f4d6fb94 will
# refresh OIDC_AUDIENCE on the next NatsAuthCalloutScore apply.
The shape of mint.py doesn't change between regular operation
and post-recovery — you just plug in fresh values for
OIDC_AUDIENCE and PROJECT_ID.