Working PyJWT script + nats CLI commands for talking to a callout-protected NATS by hand. Distills what we learned debugging the auth chain: which scope claims matter, why the audience is the project id (not the API app's clientId), how to read OIDC_AUDIENCE off the live callout instead of trusting the cache, and the failure modes — including the PyJWT vs jwt package collision that costs 30 minutes the first time you hit it. Cross-linked from fleet-zitadel-faq.md.
190 lines
8.0 KiB
Markdown
190 lines
8.0 KiB
Markdown
# Manual Zitadel token mint + NATS write
|
|
|
|
Operator-side recipe for talking to a callout-protected NATS by
|
|
hand: sign a JWT-bearer assertion with a Zitadel machine user's
|
|
private key, exchange it for an access token, drive `nats` CLI
|
|
commands with the token. Useful for debugging the auth chain,
|
|
poking the desired-state KV without the operator running, and
|
|
validating that a deployed callout is actually accepting what
|
|
you think it should.
|
|
|
|
Read [fleet-zitadel-faq.md](./fleet-zitadel-faq.md) first for the
|
|
underlying mechanism (RFC 7523 JWT-bearer flow, why we sign
|
|
locally, what each claim means).
|
|
|
|
## Inputs you need
|
|
|
|
Five strings:
|
|
|
|
| Input | Where to find it |
|
|
| --- | --- |
|
|
| `OIDC_ISSUER_URL` (the Zitadel base URL) | callout Deployment env: `kubectl exec -n fleet-system deploy/fleet-callout -- printenv OIDC_ISSUER_URL` |
|
|
| `project_id` (becomes the access token's `aud`) | callout Deployment env: `OIDC_AUDIENCE` |
|
|
| Machine user's `userId` | the JSON keyfile's `userId` field |
|
|
| Machine user's `keyId` | the JSON keyfile's `keyId` field |
|
|
| Private RSA key (PEM) | the JSON keyfile's `key` field |
|
|
|
|
Get the `fleet-ops` (admin role) JSON keyfile from the cache:
|
|
|
|
```bash
|
|
jq -r '.machine_keys["fleet-ops"]' \
|
|
~/.local/share/harmony/zitadel/client-config.json \
|
|
> /tmp/fleet-ops.json
|
|
|
|
jq -r '.userId' /tmp/fleet-ops.json # → user_id
|
|
jq -r '.keyId' /tmp/fleet-ops.json # → key_id
|
|
jq -r '.key' /tmp/fleet-ops.json > /tmp/fleet-ops.pem
|
|
```
|
|
|
|
The cache may drift from the deployed Zitadel state if Zitadel has
|
|
been re-seeded; **always pull `OIDC_AUDIENCE` from the running
|
|
callout**, not from the cache. The cache fix landed in commit
|
|
`f4d6fb94` but older entries can still trip you up.
|
|
|
|
## Mint script (PyJWT)
|
|
|
|
```python
|
|
# pip install PyJWT requests ← MUST be PyJWT, not the `jwt` package.
|
|
# The two share `import jwt`; `jwt` (the package) refuses raw PEM
|
|
# strings and demands an AbstractJWKBase wrapper. PyJWT takes PEM
|
|
# directly. If you ever see `TypeError: key must be an instance of
|
|
# a class implements jwt.AbstractJWKBase`, you have the wrong one.
|
|
|
|
import jwt, time, requests
|
|
|
|
# These come from the running callout + Zitadel. Don't reuse stale
|
|
# values from a checked-in note; verify against the live cluster.
|
|
OIDC_ISSUER_URL = "http://sso.fleet.local:8080"
|
|
PROJECT_ID = "371158654839160853" # = OIDC_AUDIENCE on callout
|
|
USER_ID = "..." # from machine keyfile
|
|
KEY_ID = "..." # from machine keyfile
|
|
|
|
key = open("/tmp/fleet-ops.pem").read()
|
|
now = int(time.time())
|
|
|
|
assertion = jwt.encode(
|
|
{
|
|
"iss": USER_ID,
|
|
"sub": USER_ID,
|
|
"aud": OIDC_ISSUER_URL, # for Zitadel itself, NOT the project_id
|
|
"exp": now + 60, # Zitadel rejects exp - iat > 60s
|
|
"iat": now,
|
|
},
|
|
key,
|
|
algorithm="RS256",
|
|
headers={"kid": KEY_ID}, # PyJWT spelling — `headers=`, not `optional_headers=`
|
|
)
|
|
|
|
r = requests.post(
|
|
f"{OIDC_ISSUER_URL}/oauth/v2/token",
|
|
data={
|
|
"grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
|
|
"assertion": assertion,
|
|
# Three scopes:
|
|
# openid — base OIDC
|
|
# urn:zitadel:iam:org:projects:roles — PLURAL.
|
|
# Without this, Zitadel omits the role claim and the
|
|
# callout rejects with "no authorized role in token".
|
|
# urn:zitadel:iam:org:project:id:<id>:aud — singular.
|
|
# Tells Zitadel to put <id> into the access token's
|
|
# `aud` claim, which the callout's audience check
|
|
# compares against OIDC_AUDIENCE.
|
|
"scope": (
|
|
"openid "
|
|
"urn:zitadel:iam:org:projects:roles "
|
|
f"urn:zitadel:iam:org:project:id:{PROJECT_ID}:aud"
|
|
),
|
|
},
|
|
)
|
|
r.raise_for_status()
|
|
token = r.json()["access_token"]
|
|
|
|
# Sanity check — decode without verifying signature so you can see
|
|
# what Zitadel actually emitted. If anything below is wrong, the
|
|
# callout will reject your token.
|
|
print(jwt.decode(token, options={"verify_signature": False}))
|
|
print(token)
|
|
```
|
|
|
|
Expected decoded claims (the parts the callout will check):
|
|
|
|
| Claim | What it should be | Why |
|
|
| --- | --- | --- |
|
|
| `iss` | `OIDC_ISSUER_URL` (byte-equal) | Callout: `validation.set_issuer(&[&self.issuer_url])` |
|
|
| `aud` | `["<PROJECT_ID>"]` | Callout: `validation.set_audience(&[&self.audience])`; the array form is Zitadel's default |
|
|
| `exp` | ~now + 12h | Zitadel default access token TTL |
|
|
| `client_id` | the machine user's username (`fleet-ops`, `device-vm-device-00`, …) | Callout uses this as `device_id_claim` (with optional `DEVICE_ID_PREFIX_STRIP` applied) |
|
|
| `urn:zitadel:iam:org:project:<PROJECT_ID>:roles` | object with role names as keys (e.g. `{"fleet-admin": {"<orgId>": "<orgName>"}}`) | Callout uses this as `roles_claim` and admits the role if `fleet-admin` or `device` is present |
|
|
|
|
If any of these is wrong, fix the script before bothering with NATS.
|
|
|
|
## Drive NATS with the token
|
|
|
|
`nats --token=<bearer>` puts the value into the CONNECT frame's
|
|
`auth_token`, which is what the callout expects.
|
|
|
|
```bash
|
|
NATS_SERVER=192.168.122.1:30422 # libvirt host's port mapping
|
|
TOKEN=$(python3 mint.py | tail -1) # last line is the raw token
|
|
|
|
# Read everything (admin role allows >):
|
|
nats --server "$NATS_SERVER" --token "$TOKEN" kv ls device-info
|
|
nats --server "$NATS_SERVER" --token "$TOKEN" kv get device-info info.vm-device-00
|
|
|
|
# Write a desired state — agent's KV watcher fires within 1s,
|
|
# reconciler creates the podman container.
|
|
nats --server "$NATS_SERVER" --token "$TOKEN" \
|
|
kv put desired-state vm-device-00.hello-web '{
|
|
"name": "hello-web",
|
|
"type": "PodmanV0",
|
|
"data": {
|
|
"services": [{
|
|
"name": "testnginx",
|
|
"image": "docker.io/nginx:latest",
|
|
"ports": ["8080:80"]
|
|
}]
|
|
}
|
|
}'
|
|
```
|
|
|
|
The exact JSON shape comes from
|
|
`harmony-reconciler-contracts/src/fleet.rs` — read that crate when
|
|
in doubt about field names, NOT this doc; this doc is a worked
|
|
example and may drift.
|
|
|
|
## Common failures and what they mean
|
|
|
|
| Symptom | Likely cause |
|
|
| --- | --- |
|
|
| `TypeError: key must be an instance of … AbstractJWKBase` | Wrong PyPI package. `pip uninstall jwt && pip install PyJWT`. |
|
|
| HTTP 400 from `/oauth/v2/token`: `"invalid_grant_type"` | Forgot the percent-encoded form encoding, OR `grant_type` value mistyped. The full URN is `urn:ietf:params:oauth:grant-type:jwt-bearer`. |
|
|
| HTTP 400: `"jwt: token is expired"` | Your assertion's `exp` is in the past. Wall-clock skew between your laptop and the cluster — sync NTP. |
|
|
| Token mints but no `urn:zitadel:…:roles` claim | Missing the **plural** `urn:zitadel:iam:org:projects:roles` in scope. |
|
|
| Token mints but `aud` is the issuer URL instead of the project id | Forgot the `urn:zitadel:iam:org:project:id:<id>:aud` scope. |
|
|
| NATS CLI: `nats: Authorization Violation` | Token is good but callout rejected it — check `kubectl logs -n fleet-system -l app=fleet-callout` for the actual reason. The most common ones are "InvalidAudience" (your `aud` ≠ deployed `OIDC_AUDIENCE`) and "no authorized role in token". |
|
|
| Callout log: `JWT validation failed: InvalidIssuer` | Trailing slash drift. `OIDC_ISSUER_URL=http://sso.fleet.local:8080/` ≠ `http://sso.fleet.local:8080`. Match exactly. |
|
|
|
|
When the callout rejects, **its log is the source of truth**, not
|
|
your decoded claims. The validation error includes which check
|
|
failed; work backwards from there.
|
|
|
|
## Rotating the deployed `OIDC_AUDIENCE`
|
|
|
|
If Zitadel was re-seeded and `OIDC_AUDIENCE` on the callout now
|
|
points at a non-existent project:
|
|
|
|
```bash
|
|
# 1. Confirm the live project id
|
|
oc -n zitadel exec -ti deploy/zitadel -- /bin/sh -c \
|
|
'curl -s -H "Authorization: Bearer $PAT" \
|
|
$ZITADEL_URL/management/v1/projects/_search \
|
|
| jq ".result[] | select(.name == \"fleet\") | .id"'
|
|
|
|
# 2. Re-run the bring-up — the live-query fix in f4d6fb94 will
|
|
# refresh OIDC_AUDIENCE on the next NatsAuthCalloutScore apply.
|
|
```
|
|
|
|
The shape of `mint.py` doesn't change between regular operation
|
|
and post-recovery — you just plug in fresh values for
|
|
`OIDC_AUDIENCE` and `PROJECT_ID`.
|