Files
harmony/docs/guides/fleet-manual-token-mint.md
Jean-Gabriel Gill-Couture 612d934ad4 docs(fleet): manual JWT-bearer mint + NATS write recipe
Working PyJWT script + nats CLI commands for talking to a
callout-protected NATS by hand. Distills what we learned debugging
the auth chain: which scope claims matter, why the audience is the
project id (not the API app's clientId), how to read OIDC_AUDIENCE
off the live callout instead of trusting the cache, and the failure
modes — including the PyJWT vs jwt package collision that costs
30 minutes the first time you hit it.

Cross-linked from fleet-zitadel-faq.md.
2026-05-05 01:43:36 -04:00

8.0 KiB

Manual Zitadel token mint + NATS write

Operator-side recipe for talking to a callout-protected NATS by hand: sign a JWT-bearer assertion with a Zitadel machine user's private key, exchange it for an access token, drive nats CLI commands with the token. Useful for debugging the auth chain, poking the desired-state KV without the operator running, and validating that a deployed callout is actually accepting what you think it should.

Read fleet-zitadel-faq.md first for the underlying mechanism (RFC 7523 JWT-bearer flow, why we sign locally, what each claim means).

Inputs you need

Five strings:

Input Where to find it
OIDC_ISSUER_URL (the Zitadel base URL) callout Deployment env: kubectl exec -n fleet-system deploy/fleet-callout -- printenv OIDC_ISSUER_URL
project_id (becomes the access token's aud) callout Deployment env: OIDC_AUDIENCE
Machine user's userId the JSON keyfile's userId field
Machine user's keyId the JSON keyfile's keyId field
Private RSA key (PEM) the JSON keyfile's key field

Get the fleet-ops (admin role) JSON keyfile from the cache:

jq -r '.machine_keys["fleet-ops"]' \
  ~/.local/share/harmony/zitadel/client-config.json \
  > /tmp/fleet-ops.json

jq -r '.userId' /tmp/fleet-ops.json    # → user_id
jq -r '.keyId'  /tmp/fleet-ops.json    # → key_id
jq -r '.key'    /tmp/fleet-ops.json    > /tmp/fleet-ops.pem

The cache may drift from the deployed Zitadel state if Zitadel has been re-seeded; always pull OIDC_AUDIENCE from the running callout, not from the cache. The cache fix landed in commit f4d6fb94 but older entries can still trip you up.

Mint script (PyJWT)

# pip install PyJWT requests   ← MUST be PyJWT, not the `jwt` package.
# The two share `import jwt`; `jwt` (the package) refuses raw PEM
# strings and demands an AbstractJWKBase wrapper. PyJWT takes PEM
# directly. If you ever see `TypeError: key must be an instance of
# a class implements jwt.AbstractJWKBase`, you have the wrong one.

import jwt, time, requests

# These come from the running callout + Zitadel. Don't reuse stale
# values from a checked-in note; verify against the live cluster.
OIDC_ISSUER_URL = "http://sso.fleet.local:8080"
PROJECT_ID      = "371158654839160853"   # = OIDC_AUDIENCE on callout
USER_ID         = "..."                  # from machine keyfile
KEY_ID          = "..."                  # from machine keyfile

key = open("/tmp/fleet-ops.pem").read()
now = int(time.time())

assertion = jwt.encode(
    {
        "iss": USER_ID,
        "sub": USER_ID,
        "aud": OIDC_ISSUER_URL,   # for Zitadel itself, NOT the project_id
        "exp": now + 60,          # Zitadel rejects exp - iat > 60s
        "iat": now,
    },
    key,
    algorithm="RS256",
    headers={"kid": KEY_ID},      # PyJWT spelling — `headers=`, not `optional_headers=`
)

r = requests.post(
    f"{OIDC_ISSUER_URL}/oauth/v2/token",
    data={
        "grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
        "assertion":  assertion,
        # Three scopes:
        #   openid                                     — base OIDC
        #   urn:zitadel:iam:org:projects:roles         — PLURAL.
        #     Without this, Zitadel omits the role claim and the
        #     callout rejects with "no authorized role in token".
        #   urn:zitadel:iam:org:project:id:<id>:aud    — singular.
        #     Tells Zitadel to put <id> into the access token's
        #     `aud` claim, which the callout's audience check
        #     compares against OIDC_AUDIENCE.
        "scope": (
            "openid "
            "urn:zitadel:iam:org:projects:roles "
            f"urn:zitadel:iam:org:project:id:{PROJECT_ID}:aud"
        ),
    },
)
r.raise_for_status()
token = r.json()["access_token"]

# Sanity check — decode without verifying signature so you can see
# what Zitadel actually emitted. If anything below is wrong, the
# callout will reject your token.
print(jwt.decode(token, options={"verify_signature": False}))
print(token)

Expected decoded claims (the parts the callout will check):

Claim What it should be Why
iss OIDC_ISSUER_URL (byte-equal) Callout: validation.set_issuer(&[&self.issuer_url])
aud ["<PROJECT_ID>"] Callout: validation.set_audience(&[&self.audience]); the array form is Zitadel's default
exp ~now + 12h Zitadel default access token TTL
client_id the machine user's username (fleet-ops, device-vm-device-00, …) Callout uses this as device_id_claim (with optional DEVICE_ID_PREFIX_STRIP applied)
urn:zitadel:iam:org:project:<PROJECT_ID>:roles object with role names as keys (e.g. {"fleet-admin": {"<orgId>": "<orgName>"}}) Callout uses this as roles_claim and admits the role if fleet-admin or device is present

If any of these is wrong, fix the script before bothering with NATS.

Drive NATS with the token

nats --token=<bearer> puts the value into the CONNECT frame's auth_token, which is what the callout expects.

NATS_SERVER=192.168.122.1:30422       # libvirt host's port mapping
TOKEN=$(python3 mint.py | tail -1)    # last line is the raw token

# Read everything (admin role allows >):
nats --server "$NATS_SERVER" --token "$TOKEN" kv ls device-info
nats --server "$NATS_SERVER" --token "$TOKEN" kv get device-info info.vm-device-00

# Write a desired state — agent's KV watcher fires within 1s,
# reconciler creates the podman container.
nats --server "$NATS_SERVER" --token "$TOKEN" \
  kv put desired-state vm-device-00.hello-web '{
    "name": "hello-web",
    "type": "PodmanV0",
    "data": {
      "services": [{
        "name":  "testnginx",
        "image": "docker.io/nginx:latest",
        "ports": ["8080:80"]
      }]
    }
  }'

The exact JSON shape comes from harmony-reconciler-contracts/src/fleet.rs — read that crate when in doubt about field names, NOT this doc; this doc is a worked example and may drift.

Common failures and what they mean

Symptom Likely cause
TypeError: key must be an instance of … AbstractJWKBase Wrong PyPI package. pip uninstall jwt && pip install PyJWT.
HTTP 400 from /oauth/v2/token: "invalid_grant_type" Forgot the percent-encoded form encoding, OR grant_type value mistyped. The full URN is urn:ietf:params:oauth:grant-type:jwt-bearer.
HTTP 400: "jwt: token is expired" Your assertion's exp is in the past. Wall-clock skew between your laptop and the cluster — sync NTP.
Token mints but no urn:zitadel:…:roles claim Missing the plural urn:zitadel:iam:org:projects:roles in scope.
Token mints but aud is the issuer URL instead of the project id Forgot the urn:zitadel:iam:org:project:id:<id>:aud scope.
NATS CLI: nats: Authorization Violation Token is good but callout rejected it — check kubectl logs -n fleet-system -l app=fleet-callout for the actual reason. The most common ones are "InvalidAudience" (your aud ≠ deployed OIDC_AUDIENCE) and "no authorized role in token".
Callout log: JWT validation failed: InvalidIssuer Trailing slash drift. OIDC_ISSUER_URL=http://sso.fleet.local:8080/http://sso.fleet.local:8080. Match exactly.

When the callout rejects, its log is the source of truth, not your decoded claims. The validation error includes which check failed; work backwards from there.

Rotating the deployed OIDC_AUDIENCE

If Zitadel was re-seeded and OIDC_AUDIENCE on the callout now points at a non-existent project:

# 1. Confirm the live project id
oc -n zitadel exec -ti deploy/zitadel -- /bin/sh -c \
  'curl -s -H "Authorization: Bearer $PAT" \
        $ZITADEL_URL/management/v1/projects/_search \
   | jq ".result[] | select(.name == \"fleet\") | .id"'

# 2. Re-run the bring-up — the live-query fix in f4d6fb94 will
#    refresh OIDC_AUDIENCE on the next NatsAuthCalloutScore apply.

The shape of mint.py doesn't change between regular operation and post-recovery — you just plug in fresh values for OIDC_AUDIENCE and PROJECT_ID.