Files
harmony/examples/harmony_sso/harmony_sso_plan.md
Jean-Gabriel Gill-Couture 5415452f15 refactor(harmony-sso): replace kubectl with typed K8s APIs, add Zitadel deployment
Replace all Command::new("kubectl") calls with harmony-k8s K8sClient
methods:
- wait_for_pod_ready() instead of kubectl get pod jsonpath
- exec_pod_capture_output() for OpenBao init/unseal/configure
- delete_resource<MutatingWebhookConfiguration>() for webhook cleanup
- port_forward() instead of kubectl port-forward subprocess

Thread K3d and K8sClient through all functions instead of
reconstructing context strings. Consolidate path helpers into
harmony_data_dir().

Add Zitadel deployment via ZitadelScore with retry logic for CNPG CRD
registration race and PostgreSQL cluster readiness timing.

Add CLI flags: --demo, --sso-demo, --skip-zitadel, --cleanup.
Add --demo mode: ConfigManager with EnvSource + StoreSource<OpenbaoSecretStore>.
Configure OpenBao with harmony-dev policy, userpass auth, and JWT auth.
2026-03-28 23:48:00 -04:00

9.8 KiB

Harmony SSO Plan

Context

Deploy Zitadel and OpenBao on a local k3d cluster, use them as harmony_config backends, and demonstrate end-to-end config storage authenticated via SSO. The goal: rock-solid deployment so teams and collaborators can reliably share config and secrets through OpenBao with Zitadel SSO authentication.

Status

Phase A: MVP with Token Auth -- DONE

  • A.1 -- CLI argument parsing (--demo, --sso-demo, --skip-zitadel, --cleanup)
  • A.2 -- Zitadel deployment via ZitadelScore (external_secure: false for k3d)
  • A.3 -- OpenBao JWT auth method + harmony-dev policy configuration
  • A.4 -- --demo flag: config storage demo with token auth via ConfigManager
  • A.5 -- Hardening: retry loops for pod readiness, HTTP readiness checks, --cleanup
  • A.6 -- README with prerequisites, usage, and architecture

Verified end-to-end: fresh k3d cluster delete -> cargo run -p example-harmony-sso -> --demo succeeds.

Phase B: OIDC Device Flow + JWT Exchange -- TODO

The Zitadel OIDC device flow code exists (harmony_secret/src/store/zitadel.rs) but the JWT exchange step is missing: process_token_response() stores the OIDC access_token as openbao_token directly, but per ADR 020-1 the id_token should be exchanged with OpenBao's /v1/auth/jwt/login endpoint.

B.1 -- Implement JWT exchange in harmony_secret/src/store/zitadel.rs:

  • Add openbao_url, jwt_auth_mount, jwt_role fields to ZitadelOidcAuth
  • Add exchange_jwt_for_openbao_token(id_token) using raw reqwest (vaultrs 0.7.4 has no JWT auth module)
  • POST {openbao_url}/v1/auth/{jwt_auth_mount}/login with {"role": "...", "jwt": "..."}
  • Modify process_token_response() to use exchange when openbao_url is set

B.2 -- Wire JWT params through harmony_secret/src/store/openbao.rs:

  • Pass base_url, jwt_auth_mount, jwt_role to ZitadelOidcAuth::new() in authenticate_zitadel_oidc()
  • Update OpenbaoSecretStore::new() signature for optional jwt_role and jwt_auth_mount

B.3 -- Add env vars to harmony_secret/src/config.rs:

  • OPENBAO_JWT_AUTH_MOUNT (default: jwt)
  • OPENBAO_JWT_ROLE (default: harmony-developer)

B.4 -- Silent refresh:

  • Add refresh_token() method to ZitadelOidcAuth
  • Update auth chain in openbao.rs: cached session -> silent refresh -> device flow

B.5 -- --sso-demo flag:

  • Already stubbed in examples/harmony_sso/src/main.rs
  • Requires a Zitadel device code application (manual setup, accept HARMONY_SSO_CLIENT_ID env var)

B.6 -- Solve in-cluster DNS for JWT auth config:

  • OpenBao JWT auth needs oidc_discovery_url to fetch Zitadel's JWKS
  • Zitadel requires Host header matching ExternalDomain on ALL endpoints (including /oauth/v2/keys)
  • So oidc_discovery_url=http://zitadel.zitadel.svc.cluster.local:8080 gets 404 from Zitadel
  • Options: (a) CoreDNS rewrite rule mapping sso.harmony.local -> zitadel.zitadel.svc, (b) Kubernetes ExternalName service, (c) Zitadel.AdditionalDomains Helm config to accept the internal hostname
  • Currently non-fatal (warning only), needed before --sso-demo can work

Phase C: Testing & Automation -- TODO

C.1 -- Integration tests (examples/harmony_sso/tests/integration.rs, #[ignore]):

  • test_openbao_health -- health endpoint
  • test_zitadel_openid_config -- OIDC discovery
  • test_openbao_userpass_auth -- write/read secret
  • test_config_manager_openbao_backend -- full ConfigManager chain
  • test_openbao_jwt_auth_configured -- verify JWT auth method + role exist

C.2 -- Zitadel application automation (examples/harmony_sso/src/zitadel_setup.rs):

  • Automate project + device code app creation via Zitadel Management API
  • Extract and save client_id

Tricky Things / Lessons Learned

ZitadelScore on k3d -- security context

The Zitadel container image (ghcr.io/zitadel/zitadel) defines User: "zitadel" (non-numeric string). With runAsNonRoot: true and runAsUser: null, kubelet can't verify the user is non-root and fails with CreateContainerConfigError. Fix: set runAsUser: 1000 explicitly (that's the UID for zitadel in /etc/passwd). This applies to all security contexts: podSecurityContext, securityContext, initJob, setupJob, and login.

Changed in harmony/src/modules/zitadel/mod.rs for the K3sFamily | Default branch.

ZitadelScore on k3d -- ingress class

The K3sFamily Helm values had kubernetes.io/ingress.class: nginx annotations. k3d ships with traefik, not nginx. The nginx annotation caused traefik to ignore the ingress entirely (404 on all routes). Fix: removed the explicit ingress class annotations -- traefik picks up ingresses without an explicit class by default.

Changed in harmony/src/modules/zitadel/mod.rs for the K3sFamily | Default branch.

CNPG CRD registration race

After helm install cloudnative-pg, the operator deployment becomes ready but the CRD (clusters.postgresql.cnpg.io) is not yet registered in the API server's discovery cache. The kube client caches API discovery at init time, so even after the CRD registers, a reused client won't see it. Fix: the example creates a fresh topology (and therefore fresh kube client) on each retry attempt. Up to 5 retries with 15s delay.

CNPG PostgreSQL cluster readiness

After the CNPG Cluster CR is created, the PostgreSQL pods and the -rw service take 15-30s to come up. ZitadelScore immediately calls topology.get_endpoint() which looks for the zitadel-pg-rw service. If the service doesn't exist yet, it fails with "not found for cluster". Fix: same retry loop catches this error pattern.

Zitadel Helm init job timing

The Zitadel Helm chart runs a zitadel-init pre-install/pre-upgrade Job that connects to PostgreSQL. If the PG cluster isn't fully ready (primary not accepting connections), the init job hangs until Helm's 5-minute timeout. On a cold start from scratch, the sequence is: CNPG operator install -> CRD registration (5-15s) -> PG cluster creation -> PG pod scheduling + init (~30s) -> PG primary ready -> Zitadel init job can connect. The retry loop handles this by allowing the full sequence to settle between attempts.

Zitadel Host header validation

Zitadel validates the Host header on all HTTP endpoints against its ExternalDomain config (sso.harmony.local). This means:

  • The OIDC discovery endpoint (/.well-known/openid-configuration) returns 404 if called via the internal service URL without the correct Host header
  • The JWKS endpoint (/oauth/v2/keys) also requires the correct Host
  • OpenBao's JWT auth oidc_discovery_url can't use http://zitadel.zitadel.svc.cluster.local:8080 because Zitadel rejects the Host
  • From outside the cluster, use 127.0.0.1:8080 with Host: sso.harmony.local header (or add /etc/hosts entry)
  • Phase B needs to solve in-cluster DNS resolution for sso.harmony.local

Both services share one port

Both Zitadel and OpenBao are exposed through traefik ingress on port 80 (mapped to host port 8080). Traefik routes by Host header: sso.harmony.local -> Zitadel, bao.harmony.local -> OpenBao. The original plan had separate port mappings (8080 for Zitadel, 8200 for OpenBao) but the 8200 mapping was useless since traefik only listens on 80/443.

For --demo mode, the port-forward bypasses traefik and connects directly to the OpenBao service on port 8200 (no Host header needed).

run_bao_command and shell escaping

The run_bao_command function runs kubectl exec ... -- sh -c "export VAULT_TOKEN=xxx && bao ...". Two gotchas:

  1. Must use export VAULT_TOKEN=... (not just VAULT_TOKEN=... prefix) because piped commands after | don't inherit the prefix env var
  2. The policy creation uses printf '...' | bao policy write harmony-dev - which needs careful quoting inside the sh -c wrapper. Using run_bao_command_raw() avoids double-wrapping.

FIXMEs for future refactoring

The user flagged several areas that should use harmony-k8s instead of raw kubectl:

  • wait_for_pod_running() -- harmony-k8s has pod wait functionality
  • init_openbao(), unseal_openbao() -- exec into pods via kubectl
  • get_k3d_binary_path(), get_openbao_data_path() -- leaking implementation details from k3d/openbao crates
  • configure_openbao() -- future candidate for an OpenBao/Vault capability trait

Files Modified (Phase A)

File Change
examples/harmony_sso/Cargo.toml Added clap, schemars, interactive-parse
examples/harmony_sso/src/main.rs Complete rewrite: CLI args, Zitadel deploy, JWT auth config, demo modes, hardening
examples/harmony_sso/README.md New: prerequisites, usage, architecture
harmony/src/modules/zitadel/mod.rs Fixed K3s security context (runAsUser: 1000), removed nginx ingress annotations

Files to Modify (Phase B)

File Change
harmony_secret/src/store/zitadel.rs JWT exchange, silent refresh
harmony_secret/src/store/openbao.rs Wire JWT params, refresh in auth chain
harmony_secret/src/config.rs OPENBAO_JWT_AUTH_MOUNT, OPENBAO_JWT_ROLE env vars

Verification

Phase A (verified 2026-03-28):

  • cargo run -p example-harmony-sso -> deploys k3d + OpenBao + Zitadel (with retry for CNPG CRD + PG readiness)
  • curl -H "Host: bao.harmony.local" http://127.0.0.1:8080/v1/sys/health -> OpenBao healthy (initialized, unsealed)
  • curl -H "Host: sso.harmony.local" http://127.0.0.1:8080/.well-known/openid-configuration -> Zitadel OIDC config with device_authorization_endpoint
  • cargo run -p example-harmony-sso -- --demo -> writes/reads config via ConfigManager + OpenbaoSecretStore, env override works

Phase B:

  • HARMONY_SSO_URL=http://sso.harmony.local HARMONY_SSO_CLIENT_ID=<id> cargo run -p example-harmony-sso -- --sso-demo
  • Device code appears, login in browser, config stored via SSO-authenticated OpenBao token

Phase C:

  • cargo test -p example-harmony-sso -- --ignored -> integration tests pass