Replace all Command::new("kubectl") calls with harmony-k8s K8sClient
methods:
- wait_for_pod_ready() instead of kubectl get pod jsonpath
- exec_pod_capture_output() for OpenBao init/unseal/configure
- delete_resource<MutatingWebhookConfiguration>() for webhook cleanup
- port_forward() instead of kubectl port-forward subprocess
Thread K3d and K8sClient through all functions instead of
reconstructing context strings. Consolidate path helpers into
harmony_data_dir().
Add Zitadel deployment via ZitadelScore with retry logic for CNPG CRD
registration race and PostgreSQL cluster readiness timing.
Add CLI flags: --demo, --sso-demo, --skip-zitadel, --cleanup.
Add --demo mode: ConfigManager with EnvSource + StoreSource<OpenbaoSecretStore>.
Configure OpenBao with harmony-dev policy, userpass auth, and JWT auth.
9.8 KiB
Harmony SSO Plan
Context
Deploy Zitadel and OpenBao on a local k3d cluster, use them as harmony_config backends, and demonstrate end-to-end config storage authenticated via SSO. The goal: rock-solid deployment so teams and collaborators can reliably share config and secrets through OpenBao with Zitadel SSO authentication.
Status
Phase A: MVP with Token Auth -- DONE
- A.1 -- CLI argument parsing (
--demo,--sso-demo,--skip-zitadel,--cleanup) - A.2 -- Zitadel deployment via
ZitadelScore(external_secure: falsefor k3d) - A.3 -- OpenBao JWT auth method +
harmony-devpolicy configuration - A.4 --
--demoflag: config storage demo with token auth viaConfigManager - A.5 -- Hardening: retry loops for pod readiness, HTTP readiness checks,
--cleanup - A.6 -- README with prerequisites, usage, and architecture
Verified end-to-end: fresh k3d cluster delete -> cargo run -p example-harmony-sso -> --demo succeeds.
Phase B: OIDC Device Flow + JWT Exchange -- TODO
The Zitadel OIDC device flow code exists (harmony_secret/src/store/zitadel.rs) but the JWT exchange step is missing: process_token_response() stores the OIDC access_token as openbao_token directly, but per ADR 020-1 the id_token should be exchanged with OpenBao's /v1/auth/jwt/login endpoint.
B.1 -- Implement JWT exchange in harmony_secret/src/store/zitadel.rs:
- Add
openbao_url,jwt_auth_mount,jwt_rolefields toZitadelOidcAuth - Add
exchange_jwt_for_openbao_token(id_token)using rawreqwest(vaultrs 0.7.4 has no JWT auth module) - POST
{openbao_url}/v1/auth/{jwt_auth_mount}/loginwith{"role": "...", "jwt": "..."} - Modify
process_token_response()to use exchange whenopenbao_urlis set
B.2 -- Wire JWT params through harmony_secret/src/store/openbao.rs:
- Pass
base_url,jwt_auth_mount,jwt_roletoZitadelOidcAuth::new()inauthenticate_zitadel_oidc() - Update
OpenbaoSecretStore::new()signature for optionaljwt_roleandjwt_auth_mount
B.3 -- Add env vars to harmony_secret/src/config.rs:
OPENBAO_JWT_AUTH_MOUNT(default:jwt)OPENBAO_JWT_ROLE(default:harmony-developer)
B.4 -- Silent refresh:
- Add
refresh_token()method toZitadelOidcAuth - Update auth chain in
openbao.rs: cached session -> silent refresh -> device flow
B.5 -- --sso-demo flag:
- Already stubbed in
examples/harmony_sso/src/main.rs - Requires a Zitadel device code application (manual setup, accept
HARMONY_SSO_CLIENT_IDenv var)
B.6 -- Solve in-cluster DNS for JWT auth config:
- OpenBao JWT auth needs
oidc_discovery_urlto fetch Zitadel's JWKS - Zitadel requires
Hostheader matchingExternalDomainon ALL endpoints (including/oauth/v2/keys) - So
oidc_discovery_url=http://zitadel.zitadel.svc.cluster.local:8080gets 404 from Zitadel - Options: (a) CoreDNS rewrite rule mapping
sso.harmony.local->zitadel.zitadel.svc, (b) Kubernetes ExternalName service, (c)Zitadel.AdditionalDomainsHelm config to accept the internal hostname - Currently non-fatal (warning only), needed before
--sso-democan work
Phase C: Testing & Automation -- TODO
C.1 -- Integration tests (examples/harmony_sso/tests/integration.rs, #[ignore]):
test_openbao_health-- health endpointtest_zitadel_openid_config-- OIDC discoverytest_openbao_userpass_auth-- write/read secrettest_config_manager_openbao_backend-- full ConfigManager chaintest_openbao_jwt_auth_configured-- verify JWT auth method + role exist
C.2 -- Zitadel application automation (examples/harmony_sso/src/zitadel_setup.rs):
- Automate project + device code app creation via Zitadel Management API
- Extract and save
client_id
Tricky Things / Lessons Learned
ZitadelScore on k3d -- security context
The Zitadel container image (ghcr.io/zitadel/zitadel) defines User: "zitadel" (non-numeric string). With runAsNonRoot: true and runAsUser: null, kubelet can't verify the user is non-root and fails with CreateContainerConfigError. Fix: set runAsUser: 1000 explicitly (that's the UID for zitadel in /etc/passwd). This applies to all security contexts: podSecurityContext, securityContext, initJob, setupJob, and login.
Changed in harmony/src/modules/zitadel/mod.rs for the K3sFamily | Default branch.
ZitadelScore on k3d -- ingress class
The K3sFamily Helm values had kubernetes.io/ingress.class: nginx annotations. k3d ships with traefik, not nginx. The nginx annotation caused traefik to ignore the ingress entirely (404 on all routes). Fix: removed the explicit ingress class annotations -- traefik picks up ingresses without an explicit class by default.
Changed in harmony/src/modules/zitadel/mod.rs for the K3sFamily | Default branch.
CNPG CRD registration race
After helm install cloudnative-pg, the operator deployment becomes ready but the CRD (clusters.postgresql.cnpg.io) is not yet registered in the API server's discovery cache. The kube client caches API discovery at init time, so even after the CRD registers, a reused client won't see it. Fix: the example creates a fresh topology (and therefore fresh kube client) on each retry attempt. Up to 5 retries with 15s delay.
CNPG PostgreSQL cluster readiness
After the CNPG Cluster CR is created, the PostgreSQL pods and the -rw service take 15-30s to come up. ZitadelScore immediately calls topology.get_endpoint() which looks for the zitadel-pg-rw service. If the service doesn't exist yet, it fails with "not found for cluster". Fix: same retry loop catches this error pattern.
Zitadel Helm init job timing
The Zitadel Helm chart runs a zitadel-init pre-install/pre-upgrade Job that connects to PostgreSQL. If the PG cluster isn't fully ready (primary not accepting connections), the init job hangs until Helm's 5-minute timeout. On a cold start from scratch, the sequence is: CNPG operator install -> CRD registration (5-15s) -> PG cluster creation -> PG pod scheduling + init (~30s) -> PG primary ready -> Zitadel init job can connect. The retry loop handles this by allowing the full sequence to settle between attempts.
Zitadel Host header validation
Zitadel validates the Host header on all HTTP endpoints against its ExternalDomain config (sso.harmony.local). This means:
- The OIDC discovery endpoint (
/.well-known/openid-configuration) returns 404 if called via the internal service URL without the correct Host header - The JWKS endpoint (
/oauth/v2/keys) also requires the correct Host - OpenBao's JWT auth
oidc_discovery_urlcan't usehttp://zitadel.zitadel.svc.cluster.local:8080because Zitadel rejects the Host - From outside the cluster, use
127.0.0.1:8080withHost: sso.harmony.localheader (or add /etc/hosts entry) - Phase B needs to solve in-cluster DNS resolution for
sso.harmony.local
Both services share one port
Both Zitadel and OpenBao are exposed through traefik ingress on port 80 (mapped to host port 8080). Traefik routes by Host header: sso.harmony.local -> Zitadel, bao.harmony.local -> OpenBao. The original plan had separate port mappings (8080 for Zitadel, 8200 for OpenBao) but the 8200 mapping was useless since traefik only listens on 80/443.
For --demo mode, the port-forward bypasses traefik and connects directly to the OpenBao service on port 8200 (no Host header needed).
run_bao_command and shell escaping
The run_bao_command function runs kubectl exec ... -- sh -c "export VAULT_TOKEN=xxx && bao ...". Two gotchas:
- Must use
export VAULT_TOKEN=...(not justVAULT_TOKEN=...prefix) because piped commands after|don't inherit the prefix env var - The policy creation uses
printf '...' | bao policy write harmony-dev -which needs careful quoting inside thesh -cwrapper. Usingrun_bao_command_raw()avoids double-wrapping.
FIXMEs for future refactoring
The user flagged several areas that should use harmony-k8s instead of raw kubectl:
wait_for_pod_running()-- harmony-k8s has pod wait functionalityinit_openbao(),unseal_openbao()-- exec into pods via kubectlget_k3d_binary_path(),get_openbao_data_path()-- leaking implementation details from k3d/openbao cratesconfigure_openbao()-- future candidate for an OpenBao/Vault capability trait
Files Modified (Phase A)
| File | Change |
|---|---|
examples/harmony_sso/Cargo.toml |
Added clap, schemars, interactive-parse |
examples/harmony_sso/src/main.rs |
Complete rewrite: CLI args, Zitadel deploy, JWT auth config, demo modes, hardening |
examples/harmony_sso/README.md |
New: prerequisites, usage, architecture |
harmony/src/modules/zitadel/mod.rs |
Fixed K3s security context (runAsUser: 1000), removed nginx ingress annotations |
Files to Modify (Phase B)
| File | Change |
|---|---|
harmony_secret/src/store/zitadel.rs |
JWT exchange, silent refresh |
harmony_secret/src/store/openbao.rs |
Wire JWT params, refresh in auth chain |
harmony_secret/src/config.rs |
OPENBAO_JWT_AUTH_MOUNT, OPENBAO_JWT_ROLE env vars |
Verification
Phase A (verified 2026-03-28):
cargo run -p example-harmony-sso-> deploys k3d + OpenBao + Zitadel (with retry for CNPG CRD + PG readiness)curl -H "Host: bao.harmony.local" http://127.0.0.1:8080/v1/sys/health-> OpenBao healthy (initialized, unsealed)curl -H "Host: sso.harmony.local" http://127.0.0.1:8080/.well-known/openid-configuration-> Zitadel OIDC config with device_authorization_endpointcargo run -p example-harmony-sso -- --demo-> writes/reads config via ConfigManager + OpenbaoSecretStore, env override works
Phase B:
HARMONY_SSO_URL=http://sso.harmony.local HARMONY_SSO_CLIENT_ID=<id> cargo run -p example-harmony-sso -- --sso-demo- Device code appears, login in browser, config stored via SSO-authenticated OpenBao token
Phase C:
cargo test -p example-harmony-sso -- --ignored-> integration tests pass