Files
harmony/examples/harmony_sso/harmony_sso_plan.md
Jean-Gabriel Gill-Couture 5415452f15 refactor(harmony-sso): replace kubectl with typed K8s APIs, add Zitadel deployment
Replace all Command::new("kubectl") calls with harmony-k8s K8sClient
methods:
- wait_for_pod_ready() instead of kubectl get pod jsonpath
- exec_pod_capture_output() for OpenBao init/unseal/configure
- delete_resource<MutatingWebhookConfiguration>() for webhook cleanup
- port_forward() instead of kubectl port-forward subprocess

Thread K3d and K8sClient through all functions instead of
reconstructing context strings. Consolidate path helpers into
harmony_data_dir().

Add Zitadel deployment via ZitadelScore with retry logic for CNPG CRD
registration race and PostgreSQL cluster readiness timing.

Add CLI flags: --demo, --sso-demo, --skip-zitadel, --cleanup.
Add --demo mode: ConfigManager with EnvSource + StoreSource<OpenbaoSecretStore>.
Configure OpenBao with harmony-dev policy, userpass auth, and JWT auth.
2026-03-28 23:48:00 -04:00

156 lines
9.8 KiB
Markdown

# Harmony SSO Plan
## Context
Deploy Zitadel and OpenBao on a local k3d cluster, use them as `harmony_config` backends, and demonstrate end-to-end config storage authenticated via SSO. The goal: rock-solid deployment so teams and collaborators can reliably share config and secrets through OpenBao with Zitadel SSO authentication.
## Status
### Phase A: MVP with Token Auth -- DONE
- [x] A.1 -- CLI argument parsing (`--demo`, `--sso-demo`, `--skip-zitadel`, `--cleanup`)
- [x] A.2 -- Zitadel deployment via `ZitadelScore` (`external_secure: false` for k3d)
- [x] A.3 -- OpenBao JWT auth method + `harmony-dev` policy configuration
- [x] A.4 -- `--demo` flag: config storage demo with token auth via `ConfigManager`
- [x] A.5 -- Hardening: retry loops for pod readiness, HTTP readiness checks, `--cleanup`
- [x] A.6 -- README with prerequisites, usage, and architecture
Verified end-to-end: fresh `k3d cluster delete` -> `cargo run -p example-harmony-sso` -> `--demo` succeeds.
### Phase B: OIDC Device Flow + JWT Exchange -- TODO
The Zitadel OIDC device flow code exists (`harmony_secret/src/store/zitadel.rs`) but the **JWT exchange** step is missing: `process_token_response()` stores the OIDC `access_token` as `openbao_token` directly, but per ADR 020-1 the `id_token` should be exchanged with OpenBao's `/v1/auth/jwt/login` endpoint.
**B.1 -- Implement JWT exchange in `harmony_secret/src/store/zitadel.rs`:**
- Add `openbao_url`, `jwt_auth_mount`, `jwt_role` fields to `ZitadelOidcAuth`
- Add `exchange_jwt_for_openbao_token(id_token)` using raw `reqwest` (vaultrs 0.7.4 has no JWT auth module)
- POST `{openbao_url}/v1/auth/{jwt_auth_mount}/login` with `{"role": "...", "jwt": "..."}`
- Modify `process_token_response()` to use exchange when `openbao_url` is set
**B.2 -- Wire JWT params through `harmony_secret/src/store/openbao.rs`:**
- Pass `base_url`, `jwt_auth_mount`, `jwt_role` to `ZitadelOidcAuth::new()` in `authenticate_zitadel_oidc()`
- Update `OpenbaoSecretStore::new()` signature for optional `jwt_role` and `jwt_auth_mount`
**B.3 -- Add env vars to `harmony_secret/src/config.rs`:**
- `OPENBAO_JWT_AUTH_MOUNT` (default: `jwt`)
- `OPENBAO_JWT_ROLE` (default: `harmony-developer`)
**B.4 -- Silent refresh:**
- Add `refresh_token()` method to `ZitadelOidcAuth`
- Update auth chain in `openbao.rs`: cached session -> silent refresh -> device flow
**B.5 -- `--sso-demo` flag:**
- Already stubbed in `examples/harmony_sso/src/main.rs`
- Requires a Zitadel device code application (manual setup, accept `HARMONY_SSO_CLIENT_ID` env var)
**B.6 -- Solve in-cluster DNS for JWT auth config:**
- OpenBao JWT auth needs `oidc_discovery_url` to fetch Zitadel's JWKS
- Zitadel requires `Host` header matching `ExternalDomain` on ALL endpoints (including `/oauth/v2/keys`)
- So `oidc_discovery_url=http://zitadel.zitadel.svc.cluster.local:8080` gets 404 from Zitadel
- Options: (a) CoreDNS rewrite rule mapping `sso.harmony.local` -> `zitadel.zitadel.svc`, (b) Kubernetes ExternalName service, (c) `Zitadel.AdditionalDomains` Helm config to accept the internal hostname
- Currently non-fatal (warning only), needed before `--sso-demo` can work
### Phase C: Testing & Automation -- TODO
**C.1 -- Integration tests** (`examples/harmony_sso/tests/integration.rs`, `#[ignore]`):
- `test_openbao_health` -- health endpoint
- `test_zitadel_openid_config` -- OIDC discovery
- `test_openbao_userpass_auth` -- write/read secret
- `test_config_manager_openbao_backend` -- full ConfigManager chain
- `test_openbao_jwt_auth_configured` -- verify JWT auth method + role exist
**C.2 -- Zitadel application automation** (`examples/harmony_sso/src/zitadel_setup.rs`):
- Automate project + device code app creation via Zitadel Management API
- Extract and save `client_id`
---
## Tricky Things / Lessons Learned
### ZitadelScore on k3d -- security context
The Zitadel container image (`ghcr.io/zitadel/zitadel`) defines `User: "zitadel"` (non-numeric string). With `runAsNonRoot: true` and `runAsUser: null`, kubelet can't verify the user is non-root and fails with `CreateContainerConfigError`. **Fix:** set `runAsUser: 1000` explicitly (that's the UID for `zitadel` in `/etc/passwd`). This applies to all security contexts: `podSecurityContext`, `securityContext`, `initJob`, `setupJob`, and `login`.
Changed in `harmony/src/modules/zitadel/mod.rs` for the `K3sFamily | Default` branch.
### ZitadelScore on k3d -- ingress class
The K3sFamily Helm values had `kubernetes.io/ingress.class: nginx` annotations. k3d ships with traefik, not nginx. The nginx annotation caused traefik to ignore the ingress entirely (404 on all routes). **Fix:** removed the explicit ingress class annotations -- traefik picks up ingresses without an explicit class by default.
Changed in `harmony/src/modules/zitadel/mod.rs` for the `K3sFamily | Default` branch.
### CNPG CRD registration race
After `helm install cloudnative-pg`, the operator deployment becomes ready but the CRD (`clusters.postgresql.cnpg.io`) is not yet registered in the API server's discovery cache. The kube client caches API discovery at init time, so even after the CRD registers, a reused client won't see it. **Fix:** the example creates a **fresh topology** (and therefore fresh kube client) on each retry attempt. Up to 5 retries with 15s delay.
### CNPG PostgreSQL cluster readiness
After the CNPG `Cluster` CR is created, the PostgreSQL pods and the `-rw` service take 15-30s to come up. `ZitadelScore` immediately calls `topology.get_endpoint()` which looks for the `zitadel-pg-rw` service. If the service doesn't exist yet, it fails with "not found for cluster". **Fix:** same retry loop catches this error pattern.
### Zitadel Helm init job timing
The Zitadel Helm chart runs a `zitadel-init` pre-install/pre-upgrade Job that connects to PostgreSQL. If the PG cluster isn't fully ready (primary not accepting connections), the init job hangs until Helm's 5-minute timeout. On a cold start from scratch, the sequence is: CNPG operator install -> CRD registration (5-15s) -> PG cluster creation -> PG pod scheduling + init (~30s) -> PG primary ready -> Zitadel init job can connect. The retry loop handles this by allowing the full sequence to settle between attempts.
### Zitadel Host header validation
Zitadel validates the `Host` header on **all** HTTP endpoints against its `ExternalDomain` config (`sso.harmony.local`). This means:
- The OIDC discovery endpoint (`/.well-known/openid-configuration`) returns 404 if called via the internal service URL without the correct Host header
- The JWKS endpoint (`/oauth/v2/keys`) also requires the correct Host
- OpenBao's JWT auth `oidc_discovery_url` can't use `http://zitadel.zitadel.svc.cluster.local:8080` because Zitadel rejects the Host
- From outside the cluster, use `127.0.0.1:8080` with `Host: sso.harmony.local` header (or add /etc/hosts entry)
- Phase B needs to solve in-cluster DNS resolution for `sso.harmony.local`
### Both services share one port
Both Zitadel and OpenBao are exposed through traefik ingress on port 80 (mapped to host port 8080). Traefik routes by `Host` header: `sso.harmony.local` -> Zitadel, `bao.harmony.local` -> OpenBao. The original plan had separate port mappings (8080 for Zitadel, 8200 for OpenBao) but the 8200 mapping was useless since traefik only listens on 80/443.
For `--demo` mode, the port-forward bypasses traefik and connects directly to the OpenBao service on port 8200 (no Host header needed).
### `run_bao_command` and shell escaping
The `run_bao_command` function runs `kubectl exec ... -- sh -c "export VAULT_TOKEN=xxx && bao ..."`. Two gotchas:
1. Must use `export VAULT_TOKEN=...` (not just `VAULT_TOKEN=...` prefix) because piped commands after `|` don't inherit the prefix env var
2. The policy creation uses `printf '...' | bao policy write harmony-dev -` which needs careful quoting inside the `sh -c` wrapper. Using `run_bao_command_raw()` avoids double-wrapping.
### FIXMEs for future refactoring
The user flagged several areas that should use `harmony-k8s` instead of raw `kubectl`:
- `wait_for_pod_running()` -- harmony-k8s has pod wait functionality
- `init_openbao()`, `unseal_openbao()` -- exec into pods via kubectl
- `get_k3d_binary_path()`, `get_openbao_data_path()` -- leaking implementation details from k3d/openbao crates
- `configure_openbao()` -- future candidate for an OpenBao/Vault capability trait
---
## Files Modified (Phase A)
| File | Change |
|---|---|
| `examples/harmony_sso/Cargo.toml` | Added clap, schemars, interactive-parse |
| `examples/harmony_sso/src/main.rs` | Complete rewrite: CLI args, Zitadel deploy, JWT auth config, demo modes, hardening |
| `examples/harmony_sso/README.md` | New: prerequisites, usage, architecture |
| `harmony/src/modules/zitadel/mod.rs` | Fixed K3s security context (`runAsUser: 1000`), removed nginx ingress annotations |
## Files to Modify (Phase B)
| File | Change |
|---|---|
| `harmony_secret/src/store/zitadel.rs` | JWT exchange, silent refresh |
| `harmony_secret/src/store/openbao.rs` | Wire JWT params, refresh in auth chain |
| `harmony_secret/src/config.rs` | OPENBAO_JWT_AUTH_MOUNT, OPENBAO_JWT_ROLE env vars |
## Verification
**Phase A (verified 2026-03-28):**
- `cargo run -p example-harmony-sso` -> deploys k3d + OpenBao + Zitadel (with retry for CNPG CRD + PG readiness)
- `curl -H "Host: bao.harmony.local" http://127.0.0.1:8080/v1/sys/health` -> OpenBao healthy (initialized, unsealed)
- `curl -H "Host: sso.harmony.local" http://127.0.0.1:8080/.well-known/openid-configuration` -> Zitadel OIDC config with device_authorization_endpoint
- `cargo run -p example-harmony-sso -- --demo` -> writes/reads config via ConfigManager + OpenbaoSecretStore, env override works
**Phase B:**
- `HARMONY_SSO_URL=http://sso.harmony.local HARMONY_SSO_CLIENT_ID=<id> cargo run -p example-harmony-sso -- --sso-demo`
- Device code appears, login in browser, config stored via SSO-authenticated OpenBao token
**Phase C:**
- `cargo test -p example-harmony-sso -- --ignored` -> integration tests pass