Replace all Command::new("kubectl") calls with harmony-k8s K8sClient
methods:
- wait_for_pod_ready() instead of kubectl get pod jsonpath
- exec_pod_capture_output() for OpenBao init/unseal/configure
- delete_resource<MutatingWebhookConfiguration>() for webhook cleanup
- port_forward() instead of kubectl port-forward subprocess
Thread K3d and K8sClient through all functions instead of
reconstructing context strings. Consolidate path helpers into
harmony_data_dir().
Add Zitadel deployment via ZitadelScore with retry logic for CNPG CRD
registration race and PostgreSQL cluster readiness timing.
Add CLI flags: --demo, --sso-demo, --skip-zitadel, --cleanup.
Add --demo mode: ConfigManager with EnvSource + StoreSource<OpenbaoSecretStore>.
Configure OpenBao with harmony-dev policy, userpass auth, and JWT auth.
156 lines
9.8 KiB
Markdown
156 lines
9.8 KiB
Markdown
# Harmony SSO Plan
|
|
|
|
## Context
|
|
|
|
Deploy Zitadel and OpenBao on a local k3d cluster, use them as `harmony_config` backends, and demonstrate end-to-end config storage authenticated via SSO. The goal: rock-solid deployment so teams and collaborators can reliably share config and secrets through OpenBao with Zitadel SSO authentication.
|
|
|
|
## Status
|
|
|
|
### Phase A: MVP with Token Auth -- DONE
|
|
|
|
- [x] A.1 -- CLI argument parsing (`--demo`, `--sso-demo`, `--skip-zitadel`, `--cleanup`)
|
|
- [x] A.2 -- Zitadel deployment via `ZitadelScore` (`external_secure: false` for k3d)
|
|
- [x] A.3 -- OpenBao JWT auth method + `harmony-dev` policy configuration
|
|
- [x] A.4 -- `--demo` flag: config storage demo with token auth via `ConfigManager`
|
|
- [x] A.5 -- Hardening: retry loops for pod readiness, HTTP readiness checks, `--cleanup`
|
|
- [x] A.6 -- README with prerequisites, usage, and architecture
|
|
|
|
Verified end-to-end: fresh `k3d cluster delete` -> `cargo run -p example-harmony-sso` -> `--demo` succeeds.
|
|
|
|
### Phase B: OIDC Device Flow + JWT Exchange -- TODO
|
|
|
|
The Zitadel OIDC device flow code exists (`harmony_secret/src/store/zitadel.rs`) but the **JWT exchange** step is missing: `process_token_response()` stores the OIDC `access_token` as `openbao_token` directly, but per ADR 020-1 the `id_token` should be exchanged with OpenBao's `/v1/auth/jwt/login` endpoint.
|
|
|
|
**B.1 -- Implement JWT exchange in `harmony_secret/src/store/zitadel.rs`:**
|
|
- Add `openbao_url`, `jwt_auth_mount`, `jwt_role` fields to `ZitadelOidcAuth`
|
|
- Add `exchange_jwt_for_openbao_token(id_token)` using raw `reqwest` (vaultrs 0.7.4 has no JWT auth module)
|
|
- POST `{openbao_url}/v1/auth/{jwt_auth_mount}/login` with `{"role": "...", "jwt": "..."}`
|
|
- Modify `process_token_response()` to use exchange when `openbao_url` is set
|
|
|
|
**B.2 -- Wire JWT params through `harmony_secret/src/store/openbao.rs`:**
|
|
- Pass `base_url`, `jwt_auth_mount`, `jwt_role` to `ZitadelOidcAuth::new()` in `authenticate_zitadel_oidc()`
|
|
- Update `OpenbaoSecretStore::new()` signature for optional `jwt_role` and `jwt_auth_mount`
|
|
|
|
**B.3 -- Add env vars to `harmony_secret/src/config.rs`:**
|
|
- `OPENBAO_JWT_AUTH_MOUNT` (default: `jwt`)
|
|
- `OPENBAO_JWT_ROLE` (default: `harmony-developer`)
|
|
|
|
**B.4 -- Silent refresh:**
|
|
- Add `refresh_token()` method to `ZitadelOidcAuth`
|
|
- Update auth chain in `openbao.rs`: cached session -> silent refresh -> device flow
|
|
|
|
**B.5 -- `--sso-demo` flag:**
|
|
- Already stubbed in `examples/harmony_sso/src/main.rs`
|
|
- Requires a Zitadel device code application (manual setup, accept `HARMONY_SSO_CLIENT_ID` env var)
|
|
|
|
**B.6 -- Solve in-cluster DNS for JWT auth config:**
|
|
- OpenBao JWT auth needs `oidc_discovery_url` to fetch Zitadel's JWKS
|
|
- Zitadel requires `Host` header matching `ExternalDomain` on ALL endpoints (including `/oauth/v2/keys`)
|
|
- So `oidc_discovery_url=http://zitadel.zitadel.svc.cluster.local:8080` gets 404 from Zitadel
|
|
- Options: (a) CoreDNS rewrite rule mapping `sso.harmony.local` -> `zitadel.zitadel.svc`, (b) Kubernetes ExternalName service, (c) `Zitadel.AdditionalDomains` Helm config to accept the internal hostname
|
|
- Currently non-fatal (warning only), needed before `--sso-demo` can work
|
|
|
|
### Phase C: Testing & Automation -- TODO
|
|
|
|
**C.1 -- Integration tests** (`examples/harmony_sso/tests/integration.rs`, `#[ignore]`):
|
|
- `test_openbao_health` -- health endpoint
|
|
- `test_zitadel_openid_config` -- OIDC discovery
|
|
- `test_openbao_userpass_auth` -- write/read secret
|
|
- `test_config_manager_openbao_backend` -- full ConfigManager chain
|
|
- `test_openbao_jwt_auth_configured` -- verify JWT auth method + role exist
|
|
|
|
**C.2 -- Zitadel application automation** (`examples/harmony_sso/src/zitadel_setup.rs`):
|
|
- Automate project + device code app creation via Zitadel Management API
|
|
- Extract and save `client_id`
|
|
|
|
---
|
|
|
|
## Tricky Things / Lessons Learned
|
|
|
|
### ZitadelScore on k3d -- security context
|
|
|
|
The Zitadel container image (`ghcr.io/zitadel/zitadel`) defines `User: "zitadel"` (non-numeric string). With `runAsNonRoot: true` and `runAsUser: null`, kubelet can't verify the user is non-root and fails with `CreateContainerConfigError`. **Fix:** set `runAsUser: 1000` explicitly (that's the UID for `zitadel` in `/etc/passwd`). This applies to all security contexts: `podSecurityContext`, `securityContext`, `initJob`, `setupJob`, and `login`.
|
|
|
|
Changed in `harmony/src/modules/zitadel/mod.rs` for the `K3sFamily | Default` branch.
|
|
|
|
### ZitadelScore on k3d -- ingress class
|
|
|
|
The K3sFamily Helm values had `kubernetes.io/ingress.class: nginx` annotations. k3d ships with traefik, not nginx. The nginx annotation caused traefik to ignore the ingress entirely (404 on all routes). **Fix:** removed the explicit ingress class annotations -- traefik picks up ingresses without an explicit class by default.
|
|
|
|
Changed in `harmony/src/modules/zitadel/mod.rs` for the `K3sFamily | Default` branch.
|
|
|
|
### CNPG CRD registration race
|
|
|
|
After `helm install cloudnative-pg`, the operator deployment becomes ready but the CRD (`clusters.postgresql.cnpg.io`) is not yet registered in the API server's discovery cache. The kube client caches API discovery at init time, so even after the CRD registers, a reused client won't see it. **Fix:** the example creates a **fresh topology** (and therefore fresh kube client) on each retry attempt. Up to 5 retries with 15s delay.
|
|
|
|
### CNPG PostgreSQL cluster readiness
|
|
|
|
After the CNPG `Cluster` CR is created, the PostgreSQL pods and the `-rw` service take 15-30s to come up. `ZitadelScore` immediately calls `topology.get_endpoint()` which looks for the `zitadel-pg-rw` service. If the service doesn't exist yet, it fails with "not found for cluster". **Fix:** same retry loop catches this error pattern.
|
|
|
|
### Zitadel Helm init job timing
|
|
|
|
The Zitadel Helm chart runs a `zitadel-init` pre-install/pre-upgrade Job that connects to PostgreSQL. If the PG cluster isn't fully ready (primary not accepting connections), the init job hangs until Helm's 5-minute timeout. On a cold start from scratch, the sequence is: CNPG operator install -> CRD registration (5-15s) -> PG cluster creation -> PG pod scheduling + init (~30s) -> PG primary ready -> Zitadel init job can connect. The retry loop handles this by allowing the full sequence to settle between attempts.
|
|
|
|
### Zitadel Host header validation
|
|
|
|
Zitadel validates the `Host` header on **all** HTTP endpoints against its `ExternalDomain` config (`sso.harmony.local`). This means:
|
|
- The OIDC discovery endpoint (`/.well-known/openid-configuration`) returns 404 if called via the internal service URL without the correct Host header
|
|
- The JWKS endpoint (`/oauth/v2/keys`) also requires the correct Host
|
|
- OpenBao's JWT auth `oidc_discovery_url` can't use `http://zitadel.zitadel.svc.cluster.local:8080` because Zitadel rejects the Host
|
|
- From outside the cluster, use `127.0.0.1:8080` with `Host: sso.harmony.local` header (or add /etc/hosts entry)
|
|
- Phase B needs to solve in-cluster DNS resolution for `sso.harmony.local`
|
|
|
|
### Both services share one port
|
|
|
|
Both Zitadel and OpenBao are exposed through traefik ingress on port 80 (mapped to host port 8080). Traefik routes by `Host` header: `sso.harmony.local` -> Zitadel, `bao.harmony.local` -> OpenBao. The original plan had separate port mappings (8080 for Zitadel, 8200 for OpenBao) but the 8200 mapping was useless since traefik only listens on 80/443.
|
|
|
|
For `--demo` mode, the port-forward bypasses traefik and connects directly to the OpenBao service on port 8200 (no Host header needed).
|
|
|
|
### `run_bao_command` and shell escaping
|
|
|
|
The `run_bao_command` function runs `kubectl exec ... -- sh -c "export VAULT_TOKEN=xxx && bao ..."`. Two gotchas:
|
|
1. Must use `export VAULT_TOKEN=...` (not just `VAULT_TOKEN=...` prefix) because piped commands after `|` don't inherit the prefix env var
|
|
2. The policy creation uses `printf '...' | bao policy write harmony-dev -` which needs careful quoting inside the `sh -c` wrapper. Using `run_bao_command_raw()` avoids double-wrapping.
|
|
|
|
### FIXMEs for future refactoring
|
|
|
|
The user flagged several areas that should use `harmony-k8s` instead of raw `kubectl`:
|
|
- `wait_for_pod_running()` -- harmony-k8s has pod wait functionality
|
|
- `init_openbao()`, `unseal_openbao()` -- exec into pods via kubectl
|
|
- `get_k3d_binary_path()`, `get_openbao_data_path()` -- leaking implementation details from k3d/openbao crates
|
|
- `configure_openbao()` -- future candidate for an OpenBao/Vault capability trait
|
|
|
|
---
|
|
|
|
## Files Modified (Phase A)
|
|
|
|
| File | Change |
|
|
|---|---|
|
|
| `examples/harmony_sso/Cargo.toml` | Added clap, schemars, interactive-parse |
|
|
| `examples/harmony_sso/src/main.rs` | Complete rewrite: CLI args, Zitadel deploy, JWT auth config, demo modes, hardening |
|
|
| `examples/harmony_sso/README.md` | New: prerequisites, usage, architecture |
|
|
| `harmony/src/modules/zitadel/mod.rs` | Fixed K3s security context (`runAsUser: 1000`), removed nginx ingress annotations |
|
|
|
|
## Files to Modify (Phase B)
|
|
|
|
| File | Change |
|
|
|---|---|
|
|
| `harmony_secret/src/store/zitadel.rs` | JWT exchange, silent refresh |
|
|
| `harmony_secret/src/store/openbao.rs` | Wire JWT params, refresh in auth chain |
|
|
| `harmony_secret/src/config.rs` | OPENBAO_JWT_AUTH_MOUNT, OPENBAO_JWT_ROLE env vars |
|
|
|
|
## Verification
|
|
|
|
**Phase A (verified 2026-03-28):**
|
|
- `cargo run -p example-harmony-sso` -> deploys k3d + OpenBao + Zitadel (with retry for CNPG CRD + PG readiness)
|
|
- `curl -H "Host: bao.harmony.local" http://127.0.0.1:8080/v1/sys/health` -> OpenBao healthy (initialized, unsealed)
|
|
- `curl -H "Host: sso.harmony.local" http://127.0.0.1:8080/.well-known/openid-configuration` -> Zitadel OIDC config with device_authorization_endpoint
|
|
- `cargo run -p example-harmony-sso -- --demo` -> writes/reads config via ConfigManager + OpenbaoSecretStore, env override works
|
|
|
|
**Phase B:**
|
|
- `HARMONY_SSO_URL=http://sso.harmony.local HARMONY_SSO_CLIENT_ID=<id> cargo run -p example-harmony-sso -- --sso-demo`
|
|
- Device code appears, login in browser, config stored via SSO-authenticated OpenBao token
|
|
|
|
**Phase C:**
|
|
- `cargo test -p example-harmony-sso -- --ignored` -> integration tests pass
|