New roadmap phase covering the hardening path for the SSO config management stack: builder pattern for OpenbaoSecretStore, ZitadelScore PG readiness fix, CoreDNSRewriteScore, integration tests, and future capability traits. Updates current state to reflect implemented Zitadel OIDC integration and harmony_sso example.
6.2 KiB
Phase 9: SSO + Config System Hardening
Goal
Make the Zitadel + OpenBao SSO config management stack production-ready, well-tested, and reusable across deployments. The harmony_sso example demonstrates the full loop: deploy infrastructure, authenticate via SSO, store and retrieve config -- all in one cargo run.
Current State (as of feat/opnsense-codegen)
The SSO example works end-to-end:
- k3d cluster + OpenBao + Zitadel deployed via Scores
OpenbaoSetupScore: init, unseal, policies, userpass, JWT authZitadelSetupScore: project + device-code app provisioning via Management API (PAT auth)- JWT exchange: Zitadel id_token → OpenBao client token via
/v1/auth/jwt/login - Device flow triggers in terminal, user logs in via browser, config stored in OpenBao KV v2
- CoreDNS patched for in-cluster hostname resolution (K3sFamily only)
- Discovery cache invalidation after CRD installation
- Session caching with TTL
What's solid
- Score composition: 4 Scores orchestrate the full stack in ~280 lines
- Config trait: clean
Serialize + Deserialize + JsonSchema, developer doesn't see OpenBao or Zitadel - Auth chain transparency: token → cached → OIDC device flow → userpass, right thing happens
- Idempotency: all Scores safe to re-run, cached sessions skip login
What needs work
See tasks below.
Tasks
9.1 Builder pattern for OpenbaoSecretStore — HIGH
Problem: OpenbaoSecretStore::new() has 11 positional arguments. Adding JWT params made it worse. Callers pass None, None, None, None for unused options.
Fix: Replace with a builder:
OpenbaoSecretStore::builder()
.url("http://127.0.0.1:8200")
.kv_mount("secret")
.skip_tls(true)
.zitadel_sso("http://sso.harmony.local:8080", "client-id-123")
.jwt_auth("harmony-developer", "jwt")
.build()
.await?
Impact: All callers updated (lib.rs, openbao_chain example, harmony_sso example). Breaking API change.
Files: harmony_secret/src/store/openbao.rs, all callers
9.2 Fix ZitadelScore PG readiness — HIGH
Problem: ZitadelScore calls topology.get_endpoint() immediately after deploying the CNPG Cluster CR. The PG -rw service takes 15-30s to appear. This forces a retry loop in the caller (the example).
Fix: Add a wait loop inside ZitadelScore's interpret, after topology.deploy(&pg_config), that polls for the -rw service to exist before calling get_endpoint(). Use K8sClient::get_resource::<Service>() with a poll loop.
Impact: Eliminates the retry wrapper in the harmony_sso example and any other Zitadel consumer.
Files: harmony/src/modules/zitadel/mod.rs
9.3 CoreDNSRewriteScore — MEDIUM
Problem: CoreDNS patching logic lives in the harmony_sso example. It's a general pattern: any service with ingress-based Host routing needs in-cluster DNS resolution.
Fix: Extract into harmony/src/modules/k8s/coredns.rs as a proper Score:
pub struct CoreDNSRewriteScore {
pub rewrites: Vec<(String, String)>, // (hostname, service FQDN)
}
impl<T: Topology + K8sclient> Score<T> for CoreDNSRewriteScore { ... }
K3sFamily only. No-op on OpenShift. Idempotent.
Files: harmony/src/modules/k8s/coredns.rs (new), harmony/src/modules/k8s/mod.rs
9.4 Integration tests for Scores — MEDIUM
Problem: Zero tests for OpenbaoSetupScore, ZitadelSetupScore, CoreDNSRewriteScore. The Scores are testable against a running k3d cluster.
Fix: Add #[ignore] integration tests that require a running cluster:
test_openbao_setup_score: deploy OpenBao + run setup, verify KV workstest_zitadel_setup_score: deploy Zitadel + run setup, verify project/app existtest_config_round_trip: store + retrieve config via SSO-authenticated OpenBao
Run with cargo test -- --ignored after deploying the example.
Files: harmony/tests/integration/ (new directory)
9.5 Remove resolve() DNS hack — LOW
Problem: ZitadelOidcAuth::http_client() hardcodes resolve(host, 127.0.0.1:port). This only works for local k3d development.
Fix: Make it configurable. Add an optional resolve_to: Option<SocketAddr> field to ZitadelOidcAuth. The example passes Some(127.0.0.1:8080) for k3d; production passes None (uses real DNS). Or better: detect whether the host resolves and only apply the override if it doesn't.
Files: harmony_secret/src/store/zitadel.rs
9.6 Typed Zitadel API client — LOW
Problem: ZitadelSetupScore uses hand-written JSON with string parsing for Management API calls. No type safety on request/response.
Fix: Create typed request/response structs for the Management API v1 endpoints used (projects, apps, users). Use serde for serialization. This doesn't need to be a full API client -- just the endpoints we use.
Files: harmony/src/modules/zitadel/api.rs (new)
9.7 Capability traits for secret vault + identity — FUTURE
Problem: OpenbaoScore and ZitadelScore are tool-specific. No capability abstraction for "I need a secret vault" or "I need an identity provider".
Fix: Design SecretVault and IdentityProvider capability traits on topologies. This is a significant architectural decision that needs an ADR.
Blocked by: Real-world use of a second implementation (e.g., HashiCorp Vault, Keycloak) to validate the abstraction boundary.
9.8 Auto-unseal for OpenBao — FUTURE
Problem: Every pod restart requires manual unseal. OpenbaoSetupScore handles this, but requires re-running the Score.
Fix: Configure Transit auto-unseal (using a second OpenBao/Vault instance) or cloud KMS auto-unseal. This is an operational concern that should be configurable in OpenbaoSetupScore.
Relationship to Other Phases
- Phase 1 (config crate): SSO flow builds directly on
harmony_config+StoreSource<OpenbaoSecretStore>. Phase 1 task 1.4 is now complete via the harmony_sso example. - Phase 2 (migrate to harmony_config): The 19
SecretManagercall sites should migrate toConfigManagerwith the OpenbaoSecretStore backend. The SSO flow validates this pattern works. - Phase 5 (E2E tests): The harmony_sso example is a candidate for the first E2E test -- it deploys k3d, exercises multiple Scores, and verifies config storage.