Files
harmony/docs/adr/020-interactive-configuration-crate.md
Jean-Gabriel Gill-Couture 64582caa64
Some checks failed
Run Check Script / check (pull_request) Failing after 10s
docs: Major rehaul of documentation
2026-03-19 22:38:55 -04:00

178 lines
10 KiB
Markdown

# ADR 020: Unified Configuration and Secret Management
Author: Jean-Gabriel Gill-Couture
Date: 2026-03-18
## Status
Proposed
## Context
Harmony's orchestration logic depends on runtime data that falls into two categories:
1. **Secrets** — credentials, tokens, private keys.
2. **Operational configuration** — deployment targets, host selections, port assignments, reboot decisions, and similar contextual choices.
Both categories share the same fundamental lifecycle: a value must be acquired before execution can proceed, it may come from several backends (environment variable, remote store, interactive prompt), and it must be shareable across a team without polluting the Git repository.
Treating these categories as separate subsystems forces developers to choose between a "config API" and a "secret API" at every call site. The only meaningful difference between the two is how the storage backend handles the data (plaintext vs. encrypted, audited vs. unaudited) and how the CLI displays it (visible vs. masked). That difference belongs in the backend, not in the application code.
Three concrete problems drive this change:
- **Async terminal corruption.** `inquire` prompts assume exclusive terminal ownership. Background tokio tasks emitting log output during a prompt corrupt the terminal state. This is inherent to Harmony's concurrent orchestration model.
- **Untestable code paths.** Any function containing an inline `inquire` call requires a real TTY to execute. Unit testing is impossible without ignoring the test entirely.
- **No backend integration.** Inline prompts cannot be answered from a remote store, an environment variable, or a CI pipeline. Every automated deployment that passes through a prompting code path requires a human operator at a terminal.
## Decision
A single workspace crate, `harmony_config`, provides all configuration and secret acquisition for Harmony. It replaces both `harmony_secret` and all inline `inquire` usage.
### Schema in Git, state in the store
The Rust type system serves as the configuration schema. Developers declare what configuration is needed by defining structs:
```rust
#[derive(Config, Serialize, Deserialize, JsonSchema, InteractiveParse)]
struct PostgresConfig {
pub host: String,
pub port: u16,
#[config(secret)]
pub password: String,
}
```
These structs live in Git and evolve with the code. When a branch introduces a new field, Git tracks that schema change. The actual values live in an external store — OpenBao by default. No `.env` files, no JSON config files, no YAML in the repository.
### Data classification
```rust
/// Tells the storage backend how to handle the data.
pub enum ConfigClass {
/// Plaintext storage is acceptable.
Standard,
/// Must be encrypted at rest, masked in UI, subject to audit logging.
Secret,
}
```
Classification is determined at the struct level. A struct with no `#[config(secret)]` fields has `ConfigClass::Standard`. A struct with one or more `#[config(secret)]` fields is elevated to `ConfigClass::Secret`. The struct is always stored as a single cohesive JSON blob; field-level splitting across backends is not a concern of the trait.
The `#[config(secret)]` attribute also instructs the `PromptSource` to mask terminal input for that field during interactive prompting.
### The Config trait
```rust
pub trait Config: Serialize + DeserializeOwned + JsonSchema + InteractiveParseObj + Sized {
/// Stable lookup key. By default, the struct name.
const KEY: &'static str;
/// How the backend should treat this data.
const CLASS: ConfigClass;
}
```
A `#[derive(Config)]` proc macro generates the implementation. The macro inspects field attributes to determine `CLASS`.
### The ConfigStore trait
```rust
#[async_trait]
pub trait ConfigStore: Send + Sync {
async fn get(
&self,
class: ConfigClass,
namespace: &str,
key: &str,
) -> Result<Option<serde_json::Value>, ConfigError>;
async fn set(
&self,
class: ConfigClass,
namespace: &str,
key: &str,
value: &serde_json::Value,
) -> Result<(), ConfigError>;
}
```
The `class` parameter is a hint. The store implementation decides what to do with it. An OpenBao store may route `Secret` data to a different path prefix or apply stricter ACLs. A future store could split fields across backends — that is an implementation concern, not a trait concern.
### Resolution chain
The `ConfigManager` tries sources in priority order:
1. **`EnvSource`** — reads `HARMONY_CONFIG_{KEY}` as a JSON string. Override hatch for CI/CD pipelines and containerized environments.
2. **`StoreSource`** — wraps a `ConfigStore` implementation. For teams, this is the OpenBao backend authenticated via Zitadel OIDC (see ADR 020-1).
3. **`PromptSource`** — presents an `interactive-parse` prompt on the terminal. Acquires a process-wide async mutex before rendering to prevent log output corruption.
When `PromptSource` obtains a value, the `ConfigManager` persists it back to the `StoreSource` so that subsequent runs — by the same developer or any teammate — resolve without prompting.
Callers that do not include `PromptSource` in their source list never block on a TTY. Test code passes empty source lists and constructs config structs directly.
### Schema versioning
The Rust struct is the schema. When a developer renames a field, removes a field, or changes a type on a branch, the store may still contain data shaped for a previous version of the struct. If another team member who does not yet have that commit runs the code, `serde_json::from_value` will fail on the stale entry.
In the initial implementation, the resolution chain handles this gracefully: a deserialization failure is treated as a cache miss, and the `PromptSource` fires. The prompted value overwrites the stale entry in the store.
This is sufficient for small teams working on short-lived branches. It is not sufficient at scale, where silent re-prompting could mask real configuration drift.
A future iteration will introduce a compile-time schema migration mechanism, similar to how `sqlx` verifies queries against a live database at compile time. The mechanism will:
- Detect schema drift between the Rust struct and the stored JSON.
- Apply named, ordered migration functions to transform stored data forward.
- Reject ambiguous migrations at compile time rather than silently corrupting state.
Until that mechanism exists, teams should treat store entries as soft caches: the struct definition is always authoritative, and the store is best-effort.
## Rationale
**Why merge secrets and config into one crate?** Separate crates with nearly identical trait shapes (`Secret` vs `Config`, `SecretStore` vs `ConfigStore`) force developers to make a classification decision at every call site. A unified crate with a `ConfigClass` discriminator moves that decision to the struct definition, where it belongs.
**Why OpenBao as the default backend?** OpenBao is a fully open-source Vault fork under the Linux Foundation. It runs on-premises with no phone-home requirement — a hard constraint for private cloud and regulated environments. Harmony already deploys OpenBao for clients (`OpenbaoScore`), so no new infrastructure is introduced.
**Why not store values in Git (e.g., encrypted YAML)?** Git-tracked config files create merge conflicts, require re-encryption on team membership changes, and leak metadata (file names, key names) even when values are encrypted. Storing state in OpenBao avoids all of these issues and provides audit logging, access control, and versioned KV out of the box.
**Why keep `PromptSource`?** Removing interactive prompts entirely would break the zero-infrastructure bootstrapping path and eliminate human-confirmation safety gates for destructive operations (interface reconfiguration, node reboot). The problem was never that prompts exist — it is that they were unavoidable and untestable. Making `PromptSource` an explicit, opt-in entry in the source list restores control.
## Consequences
### Positive
- A single API surface for all runtime data acquisition.
- All currently-ignored tests become runnable without TTY access.
- Async terminal corruption is eliminated by the process-wide prompt mutex.
- The bootstrapping path requires no infrastructure for a first run; `PromptSource` alone is sufficient.
- The team path (OpenBao + Zitadel) reuses infrastructure Harmony already deploys.
- User offboarding is a single Zitadel action.
### Negative
- Migrating all inline `inquire` and `harmony_secret` call sites is a significant refactoring effort.
- Until the schema migration mechanism is built, store entries for renamed or removed fields become stale and must be re-prompted.
- The Zitadel device flow introduces a browser step on first login per machine.
## Implementation Plan
### Phase 1: Trait design and crate restructure
Refactor `harmony_config` to define the final `Config`, `ConfigClass`, and `ConfigStore` traits. Update the derive macro to support `#[config(secret)]` and generate the correct `CLASS` constant. Implement `EnvSource` and `PromptSource` against the new traits. Write comprehensive unit tests using mock stores.
### Phase 2: Absorb `harmony_secret`
Migrate the `OpenbaoSecretStore`, `InfisicalSecretStore`, and `LocalFileSecretStore` implementations from `harmony_secret` into `harmony_config` as `ConfigStore` backends. Update all call sites that use `SecretManager::get`, `SecretManager::get_or_prompt`, or `SecretManager::set` to use `harmony_config` equivalents.
### Phase 3: Migrate inline prompts
Replace all inline `inquire` call sites in the `harmony` crate (`infra/brocade.rs`, `infra/network_manager.rs`, `modules/okd/host_network.rs`, and others) with `harmony_config` structs and `get_or_prompt` calls. Un-ignore the affected tests.
### Phase 4: Zitadel and OpenBao integration
Implement the authentication flow described in ADR 020-1. Wire `StoreSource` to use Zitadel OIDC tokens for OpenBao access. Implement token caching and silent refresh.
### Phase 5: Remove `harmony_secret`
Delete the `harmony_secret` and `harmony_secret_derive` crates from the workspace. All functionality now lives in `harmony_config`.