feat/fleet-staging-openbao #313

Open
johnride wants to merge 11 commits from feat/fleet-staging-openbao into master
18 changed files with 813 additions and 443 deletions

23
Cargo.lock generated
View File

@@ -3213,12 +3213,15 @@ dependencies = [
name = "example-openbao"
version = "0.1.0"
dependencies = [
"anyhow",
"clap",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"harmony_config",
"schemars 0.8.22",
"serde",
"tokio",
"url",
"tracing",
]
[[package]]
@@ -3495,16 +3498,19 @@ version = "0.1.0"
dependencies = [
"anyhow",
"clap",
"env_logger",
"harmony",
"harmony-fleet-deploy",
"harmony-k8s",
"harmony-nats-callout",
"harmony_cli",
"log",
"harmony_config",
"harmony_secret",
"nkeys",
"rand 0.9.4",
"schemars 0.8.22",
"serde",
"tokio",
"tracing",
]
[[package]]
@@ -4308,18 +4314,17 @@ name = "harmony_cli"
version = "0.1.0"
dependencies = [
"assert_cmd",
"chrono",
"async-trait",
"clap",
"console",
"env_logger",
"harmony",
"harmony_tui",
"indicatif",
"indicatif-log-bridge",
"inquire 0.7.5",
"lazy_static",
"log",
"tokio",
"tracing",
"tracing-subscriber",
]
[[package]]

View File

@@ -0,0 +1,134 @@
# Fleet Platform v0.3 — Staging to production-ready
Written 2026-05-31. Picks up after OpenBao + Zitadel + NATS + callout + operator are deployed and functional on staging (2-3 weeks old versions).
## Current state
- [x] OpenBao running at `secrets-stg.cb1.nationtech.io`
- [x] Zitadel running at `sso-stg.cb1.nationtech.io`
- [x] NATS + auth callout deployed in `fleet-staging` namespace
- [x] Operator deployed (older version, 2-3 weeks old)
- [x] Config-driven OpenBao installer (`examples/openbao`)
- [x] `harmony-fleet-deploy` binary reads `FleetDeployConfig` + `FleetDeploySecrets` from OpenBao
## Immediate next steps
### 1. Provision operator credentials in OpenBao
- [ ] Fetch existing creds from the running cluster:
```bash
oc -n fleet-staging get secret harmony-fleet-operator-secrets -o jsonpath='{.data.credentials\.toml}' | base64 -d
```
- [ ] Seed into OpenBao at `secret/data/fleet-staging/FleetDeploySecrets`:
```bash
export VAULT_ADDR=https://secrets-stg.cb1.nationtech.io
export VAULT_TOKEN=<root token>
oc -n fleet-staging get secret harmony-fleet-operator-secrets -o jsonpath='{.data.credentials\.toml}' | base64 -d \
| jq -Rs '{value: ({operator_credentials_toml: .} | tojson)}' \
| bao kv put secret/fleet-staging/FleetDeploySecrets -
```
- [ ] Verify the secret is readable: `bao kv get secret/fleet-staging/FleetDeploySecrets`
### 2. Private repo deploy script
- [ ] Create `.envrc` with minimal env:
```bash
export OPENBAO_URL=https://secrets-stg.cb1.nationtech.io
export HARMONY_CONFIG_NAMESPACE=fleet-staging
# export OPENBAO_TOKEN=<root token for now; SSO later>
```
- [ ] Write deploy invocation (shell script or just `harmony-fleet-deploy` call):
```bash
harmony-fleet-deploy --from-tag harmony-fleet-operator-vX.Y.Z --yes
```
- [ ] Commit `.envrc` + script to private repo (shared with teammates)
### 3. Execute operator upgrade
- [ ] Run the deploy script from the private repo
- [ ] Verify operator pod starts and connects to NATS
- [ ] Verify operator reconciles existing CRs (check logs)
- [ ] Confirm no regression in existing fleet functionality
### 4. Operator UI ingress (trivial)
- [ ] Expose operator UI with TLS ingress on `fleet-stg.<base_domain>`
- [ ] Verify the UI loads and serves the SPA
- [ ] Confirm no auth gate yet (SSO is next)
### 5. SSO login flow
- [ ] Wire operator UI to Zitadel SSO at `sso-stg.<base_domain>`
- [ ] Test login/logout flow end-to-end
- [ ] Verify session persistence across page reloads
- [ ] Confirm RBAC: only authorized Zitadel users can access the UI
### 6. Real data in UI
- [ ] Replace mock device list with live `device-info` KV data
- [ ] Replace mock deployment list with live `Deployment` CR data
- [ ] Wire per-device drilldown to real `DeviceInfo` + last-heartbeat + agent version
- [ ] NATS tail panel: SSE stream of `device-info` and `device-state` updates (plain text)
- [ ] Verify data refreshes without manual reload
## Configuration model
### Environment (minimal, committed in private repo)
```bash
OPENBAO_URL=https://secrets-stg.cb1.nationtech.io
HARMONY_CONFIG_NAMESPACE=fleet-staging
# SSO auth or root token (SSO is the goal)
```
### OpenBao (read via ConfigClient)
- `FleetDeployConfig` (k8s namespaces, NATS URL, chart coords) at `secret/data/fleet-staging/FleetDeployConfig`
- `FleetDeploySecrets` (operator creds) at `secret/data/fleet-staging/FleetDeploySecrets`
## Missing features (post-UI)
### Auth & credentials
- [ ] Per-device OpenBao policies (templated policies, one role per device type)
- [ ] Device identity claim in JWT (Zitadel `client_id` with `device-` prefix)
- [ ] OpenBao JWT auth role granularity (extend `OpenbaoJwtAuth` to list of roles)
- [x] Move k8s namespaces + chart coords into `ConfigClient` config struct (env = only identifier + auth)
### Operator capabilities
- [ ] Agent upgrade path (ADR-022 exists; implementation pending)
- [ ] Device enrollment flow (operator-facing runbook)
- [ ] Revoke device / rotate key operations
- [ ] Fleet-wide rollout strategies (canary, %-based) on top of agent-upgrade primitive
### Observability
- [ ] Operator logs every CR it acquires (verify output reads well)
- [ ] NATS debugging one-liners in hand-off menu
- [ ] Journald log streaming (currently only `.status.aggregate.lastError`)
- [ ] Metrics dashboard (deferred until >100 devices)
### Quality & hardening
- [ ] Agent config-driven labels (`[labels]` in agent toml → DeviceInfo)
- [ ] `matchExpressions` in selectors (currently `matchLabels` only)
- [ ] `Device.status.conditions` populated from heartbeat staleness
- [ ] Operator graceful degradation on bad device_id (log + skip, don't restart-loop)
- [ ] Persist `nats_auth_pass` and issuer NKey via `harmony_secret` (regenerate-every-run footgun)
### Refactors (deferred, non-blocking)
- [ ] Decompose `FleetServerScore` into independent, ConfigClient-glued Scores
- [ ] Move `harmony/modules/fleet/` → `fleet/harmony-fleet/` (ADR-021 pending)
- [ ] Delete `examples/fleet_staging_deploy` (superseded by `fleet_staging_install`)
- [ ] Drop `K8sAnywhereTopology` for ad-hoc Score execution; introduce `K8sBareTopology`
## Principles (carried forward)
- No yaml in framework code paths
- Scores describe desired state; topologies expose capabilities
- Cross-boundary wire types in `harmony-reconciler-contracts`
- Never ship untested code
- Prove claims about upstream before blaming upstream
- Design the brick before moving the brick

View File

@@ -13,6 +13,8 @@ path = "src/main.rs"
[dependencies]
harmony = { path = "../../harmony" }
harmony_cli = { path = "../../harmony_cli" }
harmony_config = { path = "../../harmony_config" }
harmony_secret = { path = "../../harmony_secret" }
harmony-k8s = { path = "../../harmony-k8s" }
harmony-nats-callout = { path = "../../nats/callout" }
harmony-fleet-deploy = { path = "../../fleet/harmony-fleet-deploy" }
@@ -21,5 +23,6 @@ rand = "0.9"
anyhow.workspace = true
clap = { version = "4", features = ["derive", "env"] }
tokio.workspace = true
log.workspace = true
env_logger.workspace = true
tracing = { workspace = true }
serde = { workspace = true }
schemars = "0.8"

View File

@@ -1,32 +1,19 @@
//! Production-shape fleet install for OKD (or any cluster with the
//! same capabilities). Composes:
//! Production-shape fleet install for OKD (or any cluster with the same
//! capabilities): Zitadel SSO + NATS (auth-callout) + operator + OpenBao,
//! composed from Scores.
//!
//! 1. Zitadel + Postgres helm install in `--zitadel-namespace`,
//! edge-TLS Route at `sso-staging.<base>` via cert-manager.
//! 2. ZitadelSetupScore in the same call so we have the
//! `fleet-operator` machine key BEFORE the operator pod starts.
//! 3. Single-instance NATS (JetStream) in `--fleet-namespace` with
//! the auth_callout block wired to the callout's issuer NKey
//! pubkey + WebSocket listener (no_tls — Route owns TLS).
//! 4. NATS WebSocket Route at `nats-fleet-staging.<base>`,
//! edge-TLS, cert-manager-managed cert.
//! 5. NatsAuthCalloutScore deployment (Secret-based env vars only,
//! no volume mounts — OKD restricted-v2 SCC compat).
//! 6. FleetOperatorScore with credentials TOML inlining the
//! `fleet-operator` JSON keyfile (env-var-from-Secret only).
//! Tunables come from [`ConfigClient`] (`HARMONY_CONFIG_FleetStagingConfig`
//! env JSON → OpenBao → interactive prompt), not a bespoke CLI. The only
//! flags are `harmony_cli`'s: `--filter`/`--list`/`-y` select which workload
//! Scores to (re)deploy — e.g. `--filter FleetOperatorScore` bumps the
//! operator without touching NATS or the callout.
//!
//! One required CLI flag — `--base-domain` — drives every public
//! hostname. Per-cluster overrides for the cluster issuer name and
//! image refs follow.
//!
//! Usage:
//!
//! ```text
//! KUBECONFIG=$ADMIN_KUBECONFIG cargo run -p example_fleet_staging_install -- \
//! --base-domain cb1.nationtech.io \
//! --operator-image hub.nationtech.io/harmony/harmony-fleet-operator:dev \
//! --callout-image hub.nationtech.io/harmony/harmony-nats-callout:dev
//! ```
//! Zitadel + OpenBao are an idempotent bootstrap: ZitadelSetupScore mints the
//! `project_id` + `fleet-operator` machine key that the callout and operator
//! Scores consume, so it must converge (and cache to disk) before they're
//! built. That data flow is why those two can't sit in the filterable batch.
use std::sync::Arc;
use anyhow::{Context, Result};
use clap::Parser;
@@ -34,390 +21,349 @@ use harmony::inventory::Inventory;
use harmony::modules::nats::capability::NatsCluster;
use harmony::modules::nats::score_nats_k8s::{AuthCalloutCfg, NatsK8sScore, WebSocketRouteCfg};
use harmony::modules::nats_auth_callout::NatsAuthCalloutScore;
use harmony::modules::openbao::{
OpenbaoInstance, OpenbaoPolicy, OpenbaoScore, OpenbaoSetupScore, cached_root_token,
};
use harmony::modules::zitadel::{
MachineKeyType, ZitadelApiApp, ZitadelAppType, ZitadelApplication, ZitadelClientConfig,
ZitadelMachineUser, ZitadelRole, ZitadelScore, ZitadelSetupScore,
};
use harmony::score::Score;
use harmony::topology::{K8sAnywhereTopology, Topology};
use harmony_fleet_deploy::{FleetOperatorScore, OperatorCredentials};
use harmony::topology::{K8sAnywhereTopology, K8sclient, Topology};
use harmony_config::{Config, ConfigClient, StoreSource};
use harmony_fleet_deploy::{FleetDeploySecrets, FleetOperatorScore, OperatorCredentials};
use harmony_k8s::KubernetesDistribution;
use harmony_secret::OpenbaoSecretStore;
use nkeys::KeyPair;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use tracing::info;
#[derive(Parser, Debug)]
#[command(
name = "fleet_staging_install",
about = "Install fleet staging stack (Zitadel + NATS + callout + operator) on OKD"
)]
struct Cli {
/// Cluster's public base domain. Hostnames are derived from it:
/// sso-staging.<base> ← Zitadel
/// nats-fleet-staging.<base> ← NATS WebSocket
///
/// To deploy on a different cluster, change this and re-run.
#[arg(long)]
/// Non-secret install tunables. `base_domain` drives every public hostname;
/// the image refs and `*-stg.<base>` hosts have no safe default, so an empty
/// value is rejected at startup. Everything else defaults to the staging
/// conventions.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Config)]
#[serde(default)]
struct FleetStagingConfig {
base_domain: String,
/// cert-manager `ClusterIssuer` name. Drives the
/// `cert-manager.io/cluster-issuer` annotation on the Zitadel
/// and NATS Routes. Override per cluster if your operator uses
/// a different issuer name.
#[arg(long, default_value = "letsencrypt-prod")]
cluster_issuer: String,
/// Namespace for NATS, callout, operator.
#[arg(long, default_value = "fleet-staging")]
fleet_namespace: String,
/// Namespace for Zitadel + Postgres.
#[arg(long, default_value = "zitadel-staging")]
zitadel_namespace: String,
/// Operator container image (`repository:tag`). Public on
/// hub.nationtech.io for the demo; ImagePullSecret for that
/// registry must already be present in `--fleet-namespace`.
#[arg(long)]
operator_image: String,
/// Auth callout container image (`repository:tag`).
#[arg(long)]
callout_image: String,
/// NATS account name auth-callout-issued users land in. Must
/// match the NATS Helm `auth_callout.account` field. Default
/// `FLEET` matches the rest of the staging conventions.
#[arg(long, default_value = "FLEET")]
cluster_issuer: String,
fleet_namespace: String,
zitadel_namespace: String,
nats_account: String,
/// Zitadel chart version pin.
#[arg(long, default_value = "v4.12.1")]
zitadel_version: String,
/// Project name created inside Zitadel for fleet auth.
#[arg(long, default_value = "fleet")]
project_name: String,
/// Role name granting full admin (operator + manual ops). The
/// callout maps this role to `pub/sub: [">"]`.
#[arg(long, default_value = "fleet-admin")]
admin_role: String,
/// Role name granting per-device scoped permissions.
#[arg(long, default_value = "device")]
device_role: String,
/// Username of the operator's Zitadel machine user. Distinct
/// from `fleet-ops` (manual admin tooling) for audit trail.
#[arg(long, default_value = "fleet-operator")]
operator_username: String,
/// Username of the manual-admin Zitadel machine user (the one
/// you mint tokens with from your laptop).
#[arg(long, default_value = "fleet-ops")]
admin_username: String,
}
impl Default for FleetStagingConfig {
fn default() -> Self {
Self {
base_domain: String::new(),
operator_image: String::new(),
callout_image: String::new(),
cluster_issuer: "letsencrypt-prod".to_string(),
fleet_namespace: "fleet-staging".to_string(),
zitadel_namespace: "zitadel-staging".to_string(),
nats_account: "FLEET".to_string(),
zitadel_version: "v4.12.1".to_string(),
project_name: "fleet".to_string(),
admin_role: "fleet-admin".to_string(),
device_role: "device".to_string(),
operator_username: "fleet-operator".to_string(),
admin_username: "fleet-ops".to_string(),
}
}
}
#[tokio::main]
async fn main() -> Result<()> {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info"))
.try_init()
.ok();
harmony_cli::cli_logger::init();
let args = harmony_cli::Args::parse();
let cfg: FleetStagingConfig = ConfigClient::for_namespace("harmony")
.await
.get_or_prompt()
.await
.context("loading FleetStagingConfig")?;
anyhow::ensure!(
!cfg.base_domain.is_empty()
&& !cfg.operator_image.is_empty()
&& !cfg.callout_image.is_empty(),
"base_domain, operator_image and callout_image must be set"
);
let cli = Cli::parse();
let topology = K8sAnywhereTopology::from_env();
topology.ensure_ready().await?;
let zitadel_host = format!("sso-stg.{}", cli.base_domain);
let nats_ws_host = format!("nats-fleet-stg.{}", cli.base_domain);
let zitadel_host = format!("sso-stg.{}", cfg.base_domain);
let nats_ws_host = format!("nats-fleet-stg.{}", cfg.base_domain);
let secrets_host = format!("secrets-stg.{}", cfg.base_domain);
let nats_release = "fleet-nats";
let cli_app_name = "harmony-cli";
// ---- 1. Zitadel helm install ----------------------------------------
let zitadel = ZitadelScore {
// ---- Bootstrap (idempotent): Zitadel mints the project + operator key ---
ZitadelScore {
host: zitadel_host.clone(),
zitadel_version: cli.zitadel_version.clone(),
zitadel_version: cfg.zitadel_version.clone(),
external_secure: true,
external_port: None,
namespace: cli.zitadel_namespace.clone(),
cluster_issuer: cli.cluster_issuer.clone(),
namespace: cfg.zitadel_namespace.clone(),
cluster_issuer: cfg.cluster_issuer.clone(),
..Default::default()
};
log::info!(
"[1/6] Zitadel helm: ns={} host={}",
cli.zitadel_namespace,
zitadel_host
);
zitadel
.interpret(&Inventory::empty(), &topology)
.await
.context("Zitadel helm install")?;
}
.interpret(&Inventory::empty(), &topology)
.await
.context("Zitadel helm install")?;
// ---- 2. ZitadelSetupScore: project + roles + machine users ----------
// Run this BEFORE building the operator score so we have the
// `fleet-operator` machine key in hand when filling
// OperatorCredentials. The Score caches keys to
// ZitadelClientConfig on disk; we read them back here.
log::info!(
"[2/6] Zitadel setup: project={} admin={} operator={}",
cli.project_name,
cli.admin_username,
cli.operator_username
);
let api_app_name = "nats";
let cli_app_name = "harmony-cli";
let zitadel_setup = ZitadelSetupScore {
ZitadelSetupScore {
host: zitadel_host.clone(),
scheme: Default::default(),
port: None,
skip_tls: false,
endpoint: None,
admin_org_id: None,
namespace: cli.zitadel_namespace.clone(),
namespace: cfg.zitadel_namespace.clone(),
// Device-code OIDC app for human admin login from
// `fleet_device_enroll`'s SSO flow. Operators sign in here
// with their personal Zitadel account; their resulting
// access token is what `mint_device_credentials` uses to
// create per-device users + keys. The numeric `client_id`
// generated by Zitadel for this app is what gets passed to
// `--admin-oidc-client-id`; we read it back from the
// ZitadelClientConfig cache below and print it in the
// success banner.
// `fleet_device_enroll`'s SSO flow. The numeric `client_id` Zitadel
// generates is read back below and printed for `--admin-oidc-client-id`.
applications: vec![ZitadelApplication {
project_name: cli.project_name.clone(),
project_name: cfg.project_name.clone(),
app_name: cli_app_name.to_string(),
app_type: ZitadelAppType::DeviceCode,
}],
api_apps: vec![ZitadelApiApp {
project_name: cli.project_name.clone(),
app_name: api_app_name.to_string(),
project_name: cfg.project_name.clone(),
app_name: "nats".to_string(),
}],
roles: vec![
ZitadelRole {
project_name: cli.project_name.clone(),
key: cli.admin_role.clone(),
project_name: cfg.project_name.clone(),
key: cfg.admin_role.clone(),
display_name: "Fleet Admin".to_string(),
group: None,
},
ZitadelRole {
project_name: cli.project_name.clone(),
key: cli.device_role.clone(),
project_name: cfg.project_name.clone(),
key: cfg.device_role.clone(),
display_name: "Device".to_string(),
group: None,
},
],
machine_users: vec![
ZitadelMachineUser {
username: cli.admin_username.clone(),
username: cfg.admin_username.clone(),
name: "Fleet Operations".to_string(),
create_pat: false,
machine_key: Some(MachineKeyType::Json),
project_name: Some(cli.project_name.clone()),
grant_roles: vec![cli.admin_role.clone()],
project_name: Some(cfg.project_name.clone()),
grant_roles: vec![cfg.admin_role.clone()],
},
ZitadelMachineUser {
username: cli.operator_username.clone(),
username: cfg.operator_username.clone(),
name: "Fleet Operator (in-cluster)".to_string(),
create_pat: false,
machine_key: Some(MachineKeyType::Json),
project_name: Some(cli.project_name.clone()),
grant_roles: vec![cli.admin_role.clone()],
project_name: Some(cfg.project_name.clone()),
grant_roles: vec![cfg.admin_role.clone()],
},
],
};
zitadel_setup
.interpret(&Inventory::empty(), &topology)
.await
.context("Zitadel setup (project + roles + machine users)")?;
}
.interpret(&Inventory::empty(), &topology)
.await
.context("Zitadel setup (project + roles + machine users)")?;
// Read back the project_id + operator key from cache.
// Read back the project_id + operator key + device-code client_id.
let zcfg = ZitadelClientConfig::load()
.context("ZitadelSetupScore did not produce a client config cache")?;
let project_id = zcfg
.project_id_by_name(&cli.project_name)
.project_id_by_name(&cfg.project_name)
.or(zcfg.project_id.as_ref())
.context("project_id missing from cache after setup")?
.clone();
let operator_machine_key = zcfg
.machine_key(&cli.operator_username)
.machine_key(&cfg.operator_username)
.with_context(|| {
format!(
"machine key for {} missing from cache after setup",
cli.operator_username
"machine key for {} missing from cache",
cfg.operator_username
)
})?
.clone();
let cli_client_id = zcfg
.client_id(cli_app_name)
.with_context(|| {
format!(
"OIDC client_id for app '{cli_app_name}' missing from cache — \
ZitadelSetupScore should have created the app and populated \
ZitadelClientConfig.apps"
)
})?
.with_context(|| format!("OIDC client_id for app '{cli_app_name}' missing from cache"))?
.clone();
log::info!("[2/6] project_id resolved: {project_id}");
log::info!("[2/6] device-code client_id for '{cli_app_name}' resolved: {cli_client_id}");
// ---- 3. Issuer NKey + auth callout pieces ---------------------------
// The callout signs user JWTs with this account NKey. NATS server
// is configured with the matching pubkey via the auth_callout
// block in the helm values rendered by NatsK8sScore.
// ---- OpenBao: deploy + policy, co-located in the fleet namespace --------
// The operator's credentials are seeded here so a later
// `harmony-fleet-deploy --from-tag <tag>` upgrades the operator alone.
let openbao = OpenbaoInstance {
namespace: cfg.fleet_namespace.clone(),
release: "openbao".to_string(),
};
OpenbaoScore {
instance: openbao.clone(),
host: secrets_host.clone(),
openshift: true,
tls_issuer: Some(cfg.cluster_issuer.clone()),
}
.interpret(&Inventory::empty(), &topology)
.await
.context("OpenBao deploy")?;
OpenbaoSetupScore {
instance: openbao.clone(),
policies: vec![OpenbaoPolicy {
name: "fleet-deployer".to_string(),
hcl: r#"path "secret/data/harmony/*" { capabilities = ["read"] }
path "secret/metadata/harmony/*" { capabilities = ["list","read"] }"#
.to_string(),
}],
..Default::default()
}
.interpret(&Inventory::empty(), &topology)
.await
.context("OpenBao setup")?;
// ---- Workload Scores: filterable via `harmony_cli::Args` ----------------
// The callout signs user JWTs with this account NKey; NATS is configured
// with the matching pubkey via the auth_callout block in its helm values.
let issuer_kp = KeyPair::new_account();
let issuer_seed = issuer_kp
.seed()
.map_err(|e| anyhow::anyhow!("issuer NKey seed: {e}"))?;
let issuer_pubkey = issuer_kp.public_key();
let nats_auth_user = "auth";
let nats_auth_pass = generate_alphanum(24);
// ---- 4. NATS install ------------------------------------------------
let nats_release = "fleet-nats";
log::info!(
"[3/6] NATS install: ns={} release={} ws={}",
cli.fleet_namespace,
nats_release,
nats_ws_host
let nats_url = format!(
"nats://{nats_release}.{}.svc.cluster.local:4222",
cfg.fleet_namespace
);
let nats_cluster = NatsCluster {
namespace: cli.fleet_namespace.clone(),
// `domain` is unused in single-instance mode (gateway off).
// Kept here for the legacy supercluster code path which the
// staging install doesn't take.
domain: cli.base_domain.clone(),
replicas: 1,
name: nats_release.to_string(),
gateway_advertise: String::new(),
dns_name: nats_ws_host.clone(),
// Static-string fields the NatsCluster shape requires; only
// referenced when `gateway` is Some, which it isn't here.
supercluster_ca_secret_name: "fleet-nats-supercluster-ca",
tls_cert_name: "fleet-nats-tls",
jetstream_enabled: "true",
};
let nats = NatsK8sScore {
distribution: KubernetesDistribution::OpenshiftFamily,
cluster: nats_cluster,
cluster: NatsCluster {
namespace: cfg.fleet_namespace.clone(),
// `domain` and the static-string fields below are only read in the
// supercluster path (gateway Some), which staging doesn't take.
domain: cfg.base_domain.clone(),
replicas: 1,
name: nats_release.to_string(),
gateway_advertise: String::new(),
dns_name: nats_ws_host.clone(),
supercluster_ca_secret_name: "fleet-nats-supercluster-ca",
tls_cert_name: "fleet-nats-tls",
jetstream_enabled: "true",
},
peers: None,
ca_bundle: None,
gateway: None, // single-instance — drop the gateway block
gateway: None,
auth_callout: Some(AuthCalloutCfg {
issuer_pubkey: issuer_pubkey.clone(),
issuer_pubkey: issuer_kp.public_key(),
auth_user: nats_auth_user.to_string(),
auth_pass: nats_auth_pass.clone(),
account: cli.nats_account.clone(),
account: cfg.nats_account.clone(),
}),
websocket: Some(WebSocketRouteCfg {
host: nats_ws_host.clone(),
cluster_issuer: cli.cluster_issuer.clone(),
cluster_issuer: cfg.cluster_issuer.clone(),
}),
};
nats.interpret(&Inventory::empty(), &topology)
.await
.context("NATS install (single-instance + auth_callout + WS Route)")?;
// ---- 5. Auth callout deployment -------------------------------------
log::info!(
"[4/6] Auth callout: image={} project_id={}",
cli.callout_image,
project_id
);
let mut callout = NatsAuthCalloutScore::new(
"fleet-callout",
&cli.fleet_namespace,
format!(
"nats://{nats_release}.{}.svc.cluster.local:4222",
cli.fleet_namespace
),
&cfg.fleet_namespace,
nats_url.clone(),
format!("https://{zitadel_host}"),
project_id.clone(),
nats_auth_user,
&nats_auth_pass,
&issuer_seed,
)
.image(&cli.callout_image)
.target_account(&cli.nats_account)
.admin_role(&cli.admin_role)
.device_role(&cli.device_role)
.image(&cfg.callout_image)
.target_account(&cfg.nats_account)
.admin_role(&cfg.admin_role)
.device_role(&cfg.device_role)
.danger_accept_invalid_certs(false);
callout.device_id_claim = "client_id".to_string();
callout.device_id_prefix_strip = "device-".to_string();
callout.roles_claim = format!("urn:zitadel:iam:org:project:{project_id}:roles");
callout
.interpret(&Inventory::empty(), &topology)
.await
.context("auth callout deploy")?;
// ---- 6. Operator deployment with credentials ------------------------
log::info!("[5/6] Operator: image={}", cli.operator_image);
// `key_json` MUST use TOML literal multi-line strings (`'''...'''`),
// not basic multi-line (`"""..."""`). Basic strings interpret
// backslash escapes, which corrupts the JSON keyfile: every `\n`
// inside the embedded RSA private key gets expanded to a literal
// newline (0x0A) before JSON parsing sees it, and JSON disallows
// raw control chars inside strings ("control character found while
// parsing a string"). Literal strings preserve `\n` as-is so the
// downstream JSON parser interprets it as an escape and decodes
// the multi-line PEM correctly.
let credentials_toml = format!(
r#"type = "zitadel-jwt"
oidc_issuer_url = "https://{zitadel_host}"
audience = "{project_id}"
key_json = '''{operator_key}'''
"#,
zitadel_host = zitadel_host,
project_id = project_id,
operator_key = operator_machine_key,
let credentials = OperatorCredentials::zitadel_jwt(
&format!("https://{zitadel_host}"),
&project_id,
&operator_machine_key,
);
let mut operator = FleetOperatorScore::new()
.namespace(&cli.fleet_namespace)
.namespace(&cfg.fleet_namespace)
.release_name("harmony-fleet-operator")
.image(&cli.operator_image)
.image(&cfg.operator_image)
.image_pull_policy("Always")
.nats_url(format!(
"nats://{nats_release}.{}.svc.cluster.local:4222",
cli.fleet_namespace
))
.nats_url(nats_url.clone())
.log_level("info,kube_runtime=warn");
operator.credentials = Some(OperatorCredentials { credentials_toml });
operator
.interpret(&Inventory::empty(), &topology)
.await
.context("operator deploy")?;
operator.credentials = Some(credentials.clone());
log::info!("[6/6] Stack installed.");
println!("\n=== fleet-staging install complete ===");
println!("Zitadel: https://{zitadel_host}/");
println!("NATS WS public: wss://{nats_ws_host}/");
println!(
"NATS in-cluster: nats://{nats_release}.{}.svc.cluster.local:4222",
cli.fleet_namespace
let scores: Vec<Box<dyn Score<K8sAnywhereTopology>>> =
vec![Box::new(nats), Box::new(callout), Box::new(operator)];
harmony_cli::run(Inventory::empty(), topology.clone(), scores, Some(args))
.await
.map_err(|e| anyhow::anyhow!("{e}"))?;
// ---- Seed operator credentials as FleetDeploySecrets --------------------
// Reached via port-forward with the cached root token, so it doesn't wait
// on the public route/cert. No kubeconfig — CD callers use their own context.
let k8s = topology
.k8s_client()
.await
.map_err(|e| anyhow::anyhow!(e))?;
let pf = k8s
.port_forward(&openbao.pod(), &openbao.namespace, 8200, 8200)
.await
.context("port-forward to OpenBao")?;
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
let store = OpenbaoSecretStore::new(
format!("http://127.0.0.1:{}", pf.port()),
"secret".to_string(),
"token".to_string(),
true,
Some(cached_root_token(&openbao).map_err(|e| anyhow::anyhow!(e))?),
None,
None,
None,
None,
None,
None,
)
.await
.context("OpenBao client")?;
ConfigClient::new(vec![
Arc::new(StoreSource::new("harmony".to_string(), store))
as Arc<dyn harmony_config::ConfigSource>,
])
.set(&FleetDeploySecrets {
operator_credentials_toml: credentials.credentials_toml.clone(),
kubeconfig: None,
})
.await
.context("seed FleetDeploySecrets")?;
info!("=== fleet-staging install complete ===");
info!(
"Zitadel: https://{zitadel_host}/ (admin user {})",
cfg.admin_username
);
println!(
"Operator: oc -n {} get deploy/harmony-fleet-operator",
cli.fleet_namespace
);
println!(
"Auth callout: oc -n {} get deploy/fleet-callout",
cli.fleet_namespace
);
println!("Project id: {project_id}");
println!(
"Admin user: {} (machine key in ~/.local/share/harmony/zitadel/client-config.json)",
cli.admin_username
);
println!(
"Operator user: {} (machine key embedded in operator's Secret)",
cli.operator_username
);
println!("SSO client_id: {cli_client_id} (app '{cli_app_name}', device-code grant)");
println!();
println!("To enroll a device, pass the SSO client_id explicitly:");
println!(
" fleet_device_enroll \\\n \
--target ssh://<user>@<device> \\\n \
--issuer-url https://{zitadel_host} \\\n \
--audience {project_id} \\\n \
--nats-url wss://{nats_ws_host} \\\n \
--admin-oidc-client-id {cli_client_id} \\\n \
info!("NATS WS public: wss://{nats_ws_host}/");
info!("OpenBao: https://{secrets_host}/");
info!("Project id: {project_id}");
info!("SSO client_id: {cli_client_id} (app '{cli_app_name}', device-code grant)");
info!(
"Enroll a device: fleet_device_enroll --target ssh://<user>@<device> \
--issuer-url https://{zitadel_host} --audience {project_id} \
--nats-url wss://{nats_ws_host} --admin-oidc-client-id {cli_client_id} \
--agent-binary <path>"
);

View File

@@ -8,7 +8,10 @@ license.workspace = true
[dependencies]
harmony = { path = "../../harmony" }
harmony_cli = { path = "../../harmony_cli" }
harmony_macros = { path = "../../harmony_macros" }
harmony_types = { path = "../../harmony_types" }
harmony_config = { path = "../../harmony_config" }
tokio.workspace = true
url.workspace = true
anyhow.workspace = true
clap = { version = "4", features = ["derive"] }
serde = { workspace = true }
schemars = "0.8"
tracing = { workspace = true }

View File

@@ -1,7 +1,36 @@
To install an openbao instance with harmony simply `cargo run -p example-openbao` .
# example-openbao
Depending on your environement configuration, it will either install a k3d cluster locally and deploy on it, or install to a remote cluster.
Installs a standalone OpenBao instance and makes it immediately usable as a
`harmony_config` store: deploy → init → unseal → KV v2. Depending on your
environment it either spins up a local k3d cluster or targets the remote
cluster `KUBECONFIG` points at.
Then follow the openbao documentation to initialize and unseal, this will make openbao usable.
Configuration comes from `ConfigClient` (`HARMONY_CONFIG_OpenbaoInstallConfig`
env JSON → OpenBao → interactive prompt). The only required field is `host`.
https://openbao.org/docs/platform/k8s/helm/run/
```bash
# Non-interactive: provide the config as JSON.
export HARMONY_CONFIG_OpenbaoInstallConfig='{
"host": "secrets-stg.cb1.nationtech.io",
"namespace": "openbao",
"release": "openbao",
"openshift": true,
"tls_issuer": "letsencrypt-prod"
}'
cargo run -p example-openbao -- --yes
```
`cargo run -p example-openbao -- --list` lists the scores without touching the
cluster. Run without `HARMONY_CONFIG_*` to be prompted for each field.
Optional features compose from config presence:
| Config field(s) | Effect |
|---------------------------------|------------------------------------------------------------|
| `tls_issuer` | cert-manager edge TLS on the ingress (omit for plain HTTP) |
| `oidc_issuer` + `oidc_audience` | JWT auth + a `harmony` role scoped to `secret/harmony/*` |
After it runs, point `harmony_config` at it with `OPENBAO_URL=https://<host>`
and `OPENBAO_TOKEN=<cached root token>` (the root token is at
`~/.local/share/harmony/openbao/unseal-keys.json`). Once `oidc_*` is set, SSO
callers can authenticate via `HARMONY_SSO_*` instead of the root token.

View File

@@ -1,22 +1,140 @@
use harmony::{
inventory::Inventory, modules::openbao::OpenbaoScore, topology::K8sAnywhereTopology,
//! Standalone OpenBao installer, configured entirely from
//! [`ConfigClient`] (`HARMONY_CONFIG_OpenbaoInstallConfig` env JSON →
//! OpenBao → interactive prompt). Deploys the chart, then initializes,
//! unseals, and enables KV v2 — so the result is immediately usable as a
//! `harmony_config` store (point `OPENBAO_URL` at the ingress host and
//! `OPENBAO_TOKEN` at the cached root token).
//!
//! Optional features compose purely from config presence — nothing is wired
//! unless its inputs are set:
//! - `tls_issuer` → cert-manager edge TLS on the ingress.
//! - `oidc_issuer` + `oidc_audience` → JWT auth method plus a `harmony`
//! role scoped to the `harmony-config` policy, letting SSO callers
//! (e.g. `harmony_config` via `HARMONY_SSO_*`) read/write
//! `secret/harmony/*` without the root token.
//!
//! This Score knows nothing about Zitadel: the OIDC issuer/audience are plain
//! config strings, so any OIDC provider works.
use anyhow::{Context, Result};
use clap::Parser;
use harmony::inventory::Inventory;
use harmony::modules::openbao::{
OpenbaoInstance, OpenbaoJwtAuth, OpenbaoPolicy, OpenbaoScore, OpenbaoSetupScore,
};
use harmony::score::Score;
use harmony::topology::K8sAnywhereTopology;
use harmony_config::{Config, ConfigClient};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use tracing::info;
/// Policy granting read/write to the `harmony_config` store path. Bound to the
/// JWT `harmony` role when OIDC is configured.
const HARMONY_CONFIG_POLICY: &str = "harmony-config";
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Config)]
#[serde(default)]
struct OpenbaoInstallConfig {
/// Public ingress host (e.g. `secrets-stg.<base-domain>`). Required.
host: String,
/// Helm release name + namespace the StatefulSet lives in.
namespace: String,
release: String,
/// OKD/OpenShift target — flips the chart's SCC-aware values.
openshift: bool,
/// cert-manager `ClusterIssuer` for edge TLS. Empty → plain HTTP ingress.
tls_issuer: String,
/// OIDC issuer URL for JWT auth (e.g. `https://sso-stg.<base>`). Empty,
/// or `oidc_audience` empty, disables JWT auth entirely.
oidc_issuer: String,
/// Expected `aud` for JWT auth (the OIDC project/client id).
oidc_audience: String,
}
impl Default for OpenbaoInstallConfig {
fn default() -> Self {
Self {
host: String::new(),
namespace: "openbao".to_string(),
release: "openbao".to_string(),
openshift: false,
tls_issuer: String::new(),
oidc_issuer: String::new(),
oidc_audience: String::new(),
}
}
}
#[tokio::main]
async fn main() {
let openbao = OpenbaoScore {
instance: Default::default(),
host: "openbao.sebastien.sto1.nationtech.io".to_string(),
openshift: false,
tls_issuer: None,
async fn main() -> Result<()> {
harmony_cli::cli_logger::init();
let args = harmony_cli::Args::parse();
let config = ConfigClient::for_namespace("harmony").await;
let cfg: OpenbaoInstallConfig = config
.get_or_prompt()
.await
.context("loading OpenbaoInstallConfig")?;
info!("Got full config {cfg:?}");
anyhow::ensure!(
!cfg.host.is_empty(),
"host must be set (e.g. secrets-stg.<base>)"
);
let instance = OpenbaoInstance {
namespace: cfg.namespace.clone(),
release: cfg.release.clone(),
};
let deploy = OpenbaoScore {
instance: instance.clone(),
host: cfg.host.clone(),
openshift: cfg.openshift,
tls_issuer: (!cfg.tls_issuer.is_empty()).then(|| cfg.tls_issuer.clone()),
};
// JWT auth composes in only when both issuer and audience are set; it
// pulls in the harmony-config policy so the role has something to grant.
let jwt_auth =
(!cfg.oidc_issuer.is_empty() && !cfg.oidc_audience.is_empty()).then(|| OpenbaoJwtAuth {
oidc_discovery_url: cfg.oidc_issuer.clone(),
bound_issuer: cfg.oidc_issuer.clone(),
role_name: "harmony".to_string(),
bound_audiences: cfg.oidc_audience.clone(),
user_claim: "sub".to_string(),
policies: vec![HARMONY_CONFIG_POLICY.to_string()],
ttl: "1h".to_string(),
max_ttl: "8h".to_string(),
});
let policies = if jwt_auth.is_some() {
vec![OpenbaoPolicy {
name: HARMONY_CONFIG_POLICY.to_string(),
hcl: r#"path "secret/data/harmony/*" { capabilities = ["create","read","update","delete"] }
path "secret/metadata/harmony/*" { capabilities = ["list","read"] }"#
.to_string(),
}]
} else {
vec![]
};
let setup = OpenbaoSetupScore {
instance,
kv_mount: "secret".to_string(),
policies,
users: vec![],
jwt_auth,
};
let scores: Vec<Box<dyn Score<K8sAnywhereTopology>>> = vec![Box::new(deploy), Box::new(setup)];
harmony_cli::run(
Inventory::autoload(),
Inventory::empty(),
K8sAnywhereTopology::from_env(),
vec![Box::new(openbao)],
None,
scores,
Some(args),
)
.await
.unwrap();
.map_err(|e| anyhow::anyhow!("{e}"))
}

View File

@@ -40,5 +40,5 @@ pub use companion::AgentObservation;
pub use nats::{FleetNatsScore, UserPassCredentials};
pub use operator::{FleetOperatorScore, OperatorCredentials, PublishedChart};
pub use release::{release_operator, version_from_tag};
pub use secrets::FleetDeploySecrets;
pub use secrets::{FleetDeployConfig, FleetDeploySecrets};
pub use server::FleetServerScore;

View File

@@ -15,7 +15,9 @@ use harmony::inventory::Inventory;
use harmony::topology::K8sAnywhereTopology;
use harmony_cli::Args as HarmonyCliArgs;
use harmony_config::ConfigClient;
use harmony_fleet_deploy::{FleetDeploySecrets, FleetOperatorScore, version_from_tag};
use harmony_fleet_deploy::{
FleetDeployConfig, FleetDeploySecrets, FleetOperatorScore, version_from_tag,
};
#[derive(Parser, Debug)]
#[command(
@@ -23,12 +25,9 @@ use harmony_fleet_deploy::{FleetDeploySecrets, FleetOperatorScore, version_from_
about = "Deploy the published harmony fleet operator chart"
)]
struct CliConfig {
#[arg(
long,
env = "HARMONY_FLEET_NAMESPACE",
default_value = "harmony-fleet-system"
)]
namespace: String,
/// Override the k8s namespace from config (e.g. `fleet-staging`).
#[arg(long, env = "HARMONY_FLEET_NAMESPACE")]
namespace: Option<String>,
/// Release tag to deploy (e.g. `harmony-fleet-operator-v0.0.2`); the
/// version is parsed from it in Rust so the workflow passes a tag and
@@ -40,22 +39,20 @@ struct CliConfig {
#[arg(long, env = "HARMONY_FLEET_OPERATOR_CHART_VERSION")]
operator_chart_version: Option<String>,
/// Override the OCI chart registry from config.
#[arg(long, env = "HARMONY_FLEET_OPERATOR_CHART_REGISTRY")]
operator_chart_registry: Option<String>,
/// Override the OCI chart project from config.
#[arg(long, env = "HARMONY_FLEET_OPERATOR_CHART_PROJECT")]
operator_chart_project: Option<String>,
/// Config namespace `FleetDeploySecrets` and `FleetDeployConfig` resolve under (Env → OpenBao).
#[arg(
long,
env = "HARMONY_FLEET_OPERATOR_CHART_REGISTRY",
default_value = "hub.nationtech.io"
env = "HARMONY_CONFIG_NAMESPACE",
default_value = "fleet-staging"
)]
operator_chart_registry: String,
#[arg(
long,
env = "HARMONY_FLEET_OPERATOR_CHART_PROJECT",
default_value = "harmony"
)]
operator_chart_project: String,
/// Config namespace `FleetDeploySecrets` resolves under (Env → OpenBao).
#[arg(long, env = "HARMONY_SECRET_NAMESPACE", default_value = "harmony")]
config_namespace: String,
#[command(flatten)]
@@ -77,12 +74,18 @@ async fn main() -> Result<()> {
let cli = CliConfig::parse();
let version = cli.chart_version()?;
let secrets: FleetDeploySecrets = ConfigClient::for_namespace(&cli.config_namespace)
.await
let config_client = ConfigClient::for_namespace(&cli.config_namespace).await;
let secrets: FleetDeploySecrets = config_client
.get()
.await
.context("loading FleetDeploySecrets (set HARMONY_CONFIG_FleetDeploySecrets or OpenBao)")?;
let config: FleetDeployConfig = config_client
.get_or_prompt()
.await
.context("loading FleetDeployConfig")?;
// Point KUBECONFIG at the scoped deployer credential before the
// topology reads it, so the runner pod needs no standing permissions.
// Held to end of scope so the tempfile outlives the deploy.
@@ -98,14 +101,19 @@ async fn main() -> Result<()> {
None => None,
};
let namespace = cli.namespace.unwrap_or(config.namespace);
let registry = cli
.operator_chart_registry
.unwrap_or(config.operator_chart_registry);
let project = cli
.operator_chart_project
.unwrap_or(config.operator_chart_project);
let operator = FleetOperatorScore::new()
.namespace(cli.namespace)
.namespace(namespace)
.nats_url(config.nats_url)
.credentials(secrets.operator_credentials_toml)
.published_chart(
cli.operator_chart_registry,
cli.operator_chart_project,
version,
);
.published_chart(registry, project, version);
harmony_cli::run(
Inventory::autoload(),

View File

@@ -102,6 +102,22 @@ pub struct OperatorCredentials {
pub credentials_toml: String,
}
impl OperatorCredentials {
/// Build the auth-callout credentials from a Zitadel machine key.
/// `key_json` goes in a TOML *literal* string — see the field docs
/// for why basic strings corrupt the embedded PEM.
pub fn zitadel_jwt(oidc_issuer_url: &str, audience: &str, key_json: &str) -> Self {
Self {
credentials_toml: format!(
"type = \"zitadel-jwt\"\n\
oidc_issuer_url = \"{oidc_issuer_url}\"\n\
audience = \"{audience}\"\n\
key_json = '''{key_json}'''\n"
),
}
}
}
impl Default for ChartOptions {
fn default() -> Self {
Self {

View File

@@ -1,4 +1,4 @@
//! Secrets for the published-chart (CD) operator deploy, via
//! Secrets and config for the published-chart (CD) operator deploy, via
//! [`harmony_config::ConfigClient`]. SSO-only by construction: no
//! user/pass field exists, so dev-only user/pass auth can't reach a prod
//! deploy. Resolved EnvSource → OpenBao, so the in-cluster runner pulls
@@ -24,6 +24,42 @@ pub struct FleetDeploySecrets {
pub kubeconfig: Option<String>,
}
/// Non-secret deploy config: k8s namespaces + chart coords. Loaded via
/// `ConfigClient::for_namespace("fleet-staging")` alongside `FleetDeploySecrets`.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Config)]
pub struct FleetDeployConfig {
/// K8s namespace where the operator, NATS, and callout live.
pub namespace: String,
/// Full NATS URL the operator connects to (e.g. `nats://fleet-nats.fleet-staging:4222`).
pub nats_url: String,
/// K8s namespace where Zitadel lives (for operator UI SSO).
pub zitadel_namespace: String,
/// K8s namespace where OpenBao lives (for operator secret fetching).
pub openbao_namespace: String,
/// OCI chart registry (e.g. `hub.nationtech.io`).
pub operator_chart_registry: String,
/// OCI chart project (e.g. `harmony`).
pub operator_chart_project: String,
}
impl Default for FleetDeployConfig {
fn default() -> Self {
Self {
namespace: "fleet-staging".to_string(),
nats_url: "nats://fleet-nats.fleet-staging:4222".to_string(),
zitadel_namespace: "zitadel-staging".to_string(),
openbao_namespace: "openbao-staging".to_string(),
operator_chart_registry: "hub.nationtech.io".to_string(),
operator_chart_project: "harmony".to_string(),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
@@ -37,4 +73,17 @@ mod tests {
// Secrets must never land in cleartext SQLite.
assert_eq!(FleetDeploySecrets::CLASS, ConfigClass::Secret);
}
#[test]
fn config_class_is_not_secret() {
assert_eq!(FleetDeployConfig::CLASS, ConfigClass::Standard);
}
#[test]
fn config_defaults() {
let c = FleetDeployConfig::default();
assert_eq!(c.namespace, "fleet-staging");
assert_eq!(c.zitadel_namespace, "zitadel-staging");
assert_eq!(c.openbao_namespace, "openbao-staging");
}
}

View File

@@ -43,7 +43,7 @@ impl<T: Topology> Maestro<T> {
/// Ensures the associated Topology is ready for operations.
/// Delegates the readiness check and potential setup actions to the Topology.
async fn prepare_topology(&mut self) -> Result<PreparationOutcome, PreparationError> {
pub async fn prepare_topology(&mut self) -> Result<PreparationOutcome, PreparationError> {
self.topology_state.prepare();
let result = self.topology.ensure_ready().await;

View File

@@ -13,7 +13,7 @@ use crate::{
topology::{HelmCommand, K8sclient, Topology},
};
pub use setup::{OpenbaoJwtAuth, OpenbaoPolicy, OpenbaoSetupScore, OpenbaoUser};
pub use setup::{OpenbaoJwtAuth, OpenbaoPolicy, OpenbaoSetupScore, OpenbaoUser, cached_root_token};
const DEFAULT_NAMESPACE: &str = "openbao";
const DEFAULT_RELEASE: &str = "openbao";

View File

@@ -143,8 +143,26 @@ fn keys_dir() -> PathBuf {
.unwrap_or_else(|| PathBuf::from("/tmp/harmony-openbao"))
}
fn keys_file() -> PathBuf {
keys_dir().join("unseal-keys.json")
/// Per-instance keys file. Keyed by namespace+release so multiple OpenBao
/// instances don't clobber each other's unseal keys in one shared file —
/// losing them means the instance can never be unsealed again.
fn keys_file(instance: &OpenbaoInstance) -> PathBuf {
keys_dir().join(format!(
"unseal-keys-{}-{}.json",
instance.namespace, instance.release
))
}
/// The root token from the cached unseal-keys file written at init.
/// Dev/staging convenience for callers that need to seed OpenBao right
/// after [`OpenbaoSetupScore`] runs; production uses auto-unseal and
/// wouldn't persist this.
pub fn cached_root_token(instance: &OpenbaoInstance) -> Result<String, String> {
let path = keys_file(instance);
let content = std::fs::read_to_string(&path).map_err(|e| format!("read {path:?}: {e}"))?;
let init: InitOutput =
serde_json::from_str(&content).map_err(|e| format!("parse {path:?}: {e}"))?;
Ok(init.root_token)
}
impl OpenbaoSetupInterpret {
@@ -188,7 +206,7 @@ impl OpenbaoSetupInterpret {
InterpretError::new(format!("Failed to create keys directory {:?}: {}", dir, e))
})?;
let path = keys_file();
let path = keys_file(&self.score.instance);
// Source of truth for "is this vault initialized?" is OpenBao itself,
// not a `bao status` pre-check parsed from stderr — that probe is
@@ -306,7 +324,7 @@ impl OpenbaoSetupInterpret {
}
info!("[OpenbaoSetup] Unsealing...");
let path = keys_file();
let path = keys_file(&self.score.instance);
let content = std::fs::read_to_string(&path)
.map_err(|e| InterpretError::new(format!("Failed to read keys: {e}")))?;
let init: InitOutput = serde_json::from_str(&content)

View File

@@ -16,13 +16,12 @@ harmony = { path = "../harmony" }
harmony_tui = { path = "../harmony_tui", optional = true }
inquire.workspace = true
tokio.workspace = true
env_logger.workspace = true
console = "0.16.0"
indicatif = "0.18.0"
lazy_static = "1.5.0"
log.workspace = true
indicatif-log-bridge = "0.2.3"
chrono.workspace = true
tracing.workspace = true
tracing-subscriber.workspace = true
[dev-dependencies]
harmony = { path = "../harmony", features = ["testing"] }
async-trait = "0.1"

View File

@@ -1,13 +1,11 @@
use chrono::Local;
use console::style;
use harmony::{
instrumentation::{self, HarmonyEvent},
modules::application::ApplicationFeatureStatus,
topology::TopologyStatus,
};
use log::{error, info, log_enabled};
use std::io::Write;
use std::sync::{Mutex, OnceLock};
use tracing::{error, info};
use tracing_subscriber::EnvFilter;
pub fn init() {
static INITIALIZED: OnceLock<()> = OnceLock::new();
@@ -17,68 +15,36 @@ pub fn init() {
});
}
// The framework still emits via the `log` crate; tracing-subscriber's default
// `tracing-log` bridge captures those records, so this subscriber covers both.
// Normal runs stay terse (level + message, ANSI-coloured); debug/trace adds the
// timestamp + target needed to actually debug — matching the old env_logger UX.
fn configure_logger() {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info"))
.format(|buf, record| {
let debug_mode = log_enabled!(log::Level::Debug);
let timestamp = Local::now().format("%Y-%m-%d %H:%M:%S");
let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info"));
let verbose = std::env::var("RUST_LOG")
.map(|v| v.to_lowercase().contains("debug") || v.to_lowercase().contains("trace"))
.unwrap_or(false);
let builder = tracing_subscriber::fmt().with_env_filter(filter);
let _ = if verbose {
builder.with_target(true).try_init()
} else {
builder.without_time().with_target(false).try_init()
};
}
let level = match record.level() {
log::Level::Error => style("ERROR").red(),
log::Level::Warn => style("WARN").yellow(),
log::Level::Info => style("INFO").green(),
log::Level::Debug => style("DEBUG").blue(),
log::Level::Trace => style("TRACE").magenta(),
};
if let Some(status) = record.key_values().get(log::kv::Key::from("status")) {
let status = status.to_borrowed_str().unwrap();
let emoji = match status {
"finished" => style(crate::theme::EMOJI_SUCCESS.to_string()).green(),
"skipped" => style(crate::theme::EMOJI_SKIP.to_string()).yellow(),
"failed" => style(crate::theme::EMOJI_ERROR.to_string()).red(),
_ => style("".into()),
};
if debug_mode {
writeln!(
buf,
"[{} {:<5} {}] {} {}",
timestamp,
level,
record.target(),
emoji,
record.args()
)
} else {
writeln!(buf, "[{:<5}] {} {}", level, emoji, record.args())
}
} else if let Some(emoji) = record.key_values().get(log::kv::Key::from("emoji")) {
if debug_mode {
writeln!(
buf,
"[{} {:<5} {}] {} {}",
timestamp,
level,
record.target(),
emoji,
record.args()
)
} else {
writeln!(buf, "[{:<5}] {} {}", level, emoji, record.args())
}
} else if debug_mode {
writeln!(
buf,
"[{} {:<5} {}] {}",
timestamp,
level,
record.target(),
record.args()
)
} else {
writeln!(buf, "[{:<5}] {}", level, record.args())
}
})
.init();
// Plain emojis — no `console::style` colour codes. The emoji already conveys
// status, the level is coloured by the fmt layer, and embedding ANSI here
// leaks escape codes when console's TTY detection disagrees with the writer.
fn ok() -> String {
crate::theme::EMOJI_SUCCESS.to_string()
}
fn skipped() -> String {
crate::theme::EMOJI_SKIP.to_string()
}
fn failed() -> String {
crate::theme::EMOJI_ERROR.to_string()
}
fn handle_events() {
@@ -93,8 +59,7 @@ fn handle_events() {
match event {
HarmonyEvent::HarmonyStarted => {}
HarmonyEvent::HarmonyFinished => {
let emoji = crate::theme::EMOJI_HARMONY.to_string();
info!(emoji = emoji.as_str(); "Harmony completed");
info!("{} Harmony completed", crate::theme::EMOJI_HARMONY);
}
HarmonyEvent::TopologyStateChanged {
topology,
@@ -103,29 +68,28 @@ fn handle_events() {
} => match status {
TopologyStatus::Queued => {}
TopologyStatus::Preparing => {
let emoji = format!(
"{}",
style(crate::theme::EMOJI_TOPOLOGY.to_string()).yellow()
info!(
"{} Preparing environment: {topology}...",
crate::theme::EMOJI_TOPOLOGY
);
info!(emoji = emoji.as_str(); "Preparing environment: {topology}...");
(*preparing_topology) = true;
}
TopologyStatus::Success => {
(*preparing_topology) = false;
if let Some(message) = message {
info!(status = "finished"; "{message}");
info!("{} {message}", ok());
}
}
TopologyStatus::Noop => {
(*preparing_topology) = false;
if let Some(message) = message {
info!(status = "skipped"; "{message}");
info!("{} {message}", skipped());
}
}
TopologyStatus::Error => {
(*preparing_topology) = false;
if let Some(message) = message {
error!(status = "failed"; "{message}");
error!("{} {message}", failed());
}
}
},
@@ -140,8 +104,10 @@ fn handle_events() {
info!("{message}");
} else {
(*current_score) = Some(score.clone());
let emoji = format!("{}", style(crate::theme::EMOJI_SCORE).blue());
info!(emoji = emoji.as_str(); "Interpreting score: {score}...");
info!(
"{} Interpreting score: {score}...",
crate::theme::EMOJI_SCORE
);
}
}
HarmonyEvent::InterpretExecutionFinished {
@@ -158,17 +124,17 @@ fn handle_events() {
match outcome {
Ok(outcome) => match outcome.status {
harmony::interpret::InterpretStatus::SUCCESS => {
info!(status = "finished"; "{}", outcome.message);
info!("{} {}", ok(), outcome.message);
}
harmony::interpret::InterpretStatus::NOOP => {
info!(status = "skipped"; "{}", outcome.message);
info!("{} {}", skipped(), outcome.message);
}
_ => {
error!(status = "failed"; "{}", outcome.message);
error!("{} {}", failed(), outcome.message);
}
},
Err(err) => {
error!(status = "failed"; "{err}");
error!("{} {err}", failed());
}
}
}
@@ -182,10 +148,13 @@ fn handle_events() {
info!("Installing feature '{feature}' for '{application}'...");
}
ApplicationFeatureStatus::Installed { details: _ } => {
info!(status = "finished"; "Feature '{feature}' installed");
info!("{} Feature '{feature}' installed", ok());
}
ApplicationFeatureStatus::Failed { message: details } => {
error!(status = "failed"; "Feature '{feature}' installation failed: {details}");
error!(
"{} Feature '{feature}' installation failed: {details}",
failed()
);
}
},
}

View File

@@ -5,7 +5,7 @@ use harmony::inventory::Inventory;
use harmony::maestro::Maestro;
use harmony::{score::Score, topology::Topology};
use inquire::Confirm;
use log::debug;
use tracing::debug;
pub mod cli_logger; // FIXME: Don't make me pub
mod cli_reporter;
@@ -119,7 +119,9 @@ pub async fn run_cli<T: Topology + Send + Sync + 'static>(
cli_logger::init();
cli_reporter::init();
let mut maestro = Maestro::initialize(inventory, topology).await.unwrap();
// Build the maestro WITHOUT preparing the topology — listing scores or
// declining the run must not touch the cluster. Prep is deferred to `init`.
let mut maestro = Maestro::new_without_initialization(inventory, topology);
maestro.register_all(scores);
let result = init(maestro, args).await;
@@ -129,11 +131,9 @@ pub async fn run_cli<T: Topology + Send + Sync + 'static>(
}
async fn init<T: Topology + Send + Sync + 'static>(
maestro: harmony::maestro::Maestro<T>,
mut maestro: harmony::maestro::Maestro<T>,
args: Args,
) -> Result<(), Box<dyn std::error::Error>> {
let _ = env_logger::builder().try_init();
let scores_vec = maestro_scores_filter(&maestro, args.all, args.filter, args.number);
if scores_vec.is_empty() {
@@ -166,6 +166,13 @@ async fn init<T: Topology + Send + Sync + 'static>(
}
}
// We're committed to running — only now prepare the topology (the
// expensive, cluster-touching step) so list/decline paths stay no-ops.
maestro
.prepare_topology()
.await
.map_err(|e| format!("topology preparation failed: {e}"))?;
// Run filtered scores
for s in scores_vec {
debug!("Running: {}", s.name());
@@ -182,8 +189,72 @@ mod tests {
maestro::Maestro,
modules::dummy::{ErrorScore, PanicScore, SuccessScore},
topology::HAClusterTopology,
topology::{PreparationError, PreparationOutcome, Topology},
};
/// Topology whose readiness check always fails. Lets a test assert that
/// `--list` never reaches `prepare_topology` (it would error if it did),
/// while the run path does.
struct ExplodingTopology;
#[async_trait::async_trait]
impl Topology for ExplodingTopology {
fn name(&self) -> &str {
"ExplodingTopology"
}
async fn ensure_ready(&self) -> Result<PreparationOutcome, PreparationError> {
Err(PreparationError::new(
"ensure_ready must not run on the list path".to_string(),
))
}
}
fn exploding_maestro() -> Maestro<ExplodingTopology> {
let mut maestro =
Maestro::new_without_initialization(Inventory::autoload(), ExplodingTopology);
maestro.register_all(vec![Box::new(SuccessScore {})]);
maestro
}
#[tokio::test]
async fn list_does_not_prepare_topology() {
// Topology prep fails; listing must still succeed because it never
// touches the topology.
let res = crate::init(
exploding_maestro(),
crate::Args {
yes: true,
filter: None,
interactive: false,
all: true,
number: 0,
list: true,
},
)
.await;
assert!(res.is_ok(), "--list should not prepare the topology");
}
#[tokio::test]
async fn run_prepares_topology_before_interpreting() {
// Same topology, but actually running: prep runs and its failure
// aborts before any score is interpreted.
let res = crate::init(
exploding_maestro(),
crate::Args {
yes: true,
filter: None,
interactive: false,
all: true,
number: 0,
list: false,
},
)
.await;
assert!(res.is_err(), "run path must prepare the topology first");
}
fn init_test_maestro() -> Maestro<HAClusterTopology> {
let inventory = Inventory::autoload();
let topology = HAClusterTopology::autoload();

View File

@@ -1,5 +1,6 @@
use crate::{ConfigClass, ConfigError, ConfigSource};
use async_trait::async_trait;
use log::{debug, info};
pub struct EnvSource;
@@ -16,6 +17,7 @@ impl ConfigSource for EnvSource {
) -> Result<Option<serde_json::Value>, ConfigError> {
let env_key = env_key_for(key);
debug!("Loading config from env var {env_key}");
match std::env::var(&env_key) {
Ok(value) => serde_json::from_str(&value).map(Some).map_err(|e| {
ConfigError::EnvError(format!(