Files
harmony/fleet/harmony-fleet-e2e/tests/ping.rs
Jean-Gabriel Gill-Couture 020ebcb1f9 refactor(fleet): deploy-architecture cleanup per ADR-023 — Scores everywhere, deploy crate, principles in CLAUDE.md
The previous e2e harness handrolled k8s manifests in `stack.rs`,
bypassing the Score-Topology-Interpret machinery harmony exists to
provide. This commit:

1. **ADR-023** codifies the rules: deploy with Scores (not
   manifests), e2e uses the same Scores as production, one Score
   per component, deploy blocks on smoke-test success, deploy logic
   lives in `*-deploy` crates, topologies are compile-time,
   thiserror over anyhow. CLAUDE.md mirrors the principles.

2. **New `fleet/harmony-fleet-deploy` crate** is the canonical home
   for fleet-component Scores:
   - `FleetOperatorScore` + helm-chart generator + `install_crds`
     moved out of `harmony::modules::fleet::operator` (they should
     never have lived in `harmony` core). `FleetServerScore`
     (composite of NATS + operator + Zitadel + callout) moved too.
   - New `FleetNatsScore` (preset over `NatsHelmChartScore` with
     fleet's required values; v1 supports `UserPass` auth, callout
     mode reserved on the public API for PR 1.5).
   - New `FleetAgentScore` with `FleetAgentTarget::Pod`; `Vm`
     target is a future variant that absorbs `FleetDeviceSetupScore`.
   - `harmony-fleet-deploy` binary built on the existing
     `harmony_cli` crate — no new CLI scaffolding.

3. **Operator runtime binary trimmed**: `Install` and `Chart`
   subcommands removed; both jobs now belong to
   `harmony-fleet-deploy`. The runtime binary becomes leaner.

4. **E2E harness rewritten** as a thin Score composer:
   `harmony-fleet-e2e/src/stack.rs` deploys the stack via
   `FleetNatsScore` + `FleetAgentScore`. The inline NATS manifest
   factory and the bespoke agent Pod renderer are gone.
   - Bring-up runs once per test binary via `shared_stack` +
     `tokio::sync::OnceCell` (matches the `fleet_e2e_demo` pattern).
   - Stale `e2e-*` namespaces from prior runs get pruned at
     startup so the leaks the OnceCell creates don't compound.

5. **`thiserror` for the agent's `CommandServer`** — replaces the
   anyhow-based surface with typed `CommandError` /
   `CommandServerError`.

6. **Memory** captures eight load-bearing principles (saved to
   `~/.claude/projects/.../memory/`) so future sessions don't drift
   back into manifest-handrolling.

Verified: `cargo test -p harmony-fleet-e2e --test ping` green
end-to-end against k3d in 25s warm.
2026-05-18 22:54:50 -04:00

71 lines
2.3 KiB
Rust

//! TDD anchor test: an in-cluster agent answers `Verb::Ping` over
//! NATS request/reply, end-to-end. First green test for the
//! `device-commands.*` protocol.
//!
//! Bring-up runs **once** for the whole test binary via
//! `harmony_fleet_e2e::shared_stack` (the same `OnceCell` pattern
//! `fleet_e2e_demo`'s walking skeleton uses). Cleanup is best-effort
//! at process exit; per-bring-up namespace is unique so concurrent
//! test runs don't collide.
//!
//! Skipped automatically when `HARMONY_FLEET_E2E=1` is not set — this
//! keeps `cargo test --workspace` cheap on machines without
//! k3d/podman. Run explicitly:
//!
//! ```bash
//! HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping
//! ```
use std::time::Duration;
use harmony_fleet_e2e::{StackOptions, shared_stack};
use harmony_fleet_operator::commands::FleetCommandsClient;
const E2E_ENV: &str = "HARMONY_FLEET_E2E";
fn e2e_enabled() -> bool {
matches!(std::env::var(E2E_ENV).as_deref(), Ok("1" | "true"))
}
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn operator_can_ping_agent() -> anyhow::Result<()> {
if !e2e_enabled() {
eprintln!(
"skipping {E2E_ENV}-gated e2e test (set {E2E_ENV}=1 to run; \
requires k3d + podman on PATH)"
);
return Ok(());
}
let _ = tracing_subscriber::fmt()
.with_env_filter(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info")),
)
.try_init();
let stack = shared_stack(StackOptions::default()).await?;
stack.print_debug_info();
let device_id = stack.device_ids[0].clone();
let client = FleetCommandsClient::new(stack.nats_client.clone());
// Generous outer timeout — bring-up readiness is verified
// separately; this only guards against an agent that comes up
// but never subscribes.
let reply = tokio::time::timeout(Duration::from_secs(15), client.ping(&device_id))
.await
.map_err(|_| anyhow::anyhow!("ping outer timeout"))??;
assert_eq!(
reply.device_id.to_string(),
device_id,
"agent must report back its own device_id"
);
assert!(
!reply.agent_version.is_empty(),
"agent_version must be non-empty (env!(CARGO_PKG_VERSION) at compile time)"
);
Ok(())
}