feat: fleet e2e x86 vm support #288
@@ -29,26 +29,33 @@ why the negative path is intentionally untested (inquire has no
|
|||||||
stdin mock; covering it would need a `Config` type with a manual
|
stdin mock; covering it would need a `Config` type with a manual
|
||||||
non-prompting `InteractiveParseObj` impl — separate refactor).
|
non-prompting `InteractiveParseObj` impl — separate refactor).
|
||||||
|
|
||||||
### 1.2 — Manual end-to-end verification per fleet component
|
### 1.2 — End-to-end verification per fleet component
|
||||||
|
|
||||||
The user-stated bar: every component of the fleet stack deploys
|
Rows the `harmony-fleet-e2e` crate now covers as automated tests:
|
||||||
reliably manually. Not yet a single automated suite. Run through
|
|
||||||
this matrix on a developer box with libvirt + k3d + podman
|
| Component | How to run | Status |
|
||||||
available. Mark date + initials when each row passes.
|
|---|---|---|
|
||||||
|
| Pod-target agent + NATS in k3d | `HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping` | ✓ automated |
|
||||||
|
| ARM VM bring-up + agent (aarch64 cloud image, AAVMF firmware) | `HARMONY_FLEET_VM_E2E=1 cargo test -p harmony-fleet-e2e --test vm_ping` | ✓ automated |
|
||||||
|
| x86 VM bring-up + agent (KVM, fast path) | `HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 cargo test … --test vm_ping` | ✓ automated |
|
||||||
|
| Device-setup over SSH (FleetDeviceSetupScore) | Exercised by every `vm_*` test bring-up | ✓ automated |
|
||||||
|
| Ping (operator → agent over NATS request/reply) | Both `ping` (Pod) and `vm_ping` (VM) | ✓ automated |
|
||||||
|
| Agent KV isolation (own filter only) | `vm_isolation` | ✓ automated |
|
||||||
|
| Podman deployment lifecycle (deploy → upgrade → delete) | `vm_deploy_lifecycle` (+ `podman ps` ground-truth via SSH) | ✓ automated |
|
||||||
|
|
||||||
|
Verified at least once each on the dev host (aarch64 ~7 min,
|
||||||
|
x86_64 ~2.5 min); see `fleet/harmony-fleet-e2e/README.md` for
|
||||||
|
copy-paste commands and the wall-clock breakdown.
|
||||||
|
|
||||||
|
Rows still **manual** (no Rust automation yet — verify by hand
|
||||||
|
before merge and record date + initials):
|
||||||
|
|
||||||
| Component | How to deploy | What "works" looks like | Owner | Last verified |
|
| Component | How to deploy | What "works" looks like | Owner | Last verified |
|
||||||
|---|---|---|---|---|
|
|---|---|---|---|---|
|
||||||
| x86 VM (cloud-init Ubuntu) | `cargo run -p example_fleet_vm_setup` | `virsh list` shows running VM with SSH key trust | | |
|
|
||||||
| ARM VM (aarch64 + AAVMF firmware) | `cargo run -p example_fleet_vm_setup --features aarch64` (or `fleet/scripts/smoke-a3-arm.sh`) | aarch64 VM boots, fleet-agent comes up on it | | |
|
|
||||||
| Zitadel (full setup) | `cargo run -p example_fleet_staging_install -- --base-domain <…>` | Zitadel admin UI reachable, persisted admin password set, IAM PAT secret created | | |
|
| Zitadel (full setup) | `cargo run -p example_fleet_staging_install -- --base-domain <…>` | Zitadel admin UI reachable, persisted admin password set, IAM PAT secret created | | |
|
||||||
| NATS + auth callout | `cargo run -p example_fleet_auth_callout` (deploy phase) | NATS pod running on k3d; callout pod healthy; JWKS fetch logs visible | | |
|
| NATS + auth callout | `cargo run -p example_fleet_auth_callout` (deploy phase) | NATS pod running on k3d; callout pod healthy; JWKS fetch logs visible | | |
|
||||||
| Operator | `cargo run -p example_fleet_server_install` | Operator pod up, Deployment CRD registered, NATS KV buckets created | | |
|
| Operator | `cargo run -p example_fleet_server_install` | Operator pod up, Deployment CRD registered, NATS KV buckets created | | |
|
||||||
| Agent on x86 VM | follow `examples/fleet_e2e_demo/RUNBOOK.md` | Agent connects to NATS, publishes DeviceInfo to KV | | |
|
|
||||||
| Agent on ARM VM | same + arm64 target | same | | |
|
|
||||||
| Enrollment via Zitadel SSO | `cargo run -p example-fleet-sso-login` + `fleet-device-enroll --device-id …` | Device JWT minted, machine user provisioned, agent connects with bearer-token JWT | | |
|
| Enrollment via Zitadel SSO | `cargo run -p example-fleet-sso-login` + `fleet-device-enroll --device-id …` | Device JWT minted, machine user provisioned, agent connects with bearer-token JWT | | |
|
||||||
| Device-setup over SSH (FleetDeviceSetupScore) | from `examples/fleet_e2e_demo::apply_setup` flow | agent binary installed, systemd unit enabled, agent running | | |
|
|
||||||
| Ping (operator → agent over NATS request/reply) | `HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping` | green test, ping round-trip | | |
|
|
||||||
| Podman deployment | apply a `Deployment` CRD with `PodmanV0Score` payload, watch agent reconcile | `podman ps` on the device shows the requested container | | |
|
|
||||||
|
|
||||||
Outputs of each manual run go into a follow-up issue / PR
|
Outputs of each manual run go into a follow-up issue / PR
|
||||||
description, not committed here — this matrix is the index, not
|
description, not committed here — this matrix is the index, not
|
||||||
@@ -64,45 +71,48 @@ For each item below, the question is: **does the code on this
|
|||||||
branch honor the principle?**
|
branch honor the principle?**
|
||||||
|
|
||||||
- **P1. Deploy with Scores, not handrolled manifests.**
|
- **P1. Deploy with Scores, not handrolled manifests.**
|
||||||
- `fleet/harmony-fleet-e2e/src/stack.rs`: already cleaned in
|
- `fleet/harmony-fleet-e2e/src/stack.rs` + `vm/*` confirmed
|
||||||
the ADR-023 refactor. Re-confirm no `k8s_openapi::api::*`
|
handroll-free: only `*Score` types are composed; the only
|
||||||
structs survive in test/example code.
|
`k8s_openapi` use is the readiness-poll `Deployment` get
|
||||||
- `fleet/harmony-fleet-deploy/src/agent.rs`: builds
|
(cluster query, not a manifest build).
|
||||||
`Deployment` / `ConfigMap` / `Service` manually inside
|
- `fleet/harmony-fleet-deploy/src/agent.rs` still builds
|
||||||
`interpret`. **Technically** within ADR-023's letter (it's
|
`Deployment` / `ConfigMap` manually inside `interpret`. ADR-023
|
||||||
inside a Score's interpret body) but is the right
|
letter is honored (manifests are inside a Score's interpret
|
||||||
abstraction to compose `K8sResourceScore` instead?
|
body, not in test/CLI code), so accepted for this branch. A
|
||||||
*Flagged for review.*
|
future cleanup could compose `K8sResourceScore` instead —
|
||||||
|
track in a follow-up issue, not a blocker.
|
||||||
- **P2. E2E uses the same Scores as production.**
|
- **P2. E2E uses the same Scores as production.**
|
||||||
- `harmony-fleet-e2e` is the test of this. Confirm `stack.rs`
|
- ✓ verified by both Pod (`stack.rs`) and VM (`vm/*.rs`)
|
||||||
composes the same Scores as `example_fleet_server_install`.
|
harnesses — they compose `FleetNatsScore` + `FleetAgentScore`
|
||||||
|
+ `ProvisionVmScore` + `FleetDeviceSetupScore` exactly as
|
||||||
|
`example_fleet_server_install` / `example_fleet_vm_setup` do.
|
||||||
- **P3. One Score per deployable component.**
|
- **P3. One Score per deployable component.**
|
||||||
- `harmony/src/modules/fleet/setup_score.rs` is 1049 lines and
|
- `harmony/src/modules/fleet/setup_score.rs` (1049 lines) is a
|
||||||
composes Zitadel + NATS + callout + operator. ADR-023 says
|
*device-side composition* (podman + user + linger + config +
|
||||||
"composition is the user-facing primitive; don't build
|
systemd unit), not a multi-service deploy. Acceptable under
|
||||||
monolithic deploy-everything Scores." Confirm this file is a
|
P3; the file is on the deferred move-to-`*-deploy` list (§1.7
|
||||||
composition of primitives, not a megascore that bypasses
|
ADR-024 scope).
|
||||||
them.
|
|
||||||
- **The 3 open code review comments still apply** (see §3.1).
|
|
||||||
- **P4. Deploy returns only after smoke-test success.**
|
- **P4. Deploy returns only after smoke-test success.**
|
||||||
- This is *not* enforced today — see §3.2. Track as known
|
- Not enforced framework-wide; see §3.2. The e2e harness now
|
||||||
debt, not a merge blocker (ADR-023 left it open).
|
has `VmStack::wait_until_ready` (ping retry until subscribed)
|
||||||
|
as a per-test stand-in. Track as known debt, not a blocker.
|
||||||
- **P5. Deploy logic lives in a `*-deploy` crate.**
|
- **P5. Deploy logic lives in a `*-deploy` crate.**
|
||||||
- Confirm: `harmony-fleet-deploy` is the canonical home. The
|
- ✓ `harmony-fleet-deploy` is the canonical home. New
|
||||||
`harmony/src/modules/fleet/` directory should shrink, not
|
`companion/` module added there. The `harmony/src/modules/
|
||||||
grow, in follow-ups. ADR-024 proposes pulling more out.
|
fleet/` directory should still shrink — see §1.7.
|
||||||
- **P6. Topologies compile-time, selected at runtime.**
|
- **P6. Topologies compile-time, selected at runtime.**
|
||||||
- No `Box<dyn Topology>` plugin loaders introduced. Confirm
|
- ✓ `rg 'Box<dyn Topology'` clean across the new code.
|
||||||
with `rg 'Box<dyn Topology'` on the new code.
|
|
||||||
- **P7. Extend Scores with companions, not API changes.**
|
- **P7. Extend Scores with companions, not API changes.**
|
||||||
- Confirm no new methods were added to `Score` / `Interpret`
|
- ✓ first concrete companion landed:
|
||||||
traits.
|
`harmony-fleet-deploy::companion::AgentObservation` — derives
|
||||||
|
the agent's KV watch scope from typed `AgentConfig` without
|
||||||
|
touching `Score` / `Interpret`.
|
||||||
- **P8. CLI hybrid, staged (B today, C later).**
|
- **P8. CLI hybrid, staged (B today, C later).**
|
||||||
- Confirm new binaries follow the `harmony-*` naming pattern
|
- ✓ `harmony-fleet-deploy` binary follows the naming pattern
|
||||||
and use `harmony_cli`.
|
and uses `harmony_cli`. No plugin discovery introduced.
|
||||||
- **P9. thiserror everywhere, anyhow only at binary glue.**
|
- **P9. thiserror everywhere, anyhow only at binary glue.**
|
||||||
- Confirm new library code uses `thiserror`. Scan for
|
- ✓ new code (`vm/*.rs`, `kv_admin.rs`, `companion/`) uses
|
||||||
`anyhow::Error` returns in non-`main.rs` files.
|
typed errors via `thiserror`. `anyhow` only at test glue.
|
||||||
|
|
||||||
Capability-naming rules from `CLAUDE.md`:
|
Capability-naming rules from `CLAUDE.md`:
|
||||||
|
|
||||||
@@ -137,28 +147,9 @@ properly.
|
|||||||
|
|
||||||
### 1.5 — Operator frontend dead-code warnings
|
### 1.5 — Operator frontend dead-code warnings
|
||||||
|
|
||||||
`cargo test` (and `cargo check`) emit ~34 warnings about
|
✓ resolved. `MockFleetService` is now wired into the views;
|
||||||
unused trait + structs in
|
`cargo check -p harmony-fleet-operator --all-targets` is 0
|
||||||
`fleet/harmony-fleet-operator/src/service/{mod, mock}.rs`:
|
warnings. The "(a) wire the trait into the views" path landed.
|
||||||
`FleetService`, `DeviceDetail`, `DeploymentDetail`, etc. all
|
|
||||||
marked `never used`. The maud+htmx frontend was committed as
|
|
||||||
"initial commit, still much work to do." The views currently
|
|
||||||
inline mock data instead of going through the `FleetService`
|
|
||||||
trait.
|
|
||||||
|
|
||||||
Decision needed before merge:
|
|
||||||
|
|
||||||
- (a) Wire the trait into the views (real fix; preferred but
|
|
||||||
more code).
|
|
||||||
- (b) Add `#[allow(dead_code)]` at module level with a TODO that
|
|
||||||
references this checklist.
|
|
||||||
- (c) Delete the unused service abstraction and rebuild it when
|
|
||||||
the views need real data.
|
|
||||||
|
|
||||||
`cargo clippy` does not flag these — only `cargo check` does,
|
|
||||||
because the dead-code lint emits during the bin compilation
|
|
||||||
path, not the lib compilation path. So the warnings are
|
|
||||||
real but easy to miss.
|
|
||||||
|
|
||||||
### 1.6 — Untracked items decision
|
### 1.6 — Untracked items decision
|
||||||
|
|
||||||
@@ -255,35 +246,21 @@ For anyone landing on the PR cold:
|
|||||||
|
|
||||||
## §3 — Known issues and deferred items
|
## §3 — Known issues and deferred items
|
||||||
|
|
||||||
### 3.1 — Code review comments on `harmony-fleet-deploy` (unaddressed)
|
### 3.1 — Code review comments on `harmony-fleet-deploy`
|
||||||
|
|
||||||
Three PR comments from the user remain open. They are real
|
✓ resolved (commit `34807511 feat: refactor fleet agent config
|
||||||
architectural problems, not nits:
|
into a strongly typed struct, remove brittle string processing`):
|
||||||
|
|
||||||
- **`fleet/harmony-fleet-deploy/src/agent.rs::PodTarget`** is a
|
- `PodTarget` now carries the typed `harmony_fleet_auth::
|
||||||
stringly-typed duplicate of `harmony-fleet-agent`'s
|
AgentConfig` directly — no more stringly-typed duplicate.
|
||||||
`AgentConfig`. The deploy crate should depend on the agent's
|
- `render_config_map` uses `toml::to_string(&cfg)`; tested to
|
||||||
config types (or a shared types crate) and use them directly
|
round-trip TOML-special characters (`"`, `\`).
|
||||||
instead of redeclaring the schema as ad-hoc `String` fields.
|
- `render_user_pass_values` is now `FleetNatsValues` + `serde_yaml
|
||||||
YAML-mud-pit in Rust clothing.
|
::to_string`; YAML-special characters escape correctly.
|
||||||
|
|
||||||
- **`fleet/harmony-fleet-deploy/src/agent.rs::render_config_map`** builds the agent's `config.toml` via `format!()` with
|
Remaining follow-up (not a merge blocker): `harmony/src/modules/
|
||||||
manual quote-escaping. Any label value containing `"`, `\`, or
|
nats/helm_chart.rs::NatsHelmChartScore::values_yaml` still takes
|
||||||
newline produces broken TOML. Fix is `toml::to_string(&typed_struct)?` once the type plumbing from the comment above is
|
a raw `String`. Lifting that to typed values is a future cleanup.
|
||||||
in place.
|
|
||||||
|
|
||||||
- **`fleet/harmony-fleet-deploy/src/nats.rs::render_user_pass_values`** builds Helm values YAML via `format!()` with raw-string interpolation. Same class of bug. Fix: typed
|
|
||||||
`FleetNatsValues` struct (or a `serde_yaml::Value` tree) +
|
|
||||||
`serde_yaml::to_string`. The same anti-pattern is in
|
|
||||||
`harmony/src/modules/nats/helm_chart.rs::NatsHelmChartScore::values_yaml` (raw `String` field); lifting that to take typed
|
|
||||||
values is the harder follow-up, but worth scoping.
|
|
||||||
|
|
||||||
The user's framing of all three: *"it felt like a cheap
|
|
||||||
non-programmer crappy deployment patchwork script converted to
|
|
||||||
rust instead of a properly engineered deployment."* Fixing these
|
|
||||||
is a small PR (probably 200 lines including the typed structs
|
|
||||||
and tests). Should land before customer-facing v0.1, but not
|
|
||||||
necessarily before this branch merges to master.
|
|
||||||
|
|
||||||
### 3.2 — Smoke-test contract (ADR-023 principle 4) deferred
|
### 3.2 — Smoke-test contract (ADR-023 principle 4) deferred
|
||||||
|
|
||||||
@@ -324,12 +301,13 @@ message; no caller in this repo should hit it.
|
|||||||
|
|
||||||
### 3.5 — Bash smoke scripts vs Rust harness
|
### 3.5 — Bash smoke scripts vs Rust harness
|
||||||
|
|
||||||
`fleet/scripts/smoke-a{1,3,3-arm,4}.sh` are the only end-to-end
|
The Rust harness now covers what `smoke-a3.sh` and
|
||||||
harnesses that actually exercise the stack today. ADR-023
|
`smoke-a3-arm.sh` exercised — both aarch64 (production) and
|
||||||
principle 2 says "E2E uses the same Scores as production." The
|
x86_64 (fast iteration) VM bring-up, podman deploy lifecycle,
|
||||||
bash scripts violate that. Migrate to `harmony-fleet-e2e`-based
|
and ping. The bash scripts remain as operational reference but
|
||||||
Rust harnesses over time. Not a merge blocker — they're useful
|
the new Rust path is the primary route. `smoke-a1.sh` / `smoke-
|
||||||
operational tools today.
|
a4.sh` (which exercise other paths) still don't have Rust
|
||||||
|
equivalents — track for a follow-up PR.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -345,12 +323,13 @@ re-deriving from git log:
|
|||||||
- **ADR-024 is the proposal for an Alternative-B capability
|
- **ADR-024 is the proposal for an Alternative-B capability
|
||||||
decomposition**, extracted from `ROADMAP/fleet_platform/architecture_review.md` §§4–5. Marked `Status: Draft` because
|
decomposition**, extracted from `ROADMAP/fleet_platform/architecture_review.md` §§4–5. Marked `Status: Draft` because
|
||||||
JG is not yet convinced.
|
JG is not yet convinced.
|
||||||
- **The deploy crate's three review comments tie back to one
|
- **The deploy crate's three review comments are resolved** (see
|
||||||
root cause**: values were authored as untyped strings, so the
|
§3.1) by lifting `PodTarget` / `FleetNatsScore` values onto
|
||||||
speculative enum variants (`FleetAgentTarget::Vm` /
|
typed structs serialised via `toml::to_string` /
|
||||||
`FleetNatsAuth::Callout`), the fixture-data defaults, and the
|
`serde_yaml::to_string`. The speculative enum variants
|
||||||
PR-cycle text in error messages are all *consequences*. Fix
|
(`FleetAgentTarget::Vm` / `FleetNatsAuth::Callout`) and
|
||||||
the type plumbing and the rest collapses.
|
PR-cycle text in error messages remain — separate from the
|
||||||
|
three review comments, still flagged for review.
|
||||||
- **`harmony_config` test code now uses `tokio::sync::Mutex`** for
|
- **`harmony_config` test code now uses `tokio::sync::Mutex`** for
|
||||||
the `ENV_LOCK` that guards process env vars across `#[tokio::test]` awaits. Was `std::sync::Mutex` held across `.await` —
|
the `ENV_LOCK` that guards process env vars across `#[tokio::test]` awaits. Was `std::sync::Mutex` held across `.await` —
|
||||||
silent deadlock waiting to happen.
|
silent deadlock waiting to happen.
|
||||||
@@ -372,20 +351,25 @@ re-deriving from git log:
|
|||||||
|
|
||||||
## §5 — Working order
|
## §5 — Working order
|
||||||
|
|
||||||
When in doubt, do tasks roughly in this order:
|
What's left between here and `git push origin master`:
|
||||||
|
|
||||||
1. **Now**: §1.2 (manual component verification). Block on
|
1. **Still manual, must verify before merge** — the four
|
||||||
anything that's broken there.
|
remaining §1.2 rows (Zitadel, NATS+callout, Operator, Zitadel
|
||||||
2. **Now-ish**: §1.3 (drift review) and §1.4 (clippy-allow
|
enrollment). Mark the matrix with date + initials.
|
||||||
audit). Either fix or file follow-ups.
|
2. **JG review calls** — §1.4 (clippy-allow audit), §1.6
|
||||||
3. **Before merge**: §1.5 (operator frontend dead code), §1.6
|
(untracked items: `dev.sh`, `style/dist/`, `manual_mint/`),
|
||||||
(untracked items), §3.4 (one-line note in merge commit
|
§1.7 (ADR-024 accept/edit/reject/keep-as-draft), §1.8 (doc
|
||||||
message about the `harmony_secret` semantic).
|
cleanup remainder).
|
||||||
4. **At review time**: JG decides on §1.7 (ADR-024) and §1.8
|
3. **Merge commit body** — §3.4 (one-line note about the
|
||||||
(doc cleanup remainder).
|
`harmony_secret` default-store semantic change).
|
||||||
5. **After merge** (follow-up PRs): §3.1 (deploy crate type
|
|
||||||
plumbing), §3.2 (smoke-test contract), §3.3 (CI for e2e),
|
After merge (follow-up PRs, not blockers):
|
||||||
§3.5 (bash → Rust harnesses), §1.8 (doc cohesion PR).
|
|
||||||
|
- §3.2 — smoke-test contract design.
|
||||||
|
- §3.3 — CI runner with libvirt + k3d + podman so the 5
|
||||||
|
`#[ignore]`'d tests come back online.
|
||||||
|
- §3.5 — Rust equivalents for `smoke-a1.sh` / `smoke-a4.sh`.
|
||||||
|
- ADR-024 migration if §1.7 lands as accept.
|
||||||
|
|
||||||
This list shrinks as items resolve. Edit in place; don't append
|
This list shrinks as items resolve. Edit in place; don't append
|
||||||
a changelog.
|
a changelog.
|
||||||
|
|||||||
@@ -80,6 +80,12 @@ nats --server nats://localhost:30423 --user admin --password e2e-admin \
|
|||||||
request "device-commands.vm-device-00-<uuid8>.ping" ""
|
request "device-commands.vm-device-00-<uuid8>.ping" ""
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Or if you don't want to install the nats binary :
|
||||||
|
|
||||||
|
```
|
||||||
|
alias natsbox='podman run --network=host --rm docker.io/natsio/nats-box:latest nats --server nats://localhost:30423 --user admin --password e2e-admin'
|
||||||
|
```
|
||||||
|
|
||||||
You should see something like `{"device_id":"vm-device-00-<uuid8>","agent_version":"0.1.0","uptime_s":12}`.
|
You should see something like `{"device_id":"vm-device-00-<uuid8>","agent_version":"0.1.0","uptime_s":12}`.
|
||||||
|
|
||||||
### Cleaning up
|
### Cleaning up
|
||||||
|
|||||||
@@ -29,6 +29,8 @@ use harmony_fleet_deploy::{FleetAgentScore, FleetNatsScore, FleetOperatorScore,
|
|||||||
name = "harmony-fleet-deploy",
|
name = "harmony-fleet-deploy",
|
||||||
about = "Deploy the harmony fleet stack to a Kubernetes cluster"
|
about = "Deploy the harmony fleet stack to a Kubernetes cluster"
|
||||||
)]
|
)]
|
||||||
|
// TODO all env vars should be prefixed with HARMONY and k8s namespaces should begin with
|
||||||
|
// `harmony-` also
|
||||||
struct CliConfig {
|
struct CliConfig {
|
||||||
/// Namespace every component lands in. Production override comes
|
/// Namespace every component lands in. Production override comes
|
||||||
/// from `FLEET_NAMESPACE`.
|
/// from `FLEET_NAMESPACE`.
|
||||||
|
|||||||
@@ -92,6 +92,12 @@ impl FleetNatsScore {
|
|||||||
/// callout. The defaults are deliberately weak (`admin/e2e-admin`,
|
/// callout. The defaults are deliberately weak (`admin/e2e-admin`,
|
||||||
/// `device/e2e-device`); override with [`with_user_pass`].
|
/// `device/e2e-device`); override with [`with_user_pass`].
|
||||||
pub fn user_pass(namespace: impl Into<String>, node_port: u16) -> Self {
|
pub fn user_pass(namespace: impl Into<String>, node_port: u16) -> Self {
|
||||||
|
// TODO this should be behind a feature flag, this code should not exist in the
|
||||||
|
// production build
|
||||||
|
//
|
||||||
|
// Actually to make it simpler I would hardcode the dev credentials in the e2e crate
|
||||||
|
// and not the deployment crate. The e2e crate can easily use the score and pass it the
|
||||||
|
// proper config or use `.with_user_pass(...)`
|
||||||
Self {
|
Self {
|
||||||
namespace: namespace.into(),
|
namespace: namespace.into(),
|
||||||
release_name: "fleet-nats".to_string(),
|
release_name: "fleet-nats".to_string(),
|
||||||
|
|||||||
@@ -23,7 +23,7 @@ src/
|
|||||||
└── vm/ # VM-target harness
|
└── vm/ # VM-target harness
|
||||||
├── stack.rs # VmStack = infra Stack + Vec<VmDevice>
|
├── stack.rs # VmStack = infra Stack + Vec<VmDevice>
|
||||||
├── device.rs # one libvirt VM: ProvisionVmScore + FleetDeviceSetupScore
|
├── device.rs # one libvirt VM: ProvisionVmScore + FleetDeviceSetupScore
|
||||||
├── agent_build.rs # cross-build the agent for aarch64-unknown-linux-gnu
|
├── agent_build.rs # build the agent for the requested guest arch (aarch64 cross / x86_64 native)
|
||||||
└── network.rs # libvirt default-network gateway IP discovery
|
└── network.rs # libvirt default-network gateway IP discovery
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -32,9 +32,9 @@ Tests in `tests/` map 1:1 to scenarios:
|
|||||||
| File | What it asserts | Cost |
|
| File | What it asserts | Cost |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `ping.rs` | Pod agent replies to `Verb::Ping` over NATS | ~30 s (k3d + image build) |
|
| `ping.rs` | Pod agent replies to `Verb::Ping` over NATS | ~30 s (k3d + image build) |
|
||||||
| `vm_ping.rs` | VM agent replies to `Verb::Ping` over NATS | aarch64 VM bring-up |
|
| `vm_ping.rs` | VM agent replies to `Verb::Ping` over NATS | ~75 s (x86 KVM) / ~7 min (aarch64 TCG) |
|
||||||
| `vm_isolation.rs` | VM agent does NOT react to another device's KV key | shared VM |
|
| `vm_isolation.rs` | VM agent does NOT react to another device's KV key | ~75 s (x86 KVM) / ~8 min (aarch64 TCG) |
|
||||||
| `vm_deploy_lifecycle.rs` | deploy → upgrade → delete podman deployment, KV phases + `podman ps` ground truth | shared VM + image pulls |
|
| `vm_deploy_lifecycle.rs` | deploy → upgrade → delete podman deployment, KV phases + `podman ps` ground truth | ~90 s (x86 KVM) / ~7-8 min (aarch64 TCG) |
|
||||||
|
|
||||||
## Env gates
|
## Env gates
|
||||||
|
|
||||||
@@ -43,8 +43,9 @@ Every test in this crate is gated so `cargo test --workspace` stays cheap.
|
|||||||
| Var | Purpose |
|
| Var | Purpose |
|
||||||
|---|---|
|
|---|---|
|
||||||
| `HARMONY_FLEET_E2E=1` | Enable the Pod-target test (`ping.rs`). Needs k3d + podman on PATH. |
|
| `HARMONY_FLEET_E2E=1` | Enable the Pod-target test (`ping.rs`). Needs k3d + podman on PATH. |
|
||||||
| `HARMONY_FLEET_VM_E2E=1` | Enable the VM-target tests (`vm_*`). Needs libvirt + qemu + aarch64 cross-toolchain. |
|
| `HARMONY_FLEET_VM_E2E=1` | Enable the VM-target tests (`vm_*`). Needs libvirt + qemu (+ aarch64 cross-toolchain when running the default arch). |
|
||||||
| `FLEET_E2E_KEEP=1` | Leave the k8s namespace + libvirt VM in place on test exit (debug). |
|
| `FLEET_E2E_KEEP=1` | Leave the k8s namespace + libvirt VM in place on test exit (debug). |
|
||||||
|
| `FLEET_E2E_VM_ARCH=x86_64` | Boot an x86_64 KVM guest instead of an aarch64 TCG guest. Default `aarch64` (production target). x86 runs ~3-4× faster — useful for iteration. |
|
||||||
| `RUST_LOG=...` | Standard tracing filter; default is `info`. |
|
| `RUST_LOG=...` | Standard tracing filter; default is `info`. |
|
||||||
|
|
||||||
## Running tests
|
## Running tests
|
||||||
@@ -55,25 +56,69 @@ Every test in this crate is gated so `cargo test --workspace` stays cheap.
|
|||||||
HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping -- --nocapture
|
HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping -- --nocapture
|
||||||
```
|
```
|
||||||
|
|
||||||
### VM-target (expensive, real podman + aarch64 boot)
|
### VM-target — pick aarch64 (prod parity) or x86_64 (fast iteration)
|
||||||
|
|
||||||
|
The same three tests run against either guest arch — flip
|
||||||
|
`FLEET_E2E_VM_ARCH`. Defaults to `aarch64` (Raspberry Pi target).
|
||||||
|
|
||||||
|
| Path | Guest CPU | Wall-clock for `vm_ping` (warm caches) | Use when |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `FLEET_E2E_VM_ARCH=x86_64` | native KVM | **~75 s** | dev iteration loop |
|
||||||
|
| (default, `aarch64`) | qemu TCG emulation | **~7 min** | pre-push / CI / arch-drift catch |
|
||||||
|
|
||||||
|
CI **must** run aarch64 — even though x86 covers the logic, a new
|
||||||
|
crate dep with a broken aarch64 build or a podman call that segfaults
|
||||||
|
under TCG will only surface on the real target.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# One scenario at a time. Each test binary brings up its own VM
|
# ---- dev iteration loop (x86_64 KVM, ~3× faster end-to-end) ----
|
||||||
# (cargo runs each integration test file as a separate binary, so the
|
HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 RUST_LOG=info \
|
||||||
# per-binary `shared_vm_stack` OnceCell does not amortize across binaries).
|
cargo test -p harmony-fleet-e2e --test vm_ping -- --nocapture
|
||||||
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info cargo test -p harmony-fleet-e2e --test vm_ping -- --nocapture
|
HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 RUST_LOG=info \
|
||||||
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info cargo test -p harmony-fleet-e2e --test vm_isolation -- --nocapture
|
cargo test -p harmony-fleet-e2e --test vm_isolation -- --nocapture
|
||||||
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info cargo test -p harmony-fleet-e2e --test vm_deploy_lifecycle -- --nocapture
|
HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 RUST_LOG=info \
|
||||||
|
cargo test -p harmony-fleet-e2e --test vm_deploy_lifecycle -- --nocapture
|
||||||
|
|
||||||
# All three sequentially:
|
# ---- pre-push / CI (aarch64 — production target) ----
|
||||||
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info cargo test -p harmony-fleet-e2e \
|
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
|
||||||
|
cargo test -p harmony-fleet-e2e --test vm_ping -- --nocapture
|
||||||
|
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
|
||||||
|
cargo test -p harmony-fleet-e2e --test vm_isolation -- --nocapture
|
||||||
|
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
|
||||||
|
cargo test -p harmony-fleet-e2e --test vm_deploy_lifecycle -- --nocapture
|
||||||
|
|
||||||
|
# ---- all three sequentially (each is a separate binary → its own VM bring-up) ----
|
||||||
|
HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 RUST_LOG=info cargo test -p harmony-fleet-e2e \
|
||||||
--test vm_ping --test vm_isolation --test vm_deploy_lifecycle -- --nocapture --test-threads=1
|
--test vm_ping --test vm_isolation --test vm_deploy_lifecycle -- --nocapture --test-threads=1
|
||||||
|
|
||||||
# Everything in the crate at once (skips disabled, runs enabled):
|
# ---- everything in the crate at once (pod + vm, gates honored per-test) ----
|
||||||
HARMONY_FLEET_E2E=1 HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
|
HARMONY_FLEET_E2E=1 HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
|
||||||
cargo test -p harmony-fleet-e2e -- --nocapture --test-threads=1
|
cargo test -p harmony-fleet-e2e -- --nocapture --test-threads=1
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Wall-clock breakdown (measured on this host)
|
||||||
|
|
||||||
|
`vm_ping` from cold libvirt + cold cargo cache (one-time pain) to a
|
||||||
|
green test:
|
||||||
|
|
||||||
|
| Step | aarch64 TCG | x86_64 KVM | Speedup |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Agent build (cold) | 85 s (cross) | 72 s (native) | 1.2× |
|
||||||
|
| qemu start → DHCP | 48 s | 9 s | 5.3× |
|
||||||
|
| sshd accepts | 9 s | <1 s | ≥10× |
|
||||||
|
| Ansible Python detect | 15 s | 1 s | 15× |
|
||||||
|
| `apt install podman + systemd-container` | **261 s** | **23 s** | **11.3×** |
|
||||||
|
| FleetDeviceSetup steps 3-7 + restart | ~50 s | ~4 s | ~12× |
|
||||||
|
| `wait_until_ready` ping retry | ~2 s | <1 s | 2× |
|
||||||
|
| **Total test future (`finished in …s`)** | **440 s** | **149 s** | **2.95×** |
|
||||||
|
|
||||||
|
The single biggest swing is `apt install podman` inside the guest:
|
||||||
|
4 min 21 s on TCG vs 23 s on KVM. The whole-test 2.95× speedup is
|
||||||
|
because cold cargo cross-build and cargo native build are comparable
|
||||||
|
(~80 s either way) — the in-guest work is where the x86 path
|
||||||
|
collapses. **Warm-cache iteration is closer to 6× because the cargo
|
||||||
|
build vanishes.**
|
||||||
|
|
||||||
### Debugging a failed bring-up
|
### Debugging a failed bring-up
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -138,6 +183,3 @@ bring-up.
|
|||||||
`FleetNatsScore::user_pass` mode. The Zitadel-JWT path is
|
`FleetNatsScore::user_pass` mode. The Zitadel-JWT path is
|
||||||
exercised by `examples/fleet_e2e_demo` (currently `#[ignore]`'d
|
exercised by `examples/fleet_e2e_demo` (currently `#[ignore]`'d
|
||||||
pending a CI runner with full bring-up capacity).
|
pending a CI runner with full bring-up capacity).
|
||||||
- **x86_64 VM bring-up.** Locked to aarch64 because that's the
|
|
||||||
production target. An x86_64 fast-path can be added by widening
|
|
||||||
`VmStackOptions::arch`; out of scope today.
|
|
||||||
|
|||||||
@@ -1,26 +1,31 @@
|
|||||||
//! Cross-build the fleet agent binary for an aarch64 Linux guest.
|
//! Build the fleet agent binary for a target VM architecture.
|
||||||
//!
|
//!
|
||||||
//! Mirrors `fleet/scripts/smoke-a3-arm.sh` phase 2 in Rust: ensure
|
//! Two paths:
|
||||||
//! the `aarch64-unknown-linux-gnu` rustup target is installed, then
|
|
||||||
//! `cargo build --release --target aarch64-unknown-linux-gnu -p
|
|
||||||
//! harmony-fleet-agent`. Returns the path to the resulting binary
|
|
||||||
//! so `FleetDeviceSetupScore` can upload it.
|
|
||||||
//!
|
//!
|
||||||
//! Prereq the harness intentionally does **not** install for the
|
//! - **aarch64** — cross-build via `cargo build --release --target
|
||||||
//! operator: a working aarch64 GNU cross-toolchain on the host
|
//! aarch64-unknown-linux-gnu -p harmony-fleet-agent`. Requires the
|
||||||
//! (Arch: `aarch64-linux-gnu-gcc`; Debian/Ubuntu:
|
//! `aarch64-unknown-linux-gnu` rustup target *and* a GNU cross-linker
|
||||||
//! `gcc-aarch64-linux-gnu`). Without it, `cargo build` fails with
|
//! on the host (Arch: `aarch64-linux-gnu-gcc`; Debian/Ubuntu:
|
||||||
//! a link error we surface verbatim.
|
//! `gcc-aarch64-linux-gnu`). Mirrors `fleet/scripts/smoke-a3-arm.sh`
|
||||||
|
//! phase 2.
|
||||||
|
//! - **x86_64** — native host build via `cargo build --release -p
|
||||||
|
//! harmony-fleet-agent`. No `--target`, no rustup add, no
|
||||||
|
//! cross-linker. The same binary the Pod-target path consumes,
|
||||||
|
//! reused here for the faster-but-non-Pi VM smoke.
|
||||||
|
//!
|
||||||
|
//! The aarch64 path matches the production Raspberry Pi target byte
|
||||||
|
//! for byte; the x86_64 path is for fast-iteration tests where the
|
||||||
|
//! arch difference doesn't matter.
|
||||||
|
|
||||||
use std::path::{Path, PathBuf};
|
use std::path::{Path, PathBuf};
|
||||||
use std::process::Stdio;
|
use std::process::Stdio;
|
||||||
|
|
||||||
|
use harmony::topology::VmArchitecture;
|
||||||
use thiserror::Error;
|
use thiserror::Error;
|
||||||
use tokio::process::Command;
|
use tokio::process::Command;
|
||||||
|
|
||||||
/// Rust target triple used for the on-VM agent. aarch64-Linux-GNU
|
/// Rust target triple for the aarch64 cross-build.
|
||||||
/// matches the Ubuntu 24.04 cloud image the harness boots.
|
pub const AGENT_AARCH64_TARGET_TRIPLE: &str = "aarch64-unknown-linux-gnu";
|
||||||
pub const AGENT_TARGET_TRIPLE: &str = "aarch64-unknown-linux-gnu";
|
|
||||||
|
|
||||||
#[derive(Debug, Error)]
|
#[derive(Debug, Error)]
|
||||||
pub enum AgentBuildError {
|
pub enum AgentBuildError {
|
||||||
@@ -30,24 +35,36 @@ pub enum AgentBuildError {
|
|||||||
#[source]
|
#[source]
|
||||||
source: std::io::Error,
|
source: std::io::Error,
|
||||||
},
|
},
|
||||||
#[error("`rustup target add {AGENT_TARGET_TRIPLE}` failed (rc={rc}): {stderr}")]
|
#[error("`rustup target add {AGENT_AARCH64_TARGET_TRIPLE}` failed (rc={rc}): {stderr}")]
|
||||||
RustupAdd { rc: i32, stderr: String },
|
RustupAdd { rc: i32, stderr: String },
|
||||||
#[error(
|
#[error(
|
||||||
"`cargo build` for harmony-fleet-agent (target {AGENT_TARGET_TRIPLE}) failed (rc={rc}). \
|
"`cargo build` for harmony-fleet-agent (target {target}) failed (rc={rc}). \
|
||||||
The most common cause is a missing aarch64 GNU cross-linker — install one (Arch: \
|
For the aarch64 cross-build, the most common cause is a missing GNU cross-linker \
|
||||||
`aarch64-linux-gnu-gcc`; Debian/Ubuntu: `gcc-aarch64-linux-gnu`) and re-run."
|
(Arch: `aarch64-linux-gnu-gcc`; Debian/Ubuntu: `gcc-aarch64-linux-gnu`)."
|
||||||
)]
|
)]
|
||||||
CargoBuild { rc: i32 },
|
CargoBuild { target: String, rc: i32 },
|
||||||
#[error("agent binary not produced at expected path {path}")]
|
#[error("agent binary not produced at expected path {path}")]
|
||||||
MissingArtifact { path: String },
|
MissingArtifact { path: String },
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build (or rebuild, cargo-cached) the aarch64 agent binary and
|
/// Build the fleet agent for the requested guest architecture and
|
||||||
/// return its on-disk path. Cheap on warm cache; first run is the
|
/// return its on-disk path. Routes to the arch-specific builder.
|
||||||
/// expensive one.
|
pub async fn build_agent_for(
|
||||||
|
arch: VmArchitecture,
|
||||||
|
workspace_root: &Path,
|
||||||
|
) -> Result<PathBuf, AgentBuildError> {
|
||||||
|
match arch {
|
||||||
|
VmArchitecture::Aarch64 => build_agent_for_aarch64(workspace_root).await,
|
||||||
|
VmArchitecture::X86_64 => build_agent_for_x86_64(workspace_root).await,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Cross-build for aarch64-Linux-GNU. The on-disk path lives under
|
||||||
|
/// `target/aarch64-unknown-linux-gnu/release/` so it doesn't collide
|
||||||
|
/// with the host's native build.
|
||||||
pub async fn build_agent_for_aarch64(workspace_root: &Path) -> Result<PathBuf, AgentBuildError> {
|
pub async fn build_agent_for_aarch64(workspace_root: &Path) -> Result<PathBuf, AgentBuildError> {
|
||||||
let rustup = Command::new("rustup")
|
let rustup = Command::new("rustup")
|
||||||
.args(["target", "add", AGENT_TARGET_TRIPLE])
|
.args(["target", "add", AGENT_AARCH64_TARGET_TRIPLE])
|
||||||
.stdout(Stdio::null())
|
.stdout(Stdio::null())
|
||||||
.stderr(Stdio::piped())
|
.stderr(Stdio::piped())
|
||||||
.output()
|
.output()
|
||||||
@@ -64,22 +81,19 @@ pub async fn build_agent_for_aarch64(workspace_root: &Path) -> Result<PathBuf, A
|
|||||||
}
|
}
|
||||||
|
|
||||||
tracing::info!(
|
tracing::info!(
|
||||||
target = AGENT_TARGET_TRIPLE,
|
target = AGENT_AARCH64_TARGET_TRIPLE,
|
||||||
"cargo build --release -p harmony-fleet-agent (cross-build)",
|
"cargo build --release -p harmony-fleet-agent (cross-build aarch64)",
|
||||||
);
|
);
|
||||||
let build = Command::new("cargo")
|
let build = Command::new("cargo")
|
||||||
.args([
|
.args([
|
||||||
"build",
|
"build",
|
||||||
"--release",
|
"--release",
|
||||||
"--target",
|
"--target",
|
||||||
AGENT_TARGET_TRIPLE,
|
AGENT_AARCH64_TARGET_TRIPLE,
|
||||||
"-p",
|
"-p",
|
||||||
"harmony-fleet-agent",
|
"harmony-fleet-agent",
|
||||||
])
|
])
|
||||||
.current_dir(workspace_root)
|
.current_dir(workspace_root)
|
||||||
// Inherit stderr so cargo's progress + any linker error
|
|
||||||
// lands on the test runner's console exactly as it would
|
|
||||||
// on the command line.
|
|
||||||
.stderr(Stdio::inherit())
|
.stderr(Stdio::inherit())
|
||||||
.stdout(Stdio::inherit())
|
.stdout(Stdio::inherit())
|
||||||
.status()
|
.status()
|
||||||
@@ -90,13 +104,51 @@ pub async fn build_agent_for_aarch64(workspace_root: &Path) -> Result<PathBuf, A
|
|||||||
})?;
|
})?;
|
||||||
if !build.success() {
|
if !build.success() {
|
||||||
return Err(AgentBuildError::CargoBuild {
|
return Err(AgentBuildError::CargoBuild {
|
||||||
|
target: AGENT_AARCH64_TARGET_TRIPLE.to_string(),
|
||||||
|
rc: build.code().unwrap_or(-1),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
let bin = workspace_root
|
||||||
|
.join("target")
|
||||||
|
.join(AGENT_AARCH64_TARGET_TRIPLE)
|
||||||
|
.join("release")
|
||||||
|
.join("harmony-fleet-agent");
|
||||||
|
if !bin.exists() {
|
||||||
|
return Err(AgentBuildError::MissingArtifact {
|
||||||
|
path: bin.display().to_string(),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
Ok(bin)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Native build for x86_64. No rustup target add, no `--target` flag
|
||||||
|
/// — the host *is* x86_64, so cargo's default output at
|
||||||
|
/// `target/release/harmony-fleet-agent` is exactly what we want.
|
||||||
|
/// Assumes the test harness runs on an x86_64 host; calling this on
|
||||||
|
/// a non-x86 host produces a binary that won't boot in the guest.
|
||||||
|
pub async fn build_agent_for_x86_64(workspace_root: &Path) -> Result<PathBuf, AgentBuildError> {
|
||||||
|
tracing::info!("cargo build --release -p harmony-fleet-agent (native x86_64)");
|
||||||
|
let build = Command::new("cargo")
|
||||||
|
.args(["build", "--release", "-p", "harmony-fleet-agent"])
|
||||||
|
.current_dir(workspace_root)
|
||||||
|
.stderr(Stdio::inherit())
|
||||||
|
.stdout(Stdio::inherit())
|
||||||
|
.status()
|
||||||
|
.await
|
||||||
|
.map_err(|source| AgentBuildError::Spawn {
|
||||||
|
cmd: "cargo".to_string(),
|
||||||
|
source,
|
||||||
|
})?;
|
||||||
|
if !build.success() {
|
||||||
|
return Err(AgentBuildError::CargoBuild {
|
||||||
|
target: "x86_64-unknown-linux-gnu (native)".to_string(),
|
||||||
rc: build.code().unwrap_or(-1),
|
rc: build.code().unwrap_or(-1),
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
let bin = workspace_root
|
let bin = workspace_root
|
||||||
.join("target")
|
.join("target")
|
||||||
.join(AGENT_TARGET_TRIPLE)
|
|
||||||
.join("release")
|
.join("release")
|
||||||
.join("harmony-fleet-agent");
|
.join("harmony-fleet-agent");
|
||||||
if !bin.exists() {
|
if !bin.exists() {
|
||||||
|
|||||||
@@ -22,10 +22,13 @@ pub mod device;
|
|||||||
pub mod network;
|
pub mod network;
|
||||||
pub mod stack;
|
pub mod stack;
|
||||||
|
|
||||||
pub use agent_build::{AGENT_TARGET_TRIPLE, AgentBuildError, build_agent_for_aarch64};
|
pub use agent_build::{
|
||||||
|
AGENT_AARCH64_TARGET_TRIPLE, AgentBuildError, build_agent_for, build_agent_for_aarch64,
|
||||||
|
build_agent_for_x86_64,
|
||||||
|
};
|
||||||
pub use device::{VmDevice, VmDeviceError, VmDeviceOptions};
|
pub use device::{VmDevice, VmDeviceError, VmDeviceOptions};
|
||||||
pub use network::{NetworkLookupError, libvirt_default_gateway_ip};
|
pub use network::{NetworkLookupError, libvirt_default_gateway_ip};
|
||||||
pub use stack::{
|
pub use stack::{
|
||||||
LIBVIRT_NETWORK, LIBVIRT_URI, VM_NAME_PREFIX, VmBringUpError, VmReadyError, VmStack,
|
ENV_VM_ARCH, LIBVIRT_NETWORK, LIBVIRT_URI, VM_NAME_PREFIX, VmBringUpError, VmReadyError,
|
||||||
VmStackOptions, shared_vm_stack,
|
VmStack, VmStackOptions, shared_vm_stack,
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -27,7 +27,7 @@ use tokio::sync::OnceCell;
|
|||||||
use uuid::Uuid;
|
use uuid::Uuid;
|
||||||
|
|
||||||
use crate::stack::{BringUpError, NATS_NODE_PORT, Stack, StackOptions, shared_stack};
|
use crate::stack::{BringUpError, NATS_NODE_PORT, Stack, StackOptions, shared_stack};
|
||||||
use crate::vm::agent_build::{AgentBuildError, build_agent_for_aarch64};
|
use crate::vm::agent_build::{AgentBuildError, build_agent_for};
|
||||||
use crate::vm::device::{VmDevice, VmDeviceError, VmDeviceOptions};
|
use crate::vm::device::{VmDevice, VmDeviceError, VmDeviceOptions};
|
||||||
use crate::vm::network::{NetworkLookupError, libvirt_default_gateway_ip};
|
use crate::vm::network::{NetworkLookupError, libvirt_default_gateway_ip};
|
||||||
|
|
||||||
@@ -82,11 +82,34 @@ impl Default for VmStackOptions {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Env var that lets tests pick a guest arch at runtime without a
|
||||||
|
/// recompile. Accepts `aarch64`/`arm64` and `x86_64`/`x86-64`.
|
||||||
|
/// Unset = defaults to aarch64 (production target).
|
||||||
|
pub const ENV_VM_ARCH: &str = "FLEET_E2E_VM_ARCH";
|
||||||
|
|
||||||
|
impl VmStackOptions {
|
||||||
|
/// Read env overrides (today: just [`ENV_VM_ARCH`]) and apply
|
||||||
|
/// them on top of [`Default`]. Returns the canonical "what the
|
||||||
|
/// test asked for" struct, so tests don't have to re-implement
|
||||||
|
/// env parsing.
|
||||||
|
pub fn from_env() -> Self {
|
||||||
|
let mut opts = Self::default();
|
||||||
|
if let Ok(raw) = std::env::var(ENV_VM_ARCH) {
|
||||||
|
match raw.to_ascii_lowercase().as_str() {
|
||||||
|
"aarch64" | "arm64" => opts.arch = VmArchitecture::Aarch64,
|
||||||
|
"x86_64" | "x86-64" | "x86" | "amd64" => opts.arch = VmArchitecture::X86_64,
|
||||||
|
other => panic!("{ENV_VM_ARCH}={other:?} not recognized — use aarch64 or x86_64"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
opts
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[derive(Debug, Error)]
|
#[derive(Debug, Error)]
|
||||||
pub enum VmBringUpError {
|
pub enum VmBringUpError {
|
||||||
#[error("infra bring-up: {0}")]
|
#[error("infra bring-up: {0}")]
|
||||||
Infra(#[from] BringUpError),
|
Infra(#[from] BringUpError),
|
||||||
#[error("aarch64 agent cross-build: {0}")]
|
#[error("agent build: {0}")]
|
||||||
AgentBuild(#[from] AgentBuildError),
|
AgentBuild(#[from] AgentBuildError),
|
||||||
#[error("libvirt gateway IP discovery: {0}")]
|
#[error("libvirt gateway IP discovery: {0}")]
|
||||||
GatewayIp(#[from] NetworkLookupError),
|
GatewayIp(#[from] NetworkLookupError),
|
||||||
@@ -154,9 +177,11 @@ impl VmStack {
|
|||||||
// place.
|
// place.
|
||||||
let infra = shared_stack(StackOptions::infra_only()).await?;
|
let infra = shared_stack(StackOptions::infra_only()).await?;
|
||||||
|
|
||||||
// 2. Cross-build the aarch64 agent binary once for all VMs.
|
// 2. Build the agent binary for the requested guest arch.
|
||||||
|
// aarch64 cross-builds; x86_64 takes the host's native
|
||||||
|
// output.
|
||||||
let workspace_root = workspace_root_from_env();
|
let workspace_root = workspace_root_from_env();
|
||||||
let agent_binary = build_agent_for_aarch64(&workspace_root).await?;
|
let agent_binary = build_agent_for(opts.arch, &workspace_root).await?;
|
||||||
|
|
||||||
// 3. Discover the libvirt gateway IP so the VM can reach
|
// 3. Discover the libvirt gateway IP so the VM can reach
|
||||||
// the host's NATS NodePort.
|
// the host's NATS NodePort.
|
||||||
|
|||||||
@@ -51,7 +51,7 @@ async fn vm_agent_drives_full_deploy_lifecycle() -> anyhow::Result<()> {
|
|||||||
)
|
)
|
||||||
.try_init();
|
.try_init();
|
||||||
|
|
||||||
let stack = shared_vm_stack(VmStackOptions::default()).await?;
|
let stack = shared_vm_stack(VmStackOptions::from_env()).await?;
|
||||||
stack.print_debug_info();
|
stack.print_debug_info();
|
||||||
stack.wait_until_ready(Duration::from_secs(60)).await?;
|
stack.wait_until_ready(Duration::from_secs(60)).await?;
|
||||||
|
|
||||||
|
|||||||
@@ -50,7 +50,7 @@ async fn agent_ignores_other_devices_keys() -> anyhow::Result<()> {
|
|||||||
)
|
)
|
||||||
.try_init();
|
.try_init();
|
||||||
|
|
||||||
let stack = shared_vm_stack(VmStackOptions::default()).await?;
|
let stack = shared_vm_stack(VmStackOptions::from_env()).await?;
|
||||||
stack.print_debug_info();
|
stack.print_debug_info();
|
||||||
stack.wait_until_ready(Duration::from_secs(60)).await?;
|
stack.wait_until_ready(Duration::from_secs(60)).await?;
|
||||||
|
|
||||||
|
|||||||
@@ -37,7 +37,7 @@ async fn agent_on_vm_replies_to_ping() -> anyhow::Result<()> {
|
|||||||
)
|
)
|
||||||
.try_init();
|
.try_init();
|
||||||
|
|
||||||
let stack = shared_vm_stack(VmStackOptions::default()).await?;
|
let stack = shared_vm_stack(VmStackOptions::from_env()).await?;
|
||||||
stack.print_debug_info();
|
stack.print_debug_info();
|
||||||
|
|
||||||
// `FleetDeviceSetupScore` returns when the systemd unit is
|
// `FleetDeviceSetupScore` returns when the systemd unit is
|
||||||
|
|||||||
Reference in New Issue
Block a user