7.2 KiB
Phase 6: E2E Tests for OKD HA Cluster on KVM
Goal
Prove the full OKD bare-metal installation flow works end-to-end using KVM virtual machines. This is the ultimate validation of Harmony's core value proposition: declare an OKD cluster, point it at infrastructure, watch it materialize.
Prerequisites
- Phase 5 complete (test harness exists, k3d tests passing)
feature/kvm-modulemerged to main- A CI runner with libvirt/KVM access and nested virtualization support
Architecture
The KVM branch already has a kvm_okd_ha_cluster example that creates:
Host bridge (WAN)
|
+--------------------+
| OPNsense | 192.168.100.1
| gateway + PXE |
+--------+-----------+
|
harmonylan (192.168.100.0/24)
+---------+---------+---------+---------+
| | | | |
+----+---+ +---+---+ +---+---+ +---+---+ +--+----+
| cp0 | | cp1 | | cp2 | |worker0| |worker1|
| .10 | | .11 | | .12 | | .20 | | .21 |
+--------+ +-------+ +-------+ +-------+ +---+---+
|
+-----+----+
| worker2 |
| .22 |
+----------+
The test needs to orchestrate this entire setup, wait for OKD to converge, and assert the cluster is healthy.
Tasks
6.1 Start with example_linux_vm — the simplest KVM test
Before tackling the full OKD stack, validate the KVM module itself with the simplest possible test:
// tests/e2e/tests/kvm_linux_vm.rs
#[tokio::test]
#[ignore] // Requires libvirt access — run with: cargo test -- --ignored
async fn test_linux_vm_boots_from_iso() {
let executor = KvmExecutor::from_env().unwrap();
// Create isolated network
let network = NetworkConfig {
name: "e2e-test-net".to_string(),
bridge: "virbr200".to_string(),
// ...
};
executor.ensure_network(&network).await.unwrap();
// Define and start VM
let vm_config = VmConfig::builder("e2e-linux-test")
.vcpus(1)
.memory_gb(1)
.disk(5)
.network(NetworkRef::named("e2e-test-net"))
.cdrom("https://releases.ubuntu.com/24.04/ubuntu-24.04-live-server-amd64.iso")
.boot_order([BootDevice::Cdrom, BootDevice::Disk])
.build();
executor.ensure_vm(&vm_config).await.unwrap();
executor.start_vm("e2e-linux-test").await.unwrap();
// Assert VM is running
let status = executor.vm_status("e2e-linux-test").await.unwrap();
assert_eq!(status, VmStatus::Running);
// Cleanup
executor.destroy_vm("e2e-linux-test").await.unwrap();
executor.undefine_vm("e2e-linux-test").await.unwrap();
executor.delete_network("e2e-test-net").await.unwrap();
}
This test validates:
- ISO download works (via
harmony_assetsif refactored, or built-in KVM module download) - libvirt XML generation is correct
- VM lifecycle (define → start → status → destroy → undefine)
- Network creation/deletion
6.2 OKD HA Cluster E2E test
The full integration test. This is long-running (30-60 minutes) and should only run nightly or on-demand.
// tests/e2e/tests/kvm_okd_ha.rs
#[tokio::test]
#[ignore] // Requires KVM + significant resources. Run nightly.
async fn test_okd_ha_cluster_on_kvm() {
// 1. Create virtual infrastructure
// - OPNsense gateway VM
// - 3 control plane VMs
// - 3 worker VMs
// - Virtual network (harmonylan)
// 2. Run OKD installation scores
// (the kvm_okd_ha_cluster example, but as a test)
// 3. Wait for OKD API server to become reachable
// - Poll https://api.okd.harmonylan:6443 until it responds
// - Timeout: 30 minutes
// 4. Assert cluster health
// - All nodes in Ready state
// - ClusterVersion reports Available=True
// - Sample workload (nginx) deploys and pod reaches Running
// 5. Cleanup
// - Destroy all VMs
// - Delete virtual networks
// - Clean up disk images
}
6.3 CI runner requirements
The KVM E2E test needs a runner with:
- Hardware: 32GB+ RAM, 8+ CPU cores, 100GB+ disk
- Software: libvirt, QEMU/KVM,
virsh, nested virtualization enabled - Network: Outbound internet access (to download ISOs, OKD images)
- Permissions: User in
libvirtgroup, or root access
Options:
- Dedicated bare-metal machine registered as a self-hosted GitHub Actions runner
- Cloud VM with nested virt (e.g., GCP n2-standard-8 with
--enable-nested-virtualization) - Manual trigger only — developer runs locally, CI just tracks pass/fail
6.4 Nightly CI job
# .github/workflows/e2e-kvm.yml
name: E2E KVM Tests
on:
schedule:
- cron: '0 2 * * *' # 2 AM daily
workflow_dispatch: # Manual trigger
jobs:
kvm-tests:
runs-on: [self-hosted, kvm] # Label for KVM-capable runners
timeout-minutes: 90
steps:
- uses: actions/checkout@v4
- name: Run KVM E2E tests
run: cargo test -p harmony-e2e-tests -- --ignored --test-threads=1
env:
RUST_LOG: info
HARMONY_KVM_URI: qemu:///system
- name: Cleanup VMs on failure
if: failure()
run: |
virsh list --all --name | grep e2e | xargs -I {} virsh destroy {} || true
virsh list --all --name | grep e2e | xargs -I {} virsh undefine {} --remove-all-storage || true
6.5 Test resource management
KVM tests create real resources that must be cleaned up even on failure. Implement a test fixture pattern:
struct KvmTestFixture {
executor: KvmExecutor,
vms: Vec<String>,
networks: Vec<String>,
}
impl KvmTestFixture {
fn track_vm(&mut self, name: &str) { self.vms.push(name.to_string()); }
fn track_network(&mut self, name: &str) { self.networks.push(name.to_string()); }
}
impl Drop for KvmTestFixture {
fn drop(&mut self) {
// Best-effort cleanup of all tracked resources
for vm in &self.vms {
let _ = std::process::Command::new("virsh")
.args(["destroy", vm]).output();
let _ = std::process::Command::new("virsh")
.args(["undefine", vm, "--remove-all-storage"]).output();
}
for net in &self.networks {
let _ = std::process::Command::new("virsh")
.args(["net-destroy", net]).output();
let _ = std::process::Command::new("virsh")
.args(["net-undefine", net]).output();
}
}
}
Deliverables
test_linux_vm_boots_from_iso— passing KVM smoke testtest_okd_ha_cluster_on_kvm— full OKD installation testKvmTestFixturewith resource cleanup on test failure- Nightly CI job on KVM-capable runner
- Force-cleanup script for leaked VMs/networks
- Documentation: how to set up a KVM runner for E2E tests