Compare commits

..

9 Commits

Author SHA1 Message Date
18d8ba2210 feat: refactored okd cluster alerting example into an opinionated high level score 2026-03-09 22:44:12 -04:00
c5b292d99b fix: dependencies and formatting 2026-03-09 22:25:16 -04:00
0258b31fd2 e2e tests module ready for review, k3d test works well 2026-03-09 22:17:28 -04:00
4407792bd5 chore: use async trait instead of ugly types 2026-03-09 21:59:57 -04:00
7978a63004 wip: harmony e2e test module coming along 2026-03-09 21:54:12 -04:00
58d00c95bb Review new test module and slightly improve testing roadmap 2026-03-09 21:01:47 -04:00
7d14f7646c fix(e2e): fix compilation errors in multicluster test
- multicluster_postgres test was incomplete, simplified to placeholder
- Added todo!() for multi-cluster PostgreSQL test to be implemented later
2026-03-09 20:15:41 -04:00
69dd763d6e feat(e2e): initial e2e test runner with k3d and cnpg tests
- Add harmony_e2e_tests crate with CLI test runner
- k3d_cluster test: provisions k3d cluster and verifies nodes
- cnpg_postgres test: deploys CNPG operator, creates PostgreSQL
  cluster, waits for readiness, executes SQL query
- multicluster_postgres test: placeholder for next iteration
2026-03-09 19:39:59 -04:00
2e46ac3418 e2e tests wip 2026-03-09 19:29:22 -04:00
20 changed files with 2491 additions and 32 deletions

View File

@@ -0,0 +1,548 @@
# CI and Testing Strategy for Harmony
## Executive Summary
Harmony aims to become a CNCF project, requiring a robust CI pipeline that demonstrates real-world reliability. The goal is to run **all examples** in CI, from simple k3d deployments to full HA OKD clusters on bare metal. This document provides context for designing and implementing this testing infrastructure.
---
## Project Context
### What is Harmony?
Harmony is an infrastructure automation framework that is **code-first and code-only**. Operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Key differentiators:
1. **Compile-time safety**: The type system prevents "config-is-valid-but-platform-is-wrong" errors
2. **Topology abstraction**: Write once, deploy to any environment (local k3d, OKD, bare metal, cloud)
3. **Capability-based design**: Scores declare what they need; topologies provide what they have
### Core Abstractions
| Concept | Description |
|---------|-------------|
| **Score** | Declarative description of desired state (the "what") |
| **Topology** | Logical representation of infrastructure (the "where") |
| **Capability** | A feature a topology offers (the "how") |
| **Interpret** | Execution logic connecting Score to Topology |
### Compile-Time Verification
```rust
// This compiles only if K8sAnywhereTopology provides K8sclient + HelmCommand
impl<T: Topology + K8sclient + HelmCommand> Score<T> for MyScore { ... }
// This FAILS to compile - LinuxHostTopology doesn't provide K8sclient
// (intentionally broken example for testing)
impl<T: Topology + K8sclient> Score<T> for K8sResourceScore { ... }
// error: LinuxHostTopology does not implement K8sclient
```
---
## Current Examples Inventory
### Summary Statistics
| Category | Count | CI Complexity |
|----------|-------|---------------|
| k3d-compatible | 22 | Low - single k3d cluster |
| OKD-specific | 4 | Medium - requires OKD cluster |
| Bare metal | 4 | High - requires physical infra or nested virtualization |
| Multi-cluster | 3 | High - requires multiple K8s clusters |
| No infra needed | 4 | Trivial - local only |
### Detailed Example Classification
#### Tier 1: k3d-Compatible (22 examples)
Can run on a local k3d cluster with minimal setup:
| Example | Topology | Capabilities | Special Notes |
|---------|----------|--------------|---------------|
| zitadel | K8sAnywhereTopology | K8sClient, HelmCommand | SSO/Identity |
| node_health | K8sAnywhereTopology | K8sClient | Health checks |
| public_postgres | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Needs ingress |
| openbao | K8sAnywhereTopology | K8sClient, HelmCommand | Vault alternative |
| rust | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Webapp deployment |
| cert_manager | K8sAnywhereTopology | K8sClient, CertificateManagement | TLS certificates |
| try_rust_webapp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Full webapp |
| monitoring | K8sAnywhereTopology | K8sClient, HelmCommand, Observability | Prometheus |
| application_monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
| monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
| postgresql | K8sAnywhereTopology | K8sClient, HelmCommand | CloudNativePG |
| ntfy | K8sAnywhereTopology | K8sClient, HelmCommand | Notifications |
| tenant | K8sAnywhereTopology | K8sClient, TenantManager | Namespace isolation |
| lamp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | LAMP stack |
| k8s_drain_node | K8sAnywhereTopology | K8sClient | Node operations |
| k8s_write_file_on_node | K8sAnywhereTopology | K8sClient | Node operations |
| remove_rook_osd | K8sAnywhereTopology | K8sClient | Ceph operations |
| validate_ceph_cluster_health | K8sAnywhereTopology | K8sClient | Ceph health |
| kube-rs | Direct kube | K8sClient | Raw kube-rs demo |
| brocade_snmp_server | K8sAnywhereTopology | K8sClient | SNMP collector |
| harmony_inventory_builder | LocalhostTopology | None | Network scanning |
| cli | LocalhostTopology | None | CLI demo |
#### Tier 2: OKD/OpenShift-Specific (4 examples)
Require OKD/OpenShift features not available in vanilla K8s:
| Example | Topology | OKD-Specific Feature |
|---------|----------|---------------------|
| okd_cluster_alerts | K8sAnywhereTopology | OpenShift Monitoring CRDs |
| operatorhub_catalog | K8sAnywhereTopology | OpenShift OperatorHub |
| rhob_application_monitoring | K8sAnywhereTopology | RHOB (Red Hat Observability) |
| nats-supercluster | K8sAnywhereTopology | OKD Routes (OpenShift Ingress) |
#### Tier 3: Bare Metal Infrastructure (4 examples)
Require physical hardware or full virtualization:
| Example | Topology | Physical Requirements |
|---------|----------|----------------------|
| okd_installation | HAClusterTopology | OPNSense, Brocade switch, PXE boot, 3+ nodes |
| okd_pxe | HAClusterTopology | OPNSense, Brocade switch, PXE infrastructure |
| sttest | HAClusterTopology | Full HA cluster with all network services |
| opnsense | OPNSenseFirewall | OPNSense firewall access |
| opnsense_node_exporter | Custom | OPNSense firewall |
#### Tier 4: Multi-Cluster (3 examples)
Require multiple K8s clusters:
| Example | Topology | Clusters Required |
|---------|----------|-------------------|
| nats | K8sAnywhereTopology × 2 | 2 clusters with NATS gateways |
| nats-module | DecentralizedTopology | 3 clusters for supercluster |
| multisite_postgres | FailoverTopology | 2 clusters for replication |
---
## Testing Categories
### 1. Compile-Time Tests
These tests verify that the type system correctly rejects invalid configurations:
```rust
// Should NOT compile - K8sResourceScore on LinuxHostTopology
#[test]
#[compile_fail]
fn test_k8s_score_on_linux_host() {
let score = K8sResourceScore::new();
let topology = LinuxHostTopology::new();
// This line should fail to compile
harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
}
// Should compile - K8sResourceScore on K8sAnywhereTopology
#[test]
fn test_k8s_score_on_k8s_topology() {
let score = K8sResourceScore::new();
let topology = K8sAnywhereTopology::from_env();
// This should compile
harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
}
```
**Implementation Options:**
- `trybuild` crate for compile-time failure tests
- Separate `tests/compile_fail/` directory with expected error messages
### 2. Unit Tests
Pure Rust logic without external dependencies:
- Score serialization/deserialization
- Inventory parsing
- Type conversions
- CRD generation
**Requirements:**
- No external services
- Sub-second execution
- Run on every PR
### 3. Integration Tests (k3d)
Deploy to a local k3d cluster:
**Setup:**
```bash
# Install k3d
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
# Create cluster
k3d cluster create harmony-test \
--agents 3 \
--k3s-arg "--disable=traefik@server:0"
# Wait for ready
kubectl wait --for=condition=Ready nodes --all --timeout=120s
```
**Test Matrix:**
| Example | k3d | Test Type |
|---------|-----|-----------|
| zitadel | ✅ | Deploy + health check |
| cert_manager | ✅ | Deploy + certificate issuance |
| monitoring | ✅ | Deploy + metric collection |
| postgresql | ✅ | Deploy + database connectivity |
| tenant | ✅ | Namespace creation + isolation |
### 4. Integration Tests (OKD)
Deploy to OKD/OpenShift cluster:
**Options:**
1. **Nested virtualization**: Run OKD in VMs (slow, expensive)
2. **CRC (CodeReady Containers)**: Single-node OKD (resource intensive)
3. **Managed OpenShift**: AWS/Azure/GCP (costly)
4. **Existing cluster**: Connect to pre-provisioned cluster (fastest)
**Test Matrix:**
| Example | OKD Required | Test Type |
|---------|--------------|-----------|
| okd_cluster_alerts | ✅ | Alert rule deployment |
| rhob_application_monitoring | ✅ | RHOB stack deployment |
| operatorhub_catalog | ✅ | Operator installation |
### 5. End-to-End Tests (Full Infrastructure)
Complete infrastructure deployment including bare metal:
**Options:**
1. **Libvirt + KVM**: Virtual machines on CI runner
2. **Nested KVM**: KVM inside KVM (for cloud CI)
3. **Dedicated hardware**: Physical test lab
4. **Mock/Hybrid**: Mock physical components, real K8s
---
## CI Environment Options
### Option A: GitHub Actions (Current Standard)
**Pros:**
- Native GitHub integration
- Large runner ecosystem
- Free for open source
**Cons:**
- Limited nested virtualization support
- 6-hour job timeout
- Resource constraints on free runners
**Matrix:**
```yaml
strategy:
matrix:
os: [ubuntu-latest]
rust: [stable, beta]
k8s: [k3d, kind]
tier: [unit, k3d-integration]
```
### Option B: Self-Hosted Runners
**Pros:**
- Full control over environment
- Can run nested virtualization
- No time limits
- Persistent state between runs
**Cons:**
- Maintenance overhead
- Cost of infrastructure
- Security considerations
**Setup:**
- Bare metal servers with KVM support
- Pre-installed k3d, kind, CRC
- OPNSense VM for network tests
### Option C: Hybrid (GitHub + Self-Hosted)
**Pros:**
- Fast unit tests on GitHub runners
- Heavy tests on self-hosted infrastructure
- Cost-effective
**Cons:**
- Two CI systems to maintain
- Complexity in test distribution
### Option D: Cloud CI (CircleCI, GitLab CI, etc.)
**Pros:**
- Often better resource options
- Docker-in-Docker support
- Better nested virtualization
**Cons:**
- Cost
- Less GitHub-native
---
## Performance Requirements
### Target Execution Times
| Test Category | Target Time | Current (est.) |
|---------------|-------------|----------------|
| Compile-time tests | < 30s | Unknown |
| Unit tests | < 60s | Unknown |
| k3d integration (per example) | < 120s | 60-300s |
| Full k3d matrix | < 15 min | 30-60 min |
| OKD integration | < 30 min | 1-2 hours |
| Full E2E | < 2 hours | 4-8 hours |
### Sub-Second Performance Strategies
1. **Parallel execution**: Run independent tests concurrently
2. **Incremental testing**: Only run affected tests on changes
3. **Cached clusters**: Pre-warm k3d clusters
4. **Layered testing**: Fail fast on cheaper tests
5. **Mock external services**: Fake Discord webhooks, etc.
---
## Test Data and Secrets Management
### Secrets Required
| Secret | Use | Storage |
|--------|-----|---------|
| Discord webhook URL | Alert receiver tests | GitHub Secrets |
| OPNSense credentials | Network tests | Self-hosted only |
| Cloud provider creds | Multi-cloud tests | Vault / GitHub Secrets |
| TLS certificates | Ingress tests | Generated on-the-fly |
### Test Data
| Data | Source | Strategy |
|------|--------|----------|
| Container images | Public registries | Cache locally |
| Helm charts | Public repos | Vendor in repo |
| K8s manifests | Generated | Dynamic |
---
## Proposed Test Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ harmony_e2e_tests Package │
│ (cargo run -p harmony_e2e_tests) │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Compile │ │ Unit │ │ Compile-Fail Tests │ │
│ │ Tests │ │ Tests │ │ (trybuild) │ │
│ │ < 30s │ │ < 60s │ │ < 30s │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ k3d Integration Tests │ │
│ │ Self-provisions k3d cluster, runs 22 examples │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ... │ │
│ │ │ 60s │ │ 90s │ │ 120s │ │ 90s │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ Parallel Execution │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ OKD Integration Tests │ │
│ │ Connects to existing OKD cluster or provisions via KVM │ │
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ okd_cluster_ │ │ rhob_application_ │ │ │
│ │ │ alerts (5 min) │ │ monitoring (10 min) │ │ │
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ KVM-based E2E Tests │ │
│ │ Uses Harmony's KVM module to provision test VMs │ │
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ okd_installation│ │ Full HA cluster deployment │ │ │
│ │ │ (30-60 min) │ │ (60-120 min) │ │ │
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Any CI system (GitHub Actions, GitLab CI, Jenkins, cron) just runs:
cargo run -p harmony_e2e_tests
```
┌─────────────────────────────────────────────────────────────────┐
GitHub Actions
├─────────────────────────────────────────────────────────────────┤
┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐
Compile Unit Compile-Fail Tests
Tests Tests (trybuild)
< 30s < 60s < 30s
└─────────────┘ └─────────────┘ └─────────────────────────┘
┌───────────────────────────────────────────────────────────┐
k3d Integration Tests
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
zitadel cert-mgr monitor postgres ...
60s 90s 120s 90s
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Parallel Execution
└───────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
Self-Hosted Runners
├─────────────────────────────────────────────────────────────────┤
┌───────────────────────────────────────────────────────────┐
OKD Integration Tests
┌─────────────────┐ ┌─────────────────────────────┐
okd_cluster_ rhob_application_
alerts (5 min) monitoring (10 min)
└─────────────────┘ └─────────────────────────────┘
└───────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────┐
KVM-based E2E Tests (Harmony provisions)
┌─────────────────────────────────────────────────────┐
Harmony KVM Module provisions test VMs
- OKD HA Cluster (3 control plane, 2 workers)
- OPNSense VM (router/firewall)
- Brocade simulator VM
└─────────────────────────────────────────────────────┘
└───────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────────┘
```
---
## Questions for Researchers
### Critical Questions
1. **Self-contained test runner**: How to design `harmony_e2e_tests` package that runs all tests with a single `cargo run` command?
2. **Nested Virtualization**: What are the prerequisites for running KVM inside a test environment?
3. **Cost Optimization**: How to minimize cloud costs while running comprehensive E2E tests?
4. **Test Isolation**: How to ensure test isolation when running parallel k3d tests?
5. **State Management**: Should we persist k3d clusters between test runs, or create fresh each time?
6. **Mocking Strategy**: Which external services (Discord, OPNSense, etc.) should be mocked vs. real?
7. **Compile-Fail Tests**: Best practices for testing Rust compile-time errors?
8. **Multi-Cluster Tests**: How to efficiently provision and connect multiple K8s clusters in tests?
9. **Secrets Management**: How to handle secrets for test environments without external CI dependencies?
10. **Test Flakiness**: Strategies for reducing flakiness in infrastructure tests?
11. **Reporting**: How to present test results for complex multi-environment test matrices?
12. **Prerequisite Detection**: How to detect and validate prerequisites (Docker, k3d, KVM) before running tests?
### Research Areas
1. **CI/CD Tools**: Evaluate GitHub Actions, GitLab CI, CircleCI, Tekton, Prow for Harmony's needs
2. **K8s Test Tools**: Evaluate kind, k3d, minikube, microk8s for local testing
3. **Mock Frameworks**: Evaluate mock-server, wiremock, hoverfly for external service mocking
4. **Test Frameworks**: Evaluate built-in Rust test, nextest, cargo-tarpaulin for performance
---
## Success Criteria
### Week 1 (Agentic Velocity)
- [ ] Compile-time verification tests working
- [ ] Unit tests for monitoring module
- [ ] First 5 k3d examples running in CI
- [ ] Mock framework for Discord webhooks
### Week 2
- [ ] All 22 k3d-compatible examples in CI
- [ ] OKD self-hosted runner operational
- [ ] KVM module reviewed and ready for CI
### Week 3-4
- [ ] Full E2E tests with KVM infrastructure
- [ ] Multi-cluster tests automated
- [ ] All examples tested in CI
### Month 2
- [ ] Sub-15-minute total CI time
- [ ] Weekly E2E tests on bare metal
- [ ] Documentation complete
- [ ] Ready for CNCF submission
---
## Prerequisites
### Hardware Requirements
| Component | Minimum | Recommended |
|-----------|---------|------------|
| CPU | 4 cores | 8+ cores (for parallel tests) |
| RAM | 8 GB | 32 GB (for KVM E2E) |
| Disk | 50 GB SSD | 500 GB NVMe |
| Docker | Required | Latest |
| k3d | Required | v5.6.0 |
| Kubectl | Required | v1.28.0 |
| libvirt | Required | 9.0.0 (for KVM tests) |
### Software Requirements
| Tool | Version |
|------|---------|
| Rust | 1.75+ |
| Docker | 24.0+ |
| k3d | v5.6.0+ |
| kubectl | v1.28+ |
| libvirt | 9.0.0 |
### Installation (One-time)
```bash
# Install Rust
curl --proto '=https://sh.rustup.rs' -sSf | sh
# Install Docker
curl -fsSL https://get.docker.com -o docker-ce | sh
# Install k3d
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
# Install kubectl
curl -LO "https://dl.k8s.io/release/v1.28.0/bin/linux/amd64" -o /usr/local/bin/kubectl
sudo mv /usr/local/bin/kubectl /usr/local/bin
```
---
## Reference Materials
### Existing Code
- Examples: `examples/*/src/main.rs`
- Topologies: `harmony/src/domain/topology/`
- Capabilities: `harmony/src/domain/topology/` (trait definitions)
- Scores: `harmony/src/modules/*/`
### Documentation
- [Coding Guide](docs/coding-guide.md)
- [Core Concepts](docs/concepts.md)
- [Monitoring Architecture](docs/monitoring.md)
- [ADR-020: Monitoring](adr/020-monitoring-alerting-architecture.md)
### Related Projects
- Crossplane (similar abstraction model)
- Pulumi (infrastructure as code)
- Terraform (state management patterns)
- Flux/ArgoCD (GitOps testing patterns)

201
CI_and_testing_roadmap.md Normal file
View File

@@ -0,0 +1,201 @@
# Pragmatic CI and Testing Roadmap for Harmony
**Status**: Active implementation (March 2026)
**Core Principle**: Self-contained test runner — no dependency on centralized CI servers
All tests are executable via one command:
```bash
cargo run -p harmony_e2e_tests
```
The `harmony_e2e_tests` package:
- Provisions its own infrastructure when needed (k3d, KVM VMs)
- Runs all test tiers in sequence or selectively
- Reports results in text, JSON or JUnit XML
- Works identically on developer laptops, any Linux server, GitHub Actions, GitLab CI, Jenkins, cron jobs, etc.
- Is the single source of truth for what "passing CI" means
## Why This Approach
1. **Portability** — same command & behavior everywhere
2. **Harmony tests Harmony** — the framework validates itself
3. **No vendor lock-in** — GitHub Actions / GitLab CI are just triggers
4. **Perfect reproducibility** — developers reproduce any CI failure locally in seconds
5. **Offline capable** — after initial setup, most tiers run without internet
## Architecture: `harmony_e2e_tests` Package
```
harmony_e2e_tests/
├── Cargo.toml
├── src/
│ ├── main.rs # CLI entry point
│ ├── lib.rs # Test runner core logic
│ ├── tiers/
│ │ ├── mod.rs
│ │ ├── compile_fail.rs # trybuild-based compile-time checks
│ │ ├── unit.rs # cargo test --lib --workspace
│ │ ├── k3d.rs # k3d cluster + parallel example runs
│ │ ├── okd.rs # connect to existing OKD cluster
│ │ └── kvm.rs # full E2E via Harmony's own KVM module
│ ├── mocks/
│ │ ├── mod.rs
│ │ ├── discord.rs # mock Discord webhook receiver
│ │ └── opnsense.rs # mock OPNSense firewall API
│ └── infrastructure/
│ ├── mod.rs
│ ├── k3d.rs # k3d cluster lifecycle
│ └── kvm.rs # helper wrappers around KVM score
└── tests/
├── ui/ # trybuild compile-fail cases (*.rs + *.stderr)
└── fixtures/ # static test data / golden files
```
## CLI Interface ( clap-based )
```bash
# Run everything (default)
cargo run -p harmony_e2e_tests
# Specific tier
cargo run -p harmony_e2e_tests -- --tier k3d
cargo run -p harmony_e2e_tests -- --tier compile
# Filter to one example
cargo run -p harmony_e2e_tests -- --tier k3d --example monitoring
# Parallelism control (k3d tier)
cargo run -p harmony_e2e_tests -- --parallel 8
# Reporting
cargo run -p harmony_e2e_tests -- --report junit.xml
cargo run -p harmony_e2e_tests -- --format json
# Debug helpers
cargo run -p harmony_e2e_tests -- --verbose --dry-run
```
## Test Tiers Ordered by Speed & Cost
| Tier | Duration target | Runner type | What it tests | Isolation strategy |
|------------------|------------------|----------------------|----------------------------------------------------|-----------------------------|
| Compile-fail | < 20 s | Any (GitHub free) | Invalid configs don't compile | Per-file trybuild |
| Unit | < 60 s | Any | Pure Rust logic | cargo test |
| k3d | 815 min | GitHub / self-hosted | 22+ k3d-compatible examples | Fresh k3d cluster + ns-per-example |
| OKD | 1030 min | Self-hosted / CRC | OKD-specific features (Routes, Monitoring CRDs…) | Existing cluster via KUBECONFIG |
| KVM Full E2E | 60180 min | Self-hosted bare-metal | Full HA OKD install + bare-metal scenarios | Harmony KVM score provisions VMs |
### Tier Details & Implementation Notes
1. **Compile-fail**
Uses **`trybuild`** crate (standard in Rust ecosystem).
Place intentional compile errors in `tests/ui/*.rs` with matching `*.stderr` expectation files.
One test function replaces the old custom loop:
```rust
#[test]
fn ui() {
let t = trybuild::TestCases::new();
t.compile_fail("tests/ui/*.rs");
}
```
2. **Unit**
Simple wrapper: `cargo test --lib --workspace -- --nocapture`
Consider `cargo-nextest` later for 23× speedup if test count grows.
3. **k3d**
- Provisions isolated cluster once at start (`k3d cluster create --agents 3 --no-lb --disable traefik`)
- Discovers examples via `[package.metadata.harmony.test-tier = "k3d"]` in `Cargo.toml`
- Runs in parallel with tokio semaphore (default 58 slots)
- Each example gets its own namespace
- Uses `defer` / `scopeguard` for guaranteed cleanup
- Mocks Discord webhook and OPNSense API
4. **OKD**
Connects to pre-provisioned cluster via `KUBECONFIG`.
Validates it is actually OpenShift/OKD before proceeding.
5. **KVM**
Uses **Harmonys own KVM module** to provision test VMs (control-plane + workers + OPNSense).
→ True “dogfooding” — if the E2E fails, the KVM score itself is likely broken.
## CI Integration Patterns
### Fast PR validation (GitHub Actions)
```yaml
name: Fast Tests
on: [push, pull_request]
jobs:
fast:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- name: Install Docker & k3d
uses: nolar/setup-k3d-k3s@v1
- run: cargo run -p harmony_e2e_tests -- --tier compile,unit,k3d --report junit.xml
- uses: actions/upload-artifact@v4
with: { name: test-results, path: junit.xml }
```
### Nightly / Merge heavy tests (self-hosted runner)
```yaml
name: Full E2E
on:
schedule: [{ cron: "0 3 * * *" }]
push: { branches: [main] }
jobs:
full:
runs-on: [self-hosted, linux, x64, kvm-capable]
steps:
- uses: actions/checkout@v4
- run: cargo run -p harmony_e2e_tests -- --tier okd,kvm --verbose --report junit.xml
```
## Prerequisites Auto-Check & Install
```rust
// in harmony_e2e_tests/src/infrastructure/prerequisites.rs
async fn ensure_k3d() -> Result<()> { … } // curl | bash if missing
async fn ensure_docker() -> Result<()> { … }
fn check_kvm_support() -> Result<()> { … } // /dev/kvm + libvirt
```
## Success Criteria
### Step 1
- [ ] `harmony_e2e_tests` package created & basic CLI working
- [ ] trybuild compile-fail suite passing
- [ ] First 810 k3d examples running reliably in CI
- [ ] Mock server for Discord webhook completed
### Step 2
- [ ] All 22 k3d-compatible examples green
- [ ] OKD tier running on dedicated self-hosted runner
- [ ] JUnit reporting + GitHub check integration
- [ ] Namespace isolation + automatic retry on transient k8s errors
### Step 3
- [ ] KVM full E2E green on bare-metal runner (nightly)
- [ ] Multi-cluster examples (nats, multisite-postgres) automated
- [ ] Total fast CI time < 12 minutes on GitHub runners
- [ ] Documentation: “How to add a new tested example”
## Quick Start for New Contributors
```bash
# One-time setup
rustup update stable
cargo install trybuild cargo-nextest # optional but recommended
# Run locally (most common)
cargo run -p harmony_e2e_tests -- --tier k3d --verbose
# Just compile checks + unit
cargo test -p harmony_e2e_tests
```

624
Cargo.lock generated
View File

@@ -297,6 +297,12 @@ dependencies = [
"libc",
]
[[package]]
name = "ansi_term"
version = "0.10.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6b3568b48b7cefa6b8ce125f9bb4989e52fbcc29ebea88df04cc7c5f12f70455"
[[package]]
name = "anstream"
version = "0.6.21"
@@ -718,6 +724,41 @@ dependencies = [
"tokio",
]
[[package]]
name = "brocade-snmp-server"
version = "0.1.0"
dependencies = [
"base64 0.22.1",
"brocade",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_secret",
"harmony_types",
"log",
"serde",
"tokio",
"url",
]
[[package]]
name = "brocade-switch"
version = "0.1.0"
dependencies = [
"async-trait",
"brocade",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"serde",
"tokio",
"url",
]
[[package]]
name = "brotli"
version = "8.0.2"
@@ -871,6 +912,22 @@ dependencies = [
"shlex",
]
[[package]]
name = "cert_manager"
version = "0.1.0"
dependencies = [
"assert_cmd",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "cfg-if"
version = "1.0.4"
@@ -1853,6 +1910,12 @@ dependencies = [
"regex",
]
[[package]]
name = "env_home"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c7f84e12ccf0a7ddc17a6c41c93326024c42920d7ee630d04950e6926645c0fe"
[[package]]
name = "env_logger"
version = "0.11.9"
@@ -1929,6 +1992,457 @@ dependencies = [
name = "example"
version = "0.0.0"
[[package]]
name = "example-application-monitoring-with-tenant"
version = "0.1.0"
dependencies = [
"env_logger",
"harmony",
"harmony_cli",
"harmony_types",
"logging",
"tokio",
"url",
]
[[package]]
name = "example-cli"
version = "0.1.0"
dependencies = [
"assert_cmd",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-k8s-drain-node"
version = "0.1.0"
dependencies = [
"assert_cmd",
"cidr",
"env_logger",
"harmony",
"harmony-k8s",
"harmony_cli",
"harmony_macros",
"harmony_types",
"inquire 0.7.5",
"log",
"tokio",
"url",
]
[[package]]
name = "example-k8s-write-file-on-node"
version = "0.1.0"
dependencies = [
"assert_cmd",
"cidr",
"env_logger",
"harmony",
"harmony-k8s",
"harmony_cli",
"harmony_macros",
"harmony_types",
"inquire 0.7.5",
"log",
"tokio",
"url",
]
[[package]]
name = "example-kube-rs"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_macros",
"http 1.4.0",
"inquire 0.7.5",
"k8s-openapi",
"kube",
"log",
"serde_yaml",
"tokio",
"url",
]
[[package]]
name = "example-lamp"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-monitoring"
version = "0.1.0"
dependencies = [
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"tokio",
"url",
]
[[package]]
name = "example-monitoring-with-tenant"
version = "0.1.0"
dependencies = [
"cidr",
"harmony",
"harmony_cli",
"harmony_types",
"tokio",
"url",
]
[[package]]
name = "example-multisite-postgres"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-nats"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-nats-module-supercluster"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"k8s-openapi",
"log",
"tokio",
"url",
]
[[package]]
name = "example-nats-supercluster"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"k8s-openapi",
"log",
"tokio",
"url",
]
[[package]]
name = "example-node-health"
version = "0.1.0"
dependencies = [
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
]
[[package]]
name = "example-ntfy"
version = "0.1.0"
dependencies = [
"harmony",
"harmony_cli",
"tokio",
"url",
]
[[package]]
name = "example-okd-cluster-alerts"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_secret",
"harmony_secret_derive",
"harmony_types",
"log",
"serde",
"tokio",
"url",
]
[[package]]
name = "example-okd-install"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_secret",
"harmony_secret_derive",
"harmony_types",
"log",
"schemars 0.8.22",
"serde",
"tokio",
"url",
]
[[package]]
name = "example-openbao"
version = "0.1.0"
dependencies = [
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"tokio",
"url",
]
[[package]]
name = "example-operatorhub-catalogsource"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-opnsense"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_secret",
"harmony_types",
"log",
"schemars 0.8.22",
"serde",
"tokio",
"url",
]
[[package]]
name = "example-opnsense-node-exporter"
version = "0.1.0"
dependencies = [
"async-trait",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_secret",
"harmony_secret_derive",
"harmony_types",
"log",
"serde",
"tokio",
"url",
]
[[package]]
name = "example-postgresql"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-public-postgres"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-pxe"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_secret",
"harmony_secret_derive",
"harmony_types",
"log",
"schemars 0.8.22",
"serde",
"tokio",
"url",
]
[[package]]
name = "example-remove-rook-osd"
version = "0.1.0"
dependencies = [
"harmony",
"harmony_cli",
"tokio",
]
[[package]]
name = "example-rust"
version = "0.1.0"
dependencies = [
"base64 0.22.1",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-tenant"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-try-rust-webapp"
version = "0.1.0"
dependencies = [
"base64 0.22.1",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-tui"
version = "0.1.0"
dependencies = [
"cidr",
"env_logger",
"harmony",
"harmony_macros",
"harmony_tui",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "example-zitadel"
version = "0.1.0"
dependencies = [
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"tokio",
"url",
]
[[package]]
name = "example_validate_ceph_cluster_health"
version = "0.1.0"
dependencies = [
"harmony",
"harmony_cli",
"tokio",
]
[[package]]
name = "eyre"
version = "0.6.12"
@@ -2540,6 +3054,30 @@ dependencies = [
"tokio",
]
[[package]]
name = "harmony_e2e_tests"
version = "0.1.0"
dependencies = [
"async-trait",
"chrono",
"clap",
"env_logger",
"harmony",
"inventory",
"k3d-rs",
"k8s-openapi",
"kube",
"log",
"serde",
"serde_json",
"sqlx",
"tempfile",
"thiserror 2.0.18",
"tokio",
"tokio-stream",
"which",
]
[[package]]
name = "harmony_execution"
version = "0.1.0"
@@ -2569,6 +3107,19 @@ dependencies = [
"tokio",
]
[[package]]
name = "harmony_inventory_builder"
version = "0.1.0"
dependencies = [
"cidr",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"tokio",
"url",
]
[[package]]
name = "harmony_macros"
version = "0.1.0"
@@ -3333,6 +3884,15 @@ dependencies = [
"thiserror 1.0.69",
]
[[package]]
name = "inventory"
version = "0.3.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "009ae045c87e7082cb72dab0ccd01ae075dd00141ddc108f43a0ea150a9e7227"
dependencies = [
"rustversion",
]
[[package]]
name = "ipnet"
version = "2.12.0"
@@ -3732,6 +4292,15 @@ dependencies = [
"log",
]
[[package]]
name = "logging"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "461a8beca676e8ab1bd468c92e9b4436d6368e11e96ae038209e520cfe665e46"
dependencies = [
"ansi_term",
]
[[package]]
name = "lru"
version = "0.12.5"
@@ -4954,6 +5523,21 @@ dependencies = [
"subtle",
]
[[package]]
name = "rhob-application-monitoring"
version = "0.1.0"
dependencies = [
"base64 0.22.1",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_types",
"log",
"tokio",
"url",
]
[[package]]
name = "ring"
version = "0.17.14"
@@ -5927,6 +6511,7 @@ dependencies = [
"memchr",
"once_cell",
"percent-encoding",
"rustls 0.23.37",
"serde",
"serde_json",
"sha2",
@@ -5936,6 +6521,7 @@ dependencies = [
"tokio-stream",
"tracing",
"url",
"webpki-roots 0.26.11",
]
[[package]]
@@ -6208,6 +6794,26 @@ dependencies = [
"syn 2.0.117",
]
[[package]]
name = "sttest"
version = "0.1.0"
dependencies = [
"brocade",
"cidr",
"env_logger",
"harmony",
"harmony_cli",
"harmony_macros",
"harmony_secret",
"harmony_secret_derive",
"harmony_types",
"log",
"schemars 0.8.22",
"serde",
"tokio",
"url",
]
[[package]]
name = "subtle"
version = "2.6.1"
@@ -7210,6 +7816,18 @@ dependencies = [
"rustls-pki-types",
]
[[package]]
name = "which"
version = "7.0.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "24d643ce3fd3e5b54854602a080f34fb10ab75e0b813ee32d00ca2b44fa74762"
dependencies = [
"either",
"env_home",
"rustix 1.1.4",
"winsafe",
]
[[package]]
name = "whoami"
version = "1.6.1"
@@ -7585,6 +8203,12 @@ dependencies = [
"windows-sys 0.48.0",
]
[[package]]
name = "winsafe"
version = "0.0.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d135d17ab770252ad95e9a872d365cf3090e3be864a34ab46f48555993efc904"
[[package]]
name = "wit-bindgen"
version = "0.51.0"

View File

@@ -2,6 +2,7 @@
resolver = "2"
members = [
"private_repos/*",
"examples/*",
"harmony",
"harmony_types",
"harmony_macros",
@@ -16,9 +17,12 @@ members = [
"harmony_secret_derive",
"harmony_secret",
"adr/agent_discovery/mdns",
"brocade",
"harmony_agent",
"harmony_agent/deploy", "harmony_node_readiness", "harmony-k8s",
"brocade",
"harmony_agent",
"harmony_agent/deploy",
"harmony_node_readiness",
"harmony-k8s",
"harmony_e2e_tests",
]
[workspace.package]

299
docs/coding-guide.md Normal file
View File

@@ -0,0 +1,299 @@
# Harmony Coding Guide
Harmony is an infrastructure automation framework. It is **code-first and code-only**: operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Good code here means a good operator experience.
### Concrete context
We use here the context of the KVM module to explain the coding style. This will make it very easy to understand and should translate quite well to other modules/contexts managed by Harmony like OPNSense and Kubernetes.
## Core Philosophy
### The Careful Craftsman Principle
Harmony is a powerful framework that does a lot. With that power comes responsibility. Every abstraction, every trait, every module must earn its place. Before adding anything, ask:
1. **Does this solve a real problem users have?** Not a theoretical problem, an actual one encountered in production.
2. **Is this the simplest solution that works?** Complexity is a cost that compounds over time.
3. **Will this make the next developer's life easier or harder?** Code is read far more often than written.
When in doubt, don't abstract. Wait for the pattern to emerge from real usage. A little duplication is better than the wrong abstraction.
### High-level functions over raw primitives
Callers should not need to know about underlying protocols, XML schemas, or API quirks. A function that deploys a VM should accept meaningful parameters like CPU count, memory, and network name — not XML strings.
```rust
// Bad: caller constructs XML and passes it to a thin wrapper
let xml = format!(r#"<domain type='kvm'>...</domain>"#, name, memory_kb, ...);
executor.create_vm(&xml).await?;
// Good: caller describes intent, the module handles representation
executor.define_vm(&VmConfig::builder("my-vm")
.cpu(4)
.memory_gb(8)
.disk(DiskConfig::new(50))
.network(NetworkRef::named("mylan"))
.boot_order([BootDevice::Network, BootDevice::Disk])
.build())
.await?;
```
The module owns the XML, the virsh invocations, the API calls — not the caller.
### Use the right abstraction layer
Prefer native library bindings over shelling out to CLI tools. The `virt` crate provides direct libvirt bindings and should be used instead of spawning `virsh` subprocesses.
- CLI subprocess calls are fragile: stdout/stderr parsing, exit codes, quoting, PATH differences
- Native bindings give typed errors, no temp files, no shell escaping
- `virt::connect::Connect` opens a connection; `virt::domain::Domain` manages VMs; `virt::network::Network` manages virtual networks
### Keep functions small and well-named
Each function should do one thing. If a function is doing two conceptually separate things, split it. Function names should read like plain English: `ensure_network_active`, `define_vm`, `vm_is_running`.
### Prefer short modules over large files
Group related types and functions by concept. A module that handles one resource (e.g., network, domain, storage) is better than a single file for everything.
---
## Error Handling
### Use `thiserror` for all error types
Define error types with `thiserror::Error`. This removes the boilerplate of implementing `Display` and `std::error::Error` by hand, keeps error messages close to their variants, and makes types easy to extend.
```rust
// Bad: hand-rolled Display + std::error::Error
#[derive(Debug)]
pub enum KVMError {
ConnectionError(String),
VMNotFound(String),
}
impl std::fmt::Display for KVMError { ... }
impl std::error::Error for KVMError {}
// Good: derive Display via thiserror
#[derive(thiserror::Error, Debug)]
pub enum KVMError {
#[error("connection failed: {0}")]
ConnectionFailed(String),
#[error("VM not found: {name}")]
VmNotFound { name: String },
}
```
### Make bubbling errors easy with `?` and `From`
`?` works on any error type for which there is a `From` impl. Add `From` conversions from lower-level errors into your module's error type so callers can use `?` without boilerplate.
With `thiserror`, wrapping a foreign error is one line:
```rust
#[derive(thiserror::Error, Debug)]
pub enum KVMError {
#[error("libvirt error: {0}")]
Libvirt(#[from] virt::error::Error),
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
}
```
This means a call that returns `virt::error::Error` can be `?`-propagated into a `Result<_, KVMError>` without any `.map_err(...)`.
### Typed errors over stringly-typed errors
Avoid `Box<dyn Error>` or `String` as error return types in library code. Callers need to distinguish errors programmatically — `KVMError::VmAlreadyExists` is actionable, `"VM already exists: foo"` as a `String` is not.
At binary entry points (e.g., `main`) it is acceptable to convert to `String` or `anyhow::Error` for display.
---
## Logging
### Use the `log` crate macros
All log output must go through the `log` crate. Never use `println!`, `eprintln!`, or `dbg!` in library code. This makes output compatible with any logging backend (env_logger, tracing, structured logging, etc.).
```rust
// Bad
println!("Creating VM: {}", name);
// Good
use log::{info, debug, warn};
info!("Creating VM: {name}");
debug!("VM XML:\n{xml}");
warn!("Network already active, skipping creation");
```
Use the right level:
| Level | When to use |
|---------|-------------|
| `error` | Unrecoverable failures (before returning Err) |
| `warn` | Recoverable issues, skipped steps |
| `info` | High-level progress events visible in normal operation |
| `debug` | Detailed operational info useful for debugging |
| `trace` | Very granular, per-iteration or per-call data |
Log before significant operations and after unexpected conditions. Do not log inside tight loops at `info` level.
---
## Types and Builders
### Derive `Serialize` on all public domain types
All public structs and enums that represent configuration or state should derive `serde::Serialize`. Add `Deserialize` when round-trip serialization is needed.
### Builder pattern for complex configs
When a type has more than three fields or optional fields, provide a builder. The builder pattern allows named, incremental construction without positional arguments.
```rust
let config = VmConfig::builder("bootstrap")
.cpu(4)
.memory_gb(8)
.disk(DiskConfig::new(50).labeled("os"))
.disk(DiskConfig::new(100).labeled("data"))
.network(NetworkRef::named("harmonylan"))
.boot_order([BootDevice::Network, BootDevice::Disk])
.build();
```
### Avoid `pub` fields on config structs
Expose data through methods or the builder, not raw field access. This preserves the ability to validate, rename, or change representation without breaking callers.
---
## Async
### Use `tokio` for all async runtime needs
All async code runs on tokio. Use `tokio::spawn`, `tokio::time`, etc. Use `#[async_trait]` for traits with async methods.
### No blocking in async context
Never call blocking I/O (file I/O, network, process spawn) directly in an async function. Use `tokio::fs`, `tokio::process`, or `tokio::task::spawn_blocking` as appropriate.
---
## Module Structure
### Follow the `Score` / `Interpret` pattern
Modules that represent deployable infrastructure should implement `Score<T: Topology>` and `Interpret<T>`:
- `Score` is the serializable, clonable configuration declaring *what* to deploy
- `Interpret` does the actual work when `execute()` is called
```rust
pub struct KvmScore {
network: NetworkConfig,
vms: Vec<VmConfig>,
}
impl<T: Topology + KvmHost> Score<T> for KvmScore {
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(KvmInterpret::new(self.clone()))
}
fn name(&self) -> String { "KvmScore".to_string() }
}
```
### Flatten the public API in `mod.rs`
Internal submodules are implementation detail. Re-export what callers need at the module root:
```rust
// modules/kvm/mod.rs
mod connection;
mod domain;
mod network;
mod error;
mod xml;
pub use connection::KvmConnection;
pub use domain::{VmConfig, VmConfigBuilder, VmStatus, DiskConfig, BootDevice};
pub use error::KvmError;
pub use network::NetworkConfig;
```
---
## Commit Style
Follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/):
```
feat(kvm): add network isolation support
fix(kvm): correct memory unit conversion for libvirt
refactor(kvm): replace virsh subprocess calls with virt crate bindings
docs: add coding guide
```
Keep pull requests small and single-purpose (under ~200 lines excluding generated code). Do not mix refactoring, bug fixes, and new features in one PR.
---
## When to Add Abstractions
Harmony provides powerful abstraction mechanisms: traits, generics, the Score/Interpret pattern, and capabilities. Use them judiciously.
### Add an abstraction when:
- **You have three or more concrete implementations** doing the same thing. Two is often coincidence; three is a pattern.
- **The abstraction provides compile-time safety** that prevents real bugs (e.g., capability bounds on topologies).
- **The abstraction hides genuine complexity** that callers shouldn't need to understand (e.g., XML schema generation for libvirt).
### Don't add an abstraction when:
- **It's just to avoid a few lines of boilerplate**. Copy-paste is sometimes better than a trait hierarchy.
- **You're anticipating future flexibility** that isn't needed today. YAGNI (You Aren't Gonna Need It).
- **The abstraction makes the code harder to understand** for someone unfamiliar with the codebase.
- **You're wrapping a single implementation**. A trait with one implementation is usually over-engineering.
### Signs you've over-abstracted:
- You need to explain the type system to a competent Rust developer for them to understand how to add a simple feature.
- Adding a new concrete type requires changes in multiple trait definitions.
- The word "factory" or "manager" appears in your type names.
- You have more trait definitions than concrete implementations.
### The Rule of Three for Traits
Before creating a new trait, ensure you have:
1. A clear, real use case (not hypothetical)
2. At least one concrete implementation
3. A plan for how callers will use it
Only generalize when the pattern is proven. The monitoring module is a good example: we had multiple alert senders (OKD, KubePrometheus, RHOB) before we introduced the `AlertSender` and `AlertReceiver<S>` traits. The traits emerged from real needs, not design sessions.
---
## Documentation
### Document the "why", not the "what"
Code should be self-explanatory for the "what". Comments and documentation should explain intent, rationale, and gotchas.
```rust
// Bad: restates the code
// Returns the number of VMs
fn vm_count(&self) -> usize { self.vms.len() }
// Good: explains the why
// Returns 0 if connection is lost, rather than erroring,
// because monitoring code uses this for health checks
fn vm_count(&self) -> usize { self.vms.len() }
```
### Keep examples in the `examples/` directory
Working code beats documentation. Every major feature should have a runnable example that demonstrates real usage.

View File

@@ -3,12 +3,10 @@ use harmony::{
modules::monitoring::{
alert_channel::discord_alert_channel::DiscordReceiver,
alert_rule::{
alerts::{
infra::opnsense::high_http_error_rate, k8s::pvc::high_pvc_fill_rate_over_two_days,
},
alerts::infra::opnsense::high_http_error_rate,
prometheus_alert_rule::AlertManagerRuleGroup,
},
okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
cluster_alerting::ClusterAlertingScore,
scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
},
topology::{
@@ -21,22 +19,37 @@ use harmony_macros::{hurl, ip};
#[tokio::main]
async fn main() {
let platform_matcher = AlertMatcher {
label: "prometheus".to_string(),
operator: MatchOp::Eq,
value: "openshift-monitoring/k8s".to_string(),
};
let severity = AlertMatcher {
label: "severity".to_string(),
operator: MatchOp::Eq,
value: "critical".to_string(),
let critical_receiver = DiscordReceiver {
name: "critical-alerts".to_string(),
url: hurl!("https://discord.example.com/webhook/critical"),
route: AlertRoute {
matchers: vec![AlertMatcher {
label: "severity".to_string(),
operator: MatchOp::Eq,
value: "critical".to_string(),
}],
..AlertRoute::default("critical-alerts".to_string())
},
};
let high_http_error_rate = high_http_error_rate();
let warning_receiver = DiscordReceiver {
name: "warning-alerts".to_string(),
url: hurl!("https://discord.example.com/webhook/warning"),
route: AlertRoute {
matchers: vec![AlertMatcher {
label: "severity".to_string(),
operator: MatchOp::Eq,
value: "warning".to_string(),
}],
repeat_interval: Some("30m".to_string()),
..AlertRoute::default("warning-alerts".to_string())
},
};
let additional_rules = AlertManagerRuleGroup::new("test-rule", vec![high_http_error_rate]);
let additional_rules =
AlertManagerRuleGroup::new("infra-alerts", vec![high_http_error_rate()]);
let scrape_target = PrometheusNodeExporter {
let firewall_scraper = PrometheusNodeExporter {
job_name: "firewall".to_string(),
metrics_path: "/metrics".to_string(),
listen_address: ip!("192.168.1.1"),
@@ -44,22 +57,16 @@ async fn main() {
..Default::default()
};
let alerting_score = ClusterAlertingScore::new()
.critical_receiver(Box::new(critical_receiver))
.warning_receiver(Box::new(warning_receiver))
.additional_rule(Box::new(additional_rules))
.scrape_target(Box::new(firewall_scraper));
harmony_cli::run(
Inventory::autoload(),
K8sAnywhereTopology::from_env(),
vec![Box::new(OpenshiftClusterAlertScore {
receivers: vec![Box::new(DiscordReceiver {
name: "crit-wills-discord-channel-example".to_string(),
url: hurl!("https://test.io"),
route: AlertRoute {
matchers: vec![severity],
..AlertRoute::default("crit-wills-discord-channel-example".to_string())
},
})],
sender: harmony::modules::monitoring::okd::OpenshiftClusterAlertSender,
rules: vec![Box::new(additional_rules)],
scrape_targets: Some(vec![Box::new(scrape_target)]),
})],
vec![Box::new(alerting_score)],
None,
)
.await

View File

@@ -46,6 +46,14 @@ impl std::fmt::Debug for K8sClient {
}
impl K8sClient {
pub fn inner_client(&self) -> &Client {
&self.client
}
pub fn inner_client_clone(&self) -> Client {
self.client.clone()
}
/// Create a client, reading `DRY_RUN` from the environment.
pub fn new(client: Client) -> Self {
Self {

View File

@@ -0,0 +1,194 @@
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::monitoring::{
alert_rule::{
alerts::k8s::{
deployment::alert_deployment_unavailable, memory_usage::alert_high_cpu_usage,
memory_usage::alert_high_memory_usage, pod::alert_container_restarting,
pod::alert_pod_not_ready, pod::pod_failed, pvc::high_pvc_fill_rate_over_two_days,
},
prometheus_alert_rule::AlertManagerRuleGroup,
},
okd::OpenshiftClusterAlertSender,
},
score::Score,
topology::{
monitoring::{
AlertReceiver, AlertRoute, AlertRule, AlertingInterpret, MatchOp, Observability,
ScrapeTarget,
},
Topology,
},
};
#[derive(Debug, Clone)]
pub struct ClusterAlertingScore {
pub critical_alerts_receiver: Option<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
pub warning_alerts_receiver: Option<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
pub additional_rules: Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>>,
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>>>,
pub include_default_rules: bool,
}
impl ClusterAlertingScore {
pub fn new() -> Self {
Self {
critical_alerts_receiver: None,
warning_alerts_receiver: None,
additional_rules: vec![],
scrape_targets: None,
include_default_rules: true,
}
}
pub fn critical_receiver(
mut self,
receiver: Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>,
) -> Self {
self.critical_alerts_receiver = Some(receiver);
self
}
pub fn warning_receiver(
mut self,
receiver: Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>,
) -> Self {
self.warning_alerts_receiver = Some(receiver);
self
}
pub fn additional_rule(
mut self,
rule: Box<dyn AlertRule<OpenshiftClusterAlertSender>>,
) -> Self {
self.additional_rules.push(rule);
self
}
pub fn scrape_target(
mut self,
target: Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>,
) -> Self {
self.scrape_targets
.get_or_insert_with(Vec::new)
.push(target);
self
}
pub fn with_default_rules(mut self, include: bool) -> Self {
self.include_default_rules = include;
self
}
fn build_default_rules(&self) -> Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>> {
if !self.include_default_rules {
return vec![];
}
let critical_rules =
AlertManagerRuleGroup::new("cluster-critical-alerts", vec![pod_failed()]);
let warning_rules = AlertManagerRuleGroup::new(
"cluster-warning-alerts",
vec![
alert_deployment_unavailable(),
alert_container_restarting(),
alert_pod_not_ready(),
alert_high_memory_usage(),
alert_high_cpu_usage(),
high_pvc_fill_rate_over_two_days(),
],
);
vec![Box::new(critical_rules), Box::new(warning_rules)]
}
fn build_receivers(&self) -> Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>> {
let mut receivers = vec![];
if let Some(ref critical_receiver) = self.critical_alerts_receiver {
receivers.push(critical_receiver.clone());
}
if let Some(ref warning_receiver) = self.warning_alerts_receiver {
receivers.push(warning_receiver.clone());
}
receivers
}
}
impl Default for ClusterAlertingScore {
fn default() -> Self {
Self::new()
}
}
impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T> for ClusterAlertingScore {
fn name(&self) -> String {
"ClusterAlertingScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
let mut all_rules = self.build_default_rules();
all_rules.extend(self.additional_rules.clone());
let receivers = self.build_receivers();
Box::new(AlertingInterpret {
sender: OpenshiftClusterAlertSender,
receivers,
rules: all_rules,
scrape_targets: self.scrape_targets.clone(),
})
}
}
impl Serialize for ClusterAlertingScore {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
serde_json::json!({
"name": "ClusterAlertingScore",
"include_default_rules": self.include_default_rules,
"has_critical_receiver": self.critical_alerts_receiver.is_some(),
"has_warning_receiver": self.warning_alerts_receiver.is_some(),
"additional_rules_count": self.additional_rules.len(),
"scrape_targets_count": self.scrape_targets.as_ref().map(|t| t.len()).unwrap_or(0),
})
.serialize(serializer)
}
}
pub fn critical_route() -> AlertRoute {
AlertRoute {
receiver: "critical".to_string(),
matchers: vec![crate::topology::monitoring::AlertMatcher {
label: "severity".to_string(),
operator: MatchOp::Eq,
value: "critical".to_string(),
}],
group_by: vec![],
repeat_interval: Some("5m".to_string()),
continue_matching: false,
children: vec![],
}
}
pub fn warning_route() -> AlertRoute {
AlertRoute {
receiver: "warning".to_string(),
matchers: vec![crate::topology::monitoring::AlertMatcher {
label: "severity".to_string(),
operator: MatchOp::Eq,
value: "warning".to_string(),
}],
group_by: vec![],
repeat_interval: Some("30m".to_string()),
continue_matching: false,
children: vec![],
}
}

View File

@@ -0,0 +1,3 @@
mod cluster_alerting_score;
pub use cluster_alerting_score::{critical_route, warning_route, ClusterAlertingScore};

View File

@@ -1,6 +1,7 @@
pub mod alert_channel;
pub mod alert_rule;
pub mod application_monitoring;
pub mod cluster_alerting;
pub mod grafana;
pub mod kube_prometheus;
pub mod ntfy;

View File

@@ -0,0 +1,32 @@
[package]
name = "harmony_e2e_tests"
version = "0.1.0"
edition = "2021"
description = "Harmony end-to-end test runner"
license = "Apache-2.0"
repository = "https://github.com/nationtech/harmony"
rust-version = "1.75.0"
[dependencies]
clap = { version = "4.4", features = ["derive"] }
chrono = { version = "0.4", features = ["serde"] }
env_logger = "0.11"
kube = { workspace = true }
k8s-openapi = { workspace = true }
log = "0.4"
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = "2.0"
tokio = { workspace = true }
which = "7.0"
inventory = "0.3"
tempfile = { workspace = true }
k3d-rs = { path = "../k3d" }
harmony = { path = "../harmony" }
sqlx = { version = "0.8", features = ["runtime-tokio", "postgres", "tls-rustls"] }
tokio-stream = "0.1"
async-trait.workspace = true
[[bin]]
name = "harmony-e2e"
path = "src/main.rs"

View File

@@ -0,0 +1,68 @@
mod test_harness;
mod tests;
use clap::{Parser, Subcommand};
use test_harness::find_tests;
#[derive(Parser)]
#[command(name = "harmony-e2e")]
#[command(about = "Harmony end-to-end test runner", long_about = None)]
struct Cli {
#[command(subcommand)]
command: Commands,
#[arg(short, long, default_value = "info")]
log_level: String,
}
#[derive(Subcommand)]
enum Commands {
List {
#[arg(short, long)]
filter: Option<String>,
},
Run {
#[arg(short, long)]
filter: Option<String>,
},
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cli = Cli::parse();
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or(&cli.log_level))
.init();
match cli.command {
Commands::List { filter } => {
let tests = find_tests(filter.as_deref());
if tests.is_empty() {
println!("No tests found matching filter.");
} else {
println!("Available tests:");
for test in tests {
println!(" {} - {}", test.name(), test.description());
}
}
}
Commands::Run { filter } => {
let tests = find_tests(filter.as_deref());
if tests.is_empty() {
return Err("No tests found matching filter.".into());
}
log::info!("Running {} test(s)...", tests.len());
for test in tests {
log::info!("=== Running: {} ===", test.name());
test.run()
.await
.map_err(|e| e as Box<dyn std::error::Error>)?;
log::info!("=== Passed: {} ===", test.name());
}
log::info!("All tests passed!");
}
}
Ok(())
}

View File

@@ -0,0 +1,60 @@
use async_trait::async_trait;
use thiserror::Error;
#[derive(Error, Debug)]
pub enum HarnessError {
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
}
pub struct TestContext {
pub test_name: String,
pub namespace: String,
}
impl TestContext {
pub fn new(test_name: &str) -> Result<Self, HarnessError> {
let namespace = format!("harmony-test-{}", test_name);
Ok(Self {
test_name: test_name.to_string(),
namespace,
})
}
}
#[async_trait]
pub trait Test: Sync {
fn name(&self) -> &'static str;
fn description(&self) -> &'static str;
async fn run(&self) -> Result<(), Box<dyn std::error::Error + Send + Sync>>;
}
pub struct TestEntry {
pub test: &'static dyn Test,
}
inventory::collect!(TestEntry);
#[macro_export]
macro_rules! register_test {
($test:expr) => {
inventory::submit! {
$crate::test_harness::TestEntry { test: $test }
}
};
}
pub fn all_tests() -> impl Iterator<Item = &'static dyn Test> {
inventory::iter::<TestEntry>().map(|entry| entry.test)
}
pub fn find_tests(filter: Option<&str>) -> Vec<&'static dyn Test> {
let filter = filter.map(|f| f.to_lowercase());
all_tests()
.filter(|t| match &filter {
Some(f) => t.name().to_lowercase().contains(f),
None => true,
})
.collect()
}

View File

@@ -0,0 +1,206 @@
use crate::register_test;
use crate::test_harness::{HarnessError, Test, TestContext};
use async_trait::async_trait;
use harmony::{
inventory::Inventory,
modules::postgresql::{capability::PostgreSQLConfig, PostgreSQLScore},
score::Score,
topology::{K8sAnywhereTopology, K8sclient, Topology},
};
use k8s_openapi::api::core::v1::Pod;
use kube::api::{Api, ListParams};
use log::{info, warn};
use std::time::Duration;
use thiserror::Error;
#[derive(Error, Debug)]
pub enum PostgresTestError {
#[error("Failed to create test context: {0}")]
ContextCreation(#[from] HarnessError),
#[error("Failed to initialize topology: {0}")]
TopologyInit(String),
#[error("Failed to interpret postgresql score: {0}")]
InterpretError(String),
#[error("Failed to get k8s client: {0}")]
K8sClient(String),
#[error("PostgreSQL deployment timed out after {timeout_seconds}s in namespace {namespace}")]
DeploymentTimeout {
namespace: String,
timeout_seconds: u64,
},
#[error("PostgreSQL connection verification failed: {0}")]
ConnectionVerification(String),
#[error("SQL query failed: {0}")]
SqlQueryFailed(String),
}
pub struct CnpgPostgresTest;
impl CnpgPostgresTest {
pub const INSTANCE: Self = CnpgPostgresTest;
}
#[async_trait]
impl Test for CnpgPostgresTest {
fn name(&self) -> &'static str {
"cnpg_postgres"
}
fn description(&self) -> &'static str {
"CNPG PostgreSQL deployment using Harmony's PostgreSQL module"
}
async fn run(&self) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
run_impl()
.await
.map_err(|e| Box::new(e) as Box<dyn std::error::Error + Send + Sync>)
}
}
register_test!(&CnpgPostgresTest::INSTANCE);
async fn run_impl() -> Result<(), PostgresTestError> {
let ctx = TestContext::new("cnpg-postgres")?;
info!("=== Test: CNPG PostgreSQL deployment ===");
info!("Step 1: Initializing K8sAnywhereTopology...");
let topology = K8sAnywhereTopology::from_env();
info!("Step 2: Ensuring topology is ready...");
topology
.ensure_ready()
.await
.map_err(|e| PostgresTestError::TopologyInit(e.to_string()))?;
info!("Step 3: Creating PostgreSQL deployment score...");
let pg_score = PostgreSQLScore {
config: PostgreSQLConfig {
cluster_name: format!("{}-pg", ctx.test_name),
namespace: ctx.namespace.clone(),
instances: 1,
..Default::default()
},
};
info!("Step 4: Deploying PostgreSQL using Harmony's PostgreSQL module...");
let outcome = pg_score
.interpret(&Inventory::empty(), &topology)
.await
.map_err(|e| PostgresTestError::InterpretError(e.to_string()))?;
info!("Deployment outcome: {}", outcome.message);
info!("Step 5: Waiting for PostgreSQL cluster to be ready...");
let cluster_name = &pg_score.config.cluster_name;
wait_for_postgres_ready(&topology, &ctx.namespace, cluster_name, 300).await?;
info!("Step 6: Verifying PostgreSQL is working with a SQL query...");
let result = verify_postgres_connection(&ctx.namespace, cluster_name).await?;
info!("Query result: {}", result);
assert!(result.contains("1"), "Expected query to return 1");
info!("=== Test PASSED: CNPG PostgreSQL deployment ===\n");
Ok(())
}
async fn wait_for_postgres_ready(
topology: &K8sAnywhereTopology,
namespace: &str,
cluster_name: &str,
timeout_seconds: u64,
) -> Result<String, PostgresTestError> {
let client = topology
.k8s_client()
.await
.map_err(PostgresTestError::K8sClient)?;
let pods: Api<Pod> = Api::namespaced(client.inner_client_clone(), namespace);
let label_selector = format!("postgresql.cnpg.io/cluster={}", cluster_name);
let deadline = tokio::time::Instant::now() + Duration::from_secs(timeout_seconds);
let mut interval = tokio::time::interval(Duration::from_secs(5));
loop {
interval.tick().await;
if tokio::time::Instant::now() > deadline {
return Err(PostgresTestError::DeploymentTimeout {
namespace: namespace.to_string(),
timeout_seconds,
});
}
let pod_list = pods
.list(&ListParams::default().labels(&label_selector))
.await
.map_err(|e| PostgresTestError::K8sClient(e.to_string()))?;
for pod in pod_list.items {
if let Some(status) = &pod.status {
if status.phase.as_deref() == Some("Running") {
if let Some(conditions) = &status.conditions {
if conditions
.iter()
.any(|c| c.type_ == "Ready" && c.status == "True")
{
let pod_name = pod.metadata.name.clone().unwrap_or_default();
info!("PostgreSQL pod '{}' is ready", pod_name);
return Ok(pod_name);
}
}
}
}
}
warn!(
"Waiting for PostgreSQL pod with label '{}' to be ready...",
label_selector
);
}
}
async fn verify_postgres_connection(
namespace: &str,
cluster_name: &str,
) -> Result<String, PostgresTestError> {
let pod_name = format!("{}-1", cluster_name);
let mut cmd = tokio::process::Command::new("kubectl");
cmd.args([
"exec",
"-n",
namespace,
&pod_name,
"--",
"psql",
"-U",
"app",
"-d",
"app",
"-t",
"-c",
"SELECT 1 AS test;",
]);
let output = cmd
.output()
.await
.map_err(|e| PostgresTestError::ConnectionVerification(e.to_string()))?;
if !output.status.success() {
return Err(PostgresTestError::SqlQueryFailed(
String::from_utf8_lossy(&output.stderr).to_string(),
));
}
Ok(String::from_utf8_lossy(&output.stdout).trim().to_string())
}

View File

@@ -0,0 +1,147 @@
use crate::register_test;
use crate::test_harness::{HarnessError, Test, TestContext};
use async_trait::async_trait;
use harmony::topology::{K8sAnywhereTopology, K8sclient, Topology};
use k8s_openapi::api::core::v1::Node;
use kube::api::{Api, ListParams};
use log::info;
use thiserror::Error;
#[derive(Error, Debug)]
pub enum K3dTestError {
#[error("Failed to create test context: {0}")]
ContextCreation(#[from] HarnessError),
#[error("Failed to initialize topology: {0}")]
TopologyInit(String),
#[error("Failed to get k8s client: {0}")]
K8sClient(String),
#[error("Cluster validation failed: expected {expected_nodes} nodes, found {nodes_count}")]
ClusterValidation {
nodes_count: usize,
expected_nodes: usize,
},
#[error("Node {node_name} is not ready")]
NodeNotReady { node_name: String },
#[error("No nodes found in cluster")]
NoNodesFound,
}
pub struct K3dClusterTest;
impl K3dClusterTest {
pub const INSTANCE: Self = K3dClusterTest;
}
#[async_trait]
impl Test for K3dClusterTest {
fn name(&self) -> &'static str {
"k3d_cluster"
}
fn description(&self) -> &'static str {
"k3d cluster creation with Harmony modules"
}
async fn run(&self) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
run_impl()
.await
.map_err(|e| Box::new(e) as Box<dyn std::error::Error + Send + Sync>)
}
}
register_test!(&K3dClusterTest::INSTANCE);
async fn run_impl() -> Result<(), K3dTestError> {
info!("=== Test: k3d cluster creation with Harmony modules ===");
let _ctx = TestContext::new("k3d-cluster")?;
info!("Step 1: Initializing K8sAnywhereTopology...");
let topology = K8sAnywhereTopology::from_env();
info!("Step 2: Ensuring topology is ready (this installs k3d if needed)...");
topology
.ensure_ready()
.await
.map_err(|e| K3dTestError::TopologyInit(e.to_string()))?;
info!("Step 3: Validating cluster is operational...");
validate_cluster(&topology).await?;
info!("Step 4: Verifying all nodes are ready...");
verify_nodes_ready(&topology).await?;
info!("=== Test PASSED: k3d cluster creation ===");
Ok(())
}
async fn validate_cluster(topology: &K8sAnywhereTopology) -> Result<(), K3dTestError> {
let client = topology
.k8s_client()
.await
.map_err(K3dTestError::K8sClient)?;
let nodes: Api<Node> = Api::all(client.inner_client_clone());
let node_list = nodes
.list(&ListParams::default())
.await
.map_err(|e| K3dTestError::K8sClient(e.to_string()))?;
let nodes_count = node_list.items.len();
if nodes_count == 0 {
return Err(K3dTestError::NoNodesFound);
}
info!("Found {} node(s) in cluster", nodes_count);
for node in &node_list.items {
let node_name = node.metadata.name.as_deref().unwrap_or("unknown");
info!(" - Node: {}", node_name);
}
if nodes_count < 1 {
return Err(K3dTestError::ClusterValidation {
nodes_count,
expected_nodes: 1,
});
}
Ok(())
}
async fn verify_nodes_ready(topology: &K8sAnywhereTopology) -> Result<(), K3dTestError> {
let client = topology
.k8s_client()
.await
.map_err(K3dTestError::K8sClient)?;
let nodes: Api<Node> = Api::all(client.inner_client_clone());
let node_list = nodes
.list(&ListParams::default())
.await
.map_err(|e| K3dTestError::K8sClient(e.to_string()))?;
for node in node_list.items {
let node_name = node.metadata.name.clone().unwrap_or_default();
let conditions = node.status.and_then(|s| s.conditions).unwrap_or_default();
let ready = conditions
.iter()
.any(|c| c.type_ == "Ready" && c.status == "True");
if !ready {
return Err(K3dTestError::NodeNotReady { node_name });
}
info!("Node '{}' is Ready", node_name);
}
Ok(())
}

View File

@@ -0,0 +1,3 @@
pub mod cnpg_postgres;
pub mod k3d_cluster;
pub mod multicluster_postgres;

View File

@@ -0,0 +1,54 @@
use crate::register_test;
use crate::test_harness::{HarnessError, Test, TestContext};
use async_trait::async_trait;
use log::info;
use thiserror::Error;
#[derive(Error, Debug)]
pub enum MulticlusterPostgresTestError {
#[error("Failed to create test context: {0}")]
ContextCreation(#[from] HarnessError),
}
pub struct MulticlusterPostgresTest;
impl MulticlusterPostgresTest {
pub const INSTANCE: Self = MulticlusterPostgresTest;
}
#[async_trait]
impl Test for MulticlusterPostgresTest {
fn name(&self) -> &'static str {
"multicluster_postgres"
}
fn description(&self) -> &'static str {
"Multi-cluster PostgreSQL with failover"
}
async fn run(&self) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
run_impl()
.await
.map_err(|e| Box::new(e) as Box<dyn std::error::Error + Send + Sync>)
}
}
register_test!(&MulticlusterPostgresTest::INSTANCE);
async fn run_impl() -> Result<(), MulticlusterPostgresTestError> {
let _ctx = TestContext::new("multicluster-postgres")?;
info!("=== Test: Multi-cluster PostgreSQL with failover ===");
info!("This test is not yet fully implemented.");
info!("It will:");
info!(" 1. Create two k3d clusters (primary and replica)");
info!(" 2. Deploy CNPG operator on both clusters");
info!(" 3. Deploy primary PostgreSQL with LoadBalancer service");
info!(" 4. Extract replication certificates from primary");
info!(" 5. Deploy replica PostgreSQL configured to replicate from primary");
info!(" 6. Insert test data on primary");
info!(" 7. Verify data is replicated to replica");
info!("=== Test SKIPPED: Multi-cluster PostgreSQL (not implemented) ===\n");
Ok(())
}

0
infrastructure.rs Normal file
View File

0
mocks.rs, Normal file
View File

0
tiers.rs, Normal file
View File