Compare commits
9 Commits
fix/refact
...
feat/clust
| Author | SHA1 | Date | |
|---|---|---|---|
| 18d8ba2210 | |||
| c5b292d99b | |||
| 0258b31fd2 | |||
| 4407792bd5 | |||
| 7978a63004 | |||
| 58d00c95bb | |||
| 7d14f7646c | |||
| 69dd763d6e | |||
| 2e46ac3418 |
548
CI_and_testing_harmony_analysis.md
Normal file
548
CI_and_testing_harmony_analysis.md
Normal file
@@ -0,0 +1,548 @@
|
||||
# CI and Testing Strategy for Harmony
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Harmony aims to become a CNCF project, requiring a robust CI pipeline that demonstrates real-world reliability. The goal is to run **all examples** in CI, from simple k3d deployments to full HA OKD clusters on bare metal. This document provides context for designing and implementing this testing infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## Project Context
|
||||
|
||||
### What is Harmony?
|
||||
|
||||
Harmony is an infrastructure automation framework that is **code-first and code-only**. Operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Key differentiators:
|
||||
|
||||
1. **Compile-time safety**: The type system prevents "config-is-valid-but-platform-is-wrong" errors
|
||||
2. **Topology abstraction**: Write once, deploy to any environment (local k3d, OKD, bare metal, cloud)
|
||||
3. **Capability-based design**: Scores declare what they need; topologies provide what they have
|
||||
|
||||
### Core Abstractions
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| **Score** | Declarative description of desired state (the "what") |
|
||||
| **Topology** | Logical representation of infrastructure (the "where") |
|
||||
| **Capability** | A feature a topology offers (the "how") |
|
||||
| **Interpret** | Execution logic connecting Score to Topology |
|
||||
|
||||
### Compile-Time Verification
|
||||
|
||||
```rust
|
||||
// This compiles only if K8sAnywhereTopology provides K8sclient + HelmCommand
|
||||
impl<T: Topology + K8sclient + HelmCommand> Score<T> for MyScore { ... }
|
||||
|
||||
// This FAILS to compile - LinuxHostTopology doesn't provide K8sclient
|
||||
// (intentionally broken example for testing)
|
||||
impl<T: Topology + K8sclient> Score<T> for K8sResourceScore { ... }
|
||||
// error: LinuxHostTopology does not implement K8sclient
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Current Examples Inventory
|
||||
|
||||
### Summary Statistics
|
||||
|
||||
| Category | Count | CI Complexity |
|
||||
|----------|-------|---------------|
|
||||
| k3d-compatible | 22 | Low - single k3d cluster |
|
||||
| OKD-specific | 4 | Medium - requires OKD cluster |
|
||||
| Bare metal | 4 | High - requires physical infra or nested virtualization |
|
||||
| Multi-cluster | 3 | High - requires multiple K8s clusters |
|
||||
| No infra needed | 4 | Trivial - local only |
|
||||
|
||||
### Detailed Example Classification
|
||||
|
||||
#### Tier 1: k3d-Compatible (22 examples)
|
||||
|
||||
Can run on a local k3d cluster with minimal setup:
|
||||
|
||||
| Example | Topology | Capabilities | Special Notes |
|
||||
|---------|----------|--------------|---------------|
|
||||
| zitadel | K8sAnywhereTopology | K8sClient, HelmCommand | SSO/Identity |
|
||||
| node_health | K8sAnywhereTopology | K8sClient | Health checks |
|
||||
| public_postgres | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Needs ingress |
|
||||
| openbao | K8sAnywhereTopology | K8sClient, HelmCommand | Vault alternative |
|
||||
| rust | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Webapp deployment |
|
||||
| cert_manager | K8sAnywhereTopology | K8sClient, CertificateManagement | TLS certificates |
|
||||
| try_rust_webapp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Full webapp |
|
||||
| monitoring | K8sAnywhereTopology | K8sClient, HelmCommand, Observability | Prometheus |
|
||||
| application_monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
|
||||
| monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
|
||||
| postgresql | K8sAnywhereTopology | K8sClient, HelmCommand | CloudNativePG |
|
||||
| ntfy | K8sAnywhereTopology | K8sClient, HelmCommand | Notifications |
|
||||
| tenant | K8sAnywhereTopology | K8sClient, TenantManager | Namespace isolation |
|
||||
| lamp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | LAMP stack |
|
||||
| k8s_drain_node | K8sAnywhereTopology | K8sClient | Node operations |
|
||||
| k8s_write_file_on_node | K8sAnywhereTopology | K8sClient | Node operations |
|
||||
| remove_rook_osd | K8sAnywhereTopology | K8sClient | Ceph operations |
|
||||
| validate_ceph_cluster_health | K8sAnywhereTopology | K8sClient | Ceph health |
|
||||
| kube-rs | Direct kube | K8sClient | Raw kube-rs demo |
|
||||
| brocade_snmp_server | K8sAnywhereTopology | K8sClient | SNMP collector |
|
||||
| harmony_inventory_builder | LocalhostTopology | None | Network scanning |
|
||||
| cli | LocalhostTopology | None | CLI demo |
|
||||
|
||||
#### Tier 2: OKD/OpenShift-Specific (4 examples)
|
||||
|
||||
Require OKD/OpenShift features not available in vanilla K8s:
|
||||
|
||||
| Example | Topology | OKD-Specific Feature |
|
||||
|---------|----------|---------------------|
|
||||
| okd_cluster_alerts | K8sAnywhereTopology | OpenShift Monitoring CRDs |
|
||||
| operatorhub_catalog | K8sAnywhereTopology | OpenShift OperatorHub |
|
||||
| rhob_application_monitoring | K8sAnywhereTopology | RHOB (Red Hat Observability) |
|
||||
| nats-supercluster | K8sAnywhereTopology | OKD Routes (OpenShift Ingress) |
|
||||
|
||||
#### Tier 3: Bare Metal Infrastructure (4 examples)
|
||||
|
||||
Require physical hardware or full virtualization:
|
||||
|
||||
| Example | Topology | Physical Requirements |
|
||||
|---------|----------|----------------------|
|
||||
| okd_installation | HAClusterTopology | OPNSense, Brocade switch, PXE boot, 3+ nodes |
|
||||
| okd_pxe | HAClusterTopology | OPNSense, Brocade switch, PXE infrastructure |
|
||||
| sttest | HAClusterTopology | Full HA cluster with all network services |
|
||||
| opnsense | OPNSenseFirewall | OPNSense firewall access |
|
||||
| opnsense_node_exporter | Custom | OPNSense firewall |
|
||||
|
||||
#### Tier 4: Multi-Cluster (3 examples)
|
||||
|
||||
Require multiple K8s clusters:
|
||||
|
||||
| Example | Topology | Clusters Required |
|
||||
|---------|----------|-------------------|
|
||||
| nats | K8sAnywhereTopology × 2 | 2 clusters with NATS gateways |
|
||||
| nats-module | DecentralizedTopology | 3 clusters for supercluster |
|
||||
| multisite_postgres | FailoverTopology | 2 clusters for replication |
|
||||
|
||||
---
|
||||
|
||||
## Testing Categories
|
||||
|
||||
### 1. Compile-Time Tests
|
||||
|
||||
These tests verify that the type system correctly rejects invalid configurations:
|
||||
|
||||
```rust
|
||||
// Should NOT compile - K8sResourceScore on LinuxHostTopology
|
||||
#[test]
|
||||
#[compile_fail]
|
||||
fn test_k8s_score_on_linux_host() {
|
||||
let score = K8sResourceScore::new();
|
||||
let topology = LinuxHostTopology::new();
|
||||
// This line should fail to compile
|
||||
harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
|
||||
}
|
||||
|
||||
// Should compile - K8sResourceScore on K8sAnywhereTopology
|
||||
#[test]
|
||||
fn test_k8s_score_on_k8s_topology() {
|
||||
let score = K8sResourceScore::new();
|
||||
let topology = K8sAnywhereTopology::from_env();
|
||||
// This should compile
|
||||
harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
|
||||
}
|
||||
```
|
||||
|
||||
**Implementation Options:**
|
||||
- `trybuild` crate for compile-time failure tests
|
||||
- Separate `tests/compile_fail/` directory with expected error messages
|
||||
|
||||
### 2. Unit Tests
|
||||
|
||||
Pure Rust logic without external dependencies:
|
||||
- Score serialization/deserialization
|
||||
- Inventory parsing
|
||||
- Type conversions
|
||||
- CRD generation
|
||||
|
||||
**Requirements:**
|
||||
- No external services
|
||||
- Sub-second execution
|
||||
- Run on every PR
|
||||
|
||||
### 3. Integration Tests (k3d)
|
||||
|
||||
Deploy to a local k3d cluster:
|
||||
|
||||
**Setup:**
|
||||
```bash
|
||||
# Install k3d
|
||||
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
|
||||
|
||||
# Create cluster
|
||||
k3d cluster create harmony-test \
|
||||
--agents 3 \
|
||||
--k3s-arg "--disable=traefik@server:0"
|
||||
|
||||
# Wait for ready
|
||||
kubectl wait --for=condition=Ready nodes --all --timeout=120s
|
||||
```
|
||||
|
||||
**Test Matrix:**
|
||||
| Example | k3d | Test Type |
|
||||
|---------|-----|-----------|
|
||||
| zitadel | ✅ | Deploy + health check |
|
||||
| cert_manager | ✅ | Deploy + certificate issuance |
|
||||
| monitoring | ✅ | Deploy + metric collection |
|
||||
| postgresql | ✅ | Deploy + database connectivity |
|
||||
| tenant | ✅ | Namespace creation + isolation |
|
||||
|
||||
### 4. Integration Tests (OKD)
|
||||
|
||||
Deploy to OKD/OpenShift cluster:
|
||||
|
||||
**Options:**
|
||||
1. **Nested virtualization**: Run OKD in VMs (slow, expensive)
|
||||
2. **CRC (CodeReady Containers)**: Single-node OKD (resource intensive)
|
||||
3. **Managed OpenShift**: AWS/Azure/GCP (costly)
|
||||
4. **Existing cluster**: Connect to pre-provisioned cluster (fastest)
|
||||
|
||||
**Test Matrix:**
|
||||
| Example | OKD Required | Test Type |
|
||||
|---------|--------------|-----------|
|
||||
| okd_cluster_alerts | ✅ | Alert rule deployment |
|
||||
| rhob_application_monitoring | ✅ | RHOB stack deployment |
|
||||
| operatorhub_catalog | ✅ | Operator installation |
|
||||
|
||||
### 5. End-to-End Tests (Full Infrastructure)
|
||||
|
||||
Complete infrastructure deployment including bare metal:
|
||||
|
||||
**Options:**
|
||||
1. **Libvirt + KVM**: Virtual machines on CI runner
|
||||
2. **Nested KVM**: KVM inside KVM (for cloud CI)
|
||||
3. **Dedicated hardware**: Physical test lab
|
||||
4. **Mock/Hybrid**: Mock physical components, real K8s
|
||||
|
||||
---
|
||||
|
||||
## CI Environment Options
|
||||
|
||||
### Option A: GitHub Actions (Current Standard)
|
||||
|
||||
**Pros:**
|
||||
- Native GitHub integration
|
||||
- Large runner ecosystem
|
||||
- Free for open source
|
||||
|
||||
**Cons:**
|
||||
- Limited nested virtualization support
|
||||
- 6-hour job timeout
|
||||
- Resource constraints on free runners
|
||||
|
||||
**Matrix:**
|
||||
```yaml
|
||||
strategy:
|
||||
matrix:
|
||||
os: [ubuntu-latest]
|
||||
rust: [stable, beta]
|
||||
k8s: [k3d, kind]
|
||||
tier: [unit, k3d-integration]
|
||||
```
|
||||
|
||||
### Option B: Self-Hosted Runners
|
||||
|
||||
**Pros:**
|
||||
- Full control over environment
|
||||
- Can run nested virtualization
|
||||
- No time limits
|
||||
- Persistent state between runs
|
||||
|
||||
**Cons:**
|
||||
- Maintenance overhead
|
||||
- Cost of infrastructure
|
||||
- Security considerations
|
||||
|
||||
**Setup:**
|
||||
- Bare metal servers with KVM support
|
||||
- Pre-installed k3d, kind, CRC
|
||||
- OPNSense VM for network tests
|
||||
|
||||
### Option C: Hybrid (GitHub + Self-Hosted)
|
||||
|
||||
**Pros:**
|
||||
- Fast unit tests on GitHub runners
|
||||
- Heavy tests on self-hosted infrastructure
|
||||
- Cost-effective
|
||||
|
||||
**Cons:**
|
||||
- Two CI systems to maintain
|
||||
- Complexity in test distribution
|
||||
|
||||
### Option D: Cloud CI (CircleCI, GitLab CI, etc.)
|
||||
|
||||
**Pros:**
|
||||
- Often better resource options
|
||||
- Docker-in-Docker support
|
||||
- Better nested virtualization
|
||||
|
||||
**Cons:**
|
||||
- Cost
|
||||
- Less GitHub-native
|
||||
|
||||
---
|
||||
|
||||
## Performance Requirements
|
||||
|
||||
### Target Execution Times
|
||||
|
||||
| Test Category | Target Time | Current (est.) |
|
||||
|---------------|-------------|----------------|
|
||||
| Compile-time tests | < 30s | Unknown |
|
||||
| Unit tests | < 60s | Unknown |
|
||||
| k3d integration (per example) | < 120s | 60-300s |
|
||||
| Full k3d matrix | < 15 min | 30-60 min |
|
||||
| OKD integration | < 30 min | 1-2 hours |
|
||||
| Full E2E | < 2 hours | 4-8 hours |
|
||||
|
||||
### Sub-Second Performance Strategies
|
||||
|
||||
1. **Parallel execution**: Run independent tests concurrently
|
||||
2. **Incremental testing**: Only run affected tests on changes
|
||||
3. **Cached clusters**: Pre-warm k3d clusters
|
||||
4. **Layered testing**: Fail fast on cheaper tests
|
||||
5. **Mock external services**: Fake Discord webhooks, etc.
|
||||
|
||||
---
|
||||
|
||||
## Test Data and Secrets Management
|
||||
|
||||
### Secrets Required
|
||||
|
||||
| Secret | Use | Storage |
|
||||
|--------|-----|---------|
|
||||
| Discord webhook URL | Alert receiver tests | GitHub Secrets |
|
||||
| OPNSense credentials | Network tests | Self-hosted only |
|
||||
| Cloud provider creds | Multi-cloud tests | Vault / GitHub Secrets |
|
||||
| TLS certificates | Ingress tests | Generated on-the-fly |
|
||||
|
||||
### Test Data
|
||||
|
||||
| Data | Source | Strategy |
|
||||
|------|--------|----------|
|
||||
| Container images | Public registries | Cache locally |
|
||||
| Helm charts | Public repos | Vendor in repo |
|
||||
| K8s manifests | Generated | Dynamic |
|
||||
|
||||
---
|
||||
|
||||
## Proposed Test Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ harmony_e2e_tests Package │
|
||||
│ (cargo run -p harmony_e2e_tests) │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ Compile │ │ Unit │ │ Compile-Fail Tests │ │
|
||||
│ │ Tests │ │ Tests │ │ (trybuild) │ │
|
||||
│ │ < 30s │ │ < 60s │ │ < 30s │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ k3d Integration Tests │ │
|
||||
│ │ Self-provisions k3d cluster, runs 22 examples │ │
|
||||
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
|
||||
│ │ │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ... │ │
|
||||
│ │ │ 60s │ │ 90s │ │ 120s │ │ 90s │ │ │
|
||||
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
|
||||
│ │ Parallel Execution │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ OKD Integration Tests │ │
|
||||
│ │ Connects to existing OKD cluster or provisions via KVM │ │
|
||||
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
|
||||
│ │ │ okd_cluster_ │ │ rhob_application_ │ │ │
|
||||
│ │ │ alerts (5 min) │ │ monitoring (10 min) │ │ │
|
||||
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ KVM-based E2E Tests │ │
|
||||
│ │ Uses Harmony's KVM module to provision test VMs │ │
|
||||
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
|
||||
│ │ │ okd_installation│ │ Full HA cluster deployment │ │ │
|
||||
│ │ │ (30-60 min) │ │ (60-120 min) │ │ │
|
||||
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Any CI system (GitHub Actions, GitLab CI, Jenkins, cron) just runs:
|
||||
cargo run -p harmony_e2e_tests
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ GitHub Actions │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ Compile │ │ Unit │ │ Compile-Fail Tests │ │
|
||||
│ │ Tests │ │ Tests │ │ (trybuild) │ │
|
||||
│ │ < 30s │ │ < 60s │ │ < 30s │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ k3d Integration Tests │ │
|
||||
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
|
||||
│ │ │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ... │ │
|
||||
│ │ │ 60s │ │ 90s │ │ 120s │ │ 90s │ │ │
|
||||
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
|
||||
│ │ Parallel Execution │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Self-Hosted Runners │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ OKD Integration Tests │ │
|
||||
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
|
||||
│ │ │ okd_cluster_ │ │ rhob_application_ │ │ │
|
||||
│ │ │ alerts (5 min) │ │ monitoring (10 min) │ │ │
|
||||
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ KVM-based E2E Tests (Harmony provisions) │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ Harmony KVM Module provisions test VMs │ │ │
|
||||
│ │ │ - OKD HA Cluster (3 control plane, 2 workers) │ │ │
|
||||
│ │ │ - OPNSense VM (router/firewall) │ │ │
|
||||
│ │ │ - Brocade simulator VM │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Questions for Researchers
|
||||
|
||||
### Critical Questions
|
||||
|
||||
1. **Self-contained test runner**: How to design `harmony_e2e_tests` package that runs all tests with a single `cargo run` command?
|
||||
|
||||
2. **Nested Virtualization**: What are the prerequisites for running KVM inside a test environment?
|
||||
|
||||
3. **Cost Optimization**: How to minimize cloud costs while running comprehensive E2E tests?
|
||||
|
||||
4. **Test Isolation**: How to ensure test isolation when running parallel k3d tests?
|
||||
|
||||
5. **State Management**: Should we persist k3d clusters between test runs, or create fresh each time?
|
||||
|
||||
6. **Mocking Strategy**: Which external services (Discord, OPNSense, etc.) should be mocked vs. real?
|
||||
|
||||
7. **Compile-Fail Tests**: Best practices for testing Rust compile-time errors?
|
||||
|
||||
8. **Multi-Cluster Tests**: How to efficiently provision and connect multiple K8s clusters in tests?
|
||||
|
||||
9. **Secrets Management**: How to handle secrets for test environments without external CI dependencies?
|
||||
|
||||
10. **Test Flakiness**: Strategies for reducing flakiness in infrastructure tests?
|
||||
|
||||
11. **Reporting**: How to present test results for complex multi-environment test matrices?
|
||||
|
||||
12. **Prerequisite Detection**: How to detect and validate prerequisites (Docker, k3d, KVM) before running tests?
|
||||
|
||||
### Research Areas
|
||||
|
||||
1. **CI/CD Tools**: Evaluate GitHub Actions, GitLab CI, CircleCI, Tekton, Prow for Harmony's needs
|
||||
|
||||
2. **K8s Test Tools**: Evaluate kind, k3d, minikube, microk8s for local testing
|
||||
|
||||
3. **Mock Frameworks**: Evaluate mock-server, wiremock, hoverfly for external service mocking
|
||||
|
||||
4. **Test Frameworks**: Evaluate built-in Rust test, nextest, cargo-tarpaulin for performance
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Week 1 (Agentic Velocity)
|
||||
- [ ] Compile-time verification tests working
|
||||
- [ ] Unit tests for monitoring module
|
||||
- [ ] First 5 k3d examples running in CI
|
||||
- [ ] Mock framework for Discord webhooks
|
||||
|
||||
### Week 2
|
||||
- [ ] All 22 k3d-compatible examples in CI
|
||||
- [ ] OKD self-hosted runner operational
|
||||
- [ ] KVM module reviewed and ready for CI
|
||||
|
||||
### Week 3-4
|
||||
- [ ] Full E2E tests with KVM infrastructure
|
||||
- [ ] Multi-cluster tests automated
|
||||
- [ ] All examples tested in CI
|
||||
|
||||
### Month 2
|
||||
- [ ] Sub-15-minute total CI time
|
||||
- [ ] Weekly E2E tests on bare metal
|
||||
- [ ] Documentation complete
|
||||
- [ ] Ready for CNCF submission
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Hardware Requirements
|
||||
|
||||
| Component | Minimum | Recommended |
|
||||
|-----------|---------|------------|
|
||||
| CPU | 4 cores | 8+ cores (for parallel tests) |
|
||||
| RAM | 8 GB | 32 GB (for KVM E2E) |
|
||||
| Disk | 50 GB SSD | 500 GB NVMe |
|
||||
| Docker | Required | Latest |
|
||||
| k3d | Required | v5.6.0 |
|
||||
| Kubectl | Required | v1.28.0 |
|
||||
| libvirt | Required | 9.0.0 (for KVM tests) |
|
||||
|
||||
### Software Requirements
|
||||
| Tool | Version |
|
||||
|------|---------|
|
||||
| Rust | 1.75+ |
|
||||
| Docker | 24.0+ |
|
||||
| k3d | v5.6.0+ |
|
||||
| kubectl | v1.28+ |
|
||||
| libvirt | 9.0.0 |
|
||||
|
||||
### Installation (One-time)
|
||||
|
||||
```bash
|
||||
# Install Rust
|
||||
curl --proto '=https://sh.rustup.rs' -sSf | sh
|
||||
|
||||
# Install Docker
|
||||
curl -fsSL https://get.docker.com -o docker-ce | sh
|
||||
|
||||
# Install k3d
|
||||
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
|
||||
|
||||
# Install kubectl
|
||||
curl -LO "https://dl.k8s.io/release/v1.28.0/bin/linux/amd64" -o /usr/local/bin/kubectl
|
||||
|
||||
sudo mv /usr/local/bin/kubectl /usr/local/bin
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Reference Materials
|
||||
### Existing Code
|
||||
|
||||
- Examples: `examples/*/src/main.rs`
|
||||
- Topologies: `harmony/src/domain/topology/`
|
||||
- Capabilities: `harmony/src/domain/topology/` (trait definitions)
|
||||
- Scores: `harmony/src/modules/*/`
|
||||
|
||||
### Documentation
|
||||
|
||||
- [Coding Guide](docs/coding-guide.md)
|
||||
- [Core Concepts](docs/concepts.md)
|
||||
- [Monitoring Architecture](docs/monitoring.md)
|
||||
- [ADR-020: Monitoring](adr/020-monitoring-alerting-architecture.md)
|
||||
|
||||
### Related Projects
|
||||
|
||||
- Crossplane (similar abstraction model)
|
||||
- Pulumi (infrastructure as code)
|
||||
- Terraform (state management patterns)
|
||||
- Flux/ArgoCD (GitOps testing patterns)
|
||||
201
CI_and_testing_roadmap.md
Normal file
201
CI_and_testing_roadmap.md
Normal file
@@ -0,0 +1,201 @@
|
||||
# Pragmatic CI and Testing Roadmap for Harmony
|
||||
|
||||
**Status**: Active implementation (March 2026)
|
||||
**Core Principle**: Self-contained test runner — no dependency on centralized CI servers
|
||||
|
||||
All tests are executable via one command:
|
||||
|
||||
```bash
|
||||
cargo run -p harmony_e2e_tests
|
||||
```
|
||||
|
||||
The `harmony_e2e_tests` package:
|
||||
- Provisions its own infrastructure when needed (k3d, KVM VMs)
|
||||
- Runs all test tiers in sequence or selectively
|
||||
- Reports results in text, JSON or JUnit XML
|
||||
- Works identically on developer laptops, any Linux server, GitHub Actions, GitLab CI, Jenkins, cron jobs, etc.
|
||||
- Is the single source of truth for what "passing CI" means
|
||||
|
||||
## Why This Approach
|
||||
|
||||
1. **Portability** — same command & behavior everywhere
|
||||
2. **Harmony tests Harmony** — the framework validates itself
|
||||
3. **No vendor lock-in** — GitHub Actions / GitLab CI are just triggers
|
||||
4. **Perfect reproducibility** — developers reproduce any CI failure locally in seconds
|
||||
5. **Offline capable** — after initial setup, most tiers run without internet
|
||||
|
||||
## Architecture: `harmony_e2e_tests` Package
|
||||
|
||||
```
|
||||
harmony_e2e_tests/
|
||||
├── Cargo.toml
|
||||
├── src/
|
||||
│ ├── main.rs # CLI entry point
|
||||
│ ├── lib.rs # Test runner core logic
|
||||
│ ├── tiers/
|
||||
│ │ ├── mod.rs
|
||||
│ │ ├── compile_fail.rs # trybuild-based compile-time checks
|
||||
│ │ ├── unit.rs # cargo test --lib --workspace
|
||||
│ │ ├── k3d.rs # k3d cluster + parallel example runs
|
||||
│ │ ├── okd.rs # connect to existing OKD cluster
|
||||
│ │ └── kvm.rs # full E2E via Harmony's own KVM module
|
||||
│ ├── mocks/
|
||||
│ │ ├── mod.rs
|
||||
│ │ ├── discord.rs # mock Discord webhook receiver
|
||||
│ │ └── opnsense.rs # mock OPNSense firewall API
|
||||
│ └── infrastructure/
|
||||
│ ├── mod.rs
|
||||
│ ├── k3d.rs # k3d cluster lifecycle
|
||||
│ └── kvm.rs # helper wrappers around KVM score
|
||||
└── tests/
|
||||
├── ui/ # trybuild compile-fail cases (*.rs + *.stderr)
|
||||
└── fixtures/ # static test data / golden files
|
||||
```
|
||||
|
||||
## CLI Interface ( clap-based )
|
||||
|
||||
```bash
|
||||
# Run everything (default)
|
||||
cargo run -p harmony_e2e_tests
|
||||
|
||||
# Specific tier
|
||||
cargo run -p harmony_e2e_tests -- --tier k3d
|
||||
cargo run -p harmony_e2e_tests -- --tier compile
|
||||
|
||||
# Filter to one example
|
||||
cargo run -p harmony_e2e_tests -- --tier k3d --example monitoring
|
||||
|
||||
# Parallelism control (k3d tier)
|
||||
cargo run -p harmony_e2e_tests -- --parallel 8
|
||||
|
||||
# Reporting
|
||||
cargo run -p harmony_e2e_tests -- --report junit.xml
|
||||
cargo run -p harmony_e2e_tests -- --format json
|
||||
|
||||
# Debug helpers
|
||||
cargo run -p harmony_e2e_tests -- --verbose --dry-run
|
||||
```
|
||||
|
||||
## Test Tiers – Ordered by Speed & Cost
|
||||
|
||||
| Tier | Duration target | Runner type | What it tests | Isolation strategy |
|
||||
|------------------|------------------|----------------------|----------------------------------------------------|-----------------------------|
|
||||
| Compile-fail | < 20 s | Any (GitHub free) | Invalid configs don't compile | Per-file trybuild |
|
||||
| Unit | < 60 s | Any | Pure Rust logic | cargo test |
|
||||
| k3d | 8–15 min | GitHub / self-hosted | 22+ k3d-compatible examples | Fresh k3d cluster + ns-per-example |
|
||||
| OKD | 10–30 min | Self-hosted / CRC | OKD-specific features (Routes, Monitoring CRDs…) | Existing cluster via KUBECONFIG |
|
||||
| KVM Full E2E | 60–180 min | Self-hosted bare-metal | Full HA OKD install + bare-metal scenarios | Harmony KVM score provisions VMs |
|
||||
|
||||
### Tier Details & Implementation Notes
|
||||
|
||||
1. **Compile-fail**
|
||||
Uses **`trybuild`** crate (standard in Rust ecosystem).
|
||||
Place intentional compile errors in `tests/ui/*.rs` with matching `*.stderr` expectation files.
|
||||
One test function replaces the old custom loop:
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn ui() {
|
||||
let t = trybuild::TestCases::new();
|
||||
t.compile_fail("tests/ui/*.rs");
|
||||
}
|
||||
```
|
||||
|
||||
2. **Unit**
|
||||
Simple wrapper: `cargo test --lib --workspace -- --nocapture`
|
||||
Consider `cargo-nextest` later for 2–3× speedup if test count grows.
|
||||
|
||||
3. **k3d**
|
||||
- Provisions isolated cluster once at start (`k3d cluster create --agents 3 --no-lb --disable traefik`)
|
||||
- Discovers examples via `[package.metadata.harmony.test-tier = "k3d"]` in `Cargo.toml`
|
||||
- Runs in parallel with tokio semaphore (default 5–8 slots)
|
||||
- Each example gets its own namespace
|
||||
- Uses `defer` / `scopeguard` for guaranteed cleanup
|
||||
- Mocks Discord webhook and OPNSense API
|
||||
|
||||
4. **OKD**
|
||||
Connects to pre-provisioned cluster via `KUBECONFIG`.
|
||||
Validates it is actually OpenShift/OKD before proceeding.
|
||||
|
||||
5. **KVM**
|
||||
Uses **Harmony’s own KVM module** to provision test VMs (control-plane + workers + OPNSense).
|
||||
→ True “dogfooding” — if the E2E fails, the KVM score itself is likely broken.
|
||||
|
||||
## CI Integration Patterns
|
||||
|
||||
### Fast PR validation (GitHub Actions)
|
||||
|
||||
```yaml
|
||||
name: Fast Tests
|
||||
on: [push, pull_request]
|
||||
jobs:
|
||||
fast:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: dtolnay/rust-toolchain@stable
|
||||
- name: Install Docker & k3d
|
||||
uses: nolar/setup-k3d-k3s@v1
|
||||
- run: cargo run -p harmony_e2e_tests -- --tier compile,unit,k3d --report junit.xml
|
||||
- uses: actions/upload-artifact@v4
|
||||
with: { name: test-results, path: junit.xml }
|
||||
```
|
||||
|
||||
### Nightly / Merge heavy tests (self-hosted runner)
|
||||
|
||||
```yaml
|
||||
name: Full E2E
|
||||
on:
|
||||
schedule: [{ cron: "0 3 * * *" }]
|
||||
push: { branches: [main] }
|
||||
jobs:
|
||||
full:
|
||||
runs-on: [self-hosted, linux, x64, kvm-capable]
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: cargo run -p harmony_e2e_tests -- --tier okd,kvm --verbose --report junit.xml
|
||||
```
|
||||
|
||||
## Prerequisites Auto-Check & Install
|
||||
|
||||
```rust
|
||||
// in harmony_e2e_tests/src/infrastructure/prerequisites.rs
|
||||
async fn ensure_k3d() -> Result<()> { … } // curl | bash if missing
|
||||
async fn ensure_docker() -> Result<()> { … }
|
||||
fn check_kvm_support() -> Result<()> { … } // /dev/kvm + libvirt
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Step 1
|
||||
- [ ] `harmony_e2e_tests` package created & basic CLI working
|
||||
- [ ] trybuild compile-fail suite passing
|
||||
- [ ] First 8–10 k3d examples running reliably in CI
|
||||
- [ ] Mock server for Discord webhook completed
|
||||
|
||||
### Step 2
|
||||
- [ ] All 22 k3d-compatible examples green
|
||||
- [ ] OKD tier running on dedicated self-hosted runner
|
||||
- [ ] JUnit reporting + GitHub check integration
|
||||
- [ ] Namespace isolation + automatic retry on transient k8s errors
|
||||
|
||||
### Step 3
|
||||
- [ ] KVM full E2E green on bare-metal runner (nightly)
|
||||
- [ ] Multi-cluster examples (nats, multisite-postgres) automated
|
||||
- [ ] Total fast CI time < 12 minutes on GitHub runners
|
||||
- [ ] Documentation: “How to add a new tested example”
|
||||
|
||||
## Quick Start for New Contributors
|
||||
|
||||
```bash
|
||||
# One-time setup
|
||||
rustup update stable
|
||||
cargo install trybuild cargo-nextest # optional but recommended
|
||||
|
||||
# Run locally (most common)
|
||||
cargo run -p harmony_e2e_tests -- --tier k3d --verbose
|
||||
|
||||
# Just compile checks + unit
|
||||
cargo test -p harmony_e2e_tests
|
||||
```
|
||||
|
||||
624
Cargo.lock
generated
624
Cargo.lock
generated
@@ -297,6 +297,12 @@ dependencies = [
|
||||
"libc",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "ansi_term"
|
||||
version = "0.10.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "6b3568b48b7cefa6b8ce125f9bb4989e52fbcc29ebea88df04cc7c5f12f70455"
|
||||
|
||||
[[package]]
|
||||
name = "anstream"
|
||||
version = "0.6.21"
|
||||
@@ -718,6 +724,41 @@ dependencies = [
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "brocade-snmp-server"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"base64 0.22.1",
|
||||
"brocade",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_secret",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"serde",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "brocade-switch"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"brocade",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"serde",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "brotli"
|
||||
version = "8.0.2"
|
||||
@@ -871,6 +912,22 @@ dependencies = [
|
||||
"shlex",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "cert_manager"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"assert_cmd",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "cfg-if"
|
||||
version = "1.0.4"
|
||||
@@ -1853,6 +1910,12 @@ dependencies = [
|
||||
"regex",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "env_home"
|
||||
version = "0.1.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "c7f84e12ccf0a7ddc17a6c41c93326024c42920d7ee630d04950e6926645c0fe"
|
||||
|
||||
[[package]]
|
||||
name = "env_logger"
|
||||
version = "0.11.9"
|
||||
@@ -1929,6 +1992,457 @@ dependencies = [
|
||||
name = "example"
|
||||
version = "0.0.0"
|
||||
|
||||
[[package]]
|
||||
name = "example-application-monitoring-with-tenant"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_types",
|
||||
"logging",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-cli"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"assert_cmd",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-k8s-drain-node"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"assert_cmd",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony-k8s",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"inquire 0.7.5",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-k8s-write-file-on-node"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"assert_cmd",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony-k8s",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"inquire 0.7.5",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-kube-rs"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_macros",
|
||||
"http 1.4.0",
|
||||
"inquire 0.7.5",
|
||||
"k8s-openapi",
|
||||
"kube",
|
||||
"log",
|
||||
"serde_yaml",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-lamp"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-monitoring"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-monitoring-with-tenant"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_types",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-multisite-postgres"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-nats"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-nats-module-supercluster"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"k8s-openapi",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-nats-supercluster"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"k8s-openapi",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-node-health"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-ntfy"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-okd-cluster-alerts"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"brocade",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_secret",
|
||||
"harmony_secret_derive",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"serde",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-okd-install"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"brocade",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_secret",
|
||||
"harmony_secret_derive",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"schemars 0.8.22",
|
||||
"serde",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-openbao"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-operatorhub-catalogsource"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-opnsense"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"brocade",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_secret",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"schemars 0.8.22",
|
||||
"serde",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-opnsense-node-exporter"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_secret",
|
||||
"harmony_secret_derive",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"serde",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-postgresql"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-public-postgres"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-pxe"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"brocade",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_secret",
|
||||
"harmony_secret_derive",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"schemars 0.8.22",
|
||||
"serde",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-remove-rook-osd"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-rust"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"base64 0.22.1",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-tenant"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-try-rust-webapp"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"base64 0.22.1",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-tui"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_macros",
|
||||
"harmony_tui",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example-zitadel"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "example_validate_ceph_cluster_health"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "eyre"
|
||||
version = "0.6.12"
|
||||
@@ -2540,6 +3054,30 @@ dependencies = [
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "harmony_e2e_tests"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"chrono",
|
||||
"clap",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"inventory",
|
||||
"k3d-rs",
|
||||
"k8s-openapi",
|
||||
"kube",
|
||||
"log",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"sqlx",
|
||||
"tempfile",
|
||||
"thiserror 2.0.18",
|
||||
"tokio",
|
||||
"tokio-stream",
|
||||
"which",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "harmony_execution"
|
||||
version = "0.1.0"
|
||||
@@ -2569,6 +3107,19 @@ dependencies = [
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "harmony_inventory_builder"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"cidr",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "harmony_macros"
|
||||
version = "0.1.0"
|
||||
@@ -3333,6 +3884,15 @@ dependencies = [
|
||||
"thiserror 1.0.69",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "inventory"
|
||||
version = "0.3.22"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "009ae045c87e7082cb72dab0ccd01ae075dd00141ddc108f43a0ea150a9e7227"
|
||||
dependencies = [
|
||||
"rustversion",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "ipnet"
|
||||
version = "2.12.0"
|
||||
@@ -3732,6 +4292,15 @@ dependencies = [
|
||||
"log",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "logging"
|
||||
version = "0.1.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "461a8beca676e8ab1bd468c92e9b4436d6368e11e96ae038209e520cfe665e46"
|
||||
dependencies = [
|
||||
"ansi_term",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "lru"
|
||||
version = "0.12.5"
|
||||
@@ -4954,6 +5523,21 @@ dependencies = [
|
||||
"subtle",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rhob-application-monitoring"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"base64 0.22.1",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "ring"
|
||||
version = "0.17.14"
|
||||
@@ -5927,6 +6511,7 @@ dependencies = [
|
||||
"memchr",
|
||||
"once_cell",
|
||||
"percent-encoding",
|
||||
"rustls 0.23.37",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"sha2",
|
||||
@@ -5936,6 +6521,7 @@ dependencies = [
|
||||
"tokio-stream",
|
||||
"tracing",
|
||||
"url",
|
||||
"webpki-roots 0.26.11",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -6208,6 +6794,26 @@ dependencies = [
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "sttest"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"brocade",
|
||||
"cidr",
|
||||
"env_logger",
|
||||
"harmony",
|
||||
"harmony_cli",
|
||||
"harmony_macros",
|
||||
"harmony_secret",
|
||||
"harmony_secret_derive",
|
||||
"harmony_types",
|
||||
"log",
|
||||
"schemars 0.8.22",
|
||||
"serde",
|
||||
"tokio",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "subtle"
|
||||
version = "2.6.1"
|
||||
@@ -7210,6 +7816,18 @@ dependencies = [
|
||||
"rustls-pki-types",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "which"
|
||||
version = "7.0.3"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "24d643ce3fd3e5b54854602a080f34fb10ab75e0b813ee32d00ca2b44fa74762"
|
||||
dependencies = [
|
||||
"either",
|
||||
"env_home",
|
||||
"rustix 1.1.4",
|
||||
"winsafe",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "whoami"
|
||||
version = "1.6.1"
|
||||
@@ -7585,6 +8203,12 @@ dependencies = [
|
||||
"windows-sys 0.48.0",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "winsafe"
|
||||
version = "0.0.19"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d135d17ab770252ad95e9a872d365cf3090e3be864a34ab46f48555993efc904"
|
||||
|
||||
[[package]]
|
||||
name = "wit-bindgen"
|
||||
version = "0.51.0"
|
||||
|
||||
10
Cargo.toml
10
Cargo.toml
@@ -2,6 +2,7 @@
|
||||
resolver = "2"
|
||||
members = [
|
||||
"private_repos/*",
|
||||
"examples/*",
|
||||
"harmony",
|
||||
"harmony_types",
|
||||
"harmony_macros",
|
||||
@@ -16,9 +17,12 @@ members = [
|
||||
"harmony_secret_derive",
|
||||
"harmony_secret",
|
||||
"adr/agent_discovery/mdns",
|
||||
"brocade",
|
||||
"harmony_agent",
|
||||
"harmony_agent/deploy", "harmony_node_readiness", "harmony-k8s",
|
||||
"brocade",
|
||||
"harmony_agent",
|
||||
"harmony_agent/deploy",
|
||||
"harmony_node_readiness",
|
||||
"harmony-k8s",
|
||||
"harmony_e2e_tests",
|
||||
]
|
||||
|
||||
[workspace.package]
|
||||
|
||||
299
docs/coding-guide.md
Normal file
299
docs/coding-guide.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# Harmony Coding Guide
|
||||
|
||||
Harmony is an infrastructure automation framework. It is **code-first and code-only**: operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Good code here means a good operator experience.
|
||||
|
||||
### Concrete context
|
||||
|
||||
We use here the context of the KVM module to explain the coding style. This will make it very easy to understand and should translate quite well to other modules/contexts managed by Harmony like OPNSense and Kubernetes.
|
||||
|
||||
## Core Philosophy
|
||||
|
||||
### The Careful Craftsman Principle
|
||||
|
||||
Harmony is a powerful framework that does a lot. With that power comes responsibility. Every abstraction, every trait, every module must earn its place. Before adding anything, ask:
|
||||
|
||||
1. **Does this solve a real problem users have?** Not a theoretical problem, an actual one encountered in production.
|
||||
2. **Is this the simplest solution that works?** Complexity is a cost that compounds over time.
|
||||
3. **Will this make the next developer's life easier or harder?** Code is read far more often than written.
|
||||
|
||||
When in doubt, don't abstract. Wait for the pattern to emerge from real usage. A little duplication is better than the wrong abstraction.
|
||||
|
||||
### High-level functions over raw primitives
|
||||
|
||||
Callers should not need to know about underlying protocols, XML schemas, or API quirks. A function that deploys a VM should accept meaningful parameters like CPU count, memory, and network name — not XML strings.
|
||||
|
||||
```rust
|
||||
// Bad: caller constructs XML and passes it to a thin wrapper
|
||||
let xml = format!(r#"<domain type='kvm'>...</domain>"#, name, memory_kb, ...);
|
||||
executor.create_vm(&xml).await?;
|
||||
|
||||
// Good: caller describes intent, the module handles representation
|
||||
executor.define_vm(&VmConfig::builder("my-vm")
|
||||
.cpu(4)
|
||||
.memory_gb(8)
|
||||
.disk(DiskConfig::new(50))
|
||||
.network(NetworkRef::named("mylan"))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build())
|
||||
.await?;
|
||||
```
|
||||
|
||||
The module owns the XML, the virsh invocations, the API calls — not the caller.
|
||||
|
||||
### Use the right abstraction layer
|
||||
|
||||
Prefer native library bindings over shelling out to CLI tools. The `virt` crate provides direct libvirt bindings and should be used instead of spawning `virsh` subprocesses.
|
||||
|
||||
- CLI subprocess calls are fragile: stdout/stderr parsing, exit codes, quoting, PATH differences
|
||||
- Native bindings give typed errors, no temp files, no shell escaping
|
||||
- `virt::connect::Connect` opens a connection; `virt::domain::Domain` manages VMs; `virt::network::Network` manages virtual networks
|
||||
|
||||
### Keep functions small and well-named
|
||||
|
||||
Each function should do one thing. If a function is doing two conceptually separate things, split it. Function names should read like plain English: `ensure_network_active`, `define_vm`, `vm_is_running`.
|
||||
|
||||
### Prefer short modules over large files
|
||||
|
||||
Group related types and functions by concept. A module that handles one resource (e.g., network, domain, storage) is better than a single file for everything.
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Use `thiserror` for all error types
|
||||
|
||||
Define error types with `thiserror::Error`. This removes the boilerplate of implementing `Display` and `std::error::Error` by hand, keeps error messages close to their variants, and makes types easy to extend.
|
||||
|
||||
```rust
|
||||
// Bad: hand-rolled Display + std::error::Error
|
||||
#[derive(Debug)]
|
||||
pub enum KVMError {
|
||||
ConnectionError(String),
|
||||
VMNotFound(String),
|
||||
}
|
||||
|
||||
impl std::fmt::Display for KVMError { ... }
|
||||
impl std::error::Error for KVMError {}
|
||||
|
||||
// Good: derive Display via thiserror
|
||||
#[derive(thiserror::Error, Debug)]
|
||||
pub enum KVMError {
|
||||
#[error("connection failed: {0}")]
|
||||
ConnectionFailed(String),
|
||||
#[error("VM not found: {name}")]
|
||||
VmNotFound { name: String },
|
||||
}
|
||||
```
|
||||
|
||||
### Make bubbling errors easy with `?` and `From`
|
||||
|
||||
`?` works on any error type for which there is a `From` impl. Add `From` conversions from lower-level errors into your module's error type so callers can use `?` without boilerplate.
|
||||
|
||||
With `thiserror`, wrapping a foreign error is one line:
|
||||
|
||||
```rust
|
||||
#[derive(thiserror::Error, Debug)]
|
||||
pub enum KVMError {
|
||||
#[error("libvirt error: {0}")]
|
||||
Libvirt(#[from] virt::error::Error),
|
||||
|
||||
#[error("IO error: {0}")]
|
||||
Io(#[from] std::io::Error),
|
||||
}
|
||||
```
|
||||
|
||||
This means a call that returns `virt::error::Error` can be `?`-propagated into a `Result<_, KVMError>` without any `.map_err(...)`.
|
||||
|
||||
### Typed errors over stringly-typed errors
|
||||
|
||||
Avoid `Box<dyn Error>` or `String` as error return types in library code. Callers need to distinguish errors programmatically — `KVMError::VmAlreadyExists` is actionable, `"VM already exists: foo"` as a `String` is not.
|
||||
|
||||
At binary entry points (e.g., `main`) it is acceptable to convert to `String` or `anyhow::Error` for display.
|
||||
|
||||
---
|
||||
|
||||
## Logging
|
||||
|
||||
### Use the `log` crate macros
|
||||
|
||||
All log output must go through the `log` crate. Never use `println!`, `eprintln!`, or `dbg!` in library code. This makes output compatible with any logging backend (env_logger, tracing, structured logging, etc.).
|
||||
|
||||
```rust
|
||||
// Bad
|
||||
println!("Creating VM: {}", name);
|
||||
|
||||
// Good
|
||||
use log::{info, debug, warn};
|
||||
info!("Creating VM: {name}");
|
||||
debug!("VM XML:\n{xml}");
|
||||
warn!("Network already active, skipping creation");
|
||||
```
|
||||
|
||||
Use the right level:
|
||||
|
||||
| Level | When to use |
|
||||
|---------|-------------|
|
||||
| `error` | Unrecoverable failures (before returning Err) |
|
||||
| `warn` | Recoverable issues, skipped steps |
|
||||
| `info` | High-level progress events visible in normal operation |
|
||||
| `debug` | Detailed operational info useful for debugging |
|
||||
| `trace` | Very granular, per-iteration or per-call data |
|
||||
|
||||
Log before significant operations and after unexpected conditions. Do not log inside tight loops at `info` level.
|
||||
|
||||
---
|
||||
|
||||
## Types and Builders
|
||||
|
||||
### Derive `Serialize` on all public domain types
|
||||
|
||||
All public structs and enums that represent configuration or state should derive `serde::Serialize`. Add `Deserialize` when round-trip serialization is needed.
|
||||
|
||||
### Builder pattern for complex configs
|
||||
|
||||
When a type has more than three fields or optional fields, provide a builder. The builder pattern allows named, incremental construction without positional arguments.
|
||||
|
||||
```rust
|
||||
let config = VmConfig::builder("bootstrap")
|
||||
.cpu(4)
|
||||
.memory_gb(8)
|
||||
.disk(DiskConfig::new(50).labeled("os"))
|
||||
.disk(DiskConfig::new(100).labeled("data"))
|
||||
.network(NetworkRef::named("harmonylan"))
|
||||
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||
.build();
|
||||
```
|
||||
|
||||
### Avoid `pub` fields on config structs
|
||||
|
||||
Expose data through methods or the builder, not raw field access. This preserves the ability to validate, rename, or change representation without breaking callers.
|
||||
|
||||
---
|
||||
|
||||
## Async
|
||||
|
||||
### Use `tokio` for all async runtime needs
|
||||
|
||||
All async code runs on tokio. Use `tokio::spawn`, `tokio::time`, etc. Use `#[async_trait]` for traits with async methods.
|
||||
|
||||
### No blocking in async context
|
||||
|
||||
Never call blocking I/O (file I/O, network, process spawn) directly in an async function. Use `tokio::fs`, `tokio::process`, or `tokio::task::spawn_blocking` as appropriate.
|
||||
|
||||
---
|
||||
|
||||
## Module Structure
|
||||
|
||||
### Follow the `Score` / `Interpret` pattern
|
||||
|
||||
Modules that represent deployable infrastructure should implement `Score<T: Topology>` and `Interpret<T>`:
|
||||
|
||||
- `Score` is the serializable, clonable configuration declaring *what* to deploy
|
||||
- `Interpret` does the actual work when `execute()` is called
|
||||
|
||||
```rust
|
||||
pub struct KvmScore {
|
||||
network: NetworkConfig,
|
||||
vms: Vec<VmConfig>,
|
||||
}
|
||||
|
||||
impl<T: Topology + KvmHost> Score<T> for KvmScore {
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||
Box::new(KvmInterpret::new(self.clone()))
|
||||
}
|
||||
fn name(&self) -> String { "KvmScore".to_string() }
|
||||
}
|
||||
```
|
||||
|
||||
### Flatten the public API in `mod.rs`
|
||||
|
||||
Internal submodules are implementation detail. Re-export what callers need at the module root:
|
||||
|
||||
```rust
|
||||
// modules/kvm/mod.rs
|
||||
mod connection;
|
||||
mod domain;
|
||||
mod network;
|
||||
mod error;
|
||||
mod xml;
|
||||
|
||||
pub use connection::KvmConnection;
|
||||
pub use domain::{VmConfig, VmConfigBuilder, VmStatus, DiskConfig, BootDevice};
|
||||
pub use error::KvmError;
|
||||
pub use network::NetworkConfig;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Commit Style
|
||||
|
||||
Follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/):
|
||||
|
||||
```
|
||||
feat(kvm): add network isolation support
|
||||
fix(kvm): correct memory unit conversion for libvirt
|
||||
refactor(kvm): replace virsh subprocess calls with virt crate bindings
|
||||
docs: add coding guide
|
||||
```
|
||||
|
||||
Keep pull requests small and single-purpose (under ~200 lines excluding generated code). Do not mix refactoring, bug fixes, and new features in one PR.
|
||||
|
||||
---
|
||||
|
||||
## When to Add Abstractions
|
||||
|
||||
Harmony provides powerful abstraction mechanisms: traits, generics, the Score/Interpret pattern, and capabilities. Use them judiciously.
|
||||
|
||||
### Add an abstraction when:
|
||||
|
||||
- **You have three or more concrete implementations** doing the same thing. Two is often coincidence; three is a pattern.
|
||||
- **The abstraction provides compile-time safety** that prevents real bugs (e.g., capability bounds on topologies).
|
||||
- **The abstraction hides genuine complexity** that callers shouldn't need to understand (e.g., XML schema generation for libvirt).
|
||||
|
||||
### Don't add an abstraction when:
|
||||
|
||||
- **It's just to avoid a few lines of boilerplate**. Copy-paste is sometimes better than a trait hierarchy.
|
||||
- **You're anticipating future flexibility** that isn't needed today. YAGNI (You Aren't Gonna Need It).
|
||||
- **The abstraction makes the code harder to understand** for someone unfamiliar with the codebase.
|
||||
- **You're wrapping a single implementation**. A trait with one implementation is usually over-engineering.
|
||||
|
||||
### Signs you've over-abstracted:
|
||||
|
||||
- You need to explain the type system to a competent Rust developer for them to understand how to add a simple feature.
|
||||
- Adding a new concrete type requires changes in multiple trait definitions.
|
||||
- The word "factory" or "manager" appears in your type names.
|
||||
- You have more trait definitions than concrete implementations.
|
||||
|
||||
### The Rule of Three for Traits
|
||||
|
||||
Before creating a new trait, ensure you have:
|
||||
|
||||
1. A clear, real use case (not hypothetical)
|
||||
2. At least one concrete implementation
|
||||
3. A plan for how callers will use it
|
||||
|
||||
Only generalize when the pattern is proven. The monitoring module is a good example: we had multiple alert senders (OKD, KubePrometheus, RHOB) before we introduced the `AlertSender` and `AlertReceiver<S>` traits. The traits emerged from real needs, not design sessions.
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
### Document the "why", not the "what"
|
||||
|
||||
Code should be self-explanatory for the "what". Comments and documentation should explain intent, rationale, and gotchas.
|
||||
|
||||
```rust
|
||||
// Bad: restates the code
|
||||
// Returns the number of VMs
|
||||
fn vm_count(&self) -> usize { self.vms.len() }
|
||||
|
||||
// Good: explains the why
|
||||
// Returns 0 if connection is lost, rather than erroring,
|
||||
// because monitoring code uses this for health checks
|
||||
fn vm_count(&self) -> usize { self.vms.len() }
|
||||
```
|
||||
|
||||
### Keep examples in the `examples/` directory
|
||||
|
||||
Working code beats documentation. Every major feature should have a runnable example that demonstrates real usage.
|
||||
|
||||
@@ -3,12 +3,10 @@ use harmony::{
|
||||
modules::monitoring::{
|
||||
alert_channel::discord_alert_channel::DiscordReceiver,
|
||||
alert_rule::{
|
||||
alerts::{
|
||||
infra::opnsense::high_http_error_rate, k8s::pvc::high_pvc_fill_rate_over_two_days,
|
||||
},
|
||||
alerts::infra::opnsense::high_http_error_rate,
|
||||
prometheus_alert_rule::AlertManagerRuleGroup,
|
||||
},
|
||||
okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
|
||||
cluster_alerting::ClusterAlertingScore,
|
||||
scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
|
||||
},
|
||||
topology::{
|
||||
@@ -21,22 +19,37 @@ use harmony_macros::{hurl, ip};
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
let platform_matcher = AlertMatcher {
|
||||
label: "prometheus".to_string(),
|
||||
operator: MatchOp::Eq,
|
||||
value: "openshift-monitoring/k8s".to_string(),
|
||||
};
|
||||
let severity = AlertMatcher {
|
||||
label: "severity".to_string(),
|
||||
operator: MatchOp::Eq,
|
||||
value: "critical".to_string(),
|
||||
let critical_receiver = DiscordReceiver {
|
||||
name: "critical-alerts".to_string(),
|
||||
url: hurl!("https://discord.example.com/webhook/critical"),
|
||||
route: AlertRoute {
|
||||
matchers: vec![AlertMatcher {
|
||||
label: "severity".to_string(),
|
||||
operator: MatchOp::Eq,
|
||||
value: "critical".to_string(),
|
||||
}],
|
||||
..AlertRoute::default("critical-alerts".to_string())
|
||||
},
|
||||
};
|
||||
|
||||
let high_http_error_rate = high_http_error_rate();
|
||||
let warning_receiver = DiscordReceiver {
|
||||
name: "warning-alerts".to_string(),
|
||||
url: hurl!("https://discord.example.com/webhook/warning"),
|
||||
route: AlertRoute {
|
||||
matchers: vec![AlertMatcher {
|
||||
label: "severity".to_string(),
|
||||
operator: MatchOp::Eq,
|
||||
value: "warning".to_string(),
|
||||
}],
|
||||
repeat_interval: Some("30m".to_string()),
|
||||
..AlertRoute::default("warning-alerts".to_string())
|
||||
},
|
||||
};
|
||||
|
||||
let additional_rules = AlertManagerRuleGroup::new("test-rule", vec![high_http_error_rate]);
|
||||
let additional_rules =
|
||||
AlertManagerRuleGroup::new("infra-alerts", vec![high_http_error_rate()]);
|
||||
|
||||
let scrape_target = PrometheusNodeExporter {
|
||||
let firewall_scraper = PrometheusNodeExporter {
|
||||
job_name: "firewall".to_string(),
|
||||
metrics_path: "/metrics".to_string(),
|
||||
listen_address: ip!("192.168.1.1"),
|
||||
@@ -44,22 +57,16 @@ async fn main() {
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let alerting_score = ClusterAlertingScore::new()
|
||||
.critical_receiver(Box::new(critical_receiver))
|
||||
.warning_receiver(Box::new(warning_receiver))
|
||||
.additional_rule(Box::new(additional_rules))
|
||||
.scrape_target(Box::new(firewall_scraper));
|
||||
|
||||
harmony_cli::run(
|
||||
Inventory::autoload(),
|
||||
K8sAnywhereTopology::from_env(),
|
||||
vec![Box::new(OpenshiftClusterAlertScore {
|
||||
receivers: vec![Box::new(DiscordReceiver {
|
||||
name: "crit-wills-discord-channel-example".to_string(),
|
||||
url: hurl!("https://test.io"),
|
||||
route: AlertRoute {
|
||||
matchers: vec![severity],
|
||||
..AlertRoute::default("crit-wills-discord-channel-example".to_string())
|
||||
},
|
||||
})],
|
||||
sender: harmony::modules::monitoring::okd::OpenshiftClusterAlertSender,
|
||||
rules: vec![Box::new(additional_rules)],
|
||||
scrape_targets: Some(vec![Box::new(scrape_target)]),
|
||||
})],
|
||||
vec![Box::new(alerting_score)],
|
||||
None,
|
||||
)
|
||||
.await
|
||||
|
||||
@@ -46,6 +46,14 @@ impl std::fmt::Debug for K8sClient {
|
||||
}
|
||||
|
||||
impl K8sClient {
|
||||
pub fn inner_client(&self) -> &Client {
|
||||
&self.client
|
||||
}
|
||||
|
||||
pub fn inner_client_clone(&self) -> Client {
|
||||
self.client.clone()
|
||||
}
|
||||
|
||||
/// Create a client, reading `DRY_RUN` from the environment.
|
||||
pub fn new(client: Client) -> Self {
|
||||
Self {
|
||||
|
||||
@@ -0,0 +1,194 @@
|
||||
use serde::Serialize;
|
||||
|
||||
use crate::{
|
||||
interpret::Interpret,
|
||||
modules::monitoring::{
|
||||
alert_rule::{
|
||||
alerts::k8s::{
|
||||
deployment::alert_deployment_unavailable, memory_usage::alert_high_cpu_usage,
|
||||
memory_usage::alert_high_memory_usage, pod::alert_container_restarting,
|
||||
pod::alert_pod_not_ready, pod::pod_failed, pvc::high_pvc_fill_rate_over_two_days,
|
||||
},
|
||||
prometheus_alert_rule::AlertManagerRuleGroup,
|
||||
},
|
||||
okd::OpenshiftClusterAlertSender,
|
||||
},
|
||||
score::Score,
|
||||
topology::{
|
||||
monitoring::{
|
||||
AlertReceiver, AlertRoute, AlertRule, AlertingInterpret, MatchOp, Observability,
|
||||
ScrapeTarget,
|
||||
},
|
||||
Topology,
|
||||
},
|
||||
};
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct ClusterAlertingScore {
|
||||
pub critical_alerts_receiver: Option<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
|
||||
pub warning_alerts_receiver: Option<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
|
||||
pub additional_rules: Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>>,
|
||||
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>>>,
|
||||
pub include_default_rules: bool,
|
||||
}
|
||||
|
||||
impl ClusterAlertingScore {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
critical_alerts_receiver: None,
|
||||
warning_alerts_receiver: None,
|
||||
additional_rules: vec![],
|
||||
scrape_targets: None,
|
||||
include_default_rules: true,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn critical_receiver(
|
||||
mut self,
|
||||
receiver: Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>,
|
||||
) -> Self {
|
||||
self.critical_alerts_receiver = Some(receiver);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn warning_receiver(
|
||||
mut self,
|
||||
receiver: Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>,
|
||||
) -> Self {
|
||||
self.warning_alerts_receiver = Some(receiver);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn additional_rule(
|
||||
mut self,
|
||||
rule: Box<dyn AlertRule<OpenshiftClusterAlertSender>>,
|
||||
) -> Self {
|
||||
self.additional_rules.push(rule);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn scrape_target(
|
||||
mut self,
|
||||
target: Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>,
|
||||
) -> Self {
|
||||
self.scrape_targets
|
||||
.get_or_insert_with(Vec::new)
|
||||
.push(target);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn with_default_rules(mut self, include: bool) -> Self {
|
||||
self.include_default_rules = include;
|
||||
self
|
||||
}
|
||||
|
||||
fn build_default_rules(&self) -> Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>> {
|
||||
if !self.include_default_rules {
|
||||
return vec![];
|
||||
}
|
||||
|
||||
let critical_rules =
|
||||
AlertManagerRuleGroup::new("cluster-critical-alerts", vec![pod_failed()]);
|
||||
|
||||
let warning_rules = AlertManagerRuleGroup::new(
|
||||
"cluster-warning-alerts",
|
||||
vec![
|
||||
alert_deployment_unavailable(),
|
||||
alert_container_restarting(),
|
||||
alert_pod_not_ready(),
|
||||
alert_high_memory_usage(),
|
||||
alert_high_cpu_usage(),
|
||||
high_pvc_fill_rate_over_two_days(),
|
||||
],
|
||||
);
|
||||
|
||||
vec![Box::new(critical_rules), Box::new(warning_rules)]
|
||||
}
|
||||
|
||||
fn build_receivers(&self) -> Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>> {
|
||||
let mut receivers = vec![];
|
||||
|
||||
if let Some(ref critical_receiver) = self.critical_alerts_receiver {
|
||||
receivers.push(critical_receiver.clone());
|
||||
}
|
||||
|
||||
if let Some(ref warning_receiver) = self.warning_alerts_receiver {
|
||||
receivers.push(warning_receiver.clone());
|
||||
}
|
||||
|
||||
receivers
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for ClusterAlertingScore {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T> for ClusterAlertingScore {
|
||||
fn name(&self) -> String {
|
||||
"ClusterAlertingScore".to_string()
|
||||
}
|
||||
|
||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||
let mut all_rules = self.build_default_rules();
|
||||
all_rules.extend(self.additional_rules.clone());
|
||||
|
||||
let receivers = self.build_receivers();
|
||||
|
||||
Box::new(AlertingInterpret {
|
||||
sender: OpenshiftClusterAlertSender,
|
||||
receivers,
|
||||
rules: all_rules,
|
||||
scrape_targets: self.scrape_targets.clone(),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl Serialize for ClusterAlertingScore {
|
||||
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
|
||||
where
|
||||
S: serde::Serializer,
|
||||
{
|
||||
serde_json::json!({
|
||||
"name": "ClusterAlertingScore",
|
||||
"include_default_rules": self.include_default_rules,
|
||||
"has_critical_receiver": self.critical_alerts_receiver.is_some(),
|
||||
"has_warning_receiver": self.warning_alerts_receiver.is_some(),
|
||||
"additional_rules_count": self.additional_rules.len(),
|
||||
"scrape_targets_count": self.scrape_targets.as_ref().map(|t| t.len()).unwrap_or(0),
|
||||
})
|
||||
.serialize(serializer)
|
||||
}
|
||||
}
|
||||
|
||||
pub fn critical_route() -> AlertRoute {
|
||||
AlertRoute {
|
||||
receiver: "critical".to_string(),
|
||||
matchers: vec![crate::topology::monitoring::AlertMatcher {
|
||||
label: "severity".to_string(),
|
||||
operator: MatchOp::Eq,
|
||||
value: "critical".to_string(),
|
||||
}],
|
||||
group_by: vec![],
|
||||
repeat_interval: Some("5m".to_string()),
|
||||
continue_matching: false,
|
||||
children: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
pub fn warning_route() -> AlertRoute {
|
||||
AlertRoute {
|
||||
receiver: "warning".to_string(),
|
||||
matchers: vec![crate::topology::monitoring::AlertMatcher {
|
||||
label: "severity".to_string(),
|
||||
operator: MatchOp::Eq,
|
||||
value: "warning".to_string(),
|
||||
}],
|
||||
group_by: vec![],
|
||||
repeat_interval: Some("30m".to_string()),
|
||||
continue_matching: false,
|
||||
children: vec![],
|
||||
}
|
||||
}
|
||||
3
harmony/src/modules/monitoring/cluster_alerting/mod.rs
Normal file
3
harmony/src/modules/monitoring/cluster_alerting/mod.rs
Normal file
@@ -0,0 +1,3 @@
|
||||
mod cluster_alerting_score;
|
||||
|
||||
pub use cluster_alerting_score::{critical_route, warning_route, ClusterAlertingScore};
|
||||
@@ -1,6 +1,7 @@
|
||||
pub mod alert_channel;
|
||||
pub mod alert_rule;
|
||||
pub mod application_monitoring;
|
||||
pub mod cluster_alerting;
|
||||
pub mod grafana;
|
||||
pub mod kube_prometheus;
|
||||
pub mod ntfy;
|
||||
|
||||
32
harmony_e2e_tests/Cargo.toml
Normal file
32
harmony_e2e_tests/Cargo.toml
Normal file
@@ -0,0 +1,32 @@
|
||||
[package]
|
||||
name = "harmony_e2e_tests"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
description = "Harmony end-to-end test runner"
|
||||
license = "Apache-2.0"
|
||||
repository = "https://github.com/nationtech/harmony"
|
||||
rust-version = "1.75.0"
|
||||
|
||||
[dependencies]
|
||||
clap = { version = "4.4", features = ["derive"] }
|
||||
chrono = { version = "0.4", features = ["serde"] }
|
||||
env_logger = "0.11"
|
||||
kube = { workspace = true }
|
||||
k8s-openapi = { workspace = true }
|
||||
log = "0.4"
|
||||
serde = { workspace = true }
|
||||
serde_json = { workspace = true }
|
||||
thiserror = "2.0"
|
||||
tokio = { workspace = true }
|
||||
which = "7.0"
|
||||
inventory = "0.3"
|
||||
tempfile = { workspace = true }
|
||||
k3d-rs = { path = "../k3d" }
|
||||
harmony = { path = "../harmony" }
|
||||
sqlx = { version = "0.8", features = ["runtime-tokio", "postgres", "tls-rustls"] }
|
||||
tokio-stream = "0.1"
|
||||
async-trait.workspace = true
|
||||
|
||||
[[bin]]
|
||||
name = "harmony-e2e"
|
||||
path = "src/main.rs"
|
||||
68
harmony_e2e_tests/src/main.rs
Normal file
68
harmony_e2e_tests/src/main.rs
Normal file
@@ -0,0 +1,68 @@
|
||||
mod test_harness;
|
||||
mod tests;
|
||||
|
||||
use clap::{Parser, Subcommand};
|
||||
use test_harness::find_tests;
|
||||
|
||||
#[derive(Parser)]
|
||||
#[command(name = "harmony-e2e")]
|
||||
#[command(about = "Harmony end-to-end test runner", long_about = None)]
|
||||
struct Cli {
|
||||
#[command(subcommand)]
|
||||
command: Commands,
|
||||
|
||||
#[arg(short, long, default_value = "info")]
|
||||
log_level: String,
|
||||
}
|
||||
|
||||
#[derive(Subcommand)]
|
||||
enum Commands {
|
||||
List {
|
||||
#[arg(short, long)]
|
||||
filter: Option<String>,
|
||||
},
|
||||
Run {
|
||||
#[arg(short, long)]
|
||||
filter: Option<String>,
|
||||
},
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let cli = Cli::parse();
|
||||
|
||||
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or(&cli.log_level))
|
||||
.init();
|
||||
|
||||
match cli.command {
|
||||
Commands::List { filter } => {
|
||||
let tests = find_tests(filter.as_deref());
|
||||
if tests.is_empty() {
|
||||
println!("No tests found matching filter.");
|
||||
} else {
|
||||
println!("Available tests:");
|
||||
for test in tests {
|
||||
println!(" {} - {}", test.name(), test.description());
|
||||
}
|
||||
}
|
||||
}
|
||||
Commands::Run { filter } => {
|
||||
let tests = find_tests(filter.as_deref());
|
||||
if tests.is_empty() {
|
||||
return Err("No tests found matching filter.".into());
|
||||
}
|
||||
|
||||
log::info!("Running {} test(s)...", tests.len());
|
||||
for test in tests {
|
||||
log::info!("=== Running: {} ===", test.name());
|
||||
test.run()
|
||||
.await
|
||||
.map_err(|e| e as Box<dyn std::error::Error>)?;
|
||||
log::info!("=== Passed: {} ===", test.name());
|
||||
}
|
||||
log::info!("All tests passed!");
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
60
harmony_e2e_tests/src/test_harness.rs
Normal file
60
harmony_e2e_tests/src/test_harness.rs
Normal file
@@ -0,0 +1,60 @@
|
||||
use async_trait::async_trait;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum HarnessError {
|
||||
#[error("IO error: {0}")]
|
||||
Io(#[from] std::io::Error),
|
||||
}
|
||||
|
||||
pub struct TestContext {
|
||||
pub test_name: String,
|
||||
pub namespace: String,
|
||||
}
|
||||
|
||||
impl TestContext {
|
||||
pub fn new(test_name: &str) -> Result<Self, HarnessError> {
|
||||
let namespace = format!("harmony-test-{}", test_name);
|
||||
|
||||
Ok(Self {
|
||||
test_name: test_name.to_string(),
|
||||
namespace,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait Test: Sync {
|
||||
fn name(&self) -> &'static str;
|
||||
fn description(&self) -> &'static str;
|
||||
async fn run(&self) -> Result<(), Box<dyn std::error::Error + Send + Sync>>;
|
||||
}
|
||||
|
||||
pub struct TestEntry {
|
||||
pub test: &'static dyn Test,
|
||||
}
|
||||
|
||||
inventory::collect!(TestEntry);
|
||||
|
||||
#[macro_export]
|
||||
macro_rules! register_test {
|
||||
($test:expr) => {
|
||||
inventory::submit! {
|
||||
$crate::test_harness::TestEntry { test: $test }
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
pub fn all_tests() -> impl Iterator<Item = &'static dyn Test> {
|
||||
inventory::iter::<TestEntry>().map(|entry| entry.test)
|
||||
}
|
||||
|
||||
pub fn find_tests(filter: Option<&str>) -> Vec<&'static dyn Test> {
|
||||
let filter = filter.map(|f| f.to_lowercase());
|
||||
all_tests()
|
||||
.filter(|t| match &filter {
|
||||
Some(f) => t.name().to_lowercase().contains(f),
|
||||
None => true,
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
206
harmony_e2e_tests/src/tests/cnpg_postgres.rs
Normal file
206
harmony_e2e_tests/src/tests/cnpg_postgres.rs
Normal file
@@ -0,0 +1,206 @@
|
||||
use crate::register_test;
|
||||
use crate::test_harness::{HarnessError, Test, TestContext};
|
||||
use async_trait::async_trait;
|
||||
use harmony::{
|
||||
inventory::Inventory,
|
||||
modules::postgresql::{capability::PostgreSQLConfig, PostgreSQLScore},
|
||||
score::Score,
|
||||
topology::{K8sAnywhereTopology, K8sclient, Topology},
|
||||
};
|
||||
use k8s_openapi::api::core::v1::Pod;
|
||||
use kube::api::{Api, ListParams};
|
||||
use log::{info, warn};
|
||||
use std::time::Duration;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum PostgresTestError {
|
||||
#[error("Failed to create test context: {0}")]
|
||||
ContextCreation(#[from] HarnessError),
|
||||
|
||||
#[error("Failed to initialize topology: {0}")]
|
||||
TopologyInit(String),
|
||||
|
||||
#[error("Failed to interpret postgresql score: {0}")]
|
||||
InterpretError(String),
|
||||
|
||||
#[error("Failed to get k8s client: {0}")]
|
||||
K8sClient(String),
|
||||
|
||||
#[error("PostgreSQL deployment timed out after {timeout_seconds}s in namespace {namespace}")]
|
||||
DeploymentTimeout {
|
||||
namespace: String,
|
||||
timeout_seconds: u64,
|
||||
},
|
||||
|
||||
#[error("PostgreSQL connection verification failed: {0}")]
|
||||
ConnectionVerification(String),
|
||||
|
||||
#[error("SQL query failed: {0}")]
|
||||
SqlQueryFailed(String),
|
||||
}
|
||||
|
||||
pub struct CnpgPostgresTest;
|
||||
|
||||
impl CnpgPostgresTest {
|
||||
pub const INSTANCE: Self = CnpgPostgresTest;
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Test for CnpgPostgresTest {
|
||||
fn name(&self) -> &'static str {
|
||||
"cnpg_postgres"
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"CNPG PostgreSQL deployment using Harmony's PostgreSQL module"
|
||||
}
|
||||
|
||||
async fn run(&self) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
|
||||
run_impl()
|
||||
.await
|
||||
.map_err(|e| Box::new(e) as Box<dyn std::error::Error + Send + Sync>)
|
||||
}
|
||||
}
|
||||
|
||||
register_test!(&CnpgPostgresTest::INSTANCE);
|
||||
|
||||
async fn run_impl() -> Result<(), PostgresTestError> {
|
||||
let ctx = TestContext::new("cnpg-postgres")?;
|
||||
|
||||
info!("=== Test: CNPG PostgreSQL deployment ===");
|
||||
|
||||
info!("Step 1: Initializing K8sAnywhereTopology...");
|
||||
let topology = K8sAnywhereTopology::from_env();
|
||||
|
||||
info!("Step 2: Ensuring topology is ready...");
|
||||
topology
|
||||
.ensure_ready()
|
||||
.await
|
||||
.map_err(|e| PostgresTestError::TopologyInit(e.to_string()))?;
|
||||
|
||||
info!("Step 3: Creating PostgreSQL deployment score...");
|
||||
let pg_score = PostgreSQLScore {
|
||||
config: PostgreSQLConfig {
|
||||
cluster_name: format!("{}-pg", ctx.test_name),
|
||||
namespace: ctx.namespace.clone(),
|
||||
instances: 1,
|
||||
..Default::default()
|
||||
},
|
||||
};
|
||||
|
||||
info!("Step 4: Deploying PostgreSQL using Harmony's PostgreSQL module...");
|
||||
let outcome = pg_score
|
||||
.interpret(&Inventory::empty(), &topology)
|
||||
.await
|
||||
.map_err(|e| PostgresTestError::InterpretError(e.to_string()))?;
|
||||
|
||||
info!("Deployment outcome: {}", outcome.message);
|
||||
|
||||
info!("Step 5: Waiting for PostgreSQL cluster to be ready...");
|
||||
let cluster_name = &pg_score.config.cluster_name;
|
||||
wait_for_postgres_ready(&topology, &ctx.namespace, cluster_name, 300).await?;
|
||||
|
||||
info!("Step 6: Verifying PostgreSQL is working with a SQL query...");
|
||||
let result = verify_postgres_connection(&ctx.namespace, cluster_name).await?;
|
||||
|
||||
info!("Query result: {}", result);
|
||||
assert!(result.contains("1"), "Expected query to return 1");
|
||||
|
||||
info!("=== Test PASSED: CNPG PostgreSQL deployment ===\n");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn wait_for_postgres_ready(
|
||||
topology: &K8sAnywhereTopology,
|
||||
namespace: &str,
|
||||
cluster_name: &str,
|
||||
timeout_seconds: u64,
|
||||
) -> Result<String, PostgresTestError> {
|
||||
let client = topology
|
||||
.k8s_client()
|
||||
.await
|
||||
.map_err(PostgresTestError::K8sClient)?;
|
||||
|
||||
let pods: Api<Pod> = Api::namespaced(client.inner_client_clone(), namespace);
|
||||
let label_selector = format!("postgresql.cnpg.io/cluster={}", cluster_name);
|
||||
|
||||
let deadline = tokio::time::Instant::now() + Duration::from_secs(timeout_seconds);
|
||||
let mut interval = tokio::time::interval(Duration::from_secs(5));
|
||||
|
||||
loop {
|
||||
interval.tick().await;
|
||||
|
||||
if tokio::time::Instant::now() > deadline {
|
||||
return Err(PostgresTestError::DeploymentTimeout {
|
||||
namespace: namespace.to_string(),
|
||||
timeout_seconds,
|
||||
});
|
||||
}
|
||||
|
||||
let pod_list = pods
|
||||
.list(&ListParams::default().labels(&label_selector))
|
||||
.await
|
||||
.map_err(|e| PostgresTestError::K8sClient(e.to_string()))?;
|
||||
|
||||
for pod in pod_list.items {
|
||||
if let Some(status) = &pod.status {
|
||||
if status.phase.as_deref() == Some("Running") {
|
||||
if let Some(conditions) = &status.conditions {
|
||||
if conditions
|
||||
.iter()
|
||||
.any(|c| c.type_ == "Ready" && c.status == "True")
|
||||
{
|
||||
let pod_name = pod.metadata.name.clone().unwrap_or_default();
|
||||
info!("PostgreSQL pod '{}' is ready", pod_name);
|
||||
return Ok(pod_name);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
warn!(
|
||||
"Waiting for PostgreSQL pod with label '{}' to be ready...",
|
||||
label_selector
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
async fn verify_postgres_connection(
|
||||
namespace: &str,
|
||||
cluster_name: &str,
|
||||
) -> Result<String, PostgresTestError> {
|
||||
let pod_name = format!("{}-1", cluster_name);
|
||||
|
||||
let mut cmd = tokio::process::Command::new("kubectl");
|
||||
cmd.args([
|
||||
"exec",
|
||||
"-n",
|
||||
namespace,
|
||||
&pod_name,
|
||||
"--",
|
||||
"psql",
|
||||
"-U",
|
||||
"app",
|
||||
"-d",
|
||||
"app",
|
||||
"-t",
|
||||
"-c",
|
||||
"SELECT 1 AS test;",
|
||||
]);
|
||||
|
||||
let output = cmd
|
||||
.output()
|
||||
.await
|
||||
.map_err(|e| PostgresTestError::ConnectionVerification(e.to_string()))?;
|
||||
|
||||
if !output.status.success() {
|
||||
return Err(PostgresTestError::SqlQueryFailed(
|
||||
String::from_utf8_lossy(&output.stderr).to_string(),
|
||||
));
|
||||
}
|
||||
|
||||
Ok(String::from_utf8_lossy(&output.stdout).trim().to_string())
|
||||
}
|
||||
147
harmony_e2e_tests/src/tests/k3d_cluster.rs
Normal file
147
harmony_e2e_tests/src/tests/k3d_cluster.rs
Normal file
@@ -0,0 +1,147 @@
|
||||
use crate::register_test;
|
||||
use crate::test_harness::{HarnessError, Test, TestContext};
|
||||
use async_trait::async_trait;
|
||||
use harmony::topology::{K8sAnywhereTopology, K8sclient, Topology};
|
||||
use k8s_openapi::api::core::v1::Node;
|
||||
use kube::api::{Api, ListParams};
|
||||
use log::info;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum K3dTestError {
|
||||
#[error("Failed to create test context: {0}")]
|
||||
ContextCreation(#[from] HarnessError),
|
||||
|
||||
#[error("Failed to initialize topology: {0}")]
|
||||
TopologyInit(String),
|
||||
|
||||
#[error("Failed to get k8s client: {0}")]
|
||||
K8sClient(String),
|
||||
|
||||
#[error("Cluster validation failed: expected {expected_nodes} nodes, found {nodes_count}")]
|
||||
ClusterValidation {
|
||||
nodes_count: usize,
|
||||
expected_nodes: usize,
|
||||
},
|
||||
|
||||
#[error("Node {node_name} is not ready")]
|
||||
NodeNotReady { node_name: String },
|
||||
|
||||
#[error("No nodes found in cluster")]
|
||||
NoNodesFound,
|
||||
}
|
||||
|
||||
pub struct K3dClusterTest;
|
||||
|
||||
impl K3dClusterTest {
|
||||
pub const INSTANCE: Self = K3dClusterTest;
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Test for K3dClusterTest {
|
||||
fn name(&self) -> &'static str {
|
||||
"k3d_cluster"
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"k3d cluster creation with Harmony modules"
|
||||
}
|
||||
|
||||
async fn run(&self) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
|
||||
run_impl()
|
||||
.await
|
||||
.map_err(|e| Box::new(e) as Box<dyn std::error::Error + Send + Sync>)
|
||||
}
|
||||
}
|
||||
|
||||
register_test!(&K3dClusterTest::INSTANCE);
|
||||
|
||||
async fn run_impl() -> Result<(), K3dTestError> {
|
||||
info!("=== Test: k3d cluster creation with Harmony modules ===");
|
||||
|
||||
let _ctx = TestContext::new("k3d-cluster")?;
|
||||
|
||||
info!("Step 1: Initializing K8sAnywhereTopology...");
|
||||
let topology = K8sAnywhereTopology::from_env();
|
||||
|
||||
info!("Step 2: Ensuring topology is ready (this installs k3d if needed)...");
|
||||
topology
|
||||
.ensure_ready()
|
||||
.await
|
||||
.map_err(|e| K3dTestError::TopologyInit(e.to_string()))?;
|
||||
|
||||
info!("Step 3: Validating cluster is operational...");
|
||||
validate_cluster(&topology).await?;
|
||||
|
||||
info!("Step 4: Verifying all nodes are ready...");
|
||||
verify_nodes_ready(&topology).await?;
|
||||
|
||||
info!("=== Test PASSED: k3d cluster creation ===");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn validate_cluster(topology: &K8sAnywhereTopology) -> Result<(), K3dTestError> {
|
||||
let client = topology
|
||||
.k8s_client()
|
||||
.await
|
||||
.map_err(K3dTestError::K8sClient)?;
|
||||
|
||||
let nodes: Api<Node> = Api::all(client.inner_client_clone());
|
||||
let node_list = nodes
|
||||
.list(&ListParams::default())
|
||||
.await
|
||||
.map_err(|e| K3dTestError::K8sClient(e.to_string()))?;
|
||||
|
||||
let nodes_count = node_list.items.len();
|
||||
|
||||
if nodes_count == 0 {
|
||||
return Err(K3dTestError::NoNodesFound);
|
||||
}
|
||||
|
||||
info!("Found {} node(s) in cluster", nodes_count);
|
||||
|
||||
for node in &node_list.items {
|
||||
let node_name = node.metadata.name.as_deref().unwrap_or("unknown");
|
||||
info!(" - Node: {}", node_name);
|
||||
}
|
||||
|
||||
if nodes_count < 1 {
|
||||
return Err(K3dTestError::ClusterValidation {
|
||||
nodes_count,
|
||||
expected_nodes: 1,
|
||||
});
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn verify_nodes_ready(topology: &K8sAnywhereTopology) -> Result<(), K3dTestError> {
|
||||
let client = topology
|
||||
.k8s_client()
|
||||
.await
|
||||
.map_err(K3dTestError::K8sClient)?;
|
||||
|
||||
let nodes: Api<Node> = Api::all(client.inner_client_clone());
|
||||
let node_list = nodes
|
||||
.list(&ListParams::default())
|
||||
.await
|
||||
.map_err(|e| K3dTestError::K8sClient(e.to_string()))?;
|
||||
|
||||
for node in node_list.items {
|
||||
let node_name = node.metadata.name.clone().unwrap_or_default();
|
||||
|
||||
let conditions = node.status.and_then(|s| s.conditions).unwrap_or_default();
|
||||
|
||||
let ready = conditions
|
||||
.iter()
|
||||
.any(|c| c.type_ == "Ready" && c.status == "True");
|
||||
|
||||
if !ready {
|
||||
return Err(K3dTestError::NodeNotReady { node_name });
|
||||
}
|
||||
|
||||
info!("Node '{}' is Ready", node_name);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
3
harmony_e2e_tests/src/tests/mod.rs
Normal file
3
harmony_e2e_tests/src/tests/mod.rs
Normal file
@@ -0,0 +1,3 @@
|
||||
pub mod cnpg_postgres;
|
||||
pub mod k3d_cluster;
|
||||
pub mod multicluster_postgres;
|
||||
54
harmony_e2e_tests/src/tests/multicluster_postgres.rs
Normal file
54
harmony_e2e_tests/src/tests/multicluster_postgres.rs
Normal file
@@ -0,0 +1,54 @@
|
||||
use crate::register_test;
|
||||
use crate::test_harness::{HarnessError, Test, TestContext};
|
||||
use async_trait::async_trait;
|
||||
use log::info;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum MulticlusterPostgresTestError {
|
||||
#[error("Failed to create test context: {0}")]
|
||||
ContextCreation(#[from] HarnessError),
|
||||
}
|
||||
|
||||
pub struct MulticlusterPostgresTest;
|
||||
|
||||
impl MulticlusterPostgresTest {
|
||||
pub const INSTANCE: Self = MulticlusterPostgresTest;
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Test for MulticlusterPostgresTest {
|
||||
fn name(&self) -> &'static str {
|
||||
"multicluster_postgres"
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"Multi-cluster PostgreSQL with failover"
|
||||
}
|
||||
|
||||
async fn run(&self) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
|
||||
run_impl()
|
||||
.await
|
||||
.map_err(|e| Box::new(e) as Box<dyn std::error::Error + Send + Sync>)
|
||||
}
|
||||
}
|
||||
|
||||
register_test!(&MulticlusterPostgresTest::INSTANCE);
|
||||
|
||||
async fn run_impl() -> Result<(), MulticlusterPostgresTestError> {
|
||||
let _ctx = TestContext::new("multicluster-postgres")?;
|
||||
|
||||
info!("=== Test: Multi-cluster PostgreSQL with failover ===");
|
||||
info!("This test is not yet fully implemented.");
|
||||
info!("It will:");
|
||||
info!(" 1. Create two k3d clusters (primary and replica)");
|
||||
info!(" 2. Deploy CNPG operator on both clusters");
|
||||
info!(" 3. Deploy primary PostgreSQL with LoadBalancer service");
|
||||
info!(" 4. Extract replication certificates from primary");
|
||||
info!(" 5. Deploy replica PostgreSQL configured to replicate from primary");
|
||||
info!(" 6. Insert test data on primary");
|
||||
info!(" 7. Verify data is replicated to replica");
|
||||
info!("=== Test SKIPPED: Multi-cluster PostgreSQL (not implemented) ===\n");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
0
infrastructure.rs
Normal file
0
infrastructure.rs
Normal file
Reference in New Issue
Block a user