fix: dependencies and formatting

e2e tests module ready for review, k3d test works well
chore: use async trait instead of ugly types
2026-03-09 22:25:16 -04:00 · 2026-03-09 22:17:28 -04:00 · 2026-03-09 21:59:57 -04:00 · 2026-03-09 21:54:12 -04:00 · 2026-03-09 21:01:47 -04:00 · 2026-03-09 20:15:41 -04:00
209 changed files with 12693 additions and 6386 deletions
--- a/CI_and_testing_harmony_analysis.md
+++ b/CI_and_testing_harmony_analysis.md
@@ -0,0 +1,548 @@
+# CI and Testing Strategy for Harmony
+
+## Executive Summary
+
+Harmony aims to become a CNCF project, requiring a robust CI pipeline that demonstrates real-world reliability. The goal is to run **all examples** in CI, from simple k3d deployments to full HA OKD clusters on bare metal. This document provides context for designing and implementing this testing infrastructure.
+
+---
+
+## Project Context
+
+### What is Harmony?
+
+Harmony is an infrastructure automation framework that is **code-first and code-only**. Operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Key differentiators:
+
+1. **Compile-time safety**: The type system prevents "config-is-valid-but-platform-is-wrong" errors
+2. **Topology abstraction**: Write once, deploy to any environment (local k3d, OKD, bare metal, cloud)
+3. **Capability-based design**: Scores declare what they need; topologies provide what they have
+
+### Core Abstractions
+
+| Concept | Description |
+|---------|-------------|
+| **Score** | Declarative description of desired state (the "what") |
+| **Topology** | Logical representation of infrastructure (the "where") |
+| **Capability** | A feature a topology offers (the "how") |
+| **Interpret** | Execution logic connecting Score to Topology |
+
+### Compile-Time Verification
+
+```rust
+// This compiles only if K8sAnywhereTopology provides K8sclient + HelmCommand
+impl<T: Topology + K8sclient + HelmCommand> Score<T> for MyScore { ... }
+
+// This FAILS to compile - LinuxHostTopology doesn't provide K8sclient
+// (intentionally broken example for testing)
+impl<T: Topology + K8sclient> Score<T> for K8sResourceScore { ... }
+// error: LinuxHostTopology does not implement K8sclient
+```
+
+---
+
+## Current Examples Inventory
+
+### Summary Statistics
+
+| Category | Count | CI Complexity |
+|----------|-------|---------------|
+| k3d-compatible | 22 | Low - single k3d cluster |
+| OKD-specific | 4 | Medium - requires OKD cluster |
+| Bare metal | 4 | High - requires physical infra or nested virtualization |
+| Multi-cluster | 3 | High - requires multiple K8s clusters |
+| No infra needed | 4 | Trivial - local only |
+
+### Detailed Example Classification
+
+#### Tier 1: k3d-Compatible (22 examples)
+
+Can run on a local k3d cluster with minimal setup:
+
+| Example | Topology | Capabilities | Special Notes |
+|---------|----------|--------------|---------------|
+| zitadel | K8sAnywhereTopology | K8sClient, HelmCommand | SSO/Identity |
+| node_health | K8sAnywhereTopology | K8sClient | Health checks |
+| public_postgres | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Needs ingress |
+| openbao | K8sAnywhereTopology | K8sClient, HelmCommand | Vault alternative |
+| rust | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Webapp deployment |
+| cert_manager | K8sAnywhereTopology | K8sClient, CertificateManagement | TLS certificates |
+| try_rust_webapp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Full webapp |
+| monitoring | K8sAnywhereTopology | K8sClient, HelmCommand, Observability | Prometheus |
+| application_monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
+| monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
+| postgresql | K8sAnywhereTopology | K8sClient, HelmCommand | CloudNativePG |
+| ntfy | K8sAnywhereTopology | K8sClient, HelmCommand | Notifications |
+| tenant | K8sAnywhereTopology | K8sClient, TenantManager | Namespace isolation |
+| lamp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | LAMP stack |
+| k8s_drain_node | K8sAnywhereTopology | K8sClient | Node operations |
+| k8s_write_file_on_node | K8sAnywhereTopology | K8sClient | Node operations |
+| remove_rook_osd | K8sAnywhereTopology | K8sClient | Ceph operations |
+| validate_ceph_cluster_health | K8sAnywhereTopology | K8sClient | Ceph health |
+| kube-rs | Direct kube | K8sClient | Raw kube-rs demo |
+| brocade_snmp_server | K8sAnywhereTopology | K8sClient | SNMP collector |
+| harmony_inventory_builder | LocalhostTopology | None | Network scanning |
+| cli | LocalhostTopology | None | CLI demo |
+
+#### Tier 2: OKD/OpenShift-Specific (4 examples)
+
+Require OKD/OpenShift features not available in vanilla K8s:
+
+| Example | Topology | OKD-Specific Feature |
+|---------|----------|---------------------|
+| okd_cluster_alerts | K8sAnywhereTopology | OpenShift Monitoring CRDs |
+| operatorhub_catalog | K8sAnywhereTopology | OpenShift OperatorHub |
+| rhob_application_monitoring | K8sAnywhereTopology | RHOB (Red Hat Observability) |
+| nats-supercluster | K8sAnywhereTopology | OKD Routes (OpenShift Ingress) |
+
+#### Tier 3: Bare Metal Infrastructure (4 examples)
+
+Require physical hardware or full virtualization:
+
+| Example | Topology | Physical Requirements |
+|---------|----------|----------------------|
+| okd_installation | HAClusterTopology | OPNSense, Brocade switch, PXE boot, 3+ nodes |
+| okd_pxe | HAClusterTopology | OPNSense, Brocade switch, PXE infrastructure |
+| sttest | HAClusterTopology | Full HA cluster with all network services |
+| opnsense | OPNSenseFirewall | OPNSense firewall access |
+| opnsense_node_exporter | Custom | OPNSense firewall |
+
+#### Tier 4: Multi-Cluster (3 examples)
+
+Require multiple K8s clusters:
+
+| Example | Topology | Clusters Required |
+|---------|----------|-------------------|
+| nats | K8sAnywhereTopology × 2 | 2 clusters with NATS gateways |
+| nats-module | DecentralizedTopology | 3 clusters for supercluster |
+| multisite_postgres | FailoverTopology | 2 clusters for replication |
+
+---
+
+## Testing Categories
+
+### 1. Compile-Time Tests
+
+These tests verify that the type system correctly rejects invalid configurations:
+
+```rust
+// Should NOT compile - K8sResourceScore on LinuxHostTopology
+#[test]
+#[compile_fail]
+fn test_k8s_score_on_linux_host() {
+    let score = K8sResourceScore::new();
+    let topology = LinuxHostTopology::new();
+    // This line should fail to compile
+    harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
+}
+
+// Should compile - K8sResourceScore on K8sAnywhereTopology
+#[test]
+fn test_k8s_score_on_k8s_topology() {
+    let score = K8sResourceScore::new();
+    let topology = K8sAnywhereTopology::from_env();
+    // This should compile
+    harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
+}
+```
+
+**Implementation Options:**
+- `trybuild` crate for compile-time failure tests
+- Separate `tests/compile_fail/` directory with expected error messages
+
+### 2. Unit Tests
+
+Pure Rust logic without external dependencies:
+- Score serialization/deserialization
+- Inventory parsing
+- Type conversions
+- CRD generation
+
+**Requirements:**
+- No external services
+- Sub-second execution
+- Run on every PR
+
+### 3. Integration Tests (k3d)
+
+Deploy to a local k3d cluster:
+
+**Setup:**
+```bash
+# Install k3d
+curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
+
+# Create cluster
+k3d cluster create harmony-test \
+  --agents 3 \
+  --k3s-arg "--disable=traefik@server:0"
+
+# Wait for ready
+kubectl wait --for=condition=Ready nodes --all --timeout=120s
+```
+
+**Test Matrix:**
+| Example | k3d | Test Type |
+|---------|-----|-----------|
+| zitadel | ✅ | Deploy + health check |
+| cert_manager | ✅ | Deploy + certificate issuance |
+| monitoring | ✅ | Deploy + metric collection |
+| postgresql | ✅ | Deploy + database connectivity |
+| tenant | ✅ | Namespace creation + isolation |
+
+### 4. Integration Tests (OKD)
+
+Deploy to OKD/OpenShift cluster:
+
+**Options:**
+1. **Nested virtualization**: Run OKD in VMs (slow, expensive)
+2. **CRC (CodeReady Containers)**: Single-node OKD (resource intensive)
+3. **Managed OpenShift**: AWS/Azure/GCP (costly)
+4. **Existing cluster**: Connect to pre-provisioned cluster (fastest)
+
+**Test Matrix:**
+| Example | OKD Required | Test Type |
+|---------|--------------|-----------|
+| okd_cluster_alerts | ✅ | Alert rule deployment |
+| rhob_application_monitoring | ✅ | RHOB stack deployment |
+| operatorhub_catalog | ✅ | Operator installation |
+
+### 5. End-to-End Tests (Full Infrastructure)
+
+Complete infrastructure deployment including bare metal:
+
+**Options:**
+1. **Libvirt + KVM**: Virtual machines on CI runner
+2. **Nested KVM**: KVM inside KVM (for cloud CI)
+3. **Dedicated hardware**: Physical test lab
+4. **Mock/Hybrid**: Mock physical components, real K8s
+
+---
+
+## CI Environment Options
+
+### Option A: GitHub Actions (Current Standard)
+
+**Pros:**
+- Native GitHub integration
+- Large runner ecosystem
+- Free for open source
+
+**Cons:**
+- Limited nested virtualization support
+- 6-hour job timeout
+- Resource constraints on free runners
+
+**Matrix:**
+```yaml
+strategy:
+  matrix:
+    os: [ubuntu-latest]
+    rust: [stable, beta]
+    k8s: [k3d, kind]
+    tier: [unit, k3d-integration]
+```
+
+### Option B: Self-Hosted Runners
+
+**Pros:**
+- Full control over environment
+- Can run nested virtualization
+- No time limits
+- Persistent state between runs
+
+**Cons:**
+- Maintenance overhead
+- Cost of infrastructure
+- Security considerations
+
+**Setup:**
+- Bare metal servers with KVM support
+- Pre-installed k3d, kind, CRC
+- OPNSense VM for network tests
+
+### Option C: Hybrid (GitHub + Self-Hosted)
+
+**Pros:**
+- Fast unit tests on GitHub runners
+- Heavy tests on self-hosted infrastructure
+- Cost-effective
+
+**Cons:**
+- Two CI systems to maintain
+- Complexity in test distribution
+
+### Option D: Cloud CI (CircleCI, GitLab CI, etc.)
+
+**Pros:**
+- Often better resource options
+- Docker-in-Docker support
+- Better nested virtualization
+
+**Cons:**
+- Cost
+- Less GitHub-native
+
+---
+
+## Performance Requirements
+
+### Target Execution Times
+
+| Test Category | Target Time | Current (est.) |
+|---------------|-------------|----------------|
+| Compile-time tests | < 30s | Unknown |
+| Unit tests | < 60s | Unknown |
+| k3d integration (per example) | < 120s | 60-300s |
+| Full k3d matrix | < 15 min | 30-60 min |
+| OKD integration | < 30 min | 1-2 hours |
+| Full E2E | < 2 hours | 4-8 hours |
+
+### Sub-Second Performance Strategies
+
+1. **Parallel execution**: Run independent tests concurrently
+2. **Incremental testing**: Only run affected tests on changes
+3. **Cached clusters**: Pre-warm k3d clusters
+4. **Layered testing**: Fail fast on cheaper tests
+5. **Mock external services**: Fake Discord webhooks, etc.
+
+---
+
+## Test Data and Secrets Management
+
+### Secrets Required
+
+| Secret | Use | Storage |
+|--------|-----|---------|
+| Discord webhook URL | Alert receiver tests | GitHub Secrets |
+| OPNSense credentials | Network tests | Self-hosted only |
+| Cloud provider creds | Multi-cloud tests | Vault / GitHub Secrets |
+| TLS certificates | Ingress tests | Generated on-the-fly |
+
+### Test Data
+
+| Data | Source | Strategy |
+|------|--------|----------|
+| Container images | Public registries | Cache locally |
+| Helm charts | Public repos | Vendor in repo |
+| K8s manifests | Generated | Dynamic |
+
+---
+
+## Proposed Test Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    harmony_e2e_tests Package                      │
+│                    (cargo run -p harmony_e2e_tests)              │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
+│  │  Compile    │  │   Unit      │  │   Compile-Fail Tests    │  │
+│  │  Tests      │  │   Tests     │  │   (trybuild)            │  │
+│  │  < 30s      │  │   < 60s     │  │   < 30s                 │  │
+│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              k3d Integration Tests                         │  │
+│  │  Self-provisions k3d cluster, runs 22 examples            │  │
+│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐         │  │
+│  │  │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ...     │  │
+│  │  │  60s    │ │  90s    │ │ 120s    │ │  90s    │         │  │
+│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘         │  │
+│  │                  Parallel Execution                        │  │
+│  └───────────────────────────────────────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              OKD Integration Tests                         │  │
+│  │  Connects to existing OKD cluster or provisions via KVM    │  │
+│  │  ┌─────────────────┐  ┌─────────────────────────────┐    │  │
+│  │  │ okd_cluster_    │  │ rhob_application_           │    │  │
+│  │  │ alerts (5 min)  │  │ monitoring (10 min)         │    │  │
+│  │  └─────────────────┘  └─────────────────────────────┘    │  │
+│  └───────────────────────────────────────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              KVM-based E2E Tests                           │  │
+│  │  Uses Harmony's KVM module to provision test VMs           │  │
+│  │  ┌─────────────────┐  ┌─────────────────────────────┐    │  │
+│  │  │ okd_installation│  │ Full HA cluster deployment   │    │  │
+│  │  │ (30-60 min)     │  │ (60-120 min)                │    │  │
+│  │  └─────────────────┘  └─────────────────────────────┘    │  │
+│  └───────────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────────┘
+
+Any CI system (GitHub Actions, GitLab CI, Jenkins, cron) just runs:
+    cargo run -p harmony_e2e_tests
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        GitHub Actions                            │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
+│  │  Compile    │  │   Unit      │  │   Compile-Fail Tests    │  │
+│  │  Tests      │  │   Tests     │  │   (trybuild)            │  │
+│  │  < 30s      │  │   < 60s     │  │   < 30s                 │  │
+│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              k3d Integration Tests                         │  │
+│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐         │  │
+│  │  │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ...     │  │
+│  │  │  60s    │ │  90s    │ │ 120s    │ │  90s    │         │  │
+│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘         │  │
+│  │                  Parallel Execution                        │  │
+│  └───────────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────────┘
+
+┌─────────────────────────────────────────────────────────────────┐
+│                     Self-Hosted Runners                          │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              OKD Integration Tests                         │  │
+│  │  ┌─────────────────┐  ┌─────────────────────────────┐    │  │
+│  │  │ okd_cluster_    │  │ rhob_application_           │    │  │
+│  │  │ alerts (5 min)  │  │ monitoring (10 min)         │    │  │
+│  │  └─────────────────┘  └─────────────────────────────┘    │  │
+│  └───────────────────────────────────────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              KVM-based E2E Tests (Harmony provisions)      │  │
+│  │  ┌─────────────────────────────────────────────────────┐  │  │
+│  │  │  Harmony KVM Module provisions test VMs             │  │  │
+│  │  │  - OKD HA Cluster (3 control plane, 2 workers)     │  │  │
+│  │  │  - OPNSense VM (router/firewall)                    │  │  │
+│  │  │  - Brocade simulator VM                             │  │  │
+│  │  └─────────────────────────────────────────────────────┘  │  │
+│  └───────────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Questions for Researchers
+
+### Critical Questions
+
+1. **Self-contained test runner**: How to design `harmony_e2e_tests` package that runs all tests with a single `cargo run` command?
+
+2. **Nested Virtualization**: What are the prerequisites for running KVM inside a test environment?
+
+3. **Cost Optimization**: How to minimize cloud costs while running comprehensive E2E tests?
+
+4. **Test Isolation**: How to ensure test isolation when running parallel k3d tests?
+
+5. **State Management**: Should we persist k3d clusters between test runs, or create fresh each time?
+
+6. **Mocking Strategy**: Which external services (Discord, OPNSense, etc.) should be mocked vs. real?
+
+7. **Compile-Fail Tests**: Best practices for testing Rust compile-time errors?
+
+8. **Multi-Cluster Tests**: How to efficiently provision and connect multiple K8s clusters in tests?
+
+9. **Secrets Management**: How to handle secrets for test environments without external CI dependencies?
+
+10. **Test Flakiness**: Strategies for reducing flakiness in infrastructure tests?
+
+11. **Reporting**: How to present test results for complex multi-environment test matrices?
+
+12. **Prerequisite Detection**: How to detect and validate prerequisites (Docker, k3d, KVM) before running tests?
+
+### Research Areas
+
+1. **CI/CD Tools**: Evaluate GitHub Actions, GitLab CI, CircleCI, Tekton, Prow for Harmony's needs
+
+2. **K8s Test Tools**: Evaluate kind, k3d, minikube, microk8s for local testing
+
+3. **Mock Frameworks**: Evaluate mock-server, wiremock, hoverfly for external service mocking
+
+4. **Test Frameworks**: Evaluate built-in Rust test, nextest, cargo-tarpaulin for performance
+
+---
+
+## Success Criteria
+
+### Week 1 (Agentic Velocity)
+- [ ] Compile-time verification tests working
+- [ ] Unit tests for monitoring module
+- [ ] First 5 k3d examples running in CI
+- [ ] Mock framework for Discord webhooks
+
+### Week 2
+- [ ] All 22 k3d-compatible examples in CI
+- [ ] OKD self-hosted runner operational
+- [ ] KVM module reviewed and ready for CI
+
+### Week 3-4
+- [ ] Full E2E tests with KVM infrastructure
+- [ ] Multi-cluster tests automated
+- [ ] All examples tested in CI
+
+### Month 2
+- [ ] Sub-15-minute total CI time
+- [ ] Weekly E2E tests on bare metal
+- [ ] Documentation complete
+- [ ] Ready for CNCF submission
+
+---
+
+## Prerequisites
+
+### Hardware Requirements
+
+| Component | Minimum | Recommended |
+|-----------|---------|------------|
+| CPU | 4 cores | 8+ cores (for parallel tests) |
+| RAM | 8 GB | 32 GB (for KVM E2E) |
+| Disk | 50 GB SSD | 500 GB NVMe |
+| Docker | Required | Latest |
+| k3d | Required | v5.6.0 |
+| Kubectl | Required | v1.28.0 |
+| libvirt | Required | 9.0.0 (for KVM tests) |
+
+### Software Requirements
+| Tool | Version |
+|------|---------|
+| Rust | 1.75+ |
+| Docker | 24.0+ |
+| k3d | v5.6.0+ |
+| kubectl | v1.28+ |
+| libvirt | 9.0.0 |
+
+### Installation (One-time)
+
+```bash
+# Install Rust
+curl --proto '=https://sh.rustup.rs' -sSf | sh
+
+# Install Docker
+curl -fsSL https://get.docker.com -o docker-ce | sh
+
+# Install k3d
+curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
+
+# Install kubectl
+curl -LO "https://dl.k8s.io/release/v1.28.0/bin/linux/amd64" -o /usr/local/bin/kubectl
+
+sudo mv /usr/local/bin/kubectl /usr/local/bin
+```
+
+---
+
+## Reference Materials
+### Existing Code
+
+- Examples: `examples/*/src/main.rs`
+- Topologies: `harmony/src/domain/topology/`
+- Capabilities: `harmony/src/domain/topology/` (trait definitions)
+- Scores: `harmony/src/modules/*/`
+
+### Documentation
+
+- [Coding Guide](docs/coding-guide.md)
+- [Core Concepts](docs/concepts.md)
+- [Monitoring Architecture](docs/monitoring.md)
+- [ADR-020: Monitoring](adr/020-monitoring-alerting-architecture.md)
+
+### Related Projects
+
+- Crossplane (similar abstraction model)
+- Pulumi (infrastructure as code)
+- Terraform (state management patterns)
+- Flux/ArgoCD (GitOps testing patterns)
--- a/CI_and_testing_roadmap.md
+++ b/CI_and_testing_roadmap.md
@@ -0,0 +1,201 @@
+# Pragmatic CI and Testing Roadmap for Harmony
+
+**Status**: Active implementation (March 2026)  
+**Core Principle**: Self-contained test runner — no dependency on centralized CI servers
+
+All tests are executable via one command:
+
+```bash
+cargo run -p harmony_e2e_tests
+```
+
+The `harmony_e2e_tests` package:
+- Provisions its own infrastructure when needed (k3d, KVM VMs)
+- Runs all test tiers in sequence or selectively
+- Reports results in text, JSON or JUnit XML
+- Works identically on developer laptops, any Linux server, GitHub Actions, GitLab CI, Jenkins, cron jobs, etc.
+- Is the single source of truth for what "passing CI" means
+
+## Why This Approach
+
+1. **Portability** — same command & behavior everywhere
+2. **Harmony tests Harmony** — the framework validates itself
+3. **No vendor lock-in** — GitHub Actions / GitLab CI are just triggers
+4. **Perfect reproducibility** — developers reproduce any CI failure locally in seconds
+5. **Offline capable** — after initial setup, most tiers run without internet
+
+## Architecture: `harmony_e2e_tests` Package
+
+```
+harmony_e2e_tests/
+├── Cargo.toml
+├── src/
+│   ├── main.rs              # CLI entry point
+│   ├── lib.rs               # Test runner core logic
+│   ├── tiers/
+│   │   ├── mod.rs
+│   │   ├── compile_fail.rs  # trybuild-based compile-time checks
+│   │   ├── unit.rs          # cargo test --lib --workspace
+│   │   ├── k3d.rs           # k3d cluster + parallel example runs
+│   │   ├── okd.rs           # connect to existing OKD cluster
+│   │   └── kvm.rs           # full E2E via Harmony's own KVM module
+│   ├── mocks/
+│   │   ├── mod.rs
+│   │   ├── discord.rs       # mock Discord webhook receiver
+│   │   └── opnsense.rs      # mock OPNSense firewall API
+│   └── infrastructure/
+│       ├── mod.rs
+│       ├── k3d.rs           # k3d cluster lifecycle
+│       └── kvm.rs           # helper wrappers around KVM score
+└── tests/
+    ├── ui/                  # trybuild compile-fail cases (*.rs + *.stderr)
+    └── fixtures/            # static test data / golden files
+```
+
+## CLI Interface ( clap-based )
+
+```bash
+# Run everything (default)
+cargo run -p harmony_e2e_tests
+
+# Specific tier
+cargo run -p harmony_e2e_tests -- --tier k3d
+cargo run -p harmony_e2e_tests -- --tier compile
+
+# Filter to one example
+cargo run -p harmony_e2e_tests -- --tier k3d --example monitoring
+
+# Parallelism control (k3d tier)
+cargo run -p harmony_e2e_tests -- --parallel 8
+
+# Reporting
+cargo run -p harmony_e2e_tests -- --report junit.xml
+cargo run -p harmony_e2e_tests -- --format json
+
+# Debug helpers
+cargo run -p harmony_e2e_tests -- --verbose --dry-run
+```
+
+## Test Tiers – Ordered by Speed & Cost
+
+| Tier             | Duration target | Runner type          | What it tests                                      | Isolation strategy          |
+|------------------|------------------|----------------------|----------------------------------------------------|-----------------------------|
+| Compile-fail     | < 20 s          | Any (GitHub free)    | Invalid configs don't compile                      | Per-file trybuild           |
+| Unit             | < 60 s          | Any                  | Pure Rust logic                                    | cargo test                  |
+| k3d              | 8–15 min        | GitHub / self-hosted | 22+ k3d-compatible examples                        | Fresh k3d cluster + ns-per-example |
+| OKD              | 10–30 min       | Self-hosted / CRC    | OKD-specific features (Routes, Monitoring CRDs…)   | Existing cluster via KUBECONFIG |
+| KVM Full E2E     | 60–180 min      | Self-hosted bare-metal | Full HA OKD install + bare-metal scenarios         | Harmony KVM score provisions VMs |
+
+### Tier Details & Implementation Notes
+
+1. **Compile-fail**  
+   Uses **`trybuild`** crate (standard in Rust ecosystem).  
+   Place intentional compile errors in `tests/ui/*.rs` with matching `*.stderr` expectation files.  
+   One test function replaces the old custom loop:
+
+   ```rust
+   #[test]
+   fn ui() {
+       let t = trybuild::TestCases::new();
+       t.compile_fail("tests/ui/*.rs");
+   }
+   ```
+
+2. **Unit**  
+   Simple wrapper: `cargo test --lib --workspace -- --nocapture`  
+   Consider `cargo-nextest` later for 2–3× speedup if test count grows.
+
+3. **k3d**  
+   - Provisions isolated cluster once at start (`k3d cluster create --agents 3 --no-lb --disable traefik`)
+   - Discovers examples via `[package.metadata.harmony.test-tier = "k3d"]` in `Cargo.toml`
+   - Runs in parallel with tokio semaphore (default 5–8 slots)
+   - Each example gets its own namespace
+   - Uses `defer` / `scopeguard` for guaranteed cleanup
+   - Mocks Discord webhook and OPNSense API
+
+4. **OKD**  
+   Connects to pre-provisioned cluster via `KUBECONFIG`.  
+   Validates it is actually OpenShift/OKD before proceeding.
+
+5. **KVM**  
+   Uses **Harmony’s own KVM module** to provision test VMs (control-plane + workers + OPNSense).  
+   → True “dogfooding” — if the E2E fails, the KVM score itself is likely broken.
+
+## CI Integration Patterns
+
+### Fast PR validation (GitHub Actions)
+
+```yaml
+name: Fast Tests
+on: [push, pull_request]
+jobs:
+  fast:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: dtolnay/rust-toolchain@stable
+      - name: Install Docker & k3d
+        uses: nolar/setup-k3d-k3s@v1
+      - run: cargo run -p harmony_e2e_tests -- --tier compile,unit,k3d --report junit.xml
+      - uses: actions/upload-artifact@v4
+        with: { name: test-results, path: junit.xml }
+```
+
+### Nightly / Merge heavy tests (self-hosted runner)
+
+```yaml
+name: Full E2E
+on:
+  schedule: [{ cron: "0 3 * * *" }]
+  push: { branches: [main] }
+jobs:
+  full:
+    runs-on: [self-hosted, linux, x64, kvm-capable]
+    steps:
+      - uses: actions/checkout@v4
+      - run: cargo run -p harmony_e2e_tests -- --tier okd,kvm --verbose --report junit.xml
+```
+
+## Prerequisites Auto-Check & Install
+
+```rust
+// in harmony_e2e_tests/src/infrastructure/prerequisites.rs
+async fn ensure_k3d() -> Result<()> { … }          // curl | bash if missing
+async fn ensure_docker() -> Result<()> { … }
+fn check_kvm_support() -> Result<()> { … }        // /dev/kvm + libvirt
+```
+
+## Success Criteria
+
+### Step 1
+- [ ] `harmony_e2e_tests` package created & basic CLI working
+- [ ] trybuild compile-fail suite passing
+- [ ] First 8–10 k3d examples running reliably in CI
+- [ ] Mock server for Discord webhook completed
+
+### Step 2
+- [ ] All 22 k3d-compatible examples green
+- [ ] OKD tier running on dedicated self-hosted runner
+- [ ] JUnit reporting + GitHub check integration
+- [ ] Namespace isolation + automatic retry on transient k8s errors
+
+### Step 3
+- [ ] KVM full E2E green on bare-metal runner (nightly)
+- [ ] Multi-cluster examples (nats, multisite-postgres) automated
+- [ ] Total fast CI time < 12 minutes on GitHub runners
+- [ ] Documentation: “How to add a new tested example”
+
+## Quick Start for New Contributors
+
+```bash
+# One-time setup
+rustup update stable
+cargo install trybuild cargo-nextest   # optional but recommended
+
+# Run locally (most common)
+cargo run -p harmony_e2e_tests -- --tier k3d --verbose
+
+# Just compile checks + unit
+cargo test -p harmony_e2e_tests
+```
+
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -19,7 +19,10 @@ members = [
  "adr/agent_discovery/mdns",
  "brocade",
  "harmony_agent",
-  "harmony_agent/deploy", "harmony_node_readiness",
+  "harmony_agent/deploy",
+  "harmony_node_readiness",
+  "harmony-k8s",
+  "harmony_e2e_tests",
 ]

 [workspace.package]
@@ -38,6 +41,8 @@ tokio = { version = "1.40", features = [
  "macros",
  "rt-multi-thread",
 ] }
+tokio-retry = "0.3.0"
+tokio-util = "0.7.15"
 cidr = { features = ["serde"], version = "0.2" }
 russh = "0.45"
 russh-keys = "0.45"
--- a/adr/020-monitoring-alerting-architecture.md
+++ b/adr/020-monitoring-alerting-architecture.md
@@ -0,0 +1,318 @@
+# Architecture Decision Record: Monitoring and Alerting Architecture
+
+Initial Author: Willem Rolleman, Jean-Gabriel Carrier
+
+Initial Date: March 9, 2026
+
+Last Updated Date: March 9, 2026
+
+## Status
+
+Accepted
+
+Supersedes: [ADR-010](010-monitoring-and-alerting.md)
+
+## Context
+
+Harmony needs a unified approach to monitoring and alerting across different infrastructure targets:
+
+1. **Cluster-level monitoring**: Administrators managing entire Kubernetes/OKD clusters need to define cluster-wide alerts, receivers, and scrape targets.
+
+2. **Tenant-level monitoring**: Multi-tenant clusters where teams are confined to namespaces need monitoring scoped to their resources.
+
+3. **Application-level monitoring**: Developers deploying applications want zero-config monitoring that "just works" for their services.
+
+The monitoring landscape is fragmented:
+- **OKD/OpenShift**: Built-in Prometheus with AlertmanagerConfig CRDs
+- **KubePrometheus**: Helm-based stack with PrometheusRule CRDs
+- **RHOB (Red Hat Observability)**: Operator-based with MonitoringStack CRDs
+- **Standalone Prometheus**: Raw Prometheus deployments
+
+Each system has different CRDs, different installation methods, and different configuration APIs.
+
+## Decision
+
+We implement a **trait-based architecture with compile-time capability verification** that provides:
+
+1. **Type-safe abstractions** via parameterized traits: `AlertReceiver<S>`, `AlertRule<S>`, `ScrapeTarget<S>`
+2. **Compile-time topology compatibility** via the `Observability<S>` capability bound
+3. **Three levels of abstraction**: Cluster, Tenant, and Application monitoring
+4. **Pre-built alert rules** as functions that return typed structs
+
+### Core Traits
+
+```rust
+// domain/topology/monitoring.rs
+
+/// Marker trait for systems that send alerts (Prometheus, etc.)
+pub trait AlertSender: Send + Sync + std::fmt::Debug {
+    fn name(&self) -> String;
+}
+
+/// Defines how a receiver (Discord, Slack, etc.) builds its configuration
+/// for a specific sender type
+pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
+}
+
+/// Defines how an alert rule builds its PrometheusRule configuration
+pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn AlertRule<S>>;
+}
+
+/// Capability that topologies implement to support monitoring
+pub trait Observability<S: AlertSender> {
+    async fn install_alert_sender(&self, sender: &S, inventory: &Inventory) 
+        -> Result<PreparationOutcome, PreparationError>;
+    async fn install_receivers(&self, sender: &S, inventory: &Inventory, 
+        receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>) -> Result<...>;
+    async fn install_rules(&self, sender: &S, inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<S>>>>) -> Result<...>;
+    async fn add_scrape_targets(&self, sender: &S, inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>) -> Result<...>;
+    async fn ensure_monitoring_installed(&self, sender: &S, inventory: &Inventory) 
+        -> Result<...>;
+}
+```
+
+### Alert Sender Types
+
+Each monitoring stack is a distinct `AlertSender`:
+
+| Sender | Module | Use Case |
+|--------|--------|----------|
+| `OpenshiftClusterAlertSender` | `monitoring/okd/` | OKD/OpenShift built-in monitoring |
+| `KubePrometheus` | `monitoring/kube_prometheus/` | Helm-deployed kube-prometheus-stack |
+| `Prometheus` | `monitoring/prometheus/` | Standalone Prometheus via Helm |
+| `RedHatClusterObservability` | `monitoring/red_hat_cluster_observability/` | RHOB operator |
+| `Grafana` | `monitoring/grafana/` | Grafana-managed alerting |
+
+### Three Levels of Monitoring
+
+#### 1. Cluster-Level Monitoring
+
+For cluster administrators. Full control over monitoring infrastructure.
+
+```rust
+// examples/okd_cluster_alerts/src/main.rs
+OpenshiftClusterAlertScore {
+    sender: OpenshiftClusterAlertSender,
+    receivers: vec![Box::new(DiscordReceiver { ... })],
+    rules: vec![Box::new(alert_rules)],
+    scrape_targets: Some(vec![Box::new(external_exporters)]),
+}
+```
+
+**Characteristics:**
+- Cluster-scoped CRDs and resources
+- Can add external scrape targets (outside cluster)
+- Manages Alertmanager configuration
+- Requires cluster-admin privileges
+
+#### 2. Tenant-Level Monitoring
+
+For teams confined to namespaces. The topology determines tenant context.
+
+```rust
+// The topology's Observability impl handles namespace scoping
+impl Observability<KubePrometheus> for K8sAnywhereTopology {
+    async fn install_rules(&self, sender: &KubePrometheus, ...) {
+        // Topology knows if it's tenant-scoped
+        let namespace = self.get_tenant_config().await
+            .map(|t| t.name)
+            .unwrap_or("default");
+        // Install rules in tenant namespace
+    }
+}
+```
+
+**Characteristics:**
+- Namespace-scoped resources
+- Cannot modify cluster-level monitoring config
+- May have restricted receiver types
+- Runtime validation of permissions (cannot be fully compile-time)
+
+#### 3. Application-Level Monitoring
+
+For developers. Zero-config, opinionated monitoring.
+
+```rust
+// modules/application/features/monitoring.rs
+pub struct Monitoring {
+    pub application: Arc<dyn Application>,
+    pub alert_receiver: Vec<Box<dyn AlertReceiver<Prometheus>>>,
+}
+
+impl<T: Topology + Observability<Prometheus> + TenantManager + ...> 
+    ApplicationFeature<T> for Monitoring 
+{
+    async fn ensure_installed(&self, topology: &T) -> Result<...> {
+        // Auto-creates ServiceMonitor
+        // Auto-installs Ntfy for notifications
+        // Handles tenant namespace automatically
+        // Wires up sensible defaults
+    }
+}
+```
+
+**Characteristics:**
+- Automatic ServiceMonitor creation
+- Opinionated notification channel (Ntfy)
+- Tenant-aware via topology
+- Minimal configuration required
+
+## Rationale
+
+### Why Generic Traits Instead of Unified Types?
+
+Each monitoring stack (OKD, KubePrometheus, RHOB) has fundamentally different CRDs:
+
+```rust
+// OKD uses AlertmanagerConfig with different structure
+AlertmanagerConfig { spec: { receivers: [...] } }
+
+// RHOB uses secret references for webhook URLs
+MonitoringStack { spec: { alertmanagerConfig: { discordConfigs: [{ apiURL: { key: "..." } }] } } }
+
+// KubePrometheus uses Alertmanager CRD with different field names
+Alertmanager { spec: { config: { receivers: [...] } } }
+```
+
+A unified type would either:
+1. Be a lowest-common-denominator (loses stack-specific features)
+2. Be a complex union type (hard to use, easy to misconfigure)
+
+Generic traits let each stack express its configuration naturally while providing a consistent interface.
+
+### Why Compile-Time Capability Bounds?
+
+```rust
+impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T> 
+    for OpenshiftClusterAlertScore { ... }
+```
+
+This fails at compile time if you try to use `OpenshiftClusterAlertScore` with a topology that doesn't support OKD monitoring. This prevents the "config-is-valid-but-platform-is-wrong" errors that Harmony was designed to eliminate.
+
+### Why Not a MonitoringStack Abstraction (V2 Approach)?
+
+The V2 approach proposed a unified `MonitoringStack` that hides sender selection:
+
+```rust
+// V2 approach - rejected
+MonitoringStack::new(MonitoringApiVersion::V2CRD)
+    .add_alert_channel(discord)
+```
+
+**Problems:**
+1. Hides which sender you're using, losing compile-time guarantees
+2. "Version selection" actually chooses between fundamentally different systems
+3. Would need to handle all stack-specific features through a generic interface
+
+The current approach is explicit: you choose `OpenshiftClusterAlertSender` and the compiler verifies compatibility.
+
+### Why Runtime Validation for Tenants?
+
+Tenant confinement is determined at runtime by the topology and K8s RBAC. We cannot know at compile time whether a user has cluster-admin or namespace-only access.
+
+Options considered:
+1. **Compile-time tenant markers** - Would require modeling entire RBAC hierarchy in types. Over-engineering.
+2. **Runtime validation** - Current approach. Fails with clear K8s permission errors if insufficient access.
+3. **No tenant support** - Would exclude a major use case.
+
+Runtime validation is the pragmatic choice. The failure mode is clear (K8s API error) and occurs early in execution.
+
+> Note : we will eventually have compile time validation for such things. Rust macros are powerful and we could discover the actual capabilities we're dealing with, similar to sqlx approach in query! macros.
+
+## Consequences
+
+### Pros
+
+1. **Type Safety**: Invalid configurations are caught at compile time
+2. **Extensibility**: Adding a new monitoring stack requires implementing traits, not modifying core code
+3. **Clear Separation**: Cluster/Tenant/Application levels have distinct entry points
+4. **Reusable Rules**: Pre-built alert rules as functions (`high_pvc_fill_rate_over_two_days()`)
+5. **CRD Accuracy**: Type definitions match actual Kubernetes CRDs exactly
+
+### Cons
+
+1. **Implementation Explosion**: `DiscordReceiver` implements `AlertReceiver<S>` for each sender type (3+ implementations)
+2. **Learning Curve**: Understanding the trait hierarchy takes time
+3. **clone_box Boilerplate**: Required for trait object cloning (3 lines per impl)
+
+### Mitigations
+
+- Implementation explosion is contained: each receiver type has O(senders) implementations, but receivers are rare compared to rules
+- Learning curve is documented with examples at each level
+- clone_box boilerplate is minimal and copy-paste
+
+## Alternatives Considered
+
+### Unified MonitoringStack Type
+
+See "Why Not a MonitoringStack Abstraction" above. Rejected for losing compile-time safety.
+
+### Helm-Only Approach
+
+Use `HelmScore` directly for each monitoring deployment. Rejected because:
+- No type safety for alert rules
+- Cannot compose with application features
+- No tenant awareness
+
+### Separate Modules Per Use Case
+
+Have `cluster_monitoring/`, `tenant_monitoring/`, `app_monitoring/` as separate modules. Rejected because:
+- Massive code duplication
+- No shared abstraction for receivers/rules
+- Adding a feature requires three implementations
+
+## Implementation Notes
+
+### Module Structure
+
+```
+modules/monitoring/
+├── mod.rs                     # Public exports
+├── alert_channel/             # Receivers (Discord, Webhook)
+├── alert_rule/                # Rules and pre-built alerts
+│   ├── prometheus_alert_rule.rs
+│   └── alerts/                # Library of pre-built rules
+│       ├── k8s/               # K8s-specific (pvc, pod, memory)
+│       └── infra/             # Infrastructure (opnsense, dell)
+├── okd/                       # OpenshiftClusterAlertSender
+├── kube_prometheus/           # KubePrometheus
+├── prometheus/                # Prometheus
+├── red_hat_cluster_observability/  # RHOB
+├── grafana/                   # Grafana
+├── application_monitoring/    # Application-level scores
+└── scrape_target/             # External scrape targets
+```
+
+### Adding a New Alert Sender
+
+1. Create sender type: `pub struct MySender; impl AlertSender for MySender { ... }`
+2. Implement `Observability<MySender>` for topologies that support it
+3. Create CRD types in `crd/` subdirectory
+4. Implement `AlertReceiver<MySender>` for existing receivers
+5. Implement `AlertRule<MySender>` for `AlertManagerRuleGroup`
+
+### Adding a New Alert Rule
+
+```rust
+pub fn my_custom_alert() -> PrometheusAlertRule {
+    PrometheusAlertRule::new("MyAlert", "up == 0")
+        .for_duration("5m")
+        .label("severity", "critical")
+        .annotation("summary", "Service is down")
+}
+```
+
+No trait implementation needed - `AlertManagerRuleGroup` already handles conversion.
+
+## Related ADRs
+
+- [ADR-013](013-monitoring-notifications.md): Notification channel selection (ntfy)
+- [ADR-011](011-multi-tenant-cluster.md): Multi-tenant cluster architecture
--- a/adr/020-monitoring-alerting-architecture/monitoring_v2/Cargo.toml
+++ b/adr/020-monitoring-alerting-architecture/monitoring_v2/Cargo.toml
@@ -0,0 +1,21 @@
+[package]
+name = "example-monitoring-v2"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony-k8s = { path = "../../harmony-k8s" }
+harmony_types = { path = "../../harmony_types" }
+kube = { workspace = true }
+schemars = "0.8"
+serde = { workspace = true, features = ["derive"] }
+serde_json = { workspace = true }
+serde_yaml = { workspace = true }
+url = { workspace = true }
+log = { workspace = true }
+async-trait = { workspace = true }
+k8s-openapi = { workspace = true }
--- a/adr/020-monitoring-alerting-architecture/monitoring_v2/README.md
+++ b/adr/020-monitoring-alerting-architecture/monitoring_v2/README.md
@@ -0,0 +1,91 @@
+# Monitoring v2 - Improved Architecture
+
+This example demonstrates the improved monitoring architecture that addresses the "WTF/minute" issues in the original design.
+
+## Key Improvements
+
+### 1. **Single AlertChannel Trait with Generic Sender**
+
+The original design required 9-12 implementations for each alert channel (Discord, Webhook, etc.) - one for each sender type. The new design uses a single trait with generic sender parameterization:
+
+pub trait AlertChannel<Sender: AlertSender> {
+    async fn install_config(&self, sender: &Sender) -> Result<Outcome, InterpretError>;
+    fn name(&self) -> String;
+    fn as_any(&self) -> &dyn std::any::Any;
+}
+
+**Benefits:**
+- One Discord implementation works with all sender types
+- Type safety at compile time
+- No runtime dispatch overhead
+
+### 2. **MonitoringStack Abstraction**
+
+Instead of manually selecting CRDPrometheus vs KubePrometheus vs RHOBObservability, you now have a unified MonitoringStack that handles versioning:
+
+let monitoring_stack = MonitoringStack::new(MonitoringApiVersion::V2CRD)
+    .set_namespace("monitoring")
+    .add_alert_channel(discord_receiver)
+    .set_scrape_targets(vec![...]);
+
+**Benefits:**
+- Single source of truth for monitoring configuration
+- Easy to switch between monitoring versions
+- Automatic version-specific configuration
+
+### 3. **TenantMonitoringScore - True Composition**
+
+The original monitoring_with_tenant example just put tenant and monitoring as separate items in a vec. The new design truly composes them:
+
+let tenant_score = TenantMonitoringScore::new("test-tenant", monitoring_stack);
+
+This creates a single score that:
+- Has tenant context
+- Has monitoring configuration
+- Automatically installs monitoring scoped to tenant namespace
+
+**Benefits:**
+- No more "two separate things" confusion
+- Automatic tenant namespace scoping
+- Clear ownership: tenant owns its monitoring
+
+### 4. **Versioned Monitoring APIs**
+
+Clear versioning makes it obvious which monitoring stack you're using:
+
+pub enum MonitoringApiVersion {
+    V1Helm,    // Old Helm charts
+    V2CRD,     // Current CRDs
+    V3RHOB,    // RHOB (future)
+}
+
+**Benefits:**
+- No guessing which API version you're using
+- Easy to migrate between versions
+- Backward compatibility path
+
+## Comparison
+
+### Original Design (monitoring_with_tenant)
+- Manual selection of each component
+- Manual installation of both components
+- Need to remember to pass both to harmony_cli::run
+- Monitoring not scoped to tenant automatically
+
+### New Design (monitoring_v2)
+- Single composed score
+- One score does it all
+
+## Usage
+
+cd examples/monitoring_v2
+cargo run
+
+## Migration Path
+
+To migrate from the old design to the new:
+
+1. Replace individual alert channel implementations with AlertChannel<Sender>
+2. Use MonitoringStack instead of manual *Prometheus selection
+3. Use TenantMonitoringScore instead of separate TenantScore + monitoring scores
+4. Select monitoring version via MonitoringApiVersion
--- a/adr/020-monitoring-alerting-architecture/monitoring_v2/src/lib.rs
+++ b/adr/020-monitoring-alerting-architecture/monitoring_v2/src/lib.rs
@@ -0,0 +1,343 @@
+use std::collections::HashMap;
+use std::sync::{Arc, Mutex};
+
+
+use log::debug;
+use serde::{Deserialize, Serialize};
+use serde_yaml::{Mapping, Value};
+
+use harmony::data::Version;
+use harmony::interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome};
+use harmony::inventory::Inventory;
+use harmony::score::Score;
+use harmony::topology::{Topology, tenant::TenantManager};
+
+use harmony_k8s::K8sClient;
+use harmony_types::k8s_name::K8sName;
+use harmony_types::net::Url;
+
+pub trait AlertSender: Send + Sync + std::fmt::Debug {
+    fn name(&self) -> String;
+    fn namespace(&self) -> String;
+}
+
+#[derive(Debug)]
+pub struct CRDPrometheus {
+    pub namespace: String,
+    pub client: Arc<K8sClient>,
+}
+
+impl AlertSender for CRDPrometheus {
+    fn name(&self) -> String {
+        "CRDPrometheus".to_string()
+    }
+
+    fn namespace(&self) -> String {
+        self.namespace.clone()
+    }
+}
+
+#[derive(Debug)]
+pub struct RHOBObservability {
+    pub namespace: String,
+    pub client: Arc<K8sClient>,
+}
+
+impl AlertSender for RHOBObservability {
+    fn name(&self) -> String {
+        "RHOBObservability".to_string()
+    }
+
+    fn namespace(&self) -> String {
+        self.namespace.clone()
+    }
+}
+
+#[derive(Debug)]
+pub struct KubePrometheus {
+    pub config: Arc<Mutex<KubePrometheusConfig>>,
+}
+
+impl Default for KubePrometheus {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl KubePrometheus {
+    pub fn new() -> Self {
+        Self {
+            config: Arc::new(Mutex::new(KubePrometheusConfig::new())),
+        }
+    }
+}
+
+impl AlertSender for KubePrometheus {
+    fn name(&self) -> String {
+        "KubePrometheus".to_string()
+    }
+
+    fn namespace(&self) -> String {
+        self.config.lock().unwrap().namespace.clone().unwrap_or_else(|| "monitoring".to_string())
+    }
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct KubePrometheusConfig {
+    pub namespace: Option<String>,
+    #[serde(skip)]
+    pub alert_receiver_configs: Vec<AlertManagerChannelConfig>,
+}
+
+impl KubePrometheusConfig {
+    pub fn new() -> Self {
+        Self {
+            namespace: None,
+            alert_receiver_configs: Vec::new(),
+        }
+    }
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct AlertManagerChannelConfig {
+    pub channel_receiver: serde_yaml::Value,
+    pub channel_route: serde_yaml::Value,
+}
+
+impl Default for AlertManagerChannelConfig {
+    fn default() -> Self {
+        Self {
+            channel_receiver: serde_yaml::Value::Mapping(Default::default()),
+            channel_route: serde_yaml::Value::Mapping(Default::default()),
+        }
+    }
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ScrapeTargetConfig {
+    pub service_name: String,
+    pub port: String,
+    pub path: String,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub enum MonitoringApiVersion {
+    V1Helm,
+    V2CRD,
+    V3RHOB,
+}
+
+#[derive(Debug, Clone)]
+pub struct MonitoringStack {
+    pub version: MonitoringApiVersion,
+    pub namespace: String,
+    pub alert_channels: Vec<Arc<dyn AlertSender>>,
+    pub scrape_targets: Vec<ScrapeTargetConfig>,
+}
+
+impl MonitoringStack {
+    pub fn new(version: MonitoringApiVersion) -> Self {
+        Self {
+            version,
+            namespace: "monitoring".to_string(),
+            alert_channels: Vec::new(),
+            scrape_targets: Vec::new(),
+        }
+    }
+
+    pub fn set_namespace(mut self, namespace: &str) -> Self {
+        self.namespace = namespace.to_string();
+        self
+    }
+
+    pub fn add_alert_channel(mut self, channel: impl AlertSender + 'static) -> Self {
+        self.alert_channels.push(Arc::new(channel));
+        self
+    }
+
+    pub fn set_scrape_targets(mut self, targets: Vec<(&str, &str, String)>) -> Self {
+        self.scrape_targets = targets
+            .into_iter()
+            .map(|(name, port, path)| ScrapeTargetConfig {
+                service_name: name.to_string(),
+                port: port.to_string(),
+                path,
+            })
+            .collect();
+        self
+    }
+}
+
+pub trait AlertChannel<Sender: AlertSender> {
+    fn install_config(&self, sender: &Sender);
+    fn name(&self) -> String;
+}
+
+#[derive(Debug, Clone)]
+pub struct DiscordWebhook {
+    pub name: K8sName,
+    pub url: Url,
+    pub selectors: Vec<HashMap<String, String>>,
+}
+
+impl DiscordWebhook {
+    fn get_config(&self) -> AlertManagerChannelConfig {
+        let mut route = Mapping::new();
+        route.insert(
+            Value::String("receiver".to_string()),
+            Value::String(self.name.to_string()),
+        );
+        route.insert(
+            Value::String("matchers".to_string()),
+            Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
+        );
+
+        let mut receiver = Mapping::new();
+        receiver.insert(
+            Value::String("name".to_string()),
+            Value::String(self.name.to_string()),
+        );
+
+        let mut discord_config = Mapping::new();
+        discord_config.insert(
+            Value::String("webhook_url".to_string()),
+            Value::String(self.url.to_string()),
+        );
+
+        receiver.insert(
+            Value::String("discord_configs".to_string()),
+            Value::Sequence(vec![Value::Mapping(discord_config)]),
+        );
+
+        AlertManagerChannelConfig {
+            channel_receiver: Value::Mapping(receiver),
+            channel_route: Value::Mapping(route),
+        }
+    }
+}
+
+impl AlertChannel<CRDPrometheus> for DiscordWebhook {
+    fn install_config(&self, sender: &CRDPrometheus) {
+        debug!("Installing Discord webhook for CRDPrometheus in namespace: {}", sender.namespace());
+        debug!("Config: {:?}", self.get_config());
+        debug!("Installed!");
+    }
+
+    fn name(&self) -> String {
+        "discord-webhook".to_string()
+    }
+}
+
+impl AlertChannel<RHOBObservability> for DiscordWebhook {
+    fn install_config(&self, sender: &RHOBObservability) {
+        debug!("Installing Discord webhook for RHOBObservability in namespace: {}", sender.namespace());
+        debug!("Config: {:?}", self.get_config());
+        debug!("Installed!");
+    }
+
+    fn name(&self) -> String {
+        "webhook-receiver".to_string()
+    }
+}
+
+impl AlertChannel<KubePrometheus> for DiscordWebhook {
+    fn install_config(&self, sender: &KubePrometheus) {
+        debug!("Installing Discord webhook for KubePrometheus in namespace: {}", sender.namespace());
+        let config = sender.config.lock().unwrap();
+        let ns = config.namespace.clone().unwrap_or_else(|| "monitoring".to_string());
+        debug!("Namespace: {}", ns);
+        let mut config = sender.config.lock().unwrap();
+        config.alert_receiver_configs.push(self.get_config());
+        debug!("Installed!");
+    }
+
+    fn name(&self) -> String {
+        "discord-webhook".to_string()
+    }
+}
+
+fn default_monitoring_stack() -> MonitoringStack {
+    MonitoringStack::new(MonitoringApiVersion::V2CRD)
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct TenantMonitoringScore {
+    pub tenant_id: harmony_types::id::Id,
+    pub tenant_name: String,
+    #[serde(skip)]
+    #[serde(default = "default_monitoring_stack")]
+    pub monitoring_stack: MonitoringStack,
+}
+
+impl TenantMonitoringScore {
+    pub fn new(tenant_name: &str, monitoring_stack: MonitoringStack) -> Self {
+        Self {
+            tenant_id: harmony_types::id::Id::default(),
+            tenant_name: tenant_name.to_string(),
+            monitoring_stack,
+        }
+    }
+}
+
+impl<T: Topology + TenantManager> Score<T> for TenantMonitoringScore {
+    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
+        Box::new(TenantMonitoringInterpret {
+            score: self.clone(),
+        })
+    }
+
+    fn name(&self) -> String {
+        format!("{} monitoring [TenantMonitoringScore]", self.tenant_name)
+    }
+}
+
+#[derive(Debug)]
+pub struct TenantMonitoringInterpret {
+    pub score: TenantMonitoringScore,
+}
+
+#[async_trait::async_trait]
+impl<T: Topology + TenantManager> Interpret<T> for TenantMonitoringInterpret {
+    async fn execute(
+        &self,
+        _inventory: &Inventory,
+        topology: &T,
+    ) -> Result<Outcome, InterpretError> {
+        let tenant_config = topology.get_tenant_config().await.unwrap();
+        let tenant_ns = tenant_config.name.clone();
+
+        match self.score.monitoring_stack.version {
+            MonitoringApiVersion::V1Helm => {
+                debug!("Installing Helm monitoring for tenant {}", tenant_ns);
+            }
+            MonitoringApiVersion::V2CRD => {
+                debug!("Installing CRD monitoring for tenant {}", tenant_ns);
+            }
+            MonitoringApiVersion::V3RHOB => {
+                debug!("Installing RHOB monitoring for tenant {}", tenant_ns);
+            }
+        }
+
+        Ok(Outcome::success(format!(
+            "Installed monitoring stack for tenant {} with version {:?}",
+            self.score.tenant_name,
+            self.score.monitoring_stack.version
+        )))
+    }
+
+    fn get_name(&self) -> InterpretName {
+        InterpretName::Custom("TenantMonitoringInterpret")
+    }
+
+    fn get_version(&self) -> Version {
+        Version::from("1.0.0").unwrap()
+    }
+
+    fn get_status(&self) -> InterpretStatus {
+        InterpretStatus::SUCCESS
+    }
+
+    fn get_children(&self) -> Vec<harmony_types::id::Id> {
+        Vec::new()
+    }
+}
--- a/brocade/src/fast_iron.rs
+++ b/brocade/src/fast_iron.rs
@@ -1,8 +1,7 @@
 use super::BrocadeClient;
 use crate::{
    BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo, MacAddressEntry,
-    PortChannelId, PortOperatingMode, SecurityLevel, parse_brocade_mac_address,
-    shell::BrocadeShell,
+    PortChannelId, PortOperatingMode, parse_brocade_mac_address, shell::BrocadeShell,
 };

 use async_trait::async_trait;
--- a/brocade/src/network_operating_system.rs
+++ b/brocade/src/network_operating_system.rs
@@ -8,7 +8,7 @@ use regex::Regex;
 use crate::{
    BrocadeClient, BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo,
    InterfaceStatus, InterfaceType, MacAddressEntry, PortChannelId, PortOperatingMode,
-    SecurityLevel, parse_brocade_mac_address, shell::BrocadeShell,
+    parse_brocade_mac_address, shell::BrocadeShell,
 };

 #[derive(Debug)]
--- a/docs/README.md
+++ b/docs/README.md
@@ -31,3 +31,16 @@ Ready to build your own components? These guides show you how.
 - [**Writing a Score**](./guides/writing-a-score.md): Learn how to create your own `Score` and `Interpret` logic to define a new desired state.
 - [**Writing a Topology**](./guides/writing-a-topology.md): Learn how to model a new environment (like AWS, GCP, or custom hardware) as a `Topology`.
 - [**Adding Capabilities**](./guides/adding-capabilities.md): See how to add a `Capability` to your custom `Topology`.
+- [**Coding Guide**](./coding-guide.md): Conventions and best practices for writing Harmony code.
+
+## 5. Module Documentation
+
+Deep dives into specific Harmony modules and features.
+
+- [**Monitoring and Alerting**](./monitoring.md): Comprehensive guide to cluster, tenant, and application-level monitoring with support for OKD, KubePrometheus, RHOB, and more.
+
+## 6. Architecture Decision Records
+
+Important architectural decisions are documented in the `adr/` directory:
+
+- [Full ADR Index](../adr/)
--- a/docs/coding-guide.md
+++ b/docs/coding-guide.md
@@ -0,0 +1,299 @@
+# Harmony Coding Guide
+
+Harmony is an infrastructure automation framework. It is **code-first and code-only**: operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Good code here means a good operator experience.
+
+### Concrete context
+
+We use here the context of the KVM module to explain the coding style. This will make it very easy to understand and should translate quite well to other modules/contexts managed by Harmony like OPNSense and Kubernetes.
+
+## Core Philosophy
+
+### The Careful Craftsman Principle
+
+Harmony is a powerful framework that does a lot. With that power comes responsibility. Every abstraction, every trait, every module must earn its place. Before adding anything, ask:
+
+1. **Does this solve a real problem users have?** Not a theoretical problem, an actual one encountered in production.
+2. **Is this the simplest solution that works?** Complexity is a cost that compounds over time.
+3. **Will this make the next developer's life easier or harder?** Code is read far more often than written.
+
+When in doubt, don't abstract. Wait for the pattern to emerge from real usage. A little duplication is better than the wrong abstraction.
+
+### High-level functions over raw primitives
+
+Callers should not need to know about underlying protocols, XML schemas, or API quirks. A function that deploys a VM should accept meaningful parameters like CPU count, memory, and network name — not XML strings.
+
+```rust
+// Bad: caller constructs XML and passes it to a thin wrapper
+let xml = format!(r#"<domain type='kvm'>...</domain>"#, name, memory_kb, ...);
+executor.create_vm(&xml).await?;
+
+// Good: caller describes intent, the module handles representation
+executor.define_vm(&VmConfig::builder("my-vm")
+    .cpu(4)
+    .memory_gb(8)
+    .disk(DiskConfig::new(50))
+    .network(NetworkRef::named("mylan"))
+    .boot_order([BootDevice::Network, BootDevice::Disk])
+    .build())
+    .await?;
+```
+
+The module owns the XML, the virsh invocations, the API calls — not the caller.
+
+### Use the right abstraction layer
+
+Prefer native library bindings over shelling out to CLI tools. The `virt` crate provides direct libvirt bindings and should be used instead of spawning `virsh` subprocesses.
+
+- CLI subprocess calls are fragile: stdout/stderr parsing, exit codes, quoting, PATH differences
+- Native bindings give typed errors, no temp files, no shell escaping
+- `virt::connect::Connect` opens a connection; `virt::domain::Domain` manages VMs; `virt::network::Network` manages virtual networks
+
+### Keep functions small and well-named
+
+Each function should do one thing. If a function is doing two conceptually separate things, split it. Function names should read like plain English: `ensure_network_active`, `define_vm`, `vm_is_running`.
+
+### Prefer short modules over large files
+
+Group related types and functions by concept. A module that handles one resource (e.g., network, domain, storage) is better than a single file for everything.
+
+---
+
+## Error Handling
+
+### Use `thiserror` for all error types
+
+Define error types with `thiserror::Error`. This removes the boilerplate of implementing `Display` and `std::error::Error` by hand, keeps error messages close to their variants, and makes types easy to extend.
+
+```rust
+// Bad: hand-rolled Display + std::error::Error
+#[derive(Debug)]
+pub enum KVMError {
+    ConnectionError(String),
+    VMNotFound(String),
+}
+
+impl std::fmt::Display for KVMError { ... }
+impl std::error::Error for KVMError {}
+
+// Good: derive Display via thiserror
+#[derive(thiserror::Error, Debug)]
+pub enum KVMError {
+    #[error("connection failed: {0}")]
+    ConnectionFailed(String),
+    #[error("VM not found: {name}")]
+    VmNotFound { name: String },
+}
+```
+
+### Make bubbling errors easy with `?` and `From`
+
+`?` works on any error type for which there is a `From` impl. Add `From` conversions from lower-level errors into your module's error type so callers can use `?` without boilerplate.
+
+With `thiserror`, wrapping a foreign error is one line:
+
+```rust
+#[derive(thiserror::Error, Debug)]
+pub enum KVMError {
+    #[error("libvirt error: {0}")]
+    Libvirt(#[from] virt::error::Error),
+
+    #[error("IO error: {0}")]
+    Io(#[from] std::io::Error),
+}
+```
+
+This means a call that returns `virt::error::Error` can be `?`-propagated into a `Result<_, KVMError>` without any `.map_err(...)`.
+
+### Typed errors over stringly-typed errors
+
+Avoid `Box<dyn Error>` or `String` as error return types in library code. Callers need to distinguish errors programmatically — `KVMError::VmAlreadyExists` is actionable, `"VM already exists: foo"` as a `String` is not.
+
+At binary entry points (e.g., `main`) it is acceptable to convert to `String` or `anyhow::Error` for display.
+
+---
+
+## Logging
+
+### Use the `log` crate macros
+
+All log output must go through the `log` crate. Never use `println!`, `eprintln!`, or `dbg!` in library code. This makes output compatible with any logging backend (env_logger, tracing, structured logging, etc.).
+
+```rust
+// Bad
+println!("Creating VM: {}", name);
+
+// Good
+use log::{info, debug, warn};
+info!("Creating VM: {name}");
+debug!("VM XML:\n{xml}");
+warn!("Network already active, skipping creation");
+```
+
+Use the right level:
+
+| Level   | When to use |
+|---------|-------------|
+| `error` | Unrecoverable failures (before returning Err) |
+| `warn`  | Recoverable issues, skipped steps |
+| `info`  | High-level progress events visible in normal operation |
+| `debug` | Detailed operational info useful for debugging |
+| `trace` | Very granular, per-iteration or per-call data |
+
+Log before significant operations and after unexpected conditions. Do not log inside tight loops at `info` level.
+
+---
+
+## Types and Builders
+
+### Derive `Serialize` on all public domain types
+
+All public structs and enums that represent configuration or state should derive `serde::Serialize`. Add `Deserialize` when round-trip serialization is needed.
+
+### Builder pattern for complex configs
+
+When a type has more than three fields or optional fields, provide a builder. The builder pattern allows named, incremental construction without positional arguments.
+
+```rust
+let config = VmConfig::builder("bootstrap")
+    .cpu(4)
+    .memory_gb(8)
+    .disk(DiskConfig::new(50).labeled("os"))
+    .disk(DiskConfig::new(100).labeled("data"))
+    .network(NetworkRef::named("harmonylan"))
+    .boot_order([BootDevice::Network, BootDevice::Disk])
+    .build();
+```
+
+### Avoid `pub` fields on config structs
+
+Expose data through methods or the builder, not raw field access. This preserves the ability to validate, rename, or change representation without breaking callers.
+
+---
+
+## Async
+
+### Use `tokio` for all async runtime needs
+
+All async code runs on tokio. Use `tokio::spawn`, `tokio::time`, etc. Use `#[async_trait]` for traits with async methods.
+
+### No blocking in async context
+
+Never call blocking I/O (file I/O, network, process spawn) directly in an async function. Use `tokio::fs`, `tokio::process`, or `tokio::task::spawn_blocking` as appropriate.
+
+---
+
+## Module Structure
+
+### Follow the `Score` / `Interpret` pattern
+
+Modules that represent deployable infrastructure should implement `Score<T: Topology>` and `Interpret<T>`:
+
+- `Score` is the serializable, clonable configuration declaring *what* to deploy
+- `Interpret` does the actual work when `execute()` is called
+
+```rust
+pub struct KvmScore {
+    network: NetworkConfig,
+    vms: Vec<VmConfig>,
+}
+
+impl<T: Topology + KvmHost> Score<T> for KvmScore {
+    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
+        Box::new(KvmInterpret::new(self.clone()))
+    }
+    fn name(&self) -> String { "KvmScore".to_string() }
+}
+```
+
+### Flatten the public API in `mod.rs`
+
+Internal submodules are implementation detail. Re-export what callers need at the module root:
+
+```rust
+// modules/kvm/mod.rs
+mod connection;
+mod domain;
+mod network;
+mod error;
+mod xml;
+
+pub use connection::KvmConnection;
+pub use domain::{VmConfig, VmConfigBuilder, VmStatus, DiskConfig, BootDevice};
+pub use error::KvmError;
+pub use network::NetworkConfig;
+```
+
+---
+
+## Commit Style
+
+Follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/):
+
+```
+feat(kvm): add network isolation support
+fix(kvm): correct memory unit conversion for libvirt
+refactor(kvm): replace virsh subprocess calls with virt crate bindings
+docs: add coding guide
+```
+
+Keep pull requests small and single-purpose (under ~200 lines excluding generated code). Do not mix refactoring, bug fixes, and new features in one PR.
+
+---
+
+## When to Add Abstractions
+
+Harmony provides powerful abstraction mechanisms: traits, generics, the Score/Interpret pattern, and capabilities. Use them judiciously.
+
+### Add an abstraction when:
+
+- **You have three or more concrete implementations** doing the same thing. Two is often coincidence; three is a pattern.
+- **The abstraction provides compile-time safety** that prevents real bugs (e.g., capability bounds on topologies).
+- **The abstraction hides genuine complexity** that callers shouldn't need to understand (e.g., XML schema generation for libvirt).
+
+### Don't add an abstraction when:
+
+- **It's just to avoid a few lines of boilerplate**. Copy-paste is sometimes better than a trait hierarchy.
+- **You're anticipating future flexibility** that isn't needed today. YAGNI (You Aren't Gonna Need It).
+- **The abstraction makes the code harder to understand** for someone unfamiliar with the codebase.
+- **You're wrapping a single implementation**. A trait with one implementation is usually over-engineering.
+
+### Signs you've over-abstracted:
+
+- You need to explain the type system to a competent Rust developer for them to understand how to add a simple feature.
+- Adding a new concrete type requires changes in multiple trait definitions.
+- The word "factory" or "manager" appears in your type names.
+- You have more trait definitions than concrete implementations.
+
+### The Rule of Three for Traits
+
+Before creating a new trait, ensure you have:
+
+1. A clear, real use case (not hypothetical)
+2. At least one concrete implementation
+3. A plan for how callers will use it
+
+Only generalize when the pattern is proven. The monitoring module is a good example: we had multiple alert senders (OKD, KubePrometheus, RHOB) before we introduced the `AlertSender` and `AlertReceiver<S>` traits. The traits emerged from real needs, not design sessions.
+
+---
+
+## Documentation
+
+### Document the "why", not the "what"
+
+Code should be self-explanatory for the "what". Comments and documentation should explain intent, rationale, and gotchas.
+
+```rust
+// Bad: restates the code
+// Returns the number of VMs
+fn vm_count(&self) -> usize { self.vms.len() }
+
+// Good: explains the why
+// Returns 0 if connection is lost, rather than erroring,
+// because monitoring code uses this for health checks
+fn vm_count(&self) -> usize { self.vms.len() }
+```
+
+### Keep examples in the `examples/` directory
+
+Working code beats documentation. Every major feature should have a runnable example that demonstrates real usage.
+
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -0,0 +1,443 @@
+# Monitoring and Alerting in Harmony
+
+Harmony provides a unified, type-safe approach to monitoring and alerting across Kubernetes, OpenShift, and bare-metal infrastructure. This guide explains the architecture and how to use it at different levels of abstraction.
+
+## Overview
+
+Harmony's monitoring module supports three distinct use cases:
+
+| Level | Who Uses It | What It Provides |
+|-------|-------------|------------------|
+| **Cluster** | Cluster administrators | Full control over monitoring stack, cluster-wide alerts, external scrape targets |
+| **Tenant** | Platform teams | Namespace-scoped monitoring in multi-tenant environments |
+| **Application** | Application developers | Zero-config monitoring that "just works" |
+
+Each level builds on the same underlying abstractions, ensuring consistency while providing appropriate complexity for each audience.
+
+## Core Concepts
+
+### AlertSender
+
+An `AlertSender` represents the system that evaluates alert rules and sends notifications. Harmony supports multiple monitoring stacks:
+
+| Sender | Description | Use When |
+|--------|-------------|----------|
+| `OpenshiftClusterAlertSender` | OKD/OpenShift built-in monitoring | Running on OKD/OpenShift |
+| `KubePrometheus` | kube-prometheus-stack via Helm | Standard Kubernetes, need full stack |
+| `Prometheus` | Standalone Prometheus | Custom Prometheus deployment |
+| `RedHatClusterObservability` | RHOB operator | Red Hat managed clusters |
+| `Grafana` | Grafana-managed alerting | Grafana as primary alerting layer |
+
+### AlertReceiver
+
+An `AlertReceiver` defines where alerts are sent (Discord, Slack, email, webhook, etc.). Receivers are parameterized by sender type because each monitoring stack has different configuration formats.
+
+```rust
+pub trait AlertReceiver<S: AlertSender> {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
+    fn name(&self) -> String;
+}
+```
+
+Built-in receivers:
+- `DiscordReceiver` - Discord webhooks
+- `WebhookReceiver` - Generic HTTP webhooks
+
+### AlertRule
+
+An `AlertRule` defines a Prometheus alert expression. Rules are also parameterized by sender to handle different CRD formats.
+
+```rust
+pub trait AlertRule<S: AlertSender> {
+    fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
+    fn name(&self) -> String;
+}
+```
+
+### Observability Capability
+
+Topologies implement `Observability<S>` to indicate they support a specific alert sender:
+
+```rust
+impl Observability<OpenshiftClusterAlertSender> for K8sAnywhereTopology {
+    async fn install_receivers(&self, sender, inventory, receivers) { ... }
+    async fn install_rules(&self, sender, inventory, rules) { ... }
+    // ...
+}
+```
+
+This provides **compile-time verification**: if you try to use `OpenshiftClusterAlertScore` with a topology that doesn't implement `Observability<OpenshiftClusterAlertSender>`, the code won't compile.
+
+---
+
+## Level 1: Cluster Monitoring
+
+Cluster monitoring is for administrators who need full control over the monitoring infrastructure. This includes:
+- Installing/managing the monitoring stack
+- Configuring cluster-wide alert receivers
+- Defining cluster-level alert rules
+- Adding external scrape targets (e.g., bare-metal servers, firewalls)
+
+### Example: OKD Cluster Alerts
+
+```rust
+use harmony::{
+    modules::monitoring::{
+        alert_channel::discord_alert_channel::DiscordReceiver,
+        alert_rule::{alerts::k8s::pvc::high_pvc_fill_rate_over_two_days, prometheus_alert_rule::AlertManagerRuleGroup},
+        okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
+        scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
+    },
+    topology::{K8sAnywhereTopology, monitoring::{AlertMatcher, AlertRoute, MatchOp}},
+};
+
+let severity_matcher = AlertMatcher {
+    label: "severity".to_string(),
+    operator: MatchOp::Eq,
+    value: "critical".to_string(),
+};
+
+let rule_group = AlertManagerRuleGroup::new(
+    "cluster-rules",
+    vec![high_pvc_fill_rate_over_two_days()],
+);
+
+let external_exporter = PrometheusNodeExporter {
+    job_name: "firewall".to_string(),
+    metrics_path: "/metrics".to_string(),
+    listen_address: ip!("192.168.1.1"),
+    port: 9100,
+    ..Default::default()
+};
+
+harmony_cli::run(
+    Inventory::autoload(),
+    K8sAnywhereTopology::from_env(),
+    vec![Box::new(OpenshiftClusterAlertScore {
+        sender: OpenshiftClusterAlertSender,
+        receivers: vec![Box::new(DiscordReceiver {
+            name: "critical-alerts".to_string(),
+            url: hurl!("https://discord.com/api/webhooks/..."),
+            route: AlertRoute {
+                matchers: vec![severity_matcher],
+                ..AlertRoute::default("critical-alerts".to_string())
+            },
+        })],
+        rules: vec![Box::new(rule_group)],
+        scrape_targets: Some(vec![Box::new(external_exporter)]),
+    })],
+    None,
+).await?;
+```
+
+### What This Does
+
+1. **Enables cluster monitoring** - Activates OKD's built-in Prometheus
+2. **Enables user workload monitoring** - Allows namespace-scoped rules
+3. **Configures Alertmanager** - Adds Discord receiver with route matching
+4. **Deploys alert rules** - Creates `AlertingRule` CRD with PVC fill rate alert
+5. **Adds external scrape target** - Configures Prometheus to scrape the firewall
+
+### Compile-Time Safety
+
+The `OpenshiftClusterAlertScore` requires:
+
+```rust
+impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T>
+    for OpenshiftClusterAlertScore
+```
+
+If `K8sAnywhereTopology` didn't implement `Observability<OpenshiftClusterAlertSender>`, this code would fail to compile. You cannot accidentally deploy OKD alerts to a cluster that doesn't support them.
+
+---
+
+## Level 2: Tenant Monitoring
+
+In multi-tenant clusters, teams are often confined to specific namespaces. Tenant monitoring adapts to this constraint:
+
+- Resources are deployed in the tenant's namespace
+- Cannot modify cluster-level monitoring configuration
+- The topology determines namespace context at runtime
+
+### How It Works
+
+The topology's `Observability` implementation handles tenant scoping:
+
+```rust
+impl Observability<KubePrometheus> for K8sAnywhereTopology {
+    async fn install_rules(&self, sender, inventory, rules) {
+        // Topology knows if it's tenant-scoped
+        let namespace = self.get_tenant_config().await
+            .map(|t| t.name)
+            .unwrap_or_else(|| "monitoring".to_string());
+        
+        // Rules are installed in the appropriate namespace
+        for rule in rules.unwrap_or_default() {
+            let score = KubePrometheusRuleScore {
+                sender: sender.clone(),
+                rule,
+                namespace: namespace.clone(), // Tenant namespace
+            };
+            score.create_interpret().execute(inventory, self).await?;
+        }
+    }
+}
+```
+
+### Tenant vs Cluster Resources
+
+| Resource | Cluster-Level | Tenant-Level |
+|----------|---------------|--------------|
+| Alertmanager config | Global receivers | Namespaced receivers (where supported) |
+| PrometheusRules | Cluster-wide alerts | Namespace alerts only |
+| ServiceMonitors | Any namespace | Own namespace only |
+| External scrape targets | Can add | Cannot add (cluster config) |
+
+### Runtime Validation
+
+Tenant constraints are validated at runtime via Kubernetes RBAC. If a tenant-scoped deployment attempts cluster-level operations, it fails with a clear permission error from the Kubernetes API.
+
+This cannot be fully compile-time because tenant context is determined by who's running the code and what permissions they have—information only available at runtime.
+
+---
+
+## Level 3: Application Monitoring
+
+Application monitoring provides zero-config, opinionated monitoring for developers. Just add the `Monitoring` feature to your application and it works.
+
+### Example
+
+```rust
+use harmony::modules::{
+    application::{Application, ApplicationFeature},
+    monitoring::alert_channel::webhook_receiver::WebhookReceiver,
+};
+
+// Define your application
+let my_app = MyApplication::new();
+
+// Add monitoring as a feature
+let monitoring = Monitoring {
+    application: Arc::new(my_app),
+    alert_receiver: vec![], // Uses defaults
+};
+
+// Install with the application
+my_app.add_feature(monitoring);
+```
+
+### What Application Monitoring Provides
+
+1. **Automatic ServiceMonitor** - Creates a ServiceMonitor for your application's pods
+2. **Ntfy Notification Channel** - Auto-installs and configures Ntfy for push notifications
+3. **Tenant Awareness** - Automatically scopes to the correct namespace
+4. **Sensible Defaults** - Pre-configured alert routes and receivers
+
+### Under the Hood
+
+```rust
+impl<T: Topology + Observability<Prometheus> + TenantManager> 
+    ApplicationFeature<T> for Monitoring 
+{
+    async fn ensure_installed(&self, topology: &T) -> Result<...> {
+        // 1. Get tenant namespace (or use app name)
+        let namespace = topology.get_tenant_config().await
+            .map(|ns| ns.name.clone())
+            .unwrap_or_else(|| self.application.name());
+
+        // 2. Create ServiceMonitor for the app
+        let app_service_monitor = ServiceMonitor {
+            metadata: ObjectMeta {
+                name: Some(self.application.name()),
+                namespace: Some(namespace.clone()),
+                ..Default::default()
+            },
+            spec: ServiceMonitorSpec::default(),
+        };
+
+        // 3. Install Ntfy for notifications
+        let ntfy = NtfyScore { namespace, host };
+        ntfy.interpret(&Inventory::empty(), topology).await?;
+
+        // 4. Wire up webhook receiver to Ntfy
+        let ntfy_receiver = WebhookReceiver { ... };
+        
+        // 5. Execute monitoring score
+        alerting_score.interpret(&Inventory::empty(), topology).await?;
+    }
+}
+```
+
+---
+
+## Pre-Built Alert Rules
+
+Harmony provides a library of common alert rules in `modules/monitoring/alert_rule/alerts/`:
+
+### Kubernetes Alerts (`alerts/k8s/`)
+
+```rust
+use harmony::modules::monitoring::alert_rule::alerts::k8s::{
+    pod::pod_failed,
+    pvc::high_pvc_fill_rate_over_two_days,
+    memory_usage::alert_high_memory_usage,
+};
+
+let rules = AlertManagerRuleGroup::new("k8s-rules", vec![
+    pod_failed(),
+    high_pvc_fill_rate_over_two_days(),
+    alert_high_memory_usage(),
+]);
+```
+
+Available rules:
+- `pod_failed()` - Pod in failed state
+- `alert_container_restarting()` - Container restart loop
+- `alert_pod_not_ready()` - Pod not ready for extended period
+- `high_pvc_fill_rate_over_two_days()` - PVC will fill within 2 days
+- `alert_high_memory_usage()` - Memory usage above threshold
+- `alert_high_cpu_usage()` - CPU usage above threshold
+
+### Infrastructure Alerts (`alerts/infra/`)
+
+```rust
+use harmony::modules::monitoring::alert_rule::alerts::infra::opnsense::high_http_error_rate;
+
+let rules = AlertManagerRuleGroup::new("infra-rules", vec![
+    high_http_error_rate(),
+]);
+```
+
+### Creating Custom Rules
+
+```rust
+use harmony::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
+
+pub fn my_custom_alert() -> PrometheusAlertRule {
+    PrometheusAlertRule::new("MyServiceDown", "up{job=\"my-service\"} == 0")
+        .for_duration("5m")
+        .label("severity", "critical")
+        .annotation("summary", "My service is down")
+        .annotation("description", "The my-service job has been down for more than 5 minutes")
+}
+```
+
+---
+
+## Alert Receivers
+
+### Discord Webhook
+
+```rust
+use harmony::modules::monitoring::alert_channel::discord_alert_channel::DiscordReceiver;
+use harmony::topology::monitoring::{AlertRoute, AlertMatcher, MatchOp};
+
+let discord = DiscordReceiver {
+    name: "ops-alerts".to_string(),
+    url: hurl!("https://discord.com/api/webhooks/123456/abcdef"),
+    route: AlertRoute {
+        receiver: "ops-alerts".to_string(),
+        matchers: vec![AlertMatcher {
+            label: "severity".to_string(),
+            operator: MatchOp::Eq,
+            value: "critical".to_string(),
+        }],
+        group_by: vec!["alertname".to_string()],
+        repeat_interval: Some("30m".to_string()),
+        continue_matching: false,
+        children: vec![],
+    },
+};
+```
+
+### Generic Webhook
+
+```rust
+use harmony::modules::monitoring::alert_channel::webhook_receiver::WebhookReceiver;
+
+let webhook = WebhookReceiver {
+    name: "custom-webhook".to_string(),
+    url: hurl!("https://api.example.com/alerts"),
+    route: AlertRoute::default("custom-webhook".to_string()),
+};
+```
+
+---
+
+## Adding a New Monitoring Stack
+
+To add support for a new monitoring stack:
+
+1. **Create the sender type** in `modules/monitoring/my_sender/mod.rs`:
+   ```rust
+   #[derive(Debug, Clone)]
+   pub struct MySender;
+   
+   impl AlertSender for MySender {
+       fn name(&self) -> String { "MySender".to_string() }
+   }
+   ```
+
+2. **Define CRD types** in `modules/monitoring/my_sender/crd/`:
+   ```rust
+   #[derive(CustomResource, Debug, Serialize, Deserialize, Clone)]
+   #[kube(group = "monitoring.example.com", version = "v1", kind = "MyAlertRule")]
+   pub struct MyAlertRuleSpec { ... }
+   ```
+
+3. **Implement Observability** in `domain/topology/k8s_anywhere/observability/my_sender.rs`:
+   ```rust
+   impl Observability<MySender> for K8sAnywhereTopology {
+       async fn install_receivers(&self, sender, inventory, receivers) { ... }
+       async fn install_rules(&self, sender, inventory, rules) { ... }
+       // ...
+   }
+   ```
+
+4. **Implement receiver conversions** for existing receivers:
+   ```rust
+   impl AlertReceiver<MySender> for DiscordReceiver {
+       fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
+           // Convert DiscordReceiver to MySender's format
+       }
+   }
+   ```
+
+5. **Create score types**:
+   ```rust
+   pub struct MySenderAlertScore {
+       pub sender: MySender,
+       pub receivers: Vec<Box<dyn AlertReceiver<MySender>>>,
+       pub rules: Vec<Box<dyn AlertRule<MySender>>>,
+   }
+   ```
+
+---
+
+## Architecture Principles
+
+### Type Safety Over Flexibility
+
+Each monitoring stack has distinct CRDs and configuration formats. Rather than a unified "MonitoringStack" type that loses stack-specific features, we use generic traits that provide type safety while allowing each stack to express its unique configuration.
+
+### Compile-Time Capability Verification
+
+The `Observability<S>` bound ensures you can't deploy OKD alerts to a KubePrometheus cluster. The compiler catches platform mismatches before deployment.
+
+### Explicit Over Implicit
+
+Monitoring stacks are chosen explicitly (`OpenshiftClusterAlertSender` vs `KubePrometheus`). There's no "auto-detection" that could lead to surprising behavior.
+
+### Three Levels, One Foundation
+
+Cluster, tenant, and application monitoring all use the same traits (`AlertSender`, `AlertReceiver`, `AlertRule`). The difference is in how scores are constructed and how topologies interpret them.
+
+---
+
+## Related Documentation
+
+- [ADR-020: Monitoring and Alerting Architecture](../adr/020-monitoring-alerting-architecture.md)
+- [ADR-013: Monitoring Notifications (ntfy)](../adr/013-monitoring-notifications.md)
+- [ADR-011: Multi-Tenant Cluster Architecture](../adr/011-multi-tenant-cluster.md)
+- [Coding Guide](coding-guide.md)
+- [Core Concepts](concepts.md)
--- a/examples/application_monitoring_with_tenant/src/main.rs
+++ b/examples/application_monitoring_with_tenant/src/main.rs
@@ -7,7 +7,7 @@ use harmony::{
        monitoring::alert_channel::webhook_receiver::WebhookReceiver,
        tenant::TenantScore,
    },
-    topology::{K8sAnywhereTopology, tenant::TenantConfig},
+    topology::{K8sAnywhereTopology, monitoring::AlertRoute, tenant::TenantConfig},
 };
 use harmony_types::id::Id;
 use harmony_types::net::Url;
@@ -33,9 +33,14 @@ async fn main() {
        service_port: 3000,
    });

+    let receiver_name = "sample-webhook-receiver".to_string();
+
    let webhook_receiver = WebhookReceiver {
-        name: "sample-webhook-receiver".to_string(),
+        name: receiver_name.clone(),
        url: Url::Url(url::Url::parse("https://webhook-doesnt-exist.com").unwrap()),
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let app = ApplicationScore {
--- a/examples/cert_manager/src/main.rs
+++ b/examples/cert_manager/src/main.rs
@@ -1,8 +1,8 @@
 use harmony::{
    inventory::Inventory,
    modules::cert_manager::{
-        capability::CertificateManagementConfig, score_cert_management::CertificateManagementScore,
-        score_certificate::CertificateScore, score_issuer::CertificateIssuerScore,
+        capability::CertificateManagementConfig, score_certificate::CertificateScore,
+        score_issuer::CertificateIssuerScore,
    },
    topology::K8sAnywhereTopology,
 };
--- a/examples/k8s_drain_node/Cargo.toml
+++ b/examples/k8s_drain_node/Cargo.toml
@@ -10,9 +10,10 @@ publish = false
 harmony = { path = "../../harmony" }
 harmony_cli = { path = "../../harmony_cli" }
 harmony_types = { path = "../../harmony_types" }
+harmony_macros = { path = "../../harmony_macros" }
+harmony-k8s = { path = "../../harmony-k8s" }
 cidr.workspace = true
 tokio.workspace = true
-harmony_macros = { path = "../../harmony_macros" }
 log.workspace = true
 env_logger.workspace = true
 url.workspace = true
--- a/examples/k8s_drain_node/src/main.rs
+++ b/examples/k8s_drain_node/src/main.rs
@@ -1,6 +1,6 @@
 use std::time::Duration;

-use harmony::topology::k8s::{DrainOptions, K8sClient};
+use harmony_k8s::{DrainOptions, K8sClient};
 use log::{info, trace};

 #[tokio::main]
--- a/examples/k8s_write_file_on_node/Cargo.toml
+++ b/examples/k8s_write_file_on_node/Cargo.toml
@@ -10,9 +10,10 @@ publish = false
 harmony = { path = "../../harmony" }
 harmony_cli = { path = "../../harmony_cli" }
 harmony_types = { path = "../../harmony_types" }
+harmony_macros = { path = "../../harmony_macros" }
+harmony-k8s = { path = "../../harmony-k8s" }
 cidr.workspace = true
 tokio.workspace = true
-harmony_macros = { path = "../../harmony_macros" }
 log.workspace = true
 env_logger.workspace = true
 url.workspace = true
--- a/examples/k8s_write_file_on_node/src/main.rs
+++ b/examples/k8s_write_file_on_node/src/main.rs
@@ -1,4 +1,4 @@
-use harmony::topology::k8s::{DrainOptions, K8sClient, NodeFile};
+use harmony_k8s::{K8sClient, NodeFile};
 use log::{info, trace};

 #[tokio::main]
--- a/examples/monitoring/src/main.rs
+++ b/examples/monitoring/src/main.rs
@@ -1,37 +1,45 @@
-use std::collections::HashMap;
+use std::{
+    collections::HashMap,
+    sync::{Arc, Mutex},
+};

 use harmony::{
    inventory::Inventory,
-    modules::{
-        monitoring::{
-            alert_channel::discord_alert_channel::DiscordWebhook,
-            alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
-            kube_prometheus::{
-                helm_prometheus_alert_score::HelmPrometheusAlertingScore,
-                types::{
-                    HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
-                    ServiceMonitorEndpoint,
+    modules::monitoring::{
+        alert_channel::discord_alert_channel::DiscordReceiver,
+        alert_rule::{
+            alerts::{
+                infra::dell_server::{
+                    alert_global_storage_status_critical,
+                    alert_global_storage_status_non_recoverable,
+                    global_storage_status_degraded_non_critical,
                },
+                k8s::pvc::high_pvc_fill_rate_over_two_days,
            },
+            prometheus_alert_rule::AlertManagerRuleGroup,
        },
-        prometheus::alerts::{
-            infra::dell_server::{
-                alert_global_storage_status_critical, alert_global_storage_status_non_recoverable,
-                global_storage_status_degraded_non_critical,
+        kube_prometheus::{
+            helm::config::KubePrometheusConfig,
+            kube_prometheus_alerting_score::KubePrometheusAlertingScore,
+            types::{
+                HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
+                ServiceMonitorEndpoint,
            },
-            k8s::pvc::high_pvc_fill_rate_over_two_days,
        },
    },
-    topology::K8sAnywhereTopology,
+    topology::{K8sAnywhereTopology, monitoring::AlertRoute},
 };
 use harmony_types::{k8s_name::K8sName, net::Url};

 #[tokio::main]
 async fn main() {
-    let discord_receiver = DiscordWebhook {
-        name: K8sName("test-discord".to_string()),
+    let receiver_name = "test-discord".to_string();
+    let discord_receiver = DiscordReceiver {
+        name: receiver_name.clone(),
        url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
-        selectors: vec![],
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
@@ -70,10 +78,15 @@ async fn main() {
        endpoints: vec![service_monitor_endpoint],
        ..Default::default()
    };
-    let alerting_score = HelmPrometheusAlertingScore {
+
+    let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
+
+    let alerting_score = KubePrometheusAlertingScore {
        receivers: vec![Box::new(discord_receiver)],
        rules: vec![Box::new(additional_rules), Box::new(additional_rules2)],
        service_monitors: vec![service_monitor],
+        scrape_targets: None,
+        config,
    };

    harmony_cli::run(
--- a/examples/monitoring_with_tenant/src/main.rs
+++ b/examples/monitoring_with_tenant/src/main.rs
@@ -1,24 +1,32 @@
-use std::{collections::HashMap, str::FromStr};
+use std::{
+    collections::HashMap,
+    str::FromStr,
+    sync::{Arc, Mutex},
+};

 use harmony::{
    inventory::Inventory,
    modules::{
        monitoring::{
-            alert_channel::discord_alert_channel::DiscordWebhook,
-            alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
+            alert_channel::discord_alert_channel::DiscordReceiver,
+            alert_rule::{
+                alerts::k8s::pvc::high_pvc_fill_rate_over_two_days,
+                prometheus_alert_rule::AlertManagerRuleGroup,
+            },
            kube_prometheus::{
-                helm_prometheus_alert_score::HelmPrometheusAlertingScore,
+                helm::config::KubePrometheusConfig,
+                kube_prometheus_alerting_score::KubePrometheusAlertingScore,
                types::{
                    HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
                    ServiceMonitorEndpoint,
                },
            },
        },
-        prometheus::alerts::k8s::pvc::high_pvc_fill_rate_over_two_days,
        tenant::TenantScore,
    },
    topology::{
        K8sAnywhereTopology,
+        monitoring::AlertRoute,
        tenant::{ResourceLimits, TenantConfig, TenantNetworkPolicy},
    },
 };
@@ -42,10 +50,13 @@ async fn main() {
        },
    };

-    let discord_receiver = DiscordWebhook {
-        name: K8sName("test-discord".to_string()),
+    let receiver_name = "test-discord".to_string();
+    let discord_receiver = DiscordReceiver {
+        name: receiver_name.clone(),
        url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
-        selectors: vec![],
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
@@ -74,10 +85,14 @@ async fn main() {
        ..Default::default()
    };

-    let alerting_score = HelmPrometheusAlertingScore {
+    let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
+
+    let alerting_score = KubePrometheusAlertingScore {
        receivers: vec![Box::new(discord_receiver)],
        rules: vec![Box::new(additional_rules)],
        service_monitors: vec![service_monitor],
+        scrape_targets: None,
+        config,
    };

    harmony_cli::run(
--- a/examples/node_health/Cargo.toml
+++ b/examples/node_health/Cargo.toml
@@ -0,0 +1,16 @@
+[package]
+name = "example-node-health"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+publish = false
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony_types = { path = "../../harmony_types" }
+tokio = { workspace = true }
+harmony_macros = { path = "../../harmony_macros" }
+log = { workspace = true }
+env_logger = { workspace = true }
--- a/examples/node_health/src/main.rs
+++ b/examples/node_health/src/main.rs
@@ -0,0 +1,17 @@
+use harmony::{
+    inventory::Inventory, modules::node_health::NodeHealthScore, topology::K8sAnywhereTopology,
+};
+
+#[tokio::main]
+async fn main() {
+    let node_health = NodeHealthScore {};
+
+    harmony_cli::run(
+        Inventory::autoload(),
+        K8sAnywhereTopology::from_env(),
+        vec![Box::new(node_health)],
+        None,
+    )
+    .await
+    .unwrap();
+}
--- a/examples/okd_cluster_alerts/src/main.rs
+++ b/examples/okd_cluster_alerts/src/main.rs
@@ -1,35 +1,64 @@
-use std::collections::HashMap;
-
 use harmony::{
    inventory::Inventory,
    modules::monitoring::{
-        alert_channel::discord_alert_channel::DiscordWebhook,
-        okd::cluster_monitoring::OpenshiftClusterAlertScore,
+        alert_channel::discord_alert_channel::DiscordReceiver,
+        alert_rule::{
+            alerts::{
+                infra::opnsense::high_http_error_rate, k8s::pvc::high_pvc_fill_rate_over_two_days,
+            },
+            prometheus_alert_rule::AlertManagerRuleGroup,
+        },
+        okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
+        scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
+    },
+    topology::{
+        K8sAnywhereTopology,
+        monitoring::{AlertMatcher, AlertRoute, MatchOp},
    },
-    topology::K8sAnywhereTopology,
 };
-use harmony_macros::hurl;
-use harmony_types::k8s_name::K8sName;
+
+use harmony_macros::{hurl, ip};

 #[tokio::main]
 async fn main() {
-    let mut sel = HashMap::new();
-    sel.insert(
-        "openshift_io_alert_source".to_string(),
-        "platform".to_string(),
-    );
-    let mut sel2 = HashMap::new();
-    sel2.insert("openshift_io_alert_source".to_string(), "".to_string());
-    let selectors = vec![sel, sel2];
+    let platform_matcher = AlertMatcher {
+        label: "prometheus".to_string(),
+        operator: MatchOp::Eq,
+        value: "openshift-monitoring/k8s".to_string(),
+    };
+    let severity = AlertMatcher {
+        label: "severity".to_string(),
+        operator: MatchOp::Eq,
+        value: "critical".to_string(),
+    };
+
+    let high_http_error_rate = high_http_error_rate();
+
+    let additional_rules = AlertManagerRuleGroup::new("test-rule", vec![high_http_error_rate]);
+
+    let scrape_target = PrometheusNodeExporter {
+        job_name: "firewall".to_string(),
+        metrics_path: "/metrics".to_string(),
+        listen_address: ip!("192.168.1.1"),
+        port: 9100,
+        ..Default::default()
+    };
+
    harmony_cli::run(
        Inventory::autoload(),
        K8sAnywhereTopology::from_env(),
        vec![Box::new(OpenshiftClusterAlertScore {
-            receivers: vec![Box::new(DiscordWebhook {
-                name: K8sName("wills-discord-webhook-example".to_string()),
-                url: hurl!("https://something.io"),
-                selectors: selectors,
+            receivers: vec![Box::new(DiscordReceiver {
+                name: "crit-wills-discord-channel-example".to_string(),
+                url: hurl!("https://test.io"),
+                route: AlertRoute {
+                    matchers: vec![severity],
+                    ..AlertRoute::default("crit-wills-discord-channel-example".to_string())
+                },
            })],
+            sender: harmony::modules::monitoring::okd::OpenshiftClusterAlertSender,
+            rules: vec![Box::new(additional_rules)],
+            scrape_targets: Some(vec![Box::new(scrape_target)]),
        })],
        None,
    )
--- a/examples/openbao/src/main.rs
+++ b/examples/openbao/src/main.rs
@@ -1,63 +1,13 @@
-use std::str::FromStr;
-
 use harmony::{
-    inventory::Inventory,
-    modules::helm::chart::{HelmChartScore, HelmRepository, NonBlankString},
-    topology::K8sAnywhereTopology,
+    inventory::Inventory, modules::openbao::OpenbaoScore, topology::K8sAnywhereTopology,
 };
-use harmony_macros::hurl;

 #[tokio::main]
 async fn main() {
-    let values_yaml = Some(
-        r#"server:
-  standalone:
-    enabled: true
-    config: |
-      listener "tcp" {
-        tls_disable = true
-        address = "[::]:8200"
-        cluster_address = "[::]:8201"
-      }
-
-      storage "file" {
-        path = "/openbao/data"
-      }
-
-  service:
-    enabled: true
-
-  dataStorage:
-    enabled: true
-    size: 10Gi
-    storageClass: null
-    accessMode: ReadWriteOnce
-
-  auditStorage:
-    enabled: true
-    size: 10Gi
-    storageClass: null
-    accessMode: ReadWriteOnce"#
-            .to_string(),
-    );
-    let openbao = HelmChartScore {
-        namespace: Some(NonBlankString::from_str("openbao").unwrap()),
-        release_name: NonBlankString::from_str("openbao").unwrap(),
-        chart_name: NonBlankString::from_str("openbao/openbao").unwrap(),
-        chart_version: None,
-        values_overrides: None,
-        values_yaml,
-        create_namespace: true,
-        install_only: true,
-        repository: Some(HelmRepository::new(
-            "openbao".to_string(),
-            hurl!("https://openbao.github.io/openbao-helm"),
-            true,
-        )),
+    let openbao = OpenbaoScore {
+        host: "openbao.sebastien.sto1.nationtech.io".to_string(),
    };

-    // TODO exec pod commands to initialize secret store if not already done
-
    harmony_cli::run(
        Inventory::autoload(),
        K8sAnywhereTopology::from_env(),
--- a/examples/operatorhub_catalog/src/main.rs
+++ b/examples/operatorhub_catalog/src/main.rs
@@ -1,5 +1,3 @@
-use std::str::FromStr;
-
 use harmony::{
    inventory::Inventory,
    modules::{k8s::apps::OperatorHubCatalogSourceScore, postgresql::CloudNativePgOperatorScore},
@@ -9,7 +7,7 @@ use harmony::{
 #[tokio::main]
 async fn main() {
    let operatorhub_catalog = OperatorHubCatalogSourceScore::default();
-    let cnpg_operator = CloudNativePgOperatorScore::default();
+    let cnpg_operator = CloudNativePgOperatorScore::default_openshift();

    harmony_cli::run(
        Inventory::autoload(),
--- a/examples/opnsense_node_exporter/src/main.rs
+++ b/examples/opnsense_node_exporter/src/main.rs
@@ -1,22 +1,13 @@
-use std::{
-    net::{IpAddr, Ipv4Addr},
-    sync::Arc,
-};
+use std::sync::Arc;

 use async_trait::async_trait;
-use cidr::Ipv4Cidr;
 use harmony::{
    executors::ExecutorError,
-    hardware::{HostCategory, Location, PhysicalHost, SwitchGroup},
-    infra::opnsense::OPNSenseManagementInterface,
    inventory::Inventory,
    modules::opnsense::node_exporter::NodeExporterScore,
-    topology::{
-        HAClusterTopology, LogicalHost, PreparationError, PreparationOutcome, Topology,
-        UnmanagedRouter, node_exporter::NodeExporter,
-    },
+    topology::{PreparationError, PreparationOutcome, Topology, node_exporter::NodeExporter},
 };
-use harmony_macros::{ip, ipv4, mac_address};
+use harmony_macros::ip;

 #[derive(Debug)]
 struct OpnSenseTopology {
--- a/examples/public_postgres/src/main.rs
+++ b/examples/public_postgres/src/main.rs
@@ -1,8 +1,7 @@
 use harmony::{
    inventory::Inventory,
    modules::postgresql::{
-        K8sPostgreSQLScore, PostgreSQLConnectionScore, PublicPostgreSQLScore,
-        capability::PostgreSQLConfig,
+        PostgreSQLConnectionScore, PublicPostgreSQLScore, capability::PostgreSQLConfig,
    },
    topology::K8sAnywhereTopology,
 };
--- a/examples/rhob_application_monitoring/src/main.rs
+++ b/examples/rhob_application_monitoring/src/main.rs
@@ -1,4 +1,4 @@
-use std::{collections::HashMap, path::PathBuf, sync::Arc};
+use std::{path::PathBuf, sync::Arc};

 use harmony::{
    inventory::Inventory,
@@ -6,9 +6,9 @@ use harmony::{
        application::{
            ApplicationScore, RustWebFramework, RustWebapp, features::rhob_monitoring::Monitoring,
        },
-        monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
+        monitoring::alert_channel::discord_alert_channel::DiscordReceiver,
    },
-    topology::K8sAnywhereTopology,
+    topology::{K8sAnywhereTopology, monitoring::AlertRoute},
 };
 use harmony_types::{k8s_name::K8sName, net::Url};

@@ -22,18 +22,21 @@ async fn main() {
        service_port: 3000,
    });

-    let discord_receiver = DiscordWebhook {
-        name: K8sName("test-discord".to_string()),
+    let receiver_name = "test-discord".to_string();
+    let discord_receiver = DiscordReceiver {
+        name: receiver_name.clone(),
        url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
-        selectors: vec![],
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let app = ApplicationScore {
        features: vec![
-            Box::new(Monitoring {
-                application: application.clone(),
-                alert_receiver: vec![Box::new(discord_receiver)],
-            }),
+            // Box::new(Monitoring {
+            //     application: application.clone(),
+            //     alert_receiver: vec![Box::new(discord_receiver)],
+            // }),
            // TODO add backups, multisite ha, etc
        ],
        application,
--- a/examples/rust/src/main.rs
+++ b/examples/rust/src/main.rs
@@ -1,4 +1,4 @@
-use std::{collections::HashMap, path::PathBuf, sync::Arc};
+use std::{path::PathBuf, sync::Arc};

 use harmony::{
    inventory::Inventory,
@@ -8,13 +8,13 @@ use harmony::{
            features::{Monitoring, PackagingDeployment},
        },
        monitoring::alert_channel::{
-            discord_alert_channel::DiscordWebhook, webhook_receiver::WebhookReceiver,
+            discord_alert_channel::DiscordReceiver, webhook_receiver::WebhookReceiver,
        },
    },
-    topology::K8sAnywhereTopology,
+    topology::{K8sAnywhereTopology, monitoring::AlertRoute},
 };
 use harmony_macros::hurl;
-use harmony_types::k8s_name::K8sName;
+use harmony_types::{k8s_name::K8sName, net::Url};

 #[tokio::main]
 async fn main() {
@@ -26,15 +26,23 @@ async fn main() {
        service_port: 3000,
    });

-    let discord_receiver = DiscordWebhook {
-        name: K8sName("test-discord".to_string()),
-        url: hurl!("https://discord.doesnt.exist.com"),
-        selectors: vec![],
+    let receiver_name = "test-discord".to_string();
+    let discord_receiver = DiscordReceiver {
+        name: receiver_name.clone(),
+        url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

+    let receiver_name = "sample-webhook-receiver".to_string();
+
    let webhook_receiver = WebhookReceiver {
-        name: "sample-webhook-receiver".to_string(),
+        name: receiver_name.clone(),
        url: hurl!("https://webhook-doesnt-exist.com"),
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let app = ApplicationScore {
@@ -42,10 +50,10 @@ async fn main() {
            Box::new(PackagingDeployment {
                application: application.clone(),
            }),
-            Box::new(Monitoring {
-                application: application.clone(),
-                alert_receiver: vec![Box::new(discord_receiver), Box::new(webhook_receiver)],
-            }),
+            // Box::new(Monitoring {
+            //     application: application.clone(),
+            //     alert_receiver: vec![Box::new(discord_receiver), Box::new(webhook_receiver)],
+            // }),
            // TODO add backups, multisite ha, etc
        ],
        application,
--- a/examples/try_rust_webapp/src/main.rs
+++ b/examples/try_rust_webapp/src/main.rs
@@ -1,11 +1,8 @@
 use harmony::{
    inventory::Inventory,
-    modules::{
-        application::{
-            ApplicationScore, RustWebFramework, RustWebapp,
-            features::{Monitoring, PackagingDeployment},
-        },
-        monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
+    modules::application::{
+        ApplicationScore, RustWebFramework, RustWebapp,
+        features::{Monitoring, PackagingDeployment},
    },
    topology::K8sAnywhereTopology,
 };
@@ -30,14 +27,14 @@ async fn main() {
            Box::new(PackagingDeployment {
                application: application.clone(),
            }),
-            Box::new(Monitoring {
-                application: application.clone(),
-                alert_receiver: vec![Box::new(DiscordWebhook {
-                    name: K8sName("test-discord".to_string()),
-                    url: hurl!("https://discord.doesnt.exist.com"),
-                    selectors: vec![],
-                })],
-            }),
+            // Box::new(Monitoring {
+            //     application: application.clone(),
+            //     alert_receiver: vec![Box::new(DiscordWebhook {
+            //         name: K8sName("test-discord".to_string()),
+            //         url: hurl!("https://discord.doesnt.exist.com"),
+            //         selectors: vec![],
+            //     })],
+            // }),
        ],
        application,
    };
--- a/examples/zitadel/Cargo.toml
+++ b/examples/zitadel/Cargo.toml
@@ -0,0 +1,14 @@
+[package]
+name = "example-zitadel"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony_macros = { path = "../../harmony_macros" }
+harmony_types = { path = "../../harmony_types" }
+tokio.workspace = true
+url.workspace = true
--- a/examples/zitadel/src/main.rs
+++ b/examples/zitadel/src/main.rs
@@ -0,0 +1,20 @@
+use harmony::{
+    inventory::Inventory, modules::zitadel::ZitadelScore, topology::K8sAnywhereTopology,
+};
+
+#[tokio::main]
+async fn main() {
+    let zitadel = ZitadelScore {
+        host: "sso.sto1.nationtech.io".to_string(),
+        zitadel_version: "v4.12.1".to_string(),
+    };
+
+    harmony_cli::run(
+        Inventory::autoload(),
+        K8sAnywhereTopology::from_env(),
+        vec![Box::new(zitadel)],
+        None,
+    )
+    .await
+    .unwrap();
+}
--- a/examples/zitadel/zitadel-9.24.0.tgz
+++ b/examples/zitadel/zitadel-9.24.0.tgz
--- a/harmony-k8s/Cargo.toml
+++ b/harmony-k8s/Cargo.toml
@@ -0,0 +1,23 @@
+[package]
+name = "harmony-k8s"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+
+[dependencies]
+kube.workspace = true
+k8s-openapi.workspace = true
+tokio.workspace = true
+tokio-retry.workspace = true
+serde.workspace = true
+serde_json.workspace = true
+serde_yaml.workspace = true
+log.workspace = true
+similar.workspace = true
+reqwest.workspace = true
+url.workspace = true
+inquire.workspace = true
+
+[dev-dependencies]
+pretty_assertions.workspace = true
--- a/harmony-k8s/src/apply.rs
+++ b/harmony-k8s/src/apply.rs
@@ -0,0 +1,593 @@
+use kube::{
+    Client, Error, Resource,
+    api::{
+        Api, ApiResource, DynamicObject, GroupVersionKind, Patch, PatchParams, PostParams,
+        ResourceExt,
+    },
+    core::ErrorResponse,
+    discovery::Scope,
+    error::DiscoveryError,
+};
+use log::{debug, error, trace, warn};
+use serde::{Serialize, de::DeserializeOwned};
+use serde_json::Value;
+use similar::TextDiff;
+use url::Url;
+
+use crate::client::K8sClient;
+use crate::helper;
+use crate::types::WriteMode;
+
+/// The field-manager token sent with every server-side apply request.
+pub const FIELD_MANAGER: &str = "harmony-k8s";
+
+// ── Private helpers ──────────────────────────────────────────────────────────
+
+/// Serialise any `Serialize` payload to a [`DynamicObject`] via JSON.
+fn to_dynamic<T: Serialize>(payload: &T) -> Result<DynamicObject, Error> {
+    serde_json::from_value(serde_json::to_value(payload).map_err(Error::SerdeError)?)
+        .map_err(Error::SerdeError)
+}
+
+/// Fetch the current resource, display a unified diff against `payload`, and
+/// return `()`.  All output goes to stdout (same behaviour as before).
+///
+/// A 404 is treated as "resource would be created" — not an error.
+async fn show_dry_run<T: Serialize>(
+    api: &Api<DynamicObject>,
+    name: &str,
+    payload: &T,
+) -> Result<(), Error> {
+    let new_yaml = serde_yaml::to_string(payload)
+        .unwrap_or_else(|_| "Failed to serialize new resource".to_string());
+
+    match api.get(name).await {
+        Ok(current) => {
+            println!("\nDry-run for resource: '{name}'");
+            let mut current_val = serde_yaml::to_value(&current).unwrap_or(serde_yaml::Value::Null);
+            if let Some(map) = current_val.as_mapping_mut() {
+                map.remove(&serde_yaml::Value::String("status".to_string()));
+            }
+            let current_yaml = serde_yaml::to_string(&current_val)
+                .unwrap_or_else(|_| "Failed to serialize current resource".to_string());
+
+            if current_yaml == new_yaml {
+                println!("No changes detected.");
+            } else {
+                println!("Changes detected:");
+                let diff = TextDiff::from_lines(&current_yaml, &new_yaml);
+                for change in diff.iter_all_changes() {
+                    let sign = match change.tag() {
+                        similar::ChangeTag::Delete => "-",
+                        similar::ChangeTag::Insert => "+",
+                        similar::ChangeTag::Equal => " ",
+                    };
+                    print!("{sign}{change}");
+                }
+            }
+            Ok(())
+        }
+        Err(Error::Api(ErrorResponse { code: 404, .. })) => {
+            println!("\nDry-run for new resource: '{name}'");
+            println!("Resource does not exist. Would be created:");
+            for line in new_yaml.lines() {
+                println!("+{line}");
+            }
+            Ok(())
+        }
+        Err(e) => {
+            error!("Failed to fetch resource '{name}' for dry-run: {e}");
+            Err(e)
+        }
+    }
+}
+
+/// Execute the real (non-dry-run) apply, respecting [`WriteMode`].
+async fn do_apply<T: Serialize + std::fmt::Debug>(
+    api: &Api<DynamicObject>,
+    name: &str,
+    payload: &T,
+    patch_params: &PatchParams,
+    write_mode: &WriteMode,
+) -> Result<DynamicObject, Error> {
+    match write_mode {
+        WriteMode::CreateOrUpdate => {
+            // TODO refactor this arm to perform self.update and if fail with 404 self.create
+            // This will avoid the repetition of the api.patch and api.create calls within this
+            // function body. This makes the code more maintainable
+            match api.patch(name, patch_params, &Patch::Apply(payload)).await {
+                Ok(obj) => Ok(obj),
+                Err(Error::Api(ErrorResponse { code: 404, .. })) => {
+                    debug!("Resource '{name}' not found via SSA, falling back to POST");
+                    let dyn_obj = to_dynamic(payload)?;
+                    api.create(&PostParams::default(), &dyn_obj)
+                        .await
+                        .map_err(|e| {
+                            error!("Failed to create '{name}': {e}");
+                            e
+                        })
+                }
+                Err(e) => {
+                    error!("Failed to apply '{name}': {e}");
+                    Err(e)
+                }
+            }
+        }
+        WriteMode::Create => {
+            let dyn_obj = to_dynamic(payload)?;
+            api.create(&PostParams::default(), &dyn_obj)
+                .await
+                .map_err(|e| {
+                    error!("Failed to create '{name}': {e}");
+                    e
+                })
+        }
+        WriteMode::Update => match api.patch(name, patch_params, &Patch::Apply(payload)).await {
+            Ok(obj) => Ok(obj),
+            Err(Error::Api(ErrorResponse { code: 404, .. })) => Err(Error::Api(ErrorResponse {
+                code: 404,
+                message: format!("Resource '{name}' not found and WriteMode is UpdateOnly"),
+                reason: "NotFound".to_string(),
+                status: "Failure".to_string(),
+            })),
+            Err(e) => {
+                error!("Failed to update '{name}': {e}");
+                Err(e)
+            }
+        },
+    }
+}
+
+// ── Public API ───────────────────────────────────────────────────────────────
+
+impl K8sClient {
+    /// Server-side apply: create if absent, update if present.
+    /// Equivalent to `kubectl apply`.
+    pub async fn apply<K>(&self, resource: &K, namespace: Option<&str>) -> Result<K, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        self.apply_with_strategy(resource, namespace, WriteMode::CreateOrUpdate)
+            .await
+    }
+
+    /// POST only — returns an error if the resource already exists.
+    pub async fn create<K>(&self, resource: &K, namespace: Option<&str>) -> Result<K, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        self.apply_with_strategy(resource, namespace, WriteMode::Create)
+            .await
+    }
+
+    /// Server-side apply only — returns an error if the resource does not exist.
+    pub async fn update<K>(&self, resource: &K, namespace: Option<&str>) -> Result<K, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        self.apply_with_strategy(resource, namespace, WriteMode::Update)
+            .await
+    }
+
+    pub async fn apply_with_strategy<K>(
+        &self,
+        resource: &K,
+        namespace: Option<&str>,
+        write_mode: WriteMode,
+    ) -> Result<K, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        debug!(
+            "apply_with_strategy: {:?} ns={:?}",
+            resource.meta().name,
+            namespace
+        );
+        trace!("{:#}", serde_json::to_value(resource).unwrap_or_default());
+
+        let dyntype = K::DynamicType::default();
+        let gvk = GroupVersionKind {
+            group: K::group(&dyntype).to_string(),
+            version: K::version(&dyntype).to_string(),
+            kind: K::kind(&dyntype).to_string(),
+        };
+
+        let discovery = self.discovery().await?;
+        let (ar, caps) = discovery.resolve_gvk(&gvk).ok_or_else(|| {
+            Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Cannot resolve GVK: {gvk:?}"
+            )))
+        })?;
+
+        let effective_ns = if caps.scope == Scope::Cluster {
+            None
+        } else {
+            namespace.or_else(|| resource.meta().namespace.as_deref())
+        };
+
+        let api: Api<DynamicObject> =
+            get_dynamic_api(ar, caps, self.client.clone(), effective_ns, false);
+
+        let name = resource
+            .meta()
+            .name
+            .as_deref()
+            .expect("Kubernetes resource must have a name");
+
+        if self.dry_run {
+            show_dry_run(&api, name, resource).await?;
+            return Ok(resource.clone());
+        }
+
+        let patch_params = PatchParams::apply(FIELD_MANAGER);
+        do_apply(&api, name, resource, &patch_params, &write_mode)
+            .await
+            .and_then(helper::dyn_to_typed)
+    }
+
+    /// Applies resources in order, one at a time
+    pub async fn apply_many<K>(&self, resources: &[K], ns: Option<&str>) -> Result<Vec<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        let mut result = Vec::new();
+        for r in resources.iter() {
+            let res = self.apply(r, ns).await;
+            if res.is_err() {
+                // NOTE: this may log sensitive data; downgrade to debug if needed.
+                warn!(
+                    "Failed to apply k8s resource: {}",
+                    serde_json::to_string_pretty(r).map_err(Error::SerdeError)?
+                );
+            }
+            result.push(res?);
+        }
+        Ok(result)
+    }
+
+    /// Apply a [`DynamicObject`] resource using server-side apply.
+    pub async fn apply_dynamic(
+        &self,
+        resource: &DynamicObject,
+        namespace: Option<&str>,
+        force_conflicts: bool,
+    ) -> Result<DynamicObject, Error> {
+        trace!("apply_dynamic {resource:#?} ns={namespace:?} force={force_conflicts}");
+
+        let discovery = self.discovery().await?;
+        let type_meta = resource.types.as_ref().ok_or_else(|| {
+            Error::BuildRequest(kube::core::request::Error::Validation(
+                "DynamicObject must have types (apiVersion and kind)".to_string(),
+            ))
+        })?;
+
+        let gvk = GroupVersionKind::try_from(type_meta).map_err(|_| {
+            Error::BuildRequest(kube::core::request::Error::Validation(format!(
+                "Invalid GVK in DynamicObject: {type_meta:?}"
+            )))
+        })?;
+
+        let (ar, caps) = discovery.resolve_gvk(&gvk).ok_or_else(|| {
+            Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Cannot resolve GVK: {gvk:?}"
+            )))
+        })?;
+
+        let effective_ns = if caps.scope == Scope::Cluster {
+            None
+        } else {
+            namespace.or_else(|| resource.metadata.namespace.as_deref())
+        };
+
+        let api = get_dynamic_api(ar, caps, self.client.clone(), effective_ns, false);
+        let name = resource.metadata.name.as_deref().ok_or_else(|| {
+            Error::BuildRequest(kube::core::request::Error::Validation(
+                "DynamicObject must have metadata.name".to_string(),
+            ))
+        })?;
+
+        debug!(
+            "apply_dynamic kind={:?} name='{name}' ns={effective_ns:?}",
+            resource.types.as_ref().map(|t| &t.kind),
+        );
+
+        // NOTE would be nice to improve cohesion between the dynamic and typed apis and avoid copy
+        // pasting the dry_run and some more logic
+        if self.dry_run {
+            show_dry_run(&api, name, resource).await?;
+            return Ok(resource.clone());
+        }
+
+        let mut patch_params = PatchParams::apply(FIELD_MANAGER);
+        patch_params.force = force_conflicts;
+
+        do_apply(
+            &api,
+            name,
+            resource,
+            &patch_params,
+            &WriteMode::CreateOrUpdate,
+        )
+        .await
+    }
+
+    pub async fn apply_dynamic_many(
+        &self,
+        resources: &[DynamicObject],
+        namespace: Option<&str>,
+        force_conflicts: bool,
+    ) -> Result<Vec<DynamicObject>, Error> {
+        let mut result = Vec::new();
+        for r in resources.iter() {
+            result.push(self.apply_dynamic(r, namespace, force_conflicts).await?);
+        }
+        Ok(result)
+    }
+
+    pub async fn apply_yaml_many(
+        &self,
+        #[allow(clippy::ptr_arg)] yaml: &Vec<serde_yaml::Value>,
+        ns: Option<&str>,
+    ) -> Result<(), Error> {
+        for y in yaml.iter() {
+            self.apply_yaml(y, ns).await?;
+        }
+        Ok(())
+    }
+
+    pub async fn apply_yaml(
+        &self,
+        yaml: &serde_yaml::Value,
+        ns: Option<&str>,
+    ) -> Result<(), Error> {
+        // NOTE wouldn't it be possible to parse this into a DynamicObject and simply call
+        // apply_dynamic instead of reimplementing api interactions?
+        let obj: DynamicObject =
+            serde_yaml::from_value(yaml.clone()).expect("YAML must deserialise to DynamicObject");
+        let name = obj.metadata.name.as_ref().expect("YAML must have a name");
+
+        let api_version = yaml["apiVersion"].as_str().expect("missing apiVersion");
+        let kind = yaml["kind"].as_str().expect("missing kind");
+
+        let mut it = api_version.splitn(2, '/');
+        let first = it.next().unwrap();
+        let (g, v) = match it.next() {
+            Some(second) => (first, second),
+            None => ("", first),
+        };
+
+        let api_resource = ApiResource::from_gvk(&GroupVersionKind::gvk(g, v, kind));
+        let namespace = ns.unwrap_or_else(|| {
+            obj.metadata
+                .namespace
+                .as_deref()
+                .expect("YAML must have a namespace when ns is not provided")
+        });
+
+        let api: Api<DynamicObject> =
+            Api::namespaced_with(self.client.clone(), namespace, &api_resource);
+
+        println!("Applying '{name}' in namespace '{namespace}'...");
+        let patch_params = PatchParams::apply(FIELD_MANAGER);
+        let result = api.patch(name, &patch_params, &Patch::Apply(&obj)).await?;
+        println!("Successfully applied '{}'.", result.name_any());
+        Ok(())
+    }
+
+    /// Equivalent to `kubectl apply -f <url>`.
+    pub async fn apply_url(&self, url: Url, ns: Option<&str>) -> Result<(), Error> {
+        let patch_params = PatchParams::apply(FIELD_MANAGER);
+        let discovery = self.discovery().await?;
+
+        let yaml = reqwest::get(url)
+            .await
+            .expect("Could not fetch URL")
+            .text()
+            .await
+            .expect("Could not read response body");
+
+        for doc in multidoc_deserialize(&yaml).expect("Failed to parse YAML from URL") {
+            let obj: DynamicObject =
+                serde_yaml::from_value(doc).expect("YAML document is not a valid object");
+            let namespace = obj.metadata.namespace.as_deref().or(ns);
+            let type_meta = obj.types.as_ref().expect("Object is missing TypeMeta");
+            let gvk =
+                GroupVersionKind::try_from(type_meta).expect("Object has invalid GroupVersionKind");
+            let name = obj.name_any();
+
+            if let Some((ar, caps)) = discovery.resolve_gvk(&gvk) {
+                let api = get_dynamic_api(ar, caps, self.client.clone(), namespace, false);
+                trace!(
+                    "Applying {}:\n{}",
+                    gvk.kind,
+                    serde_yaml::to_string(&obj).unwrap_or_default()
+                );
+                let data: Value = serde_json::to_value(&obj).expect("serialisation failed");
+                let _r = api.patch(&name, &patch_params, &Patch::Apply(data)).await?;
+                debug!("Applied {} '{name}'", gvk.kind);
+            } else {
+                warn!("Skipping document with unknown GVK: {gvk:?}");
+            }
+        }
+        Ok(())
+    }
+
+    /// Build a dynamic API client from a [`DynamicObject`]'s type metadata.
+    pub(crate) fn get_api_for_dynamic_object(
+        &self,
+        object: &DynamicObject,
+        ns: Option<&str>,
+    ) -> Result<Api<DynamicObject>, Error> {
+        let ar = object
+            .types
+            .as_ref()
+            .and_then(|t| {
+                let parts: Vec<&str> = t.api_version.split('/').collect();
+                match parts.as_slice() {
+                    [version] => Some(ApiResource::from_gvk(&GroupVersionKind::gvk(
+                        "", version, &t.kind,
+                    ))),
+                    [group, version] => Some(ApiResource::from_gvk(&GroupVersionKind::gvk(
+                        group, version, &t.kind,
+                    ))),
+                    _ => None,
+                }
+            })
+            .ok_or_else(|| {
+                Error::BuildRequest(kube::core::request::Error::Validation(format!(
+                    "Invalid apiVersion in DynamicObject: {object:#?}"
+                )))
+            })?;
+
+        Ok(match ns {
+            Some(ns) => Api::namespaced_with(self.client.clone(), ns, &ar),
+            None => Api::default_namespaced_with(self.client.clone(), &ar),
+        })
+    }
+}
+
+// ── Free functions ───────────────────────────────────────────────────────────
+
+pub(crate) fn get_dynamic_api(
+    resource: kube::api::ApiResource,
+    capabilities: kube::discovery::ApiCapabilities,
+    client: Client,
+    ns: Option<&str>,
+    all: bool,
+) -> Api<DynamicObject> {
+    if capabilities.scope == Scope::Cluster || all {
+        Api::all_with(client, &resource)
+    } else if let Some(namespace) = ns {
+        Api::namespaced_with(client, namespace, &resource)
+    } else {
+        Api::default_namespaced_with(client, &resource)
+    }
+}
+
+pub(crate) fn multidoc_deserialize(
+    data: &str,
+) -> Result<Vec<serde_yaml::Value>, serde_yaml::Error> {
+    use serde::Deserialize;
+    let mut docs = vec![];
+    for de in serde_yaml::Deserializer::from_str(data) {
+        docs.push(serde_yaml::Value::deserialize(de)?);
+    }
+    Ok(docs)
+}
+
+// ── Tests ────────────────────────────────────────────────────────────────────
+
+#[cfg(test)]
+mod apply_tests {
+    use std::collections::BTreeMap;
+    use std::time::{SystemTime, UNIX_EPOCH};
+
+    use k8s_openapi::api::core::v1::ConfigMap;
+    use k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta;
+    use kube::api::{DeleteParams, TypeMeta};
+
+    use super::*;
+
+    #[tokio::test]
+    #[ignore = "requires kubernetes cluster"]
+    async fn apply_creates_new_configmap() {
+        let client = K8sClient::try_default().await.unwrap();
+        let ns = "default";
+        let name = format!(
+            "test-cm-{}",
+            SystemTime::now()
+                .duration_since(UNIX_EPOCH)
+                .unwrap()
+                .as_millis()
+        );
+
+        let cm = ConfigMap {
+            metadata: ObjectMeta {
+                name: Some(name.clone()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            data: Some(BTreeMap::from([("key1".to_string(), "value1".to_string())])),
+            ..Default::default()
+        };
+
+        assert!(client.apply(&cm, Some(ns)).await.is_ok());
+
+        let api: Api<ConfigMap> = Api::namespaced(client.client.clone(), ns);
+        let _ = api.delete(&name, &DeleteParams::default()).await;
+    }
+
+    #[tokio::test]
+    #[ignore = "requires kubernetes cluster"]
+    async fn apply_is_idempotent() {
+        let client = K8sClient::try_default().await.unwrap();
+        let ns = "default";
+        let name = format!(
+            "test-idem-{}",
+            SystemTime::now()
+                .duration_since(UNIX_EPOCH)
+                .unwrap()
+                .as_millis()
+        );
+
+        let cm = ConfigMap {
+            metadata: ObjectMeta {
+                name: Some(name.clone()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            data: Some(BTreeMap::from([("key".to_string(), "value".to_string())])),
+            ..Default::default()
+        };
+
+        assert!(
+            client.apply(&cm, Some(ns)).await.is_ok(),
+            "first apply failed"
+        );
+        assert!(
+            client.apply(&cm, Some(ns)).await.is_ok(),
+            "second apply failed (not idempotent)"
+        );
+
+        let api: Api<ConfigMap> = Api::namespaced(client.client.clone(), ns);
+        let _ = api.delete(&name, &DeleteParams::default()).await;
+    }
+
+    #[tokio::test]
+    #[ignore = "requires kubernetes cluster"]
+    async fn apply_dynamic_creates_new_resource() {
+        let client = K8sClient::try_default().await.unwrap();
+        let ns = "default";
+        let name = format!(
+            "test-dyn-{}",
+            SystemTime::now()
+                .duration_since(UNIX_EPOCH)
+                .unwrap()
+                .as_millis()
+        );
+
+        let obj = DynamicObject {
+            types: Some(TypeMeta {
+                api_version: "v1".to_string(),
+                kind: "ConfigMap".to_string(),
+            }),
+            metadata: ObjectMeta {
+                name: Some(name.clone()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            data: serde_json::json!({}),
+        };
+
+        let result = client.apply_dynamic(&obj, Some(ns), false).await;
+        assert!(result.is_ok(), "apply_dynamic failed: {:?}", result.err());
+
+        let api: Api<ConfigMap> = Api::namespaced(client.client.clone(), ns);
+        let _ = api.delete(&name, &DeleteParams::default()).await;
+    }
+}
--- a/harmony/src/domain/topology/k8s/bundle.rs
+++ b/harmony/src/domain/topology/k8s/bundle.rs
@@ -25,9 +25,9 @@
 //!
 //! ## Example
 //!
-//! ```rust,no_run
-//! use harmony::topology::k8s::{K8sClient, helper};
-//! use harmony::topology::KubernetesDistribution;
+//! ```
+//! use harmony_k8s::{K8sClient, helper};
+//! use harmony_k8s::KubernetesDistribution;
 //!
 //! async fn write_network_config(client: &K8sClient, node: &str) {
 //!     // Create a bundle with platform-specific RBAC
@@ -56,7 +56,7 @@ use kube::{Error, Resource, ResourceExt, api::DynamicObject};
 use serde::Serialize;
 use serde_json;

-use crate::domain::topology::k8s::K8sClient;
+use crate::K8sClient;

 /// A ResourceBundle represents a logical unit of work consisting of multiple
 /// Kubernetes resources that should be applied or deleted together.
--- a/harmony-k8s/src/client.rs
+++ b/harmony-k8s/src/client.rs
@@ -0,0 +1,107 @@
+use std::sync::Arc;
+
+use kube::config::{KubeConfigOptions, Kubeconfig};
+use kube::{Client, Config, Discovery, Error};
+use log::error;
+use serde::Serialize;
+use tokio::sync::OnceCell;
+
+use crate::types::KubernetesDistribution;
+
+// TODO not cool, should use a proper configuration mechanism
+// cli arg, env var, config file
+fn read_dry_run_from_env() -> bool {
+    std::env::var("DRY_RUN")
+        .map(|v| v == "true" || v == "1")
+        .unwrap_or(false)
+}
+
+#[derive(Clone)]
+pub struct K8sClient {
+    pub(crate) client: Client,
+    /// When `true` no mutation is sent to the API server; diffs are printed
+    /// to stdout instead. Initialised from the `DRY_RUN` environment variable.
+    pub(crate) dry_run: bool,
+    pub(crate) k8s_distribution: Arc<OnceCell<KubernetesDistribution>>,
+    pub(crate) discovery: Arc<OnceCell<Discovery>>,
+}
+
+impl Serialize for K8sClient {
+    fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: serde::Serializer,
+    {
+        todo!("K8sClient serialization is not meaningful; remove this impl if unused")
+    }
+}
+
+impl std::fmt::Debug for K8sClient {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.write_fmt(format_args!(
+            "K8sClient {{ namespace: {}, dry_run: {} }}",
+            self.client.default_namespace(),
+            self.dry_run,
+        ))
+    }
+}
+
+impl K8sClient {
+    pub fn inner_client(&self) -> &Client {
+        &self.client
+    }
+
+    pub fn inner_client_clone(&self) -> Client {
+        self.client.clone()
+    }
+
+    /// Create a client, reading `DRY_RUN` from the environment.
+    pub fn new(client: Client) -> Self {
+        Self {
+            dry_run: read_dry_run_from_env(),
+            client,
+            k8s_distribution: Arc::new(OnceCell::new()),
+            discovery: Arc::new(OnceCell::new()),
+        }
+    }
+
+    /// Create a client that always operates in dry-run mode, regardless of
+    /// the environment variable.
+    pub fn new_dry_run(client: Client) -> Self {
+        Self {
+            dry_run: true,
+            ..Self::new(client)
+        }
+    }
+
+    /// Returns `true` if this client is operating in dry-run mode.
+    pub fn is_dry_run(&self) -> bool {
+        self.dry_run
+    }
+
+    pub async fn try_default() -> Result<Self, Error> {
+        Ok(Self::new(Client::try_default().await?))
+    }
+
+    pub async fn from_kubeconfig(path: &str) -> Option<Self> {
+        Self::from_kubeconfig_with_opts(path, &KubeConfigOptions::default()).await
+    }
+
+    pub async fn from_kubeconfig_with_context(path: &str, context: Option<String>) -> Option<Self> {
+        let mut opts = KubeConfigOptions::default();
+        opts.context = context;
+        Self::from_kubeconfig_with_opts(path, &opts).await
+    }
+
+    pub async fn from_kubeconfig_with_opts(path: &str, opts: &KubeConfigOptions) -> Option<Self> {
+        let k = match Kubeconfig::read_from(path) {
+            Ok(k) => k,
+            Err(e) => {
+                error!("Failed to load kubeconfig from {path}: {e}");
+                return None;
+            }
+        };
+        Some(Self::new(
+            Client::try_from(Config::from_custom_kubeconfig(k, opts).await.unwrap()).unwrap(),
+        ))
+    }
+}
--- a/harmony/src/domain/topology/k8s/config.rs
+++ b/harmony/src/domain/topology/k8s/config.rs
--- a/harmony-k8s/src/discovery.rs
+++ b/harmony-k8s/src/discovery.rs
@@ -0,0 +1,83 @@
+use std::time::Duration;
+
+use kube::{Discovery, Error};
+use log::{debug, error, info, trace, warn};
+use tokio::sync::Mutex;
+use tokio_retry::{Retry, strategy::ExponentialBackoff};
+
+use crate::client::K8sClient;
+use crate::types::KubernetesDistribution;
+
+impl K8sClient {
+    pub async fn get_apiserver_version(
+        &self,
+    ) -> Result<k8s_openapi::apimachinery::pkg::version::Info, Error> {
+        self.client.clone().apiserver_version().await
+    }
+
+    /// Runs (and caches) Kubernetes API discovery with exponential-backoff retries.
+    pub async fn discovery(&self) -> Result<&Discovery, Error> {
+        let retry_strategy = ExponentialBackoff::from_millis(1000)
+            .max_delay(Duration::from_secs(32))
+            .take(6);
+
+        let attempt = Mutex::new(0u32);
+        Retry::spawn(retry_strategy, || async {
+            let mut n = attempt.lock().await;
+            *n += 1;
+            match self
+                .discovery
+                .get_or_try_init(async || {
+                    debug!("Running Kubernetes API discovery (attempt {})", *n);
+                    let d = Discovery::new(self.client.clone()).run().await?;
+                    debug!("Kubernetes API discovery completed");
+                    Ok(d)
+                })
+                .await
+            {
+                Ok(d) => Ok(d),
+                Err(e) => {
+                    warn!("Kubernetes API discovery failed (attempt {}): {}", *n, e);
+                    Err(e)
+                }
+            }
+        })
+        .await
+        .map_err(|e| {
+            error!("Kubernetes API discovery failed after all retries: {}", e);
+            e
+        })
+    }
+
+    /// Detect which Kubernetes distribution is running. Result is cached for
+    /// the lifetime of the client.
+    pub async fn get_k8s_distribution(&self) -> Result<KubernetesDistribution, Error> {
+        self.k8s_distribution
+            .get_or_try_init(async || {
+                debug!("Detecting Kubernetes distribution");
+                let api_groups = self.client.list_api_groups().await?;
+                trace!("list_api_groups: {:?}", api_groups);
+
+                let version = self.get_apiserver_version().await?;
+
+                if api_groups
+                    .groups
+                    .iter()
+                    .any(|g| g.name == "project.openshift.io")
+                {
+                    info!("Detected distribution: OpenshiftFamily");
+                    return Ok(KubernetesDistribution::OpenshiftFamily);
+                }
+
+                if version.git_version.contains("k3s") {
+                    info!("Detected distribution: K3sFamily");
+                    return Ok(KubernetesDistribution::K3sFamily);
+                }
+
+                info!("Distribution not identified, using Default");
+                Ok(KubernetesDistribution::Default)
+            })
+            .await
+            .cloned()
+    }
+}
--- a/harmony/src/domain/topology/k8s/helper.rs
+++ b/harmony/src/domain/topology/k8s/helper.rs
@@ -1,7 +1,7 @@
 use std::collections::BTreeMap;
 use std::time::Duration;

-use crate::topology::KubernetesDistribution;
+use crate::KubernetesDistribution;

 use super::bundle::ResourceBundle;
 use super::config::PRIVILEGED_POD_IMAGE;
@@ -10,8 +10,10 @@ use k8s_openapi::api::core::v1::{
 };
 use k8s_openapi::api::rbac::v1::{ClusterRoleBinding, RoleRef, Subject};
 use k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta;
+use kube::api::DynamicObject;
 use kube::error::DiscoveryError;
 use log::{debug, error, info, warn};
+use serde::de::DeserializeOwned;

 #[derive(Debug)]
 pub struct PrivilegedPodConfig {
@@ -131,9 +133,9 @@ pub fn host_root_volume() -> (Volume, VolumeMount) {
 ///
 /// # Example
 ///
-/// ```rust,no_run
-/// # use harmony::topology::k8s::helper::{build_privileged_bundle, PrivilegedPodConfig};
-/// # use harmony::topology::KubernetesDistribution;
+/// ```
+/// use harmony_k8s::helper::{build_privileged_bundle, PrivilegedPodConfig};
+/// use harmony_k8s::KubernetesDistribution;
 /// let bundle = build_privileged_bundle(
 ///     PrivilegedPodConfig {
 ///         name: "network-setup".to_string(),
@@ -279,6 +281,16 @@ pub fn prompt_drain_timeout_action(
    }
 }

+/// JSON round-trip: DynamicObject → K
+///
+/// Safe because the DynamicObject was produced by the apiserver from a
+/// payload that was originally serialized from K, so the schema is identical.
+pub(crate) fn dyn_to_typed<K: DeserializeOwned>(obj: DynamicObject) -> Result<K, kube::Error> {
+    serde_json::to_value(obj)
+        .and_then(serde_json::from_value)
+        .map_err(kube::Error::SerdeError)
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
--- a/harmony-k8s/src/lib.rs
+++ b/harmony-k8s/src/lib.rs
@@ -0,0 +1,13 @@
+pub mod apply;
+pub mod bundle;
+pub mod client;
+pub mod config;
+pub mod discovery;
+pub mod helper;
+pub mod node;
+pub mod pod;
+pub mod resources;
+pub mod types;
+
+pub use client::K8sClient;
+pub use types::{DrainOptions, KubernetesDistribution, NodeFile, ScopeResolver, WriteMode};
--- a/harmony-k8s/src/main.rs
+++ b/harmony-k8s/src/main.rs
@@ -0,0 +1,3 @@
+fn main() {
+    println!("Hello, world!");
+}
--- a/harmony-k8s/src/node.rs
+++ b/harmony-k8s/src/node.rs
@@ -0,0 +1,722 @@
+use std::collections::BTreeMap;
+use std::time::{Duration, SystemTime, UNIX_EPOCH};
+
+use k8s_openapi::api::core::v1::{
+    ConfigMap, ConfigMapVolumeSource, Node, Pod, Volume, VolumeMount,
+};
+use k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta;
+use kube::{
+    Error,
+    api::{Api, DeleteParams, EvictParams, ListParams, PostParams},
+    core::ErrorResponse,
+    error::DiscoveryError,
+};
+use log::{debug, error, info, warn};
+use tokio::time::sleep;
+
+use crate::client::K8sClient;
+use crate::helper::{self, PrivilegedPodConfig};
+use crate::types::{DrainOptions, NodeFile};
+
+impl K8sClient {
+    pub async fn cordon_node(&self, node_name: &str) -> Result<(), Error> {
+        Api::<Node>::all(self.client.clone())
+            .cordon(node_name)
+            .await?;
+        Ok(())
+    }
+
+    pub async fn uncordon_node(&self, node_name: &str) -> Result<(), Error> {
+        Api::<Node>::all(self.client.clone())
+            .uncordon(node_name)
+            .await?;
+        Ok(())
+    }
+
+    pub async fn wait_for_node_ready(&self, node_name: &str) -> Result<(), Error> {
+        self.wait_for_node_ready_with_timeout(node_name, Duration::from_secs(600))
+            .await
+    }
+
+    async fn wait_for_node_ready_with_timeout(
+        &self,
+        node_name: &str,
+        timeout: Duration,
+    ) -> Result<(), Error> {
+        let api: Api<Node> = Api::all(self.client.clone());
+        let start = tokio::time::Instant::now();
+        let poll = Duration::from_secs(5);
+        loop {
+            if start.elapsed() > timeout {
+                return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Node '{node_name}' did not become Ready within {timeout:?}"
+                ))));
+            }
+            match api.get(node_name).await {
+                Ok(node) => {
+                    if node
+                        .status
+                        .as_ref()
+                        .and_then(|s| s.conditions.as_ref())
+                        .map(|conds| {
+                            conds
+                                .iter()
+                                .any(|c| c.type_ == "Ready" && c.status == "True")
+                        })
+                        .unwrap_or(false)
+                    {
+                        debug!("Node '{node_name}' is Ready");
+                        return Ok(());
+                    }
+                }
+                Err(e) => debug!("Error polling node '{node_name}': {e}"),
+            }
+            sleep(poll).await;
+        }
+    }
+
+    async fn wait_for_node_not_ready(
+        &self,
+        node_name: &str,
+        timeout: Duration,
+    ) -> Result<(), Error> {
+        let api: Api<Node> = Api::all(self.client.clone());
+        let start = tokio::time::Instant::now();
+        let poll = Duration::from_secs(5);
+        loop {
+            if start.elapsed() > timeout {
+                return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Node '{node_name}' did not become NotReady within {timeout:?}"
+                ))));
+            }
+            match api.get(node_name).await {
+                Ok(node) => {
+                    let is_ready = node
+                        .status
+                        .as_ref()
+                        .and_then(|s| s.conditions.as_ref())
+                        .map(|conds| {
+                            conds
+                                .iter()
+                                .any(|c| c.type_ == "Ready" && c.status == "True")
+                        })
+                        .unwrap_or(false);
+                    if !is_ready {
+                        debug!("Node '{node_name}' is NotReady");
+                        return Ok(());
+                    }
+                }
+                Err(e) => debug!("Error polling node '{node_name}': {e}"),
+            }
+            sleep(poll).await;
+        }
+    }
+
+    async fn list_pods_on_node(&self, node_name: &str) -> Result<Vec<Pod>, Error> {
+        let api: Api<Pod> = Api::all(self.client.clone());
+        Ok(api
+            .list(&ListParams::default().fields(&format!("spec.nodeName={node_name}")))
+            .await?
+            .items)
+    }
+
+    fn is_mirror_pod(pod: &Pod) -> bool {
+        pod.metadata
+            .annotations
+            .as_ref()
+            .map(|a| a.contains_key("kubernetes.io/config.mirror"))
+            .unwrap_or(false)
+    }
+
+    fn is_daemonset_pod(pod: &Pod) -> bool {
+        pod.metadata
+            .owner_references
+            .as_ref()
+            .map(|refs| refs.iter().any(|r| r.kind == "DaemonSet"))
+            .unwrap_or(false)
+    }
+
+    fn has_emptydir_volume(pod: &Pod) -> bool {
+        pod.spec
+            .as_ref()
+            .and_then(|s| s.volumes.as_ref())
+            .map(|vols| vols.iter().any(|v| v.empty_dir.is_some()))
+            .unwrap_or(false)
+    }
+
+    fn is_completed_pod(pod: &Pod) -> bool {
+        pod.status
+            .as_ref()
+            .and_then(|s| s.phase.as_deref())
+            .map(|phase| phase == "Succeeded" || phase == "Failed")
+            .unwrap_or(false)
+    }
+
+    fn classify_pods_for_drain(
+        pods: &[Pod],
+        options: &DrainOptions,
+    ) -> Result<(Vec<Pod>, Vec<String>), String> {
+        let mut evictable = Vec::new();
+        let mut skipped = Vec::new();
+        let mut blocking = Vec::new();
+
+        for pod in pods {
+            let name = pod.metadata.name.as_deref().unwrap_or("<unknown>");
+            let ns = pod.metadata.namespace.as_deref().unwrap_or("<unknown>");
+            let qualified = format!("{ns}/{name}");
+
+            if Self::is_mirror_pod(pod) {
+                skipped.push(format!("{qualified} (mirror pod)"));
+                continue;
+            }
+            if Self::is_completed_pod(pod) {
+                skipped.push(format!("{qualified} (completed)"));
+                continue;
+            }
+            if Self::is_daemonset_pod(pod) {
+                if options.ignore_daemonsets {
+                    skipped.push(format!("{qualified} (DaemonSet-managed)"));
+                } else {
+                    blocking.push(format!(
+                        "{qualified} is managed by a DaemonSet (set ignore_daemonsets to skip)"
+                    ));
+                }
+                continue;
+            }
+            if Self::has_emptydir_volume(pod) && !options.delete_emptydir_data {
+                blocking.push(format!(
+                    "{qualified} uses emptyDir volumes (set delete_emptydir_data to allow eviction)"
+                ));
+                continue;
+            }
+            evictable.push(pod.clone());
+        }
+
+        if !blocking.is_empty() {
+            return Err(format!(
+                "Cannot drain node — the following pods block eviction:\n  - {}",
+                blocking.join("\n  - ")
+            ));
+        }
+        Ok((evictable, skipped))
+    }
+
+    async fn evict_pod(&self, pod: &Pod) -> Result<(), Error> {
+        let name = pod.metadata.name.as_deref().unwrap_or_default();
+        let ns = pod.metadata.namespace.as_deref().unwrap_or_default();
+        debug!("Evicting pod {ns}/{name}");
+        Api::<Pod>::namespaced(self.client.clone(), ns)
+            .evict(name, &EvictParams::default())
+            .await
+            .map(|_| ())
+    }
+
+    /// Drains a node: cordon → classify → evict & wait.
+    pub async fn drain_node(&self, node_name: &str, options: &DrainOptions) -> Result<(), Error> {
+        debug!("Cordoning '{node_name}'");
+        self.cordon_node(node_name).await?;
+
+        let pods = self.list_pods_on_node(node_name).await?;
+        debug!("Found {} pod(s) on '{node_name}'", pods.len());
+
+        let (evictable, skipped) =
+            Self::classify_pods_for_drain(&pods, options).map_err(|msg| {
+                error!("{msg}");
+                Error::Discovery(DiscoveryError::MissingResource(msg))
+            })?;
+
+        for s in &skipped {
+            info!("Skipping pod: {s}");
+        }
+        if evictable.is_empty() {
+            info!("No pods to evict on '{node_name}'");
+            return Ok(());
+        }
+        info!("Evicting {} pod(s) from '{node_name}'", evictable.len());
+
+        let mut start = tokio::time::Instant::now();
+        let poll = Duration::from_secs(5);
+        let mut pending = evictable;
+
+        loop {
+            for pod in &pending {
+                match self.evict_pod(pod).await {
+                    Ok(()) => {}
+                    Err(Error::Api(ErrorResponse { code: 404, .. })) => {}
+                    Err(Error::Api(ErrorResponse { code: 429, .. })) => {
+                        warn!(
+                            "PDB blocked eviction of {}/{}; will retry",
+                            pod.metadata.namespace.as_deref().unwrap_or(""),
+                            pod.metadata.name.as_deref().unwrap_or("")
+                        );
+                    }
+                    Err(e) => {
+                        error!(
+                            "Failed to evict {}/{}: {e}",
+                            pod.metadata.namespace.as_deref().unwrap_or(""),
+                            pod.metadata.name.as_deref().unwrap_or("")
+                        );
+                        return Err(e);
+                    }
+                }
+            }
+
+            sleep(poll).await;
+
+            let mut still_present = Vec::new();
+            for pod in pending {
+                let ns = pod.metadata.namespace.as_deref().unwrap_or_default();
+                let name = pod.metadata.name.as_deref().unwrap_or_default();
+                match self.get_pod(name, Some(ns)).await? {
+                    Some(_) => still_present.push(pod),
+                    None => debug!("Pod {ns}/{name} evicted"),
+                }
+            }
+            pending = still_present;
+
+            if pending.is_empty() {
+                break;
+            }
+
+            if start.elapsed() > options.timeout {
+                match helper::prompt_drain_timeout_action(
+                    node_name,
+                    pending.len(),
+                    options.timeout,
+                )? {
+                    helper::DrainTimeoutAction::Accept => break,
+                    helper::DrainTimeoutAction::Retry => {
+                        start = tokio::time::Instant::now();
+                        continue;
+                    }
+                    helper::DrainTimeoutAction::Abort => {
+                        return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                            "Drain aborted. {} pod(s) remaining on '{node_name}'",
+                            pending.len()
+                        ))));
+                    }
+                }
+            }
+            debug!("Waiting for {} pod(s) on '{node_name}'", pending.len());
+        }
+
+        debug!("'{node_name}' drained successfully");
+        Ok(())
+    }
+
+    /// Safely reboots a node: drain → reboot → wait for Ready → uncordon.
+    pub async fn reboot_node(
+        &self,
+        node_name: &str,
+        drain_options: &DrainOptions,
+        timeout: Duration,
+    ) -> Result<(), Error> {
+        info!("Starting reboot for '{node_name}'");
+        let node_api: Api<Node> = Api::all(self.client.clone());
+
+        let boot_id_before = node_api
+            .get(node_name)
+            .await?
+            .status
+            .as_ref()
+            .and_then(|s| s.node_info.as_ref())
+            .map(|ni| ni.boot_id.clone())
+            .ok_or_else(|| {
+                Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Node '{node_name}' has no boot_id in status"
+                )))
+            })?;
+
+        info!("Draining '{node_name}'");
+        self.drain_node(node_name, drain_options).await?;
+
+        let start = tokio::time::Instant::now();
+
+        info!("Scheduling reboot for '{node_name}'");
+        let reboot_cmd =
+            "echo rebooting ; nohup bash -c 'sleep 5 && nsenter -t 1 -m -- systemctl reboot'";
+        match self
+            .run_privileged_command_on_node(node_name, reboot_cmd)
+            .await
+        {
+            Ok(_) => debug!("Reboot command dispatched"),
+            Err(e) => debug!("Reboot command error (expected if node began shutdown): {e}"),
+        }
+
+        info!("Waiting for '{node_name}' to begin shutdown");
+        self.wait_for_node_not_ready(node_name, timeout.saturating_sub(start.elapsed()))
+            .await?;
+
+        if start.elapsed() > timeout {
+            return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Timeout during reboot of '{node_name}' (shutdown phase)"
+            ))));
+        }
+
+        info!("Waiting for '{node_name}' to come back online");
+        self.wait_for_node_ready_with_timeout(node_name, timeout.saturating_sub(start.elapsed()))
+            .await?;
+
+        if start.elapsed() > timeout {
+            return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Timeout during reboot of '{node_name}' (ready phase)"
+            ))));
+        }
+
+        let boot_id_after = node_api
+            .get(node_name)
+            .await?
+            .status
+            .as_ref()
+            .and_then(|s| s.node_info.as_ref())
+            .map(|ni| ni.boot_id.clone())
+            .ok_or_else(|| {
+                Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Node '{node_name}' has no boot_id after reboot"
+                )))
+            })?;
+
+        if boot_id_before == boot_id_after {
+            return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Node '{node_name}' did not actually reboot (boot_id unchanged: {boot_id_before})"
+            ))));
+        }
+
+        info!("'{node_name}' rebooted ({boot_id_before} → {boot_id_after})");
+        self.uncordon_node(node_name).await?;
+        info!("'{node_name}' reboot complete ({:?})", start.elapsed());
+        Ok(())
+    }
+
+    /// Write a set of files to a node's filesystem via a privileged ephemeral pod.
+    pub async fn write_files_to_node(
+        &self,
+        node_name: &str,
+        files: &[NodeFile],
+    ) -> Result<String, Error> {
+        let ns = self.client.default_namespace();
+        let suffix = SystemTime::now()
+            .duration_since(UNIX_EPOCH)
+            .unwrap()
+            .as_millis();
+        let name = format!("harmony-k8s-writer-{suffix}");
+
+        debug!("Writing {} file(s) to '{node_name}'", files.len());
+
+        let mut data = BTreeMap::new();
+        let mut script = String::from("set -e\n");
+        for (i, file) in files.iter().enumerate() {
+            let key = format!("f{i}");
+            data.insert(key.clone(), file.content.clone());
+            script.push_str(&format!("mkdir -p \"$(dirname \"/host{}\")\"\n", file.path));
+            script.push_str(&format!("cp \"/payload/{key}\" \"/host{}\"\n", file.path));
+            script.push_str(&format!("chmod {:o} \"/host{}\"\n", file.mode, file.path));
+        }
+
+        let cm = ConfigMap {
+            metadata: ObjectMeta {
+                name: Some(name.clone()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            data: Some(data),
+            ..Default::default()
+        };
+
+        let cm_api: Api<ConfigMap> = Api::namespaced(self.client.clone(), ns);
+        cm_api.create(&PostParams::default(), &cm).await?;
+        debug!("Created ConfigMap '{name}'");
+
+        let (host_vol, host_mount) = helper::host_root_volume();
+        let payload_vol = Volume {
+            name: "payload".to_string(),
+            config_map: Some(ConfigMapVolumeSource {
+                name: name.clone(),
+                ..Default::default()
+            }),
+            ..Default::default()
+        };
+        let payload_mount = VolumeMount {
+            name: "payload".to_string(),
+            mount_path: "/payload".to_string(),
+            ..Default::default()
+        };
+
+        let bundle = helper::build_privileged_bundle(
+            PrivilegedPodConfig {
+                name: name.clone(),
+                namespace: ns.to_string(),
+                node_name: node_name.to_string(),
+                container_name: "writer".to_string(),
+                command: vec!["/bin/bash".to_string(), "-c".to_string(), script],
+                volumes: vec![payload_vol, host_vol],
+                volume_mounts: vec![payload_mount, host_mount],
+                host_pid: false,
+                host_network: false,
+            },
+            &self.get_k8s_distribution().await?,
+        );
+
+        bundle.apply(self).await?;
+        debug!("Created privileged pod bundle '{name}'");
+
+        let result = self.wait_for_pod_completion(&name, ns).await;
+
+        debug!("Cleaning up '{name}'");
+        let _ = bundle.delete(self).await;
+        let _ = cm_api.delete(&name, &DeleteParams::default()).await;
+
+        result
+    }
+
+    /// Run a privileged command on a node via an ephemeral pod.
+    pub async fn run_privileged_command_on_node(
+        &self,
+        node_name: &str,
+        command: &str,
+    ) -> Result<String, Error> {
+        let namespace = self.client.default_namespace();
+        let suffix = SystemTime::now()
+            .duration_since(UNIX_EPOCH)
+            .unwrap()
+            .as_millis();
+        let name = format!("harmony-k8s-cmd-{suffix}");
+
+        debug!("Running privileged command on '{node_name}': {command}");
+
+        let (host_vol, host_mount) = helper::host_root_volume();
+        let bundle = helper::build_privileged_bundle(
+            PrivilegedPodConfig {
+                name: name.clone(),
+                namespace: namespace.to_string(),
+                node_name: node_name.to_string(),
+                container_name: "runner".to_string(),
+                command: vec![
+                    "/bin/bash".to_string(),
+                    "-c".to_string(),
+                    command.to_string(),
+                ],
+                volumes: vec![host_vol],
+                volume_mounts: vec![host_mount],
+                host_pid: true,
+                host_network: true,
+            },
+            &self.get_k8s_distribution().await?,
+        );
+
+        bundle.apply(self).await?;
+        debug!("Privileged pod '{name}' created");
+
+        let result = self.wait_for_pod_completion(&name, namespace).await;
+
+        debug!("Cleaning up '{name}'");
+        let _ = bundle.delete(self).await;
+
+        result
+    }
+}
+
+// ── Tests ────────────────────────────────────────────────────────────────────
+
+#[cfg(test)]
+mod tests {
+    use k8s_openapi::api::core::v1::{EmptyDirVolumeSource, PodSpec, PodStatus, Volume};
+    use k8s_openapi::apimachinery::pkg::apis::meta::v1::{ObjectMeta, OwnerReference};
+
+    use super::*;
+
+    fn base_pod(name: &str, ns: &str) -> Pod {
+        Pod {
+            metadata: ObjectMeta {
+                name: Some(name.to_string()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            spec: Some(PodSpec::default()),
+            status: Some(PodStatus {
+                phase: Some("Running".to_string()),
+                ..Default::default()
+            }),
+        }
+    }
+
+    fn mirror_pod(name: &str, ns: &str) -> Pod {
+        let mut pod = base_pod(name, ns);
+        pod.metadata.annotations = Some(std::collections::BTreeMap::from([(
+            "kubernetes.io/config.mirror".to_string(),
+            "abc123".to_string(),
+        )]));
+        pod
+    }
+
+    fn daemonset_pod(name: &str, ns: &str) -> Pod {
+        let mut pod = base_pod(name, ns);
+        pod.metadata.owner_references = Some(vec![OwnerReference {
+            api_version: "apps/v1".to_string(),
+            kind: "DaemonSet".to_string(),
+            name: "some-ds".to_string(),
+            uid: "uid-ds".to_string(),
+            ..Default::default()
+        }]);
+        pod
+    }
+
+    fn emptydir_pod(name: &str, ns: &str) -> Pod {
+        let mut pod = base_pod(name, ns);
+        pod.spec = Some(PodSpec {
+            volumes: Some(vec![Volume {
+                name: "scratch".to_string(),
+                empty_dir: Some(EmptyDirVolumeSource::default()),
+                ..Default::default()
+            }]),
+            ..Default::default()
+        });
+        pod
+    }
+
+    fn completed_pod(name: &str, ns: &str, phase: &str) -> Pod {
+        let mut pod = base_pod(name, ns);
+        pod.status = Some(PodStatus {
+            phase: Some(phase.to_string()),
+            ..Default::default()
+        });
+        pod
+    }
+
+    fn default_opts() -> DrainOptions {
+        DrainOptions::default()
+    }
+
+    // All test bodies are identical to the original — only the module path changed.
+
+    #[test]
+    fn empty_pod_list_returns_empty_vecs() {
+        let (e, s) = K8sClient::classify_pods_for_drain(&[], &default_opts()).unwrap();
+        assert!(e.is_empty());
+        assert!(s.is_empty());
+    }
+
+    #[test]
+    fn normal_pod_is_evictable() {
+        let pods = vec![base_pod("web", "default")];
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &default_opts()).unwrap();
+        assert_eq!(e.len(), 1);
+        assert!(s.is_empty());
+    }
+
+    #[test]
+    fn mirror_pod_is_skipped() {
+        let pods = vec![mirror_pod("kube-apiserver", "kube-system")];
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &default_opts()).unwrap();
+        assert!(e.is_empty());
+        assert!(s[0].contains("mirror pod"));
+    }
+
+    #[test]
+    fn completed_pods_are_skipped() {
+        for phase in ["Succeeded", "Failed"] {
+            let pods = vec![completed_pod("job", "batch", phase)];
+            let (e, s) = K8sClient::classify_pods_for_drain(&pods, &default_opts()).unwrap();
+            assert!(e.is_empty());
+            assert!(s[0].contains("completed"));
+        }
+    }
+
+    #[test]
+    fn daemonset_skipped_when_ignored() {
+        let pods = vec![daemonset_pod("fluentd", "logging")];
+        let opts = DrainOptions {
+            ignore_daemonsets: true,
+            ..default_opts()
+        };
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap();
+        assert!(e.is_empty());
+        assert!(s[0].contains("DaemonSet-managed"));
+    }
+
+    #[test]
+    fn daemonset_blocks_when_not_ignored() {
+        let pods = vec![daemonset_pod("fluentd", "logging")];
+        let opts = DrainOptions {
+            ignore_daemonsets: false,
+            ..default_opts()
+        };
+        let err = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap_err();
+        assert!(err.contains("DaemonSet") && err.contains("logging/fluentd"));
+    }
+
+    #[test]
+    fn emptydir_blocks_without_flag() {
+        let pods = vec![emptydir_pod("cache", "default")];
+        let opts = DrainOptions {
+            delete_emptydir_data: false,
+            ..default_opts()
+        };
+        let err = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap_err();
+        assert!(err.contains("emptyDir") && err.contains("default/cache"));
+    }
+
+    #[test]
+    fn emptydir_evictable_with_flag() {
+        let pods = vec![emptydir_pod("cache", "default")];
+        let opts = DrainOptions {
+            delete_emptydir_data: true,
+            ..default_opts()
+        };
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap();
+        assert_eq!(e.len(), 1);
+        assert!(s.is_empty());
+    }
+
+    #[test]
+    fn multiple_blocking_all_reported() {
+        let pods = vec![daemonset_pod("ds", "ns1"), emptydir_pod("ed", "ns2")];
+        let opts = DrainOptions {
+            ignore_daemonsets: false,
+            delete_emptydir_data: false,
+            ..default_opts()
+        };
+        let err = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap_err();
+        assert!(err.contains("ns1/ds") && err.contains("ns2/ed"));
+    }
+
+    #[test]
+    fn mixed_pods_classified_correctly() {
+        let pods = vec![
+            base_pod("web", "default"),
+            mirror_pod("kube-apiserver", "kube-system"),
+            daemonset_pod("fluentd", "logging"),
+            completed_pod("job", "batch", "Succeeded"),
+            base_pod("api", "default"),
+        ];
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &default_opts()).unwrap();
+        let names: Vec<&str> = e
+            .iter()
+            .map(|p| p.metadata.name.as_deref().unwrap())
+            .collect();
+        assert_eq!(names, vec!["web", "api"]);
+        assert_eq!(s.len(), 3);
+    }
+
+    #[test]
+    fn mirror_checked_before_completed() {
+        let mut pod = mirror_pod("static-etcd", "kube-system");
+        pod.status = Some(PodStatus {
+            phase: Some("Succeeded".to_string()),
+            ..Default::default()
+        });
+        let (_, s) = K8sClient::classify_pods_for_drain(&[pod], &default_opts()).unwrap();
+        assert!(s[0].contains("mirror pod"), "got: {}", s[0]);
+    }
+
+    #[test]
+    fn completed_checked_before_daemonset() {
+        let mut pod = daemonset_pod("collector", "monitoring");
+        pod.status = Some(PodStatus {
+            phase: Some("Failed".to_string()),
+            ..Default::default()
+        });
+        let (_, s) = K8sClient::classify_pods_for_drain(&[pod], &default_opts()).unwrap();
+        assert!(s[0].contains("completed"), "got: {}", s[0]);
+    }
+}
--- a/harmony-k8s/src/pod.rs
+++ b/harmony-k8s/src/pod.rs
@@ -0,0 +1,193 @@
+use std::time::Duration;
+
+use k8s_openapi::api::core::v1::Pod;
+use kube::{
+    Error,
+    api::{Api, AttachParams, ListParams},
+    error::DiscoveryError,
+    runtime::reflector::Lookup,
+};
+use log::debug;
+use tokio::io::AsyncReadExt;
+use tokio::time::sleep;
+
+use crate::client::K8sClient;
+
+impl K8sClient {
+    pub async fn get_pod(&self, name: &str, namespace: Option<&str>) -> Result<Option<Pod>, Error> {
+        let api: Api<Pod> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        api.get_opt(name).await
+    }
+
+    pub async fn wait_for_pod_ready(
+        &self,
+        pod_name: &str,
+        namespace: Option<&str>,
+    ) -> Result<(), Error> {
+        let mut elapsed = 0u64;
+        let interval = 5u64;
+        let timeout_secs = 120u64;
+        loop {
+            if let Some(p) = self.get_pod(pod_name, namespace).await? {
+                if let Some(phase) = p.status.and_then(|s| s.phase) {
+                    if phase.to_lowercase() == "running" {
+                        return Ok(());
+                    }
+                }
+            }
+            if elapsed >= timeout_secs {
+                return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Pod '{}' in '{}' did not become ready within {timeout_secs}s",
+                    pod_name,
+                    namespace.unwrap_or("<default>"),
+                ))));
+            }
+            sleep(Duration::from_secs(interval)).await;
+            elapsed += interval;
+        }
+    }
+
+    /// Polls a pod until it reaches `Succeeded` or `Failed`, then returns its
+    /// logs.  Used internally by node operations.
+    pub(crate) async fn wait_for_pod_completion(
+        &self,
+        name: &str,
+        namespace: &str,
+    ) -> Result<String, Error> {
+        let api: Api<Pod> = Api::namespaced(self.client.clone(), namespace);
+        let poll_interval = Duration::from_secs(2);
+        for _ in 0..60 {
+            sleep(poll_interval).await;
+            let p = api.get(name).await?;
+            match p.status.and_then(|s| s.phase).as_deref() {
+                Some("Succeeded") => {
+                    let logs = api
+                        .logs(name, &Default::default())
+                        .await
+                        .unwrap_or_default();
+                    debug!("Pod {namespace}/{name} succeeded. Logs: {logs}");
+                    return Ok(logs);
+                }
+                Some("Failed") => {
+                    let logs = api
+                        .logs(name, &Default::default())
+                        .await
+                        .unwrap_or_default();
+                    debug!("Pod {namespace}/{name} failed. Logs: {logs}");
+                    return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                        "Pod '{name}' failed.\n{logs}"
+                    ))));
+                }
+                _ => {}
+            }
+        }
+        Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+            "Timed out waiting for pod '{name}'"
+        ))))
+    }
+
+    /// Execute a command in the first pod matching `{label}={name}`.
+    pub async fn exec_app_capture_output(
+        &self,
+        name: String,
+        label: String,
+        namespace: Option<&str>,
+        command: Vec<&str>,
+    ) -> Result<String, String> {
+        let api: Api<Pod> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        let pod_list = api
+            .list(&ListParams::default().labels(&format!("{label}={name}")))
+            .await
+            .expect("Failed to list pods");
+
+        let pod_name = pod_list
+            .items
+            .first()
+            .expect("No matching pod")
+            .name()
+            .expect("Pod has no name")
+            .into_owned();
+
+        match api
+            .exec(
+                &pod_name,
+                command,
+                &AttachParams::default().stdout(true).stderr(true),
+            )
+            .await
+        {
+            Err(e) => Err(e.to_string()),
+            Ok(mut process) => {
+                let status = process
+                    .take_status()
+                    .expect("No status handle")
+                    .await
+                    .expect("Status channel closed");
+
+                if let Some(s) = status.status {
+                    let mut buf = String::new();
+                    if let Some(mut stdout) = process.stdout() {
+                        stdout
+                            .read_to_string(&mut buf)
+                            .await
+                            .map_err(|e| format!("Failed to read stdout: {e}"))?;
+                    }
+                    debug!("exec status: {} - {:?}", s, status.details);
+                    if s == "Success" { Ok(buf) } else { Err(s) }
+                } else {
+                    Err("No inner status from pod exec".to_string())
+                }
+            }
+        }
+    }
+
+    /// Execute a command in the first pod matching
+    /// `app.kubernetes.io/name={name}`.
+    pub async fn exec_app(
+        &self,
+        name: String,
+        namespace: Option<&str>,
+        command: Vec<&str>,
+    ) -> Result<(), String> {
+        let api: Api<Pod> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        let pod_list = api
+            .list(&ListParams::default().labels(&format!("app.kubernetes.io/name={name}")))
+            .await
+            .expect("Failed to list pods");
+
+        let pod_name = pod_list
+            .items
+            .first()
+            .expect("No matching pod")
+            .name()
+            .expect("Pod has no name")
+            .into_owned();
+
+        match api.exec(&pod_name, command, &AttachParams::default()).await {
+            Err(e) => Err(e.to_string()),
+            Ok(mut process) => {
+                let status = process
+                    .take_status()
+                    .expect("No status handle")
+                    .await
+                    .expect("Status channel closed");
+
+                if let Some(s) = status.status {
+                    debug!("exec status: {} - {:?}", s, status.details);
+                    if s == "Success" { Ok(()) } else { Err(s) }
+                } else {
+                    Err("No inner status from pod exec".to_string())
+                }
+            }
+        }
+    }
+}
--- a/harmony-k8s/src/resources.rs
+++ b/harmony-k8s/src/resources.rs
@@ -0,0 +1,316 @@
+use std::collections::HashMap;
+
+use k8s_openapi::api::{
+    apps::v1::Deployment,
+    core::v1::{Node, ServiceAccount},
+};
+use k8s_openapi::apiextensions_apiserver::pkg::apis::apiextensions::v1::CustomResourceDefinition;
+use kube::api::ApiResource;
+use kube::{
+    Error, Resource,
+    api::{Api, DynamicObject, GroupVersionKind, ListParams, ObjectList},
+    runtime::conditions,
+    runtime::wait::await_condition,
+};
+use log::debug;
+use serde::de::DeserializeOwned;
+use serde_json::Value;
+use std::time::Duration;
+
+use crate::client::K8sClient;
+use crate::types::ScopeResolver;
+
+impl K8sClient {
+    pub async fn has_healthy_deployment_with_label(
+        &self,
+        namespace: &str,
+        label_selector: &str,
+    ) -> Result<bool, Error> {
+        let api: Api<Deployment> = Api::namespaced(self.client.clone(), namespace);
+        let list = api
+            .list(&ListParams::default().labels(label_selector))
+            .await?;
+        for d in list.items {
+            let available = d
+                .status
+                .as_ref()
+                .and_then(|s| s.available_replicas)
+                .unwrap_or(0);
+            if available > 0 {
+                return Ok(true);
+            }
+            if let Some(conds) = d.status.as_ref().and_then(|s| s.conditions.as_ref()) {
+                if conds
+                    .iter()
+                    .any(|c| c.type_ == "Available" && c.status == "True")
+                {
+                    return Ok(true);
+                }
+            }
+        }
+        Ok(false)
+    }
+
+    pub async fn list_namespaces_with_healthy_deployments(
+        &self,
+        label_selector: &str,
+    ) -> Result<Vec<String>, Error> {
+        let api: Api<Deployment> = Api::all(self.client.clone());
+        let list = api
+            .list(&ListParams::default().labels(label_selector))
+            .await?;
+
+        let mut healthy_ns: HashMap<String, bool> = HashMap::new();
+        for d in list.items {
+            let ns = match d.metadata.namespace.clone() {
+                Some(n) => n,
+                None => continue,
+            };
+            let available = d
+                .status
+                .as_ref()
+                .and_then(|s| s.available_replicas)
+                .unwrap_or(0);
+            let is_healthy = if available > 0 {
+                true
+            } else {
+                d.status
+                    .as_ref()
+                    .and_then(|s| s.conditions.as_ref())
+                    .map(|c| {
+                        c.iter()
+                            .any(|c| c.type_ == "Available" && c.status == "True")
+                    })
+                    .unwrap_or(false)
+            };
+            if is_healthy {
+                healthy_ns.insert(ns, true);
+            }
+        }
+        Ok(healthy_ns.into_keys().collect())
+    }
+
+    pub async fn get_controller_service_account_name(
+        &self,
+        ns: &str,
+    ) -> Result<Option<String>, Error> {
+        let api: Api<Deployment> = Api::namespaced(self.client.clone(), ns);
+        let list = api
+            .list(&ListParams::default().labels("app.kubernetes.io/component=controller"))
+            .await?;
+        if let Some(dep) = list.items.first() {
+            if let Some(sa) = dep
+                .spec
+                .as_ref()
+                .and_then(|s| s.template.spec.as_ref())
+                .and_then(|s| s.service_account_name.clone())
+            {
+                return Ok(Some(sa));
+            }
+        }
+        Ok(None)
+    }
+
+    pub async fn list_clusterrolebindings_json(&self) -> Result<Vec<Value>, Error> {
+        let gvk = GroupVersionKind::gvk("rbac.authorization.k8s.io", "v1", "ClusterRoleBinding");
+        let ar = ApiResource::from_gvk(&gvk);
+        let api: Api<DynamicObject> = Api::all_with(self.client.clone(), &ar);
+        let list = api.list(&ListParams::default()).await?;
+        Ok(list
+            .items
+            .into_iter()
+            .map(|o| serde_json::to_value(&o).unwrap_or(Value::Null))
+            .collect())
+    }
+
+    pub async fn is_service_account_cluster_wide(&self, sa: &str, ns: &str) -> Result<bool, Error> {
+        let sa_user = format!("system:serviceaccount:{ns}:{sa}");
+        for crb in self.list_clusterrolebindings_json().await? {
+            if let Some(subjects) = crb.get("subjects").and_then(|s| s.as_array()) {
+                for subj in subjects {
+                    let kind = subj.get("kind").and_then(|v| v.as_str()).unwrap_or("");
+                    let name = subj.get("name").and_then(|v| v.as_str()).unwrap_or("");
+                    let subj_ns = subj.get("namespace").and_then(|v| v.as_str()).unwrap_or("");
+                    if (kind == "ServiceAccount" && name == sa && subj_ns == ns)
+                        || (kind == "User" && name == sa_user)
+                    {
+                        return Ok(true);
+                    }
+                }
+            }
+        }
+        Ok(false)
+    }
+
+    pub async fn has_crd(&self, name: &str) -> Result<bool, Error> {
+        let api: Api<CustomResourceDefinition> = Api::all(self.client.clone());
+        let crds = api
+            .list(&ListParams::default().fields(&format!("metadata.name={name}")))
+            .await?;
+        Ok(!crds.items.is_empty())
+    }
+
+    pub async fn service_account_api(&self, namespace: &str) -> Api<ServiceAccount> {
+        Api::namespaced(self.client.clone(), namespace)
+    }
+
+    pub async fn get_resource_json_value(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+        gvk: &GroupVersionKind,
+    ) -> Result<DynamicObject, Error> {
+        let ar = ApiResource::from_gvk(gvk);
+        let api: Api<DynamicObject> = match namespace {
+            Some(ns) => Api::namespaced_with(self.client.clone(), ns, &ar),
+            None => Api::default_namespaced_with(self.client.clone(), &ar),
+        };
+        api.get(name).await
+    }
+
+    pub async fn get_secret_json_value(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+    ) -> Result<DynamicObject, Error> {
+        self.get_resource_json_value(
+            name,
+            namespace,
+            &GroupVersionKind {
+                group: String::new(),
+                version: "v1".to_string(),
+                kind: "Secret".to_string(),
+            },
+        )
+        .await
+    }
+
+    pub async fn get_deployment(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+    ) -> Result<Option<Deployment>, Error> {
+        let api: Api<Deployment> = match namespace {
+            Some(ns) => {
+                debug!("Getting namespaced deployment '{name}' in '{ns}'");
+                Api::namespaced(self.client.clone(), ns)
+            }
+            None => {
+                debug!("Getting deployment '{name}' in default namespace");
+                Api::default_namespaced(self.client.clone())
+            }
+        };
+        api.get_opt(name).await
+    }
+
+    pub async fn scale_deployment(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+        replicas: u32,
+    ) -> Result<(), Error> {
+        let api: Api<Deployment> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        use kube::api::{Patch, PatchParams};
+        use serde_json::json;
+        let patch = json!({ "spec": { "replicas": replicas } });
+        api.patch_scale(name, &PatchParams::default(), &Patch::Merge(&patch))
+            .await?;
+        Ok(())
+    }
+
+    pub async fn delete_deployment(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+    ) -> Result<(), Error> {
+        let api: Api<Deployment> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        api.delete(name, &kube::api::DeleteParams::default())
+            .await?;
+        Ok(())
+    }
+
+    pub async fn wait_until_deployment_ready(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+        timeout: Option<Duration>,
+    ) -> Result<(), String> {
+        let api: Api<Deployment> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        let timeout = timeout.unwrap_or(Duration::from_secs(120));
+        let establish = await_condition(api, name, conditions::is_deployment_completed());
+        tokio::time::timeout(timeout, establish)
+            .await
+            .map(|_| ())
+            .map_err(|_| "Timed out waiting for deployment".to_string())
+    }
+
+    /// Gets a single named resource, using the correct API scope for `K`.
+    pub async fn get_resource<K>(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+    ) -> Result<Option<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
+        <K as Resource>::Scope: ScopeResolver<K>,
+        <K as Resource>::DynamicType: Default,
+    {
+        let api: Api<K> =
+            <<K as Resource>::Scope as ScopeResolver<K>>::get_api(&self.client, namespace);
+        api.get_opt(name).await
+    }
+
+    pub async fn list_resources<K>(
+        &self,
+        namespace: Option<&str>,
+        list_params: Option<ListParams>,
+    ) -> Result<ObjectList<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
+        <K as Resource>::Scope: ScopeResolver<K>,
+        <K as Resource>::DynamicType: Default,
+    {
+        let api: Api<K> =
+            <<K as Resource>::Scope as ScopeResolver<K>>::get_api(&self.client, namespace);
+        api.list(&list_params.unwrap_or_default()).await
+    }
+
+    pub async fn list_all_resources_with_labels<K>(&self, labels: &str) -> Result<Vec<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
+        <K as Resource>::DynamicType: Default,
+    {
+        Api::<K>::all(self.client.clone())
+            .list(&ListParams::default().labels(labels))
+            .await
+            .map(|l| l.items)
+    }
+
+    pub async fn get_all_resource_in_all_namespace<K>(&self) -> Result<Vec<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
+        <K as Resource>::Scope: ScopeResolver<K>,
+        <K as Resource>::DynamicType: Default,
+    {
+        Api::<K>::all(self.client.clone())
+            .list(&Default::default())
+            .await
+            .map(|l| l.items)
+    }
+
+    pub async fn get_nodes(
+        &self,
+        list_params: Option<ListParams>,
+    ) -> Result<ObjectList<Node>, Error> {
+        self.list_resources(None, list_params).await
+    }
+}
--- a/harmony-k8s/src/types.rs
+++ b/harmony-k8s/src/types.rs
@@ -0,0 +1,100 @@
+use std::time::Duration;
+
+use k8s_openapi::{ClusterResourceScope, NamespaceResourceScope};
+use kube::{Api, Client, Resource};
+use serde::Serialize;
+
+/// Which Kubernetes distribution is running. Detected once at runtime via
+/// [`crate::discovery::K8sClient::get_k8s_distribution`].
+#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
+pub enum KubernetesDistribution {
+    Default,
+    OpenshiftFamily,
+    K3sFamily,
+}
+
+/// A file to be written to a node's filesystem.
+#[derive(Debug, Clone)]
+pub struct NodeFile {
+    /// Absolute path on the host where the file should be written.
+    pub path: String,
+    /// Content of the file.
+    pub content: String,
+    /// UNIX permissions (e.g. `0o600`).
+    pub mode: u32,
+}
+
+/// Options controlling the behaviour of a [`crate::K8sClient::drain_node`] operation.
+#[derive(Debug, Clone)]
+pub struct DrainOptions {
+    /// Evict pods that use `emptyDir` volumes (ephemeral data is lost).
+    /// Equivalent to `kubectl drain --delete-emptydir-data`.
+    pub delete_emptydir_data: bool,
+    /// Silently skip DaemonSet-managed pods instead of blocking the drain.
+    /// Equivalent to `kubectl drain --ignore-daemonsets`.
+    pub ignore_daemonsets: bool,
+    /// Maximum wall-clock time to wait for all evictions to complete.
+    pub timeout: Duration,
+}
+
+impl Default for DrainOptions {
+    fn default() -> Self {
+        Self {
+            delete_emptydir_data: false,
+            ignore_daemonsets: true,
+            timeout: Duration::from_secs(1),
+        }
+    }
+}
+
+impl DrainOptions {
+    pub fn default_ignore_daemonset_delete_emptydir_data() -> Self {
+        Self {
+            delete_emptydir_data: true,
+            ignore_daemonsets: true,
+            ..Self::default()
+        }
+    }
+}
+
+/// Controls how [`crate::K8sClient::apply_with_strategy`] behaves when the
+/// resource already exists (or does not).
+pub enum WriteMode {
+    /// Server-side apply; create if absent, update if present (default).
+    CreateOrUpdate,
+    /// POST only; return an error if the resource already exists.
+    Create,
+    /// Server-side apply only; return an error if the resource does not exist.
+    Update,
+}
+
+// ── Scope resolution trait ───────────────────────────────────────────────────
+
+/// Resolves the correct [`kube::Api`] for a resource type based on its scope
+/// (cluster-wide vs. namespace-scoped).
+pub trait ScopeResolver<K: Resource> {
+    fn get_api(client: &Client, ns: Option<&str>) -> Api<K>;
+}
+
+impl<K> ScopeResolver<K> for ClusterResourceScope
+where
+    K: Resource<Scope = ClusterResourceScope>,
+    <K as Resource>::DynamicType: Default,
+{
+    fn get_api(client: &Client, _ns: Option<&str>) -> Api<K> {
+        Api::all(client.clone())
+    }
+}
+
+impl<K> ScopeResolver<K> for NamespaceResourceScope
+where
+    K: Resource<Scope = NamespaceResourceScope>,
+    <K as Resource>::DynamicType: Default,
+{
+    fn get_api(client: &Client, ns: Option<&str>) -> Api<K> {
+        match ns {
+            Some(ns) => Api::namespaced(client.clone(), ns),
+            None => Api::default_namespaced(client.clone()),
+        }
+    }
+}
--- a/harmony/Cargo.toml
+++ b/harmony/Cargo.toml
@@ -21,6 +21,8 @@ semver = "1.0.23"
 serde.workspace = true
 serde_json.workspace = true
 tokio.workspace = true
+tokio-retry.workspace = true
+tokio-util.workspace = true
 derive-new.workspace = true
 log.workspace = true
 env_logger.workspace = true
@@ -31,6 +33,7 @@ opnsense-config-xml = { path = "../opnsense-config-xml" }
 harmony_macros = { path = "../harmony_macros" }
 harmony_types = { path = "../harmony_types" }
 harmony_execution = { path = "../harmony_execution" }
+harmony-k8s = { path = "../harmony-k8s" }
 uuid.workspace = true
 url.workspace = true
 kube = { workspace = true, features = ["derive"] }
@@ -60,7 +63,6 @@ temp-dir = "0.1.14"
 dyn-clone = "1.0.19"
 similar.workspace = true
 futures-util = "0.3.31"
-tokio-util = "0.7.15"
 strum = { version = "0.27.1", features = ["derive"] }
 tempfile.workspace = true
 serde_with = "3.14.0"
@@ -80,7 +82,7 @@ sqlx.workspace = true
 inquire.workspace = true
 brocade = { path = "../brocade" }
 option-ext = "0.2.0"
-tokio-retry = "0.3.0"
+rand.workspace = true

 [dev-dependencies]
 pretty_assertions.workspace = true
--- a/harmony/src/domain/interpret/mod.rs
+++ b/harmony/src/domain/interpret/mod.rs
@@ -4,8 +4,6 @@ use std::error::Error;
 use async_trait::async_trait;
 use derive_new::new;

-use crate::inventory::HostRole;
-
 use super::{
    data::Version, executors::ExecutorError, inventory::Inventory, topology::PreparationError,
 };
--- a/harmony/src/domain/topology/ha_cluster.rs
+++ b/harmony/src/domain/topology/ha_cluster.rs
@@ -1,4 +1,5 @@
 use async_trait::async_trait;
+use harmony_k8s::K8sClient;
 use harmony_macros::ip;
 use harmony_types::{
    id::Id,
@@ -8,7 +9,7 @@ use harmony_types::{
 use log::debug;
 use log::info;

-use crate::topology::PxeOptions;
+use crate::topology::{HelmCommand, PxeOptions};
 use crate::{data::FileContent, executors::ExecutorError, topology::node_exporter::NodeExporter};
 use crate::{infra::network_manager::OpenShiftNmStateNetworkManager, topology::PortConfig};

@@ -16,9 +17,12 @@ use super::{
    DHCPStaticEntry, DhcpServer, DnsRecord, DnsRecordType, DnsServer, Firewall, HostNetworkConfig,
    HttpServer, IpAddress, K8sclient, LoadBalancer, LoadBalancerService, LogicalHost, NetworkError,
    NetworkManager, PreparationError, PreparationOutcome, Router, Switch, SwitchClient,
-    SwitchError, TftpServer, Topology, k8s::K8sClient,
+    SwitchError, TftpServer, Topology,
+};
+use std::{
+    process::Command,
+    sync::{Arc, OnceLock},
 };
-use std::sync::{Arc, OnceLock};

 #[derive(Debug, Clone)]
 pub struct HAClusterTopology {
@@ -52,6 +56,30 @@ impl Topology for HAClusterTopology {
    }
 }

+impl HelmCommand for HAClusterTopology {
+    fn get_helm_command(&self) -> Command {
+        let mut cmd = Command::new("helm");
+        if let Some(k) = &self.kubeconfig {
+            cmd.args(["--kubeconfig", k]);
+        }
+
+        // FIXME we should support context anywhere there is a k8sclient
+        // This likely belongs in the k8sclient itself and should be extracted to a separate
+        // crate
+        //
+        // I feel like helm could very well be a feature of this external k8s client.
+        //
+        // Same for kustomize
+        //
+        // if let Some(c) = &self.k8s_context {
+        //     cmd.args(["--kube-context", c]);
+        // }
+
+        info!("Using helm command {cmd:?}");
+        cmd
+    }
+}
+
 #[async_trait]
 impl K8sclient for HAClusterTopology {
    async fn k8s_client(&self) -> Result<Arc<K8sClient>, String> {
--- a/harmony/src/domain/topology/k8s/mod.rs
+++ b/harmony/src/domain/topology/k8s/mod.rs
--- a/harmony/src/domain/topology/k8s_anywhere/k8s_anywhere.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/k8s_anywhere.rs
@@ -1,13 +1,14 @@
-use std::{collections::BTreeMap, process::Command, sync::Arc, time::Duration};
+use std::{collections::BTreeMap, process::Command, sync::Arc};

 use async_trait::async_trait;
 use base64::{Engine, engine::general_purpose};
+use harmony_k8s::{K8sClient, KubernetesDistribution};
 use harmony_types::rfc1123::Rfc1123Name;
 use k8s_openapi::api::{
    core::v1::{Pod, Secret},
    rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
 };
-use kube::api::{DynamicObject, GroupVersionKind, ObjectMeta};
+use kube::api::{GroupVersionKind, ObjectMeta};
 use log::{debug, info, trace, warn};
 use serde::Serialize;
 use tokio::sync::OnceCell;
@@ -28,28 +29,7 @@ use crate::{
            score_cert_management::CertificateManagementScore,
        },
        k3d::K3DInstallationScore,
-        k8s::ingress::{K8sIngressScore, PathType},
-        monitoring::{
-            grafana::{grafana::Grafana, helm::helm_grafana::grafana_helm_chart_score},
-            kube_prometheus::crd::{
-                crd_alertmanager_config::CRDPrometheus,
-                crd_grafana::{
-                    Grafana as GrafanaCRD, GrafanaCom, GrafanaDashboard,
-                    GrafanaDashboardDatasource, GrafanaDashboardSpec, GrafanaDatasource,
-                    GrafanaDatasourceConfig, GrafanaDatasourceJsonData,
-                    GrafanaDatasourceSecureJsonData, GrafanaDatasourceSpec, GrafanaSpec,
-                },
-                crd_prometheuses::LabelSelector,
-                prometheus_operator::prometheus_operator_helm_chart_score,
-                rhob_alertmanager_config::RHOBObservability,
-                service_monitor::ServiceMonitor,
-            },
-        },
        okd::{crd::ingresses_config::Ingress as IngressResource, route::OKDTlsPassthroughScore},
-        prometheus::{
-            k8s_prometheus_alerting_score::K8sPrometheusCRDAlertingScore,
-            prometheus::PrometheusMonitoring, rhob_alerting_score::RHOBAlertingScore,
-        },
    },
    score::Score,
    topology::{TlsRoute, TlsRouter, ingress::Ingress},
@@ -58,8 +38,6 @@ use crate::{
 use super::super::{
    DeploymentTarget, HelmCommand, K8sclient, MultiTargetTopology, PreparationError,
    PreparationOutcome, Topology,
-    k8s::K8sClient,
-    oberservability::monitoring::AlertReceiver,
    tenant::{
        TenantConfig, TenantManager,
        k8s::K8sTenantManager,
@@ -76,13 +54,6 @@ struct K8sState {
    message: String,
 }

-#[derive(Debug, Clone, Serialize)]
-pub enum KubernetesDistribution {
-    OpenshiftFamily,
-    K3sFamily,
-    Default,
-}
-
 #[derive(Debug, Clone)]
 enum K8sSource {
    LocalK3d,
@@ -173,216 +144,6 @@ impl TlsRouter for K8sAnywhereTopology {
    }
 }

-#[async_trait]
-impl Grafana for K8sAnywhereTopology {
-    async fn ensure_grafana_operator(
-        &self,
-        inventory: &Inventory,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        debug!("ensure grafana operator");
-        let client = self.k8s_client().await.unwrap();
-        let grafana_gvk = GroupVersionKind {
-            group: "grafana.integreatly.org".to_string(),
-            version: "v1beta1".to_string(),
-            kind: "Grafana".to_string(),
-        };
-        let name = "grafanas.grafana.integreatly.org";
-        let ns = "grafana";
-
-        let grafana_crd = client
-            .get_resource_json_value(name, Some(ns), &grafana_gvk)
-            .await;
-        match grafana_crd {
-            Ok(_) => {
-                return Ok(PreparationOutcome::Success {
-                    details: "Found grafana CRDs in cluster".to_string(),
-                });
-            }
-
-            Err(_) => {
-                return self
-                    .install_grafana_operator(inventory, Some("grafana"))
-                    .await;
-            }
-        };
-    }
-    async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError> {
-        let ns = "grafana";
-
-        let mut label = BTreeMap::new();
-
-        label.insert("dashboards".to_string(), "grafana".to_string());
-
-        let label_selector = LabelSelector {
-            match_labels: label.clone(),
-            match_expressions: vec![],
-        };
-
-        let client = self.k8s_client().await?;
-
-        let grafana = self.build_grafana(ns, &label);
-
-        client.apply(&grafana, Some(ns)).await?;
-        //TODO change this to a ensure ready or something better than just a timeout
-        client
-            .wait_until_deployment_ready(
-                "grafana-grafana-deployment",
-                Some("grafana"),
-                Some(Duration::from_secs(30)),
-            )
-            .await?;
-
-        let sa_name = "grafana-grafana-sa";
-        let token_secret_name = "grafana-sa-token-secret";
-
-        let sa_token_secret = self.build_sa_token_secret(token_secret_name, sa_name, ns);
-
-        client.apply(&sa_token_secret, Some(ns)).await?;
-        let secret_gvk = GroupVersionKind {
-            group: "".to_string(),
-            version: "v1".to_string(),
-            kind: "Secret".to_string(),
-        };
-
-        let secret = client
-            .get_resource_json_value(token_secret_name, Some(ns), &secret_gvk)
-            .await?;
-
-        let token = format!(
-            "Bearer {}",
-            self.extract_and_normalize_token(&secret).unwrap()
-        );
-
-        debug!("creating grafana clusterrole binding");
-
-        let clusterrolebinding =
-            self.build_cluster_rolebinding(sa_name, "cluster-monitoring-view", ns);
-
-        client.apply(&clusterrolebinding, Some(ns)).await?;
-
-        debug!("creating grafana datasource crd");
-
-        let thanos_url = format!(
-            "https://{}",
-            self.get_domain("thanos-querier-openshift-monitoring")
-                .await
-                .unwrap()
-        );
-
-        let thanos_openshift_datasource = self.build_grafana_datasource(
-            "thanos-openshift-monitoring",
-            ns,
-            &label_selector,
-            &thanos_url,
-            &token,
-        );
-
-        client.apply(&thanos_openshift_datasource, Some(ns)).await?;
-
-        debug!("creating grafana dashboard crd");
-        let dashboard = self.build_grafana_dashboard(ns, &label_selector);
-
-        client.apply(&dashboard, Some(ns)).await?;
-        debug!("creating grafana ingress");
-        let grafana_ingress = self.build_grafana_ingress(ns).await;
-
-        grafana_ingress
-            .interpret(&Inventory::empty(), self)
-            .await
-            .map_err(|e| PreparationError::new(e.to_string()))?;
-
-        Ok(PreparationOutcome::Success {
-            details: "Installed grafana composants".to_string(),
-        })
-    }
-}
-
-#[async_trait]
-impl PrometheusMonitoring<CRDPrometheus> for K8sAnywhereTopology {
-    async fn install_prometheus(
-        &self,
-        sender: &CRDPrometheus,
-        _inventory: &Inventory,
-        _receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let client = self.k8s_client().await?;
-
-        for monitor in sender.service_monitor.iter() {
-            client
-                .apply(monitor, Some(&sender.namespace))
-                .await
-                .map_err(|e| PreparationError::new(e.to_string()))?;
-        }
-        Ok(PreparationOutcome::Success {
-            details: "successfuly installed prometheus components".to_string(),
-        })
-    }
-
-    async fn ensure_prometheus_operator(
-        &self,
-        sender: &CRDPrometheus,
-        _inventory: &Inventory,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let po_result = self.ensure_prometheus_operator(sender).await?;
-
-        match po_result {
-            PreparationOutcome::Success { details: _ } => {
-                debug!("Detected prometheus crds operator present in cluster.");
-                return Ok(po_result);
-            }
-            PreparationOutcome::Noop => {
-                debug!("Skipping Prometheus CR installation due to missing operator.");
-                return Ok(po_result);
-            }
-        }
-    }
-}
-
-#[async_trait]
-impl PrometheusMonitoring<RHOBObservability> for K8sAnywhereTopology {
-    async fn install_prometheus(
-        &self,
-        sender: &RHOBObservability,
-        inventory: &Inventory,
-        receivers: Option<Vec<Box<dyn AlertReceiver<RHOBObservability>>>>,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let po_result = self.ensure_cluster_observability_operator(sender).await?;
-
-        if po_result == PreparationOutcome::Noop {
-            debug!("Skipping Prometheus CR installation due to missing operator.");
-            return Ok(po_result);
-        }
-
-        let result = self
-            .get_cluster_observability_operator_prometheus_application_score(
-                sender.clone(),
-                receivers,
-            )
-            .await
-            .interpret(inventory, self)
-            .await;
-
-        match result {
-            Ok(outcome) => match outcome.status {
-                InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
-                    details: outcome.message,
-                }),
-                InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
-                _ => Err(PreparationError::new(outcome.message)),
-            },
-            Err(err) => Err(PreparationError::new(err.to_string())),
-        }
-    }
-
-    async fn ensure_prometheus_operator(
-        &self,
-        sender: &RHOBObservability,
-        inventory: &Inventory,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        todo!()
-    }
-}
-
 impl Serialize for K8sAnywhereTopology {
    fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
    where
@@ -587,23 +348,6 @@ impl K8sAnywhereTopology {
        }
    }

-    fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
-        let token_b64 = secret
-            .data
-            .get("token")
-            .or_else(|| secret.data.get("data").and_then(|d| d.get("token")))
-            .and_then(|v| v.as_str())?;
-
-        let bytes = general_purpose::STANDARD.decode(token_b64).ok()?;
-
-        let s = String::from_utf8(bytes).ok()?;
-
-        let cleaned = s
-            .trim_matches(|c: char| c.is_whitespace() || c == '\0')
-            .to_string();
-        Some(cleaned)
-    }
-
    pub async fn get_k8s_distribution(&self) -> Result<KubernetesDistribution, PreparationError> {
        self.k8s_client()
            .await?
@@ -663,141 +407,6 @@ impl K8sAnywhereTopology {
        }
    }

-    fn build_grafana_datasource(
-        &self,
-        name: &str,
-        ns: &str,
-        label_selector: &LabelSelector,
-        url: &str,
-        token: &str,
-    ) -> GrafanaDatasource {
-        let mut json_data = BTreeMap::new();
-        json_data.insert("timeInterval".to_string(), "5s".to_string());
-
-        GrafanaDatasource {
-            metadata: ObjectMeta {
-                name: Some(name.to_string()),
-                namespace: Some(ns.to_string()),
-                ..Default::default()
-            },
-            spec: GrafanaDatasourceSpec {
-                instance_selector: label_selector.clone(),
-                allow_cross_namespace_import: Some(true),
-                values_from: None,
-                datasource: GrafanaDatasourceConfig {
-                    access: "proxy".to_string(),
-                    name: name.to_string(),
-                    r#type: "prometheus".to_string(),
-                    url: url.to_string(),
-                    database: None,
-                    json_data: Some(GrafanaDatasourceJsonData {
-                        time_interval: Some("60s".to_string()),
-                        http_header_name1: Some("Authorization".to_string()),
-                        tls_skip_verify: Some(true),
-                        oauth_pass_thru: Some(true),
-                    }),
-                    secure_json_data: Some(GrafanaDatasourceSecureJsonData {
-                        http_header_value1: Some(format!("Bearer {token}")),
-                    }),
-                    is_default: Some(false),
-                    editable: Some(true),
-                },
-            },
-        }
-    }
-
-    fn build_grafana_dashboard(
-        &self,
-        ns: &str,
-        label_selector: &LabelSelector,
-    ) -> GrafanaDashboard {
-        let graf_dashboard = GrafanaDashboard {
-            metadata: ObjectMeta {
-                name: Some(format!("grafana-dashboard-{}", ns)),
-                namespace: Some(ns.to_string()),
-                ..Default::default()
-            },
-            spec: GrafanaDashboardSpec {
-                resync_period: Some("30s".to_string()),
-                instance_selector: label_selector.clone(),
-                datasources: Some(vec![GrafanaDashboardDatasource {
-                    input_name: "DS_PROMETHEUS".to_string(),
-                    datasource_name: "thanos-openshift-monitoring".to_string(),
-                }]),
-                json: None,
-                grafana_com: Some(GrafanaCom {
-                    id: 17406,
-                    revision: None,
-                }),
-            },
-        };
-        graf_dashboard
-    }
-
-    fn build_grafana(&self, ns: &str, labels: &BTreeMap<String, String>) -> GrafanaCRD {
-        let grafana = GrafanaCRD {
-            metadata: ObjectMeta {
-                name: Some(format!("grafana-{}", ns)),
-                namespace: Some(ns.to_string()),
-                labels: Some(labels.clone()),
-                ..Default::default()
-            },
-            spec: GrafanaSpec {
-                config: None,
-                admin_user: None,
-                admin_password: None,
-                ingress: None,
-                persistence: None,
-                resources: None,
-            },
-        };
-        grafana
-    }
-
-    async fn build_grafana_ingress(&self, ns: &str) -> K8sIngressScore {
-        let domain = self.get_domain(&format!("grafana-{}", ns)).await.unwrap();
-        let name = format!("{}-grafana", ns);
-        let backend_service = format!("grafana-{}-service", ns);
-
-        K8sIngressScore {
-            name: fqdn::fqdn!(&name),
-            host: fqdn::fqdn!(&domain),
-            backend_service: fqdn::fqdn!(&backend_service),
-            port: 3000,
-            path: Some("/".to_string()),
-            path_type: Some(PathType::Prefix),
-            namespace: Some(fqdn::fqdn!(&ns)),
-            ingress_class_name: Some("openshift-default".to_string()),
-        }
-    }
-
-    async fn get_cluster_observability_operator_prometheus_application_score(
-        &self,
-        sender: RHOBObservability,
-        receivers: Option<Vec<Box<dyn AlertReceiver<RHOBObservability>>>>,
-    ) -> RHOBAlertingScore {
-        RHOBAlertingScore {
-            sender,
-            receivers: receivers.unwrap_or_default(),
-            service_monitors: vec![],
-            prometheus_rules: vec![],
-        }
-    }
-
-    async fn get_k8s_prometheus_application_score(
-        &self,
-        sender: CRDPrometheus,
-        receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
-        service_monitors: Option<Vec<ServiceMonitor>>,
-    ) -> K8sPrometheusCRDAlertingScore {
-        return K8sPrometheusCRDAlertingScore {
-            sender,
-            receivers: receivers.unwrap_or_default(),
-            service_monitors: service_monitors.unwrap_or_default(),
-            prometheus_rules: vec![],
-        };
-    }
-
    async fn openshift_ingress_operator_available(&self) -> Result<(), PreparationError> {
        let client = self.k8s_client().await?;
        let gvk = GroupVersionKind {
@@ -963,137 +572,6 @@ impl K8sAnywhereTopology {
            )),
        }
    }
-
-    async fn ensure_cluster_observability_operator(
-        &self,
-        sender: &RHOBObservability,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let status = Command::new("sh")
-            .args(["-c", "kubectl get crd -A | grep -i rhobs"])
-            .status()
-            .map_err(|e| PreparationError::new(format!("could not connect to cluster: {}", e)))?;
-
-        if !status.success() {
-            if let Some(Some(k8s_state)) = self.k8s_state.get() {
-                match k8s_state.source {
-                    K8sSource::LocalK3d => {
-                        warn!(
-                            "Installing observability operator is not supported on LocalK3d source"
-                        );
-                        return Ok(PreparationOutcome::Noop);
-                        debug!("installing cluster observability operator");
-                        todo!();
-                        let op_score =
-                            prometheus_operator_helm_chart_score(sender.namespace.clone());
-                        let result = op_score.interpret(&Inventory::empty(), self).await;
-
-                        return match result {
-                            Ok(outcome) => match outcome.status {
-                                InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
-                                    details: "installed cluster observability operator".into(),
-                                }),
-                                InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
-                                _ => Err(PreparationError::new(
-                                    "failed to install cluster observability operator (unknown error)".into(),
-                                )),
-                            },
-                            Err(err) => Err(PreparationError::new(err.to_string())),
-                        };
-                    }
-                    K8sSource::Kubeconfig => {
-                        debug!(
-                            "unable to install cluster observability operator, contact cluster admin"
-                        );
-                        return Ok(PreparationOutcome::Noop);
-                    }
-                }
-            } else {
-                warn!(
-                    "Unable to detect k8s_state. Skipping Cluster Observability Operator install."
-                );
-                return Ok(PreparationOutcome::Noop);
-            }
-        }
-
-        debug!("Cluster Observability Operator is already present, skipping install");
-
-        Ok(PreparationOutcome::Success {
-            details: "cluster observability operator present in cluster".into(),
-        })
-    }
-
-    async fn ensure_prometheus_operator(
-        &self,
-        sender: &CRDPrometheus,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let status = Command::new("sh")
-            .args(["-c", "kubectl get crd -A | grep -i prometheuses"])
-            .status()
-            .map_err(|e| PreparationError::new(format!("could not connect to cluster: {}", e)))?;
-
-        if !status.success() {
-            if let Some(Some(k8s_state)) = self.k8s_state.get() {
-                match k8s_state.source {
-                    K8sSource::LocalK3d => {
-                        debug!("installing prometheus operator");
-                        let op_score =
-                            prometheus_operator_helm_chart_score(sender.namespace.clone());
-                        let result = op_score.interpret(&Inventory::empty(), self).await;
-
-                        return match result {
-                            Ok(outcome) => match outcome.status {
-                                InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
-                                    details: "installed prometheus operator".into(),
-                                }),
-                                InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
-                                _ => Err(PreparationError::new(
-                                    "failed to install prometheus operator (unknown error)".into(),
-                                )),
-                            },
-                            Err(err) => Err(PreparationError::new(err.to_string())),
-                        };
-                    }
-                    K8sSource::Kubeconfig => {
-                        debug!("unable to install prometheus operator, contact cluster admin");
-                        return Ok(PreparationOutcome::Noop);
-                    }
-                }
-            } else {
-                warn!("Unable to detect k8s_state. Skipping Prometheus Operator install.");
-                return Ok(PreparationOutcome::Noop);
-            }
-        }
-
-        debug!("Prometheus operator is already present, skipping install");
-
-        Ok(PreparationOutcome::Success {
-            details: "prometheus operator present in cluster".into(),
-        })
-    }
-
-    async fn install_grafana_operator(
-        &self,
-        inventory: &Inventory,
-        ns: Option<&str>,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let namespace = ns.unwrap_or("grafana");
-        info!("installing grafana operator in ns {namespace}");
-        let tenant = self.get_k8s_tenant_manager()?.get_tenant_config().await;
-        let mut namespace_scope = false;
-        if tenant.is_some() {
-            namespace_scope = true;
-        }
-        let _grafana_operator_score = grafana_helm_chart_score(namespace, namespace_scope)
-            .interpret(inventory, self)
-            .await
-            .map_err(|e| PreparationError::new(e.to_string()));
-        Ok(PreparationOutcome::Success {
-            details: format!(
-                "Successfully installed grafana operator in ns {}",
-                ns.unwrap()
-            ),
-        })
-    }
 }

 #[derive(Clone, Debug)]
--- a/harmony/src/domain/topology/k8s_anywhere/mod.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/mod.rs
@@ -1,4 +1,5 @@
 mod k8s_anywhere;
 pub mod nats;
+pub mod observability;
 mod postgres;
 pub use k8s_anywhere::*;
--- a/harmony/src/domain/topology/k8s_anywhere/observability/grafana.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/grafana.rs
@@ -0,0 +1,147 @@
+use async_trait::async_trait;
+
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::grafana::{
+        grafana::Grafana,
+        k8s::{
+            score_ensure_grafana_ready::GrafanaK8sEnsureReadyScore,
+            score_grafana_alert_receiver::GrafanaK8sReceiverScore,
+            score_grafana_datasource::GrafanaK8sDatasourceScore,
+            score_grafana_rule::GrafanaK8sRuleScore, score_install_grafana::GrafanaK8sInstallScore,
+        },
+    },
+    score::Score,
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<Grafana> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = GrafanaK8sInstallScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("Grafana not installed {}", e)))?;
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed grafana alert sender".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<Grafana>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let receivers = match receivers {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for receiver in receivers {
+            let score = GrafanaK8sReceiverScore {
+                receiver,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert receivers installed successfully".to_string(),
+        })
+    }
+
+    async fn install_rules(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<Grafana>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let rules = match rules {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for rule in rules {
+            let score = GrafanaK8sRuleScore {
+                sender: sender.clone(),
+                rule,
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert rules installed successfully".to_string(),
+        })
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Grafana>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let scrape_targets = match scrape_targets {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for scrape_target in scrape_targets {
+            let score = GrafanaK8sDatasourceScore {
+                scrape_target,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to add DataSource: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All datasources installed successfully".to_string(),
+        })
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = GrafanaK8sEnsureReadyScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("Grafana not ready {}", e)))?;
+        Ok(PreparationOutcome::Success {
+            details: "Grafana Ready".to_string(),
+        })
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/observability/kube_prometheus.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/kube_prometheus.rs
@@ -0,0 +1,142 @@
+use async_trait::async_trait;
+
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::kube_prometheus::{
+        KubePrometheus, helm::kube_prometheus_helm_chart::kube_prometheus_helm_chart_score,
+        score_kube_prometheus_alert_receivers::KubePrometheusReceiverScore,
+        score_kube_prometheus_ensure_ready::KubePrometheusEnsureReadyScore,
+        score_kube_prometheus_rule::KubePrometheusRuleScore,
+        score_kube_prometheus_scrape_target::KubePrometheusScrapeTargetScore,
+    },
+    score::Score,
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<KubePrometheus> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        kube_prometheus_helm_chart_score(sender.config.clone())
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(e.to_string()))?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed kubeprometheus alert sender".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<KubePrometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let receivers = match receivers {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for receiver in receivers {
+            let score = KubePrometheusReceiverScore {
+                receiver,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert receivers installed successfully".to_string(),
+        })
+    }
+
+    async fn install_rules(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<KubePrometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let rules = match rules {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for rule in rules {
+            let score = KubePrometheusRuleScore {
+                sender: sender.clone(),
+                rule,
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert rules installed successfully".to_string(),
+        })
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<KubePrometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let scrape_targets = match scrape_targets {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for scrape_target in scrape_targets {
+            let score = KubePrometheusScrapeTargetScore {
+                scrape_target,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All scrap targets installed successfully".to_string(),
+        })
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = KubePrometheusEnsureReadyScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("KubePrometheus not ready {}", e)))?;
+        Ok(PreparationOutcome::Success {
+            details: "KubePrometheus Ready".to_string(),
+        })
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/observability/mod.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/mod.rs
@@ -0,0 +1,5 @@
+pub mod grafana;
+pub mod kube_prometheus;
+pub mod openshift_monitoring;
+pub mod prometheus;
+pub mod redhat_cluster_observability;
--- a/harmony/src/domain/topology/k8s_anywhere/observability/openshift_monitoring.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/openshift_monitoring.rs
@@ -0,0 +1,142 @@
+use async_trait::async_trait;
+use log::info;
+
+use crate::score::Score;
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::okd::{
+        OpenshiftClusterAlertSender,
+        score_enable_cluster_monitoring::OpenshiftEnableClusterMonitoringScore,
+        score_openshift_alert_rule::OpenshiftAlertRuleScore,
+        score_openshift_receiver::OpenshiftReceiverScore,
+        score_openshift_scrape_target::OpenshiftScrapeTargetScore,
+        score_user_workload::OpenshiftUserWorkloadMonitoring,
+        score_verify_user_workload_monitoring::VerifyUserWorkload,
+    },
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<OpenshiftClusterAlertSender> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        info!("enabling cluster monitoring");
+        let cluster_monitoring_score = OpenshiftEnableClusterMonitoringScore {};
+        cluster_monitoring_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError { msg: e.to_string() })?;
+
+        info!("enabling user workload monitoring");
+        let user_workload_score = OpenshiftUserWorkloadMonitoring {};
+        user_workload_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError { msg: e.to_string() })?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully configured cluster monitoring".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        if let Some(receivers) = receivers {
+            for receiver in receivers {
+                info!("Installing receiver {}", receiver.name());
+                let receiver_score = OpenshiftReceiverScore { receiver };
+                receiver_score
+                    .create_interpret()
+                    .execute(inventory, self)
+                    .await
+                    .map_err(|e| PreparationError { msg: e.to_string() })?;
+            }
+            Ok(PreparationOutcome::Success {
+                details: "Successfully installed receivers for OpenshiftClusterMonitoring"
+                    .to_string(),
+            })
+        } else {
+            Ok(PreparationOutcome::Noop)
+        }
+    }
+
+    async fn install_rules(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        if let Some(rules) = rules {
+            for rule in rules {
+                info!("Installing rule ");
+                let rule_score = OpenshiftAlertRuleScore { rule: rule };
+                rule_score
+                    .create_interpret()
+                    .execute(inventory, self)
+                    .await
+                    .map_err(|e| PreparationError { msg: e.to_string() })?;
+            }
+            Ok(PreparationOutcome::Success {
+                details: "Successfully installed rules for OpenshiftClusterMonitoring".to_string(),
+            })
+        } else {
+            Ok(PreparationOutcome::Noop)
+        }
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        if let Some(scrape_targets) = scrape_targets {
+            for scrape_target in scrape_targets {
+                info!("Installing scrape target");
+                let scrape_target_score = OpenshiftScrapeTargetScore {
+                    scrape_target: scrape_target,
+                };
+                scrape_target_score
+                    .create_interpret()
+                    .execute(inventory, self)
+                    .await
+                    .map_err(|e| PreparationError { msg: e.to_string() })?;
+            }
+            Ok(PreparationOutcome::Success {
+                details: "Successfully added scrape targets for OpenshiftClusterMonitoring"
+                    .to_string(),
+            })
+        } else {
+            Ok(PreparationOutcome::Noop)
+        }
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let verify_monitoring_score = VerifyUserWorkload {};
+        info!("Verifying user workload and cluster monitoring installed");
+        verify_monitoring_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError { msg: e.to_string() })?;
+        Ok(PreparationOutcome::Success {
+            details: "OpenshiftClusterMonitoring ready".to_string(),
+        })
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/observability/prometheus.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/prometheus.rs
@@ -0,0 +1,147 @@
+use async_trait::async_trait;
+
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::prometheus::{
+        Prometheus, score_prometheus_alert_receivers::PrometheusReceiverScore,
+        score_prometheus_ensure_ready::PrometheusEnsureReadyScore,
+        score_prometheus_install::PrometheusInstallScore,
+        score_prometheus_rule::PrometheusRuleScore,
+        score_prometheus_scrape_target::PrometheusScrapeTargetScore,
+    },
+    score::Score,
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<Prometheus> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = PrometheusInstallScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("Prometheus not installed {}", e)))?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed kubeprometheus alert sender".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<Prometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let receivers = match receivers {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for receiver in receivers {
+            let score = PrometheusReceiverScore {
+                receiver,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert receivers installed successfully".to_string(),
+        })
+    }
+
+    async fn install_rules(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<Prometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let rules = match rules {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for rule in rules {
+            let score = PrometheusRuleScore {
+                sender: sender.clone(),
+                rule,
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert rules installed successfully".to_string(),
+        })
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Prometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let scrape_targets = match scrape_targets {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for scrape_target in scrape_targets {
+            let score = PrometheusScrapeTargetScore {
+                scrape_target,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All scrap targets installed successfully".to_string(),
+        })
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = PrometheusEnsureReadyScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("Prometheus not ready {}", e)))?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Prometheus Ready".to_string(),
+        })
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/observability/redhat_cluster_observability.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/redhat_cluster_observability.rs
@@ -0,0 +1,116 @@
+use crate::{
+    modules::monitoring::red_hat_cluster_observability::{
+        score_alert_receiver::RedHatClusterObservabilityReceiverScore,
+        score_coo_monitoring_stack::RedHatClusterObservabilityMonitoringStackScore,
+    },
+    score::Score,
+};
+use async_trait::async_trait;
+use log::info;
+
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::red_hat_cluster_observability::{
+        RedHatClusterObservability,
+        score_redhat_cluster_observability_operator::RedHatClusterObservabilityOperatorScore,
+    },
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<RedHatClusterObservability> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        sender: &RedHatClusterObservability,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        info!("Verifying Redhat Cluster Observability Operator");
+
+        let coo_score = RedHatClusterObservabilityOperatorScore::default();
+
+        coo_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(e.to_string()))?;
+
+        info!(
+            "Installing Cluster Observability Operator Monitoring Stack in ns {}",
+            sender.namespace.clone()
+        );
+
+        let coo_monitoring_stack_score = RedHatClusterObservabilityMonitoringStackScore {
+            namespace: sender.namespace.clone(),
+            resource_selector: sender.resource_selector.clone(),
+        };
+
+        coo_monitoring_stack_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(e.to_string()))?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed RedHatClusterObservability Operator".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        sender: &RedHatClusterObservability,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let receivers = match receivers {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for receiver in receivers {
+            info!("Installing receiver {}", receiver.name());
+
+            let receiver_score = RedHatClusterObservabilityReceiverScore {
+                receiver,
+                sender: sender.clone(),
+            };
+            receiver_score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(e.to_string()))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed receivers for OpenshiftClusterMonitoring".to_string(),
+        })
+    }
+
+    async fn install_rules(
+        &self,
+        _sender: &RedHatClusterObservability,
+        _inventory: &Inventory,
+        _rules: Option<Vec<Box<dyn AlertRule<RedHatClusterObservability>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        todo!()
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        _sender: &RedHatClusterObservability,
+        _inventory: &Inventory,
+        _scrape_targets: Option<Vec<Box<dyn ScrapeTarget<RedHatClusterObservability>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        todo!()
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        _sender: &RedHatClusterObservability,
+        _inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        todo!()
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/postgres.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/postgres.rs
@@ -1,7 +1,6 @@
 use async_trait::async_trait;

 use crate::{
-    interpret::Outcome,
    inventory::Inventory,
    modules::postgresql::{
        K8sPostgreSQLScore,
--- a/harmony/src/domain/topology/mod.rs
+++ b/harmony/src/domain/topology/mod.rs
@@ -2,6 +2,7 @@ pub mod decentralized;
 mod failover;
 mod ha_cluster;
 pub mod ingress;
+pub mod monitoring;
 pub mod node_exporter;
 pub mod opnsense;
 pub use failover::*;
@@ -11,12 +12,10 @@ mod http;
 pub mod installable;
 mod k8s_anywhere;
 mod localhost;
-pub mod oberservability;
 pub mod tenant;
 use derive_new::new;
 pub use k8s_anywhere::*;
 pub use localhost::*;
-pub mod k8s;
 mod load_balancer;
 pub mod router;
 mod tftp;
--- a/harmony/src/domain/topology/monitoring.rs
+++ b/harmony/src/domain/topology/monitoring.rs
@@ -0,0 +1,256 @@
+use std::{
+    any::Any,
+    collections::{BTreeMap, HashMap},
+    net::IpAddr,
+};
+
+use async_trait::async_trait;
+use kube::api::DynamicObject;
+use log::{debug, info};
+use serde::{Deserialize, Serialize};
+
+use crate::{
+    data::Version,
+    interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
+    inventory::Inventory,
+    topology::{PreparationError, PreparationOutcome, Topology, installable::Installable},
+};
+use harmony_types::id::Id;
+
+/// Defines the application that sends the alerts to a receivers
+/// for example prometheus
+#[async_trait]
+pub trait AlertSender: Send + Sync + std::fmt::Debug {
+    fn name(&self) -> String;
+}
+
+/// Trait which defines how an alert sender is impleneted for a specific topology
+#[async_trait]
+pub trait Observability<S: AlertSender> {
+    async fn install_alert_sender(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError>;
+
+    async fn install_receivers(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>,
+    ) -> Result<PreparationOutcome, PreparationError>;
+
+    async fn install_rules(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<S>>>>,
+    ) -> Result<PreparationOutcome, PreparationError>;
+
+    async fn add_scrape_targets(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
+    ) -> Result<PreparationOutcome, PreparationError>;
+
+    async fn ensure_monitoring_installed(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError>;
+}
+
+/// Defines the entity that receives the alerts from a sender. For example Discord, Slack, etc
+///
+pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
+}
+
+/// Defines a generic rule that can be applied to a sender, such as aprometheus alert rule
+pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn AlertRule<S>>;
+}
+
+/// A generic scrape target that can be added to a sender to scrape metrics from, for example a
+/// server outside of the cluster
+pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build_scrape_target(&self) -> Result<ExternalScrapeTarget, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ExternalScrapeTarget {
+    pub ip: IpAddr,
+    pub port: i32,
+    pub interval: Option<String>,
+    pub path: Option<String>,
+    pub labels: Option<BTreeMap<String, String>>,
+}
+
+/// Alerting interpret to install an alert sender on a given topology
+#[derive(Debug)]
+pub struct AlertingInterpret<S: AlertSender> {
+    pub sender: S,
+    pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
+    pub rules: Vec<Box<dyn AlertRule<S>>>,
+    pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
+}
+
+#[async_trait]
+impl<S: AlertSender, T: Topology + Observability<S>> Interpret<T> for AlertingInterpret<S> {
+    async fn execute(
+        &self,
+        inventory: &Inventory,
+        topology: &T,
+    ) -> Result<Outcome, InterpretError> {
+        info!("Configuring alert sender {}", self.sender.name());
+        topology
+            .install_alert_sender(&self.sender, inventory)
+            .await?;
+
+        info!("Installing receivers");
+        topology
+            .install_receivers(&self.sender, inventory, Some(self.receivers.clone()))
+            .await?;
+
+        info!("Installing rules");
+        topology
+            .install_rules(&self.sender, inventory, Some(self.rules.clone()))
+            .await?;
+
+        info!("Adding extra scrape targets");
+        topology
+            .add_scrape_targets(&self.sender, inventory, self.scrape_targets.clone())
+            .await?;
+
+        info!("Ensuring alert sender {} is ready", self.sender.name());
+        topology
+            .ensure_monitoring_installed(&self.sender, inventory)
+            .await?;
+
+        Ok(Outcome::success(format!(
+            "successfully installed alert sender {}",
+            self.sender.name()
+        )))
+    }
+
+    fn get_name(&self) -> InterpretName {
+        InterpretName::Alerting
+    }
+
+    fn get_version(&self) -> Version {
+        todo!()
+    }
+
+    fn get_status(&self) -> InterpretStatus {
+        todo!()
+    }
+
+    fn get_children(&self) -> Vec<Id> {
+        todo!()
+    }
+}
+
+impl<S: AlertSender> Clone for Box<dyn AlertReceiver<S>> {
+    fn clone(&self) -> Self {
+        self.clone_box()
+    }
+}
+
+impl<S: AlertSender> Clone for Box<dyn AlertRule<S>> {
+    fn clone(&self) -> Self {
+        self.clone_box()
+    }
+}
+
+impl<S: AlertSender> Clone for Box<dyn ScrapeTarget<S>> {
+    fn clone(&self) -> Self {
+        self.clone_box()
+    }
+}
+
+pub struct ReceiverInstallPlan {
+    pub install_operation: Option<Vec<InstallOperation>>,
+    pub route: Option<AlertRoute>,
+    pub receiver: Option<serde_yaml::Value>,
+}
+
+impl Default for ReceiverInstallPlan {
+    fn default() -> Self {
+        Self {
+            install_operation: None,
+            route: None,
+            receiver: None,
+        }
+    }
+}
+
+pub enum InstallOperation {
+    CreateSecret {
+        name: String,
+        data: BTreeMap<String, String>,
+    },
+}
+
+///Generic routing that can map to various alert sender backends
+#[derive(Debug, Clone, Serialize)]
+pub struct AlertRoute {
+    pub receiver: String,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    pub matchers: Vec<AlertMatcher>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    pub group_by: Vec<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub repeat_interval: Option<String>,
+    #[serde(rename = "continue")]
+    pub continue_matching: bool,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    pub children: Vec<AlertRoute>,
+}
+
+impl AlertRoute {
+    pub fn default(name: String) -> Self {
+        Self {
+            receiver: name,
+            matchers: vec![],
+            group_by: vec![],
+            repeat_interval: Some("30s".to_string()),
+            continue_matching: true,
+            children: vec![],
+        }
+    }
+}
+
+#[derive(Debug, Clone, Serialize)]
+pub struct AlertMatcher {
+    pub label: String,
+    pub operator: MatchOp,
+    pub value: String,
+}
+
+#[derive(Debug, Clone)]
+pub enum MatchOp {
+    Eq,
+    NotEq,
+    Regex,
+}
+
+impl Serialize for MatchOp {
+    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: serde::Serializer,
+    {
+        let op = match self {
+            MatchOp::Eq => "=",
+            MatchOp::NotEq => "!=",
+            MatchOp::Regex => "=~",
+        };
+        serializer.serialize_str(op)
+    }
+}
--- a/harmony/src/domain/topology/network.rs
+++ b/harmony/src/domain/topology/network.rs
@@ -9,6 +9,7 @@ use std::{
 use async_trait::async_trait;
 use brocade::PortOperatingMode;
 use derive_new::new;
+use harmony_k8s::K8sClient;
 use harmony_types::{
    id::Id,
    net::{IpAddress, MacAddress},
@@ -18,7 +19,7 @@ use serde::Serialize;

 use crate::executors::ExecutorError;

-use super::{LogicalHost, k8s::K8sClient};
+use super::LogicalHost;

 #[derive(Debug)]
 pub struct DHCPStaticEntry {
--- a/harmony/src/domain/topology/oberservability/mod.rs
+++ b/harmony/src/domain/topology/oberservability/mod.rs
@@ -1 +0,0 @@
-pub mod monitoring;
--- a/harmony/src/domain/topology/oberservability/monitoring.rs
+++ b/harmony/src/domain/topology/oberservability/monitoring.rs
@@ -1,101 +0,0 @@
-use std::{any::Any, collections::HashMap};
-
-use async_trait::async_trait;
-use kube::api::DynamicObject;
-use log::debug;
-
-use crate::{
-    data::Version,
-    interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
-    inventory::Inventory,
-    topology::{Topology, installable::Installable},
-};
-use harmony_types::id::Id;
-
-#[async_trait]
-pub trait AlertSender: Send + Sync + std::fmt::Debug {
-    fn name(&self) -> String;
-}
-
-#[derive(Debug)]
-pub struct AlertingInterpret<S: AlertSender> {
-    pub sender: S,
-    pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
-    pub rules: Vec<Box<dyn AlertRule<S>>>,
-    pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
-}
-
-#[async_trait]
-impl<S: AlertSender + Installable<T>, T: Topology> Interpret<T> for AlertingInterpret<S> {
-    async fn execute(
-        &self,
-        inventory: &Inventory,
-        topology: &T,
-    ) -> Result<Outcome, InterpretError> {
-        debug!("hit sender configure for AlertingInterpret");
-        self.sender.configure(inventory, topology).await?;
-        for receiver in self.receivers.iter() {
-            receiver.install(&self.sender).await?;
-        }
-        for rule in self.rules.iter() {
-            debug!("installing rule: {:#?}", rule);
-            rule.install(&self.sender).await?;
-        }
-        if let Some(targets) = &self.scrape_targets {
-            for target in targets.iter() {
-                debug!("installing scrape_target: {:#?}", target);
-                target.install(&self.sender).await?;
-            }
-        }
-        self.sender.ensure_installed(inventory, topology).await?;
-        Ok(Outcome::success(format!(
-            "successfully installed alert sender {}",
-            self.sender.name()
-        )))
-    }
-
-    fn get_name(&self) -> InterpretName {
-        InterpretName::Alerting
-    }
-
-    fn get_version(&self) -> Version {
-        todo!()
-    }
-
-    fn get_status(&self) -> InterpretStatus {
-        todo!()
-    }
-
-    fn get_children(&self) -> Vec<Id> {
-        todo!()
-    }
-}
-
-#[async_trait]
-pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
-    async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
-    fn name(&self) -> String;
-    fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
-    fn as_any(&self) -> &dyn Any;
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String>;
-}
-
-#[derive(Debug)]
-pub struct AlertManagerReceiver {
-    pub receiver_config: serde_json::Value,
-    // FIXME we should not leak k8s here. DynamicObject is k8s specific
-    pub additional_ressources: Vec<DynamicObject>,
-    pub route_config: serde_json::Value,
-}
-
-#[async_trait]
-pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
-    async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
-    fn clone_box(&self) -> Box<dyn AlertRule<S>>;
-}
-
-#[async_trait]
-pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
-    async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
-    fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
-}
--- a/harmony/src/domain/topology/tenant/k8s.rs
+++ b/harmony/src/domain/topology/tenant/k8s.rs
@@ -1,10 +1,8 @@
 use std::sync::Arc;

-use crate::{
-    executors::ExecutorError,
-    topology::k8s::{ApplyStrategy, K8sClient},
-};
+use crate::executors::ExecutorError;
 use async_trait::async_trait;
+use harmony_k8s::K8sClient;
 use k8s_openapi::{
    api::{
        core::v1::{LimitRange, Namespace, ResourceQuota},
@@ -14,7 +12,7 @@ use k8s_openapi::{
    },
    apimachinery::pkg::util::intstr::IntOrString,
 };
-use kube::{Resource, api::DynamicObject};
+use kube::Resource;
 use log::debug;
 use serde::de::DeserializeOwned;
 use serde_json::json;
@@ -59,7 +57,6 @@ impl K8sTenantManager {
    ) -> Result<K, ExecutorError>
    where
        <K as kube::Resource>::DynamicType: Default,
-        <K as kube::Resource>::Scope: ApplyStrategy<K>,
    {
        self.apply_labels(&mut resource, config);
        self.k8s_client
--- a/harmony/src/infra/network_manager.rs
+++ b/harmony/src/infra/network_manager.rs
@@ -5,6 +5,7 @@ use std::{

 use askama::Template;
 use async_trait::async_trait;
+use harmony_k8s::{DrainOptions, K8sClient, NodeFile};
 use harmony_types::id::Id;
 use k8s_openapi::api::core::v1::Node;
 use kube::{
@@ -15,10 +16,7 @@ use log::{debug, info, warn};

 use crate::{
    modules::okd::crd::nmstate,
-    topology::{
-        HostNetworkConfig, NetworkError, NetworkManager,
-        k8s::{DrainOptions, K8sClient, NodeFile},
-    },
+    topology::{HostNetworkConfig, NetworkError, NetworkManager},
 };

 /// NetworkManager bond configuration template
--- a/harmony/src/modules/application/backend_app.rs
+++ b/harmony/src/modules/application/backend_app.rs
@@ -1,5 +1,5 @@
 use async_trait::async_trait;
-use log::{debug, info, trace};
+use log::{debug, info};
 use serde::Serialize;
 use std::path::PathBuf;

--- a/harmony/src/modules/application/features/helm_argocd_score.rs
+++ b/harmony/src/modules/application/features/helm_argocd_score.rs
@@ -1,4 +1,5 @@
 use async_trait::async_trait;
+use harmony_k8s::K8sClient;
 use harmony_macros::hurl;
 use log::{debug, info, trace, warn};
 use non_blank_string_rs::NonBlankString;
@@ -14,7 +15,7 @@ use crate::{
        helm::chart::{HelmChartScore, HelmRepository},
    },
    score::Score,
-    topology::{HelmCommand, K8sclient, Topology, ingress::Ingress, k8s::K8sClient},
+    topology::{HelmCommand, K8sclient, Topology, ingress::Ingress},
 };
 use harmony_types::id::Id;

--- a/harmony/src/modules/application/features/monitoring.rs
+++ b/harmony/src/modules/application/features/monitoring.rs
@@ -2,13 +2,15 @@ use crate::modules::application::{
    Application, ApplicationFeature, InstallationError, InstallationOutcome,
 };
 use crate::modules::monitoring::application_monitoring::application_monitoring_score::ApplicationMonitoringScore;
-use crate::modules::monitoring::grafana::grafana::Grafana;
-use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus;
 use crate::modules::monitoring::kube_prometheus::crd::service_monitor::{
    ServiceMonitor, ServiceMonitorSpec,
 };
+use crate::modules::monitoring::prometheus::Prometheus;
+use crate::modules::monitoring::prometheus::helm::prometheus_config::PrometheusConfig;
 use crate::topology::MultiTargetTopology;
 use crate::topology::ingress::Ingress;
+use crate::topology::monitoring::Observability;
+use crate::topology::monitoring::{AlertReceiver, AlertRoute};
 use crate::{
    inventory::Inventory,
    modules::monitoring::{
@@ -17,10 +19,6 @@ use crate::{
    score::Score,
    topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
 };
-use crate::{
-    modules::prometheus::prometheus::PrometheusMonitoring,
-    topology::oberservability::monitoring::AlertReceiver,
-};
 use async_trait::async_trait;
 use base64::{Engine as _, engine::general_purpose};
 use harmony_secret::SecretManager;
@@ -30,12 +28,13 @@ use kube::api::ObjectMeta;
 use log::{debug, info};
 use schemars::JsonSchema;
 use serde::{Deserialize, Serialize};
-use std::sync::Arc;
+use std::sync::{Arc, Mutex};

+//TODO test this
 #[derive(Debug, Clone)]
 pub struct Monitoring {
    pub application: Arc<dyn Application>,
-    pub alert_receiver: Vec<Box<dyn AlertReceiver<CRDPrometheus>>>,
+    pub alert_receiver: Vec<Box<dyn AlertReceiver<Prometheus>>>,
 }

 #[async_trait]
@@ -46,8 +45,7 @@ impl<
        + TenantManager
        + K8sclient
        + MultiTargetTopology
-        + PrometheusMonitoring<CRDPrometheus>
-        + Grafana
+        + Observability<Prometheus>
        + Ingress
        + std::fmt::Debug,
 > ApplicationFeature<T> for Monitoring
@@ -74,17 +72,15 @@ impl<
        };

        let mut alerting_score = ApplicationMonitoringScore {
-            sender: CRDPrometheus {
-                namespace: namespace.clone(),
-                client: topology.k8s_client().await.unwrap(),
-                service_monitor: vec![app_service_monitor],
+            sender: Prometheus {
+                config: Arc::new(Mutex::new(PrometheusConfig::new())),
            },
            application: self.application.clone(),
            receivers: self.alert_receiver.clone(),
        };
        let ntfy = NtfyScore {
            namespace: namespace.clone(),
-            host: domain,
+            host: domain.clone(),
        };
        ntfy.interpret(&Inventory::empty(), topology)
            .await
@@ -105,20 +101,28 @@ impl<

        debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");

+        debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");
        let ntfy_receiver = WebhookReceiver {
            name: "ntfy-webhook".to_string(),
            url: Url::Url(
                url::Url::parse(
                    format!(
-                        "http://ntfy.{}.svc.cluster.local/rust-web-app?auth={ntfy_default_auth_param}",
-                        namespace.clone()
+                        "http://{domain}/{}?auth={ntfy_default_auth_param}",
+                        __self.application.name()
                    )
                    .as_str(),
                )
                .unwrap(),
            ),
+            route: AlertRoute {
+                ..AlertRoute::default("ntfy-webhook".to_string())
+            },
        };
-
+        debug!(
+            "ntfy webhook receiver \n{:#?}\nntfy topic: {}",
+            ntfy_receiver.clone(),
+            self.application.name()
+        );
        alerting_score.receivers.push(Box::new(ntfy_receiver));
        alerting_score
            .interpret(&Inventory::empty(), topology)
--- a/harmony/src/modules/application/features/rhob_monitoring.rs
+++ b/harmony/src/modules/application/features/rhob_monitoring.rs
@@ -3,11 +3,13 @@ use std::sync::Arc;
 use crate::modules::application::{
    Application, ApplicationFeature, InstallationError, InstallationOutcome,
 };
-use crate::modules::monitoring::application_monitoring::rhobs_application_monitoring_score::ApplicationRHOBMonitoringScore;

-use crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::RHOBObservability;
+use crate::modules::monitoring::red_hat_cluster_observability::RedHatClusterObservability;
+use crate::modules::monitoring::red_hat_cluster_observability::redhat_cluster_observability::RedHatClusterObservabilityScore;
 use crate::topology::MultiTargetTopology;
 use crate::topology::ingress::Ingress;
+use crate::topology::monitoring::Observability;
+use crate::topology::monitoring::{AlertReceiver, AlertRoute};
 use crate::{
    inventory::Inventory,
    modules::monitoring::{
@@ -16,10 +18,6 @@ use crate::{
    score::Score,
    topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
 };
-use crate::{
-    modules::prometheus::prometheus::PrometheusMonitoring,
-    topology::oberservability::monitoring::AlertReceiver,
-};
 use async_trait::async_trait;
 use base64::{Engine as _, engine::general_purpose};
 use harmony_types::net::Url;
@@ -28,9 +26,10 @@ use log::{debug, info};
 #[derive(Debug, Clone)]
 pub struct Monitoring {
    pub application: Arc<dyn Application>,
-    pub alert_receiver: Vec<Box<dyn AlertReceiver<RHOBObservability>>>,
+    pub alert_receiver: Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>,
 }

+///TODO TEST this
 #[async_trait]
 impl<
    T: Topology
@@ -41,7 +40,7 @@ impl<
        + MultiTargetTopology
        + Ingress
        + std::fmt::Debug
-        + PrometheusMonitoring<RHOBObservability>,
+        + Observability<RedHatClusterObservability>,
 > ApplicationFeature<T> for Monitoring
 {
    async fn ensure_installed(
@@ -55,13 +54,14 @@ impl<
            .map(|ns| ns.name.clone())
            .unwrap_or_else(|| self.application.name());

-        let mut alerting_score = ApplicationRHOBMonitoringScore {
-            sender: RHOBObservability {
+        let mut alerting_score = RedHatClusterObservabilityScore {
+            sender: RedHatClusterObservability {
                namespace: namespace.clone(),
-                client: topology.k8s_client().await.unwrap(),
+                resource_selector: todo!(),
            },
-            application: self.application.clone(),
            receivers: self.alert_receiver.clone(),
+            rules: vec![],
+            scrape_targets: None,
        };
        let domain = topology
            .get_domain("ntfy")
@@ -97,12 +97,15 @@ impl<
                url::Url::parse(
                    format!(
                        "http://{domain}/{}?auth={ntfy_default_auth_param}",
-                        self.application.name()
+                        __self.application.name()
                    )
                    .as_str(),
                )
                .unwrap(),
            ),
+            route: AlertRoute {
+                ..AlertRoute::default("ntfy-webhook".to_string())
+            },
        };
        debug!(
            "ntfy webhook receiver \n{:#?}\nntfy topic: {}",
--- a/harmony/src/modules/argocd/mod.rs
+++ b/harmony/src/modules/argocd/mod.rs
@@ -1,8 +1,9 @@
 use std::sync::Arc;

+use harmony_k8s::K8sClient;
 use log::{debug, info};

-use crate::{interpret::InterpretError, topology::k8s::K8sClient};
+use crate::interpret::InterpretError;

 #[derive(Clone, Debug, PartialEq, Eq)]
 pub enum ArgoScope {
--- a/harmony/src/modules/brocade/brocade_snmp.rs
+++ b/harmony/src/modules/brocade/brocade_snmp.rs
@@ -44,6 +44,12 @@ pub struct BrocadeSwitchAuth {
    pub password: String,
 }

+impl BrocadeSwitchAuth {
+    pub fn user_pass(username: String, password: String) -> Self {
+        Self { username, password }
+    }
+}
+
 #[derive(Secret, Clone, Debug, JsonSchema, Serialize, Deserialize)]
 pub struct BrocadeSnmpAuth {
    pub username: String,
--- a/harmony/src/modules/cert_manager/cluster_issuer.rs
+++ b/harmony/src/modules/cert_manager/cluster_issuer.rs
@@ -1,3 +1,4 @@
+use harmony_k8s::K8sClient;
 use std::sync::Arc;

 use async_trait::async_trait;
@@ -11,7 +12,7 @@ use crate::{
    interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
    inventory::Inventory,
    score::Score,
-    topology::{K8sclient, Topology, k8s::K8sClient},
+    topology::{K8sclient, Topology},
 };

 #[derive(Clone, Debug, Serialize)]
--- a/harmony/src/modules/inventory/mod.rs
+++ b/harmony/src/modules/inventory/mod.rs
@@ -54,6 +54,12 @@ pub enum HarmonyDiscoveryStrategy {
    SUBNET { cidr: cidr::Ipv4Cidr, port: u16 },
 }

+impl Default for HarmonyDiscoveryStrategy {
+    fn default() -> Self {
+        HarmonyDiscoveryStrategy::MDNS
+    }
+}
+
 #[async_trait]
 impl<T: Topology> Interpret<T> for DiscoverInventoryAgentInterpret {
    async fn execute(
--- a/harmony/src/modules/k8s/failover.rs
+++ b/harmony/src/modules/k8s/failover.rs
@@ -3,7 +3,8 @@ use std::sync::Arc;
 use async_trait::async_trait;
 use log::warn;

-use crate::topology::{FailoverTopology, K8sclient, k8s::K8sClient};
+use crate::topology::{FailoverTopology, K8sclient};
+use harmony_k8s::K8sClient;

 #[async_trait]
 impl<T: K8sclient> K8sclient for FailoverTopology<T> {
--- a/harmony/src/modules/k8s/resource.rs
+++ b/harmony/src/modules/k8s/resource.rs
@@ -1,5 +1,4 @@
 use async_trait::async_trait;
-use k8s_openapi::NamespaceResourceScope;
 use kube::Resource;
 use log::info;
 use serde::{Serialize, de::DeserializeOwned};
@@ -29,7 +28,7 @@ impl<K: Resource + std::fmt::Debug> K8sResourceScore<K> {
 }

 impl<
-    K: Resource<Scope = NamespaceResourceScope>
+    K: Resource
        + std::fmt::Debug
        + Sync
        + DeserializeOwned
@@ -61,7 +60,7 @@ pub struct K8sResourceInterpret<K: Resource + std::fmt::Debug + Sync + Send> {

 #[async_trait]
 impl<
-    K: Resource<Scope = NamespaceResourceScope>
+    K: Resource
        + Clone
        + std::fmt::Debug
        + DeserializeOwned
@@ -109,7 +108,7 @@ where
        topology
            .k8s_client()
            .await
-            .expect("Environment should provide enough information to instanciate a client")
+            .map_err(|e| InterpretError::new(format!("Failed to get k8s client : {e}")))?
            .apply_many(&self.score.resource, self.score.namespace.as_deref())
            .await?;

--- a/harmony/src/modules/mod.rs
+++ b/harmony/src/modules/mod.rs
@@ -15,10 +15,12 @@ pub mod load_balancer;
 pub mod monitoring;
 pub mod nats;
 pub mod network;
+pub mod node_health;
 pub mod okd;
+pub mod openbao;
 pub mod opnsense;
 pub mod postgresql;
-pub mod prometheus;
 pub mod storage;
 pub mod tenant;
 pub mod tftp;
+pub mod zitadel;
--- a/harmony/src/modules/monitoring/alert_channel/discord_alert_channel.rs
+++ b/harmony/src/modules/monitoring/alert_channel/discord_alert_channel.rs
@@ -1,99 +1,38 @@
-use std::any::Any;
-use std::collections::{BTreeMap, HashMap};
-
-use async_trait::async_trait;
-use harmony_types::k8s_name::K8sName;
-use k8s_openapi::api::core::v1::Secret;
-use kube::Resource;
-use kube::api::{DynamicObject, ObjectMeta};
-use log::{debug, trace};
+use crate::modules::monitoring::kube_prometheus::KubePrometheus;
+use crate::modules::monitoring::okd::OpenshiftClusterAlertSender;
+use crate::modules::monitoring::red_hat_cluster_observability::RedHatClusterObservability;
+use crate::topology::monitoring::{AlertRoute, InstallOperation, ReceiverInstallPlan};
+use crate::{interpret::InterpretError, topology::monitoring::AlertReceiver};
+use harmony_types::net::Url;
 use serde::Serialize;
 use serde_json::json;
-use serde_yaml::{Mapping, Value};
-
-use crate::infra::kube::kube_resource_to_dynamic;
-use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::{
-    AlertmanagerConfig, AlertmanagerConfigSpec, CRDPrometheus,
-};
-use crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::RHOBObservability;
-use crate::modules::monitoring::okd::OpenshiftClusterAlertSender;
-use crate::topology::oberservability::monitoring::AlertManagerReceiver;
-use crate::{
-    interpret::{InterpretError, Outcome},
-    modules::monitoring::{
-        kube_prometheus::{
-            prometheus::{KubePrometheus, KubePrometheusReceiver},
-            types::{AlertChannelConfig, AlertManagerChannelConfig},
-        },
-        prometheus::prometheus::{Prometheus, PrometheusReceiver},
-    },
-    topology::oberservability::monitoring::AlertReceiver,
-};
-use harmony_types::net::Url;
+use std::collections::BTreeMap;

 #[derive(Debug, Clone, Serialize)]
-pub struct DiscordWebhook {
-    pub name: K8sName,
+pub struct DiscordReceiver {
+    pub name: String,
    pub url: Url,
-    pub selectors: Vec<HashMap<String, String>>,
+    pub route: AlertRoute,
 }

-impl DiscordWebhook {
-    fn get_receiver_config(&self) -> Result<AlertManagerReceiver, String> {
-        let secret_name = format!("{}-secret", self.name.clone());
-        let webhook_key = format!("{}", self.url.clone());
+impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordReceiver {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
+        let receiver_block = serde_yaml::to_value(json!({
+            "name": self.name,
+            "discord_configs": [{
+                "webhook_url": format!("{}", self.url),
+                "title": "{{ template \"discord.default.title\" . }}",
+                "message": "{{ template \"discord.default.message\" . }}"
+            }]
+        }))
+        .map_err(|e| InterpretError::new(e.to_string()))?;

-        let mut string_data = BTreeMap::new();
-        string_data.insert("webhook-url".to_string(), webhook_key.clone());
-
-        let secret = Secret {
-            metadata: kube::core::ObjectMeta {
-                name: Some(secret_name.clone()),
-                ..Default::default()
-            },
-            string_data: Some(string_data),
-            type_: Some("Opaque".to_string()),
-            ..Default::default()
-        };
-
-        let mut matchers: Vec<String> = Vec::new();
-        for selector in &self.selectors {
-            trace!("selector: {:#?}", selector);
-            for (k, v) in selector {
-                matchers.push(format!("{} = {}", k, v));
-            }
-        }
-
-        Ok(AlertManagerReceiver {
-            additional_ressources: vec![kube_resource_to_dynamic(&secret)?],
-
-            receiver_config: json!({
-                "name": self.name,
-                "discord_configs": [
-                    {
-                    "webhook_url": self.url.clone(),
-                    "title": "{{ template \"discord.default.title\" . }}",
-                    "message": "{{ template \"discord.default.message\" . }}"
-                    }
-                ]
-            }),
-            route_config: json!({
-                "receiver": self.name,
-                "matchers": matchers,
-
-            }),
+        Ok(ReceiverInstallPlan {
+            install_operation: None,
+            route: Some(self.route.clone()),
+            receiver: Some(receiver_block),
        })
    }
-}
-
-#[async_trait]
-impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordWebhook {
-    async fn install(
-        &self,
-        sender: &OpenshiftClusterAlertSender,
-    ) -> Result<Outcome, InterpretError> {
-        todo!()
-    }

    fn name(&self) -> String {
        self.name.clone().to_string()
@@ -102,309 +41,77 @@ impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordWebhook {
    fn clone_box(&self) -> Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
        Box::new(self.clone())
    }
-
-    fn as_any(&self) -> &dyn Any {
-        todo!()
-    }
-
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        self.get_receiver_config()
-    }
 }

-#[async_trait]
-impl AlertReceiver<RHOBObservability> for DiscordWebhook {
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        todo!()
-    }
-
-    async fn install(&self, sender: &RHOBObservability) -> Result<Outcome, InterpretError> {
-        let ns = sender.namespace.clone();
-
-        let config = self.get_receiver_config()?;
-        for resource in config.additional_ressources.iter() {
-            todo!("can I apply a dynamicresource");
-            // sender.client.apply(resource, Some(&ns)).await;
-        }
-
-        let spec = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfigSpec {
-            data: json!({
-                "route": {
-                    "receiver": self.name,
-                },
-                "receivers": [
-                    config.receiver_config
-                ]
-            }),
-        };
-
-        let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfig {
-            metadata: ObjectMeta {
-                name: Some(self.name.clone().to_string()),
-                labels: Some(std::collections::BTreeMap::from([(
-                    "alertmanagerConfig".to_string(),
-                    "enabled".to_string(),
-                )])),
-                namespace: Some(sender.namespace.clone()),
-                ..Default::default()
-            },
-            spec,
-        };
-        debug!(
-            "alertmanager_configs yaml:\n{:#?}",
-            serde_yaml::to_string(&alertmanager_configs)
-        );
-        debug!(
-            "alert manager configs: \n{:#?}",
-            alertmanager_configs.clone()
-        );
-
-        sender
-            .client
-            .apply(&alertmanager_configs, Some(&sender.namespace))
-            .await?;
-        Ok(Outcome::success(format!(
-            "installed rhob-alertmanagerconfigs for {}",
-            self.name
-        )))
-    }
-
-    fn name(&self) -> String {
-        "webhook-receiver".to_string()
-    }
-
-    fn clone_box(&self) -> Box<dyn AlertReceiver<RHOBObservability>> {
-        Box::new(self.clone())
-    }
-
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
-}
-
-#[async_trait]
-impl AlertReceiver<CRDPrometheus> for DiscordWebhook {
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        todo!()
-    }
-    async fn install(&self, sender: &CRDPrometheus) -> Result<Outcome, InterpretError> {
-        let ns = sender.namespace.clone();
+impl AlertReceiver<RedHatClusterObservability> for DiscordReceiver {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
        let secret_name = format!("{}-secret", self.name.clone());
        let webhook_key = format!("{}", self.url.clone());

        let mut string_data = BTreeMap::new();
        string_data.insert("webhook-url".to_string(), webhook_key.clone());

-        let secret = Secret {
-            metadata: kube::core::ObjectMeta {
-                name: Some(secret_name.clone()),
-                ..Default::default()
-            },
-            string_data: Some(string_data),
-            type_: Some("Opaque".to_string()),
-            ..Default::default()
-        };
+        let receiver_config = json!({
+            "name": self.name,
+            "discordConfigs": [
+                {
+                    "apiURL": {
+                        "key": "webhook-url",
+                        "name": format!("{}-secret", self.name)
+                    },
+                    "title": "{{ template \"discord.default.title\" . }}",
+                    "message": "{{ template \"discord.default.message\" . }}"
+                }
+            ]
+        });

-        let _ = sender.client.apply(&secret, Some(&ns)).await;
-
-        let spec = AlertmanagerConfigSpec {
-            data: json!({
-                "route": {
-                    "receiver": self.name,
-                },
-                "receivers": [
-                    {
-                        "name": self.name,
-                        "discordConfigs": [
-                            {
-                            "apiURL": {
-                                "name": secret_name,
-                                "key":  "webhook-url",
-                            },
-                            "title": "{{ template \"discord.default.title\" . }}",
-                            "message": "{{ template \"discord.default.message\" . }}"
-                            }
-                        ]
-                    }
-                ]
-            }),
-        };
-
-        let alertmanager_configs = AlertmanagerConfig {
-            metadata: ObjectMeta {
-                name: Some(self.name.clone().to_string()),
-                labels: Some(std::collections::BTreeMap::from([(
-                    "alertmanagerConfig".to_string(),
-                    "enabled".to_string(),
-                )])),
-                namespace: Some(ns),
-                ..Default::default()
-            },
-            spec,
-        };
-
-        sender
-            .client
-            .apply(&alertmanager_configs, Some(&sender.namespace))
-            .await?;
-        Ok(Outcome::success(format!(
-            "installed crd-alertmanagerconfigs for {}",
-            self.name
-        )))
+        Ok(ReceiverInstallPlan {
+            install_operation: Some(vec![InstallOperation::CreateSecret {
+                name: secret_name,
+                data: string_data,
+            }]),
+            route: Some(self.route.clone()),
+            receiver: Some(
+                serde_yaml::to_value(receiver_config)
+                    .map_err(|e| InterpretError::new(e.to_string()))
+                    .expect("failed to build yaml value"),
+            ),
+        })
    }
+
    fn name(&self) -> String {
-        "discord-webhook".to_string()
+        self.name.clone()
    }
-    fn clone_box(&self) -> Box<dyn AlertReceiver<CRDPrometheus>> {
+
+    fn clone_box(&self) -> Box<dyn AlertReceiver<RedHatClusterObservability>> {
        Box::new(self.clone())
    }
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
 }

-#[async_trait]
-impl AlertReceiver<Prometheus> for DiscordWebhook {
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        todo!()
-    }
-    async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
-        sender.install_receiver(self).await
+impl AlertReceiver<KubePrometheus> for DiscordReceiver {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
+        let receiver_block = serde_yaml::to_value(json!({
+            "name": self.name,
+            "discord_configs": [{
+                "webhook_url": format!("{}", self.url),
+                "title": "{{ template \"discord.default.title\" . }}",
+                "message": "{{ template \"discord.default.message\" . }}"
+            }]
+        }))
+        .map_err(|e| InterpretError::new(e.to_string()))?;
+
+        Ok(ReceiverInstallPlan {
+            install_operation: None,
+            route: Some(self.route.clone()),
+            receiver: Some(receiver_block),
+        })
    }
+
    fn name(&self) -> String {
-        "discord-webhook".to_string()
+        self.name.clone()
    }
-    fn clone_box(&self) -> Box<dyn AlertReceiver<Prometheus>> {
-        Box::new(self.clone())
-    }
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
-}

-#[async_trait]
-impl PrometheusReceiver for DiscordWebhook {
-    fn name(&self) -> String {
-        self.name.clone().to_string()
-    }
-    async fn configure_receiver(&self) -> AlertManagerChannelConfig {
-        self.get_config().await
-    }
-}
-
-#[async_trait]
-impl AlertReceiver<KubePrometheus> for DiscordWebhook {
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        todo!()
-    }
-    async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
-        sender.install_receiver(self).await
-    }
    fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
        Box::new(self.clone())
    }
-    fn name(&self) -> String {
-        "discord-webhook".to_string()
-    }
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
-}
-
-#[async_trait]
-impl KubePrometheusReceiver for DiscordWebhook {
-    fn name(&self) -> String {
-        self.name.clone().to_string()
-    }
-    async fn configure_receiver(&self) -> AlertManagerChannelConfig {
-        self.get_config().await
-    }
-}
-
-#[async_trait]
-impl AlertChannelConfig for DiscordWebhook {
-    async fn get_config(&self) -> AlertManagerChannelConfig {
-        let channel_global_config = None;
-        let channel_receiver = self.alert_channel_receiver().await;
-        let channel_route = self.alert_channel_route().await;
-
-        AlertManagerChannelConfig {
-            channel_global_config,
-            channel_receiver,
-            channel_route,
-        }
-    }
-}
-
-impl DiscordWebhook {
-    async fn alert_channel_route(&self) -> serde_yaml::Value {
-        let mut route = Mapping::new();
-        route.insert(
-            Value::String("receiver".to_string()),
-            Value::String(self.name.clone().to_string()),
-        );
-        route.insert(
-            Value::String("matchers".to_string()),
-            Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
-        );
-        route.insert(Value::String("continue".to_string()), Value::Bool(true));
-        Value::Mapping(route)
-    }
-
-    async fn alert_channel_receiver(&self) -> serde_yaml::Value {
-        let mut receiver = Mapping::new();
-        receiver.insert(
-            Value::String("name".to_string()),
-            Value::String(self.name.clone().to_string()),
-        );
-
-        let mut discord_config = Mapping::new();
-        discord_config.insert(
-            Value::String("webhook_url".to_string()),
-            Value::String(self.url.to_string()),
-        );
-
-        receiver.insert(
-            Value::String("discord_configs".to_string()),
-            Value::Sequence(vec![Value::Mapping(discord_config)]),
-        );
-
-        Value::Mapping(receiver)
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[tokio::test]
-    async fn discord_serialize_should_match() {
-        let discord_receiver = DiscordWebhook {
-            name: K8sName("test-discord".to_string()),
-            url: Url::Url(url::Url::parse("https://discord.i.dont.exist.com").unwrap()),
-            selectors: vec![],
-        };
-
-        let discord_receiver_receiver =
-            serde_yaml::to_string(&discord_receiver.alert_channel_receiver().await).unwrap();
-        println!("receiver \n{:#}", discord_receiver_receiver);
-        let discord_receiver_receiver_yaml = r#"name: test-discord
-discord_configs:
- webhook_url: https://discord.i.dont.exist.com/
-"#
-        .to_string();
-
-        let discord_receiver_route =
-            serde_yaml::to_string(&discord_receiver.alert_channel_route().await).unwrap();
-        println!("route \n{:#}", discord_receiver_route);
-        let discord_receiver_route_yaml = r#"receiver: test-discord
-matchers:
- alertname!=Watchdog
-continue: true
-"#
-        .to_string();
-
-        assert_eq!(discord_receiver_receiver, discord_receiver_receiver_yaml);
-        assert_eq!(discord_receiver_route, discord_receiver_route_yaml);
-    }
 }
--- a/harmony/src/modules/monitoring/alert_channel/webhook_receiver.rs
+++ b/harmony/src/modules/monitoring/alert_channel/webhook_receiver.rs
@@ -1,25 +1,13 @@
-use std::any::Any;
-
-use async_trait::async_trait;
-use kube::api::ObjectMeta;
-use log::debug;
 use serde::Serialize;
 use serde_json::json;
-use serde_yaml::{Mapping, Value};

 use crate::{
-    interpret::{InterpretError, Outcome},
+    interpret::InterpretError,
    modules::monitoring::{
-        kube_prometheus::{
-            crd::{
-                crd_alertmanager_config::CRDPrometheus, rhob_alertmanager_config::RHOBObservability,
-            },
-            prometheus::{KubePrometheus, KubePrometheusReceiver},
-            types::{AlertChannelConfig, AlertManagerChannelConfig},
-        },
-        prometheus::prometheus::{Prometheus, PrometheusReceiver},
+        kube_prometheus::KubePrometheus, okd::OpenshiftClusterAlertSender, prometheus::Prometheus,
+        red_hat_cluster_observability::RedHatClusterObservability,
    },
-    topology::oberservability::monitoring::{AlertManagerReceiver, AlertReceiver},
+    topology::monitoring::{AlertReceiver, AlertRoute, ReceiverInstallPlan},
 };
 use harmony_types::net::Url;

@@ -27,281 +15,115 @@ use harmony_types::net::Url;
 pub struct WebhookReceiver {
    pub name: String,
    pub url: Url,
-}
-
-#[async_trait]
-impl AlertReceiver<RHOBObservability> for WebhookReceiver {
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        todo!()
-    }
-    async fn install(&self, sender: &RHOBObservability) -> Result<Outcome, InterpretError> {
-        let spec = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfigSpec {
-            data: json!({
-                "route": {
-                    "receiver": self.name,
-                },
-                "receivers": [
-                    {
-                        "name": self.name,
-                        "webhookConfigs": [
-                            {
-                            "url": self.url,
-                            "httpConfig": {
-                                "tlsConfig": {
-                                    "insecureSkipVerify": true
-                                    }
-                                }
-                            }
-                        ]
-                    }
-                ]
-            }),
-        };
-
-        let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfig {
-            metadata: ObjectMeta {
-                name: Some(self.name.clone()),
-                labels: Some(std::collections::BTreeMap::from([(
-                    "alertmanagerConfig".to_string(),
-                    "enabled".to_string(),
-                )])),
-                namespace: Some(sender.namespace.clone()),
-                ..Default::default()
-            },
-            spec,
-        };
-        debug!(
-            "alert manager configs: \n{:#?}",
-            alertmanager_configs.clone()
-        );
-
-        sender
-            .client
-            .apply(&alertmanager_configs, Some(&sender.namespace))
-            .await?;
-        Ok(Outcome::success(format!(
-            "installed rhob-alertmanagerconfigs for {}",
-            self.name
-        )))
-    }
-
-    fn name(&self) -> String {
-        "webhook-receiver".to_string()
-    }
-
-    fn clone_box(&self) -> Box<dyn AlertReceiver<RHOBObservability>> {
-        Box::new(self.clone())
-    }
-
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
-}
-
-#[async_trait]
-impl AlertReceiver<CRDPrometheus> for WebhookReceiver {
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        todo!()
-    }
-    async fn install(&self, sender: &CRDPrometheus) -> Result<Outcome, InterpretError> {
-        let spec = crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::AlertmanagerConfigSpec {
-            data: json!({
-                "route": {
-                    "receiver": self.name,
-                },
-                "receivers": [
-                    {
-                        "name": self.name,
-                        "webhookConfigs": [
-                            {
-                            "url": self.url,
-                            }
-                        ]
-                    }
-                ]
-            }),
-        };
-
-        let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::AlertmanagerConfig {
-            metadata: ObjectMeta {
-                name: Some(self.name.clone()),
-                labels: Some(std::collections::BTreeMap::from([(
-                    "alertmanagerConfig".to_string(),
-                    "enabled".to_string(),
-                )])),
-                namespace: Some(sender.namespace.clone()),
-                ..Default::default()
-            },
-            spec,
-        };
-        debug!(
-            "alert manager configs: \n{:#?}",
-            alertmanager_configs.clone()
-        );
-
-        sender
-            .client
-            .apply(&alertmanager_configs, Some(&sender.namespace))
-            .await?;
-        Ok(Outcome::success(format!(
-            "installed crd-alertmanagerconfigs for {}",
-            self.name
-        )))
-    }
-
-    fn name(&self) -> String {
-        "webhook-receiver".to_string()
-    }
-
-    fn clone_box(&self) -> Box<dyn AlertReceiver<CRDPrometheus>> {
-        Box::new(self.clone())
-    }
-
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
-}
-
-#[async_trait]
-impl AlertReceiver<Prometheus> for WebhookReceiver {
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        todo!()
-    }
-    async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
-        sender.install_receiver(self).await
-    }
-    fn name(&self) -> String {
-        "webhook-receiver".to_string()
-    }
-    fn clone_box(&self) -> Box<dyn AlertReceiver<Prometheus>> {
-        Box::new(self.clone())
-    }
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
-}
-
-#[async_trait]
-impl PrometheusReceiver for WebhookReceiver {
-    fn name(&self) -> String {
-        self.name.clone()
-    }
-    async fn configure_receiver(&self) -> AlertManagerChannelConfig {
-        self.get_config().await
-    }
-}
-
-#[async_trait]
-impl AlertReceiver<KubePrometheus> for WebhookReceiver {
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
-        todo!()
-    }
-    async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
-        sender.install_receiver(self).await
-    }
-    fn name(&self) -> String {
-        "webhook-receiver".to_string()
-    }
-    fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
-        Box::new(self.clone())
-    }
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
-}
-
-#[async_trait]
-impl KubePrometheusReceiver for WebhookReceiver {
-    fn name(&self) -> String {
-        self.name.clone()
-    }
-    async fn configure_receiver(&self) -> AlertManagerChannelConfig {
-        self.get_config().await
-    }
-}
-
-#[async_trait]
-impl AlertChannelConfig for WebhookReceiver {
-    async fn get_config(&self) -> AlertManagerChannelConfig {
-        let channel_global_config = None;
-        let channel_receiver = self.alert_channel_receiver().await;
-        let channel_route = self.alert_channel_route().await;
-
-        AlertManagerChannelConfig {
-            channel_global_config,
-            channel_receiver,
-            channel_route,
-        }
-    }
+    pub route: AlertRoute,
 }

 impl WebhookReceiver {
-    async fn alert_channel_route(&self) -> serde_yaml::Value {
-        let mut route = Mapping::new();
-        route.insert(
-            Value::String("receiver".to_string()),
-            Value::String(self.name.clone()),
-        );
-        route.insert(
-            Value::String("matchers".to_string()),
-            Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
-        );
-        route.insert(Value::String("continue".to_string()), Value::Bool(true));
-        Value::Mapping(route)
+    fn build_receiver(&self) -> serde_json::Value {
+        json!({
+        "name": self.name,
+        "webhookConfigs": [
+            {
+            "url": self.url,
+            "httpConfig": {
+                "tlsConfig": {
+                    "insecureSkipVerify": true
+                    }
+                }
+            }
+        ]})
    }

-    async fn alert_channel_receiver(&self) -> serde_yaml::Value {
-        let mut receiver = Mapping::new();
-        receiver.insert(
-            Value::String("name".to_string()),
-            Value::String(self.name.clone()),
-        );
-
-        let mut webhook_config = Mapping::new();
-        webhook_config.insert(
-            Value::String("url".to_string()),
-            Value::String(self.url.to_string()),
-        );
-
-        receiver.insert(
-            Value::String("webhook_configs".to_string()),
-            Value::Sequence(vec![Value::Mapping(webhook_config)]),
-        );
-
-        Value::Mapping(receiver)
+    fn build_route(&self) -> serde_json::Value {
+        json!({
+            "name": self.name})
    }
 }

-#[cfg(test)]
-mod tests {
-    use super::*;
-    #[tokio::test]
-    async fn webhook_serialize_should_match() {
-        let webhook_receiver = WebhookReceiver {
-            name: "test-webhook".to_string(),
-            url: Url::Url(url::Url::parse("https://webhook.i.dont.exist.com").unwrap()),
-        };
+impl AlertReceiver<OpenshiftClusterAlertSender> for WebhookReceiver {
+    fn name(&self) -> String {
+        self.name.clone()
+    }

-        let webhook_receiver_receiver =
-            serde_yaml::to_string(&webhook_receiver.alert_channel_receiver().await).unwrap();
-        println!("receiver \n{:#}", webhook_receiver_receiver);
-        let webhook_receiver_receiver_yaml = r#"name: test-webhook
-webhook_configs:
- url: https://webhook.i.dont.exist.com/
-"#
-        .to_string();
+    fn clone_box(&self) -> Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
+        Box::new(self.clone())
+    }

-        let webhook_receiver_route =
-            serde_yaml::to_string(&webhook_receiver.alert_channel_route().await).unwrap();
-        println!("route \n{:#}", webhook_receiver_route);
-        let webhook_receiver_route_yaml = r#"receiver: test-webhook
-matchers:
- alertname!=Watchdog
-continue: true
-"#
-        .to_string();
+    fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
+        let receiver = self.build_receiver();
+        let receiver =
+            serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;

-        assert_eq!(webhook_receiver_receiver, webhook_receiver_receiver_yaml);
-        assert_eq!(webhook_receiver_route, webhook_receiver_route_yaml);
+        Ok(ReceiverInstallPlan {
+            install_operation: None,
+            route: Some(self.route.clone()),
+            receiver: Some(receiver),
+        })
+    }
+}
+
+impl AlertReceiver<RedHatClusterObservability> for WebhookReceiver {
+    fn name(&self) -> String {
+        self.name.clone()
+    }
+
+    fn clone_box(&self) -> Box<dyn AlertReceiver<RedHatClusterObservability>> {
+        Box::new(self.clone())
+    }
+
+    fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
+        let receiver = self.build_receiver();
+        let receiver =
+            serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
+
+        Ok(ReceiverInstallPlan {
+            install_operation: None,
+            route: Some(self.route.clone()),
+            receiver: Some(receiver),
+        })
+    }
+}
+
+impl AlertReceiver<KubePrometheus> for WebhookReceiver {
+    fn name(&self) -> String {
+        self.name.clone()
+    }
+
+    fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
+        Box::new(self.clone())
+    }
+
+    fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
+        let receiver = self.build_receiver();
+        let receiver =
+            serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
+
+        Ok(ReceiverInstallPlan {
+            install_operation: None,
+            route: Some(self.route.clone()),
+            receiver: Some(receiver),
+        })
+    }
+}
+
+impl AlertReceiver<Prometheus> for WebhookReceiver {
+    fn name(&self) -> String {
+        self.name.clone()
+    }
+
+    fn clone_box(&self) -> Box<dyn AlertReceiver<Prometheus>> {
+        Box::new(self.clone())
+    }
+
+    fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
+        let receiver = self.build_receiver();
+        let receiver =
+            serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
+
+        Ok(ReceiverInstallPlan {
+            install_operation: None,
+            route: Some(self.route.clone()),
+            receiver: Some(receiver),
+        })
    }
 }
--- a/harmony/src/modules/monitoring/alert_rule/alerts/infra/dell_server.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/infra/dell_server.rs
--- a/harmony/src/modules/monitoring/alert_rule/alerts/infra/mod.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/infra/mod.rs
@@ -1 +1,2 @@
 pub mod dell_server;
+pub mod opnsense;
--- a/harmony/src/modules/monitoring/alert_rule/alerts/infra/opnsense.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/infra/opnsense.rs
@@ -0,0 +1,15 @@
+use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
+
+pub fn high_http_error_rate() -> PrometheusAlertRule {
+    let expression = r#"(
+  sum(rate(http_requests_total{status=~"5.."}[5m])) by (job, route, service)
+  /
+  sum(rate(http_requests_total[5m])) by (job, route, service)
+) > 0.05 and sum(rate(http_requests_total[5m])) by (job, route, service) > 10"#;
+
+    PrometheusAlertRule::new("HighApplicationErrorRate", expression)
+        .for_duration("10m")
+        .label("severity", "warning")
+        .annotation("summary", "High HTTP error rate on {{ $labels.job }}")
+        .annotation("description", "Job {{ $labels.job }} (route {{ $labels.route }}) has an error rate > 5% over the last 10m.")
+}
--- a/harmony/src/modules/monitoring/alert_rule/alerts/k8s/deployment.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/k8s/deployment.rs
--- a/harmony/src/modules/monitoring/alert_rule/alerts/k8s/memory_usage.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/k8s/memory_usage.rs
--- a/harmony/src/modules/monitoring/alert_rule/alerts/k8s/mod.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/k8s/mod.rs
--- a/harmony/src/modules/monitoring/alert_rule/alerts/k8s/pod.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/k8s/pod.rs
--- a/harmony/src/modules/monitoring/alert_rule/alerts/k8s/pvc.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/k8s/pvc.rs
--- a/harmony/src/modules/monitoring/alert_rule/alerts/k8s/service.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/k8s/service.rs
--- a/harmony/src/modules/monitoring/alert_rule/alerts/mod.rs
+++ b/harmony/src/modules/monitoring/alert_rule/alerts/mod.rs
--- a/harmony/src/modules/monitoring/alert_rule/mod.rs
+++ b/harmony/src/modules/monitoring/alert_rule/mod.rs
@@ -1 +1,2 @@
+pub mod alerts;
 pub mod prometheus_alert_rule;
--- a/harmony/src/modules/monitoring/alert_rule/prometheus_alert_rule.rs
+++ b/harmony/src/modules/monitoring/alert_rule/prometheus_alert_rule.rs
@@ -1,79 +1,13 @@
-use std::collections::{BTreeMap, HashMap};
+use std::collections::HashMap;

-use async_trait::async_trait;
 use serde::Serialize;

 use crate::{
-    interpret::{InterpretError, Outcome},
-    modules::monitoring::{
-        kube_prometheus::{
-            prometheus::{KubePrometheus, KubePrometheusRule},
-            types::{AlertGroup, AlertManagerAdditionalPromRules},
-        },
-        prometheus::prometheus::{Prometheus, PrometheusRule},
-    },
-    topology::oberservability::monitoring::AlertRule,
+    interpret::InterpretError,
+    modules::monitoring::{kube_prometheus::KubePrometheus, okd::OpenshiftClusterAlertSender},
+    topology::monitoring::AlertRule,
 };

-#[async_trait]
-impl AlertRule<KubePrometheus> for AlertManagerRuleGroup {
-    async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
-        sender.install_rule(self).await
-    }
-    fn clone_box(&self) -> Box<dyn AlertRule<KubePrometheus>> {
-        Box::new(self.clone())
-    }
-}
-
-#[async_trait]
-impl AlertRule<Prometheus> for AlertManagerRuleGroup {
-    async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
-        sender.install_rule(self).await
-    }
-    fn clone_box(&self) -> Box<dyn AlertRule<Prometheus>> {
-        Box::new(self.clone())
-    }
-}
-
-#[async_trait]
-impl PrometheusRule for AlertManagerRuleGroup {
-    fn name(&self) -> String {
-        self.name.clone()
-    }
-    async fn configure_rule(&self) -> AlertManagerAdditionalPromRules {
-        let mut additional_prom_rules = BTreeMap::new();
-
-        additional_prom_rules.insert(
-            self.name.clone(),
-            AlertGroup {
-                groups: vec![self.clone()],
-            },
-        );
-        AlertManagerAdditionalPromRules {
-            rules: additional_prom_rules,
-        }
-    }
-}
-#[async_trait]
-impl KubePrometheusRule for AlertManagerRuleGroup {
-    fn name(&self) -> String {
-        self.name.clone()
-    }
-    async fn configure_rule(&self) -> AlertManagerAdditionalPromRules {
-        let mut additional_prom_rules = BTreeMap::new();
-
-        additional_prom_rules.insert(
-            self.name.clone(),
-            AlertGroup {
-                groups: vec![self.clone()],
-            },
-        );
-        AlertManagerAdditionalPromRules {
-            rules: additional_prom_rules,
-        }
-    }
-}
-
 impl AlertManagerRuleGroup {
    pub fn new(name: &str, rules: Vec<PrometheusAlertRule>) -> AlertManagerRuleGroup {
        AlertManagerRuleGroup {
@@ -129,3 +63,55 @@ impl PrometheusAlertRule {
        self
    }
 }
+
+impl AlertRule<OpenshiftClusterAlertSender> for AlertManagerRuleGroup {
+    fn build_rule(&self) -> Result<serde_json::Value, InterpretError> {
+        let name = self.name.clone();
+        let mut rules: Vec<crate::modules::monitoring::okd::crd::alerting_rules::Rule> = vec![];
+        for rule in self.rules.clone() {
+            rules.push(rule.into())
+        }
+
+        let rule_groups =
+            vec![crate::modules::monitoring::okd::crd::alerting_rules::RuleGroup { name, rules }];
+
+        Ok(serde_json::to_value(rule_groups).map_err(|e| InterpretError::new(e.to_string()))?)
+    }
+
+    fn name(&self) -> String {
+        self.name.clone()
+    }
+
+    fn clone_box(&self) -> Box<dyn AlertRule<OpenshiftClusterAlertSender>> {
+        Box::new(self.clone())
+    }
+}
+
+impl AlertRule<KubePrometheus> for AlertManagerRuleGroup {
+    fn build_rule(&self) -> Result<serde_json::Value, InterpretError> {
+        let name = self.name.clone();
+        let mut rules: Vec<
+            crate::modules::monitoring::kube_prometheus::crd::crd_prometheus_rules::Rule,
+        > = vec![];
+        for rule in self.rules.clone() {
+            rules.push(rule.into())
+        }
+
+        let rule_groups = vec![
+            crate::modules::monitoring::kube_prometheus::crd::crd_prometheus_rules::RuleGroup {
+                name,
+                rules,
+            },
+        ];
+
+        Ok(serde_json::to_value(rule_groups).map_err(|e| InterpretError::new(e.to_string()))?)
+    }
+
+    fn name(&self) -> String {
+        self.name.clone()
+    }
+
+    fn clone_box(&self) -> Box<dyn AlertRule<KubePrometheus>> {
+        Box::new(self.clone())
+    }
+}
--- a/harmony/src/modules/monitoring/application_monitoring/application_monitoring_score.rs
+++ b/harmony/src/modules/monitoring/application_monitoring/application_monitoring_score.rs
@@ -5,32 +5,26 @@ use serde::Serialize;

 use crate::{
    interpret::Interpret,
-    modules::{
-        application::Application,
-        monitoring::{
-            grafana::grafana::Grafana, kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus,
-        },
-        prometheus::prometheus::PrometheusMonitoring,
-    },
+    modules::{application::Application, monitoring::prometheus::Prometheus},
    score::Score,
    topology::{
        K8sclient, Topology,
-        oberservability::monitoring::{AlertReceiver, AlertingInterpret, ScrapeTarget},
+        monitoring::{AlertReceiver, AlertingInterpret, Observability, ScrapeTarget},
    },
 };

 #[derive(Debug, Clone, Serialize)]
 pub struct ApplicationMonitoringScore {
-    pub sender: CRDPrometheus,
+    pub sender: Prometheus,
    pub application: Arc<dyn Application>,
-    pub receivers: Vec<Box<dyn AlertReceiver<CRDPrometheus>>>,
+    pub receivers: Vec<Box<dyn AlertReceiver<Prometheus>>>,
 }

-impl<T: Topology + PrometheusMonitoring<CRDPrometheus> + K8sclient + Grafana> Score<T>
-    for ApplicationMonitoringScore
-{
+impl<T: Topology + Observability<Prometheus> + K8sclient> Score<T> for ApplicationMonitoringScore {
    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
        debug!("creating alerting interpret");
+        //TODO will need to use k8sclient to apply service monitors or find a way to pass
+        //them to the AlertingInterpret potentially via Sender Prometheus
        Box::new(AlertingInterpret {
            sender: self.sender.clone(),
            receivers: self.receivers.clone(),
--- a/harmony/src/modules/monitoring/application_monitoring/rhobs_application_monitoring_score.rs
+++ b/harmony/src/modules/monitoring/application_monitoring/rhobs_application_monitoring_score.rs
@@ -9,28 +9,27 @@ use crate::{
    inventory::Inventory,
    modules::{
        application::Application,
-        monitoring::kube_prometheus::crd::{
-            crd_alertmanager_config::CRDPrometheus, rhob_alertmanager_config::RHOBObservability,
-        },
-        prometheus::prometheus::PrometheusMonitoring,
+        monitoring::red_hat_cluster_observability::RedHatClusterObservability,
    },
    score::Score,
-    topology::{PreparationOutcome, Topology, oberservability::monitoring::AlertReceiver},
+    topology::{
+        Topology,
+        monitoring::{AlertReceiver, AlertingInterpret, Observability},
+    },
 };
 use harmony_types::id::Id;
-
 #[derive(Debug, Clone, Serialize)]
-pub struct ApplicationRHOBMonitoringScore {
-    pub sender: RHOBObservability,
+pub struct ApplicationRedHatClusterMonitoringScore {
+    pub sender: RedHatClusterObservability,
    pub application: Arc<dyn Application>,
-    pub receivers: Vec<Box<dyn AlertReceiver<RHOBObservability>>>,
+    pub receivers: Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>,
 }

-impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Score<T>
-    for ApplicationRHOBMonitoringScore
+impl<T: Topology + Observability<RedHatClusterObservability>> Score<T>
+    for ApplicationRedHatClusterMonitoringScore
 {
    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
-        Box::new(ApplicationRHOBMonitoringInterpret {
+        Box::new(ApplicationRedHatClusterMonitoringInterpret {
            score: self.clone(),
        })
    }
@@ -44,38 +43,28 @@ impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Score<T>
 }

 #[derive(Debug)]
-pub struct ApplicationRHOBMonitoringInterpret {
-    score: ApplicationRHOBMonitoringScore,
+pub struct ApplicationRedHatClusterMonitoringInterpret {
+    score: ApplicationRedHatClusterMonitoringScore,
 }

 #[async_trait]
-impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Interpret<T>
-    for ApplicationRHOBMonitoringInterpret
+impl<T: Topology + Observability<RedHatClusterObservability>> Interpret<T>
+    for ApplicationRedHatClusterMonitoringInterpret
 {
    async fn execute(
        &self,
        inventory: &Inventory,
        topology: &T,
    ) -> Result<Outcome, InterpretError> {
-        let result = topology
-            .install_prometheus(
-                &self.score.sender,
-                inventory,
-                Some(self.score.receivers.clone()),
-            )
-            .await;
-
-        match result {
-            Ok(outcome) => match outcome {
-                PreparationOutcome::Success { details: _ } => {
-                    Ok(Outcome::success("Prometheus installed".into()))
-                }
-                PreparationOutcome::Noop => {
-                    Ok(Outcome::noop("Prometheus installation skipped".into()))
-                }
-            },
-            Err(err) => Err(InterpretError::from(err)),
-        }
+        //TODO will need to use k8sclient to apply crd ServiceMonitor or find a way to pass
+        //them to the AlertingInterpret potentially via Sender RedHatClusterObservability
+        let alerting_interpret = AlertingInterpret {
+            sender: self.score.sender.clone(),
+            receivers: self.score.receivers.clone(),
+            rules: vec![],
+            scrape_targets: None,
+        };
+        alerting_interpret.execute(inventory, topology).await
    }

    fn get_name(&self) -> InterpretName {
--- a/harmony/src/modules/monitoring/grafana/grafana.rs
+++ b/harmony/src/modules/monitoring/grafana/grafana.rs
@@ -1,17 +1,41 @@
-use async_trait::async_trait;
-use k8s_openapi::Resource;
+use serde::Serialize;

-use crate::{
-    inventory::Inventory,
-    topology::{PreparationError, PreparationOutcome},
-};
+use crate::topology::monitoring::{AlertReceiver, AlertRule, AlertSender, ScrapeTarget};

-#[async_trait]
-pub trait Grafana {
-    async fn ensure_grafana_operator(
-        &self,
-        inventory: &Inventory,
-    ) -> Result<PreparationOutcome, PreparationError>;
-
-    async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError>;
+#[derive(Debug, Clone, Serialize)]
+pub struct Grafana {
+    pub namespace: String,
+}
+
+impl AlertSender for Grafana {
+    fn name(&self) -> String {
+        "grafana".to_string()
+    }
+}
+
+impl Serialize for Box<dyn AlertReceiver<Grafana>> {
+    fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: serde::Serializer,
+    {
+        todo!()
+    }
+}
+
+impl Serialize for Box<dyn AlertRule<Grafana>> {
+    fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: serde::Serializer,
+    {
+        todo!()
+    }
+}
+
+impl Serialize for Box<dyn ScrapeTarget<Grafana>> {
+    fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: serde::Serializer,
+    {
+        todo!()
+    }
 }
--- a/harmony/src/modules/monitoring/grafana/grafana_alerting_score.rs
+++ b/harmony/src/modules/monitoring/grafana/grafana_alerting_score.rs
@@ -0,0 +1,32 @@
+use serde::Serialize;
+
+use crate::{
+    modules::monitoring::grafana::grafana::Grafana,
+    score::Score,
+    topology::{
+        Topology,
+        monitoring::{AlertReceiver, AlertRule, AlertingInterpret, Observability, ScrapeTarget},
+    },
+};
+
+#[derive(Clone, Debug, Serialize)]
+pub struct GrafanaAlertingScore {
+    pub receivers: Vec<Box<dyn AlertReceiver<Grafana>>>,
+    pub rules: Vec<Box<dyn AlertRule<Grafana>>>,
+    pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Grafana>>>>,
+    pub sender: Grafana,
+}
+
+impl<T: Topology + Observability<Grafana>> Score<T> for GrafanaAlertingScore {
+    fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
+        Box::new(AlertingInterpret {
+            sender: self.sender.clone(),
+            receivers: self.receivers.clone(),
+            rules: self.rules.clone(),
+            scrape_targets: self.scrape_targets.clone(),
+        })
+    }
+    fn name(&self) -> String {
+        "HelmPrometheusAlertingScore".to_string()
+    }
+}
--- a/harmony/src/modules/monitoring/grafana/helm/helm_grafana.rs
+++ b/harmony/src/modules/monitoring/grafana/helm/helm_grafana.rs
@@ -1,28 +0,0 @@
-use harmony_macros::hurl;
-use non_blank_string_rs::NonBlankString;
-use std::{collections::HashMap, str::FromStr};
-
-use crate::modules::helm::chart::{HelmChartScore, HelmRepository};
-
-pub fn grafana_helm_chart_score(ns: &str, namespace_scope: bool) -> HelmChartScore {
-    let mut values_overrides = HashMap::new();
-    values_overrides.insert(
-        NonBlankString::from_str("namespaceScope").unwrap(),
-        namespace_scope.to_string(),
-    );
-    HelmChartScore {
-        namespace: Some(NonBlankString::from_str(ns).unwrap()),
-        release_name: NonBlankString::from_str("grafana-operator").unwrap(),
-        chart_name: NonBlankString::from_str("grafana/grafana-operator").unwrap(),
-        chart_version: None,
-        values_overrides: Some(values_overrides),
-        values_yaml: None,
-        create_namespace: true,
-        install_only: true,
-        repository: Some(HelmRepository::new(
-            "grafana".to_string(),
-            hurl!("https://grafana.github.io/helm-charts"),
-            true,
-        )),
-    }
-}
--- a/harmony/src/modules/monitoring/grafana/helm/mod.rs
+++ b/harmony/src/modules/monitoring/grafana/helm/mod.rs
@@ -1 +0,0 @@
-pub mod helm_grafana;
--- a/harmony/src/modules/monitoring/kube_prometheus/crd/crd_grafana.rs
+++ b/harmony/src/modules/monitoring/kube_prometheus/crd/crd_grafana.rs
@@ -4,7 +4,7 @@ use kube::CustomResource;
 use schemars::JsonSchema;
 use serde::{Deserialize, Serialize};

-use super::crd_prometheuses::LabelSelector;
+use crate::modules::monitoring::kube_prometheus::crd::crd_prometheuses::LabelSelector;

 #[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
 #[kube(
--- a/harmony/src/modules/monitoring/kube_prometheus/crd/grafana_default_dashboard.rs
+++ b/harmony/src/modules/monitoring/kube_prometheus/crd/grafana_default_dashboard.rs
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Jean-Gabriel Gill-Couture	c5b292d99b	fix: dependencies and formatting	2026-03-09 22:25:16 -04:00
Jean-Gabriel Gill-Couture	0258b31fd2	e2e tests module ready for review, k3d test works well	2026-03-09 22:17:28 -04:00
Jean-Gabriel Gill-Couture	4407792bd5	chore: use async trait instead of ugly types	2026-03-09 21:59:57 -04:00
Jean-Gabriel Gill-Couture	7978a63004	wip: harmony e2e test module coming along	2026-03-09 21:54:12 -04:00
Jean-Gabriel Gill-Couture	58d00c95bb	Review new test module and slightly improve testing roadmap	2026-03-09 21:01:47 -04:00
Jean-Gabriel Gill-Couture	7d14f7646c	fix(e2e): fix compilation errors in multicluster test - multicluster_postgres test was incomplete, simplified to placeholder - Added todo!() for multi-cluster PostgreSQL test to be implemented later	2026-03-09 20:15:41 -04:00
Jean-Gabriel Gill-Couture	69dd763d6e	feat(e2e): initial e2e test runner with k3d and cnpg tests - Add harmony_e2e_tests crate with CLI test runner - k3d_cluster test: provisions k3d cluster and verifies nodes - cnpg_postgres test: deploys CNPG operator, creates PostgreSQL cluster, waits for readiness, executes SQL query - multicluster_postgres test: placeholder for next iteration	2026-03-09 19:39:59 -04:00
Jean-Gabriel Gill-Couture	2e46ac3418	e2e tests wip	2026-03-09 19:29:22 -04:00
Jean-Gabriel Gill-Couture	af6145afe3	doc: monitoring module documentation All checks were successful Run Check Script / check (pull_request) Successful in 1m23s Details	2026-03-09 18:33:35 -04:00
Jean-Gabriel Gill-Couture	701d86de69	fix: Finish merging k8s refactoring All checks were successful Run Check Script / check (pull_request) Successful in 1m24s Details	2026-03-09 17:20:03 -04:00
Jean-Gabriel Gill-Couture	6db7a780fa	chore: Fix some warnings Some checks failed Run Check Script / check (pull_request) Failing after 40s Details	2026-03-09 17:17:12 -04:00
Jean-Gabriel Gill-Couture	0df4e3cdee	Merge remote-tracking branch 'origin/master' into fix/refactor_alert_receivers	2026-03-09 17:12:39 -04:00
johnride	2a7fa466cc	Merge pull request 'reafactor/k8sclient' (#243 ) from reafactor/k8sclient into master Some checks failed Run Check Script / check (push) Successful in 2m57s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m2s Details Reviewed-on: #243	2026-03-07 23:05:09 +00:00
Jean-Gabriel Gill-Couture	f463cd1e94	Fix merge conflict between master and refactor/k8sclient All checks were successful Run Check Script / check (pull_request) Successful in 1m28s Details	2026-03-07 17:56:26 -05:00
johnride	e1da7949ec	Merge pull request 'okd: add worker nodes to load balancer backend pool' (#246 ) from feat/okd-load-balancer-include-workers into master Some checks failed Run Check Script / check (push) Successful in 1m30s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 1m53s Details Reviewed-on: #246	2026-03-07 22:42:14 +00:00
Jean-Gabriel Gill-Couture	d0a1a73710	doc: fix example code to use ignore instead of no_run All checks were successful Run Check Script / check (pull_request) Successful in 1m43s Details - fails because cannot be used at module level - Use to skip doc compilation while keeping example visible	2026-03-07 17:30:24 -05:00
Jean-Gabriel Gill-Couture	bc2b328296	okd: include workers in load balancer backend pool + add tests and docs Some checks failed Run Check Script / check (pull_request) Failing after 24s Details - Add nodes_to_backend_server() function to include both control plane and worker nodes - Update public services (ports 80, 443) to use worker-inclusive backend pool - Add comprehensive tests covering all backend configurations - Add documentation with OKD reference link and usage examples	2026-03-07 17:15:24 -05:00
Jean-Gabriel Gill-Couture	a93896707f	okd: add worker nodes to load balancer backend pool All checks were successful Run Check Script / check (pull_request) Successful in 1m29s Details Include both control plane and worker nodes in ports 80 and 443 backend pools	2026-03-07 16:46:47 -05:00
Jean-Gabriel Gill-Couture	0e9b23a320	Merge branch 'feat/change-node-readiness-strategy' Some checks failed Run Check Script / check (push) Successful in 1m26s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m11s Details	2026-03-07 16:35:14 -05:00
Jean-Gabriel Gill-Couture	f532ba2b40	doc: Update node readiness readme and deployed port to 25001 All checks were successful Run Check Script / check (pull_request) Successful in 1m27s Details	2026-03-07 16:33:28 -05:00
Jean-Gabriel Gill-Couture	fafca31798	fix: formatting and check script All checks were successful Run Check Script / check (pull_request) Successful in 1m28s Details	2026-03-07 16:08:52 -05:00
johnride	5412c34957	Merge pull request 'fix: change vlan definition from MaybeString to RawXml' (#245 ) from feat/opnsense-config-xml-support-vlan into master Some checks failed Run Check Script / check (push) Successful in 1m47s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m7s Details Reviewed-on: #245	2026-03-07 20:59:28 +00:00
Jean-Gabriel Gill-Couture	787cc8feab	Fix doc tests for harmony-k8s crate refactoring All checks were successful Run Check Script / check (pull_request) Successful in 2m6s Details - Updated harmony-k8s doc tests to import from harmony_k8s instead of harmony - Changed CloudNativePgOperatorScore::default() to default_openshift() This ensures doc tests work correctly after moving K8sClient to the harmony-k8s crate.	2026-03-07 15:50:39 -05:00
Jean-Gabriel Gill-Couture	ce041f495b	fix(zitadel): include admin@zitadel.{host} username, secure password with symbol/number, and cert-manager TLS configuration Some checks failed Run Check Script / check (pull_request) Failing after 26s Details Update Zitadel deployment to use correct username format (admin@zitadel.{host}), generate secure passwords with required complexity (uppercase, lowercase, digit, symbol), configure edge TLS termination for OpenShift, and add cert-manager annotations. Also refactor password generation to ensure all complexity requirements are met.	2026-03-07 15:29:26 -05:00
Sylvain Tremblay	55de206523	fix: change vlan definition from MaybeString to RawXml All checks were successful Run Check Script / check (pull_request) Successful in 1m29s Details	2026-03-07 10:03:03 -05:00
Jean-Gabriel Gill-Couture	64893a84f5	fix(node health endpoint): Setup sane timeouts for usage as a load balancer health check. The default k8s client timeout of 30 seconds caused haproxy health check to fail even though we still returned 200OK after 30 seconds Some checks failed Run Check Script / check (pull_request) Failing after 25s Details	2026-03-06 16:28:13 -05:00
Jean-Gabriel Gill-Couture	f941672662	fix: Node readiness always fails open when kube api call fails on note status check Some checks failed Run Check Script / check (pull_request) Failing after 1m54s Details	2026-03-06 15:45:38 -05:00
Jean-Gabriel Gill-Couture	a98113dd40	wip: zitadel ingress https not working yet Some checks failed Run Check Script / check (pull_request) Failing after 28s Details	2026-03-06 15:28:21 -05:00
Sylvain Tremblay	5db1a31d33	...	2026-03-06 15:24:33 -05:00
Jean-Gabriel Gill-Couture	f5aac67af8	feat: k8s client works fine, added version config in zitadel and fix master key secret existence handling Some checks failed Run Check Script / check (pull_request) Failing after 32s Details	2026-03-06 15:15:35 -05:00
Sylvain Tremblay	d7e5bf11d5	removing bad stuff I did this morning and trying to make it simple, and adding a couple tests	2026-03-06 14:41:08 -05:00
Jean-Gabriel Gill-Couture	2e1f1b8447	feat: Refactor K8sClient into separate, publishable crate, and add zitadel example	2026-03-06 14:21:15 -05:00
Sylvain Tremblay	2b157ad7fd	feat: add a background loop checking the node status every X seconds. If NotReady for Y seconds, kill the router pod if there's one	2026-03-06 11:57:39 -05:00
Jean-Gabriel Gill-Couture	a0c0905c3b	wip: zitadel deployment	2026-03-06 10:56:48 -05:00
stremblay	fe52f69473	Merge pull request 'feat/openbao_secret_manager' (#239 ) from feat/openbao_secret_manager into master Some checks failed Run Check Script / check (push) Successful in 1m35s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m36s Details Reviewed-on: #239 Reviewed-by: stremblay <stremblay@nationtech.io>	2026-03-04 15:06:15 +00:00
Jean-Gabriel Gill-Couture	d8338ad12c	wip(sso): Openbao deploys fine, not fully tested yet, zitadel wip All checks were successful Run Check Script / check (pull_request) Successful in 1m40s Details	2026-03-04 09:53:33 -05:00
Jean-Gabriel Gill-Couture	ac9fedf853	wip(secret store): Fix openbao, refactor with rust client	2026-03-04 09:33:21 -05:00
Jean-Gabriel Gill-Couture	fd3705e382	wip(secret store): openbao/vault store implementation	2026-03-04 09:33:21 -05:00
stremblay	4840c7fdc2	Merge pull request 'feat/node-health-score' (#242 ) from feat/node-health-score into master Some checks failed Run Check Script / check (push) Successful in 1m51s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 3m16s Details Reviewed-on: #242 Reviewed-by: johnride <jg@nationtech.io>	2026-03-04 14:31:44 +00:00
Sylvain Tremblay	20172a7801	removing another useless commented line All checks were successful Run Check Script / check (pull_request) Successful in 2m17s Details	2026-03-04 09:31:02 -05:00
Sylvain Tremblay	6bb33c5845	remove useless comment All checks were successful Run Check Script / check (pull_request) Successful in 1m43s Details	2026-03-04 09:29:49 -05:00
Sylvain Tremblay	d9357adad3	format code, fix interpert name All checks were successful Run Check Script / check (pull_request) Successful in 1m33s Details	2026-03-04 09:28:32 -05:00
Sylvain Tremblay	a25ca86bdf	wip: happy path is working Some checks failed Run Check Script / check (pull_request) Failing after 29s Details	2026-03-04 08:21:08 -05:00
Sylvain Tremblay	646c5e723e	feat: implementing node_health	2026-03-04 07:16:25 -05:00
stremblay	69c382e8c6	Merge pull request 'feat(k8s): Can now apply resources of any scope. Kind of a hack leveraging the dynamic type under the hood but this is due to a limitation of kube-rs' (#241 ) from feat/k8s_apply_any_scope into master Some checks failed Run Check Script / check (push) Successful in 2m42s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 4m15s Details Reviewed-on: #241 Reviewed-by: stremblay <stremblay@nationtech.io>	2026-03-03 20:06:03 +00:00
Jean-Gabriel Gill-Couture	dca764395d	feat(k8s): Can now apply resources of any scope. Kind of a hack leveraging the dynamic type under the hood but this is due to a limitation of kube-rs Some checks failed Run Check Script / check (pull_request) Failing after 38s Details	2026-03-03 14:37:52 -05:00
johnride	2738985edb	Merge pull request 'feat: New harmony node readiness mini project what exposes health of a node on port 25001' (#237 ) from feat/harmony-node-health-endpoint into master Some checks failed Run Check Script / check (push) Successful in 1m36s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 3m35s Details Reviewed-on: #237	2026-03-02 19:56:39 +00:00
Jean-Gabriel Gill-Couture	d9a21bf94b	feat: node readiness now supports a check query param with node_ready and okd_router_1936 options All checks were successful Run Check Script / check (pull_request) Successful in 1m51s Details	2026-03-02 14:55:28 -05:00
wjro	5c34d81d28	fix: modified alert receiver trait to allow install plan which provides the topology the ability to apply receiver specfici configurations as required by the underlying alert sender All checks were successful Run Check Script / check (pull_request) Successful in 1m35s Details	2026-02-27 11:50:41 -05:00
wjro	c4dd0b0cf2	chore: cleaned up some dead code, comments, etc All checks were successful Run Check Script / check (pull_request) Successful in 1m39s Details	2026-02-26 16:06:14 -05:00
wjro	b14b41d172	refactor: prometheus alert sender All checks were successful Run Check Script / check (pull_request) Successful in 1m40s Details	2026-02-26 15:10:28 -05:00
wjro	5e861cfc6d	refactor: skeleton structure for grafana observability All checks were successful Run Check Script / check (pull_request) Successful in 1m36s Details	2026-02-26 14:38:28 -05:00
wjro	4fad077eb4	refactor(kubeprometheus): implemented Observability for KubePrometheus All checks were successful Run Check Script / check (pull_request) Successful in 1m38s Details	2026-02-26 13:07:28 -05:00
wjro	d80561e326	wip(kubeprometheus): created base scores for kubeprometheus alert receivers, scrape_tarets and rules Some checks failed Run Check Script / check (pull_request) Failing after 37s Details	2026-02-25 16:16:33 -05:00
wjro	621aed4903	wip: refactoring kubeprometheus Some checks failed Run Check Script / check (pull_request) Failing after 10m18s Details	2026-02-25 15:48:12 -05:00
wjro	e68426cc3d	feat: added implentation for prometheus node exporter external scrape target for openshift cluster alert sender. added alerting rule to return high http error rate Some checks failed Run Check Script / check (pull_request) Failing after 39s Details	2026-02-25 14:54:10 -05:00
wjro	0c1c8daf13	wip: working alert rule for okd Some checks failed Run Check Script / check (pull_request) Failing after 1m31s Details	2026-02-24 16:13:30 -05:00
wjro	4b5e3a52a1	feat: working example of enabling and adding an alert receiver for okd_cluster_alerts All checks were successful Run Check Script / check (pull_request) Successful in 1m42s Details	2026-02-24 11:14:47 -05:00
wjro	c54936d19f	fix: added check to verify if cluster monitoring is enabled Some checks failed Run Check Script / check (pull_request) Failing after 40s Details	2026-02-23 16:07:52 -05:00
wjro	699822af74	chore: reorganized file location All checks were successful Run Check Script / check (pull_request) Successful in 2m14s Details	2026-02-23 15:03:55 -05:00
wjro	554c94f5a9	wip: compiles All checks were successful Run Check Script / check (pull_request) Successful in 2m9s Details	2026-02-23 14:48:05 -05:00
wjro	836db9e6b1	wip: refactored redhat cluster observability operator Some checks failed Run Check Script / check (pull_request) Failing after 41s Details	2026-02-23 13:18:40 -05:00
wjro	bc6a41d40c	wip: removed use of installable trait, added all installation and ensure ready functions to the trait monitor, first impl of AlertReceiver for OpenshiftClusterAlertSender Some checks failed Run Check Script / check (pull_request) Failing after -22s Details	2026-02-20 12:49:55 -05:00
wjro	8d446ec2e4	wip: refactoring monitoring Some checks failed Run Check Script / check (pull_request) Failing after -14s Details	2026-02-19 16:25:59 -05:00