fix: dependencies and formatting

e2e tests module ready for review, k3d test works well
chore: use async trait instead of ugly types
2026-03-09 22:25:16 -04:00 · 2026-03-09 22:17:28 -04:00 · 2026-03-09 21:59:57 -04:00 · 2026-03-09 21:54:12 -04:00 · 2026-03-09 21:01:47 -04:00 · 2026-03-09 20:15:41 -04:00
239 changed files with 15129 additions and 5145 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -26,3 +26,6 @@ Cargo.lock
 *.pdb

 .harmony_generated
+
+# Useful to create ignore folders for temp files and notes
+ignore
--- a/CI_and_testing_harmony_analysis.md
+++ b/CI_and_testing_harmony_analysis.md
@@ -0,0 +1,548 @@
+# CI and Testing Strategy for Harmony
+
+## Executive Summary
+
+Harmony aims to become a CNCF project, requiring a robust CI pipeline that demonstrates real-world reliability. The goal is to run **all examples** in CI, from simple k3d deployments to full HA OKD clusters on bare metal. This document provides context for designing and implementing this testing infrastructure.
+
+---
+
+## Project Context
+
+### What is Harmony?
+
+Harmony is an infrastructure automation framework that is **code-first and code-only**. Operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Key differentiators:
+
+1. **Compile-time safety**: The type system prevents "config-is-valid-but-platform-is-wrong" errors
+2. **Topology abstraction**: Write once, deploy to any environment (local k3d, OKD, bare metal, cloud)
+3. **Capability-based design**: Scores declare what they need; topologies provide what they have
+
+### Core Abstractions
+
+| Concept | Description |
+|---------|-------------|
+| **Score** | Declarative description of desired state (the "what") |
+| **Topology** | Logical representation of infrastructure (the "where") |
+| **Capability** | A feature a topology offers (the "how") |
+| **Interpret** | Execution logic connecting Score to Topology |
+
+### Compile-Time Verification
+
+```rust
+// This compiles only if K8sAnywhereTopology provides K8sclient + HelmCommand
+impl<T: Topology + K8sclient + HelmCommand> Score<T> for MyScore { ... }
+
+// This FAILS to compile - LinuxHostTopology doesn't provide K8sclient
+// (intentionally broken example for testing)
+impl<T: Topology + K8sclient> Score<T> for K8sResourceScore { ... }
+// error: LinuxHostTopology does not implement K8sclient
+```
+
+---
+
+## Current Examples Inventory
+
+### Summary Statistics
+
+| Category | Count | CI Complexity |
+|----------|-------|---------------|
+| k3d-compatible | 22 | Low - single k3d cluster |
+| OKD-specific | 4 | Medium - requires OKD cluster |
+| Bare metal | 4 | High - requires physical infra or nested virtualization |
+| Multi-cluster | 3 | High - requires multiple K8s clusters |
+| No infra needed | 4 | Trivial - local only |
+
+### Detailed Example Classification
+
+#### Tier 1: k3d-Compatible (22 examples)
+
+Can run on a local k3d cluster with minimal setup:
+
+| Example | Topology | Capabilities | Special Notes |
+|---------|----------|--------------|---------------|
+| zitadel | K8sAnywhereTopology | K8sClient, HelmCommand | SSO/Identity |
+| node_health | K8sAnywhereTopology | K8sClient | Health checks |
+| public_postgres | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Needs ingress |
+| openbao | K8sAnywhereTopology | K8sClient, HelmCommand | Vault alternative |
+| rust | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Webapp deployment |
+| cert_manager | K8sAnywhereTopology | K8sClient, CertificateManagement | TLS certificates |
+| try_rust_webapp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Full webapp |
+| monitoring | K8sAnywhereTopology | K8sClient, HelmCommand, Observability | Prometheus |
+| application_monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
+| monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
+| postgresql | K8sAnywhereTopology | K8sClient, HelmCommand | CloudNativePG |
+| ntfy | K8sAnywhereTopology | K8sClient, HelmCommand | Notifications |
+| tenant | K8sAnywhereTopology | K8sClient, TenantManager | Namespace isolation |
+| lamp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | LAMP stack |
+| k8s_drain_node | K8sAnywhereTopology | K8sClient | Node operations |
+| k8s_write_file_on_node | K8sAnywhereTopology | K8sClient | Node operations |
+| remove_rook_osd | K8sAnywhereTopology | K8sClient | Ceph operations |
+| validate_ceph_cluster_health | K8sAnywhereTopology | K8sClient | Ceph health |
+| kube-rs | Direct kube | K8sClient | Raw kube-rs demo |
+| brocade_snmp_server | K8sAnywhereTopology | K8sClient | SNMP collector |
+| harmony_inventory_builder | LocalhostTopology | None | Network scanning |
+| cli | LocalhostTopology | None | CLI demo |
+
+#### Tier 2: OKD/OpenShift-Specific (4 examples)
+
+Require OKD/OpenShift features not available in vanilla K8s:
+
+| Example | Topology | OKD-Specific Feature |
+|---------|----------|---------------------|
+| okd_cluster_alerts | K8sAnywhereTopology | OpenShift Monitoring CRDs |
+| operatorhub_catalog | K8sAnywhereTopology | OpenShift OperatorHub |
+| rhob_application_monitoring | K8sAnywhereTopology | RHOB (Red Hat Observability) |
+| nats-supercluster | K8sAnywhereTopology | OKD Routes (OpenShift Ingress) |
+
+#### Tier 3: Bare Metal Infrastructure (4 examples)
+
+Require physical hardware or full virtualization:
+
+| Example | Topology | Physical Requirements |
+|---------|----------|----------------------|
+| okd_installation | HAClusterTopology | OPNSense, Brocade switch, PXE boot, 3+ nodes |
+| okd_pxe | HAClusterTopology | OPNSense, Brocade switch, PXE infrastructure |
+| sttest | HAClusterTopology | Full HA cluster with all network services |
+| opnsense | OPNSenseFirewall | OPNSense firewall access |
+| opnsense_node_exporter | Custom | OPNSense firewall |
+
+#### Tier 4: Multi-Cluster (3 examples)
+
+Require multiple K8s clusters:
+
+| Example | Topology | Clusters Required |
+|---------|----------|-------------------|
+| nats | K8sAnywhereTopology × 2 | 2 clusters with NATS gateways |
+| nats-module | DecentralizedTopology | 3 clusters for supercluster |
+| multisite_postgres | FailoverTopology | 2 clusters for replication |
+
+---
+
+## Testing Categories
+
+### 1. Compile-Time Tests
+
+These tests verify that the type system correctly rejects invalid configurations:
+
+```rust
+// Should NOT compile - K8sResourceScore on LinuxHostTopology
+#[test]
+#[compile_fail]
+fn test_k8s_score_on_linux_host() {
+    let score = K8sResourceScore::new();
+    let topology = LinuxHostTopology::new();
+    // This line should fail to compile
+    harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
+}
+
+// Should compile - K8sResourceScore on K8sAnywhereTopology
+#[test]
+fn test_k8s_score_on_k8s_topology() {
+    let score = K8sResourceScore::new();
+    let topology = K8sAnywhereTopology::from_env();
+    // This should compile
+    harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
+}
+```
+
+**Implementation Options:**
+- `trybuild` crate for compile-time failure tests
+- Separate `tests/compile_fail/` directory with expected error messages
+
+### 2. Unit Tests
+
+Pure Rust logic without external dependencies:
+- Score serialization/deserialization
+- Inventory parsing
+- Type conversions
+- CRD generation
+
+**Requirements:**
+- No external services
+- Sub-second execution
+- Run on every PR
+
+### 3. Integration Tests (k3d)
+
+Deploy to a local k3d cluster:
+
+**Setup:**
+```bash
+# Install k3d
+curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
+
+# Create cluster
+k3d cluster create harmony-test \
+  --agents 3 \
+  --k3s-arg "--disable=traefik@server:0"
+
+# Wait for ready
+kubectl wait --for=condition=Ready nodes --all --timeout=120s
+```
+
+**Test Matrix:**
+| Example | k3d | Test Type |
+|---------|-----|-----------|
+| zitadel | ✅ | Deploy + health check |
+| cert_manager | ✅ | Deploy + certificate issuance |
+| monitoring | ✅ | Deploy + metric collection |
+| postgresql | ✅ | Deploy + database connectivity |
+| tenant | ✅ | Namespace creation + isolation |
+
+### 4. Integration Tests (OKD)
+
+Deploy to OKD/OpenShift cluster:
+
+**Options:**
+1. **Nested virtualization**: Run OKD in VMs (slow, expensive)
+2. **CRC (CodeReady Containers)**: Single-node OKD (resource intensive)
+3. **Managed OpenShift**: AWS/Azure/GCP (costly)
+4. **Existing cluster**: Connect to pre-provisioned cluster (fastest)
+
+**Test Matrix:**
+| Example | OKD Required | Test Type |
+|---------|--------------|-----------|
+| okd_cluster_alerts | ✅ | Alert rule deployment |
+| rhob_application_monitoring | ✅ | RHOB stack deployment |
+| operatorhub_catalog | ✅ | Operator installation |
+
+### 5. End-to-End Tests (Full Infrastructure)
+
+Complete infrastructure deployment including bare metal:
+
+**Options:**
+1. **Libvirt + KVM**: Virtual machines on CI runner
+2. **Nested KVM**: KVM inside KVM (for cloud CI)
+3. **Dedicated hardware**: Physical test lab
+4. **Mock/Hybrid**: Mock physical components, real K8s
+
+---
+
+## CI Environment Options
+
+### Option A: GitHub Actions (Current Standard)
+
+**Pros:**
+- Native GitHub integration
+- Large runner ecosystem
+- Free for open source
+
+**Cons:**
+- Limited nested virtualization support
+- 6-hour job timeout
+- Resource constraints on free runners
+
+**Matrix:**
+```yaml
+strategy:
+  matrix:
+    os: [ubuntu-latest]
+    rust: [stable, beta]
+    k8s: [k3d, kind]
+    tier: [unit, k3d-integration]
+```
+
+### Option B: Self-Hosted Runners
+
+**Pros:**
+- Full control over environment
+- Can run nested virtualization
+- No time limits
+- Persistent state between runs
+
+**Cons:**
+- Maintenance overhead
+- Cost of infrastructure
+- Security considerations
+
+**Setup:**
+- Bare metal servers with KVM support
+- Pre-installed k3d, kind, CRC
+- OPNSense VM for network tests
+
+### Option C: Hybrid (GitHub + Self-Hosted)
+
+**Pros:**
+- Fast unit tests on GitHub runners
+- Heavy tests on self-hosted infrastructure
+- Cost-effective
+
+**Cons:**
+- Two CI systems to maintain
+- Complexity in test distribution
+
+### Option D: Cloud CI (CircleCI, GitLab CI, etc.)
+
+**Pros:**
+- Often better resource options
+- Docker-in-Docker support
+- Better nested virtualization
+
+**Cons:**
+- Cost
+- Less GitHub-native
+
+---
+
+## Performance Requirements
+
+### Target Execution Times
+
+| Test Category | Target Time | Current (est.) |
+|---------------|-------------|----------------|
+| Compile-time tests | < 30s | Unknown |
+| Unit tests | < 60s | Unknown |
+| k3d integration (per example) | < 120s | 60-300s |
+| Full k3d matrix | < 15 min | 30-60 min |
+| OKD integration | < 30 min | 1-2 hours |
+| Full E2E | < 2 hours | 4-8 hours |
+
+### Sub-Second Performance Strategies
+
+1. **Parallel execution**: Run independent tests concurrently
+2. **Incremental testing**: Only run affected tests on changes
+3. **Cached clusters**: Pre-warm k3d clusters
+4. **Layered testing**: Fail fast on cheaper tests
+5. **Mock external services**: Fake Discord webhooks, etc.
+
+---
+
+## Test Data and Secrets Management
+
+### Secrets Required
+
+| Secret | Use | Storage |
+|--------|-----|---------|
+| Discord webhook URL | Alert receiver tests | GitHub Secrets |
+| OPNSense credentials | Network tests | Self-hosted only |
+| Cloud provider creds | Multi-cloud tests | Vault / GitHub Secrets |
+| TLS certificates | Ingress tests | Generated on-the-fly |
+
+### Test Data
+
+| Data | Source | Strategy |
+|------|--------|----------|
+| Container images | Public registries | Cache locally |
+| Helm charts | Public repos | Vendor in repo |
+| K8s manifests | Generated | Dynamic |
+
+---
+
+## Proposed Test Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    harmony_e2e_tests Package                      │
+│                    (cargo run -p harmony_e2e_tests)              │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
+│  │  Compile    │  │   Unit      │  │   Compile-Fail Tests    │  │
+│  │  Tests      │  │   Tests     │  │   (trybuild)            │  │
+│  │  < 30s      │  │   < 60s     │  │   < 30s                 │  │
+│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              k3d Integration Tests                         │  │
+│  │  Self-provisions k3d cluster, runs 22 examples            │  │
+│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐         │  │
+│  │  │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ...     │  │
+│  │  │  60s    │ │  90s    │ │ 120s    │ │  90s    │         │  │
+│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘         │  │
+│  │                  Parallel Execution                        │  │
+│  └───────────────────────────────────────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              OKD Integration Tests                         │  │
+│  │  Connects to existing OKD cluster or provisions via KVM    │  │
+│  │  ┌─────────────────┐  ┌─────────────────────────────┐    │  │
+│  │  │ okd_cluster_    │  │ rhob_application_           │    │  │
+│  │  │ alerts (5 min)  │  │ monitoring (10 min)         │    │  │
+│  │  └─────────────────┘  └─────────────────────────────┘    │  │
+│  └───────────────────────────────────────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              KVM-based E2E Tests                           │  │
+│  │  Uses Harmony's KVM module to provision test VMs           │  │
+│  │  ┌─────────────────┐  ┌─────────────────────────────┐    │  │
+│  │  │ okd_installation│  │ Full HA cluster deployment   │    │  │
+│  │  │ (30-60 min)     │  │ (60-120 min)                │    │  │
+│  │  └─────────────────┘  └─────────────────────────────┘    │  │
+│  └───────────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────────┘
+
+Any CI system (GitHub Actions, GitLab CI, Jenkins, cron) just runs:
+    cargo run -p harmony_e2e_tests
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        GitHub Actions                            │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
+│  │  Compile    │  │   Unit      │  │   Compile-Fail Tests    │  │
+│  │  Tests      │  │   Tests     │  │   (trybuild)            │  │
+│  │  < 30s      │  │   < 60s     │  │   < 30s                 │  │
+│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              k3d Integration Tests                         │  │
+│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐         │  │
+│  │  │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ...     │  │
+│  │  │  60s    │ │  90s    │ │ 120s    │ │  90s    │         │  │
+│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘         │  │
+│  │                  Parallel Execution                        │  │
+│  └───────────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────────┘
+
+┌─────────────────────────────────────────────────────────────────┐
+│                     Self-Hosted Runners                          │
+├─────────────────────────────────────────────────────────────────┤
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              OKD Integration Tests                         │  │
+│  │  ┌─────────────────┐  ┌─────────────────────────────┐    │  │
+│  │  │ okd_cluster_    │  │ rhob_application_           │    │  │
+│  │  │ alerts (5 min)  │  │ monitoring (10 min)         │    │  │
+│  │  └─────────────────┘  └─────────────────────────────┘    │  │
+│  └───────────────────────────────────────────────────────────┘  │
+│                                                                  │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              KVM-based E2E Tests (Harmony provisions)      │  │
+│  │  ┌─────────────────────────────────────────────────────┐  │  │
+│  │  │  Harmony KVM Module provisions test VMs             │  │  │
+│  │  │  - OKD HA Cluster (3 control plane, 2 workers)     │  │  │
+│  │  │  - OPNSense VM (router/firewall)                    │  │  │
+│  │  │  - Brocade simulator VM                             │  │  │
+│  │  └─────────────────────────────────────────────────────┘  │  │
+│  └───────────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Questions for Researchers
+
+### Critical Questions
+
+1. **Self-contained test runner**: How to design `harmony_e2e_tests` package that runs all tests with a single `cargo run` command?
+
+2. **Nested Virtualization**: What are the prerequisites for running KVM inside a test environment?
+
+3. **Cost Optimization**: How to minimize cloud costs while running comprehensive E2E tests?
+
+4. **Test Isolation**: How to ensure test isolation when running parallel k3d tests?
+
+5. **State Management**: Should we persist k3d clusters between test runs, or create fresh each time?
+
+6. **Mocking Strategy**: Which external services (Discord, OPNSense, etc.) should be mocked vs. real?
+
+7. **Compile-Fail Tests**: Best practices for testing Rust compile-time errors?
+
+8. **Multi-Cluster Tests**: How to efficiently provision and connect multiple K8s clusters in tests?
+
+9. **Secrets Management**: How to handle secrets for test environments without external CI dependencies?
+
+10. **Test Flakiness**: Strategies for reducing flakiness in infrastructure tests?
+
+11. **Reporting**: How to present test results for complex multi-environment test matrices?
+
+12. **Prerequisite Detection**: How to detect and validate prerequisites (Docker, k3d, KVM) before running tests?
+
+### Research Areas
+
+1. **CI/CD Tools**: Evaluate GitHub Actions, GitLab CI, CircleCI, Tekton, Prow for Harmony's needs
+
+2. **K8s Test Tools**: Evaluate kind, k3d, minikube, microk8s for local testing
+
+3. **Mock Frameworks**: Evaluate mock-server, wiremock, hoverfly for external service mocking
+
+4. **Test Frameworks**: Evaluate built-in Rust test, nextest, cargo-tarpaulin for performance
+
+---
+
+## Success Criteria
+
+### Week 1 (Agentic Velocity)
+- [ ] Compile-time verification tests working
+- [ ] Unit tests for monitoring module
+- [ ] First 5 k3d examples running in CI
+- [ ] Mock framework for Discord webhooks
+
+### Week 2
+- [ ] All 22 k3d-compatible examples in CI
+- [ ] OKD self-hosted runner operational
+- [ ] KVM module reviewed and ready for CI
+
+### Week 3-4
+- [ ] Full E2E tests with KVM infrastructure
+- [ ] Multi-cluster tests automated
+- [ ] All examples tested in CI
+
+### Month 2
+- [ ] Sub-15-minute total CI time
+- [ ] Weekly E2E tests on bare metal
+- [ ] Documentation complete
+- [ ] Ready for CNCF submission
+
+---
+
+## Prerequisites
+
+### Hardware Requirements
+
+| Component | Minimum | Recommended |
+|-----------|---------|------------|
+| CPU | 4 cores | 8+ cores (for parallel tests) |
+| RAM | 8 GB | 32 GB (for KVM E2E) |
+| Disk | 50 GB SSD | 500 GB NVMe |
+| Docker | Required | Latest |
+| k3d | Required | v5.6.0 |
+| Kubectl | Required | v1.28.0 |
+| libvirt | Required | 9.0.0 (for KVM tests) |
+
+### Software Requirements
+| Tool | Version |
+|------|---------|
+| Rust | 1.75+ |
+| Docker | 24.0+ |
+| k3d | v5.6.0+ |
+| kubectl | v1.28+ |
+| libvirt | 9.0.0 |
+
+### Installation (One-time)
+
+```bash
+# Install Rust
+curl --proto '=https://sh.rustup.rs' -sSf | sh
+
+# Install Docker
+curl -fsSL https://get.docker.com -o docker-ce | sh
+
+# Install k3d
+curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
+
+# Install kubectl
+curl -LO "https://dl.k8s.io/release/v1.28.0/bin/linux/amd64" -o /usr/local/bin/kubectl
+
+sudo mv /usr/local/bin/kubectl /usr/local/bin
+```
+
+---
+
+## Reference Materials
+### Existing Code
+
+- Examples: `examples/*/src/main.rs`
+- Topologies: `harmony/src/domain/topology/`
+- Capabilities: `harmony/src/domain/topology/` (trait definitions)
+- Scores: `harmony/src/modules/*/`
+
+### Documentation
+
+- [Coding Guide](docs/coding-guide.md)
+- [Core Concepts](docs/concepts.md)
+- [Monitoring Architecture](docs/monitoring.md)
+- [ADR-020: Monitoring](adr/020-monitoring-alerting-architecture.md)
+
+### Related Projects
+
+- Crossplane (similar abstraction model)
+- Pulumi (infrastructure as code)
+- Terraform (state management patterns)
+- Flux/ArgoCD (GitOps testing patterns)
--- a/CI_and_testing_roadmap.md
+++ b/CI_and_testing_roadmap.md
@@ -0,0 +1,201 @@
+# Pragmatic CI and Testing Roadmap for Harmony
+
+**Status**: Active implementation (March 2026)  
+**Core Principle**: Self-contained test runner — no dependency on centralized CI servers
+
+All tests are executable via one command:
+
+```bash
+cargo run -p harmony_e2e_tests
+```
+
+The `harmony_e2e_tests` package:
+- Provisions its own infrastructure when needed (k3d, KVM VMs)
+- Runs all test tiers in sequence or selectively
+- Reports results in text, JSON or JUnit XML
+- Works identically on developer laptops, any Linux server, GitHub Actions, GitLab CI, Jenkins, cron jobs, etc.
+- Is the single source of truth for what "passing CI" means
+
+## Why This Approach
+
+1. **Portability** — same command & behavior everywhere
+2. **Harmony tests Harmony** — the framework validates itself
+3. **No vendor lock-in** — GitHub Actions / GitLab CI are just triggers
+4. **Perfect reproducibility** — developers reproduce any CI failure locally in seconds
+5. **Offline capable** — after initial setup, most tiers run without internet
+
+## Architecture: `harmony_e2e_tests` Package
+
+```
+harmony_e2e_tests/
+├── Cargo.toml
+├── src/
+│   ├── main.rs              # CLI entry point
+│   ├── lib.rs               # Test runner core logic
+│   ├── tiers/
+│   │   ├── mod.rs
+│   │   ├── compile_fail.rs  # trybuild-based compile-time checks
+│   │   ├── unit.rs          # cargo test --lib --workspace
+│   │   ├── k3d.rs           # k3d cluster + parallel example runs
+│   │   ├── okd.rs           # connect to existing OKD cluster
+│   │   └── kvm.rs           # full E2E via Harmony's own KVM module
+│   ├── mocks/
+│   │   ├── mod.rs
+│   │   ├── discord.rs       # mock Discord webhook receiver
+│   │   └── opnsense.rs      # mock OPNSense firewall API
+│   └── infrastructure/
+│       ├── mod.rs
+│       ├── k3d.rs           # k3d cluster lifecycle
+│       └── kvm.rs           # helper wrappers around KVM score
+└── tests/
+    ├── ui/                  # trybuild compile-fail cases (*.rs + *.stderr)
+    └── fixtures/            # static test data / golden files
+```
+
+## CLI Interface ( clap-based )
+
+```bash
+# Run everything (default)
+cargo run -p harmony_e2e_tests
+
+# Specific tier
+cargo run -p harmony_e2e_tests -- --tier k3d
+cargo run -p harmony_e2e_tests -- --tier compile
+
+# Filter to one example
+cargo run -p harmony_e2e_tests -- --tier k3d --example monitoring
+
+# Parallelism control (k3d tier)
+cargo run -p harmony_e2e_tests -- --parallel 8
+
+# Reporting
+cargo run -p harmony_e2e_tests -- --report junit.xml
+cargo run -p harmony_e2e_tests -- --format json
+
+# Debug helpers
+cargo run -p harmony_e2e_tests -- --verbose --dry-run
+```
+
+## Test Tiers – Ordered by Speed & Cost
+
+| Tier             | Duration target | Runner type          | What it tests                                      | Isolation strategy          |
+|------------------|------------------|----------------------|----------------------------------------------------|-----------------------------|
+| Compile-fail     | < 20 s          | Any (GitHub free)    | Invalid configs don't compile                      | Per-file trybuild           |
+| Unit             | < 60 s          | Any                  | Pure Rust logic                                    | cargo test                  |
+| k3d              | 8–15 min        | GitHub / self-hosted | 22+ k3d-compatible examples                        | Fresh k3d cluster + ns-per-example |
+| OKD              | 10–30 min       | Self-hosted / CRC    | OKD-specific features (Routes, Monitoring CRDs…)   | Existing cluster via KUBECONFIG |
+| KVM Full E2E     | 60–180 min      | Self-hosted bare-metal | Full HA OKD install + bare-metal scenarios         | Harmony KVM score provisions VMs |
+
+### Tier Details & Implementation Notes
+
+1. **Compile-fail**  
+   Uses **`trybuild`** crate (standard in Rust ecosystem).  
+   Place intentional compile errors in `tests/ui/*.rs` with matching `*.stderr` expectation files.  
+   One test function replaces the old custom loop:
+
+   ```rust
+   #[test]
+   fn ui() {
+       let t = trybuild::TestCases::new();
+       t.compile_fail("tests/ui/*.rs");
+   }
+   ```
+
+2. **Unit**  
+   Simple wrapper: `cargo test --lib --workspace -- --nocapture`  
+   Consider `cargo-nextest` later for 2–3× speedup if test count grows.
+
+3. **k3d**  
+   - Provisions isolated cluster once at start (`k3d cluster create --agents 3 --no-lb --disable traefik`)
+   - Discovers examples via `[package.metadata.harmony.test-tier = "k3d"]` in `Cargo.toml`
+   - Runs in parallel with tokio semaphore (default 5–8 slots)
+   - Each example gets its own namespace
+   - Uses `defer` / `scopeguard` for guaranteed cleanup
+   - Mocks Discord webhook and OPNSense API
+
+4. **OKD**  
+   Connects to pre-provisioned cluster via `KUBECONFIG`.  
+   Validates it is actually OpenShift/OKD before proceeding.
+
+5. **KVM**  
+   Uses **Harmony’s own KVM module** to provision test VMs (control-plane + workers + OPNSense).  
+   → True “dogfooding” — if the E2E fails, the KVM score itself is likely broken.
+
+## CI Integration Patterns
+
+### Fast PR validation (GitHub Actions)
+
+```yaml
+name: Fast Tests
+on: [push, pull_request]
+jobs:
+  fast:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: dtolnay/rust-toolchain@stable
+      - name: Install Docker & k3d
+        uses: nolar/setup-k3d-k3s@v1
+      - run: cargo run -p harmony_e2e_tests -- --tier compile,unit,k3d --report junit.xml
+      - uses: actions/upload-artifact@v4
+        with: { name: test-results, path: junit.xml }
+```
+
+### Nightly / Merge heavy tests (self-hosted runner)
+
+```yaml
+name: Full E2E
+on:
+  schedule: [{ cron: "0 3 * * *" }]
+  push: { branches: [main] }
+jobs:
+  full:
+    runs-on: [self-hosted, linux, x64, kvm-capable]
+    steps:
+      - uses: actions/checkout@v4
+      - run: cargo run -p harmony_e2e_tests -- --tier okd,kvm --verbose --report junit.xml
+```
+
+## Prerequisites Auto-Check & Install
+
+```rust
+// in harmony_e2e_tests/src/infrastructure/prerequisites.rs
+async fn ensure_k3d() -> Result<()> { … }          // curl | bash if missing
+async fn ensure_docker() -> Result<()> { … }
+fn check_kvm_support() -> Result<()> { … }        // /dev/kvm + libvirt
+```
+
+## Success Criteria
+
+### Step 1
+- [ ] `harmony_e2e_tests` package created & basic CLI working
+- [ ] trybuild compile-fail suite passing
+- [ ] First 8–10 k3d examples running reliably in CI
+- [ ] Mock server for Discord webhook completed
+
+### Step 2
+- [ ] All 22 k3d-compatible examples green
+- [ ] OKD tier running on dedicated self-hosted runner
+- [ ] JUnit reporting + GitHub check integration
+- [ ] Namespace isolation + automatic retry on transient k8s errors
+
+### Step 3
+- [ ] KVM full E2E green on bare-metal runner (nightly)
+- [ ] Multi-cluster examples (nats, multisite-postgres) automated
+- [ ] Total fast CI time < 12 minutes on GitHub runners
+- [ ] Documentation: “How to add a new tested example”
+
+## Quick Start for New Contributors
+
+```bash
+# One-time setup
+rustup update stable
+cargo install trybuild cargo-nextest   # optional but recommended
+
+# Run locally (most common)
+cargo run -p harmony_e2e_tests -- --tier k3d --verbose
+
+# Just compile checks + unit
+cargo test -p harmony_e2e_tests
+```
+
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -20,6 +20,9 @@ members = [
  "brocade",
  "harmony_agent",
  "harmony_agent/deploy",
+  "harmony_node_readiness",
+  "harmony-k8s",
+  "harmony_e2e_tests",
 ]

 [workspace.package]
@@ -38,6 +41,8 @@ tokio = { version = "1.40", features = [
  "macros",
  "rt-multi-thread",
 ] }
+tokio-retry = "0.3.0"
+tokio-util = "0.7.15"
 cidr = { features = ["serde"], version = "0.2" }
 russh = "0.45"
 russh-keys = "0.45"
@@ -52,6 +57,7 @@ kube = { version = "1.1.0", features = [
  "jsonpatch",
 ] }
 k8s-openapi = { version = "0.25", features = ["v1_30"] }
+# TODO replace with https://github.com/bourumir-wyngs/serde-saphyr as serde_yaml is deprecated https://github.com/sebastienrousseau/serde_yml
 serde_yaml = "0.9"
 serde-value = "0.7"
 http = "1.2"
--- a/README.md
+++ b/README.md
@@ -1,4 +1,6 @@
-# Harmony : Open-source infrastructure orchestration that treats your platform like first-class code
+# Harmony
+
+Open-source infrastructure orchestration that treats your platform like first-class code.

 In other words, Harmony is a **next-generation platform engineering framework**.

@@ -20,9 +22,7 @@ All in **one strongly-typed Rust codebase**.

 From a **developer laptop** to a **global production cluster**, a single **source of truth** drives the **full software lifecycle.**

---
-
-## 1 · The Harmony Philosophy
+## The Harmony Philosophy

 Infrastructure is essential, but it shouldn’t be your core business. Harmony is built on three guiding principles that make modern platforms reliable, repeatable, and easy to reason about.

@@ -34,9 +34,18 @@ Infrastructure is essential, but it shouldn’t be your core business. Harmony i

 These principles surface as simple, ergonomic Rust APIs that let teams focus on their product while trusting the platform underneath.

---
+## Where to Start

-## 2 · Quick Start
+We have a comprehensive set of documentation right here in the repository.
+
+| I want to...      | Start Here                                                         |
+| ----------------- | ------------------------------------------------------------------ |
+| Get Started       | [Getting Started Guide](./docs/guides/getting-started.md)          |
+| See an Example    | [Use Case: Deploy a Rust Web App](./docs/use-cases/rust-webapp.md) |
+| Explore           | [Documentation Hub](./docs/README.md)                              |
+| See Core Concepts | [Core Concepts Explained](./docs/concepts.md)                      |
+
+## Quick Look: Deploy a Rust Webapp

 The snippet below spins up a complete **production-grade Rust + Leptos Webapp** with monitoring. Swap it for your own scores to deploy anything from microservices to machine-learning pipelines.

@@ -94,63 +103,33 @@ async fn main() {
 }
 ```

-Run it:
+To run this:

-```bash
-cargo run
-```
+- Clone the repository: `git clone https://git.nationtech.io/nationtech/harmony`
+- Install dependencies: `cargo build --release`
+- Run the example: `cargo run --example try_rust_webapp`

-Harmony analyses the code, shows an execution plan in a TUI, and applies it once you confirm. Same code, same binary—every environment.
+## Documentation

---
+All documentation is in the `/docs` directory.

-## 3 · Core Concepts
+- [Documentation Hub](./docs/README.md): The main entry point for all documentation.
+- [Core Concepts](./docs/concepts.md): A detailed look at Score, Topology, Capability, Inventory, and Interpret.
+- [Component Catalogs](./docs/catalogs/README.md): Discover all available Scores, Topologies, and Capabilities.
+- [Developer Guide](./docs/guides/developer-guide.md): Learn how to write your own Scores and Topologies.

-| Term             | One-liner                                                                                            |
-| ---------------- | ---------------------------------------------------------------------------------------------------- |
-| **Score<T>**     | Declarative description of the desired state (e.g., `LAMPScore`).                                    |
-| **Interpret<T>** | Imperative logic that realises a `Score` on a specific environment.                                  |
-| **Topology**     | An environment (local k3d, AWS, bare-metal) exposing verified _Capabilities_ (Kubernetes, DNS, …).   |
-| **Maestro**      | Orchestrator that compiles Scores + Topology, ensuring all capabilities line up **at compile-time**. |
-| **Inventory**    | Optional catalogue of physical assets for bare-metal and edge deployments.                           |
+## Architectural Decision Records

-A visual overview is in the diagram below.
+- [ADR-001 · Why Rust](adr/001-rust.md)
+- [ADR-003 · Infrastructure Abstractions](adr/003-infrastructure-abstractions.md)
+- [ADR-006 · Secret Management](adr/006-secret-management.md)
+- [ADR-011 · Multi-Tenant Cluster](adr/011-multi-tenant-cluster.md)

-[Harmony Core Architecture](docs/diagrams/Harmony_Core_Architecture.drawio.svg)
+## Contribute

---
+Discussions and roadmap live in [Issues](https://git.nationtech.io/nationtech/harmony/-/issues). PRs, ideas, and feedback are welcome!

-## 4 · Install
-
-Prerequisites:
-
- Rust
- Docker (if you deploy locally)
- `kubectl` / `helm` for Kubernetes-based topologies
-
-```bash
-git clone https://git.nationtech.io/nationtech/harmony
-cd harmony
-cargo build --release          # builds the CLI, TUI and libraries
-```
-
---
-
-## 5 · Learning More
-
- **Architectural Decision Records** – dive into the rationale
-  - [ADR-001 · Why Rust](adr/001-rust.md)
-  - [ADR-003 · Infrastructure Abstractions](adr/003-infrastructure-abstractions.md)
-  - [ADR-006 · Secret Management](adr/006-secret-management.md)
-  - [ADR-011 · Multi-Tenant Cluster](adr/011-multi-tenant-cluster.md)
-
- **Extending Harmony** – write new Scores / Interprets, add hardware like OPNsense firewalls, or embed Harmony in your own tooling (`/docs`).
-
- **Community** – discussions and roadmap live in [GitLab issues](https://git.nationtech.io/nationtech/harmony/-/issues). PRs, ideas, and feedback are welcome!
-
---
-
-## 6 · License
+## License

 Harmony is released under the **GNU AGPL v3**.

--- a/adr/019-Nats-credentials-management.md
+++ b/adr/019-Nats-credentials-management.md
@@ -1,86 +0,0 @@
-Initial Date: 2025-02-06
-
-## Status
-
-Proposed
-
-## Context 
-
-The Harmony Agent requires a persistent connection to the NATS Supercluster to perform Key-Value (KV) operations (Read/Write/Watch).
-
-Service Requirements: The agent must authenticate with sufficient privileges to manage KV buckets and interact with the JetStream API.
-
-Infrastructure: NATS is deployed as a multi-site Supercluster. Authentication must be consistent across sites to allow for agent failover and data replication.
-
-https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro
-
-Technical Constraint: In NATS, JetStream functionality is not global by default; it must be explicitly enabled and capped at the Account level to allow KV bucket creation and persistence.
-
-## Issues
-
-1. The "System Account" Trap
-
-    The Hole: Using the system account for the Harmony Agent.
-
-    The Risk: The NATS System Account is for server heartbeat and monitoring. It cannot (and should not) own JetStream KV buckets.
-
-2. Multi-Site Authorization Sync
-
-    The Hole: Defining users in local nats.conf files via Helm.
-
-    The Risk: If an agent at Site-2 fails over to Site-3, but Site-3’s local configuration doesn't have the testUser credentials, the agent will be locked out during an outage.
-
-3. KV Replication Factor
-
-    The Hole: Not specifying the Replicas count for the KV bucket.
-
-    The Risk: If you create a KV bucket with the default (1 replica), it only exists at the site where it was created. If that site goes down, the data is lost despite having a Supercluster.
-
-4. Subject-Level Permissions
-
-    The Hole: Only granting TEST.* permissions.
-
-    The Risk: NATS KV uses internal subjects (e.g., $KV.<bucket_name>.>). Without access to these, the agent will get an "Authorization Violation" even if it's logged in.
-
-
-## Proposed Solution
-
-To enable reliable, secure communication between the Harmony Agent and the NATS Supercluster, we will implement Account-isolated JetStream using NKey Authentication (or mTLS).
-1. Dedicated Account Architecture
-
-We will move away from the "Global/Default" account. A dedicated HARMONY account will be defined identically across all sites in the Supercluster. This ensures that the metadata for the KV bucket can replicate across the gateways.
-
-    System Account: Reserved for NATS internal health and Supercluster routing.
-
-    Harmony Account: Dedicated to Harmony Agent data, with JetStream explicitly enabled.
-
-2. Authentication: Use harmony secret store mounted into nats container
-
-    Take advantage of currently implemented solution
-
-3. JetStream & KV Configuration
-
-To ensure the KV bucket is available across the Supercluster, the following configuration must be applied:
-
-    Replication Factor (R=3): KV buckets will be created with a replication factor of 3 to ensure data persists across Site-1, Site-2, and Site-3.
-
-    Permissions: The agent will be granted scoped access to:
-
-        $KV.HARMONY.> (Data operations)
-
-        $JS.API.CONSUMER.> and $JS.API.STREAM.> (Management operations)
-
-## Consequence of Decision
-Pros
-
-    Resilience: Agents can fail over to any site in the Supercluster and find their credentials and data.
-
-    Security: By using a dedicated account, the Harmony Agent cannot see or interfere with NATS system traffic.
-
-    Scalability: We can add Site-4 or Site-5 simply by copying the HARMONY account definition.
-
-Cons / Risks
-
-    Configuration Drift: If one site's ConfigMap is updated without the others, authentication will fail during a site failover.
-
-    Complexity: Requires a "Management" step to ensure the account exists on all NATS instances before the agent attempts to connect.
--- a/adr/019-Network-bond-setup.md
+++ b/adr/019-Network-bond-setup.md
@@ -0,0 +1,65 @@
+# Architecture Decision Record: Network Bonding Configuration via External Automation
+
+Initial Author: Jean-Gabriel Gill-Couture & Sylvain Tremblay
+
+Initial Date: 2026-02-13
+
+Last Updated Date: 2026-02-13
+
+## Status
+
+Accepted
+
+## Context
+
+We need to configure LACP bonds on 10GbE interfaces across all worker nodes in the OpenShift cluster. A significant challenge is that interface names (e.g., `enp1s0f0` vs `ens1f0`) vary across different hardware nodes.
+
+The standard OpenShift mechanism (MachineConfig) applies identical configurations to all nodes in a MachineConfigPool. Since the interface names differ, a single static MachineConfig cannot target specific physical devices across the entire cluster without complex workarounds.
+
+## Decision
+
+We will use the existing "Harmony" automation tool to generate and apply host-specific NetworkManager configuration files directly to the nodes.
+
+1.  Harmony will generate the specific `.nmconnection` files for the bond and slaves based on its inventory of interface names.
+2.  Files will be pushed to `/etc/NetworkManager/system-connections/` on each node.
+3.  Configuration will be applied via `nmcli` reload or a node reboot.
+
+## Rationale
+
+*   **Inventory Awareness:** Harmony already possesses the specific interface mapping data for each host.
+*   **Persistence:** Fedora CoreOS/SCOS allows writing to `/etc`, and these files persist across reboots and OS upgrades (rpm-ostree updates).
+*   **Avoids Complexity:** This approach avoids the operational overhead of creating unique MachineConfigPools for every single host or hardware variant.
+*   **Safety:** Unlike wildcard matching, this ensures explicit interface selection, preventing accidental bonding of reserved interfaces (e.g., future separation of Ceph storage traffic).
+
+## Consequences
+
+**Pros:**
+*   Precise, per-host configuration without polluting the Kubernetes API with hundreds of MachineConfigs.
+*   Standard Linux networking behavior; easy to debug locally.
+*   Prevents accidental interface capture (unlike wildcards).
+
+**Cons:**
+*   **Loss of Declarative K8s State:** The network config is not managed by the Machine Config Operator (MCO).
+*   **Node Replacement Friction:** Newly provisioned nodes (replacements) will boot with default config. Harmony must be run against new nodes manually or via a hook before they can fully join the cluster workload.
+
+## Alternatives considered
+
+1.  **Wildcard Matching in NetworkManager (e.g., `interface-name=enp*`):**
+    *   *Pros:* Single MachineConfig for the whole cluster.
+    *   *Cons:* Rejected because it is too broad. It risks capturing interfaces intended for other purposes (e.g., splitting storage and cluster networks later).
+
+2.  **"Kitchen Sink" Configuration:**
+    *   *Pros:* Single file listing every possible interface name as a slave.
+    *   *Cons:* "Dirty" configuration; results in many inactive connections on every host; brittle if new naming schemes appear.
+
+3.  **Per-Host MachineConfig:**
+    *   *Pros:* Fully declarative within OpenShift.
+    *   *Cons:* Requires a unique `MachineConfigPool` per host, which is an anti-pattern and unmaintainable at scale.
+
+4.  **On-boot Generation Script:**
+    *   *Pros:* Dynamic detection.
+    *   *Cons:* Increases boot complexity; harder to debug if the script fails during startup.
+
+## Additional Notes
+
+While `/etc` is writable and persistent on CoreOS, this configuration falls outside the "Day 1" Ignition process. Operational runbooks must be updated to ensure Harmony runs on any node replacement events.
--- a/adr/020-monitoring-alerting-architecture.md
+++ b/adr/020-monitoring-alerting-architecture.md
@@ -0,0 +1,318 @@
+# Architecture Decision Record: Monitoring and Alerting Architecture
+
+Initial Author: Willem Rolleman, Jean-Gabriel Carrier
+
+Initial Date: March 9, 2026
+
+Last Updated Date: March 9, 2026
+
+## Status
+
+Accepted
+
+Supersedes: [ADR-010](010-monitoring-and-alerting.md)
+
+## Context
+
+Harmony needs a unified approach to monitoring and alerting across different infrastructure targets:
+
+1. **Cluster-level monitoring**: Administrators managing entire Kubernetes/OKD clusters need to define cluster-wide alerts, receivers, and scrape targets.
+
+2. **Tenant-level monitoring**: Multi-tenant clusters where teams are confined to namespaces need monitoring scoped to their resources.
+
+3. **Application-level monitoring**: Developers deploying applications want zero-config monitoring that "just works" for their services.
+
+The monitoring landscape is fragmented:
+- **OKD/OpenShift**: Built-in Prometheus with AlertmanagerConfig CRDs
+- **KubePrometheus**: Helm-based stack with PrometheusRule CRDs
+- **RHOB (Red Hat Observability)**: Operator-based with MonitoringStack CRDs
+- **Standalone Prometheus**: Raw Prometheus deployments
+
+Each system has different CRDs, different installation methods, and different configuration APIs.
+
+## Decision
+
+We implement a **trait-based architecture with compile-time capability verification** that provides:
+
+1. **Type-safe abstractions** via parameterized traits: `AlertReceiver<S>`, `AlertRule<S>`, `ScrapeTarget<S>`
+2. **Compile-time topology compatibility** via the `Observability<S>` capability bound
+3. **Three levels of abstraction**: Cluster, Tenant, and Application monitoring
+4. **Pre-built alert rules** as functions that return typed structs
+
+### Core Traits
+
+```rust
+// domain/topology/monitoring.rs
+
+/// Marker trait for systems that send alerts (Prometheus, etc.)
+pub trait AlertSender: Send + Sync + std::fmt::Debug {
+    fn name(&self) -> String;
+}
+
+/// Defines how a receiver (Discord, Slack, etc.) builds its configuration
+/// for a specific sender type
+pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
+}
+
+/// Defines how an alert rule builds its PrometheusRule configuration
+pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn AlertRule<S>>;
+}
+
+/// Capability that topologies implement to support monitoring
+pub trait Observability<S: AlertSender> {
+    async fn install_alert_sender(&self, sender: &S, inventory: &Inventory) 
+        -> Result<PreparationOutcome, PreparationError>;
+    async fn install_receivers(&self, sender: &S, inventory: &Inventory, 
+        receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>) -> Result<...>;
+    async fn install_rules(&self, sender: &S, inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<S>>>>) -> Result<...>;
+    async fn add_scrape_targets(&self, sender: &S, inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>) -> Result<...>;
+    async fn ensure_monitoring_installed(&self, sender: &S, inventory: &Inventory) 
+        -> Result<...>;
+}
+```
+
+### Alert Sender Types
+
+Each monitoring stack is a distinct `AlertSender`:
+
+| Sender | Module | Use Case |
+|--------|--------|----------|
+| `OpenshiftClusterAlertSender` | `monitoring/okd/` | OKD/OpenShift built-in monitoring |
+| `KubePrometheus` | `monitoring/kube_prometheus/` | Helm-deployed kube-prometheus-stack |
+| `Prometheus` | `monitoring/prometheus/` | Standalone Prometheus via Helm |
+| `RedHatClusterObservability` | `monitoring/red_hat_cluster_observability/` | RHOB operator |
+| `Grafana` | `monitoring/grafana/` | Grafana-managed alerting |
+
+### Three Levels of Monitoring
+
+#### 1. Cluster-Level Monitoring
+
+For cluster administrators. Full control over monitoring infrastructure.
+
+```rust
+// examples/okd_cluster_alerts/src/main.rs
+OpenshiftClusterAlertScore {
+    sender: OpenshiftClusterAlertSender,
+    receivers: vec![Box::new(DiscordReceiver { ... })],
+    rules: vec![Box::new(alert_rules)],
+    scrape_targets: Some(vec![Box::new(external_exporters)]),
+}
+```
+
+**Characteristics:**
+- Cluster-scoped CRDs and resources
+- Can add external scrape targets (outside cluster)
+- Manages Alertmanager configuration
+- Requires cluster-admin privileges
+
+#### 2. Tenant-Level Monitoring
+
+For teams confined to namespaces. The topology determines tenant context.
+
+```rust
+// The topology's Observability impl handles namespace scoping
+impl Observability<KubePrometheus> for K8sAnywhereTopology {
+    async fn install_rules(&self, sender: &KubePrometheus, ...) {
+        // Topology knows if it's tenant-scoped
+        let namespace = self.get_tenant_config().await
+            .map(|t| t.name)
+            .unwrap_or("default");
+        // Install rules in tenant namespace
+    }
+}
+```
+
+**Characteristics:**
+- Namespace-scoped resources
+- Cannot modify cluster-level monitoring config
+- May have restricted receiver types
+- Runtime validation of permissions (cannot be fully compile-time)
+
+#### 3. Application-Level Monitoring
+
+For developers. Zero-config, opinionated monitoring.
+
+```rust
+// modules/application/features/monitoring.rs
+pub struct Monitoring {
+    pub application: Arc<dyn Application>,
+    pub alert_receiver: Vec<Box<dyn AlertReceiver<Prometheus>>>,
+}
+
+impl<T: Topology + Observability<Prometheus> + TenantManager + ...> 
+    ApplicationFeature<T> for Monitoring 
+{
+    async fn ensure_installed(&self, topology: &T) -> Result<...> {
+        // Auto-creates ServiceMonitor
+        // Auto-installs Ntfy for notifications
+        // Handles tenant namespace automatically
+        // Wires up sensible defaults
+    }
+}
+```
+
+**Characteristics:**
+- Automatic ServiceMonitor creation
+- Opinionated notification channel (Ntfy)
+- Tenant-aware via topology
+- Minimal configuration required
+
+## Rationale
+
+### Why Generic Traits Instead of Unified Types?
+
+Each monitoring stack (OKD, KubePrometheus, RHOB) has fundamentally different CRDs:
+
+```rust
+// OKD uses AlertmanagerConfig with different structure
+AlertmanagerConfig { spec: { receivers: [...] } }
+
+// RHOB uses secret references for webhook URLs
+MonitoringStack { spec: { alertmanagerConfig: { discordConfigs: [{ apiURL: { key: "..." } }] } } }
+
+// KubePrometheus uses Alertmanager CRD with different field names
+Alertmanager { spec: { config: { receivers: [...] } } }
+```
+
+A unified type would either:
+1. Be a lowest-common-denominator (loses stack-specific features)
+2. Be a complex union type (hard to use, easy to misconfigure)
+
+Generic traits let each stack express its configuration naturally while providing a consistent interface.
+
+### Why Compile-Time Capability Bounds?
+
+```rust
+impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T> 
+    for OpenshiftClusterAlertScore { ... }
+```
+
+This fails at compile time if you try to use `OpenshiftClusterAlertScore` with a topology that doesn't support OKD monitoring. This prevents the "config-is-valid-but-platform-is-wrong" errors that Harmony was designed to eliminate.
+
+### Why Not a MonitoringStack Abstraction (V2 Approach)?
+
+The V2 approach proposed a unified `MonitoringStack` that hides sender selection:
+
+```rust
+// V2 approach - rejected
+MonitoringStack::new(MonitoringApiVersion::V2CRD)
+    .add_alert_channel(discord)
+```
+
+**Problems:**
+1. Hides which sender you're using, losing compile-time guarantees
+2. "Version selection" actually chooses between fundamentally different systems
+3. Would need to handle all stack-specific features through a generic interface
+
+The current approach is explicit: you choose `OpenshiftClusterAlertSender` and the compiler verifies compatibility.
+
+### Why Runtime Validation for Tenants?
+
+Tenant confinement is determined at runtime by the topology and K8s RBAC. We cannot know at compile time whether a user has cluster-admin or namespace-only access.
+
+Options considered:
+1. **Compile-time tenant markers** - Would require modeling entire RBAC hierarchy in types. Over-engineering.
+2. **Runtime validation** - Current approach. Fails with clear K8s permission errors if insufficient access.
+3. **No tenant support** - Would exclude a major use case.
+
+Runtime validation is the pragmatic choice. The failure mode is clear (K8s API error) and occurs early in execution.
+
+> Note : we will eventually have compile time validation for such things. Rust macros are powerful and we could discover the actual capabilities we're dealing with, similar to sqlx approach in query! macros.
+
+## Consequences
+
+### Pros
+
+1. **Type Safety**: Invalid configurations are caught at compile time
+2. **Extensibility**: Adding a new monitoring stack requires implementing traits, not modifying core code
+3. **Clear Separation**: Cluster/Tenant/Application levels have distinct entry points
+4. **Reusable Rules**: Pre-built alert rules as functions (`high_pvc_fill_rate_over_two_days()`)
+5. **CRD Accuracy**: Type definitions match actual Kubernetes CRDs exactly
+
+### Cons
+
+1. **Implementation Explosion**: `DiscordReceiver` implements `AlertReceiver<S>` for each sender type (3+ implementations)
+2. **Learning Curve**: Understanding the trait hierarchy takes time
+3. **clone_box Boilerplate**: Required for trait object cloning (3 lines per impl)
+
+### Mitigations
+
+- Implementation explosion is contained: each receiver type has O(senders) implementations, but receivers are rare compared to rules
+- Learning curve is documented with examples at each level
+- clone_box boilerplate is minimal and copy-paste
+
+## Alternatives Considered
+
+### Unified MonitoringStack Type
+
+See "Why Not a MonitoringStack Abstraction" above. Rejected for losing compile-time safety.
+
+### Helm-Only Approach
+
+Use `HelmScore` directly for each monitoring deployment. Rejected because:
+- No type safety for alert rules
+- Cannot compose with application features
+- No tenant awareness
+
+### Separate Modules Per Use Case
+
+Have `cluster_monitoring/`, `tenant_monitoring/`, `app_monitoring/` as separate modules. Rejected because:
+- Massive code duplication
+- No shared abstraction for receivers/rules
+- Adding a feature requires three implementations
+
+## Implementation Notes
+
+### Module Structure
+
+```
+modules/monitoring/
+├── mod.rs                     # Public exports
+├── alert_channel/             # Receivers (Discord, Webhook)
+├── alert_rule/                # Rules and pre-built alerts
+│   ├── prometheus_alert_rule.rs
+│   └── alerts/                # Library of pre-built rules
+│       ├── k8s/               # K8s-specific (pvc, pod, memory)
+│       └── infra/             # Infrastructure (opnsense, dell)
+├── okd/                       # OpenshiftClusterAlertSender
+├── kube_prometheus/           # KubePrometheus
+├── prometheus/                # Prometheus
+├── red_hat_cluster_observability/  # RHOB
+├── grafana/                   # Grafana
+├── application_monitoring/    # Application-level scores
+└── scrape_target/             # External scrape targets
+```
+
+### Adding a New Alert Sender
+
+1. Create sender type: `pub struct MySender; impl AlertSender for MySender { ... }`
+2. Implement `Observability<MySender>` for topologies that support it
+3. Create CRD types in `crd/` subdirectory
+4. Implement `AlertReceiver<MySender>` for existing receivers
+5. Implement `AlertRule<MySender>` for `AlertManagerRuleGroup`
+
+### Adding a New Alert Rule
+
+```rust
+pub fn my_custom_alert() -> PrometheusAlertRule {
+    PrometheusAlertRule::new("MyAlert", "up == 0")
+        .for_duration("5m")
+        .label("severity", "critical")
+        .annotation("summary", "Service is down")
+}
+```
+
+No trait implementation needed - `AlertManagerRuleGroup` already handles conversion.
+
+## Related ADRs
+
+- [ADR-013](013-monitoring-notifications.md): Notification channel selection (ntfy)
+- [ADR-011](011-multi-tenant-cluster.md): Multi-tenant cluster architecture
--- a/adr/020-monitoring-alerting-architecture/monitoring_v2/Cargo.toml
+++ b/adr/020-monitoring-alerting-architecture/monitoring_v2/Cargo.toml
@@ -0,0 +1,21 @@
+[package]
+name = "example-monitoring-v2"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony-k8s = { path = "../../harmony-k8s" }
+harmony_types = { path = "../../harmony_types" }
+kube = { workspace = true }
+schemars = "0.8"
+serde = { workspace = true, features = ["derive"] }
+serde_json = { workspace = true }
+serde_yaml = { workspace = true }
+url = { workspace = true }
+log = { workspace = true }
+async-trait = { workspace = true }
+k8s-openapi = { workspace = true }
--- a/adr/020-monitoring-alerting-architecture/monitoring_v2/README.md
+++ b/adr/020-monitoring-alerting-architecture/monitoring_v2/README.md
@@ -0,0 +1,91 @@
+# Monitoring v2 - Improved Architecture
+
+This example demonstrates the improved monitoring architecture that addresses the "WTF/minute" issues in the original design.
+
+## Key Improvements
+
+### 1. **Single AlertChannel Trait with Generic Sender**
+
+The original design required 9-12 implementations for each alert channel (Discord, Webhook, etc.) - one for each sender type. The new design uses a single trait with generic sender parameterization:
+
+pub trait AlertChannel<Sender: AlertSender> {
+    async fn install_config(&self, sender: &Sender) -> Result<Outcome, InterpretError>;
+    fn name(&self) -> String;
+    fn as_any(&self) -> &dyn std::any::Any;
+}
+
+**Benefits:**
+- One Discord implementation works with all sender types
+- Type safety at compile time
+- No runtime dispatch overhead
+
+### 2. **MonitoringStack Abstraction**
+
+Instead of manually selecting CRDPrometheus vs KubePrometheus vs RHOBObservability, you now have a unified MonitoringStack that handles versioning:
+
+let monitoring_stack = MonitoringStack::new(MonitoringApiVersion::V2CRD)
+    .set_namespace("monitoring")
+    .add_alert_channel(discord_receiver)
+    .set_scrape_targets(vec![...]);
+
+**Benefits:**
+- Single source of truth for monitoring configuration
+- Easy to switch between monitoring versions
+- Automatic version-specific configuration
+
+### 3. **TenantMonitoringScore - True Composition**
+
+The original monitoring_with_tenant example just put tenant and monitoring as separate items in a vec. The new design truly composes them:
+
+let tenant_score = TenantMonitoringScore::new("test-tenant", monitoring_stack);
+
+This creates a single score that:
+- Has tenant context
+- Has monitoring configuration
+- Automatically installs monitoring scoped to tenant namespace
+
+**Benefits:**
+- No more "two separate things" confusion
+- Automatic tenant namespace scoping
+- Clear ownership: tenant owns its monitoring
+
+### 4. **Versioned Monitoring APIs**
+
+Clear versioning makes it obvious which monitoring stack you're using:
+
+pub enum MonitoringApiVersion {
+    V1Helm,    // Old Helm charts
+    V2CRD,     // Current CRDs
+    V3RHOB,    // RHOB (future)
+}
+
+**Benefits:**
+- No guessing which API version you're using
+- Easy to migrate between versions
+- Backward compatibility path
+
+## Comparison
+
+### Original Design (monitoring_with_tenant)
+- Manual selection of each component
+- Manual installation of both components
+- Need to remember to pass both to harmony_cli::run
+- Monitoring not scoped to tenant automatically
+
+### New Design (monitoring_v2)
+- Single composed score
+- One score does it all
+
+## Usage
+
+cd examples/monitoring_v2
+cargo run
+
+## Migration Path
+
+To migrate from the old design to the new:
+
+1. Replace individual alert channel implementations with AlertChannel<Sender>
+2. Use MonitoringStack instead of manual *Prometheus selection
+3. Use TenantMonitoringScore instead of separate TenantScore + monitoring scores
+4. Select monitoring version via MonitoringApiVersion
--- a/adr/020-monitoring-alerting-architecture/monitoring_v2/src/lib.rs
+++ b/adr/020-monitoring-alerting-architecture/monitoring_v2/src/lib.rs
@@ -0,0 +1,343 @@
+use std::collections::HashMap;
+use std::sync::{Arc, Mutex};
+
+
+use log::debug;
+use serde::{Deserialize, Serialize};
+use serde_yaml::{Mapping, Value};
+
+use harmony::data::Version;
+use harmony::interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome};
+use harmony::inventory::Inventory;
+use harmony::score::Score;
+use harmony::topology::{Topology, tenant::TenantManager};
+
+use harmony_k8s::K8sClient;
+use harmony_types::k8s_name::K8sName;
+use harmony_types::net::Url;
+
+pub trait AlertSender: Send + Sync + std::fmt::Debug {
+    fn name(&self) -> String;
+    fn namespace(&self) -> String;
+}
+
+#[derive(Debug)]
+pub struct CRDPrometheus {
+    pub namespace: String,
+    pub client: Arc<K8sClient>,
+}
+
+impl AlertSender for CRDPrometheus {
+    fn name(&self) -> String {
+        "CRDPrometheus".to_string()
+    }
+
+    fn namespace(&self) -> String {
+        self.namespace.clone()
+    }
+}
+
+#[derive(Debug)]
+pub struct RHOBObservability {
+    pub namespace: String,
+    pub client: Arc<K8sClient>,
+}
+
+impl AlertSender for RHOBObservability {
+    fn name(&self) -> String {
+        "RHOBObservability".to_string()
+    }
+
+    fn namespace(&self) -> String {
+        self.namespace.clone()
+    }
+}
+
+#[derive(Debug)]
+pub struct KubePrometheus {
+    pub config: Arc<Mutex<KubePrometheusConfig>>,
+}
+
+impl Default for KubePrometheus {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl KubePrometheus {
+    pub fn new() -> Self {
+        Self {
+            config: Arc::new(Mutex::new(KubePrometheusConfig::new())),
+        }
+    }
+}
+
+impl AlertSender for KubePrometheus {
+    fn name(&self) -> String {
+        "KubePrometheus".to_string()
+    }
+
+    fn namespace(&self) -> String {
+        self.config.lock().unwrap().namespace.clone().unwrap_or_else(|| "monitoring".to_string())
+    }
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct KubePrometheusConfig {
+    pub namespace: Option<String>,
+    #[serde(skip)]
+    pub alert_receiver_configs: Vec<AlertManagerChannelConfig>,
+}
+
+impl KubePrometheusConfig {
+    pub fn new() -> Self {
+        Self {
+            namespace: None,
+            alert_receiver_configs: Vec::new(),
+        }
+    }
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct AlertManagerChannelConfig {
+    pub channel_receiver: serde_yaml::Value,
+    pub channel_route: serde_yaml::Value,
+}
+
+impl Default for AlertManagerChannelConfig {
+    fn default() -> Self {
+        Self {
+            channel_receiver: serde_yaml::Value::Mapping(Default::default()),
+            channel_route: serde_yaml::Value::Mapping(Default::default()),
+        }
+    }
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ScrapeTargetConfig {
+    pub service_name: String,
+    pub port: String,
+    pub path: String,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub enum MonitoringApiVersion {
+    V1Helm,
+    V2CRD,
+    V3RHOB,
+}
+
+#[derive(Debug, Clone)]
+pub struct MonitoringStack {
+    pub version: MonitoringApiVersion,
+    pub namespace: String,
+    pub alert_channels: Vec<Arc<dyn AlertSender>>,
+    pub scrape_targets: Vec<ScrapeTargetConfig>,
+}
+
+impl MonitoringStack {
+    pub fn new(version: MonitoringApiVersion) -> Self {
+        Self {
+            version,
+            namespace: "monitoring".to_string(),
+            alert_channels: Vec::new(),
+            scrape_targets: Vec::new(),
+        }
+    }
+
+    pub fn set_namespace(mut self, namespace: &str) -> Self {
+        self.namespace = namespace.to_string();
+        self
+    }
+
+    pub fn add_alert_channel(mut self, channel: impl AlertSender + 'static) -> Self {
+        self.alert_channels.push(Arc::new(channel));
+        self
+    }
+
+    pub fn set_scrape_targets(mut self, targets: Vec<(&str, &str, String)>) -> Self {
+        self.scrape_targets = targets
+            .into_iter()
+            .map(|(name, port, path)| ScrapeTargetConfig {
+                service_name: name.to_string(),
+                port: port.to_string(),
+                path,
+            })
+            .collect();
+        self
+    }
+}
+
+pub trait AlertChannel<Sender: AlertSender> {
+    fn install_config(&self, sender: &Sender);
+    fn name(&self) -> String;
+}
+
+#[derive(Debug, Clone)]
+pub struct DiscordWebhook {
+    pub name: K8sName,
+    pub url: Url,
+    pub selectors: Vec<HashMap<String, String>>,
+}
+
+impl DiscordWebhook {
+    fn get_config(&self) -> AlertManagerChannelConfig {
+        let mut route = Mapping::new();
+        route.insert(
+            Value::String("receiver".to_string()),
+            Value::String(self.name.to_string()),
+        );
+        route.insert(
+            Value::String("matchers".to_string()),
+            Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
+        );
+
+        let mut receiver = Mapping::new();
+        receiver.insert(
+            Value::String("name".to_string()),
+            Value::String(self.name.to_string()),
+        );
+
+        let mut discord_config = Mapping::new();
+        discord_config.insert(
+            Value::String("webhook_url".to_string()),
+            Value::String(self.url.to_string()),
+        );
+
+        receiver.insert(
+            Value::String("discord_configs".to_string()),
+            Value::Sequence(vec![Value::Mapping(discord_config)]),
+        );
+
+        AlertManagerChannelConfig {
+            channel_receiver: Value::Mapping(receiver),
+            channel_route: Value::Mapping(route),
+        }
+    }
+}
+
+impl AlertChannel<CRDPrometheus> for DiscordWebhook {
+    fn install_config(&self, sender: &CRDPrometheus) {
+        debug!("Installing Discord webhook for CRDPrometheus in namespace: {}", sender.namespace());
+        debug!("Config: {:?}", self.get_config());
+        debug!("Installed!");
+    }
+
+    fn name(&self) -> String {
+        "discord-webhook".to_string()
+    }
+}
+
+impl AlertChannel<RHOBObservability> for DiscordWebhook {
+    fn install_config(&self, sender: &RHOBObservability) {
+        debug!("Installing Discord webhook for RHOBObservability in namespace: {}", sender.namespace());
+        debug!("Config: {:?}", self.get_config());
+        debug!("Installed!");
+    }
+
+    fn name(&self) -> String {
+        "webhook-receiver".to_string()
+    }
+}
+
+impl AlertChannel<KubePrometheus> for DiscordWebhook {
+    fn install_config(&self, sender: &KubePrometheus) {
+        debug!("Installing Discord webhook for KubePrometheus in namespace: {}", sender.namespace());
+        let config = sender.config.lock().unwrap();
+        let ns = config.namespace.clone().unwrap_or_else(|| "monitoring".to_string());
+        debug!("Namespace: {}", ns);
+        let mut config = sender.config.lock().unwrap();
+        config.alert_receiver_configs.push(self.get_config());
+        debug!("Installed!");
+    }
+
+    fn name(&self) -> String {
+        "discord-webhook".to_string()
+    }
+}
+
+fn default_monitoring_stack() -> MonitoringStack {
+    MonitoringStack::new(MonitoringApiVersion::V2CRD)
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct TenantMonitoringScore {
+    pub tenant_id: harmony_types::id::Id,
+    pub tenant_name: String,
+    #[serde(skip)]
+    #[serde(default = "default_monitoring_stack")]
+    pub monitoring_stack: MonitoringStack,
+}
+
+impl TenantMonitoringScore {
+    pub fn new(tenant_name: &str, monitoring_stack: MonitoringStack) -> Self {
+        Self {
+            tenant_id: harmony_types::id::Id::default(),
+            tenant_name: tenant_name.to_string(),
+            monitoring_stack,
+        }
+    }
+}
+
+impl<T: Topology + TenantManager> Score<T> for TenantMonitoringScore {
+    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
+        Box::new(TenantMonitoringInterpret {
+            score: self.clone(),
+        })
+    }
+
+    fn name(&self) -> String {
+        format!("{} monitoring [TenantMonitoringScore]", self.tenant_name)
+    }
+}
+
+#[derive(Debug)]
+pub struct TenantMonitoringInterpret {
+    pub score: TenantMonitoringScore,
+}
+
+#[async_trait::async_trait]
+impl<T: Topology + TenantManager> Interpret<T> for TenantMonitoringInterpret {
+    async fn execute(
+        &self,
+        _inventory: &Inventory,
+        topology: &T,
+    ) -> Result<Outcome, InterpretError> {
+        let tenant_config = topology.get_tenant_config().await.unwrap();
+        let tenant_ns = tenant_config.name.clone();
+
+        match self.score.monitoring_stack.version {
+            MonitoringApiVersion::V1Helm => {
+                debug!("Installing Helm monitoring for tenant {}", tenant_ns);
+            }
+            MonitoringApiVersion::V2CRD => {
+                debug!("Installing CRD monitoring for tenant {}", tenant_ns);
+            }
+            MonitoringApiVersion::V3RHOB => {
+                debug!("Installing RHOB monitoring for tenant {}", tenant_ns);
+            }
+        }
+
+        Ok(Outcome::success(format!(
+            "Installed monitoring stack for tenant {} with version {:?}",
+            self.score.tenant_name,
+            self.score.monitoring_stack.version
+        )))
+    }
+
+    fn get_name(&self) -> InterpretName {
+        InterpretName::Custom("TenantMonitoringInterpret")
+    }
+
+    fn get_version(&self) -> Version {
+        Version::from("1.0.0").unwrap()
+    }
+
+    fn get_status(&self) -> InterpretStatus {
+        InterpretStatus::SUCCESS
+    }
+
+    fn get_children(&self) -> Vec<harmony_types::id::Id> {
+        Vec::new()
+    }
+}
--- a/brocade/examples/main.rs
+++ b/brocade/examples/main.rs
@@ -1,7 +1,7 @@
 use std::net::{IpAddr, Ipv4Addr};

 use brocade::{BrocadeOptions, ssh};
-use harmony_secret::Secret;
+use harmony_secret::{Secret, SecretManager};
 use harmony_types::switch::PortLocation;
 use schemars::JsonSchema;
 use serde::{Deserialize, Serialize};
@@ -21,17 +21,15 @@ async fn main() {
    // let ip = IpAddr::V4(Ipv4Addr::new(192, 168, 4, 11)); // brocade @ st
    let switch_addresses = vec![ip];

-    // let config = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
-    //     .await
-    //     .unwrap();
+    let config = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
+        .await
+        .unwrap();

    let brocade = brocade::init(
        &switch_addresses,
-        // &config.username,
-        // &config.password,
-        "admin",
-        "password",
-        BrocadeOptions {
+        &config.username,
+        &config.password,
+        &BrocadeOptions {
            dry_run: true,
            ssh: ssh::SshOptions {
                port: 2222,
--- a/brocade/src/fast_iron.rs
+++ b/brocade/src/fast_iron.rs
@@ -1,8 +1,7 @@
 use super::BrocadeClient;
 use crate::{
    BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo, MacAddressEntry,
-    PortChannelId, PortOperatingMode, SecurityLevel, parse_brocade_mac_address,
-    shell::BrocadeShell,
+    PortChannelId, PortOperatingMode, parse_brocade_mac_address, shell::BrocadeShell,
 };

 use async_trait::async_trait;
--- a/brocade/src/lib.rs
+++ b/brocade/src/lib.rs
@@ -144,7 +144,7 @@ pub async fn init(
    ip_addresses: &[IpAddr],
    username: &str,
    password: &str,
-    options: BrocadeOptions,
+    options: &BrocadeOptions,
 ) -> Result<Box<dyn BrocadeClient + Send + Sync>, Error> {
    let shell = BrocadeShell::init(ip_addresses, username, password, options).await?;

--- a/brocade/src/network_operating_system.rs
+++ b/brocade/src/network_operating_system.rs
@@ -8,7 +8,7 @@ use regex::Regex;
 use crate::{
    BrocadeClient, BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo,
    InterfaceStatus, InterfaceType, MacAddressEntry, PortChannelId, PortOperatingMode,
-    SecurityLevel, parse_brocade_mac_address, shell::BrocadeShell,
+    parse_brocade_mac_address, shell::BrocadeShell,
 };

 #[derive(Debug)]
--- a/brocade/src/shell.rs
+++ b/brocade/src/shell.rs
@@ -28,7 +28,7 @@ impl BrocadeShell {
        ip_addresses: &[IpAddr],
        username: &str,
        password: &str,
-        options: BrocadeOptions,
+        options: &BrocadeOptions,
    ) -> Result<Self, Error> {
        let ip = ip_addresses
            .first()
--- a/brocade/src/ssh.rs
+++ b/brocade/src/ssh.rs
@@ -70,7 +70,7 @@ pub async fn try_init_client(
    username: &str,
    password: &str,
    ip: &std::net::IpAddr,
-    base_options: BrocadeOptions,
+    base_options: &BrocadeOptions,
 ) -> Result<BrocadeOptions, Error> {
    let mut default = SshOptions::default();
    default.port = base_options.ssh.port;
--- a/docs/README.md
+++ b/docs/README.md
@@ -1 +1,46 @@
-Not much here yet, see the `adr` folder for now. More to come in time!
+# Harmony Documentation Hub
+
+Welcome to the Harmony documentation. This is the main entry point for learning everything from core concepts to building your own Score, Topologies, and Capabilities.
+
+## 1. Getting Started
+
+If you're new to Harmony, start here:
+
+- [**Getting Started Guide**](./guides/getting-started.md): A step-by-step tutorial that takes you from an empty project to deploying your first application.
+- [**Core Concepts**](./concepts.md): A high-level overview of the key concepts in Harmony: `Score`, `Topology`, `Capability`, `Inventory`, `Interpret`, ...
+
+## 2. Use Cases & Examples
+
+See how to use Harmony to solve real-world problems.
+
+- [**OKD on Bare Metal**](./use-cases/okd-on-bare-metal.md): A detailed walkthrough of bootstrapping a high-availability OKD cluster from physical hardware.
+- [**Deploy a Rust Web App**](./use-cases/deploy-rust-webapp.md): A quick guide to deploying a monitored, containerized web application to a Kubernetes cluster.
+
+## 3. Component Catalogs
+
+Discover existing, reusable components you can use in your Harmony projects.
+
+- [**Scores Catalog**](./catalogs/scores.md): A categorized list of all available `Scores` (the "what").
+- [**Topologies Catalog**](./catalogs/topologies.md): A list of all available `Topologies` (the "where").
+- [**Capabilities Catalog**](./catalogs/capabilities.md): A list of all available `Capabilities` (the "how").
+
+## 4. Developer Guides
+
+Ready to build your own components? These guides show you how.
+
+- [**Writing a Score**](./guides/writing-a-score.md): Learn how to create your own `Score` and `Interpret` logic to define a new desired state.
+- [**Writing a Topology**](./guides/writing-a-topology.md): Learn how to model a new environment (like AWS, GCP, or custom hardware) as a `Topology`.
+- [**Adding Capabilities**](./guides/adding-capabilities.md): See how to add a `Capability` to your custom `Topology`.
+- [**Coding Guide**](./coding-guide.md): Conventions and best practices for writing Harmony code.
+
+## 5. Module Documentation
+
+Deep dives into specific Harmony modules and features.
+
+- [**Monitoring and Alerting**](./monitoring.md): Comprehensive guide to cluster, tenant, and application-level monitoring with support for OKD, KubePrometheus, RHOB, and more.
+
+## 6. Architecture Decision Records
+
+Important architectural decisions are documented in the `adr/` directory:
+
+- [Full ADR Index](../adr/)
--- a/docs/catalogs/README.md
+++ b/docs/catalogs/README.md
@@ -0,0 +1,7 @@
+# Component Catalogs
+
+This section is the "dictionary" for Harmony. It lists all the reusable components available out-of-the-box.
+
+- [**Scores Catalog**](./scores.md): Discover all available `Scores` (the "what").
+- [**Topologies Catalog**](./topologies.md): A list of all available `Topologies` (the "where").
+- [**Capabilities Catalog**](./capabilities.md): A list of all available `Capabilities` (the "how").
--- a/docs/catalogs/capabilities.md
+++ b/docs/catalogs/capabilities.md
@@ -0,0 +1,40 @@
+# Capabilities Catalog
+
+A `Capability` is a specific feature or API that a `Topology` offers. `Interpret` logic uses these capabilities to execute a `Score`.
+
+This list is primarily for developers **writing new Topologies or Scores**. As a user, you just need to know that the `Topology` you pick (like `K8sAnywhereTopology`) provides the capabilities your `Scores` (like `ApplicationScore`) need.
+
+<!--toc:start-->
+
+- [Capabilities Catalog](#capabilities-catalog)
+  - [Kubernetes & Application](#kubernetes-application)
+  - [Monitoring & Observability](#monitoring-observability)
+  - [Networking (Core Services)](#networking-core-services)
+  - [Networking (Hardware & Host)](#networking-hardware-host)
+
+<!--toc:end-->
+
+## Kubernetes & Application
+
+- **K8sClient**: Provides an authenticated client to interact with a Kubernetes API (create/read/update/delete resources).
+- **HelmCommand**: Provides the ability to execute Helm commands (install, upgrade, template).
+- **TenantManager**: Provides methods for managing tenants in a multi-tenant cluster.
+- **Ingress**: Provides an interface for managing ingress controllers and resources.
+
+## Monitoring & Observability
+
+- **Grafana**: Provides an API for configuring Grafana (datasources, dashboards).
+- **Monitoring**: A general capability for configuring monitoring (e.g., creating Prometheus rules).
+
+## Networking (Core Services)
+
+- **DnsServer**: Provides an interface for creating and managing DNS records.
+- **LoadBalancer**: Provides an interface for configuring a load balancer (e.g., OPNsense, MetalLB).
+- **DhcpServer**: Provides an interface for managing DHCP leases and host bindings.
+- **TftpServer**: Provides an interface for managing files on a TFTP server (e.g., iPXE boot files).
+
+## Networking (Hardware & Host)
+
+- **Router**: Provides an interface for configuring routing rules, typically on a firewall like OPNsense.
+- **Switch**: Provides an interface for configuring a physical network switch (e.g., managing VLANs and port channels).
+- **NetworkManager**: Provides an interface for configuring host-level networking (e.g., creating bonds and bridges on a node).
--- a/docs/catalogs/scores.md
+++ b/docs/catalogs/scores.md
@@ -0,0 +1,102 @@
+# Scores Catalog
+
+A `Score` is a declarative description of a desired state. Find the Score you need and add it to your `harmony!` block's `scores` array.
+
+<!--toc:start-->
+
+- [Scores Catalog](#scores-catalog)
+  - [Application Deployment](#application-deployment)
+  - [OKD / Kubernetes Cluster Setup](#okd-kubernetes-cluster-setup)
+  - [Cluster Services & Management](#cluster-services-management)
+  - [Monitoring & Alerting](#monitoring-alerting)
+  - [Infrastructure & Networking (Bare Metal)](#infrastructure-networking-bare-metal)
+  - [Infrastructure & Networking (Cluster)](#infrastructure-networking-cluster)
+  - [Tenant Management](#tenant-management)
+  - [Utility](#utility)
+
+<!--toc:end-->
+
+## Application Deployment
+
+Scores for deploying and managing end-user applications.
+
+- **ApplicationScore**: The primary score for deploying a web application. Describes the application, its framework, and the features it requires (e.g., monitoring, CI/CD).
+- **HelmChartScore**: Deploys a generic Helm chart to a Kubernetes cluster.
+- **ArgoHelmScore**: Deploys an application using an ArgoCD Helm chart.
+- **LAMPScore**: A specialized score for deploying a classic LAMP (Linux, Apache, MySQL, PHP) stack.
+
+## OKD / Kubernetes Cluster Setup
+
+This collection of Scores is used to provision an entire OKD cluster from bare metal. They are typically used in order.
+
+- **OKDSetup01InventoryScore**: Discovers and catalogs the physical hardware.
+- **OKDSetup02BootstrapScore**: Configures the bootstrap node, renders iPXE files, and kicks off the SCOS installation.
+- **OKDSetup03ControlPlaneScore**: Renders iPXE configurations for the control plane nodes.
+- **OKDSetupPersistNetworkBondScore**: Configures network bonds on the nodes and port channels on the switches.
+- **OKDSetup04WorkersScore**: Renders iPXE configurations for the worker nodes.
+- **OKDSetup06InstallationReportScore**: Runs post-installation checks and generates a report.
+- **OKDUpgradeScore**: Manages the upgrade process for an existing OKD cluster.
+
+## Cluster Services & Management
+
+Scores for installing and managing services _inside_ a Kubernetes cluster.
+
+- **K3DInstallationScore**: Installs and configes a local K3D (k3s-in-docker) cluster. Used by `K8sAnywhereTopology`.
+- **CertManagerHelmScore**: Deploys the `cert-manager` Helm chart.
+- **ClusterIssuerScore**: Configures a `ClusterIssuer` for `cert-manager`, (e.g., for Let's Encrypt).
+- **K8sNamespaceScore**: Ensures a Kubernetes namespace exists.
+- **K8sDeploymentScore**: Deploys a generic `Deployment` resource to Kubernetes.
+- **K8sIngressScore**: Configures an `Ingress` resource for a service.
+
+## Monitoring & Alerting
+
+Scores for configuring observability, dashboards, and alerts.
+
+- **ApplicationMonitoringScore**: A generic score to set up monitoring for an application.
+- **ApplicationRHOBMonitoringScore**: A specialized score for setting up monitoring via the Red Hat Observability stack.
+- **HelmPrometheusAlertingScore**: Configures Prometheus alerts via a Helm chart.
+- **K8sPrometheusCRDAlertingScore**: Configures Prometheus alerts using the `PrometheusRule` CRD.
+- **PrometheusAlertScore**: A generic score for creating a Prometheus alert.
+- **RHOBAlertingScore**: Configures alerts specifically for the Red Hat Observability stack.
+- **NtfyScore**: Configures alerts to be sent to a `ntfy.sh` server.
+
+## Infrastructure & Networking (Bare Metal)
+
+Low-level scores for managing physical hardware and network services.
+
+- **DhcpScore**: Configures a DHCP server.
+- **OKDDhcpScore**: A specialized DHCP configuration for the OKD bootstrap process.
+- **OKDBootstrapDhcpScore**: Configures DHCP specifically for the bootstrap node.
+- **DhcpHostBindingScore**: Creates a specific MAC-to-IP binding in the DHCP server.
+- **DnsScore**: Configures a DNS server.
+- **OKDDnsScore**: A specialized DNS configuration for the OKD cluster (e.g., `api.*`, `*.apps.*`).
+- **StaticFilesHttpScore**: Serves a directory of static files (e.g., a documentation site) over HTTP.
+- **TftpScore**: Configures a TFTP server, typically for serving iPXE boot files.
+- **IPxeMacBootFileScore**: Assigns a specific iPXE boot file to a MAC address in the TFTP server.
+- **OKDIpxeScore**: A specialized score for generating the iPXE boot scripts for OKD.
+- **OPNsenseShellCommandScore**: Executes a shell command on an OPNsense firewall.
+
+## Infrastructure & Networking (Cluster)
+
+Network services that run inside the cluster or as part of the topology.
+
+- **LoadBalancerScore**: Configures a general-purpose load balancer.
+- **OKDLoadBalancerScore**: Configures the high-availability load balancers for the OKD API and ingress.
+- **OKDBootstrapLoadBalancerScore**: Configures the load balancer specifically for the bootstrap-time API endpoint.
+- **K8sIngressScore**: Configures an Ingress controller or resource.
+- [HighAvailabilityHostNetworkScore](../../harmony/src/modules/okd/host_network.rs): Configures network bonds on a host and the corresponding port-channels on the switch stack for high-availability.
+
+## Tenant Management
+
+Scores for managing multi-tenancy within a cluster.
+
+- **TenantScore**: Creates a new tenant (e.g., a namespace, quotas, network policies).
+- **TenantCredentialScore**: Generates and provisions credentials for a new tenant.
+
+## Utility
+
+Helper scores for discovery and inspection.
+
+- **LaunchDiscoverInventoryAgentScore**: Launches the agent responsible for the `OKDSetup01InventoryScore`.
+- **DiscoverHostForRoleScore**: A utility score to find a host matching a specific role in the inventory.
+- **InspectInventoryScore**: Dumps the discovered inventory for inspection.
--- a/docs/catalogs/topologies.md
+++ b/docs/catalogs/topologies.md
@@ -0,0 +1,59 @@
+# Topologies Catalog
+
+A `Topology` is the logical representation of your infrastructure and its `Capabilities`. You select a `Topology` in your Harmony project to define _where_ your `Scores` will be applied.
+
+<!--toc:start-->
+
+- [Topologies Catalog](#topologies-catalog)
+  - [HAClusterTopology](#haclustertopology)
+  - [K8sAnywhereTopology](#k8sanywheretopology)
+
+<!--toc:end-->
+
+### HAClusterTopology
+
+- **`HAClusterTopology::autoload()`**
+
+This `Topology` represents a high-availability, bare-metal cluster. It is designed for production-grade deployments like OKD.
+
+It models an environment consisting of:
+
+- At least 3 cluster nodes (for control plane/workers)
+- 2 redundant firewalls (e.g., OPNsense)
+- 2 redundant network switches
+
+**Provided Capabilities:**
+This topology provides a rich set of capabilities required for bare-metal provisioning and cluster management, including:
+
+- `K8sClient` (once the cluster is bootstrapped)
+- `DnsServer`
+- `LoadBalancer`
+- `DhcpServer`
+- `TftpServer`
+- `Router` (via the firewalls)
+- `Switch`
+- `NetworkManager` (for host-level network config)
+
+---
+
+### K8sAnywhereTopology
+
+- **`K8sAnywhereTopology::from_env()`**
+
+This `Topology` is designed for development and application deployment. It provides a simple, abstract way to deploy to _any_ Kubernetes cluster.
+
+**How it works:**
+
+1. By default (`from_env()` with no env vars), it automatically provisions a **local K3D (k3s-in-docker) cluster** on your machine. This is perfect for local development and testing.
+2. If you provide a `KUBECONFIG` environment variable, it will instead connect to that **existing Kubernetes cluster** (e.g., your staging or production OKD cluster).
+
+This allows you to use the _exact same code_ to deploy your application locally as you do to deploy it to production.
+
+**Provided Capabilities:**
+
+- `K8sClient`
+- `HelmCommand`
+- `TenantManager`
+- `Ingress`
+- `Monitoring`
+- ...and more.
--- a/docs/coding-guide.md
+++ b/docs/coding-guide.md
@@ -0,0 +1,299 @@
+# Harmony Coding Guide
+
+Harmony is an infrastructure automation framework. It is **code-first and code-only**: operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Good code here means a good operator experience.
+
+### Concrete context
+
+We use here the context of the KVM module to explain the coding style. This will make it very easy to understand and should translate quite well to other modules/contexts managed by Harmony like OPNSense and Kubernetes.
+
+## Core Philosophy
+
+### The Careful Craftsman Principle
+
+Harmony is a powerful framework that does a lot. With that power comes responsibility. Every abstraction, every trait, every module must earn its place. Before adding anything, ask:
+
+1. **Does this solve a real problem users have?** Not a theoretical problem, an actual one encountered in production.
+2. **Is this the simplest solution that works?** Complexity is a cost that compounds over time.
+3. **Will this make the next developer's life easier or harder?** Code is read far more often than written.
+
+When in doubt, don't abstract. Wait for the pattern to emerge from real usage. A little duplication is better than the wrong abstraction.
+
+### High-level functions over raw primitives
+
+Callers should not need to know about underlying protocols, XML schemas, or API quirks. A function that deploys a VM should accept meaningful parameters like CPU count, memory, and network name — not XML strings.
+
+```rust
+// Bad: caller constructs XML and passes it to a thin wrapper
+let xml = format!(r#"<domain type='kvm'>...</domain>"#, name, memory_kb, ...);
+executor.create_vm(&xml).await?;
+
+// Good: caller describes intent, the module handles representation
+executor.define_vm(&VmConfig::builder("my-vm")
+    .cpu(4)
+    .memory_gb(8)
+    .disk(DiskConfig::new(50))
+    .network(NetworkRef::named("mylan"))
+    .boot_order([BootDevice::Network, BootDevice::Disk])
+    .build())
+    .await?;
+```
+
+The module owns the XML, the virsh invocations, the API calls — not the caller.
+
+### Use the right abstraction layer
+
+Prefer native library bindings over shelling out to CLI tools. The `virt` crate provides direct libvirt bindings and should be used instead of spawning `virsh` subprocesses.
+
+- CLI subprocess calls are fragile: stdout/stderr parsing, exit codes, quoting, PATH differences
+- Native bindings give typed errors, no temp files, no shell escaping
+- `virt::connect::Connect` opens a connection; `virt::domain::Domain` manages VMs; `virt::network::Network` manages virtual networks
+
+### Keep functions small and well-named
+
+Each function should do one thing. If a function is doing two conceptually separate things, split it. Function names should read like plain English: `ensure_network_active`, `define_vm`, `vm_is_running`.
+
+### Prefer short modules over large files
+
+Group related types and functions by concept. A module that handles one resource (e.g., network, domain, storage) is better than a single file for everything.
+
+---
+
+## Error Handling
+
+### Use `thiserror` for all error types
+
+Define error types with `thiserror::Error`. This removes the boilerplate of implementing `Display` and `std::error::Error` by hand, keeps error messages close to their variants, and makes types easy to extend.
+
+```rust
+// Bad: hand-rolled Display + std::error::Error
+#[derive(Debug)]
+pub enum KVMError {
+    ConnectionError(String),
+    VMNotFound(String),
+}
+
+impl std::fmt::Display for KVMError { ... }
+impl std::error::Error for KVMError {}
+
+// Good: derive Display via thiserror
+#[derive(thiserror::Error, Debug)]
+pub enum KVMError {
+    #[error("connection failed: {0}")]
+    ConnectionFailed(String),
+    #[error("VM not found: {name}")]
+    VmNotFound { name: String },
+}
+```
+
+### Make bubbling errors easy with `?` and `From`
+
+`?` works on any error type for which there is a `From` impl. Add `From` conversions from lower-level errors into your module's error type so callers can use `?` without boilerplate.
+
+With `thiserror`, wrapping a foreign error is one line:
+
+```rust
+#[derive(thiserror::Error, Debug)]
+pub enum KVMError {
+    #[error("libvirt error: {0}")]
+    Libvirt(#[from] virt::error::Error),
+
+    #[error("IO error: {0}")]
+    Io(#[from] std::io::Error),
+}
+```
+
+This means a call that returns `virt::error::Error` can be `?`-propagated into a `Result<_, KVMError>` without any `.map_err(...)`.
+
+### Typed errors over stringly-typed errors
+
+Avoid `Box<dyn Error>` or `String` as error return types in library code. Callers need to distinguish errors programmatically — `KVMError::VmAlreadyExists` is actionable, `"VM already exists: foo"` as a `String` is not.
+
+At binary entry points (e.g., `main`) it is acceptable to convert to `String` or `anyhow::Error` for display.
+
+---
+
+## Logging
+
+### Use the `log` crate macros
+
+All log output must go through the `log` crate. Never use `println!`, `eprintln!`, or `dbg!` in library code. This makes output compatible with any logging backend (env_logger, tracing, structured logging, etc.).
+
+```rust
+// Bad
+println!("Creating VM: {}", name);
+
+// Good
+use log::{info, debug, warn};
+info!("Creating VM: {name}");
+debug!("VM XML:\n{xml}");
+warn!("Network already active, skipping creation");
+```
+
+Use the right level:
+
+| Level   | When to use |
+|---------|-------------|
+| `error` | Unrecoverable failures (before returning Err) |
+| `warn`  | Recoverable issues, skipped steps |
+| `info`  | High-level progress events visible in normal operation |
+| `debug` | Detailed operational info useful for debugging |
+| `trace` | Very granular, per-iteration or per-call data |
+
+Log before significant operations and after unexpected conditions. Do not log inside tight loops at `info` level.
+
+---
+
+## Types and Builders
+
+### Derive `Serialize` on all public domain types
+
+All public structs and enums that represent configuration or state should derive `serde::Serialize`. Add `Deserialize` when round-trip serialization is needed.
+
+### Builder pattern for complex configs
+
+When a type has more than three fields or optional fields, provide a builder. The builder pattern allows named, incremental construction without positional arguments.
+
+```rust
+let config = VmConfig::builder("bootstrap")
+    .cpu(4)
+    .memory_gb(8)
+    .disk(DiskConfig::new(50).labeled("os"))
+    .disk(DiskConfig::new(100).labeled("data"))
+    .network(NetworkRef::named("harmonylan"))
+    .boot_order([BootDevice::Network, BootDevice::Disk])
+    .build();
+```
+
+### Avoid `pub` fields on config structs
+
+Expose data through methods or the builder, not raw field access. This preserves the ability to validate, rename, or change representation without breaking callers.
+
+---
+
+## Async
+
+### Use `tokio` for all async runtime needs
+
+All async code runs on tokio. Use `tokio::spawn`, `tokio::time`, etc. Use `#[async_trait]` for traits with async methods.
+
+### No blocking in async context
+
+Never call blocking I/O (file I/O, network, process spawn) directly in an async function. Use `tokio::fs`, `tokio::process`, or `tokio::task::spawn_blocking` as appropriate.
+
+---
+
+## Module Structure
+
+### Follow the `Score` / `Interpret` pattern
+
+Modules that represent deployable infrastructure should implement `Score<T: Topology>` and `Interpret<T>`:
+
+- `Score` is the serializable, clonable configuration declaring *what* to deploy
+- `Interpret` does the actual work when `execute()` is called
+
+```rust
+pub struct KvmScore {
+    network: NetworkConfig,
+    vms: Vec<VmConfig>,
+}
+
+impl<T: Topology + KvmHost> Score<T> for KvmScore {
+    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
+        Box::new(KvmInterpret::new(self.clone()))
+    }
+    fn name(&self) -> String { "KvmScore".to_string() }
+}
+```
+
+### Flatten the public API in `mod.rs`
+
+Internal submodules are implementation detail. Re-export what callers need at the module root:
+
+```rust
+// modules/kvm/mod.rs
+mod connection;
+mod domain;
+mod network;
+mod error;
+mod xml;
+
+pub use connection::KvmConnection;
+pub use domain::{VmConfig, VmConfigBuilder, VmStatus, DiskConfig, BootDevice};
+pub use error::KvmError;
+pub use network::NetworkConfig;
+```
+
+---
+
+## Commit Style
+
+Follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/):
+
+```
+feat(kvm): add network isolation support
+fix(kvm): correct memory unit conversion for libvirt
+refactor(kvm): replace virsh subprocess calls with virt crate bindings
+docs: add coding guide
+```
+
+Keep pull requests small and single-purpose (under ~200 lines excluding generated code). Do not mix refactoring, bug fixes, and new features in one PR.
+
+---
+
+## When to Add Abstractions
+
+Harmony provides powerful abstraction mechanisms: traits, generics, the Score/Interpret pattern, and capabilities. Use them judiciously.
+
+### Add an abstraction when:
+
+- **You have three or more concrete implementations** doing the same thing. Two is often coincidence; three is a pattern.
+- **The abstraction provides compile-time safety** that prevents real bugs (e.g., capability bounds on topologies).
+- **The abstraction hides genuine complexity** that callers shouldn't need to understand (e.g., XML schema generation for libvirt).
+
+### Don't add an abstraction when:
+
+- **It's just to avoid a few lines of boilerplate**. Copy-paste is sometimes better than a trait hierarchy.
+- **You're anticipating future flexibility** that isn't needed today. YAGNI (You Aren't Gonna Need It).
+- **The abstraction makes the code harder to understand** for someone unfamiliar with the codebase.
+- **You're wrapping a single implementation**. A trait with one implementation is usually over-engineering.
+
+### Signs you've over-abstracted:
+
+- You need to explain the type system to a competent Rust developer for them to understand how to add a simple feature.
+- Adding a new concrete type requires changes in multiple trait definitions.
+- The word "factory" or "manager" appears in your type names.
+- You have more trait definitions than concrete implementations.
+
+### The Rule of Three for Traits
+
+Before creating a new trait, ensure you have:
+
+1. A clear, real use case (not hypothetical)
+2. At least one concrete implementation
+3. A plan for how callers will use it
+
+Only generalize when the pattern is proven. The monitoring module is a good example: we had multiple alert senders (OKD, KubePrometheus, RHOB) before we introduced the `AlertSender` and `AlertReceiver<S>` traits. The traits emerged from real needs, not design sessions.
+
+---
+
+## Documentation
+
+### Document the "why", not the "what"
+
+Code should be self-explanatory for the "what". Comments and documentation should explain intent, rationale, and gotchas.
+
+```rust
+// Bad: restates the code
+// Returns the number of VMs
+fn vm_count(&self) -> usize { self.vms.len() }
+
+// Good: explains the why
+// Returns 0 if connection is lost, rather than erroring,
+// because monitoring code uses this for health checks
+fn vm_count(&self) -> usize { self.vms.len() }
+```
+
+### Keep examples in the `examples/` directory
+
+Working code beats documentation. Every major feature should have a runnable example that demonstrates real usage.
+
--- a/docs/concepts.md
+++ b/docs/concepts.md
@@ -0,0 +1,40 @@
+# Core Concepts
+
+Harmony's design is based on a few key concepts. Understanding them is the key to unlocking the framework's power.
+
+### 1. Score
+
+- **What it is:** A **Score** is a declarative description of a desired state. It's a "resource" that defines _what_ you want to achieve, not _how_ to do it.
+- **Example:** `ApplicationScore` declares "I want this web application to be running and monitored."
+
+### 2. Topology
+
+- **What it is:** A **Topology** is the logical representation of your infrastructure and its abilities. It's the "where" your Scores will be applied.
+- **Key Job:** A Topology's most important job is to expose which `Capabilities` it supports.
+- **Example:** `HAClusterTopology` represents a bare-metal cluster and exposes `Capabilities` like `NetworkManager` and `Switch`. `K8sAnywhereTopology` represents a Kubernetes cluster and exposes the `K8sClient` `Capability`.
+
+### 3. Capability
+
+- **What it is:** A **Capability** is a specific feature or API that a `Topology` offers. It's the "how" a `Topology` can fulfill a `Score`'s request.
+- **Example:** The `K8sClient` capability offers a way to interact with a Kubernetes API. The `Switch` capability offers a way to configure a physical network switch.
+
+### 4. Interpret
+
+- **What it is:** An **Interpret** is the execution logic that makes a `Score` a reality. It's the "glue" that connects the _desired state_ (`Score`) to the _environment's abilities_ (`Topology`'s `Capabilities`).
+- **How it works:** When you apply a `Score`, Harmony finds the matching `Interpret` for your `Topology`. This `Interpret` then uses the `Capabilities` provided by the `Topology` to execute the necessary steps.
+
+### 5. Inventory
+
+- **What it is:** An **Inventory** is the physical material (the "what") used in a cluster. This is most relevant for bare-metal or on-premise topologies.
+- **Example:** A list of nodes with their roles (control plane, worker), CPU, RAM, and network interfaces. For the `K8sAnywhereTopology`, the inventory might be empty or autoloaded, as the infrastructure is more abstract.
+
+---
+
+### How They Work Together (The Compile-Time Check)
+
+1. You **write a `Score`** (e.g., `ApplicationScore`).
+2. Your `Score`'s `Interpret` logic requires certain **`Capabilities`** (e.g., `K8sClient` and `Ingress`).
+3. You choose a **`Topology`** to run it on (e.g., `HAClusterTopology`).
+4. **At compile-time**, Harmony checks: "Does `HAClusterTopology` provide the `K8sClient` and `Ingress` capabilities that `ApplicationScore` needs?"
+   - **If Yes:** Your code compiles. You can be confident it will run.
+   - **If No:** The compiler gives you an error. You've just prevented a "config-is-valid-but-platform-is-wrong" runtime error before you even deployed.
--- a/docs/guides/getting-started.md
+++ b/docs/guides/getting-started.md
@@ -0,0 +1,42 @@
+# Getting Started Guide
+
+Welcome to Harmony! This guide will walk you through installing the Harmony framework, setting up a new project, and deploying your first application.
+
+We will build and deploy the "Rust Web App" example, which automatically:
+
+1. Provisions a local K3D (Kubernetes in Docker) cluster.
+2. Deploys a sample Rust web application.
+3. Sets up monitoring for the application.
+
+## Prerequisites
+
+Before you begin, you'll need a few tools installed on your system:
+
+- **Rust & Cargo:** [Install Rust](https://www.rust-lang.org/tools/install)
+- **Docker:** [Install Docker](https://docs.docker.com/get-docker/) (Required for the K3D local cluster)
+- **kubectl:** [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) (For inspecting the cluster)
+
+## 1. Install Harmony
+
+First, clone the Harmony repository and build the project. This gives you the `harmony` CLI and all the core libraries.
+
+```bash
+# Clone the main repository
+git clone https://git.nationtech.io/nationtech/harmony
+cd harmony
+
+# Build the project (this may take a few minutes)
+cargo build --release
+```
+
+...
+
+## Next Steps
+
+Congratulations, you've just deployed an application using true infrastructure-as-code!
+
+From here, you can:
+
+- [Explore the Catalogs](../catalogs/README.md): See what other [Scores](../catalogs/scores.md) and [Topologies](../catalogs/topologies.md) are available.
+- [Read the Use Cases](../use-cases/README.md): Check out the [OKD on Bare Metal](./use-cases/okd-on-bare-metal.md) guide for a more advanced scenario.
+- [Write your own Score](../guides/writing-a-score.md): Dive into the [Developer Guide](./guides/developer-guide.md) to start building your own components.
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -0,0 +1,443 @@
+# Monitoring and Alerting in Harmony
+
+Harmony provides a unified, type-safe approach to monitoring and alerting across Kubernetes, OpenShift, and bare-metal infrastructure. This guide explains the architecture and how to use it at different levels of abstraction.
+
+## Overview
+
+Harmony's monitoring module supports three distinct use cases:
+
+| Level | Who Uses It | What It Provides |
+|-------|-------------|------------------|
+| **Cluster** | Cluster administrators | Full control over monitoring stack, cluster-wide alerts, external scrape targets |
+| **Tenant** | Platform teams | Namespace-scoped monitoring in multi-tenant environments |
+| **Application** | Application developers | Zero-config monitoring that "just works" |
+
+Each level builds on the same underlying abstractions, ensuring consistency while providing appropriate complexity for each audience.
+
+## Core Concepts
+
+### AlertSender
+
+An `AlertSender` represents the system that evaluates alert rules and sends notifications. Harmony supports multiple monitoring stacks:
+
+| Sender | Description | Use When |
+|--------|-------------|----------|
+| `OpenshiftClusterAlertSender` | OKD/OpenShift built-in monitoring | Running on OKD/OpenShift |
+| `KubePrometheus` | kube-prometheus-stack via Helm | Standard Kubernetes, need full stack |
+| `Prometheus` | Standalone Prometheus | Custom Prometheus deployment |
+| `RedHatClusterObservability` | RHOB operator | Red Hat managed clusters |
+| `Grafana` | Grafana-managed alerting | Grafana as primary alerting layer |
+
+### AlertReceiver
+
+An `AlertReceiver` defines where alerts are sent (Discord, Slack, email, webhook, etc.). Receivers are parameterized by sender type because each monitoring stack has different configuration formats.
+
+```rust
+pub trait AlertReceiver<S: AlertSender> {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
+    fn name(&self) -> String;
+}
+```
+
+Built-in receivers:
+- `DiscordReceiver` - Discord webhooks
+- `WebhookReceiver` - Generic HTTP webhooks
+
+### AlertRule
+
+An `AlertRule` defines a Prometheus alert expression. Rules are also parameterized by sender to handle different CRD formats.
+
+```rust
+pub trait AlertRule<S: AlertSender> {
+    fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
+    fn name(&self) -> String;
+}
+```
+
+### Observability Capability
+
+Topologies implement `Observability<S>` to indicate they support a specific alert sender:
+
+```rust
+impl Observability<OpenshiftClusterAlertSender> for K8sAnywhereTopology {
+    async fn install_receivers(&self, sender, inventory, receivers) { ... }
+    async fn install_rules(&self, sender, inventory, rules) { ... }
+    // ...
+}
+```
+
+This provides **compile-time verification**: if you try to use `OpenshiftClusterAlertScore` with a topology that doesn't implement `Observability<OpenshiftClusterAlertSender>`, the code won't compile.
+
+---
+
+## Level 1: Cluster Monitoring
+
+Cluster monitoring is for administrators who need full control over the monitoring infrastructure. This includes:
+- Installing/managing the monitoring stack
+- Configuring cluster-wide alert receivers
+- Defining cluster-level alert rules
+- Adding external scrape targets (e.g., bare-metal servers, firewalls)
+
+### Example: OKD Cluster Alerts
+
+```rust
+use harmony::{
+    modules::monitoring::{
+        alert_channel::discord_alert_channel::DiscordReceiver,
+        alert_rule::{alerts::k8s::pvc::high_pvc_fill_rate_over_two_days, prometheus_alert_rule::AlertManagerRuleGroup},
+        okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
+        scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
+    },
+    topology::{K8sAnywhereTopology, monitoring::{AlertMatcher, AlertRoute, MatchOp}},
+};
+
+let severity_matcher = AlertMatcher {
+    label: "severity".to_string(),
+    operator: MatchOp::Eq,
+    value: "critical".to_string(),
+};
+
+let rule_group = AlertManagerRuleGroup::new(
+    "cluster-rules",
+    vec![high_pvc_fill_rate_over_two_days()],
+);
+
+let external_exporter = PrometheusNodeExporter {
+    job_name: "firewall".to_string(),
+    metrics_path: "/metrics".to_string(),
+    listen_address: ip!("192.168.1.1"),
+    port: 9100,
+    ..Default::default()
+};
+
+harmony_cli::run(
+    Inventory::autoload(),
+    K8sAnywhereTopology::from_env(),
+    vec![Box::new(OpenshiftClusterAlertScore {
+        sender: OpenshiftClusterAlertSender,
+        receivers: vec![Box::new(DiscordReceiver {
+            name: "critical-alerts".to_string(),
+            url: hurl!("https://discord.com/api/webhooks/..."),
+            route: AlertRoute {
+                matchers: vec![severity_matcher],
+                ..AlertRoute::default("critical-alerts".to_string())
+            },
+        })],
+        rules: vec![Box::new(rule_group)],
+        scrape_targets: Some(vec![Box::new(external_exporter)]),
+    })],
+    None,
+).await?;
+```
+
+### What This Does
+
+1. **Enables cluster monitoring** - Activates OKD's built-in Prometheus
+2. **Enables user workload monitoring** - Allows namespace-scoped rules
+3. **Configures Alertmanager** - Adds Discord receiver with route matching
+4. **Deploys alert rules** - Creates `AlertingRule` CRD with PVC fill rate alert
+5. **Adds external scrape target** - Configures Prometheus to scrape the firewall
+
+### Compile-Time Safety
+
+The `OpenshiftClusterAlertScore` requires:
+
+```rust
+impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T>
+    for OpenshiftClusterAlertScore
+```
+
+If `K8sAnywhereTopology` didn't implement `Observability<OpenshiftClusterAlertSender>`, this code would fail to compile. You cannot accidentally deploy OKD alerts to a cluster that doesn't support them.
+
+---
+
+## Level 2: Tenant Monitoring
+
+In multi-tenant clusters, teams are often confined to specific namespaces. Tenant monitoring adapts to this constraint:
+
+- Resources are deployed in the tenant's namespace
+- Cannot modify cluster-level monitoring configuration
+- The topology determines namespace context at runtime
+
+### How It Works
+
+The topology's `Observability` implementation handles tenant scoping:
+
+```rust
+impl Observability<KubePrometheus> for K8sAnywhereTopology {
+    async fn install_rules(&self, sender, inventory, rules) {
+        // Topology knows if it's tenant-scoped
+        let namespace = self.get_tenant_config().await
+            .map(|t| t.name)
+            .unwrap_or_else(|| "monitoring".to_string());
+        
+        // Rules are installed in the appropriate namespace
+        for rule in rules.unwrap_or_default() {
+            let score = KubePrometheusRuleScore {
+                sender: sender.clone(),
+                rule,
+                namespace: namespace.clone(), // Tenant namespace
+            };
+            score.create_interpret().execute(inventory, self).await?;
+        }
+    }
+}
+```
+
+### Tenant vs Cluster Resources
+
+| Resource | Cluster-Level | Tenant-Level |
+|----------|---------------|--------------|
+| Alertmanager config | Global receivers | Namespaced receivers (where supported) |
+| PrometheusRules | Cluster-wide alerts | Namespace alerts only |
+| ServiceMonitors | Any namespace | Own namespace only |
+| External scrape targets | Can add | Cannot add (cluster config) |
+
+### Runtime Validation
+
+Tenant constraints are validated at runtime via Kubernetes RBAC. If a tenant-scoped deployment attempts cluster-level operations, it fails with a clear permission error from the Kubernetes API.
+
+This cannot be fully compile-time because tenant context is determined by who's running the code and what permissions they have—information only available at runtime.
+
+---
+
+## Level 3: Application Monitoring
+
+Application monitoring provides zero-config, opinionated monitoring for developers. Just add the `Monitoring` feature to your application and it works.
+
+### Example
+
+```rust
+use harmony::modules::{
+    application::{Application, ApplicationFeature},
+    monitoring::alert_channel::webhook_receiver::WebhookReceiver,
+};
+
+// Define your application
+let my_app = MyApplication::new();
+
+// Add monitoring as a feature
+let monitoring = Monitoring {
+    application: Arc::new(my_app),
+    alert_receiver: vec![], // Uses defaults
+};
+
+// Install with the application
+my_app.add_feature(monitoring);
+```
+
+### What Application Monitoring Provides
+
+1. **Automatic ServiceMonitor** - Creates a ServiceMonitor for your application's pods
+2. **Ntfy Notification Channel** - Auto-installs and configures Ntfy for push notifications
+3. **Tenant Awareness** - Automatically scopes to the correct namespace
+4. **Sensible Defaults** - Pre-configured alert routes and receivers
+
+### Under the Hood
+
+```rust
+impl<T: Topology + Observability<Prometheus> + TenantManager> 
+    ApplicationFeature<T> for Monitoring 
+{
+    async fn ensure_installed(&self, topology: &T) -> Result<...> {
+        // 1. Get tenant namespace (or use app name)
+        let namespace = topology.get_tenant_config().await
+            .map(|ns| ns.name.clone())
+            .unwrap_or_else(|| self.application.name());
+
+        // 2. Create ServiceMonitor for the app
+        let app_service_monitor = ServiceMonitor {
+            metadata: ObjectMeta {
+                name: Some(self.application.name()),
+                namespace: Some(namespace.clone()),
+                ..Default::default()
+            },
+            spec: ServiceMonitorSpec::default(),
+        };
+
+        // 3. Install Ntfy for notifications
+        let ntfy = NtfyScore { namespace, host };
+        ntfy.interpret(&Inventory::empty(), topology).await?;
+
+        // 4. Wire up webhook receiver to Ntfy
+        let ntfy_receiver = WebhookReceiver { ... };
+        
+        // 5. Execute monitoring score
+        alerting_score.interpret(&Inventory::empty(), topology).await?;
+    }
+}
+```
+
+---
+
+## Pre-Built Alert Rules
+
+Harmony provides a library of common alert rules in `modules/monitoring/alert_rule/alerts/`:
+
+### Kubernetes Alerts (`alerts/k8s/`)
+
+```rust
+use harmony::modules::monitoring::alert_rule::alerts::k8s::{
+    pod::pod_failed,
+    pvc::high_pvc_fill_rate_over_two_days,
+    memory_usage::alert_high_memory_usage,
+};
+
+let rules = AlertManagerRuleGroup::new("k8s-rules", vec![
+    pod_failed(),
+    high_pvc_fill_rate_over_two_days(),
+    alert_high_memory_usage(),
+]);
+```
+
+Available rules:
+- `pod_failed()` - Pod in failed state
+- `alert_container_restarting()` - Container restart loop
+- `alert_pod_not_ready()` - Pod not ready for extended period
+- `high_pvc_fill_rate_over_two_days()` - PVC will fill within 2 days
+- `alert_high_memory_usage()` - Memory usage above threshold
+- `alert_high_cpu_usage()` - CPU usage above threshold
+
+### Infrastructure Alerts (`alerts/infra/`)
+
+```rust
+use harmony::modules::monitoring::alert_rule::alerts::infra::opnsense::high_http_error_rate;
+
+let rules = AlertManagerRuleGroup::new("infra-rules", vec![
+    high_http_error_rate(),
+]);
+```
+
+### Creating Custom Rules
+
+```rust
+use harmony::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
+
+pub fn my_custom_alert() -> PrometheusAlertRule {
+    PrometheusAlertRule::new("MyServiceDown", "up{job=\"my-service\"} == 0")
+        .for_duration("5m")
+        .label("severity", "critical")
+        .annotation("summary", "My service is down")
+        .annotation("description", "The my-service job has been down for more than 5 minutes")
+}
+```
+
+---
+
+## Alert Receivers
+
+### Discord Webhook
+
+```rust
+use harmony::modules::monitoring::alert_channel::discord_alert_channel::DiscordReceiver;
+use harmony::topology::monitoring::{AlertRoute, AlertMatcher, MatchOp};
+
+let discord = DiscordReceiver {
+    name: "ops-alerts".to_string(),
+    url: hurl!("https://discord.com/api/webhooks/123456/abcdef"),
+    route: AlertRoute {
+        receiver: "ops-alerts".to_string(),
+        matchers: vec![AlertMatcher {
+            label: "severity".to_string(),
+            operator: MatchOp::Eq,
+            value: "critical".to_string(),
+        }],
+        group_by: vec!["alertname".to_string()],
+        repeat_interval: Some("30m".to_string()),
+        continue_matching: false,
+        children: vec![],
+    },
+};
+```
+
+### Generic Webhook
+
+```rust
+use harmony::modules::monitoring::alert_channel::webhook_receiver::WebhookReceiver;
+
+let webhook = WebhookReceiver {
+    name: "custom-webhook".to_string(),
+    url: hurl!("https://api.example.com/alerts"),
+    route: AlertRoute::default("custom-webhook".to_string()),
+};
+```
+
+---
+
+## Adding a New Monitoring Stack
+
+To add support for a new monitoring stack:
+
+1. **Create the sender type** in `modules/monitoring/my_sender/mod.rs`:
+   ```rust
+   #[derive(Debug, Clone)]
+   pub struct MySender;
+   
+   impl AlertSender for MySender {
+       fn name(&self) -> String { "MySender".to_string() }
+   }
+   ```
+
+2. **Define CRD types** in `modules/monitoring/my_sender/crd/`:
+   ```rust
+   #[derive(CustomResource, Debug, Serialize, Deserialize, Clone)]
+   #[kube(group = "monitoring.example.com", version = "v1", kind = "MyAlertRule")]
+   pub struct MyAlertRuleSpec { ... }
+   ```
+
+3. **Implement Observability** in `domain/topology/k8s_anywhere/observability/my_sender.rs`:
+   ```rust
+   impl Observability<MySender> for K8sAnywhereTopology {
+       async fn install_receivers(&self, sender, inventory, receivers) { ... }
+       async fn install_rules(&self, sender, inventory, rules) { ... }
+       // ...
+   }
+   ```
+
+4. **Implement receiver conversions** for existing receivers:
+   ```rust
+   impl AlertReceiver<MySender> for DiscordReceiver {
+       fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
+           // Convert DiscordReceiver to MySender's format
+       }
+   }
+   ```
+
+5. **Create score types**:
+   ```rust
+   pub struct MySenderAlertScore {
+       pub sender: MySender,
+       pub receivers: Vec<Box<dyn AlertReceiver<MySender>>>,
+       pub rules: Vec<Box<dyn AlertRule<MySender>>>,
+   }
+   ```
+
+---
+
+## Architecture Principles
+
+### Type Safety Over Flexibility
+
+Each monitoring stack has distinct CRDs and configuration formats. Rather than a unified "MonitoringStack" type that loses stack-specific features, we use generic traits that provide type safety while allowing each stack to express its unique configuration.
+
+### Compile-Time Capability Verification
+
+The `Observability<S>` bound ensures you can't deploy OKD alerts to a KubePrometheus cluster. The compiler catches platform mismatches before deployment.
+
+### Explicit Over Implicit
+
+Monitoring stacks are chosen explicitly (`OpenshiftClusterAlertSender` vs `KubePrometheus`). There's no "auto-detection" that could lead to surprising behavior.
+
+### Three Levels, One Foundation
+
+Cluster, tenant, and application monitoring all use the same traits (`AlertSender`, `AlertReceiver`, `AlertRule`). The difference is in how scores are constructed and how topologies interpret them.
+
+---
+
+## Related Documentation
+
+- [ADR-020: Monitoring and Alerting Architecture](../adr/020-monitoring-alerting-architecture.md)
+- [ADR-013: Monitoring Notifications (ntfy)](../adr/013-monitoring-notifications.md)
+- [ADR-011: Multi-Tenant Cluster Architecture](../adr/011-multi-tenant-cluster.md)
+- [Coding Guide](coding-guide.md)
+- [Core Concepts](concepts.md)
--- a/examples/application_monitoring_with_tenant/src/main.rs
+++ b/examples/application_monitoring_with_tenant/src/main.rs
@@ -7,7 +7,7 @@ use harmony::{
        monitoring::alert_channel::webhook_receiver::WebhookReceiver,
        tenant::TenantScore,
    },
-    topology::{K8sAnywhereTopology, tenant::TenantConfig},
+    topology::{K8sAnywhereTopology, monitoring::AlertRoute, tenant::TenantConfig},
 };
 use harmony_types::id::Id;
 use harmony_types::net::Url;
@@ -33,9 +33,14 @@ async fn main() {
        service_port: 3000,
    });

+    let receiver_name = "sample-webhook-receiver".to_string();
+
    let webhook_receiver = WebhookReceiver {
-        name: "sample-webhook-receiver".to_string(),
+        name: receiver_name.clone(),
        url: Url::Url(url::Url::parse("https://webhook-doesnt-exist.com").unwrap()),
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let app = ApplicationScore {
--- a/examples/brocade_switch/src/main.rs
+++ b/examples/brocade_switch/src/main.rs
@@ -1,22 +1,28 @@
 use std::str::FromStr;

-use async_trait::async_trait;
 use brocade::{BrocadeOptions, PortOperatingMode};
 use harmony::{
-    data::Version,
-    infra::brocade::BrocadeSwitchClient,
-    interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
+    infra::brocade::BrocadeSwitchConfig,
    inventory::Inventory,
-    score::Score,
-    topology::{
-        HostNetworkConfig, PortConfig, PreparationError, PreparationOutcome, Switch, SwitchClient,
-        SwitchError, Topology,
-    },
+    modules::brocade::{BrocadeSwitchAuth, BrocadeSwitchScore, SwitchTopology},
 };
 use harmony_macros::ip;
-use harmony_types::{id::Id, net::MacAddress, switch::PortLocation};
-use log::{debug, info};
-use serde::Serialize;
+use harmony_types::{id::Id, switch::PortLocation};
+
+fn get_switch_config() -> BrocadeSwitchConfig {
+    let mut options = BrocadeOptions::default();
+    options.ssh.port = 2222;
+    let auth = BrocadeSwitchAuth {
+        username: "admin".to_string(),
+        password: "password".to_string(),
+    };
+
+    BrocadeSwitchConfig {
+        ips: vec![ip!("127.0.0.1")],
+        auth,
+        options,
+    }
+}

 #[tokio::main]
 async fn main() {
@@ -32,126 +38,13 @@ async fn main() {
            (PortLocation(1, 0, 18), PortOperatingMode::Trunk),
        ],
    };
+
    harmony_cli::run(
        Inventory::autoload(),
-        SwitchTopology::new().await,
+        SwitchTopology::new(get_switch_config()).await,
        vec![Box::new(switch_score)],
        None,
    )
    .await
    .unwrap();
 }
-
-#[derive(Clone, Debug, Serialize)]
-struct BrocadeSwitchScore {
-    port_channels_to_clear: Vec<Id>,
-    ports_to_configure: Vec<PortConfig>,
-}
-
-impl<T: Topology + Switch> Score<T> for BrocadeSwitchScore {
-    fn name(&self) -> String {
-        "BrocadeSwitchScore".to_string()
-    }
-
-    #[doc(hidden)]
-    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
-        Box::new(BrocadeSwitchInterpret {
-            score: self.clone(),
-        })
-    }
-}
-
-#[derive(Debug)]
-struct BrocadeSwitchInterpret {
-    score: BrocadeSwitchScore,
-}
-
-#[async_trait]
-impl<T: Topology + Switch> Interpret<T> for BrocadeSwitchInterpret {
-    async fn execute(
-        &self,
-        _inventory: &Inventory,
-        topology: &T,
-    ) -> Result<Outcome, InterpretError> {
-        info!("Applying switch configuration {:?}", self.score);
-        debug!(
-            "Clearing port channel {:?}",
-            self.score.port_channels_to_clear
-        );
-        topology
-            .clear_port_channel(&self.score.port_channels_to_clear)
-            .await
-            .map_err(|e| InterpretError::new(e.to_string()))?;
-        debug!("Configuring interfaces {:?}", self.score.ports_to_configure);
-        topology
-            .configure_interface(&self.score.ports_to_configure)
-            .await
-            .map_err(|e| InterpretError::new(e.to_string()))?;
-        Ok(Outcome::success("switch configured".to_string()))
-    }
-    fn get_name(&self) -> InterpretName {
-        InterpretName::Custom("BrocadeSwitchInterpret")
-    }
-    fn get_version(&self) -> Version {
-        todo!()
-    }
-    fn get_status(&self) -> InterpretStatus {
-        todo!()
-    }
-    fn get_children(&self) -> Vec<Id> {
-        todo!()
-    }
-}
-
-struct SwitchTopology {
-    client: Box<dyn SwitchClient>,
-}
-
-#[async_trait]
-impl Topology for SwitchTopology {
-    fn name(&self) -> &str {
-        "SwitchTopology"
-    }
-
-    async fn ensure_ready(&self) -> Result<PreparationOutcome, PreparationError> {
-        Ok(PreparationOutcome::Noop)
-    }
-}
-
-impl SwitchTopology {
-    async fn new() -> Self {
-        let mut options = BrocadeOptions::default();
-        options.ssh.port = 2222;
-        let client =
-            BrocadeSwitchClient::init(&vec![ip!("127.0.0.1")], &"admin", &"password", options)
-                .await
-                .expect("Failed to connect to switch");
-
-        let client = Box::new(client);
-        Self { client }
-    }
-}
-
-#[async_trait]
-impl Switch for SwitchTopology {
-    async fn setup_switch(&self) -> Result<(), SwitchError> {
-        todo!()
-    }
-
-    async fn get_port_for_mac_address(
-        &self,
-        _mac_address: &MacAddress,
-    ) -> Result<Option<PortLocation>, SwitchError> {
-        todo!()
-    }
-
-    async fn configure_port_channel(&self, _config: &HostNetworkConfig) -> Result<(), SwitchError> {
-        todo!()
-    }
-    async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
-        self.client.clear_port_channel(ids).await
-    }
-    async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
-        self.client.configure_interface(ports).await
-    }
-}
--- a/examples/cert_manager/src/main.rs
+++ b/examples/cert_manager/src/main.rs
@@ -1,8 +1,8 @@
 use harmony::{
    inventory::Inventory,
    modules::cert_manager::{
-        capability::CertificateManagementConfig, score_cert_management::CertificateManagementScore,
-        score_certificate::CertificateScore, score_issuer::CertificateIssuerScore,
+        capability::CertificateManagementConfig, score_certificate::CertificateScore,
+        score_issuer::CertificateIssuerScore,
    },
    topology::K8sAnywhereTopology,
 };
--- a/examples/k8s_drain_node/Cargo.toml
+++ b/examples/k8s_drain_node/Cargo.toml
@@ -0,0 +1,21 @@
+[package]
+name = "example-k8s-drain-node"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+publish = false
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony_types = { path = "../../harmony_types" }
+harmony_macros = { path = "../../harmony_macros" }
+harmony-k8s = { path = "../../harmony-k8s" }
+cidr.workspace = true
+tokio.workspace = true
+log.workspace = true
+env_logger.workspace = true
+url.workspace = true
+assert_cmd = "2.0.16"
+inquire.workspace = true
--- a/examples/k8s_drain_node/src/main.rs
+++ b/examples/k8s_drain_node/src/main.rs
@@ -0,0 +1,61 @@
+use std::time::Duration;
+
+use harmony_k8s::{DrainOptions, K8sClient};
+use log::{info, trace};
+
+#[tokio::main]
+async fn main() {
+    env_logger::init();
+    let k8s = K8sClient::try_default().await.unwrap();
+    let nodes = k8s.get_nodes(None).await.unwrap();
+    trace!("Got nodes : {nodes:#?}");
+    let node_names = nodes
+        .iter()
+        .map(|n| n.metadata.name.as_ref().unwrap())
+        .collect::<Vec<&String>>();
+
+    info!("Got nodes : {:?}", node_names);
+
+    let node_name = inquire::Select::new("What node do you want to operate on?", node_names)
+        .prompt()
+        .unwrap();
+
+    let drain = inquire::Confirm::new("Do you wish to drain the node now ?")
+        .prompt()
+        .unwrap();
+
+    if drain {
+        let mut options = DrainOptions::default_ignore_daemonset_delete_emptydir_data();
+        options.timeout = Duration::from_secs(1);
+        k8s.drain_node(&node_name, &options).await.unwrap();
+
+        info!("Node {node_name} successfully drained");
+    }
+
+    let uncordon =
+        inquire::Confirm::new("Do you wish to uncordon node to resume scheduling workloads now?")
+            .prompt()
+            .unwrap();
+
+    if uncordon {
+        info!("Uncordoning node {node_name}");
+        k8s.uncordon_node(node_name).await.unwrap();
+        info!("Node {node_name} uncordoned");
+    }
+
+    let reboot = inquire::Confirm::new("Do you wish to reboot node now?")
+        .prompt()
+        .unwrap();
+
+    if reboot {
+        k8s.reboot_node(
+            &node_name,
+            &DrainOptions::default_ignore_daemonset_delete_emptydir_data(),
+            Duration::from_secs(3600),
+        )
+        .await
+        .unwrap();
+    }
+
+    info!("All done playing with nodes, happy harmonizing!");
+}
--- a/examples/k8s_write_file_on_node/Cargo.toml
+++ b/examples/k8s_write_file_on_node/Cargo.toml
@@ -0,0 +1,21 @@
+[package]
+name = "example-k8s-write-file-on-node"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+publish = false
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony_types = { path = "../../harmony_types" }
+harmony_macros = { path = "../../harmony_macros" }
+harmony-k8s = { path = "../../harmony-k8s" }
+cidr.workspace = true
+tokio.workspace = true
+log.workspace = true
+env_logger.workspace = true
+url.workspace = true
+assert_cmd = "2.0.16"
+inquire.workspace = true
--- a/examples/k8s_write_file_on_node/src/main.rs
+++ b/examples/k8s_write_file_on_node/src/main.rs
@@ -0,0 +1,45 @@
+use harmony_k8s::{K8sClient, NodeFile};
+use log::{info, trace};
+
+#[tokio::main]
+async fn main() {
+    env_logger::init();
+    let k8s = K8sClient::try_default().await.unwrap();
+    let nodes = k8s.get_nodes(None).await.unwrap();
+    trace!("Got nodes : {nodes:#?}");
+    let node_names = nodes
+        .iter()
+        .map(|n| n.metadata.name.as_ref().unwrap())
+        .collect::<Vec<&String>>();
+
+    info!("Got nodes : {:?}", node_names);
+
+    let node = inquire::Select::new("What node do you want to write file to?", node_names)
+        .prompt()
+        .unwrap();
+
+    let path = inquire::Text::new("File path on node").prompt().unwrap();
+    let content = inquire::Text::new("File content").prompt().unwrap();
+
+    let node_file = NodeFile {
+        path: path,
+        content: content,
+        mode: 0o600,
+    };
+
+    k8s.write_files_to_node(&node, &vec![node_file.clone()])
+        .await
+        .unwrap();
+
+    let cmd = inquire::Text::new("Command to run on node")
+        .prompt()
+        .unwrap();
+    k8s.run_privileged_command_on_node(&node, &cmd)
+        .await
+        .unwrap();
+
+    info!(
+        "File {} mode {} written in node {node}",
+        node_file.path, node_file.mode
+    );
+}
--- a/examples/monitoring/src/main.rs
+++ b/examples/monitoring/src/main.rs
@@ -1,37 +1,45 @@
-use std::collections::HashMap;
+use std::{
+    collections::HashMap,
+    sync::{Arc, Mutex},
+};

 use harmony::{
    inventory::Inventory,
-    modules::{
-        monitoring::{
-            alert_channel::discord_alert_channel::DiscordWebhook,
-            alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
-            kube_prometheus::{
-                helm_prometheus_alert_score::HelmPrometheusAlertingScore,
-                types::{
-                    HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
-                    ServiceMonitorEndpoint,
+    modules::monitoring::{
+        alert_channel::discord_alert_channel::DiscordReceiver,
+        alert_rule::{
+            alerts::{
+                infra::dell_server::{
+                    alert_global_storage_status_critical,
+                    alert_global_storage_status_non_recoverable,
+                    global_storage_status_degraded_non_critical,
                },
+                k8s::pvc::high_pvc_fill_rate_over_two_days,
            },
+            prometheus_alert_rule::AlertManagerRuleGroup,
        },
-        prometheus::alerts::{
-            infra::dell_server::{
-                alert_global_storage_status_critical, alert_global_storage_status_non_recoverable,
-                global_storage_status_degraded_non_critical,
+        kube_prometheus::{
+            helm::config::KubePrometheusConfig,
+            kube_prometheus_alerting_score::KubePrometheusAlertingScore,
+            types::{
+                HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
+                ServiceMonitorEndpoint,
            },
-            k8s::pvc::high_pvc_fill_rate_over_two_days,
        },
    },
-    topology::K8sAnywhereTopology,
+    topology::{K8sAnywhereTopology, monitoring::AlertRoute},
 };
 use harmony_types::{k8s_name::K8sName, net::Url};

 #[tokio::main]
 async fn main() {
-    let discord_receiver = DiscordWebhook {
-        name: K8sName("test-discord".to_string()),
+    let receiver_name = "test-discord".to_string();
+    let discord_receiver = DiscordReceiver {
+        name: receiver_name.clone(),
        url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
-        selectors: vec![],
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
@@ -70,10 +78,15 @@ async fn main() {
        endpoints: vec![service_monitor_endpoint],
        ..Default::default()
    };
-    let alerting_score = HelmPrometheusAlertingScore {
+
+    let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
+
+    let alerting_score = KubePrometheusAlertingScore {
        receivers: vec![Box::new(discord_receiver)],
        rules: vec![Box::new(additional_rules), Box::new(additional_rules2)],
        service_monitors: vec![service_monitor],
+        scrape_targets: None,
+        config,
    };

    harmony_cli::run(
--- a/examples/monitoring_with_tenant/src/main.rs
+++ b/examples/monitoring_with_tenant/src/main.rs
@@ -1,24 +1,32 @@
-use std::{collections::HashMap, str::FromStr};
+use std::{
+    collections::HashMap,
+    str::FromStr,
+    sync::{Arc, Mutex},
+};

 use harmony::{
    inventory::Inventory,
    modules::{
        monitoring::{
-            alert_channel::discord_alert_channel::DiscordWebhook,
-            alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
+            alert_channel::discord_alert_channel::DiscordReceiver,
+            alert_rule::{
+                alerts::k8s::pvc::high_pvc_fill_rate_over_two_days,
+                prometheus_alert_rule::AlertManagerRuleGroup,
+            },
            kube_prometheus::{
-                helm_prometheus_alert_score::HelmPrometheusAlertingScore,
+                helm::config::KubePrometheusConfig,
+                kube_prometheus_alerting_score::KubePrometheusAlertingScore,
                types::{
                    HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
                    ServiceMonitorEndpoint,
                },
            },
        },
-        prometheus::alerts::k8s::pvc::high_pvc_fill_rate_over_two_days,
        tenant::TenantScore,
    },
    topology::{
        K8sAnywhereTopology,
+        monitoring::AlertRoute,
        tenant::{ResourceLimits, TenantConfig, TenantNetworkPolicy},
    },
 };
@@ -42,10 +50,13 @@ async fn main() {
        },
    };

-    let discord_receiver = DiscordWebhook {
-        name: K8sName("test-discord".to_string()),
+    let receiver_name = "test-discord".to_string();
+    let discord_receiver = DiscordReceiver {
+        name: receiver_name.clone(),
        url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
-        selectors: vec![],
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
@@ -74,10 +85,14 @@ async fn main() {
        ..Default::default()
    };

-    let alerting_score = HelmPrometheusAlertingScore {
+    let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
+
+    let alerting_score = KubePrometheusAlertingScore {
        receivers: vec![Box::new(discord_receiver)],
        rules: vec![Box::new(additional_rules)],
        service_monitors: vec![service_monitor],
+        scrape_targets: None,
+        config,
    };

    harmony_cli::run(
--- a/examples/nats-supercluster/src/main.rs
+++ b/examples/nats-supercluster/src/main.rs
@@ -215,7 +215,7 @@ fn site(
            dns_name: format!("{cluster_name}-gw.{domain}"),
            supercluster_ca_secret_name: "nats-supercluster-ca-bundle",
            tls_cert_name: "nats-gateway",
-            jetstream_enabled: "false",
+            jetstream_enabled: "true",
        },
    }
 }
--- a/examples/node_health/Cargo.toml
+++ b/examples/node_health/Cargo.toml
@@ -0,0 +1,16 @@
+[package]
+name = "example-node-health"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+publish = false
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony_types = { path = "../../harmony_types" }
+tokio = { workspace = true }
+harmony_macros = { path = "../../harmony_macros" }
+log = { workspace = true }
+env_logger = { workspace = true }
--- a/examples/node_health/src/main.rs
+++ b/examples/node_health/src/main.rs
@@ -0,0 +1,17 @@
+use harmony::{
+    inventory::Inventory, modules::node_health::NodeHealthScore, topology::K8sAnywhereTopology,
+};
+
+#[tokio::main]
+async fn main() {
+    let node_health = NodeHealthScore {};
+
+    harmony_cli::run(
+        Inventory::autoload(),
+        K8sAnywhereTopology::from_env(),
+        vec![Box::new(node_health)],
+        None,
+    )
+    .await
+    .unwrap();
+}
--- a/examples/okd_cluster_alerts/src/main.rs
+++ b/examples/okd_cluster_alerts/src/main.rs
@@ -1,35 +1,64 @@
-use std::collections::HashMap;
-
 use harmony::{
    inventory::Inventory,
    modules::monitoring::{
-        alert_channel::discord_alert_channel::DiscordWebhook,
-        okd::cluster_monitoring::OpenshiftClusterAlertScore,
+        alert_channel::discord_alert_channel::DiscordReceiver,
+        alert_rule::{
+            alerts::{
+                infra::opnsense::high_http_error_rate, k8s::pvc::high_pvc_fill_rate_over_two_days,
+            },
+            prometheus_alert_rule::AlertManagerRuleGroup,
+        },
+        okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
+        scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
+    },
+    topology::{
+        K8sAnywhereTopology,
+        monitoring::{AlertMatcher, AlertRoute, MatchOp},
    },
-    topology::K8sAnywhereTopology,
 };
-use harmony_macros::hurl;
-use harmony_types::k8s_name::K8sName;
+
+use harmony_macros::{hurl, ip};

 #[tokio::main]
 async fn main() {
-    let mut sel = HashMap::new();
-    sel.insert(
-        "openshift_io_alert_source".to_string(),
-        "platform".to_string(),
-    );
-    let mut sel2 = HashMap::new();
-    sel2.insert("openshift_io_alert_source".to_string(), "".to_string());
-    let selectors = vec![sel, sel2];
+    let platform_matcher = AlertMatcher {
+        label: "prometheus".to_string(),
+        operator: MatchOp::Eq,
+        value: "openshift-monitoring/k8s".to_string(),
+    };
+    let severity = AlertMatcher {
+        label: "severity".to_string(),
+        operator: MatchOp::Eq,
+        value: "critical".to_string(),
+    };
+
+    let high_http_error_rate = high_http_error_rate();
+
+    let additional_rules = AlertManagerRuleGroup::new("test-rule", vec![high_http_error_rate]);
+
+    let scrape_target = PrometheusNodeExporter {
+        job_name: "firewall".to_string(),
+        metrics_path: "/metrics".to_string(),
+        listen_address: ip!("192.168.1.1"),
+        port: 9100,
+        ..Default::default()
+    };
+
    harmony_cli::run(
        Inventory::autoload(),
        K8sAnywhereTopology::from_env(),
        vec![Box::new(OpenshiftClusterAlertScore {
-            receivers: vec![Box::new(DiscordWebhook {
-                name: K8sName("wills-discord-webhook-example".to_string()),
-                url: hurl!("https://something.io"),
-                selectors: selectors,
+            receivers: vec![Box::new(DiscordReceiver {
+                name: "crit-wills-discord-channel-example".to_string(),
+                url: hurl!("https://test.io"),
+                route: AlertRoute {
+                    matchers: vec![severity],
+                    ..AlertRoute::default("crit-wills-discord-channel-example".to_string())
+                },
            })],
+            sender: harmony::modules::monitoring::okd::OpenshiftClusterAlertSender,
+            rules: vec![Box::new(additional_rules)],
+            scrape_targets: Some(vec![Box::new(scrape_target)]),
        })],
        None,
    )
--- a/examples/okd_installation/src/topology.rs
+++ b/examples/okd_installation/src/topology.rs
@@ -2,8 +2,12 @@ use brocade::BrocadeOptions;
 use cidr::Ipv4Cidr;
 use harmony::{
    hardware::{Location, SwitchGroup},
-    infra::{brocade::BrocadeSwitchClient, opnsense::OPNSenseManagementInterface},
+    infra::{
+        brocade::{BrocadeSwitchClient, BrocadeSwitchConfig},
+        opnsense::OPNSenseManagementInterface,
+    },
    inventory::Inventory,
+    modules::brocade::BrocadeSwitchAuth,
    topology::{HAClusterTopology, LogicalHost, UnmanagedRouter},
 };
 use harmony_macros::{ip, ipv4};
@@ -36,12 +40,11 @@ pub async fn get_topology() -> HAClusterTopology {
        dry_run: *harmony::config::DRY_RUN,
        ..Default::default()
    };
-    let switch_client = BrocadeSwitchClient::init(
-        &switches,
-        &switch_auth.username,
-        &switch_auth.password,
-        brocade_options,
-    )
+    let switch_client = BrocadeSwitchClient::init(BrocadeSwitchConfig {
+        ips: switches,
+        auth: switch_auth,
+        options: brocade_options,
+    })
    .await
    .expect("Failed to connect to switch");

@@ -103,9 +106,3 @@ pub fn get_inventory() -> Inventory {
        control_plane_host: vec![],
    }
 }
-
-#[derive(Secret, Serialize, Deserialize, JsonSchema, Debug)]
-pub struct BrocadeSwitchAuth {
-    pub username: String,
-    pub password: String,
-}
--- a/examples/okd_pxe/src/topology.rs
+++ b/examples/okd_pxe/src/topology.rs
@@ -3,14 +3,16 @@ use cidr::Ipv4Cidr;
 use harmony::{
    config::secret::OPNSenseFirewallCredentials,
    hardware::{Location, SwitchGroup},
-    infra::{brocade::BrocadeSwitchClient, opnsense::OPNSenseManagementInterface},
+    infra::{
+        brocade::{BrocadeSwitchClient, BrocadeSwitchConfig},
+        opnsense::OPNSenseManagementInterface,
+    },
    inventory::Inventory,
+    modules::brocade::BrocadeSwitchAuth,
    topology::{HAClusterTopology, LogicalHost, UnmanagedRouter},
 };
 use harmony_macros::{ip, ipv4};
-use harmony_secret::{Secret, SecretManager};
-use schemars::JsonSchema;
-use serde::{Deserialize, Serialize};
+use harmony_secret::SecretManager;
 use std::{
    net::IpAddr,
    sync::{Arc, OnceLock},
@@ -31,12 +33,11 @@ pub async fn get_topology() -> HAClusterTopology {
        dry_run: *harmony::config::DRY_RUN,
        ..Default::default()
    };
-    let switch_client = BrocadeSwitchClient::init(
-        &switches,
-        &switch_auth.username,
-        &switch_auth.password,
-        brocade_options,
-    )
+    let switch_client = BrocadeSwitchClient::init(BrocadeSwitchConfig {
+        ips: switches,
+        auth: switch_auth,
+        options: brocade_options,
+    })
    .await
    .expect("Failed to connect to switch");

@@ -98,9 +99,3 @@ pub fn get_inventory() -> Inventory {
        control_plane_host: vec![],
    }
 }
-
-#[derive(Secret, Serialize, Deserialize, JsonSchema, Debug)]
-pub struct BrocadeSwitchAuth {
-    pub username: String,
-    pub password: String,
-}
--- a/examples/openbao/src/main.rs
+++ b/examples/openbao/src/main.rs
@@ -1,63 +1,13 @@
-use std::str::FromStr;
-
 use harmony::{
-    inventory::Inventory,
-    modules::helm::chart::{HelmChartScore, HelmRepository, NonBlankString},
-    topology::K8sAnywhereTopology,
+    inventory::Inventory, modules::openbao::OpenbaoScore, topology::K8sAnywhereTopology,
 };
-use harmony_macros::hurl;

 #[tokio::main]
 async fn main() {
-    let values_yaml = Some(
-        r#"server:
-  standalone:
-    enabled: true
-    config: |
-      listener "tcp" {
-        tls_disable = true
-        address = "[::]:8200"
-        cluster_address = "[::]:8201"
-      }
-
-      storage "file" {
-        path = "/openbao/data"
-      }
-
-  service:
-    enabled: true
-
-  dataStorage:
-    enabled: true
-    size: 10Gi
-    storageClass: null
-    accessMode: ReadWriteOnce
-
-  auditStorage:
-    enabled: true
-    size: 10Gi
-    storageClass: null
-    accessMode: ReadWriteOnce"#
-            .to_string(),
-    );
-    let openbao = HelmChartScore {
-        namespace: Some(NonBlankString::from_str("openbao").unwrap()),
-        release_name: NonBlankString::from_str("openbao").unwrap(),
-        chart_name: NonBlankString::from_str("openbao/openbao").unwrap(),
-        chart_version: None,
-        values_overrides: None,
-        values_yaml,
-        create_namespace: true,
-        install_only: true,
-        repository: Some(HelmRepository::new(
-            "openbao".to_string(),
-            hurl!("https://openbao.github.io/openbao-helm"),
-            true,
-        )),
+    let openbao = OpenbaoScore {
+        host: "openbao.sebastien.sto1.nationtech.io".to_string(),
    };

-    // TODO exec pod commands to initialize secret store if not already done
-
    harmony_cli::run(
        Inventory::autoload(),
        K8sAnywhereTopology::from_env(),
--- a/examples/operatorhub_catalog/src/main.rs
+++ b/examples/operatorhub_catalog/src/main.rs
@@ -1,5 +1,3 @@
-use std::str::FromStr;
-
 use harmony::{
    inventory::Inventory,
    modules::{k8s::apps::OperatorHubCatalogSourceScore, postgresql::CloudNativePgOperatorScore},
@@ -9,7 +7,7 @@ use harmony::{
 #[tokio::main]
 async fn main() {
    let operatorhub_catalog = OperatorHubCatalogSourceScore::default();
-    let cnpg_operator = CloudNativePgOperatorScore::default();
+    let cnpg_operator = CloudNativePgOperatorScore::default_openshift();

    harmony_cli::run(
        Inventory::autoload(),
--- a/examples/opnsense_node_exporter/src/main.rs
+++ b/examples/opnsense_node_exporter/src/main.rs
@@ -1,22 +1,13 @@
-use std::{
-    net::{IpAddr, Ipv4Addr},
-    sync::Arc,
-};
+use std::sync::Arc;

 use async_trait::async_trait;
-use cidr::Ipv4Cidr;
 use harmony::{
    executors::ExecutorError,
-    hardware::{HostCategory, Location, PhysicalHost, SwitchGroup},
-    infra::opnsense::OPNSenseManagementInterface,
    inventory::Inventory,
    modules::opnsense::node_exporter::NodeExporterScore,
-    topology::{
-        HAClusterTopology, LogicalHost, PreparationError, PreparationOutcome, Topology,
-        UnmanagedRouter, node_exporter::NodeExporter,
-    },
+    topology::{PreparationError, PreparationOutcome, Topology, node_exporter::NodeExporter},
 };
-use harmony_macros::{ip, ipv4, mac_address};
+use harmony_macros::ip;

 #[derive(Debug)]
 struct OpnSenseTopology {
--- a/examples/public_postgres/src/main.rs
+++ b/examples/public_postgres/src/main.rs
@@ -1,8 +1,7 @@
 use harmony::{
    inventory::Inventory,
    modules::postgresql::{
-        K8sPostgreSQLScore, PostgreSQLConnectionScore, PublicPostgreSQLScore,
-        capability::PostgreSQLConfig,
+        PostgreSQLConnectionScore, PublicPostgreSQLScore, capability::PostgreSQLConfig,
    },
    topology::K8sAnywhereTopology,
 };
--- a/examples/rhob_application_monitoring/src/main.rs
+++ b/examples/rhob_application_monitoring/src/main.rs
@@ -1,4 +1,4 @@
-use std::{collections::HashMap, path::PathBuf, sync::Arc};
+use std::{path::PathBuf, sync::Arc};

 use harmony::{
    inventory::Inventory,
@@ -6,9 +6,9 @@ use harmony::{
        application::{
            ApplicationScore, RustWebFramework, RustWebapp, features::rhob_monitoring::Monitoring,
        },
-        monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
+        monitoring::alert_channel::discord_alert_channel::DiscordReceiver,
    },
-    topology::K8sAnywhereTopology,
+    topology::{K8sAnywhereTopology, monitoring::AlertRoute},
 };
 use harmony_types::{k8s_name::K8sName, net::Url};

@@ -22,18 +22,21 @@ async fn main() {
        service_port: 3000,
    });

-    let discord_receiver = DiscordWebhook {
-        name: K8sName("test-discord".to_string()),
+    let receiver_name = "test-discord".to_string();
+    let discord_receiver = DiscordReceiver {
+        name: receiver_name.clone(),
        url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
-        selectors: vec![],
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let app = ApplicationScore {
        features: vec![
-            Box::new(Monitoring {
-                application: application.clone(),
-                alert_receiver: vec![Box::new(discord_receiver)],
-            }),
+            // Box::new(Monitoring {
+            //     application: application.clone(),
+            //     alert_receiver: vec![Box::new(discord_receiver)],
+            // }),
            // TODO add backups, multisite ha, etc
        ],
        application,
--- a/examples/rust/src/main.rs
+++ b/examples/rust/src/main.rs
@@ -1,4 +1,4 @@
-use std::{collections::HashMap, path::PathBuf, sync::Arc};
+use std::{path::PathBuf, sync::Arc};

 use harmony::{
    inventory::Inventory,
@@ -8,13 +8,13 @@ use harmony::{
            features::{Monitoring, PackagingDeployment},
        },
        monitoring::alert_channel::{
-            discord_alert_channel::DiscordWebhook, webhook_receiver::WebhookReceiver,
+            discord_alert_channel::DiscordReceiver, webhook_receiver::WebhookReceiver,
        },
    },
-    topology::K8sAnywhereTopology,
+    topology::{K8sAnywhereTopology, monitoring::AlertRoute},
 };
 use harmony_macros::hurl;
-use harmony_types::k8s_name::K8sName;
+use harmony_types::{k8s_name::K8sName, net::Url};

 #[tokio::main]
 async fn main() {
@@ -26,15 +26,23 @@ async fn main() {
        service_port: 3000,
    });

-    let discord_receiver = DiscordWebhook {
-        name: K8sName("test-discord".to_string()),
-        url: hurl!("https://discord.doesnt.exist.com"),
-        selectors: vec![],
+    let receiver_name = "test-discord".to_string();
+    let discord_receiver = DiscordReceiver {
+        name: receiver_name.clone(),
+        url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

+    let receiver_name = "sample-webhook-receiver".to_string();
+
    let webhook_receiver = WebhookReceiver {
-        name: "sample-webhook-receiver".to_string(),
+        name: receiver_name.clone(),
        url: hurl!("https://webhook-doesnt-exist.com"),
+        route: AlertRoute {
+            ..AlertRoute::default(receiver_name)
+        },
    };

    let app = ApplicationScore {
@@ -42,10 +50,10 @@ async fn main() {
            Box::new(PackagingDeployment {
                application: application.clone(),
            }),
-            Box::new(Monitoring {
-                application: application.clone(),
-                alert_receiver: vec![Box::new(discord_receiver), Box::new(webhook_receiver)],
-            }),
+            // Box::new(Monitoring {
+            //     application: application.clone(),
+            //     alert_receiver: vec![Box::new(discord_receiver), Box::new(webhook_receiver)],
+            // }),
            // TODO add backups, multisite ha, etc
        ],
        application,
--- a/examples/try_rust_webapp/Cargo.toml
+++ b/examples/try_rust_webapp/Cargo.toml
@@ -5,6 +5,10 @@ version.workspace = true
 readme.workspace = true
 license.workspace = true

+[[example]]
+name = "try_rust_webapp"
+path = "src/main.rs"
+
 [dependencies]
 harmony = { path = "../../harmony" }
 harmony_cli = { path = "../../harmony_cli" }
--- a/examples/try_rust_webapp/src/main.rs
+++ b/examples/try_rust_webapp/src/main.rs
@@ -1,11 +1,8 @@
 use harmony::{
    inventory::Inventory,
-    modules::{
-        application::{
-            ApplicationScore, RustWebFramework, RustWebapp,
-            features::{Monitoring, PackagingDeployment},
-        },
-        monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
+    modules::application::{
+        ApplicationScore, RustWebFramework, RustWebapp,
+        features::{Monitoring, PackagingDeployment},
    },
    topology::K8sAnywhereTopology,
 };
@@ -30,14 +27,14 @@ async fn main() {
            Box::new(PackagingDeployment {
                application: application.clone(),
            }),
-            Box::new(Monitoring {
-                application: application.clone(),
-                alert_receiver: vec![Box::new(DiscordWebhook {
-                    name: K8sName("test-discord".to_string()),
-                    url: hurl!("https://discord.doesnt.exist.com"),
-                    selectors: vec![],
-                })],
-            }),
+            // Box::new(Monitoring {
+            //     application: application.clone(),
+            //     alert_receiver: vec![Box::new(DiscordWebhook {
+            //         name: K8sName("test-discord".to_string()),
+            //         url: hurl!("https://discord.doesnt.exist.com"),
+            //         selectors: vec![],
+            //     })],
+            // }),
        ],
        application,
    };
--- a/examples/zitadel/Cargo.toml
+++ b/examples/zitadel/Cargo.toml
@@ -0,0 +1,14 @@
+[package]
+name = "example-zitadel"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony_macros = { path = "../../harmony_macros" }
+harmony_types = { path = "../../harmony_types" }
+tokio.workspace = true
+url.workspace = true
--- a/examples/zitadel/src/main.rs
+++ b/examples/zitadel/src/main.rs
@@ -0,0 +1,20 @@
+use harmony::{
+    inventory::Inventory, modules::zitadel::ZitadelScore, topology::K8sAnywhereTopology,
+};
+
+#[tokio::main]
+async fn main() {
+    let zitadel = ZitadelScore {
+        host: "sso.sto1.nationtech.io".to_string(),
+        zitadel_version: "v4.12.1".to_string(),
+    };
+
+    harmony_cli::run(
+        Inventory::autoload(),
+        K8sAnywhereTopology::from_env(),
+        vec![Box::new(zitadel)],
+        None,
+    )
+    .await
+    .unwrap();
+}
--- a/examples/zitadel/zitadel-9.24.0.tgz
+++ b/examples/zitadel/zitadel-9.24.0.tgz
--- a/harmony-k8s/Cargo.toml
+++ b/harmony-k8s/Cargo.toml
@@ -0,0 +1,23 @@
+[package]
+name = "harmony-k8s"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+
+[dependencies]
+kube.workspace = true
+k8s-openapi.workspace = true
+tokio.workspace = true
+tokio-retry.workspace = true
+serde.workspace = true
+serde_json.workspace = true
+serde_yaml.workspace = true
+log.workspace = true
+similar.workspace = true
+reqwest.workspace = true
+url.workspace = true
+inquire.workspace = true
+
+[dev-dependencies]
+pretty_assertions.workspace = true
--- a/harmony-k8s/src/apply.rs
+++ b/harmony-k8s/src/apply.rs
@@ -0,0 +1,593 @@
+use kube::{
+    Client, Error, Resource,
+    api::{
+        Api, ApiResource, DynamicObject, GroupVersionKind, Patch, PatchParams, PostParams,
+        ResourceExt,
+    },
+    core::ErrorResponse,
+    discovery::Scope,
+    error::DiscoveryError,
+};
+use log::{debug, error, trace, warn};
+use serde::{Serialize, de::DeserializeOwned};
+use serde_json::Value;
+use similar::TextDiff;
+use url::Url;
+
+use crate::client::K8sClient;
+use crate::helper;
+use crate::types::WriteMode;
+
+/// The field-manager token sent with every server-side apply request.
+pub const FIELD_MANAGER: &str = "harmony-k8s";
+
+// ── Private helpers ──────────────────────────────────────────────────────────
+
+/// Serialise any `Serialize` payload to a [`DynamicObject`] via JSON.
+fn to_dynamic<T: Serialize>(payload: &T) -> Result<DynamicObject, Error> {
+    serde_json::from_value(serde_json::to_value(payload).map_err(Error::SerdeError)?)
+        .map_err(Error::SerdeError)
+}
+
+/// Fetch the current resource, display a unified diff against `payload`, and
+/// return `()`.  All output goes to stdout (same behaviour as before).
+///
+/// A 404 is treated as "resource would be created" — not an error.
+async fn show_dry_run<T: Serialize>(
+    api: &Api<DynamicObject>,
+    name: &str,
+    payload: &T,
+) -> Result<(), Error> {
+    let new_yaml = serde_yaml::to_string(payload)
+        .unwrap_or_else(|_| "Failed to serialize new resource".to_string());
+
+    match api.get(name).await {
+        Ok(current) => {
+            println!("\nDry-run for resource: '{name}'");
+            let mut current_val = serde_yaml::to_value(&current).unwrap_or(serde_yaml::Value::Null);
+            if let Some(map) = current_val.as_mapping_mut() {
+                map.remove(&serde_yaml::Value::String("status".to_string()));
+            }
+            let current_yaml = serde_yaml::to_string(&current_val)
+                .unwrap_or_else(|_| "Failed to serialize current resource".to_string());
+
+            if current_yaml == new_yaml {
+                println!("No changes detected.");
+            } else {
+                println!("Changes detected:");
+                let diff = TextDiff::from_lines(&current_yaml, &new_yaml);
+                for change in diff.iter_all_changes() {
+                    let sign = match change.tag() {
+                        similar::ChangeTag::Delete => "-",
+                        similar::ChangeTag::Insert => "+",
+                        similar::ChangeTag::Equal => " ",
+                    };
+                    print!("{sign}{change}");
+                }
+            }
+            Ok(())
+        }
+        Err(Error::Api(ErrorResponse { code: 404, .. })) => {
+            println!("\nDry-run for new resource: '{name}'");
+            println!("Resource does not exist. Would be created:");
+            for line in new_yaml.lines() {
+                println!("+{line}");
+            }
+            Ok(())
+        }
+        Err(e) => {
+            error!("Failed to fetch resource '{name}' for dry-run: {e}");
+            Err(e)
+        }
+    }
+}
+
+/// Execute the real (non-dry-run) apply, respecting [`WriteMode`].
+async fn do_apply<T: Serialize + std::fmt::Debug>(
+    api: &Api<DynamicObject>,
+    name: &str,
+    payload: &T,
+    patch_params: &PatchParams,
+    write_mode: &WriteMode,
+) -> Result<DynamicObject, Error> {
+    match write_mode {
+        WriteMode::CreateOrUpdate => {
+            // TODO refactor this arm to perform self.update and if fail with 404 self.create
+            // This will avoid the repetition of the api.patch and api.create calls within this
+            // function body. This makes the code more maintainable
+            match api.patch(name, patch_params, &Patch::Apply(payload)).await {
+                Ok(obj) => Ok(obj),
+                Err(Error::Api(ErrorResponse { code: 404, .. })) => {
+                    debug!("Resource '{name}' not found via SSA, falling back to POST");
+                    let dyn_obj = to_dynamic(payload)?;
+                    api.create(&PostParams::default(), &dyn_obj)
+                        .await
+                        .map_err(|e| {
+                            error!("Failed to create '{name}': {e}");
+                            e
+                        })
+                }
+                Err(e) => {
+                    error!("Failed to apply '{name}': {e}");
+                    Err(e)
+                }
+            }
+        }
+        WriteMode::Create => {
+            let dyn_obj = to_dynamic(payload)?;
+            api.create(&PostParams::default(), &dyn_obj)
+                .await
+                .map_err(|e| {
+                    error!("Failed to create '{name}': {e}");
+                    e
+                })
+        }
+        WriteMode::Update => match api.patch(name, patch_params, &Patch::Apply(payload)).await {
+            Ok(obj) => Ok(obj),
+            Err(Error::Api(ErrorResponse { code: 404, .. })) => Err(Error::Api(ErrorResponse {
+                code: 404,
+                message: format!("Resource '{name}' not found and WriteMode is UpdateOnly"),
+                reason: "NotFound".to_string(),
+                status: "Failure".to_string(),
+            })),
+            Err(e) => {
+                error!("Failed to update '{name}': {e}");
+                Err(e)
+            }
+        },
+    }
+}
+
+// ── Public API ───────────────────────────────────────────────────────────────
+
+impl K8sClient {
+    /// Server-side apply: create if absent, update if present.
+    /// Equivalent to `kubectl apply`.
+    pub async fn apply<K>(&self, resource: &K, namespace: Option<&str>) -> Result<K, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        self.apply_with_strategy(resource, namespace, WriteMode::CreateOrUpdate)
+            .await
+    }
+
+    /// POST only — returns an error if the resource already exists.
+    pub async fn create<K>(&self, resource: &K, namespace: Option<&str>) -> Result<K, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        self.apply_with_strategy(resource, namespace, WriteMode::Create)
+            .await
+    }
+
+    /// Server-side apply only — returns an error if the resource does not exist.
+    pub async fn update<K>(&self, resource: &K, namespace: Option<&str>) -> Result<K, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        self.apply_with_strategy(resource, namespace, WriteMode::Update)
+            .await
+    }
+
+    pub async fn apply_with_strategy<K>(
+        &self,
+        resource: &K,
+        namespace: Option<&str>,
+        write_mode: WriteMode,
+    ) -> Result<K, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        debug!(
+            "apply_with_strategy: {:?} ns={:?}",
+            resource.meta().name,
+            namespace
+        );
+        trace!("{:#}", serde_json::to_value(resource).unwrap_or_default());
+
+        let dyntype = K::DynamicType::default();
+        let gvk = GroupVersionKind {
+            group: K::group(&dyntype).to_string(),
+            version: K::version(&dyntype).to_string(),
+            kind: K::kind(&dyntype).to_string(),
+        };
+
+        let discovery = self.discovery().await?;
+        let (ar, caps) = discovery.resolve_gvk(&gvk).ok_or_else(|| {
+            Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Cannot resolve GVK: {gvk:?}"
+            )))
+        })?;
+
+        let effective_ns = if caps.scope == Scope::Cluster {
+            None
+        } else {
+            namespace.or_else(|| resource.meta().namespace.as_deref())
+        };
+
+        let api: Api<DynamicObject> =
+            get_dynamic_api(ar, caps, self.client.clone(), effective_ns, false);
+
+        let name = resource
+            .meta()
+            .name
+            .as_deref()
+            .expect("Kubernetes resource must have a name");
+
+        if self.dry_run {
+            show_dry_run(&api, name, resource).await?;
+            return Ok(resource.clone());
+        }
+
+        let patch_params = PatchParams::apply(FIELD_MANAGER);
+        do_apply(&api, name, resource, &patch_params, &write_mode)
+            .await
+            .and_then(helper::dyn_to_typed)
+    }
+
+    /// Applies resources in order, one at a time
+    pub async fn apply_many<K>(&self, resources: &[K], ns: Option<&str>) -> Result<Vec<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        let mut result = Vec::new();
+        for r in resources.iter() {
+            let res = self.apply(r, ns).await;
+            if res.is_err() {
+                // NOTE: this may log sensitive data; downgrade to debug if needed.
+                warn!(
+                    "Failed to apply k8s resource: {}",
+                    serde_json::to_string_pretty(r).map_err(Error::SerdeError)?
+                );
+            }
+            result.push(res?);
+        }
+        Ok(result)
+    }
+
+    /// Apply a [`DynamicObject`] resource using server-side apply.
+    pub async fn apply_dynamic(
+        &self,
+        resource: &DynamicObject,
+        namespace: Option<&str>,
+        force_conflicts: bool,
+    ) -> Result<DynamicObject, Error> {
+        trace!("apply_dynamic {resource:#?} ns={namespace:?} force={force_conflicts}");
+
+        let discovery = self.discovery().await?;
+        let type_meta = resource.types.as_ref().ok_or_else(|| {
+            Error::BuildRequest(kube::core::request::Error::Validation(
+                "DynamicObject must have types (apiVersion and kind)".to_string(),
+            ))
+        })?;
+
+        let gvk = GroupVersionKind::try_from(type_meta).map_err(|_| {
+            Error::BuildRequest(kube::core::request::Error::Validation(format!(
+                "Invalid GVK in DynamicObject: {type_meta:?}"
+            )))
+        })?;
+
+        let (ar, caps) = discovery.resolve_gvk(&gvk).ok_or_else(|| {
+            Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Cannot resolve GVK: {gvk:?}"
+            )))
+        })?;
+
+        let effective_ns = if caps.scope == Scope::Cluster {
+            None
+        } else {
+            namespace.or_else(|| resource.metadata.namespace.as_deref())
+        };
+
+        let api = get_dynamic_api(ar, caps, self.client.clone(), effective_ns, false);
+        let name = resource.metadata.name.as_deref().ok_or_else(|| {
+            Error::BuildRequest(kube::core::request::Error::Validation(
+                "DynamicObject must have metadata.name".to_string(),
+            ))
+        })?;
+
+        debug!(
+            "apply_dynamic kind={:?} name='{name}' ns={effective_ns:?}",
+            resource.types.as_ref().map(|t| &t.kind),
+        );
+
+        // NOTE would be nice to improve cohesion between the dynamic and typed apis and avoid copy
+        // pasting the dry_run and some more logic
+        if self.dry_run {
+            show_dry_run(&api, name, resource).await?;
+            return Ok(resource.clone());
+        }
+
+        let mut patch_params = PatchParams::apply(FIELD_MANAGER);
+        patch_params.force = force_conflicts;
+
+        do_apply(
+            &api,
+            name,
+            resource,
+            &patch_params,
+            &WriteMode::CreateOrUpdate,
+        )
+        .await
+    }
+
+    pub async fn apply_dynamic_many(
+        &self,
+        resources: &[DynamicObject],
+        namespace: Option<&str>,
+        force_conflicts: bool,
+    ) -> Result<Vec<DynamicObject>, Error> {
+        let mut result = Vec::new();
+        for r in resources.iter() {
+            result.push(self.apply_dynamic(r, namespace, force_conflicts).await?);
+        }
+        Ok(result)
+    }
+
+    pub async fn apply_yaml_many(
+        &self,
+        #[allow(clippy::ptr_arg)] yaml: &Vec<serde_yaml::Value>,
+        ns: Option<&str>,
+    ) -> Result<(), Error> {
+        for y in yaml.iter() {
+            self.apply_yaml(y, ns).await?;
+        }
+        Ok(())
+    }
+
+    pub async fn apply_yaml(
+        &self,
+        yaml: &serde_yaml::Value,
+        ns: Option<&str>,
+    ) -> Result<(), Error> {
+        // NOTE wouldn't it be possible to parse this into a DynamicObject and simply call
+        // apply_dynamic instead of reimplementing api interactions?
+        let obj: DynamicObject =
+            serde_yaml::from_value(yaml.clone()).expect("YAML must deserialise to DynamicObject");
+        let name = obj.metadata.name.as_ref().expect("YAML must have a name");
+
+        let api_version = yaml["apiVersion"].as_str().expect("missing apiVersion");
+        let kind = yaml["kind"].as_str().expect("missing kind");
+
+        let mut it = api_version.splitn(2, '/');
+        let first = it.next().unwrap();
+        let (g, v) = match it.next() {
+            Some(second) => (first, second),
+            None => ("", first),
+        };
+
+        let api_resource = ApiResource::from_gvk(&GroupVersionKind::gvk(g, v, kind));
+        let namespace = ns.unwrap_or_else(|| {
+            obj.metadata
+                .namespace
+                .as_deref()
+                .expect("YAML must have a namespace when ns is not provided")
+        });
+
+        let api: Api<DynamicObject> =
+            Api::namespaced_with(self.client.clone(), namespace, &api_resource);
+
+        println!("Applying '{name}' in namespace '{namespace}'...");
+        let patch_params = PatchParams::apply(FIELD_MANAGER);
+        let result = api.patch(name, &patch_params, &Patch::Apply(&obj)).await?;
+        println!("Successfully applied '{}'.", result.name_any());
+        Ok(())
+    }
+
+    /// Equivalent to `kubectl apply -f <url>`.
+    pub async fn apply_url(&self, url: Url, ns: Option<&str>) -> Result<(), Error> {
+        let patch_params = PatchParams::apply(FIELD_MANAGER);
+        let discovery = self.discovery().await?;
+
+        let yaml = reqwest::get(url)
+            .await
+            .expect("Could not fetch URL")
+            .text()
+            .await
+            .expect("Could not read response body");
+
+        for doc in multidoc_deserialize(&yaml).expect("Failed to parse YAML from URL") {
+            let obj: DynamicObject =
+                serde_yaml::from_value(doc).expect("YAML document is not a valid object");
+            let namespace = obj.metadata.namespace.as_deref().or(ns);
+            let type_meta = obj.types.as_ref().expect("Object is missing TypeMeta");
+            let gvk =
+                GroupVersionKind::try_from(type_meta).expect("Object has invalid GroupVersionKind");
+            let name = obj.name_any();
+
+            if let Some((ar, caps)) = discovery.resolve_gvk(&gvk) {
+                let api = get_dynamic_api(ar, caps, self.client.clone(), namespace, false);
+                trace!(
+                    "Applying {}:\n{}",
+                    gvk.kind,
+                    serde_yaml::to_string(&obj).unwrap_or_default()
+                );
+                let data: Value = serde_json::to_value(&obj).expect("serialisation failed");
+                let _r = api.patch(&name, &patch_params, &Patch::Apply(data)).await?;
+                debug!("Applied {} '{name}'", gvk.kind);
+            } else {
+                warn!("Skipping document with unknown GVK: {gvk:?}");
+            }
+        }
+        Ok(())
+    }
+
+    /// Build a dynamic API client from a [`DynamicObject`]'s type metadata.
+    pub(crate) fn get_api_for_dynamic_object(
+        &self,
+        object: &DynamicObject,
+        ns: Option<&str>,
+    ) -> Result<Api<DynamicObject>, Error> {
+        let ar = object
+            .types
+            .as_ref()
+            .and_then(|t| {
+                let parts: Vec<&str> = t.api_version.split('/').collect();
+                match parts.as_slice() {
+                    [version] => Some(ApiResource::from_gvk(&GroupVersionKind::gvk(
+                        "", version, &t.kind,
+                    ))),
+                    [group, version] => Some(ApiResource::from_gvk(&GroupVersionKind::gvk(
+                        group, version, &t.kind,
+                    ))),
+                    _ => None,
+                }
+            })
+            .ok_or_else(|| {
+                Error::BuildRequest(kube::core::request::Error::Validation(format!(
+                    "Invalid apiVersion in DynamicObject: {object:#?}"
+                )))
+            })?;
+
+        Ok(match ns {
+            Some(ns) => Api::namespaced_with(self.client.clone(), ns, &ar),
+            None => Api::default_namespaced_with(self.client.clone(), &ar),
+        })
+    }
+}
+
+// ── Free functions ───────────────────────────────────────────────────────────
+
+pub(crate) fn get_dynamic_api(
+    resource: kube::api::ApiResource,
+    capabilities: kube::discovery::ApiCapabilities,
+    client: Client,
+    ns: Option<&str>,
+    all: bool,
+) -> Api<DynamicObject> {
+    if capabilities.scope == Scope::Cluster || all {
+        Api::all_with(client, &resource)
+    } else if let Some(namespace) = ns {
+        Api::namespaced_with(client, namespace, &resource)
+    } else {
+        Api::default_namespaced_with(client, &resource)
+    }
+}
+
+pub(crate) fn multidoc_deserialize(
+    data: &str,
+) -> Result<Vec<serde_yaml::Value>, serde_yaml::Error> {
+    use serde::Deserialize;
+    let mut docs = vec![];
+    for de in serde_yaml::Deserializer::from_str(data) {
+        docs.push(serde_yaml::Value::deserialize(de)?);
+    }
+    Ok(docs)
+}
+
+// ── Tests ────────────────────────────────────────────────────────────────────
+
+#[cfg(test)]
+mod apply_tests {
+    use std::collections::BTreeMap;
+    use std::time::{SystemTime, UNIX_EPOCH};
+
+    use k8s_openapi::api::core::v1::ConfigMap;
+    use k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta;
+    use kube::api::{DeleteParams, TypeMeta};
+
+    use super::*;
+
+    #[tokio::test]
+    #[ignore = "requires kubernetes cluster"]
+    async fn apply_creates_new_configmap() {
+        let client = K8sClient::try_default().await.unwrap();
+        let ns = "default";
+        let name = format!(
+            "test-cm-{}",
+            SystemTime::now()
+                .duration_since(UNIX_EPOCH)
+                .unwrap()
+                .as_millis()
+        );
+
+        let cm = ConfigMap {
+            metadata: ObjectMeta {
+                name: Some(name.clone()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            data: Some(BTreeMap::from([("key1".to_string(), "value1".to_string())])),
+            ..Default::default()
+        };
+
+        assert!(client.apply(&cm, Some(ns)).await.is_ok());
+
+        let api: Api<ConfigMap> = Api::namespaced(client.client.clone(), ns);
+        let _ = api.delete(&name, &DeleteParams::default()).await;
+    }
+
+    #[tokio::test]
+    #[ignore = "requires kubernetes cluster"]
+    async fn apply_is_idempotent() {
+        let client = K8sClient::try_default().await.unwrap();
+        let ns = "default";
+        let name = format!(
+            "test-idem-{}",
+            SystemTime::now()
+                .duration_since(UNIX_EPOCH)
+                .unwrap()
+                .as_millis()
+        );
+
+        let cm = ConfigMap {
+            metadata: ObjectMeta {
+                name: Some(name.clone()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            data: Some(BTreeMap::from([("key".to_string(), "value".to_string())])),
+            ..Default::default()
+        };
+
+        assert!(
+            client.apply(&cm, Some(ns)).await.is_ok(),
+            "first apply failed"
+        );
+        assert!(
+            client.apply(&cm, Some(ns)).await.is_ok(),
+            "second apply failed (not idempotent)"
+        );
+
+        let api: Api<ConfigMap> = Api::namespaced(client.client.clone(), ns);
+        let _ = api.delete(&name, &DeleteParams::default()).await;
+    }
+
+    #[tokio::test]
+    #[ignore = "requires kubernetes cluster"]
+    async fn apply_dynamic_creates_new_resource() {
+        let client = K8sClient::try_default().await.unwrap();
+        let ns = "default";
+        let name = format!(
+            "test-dyn-{}",
+            SystemTime::now()
+                .duration_since(UNIX_EPOCH)
+                .unwrap()
+                .as_millis()
+        );
+
+        let obj = DynamicObject {
+            types: Some(TypeMeta {
+                api_version: "v1".to_string(),
+                kind: "ConfigMap".to_string(),
+            }),
+            metadata: ObjectMeta {
+                name: Some(name.clone()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            data: serde_json::json!({}),
+        };
+
+        let result = client.apply_dynamic(&obj, Some(ns), false).await;
+        assert!(result.is_ok(), "apply_dynamic failed: {:?}", result.err());
+
+        let api: Api<ConfigMap> = Api::namespaced(client.client.clone(), ns);
+        let _ = api.delete(&name, &DeleteParams::default()).await;
+    }
+}
--- a/harmony-k8s/src/bundle.rs
+++ b/harmony-k8s/src/bundle.rs
@@ -0,0 +1,133 @@
+//! Resource Bundle Pattern Implementation
+//!
+//! This module implements the Resource Bundle pattern for managing groups of
+//! Kubernetes resources that form a logical unit of work.
+//!
+//! ## Purpose
+//!
+//! The ResourceBundle pattern addresses the need to manage ephemeral privileged
+//! pods along with their platform-specific security requirements (e.g., OpenShift
+//! Security Context Constraints).
+//!
+//! ## Use Cases
+//!
+//! - Writing files to node filesystems (e.g., NetworkManager configurations for
+//!   network bonding as described in ADR-019)
+//! - Running privileged commands on nodes (e.g., reboots, system configuration)
+//!
+//! ## Benefits
+//!
+//! - **Separation of Concerns**: Client code doesn't need to know about
+//!   platform-specific RBAC requirements
+//! - **Atomic Operations**: Resources are applied and deleted as a unit
+//! - **Clean Abstractions**: Privileged operations are encapsulated in bundles
+//!   rather than scattered throughout client methods
+//!
+//! ## Example
+//!
+//! ```
+//! use harmony_k8s::{K8sClient, helper};
+//! use harmony_k8s::KubernetesDistribution;
+//!
+//! async fn write_network_config(client: &K8sClient, node: &str) {
+//!     // Create a bundle with platform-specific RBAC
+//!     let bundle = helper::build_privileged_bundle(
+//!         helper::PrivilegedPodConfig {
+//!             name: "network-config".to_string(),
+//!             namespace: "default".to_string(),
+//!             node_name: node.to_string(),
+//!             // ... other config
+//!             ..Default::default()
+//!         },
+//!         &KubernetesDistribution::OpenshiftFamily,
+//!     );
+//!     
+//!     // Apply all resources (RBAC + Pod) atomically
+//!     bundle.apply(client).await.unwrap();
+//!     
+//!     // ... wait for completion ...
+//!     
+//!     // Cleanup all resources
+//!     bundle.delete(client).await.unwrap();
+//! }
+//! ```
+
+use kube::{Error, Resource, ResourceExt, api::DynamicObject};
+use serde::Serialize;
+use serde_json;
+
+use crate::K8sClient;
+
+/// A ResourceBundle represents a logical unit of work consisting of multiple
+/// Kubernetes resources that should be applied or deleted together.
+///
+/// This pattern is useful for managing ephemeral privileged pods along with
+/// their required RBAC bindings (e.g., OpenShift SCC bindings).
+#[derive(Debug)]
+pub struct ResourceBundle {
+    pub resources: Vec<DynamicObject>,
+}
+
+impl ResourceBundle {
+    pub fn new() -> Self {
+        Self {
+            resources: Vec::new(),
+        }
+    }
+
+    /// Add a Kubernetes resource to this bundle.
+    /// The resource is converted to a DynamicObject for generic handling.
+    pub fn add<K>(&mut self, resource: K)
+    where
+        K: Resource + Serialize,
+        <K as Resource>::DynamicType: Default,
+    {
+        // Convert the typed resource to JSON, then to DynamicObject
+        let json = serde_json::to_value(&resource).expect("Failed to serialize resource");
+        let mut obj: DynamicObject =
+            serde_json::from_value(json).expect("Failed to convert to DynamicObject");
+
+        // Ensure type metadata is set
+        if obj.types.is_none() {
+            let api_version = Default::default();
+            let kind = Default::default();
+            let gvk = K::api_version(&api_version);
+            let kind = K::kind(&kind);
+            obj.types = Some(kube::api::TypeMeta {
+                api_version: gvk.to_string(),
+                kind: kind.to_string(),
+            });
+        }
+
+        self.resources.push(obj);
+    }
+
+    /// Apply all resources in this bundle to the cluster.
+    /// Resources are applied in the order they were added.
+    pub async fn apply(&self, client: &K8sClient) -> Result<(), Error> {
+        for res in &self.resources {
+            let namespace = res.namespace();
+            client
+                .apply_dynamic(res, namespace.as_deref(), true)
+                .await?;
+        }
+        Ok(())
+    }
+
+    /// Delete all resources in this bundle from the cluster.
+    /// Resources are deleted in reverse order to respect dependencies.
+    pub async fn delete(&self, client: &K8sClient) -> Result<(), Error> {
+        // FIXME delete all in parallel and retry using kube::client::retry::RetryPolicy
+        for res in self.resources.iter().rev() {
+            let api = client.get_api_for_dynamic_object(res, res.namespace().as_deref())?;
+            let name = res.name_any();
+            // FIXME this swallows all errors. Swallowing a 404 is ok but other errors must be
+            // handled properly (such as retrying). A normal error case is when we delete a
+            // resource bundle with dependencies between various resources. Such as a pod with a
+            // dependency on a ClusterRoleBinding. Trying to delete the ClusterRoleBinding first
+            // is expected to fail
+            let _ = api.delete(&name, &kube::api::DeleteParams::default()).await;
+        }
+        Ok(())
+    }
+}
--- a/harmony-k8s/src/client.rs
+++ b/harmony-k8s/src/client.rs
@@ -0,0 +1,107 @@
+use std::sync::Arc;
+
+use kube::config::{KubeConfigOptions, Kubeconfig};
+use kube::{Client, Config, Discovery, Error};
+use log::error;
+use serde::Serialize;
+use tokio::sync::OnceCell;
+
+use crate::types::KubernetesDistribution;
+
+// TODO not cool, should use a proper configuration mechanism
+// cli arg, env var, config file
+fn read_dry_run_from_env() -> bool {
+    std::env::var("DRY_RUN")
+        .map(|v| v == "true" || v == "1")
+        .unwrap_or(false)
+}
+
+#[derive(Clone)]
+pub struct K8sClient {
+    pub(crate) client: Client,
+    /// When `true` no mutation is sent to the API server; diffs are printed
+    /// to stdout instead. Initialised from the `DRY_RUN` environment variable.
+    pub(crate) dry_run: bool,
+    pub(crate) k8s_distribution: Arc<OnceCell<KubernetesDistribution>>,
+    pub(crate) discovery: Arc<OnceCell<Discovery>>,
+}
+
+impl Serialize for K8sClient {
+    fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: serde::Serializer,
+    {
+        todo!("K8sClient serialization is not meaningful; remove this impl if unused")
+    }
+}
+
+impl std::fmt::Debug for K8sClient {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.write_fmt(format_args!(
+            "K8sClient {{ namespace: {}, dry_run: {} }}",
+            self.client.default_namespace(),
+            self.dry_run,
+        ))
+    }
+}
+
+impl K8sClient {
+    pub fn inner_client(&self) -> &Client {
+        &self.client
+    }
+
+    pub fn inner_client_clone(&self) -> Client {
+        self.client.clone()
+    }
+
+    /// Create a client, reading `DRY_RUN` from the environment.
+    pub fn new(client: Client) -> Self {
+        Self {
+            dry_run: read_dry_run_from_env(),
+            client,
+            k8s_distribution: Arc::new(OnceCell::new()),
+            discovery: Arc::new(OnceCell::new()),
+        }
+    }
+
+    /// Create a client that always operates in dry-run mode, regardless of
+    /// the environment variable.
+    pub fn new_dry_run(client: Client) -> Self {
+        Self {
+            dry_run: true,
+            ..Self::new(client)
+        }
+    }
+
+    /// Returns `true` if this client is operating in dry-run mode.
+    pub fn is_dry_run(&self) -> bool {
+        self.dry_run
+    }
+
+    pub async fn try_default() -> Result<Self, Error> {
+        Ok(Self::new(Client::try_default().await?))
+    }
+
+    pub async fn from_kubeconfig(path: &str) -> Option<Self> {
+        Self::from_kubeconfig_with_opts(path, &KubeConfigOptions::default()).await
+    }
+
+    pub async fn from_kubeconfig_with_context(path: &str, context: Option<String>) -> Option<Self> {
+        let mut opts = KubeConfigOptions::default();
+        opts.context = context;
+        Self::from_kubeconfig_with_opts(path, &opts).await
+    }
+
+    pub async fn from_kubeconfig_with_opts(path: &str, opts: &KubeConfigOptions) -> Option<Self> {
+        let k = match Kubeconfig::read_from(path) {
+            Ok(k) => k,
+            Err(e) => {
+                error!("Failed to load kubeconfig from {path}: {e}");
+                return None;
+            }
+        };
+        Some(Self::new(
+            Client::try_from(Config::from_custom_kubeconfig(k, opts).await.unwrap()).unwrap(),
+        ))
+    }
+}
--- a/harmony-k8s/src/config.rs
+++ b/harmony-k8s/src/config.rs
@@ -0,0 +1 @@
+pub const PRIVILEGED_POD_IMAGE: &str = "hub.nationtech.io/redhat/ubi10:latest";
--- a/harmony-k8s/src/discovery.rs
+++ b/harmony-k8s/src/discovery.rs
@@ -0,0 +1,83 @@
+use std::time::Duration;
+
+use kube::{Discovery, Error};
+use log::{debug, error, info, trace, warn};
+use tokio::sync::Mutex;
+use tokio_retry::{Retry, strategy::ExponentialBackoff};
+
+use crate::client::K8sClient;
+use crate::types::KubernetesDistribution;
+
+impl K8sClient {
+    pub async fn get_apiserver_version(
+        &self,
+    ) -> Result<k8s_openapi::apimachinery::pkg::version::Info, Error> {
+        self.client.clone().apiserver_version().await
+    }
+
+    /// Runs (and caches) Kubernetes API discovery with exponential-backoff retries.
+    pub async fn discovery(&self) -> Result<&Discovery, Error> {
+        let retry_strategy = ExponentialBackoff::from_millis(1000)
+            .max_delay(Duration::from_secs(32))
+            .take(6);
+
+        let attempt = Mutex::new(0u32);
+        Retry::spawn(retry_strategy, || async {
+            let mut n = attempt.lock().await;
+            *n += 1;
+            match self
+                .discovery
+                .get_or_try_init(async || {
+                    debug!("Running Kubernetes API discovery (attempt {})", *n);
+                    let d = Discovery::new(self.client.clone()).run().await?;
+                    debug!("Kubernetes API discovery completed");
+                    Ok(d)
+                })
+                .await
+            {
+                Ok(d) => Ok(d),
+                Err(e) => {
+                    warn!("Kubernetes API discovery failed (attempt {}): {}", *n, e);
+                    Err(e)
+                }
+            }
+        })
+        .await
+        .map_err(|e| {
+            error!("Kubernetes API discovery failed after all retries: {}", e);
+            e
+        })
+    }
+
+    /// Detect which Kubernetes distribution is running. Result is cached for
+    /// the lifetime of the client.
+    pub async fn get_k8s_distribution(&self) -> Result<KubernetesDistribution, Error> {
+        self.k8s_distribution
+            .get_or_try_init(async || {
+                debug!("Detecting Kubernetes distribution");
+                let api_groups = self.client.list_api_groups().await?;
+                trace!("list_api_groups: {:?}", api_groups);
+
+                let version = self.get_apiserver_version().await?;
+
+                if api_groups
+                    .groups
+                    .iter()
+                    .any(|g| g.name == "project.openshift.io")
+                {
+                    info!("Detected distribution: OpenshiftFamily");
+                    return Ok(KubernetesDistribution::OpenshiftFamily);
+                }
+
+                if version.git_version.contains("k3s") {
+                    info!("Detected distribution: K3sFamily");
+                    return Ok(KubernetesDistribution::K3sFamily);
+                }
+
+                info!("Distribution not identified, using Default");
+                Ok(KubernetesDistribution::Default)
+            })
+            .await
+            .cloned()
+    }
+}
--- a/harmony-k8s/src/helper.rs
+++ b/harmony-k8s/src/helper.rs
@@ -0,0 +1,613 @@
+use std::collections::BTreeMap;
+use std::time::Duration;
+
+use crate::KubernetesDistribution;
+
+use super::bundle::ResourceBundle;
+use super::config::PRIVILEGED_POD_IMAGE;
+use k8s_openapi::api::core::v1::{
+    Container, HostPathVolumeSource, Pod, PodSpec, SecurityContext, Volume, VolumeMount,
+};
+use k8s_openapi::api::rbac::v1::{ClusterRoleBinding, RoleRef, Subject};
+use k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta;
+use kube::api::DynamicObject;
+use kube::error::DiscoveryError;
+use log::{debug, error, info, warn};
+use serde::de::DeserializeOwned;
+
+#[derive(Debug)]
+pub struct PrivilegedPodConfig {
+    pub name: String,
+    pub namespace: String,
+    pub node_name: String,
+    pub container_name: String,
+    pub command: Vec<String>,
+    pub volumes: Vec<Volume>,
+    pub volume_mounts: Vec<VolumeMount>,
+    pub host_pid: bool,
+    pub host_network: bool,
+}
+
+impl Default for PrivilegedPodConfig {
+    fn default() -> Self {
+        Self {
+            name: "privileged-pod".to_string(),
+            namespace: "harmony".to_string(),
+            node_name: "".to_string(),
+            container_name: "privileged-container".to_string(),
+            command: vec![],
+            volumes: vec![],
+            volume_mounts: vec![],
+            host_pid: false,
+            host_network: false,
+        }
+    }
+}
+
+pub fn build_privileged_pod(
+    config: PrivilegedPodConfig,
+    k8s_distribution: &KubernetesDistribution,
+) -> Pod {
+    let annotations = match k8s_distribution {
+        KubernetesDistribution::OpenshiftFamily => Some(BTreeMap::from([
+            ("openshift.io/scc".to_string(), "privileged".to_string()),
+            (
+                "openshift.io/required-scc".to_string(),
+                "privileged".to_string(),
+            ),
+        ])),
+        _ => None,
+    };
+
+    Pod {
+        metadata: ObjectMeta {
+            name: Some(config.name),
+            namespace: Some(config.namespace),
+            annotations,
+            ..Default::default()
+        },
+        spec: Some(PodSpec {
+            node_name: Some(config.node_name),
+            restart_policy: Some("Never".to_string()),
+            host_pid: Some(config.host_pid),
+            host_network: Some(config.host_network),
+            containers: vec![Container {
+                name: config.container_name,
+                image: Some(PRIVILEGED_POD_IMAGE.to_string()),
+                command: Some(config.command),
+                security_context: Some(SecurityContext {
+                    privileged: Some(true),
+                    ..Default::default()
+                }),
+                volume_mounts: Some(config.volume_mounts),
+                ..Default::default()
+            }],
+            volumes: Some(config.volumes),
+            ..Default::default()
+        }),
+        ..Default::default()
+    }
+}
+
+pub fn host_root_volume() -> (Volume, VolumeMount) {
+    (
+        Volume {
+            name: "host".to_string(),
+            host_path: Some(HostPathVolumeSource {
+                path: "/".to_string(),
+                ..Default::default()
+            }),
+            ..Default::default()
+        },
+        VolumeMount {
+            name: "host".to_string(),
+            mount_path: "/host".to_string(),
+            ..Default::default()
+        },
+    )
+}
+
+/// Build a ResourceBundle containing a privileged pod and any required RBAC.
+///
+/// This function implements the Resource Bundle pattern to encapsulate platform-specific
+/// security requirements for running privileged operations on nodes.
+///
+/// # Platform-Specific Behavior
+///
+/// - **OpenShift**: Creates a ClusterRoleBinding to grant the default ServiceAccount
+///   access to the `system:openshift:scc:privileged` ClusterRole, which allows the pod
+///   to use the privileged Security Context Constraint (SCC).
+/// - **Standard Kubernetes/K3s**: Only creates the Pod resource, as these distributions
+///   use standard PodSecurityPolicy or don't enforce additional security constraints.
+///
+/// # Arguments
+///
+/// * `config` - Configuration for the privileged pod (name, namespace, command, etc.)
+/// * `k8s_distribution` - The detected Kubernetes distribution to determine RBAC requirements
+///
+/// # Returns
+///
+/// A `ResourceBundle` containing 1-2 resources:
+/// - ClusterRoleBinding (OpenShift only)
+/// - Pod (all distributions)
+///
+/// # Example
+///
+/// ```
+/// use harmony_k8s::helper::{build_privileged_bundle, PrivilegedPodConfig};
+/// use harmony_k8s::KubernetesDistribution;
+/// let bundle = build_privileged_bundle(
+///     PrivilegedPodConfig {
+///         name: "network-setup".to_string(),
+///         namespace: "default".to_string(),
+///         node_name: "worker-01".to_string(),
+///         container_name: "setup".to_string(),
+///         command: vec!["nmcli".to_string(), "connection".to_string(), "reload".to_string()],
+///         ..Default::default()
+///     },
+///     &KubernetesDistribution::OpenshiftFamily,
+/// );
+/// // Bundle now contains ClusterRoleBinding + Pod
+/// ```
+pub fn build_privileged_bundle(
+    config: PrivilegedPodConfig,
+    k8s_distribution: &KubernetesDistribution,
+) -> ResourceBundle {
+    debug!(
+        "Building privileged bundle for config {config:#?} on distribution {k8s_distribution:?}"
+    );
+    let mut bundle = ResourceBundle::new();
+    let pod_name = config.name.clone();
+    let namespace = config.namespace.clone();
+
+    // 1. On OpenShift, create RBAC binding to privileged SCC
+    if let KubernetesDistribution::OpenshiftFamily = k8s_distribution {
+        // The default ServiceAccount needs to be bound to the privileged SCC
+        // via the system:openshift:scc:privileged ClusterRole
+        let crb = ClusterRoleBinding {
+            metadata: ObjectMeta {
+                name: Some(format!("{}-scc-binding", pod_name)),
+                ..Default::default()
+            },
+            role_ref: RoleRef {
+                api_group: "rbac.authorization.k8s.io".to_string(),
+                kind: "ClusterRole".to_string(),
+                name: "system:openshift:scc:privileged".to_string(),
+            },
+            subjects: Some(vec![Subject {
+                kind: "ServiceAccount".to_string(),
+                name: "default".to_string(),
+                namespace: Some(namespace.clone()),
+                api_group: None,
+                ..Default::default()
+            }]),
+        };
+        bundle.add(crb);
+    }
+
+    // 2. Build the privileged pod
+    let pod = build_privileged_pod(config, k8s_distribution);
+    bundle.add(pod);
+
+    bundle
+}
+
+/// Action to take when a drain operation times out.
+pub enum DrainTimeoutAction {
+    /// Accept the partial drain and continue
+    Accept,
+    /// Retry the drain for another timeout period
+    Retry,
+    /// Abort the drain operation
+    Abort,
+}
+
+/// Prompts the user to confirm acceptance of a partial drain.
+///
+/// Returns `Ok(true)` if the user confirms acceptance, `Ok(false)` if the user
+/// chooses to retry or abort, and `Err` if the prompt system fails entirely.
+pub fn prompt_drain_timeout_action(
+    node_name: &str,
+    pending_count: usize,
+    timeout_duration: Duration,
+) -> Result<DrainTimeoutAction, kube::Error> {
+    let prompt_msg = format!(
+        "Drain operation timed out on node '{}' with {} pod(s) remaining. What would you like to do?",
+        node_name, pending_count
+    );
+
+    loop {
+        let choices = vec![
+            "Accept drain failure (requires confirmation)".to_string(),
+            format!("Retry drain for another {:?}", timeout_duration),
+            "Abort operation".to_string(),
+        ];
+
+        let selection = inquire::Select::new(&prompt_msg, choices)
+            .with_help_message("Use arrow keys to navigate, Enter to select")
+            .prompt()
+            .map_err(|e| {
+                kube::Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Prompt failed: {}",
+                    e
+                )))
+            })?;
+
+        if selection.starts_with("Accept") {
+            // Require typed confirmation - retry until correct or user cancels
+            let required_confirmation = format!("yes-accept-drain:{}={}", node_name, pending_count);
+
+            let confirmation_prompt = format!(
+                "To accept this partial drain, type exactly: {}",
+                required_confirmation
+            );
+
+            match inquire::Text::new(&confirmation_prompt)
+                .with_help_message(&format!(
+                    "This action acknowledges {} pods will remain on the node",
+                    pending_count
+                ))
+                .prompt()
+            {
+                Ok(input) if input == required_confirmation => {
+                    warn!(
+                        "User accepted partial drain of node '{}' with {} pods remaining (confirmation: {})",
+                        node_name, pending_count, required_confirmation
+                    );
+                    return Ok(DrainTimeoutAction::Accept);
+                }
+                Ok(input) => {
+                    warn!(
+                        "Confirmation failed. Expected '{}', got '{}'. Please try again.",
+                        required_confirmation, input
+                    );
+                }
+                Err(e) => {
+                    // User cancelled (Ctrl+C) or prompt system failed
+                    error!("Confirmation prompt cancelled or failed: {}", e);
+                    return Ok(DrainTimeoutAction::Abort);
+                }
+            }
+        } else if selection.starts_with("Retry") {
+            info!(
+                "User chose to retry drain operation for another {:?}",
+                timeout_duration
+            );
+            return Ok(DrainTimeoutAction::Retry);
+        } else {
+            error!("Drain operation aborted by user");
+            return Ok(DrainTimeoutAction::Abort);
+        }
+    }
+}
+
+/// JSON round-trip: DynamicObject → K
+///
+/// Safe because the DynamicObject was produced by the apiserver from a
+/// payload that was originally serialized from K, so the schema is identical.
+pub(crate) fn dyn_to_typed<K: DeserializeOwned>(obj: DynamicObject) -> Result<K, kube::Error> {
+    serde_json::to_value(obj)
+        .and_then(serde_json::from_value)
+        .map_err(kube::Error::SerdeError)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use pretty_assertions::assert_eq;
+
+    #[test]
+    fn test_host_root_volume() {
+        let (volume, mount) = host_root_volume();
+
+        assert_eq!(volume.name, "host");
+        assert_eq!(volume.host_path.as_ref().unwrap().path, "/");
+
+        assert_eq!(mount.name, "host");
+        assert_eq!(mount.mount_path, "/host");
+    }
+
+    #[test]
+    fn test_build_privileged_pod_minimal() {
+        let pod = build_privileged_pod(
+            PrivilegedPodConfig {
+                name: "minimal-pod".to_string(),
+                namespace: "kube-system".to_string(),
+                node_name: "node-123".to_string(),
+                container_name: "debug-container".to_string(),
+                command: vec!["sleep".to_string(), "3600".to_string()],
+                ..Default::default()
+            },
+            &KubernetesDistribution::Default,
+        );
+
+        assert_eq!(pod.metadata.name, Some("minimal-pod".to_string()));
+        assert_eq!(pod.metadata.namespace, Some("kube-system".to_string()));
+
+        let spec = pod.spec.as_ref().expect("Pod spec should be present");
+        assert_eq!(spec.node_name, Some("node-123".to_string()));
+        assert_eq!(spec.restart_policy, Some("Never".to_string()));
+        assert_eq!(spec.host_pid, Some(false));
+        assert_eq!(spec.host_network, Some(false));
+
+        assert_eq!(spec.containers.len(), 1);
+        let container = &spec.containers[0];
+        assert_eq!(container.name, "debug-container");
+        assert_eq!(container.image, Some(PRIVILEGED_POD_IMAGE.to_string()));
+        assert_eq!(
+            container.command,
+            Some(vec!["sleep".to_string(), "3600".to_string()])
+        );
+
+        // Security context check
+        let sec_ctx = container
+            .security_context
+            .as_ref()
+            .expect("Security context missing");
+        assert_eq!(sec_ctx.privileged, Some(true));
+    }
+
+    #[test]
+    fn test_build_privileged_pod_with_volumes_and_host_access() {
+        let (host_vol, host_mount) = host_root_volume();
+
+        let pod = build_privileged_pod(
+            PrivilegedPodConfig {
+                name: "full-pod".to_string(),
+                namespace: "default".to_string(),
+                node_name: "node-1".to_string(),
+                container_name: "runner".to_string(),
+                command: vec!["/bin/sh".to_string()],
+                volumes: vec![host_vol.clone()],
+                volume_mounts: vec![host_mount.clone()],
+                host_pid: true,
+                host_network: true,
+            },
+            &KubernetesDistribution::Default,
+        );
+
+        let spec = pod.spec.as_ref().expect("Pod spec should be present");
+        assert_eq!(spec.host_pid, Some(true));
+        assert_eq!(spec.host_network, Some(true));
+
+        // Check volumes in Spec
+        let volumes = spec.volumes.as_ref().expect("Volumes should be present");
+        assert_eq!(volumes.len(), 1);
+        assert_eq!(volumes[0].name, "host");
+
+        // Check mounts in Container
+        let container = &spec.containers[0];
+        let mounts = container
+            .volume_mounts
+            .as_ref()
+            .expect("Mounts should be present");
+        assert_eq!(mounts.len(), 1);
+        assert_eq!(mounts[0].name, "host");
+        assert_eq!(mounts[0].mount_path, "/host");
+    }
+
+    #[test]
+    fn test_build_privileged_pod_structure_correctness() {
+        // This test validates that the construction logic puts things in the right places
+        // effectively validating the "template".
+
+        let custom_vol = Volume {
+            name: "custom-vol".to_string(),
+            ..Default::default()
+        };
+        let custom_mount = VolumeMount {
+            name: "custom-vol".to_string(),
+            mount_path: "/custom".to_string(),
+            ..Default::default()
+        };
+
+        let pod = build_privileged_pod(
+            PrivilegedPodConfig {
+                name: "structure-test".to_string(),
+                namespace: "test-ns".to_string(),
+                node_name: "test-node".to_string(),
+                container_name: "test-container".to_string(),
+                command: vec!["cmd".to_string()],
+                volumes: vec![custom_vol],
+                volume_mounts: vec![custom_mount],
+                ..Default::default()
+            },
+            &KubernetesDistribution::Default,
+        );
+
+        // Validate structure depth
+        let spec = pod.spec.as_ref().unwrap();
+
+        // 1. Spec level fields
+        assert!(spec.node_name.is_some());
+        assert!(spec.volumes.is_some());
+
+        // 2. Container level fields
+        let container = &spec.containers[0];
+        assert!(container.security_context.is_some());
+        assert!(container.volume_mounts.is_some());
+
+        // 3. Nested fields
+        assert!(
+            container
+                .security_context
+                .as_ref()
+                .unwrap()
+                .privileged
+                .unwrap()
+        );
+        assert_eq!(spec.volumes.as_ref().unwrap()[0].name, "custom-vol");
+        assert_eq!(
+            container.volume_mounts.as_ref().unwrap()[0].mount_path,
+            "/custom"
+        );
+    }
+
+    #[test]
+    fn test_build_privileged_bundle_default_distribution() {
+        let bundle = build_privileged_bundle(
+            PrivilegedPodConfig {
+                name: "test-bundle".to_string(),
+                namespace: "test-ns".to_string(),
+                node_name: "node-1".to_string(),
+                container_name: "test-container".to_string(),
+                command: vec!["echo".to_string(), "hello".to_string()],
+                ..Default::default()
+            },
+            &KubernetesDistribution::Default,
+        );
+
+        // For Default distribution, only the Pod should be in the bundle
+        assert_eq!(bundle.resources.len(), 1);
+
+        let pod_obj = &bundle.resources[0];
+        assert_eq!(pod_obj.metadata.name.as_deref(), Some("test-bundle"));
+        assert_eq!(pod_obj.metadata.namespace.as_deref(), Some("test-ns"));
+    }
+
+    #[test]
+    fn test_build_privileged_bundle_openshift_distribution() {
+        let bundle = build_privileged_bundle(
+            PrivilegedPodConfig {
+                name: "test-bundle-ocp".to_string(),
+                namespace: "test-ns".to_string(),
+                node_name: "node-1".to_string(),
+                container_name: "test-container".to_string(),
+                command: vec!["echo".to_string(), "hello".to_string()],
+                ..Default::default()
+            },
+            &KubernetesDistribution::OpenshiftFamily,
+        );
+
+        // For OpenShift, both ClusterRoleBinding and Pod should be in the bundle
+        assert_eq!(bundle.resources.len(), 2);
+
+        // First resource should be the ClusterRoleBinding
+        let crb_obj = &bundle.resources[0];
+        assert_eq!(
+            crb_obj.metadata.name.as_deref(),
+            Some("test-bundle-ocp-scc-binding")
+        );
+
+        // Verify it's targeting the privileged SCC
+        if let Some(role_ref) = crb_obj.data.get("roleRef") {
+            assert_eq!(
+                role_ref.get("name").and_then(|v| v.as_str()),
+                Some("system:openshift:scc:privileged")
+            );
+        }
+
+        // Second resource should be the Pod
+        let pod_obj = &bundle.resources[1];
+        assert_eq!(pod_obj.metadata.name.as_deref(), Some("test-bundle-ocp"));
+        assert_eq!(pod_obj.metadata.namespace.as_deref(), Some("test-ns"));
+    }
+
+    #[test]
+    fn test_build_privileged_bundle_k3s_distribution() {
+        let bundle = build_privileged_bundle(
+            PrivilegedPodConfig {
+                name: "test-bundle-k3s".to_string(),
+                namespace: "test-ns".to_string(),
+                node_name: "node-1".to_string(),
+                container_name: "test-container".to_string(),
+                command: vec!["echo".to_string(), "hello".to_string()],
+                ..Default::default()
+            },
+            &KubernetesDistribution::K3sFamily,
+        );
+
+        // For K3s, only the Pod should be in the bundle (no special SCC)
+        assert_eq!(bundle.resources.len(), 1);
+
+        let pod_obj = &bundle.resources[0];
+        assert_eq!(pod_obj.metadata.name.as_deref(), Some("test-bundle-k3s"));
+    }
+
+    #[test]
+    fn test_pod_yaml_rendering_expected() {
+        let pod = build_privileged_pod(
+            PrivilegedPodConfig {
+                name: "pod_name".to_string(),
+                namespace: "pod_namespace".to_string(),
+                node_name: "node name".to_string(),
+                container_name: "container name".to_string(),
+                command: vec!["command".to_string(), "argument".to_string()],
+                host_pid: true,
+                host_network: true,
+                ..Default::default()
+            },
+            &KubernetesDistribution::Default,
+        );
+
+        assert_eq!(
+            &serde_yaml::to_string(&pod).unwrap(),
+            "apiVersion: v1
+kind: Pod
+metadata:
+  name: pod_name
+  namespace: pod_namespace
+spec:
+  containers:
+  - command:
+    - command
+    - argument
+    image: hub.nationtech.io/redhat/ubi10:latest
+    name: container name
+    securityContext:
+      privileged: true
+    volumeMounts: []
+  hostNetwork: true
+  hostPID: true
+  nodeName: node name
+  restartPolicy: Never
+  volumes: []
+"
+        );
+    }
+
+    #[test]
+    fn test_pod_yaml_rendering_openshift() {
+        let pod = build_privileged_pod(
+            PrivilegedPodConfig {
+                name: "pod_name".to_string(),
+                namespace: "pod_namespace".to_string(),
+                node_name: "node name".to_string(),
+                container_name: "container name".to_string(),
+                command: vec!["command".to_string(), "argument".to_string()],
+                host_pid: true,
+                host_network: true,
+                ..Default::default()
+            },
+            &KubernetesDistribution::OpenshiftFamily,
+        );
+
+        assert_eq!(
+            &serde_yaml::to_string(&pod).unwrap(),
+            "apiVersion: v1
+kind: Pod
+metadata:
+  annotations:
+    openshift.io/required-scc: privileged
+    openshift.io/scc: privileged
+  name: pod_name
+  namespace: pod_namespace
+spec:
+  containers:
+  - command:
+    - command
+    - argument
+    image: hub.nationtech.io/redhat/ubi10:latest
+    name: container name
+    securityContext:
+      privileged: true
+    volumeMounts: []
+  hostNetwork: true
+  hostPID: true
+  nodeName: node name
+  restartPolicy: Never
+  volumes: []
+"
+        );
+    }
+}
--- a/harmony-k8s/src/lib.rs
+++ b/harmony-k8s/src/lib.rs
@@ -0,0 +1,13 @@
+pub mod apply;
+pub mod bundle;
+pub mod client;
+pub mod config;
+pub mod discovery;
+pub mod helper;
+pub mod node;
+pub mod pod;
+pub mod resources;
+pub mod types;
+
+pub use client::K8sClient;
+pub use types::{DrainOptions, KubernetesDistribution, NodeFile, ScopeResolver, WriteMode};
--- a/harmony-k8s/src/main.rs
+++ b/harmony-k8s/src/main.rs
@@ -0,0 +1,3 @@
+fn main() {
+    println!("Hello, world!");
+}
--- a/harmony-k8s/src/node.rs
+++ b/harmony-k8s/src/node.rs
@@ -0,0 +1,722 @@
+use std::collections::BTreeMap;
+use std::time::{Duration, SystemTime, UNIX_EPOCH};
+
+use k8s_openapi::api::core::v1::{
+    ConfigMap, ConfigMapVolumeSource, Node, Pod, Volume, VolumeMount,
+};
+use k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta;
+use kube::{
+    Error,
+    api::{Api, DeleteParams, EvictParams, ListParams, PostParams},
+    core::ErrorResponse,
+    error::DiscoveryError,
+};
+use log::{debug, error, info, warn};
+use tokio::time::sleep;
+
+use crate::client::K8sClient;
+use crate::helper::{self, PrivilegedPodConfig};
+use crate::types::{DrainOptions, NodeFile};
+
+impl K8sClient {
+    pub async fn cordon_node(&self, node_name: &str) -> Result<(), Error> {
+        Api::<Node>::all(self.client.clone())
+            .cordon(node_name)
+            .await?;
+        Ok(())
+    }
+
+    pub async fn uncordon_node(&self, node_name: &str) -> Result<(), Error> {
+        Api::<Node>::all(self.client.clone())
+            .uncordon(node_name)
+            .await?;
+        Ok(())
+    }
+
+    pub async fn wait_for_node_ready(&self, node_name: &str) -> Result<(), Error> {
+        self.wait_for_node_ready_with_timeout(node_name, Duration::from_secs(600))
+            .await
+    }
+
+    async fn wait_for_node_ready_with_timeout(
+        &self,
+        node_name: &str,
+        timeout: Duration,
+    ) -> Result<(), Error> {
+        let api: Api<Node> = Api::all(self.client.clone());
+        let start = tokio::time::Instant::now();
+        let poll = Duration::from_secs(5);
+        loop {
+            if start.elapsed() > timeout {
+                return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Node '{node_name}' did not become Ready within {timeout:?}"
+                ))));
+            }
+            match api.get(node_name).await {
+                Ok(node) => {
+                    if node
+                        .status
+                        .as_ref()
+                        .and_then(|s| s.conditions.as_ref())
+                        .map(|conds| {
+                            conds
+                                .iter()
+                                .any(|c| c.type_ == "Ready" && c.status == "True")
+                        })
+                        .unwrap_or(false)
+                    {
+                        debug!("Node '{node_name}' is Ready");
+                        return Ok(());
+                    }
+                }
+                Err(e) => debug!("Error polling node '{node_name}': {e}"),
+            }
+            sleep(poll).await;
+        }
+    }
+
+    async fn wait_for_node_not_ready(
+        &self,
+        node_name: &str,
+        timeout: Duration,
+    ) -> Result<(), Error> {
+        let api: Api<Node> = Api::all(self.client.clone());
+        let start = tokio::time::Instant::now();
+        let poll = Duration::from_secs(5);
+        loop {
+            if start.elapsed() > timeout {
+                return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Node '{node_name}' did not become NotReady within {timeout:?}"
+                ))));
+            }
+            match api.get(node_name).await {
+                Ok(node) => {
+                    let is_ready = node
+                        .status
+                        .as_ref()
+                        .and_then(|s| s.conditions.as_ref())
+                        .map(|conds| {
+                            conds
+                                .iter()
+                                .any(|c| c.type_ == "Ready" && c.status == "True")
+                        })
+                        .unwrap_or(false);
+                    if !is_ready {
+                        debug!("Node '{node_name}' is NotReady");
+                        return Ok(());
+                    }
+                }
+                Err(e) => debug!("Error polling node '{node_name}': {e}"),
+            }
+            sleep(poll).await;
+        }
+    }
+
+    async fn list_pods_on_node(&self, node_name: &str) -> Result<Vec<Pod>, Error> {
+        let api: Api<Pod> = Api::all(self.client.clone());
+        Ok(api
+            .list(&ListParams::default().fields(&format!("spec.nodeName={node_name}")))
+            .await?
+            .items)
+    }
+
+    fn is_mirror_pod(pod: &Pod) -> bool {
+        pod.metadata
+            .annotations
+            .as_ref()
+            .map(|a| a.contains_key("kubernetes.io/config.mirror"))
+            .unwrap_or(false)
+    }
+
+    fn is_daemonset_pod(pod: &Pod) -> bool {
+        pod.metadata
+            .owner_references
+            .as_ref()
+            .map(|refs| refs.iter().any(|r| r.kind == "DaemonSet"))
+            .unwrap_or(false)
+    }
+
+    fn has_emptydir_volume(pod: &Pod) -> bool {
+        pod.spec
+            .as_ref()
+            .and_then(|s| s.volumes.as_ref())
+            .map(|vols| vols.iter().any(|v| v.empty_dir.is_some()))
+            .unwrap_or(false)
+    }
+
+    fn is_completed_pod(pod: &Pod) -> bool {
+        pod.status
+            .as_ref()
+            .and_then(|s| s.phase.as_deref())
+            .map(|phase| phase == "Succeeded" || phase == "Failed")
+            .unwrap_or(false)
+    }
+
+    fn classify_pods_for_drain(
+        pods: &[Pod],
+        options: &DrainOptions,
+    ) -> Result<(Vec<Pod>, Vec<String>), String> {
+        let mut evictable = Vec::new();
+        let mut skipped = Vec::new();
+        let mut blocking = Vec::new();
+
+        for pod in pods {
+            let name = pod.metadata.name.as_deref().unwrap_or("<unknown>");
+            let ns = pod.metadata.namespace.as_deref().unwrap_or("<unknown>");
+            let qualified = format!("{ns}/{name}");
+
+            if Self::is_mirror_pod(pod) {
+                skipped.push(format!("{qualified} (mirror pod)"));
+                continue;
+            }
+            if Self::is_completed_pod(pod) {
+                skipped.push(format!("{qualified} (completed)"));
+                continue;
+            }
+            if Self::is_daemonset_pod(pod) {
+                if options.ignore_daemonsets {
+                    skipped.push(format!("{qualified} (DaemonSet-managed)"));
+                } else {
+                    blocking.push(format!(
+                        "{qualified} is managed by a DaemonSet (set ignore_daemonsets to skip)"
+                    ));
+                }
+                continue;
+            }
+            if Self::has_emptydir_volume(pod) && !options.delete_emptydir_data {
+                blocking.push(format!(
+                    "{qualified} uses emptyDir volumes (set delete_emptydir_data to allow eviction)"
+                ));
+                continue;
+            }
+            evictable.push(pod.clone());
+        }
+
+        if !blocking.is_empty() {
+            return Err(format!(
+                "Cannot drain node — the following pods block eviction:\n  - {}",
+                blocking.join("\n  - ")
+            ));
+        }
+        Ok((evictable, skipped))
+    }
+
+    async fn evict_pod(&self, pod: &Pod) -> Result<(), Error> {
+        let name = pod.metadata.name.as_deref().unwrap_or_default();
+        let ns = pod.metadata.namespace.as_deref().unwrap_or_default();
+        debug!("Evicting pod {ns}/{name}");
+        Api::<Pod>::namespaced(self.client.clone(), ns)
+            .evict(name, &EvictParams::default())
+            .await
+            .map(|_| ())
+    }
+
+    /// Drains a node: cordon → classify → evict & wait.
+    pub async fn drain_node(&self, node_name: &str, options: &DrainOptions) -> Result<(), Error> {
+        debug!("Cordoning '{node_name}'");
+        self.cordon_node(node_name).await?;
+
+        let pods = self.list_pods_on_node(node_name).await?;
+        debug!("Found {} pod(s) on '{node_name}'", pods.len());
+
+        let (evictable, skipped) =
+            Self::classify_pods_for_drain(&pods, options).map_err(|msg| {
+                error!("{msg}");
+                Error::Discovery(DiscoveryError::MissingResource(msg))
+            })?;
+
+        for s in &skipped {
+            info!("Skipping pod: {s}");
+        }
+        if evictable.is_empty() {
+            info!("No pods to evict on '{node_name}'");
+            return Ok(());
+        }
+        info!("Evicting {} pod(s) from '{node_name}'", evictable.len());
+
+        let mut start = tokio::time::Instant::now();
+        let poll = Duration::from_secs(5);
+        let mut pending = evictable;
+
+        loop {
+            for pod in &pending {
+                match self.evict_pod(pod).await {
+                    Ok(()) => {}
+                    Err(Error::Api(ErrorResponse { code: 404, .. })) => {}
+                    Err(Error::Api(ErrorResponse { code: 429, .. })) => {
+                        warn!(
+                            "PDB blocked eviction of {}/{}; will retry",
+                            pod.metadata.namespace.as_deref().unwrap_or(""),
+                            pod.metadata.name.as_deref().unwrap_or("")
+                        );
+                    }
+                    Err(e) => {
+                        error!(
+                            "Failed to evict {}/{}: {e}",
+                            pod.metadata.namespace.as_deref().unwrap_or(""),
+                            pod.metadata.name.as_deref().unwrap_or("")
+                        );
+                        return Err(e);
+                    }
+                }
+            }
+
+            sleep(poll).await;
+
+            let mut still_present = Vec::new();
+            for pod in pending {
+                let ns = pod.metadata.namespace.as_deref().unwrap_or_default();
+                let name = pod.metadata.name.as_deref().unwrap_or_default();
+                match self.get_pod(name, Some(ns)).await? {
+                    Some(_) => still_present.push(pod),
+                    None => debug!("Pod {ns}/{name} evicted"),
+                }
+            }
+            pending = still_present;
+
+            if pending.is_empty() {
+                break;
+            }
+
+            if start.elapsed() > options.timeout {
+                match helper::prompt_drain_timeout_action(
+                    node_name,
+                    pending.len(),
+                    options.timeout,
+                )? {
+                    helper::DrainTimeoutAction::Accept => break,
+                    helper::DrainTimeoutAction::Retry => {
+                        start = tokio::time::Instant::now();
+                        continue;
+                    }
+                    helper::DrainTimeoutAction::Abort => {
+                        return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                            "Drain aborted. {} pod(s) remaining on '{node_name}'",
+                            pending.len()
+                        ))));
+                    }
+                }
+            }
+            debug!("Waiting for {} pod(s) on '{node_name}'", pending.len());
+        }
+
+        debug!("'{node_name}' drained successfully");
+        Ok(())
+    }
+
+    /// Safely reboots a node: drain → reboot → wait for Ready → uncordon.
+    pub async fn reboot_node(
+        &self,
+        node_name: &str,
+        drain_options: &DrainOptions,
+        timeout: Duration,
+    ) -> Result<(), Error> {
+        info!("Starting reboot for '{node_name}'");
+        let node_api: Api<Node> = Api::all(self.client.clone());
+
+        let boot_id_before = node_api
+            .get(node_name)
+            .await?
+            .status
+            .as_ref()
+            .and_then(|s| s.node_info.as_ref())
+            .map(|ni| ni.boot_id.clone())
+            .ok_or_else(|| {
+                Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Node '{node_name}' has no boot_id in status"
+                )))
+            })?;
+
+        info!("Draining '{node_name}'");
+        self.drain_node(node_name, drain_options).await?;
+
+        let start = tokio::time::Instant::now();
+
+        info!("Scheduling reboot for '{node_name}'");
+        let reboot_cmd =
+            "echo rebooting ; nohup bash -c 'sleep 5 && nsenter -t 1 -m -- systemctl reboot'";
+        match self
+            .run_privileged_command_on_node(node_name, reboot_cmd)
+            .await
+        {
+            Ok(_) => debug!("Reboot command dispatched"),
+            Err(e) => debug!("Reboot command error (expected if node began shutdown): {e}"),
+        }
+
+        info!("Waiting for '{node_name}' to begin shutdown");
+        self.wait_for_node_not_ready(node_name, timeout.saturating_sub(start.elapsed()))
+            .await?;
+
+        if start.elapsed() > timeout {
+            return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Timeout during reboot of '{node_name}' (shutdown phase)"
+            ))));
+        }
+
+        info!("Waiting for '{node_name}' to come back online");
+        self.wait_for_node_ready_with_timeout(node_name, timeout.saturating_sub(start.elapsed()))
+            .await?;
+
+        if start.elapsed() > timeout {
+            return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Timeout during reboot of '{node_name}' (ready phase)"
+            ))));
+        }
+
+        let boot_id_after = node_api
+            .get(node_name)
+            .await?
+            .status
+            .as_ref()
+            .and_then(|s| s.node_info.as_ref())
+            .map(|ni| ni.boot_id.clone())
+            .ok_or_else(|| {
+                Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Node '{node_name}' has no boot_id after reboot"
+                )))
+            })?;
+
+        if boot_id_before == boot_id_after {
+            return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                "Node '{node_name}' did not actually reboot (boot_id unchanged: {boot_id_before})"
+            ))));
+        }
+
+        info!("'{node_name}' rebooted ({boot_id_before} → {boot_id_after})");
+        self.uncordon_node(node_name).await?;
+        info!("'{node_name}' reboot complete ({:?})", start.elapsed());
+        Ok(())
+    }
+
+    /// Write a set of files to a node's filesystem via a privileged ephemeral pod.
+    pub async fn write_files_to_node(
+        &self,
+        node_name: &str,
+        files: &[NodeFile],
+    ) -> Result<String, Error> {
+        let ns = self.client.default_namespace();
+        let suffix = SystemTime::now()
+            .duration_since(UNIX_EPOCH)
+            .unwrap()
+            .as_millis();
+        let name = format!("harmony-k8s-writer-{suffix}");
+
+        debug!("Writing {} file(s) to '{node_name}'", files.len());
+
+        let mut data = BTreeMap::new();
+        let mut script = String::from("set -e\n");
+        for (i, file) in files.iter().enumerate() {
+            let key = format!("f{i}");
+            data.insert(key.clone(), file.content.clone());
+            script.push_str(&format!("mkdir -p \"$(dirname \"/host{}\")\"\n", file.path));
+            script.push_str(&format!("cp \"/payload/{key}\" \"/host{}\"\n", file.path));
+            script.push_str(&format!("chmod {:o} \"/host{}\"\n", file.mode, file.path));
+        }
+
+        let cm = ConfigMap {
+            metadata: ObjectMeta {
+                name: Some(name.clone()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            data: Some(data),
+            ..Default::default()
+        };
+
+        let cm_api: Api<ConfigMap> = Api::namespaced(self.client.clone(), ns);
+        cm_api.create(&PostParams::default(), &cm).await?;
+        debug!("Created ConfigMap '{name}'");
+
+        let (host_vol, host_mount) = helper::host_root_volume();
+        let payload_vol = Volume {
+            name: "payload".to_string(),
+            config_map: Some(ConfigMapVolumeSource {
+                name: name.clone(),
+                ..Default::default()
+            }),
+            ..Default::default()
+        };
+        let payload_mount = VolumeMount {
+            name: "payload".to_string(),
+            mount_path: "/payload".to_string(),
+            ..Default::default()
+        };
+
+        let bundle = helper::build_privileged_bundle(
+            PrivilegedPodConfig {
+                name: name.clone(),
+                namespace: ns.to_string(),
+                node_name: node_name.to_string(),
+                container_name: "writer".to_string(),
+                command: vec!["/bin/bash".to_string(), "-c".to_string(), script],
+                volumes: vec![payload_vol, host_vol],
+                volume_mounts: vec![payload_mount, host_mount],
+                host_pid: false,
+                host_network: false,
+            },
+            &self.get_k8s_distribution().await?,
+        );
+
+        bundle.apply(self).await?;
+        debug!("Created privileged pod bundle '{name}'");
+
+        let result = self.wait_for_pod_completion(&name, ns).await;
+
+        debug!("Cleaning up '{name}'");
+        let _ = bundle.delete(self).await;
+        let _ = cm_api.delete(&name, &DeleteParams::default()).await;
+
+        result
+    }
+
+    /// Run a privileged command on a node via an ephemeral pod.
+    pub async fn run_privileged_command_on_node(
+        &self,
+        node_name: &str,
+        command: &str,
+    ) -> Result<String, Error> {
+        let namespace = self.client.default_namespace();
+        let suffix = SystemTime::now()
+            .duration_since(UNIX_EPOCH)
+            .unwrap()
+            .as_millis();
+        let name = format!("harmony-k8s-cmd-{suffix}");
+
+        debug!("Running privileged command on '{node_name}': {command}");
+
+        let (host_vol, host_mount) = helper::host_root_volume();
+        let bundle = helper::build_privileged_bundle(
+            PrivilegedPodConfig {
+                name: name.clone(),
+                namespace: namespace.to_string(),
+                node_name: node_name.to_string(),
+                container_name: "runner".to_string(),
+                command: vec![
+                    "/bin/bash".to_string(),
+                    "-c".to_string(),
+                    command.to_string(),
+                ],
+                volumes: vec![host_vol],
+                volume_mounts: vec![host_mount],
+                host_pid: true,
+                host_network: true,
+            },
+            &self.get_k8s_distribution().await?,
+        );
+
+        bundle.apply(self).await?;
+        debug!("Privileged pod '{name}' created");
+
+        let result = self.wait_for_pod_completion(&name, namespace).await;
+
+        debug!("Cleaning up '{name}'");
+        let _ = bundle.delete(self).await;
+
+        result
+    }
+}
+
+// ── Tests ────────────────────────────────────────────────────────────────────
+
+#[cfg(test)]
+mod tests {
+    use k8s_openapi::api::core::v1::{EmptyDirVolumeSource, PodSpec, PodStatus, Volume};
+    use k8s_openapi::apimachinery::pkg::apis::meta::v1::{ObjectMeta, OwnerReference};
+
+    use super::*;
+
+    fn base_pod(name: &str, ns: &str) -> Pod {
+        Pod {
+            metadata: ObjectMeta {
+                name: Some(name.to_string()),
+                namespace: Some(ns.to_string()),
+                ..Default::default()
+            },
+            spec: Some(PodSpec::default()),
+            status: Some(PodStatus {
+                phase: Some("Running".to_string()),
+                ..Default::default()
+            }),
+        }
+    }
+
+    fn mirror_pod(name: &str, ns: &str) -> Pod {
+        let mut pod = base_pod(name, ns);
+        pod.metadata.annotations = Some(std::collections::BTreeMap::from([(
+            "kubernetes.io/config.mirror".to_string(),
+            "abc123".to_string(),
+        )]));
+        pod
+    }
+
+    fn daemonset_pod(name: &str, ns: &str) -> Pod {
+        let mut pod = base_pod(name, ns);
+        pod.metadata.owner_references = Some(vec![OwnerReference {
+            api_version: "apps/v1".to_string(),
+            kind: "DaemonSet".to_string(),
+            name: "some-ds".to_string(),
+            uid: "uid-ds".to_string(),
+            ..Default::default()
+        }]);
+        pod
+    }
+
+    fn emptydir_pod(name: &str, ns: &str) -> Pod {
+        let mut pod = base_pod(name, ns);
+        pod.spec = Some(PodSpec {
+            volumes: Some(vec![Volume {
+                name: "scratch".to_string(),
+                empty_dir: Some(EmptyDirVolumeSource::default()),
+                ..Default::default()
+            }]),
+            ..Default::default()
+        });
+        pod
+    }
+
+    fn completed_pod(name: &str, ns: &str, phase: &str) -> Pod {
+        let mut pod = base_pod(name, ns);
+        pod.status = Some(PodStatus {
+            phase: Some(phase.to_string()),
+            ..Default::default()
+        });
+        pod
+    }
+
+    fn default_opts() -> DrainOptions {
+        DrainOptions::default()
+    }
+
+    // All test bodies are identical to the original — only the module path changed.
+
+    #[test]
+    fn empty_pod_list_returns_empty_vecs() {
+        let (e, s) = K8sClient::classify_pods_for_drain(&[], &default_opts()).unwrap();
+        assert!(e.is_empty());
+        assert!(s.is_empty());
+    }
+
+    #[test]
+    fn normal_pod_is_evictable() {
+        let pods = vec![base_pod("web", "default")];
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &default_opts()).unwrap();
+        assert_eq!(e.len(), 1);
+        assert!(s.is_empty());
+    }
+
+    #[test]
+    fn mirror_pod_is_skipped() {
+        let pods = vec![mirror_pod("kube-apiserver", "kube-system")];
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &default_opts()).unwrap();
+        assert!(e.is_empty());
+        assert!(s[0].contains("mirror pod"));
+    }
+
+    #[test]
+    fn completed_pods_are_skipped() {
+        for phase in ["Succeeded", "Failed"] {
+            let pods = vec![completed_pod("job", "batch", phase)];
+            let (e, s) = K8sClient::classify_pods_for_drain(&pods, &default_opts()).unwrap();
+            assert!(e.is_empty());
+            assert!(s[0].contains("completed"));
+        }
+    }
+
+    #[test]
+    fn daemonset_skipped_when_ignored() {
+        let pods = vec![daemonset_pod("fluentd", "logging")];
+        let opts = DrainOptions {
+            ignore_daemonsets: true,
+            ..default_opts()
+        };
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap();
+        assert!(e.is_empty());
+        assert!(s[0].contains("DaemonSet-managed"));
+    }
+
+    #[test]
+    fn daemonset_blocks_when_not_ignored() {
+        let pods = vec![daemonset_pod("fluentd", "logging")];
+        let opts = DrainOptions {
+            ignore_daemonsets: false,
+            ..default_opts()
+        };
+        let err = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap_err();
+        assert!(err.contains("DaemonSet") && err.contains("logging/fluentd"));
+    }
+
+    #[test]
+    fn emptydir_blocks_without_flag() {
+        let pods = vec![emptydir_pod("cache", "default")];
+        let opts = DrainOptions {
+            delete_emptydir_data: false,
+            ..default_opts()
+        };
+        let err = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap_err();
+        assert!(err.contains("emptyDir") && err.contains("default/cache"));
+    }
+
+    #[test]
+    fn emptydir_evictable_with_flag() {
+        let pods = vec![emptydir_pod("cache", "default")];
+        let opts = DrainOptions {
+            delete_emptydir_data: true,
+            ..default_opts()
+        };
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap();
+        assert_eq!(e.len(), 1);
+        assert!(s.is_empty());
+    }
+
+    #[test]
+    fn multiple_blocking_all_reported() {
+        let pods = vec![daemonset_pod("ds", "ns1"), emptydir_pod("ed", "ns2")];
+        let opts = DrainOptions {
+            ignore_daemonsets: false,
+            delete_emptydir_data: false,
+            ..default_opts()
+        };
+        let err = K8sClient::classify_pods_for_drain(&pods, &opts).unwrap_err();
+        assert!(err.contains("ns1/ds") && err.contains("ns2/ed"));
+    }
+
+    #[test]
+    fn mixed_pods_classified_correctly() {
+        let pods = vec![
+            base_pod("web", "default"),
+            mirror_pod("kube-apiserver", "kube-system"),
+            daemonset_pod("fluentd", "logging"),
+            completed_pod("job", "batch", "Succeeded"),
+            base_pod("api", "default"),
+        ];
+        let (e, s) = K8sClient::classify_pods_for_drain(&pods, &default_opts()).unwrap();
+        let names: Vec<&str> = e
+            .iter()
+            .map(|p| p.metadata.name.as_deref().unwrap())
+            .collect();
+        assert_eq!(names, vec!["web", "api"]);
+        assert_eq!(s.len(), 3);
+    }
+
+    #[test]
+    fn mirror_checked_before_completed() {
+        let mut pod = mirror_pod("static-etcd", "kube-system");
+        pod.status = Some(PodStatus {
+            phase: Some("Succeeded".to_string()),
+            ..Default::default()
+        });
+        let (_, s) = K8sClient::classify_pods_for_drain(&[pod], &default_opts()).unwrap();
+        assert!(s[0].contains("mirror pod"), "got: {}", s[0]);
+    }
+
+    #[test]
+    fn completed_checked_before_daemonset() {
+        let mut pod = daemonset_pod("collector", "monitoring");
+        pod.status = Some(PodStatus {
+            phase: Some("Failed".to_string()),
+            ..Default::default()
+        });
+        let (_, s) = K8sClient::classify_pods_for_drain(&[pod], &default_opts()).unwrap();
+        assert!(s[0].contains("completed"), "got: {}", s[0]);
+    }
+}
--- a/harmony-k8s/src/pod.rs
+++ b/harmony-k8s/src/pod.rs
@@ -0,0 +1,193 @@
+use std::time::Duration;
+
+use k8s_openapi::api::core::v1::Pod;
+use kube::{
+    Error,
+    api::{Api, AttachParams, ListParams},
+    error::DiscoveryError,
+    runtime::reflector::Lookup,
+};
+use log::debug;
+use tokio::io::AsyncReadExt;
+use tokio::time::sleep;
+
+use crate::client::K8sClient;
+
+impl K8sClient {
+    pub async fn get_pod(&self, name: &str, namespace: Option<&str>) -> Result<Option<Pod>, Error> {
+        let api: Api<Pod> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        api.get_opt(name).await
+    }
+
+    pub async fn wait_for_pod_ready(
+        &self,
+        pod_name: &str,
+        namespace: Option<&str>,
+    ) -> Result<(), Error> {
+        let mut elapsed = 0u64;
+        let interval = 5u64;
+        let timeout_secs = 120u64;
+        loop {
+            if let Some(p) = self.get_pod(pod_name, namespace).await? {
+                if let Some(phase) = p.status.and_then(|s| s.phase) {
+                    if phase.to_lowercase() == "running" {
+                        return Ok(());
+                    }
+                }
+            }
+            if elapsed >= timeout_secs {
+                return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                    "Pod '{}' in '{}' did not become ready within {timeout_secs}s",
+                    pod_name,
+                    namespace.unwrap_or("<default>"),
+                ))));
+            }
+            sleep(Duration::from_secs(interval)).await;
+            elapsed += interval;
+        }
+    }
+
+    /// Polls a pod until it reaches `Succeeded` or `Failed`, then returns its
+    /// logs.  Used internally by node operations.
+    pub(crate) async fn wait_for_pod_completion(
+        &self,
+        name: &str,
+        namespace: &str,
+    ) -> Result<String, Error> {
+        let api: Api<Pod> = Api::namespaced(self.client.clone(), namespace);
+        let poll_interval = Duration::from_secs(2);
+        for _ in 0..60 {
+            sleep(poll_interval).await;
+            let p = api.get(name).await?;
+            match p.status.and_then(|s| s.phase).as_deref() {
+                Some("Succeeded") => {
+                    let logs = api
+                        .logs(name, &Default::default())
+                        .await
+                        .unwrap_or_default();
+                    debug!("Pod {namespace}/{name} succeeded. Logs: {logs}");
+                    return Ok(logs);
+                }
+                Some("Failed") => {
+                    let logs = api
+                        .logs(name, &Default::default())
+                        .await
+                        .unwrap_or_default();
+                    debug!("Pod {namespace}/{name} failed. Logs: {logs}");
+                    return Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+                        "Pod '{name}' failed.\n{logs}"
+                    ))));
+                }
+                _ => {}
+            }
+        }
+        Err(Error::Discovery(DiscoveryError::MissingResource(format!(
+            "Timed out waiting for pod '{name}'"
+        ))))
+    }
+
+    /// Execute a command in the first pod matching `{label}={name}`.
+    pub async fn exec_app_capture_output(
+        &self,
+        name: String,
+        label: String,
+        namespace: Option<&str>,
+        command: Vec<&str>,
+    ) -> Result<String, String> {
+        let api: Api<Pod> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        let pod_list = api
+            .list(&ListParams::default().labels(&format!("{label}={name}")))
+            .await
+            .expect("Failed to list pods");
+
+        let pod_name = pod_list
+            .items
+            .first()
+            .expect("No matching pod")
+            .name()
+            .expect("Pod has no name")
+            .into_owned();
+
+        match api
+            .exec(
+                &pod_name,
+                command,
+                &AttachParams::default().stdout(true).stderr(true),
+            )
+            .await
+        {
+            Err(e) => Err(e.to_string()),
+            Ok(mut process) => {
+                let status = process
+                    .take_status()
+                    .expect("No status handle")
+                    .await
+                    .expect("Status channel closed");
+
+                if let Some(s) = status.status {
+                    let mut buf = String::new();
+                    if let Some(mut stdout) = process.stdout() {
+                        stdout
+                            .read_to_string(&mut buf)
+                            .await
+                            .map_err(|e| format!("Failed to read stdout: {e}"))?;
+                    }
+                    debug!("exec status: {} - {:?}", s, status.details);
+                    if s == "Success" { Ok(buf) } else { Err(s) }
+                } else {
+                    Err("No inner status from pod exec".to_string())
+                }
+            }
+        }
+    }
+
+    /// Execute a command in the first pod matching
+    /// `app.kubernetes.io/name={name}`.
+    pub async fn exec_app(
+        &self,
+        name: String,
+        namespace: Option<&str>,
+        command: Vec<&str>,
+    ) -> Result<(), String> {
+        let api: Api<Pod> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        let pod_list = api
+            .list(&ListParams::default().labels(&format!("app.kubernetes.io/name={name}")))
+            .await
+            .expect("Failed to list pods");
+
+        let pod_name = pod_list
+            .items
+            .first()
+            .expect("No matching pod")
+            .name()
+            .expect("Pod has no name")
+            .into_owned();
+
+        match api.exec(&pod_name, command, &AttachParams::default()).await {
+            Err(e) => Err(e.to_string()),
+            Ok(mut process) => {
+                let status = process
+                    .take_status()
+                    .expect("No status handle")
+                    .await
+                    .expect("Status channel closed");
+
+                if let Some(s) = status.status {
+                    debug!("exec status: {} - {:?}", s, status.details);
+                    if s == "Success" { Ok(()) } else { Err(s) }
+                } else {
+                    Err("No inner status from pod exec".to_string())
+                }
+            }
+        }
+    }
+}
--- a/harmony-k8s/src/resources.rs
+++ b/harmony-k8s/src/resources.rs
@@ -0,0 +1,316 @@
+use std::collections::HashMap;
+
+use k8s_openapi::api::{
+    apps::v1::Deployment,
+    core::v1::{Node, ServiceAccount},
+};
+use k8s_openapi::apiextensions_apiserver::pkg::apis::apiextensions::v1::CustomResourceDefinition;
+use kube::api::ApiResource;
+use kube::{
+    Error, Resource,
+    api::{Api, DynamicObject, GroupVersionKind, ListParams, ObjectList},
+    runtime::conditions,
+    runtime::wait::await_condition,
+};
+use log::debug;
+use serde::de::DeserializeOwned;
+use serde_json::Value;
+use std::time::Duration;
+
+use crate::client::K8sClient;
+use crate::types::ScopeResolver;
+
+impl K8sClient {
+    pub async fn has_healthy_deployment_with_label(
+        &self,
+        namespace: &str,
+        label_selector: &str,
+    ) -> Result<bool, Error> {
+        let api: Api<Deployment> = Api::namespaced(self.client.clone(), namespace);
+        let list = api
+            .list(&ListParams::default().labels(label_selector))
+            .await?;
+        for d in list.items {
+            let available = d
+                .status
+                .as_ref()
+                .and_then(|s| s.available_replicas)
+                .unwrap_or(0);
+            if available > 0 {
+                return Ok(true);
+            }
+            if let Some(conds) = d.status.as_ref().and_then(|s| s.conditions.as_ref()) {
+                if conds
+                    .iter()
+                    .any(|c| c.type_ == "Available" && c.status == "True")
+                {
+                    return Ok(true);
+                }
+            }
+        }
+        Ok(false)
+    }
+
+    pub async fn list_namespaces_with_healthy_deployments(
+        &self,
+        label_selector: &str,
+    ) -> Result<Vec<String>, Error> {
+        let api: Api<Deployment> = Api::all(self.client.clone());
+        let list = api
+            .list(&ListParams::default().labels(label_selector))
+            .await?;
+
+        let mut healthy_ns: HashMap<String, bool> = HashMap::new();
+        for d in list.items {
+            let ns = match d.metadata.namespace.clone() {
+                Some(n) => n,
+                None => continue,
+            };
+            let available = d
+                .status
+                .as_ref()
+                .and_then(|s| s.available_replicas)
+                .unwrap_or(0);
+            let is_healthy = if available > 0 {
+                true
+            } else {
+                d.status
+                    .as_ref()
+                    .and_then(|s| s.conditions.as_ref())
+                    .map(|c| {
+                        c.iter()
+                            .any(|c| c.type_ == "Available" && c.status == "True")
+                    })
+                    .unwrap_or(false)
+            };
+            if is_healthy {
+                healthy_ns.insert(ns, true);
+            }
+        }
+        Ok(healthy_ns.into_keys().collect())
+    }
+
+    pub async fn get_controller_service_account_name(
+        &self,
+        ns: &str,
+    ) -> Result<Option<String>, Error> {
+        let api: Api<Deployment> = Api::namespaced(self.client.clone(), ns);
+        let list = api
+            .list(&ListParams::default().labels("app.kubernetes.io/component=controller"))
+            .await?;
+        if let Some(dep) = list.items.first() {
+            if let Some(sa) = dep
+                .spec
+                .as_ref()
+                .and_then(|s| s.template.spec.as_ref())
+                .and_then(|s| s.service_account_name.clone())
+            {
+                return Ok(Some(sa));
+            }
+        }
+        Ok(None)
+    }
+
+    pub async fn list_clusterrolebindings_json(&self) -> Result<Vec<Value>, Error> {
+        let gvk = GroupVersionKind::gvk("rbac.authorization.k8s.io", "v1", "ClusterRoleBinding");
+        let ar = ApiResource::from_gvk(&gvk);
+        let api: Api<DynamicObject> = Api::all_with(self.client.clone(), &ar);
+        let list = api.list(&ListParams::default()).await?;
+        Ok(list
+            .items
+            .into_iter()
+            .map(|o| serde_json::to_value(&o).unwrap_or(Value::Null))
+            .collect())
+    }
+
+    pub async fn is_service_account_cluster_wide(&self, sa: &str, ns: &str) -> Result<bool, Error> {
+        let sa_user = format!("system:serviceaccount:{ns}:{sa}");
+        for crb in self.list_clusterrolebindings_json().await? {
+            if let Some(subjects) = crb.get("subjects").and_then(|s| s.as_array()) {
+                for subj in subjects {
+                    let kind = subj.get("kind").and_then(|v| v.as_str()).unwrap_or("");
+                    let name = subj.get("name").and_then(|v| v.as_str()).unwrap_or("");
+                    let subj_ns = subj.get("namespace").and_then(|v| v.as_str()).unwrap_or("");
+                    if (kind == "ServiceAccount" && name == sa && subj_ns == ns)
+                        || (kind == "User" && name == sa_user)
+                    {
+                        return Ok(true);
+                    }
+                }
+            }
+        }
+        Ok(false)
+    }
+
+    pub async fn has_crd(&self, name: &str) -> Result<bool, Error> {
+        let api: Api<CustomResourceDefinition> = Api::all(self.client.clone());
+        let crds = api
+            .list(&ListParams::default().fields(&format!("metadata.name={name}")))
+            .await?;
+        Ok(!crds.items.is_empty())
+    }
+
+    pub async fn service_account_api(&self, namespace: &str) -> Api<ServiceAccount> {
+        Api::namespaced(self.client.clone(), namespace)
+    }
+
+    pub async fn get_resource_json_value(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+        gvk: &GroupVersionKind,
+    ) -> Result<DynamicObject, Error> {
+        let ar = ApiResource::from_gvk(gvk);
+        let api: Api<DynamicObject> = match namespace {
+            Some(ns) => Api::namespaced_with(self.client.clone(), ns, &ar),
+            None => Api::default_namespaced_with(self.client.clone(), &ar),
+        };
+        api.get(name).await
+    }
+
+    pub async fn get_secret_json_value(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+    ) -> Result<DynamicObject, Error> {
+        self.get_resource_json_value(
+            name,
+            namespace,
+            &GroupVersionKind {
+                group: String::new(),
+                version: "v1".to_string(),
+                kind: "Secret".to_string(),
+            },
+        )
+        .await
+    }
+
+    pub async fn get_deployment(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+    ) -> Result<Option<Deployment>, Error> {
+        let api: Api<Deployment> = match namespace {
+            Some(ns) => {
+                debug!("Getting namespaced deployment '{name}' in '{ns}'");
+                Api::namespaced(self.client.clone(), ns)
+            }
+            None => {
+                debug!("Getting deployment '{name}' in default namespace");
+                Api::default_namespaced(self.client.clone())
+            }
+        };
+        api.get_opt(name).await
+    }
+
+    pub async fn scale_deployment(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+        replicas: u32,
+    ) -> Result<(), Error> {
+        let api: Api<Deployment> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        use kube::api::{Patch, PatchParams};
+        use serde_json::json;
+        let patch = json!({ "spec": { "replicas": replicas } });
+        api.patch_scale(name, &PatchParams::default(), &Patch::Merge(&patch))
+            .await?;
+        Ok(())
+    }
+
+    pub async fn delete_deployment(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+    ) -> Result<(), Error> {
+        let api: Api<Deployment> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        api.delete(name, &kube::api::DeleteParams::default())
+            .await?;
+        Ok(())
+    }
+
+    pub async fn wait_until_deployment_ready(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+        timeout: Option<Duration>,
+    ) -> Result<(), String> {
+        let api: Api<Deployment> = match namespace {
+            Some(ns) => Api::namespaced(self.client.clone(), ns),
+            None => Api::default_namespaced(self.client.clone()),
+        };
+        let timeout = timeout.unwrap_or(Duration::from_secs(120));
+        let establish = await_condition(api, name, conditions::is_deployment_completed());
+        tokio::time::timeout(timeout, establish)
+            .await
+            .map(|_| ())
+            .map_err(|_| "Timed out waiting for deployment".to_string())
+    }
+
+    /// Gets a single named resource, using the correct API scope for `K`.
+    pub async fn get_resource<K>(
+        &self,
+        name: &str,
+        namespace: Option<&str>,
+    ) -> Result<Option<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
+        <K as Resource>::Scope: ScopeResolver<K>,
+        <K as Resource>::DynamicType: Default,
+    {
+        let api: Api<K> =
+            <<K as Resource>::Scope as ScopeResolver<K>>::get_api(&self.client, namespace);
+        api.get_opt(name).await
+    }
+
+    pub async fn list_resources<K>(
+        &self,
+        namespace: Option<&str>,
+        list_params: Option<ListParams>,
+    ) -> Result<ObjectList<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
+        <K as Resource>::Scope: ScopeResolver<K>,
+        <K as Resource>::DynamicType: Default,
+    {
+        let api: Api<K> =
+            <<K as Resource>::Scope as ScopeResolver<K>>::get_api(&self.client, namespace);
+        api.list(&list_params.unwrap_or_default()).await
+    }
+
+    pub async fn list_all_resources_with_labels<K>(&self, labels: &str) -> Result<Vec<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
+        <K as Resource>::DynamicType: Default,
+    {
+        Api::<K>::all(self.client.clone())
+            .list(&ListParams::default().labels(labels))
+            .await
+            .map(|l| l.items)
+    }
+
+    pub async fn get_all_resource_in_all_namespace<K>(&self) -> Result<Vec<K>, Error>
+    where
+        K: Resource + Clone + std::fmt::Debug + DeserializeOwned,
+        <K as Resource>::Scope: ScopeResolver<K>,
+        <K as Resource>::DynamicType: Default,
+    {
+        Api::<K>::all(self.client.clone())
+            .list(&Default::default())
+            .await
+            .map(|l| l.items)
+    }
+
+    pub async fn get_nodes(
+        &self,
+        list_params: Option<ListParams>,
+    ) -> Result<ObjectList<Node>, Error> {
+        self.list_resources(None, list_params).await
+    }
+}
--- a/harmony-k8s/src/types.rs
+++ b/harmony-k8s/src/types.rs
@@ -0,0 +1,100 @@
+use std::time::Duration;
+
+use k8s_openapi::{ClusterResourceScope, NamespaceResourceScope};
+use kube::{Api, Client, Resource};
+use serde::Serialize;
+
+/// Which Kubernetes distribution is running. Detected once at runtime via
+/// [`crate::discovery::K8sClient::get_k8s_distribution`].
+#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
+pub enum KubernetesDistribution {
+    Default,
+    OpenshiftFamily,
+    K3sFamily,
+}
+
+/// A file to be written to a node's filesystem.
+#[derive(Debug, Clone)]
+pub struct NodeFile {
+    /// Absolute path on the host where the file should be written.
+    pub path: String,
+    /// Content of the file.
+    pub content: String,
+    /// UNIX permissions (e.g. `0o600`).
+    pub mode: u32,
+}
+
+/// Options controlling the behaviour of a [`crate::K8sClient::drain_node`] operation.
+#[derive(Debug, Clone)]
+pub struct DrainOptions {
+    /// Evict pods that use `emptyDir` volumes (ephemeral data is lost).
+    /// Equivalent to `kubectl drain --delete-emptydir-data`.
+    pub delete_emptydir_data: bool,
+    /// Silently skip DaemonSet-managed pods instead of blocking the drain.
+    /// Equivalent to `kubectl drain --ignore-daemonsets`.
+    pub ignore_daemonsets: bool,
+    /// Maximum wall-clock time to wait for all evictions to complete.
+    pub timeout: Duration,
+}
+
+impl Default for DrainOptions {
+    fn default() -> Self {
+        Self {
+            delete_emptydir_data: false,
+            ignore_daemonsets: true,
+            timeout: Duration::from_secs(1),
+        }
+    }
+}
+
+impl DrainOptions {
+    pub fn default_ignore_daemonset_delete_emptydir_data() -> Self {
+        Self {
+            delete_emptydir_data: true,
+            ignore_daemonsets: true,
+            ..Self::default()
+        }
+    }
+}
+
+/// Controls how [`crate::K8sClient::apply_with_strategy`] behaves when the
+/// resource already exists (or does not).
+pub enum WriteMode {
+    /// Server-side apply; create if absent, update if present (default).
+    CreateOrUpdate,
+    /// POST only; return an error if the resource already exists.
+    Create,
+    /// Server-side apply only; return an error if the resource does not exist.
+    Update,
+}
+
+// ── Scope resolution trait ───────────────────────────────────────────────────
+
+/// Resolves the correct [`kube::Api`] for a resource type based on its scope
+/// (cluster-wide vs. namespace-scoped).
+pub trait ScopeResolver<K: Resource> {
+    fn get_api(client: &Client, ns: Option<&str>) -> Api<K>;
+}
+
+impl<K> ScopeResolver<K> for ClusterResourceScope
+where
+    K: Resource<Scope = ClusterResourceScope>,
+    <K as Resource>::DynamicType: Default,
+{
+    fn get_api(client: &Client, _ns: Option<&str>) -> Api<K> {
+        Api::all(client.clone())
+    }
+}
+
+impl<K> ScopeResolver<K> for NamespaceResourceScope
+where
+    K: Resource<Scope = NamespaceResourceScope>,
+    <K as Resource>::DynamicType: Default,
+{
+    fn get_api(client: &Client, ns: Option<&str>) -> Api<K> {
+        match ns {
+            Some(ns) => Api::namespaced(client.clone(), ns),
+            None => Api::default_namespaced(client.clone()),
+        }
+    }
+}
--- a/harmony/Cargo.toml
+++ b/harmony/Cargo.toml
@@ -21,6 +21,8 @@ semver = "1.0.23"
 serde.workspace = true
 serde_json.workspace = true
 tokio.workspace = true
+tokio-retry.workspace = true
+tokio-util.workspace = true
 derive-new.workspace = true
 log.workspace = true
 env_logger.workspace = true
@@ -31,6 +33,7 @@ opnsense-config-xml = { path = "../opnsense-config-xml" }
 harmony_macros = { path = "../harmony_macros" }
 harmony_types = { path = "../harmony_types" }
 harmony_execution = { path = "../harmony_execution" }
+harmony-k8s = { path = "../harmony-k8s" }
 uuid.workspace = true
 url.workspace = true
 kube = { workspace = true, features = ["derive"] }
@@ -60,7 +63,6 @@ temp-dir = "0.1.14"
 dyn-clone = "1.0.19"
 similar.workspace = true
 futures-util = "0.3.31"
-tokio-util = "0.7.15"
 strum = { version = "0.27.1", features = ["derive"] }
 tempfile.workspace = true
 serde_with = "3.14.0"
@@ -80,7 +82,7 @@ sqlx.workspace = true
 inquire.workspace = true
 brocade = { path = "../brocade" }
 option-ext = "0.2.0"
-tokio-retry = "0.3.0"
+rand.workspace = true

 [dev-dependencies]
 pretty_assertions.workspace = true
--- a/harmony/src/domain/hardware/mod.rs
+++ b/harmony/src/domain/hardware/mod.rs
@@ -108,11 +108,18 @@ impl PhysicalHost {
            };

            let storage_summary = if drive_count > 1 {
+                let drive_sizes = self
+                    .storage
+                    .iter()
+                    .map(|d| format_storage(d.size_bytes))
+                    .collect::<Vec<_>>()
+                    .join(", ");
+
                format!(
-                    "{} Storage ({}x {})",
+                    "{} Storage ({} Disks [{}])",
                    format_storage(total_storage_bytes),
                    drive_count,
-                    first_drive_model
+                    drive_sizes
                )
            } else {
                format!(
--- a/harmony/src/domain/interpret/mod.rs
+++ b/harmony/src/domain/interpret/mod.rs
@@ -4,8 +4,6 @@ use std::error::Error;
 use async_trait::async_trait;
 use derive_new::new;

-use crate::inventory::HostRole;
-
 use super::{
    data::Version, executors::ExecutorError, inventory::Inventory, topology::PreparationError,
 };
--- a/harmony/src/domain/topology/ha_cluster.rs
+++ b/harmony/src/domain/topology/ha_cluster.rs
@@ -1,5 +1,5 @@
 use async_trait::async_trait;
-use brocade::PortOperatingMode;
+use harmony_k8s::K8sClient;
 use harmony_macros::ip;
 use harmony_types::{
    id::Id,
@@ -9,17 +9,20 @@ use harmony_types::{
 use log::debug;
 use log::info;

+use crate::topology::{HelmCommand, PxeOptions};
 use crate::{data::FileContent, executors::ExecutorError, topology::node_exporter::NodeExporter};
 use crate::{infra::network_manager::OpenShiftNmStateNetworkManager, topology::PortConfig};
-use crate::{modules::inventory::HarmonyDiscoveryStrategy, topology::PxeOptions};

 use super::{
    DHCPStaticEntry, DhcpServer, DnsRecord, DnsRecordType, DnsServer, Firewall, HostNetworkConfig,
    HttpServer, IpAddress, K8sclient, LoadBalancer, LoadBalancerService, LogicalHost, NetworkError,
    NetworkManager, PreparationError, PreparationOutcome, Router, Switch, SwitchClient,
-    SwitchError, TftpServer, Topology, k8s::K8sClient,
+    SwitchError, TftpServer, Topology,
+};
+use std::{
+    process::Command,
+    sync::{Arc, OnceLock},
 };
-use std::sync::{Arc, OnceLock};

 #[derive(Debug, Clone)]
 pub struct HAClusterTopology {
@@ -53,6 +56,30 @@ impl Topology for HAClusterTopology {
    }
 }

+impl HelmCommand for HAClusterTopology {
+    fn get_helm_command(&self) -> Command {
+        let mut cmd = Command::new("helm");
+        if let Some(k) = &self.kubeconfig {
+            cmd.args(["--kubeconfig", k]);
+        }
+
+        // FIXME we should support context anywhere there is a k8sclient
+        // This likely belongs in the k8sclient itself and should be extracted to a separate
+        // crate
+        //
+        // I feel like helm could very well be a feature of this external k8s client.
+        //
+        // Same for kustomize
+        //
+        // if let Some(c) = &self.k8s_context {
+        //     cmd.args(["--kube-context", c]);
+        // }
+
+        info!("Using helm command {cmd:?}");
+        cmd
+    }
+}
+
 #[async_trait]
 impl K8sclient for HAClusterTopology {
    async fn k8s_client(&self) -> Result<Arc<K8sClient>, String> {
@@ -301,10 +328,10 @@ impl Switch for HAClusterTopology {
        Ok(())
    }

-    async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
+    async fn clear_port_channel(&self, _ids: &Vec<Id>) -> Result<(), SwitchError> {
        todo!()
    }
-    async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
+    async fn configure_interface(&self, _ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
        todo!()
    }
 }
@@ -322,7 +349,15 @@ impl NetworkManager for HAClusterTopology {
        self.network_manager().await.configure_bond(config).await
    }

-    //TODO add snmp here
+    async fn configure_bond_on_primary_interface(
+        &self,
+        config: &HostNetworkConfig,
+    ) -> Result<(), NetworkError> {
+        self.network_manager()
+            .await
+            .configure_bond_on_primary_interface(config)
+            .await
+    }
 }

 #[async_trait]
@@ -562,10 +597,10 @@ impl SwitchClient for DummyInfra {
    ) -> Result<u8, SwitchError> {
        unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
    }
-    async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
+    async fn clear_port_channel(&self, _ids: &Vec<Id>) -> Result<(), SwitchError> {
        todo!()
    }
-    async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
+    async fn configure_interface(&self, _ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
        todo!()
    }
 }
--- a/harmony/src/domain/topology/k8s.rs
+++ b/harmony/src/domain/topology/k8s.rs
--- a/harmony/src/domain/topology/k8s_anywhere/k8s_anywhere.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/k8s_anywhere.rs
@@ -1,19 +1,14 @@
-use std::{collections::BTreeMap, process::Command, sync::Arc, time::Duration};
+use std::{collections::BTreeMap, process::Command, sync::Arc};

 use async_trait::async_trait;
 use base64::{Engine, engine::general_purpose};
+use harmony_k8s::{K8sClient, KubernetesDistribution};
 use harmony_types::rfc1123::Rfc1123Name;
-use k8s_openapi::{
-    ByteString,
-    api::{
-        core::v1::{Pod, Secret},
-        rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
-    },
-};
-use kube::{
-    api::{DynamicObject, GroupVersionKind, ObjectMeta},
-    runtime::conditions,
+use k8s_openapi::api::{
+    core::v1::{Pod, Secret},
+    rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
 };
+use kube::api::{GroupVersionKind, ObjectMeta};
 use log::{debug, info, trace, warn};
 use serde::Serialize;
 use tokio::sync::OnceCell;
@@ -34,32 +29,7 @@ use crate::{
            score_cert_management::CertificateManagementScore,
        },
        k3d::K3DInstallationScore,
-        k8s::{
-            ingress::{K8sIngressScore, PathType},
-            resource::K8sResourceScore,
-        },
-        monitoring::{
-            grafana::{grafana::Grafana, helm::helm_grafana::grafana_helm_chart_score},
-            kube_prometheus::crd::{
-                crd_alertmanager_config::CRDPrometheus,
-                crd_grafana::{
-                    Grafana as GrafanaCRD, GrafanaCom, GrafanaDashboard,
-                    GrafanaDashboardDatasource, GrafanaDashboardSpec, GrafanaDatasource,
-                    GrafanaDatasourceConfig, GrafanaDatasourceJsonData,
-                    GrafanaDatasourceSecureJsonData, GrafanaDatasourceSpec, GrafanaSpec,
-                },
-                crd_prometheuses::LabelSelector,
-                prometheus_operator::prometheus_operator_helm_chart_score,
-                rhob_alertmanager_config::RHOBObservability,
-                service_monitor::ServiceMonitor,
-            },
-        },
-        nats::capability::NatsCluster,
        okd::{crd::ingresses_config::Ingress as IngressResource, route::OKDTlsPassthroughScore},
-        prometheus::{
-            k8s_prometheus_alerting_score::K8sPrometheusCRDAlertingScore,
-            prometheus::PrometheusMonitoring, rhob_alerting_score::RHOBAlertingScore,
-        },
    },
    score::Score,
    topology::{TlsRoute, TlsRouter, ingress::Ingress},
@@ -68,8 +38,6 @@ use crate::{
 use super::super::{
    DeploymentTarget, HelmCommand, K8sclient, MultiTargetTopology, PreparationError,
    PreparationOutcome, Topology,
-    k8s::K8sClient,
-    oberservability::monitoring::AlertReceiver,
    tenant::{
        TenantConfig, TenantManager,
        k8s::K8sTenantManager,
@@ -86,13 +54,6 @@ struct K8sState {
    message: String,
 }

-#[derive(Debug, Clone, Serialize)]
-pub enum KubernetesDistribution {
-    OpenshiftFamily,
-    K3sFamily,
-    Default,
-}
-
 #[derive(Debug, Clone)]
 enum K8sSource {
    LocalK3d,
@@ -103,7 +64,6 @@ enum K8sSource {
 pub struct K8sAnywhereTopology {
    k8s_state: Arc<OnceCell<Option<K8sState>>>,
    tenant_manager: Arc<OnceCell<K8sTenantManager>>,
-    k8s_distribution: Arc<OnceCell<KubernetesDistribution>>,
    config: Arc<K8sAnywhereConfig>,
 }

@@ -184,216 +144,6 @@ impl TlsRouter for K8sAnywhereTopology {
    }
 }

-#[async_trait]
-impl Grafana for K8sAnywhereTopology {
-    async fn ensure_grafana_operator(
-        &self,
-        inventory: &Inventory,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        debug!("ensure grafana operator");
-        let client = self.k8s_client().await.unwrap();
-        let grafana_gvk = GroupVersionKind {
-            group: "grafana.integreatly.org".to_string(),
-            version: "v1beta1".to_string(),
-            kind: "Grafana".to_string(),
-        };
-        let name = "grafanas.grafana.integreatly.org";
-        let ns = "grafana";
-
-        let grafana_crd = client
-            .get_resource_json_value(name, Some(ns), &grafana_gvk)
-            .await;
-        match grafana_crd {
-            Ok(_) => {
-                return Ok(PreparationOutcome::Success {
-                    details: "Found grafana CRDs in cluster".to_string(),
-                });
-            }
-
-            Err(_) => {
-                return self
-                    .install_grafana_operator(inventory, Some("grafana"))
-                    .await;
-            }
-        };
-    }
-    async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError> {
-        let ns = "grafana";
-
-        let mut label = BTreeMap::new();
-
-        label.insert("dashboards".to_string(), "grafana".to_string());
-
-        let label_selector = LabelSelector {
-            match_labels: label.clone(),
-            match_expressions: vec![],
-        };
-
-        let client = self.k8s_client().await?;
-
-        let grafana = self.build_grafana(ns, &label);
-
-        client.apply(&grafana, Some(ns)).await?;
-        //TODO change this to a ensure ready or something better than just a timeout
-        client
-            .wait_until_deployment_ready(
-                "grafana-grafana-deployment",
-                Some("grafana"),
-                Some(Duration::from_secs(30)),
-            )
-            .await?;
-
-        let sa_name = "grafana-grafana-sa";
-        let token_secret_name = "grafana-sa-token-secret";
-
-        let sa_token_secret = self.build_sa_token_secret(token_secret_name, sa_name, ns);
-
-        client.apply(&sa_token_secret, Some(ns)).await?;
-        let secret_gvk = GroupVersionKind {
-            group: "".to_string(),
-            version: "v1".to_string(),
-            kind: "Secret".to_string(),
-        };
-
-        let secret = client
-            .get_resource_json_value(token_secret_name, Some(ns), &secret_gvk)
-            .await?;
-
-        let token = format!(
-            "Bearer {}",
-            self.extract_and_normalize_token(&secret).unwrap()
-        );
-
-        debug!("creating grafana clusterrole binding");
-
-        let clusterrolebinding =
-            self.build_cluster_rolebinding(sa_name, "cluster-monitoring-view", ns);
-
-        client.apply(&clusterrolebinding, Some(ns)).await?;
-
-        debug!("creating grafana datasource crd");
-
-        let thanos_url = format!(
-            "https://{}",
-            self.get_domain("thanos-querier-openshift-monitoring")
-                .await
-                .unwrap()
-        );
-
-        let thanos_openshift_datasource = self.build_grafana_datasource(
-            "thanos-openshift-monitoring",
-            ns,
-            &label_selector,
-            &thanos_url,
-            &token,
-        );
-
-        client.apply(&thanos_openshift_datasource, Some(ns)).await?;
-
-        debug!("creating grafana dashboard crd");
-        let dashboard = self.build_grafana_dashboard(ns, &label_selector);
-
-        client.apply(&dashboard, Some(ns)).await?;
-        debug!("creating grafana ingress");
-        let grafana_ingress = self.build_grafana_ingress(ns).await;
-
-        grafana_ingress
-            .interpret(&Inventory::empty(), self)
-            .await
-            .map_err(|e| PreparationError::new(e.to_string()))?;
-
-        Ok(PreparationOutcome::Success {
-            details: "Installed grafana composants".to_string(),
-        })
-    }
-}
-
-#[async_trait]
-impl PrometheusMonitoring<CRDPrometheus> for K8sAnywhereTopology {
-    async fn install_prometheus(
-        &self,
-        sender: &CRDPrometheus,
-        _inventory: &Inventory,
-        _receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let client = self.k8s_client().await?;
-
-        for monitor in sender.service_monitor.iter() {
-            client
-                .apply(monitor, Some(&sender.namespace))
-                .await
-                .map_err(|e| PreparationError::new(e.to_string()))?;
-        }
-        Ok(PreparationOutcome::Success {
-            details: "successfuly installed prometheus components".to_string(),
-        })
-    }
-
-    async fn ensure_prometheus_operator(
-        &self,
-        sender: &CRDPrometheus,
-        _inventory: &Inventory,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let po_result = self.ensure_prometheus_operator(sender).await?;
-
-        match po_result {
-            PreparationOutcome::Success { details: _ } => {
-                debug!("Detected prometheus crds operator present in cluster.");
-                return Ok(po_result);
-            }
-            PreparationOutcome::Noop => {
-                debug!("Skipping Prometheus CR installation due to missing operator.");
-                return Ok(po_result);
-            }
-        }
-    }
-}
-
-#[async_trait]
-impl PrometheusMonitoring<RHOBObservability> for K8sAnywhereTopology {
-    async fn install_prometheus(
-        &self,
-        sender: &RHOBObservability,
-        inventory: &Inventory,
-        receivers: Option<Vec<Box<dyn AlertReceiver<RHOBObservability>>>>,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let po_result = self.ensure_cluster_observability_operator(sender).await?;
-
-        if po_result == PreparationOutcome::Noop {
-            debug!("Skipping Prometheus CR installation due to missing operator.");
-            return Ok(po_result);
-        }
-
-        let result = self
-            .get_cluster_observability_operator_prometheus_application_score(
-                sender.clone(),
-                receivers,
-            )
-            .await
-            .interpret(inventory, self)
-            .await;
-
-        match result {
-            Ok(outcome) => match outcome.status {
-                InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
-                    details: outcome.message,
-                }),
-                InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
-                _ => Err(PreparationError::new(outcome.message)),
-            },
-            Err(err) => Err(PreparationError::new(err.to_string())),
-        }
-    }
-
-    async fn ensure_prometheus_operator(
-        &self,
-        sender: &RHOBObservability,
-        inventory: &Inventory,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        todo!()
-    }
-}
-
 impl Serialize for K8sAnywhereTopology {
    fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
    where
@@ -554,7 +304,6 @@ impl K8sAnywhereTopology {
        Self {
            k8s_state: Arc::new(OnceCell::new()),
            tenant_manager: Arc::new(OnceCell::new()),
-            k8s_distribution: Arc::new(OnceCell::new()),
            config: Arc::new(K8sAnywhereConfig::from_env()),
        }
    }
@@ -563,7 +312,6 @@ impl K8sAnywhereTopology {
        Self {
            k8s_state: Arc::new(OnceCell::new()),
            tenant_manager: Arc::new(OnceCell::new()),
-            k8s_distribution: Arc::new(OnceCell::new()),
            config: Arc::new(config),
        }
    }
@@ -600,56 +348,14 @@ impl K8sAnywhereTopology {
        }
    }

-    pub async fn get_k8s_distribution(&self) -> Result<&KubernetesDistribution, PreparationError> {
-        self.k8s_distribution
-            .get_or_try_init(async || {
-                debug!("Trying to detect k8s distribution");
-                let client = self.k8s_client().await.unwrap();
-
-                let discovery = client.discovery().await.map_err(|e| {
-                    PreparationError::new(format!("Could not discover API groups: {}", e))
-                })?;
-
-                let version = client.get_apiserver_version().await.map_err(|e| {
-                    PreparationError::new(format!("Could not get server version: {}", e))
-                })?;
-
-                // OpenShift / OKD
-                if discovery
-                    .groups()
-                    .any(|g| g.name() == "project.openshift.io")
-                {
-                    info!("Found KubernetesDistribution OpenshiftFamily");
-                    return Ok(KubernetesDistribution::OpenshiftFamily);
-                }
-
-                // K3d / K3s
-                if version.git_version.contains("k3s") {
-                    info!("Found KubernetesDistribution K3sFamily");
-                    return Ok(KubernetesDistribution::K3sFamily);
-                }
-
-                info!("Could not identify KubernetesDistribution, using Default");
-                return Ok(KubernetesDistribution::Default);
-            })
+    pub async fn get_k8s_distribution(&self) -> Result<KubernetesDistribution, PreparationError> {
+        self.k8s_client()
+            .await?
+            .get_k8s_distribution()
            .await
-    }
-
-    fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
-        let token_b64 = secret
-            .data
-            .get("token")
-            .or_else(|| secret.data.get("data").and_then(|d| d.get("token")))
-            .and_then(|v| v.as_str())?;
-
-        let bytes = general_purpose::STANDARD.decode(token_b64).ok()?;
-
-        let s = String::from_utf8(bytes).ok()?;
-
-        let cleaned = s
-            .trim_matches(|c: char| c.is_whitespace() || c == '\0')
-            .to_string();
-        Some(cleaned)
+            .map_err(|e| {
+                PreparationError::new(format!("Failed to get k8s distribution from client : {e}"))
+            })
    }

    pub fn build_cluster_rolebinding(
@@ -701,141 +407,6 @@ impl K8sAnywhereTopology {
        }
    }

-    fn build_grafana_datasource(
-        &self,
-        name: &str,
-        ns: &str,
-        label_selector: &LabelSelector,
-        url: &str,
-        token: &str,
-    ) -> GrafanaDatasource {
-        let mut json_data = BTreeMap::new();
-        json_data.insert("timeInterval".to_string(), "5s".to_string());
-
-        GrafanaDatasource {
-            metadata: ObjectMeta {
-                name: Some(name.to_string()),
-                namespace: Some(ns.to_string()),
-                ..Default::default()
-            },
-            spec: GrafanaDatasourceSpec {
-                instance_selector: label_selector.clone(),
-                allow_cross_namespace_import: Some(true),
-                values_from: None,
-                datasource: GrafanaDatasourceConfig {
-                    access: "proxy".to_string(),
-                    name: name.to_string(),
-                    r#type: "prometheus".to_string(),
-                    url: url.to_string(),
-                    database: None,
-                    json_data: Some(GrafanaDatasourceJsonData {
-                        time_interval: Some("60s".to_string()),
-                        http_header_name1: Some("Authorization".to_string()),
-                        tls_skip_verify: Some(true),
-                        oauth_pass_thru: Some(true),
-                    }),
-                    secure_json_data: Some(GrafanaDatasourceSecureJsonData {
-                        http_header_value1: Some(format!("Bearer {token}")),
-                    }),
-                    is_default: Some(false),
-                    editable: Some(true),
-                },
-            },
-        }
-    }
-
-    fn build_grafana_dashboard(
-        &self,
-        ns: &str,
-        label_selector: &LabelSelector,
-    ) -> GrafanaDashboard {
-        let graf_dashboard = GrafanaDashboard {
-            metadata: ObjectMeta {
-                name: Some(format!("grafana-dashboard-{}", ns)),
-                namespace: Some(ns.to_string()),
-                ..Default::default()
-            },
-            spec: GrafanaDashboardSpec {
-                resync_period: Some("30s".to_string()),
-                instance_selector: label_selector.clone(),
-                datasources: Some(vec![GrafanaDashboardDatasource {
-                    input_name: "DS_PROMETHEUS".to_string(),
-                    datasource_name: "thanos-openshift-monitoring".to_string(),
-                }]),
-                json: None,
-                grafana_com: Some(GrafanaCom {
-                    id: 17406,
-                    revision: None,
-                }),
-            },
-        };
-        graf_dashboard
-    }
-
-    fn build_grafana(&self, ns: &str, labels: &BTreeMap<String, String>) -> GrafanaCRD {
-        let grafana = GrafanaCRD {
-            metadata: ObjectMeta {
-                name: Some(format!("grafana-{}", ns)),
-                namespace: Some(ns.to_string()),
-                labels: Some(labels.clone()),
-                ..Default::default()
-            },
-            spec: GrafanaSpec {
-                config: None,
-                admin_user: None,
-                admin_password: None,
-                ingress: None,
-                persistence: None,
-                resources: None,
-            },
-        };
-        grafana
-    }
-
-    async fn build_grafana_ingress(&self, ns: &str) -> K8sIngressScore {
-        let domain = self.get_domain(&format!("grafana-{}", ns)).await.unwrap();
-        let name = format!("{}-grafana", ns);
-        let backend_service = format!("grafana-{}-service", ns);
-
-        K8sIngressScore {
-            name: fqdn::fqdn!(&name),
-            host: fqdn::fqdn!(&domain),
-            backend_service: fqdn::fqdn!(&backend_service),
-            port: 3000,
-            path: Some("/".to_string()),
-            path_type: Some(PathType::Prefix),
-            namespace: Some(fqdn::fqdn!(&ns)),
-            ingress_class_name: Some("openshift-default".to_string()),
-        }
-    }
-
-    async fn get_cluster_observability_operator_prometheus_application_score(
-        &self,
-        sender: RHOBObservability,
-        receivers: Option<Vec<Box<dyn AlertReceiver<RHOBObservability>>>>,
-    ) -> RHOBAlertingScore {
-        RHOBAlertingScore {
-            sender,
-            receivers: receivers.unwrap_or_default(),
-            service_monitors: vec![],
-            prometheus_rules: vec![],
-        }
-    }
-
-    async fn get_k8s_prometheus_application_score(
-        &self,
-        sender: CRDPrometheus,
-        receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
-        service_monitors: Option<Vec<ServiceMonitor>>,
-    ) -> K8sPrometheusCRDAlertingScore {
-        return K8sPrometheusCRDAlertingScore {
-            sender,
-            receivers: receivers.unwrap_or_default(),
-            service_monitors: service_monitors.unwrap_or_default(),
-            prometheus_rules: vec![],
-        };
-    }
-
    async fn openshift_ingress_operator_available(&self) -> Result<(), PreparationError> {
        let client = self.k8s_client().await?;
        let gvk = GroupVersionKind {
@@ -1001,137 +572,6 @@ impl K8sAnywhereTopology {
            )),
        }
    }
-
-    async fn ensure_cluster_observability_operator(
-        &self,
-        sender: &RHOBObservability,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let status = Command::new("sh")
-            .args(["-c", "kubectl get crd -A | grep -i rhobs"])
-            .status()
-            .map_err(|e| PreparationError::new(format!("could not connect to cluster: {}", e)))?;
-
-        if !status.success() {
-            if let Some(Some(k8s_state)) = self.k8s_state.get() {
-                match k8s_state.source {
-                    K8sSource::LocalK3d => {
-                        warn!(
-                            "Installing observability operator is not supported on LocalK3d source"
-                        );
-                        return Ok(PreparationOutcome::Noop);
-                        debug!("installing cluster observability operator");
-                        todo!();
-                        let op_score =
-                            prometheus_operator_helm_chart_score(sender.namespace.clone());
-                        let result = op_score.interpret(&Inventory::empty(), self).await;
-
-                        return match result {
-                            Ok(outcome) => match outcome.status {
-                                InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
-                                    details: "installed cluster observability operator".into(),
-                                }),
-                                InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
-                                _ => Err(PreparationError::new(
-                                    "failed to install cluster observability operator (unknown error)".into(),
-                                )),
-                            },
-                            Err(err) => Err(PreparationError::new(err.to_string())),
-                        };
-                    }
-                    K8sSource::Kubeconfig => {
-                        debug!(
-                            "unable to install cluster observability operator, contact cluster admin"
-                        );
-                        return Ok(PreparationOutcome::Noop);
-                    }
-                }
-            } else {
-                warn!(
-                    "Unable to detect k8s_state. Skipping Cluster Observability Operator install."
-                );
-                return Ok(PreparationOutcome::Noop);
-            }
-        }
-
-        debug!("Cluster Observability Operator is already present, skipping install");
-
-        Ok(PreparationOutcome::Success {
-            details: "cluster observability operator present in cluster".into(),
-        })
-    }
-
-    async fn ensure_prometheus_operator(
-        &self,
-        sender: &CRDPrometheus,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let status = Command::new("sh")
-            .args(["-c", "kubectl get crd -A | grep -i prometheuses"])
-            .status()
-            .map_err(|e| PreparationError::new(format!("could not connect to cluster: {}", e)))?;
-
-        if !status.success() {
-            if let Some(Some(k8s_state)) = self.k8s_state.get() {
-                match k8s_state.source {
-                    K8sSource::LocalK3d => {
-                        debug!("installing prometheus operator");
-                        let op_score =
-                            prometheus_operator_helm_chart_score(sender.namespace.clone());
-                        let result = op_score.interpret(&Inventory::empty(), self).await;
-
-                        return match result {
-                            Ok(outcome) => match outcome.status {
-                                InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
-                                    details: "installed prometheus operator".into(),
-                                }),
-                                InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
-                                _ => Err(PreparationError::new(
-                                    "failed to install prometheus operator (unknown error)".into(),
-                                )),
-                            },
-                            Err(err) => Err(PreparationError::new(err.to_string())),
-                        };
-                    }
-                    K8sSource::Kubeconfig => {
-                        debug!("unable to install prometheus operator, contact cluster admin");
-                        return Ok(PreparationOutcome::Noop);
-                    }
-                }
-            } else {
-                warn!("Unable to detect k8s_state. Skipping Prometheus Operator install.");
-                return Ok(PreparationOutcome::Noop);
-            }
-        }
-
-        debug!("Prometheus operator is already present, skipping install");
-
-        Ok(PreparationOutcome::Success {
-            details: "prometheus operator present in cluster".into(),
-        })
-    }
-
-    async fn install_grafana_operator(
-        &self,
-        inventory: &Inventory,
-        ns: Option<&str>,
-    ) -> Result<PreparationOutcome, PreparationError> {
-        let namespace = ns.unwrap_or("grafana");
-        info!("installing grafana operator in ns {namespace}");
-        let tenant = self.get_k8s_tenant_manager()?.get_tenant_config().await;
-        let mut namespace_scope = false;
-        if tenant.is_some() {
-            namespace_scope = true;
-        }
-        let _grafana_operator_score = grafana_helm_chart_score(namespace, namespace_scope)
-            .interpret(inventory, self)
-            .await
-            .map_err(|e| PreparationError::new(e.to_string()));
-        Ok(PreparationOutcome::Success {
-            details: format!(
-                "Successfully installed grafana operator in ns {}",
-                ns.unwrap()
-            ),
-        })
-    }
 }

 #[derive(Clone, Debug)]
--- a/harmony/src/domain/topology/k8s_anywhere/mod.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/mod.rs
@@ -1,4 +1,5 @@
 mod k8s_anywhere;
 pub mod nats;
+pub mod observability;
 mod postgres;
 pub use k8s_anywhere::*;
--- a/harmony/src/domain/topology/k8s_anywhere/observability/grafana.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/grafana.rs
@@ -0,0 +1,147 @@
+use async_trait::async_trait;
+
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::grafana::{
+        grafana::Grafana,
+        k8s::{
+            score_ensure_grafana_ready::GrafanaK8sEnsureReadyScore,
+            score_grafana_alert_receiver::GrafanaK8sReceiverScore,
+            score_grafana_datasource::GrafanaK8sDatasourceScore,
+            score_grafana_rule::GrafanaK8sRuleScore, score_install_grafana::GrafanaK8sInstallScore,
+        },
+    },
+    score::Score,
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<Grafana> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = GrafanaK8sInstallScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("Grafana not installed {}", e)))?;
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed grafana alert sender".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<Grafana>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let receivers = match receivers {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for receiver in receivers {
+            let score = GrafanaK8sReceiverScore {
+                receiver,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert receivers installed successfully".to_string(),
+        })
+    }
+
+    async fn install_rules(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<Grafana>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let rules = match rules {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for rule in rules {
+            let score = GrafanaK8sRuleScore {
+                sender: sender.clone(),
+                rule,
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert rules installed successfully".to_string(),
+        })
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Grafana>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let scrape_targets = match scrape_targets {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for scrape_target in scrape_targets {
+            let score = GrafanaK8sDatasourceScore {
+                scrape_target,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to add DataSource: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All datasources installed successfully".to_string(),
+        })
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        sender: &Grafana,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = GrafanaK8sEnsureReadyScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("Grafana not ready {}", e)))?;
+        Ok(PreparationOutcome::Success {
+            details: "Grafana Ready".to_string(),
+        })
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/observability/kube_prometheus.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/kube_prometheus.rs
@@ -0,0 +1,142 @@
+use async_trait::async_trait;
+
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::kube_prometheus::{
+        KubePrometheus, helm::kube_prometheus_helm_chart::kube_prometheus_helm_chart_score,
+        score_kube_prometheus_alert_receivers::KubePrometheusReceiverScore,
+        score_kube_prometheus_ensure_ready::KubePrometheusEnsureReadyScore,
+        score_kube_prometheus_rule::KubePrometheusRuleScore,
+        score_kube_prometheus_scrape_target::KubePrometheusScrapeTargetScore,
+    },
+    score::Score,
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<KubePrometheus> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        kube_prometheus_helm_chart_score(sender.config.clone())
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(e.to_string()))?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed kubeprometheus alert sender".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<KubePrometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let receivers = match receivers {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for receiver in receivers {
+            let score = KubePrometheusReceiverScore {
+                receiver,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert receivers installed successfully".to_string(),
+        })
+    }
+
+    async fn install_rules(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<KubePrometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let rules = match rules {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for rule in rules {
+            let score = KubePrometheusRuleScore {
+                sender: sender.clone(),
+                rule,
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert rules installed successfully".to_string(),
+        })
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<KubePrometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let scrape_targets = match scrape_targets {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for scrape_target in scrape_targets {
+            let score = KubePrometheusScrapeTargetScore {
+                scrape_target,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All scrap targets installed successfully".to_string(),
+        })
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        sender: &KubePrometheus,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = KubePrometheusEnsureReadyScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("KubePrometheus not ready {}", e)))?;
+        Ok(PreparationOutcome::Success {
+            details: "KubePrometheus Ready".to_string(),
+        })
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/observability/mod.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/mod.rs
@@ -0,0 +1,5 @@
+pub mod grafana;
+pub mod kube_prometheus;
+pub mod openshift_monitoring;
+pub mod prometheus;
+pub mod redhat_cluster_observability;
--- a/harmony/src/domain/topology/k8s_anywhere/observability/openshift_monitoring.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/openshift_monitoring.rs
@@ -0,0 +1,142 @@
+use async_trait::async_trait;
+use log::info;
+
+use crate::score::Score;
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::okd::{
+        OpenshiftClusterAlertSender,
+        score_enable_cluster_monitoring::OpenshiftEnableClusterMonitoringScore,
+        score_openshift_alert_rule::OpenshiftAlertRuleScore,
+        score_openshift_receiver::OpenshiftReceiverScore,
+        score_openshift_scrape_target::OpenshiftScrapeTargetScore,
+        score_user_workload::OpenshiftUserWorkloadMonitoring,
+        score_verify_user_workload_monitoring::VerifyUserWorkload,
+    },
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<OpenshiftClusterAlertSender> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        info!("enabling cluster monitoring");
+        let cluster_monitoring_score = OpenshiftEnableClusterMonitoringScore {};
+        cluster_monitoring_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError { msg: e.to_string() })?;
+
+        info!("enabling user workload monitoring");
+        let user_workload_score = OpenshiftUserWorkloadMonitoring {};
+        user_workload_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError { msg: e.to_string() })?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully configured cluster monitoring".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        if let Some(receivers) = receivers {
+            for receiver in receivers {
+                info!("Installing receiver {}", receiver.name());
+                let receiver_score = OpenshiftReceiverScore { receiver };
+                receiver_score
+                    .create_interpret()
+                    .execute(inventory, self)
+                    .await
+                    .map_err(|e| PreparationError { msg: e.to_string() })?;
+            }
+            Ok(PreparationOutcome::Success {
+                details: "Successfully installed receivers for OpenshiftClusterMonitoring"
+                    .to_string(),
+            })
+        } else {
+            Ok(PreparationOutcome::Noop)
+        }
+    }
+
+    async fn install_rules(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        if let Some(rules) = rules {
+            for rule in rules {
+                info!("Installing rule ");
+                let rule_score = OpenshiftAlertRuleScore { rule: rule };
+                rule_score
+                    .create_interpret()
+                    .execute(inventory, self)
+                    .await
+                    .map_err(|e| PreparationError { msg: e.to_string() })?;
+            }
+            Ok(PreparationOutcome::Success {
+                details: "Successfully installed rules for OpenshiftClusterMonitoring".to_string(),
+            })
+        } else {
+            Ok(PreparationOutcome::Noop)
+        }
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        if let Some(scrape_targets) = scrape_targets {
+            for scrape_target in scrape_targets {
+                info!("Installing scrape target");
+                let scrape_target_score = OpenshiftScrapeTargetScore {
+                    scrape_target: scrape_target,
+                };
+                scrape_target_score
+                    .create_interpret()
+                    .execute(inventory, self)
+                    .await
+                    .map_err(|e| PreparationError { msg: e.to_string() })?;
+            }
+            Ok(PreparationOutcome::Success {
+                details: "Successfully added scrape targets for OpenshiftClusterMonitoring"
+                    .to_string(),
+            })
+        } else {
+            Ok(PreparationOutcome::Noop)
+        }
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        _sender: &OpenshiftClusterAlertSender,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let verify_monitoring_score = VerifyUserWorkload {};
+        info!("Verifying user workload and cluster monitoring installed");
+        verify_monitoring_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError { msg: e.to_string() })?;
+        Ok(PreparationOutcome::Success {
+            details: "OpenshiftClusterMonitoring ready".to_string(),
+        })
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/observability/prometheus.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/prometheus.rs
@@ -0,0 +1,147 @@
+use async_trait::async_trait;
+
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::prometheus::{
+        Prometheus, score_prometheus_alert_receivers::PrometheusReceiverScore,
+        score_prometheus_ensure_ready::PrometheusEnsureReadyScore,
+        score_prometheus_install::PrometheusInstallScore,
+        score_prometheus_rule::PrometheusRuleScore,
+        score_prometheus_scrape_target::PrometheusScrapeTargetScore,
+    },
+    score::Score,
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<Prometheus> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = PrometheusInstallScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("Prometheus not installed {}", e)))?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed kubeprometheus alert sender".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<Prometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let receivers = match receivers {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for receiver in receivers {
+            let score = PrometheusReceiverScore {
+                receiver,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert receivers installed successfully".to_string(),
+        })
+    }
+
+    async fn install_rules(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<Prometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let rules = match rules {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for rule in rules {
+            let score = PrometheusRuleScore {
+                sender: sender.clone(),
+                rule,
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All alert rules installed successfully".to_string(),
+        })
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Prometheus>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let scrape_targets = match scrape_targets {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for scrape_target in scrape_targets {
+            let score = PrometheusScrapeTargetScore {
+                scrape_target,
+                sender: sender.clone(),
+            };
+
+            score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "All scrap targets installed successfully".to_string(),
+        })
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        sender: &Prometheus,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let score = PrometheusEnsureReadyScore {
+            sender: sender.clone(),
+        };
+
+        score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(format!("Prometheus not ready {}", e)))?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Prometheus Ready".to_string(),
+        })
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/observability/redhat_cluster_observability.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/observability/redhat_cluster_observability.rs
@@ -0,0 +1,116 @@
+use crate::{
+    modules::monitoring::red_hat_cluster_observability::{
+        score_alert_receiver::RedHatClusterObservabilityReceiverScore,
+        score_coo_monitoring_stack::RedHatClusterObservabilityMonitoringStackScore,
+    },
+    score::Score,
+};
+use async_trait::async_trait;
+use log::info;
+
+use crate::{
+    inventory::Inventory,
+    modules::monitoring::red_hat_cluster_observability::{
+        RedHatClusterObservability,
+        score_redhat_cluster_observability_operator::RedHatClusterObservabilityOperatorScore,
+    },
+    topology::{
+        K8sAnywhereTopology, PreparationError, PreparationOutcome,
+        monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
+    },
+};
+
+#[async_trait]
+impl Observability<RedHatClusterObservability> for K8sAnywhereTopology {
+    async fn install_alert_sender(
+        &self,
+        sender: &RedHatClusterObservability,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        info!("Verifying Redhat Cluster Observability Operator");
+
+        let coo_score = RedHatClusterObservabilityOperatorScore::default();
+
+        coo_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(e.to_string()))?;
+
+        info!(
+            "Installing Cluster Observability Operator Monitoring Stack in ns {}",
+            sender.namespace.clone()
+        );
+
+        let coo_monitoring_stack_score = RedHatClusterObservabilityMonitoringStackScore {
+            namespace: sender.namespace.clone(),
+            resource_selector: sender.resource_selector.clone(),
+        };
+
+        coo_monitoring_stack_score
+            .create_interpret()
+            .execute(inventory, self)
+            .await
+            .map_err(|e| PreparationError::new(e.to_string()))?;
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed RedHatClusterObservability Operator".to_string(),
+        })
+    }
+
+    async fn install_receivers(
+        &self,
+        sender: &RedHatClusterObservability,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        let receivers = match receivers {
+            Some(r) if !r.is_empty() => r,
+            _ => return Ok(PreparationOutcome::Noop),
+        };
+
+        for receiver in receivers {
+            info!("Installing receiver {}", receiver.name());
+
+            let receiver_score = RedHatClusterObservabilityReceiverScore {
+                receiver,
+                sender: sender.clone(),
+            };
+            receiver_score
+                .create_interpret()
+                .execute(inventory, self)
+                .await
+                .map_err(|e| PreparationError::new(e.to_string()))?;
+        }
+
+        Ok(PreparationOutcome::Success {
+            details: "Successfully installed receivers for OpenshiftClusterMonitoring".to_string(),
+        })
+    }
+
+    async fn install_rules(
+        &self,
+        _sender: &RedHatClusterObservability,
+        _inventory: &Inventory,
+        _rules: Option<Vec<Box<dyn AlertRule<RedHatClusterObservability>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        todo!()
+    }
+
+    async fn add_scrape_targets(
+        &self,
+        _sender: &RedHatClusterObservability,
+        _inventory: &Inventory,
+        _scrape_targets: Option<Vec<Box<dyn ScrapeTarget<RedHatClusterObservability>>>>,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        todo!()
+    }
+
+    async fn ensure_monitoring_installed(
+        &self,
+        _sender: &RedHatClusterObservability,
+        _inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError> {
+        todo!()
+    }
+}
--- a/harmony/src/domain/topology/k8s_anywhere/postgres.rs
+++ b/harmony/src/domain/topology/k8s_anywhere/postgres.rs
@@ -1,7 +1,6 @@
 use async_trait::async_trait;

 use crate::{
-    interpret::Outcome,
    inventory::Inventory,
    modules::postgresql::{
        K8sPostgreSQLScore,
--- a/harmony/src/domain/topology/load_balancer.rs
+++ b/harmony/src/domain/topology/load_balancer.rs
@@ -1,7 +1,6 @@
 use std::{net::SocketAddr, str::FromStr};

 use async_trait::async_trait;
-use log::debug;
 use serde::Serialize;

 use super::LogicalHost;
--- a/harmony/src/domain/topology/mod.rs
+++ b/harmony/src/domain/topology/mod.rs
@@ -2,6 +2,7 @@ pub mod decentralized;
 mod failover;
 mod ha_cluster;
 pub mod ingress;
+pub mod monitoring;
 pub mod node_exporter;
 pub mod opnsense;
 pub use failover::*;
@@ -11,12 +12,10 @@ mod http;
 pub mod installable;
 mod k8s_anywhere;
 mod localhost;
-pub mod oberservability;
 pub mod tenant;
 use derive_new::new;
 pub use k8s_anywhere::*;
 pub use localhost::*;
-pub mod k8s;
 mod load_balancer;
 pub mod router;
 mod tftp;
--- a/harmony/src/domain/topology/monitoring.rs
+++ b/harmony/src/domain/topology/monitoring.rs
@@ -0,0 +1,256 @@
+use std::{
+    any::Any,
+    collections::{BTreeMap, HashMap},
+    net::IpAddr,
+};
+
+use async_trait::async_trait;
+use kube::api::DynamicObject;
+use log::{debug, info};
+use serde::{Deserialize, Serialize};
+
+use crate::{
+    data::Version,
+    interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
+    inventory::Inventory,
+    topology::{PreparationError, PreparationOutcome, Topology, installable::Installable},
+};
+use harmony_types::id::Id;
+
+/// Defines the application that sends the alerts to a receivers
+/// for example prometheus
+#[async_trait]
+pub trait AlertSender: Send + Sync + std::fmt::Debug {
+    fn name(&self) -> String;
+}
+
+/// Trait which defines how an alert sender is impleneted for a specific topology
+#[async_trait]
+pub trait Observability<S: AlertSender> {
+    async fn install_alert_sender(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError>;
+
+    async fn install_receivers(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+        receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>,
+    ) -> Result<PreparationOutcome, PreparationError>;
+
+    async fn install_rules(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+        rules: Option<Vec<Box<dyn AlertRule<S>>>>,
+    ) -> Result<PreparationOutcome, PreparationError>;
+
+    async fn add_scrape_targets(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+        scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
+    ) -> Result<PreparationOutcome, PreparationError>;
+
+    async fn ensure_monitoring_installed(
+        &self,
+        sender: &S,
+        inventory: &Inventory,
+    ) -> Result<PreparationOutcome, PreparationError>;
+}
+
+/// Defines the entity that receives the alerts from a sender. For example Discord, Slack, etc
+///
+pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
+}
+
+/// Defines a generic rule that can be applied to a sender, such as aprometheus alert rule
+pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn AlertRule<S>>;
+}
+
+/// A generic scrape target that can be added to a sender to scrape metrics from, for example a
+/// server outside of the cluster
+pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
+    fn build_scrape_target(&self) -> Result<ExternalScrapeTarget, InterpretError>;
+    fn name(&self) -> String;
+    fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ExternalScrapeTarget {
+    pub ip: IpAddr,
+    pub port: i32,
+    pub interval: Option<String>,
+    pub path: Option<String>,
+    pub labels: Option<BTreeMap<String, String>>,
+}
+
+/// Alerting interpret to install an alert sender on a given topology
+#[derive(Debug)]
+pub struct AlertingInterpret<S: AlertSender> {
+    pub sender: S,
+    pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
+    pub rules: Vec<Box<dyn AlertRule<S>>>,
+    pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
+}
+
+#[async_trait]
+impl<S: AlertSender, T: Topology + Observability<S>> Interpret<T> for AlertingInterpret<S> {
+    async fn execute(
+        &self,
+        inventory: &Inventory,
+        topology: &T,
+    ) -> Result<Outcome, InterpretError> {
+        info!("Configuring alert sender {}", self.sender.name());
+        topology
+            .install_alert_sender(&self.sender, inventory)
+            .await?;
+
+        info!("Installing receivers");
+        topology
+            .install_receivers(&self.sender, inventory, Some(self.receivers.clone()))
+            .await?;
+
+        info!("Installing rules");
+        topology
+            .install_rules(&self.sender, inventory, Some(self.rules.clone()))
+            .await?;
+
+        info!("Adding extra scrape targets");
+        topology
+            .add_scrape_targets(&self.sender, inventory, self.scrape_targets.clone())
+            .await?;
+
+        info!("Ensuring alert sender {} is ready", self.sender.name());
+        topology
+            .ensure_monitoring_installed(&self.sender, inventory)
+            .await?;
+
+        Ok(Outcome::success(format!(
+            "successfully installed alert sender {}",
+            self.sender.name()
+        )))
+    }
+
+    fn get_name(&self) -> InterpretName {
+        InterpretName::Alerting
+    }
+
+    fn get_version(&self) -> Version {
+        todo!()
+    }
+
+    fn get_status(&self) -> InterpretStatus {
+        todo!()
+    }
+
+    fn get_children(&self) -> Vec<Id> {
+        todo!()
+    }
+}
+
+impl<S: AlertSender> Clone for Box<dyn AlertReceiver<S>> {
+    fn clone(&self) -> Self {
+        self.clone_box()
+    }
+}
+
+impl<S: AlertSender> Clone for Box<dyn AlertRule<S>> {
+    fn clone(&self) -> Self {
+        self.clone_box()
+    }
+}
+
+impl<S: AlertSender> Clone for Box<dyn ScrapeTarget<S>> {
+    fn clone(&self) -> Self {
+        self.clone_box()
+    }
+}
+
+pub struct ReceiverInstallPlan {
+    pub install_operation: Option<Vec<InstallOperation>>,
+    pub route: Option<AlertRoute>,
+    pub receiver: Option<serde_yaml::Value>,
+}
+
+impl Default for ReceiverInstallPlan {
+    fn default() -> Self {
+        Self {
+            install_operation: None,
+            route: None,
+            receiver: None,
+        }
+    }
+}
+
+pub enum InstallOperation {
+    CreateSecret {
+        name: String,
+        data: BTreeMap<String, String>,
+    },
+}
+
+///Generic routing that can map to various alert sender backends
+#[derive(Debug, Clone, Serialize)]
+pub struct AlertRoute {
+    pub receiver: String,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    pub matchers: Vec<AlertMatcher>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    pub group_by: Vec<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub repeat_interval: Option<String>,
+    #[serde(rename = "continue")]
+    pub continue_matching: bool,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    pub children: Vec<AlertRoute>,
+}
+
+impl AlertRoute {
+    pub fn default(name: String) -> Self {
+        Self {
+            receiver: name,
+            matchers: vec![],
+            group_by: vec![],
+            repeat_interval: Some("30s".to_string()),
+            continue_matching: true,
+            children: vec![],
+        }
+    }
+}
+
+#[derive(Debug, Clone, Serialize)]
+pub struct AlertMatcher {
+    pub label: String,
+    pub operator: MatchOp,
+    pub value: String,
+}
+
+#[derive(Debug, Clone)]
+pub enum MatchOp {
+    Eq,
+    NotEq,
+    Regex,
+}
+
+impl Serialize for MatchOp {
+    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: serde::Serializer,
+    {
+        let op = match self {
+            MatchOp::Eq => "=",
+            MatchOp::NotEq => "!=",
+            MatchOp::Regex => "=~",
+        };
+        serializer.serialize_str(op)
+    }
+}
--- a/harmony/src/domain/topology/network.rs
+++ b/harmony/src/domain/topology/network.rs
@@ -9,6 +9,7 @@ use std::{
 use async_trait::async_trait;
 use brocade::PortOperatingMode;
 use derive_new::new;
+use harmony_k8s::K8sClient;
 use harmony_types::{
    id::Id,
    net::{IpAddress, MacAddress},
@@ -18,7 +19,7 @@ use serde::Serialize;

 use crate::executors::ExecutorError;

-use super::{LogicalHost, k8s::K8sClient};
+use super::LogicalHost;

 #[derive(Debug)]
 pub struct DHCPStaticEntry {
@@ -188,6 +189,10 @@ impl FromStr for DnsRecordType {
 pub trait NetworkManager: Debug + Send + Sync {
    async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError>;
    async fn configure_bond(&self, config: &HostNetworkConfig) -> Result<(), NetworkError>;
+    async fn configure_bond_on_primary_interface(
+        &self,
+        config: &HostNetworkConfig,
+    ) -> Result<(), NetworkError>;
 }

 #[derive(Debug, Clone, new)]
--- a/harmony/src/domain/topology/oberservability/mod.rs
+++ b/harmony/src/domain/topology/oberservability/mod.rs
@@ -1 +0,0 @@
-pub mod monitoring;
--- a/harmony/src/domain/topology/oberservability/monitoring.rs
+++ b/harmony/src/domain/topology/oberservability/monitoring.rs
@@ -1,101 +0,0 @@
-use std::{any::Any, collections::HashMap};
-
-use async_trait::async_trait;
-use kube::api::DynamicObject;
-use log::debug;
-
-use crate::{
-    data::Version,
-    interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
-    inventory::Inventory,
-    topology::{Topology, installable::Installable},
-};
-use harmony_types::id::Id;
-
-#[async_trait]
-pub trait AlertSender: Send + Sync + std::fmt::Debug {
-    fn name(&self) -> String;
-}
-
-#[derive(Debug)]
-pub struct AlertingInterpret<S: AlertSender> {
-    pub sender: S,
-    pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
-    pub rules: Vec<Box<dyn AlertRule<S>>>,
-    pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
-}
-
-#[async_trait]
-impl<S: AlertSender + Installable<T>, T: Topology> Interpret<T> for AlertingInterpret<S> {
-    async fn execute(
-        &self,
-        inventory: &Inventory,
-        topology: &T,
-    ) -> Result<Outcome, InterpretError> {
-        debug!("hit sender configure for AlertingInterpret");
-        self.sender.configure(inventory, topology).await?;
-        for receiver in self.receivers.iter() {
-            receiver.install(&self.sender).await?;
-        }
-        for rule in self.rules.iter() {
-            debug!("installing rule: {:#?}", rule);
-            rule.install(&self.sender).await?;
-        }
-        if let Some(targets) = &self.scrape_targets {
-            for target in targets.iter() {
-                debug!("installing scrape_target: {:#?}", target);
-                target.install(&self.sender).await?;
-            }
-        }
-        self.sender.ensure_installed(inventory, topology).await?;
-        Ok(Outcome::success(format!(
-            "successfully installed alert sender {}",
-            self.sender.name()
-        )))
-    }
-
-    fn get_name(&self) -> InterpretName {
-        InterpretName::Alerting
-    }
-
-    fn get_version(&self) -> Version {
-        todo!()
-    }
-
-    fn get_status(&self) -> InterpretStatus {
-        todo!()
-    }
-
-    fn get_children(&self) -> Vec<Id> {
-        todo!()
-    }
-}
-
-#[async_trait]
-pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
-    async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
-    fn name(&self) -> String;
-    fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
-    fn as_any(&self) -> &dyn Any;
-    fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String>;
-}
-
-#[derive(Debug)]
-pub struct AlertManagerReceiver {
-    pub receiver_config: serde_json::Value,
-    // FIXME we should not leak k8s here. DynamicObject is k8s specific
-    pub additional_ressources: Vec<DynamicObject>,
-    pub route_config: serde_json::Value,
-}
-
-#[async_trait]
-pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
-    async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
-    fn clone_box(&self) -> Box<dyn AlertRule<S>>;
-}
-
-#[async_trait]
-pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
-    async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
-    fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
-}
--- a/harmony/src/domain/topology/tenant/k8s.rs
+++ b/harmony/src/domain/topology/tenant/k8s.rs
@@ -1,10 +1,8 @@
 use std::sync::Arc;

-use crate::{
-    executors::ExecutorError,
-    topology::k8s::{ApplyStrategy, K8sClient},
-};
+use crate::executors::ExecutorError;
 use async_trait::async_trait;
+use harmony_k8s::K8sClient;
 use k8s_openapi::{
    api::{
        core::v1::{LimitRange, Namespace, ResourceQuota},
@@ -14,7 +12,7 @@ use k8s_openapi::{
    },
    apimachinery::pkg::util::intstr::IntOrString,
 };
-use kube::{Resource, api::DynamicObject};
+use kube::Resource;
 use log::debug;
 use serde::de::DeserializeOwned;
 use serde_json::json;
@@ -59,7 +57,6 @@ impl K8sTenantManager {
    ) -> Result<K, ExecutorError>
    where
        <K as kube::Resource>::DynamicType: Default,
-        <K as kube::Resource>::Scope: ApplyStrategy<K>,
    {
        self.apply_labels(&mut resource, config);
        self.k8s_client
--- a/harmony/src/infra/brocade.rs
+++ b/harmony/src/infra/brocade.rs
@@ -5,9 +5,20 @@ use harmony_types::{
    net::{IpAddress, MacAddress},
    switch::{PortDeclaration, PortLocation},
 };
+use log::info;
 use option_ext::OptionExt;

-use crate::topology::{PortConfig, SwitchClient, SwitchError};
+use crate::{
+    modules::brocade::BrocadeSwitchAuth,
+    topology::{PortConfig, SwitchClient, SwitchError},
+};
+
+#[derive(Debug, Clone)]
+pub struct BrocadeSwitchConfig {
+    pub ips: Vec<IpAddress>,
+    pub auth: BrocadeSwitchAuth,
+    pub options: BrocadeOptions,
+}

 #[derive(Debug)]
 pub struct BrocadeSwitchClient {
@@ -15,13 +26,11 @@ pub struct BrocadeSwitchClient {
 }

 impl BrocadeSwitchClient {
-    pub async fn init(
-        ip_addresses: &[IpAddress],
-        username: &str,
-        password: &str,
-        options: BrocadeOptions,
-    ) -> Result<Self, brocade::Error> {
-        let brocade = brocade::init(ip_addresses, username, password, options).await?;
+    pub async fn init(config: BrocadeSwitchConfig) -> Result<Self, brocade::Error> {
+        let auth = &config.auth;
+        let options = &config.options;
+
+        let brocade = brocade::init(&config.ips, &auth.username, &auth.password, options).await?;
        Ok(Self { brocade })
    }
 }
@@ -52,13 +61,18 @@ impl SwitchClient for BrocadeSwitchClient {
                        || link.remote_port.contains(&interface.port_location)
                })
            })
-            .map(|interface| (interface.name.clone(), PortOperatingMode::Access))
+            .map(|interface| (interface.name.clone(), PortOperatingMode::Trunk))
            .collect();

        if interfaces.is_empty() {
            return Ok(());
        }

+        info!("About to configure interfaces {interfaces:?}");
+        // inquire::Confirm::new("Do you wish to configures interfaces now?")
+        //     .prompt()
+        //     .map_err(|e| SwitchError::new(e.to_string()))?;
+
        self.brocade
            .configure_interfaces(&interfaces)
            .await
@@ -208,8 +222,8 @@ mod tests {
        //TODO not sure about this
        let configured_interfaces = brocade.configured_interfaces.lock().unwrap();
        assert_that!(*configured_interfaces).contains_exactly(vec![
-            (first_interface.name.clone(), PortOperatingMode::Access),
-            (second_interface.name.clone(), PortOperatingMode::Access),
+            (first_interface.name.clone(), PortOperatingMode::Trunk),
+            (second_interface.name.clone(), PortOperatingMode::Trunk),
        ]);
    }

--- a/harmony/src/infra/network_manager.rs
+++ b/harmony/src/infra/network_manager.rs
@@ -3,20 +3,77 @@ use std::{
    sync::Arc,
 };

+use askama::Template;
 use async_trait::async_trait;
+use harmony_k8s::{DrainOptions, K8sClient, NodeFile};
 use harmony_types::id::Id;
 use k8s_openapi::api::core::v1::Node;
 use kube::{
    ResourceExt,
    api::{ObjectList, ObjectMeta},
 };
-use log::{debug, info};
+use log::{debug, info, warn};

 use crate::{
    modules::okd::crd::nmstate,
-    topology::{HostNetworkConfig, NetworkError, NetworkManager, k8s::K8sClient},
+    topology::{HostNetworkConfig, NetworkError, NetworkManager},
 };

+/// NetworkManager bond configuration template
+#[derive(Template)]
+#[template(
+    source = r#"[connection]
+id={{ bond_name }}
+uuid={{ bond_uuid }}
+type=bond
+autoconnect-slaves=1
+interface-name={{ bond_name }}
+
+[bond]
+lacp_rate=fast
+mode=802.3ad
+xmit_hash_policy=layer2
+
+[ipv4]
+method=auto
+
+[ipv6]
+addr-gen-mode=default
+method=auto
+
+[proxy]
+"#,
+    ext = "txt"
+)]
+struct BondConfigTemplate {
+    bond_name: String,
+    bond_uuid: String,
+}
+
+/// NetworkManager bond slave configuration template
+#[derive(Template)]
+#[template(
+    source = r#"[connection]
+id={{ slave_id }}
+uuid={{ slave_uuid }}
+type=ethernet
+interface-name={{ interface_name }}
+master={{ bond_name }}
+slave-type=bond
+
+[ethernet]
+
+[bond-port]
+"#,
+    ext = "txt"
+)]
+struct BondSlaveConfigTemplate {
+    slave_id: String,
+    slave_uuid: String,
+    interface_name: String,
+    bond_name: String,
+}
+
 /// TODO document properly the non-intuitive behavior or "roll forward only" of nmstate in general
 /// It is documented in nmstate official doc, but worth mentionning here :
 ///
@@ -87,6 +144,117 @@ impl NetworkManager for OpenShiftNmStateNetworkManager {
        Ok(())
    }

+    /// Configures bonding on the primary network interface of a node.
+    ///
+    /// Changing the *primary* network interface (making it a bond
+    /// slave) will disrupt node connectivity mid-change, so the
+    /// procedure is:
+    ///
+    ///   1. Generate NetworkManager .nmconnection files
+    ///   2. Drain the node (includes cordon)
+    ///   3. Write configuration files to `/etc/NetworkManager/system-connections/`
+    ///   4. Attempt to reload NetworkManager (optional, best-effort)
+    ///   5. Reboot the node with full verification (drain, boot_id check, uncordon)
+    ///
+    /// The reboot procedure includes:
+    /// - Recording boot_id before reboot
+    /// - Fire-and-forget reboot command
+    /// - Waiting for NotReady status
+    /// - Waiting for Ready status
+    /// - Verifying boot_id changed
+    /// - Uncordoning the node
+    ///
+    /// See ADR-019 for context and rationale.
+    async fn configure_bond_on_primary_interface(
+        &self,
+        config: &HostNetworkConfig,
+    ) -> Result<(), NetworkError> {
+        use std::time::Duration;
+
+        let node_name = self.get_node_name_for_id(&config.host_id).await?;
+        let hostname = self.get_hostname(&config.host_id).await?;
+
+        info!(
+            "Configuring bond on primary interface for host '{}' (node '{}')",
+            config.host_id, node_name
+        );
+
+        // 1. Generate .nmconnection files
+        let files = self.generate_nmconnection_files(&hostname, config)?;
+        debug!(
+            "Generated {} NetworkManager configuration files",
+            files.len()
+        );
+
+        // 2. Write configuration files to the node (before draining)
+        // We do this while the node is still running for faster operation
+        info!(
+            "Writing NetworkManager configuration files to node '{}'...",
+            node_name
+        );
+        self.k8s_client
+            .write_files_to_node(&node_name, &files)
+            .await
+            .map_err(|e| {
+                NetworkError::new(format!(
+                    "Failed to write configuration files to node '{}': {}",
+                    node_name, e
+                ))
+            })?;
+
+        // 3. Reload NetworkManager configuration (best-effort)
+        // This won't activate the bond yet since the primary interface would lose connectivity,
+        // but it validates the configuration files are correct
+        info!(
+            "Reloading NetworkManager configuration on node '{}'...",
+            node_name
+        );
+        match self
+            .k8s_client
+            .run_privileged_command_on_node(&node_name, "chroot /host nmcli connection reload")
+            .await
+        {
+            Ok(output) => {
+                debug!("NetworkManager reload output: {}", output.trim());
+            }
+            Err(e) => {
+                warn!(
+                    "Failed to reload NetworkManager configuration: {}. Proceeding with reboot.",
+                    e
+                );
+                // Don't fail here - reboot will pick up the config anyway
+            }
+        }
+
+        // 4. Reboot the node with full verification
+        // The reboot_node function handles: drain, boot_id capture, reboot, NotReady wait,
+        // Ready wait, boot_id verification, and uncordon
+        // 60 minutes timeout for bare-metal environments (drain can take 20-30 mins)
+        let reboot_timeout = Duration::from_secs(3600);
+        info!(
+            "Rebooting node '{}' to apply network configuration (timeout: {:?})...",
+            node_name, reboot_timeout
+        );
+
+        self.k8s_client
+            .reboot_node(
+                &node_name,
+                &DrainOptions::default_ignore_daemonset_delete_emptydir_data(),
+                reboot_timeout,
+            )
+            .await
+            .map_err(|e| {
+                NetworkError::new(format!("Failed to reboot node '{}': {}", node_name, e))
+            })?;
+
+        info!(
+            "Successfully configured bond on primary interface for host '{}' (node '{}')",
+            config.host_id, node_name
+        );
+
+        Ok(())
+    }
+
    async fn configure_bond(&self, config: &HostNetworkConfig) -> Result<(), NetworkError> {
        let hostname = self.get_hostname(&config.host_id).await.map_err(|e| {
            NetworkError::new(format!(
@@ -208,14 +376,14 @@ impl OpenShiftNmStateNetworkManager {
        }
    }

-    async fn get_hostname(&self, host_id: &Id) -> Result<String, String> {
+    async fn get_node_for_id(&self, host_id: &Id) -> Result<Node, String> {
        let nodes: ObjectList<Node> = self
            .k8s_client
            .list_resources(None, None)
            .await
            .map_err(|e| format!("Failed to list nodes: {e}"))?;

-        let Some(node) = nodes.iter().find(|n| {
+        let Some(node) = nodes.into_iter().find(|n| {
            n.status
                .as_ref()
                .and_then(|s| s.node_info.as_ref())
@@ -225,6 +393,20 @@ impl OpenShiftNmStateNetworkManager {
            return Err(format!("No node found for host '{host_id}'"));
        };

+        Ok(node)
+    }
+
+    async fn get_node_name_for_id(&self, host_id: &Id) -> Result<String, String> {
+        let node = self.get_node_for_id(host_id).await?;
+
+        node.metadata.name.ok_or(format!(
+            "A node should always have a name, node for host_id {host_id} has no name"
+        ))
+    }
+
+    async fn get_hostname(&self, host_id: &Id) -> Result<String, String> {
+        let node = self.get_node_for_id(host_id).await?;
+
        node.labels()
            .get("kubernetes.io/hostname")
            .ok_or(format!(
@@ -261,4 +443,82 @@ impl OpenShiftNmStateNetworkManager {
        let next_id = (0..).find(|id| !used_ids.contains(id)).unwrap();
        Ok(format!("bond{next_id}"))
    }
+
+    /// Generates NetworkManager .nmconnection files for bonding configuration.
+    ///
+    /// Creates:
+    /// - One bond master configuration file (bond0.nmconnection)
+    /// - One slave configuration file per interface (bond0-<iface>.nmconnection)
+    ///
+    /// All files are placed in `/etc/NetworkManager/system-connections/` with
+    /// mode 0o600 (required by NetworkManager).
+    fn generate_nmconnection_files(
+        &self,
+        hostname: &str,
+        config: &HostNetworkConfig,
+    ) -> Result<Vec<NodeFile>, NetworkError> {
+        let mut files = Vec::new();
+        let bond_name = "bond0";
+        let bond_uuid = uuid::Uuid::new_v4().to_string();
+
+        // Generate bond master configuration
+        let bond_template = BondConfigTemplate {
+            bond_name: bond_name.to_string(),
+            bond_uuid: bond_uuid.clone(),
+        };
+
+        let bond_content = bond_template.render().map_err(|e| {
+            NetworkError::new(format!(
+                "Failed to render bond configuration template: {}",
+                e
+            ))
+        })?;
+
+        files.push(NodeFile {
+            path: format!(
+                "/etc/NetworkManager/system-connections/{}.nmconnection",
+                bond_name
+            ),
+            content: bond_content,
+            mode: 0o600,
+        });
+
+        // Generate slave configurations for each interface
+        for switch_port in &config.switch_ports {
+            let interface_name = &switch_port.interface.name;
+            let slave_id = format!("{}-{}", bond_name, interface_name);
+            let slave_uuid = uuid::Uuid::new_v4().to_string();
+
+            let slave_template = BondSlaveConfigTemplate {
+                slave_id: slave_id.clone(),
+                slave_uuid,
+                interface_name: interface_name.clone(),
+                bond_name: bond_name.to_string(),
+            };
+
+            let slave_content = slave_template.render().map_err(|e| {
+                NetworkError::new(format!(
+                    "Failed to render slave configuration template for interface '{}': {}",
+                    interface_name, e
+                ))
+            })?;
+
+            files.push(NodeFile {
+                path: format!(
+                    "/etc/NetworkManager/system-connections/{}.nmconnection",
+                    slave_id
+                ),
+                content: slave_content,
+                mode: 0o600,
+            });
+        }
+
+        debug!(
+            "Generated {} NetworkManager configuration files for host '{}'",
+            files.len(),
+            hostname
+        );
+
+        Ok(files)
+    }
 }
--- a/harmony/src/modules/application/backend_app.rs
+++ b/harmony/src/modules/application/backend_app.rs
@@ -1,5 +1,5 @@
 use async_trait::async_trait;
-use log::{debug, info, trace};
+use log::{debug, info};
 use serde::Serialize;
 use std::path::PathBuf;

--- a/harmony/src/modules/application/features/helm_argocd_score.rs
+++ b/harmony/src/modules/application/features/helm_argocd_score.rs
@@ -1,4 +1,5 @@
 use async_trait::async_trait;
+use harmony_k8s::K8sClient;
 use harmony_macros::hurl;
 use log::{debug, info, trace, warn};
 use non_blank_string_rs::NonBlankString;
@@ -14,7 +15,7 @@ use crate::{
        helm::chart::{HelmChartScore, HelmRepository},
    },
    score::Score,
-    topology::{HelmCommand, K8sclient, Topology, ingress::Ingress, k8s::K8sClient},
+    topology::{HelmCommand, K8sclient, Topology, ingress::Ingress},
 };
 use harmony_types::id::Id;

--- a/harmony/src/modules/application/features/monitoring.rs
+++ b/harmony/src/modules/application/features/monitoring.rs
@@ -2,13 +2,15 @@ use crate::modules::application::{
    Application, ApplicationFeature, InstallationError, InstallationOutcome,
 };
 use crate::modules::monitoring::application_monitoring::application_monitoring_score::ApplicationMonitoringScore;
-use crate::modules::monitoring::grafana::grafana::Grafana;
-use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus;
 use crate::modules::monitoring::kube_prometheus::crd::service_monitor::{
    ServiceMonitor, ServiceMonitorSpec,
 };
+use crate::modules::monitoring::prometheus::Prometheus;
+use crate::modules::monitoring::prometheus::helm::prometheus_config::PrometheusConfig;
 use crate::topology::MultiTargetTopology;
 use crate::topology::ingress::Ingress;
+use crate::topology::monitoring::Observability;
+use crate::topology::monitoring::{AlertReceiver, AlertRoute};
 use crate::{
    inventory::Inventory,
    modules::monitoring::{
@@ -17,10 +19,6 @@ use crate::{
    score::Score,
    topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
 };
-use crate::{
-    modules::prometheus::prometheus::PrometheusMonitoring,
-    topology::oberservability::monitoring::AlertReceiver,
-};
 use async_trait::async_trait;
 use base64::{Engine as _, engine::general_purpose};
 use harmony_secret::SecretManager;
@@ -30,12 +28,13 @@ use kube::api::ObjectMeta;
 use log::{debug, info};
 use schemars::JsonSchema;
 use serde::{Deserialize, Serialize};
-use std::sync::Arc;
+use std::sync::{Arc, Mutex};

+//TODO test this
 #[derive(Debug, Clone)]
 pub struct Monitoring {
    pub application: Arc<dyn Application>,
-    pub alert_receiver: Vec<Box<dyn AlertReceiver<CRDPrometheus>>>,
+    pub alert_receiver: Vec<Box<dyn AlertReceiver<Prometheus>>>,
 }

 #[async_trait]
@@ -46,8 +45,7 @@ impl<
        + TenantManager
        + K8sclient
        + MultiTargetTopology
-        + PrometheusMonitoring<CRDPrometheus>
-        + Grafana
+        + Observability<Prometheus>
        + Ingress
        + std::fmt::Debug,
 > ApplicationFeature<T> for Monitoring
@@ -74,17 +72,15 @@ impl<
        };

        let mut alerting_score = ApplicationMonitoringScore {
-            sender: CRDPrometheus {
-                namespace: namespace.clone(),
-                client: topology.k8s_client().await.unwrap(),
-                service_monitor: vec![app_service_monitor],
+            sender: Prometheus {
+                config: Arc::new(Mutex::new(PrometheusConfig::new())),
            },
            application: self.application.clone(),
            receivers: self.alert_receiver.clone(),
        };
        let ntfy = NtfyScore {
            namespace: namespace.clone(),
-            host: domain,
+            host: domain.clone(),
        };
        ntfy.interpret(&Inventory::empty(), topology)
            .await
@@ -105,20 +101,28 @@ impl<

        debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");

+        debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");
        let ntfy_receiver = WebhookReceiver {
            name: "ntfy-webhook".to_string(),
            url: Url::Url(
                url::Url::parse(
                    format!(
-                        "http://ntfy.{}.svc.cluster.local/rust-web-app?auth={ntfy_default_auth_param}",
-                        namespace.clone()
+                        "http://{domain}/{}?auth={ntfy_default_auth_param}",
+                        __self.application.name()
                    )
                    .as_str(),
                )
                .unwrap(),
            ),
+            route: AlertRoute {
+                ..AlertRoute::default("ntfy-webhook".to_string())
+            },
        };
-
+        debug!(
+            "ntfy webhook receiver \n{:#?}\nntfy topic: {}",
+            ntfy_receiver.clone(),
+            self.application.name()
+        );
        alerting_score.receivers.push(Box::new(ntfy_receiver));
        alerting_score
            .interpret(&Inventory::empty(), topology)
--- a/harmony/src/modules/application/features/rhob_monitoring.rs
+++ b/harmony/src/modules/application/features/rhob_monitoring.rs
@@ -3,11 +3,13 @@ use std::sync::Arc;
 use crate::modules::application::{
    Application, ApplicationFeature, InstallationError, InstallationOutcome,
 };
-use crate::modules::monitoring::application_monitoring::rhobs_application_monitoring_score::ApplicationRHOBMonitoringScore;

-use crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::RHOBObservability;
+use crate::modules::monitoring::red_hat_cluster_observability::RedHatClusterObservability;
+use crate::modules::monitoring::red_hat_cluster_observability::redhat_cluster_observability::RedHatClusterObservabilityScore;
 use crate::topology::MultiTargetTopology;
 use crate::topology::ingress::Ingress;
+use crate::topology::monitoring::Observability;
+use crate::topology::monitoring::{AlertReceiver, AlertRoute};
 use crate::{
    inventory::Inventory,
    modules::monitoring::{
@@ -16,10 +18,6 @@ use crate::{
    score::Score,
    topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
 };
-use crate::{
-    modules::prometheus::prometheus::PrometheusMonitoring,
-    topology::oberservability::monitoring::AlertReceiver,
-};
 use async_trait::async_trait;
 use base64::{Engine as _, engine::general_purpose};
 use harmony_types::net::Url;
@@ -28,9 +26,10 @@ use log::{debug, info};
 #[derive(Debug, Clone)]
 pub struct Monitoring {
    pub application: Arc<dyn Application>,
-    pub alert_receiver: Vec<Box<dyn AlertReceiver<RHOBObservability>>>,
+    pub alert_receiver: Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>,
 }

+///TODO TEST this
 #[async_trait]
 impl<
    T: Topology
@@ -41,7 +40,7 @@ impl<
        + MultiTargetTopology
        + Ingress
        + std::fmt::Debug
-        + PrometheusMonitoring<RHOBObservability>,
+        + Observability<RedHatClusterObservability>,
 > ApplicationFeature<T> for Monitoring
 {
    async fn ensure_installed(
@@ -55,13 +54,14 @@ impl<
            .map(|ns| ns.name.clone())
            .unwrap_or_else(|| self.application.name());

-        let mut alerting_score = ApplicationRHOBMonitoringScore {
-            sender: RHOBObservability {
+        let mut alerting_score = RedHatClusterObservabilityScore {
+            sender: RedHatClusterObservability {
                namespace: namespace.clone(),
-                client: topology.k8s_client().await.unwrap(),
+                resource_selector: todo!(),
            },
-            application: self.application.clone(),
            receivers: self.alert_receiver.clone(),
+            rules: vec![],
+            scrape_targets: None,
        };
        let domain = topology
            .get_domain("ntfy")
@@ -97,12 +97,15 @@ impl<
                url::Url::parse(
                    format!(
                        "http://{domain}/{}?auth={ntfy_default_auth_param}",
-                        self.application.name()
+                        __self.application.name()
                    )
                    .as_str(),
                )
                .unwrap(),
            ),
+            route: AlertRoute {
+                ..AlertRoute::default("ntfy-webhook".to_string())
+            },
        };
        debug!(
            "ntfy webhook receiver \n{:#?}\nntfy topic: {}",
--- a/harmony/src/modules/argocd/mod.rs
+++ b/harmony/src/modules/argocd/mod.rs
@@ -1,8 +1,9 @@
 use std::sync::Arc;

+use harmony_k8s::K8sClient;
 use log::{debug, info};

-use crate::{interpret::InterpretError, topology::k8s::K8sClient};
+use crate::interpret::InterpretError;

 #[derive(Clone, Debug, PartialEq, Eq)]
 pub enum ArgoScope {
--- a/harmony/src/modules/brocade/brocade.rs
+++ b/harmony/src/modules/brocade/brocade.rs
@@ -0,0 +1,138 @@
+use async_trait::async_trait;
+use brocade::{BrocadeOptions, PortOperatingMode};
+
+use crate::{
+    data::Version,
+    infra::brocade::{BrocadeSwitchClient, BrocadeSwitchConfig},
+    interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
+    inventory::Inventory,
+    score::Score,
+    topology::{
+        HostNetworkConfig, PortConfig, PreparationError, PreparationOutcome, Switch, SwitchClient,
+        SwitchError, Topology,
+    },
+};
+use harmony_macros::ip;
+use harmony_types::{id::Id, net::MacAddress, switch::PortLocation};
+use log::{debug, info};
+use serde::Serialize;
+
+#[derive(Clone, Debug, Serialize)]
+pub struct BrocadeSwitchScore {
+    pub port_channels_to_clear: Vec<Id>,
+    pub ports_to_configure: Vec<PortConfig>,
+}
+
+impl<T: Topology + Switch> Score<T> for BrocadeSwitchScore {
+    fn name(&self) -> String {
+        "BrocadeSwitchScore".to_string()
+    }
+
+    #[doc(hidden)]
+    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
+        Box::new(BrocadeSwitchInterpret {
+            score: self.clone(),
+        })
+    }
+}
+
+#[derive(Debug)]
+pub struct BrocadeSwitchInterpret {
+    score: BrocadeSwitchScore,
+}
+
+#[async_trait]
+impl<T: Topology + Switch> Interpret<T> for BrocadeSwitchInterpret {
+    async fn execute(
+        &self,
+        _inventory: &Inventory,
+        topology: &T,
+    ) -> Result<Outcome, InterpretError> {
+        info!("Applying switch configuration {:?}", self.score);
+        debug!(
+            "Clearing port channel {:?}",
+            self.score.port_channels_to_clear
+        );
+        topology
+            .clear_port_channel(&self.score.port_channels_to_clear)
+            .await
+            .map_err(|e| InterpretError::new(e.to_string()))?;
+        debug!("Configuring interfaces {:?}", self.score.ports_to_configure);
+        topology
+            .configure_interface(&self.score.ports_to_configure)
+            .await
+            .map_err(|e| InterpretError::new(e.to_string()))?;
+        Ok(Outcome::success("switch configured".to_string()))
+    }
+    fn get_name(&self) -> InterpretName {
+        InterpretName::Custom("BrocadeSwitchInterpret")
+    }
+    fn get_version(&self) -> Version {
+        todo!()
+    }
+    fn get_status(&self) -> InterpretStatus {
+        todo!()
+    }
+    fn get_children(&self) -> Vec<Id> {
+        todo!()
+    }
+}
+
+/*
+pub struct BrocadeSwitchConfig {
+    pub ips: Vec<harmony_types::net::IpAddress>,
+    pub username: String,
+    pub password: String,
+    pub options: BrocadeOptions,
+}
+*/
+
+pub struct SwitchTopology {
+    client: Box<dyn SwitchClient>,
+}
+
+#[async_trait]
+impl Topology for SwitchTopology {
+    fn name(&self) -> &str {
+        "SwitchTopology"
+    }
+
+    async fn ensure_ready(&self) -> Result<PreparationOutcome, PreparationError> {
+        Ok(PreparationOutcome::Noop)
+    }
+}
+
+impl SwitchTopology {
+    pub async fn new(config: BrocadeSwitchConfig) -> Self {
+        let client = BrocadeSwitchClient::init(config)
+            .await
+            .expect("Failed to connect to switch");
+
+        let client = Box::new(client);
+        Self { client }
+    }
+}
+
+#[async_trait]
+impl Switch for SwitchTopology {
+    async fn setup_switch(&self) -> Result<(), SwitchError> {
+        todo!()
+    }
+
+    async fn get_port_for_mac_address(
+        &self,
+        _mac_address: &MacAddress,
+    ) -> Result<Option<PortLocation>, SwitchError> {
+        todo!()
+    }
+
+    async fn configure_port_channel(&self, _config: &HostNetworkConfig) -> Result<(), SwitchError> {
+        todo!()
+    }
+    async fn clear_port_channel(&self, ids: &Vec<Id>) -> Result<(), SwitchError> {
+        self.client.clear_port_channel(ids).await
+    }
+    async fn configure_interface(&self, ports: &Vec<PortConfig>) -> Result<(), SwitchError> {
+        self.client.configure_interface(ports).await
+    }
+}
--- a/harmony/src/modules/brocade/brocade_snmp.rs
+++ b/harmony/src/modules/brocade/brocade_snmp.rs
@@ -39,16 +39,22 @@ pub struct BrocadeEnableSnmpInterpret {
 }

 #[derive(Secret, Clone, Debug, JsonSchema, Serialize, Deserialize)]
-struct BrocadeSwitchAuth {
-    username: String,
-    password: String,
+pub struct BrocadeSwitchAuth {
+    pub username: String,
+    pub password: String,
+}
+
+impl BrocadeSwitchAuth {
+    pub fn user_pass(username: String, password: String) -> Self {
+        Self { username, password }
+    }
 }

 #[derive(Secret, Clone, Debug, JsonSchema, Serialize, Deserialize)]
-struct BrocadeSnmpAuth {
-    username: String,
-    auth_password: String,
-    des_password: String,
+pub struct BrocadeSnmpAuth {
+    pub username: String,
+    pub auth_password: String,
+    pub des_password: String,
 }

 #[async_trait]
@@ -72,7 +78,7 @@ impl<T: Topology> Interpret<T> for BrocadeEnableSnmpInterpret {
            &switch_addresses,
            &config.username,
            &config.password,
-            BrocadeOptions {
+            &BrocadeOptions {
                dry_run: self.score.dry_run,
                ..Default::default()
            },
--- a/harmony/src/modules/brocade/mod.rs
+++ b/harmony/src/modules/brocade/mod.rs
@@ -0,0 +1,5 @@
+pub mod brocade;
+pub use brocade::*;
+
+pub mod brocade_snmp;
+pub use brocade_snmp::*;
--- a/harmony/src/modules/cert_manager/cluster_issuer.rs
+++ b/harmony/src/modules/cert_manager/cluster_issuer.rs
@@ -1,3 +1,4 @@
+use harmony_k8s::K8sClient;
 use std::sync::Arc;

 use async_trait::async_trait;
@@ -11,7 +12,7 @@ use crate::{
    interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
    inventory::Inventory,
    score::Score,
-    topology::{K8sclient, Topology, k8s::K8sClient},
+    topology::{K8sclient, Topology},
 };

 #[derive(Clone, Debug, Serialize)]
--- a/harmony/src/modules/inventory/discovery.rs
+++ b/harmony/src/modules/inventory/discovery.rs
@@ -82,17 +82,40 @@ impl<T: Topology> Interpret<T> for DiscoverHostForRoleInterpret {
                        self.score.role,
                        choice.summary()
                    );
-                    let disk_names: Vec<String> =
-                        choice.storage.iter().map(|s| s.name.clone()).collect();
+                    let mut disk_choices: Vec<(String, String)> = vec![];
+
+                    for s in choice.storage.iter() {
+                        let size_gb: f64 = s.size_bytes as f64 / 1_000_000_000.0;
+                        let (size, unit) = if size_gb >= 1000.0 {
+                            (size_gb / 1000.0, "TB")
+                        } else {
+                            (size_gb, "GB")
+                        };
+                        let drive_type = if s.rotational { "rotational" } else { "SSD" };
+                        let smart_str = s.smart_status.as_deref().unwrap_or("N/A");
+                        let display = format!(
+                            "{} : [{}] - {:.0} {} ({}) - {} - Smart: {}",
+                            s.name, s.model, size, unit, drive_type, s.interface_type, smart_str
+                        );
+                        disk_choices.push((display, s.name.clone()));
+                    }
+
+                    let display_refs: Vec<&str> =
+                        disk_choices.iter().map(|(d, _)| d.as_str()).collect();

                    let disk_choice = inquire::Select::new(
                        &format!("Select the disk to use on host {}:", choice.summary()),
-                        disk_names,
+                        display_refs,
                    )
                    .prompt();

                    match disk_choice {
-                        Ok(disk_name) => {
+                        Ok(selected_display) => {
+                            let disk_name = disk_choices
+                                .iter()
+                                .find(|(d, _)| d.as_str() == selected_display)
+                                .map(|(_, name)| name.clone())
+                                .unwrap();
                            info!("Selected disk {} for node {}", disk_name, choice.summary());
                            host_repo
                                .save_role_mapping(&self.score.role, &choice, &disk_name)
--- a/harmony/src/modules/inventory/mod.rs
+++ b/harmony/src/modules/inventory/mod.rs
@@ -54,6 +54,12 @@ pub enum HarmonyDiscoveryStrategy {
    SUBNET { cidr: cidr::Ipv4Cidr, port: u16 },
 }

+impl Default for HarmonyDiscoveryStrategy {
+    fn default() -> Self {
+        HarmonyDiscoveryStrategy::MDNS
+    }
+}
+
 #[async_trait]
 impl<T: Topology> Interpret<T> for DiscoverInventoryAgentInterpret {
    async fn execute(
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`pub const PRIVILEGED_POD_IMAGE: &str = "hub.nationtech.io/redhat/ubi10:latest";`