Compare commits
28 Commits
feat/harmo
...
e2e-tests-
| Author | SHA1 | Date | |
|---|---|---|---|
| c5b292d99b | |||
| 0258b31fd2 | |||
| 4407792bd5 | |||
| 7978a63004 | |||
| 58d00c95bb | |||
| 7d14f7646c | |||
| 69dd763d6e | |||
| 2e46ac3418 | |||
| af6145afe3 | |||
| 701d86de69 | |||
| 6db7a780fa | |||
| 0df4e3cdee | |||
| 5c34d81d28 | |||
| c4dd0b0cf2 | |||
| b14b41d172 | |||
| 5e861cfc6d | |||
| 4fad077eb4 | |||
| d80561e326 | |||
| 621aed4903 | |||
| e68426cc3d | |||
| 0c1c8daf13 | |||
| 4b5e3a52a1 | |||
| c54936d19f | |||
| 699822af74 | |||
| 554c94f5a9 | |||
| 836db9e6b1 | |||
| bc6a41d40c | |||
| 8d446ec2e4 |
548
CI_and_testing_harmony_analysis.md
Normal file
548
CI_and_testing_harmony_analysis.md
Normal file
@@ -0,0 +1,548 @@
|
|||||||
|
# CI and Testing Strategy for Harmony
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Harmony aims to become a CNCF project, requiring a robust CI pipeline that demonstrates real-world reliability. The goal is to run **all examples** in CI, from simple k3d deployments to full HA OKD clusters on bare metal. This document provides context for designing and implementing this testing infrastructure.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Context
|
||||||
|
|
||||||
|
### What is Harmony?
|
||||||
|
|
||||||
|
Harmony is an infrastructure automation framework that is **code-first and code-only**. Operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Key differentiators:
|
||||||
|
|
||||||
|
1. **Compile-time safety**: The type system prevents "config-is-valid-but-platform-is-wrong" errors
|
||||||
|
2. **Topology abstraction**: Write once, deploy to any environment (local k3d, OKD, bare metal, cloud)
|
||||||
|
3. **Capability-based design**: Scores declare what they need; topologies provide what they have
|
||||||
|
|
||||||
|
### Core Abstractions
|
||||||
|
|
||||||
|
| Concept | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| **Score** | Declarative description of desired state (the "what") |
|
||||||
|
| **Topology** | Logical representation of infrastructure (the "where") |
|
||||||
|
| **Capability** | A feature a topology offers (the "how") |
|
||||||
|
| **Interpret** | Execution logic connecting Score to Topology |
|
||||||
|
|
||||||
|
### Compile-Time Verification
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// This compiles only if K8sAnywhereTopology provides K8sclient + HelmCommand
|
||||||
|
impl<T: Topology + K8sclient + HelmCommand> Score<T> for MyScore { ... }
|
||||||
|
|
||||||
|
// This FAILS to compile - LinuxHostTopology doesn't provide K8sclient
|
||||||
|
// (intentionally broken example for testing)
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for K8sResourceScore { ... }
|
||||||
|
// error: LinuxHostTopology does not implement K8sclient
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Examples Inventory
|
||||||
|
|
||||||
|
### Summary Statistics
|
||||||
|
|
||||||
|
| Category | Count | CI Complexity |
|
||||||
|
|----------|-------|---------------|
|
||||||
|
| k3d-compatible | 22 | Low - single k3d cluster |
|
||||||
|
| OKD-specific | 4 | Medium - requires OKD cluster |
|
||||||
|
| Bare metal | 4 | High - requires physical infra or nested virtualization |
|
||||||
|
| Multi-cluster | 3 | High - requires multiple K8s clusters |
|
||||||
|
| No infra needed | 4 | Trivial - local only |
|
||||||
|
|
||||||
|
### Detailed Example Classification
|
||||||
|
|
||||||
|
#### Tier 1: k3d-Compatible (22 examples)
|
||||||
|
|
||||||
|
Can run on a local k3d cluster with minimal setup:
|
||||||
|
|
||||||
|
| Example | Topology | Capabilities | Special Notes |
|
||||||
|
|---------|----------|--------------|---------------|
|
||||||
|
| zitadel | K8sAnywhereTopology | K8sClient, HelmCommand | SSO/Identity |
|
||||||
|
| node_health | K8sAnywhereTopology | K8sClient | Health checks |
|
||||||
|
| public_postgres | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Needs ingress |
|
||||||
|
| openbao | K8sAnywhereTopology | K8sClient, HelmCommand | Vault alternative |
|
||||||
|
| rust | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Webapp deployment |
|
||||||
|
| cert_manager | K8sAnywhereTopology | K8sClient, CertificateManagement | TLS certificates |
|
||||||
|
| try_rust_webapp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | Full webapp |
|
||||||
|
| monitoring | K8sAnywhereTopology | K8sClient, HelmCommand, Observability | Prometheus |
|
||||||
|
| application_monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
|
||||||
|
| monitoring_with_tenant | K8sAnywhereTopology | K8sClient, HelmCommand, TenantManager, Observability | Multi-tenant |
|
||||||
|
| postgresql | K8sAnywhereTopology | K8sClient, HelmCommand | CloudNativePG |
|
||||||
|
| ntfy | K8sAnywhereTopology | K8sClient, HelmCommand | Notifications |
|
||||||
|
| tenant | K8sAnywhereTopology | K8sClient, TenantManager | Namespace isolation |
|
||||||
|
| lamp | K8sAnywhereTopology | K8sClient, HelmCommand, TlsRouter | LAMP stack |
|
||||||
|
| k8s_drain_node | K8sAnywhereTopology | K8sClient | Node operations |
|
||||||
|
| k8s_write_file_on_node | K8sAnywhereTopology | K8sClient | Node operations |
|
||||||
|
| remove_rook_osd | K8sAnywhereTopology | K8sClient | Ceph operations |
|
||||||
|
| validate_ceph_cluster_health | K8sAnywhereTopology | K8sClient | Ceph health |
|
||||||
|
| kube-rs | Direct kube | K8sClient | Raw kube-rs demo |
|
||||||
|
| brocade_snmp_server | K8sAnywhereTopology | K8sClient | SNMP collector |
|
||||||
|
| harmony_inventory_builder | LocalhostTopology | None | Network scanning |
|
||||||
|
| cli | LocalhostTopology | None | CLI demo |
|
||||||
|
|
||||||
|
#### Tier 2: OKD/OpenShift-Specific (4 examples)
|
||||||
|
|
||||||
|
Require OKD/OpenShift features not available in vanilla K8s:
|
||||||
|
|
||||||
|
| Example | Topology | OKD-Specific Feature |
|
||||||
|
|---------|----------|---------------------|
|
||||||
|
| okd_cluster_alerts | K8sAnywhereTopology | OpenShift Monitoring CRDs |
|
||||||
|
| operatorhub_catalog | K8sAnywhereTopology | OpenShift OperatorHub |
|
||||||
|
| rhob_application_monitoring | K8sAnywhereTopology | RHOB (Red Hat Observability) |
|
||||||
|
| nats-supercluster | K8sAnywhereTopology | OKD Routes (OpenShift Ingress) |
|
||||||
|
|
||||||
|
#### Tier 3: Bare Metal Infrastructure (4 examples)
|
||||||
|
|
||||||
|
Require physical hardware or full virtualization:
|
||||||
|
|
||||||
|
| Example | Topology | Physical Requirements |
|
||||||
|
|---------|----------|----------------------|
|
||||||
|
| okd_installation | HAClusterTopology | OPNSense, Brocade switch, PXE boot, 3+ nodes |
|
||||||
|
| okd_pxe | HAClusterTopology | OPNSense, Brocade switch, PXE infrastructure |
|
||||||
|
| sttest | HAClusterTopology | Full HA cluster with all network services |
|
||||||
|
| opnsense | OPNSenseFirewall | OPNSense firewall access |
|
||||||
|
| opnsense_node_exporter | Custom | OPNSense firewall |
|
||||||
|
|
||||||
|
#### Tier 4: Multi-Cluster (3 examples)
|
||||||
|
|
||||||
|
Require multiple K8s clusters:
|
||||||
|
|
||||||
|
| Example | Topology | Clusters Required |
|
||||||
|
|---------|----------|-------------------|
|
||||||
|
| nats | K8sAnywhereTopology × 2 | 2 clusters with NATS gateways |
|
||||||
|
| nats-module | DecentralizedTopology | 3 clusters for supercluster |
|
||||||
|
| multisite_postgres | FailoverTopology | 2 clusters for replication |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Categories
|
||||||
|
|
||||||
|
### 1. Compile-Time Tests
|
||||||
|
|
||||||
|
These tests verify that the type system correctly rejects invalid configurations:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Should NOT compile - K8sResourceScore on LinuxHostTopology
|
||||||
|
#[test]
|
||||||
|
#[compile_fail]
|
||||||
|
fn test_k8s_score_on_linux_host() {
|
||||||
|
let score = K8sResourceScore::new();
|
||||||
|
let topology = LinuxHostTopology::new();
|
||||||
|
// This line should fail to compile
|
||||||
|
harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Should compile - K8sResourceScore on K8sAnywhereTopology
|
||||||
|
#[test]
|
||||||
|
fn test_k8s_score_on_k8s_topology() {
|
||||||
|
let score = K8sResourceScore::new();
|
||||||
|
let topology = K8sAnywhereTopology::from_env();
|
||||||
|
// This should compile
|
||||||
|
harmony_cli::run(Inventory::empty(), topology, vec![Box::new(score)], None);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Implementation Options:**
|
||||||
|
- `trybuild` crate for compile-time failure tests
|
||||||
|
- Separate `tests/compile_fail/` directory with expected error messages
|
||||||
|
|
||||||
|
### 2. Unit Tests
|
||||||
|
|
||||||
|
Pure Rust logic without external dependencies:
|
||||||
|
- Score serialization/deserialization
|
||||||
|
- Inventory parsing
|
||||||
|
- Type conversions
|
||||||
|
- CRD generation
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- No external services
|
||||||
|
- Sub-second execution
|
||||||
|
- Run on every PR
|
||||||
|
|
||||||
|
### 3. Integration Tests (k3d)
|
||||||
|
|
||||||
|
Deploy to a local k3d cluster:
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
```bash
|
||||||
|
# Install k3d
|
||||||
|
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
|
||||||
|
|
||||||
|
# Create cluster
|
||||||
|
k3d cluster create harmony-test \
|
||||||
|
--agents 3 \
|
||||||
|
--k3s-arg "--disable=traefik@server:0"
|
||||||
|
|
||||||
|
# Wait for ready
|
||||||
|
kubectl wait --for=condition=Ready nodes --all --timeout=120s
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test Matrix:**
|
||||||
|
| Example | k3d | Test Type |
|
||||||
|
|---------|-----|-----------|
|
||||||
|
| zitadel | ✅ | Deploy + health check |
|
||||||
|
| cert_manager | ✅ | Deploy + certificate issuance |
|
||||||
|
| monitoring | ✅ | Deploy + metric collection |
|
||||||
|
| postgresql | ✅ | Deploy + database connectivity |
|
||||||
|
| tenant | ✅ | Namespace creation + isolation |
|
||||||
|
|
||||||
|
### 4. Integration Tests (OKD)
|
||||||
|
|
||||||
|
Deploy to OKD/OpenShift cluster:
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
1. **Nested virtualization**: Run OKD in VMs (slow, expensive)
|
||||||
|
2. **CRC (CodeReady Containers)**: Single-node OKD (resource intensive)
|
||||||
|
3. **Managed OpenShift**: AWS/Azure/GCP (costly)
|
||||||
|
4. **Existing cluster**: Connect to pre-provisioned cluster (fastest)
|
||||||
|
|
||||||
|
**Test Matrix:**
|
||||||
|
| Example | OKD Required | Test Type |
|
||||||
|
|---------|--------------|-----------|
|
||||||
|
| okd_cluster_alerts | ✅ | Alert rule deployment |
|
||||||
|
| rhob_application_monitoring | ✅ | RHOB stack deployment |
|
||||||
|
| operatorhub_catalog | ✅ | Operator installation |
|
||||||
|
|
||||||
|
### 5. End-to-End Tests (Full Infrastructure)
|
||||||
|
|
||||||
|
Complete infrastructure deployment including bare metal:
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
1. **Libvirt + KVM**: Virtual machines on CI runner
|
||||||
|
2. **Nested KVM**: KVM inside KVM (for cloud CI)
|
||||||
|
3. **Dedicated hardware**: Physical test lab
|
||||||
|
4. **Mock/Hybrid**: Mock physical components, real K8s
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CI Environment Options
|
||||||
|
|
||||||
|
### Option A: GitHub Actions (Current Standard)
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Native GitHub integration
|
||||||
|
- Large runner ecosystem
|
||||||
|
- Free for open source
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Limited nested virtualization support
|
||||||
|
- 6-hour job timeout
|
||||||
|
- Resource constraints on free runners
|
||||||
|
|
||||||
|
**Matrix:**
|
||||||
|
```yaml
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
os: [ubuntu-latest]
|
||||||
|
rust: [stable, beta]
|
||||||
|
k8s: [k3d, kind]
|
||||||
|
tier: [unit, k3d-integration]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option B: Self-Hosted Runners
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Full control over environment
|
||||||
|
- Can run nested virtualization
|
||||||
|
- No time limits
|
||||||
|
- Persistent state between runs
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Maintenance overhead
|
||||||
|
- Cost of infrastructure
|
||||||
|
- Security considerations
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Bare metal servers with KVM support
|
||||||
|
- Pre-installed k3d, kind, CRC
|
||||||
|
- OPNSense VM for network tests
|
||||||
|
|
||||||
|
### Option C: Hybrid (GitHub + Self-Hosted)
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Fast unit tests on GitHub runners
|
||||||
|
- Heavy tests on self-hosted infrastructure
|
||||||
|
- Cost-effective
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Two CI systems to maintain
|
||||||
|
- Complexity in test distribution
|
||||||
|
|
||||||
|
### Option D: Cloud CI (CircleCI, GitLab CI, etc.)
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Often better resource options
|
||||||
|
- Docker-in-Docker support
|
||||||
|
- Better nested virtualization
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Cost
|
||||||
|
- Less GitHub-native
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Performance Requirements
|
||||||
|
|
||||||
|
### Target Execution Times
|
||||||
|
|
||||||
|
| Test Category | Target Time | Current (est.) |
|
||||||
|
|---------------|-------------|----------------|
|
||||||
|
| Compile-time tests | < 30s | Unknown |
|
||||||
|
| Unit tests | < 60s | Unknown |
|
||||||
|
| k3d integration (per example) | < 120s | 60-300s |
|
||||||
|
| Full k3d matrix | < 15 min | 30-60 min |
|
||||||
|
| OKD integration | < 30 min | 1-2 hours |
|
||||||
|
| Full E2E | < 2 hours | 4-8 hours |
|
||||||
|
|
||||||
|
### Sub-Second Performance Strategies
|
||||||
|
|
||||||
|
1. **Parallel execution**: Run independent tests concurrently
|
||||||
|
2. **Incremental testing**: Only run affected tests on changes
|
||||||
|
3. **Cached clusters**: Pre-warm k3d clusters
|
||||||
|
4. **Layered testing**: Fail fast on cheaper tests
|
||||||
|
5. **Mock external services**: Fake Discord webhooks, etc.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Data and Secrets Management
|
||||||
|
|
||||||
|
### Secrets Required
|
||||||
|
|
||||||
|
| Secret | Use | Storage |
|
||||||
|
|--------|-----|---------|
|
||||||
|
| Discord webhook URL | Alert receiver tests | GitHub Secrets |
|
||||||
|
| OPNSense credentials | Network tests | Self-hosted only |
|
||||||
|
| Cloud provider creds | Multi-cloud tests | Vault / GitHub Secrets |
|
||||||
|
| TLS certificates | Ingress tests | Generated on-the-fly |
|
||||||
|
|
||||||
|
### Test Data
|
||||||
|
|
||||||
|
| Data | Source | Strategy |
|
||||||
|
|------|--------|----------|
|
||||||
|
| Container images | Public registries | Cache locally |
|
||||||
|
| Helm charts | Public repos | Vendor in repo |
|
||||||
|
| K8s manifests | Generated | Dynamic |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Proposed Test Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ harmony_e2e_tests Package │
|
||||||
|
│ (cargo run -p harmony_e2e_tests) │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||||
|
│ │ Compile │ │ Unit │ │ Compile-Fail Tests │ │
|
||||||
|
│ │ Tests │ │ Tests │ │ (trybuild) │ │
|
||||||
|
│ │ < 30s │ │ < 60s │ │ < 30s │ │
|
||||||
|
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ k3d Integration Tests │ │
|
||||||
|
│ │ Self-provisions k3d cluster, runs 22 examples │ │
|
||||||
|
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
|
||||||
|
│ │ │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ... │ │
|
||||||
|
│ │ │ 60s │ │ 90s │ │ 120s │ │ 90s │ │ │
|
||||||
|
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
|
||||||
|
│ │ Parallel Execution │ │
|
||||||
|
│ └───────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ OKD Integration Tests │ │
|
||||||
|
│ │ Connects to existing OKD cluster or provisions via KVM │ │
|
||||||
|
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
|
||||||
|
│ │ │ okd_cluster_ │ │ rhob_application_ │ │ │
|
||||||
|
│ │ │ alerts (5 min) │ │ monitoring (10 min) │ │ │
|
||||||
|
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
|
||||||
|
│ └───────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ KVM-based E2E Tests │ │
|
||||||
|
│ │ Uses Harmony's KVM module to provision test VMs │ │
|
||||||
|
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
|
||||||
|
│ │ │ okd_installation│ │ Full HA cluster deployment │ │ │
|
||||||
|
│ │ │ (30-60 min) │ │ (60-120 min) │ │ │
|
||||||
|
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
|
||||||
|
│ └───────────────────────────────────────────────────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
Any CI system (GitHub Actions, GitLab CI, Jenkins, cron) just runs:
|
||||||
|
cargo run -p harmony_e2e_tests
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ GitHub Actions │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||||
|
│ │ Compile │ │ Unit │ │ Compile-Fail Tests │ │
|
||||||
|
│ │ Tests │ │ Tests │ │ (trybuild) │ │
|
||||||
|
│ │ < 30s │ │ < 60s │ │ < 30s │ │
|
||||||
|
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ k3d Integration Tests │ │
|
||||||
|
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
|
||||||
|
│ │ │ zitadel │ │ cert-mgr│ │ monitor │ │ postgres│ ... │ │
|
||||||
|
│ │ │ 60s │ │ 90s │ │ 120s │ │ 90s │ │ │
|
||||||
|
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
|
||||||
|
│ │ Parallel Execution │ │
|
||||||
|
│ └───────────────────────────────────────────────────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Self-Hosted Runners │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ OKD Integration Tests │ │
|
||||||
|
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
|
||||||
|
│ │ │ okd_cluster_ │ │ rhob_application_ │ │ │
|
||||||
|
│ │ │ alerts (5 min) │ │ monitoring (10 min) │ │ │
|
||||||
|
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
|
||||||
|
│ └───────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ KVM-based E2E Tests (Harmony provisions) │ │
|
||||||
|
│ │ ┌─────────────────────────────────────────────────────┐ │ │
|
||||||
|
│ │ │ Harmony KVM Module provisions test VMs │ │ │
|
||||||
|
│ │ │ - OKD HA Cluster (3 control plane, 2 workers) │ │ │
|
||||||
|
│ │ │ - OPNSense VM (router/firewall) │ │ │
|
||||||
|
│ │ │ - Brocade simulator VM │ │ │
|
||||||
|
│ │ └─────────────────────────────────────────────────────┘ │ │
|
||||||
|
│ └───────────────────────────────────────────────────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Questions for Researchers
|
||||||
|
|
||||||
|
### Critical Questions
|
||||||
|
|
||||||
|
1. **Self-contained test runner**: How to design `harmony_e2e_tests` package that runs all tests with a single `cargo run` command?
|
||||||
|
|
||||||
|
2. **Nested Virtualization**: What are the prerequisites for running KVM inside a test environment?
|
||||||
|
|
||||||
|
3. **Cost Optimization**: How to minimize cloud costs while running comprehensive E2E tests?
|
||||||
|
|
||||||
|
4. **Test Isolation**: How to ensure test isolation when running parallel k3d tests?
|
||||||
|
|
||||||
|
5. **State Management**: Should we persist k3d clusters between test runs, or create fresh each time?
|
||||||
|
|
||||||
|
6. **Mocking Strategy**: Which external services (Discord, OPNSense, etc.) should be mocked vs. real?
|
||||||
|
|
||||||
|
7. **Compile-Fail Tests**: Best practices for testing Rust compile-time errors?
|
||||||
|
|
||||||
|
8. **Multi-Cluster Tests**: How to efficiently provision and connect multiple K8s clusters in tests?
|
||||||
|
|
||||||
|
9. **Secrets Management**: How to handle secrets for test environments without external CI dependencies?
|
||||||
|
|
||||||
|
10. **Test Flakiness**: Strategies for reducing flakiness in infrastructure tests?
|
||||||
|
|
||||||
|
11. **Reporting**: How to present test results for complex multi-environment test matrices?
|
||||||
|
|
||||||
|
12. **Prerequisite Detection**: How to detect and validate prerequisites (Docker, k3d, KVM) before running tests?
|
||||||
|
|
||||||
|
### Research Areas
|
||||||
|
|
||||||
|
1. **CI/CD Tools**: Evaluate GitHub Actions, GitLab CI, CircleCI, Tekton, Prow for Harmony's needs
|
||||||
|
|
||||||
|
2. **K8s Test Tools**: Evaluate kind, k3d, minikube, microk8s for local testing
|
||||||
|
|
||||||
|
3. **Mock Frameworks**: Evaluate mock-server, wiremock, hoverfly for external service mocking
|
||||||
|
|
||||||
|
4. **Test Frameworks**: Evaluate built-in Rust test, nextest, cargo-tarpaulin for performance
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
### Week 1 (Agentic Velocity)
|
||||||
|
- [ ] Compile-time verification tests working
|
||||||
|
- [ ] Unit tests for monitoring module
|
||||||
|
- [ ] First 5 k3d examples running in CI
|
||||||
|
- [ ] Mock framework for Discord webhooks
|
||||||
|
|
||||||
|
### Week 2
|
||||||
|
- [ ] All 22 k3d-compatible examples in CI
|
||||||
|
- [ ] OKD self-hosted runner operational
|
||||||
|
- [ ] KVM module reviewed and ready for CI
|
||||||
|
|
||||||
|
### Week 3-4
|
||||||
|
- [ ] Full E2E tests with KVM infrastructure
|
||||||
|
- [ ] Multi-cluster tests automated
|
||||||
|
- [ ] All examples tested in CI
|
||||||
|
|
||||||
|
### Month 2
|
||||||
|
- [ ] Sub-15-minute total CI time
|
||||||
|
- [ ] Weekly E2E tests on bare metal
|
||||||
|
- [ ] Documentation complete
|
||||||
|
- [ ] Ready for CNCF submission
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
### Hardware Requirements
|
||||||
|
|
||||||
|
| Component | Minimum | Recommended |
|
||||||
|
|-----------|---------|------------|
|
||||||
|
| CPU | 4 cores | 8+ cores (for parallel tests) |
|
||||||
|
| RAM | 8 GB | 32 GB (for KVM E2E) |
|
||||||
|
| Disk | 50 GB SSD | 500 GB NVMe |
|
||||||
|
| Docker | Required | Latest |
|
||||||
|
| k3d | Required | v5.6.0 |
|
||||||
|
| Kubectl | Required | v1.28.0 |
|
||||||
|
| libvirt | Required | 9.0.0 (for KVM tests) |
|
||||||
|
|
||||||
|
### Software Requirements
|
||||||
|
| Tool | Version |
|
||||||
|
|------|---------|
|
||||||
|
| Rust | 1.75+ |
|
||||||
|
| Docker | 24.0+ |
|
||||||
|
| k3d | v5.6.0+ |
|
||||||
|
| kubectl | v1.28+ |
|
||||||
|
| libvirt | 9.0.0 |
|
||||||
|
|
||||||
|
### Installation (One-time)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install Rust
|
||||||
|
curl --proto '=https://sh.rustup.rs' -sSf | sh
|
||||||
|
|
||||||
|
# Install Docker
|
||||||
|
curl -fsSL https://get.docker.com -o docker-ce | sh
|
||||||
|
|
||||||
|
# Install k3d
|
||||||
|
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
|
||||||
|
|
||||||
|
# Install kubectl
|
||||||
|
curl -LO "https://dl.k8s.io/release/v1.28.0/bin/linux/amd64" -o /usr/local/bin/kubectl
|
||||||
|
|
||||||
|
sudo mv /usr/local/bin/kubectl /usr/local/bin
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reference Materials
|
||||||
|
### Existing Code
|
||||||
|
|
||||||
|
- Examples: `examples/*/src/main.rs`
|
||||||
|
- Topologies: `harmony/src/domain/topology/`
|
||||||
|
- Capabilities: `harmony/src/domain/topology/` (trait definitions)
|
||||||
|
- Scores: `harmony/src/modules/*/`
|
||||||
|
|
||||||
|
### Documentation
|
||||||
|
|
||||||
|
- [Coding Guide](docs/coding-guide.md)
|
||||||
|
- [Core Concepts](docs/concepts.md)
|
||||||
|
- [Monitoring Architecture](docs/monitoring.md)
|
||||||
|
- [ADR-020: Monitoring](adr/020-monitoring-alerting-architecture.md)
|
||||||
|
|
||||||
|
### Related Projects
|
||||||
|
|
||||||
|
- Crossplane (similar abstraction model)
|
||||||
|
- Pulumi (infrastructure as code)
|
||||||
|
- Terraform (state management patterns)
|
||||||
|
- Flux/ArgoCD (GitOps testing patterns)
|
||||||
201
CI_and_testing_roadmap.md
Normal file
201
CI_and_testing_roadmap.md
Normal file
@@ -0,0 +1,201 @@
|
|||||||
|
# Pragmatic CI and Testing Roadmap for Harmony
|
||||||
|
|
||||||
|
**Status**: Active implementation (March 2026)
|
||||||
|
**Core Principle**: Self-contained test runner — no dependency on centralized CI servers
|
||||||
|
|
||||||
|
All tests are executable via one command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo run -p harmony_e2e_tests
|
||||||
|
```
|
||||||
|
|
||||||
|
The `harmony_e2e_tests` package:
|
||||||
|
- Provisions its own infrastructure when needed (k3d, KVM VMs)
|
||||||
|
- Runs all test tiers in sequence or selectively
|
||||||
|
- Reports results in text, JSON or JUnit XML
|
||||||
|
- Works identically on developer laptops, any Linux server, GitHub Actions, GitLab CI, Jenkins, cron jobs, etc.
|
||||||
|
- Is the single source of truth for what "passing CI" means
|
||||||
|
|
||||||
|
## Why This Approach
|
||||||
|
|
||||||
|
1. **Portability** — same command & behavior everywhere
|
||||||
|
2. **Harmony tests Harmony** — the framework validates itself
|
||||||
|
3. **No vendor lock-in** — GitHub Actions / GitLab CI are just triggers
|
||||||
|
4. **Perfect reproducibility** — developers reproduce any CI failure locally in seconds
|
||||||
|
5. **Offline capable** — after initial setup, most tiers run without internet
|
||||||
|
|
||||||
|
## Architecture: `harmony_e2e_tests` Package
|
||||||
|
|
||||||
|
```
|
||||||
|
harmony_e2e_tests/
|
||||||
|
├── Cargo.toml
|
||||||
|
├── src/
|
||||||
|
│ ├── main.rs # CLI entry point
|
||||||
|
│ ├── lib.rs # Test runner core logic
|
||||||
|
│ ├── tiers/
|
||||||
|
│ │ ├── mod.rs
|
||||||
|
│ │ ├── compile_fail.rs # trybuild-based compile-time checks
|
||||||
|
│ │ ├── unit.rs # cargo test --lib --workspace
|
||||||
|
│ │ ├── k3d.rs # k3d cluster + parallel example runs
|
||||||
|
│ │ ├── okd.rs # connect to existing OKD cluster
|
||||||
|
│ │ └── kvm.rs # full E2E via Harmony's own KVM module
|
||||||
|
│ ├── mocks/
|
||||||
|
│ │ ├── mod.rs
|
||||||
|
│ │ ├── discord.rs # mock Discord webhook receiver
|
||||||
|
│ │ └── opnsense.rs # mock OPNSense firewall API
|
||||||
|
│ └── infrastructure/
|
||||||
|
│ ├── mod.rs
|
||||||
|
│ ├── k3d.rs # k3d cluster lifecycle
|
||||||
|
│ └── kvm.rs # helper wrappers around KVM score
|
||||||
|
└── tests/
|
||||||
|
├── ui/ # trybuild compile-fail cases (*.rs + *.stderr)
|
||||||
|
└── fixtures/ # static test data / golden files
|
||||||
|
```
|
||||||
|
|
||||||
|
## CLI Interface ( clap-based )
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run everything (default)
|
||||||
|
cargo run -p harmony_e2e_tests
|
||||||
|
|
||||||
|
# Specific tier
|
||||||
|
cargo run -p harmony_e2e_tests -- --tier k3d
|
||||||
|
cargo run -p harmony_e2e_tests -- --tier compile
|
||||||
|
|
||||||
|
# Filter to one example
|
||||||
|
cargo run -p harmony_e2e_tests -- --tier k3d --example monitoring
|
||||||
|
|
||||||
|
# Parallelism control (k3d tier)
|
||||||
|
cargo run -p harmony_e2e_tests -- --parallel 8
|
||||||
|
|
||||||
|
# Reporting
|
||||||
|
cargo run -p harmony_e2e_tests -- --report junit.xml
|
||||||
|
cargo run -p harmony_e2e_tests -- --format json
|
||||||
|
|
||||||
|
# Debug helpers
|
||||||
|
cargo run -p harmony_e2e_tests -- --verbose --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test Tiers – Ordered by Speed & Cost
|
||||||
|
|
||||||
|
| Tier | Duration target | Runner type | What it tests | Isolation strategy |
|
||||||
|
|------------------|------------------|----------------------|----------------------------------------------------|-----------------------------|
|
||||||
|
| Compile-fail | < 20 s | Any (GitHub free) | Invalid configs don't compile | Per-file trybuild |
|
||||||
|
| Unit | < 60 s | Any | Pure Rust logic | cargo test |
|
||||||
|
| k3d | 8–15 min | GitHub / self-hosted | 22+ k3d-compatible examples | Fresh k3d cluster + ns-per-example |
|
||||||
|
| OKD | 10–30 min | Self-hosted / CRC | OKD-specific features (Routes, Monitoring CRDs…) | Existing cluster via KUBECONFIG |
|
||||||
|
| KVM Full E2E | 60–180 min | Self-hosted bare-metal | Full HA OKD install + bare-metal scenarios | Harmony KVM score provisions VMs |
|
||||||
|
|
||||||
|
### Tier Details & Implementation Notes
|
||||||
|
|
||||||
|
1. **Compile-fail**
|
||||||
|
Uses **`trybuild`** crate (standard in Rust ecosystem).
|
||||||
|
Place intentional compile errors in `tests/ui/*.rs` with matching `*.stderr` expectation files.
|
||||||
|
One test function replaces the old custom loop:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[test]
|
||||||
|
fn ui() {
|
||||||
|
let t = trybuild::TestCases::new();
|
||||||
|
t.compile_fail("tests/ui/*.rs");
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Unit**
|
||||||
|
Simple wrapper: `cargo test --lib --workspace -- --nocapture`
|
||||||
|
Consider `cargo-nextest` later for 2–3× speedup if test count grows.
|
||||||
|
|
||||||
|
3. **k3d**
|
||||||
|
- Provisions isolated cluster once at start (`k3d cluster create --agents 3 --no-lb --disable traefik`)
|
||||||
|
- Discovers examples via `[package.metadata.harmony.test-tier = "k3d"]` in `Cargo.toml`
|
||||||
|
- Runs in parallel with tokio semaphore (default 5–8 slots)
|
||||||
|
- Each example gets its own namespace
|
||||||
|
- Uses `defer` / `scopeguard` for guaranteed cleanup
|
||||||
|
- Mocks Discord webhook and OPNSense API
|
||||||
|
|
||||||
|
4. **OKD**
|
||||||
|
Connects to pre-provisioned cluster via `KUBECONFIG`.
|
||||||
|
Validates it is actually OpenShift/OKD before proceeding.
|
||||||
|
|
||||||
|
5. **KVM**
|
||||||
|
Uses **Harmony’s own KVM module** to provision test VMs (control-plane + workers + OPNSense).
|
||||||
|
→ True “dogfooding” — if the E2E fails, the KVM score itself is likely broken.
|
||||||
|
|
||||||
|
## CI Integration Patterns
|
||||||
|
|
||||||
|
### Fast PR validation (GitHub Actions)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name: Fast Tests
|
||||||
|
on: [push, pull_request]
|
||||||
|
jobs:
|
||||||
|
fast:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- uses: dtolnay/rust-toolchain@stable
|
||||||
|
- name: Install Docker & k3d
|
||||||
|
uses: nolar/setup-k3d-k3s@v1
|
||||||
|
- run: cargo run -p harmony_e2e_tests -- --tier compile,unit,k3d --report junit.xml
|
||||||
|
- uses: actions/upload-artifact@v4
|
||||||
|
with: { name: test-results, path: junit.xml }
|
||||||
|
```
|
||||||
|
|
||||||
|
### Nightly / Merge heavy tests (self-hosted runner)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name: Full E2E
|
||||||
|
on:
|
||||||
|
schedule: [{ cron: "0 3 * * *" }]
|
||||||
|
push: { branches: [main] }
|
||||||
|
jobs:
|
||||||
|
full:
|
||||||
|
runs-on: [self-hosted, linux, x64, kvm-capable]
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- run: cargo run -p harmony_e2e_tests -- --tier okd,kvm --verbose --report junit.xml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Prerequisites Auto-Check & Install
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// in harmony_e2e_tests/src/infrastructure/prerequisites.rs
|
||||||
|
async fn ensure_k3d() -> Result<()> { … } // curl | bash if missing
|
||||||
|
async fn ensure_docker() -> Result<()> { … }
|
||||||
|
fn check_kvm_support() -> Result<()> { … } // /dev/kvm + libvirt
|
||||||
|
```
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
### Step 1
|
||||||
|
- [ ] `harmony_e2e_tests` package created & basic CLI working
|
||||||
|
- [ ] trybuild compile-fail suite passing
|
||||||
|
- [ ] First 8–10 k3d examples running reliably in CI
|
||||||
|
- [ ] Mock server for Discord webhook completed
|
||||||
|
|
||||||
|
### Step 2
|
||||||
|
- [ ] All 22 k3d-compatible examples green
|
||||||
|
- [ ] OKD tier running on dedicated self-hosted runner
|
||||||
|
- [ ] JUnit reporting + GitHub check integration
|
||||||
|
- [ ] Namespace isolation + automatic retry on transient k8s errors
|
||||||
|
|
||||||
|
### Step 3
|
||||||
|
- [ ] KVM full E2E green on bare-metal runner (nightly)
|
||||||
|
- [ ] Multi-cluster examples (nats, multisite-postgres) automated
|
||||||
|
- [ ] Total fast CI time < 12 minutes on GitHub runners
|
||||||
|
- [ ] Documentation: “How to add a new tested example”
|
||||||
|
|
||||||
|
## Quick Start for New Contributors
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# One-time setup
|
||||||
|
rustup update stable
|
||||||
|
cargo install trybuild cargo-nextest # optional but recommended
|
||||||
|
|
||||||
|
# Run locally (most common)
|
||||||
|
cargo run -p harmony_e2e_tests -- --tier k3d --verbose
|
||||||
|
|
||||||
|
# Just compile checks + unit
|
||||||
|
cargo test -p harmony_e2e_tests
|
||||||
|
```
|
||||||
|
|
||||||
624
Cargo.lock
generated
624
Cargo.lock
generated
@@ -297,6 +297,12 @@ dependencies = [
|
|||||||
"libc",
|
"libc",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "ansi_term"
|
||||||
|
version = "0.10.2"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "6b3568b48b7cefa6b8ce125f9bb4989e52fbcc29ebea88df04cc7c5f12f70455"
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "anstream"
|
name = "anstream"
|
||||||
version = "0.6.21"
|
version = "0.6.21"
|
||||||
@@ -718,6 +724,41 @@ dependencies = [
|
|||||||
"tokio",
|
"tokio",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "brocade-snmp-server"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"base64 0.22.1",
|
||||||
|
"brocade",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_secret",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"serde",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "brocade-switch"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"async-trait",
|
||||||
|
"brocade",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"serde",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "brotli"
|
name = "brotli"
|
||||||
version = "8.0.2"
|
version = "8.0.2"
|
||||||
@@ -871,6 +912,22 @@ dependencies = [
|
|||||||
"shlex",
|
"shlex",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "cert_manager"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"assert_cmd",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "cfg-if"
|
name = "cfg-if"
|
||||||
version = "1.0.4"
|
version = "1.0.4"
|
||||||
@@ -1853,6 +1910,12 @@ dependencies = [
|
|||||||
"regex",
|
"regex",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "env_home"
|
||||||
|
version = "0.1.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "c7f84e12ccf0a7ddc17a6c41c93326024c42920d7ee630d04950e6926645c0fe"
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "env_logger"
|
name = "env_logger"
|
||||||
version = "0.11.9"
|
version = "0.11.9"
|
||||||
@@ -1929,6 +1992,457 @@ dependencies = [
|
|||||||
name = "example"
|
name = "example"
|
||||||
version = "0.0.0"
|
version = "0.0.0"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-application-monitoring-with-tenant"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_types",
|
||||||
|
"logging",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-cli"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"assert_cmd",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-k8s-drain-node"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"assert_cmd",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony-k8s",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"inquire 0.7.5",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-k8s-write-file-on-node"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"assert_cmd",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony-k8s",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"inquire 0.7.5",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-kube-rs"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_macros",
|
||||||
|
"http 1.4.0",
|
||||||
|
"inquire 0.7.5",
|
||||||
|
"k8s-openapi",
|
||||||
|
"kube",
|
||||||
|
"log",
|
||||||
|
"serde_yaml",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-lamp"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-monitoring"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-monitoring-with-tenant"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_types",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-multisite-postgres"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-nats"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-nats-module-supercluster"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"k8s-openapi",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-nats-supercluster"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"k8s-openapi",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-node-health"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-ntfy"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-okd-cluster-alerts"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"brocade",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_secret",
|
||||||
|
"harmony_secret_derive",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"serde",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-okd-install"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"brocade",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_secret",
|
||||||
|
"harmony_secret_derive",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"schemars 0.8.22",
|
||||||
|
"serde",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-openbao"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-operatorhub-catalogsource"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-opnsense"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"brocade",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_secret",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"schemars 0.8.22",
|
||||||
|
"serde",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-opnsense-node-exporter"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"async-trait",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_secret",
|
||||||
|
"harmony_secret_derive",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"serde",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-postgresql"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-public-postgres"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-pxe"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"brocade",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_secret",
|
||||||
|
"harmony_secret_derive",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"schemars 0.8.22",
|
||||||
|
"serde",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-remove-rook-osd"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"tokio",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-rust"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"base64 0.22.1",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-tenant"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-try-rust-webapp"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"base64 0.22.1",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-tui"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_tui",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example-zitadel"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "example_validate_ceph_cluster_health"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"tokio",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "eyre"
|
name = "eyre"
|
||||||
version = "0.6.12"
|
version = "0.6.12"
|
||||||
@@ -2540,6 +3054,30 @@ dependencies = [
|
|||||||
"tokio",
|
"tokio",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "harmony_e2e_tests"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"async-trait",
|
||||||
|
"chrono",
|
||||||
|
"clap",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"inventory",
|
||||||
|
"k3d-rs",
|
||||||
|
"k8s-openapi",
|
||||||
|
"kube",
|
||||||
|
"log",
|
||||||
|
"serde",
|
||||||
|
"serde_json",
|
||||||
|
"sqlx",
|
||||||
|
"tempfile",
|
||||||
|
"thiserror 2.0.18",
|
||||||
|
"tokio",
|
||||||
|
"tokio-stream",
|
||||||
|
"which",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "harmony_execution"
|
name = "harmony_execution"
|
||||||
version = "0.1.0"
|
version = "0.1.0"
|
||||||
@@ -2569,6 +3107,19 @@ dependencies = [
|
|||||||
"tokio",
|
"tokio",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "harmony_inventory_builder"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"cidr",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "harmony_macros"
|
name = "harmony_macros"
|
||||||
version = "0.1.0"
|
version = "0.1.0"
|
||||||
@@ -3333,6 +3884,15 @@ dependencies = [
|
|||||||
"thiserror 1.0.69",
|
"thiserror 1.0.69",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "inventory"
|
||||||
|
version = "0.3.22"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "009ae045c87e7082cb72dab0ccd01ae075dd00141ddc108f43a0ea150a9e7227"
|
||||||
|
dependencies = [
|
||||||
|
"rustversion",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "ipnet"
|
name = "ipnet"
|
||||||
version = "2.12.0"
|
version = "2.12.0"
|
||||||
@@ -3732,6 +4292,15 @@ dependencies = [
|
|||||||
"log",
|
"log",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "logging"
|
||||||
|
version = "0.1.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "461a8beca676e8ab1bd468c92e9b4436d6368e11e96ae038209e520cfe665e46"
|
||||||
|
dependencies = [
|
||||||
|
"ansi_term",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "lru"
|
name = "lru"
|
||||||
version = "0.12.5"
|
version = "0.12.5"
|
||||||
@@ -4954,6 +5523,21 @@ dependencies = [
|
|||||||
"subtle",
|
"subtle",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "rhob-application-monitoring"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"base64 0.22.1",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "ring"
|
name = "ring"
|
||||||
version = "0.17.14"
|
version = "0.17.14"
|
||||||
@@ -5927,6 +6511,7 @@ dependencies = [
|
|||||||
"memchr",
|
"memchr",
|
||||||
"once_cell",
|
"once_cell",
|
||||||
"percent-encoding",
|
"percent-encoding",
|
||||||
|
"rustls 0.23.37",
|
||||||
"serde",
|
"serde",
|
||||||
"serde_json",
|
"serde_json",
|
||||||
"sha2",
|
"sha2",
|
||||||
@@ -5936,6 +6521,7 @@ dependencies = [
|
|||||||
"tokio-stream",
|
"tokio-stream",
|
||||||
"tracing",
|
"tracing",
|
||||||
"url",
|
"url",
|
||||||
|
"webpki-roots 0.26.11",
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
@@ -6208,6 +6794,26 @@ dependencies = [
|
|||||||
"syn 2.0.117",
|
"syn 2.0.117",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "sttest"
|
||||||
|
version = "0.1.0"
|
||||||
|
dependencies = [
|
||||||
|
"brocade",
|
||||||
|
"cidr",
|
||||||
|
"env_logger",
|
||||||
|
"harmony",
|
||||||
|
"harmony_cli",
|
||||||
|
"harmony_macros",
|
||||||
|
"harmony_secret",
|
||||||
|
"harmony_secret_derive",
|
||||||
|
"harmony_types",
|
||||||
|
"log",
|
||||||
|
"schemars 0.8.22",
|
||||||
|
"serde",
|
||||||
|
"tokio",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "subtle"
|
name = "subtle"
|
||||||
version = "2.6.1"
|
version = "2.6.1"
|
||||||
@@ -7210,6 +7816,18 @@ dependencies = [
|
|||||||
"rustls-pki-types",
|
"rustls-pki-types",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "which"
|
||||||
|
version = "7.0.3"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "24d643ce3fd3e5b54854602a080f34fb10ab75e0b813ee32d00ca2b44fa74762"
|
||||||
|
dependencies = [
|
||||||
|
"either",
|
||||||
|
"env_home",
|
||||||
|
"rustix 1.1.4",
|
||||||
|
"winsafe",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "whoami"
|
name = "whoami"
|
||||||
version = "1.6.1"
|
version = "1.6.1"
|
||||||
@@ -7585,6 +8203,12 @@ dependencies = [
|
|||||||
"windows-sys 0.48.0",
|
"windows-sys 0.48.0",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "winsafe"
|
||||||
|
version = "0.0.19"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "d135d17ab770252ad95e9a872d365cf3090e3be864a34ab46f48555993efc904"
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "wit-bindgen"
|
name = "wit-bindgen"
|
||||||
version = "0.51.0"
|
version = "0.51.0"
|
||||||
|
|||||||
10
Cargo.toml
10
Cargo.toml
@@ -2,6 +2,7 @@
|
|||||||
resolver = "2"
|
resolver = "2"
|
||||||
members = [
|
members = [
|
||||||
"private_repos/*",
|
"private_repos/*",
|
||||||
|
"examples/*",
|
||||||
"harmony",
|
"harmony",
|
||||||
"harmony_types",
|
"harmony_types",
|
||||||
"harmony_macros",
|
"harmony_macros",
|
||||||
@@ -16,9 +17,12 @@ members = [
|
|||||||
"harmony_secret_derive",
|
"harmony_secret_derive",
|
||||||
"harmony_secret",
|
"harmony_secret",
|
||||||
"adr/agent_discovery/mdns",
|
"adr/agent_discovery/mdns",
|
||||||
"brocade",
|
"brocade",
|
||||||
"harmony_agent",
|
"harmony_agent",
|
||||||
"harmony_agent/deploy", "harmony_node_readiness", "harmony-k8s",
|
"harmony_agent/deploy",
|
||||||
|
"harmony_node_readiness",
|
||||||
|
"harmony-k8s",
|
||||||
|
"harmony_e2e_tests",
|
||||||
]
|
]
|
||||||
|
|
||||||
[workspace.package]
|
[workspace.package]
|
||||||
|
|||||||
318
adr/020-monitoring-alerting-architecture.md
Normal file
318
adr/020-monitoring-alerting-architecture.md
Normal file
@@ -0,0 +1,318 @@
|
|||||||
|
# Architecture Decision Record: Monitoring and Alerting Architecture
|
||||||
|
|
||||||
|
Initial Author: Willem Rolleman, Jean-Gabriel Carrier
|
||||||
|
|
||||||
|
Initial Date: March 9, 2026
|
||||||
|
|
||||||
|
Last Updated Date: March 9, 2026
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
Supersedes: [ADR-010](010-monitoring-and-alerting.md)
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Harmony needs a unified approach to monitoring and alerting across different infrastructure targets:
|
||||||
|
|
||||||
|
1. **Cluster-level monitoring**: Administrators managing entire Kubernetes/OKD clusters need to define cluster-wide alerts, receivers, and scrape targets.
|
||||||
|
|
||||||
|
2. **Tenant-level monitoring**: Multi-tenant clusters where teams are confined to namespaces need monitoring scoped to their resources.
|
||||||
|
|
||||||
|
3. **Application-level monitoring**: Developers deploying applications want zero-config monitoring that "just works" for their services.
|
||||||
|
|
||||||
|
The monitoring landscape is fragmented:
|
||||||
|
- **OKD/OpenShift**: Built-in Prometheus with AlertmanagerConfig CRDs
|
||||||
|
- **KubePrometheus**: Helm-based stack with PrometheusRule CRDs
|
||||||
|
- **RHOB (Red Hat Observability)**: Operator-based with MonitoringStack CRDs
|
||||||
|
- **Standalone Prometheus**: Raw Prometheus deployments
|
||||||
|
|
||||||
|
Each system has different CRDs, different installation methods, and different configuration APIs.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
We implement a **trait-based architecture with compile-time capability verification** that provides:
|
||||||
|
|
||||||
|
1. **Type-safe abstractions** via parameterized traits: `AlertReceiver<S>`, `AlertRule<S>`, `ScrapeTarget<S>`
|
||||||
|
2. **Compile-time topology compatibility** via the `Observability<S>` capability bound
|
||||||
|
3. **Three levels of abstraction**: Cluster, Tenant, and Application monitoring
|
||||||
|
4. **Pre-built alert rules** as functions that return typed structs
|
||||||
|
|
||||||
|
### Core Traits
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// domain/topology/monitoring.rs
|
||||||
|
|
||||||
|
/// Marker trait for systems that send alerts (Prometheus, etc.)
|
||||||
|
pub trait AlertSender: Send + Sync + std::fmt::Debug {
|
||||||
|
fn name(&self) -> String;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Defines how a receiver (Discord, Slack, etc.) builds its configuration
|
||||||
|
/// for a specific sender type
|
||||||
|
pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
|
||||||
|
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
|
||||||
|
fn name(&self) -> String;
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Defines how an alert rule builds its PrometheusRule configuration
|
||||||
|
pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
|
||||||
|
fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
|
||||||
|
fn name(&self) -> String;
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertRule<S>>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Capability that topologies implement to support monitoring
|
||||||
|
pub trait Observability<S: AlertSender> {
|
||||||
|
async fn install_alert_sender(&self, sender: &S, inventory: &Inventory)
|
||||||
|
-> Result<PreparationOutcome, PreparationError>;
|
||||||
|
async fn install_receivers(&self, sender: &S, inventory: &Inventory,
|
||||||
|
receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>) -> Result<...>;
|
||||||
|
async fn install_rules(&self, sender: &S, inventory: &Inventory,
|
||||||
|
rules: Option<Vec<Box<dyn AlertRule<S>>>>) -> Result<...>;
|
||||||
|
async fn add_scrape_targets(&self, sender: &S, inventory: &Inventory,
|
||||||
|
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>) -> Result<...>;
|
||||||
|
async fn ensure_monitoring_installed(&self, sender: &S, inventory: &Inventory)
|
||||||
|
-> Result<...>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Alert Sender Types
|
||||||
|
|
||||||
|
Each monitoring stack is a distinct `AlertSender`:
|
||||||
|
|
||||||
|
| Sender | Module | Use Case |
|
||||||
|
|--------|--------|----------|
|
||||||
|
| `OpenshiftClusterAlertSender` | `monitoring/okd/` | OKD/OpenShift built-in monitoring |
|
||||||
|
| `KubePrometheus` | `monitoring/kube_prometheus/` | Helm-deployed kube-prometheus-stack |
|
||||||
|
| `Prometheus` | `monitoring/prometheus/` | Standalone Prometheus via Helm |
|
||||||
|
| `RedHatClusterObservability` | `monitoring/red_hat_cluster_observability/` | RHOB operator |
|
||||||
|
| `Grafana` | `monitoring/grafana/` | Grafana-managed alerting |
|
||||||
|
|
||||||
|
### Three Levels of Monitoring
|
||||||
|
|
||||||
|
#### 1. Cluster-Level Monitoring
|
||||||
|
|
||||||
|
For cluster administrators. Full control over monitoring infrastructure.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// examples/okd_cluster_alerts/src/main.rs
|
||||||
|
OpenshiftClusterAlertScore {
|
||||||
|
sender: OpenshiftClusterAlertSender,
|
||||||
|
receivers: vec![Box::new(DiscordReceiver { ... })],
|
||||||
|
rules: vec![Box::new(alert_rules)],
|
||||||
|
scrape_targets: Some(vec![Box::new(external_exporters)]),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Characteristics:**
|
||||||
|
- Cluster-scoped CRDs and resources
|
||||||
|
- Can add external scrape targets (outside cluster)
|
||||||
|
- Manages Alertmanager configuration
|
||||||
|
- Requires cluster-admin privileges
|
||||||
|
|
||||||
|
#### 2. Tenant-Level Monitoring
|
||||||
|
|
||||||
|
For teams confined to namespaces. The topology determines tenant context.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// The topology's Observability impl handles namespace scoping
|
||||||
|
impl Observability<KubePrometheus> for K8sAnywhereTopology {
|
||||||
|
async fn install_rules(&self, sender: &KubePrometheus, ...) {
|
||||||
|
// Topology knows if it's tenant-scoped
|
||||||
|
let namespace = self.get_tenant_config().await
|
||||||
|
.map(|t| t.name)
|
||||||
|
.unwrap_or("default");
|
||||||
|
// Install rules in tenant namespace
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Characteristics:**
|
||||||
|
- Namespace-scoped resources
|
||||||
|
- Cannot modify cluster-level monitoring config
|
||||||
|
- May have restricted receiver types
|
||||||
|
- Runtime validation of permissions (cannot be fully compile-time)
|
||||||
|
|
||||||
|
#### 3. Application-Level Monitoring
|
||||||
|
|
||||||
|
For developers. Zero-config, opinionated monitoring.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// modules/application/features/monitoring.rs
|
||||||
|
pub struct Monitoring {
|
||||||
|
pub application: Arc<dyn Application>,
|
||||||
|
pub alert_receiver: Vec<Box<dyn AlertReceiver<Prometheus>>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + Observability<Prometheus> + TenantManager + ...>
|
||||||
|
ApplicationFeature<T> for Monitoring
|
||||||
|
{
|
||||||
|
async fn ensure_installed(&self, topology: &T) -> Result<...> {
|
||||||
|
// Auto-creates ServiceMonitor
|
||||||
|
// Auto-installs Ntfy for notifications
|
||||||
|
// Handles tenant namespace automatically
|
||||||
|
// Wires up sensible defaults
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Characteristics:**
|
||||||
|
- Automatic ServiceMonitor creation
|
||||||
|
- Opinionated notification channel (Ntfy)
|
||||||
|
- Tenant-aware via topology
|
||||||
|
- Minimal configuration required
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
### Why Generic Traits Instead of Unified Types?
|
||||||
|
|
||||||
|
Each monitoring stack (OKD, KubePrometheus, RHOB) has fundamentally different CRDs:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// OKD uses AlertmanagerConfig with different structure
|
||||||
|
AlertmanagerConfig { spec: { receivers: [...] } }
|
||||||
|
|
||||||
|
// RHOB uses secret references for webhook URLs
|
||||||
|
MonitoringStack { spec: { alertmanagerConfig: { discordConfigs: [{ apiURL: { key: "..." } }] } } }
|
||||||
|
|
||||||
|
// KubePrometheus uses Alertmanager CRD with different field names
|
||||||
|
Alertmanager { spec: { config: { receivers: [...] } } }
|
||||||
|
```
|
||||||
|
|
||||||
|
A unified type would either:
|
||||||
|
1. Be a lowest-common-denominator (loses stack-specific features)
|
||||||
|
2. Be a complex union type (hard to use, easy to misconfigure)
|
||||||
|
|
||||||
|
Generic traits let each stack express its configuration naturally while providing a consistent interface.
|
||||||
|
|
||||||
|
### Why Compile-Time Capability Bounds?
|
||||||
|
|
||||||
|
```rust
|
||||||
|
impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T>
|
||||||
|
for OpenshiftClusterAlertScore { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
This fails at compile time if you try to use `OpenshiftClusterAlertScore` with a topology that doesn't support OKD monitoring. This prevents the "config-is-valid-but-platform-is-wrong" errors that Harmony was designed to eliminate.
|
||||||
|
|
||||||
|
### Why Not a MonitoringStack Abstraction (V2 Approach)?
|
||||||
|
|
||||||
|
The V2 approach proposed a unified `MonitoringStack` that hides sender selection:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// V2 approach - rejected
|
||||||
|
MonitoringStack::new(MonitoringApiVersion::V2CRD)
|
||||||
|
.add_alert_channel(discord)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problems:**
|
||||||
|
1. Hides which sender you're using, losing compile-time guarantees
|
||||||
|
2. "Version selection" actually chooses between fundamentally different systems
|
||||||
|
3. Would need to handle all stack-specific features through a generic interface
|
||||||
|
|
||||||
|
The current approach is explicit: you choose `OpenshiftClusterAlertSender` and the compiler verifies compatibility.
|
||||||
|
|
||||||
|
### Why Runtime Validation for Tenants?
|
||||||
|
|
||||||
|
Tenant confinement is determined at runtime by the topology and K8s RBAC. We cannot know at compile time whether a user has cluster-admin or namespace-only access.
|
||||||
|
|
||||||
|
Options considered:
|
||||||
|
1. **Compile-time tenant markers** - Would require modeling entire RBAC hierarchy in types. Over-engineering.
|
||||||
|
2. **Runtime validation** - Current approach. Fails with clear K8s permission errors if insufficient access.
|
||||||
|
3. **No tenant support** - Would exclude a major use case.
|
||||||
|
|
||||||
|
Runtime validation is the pragmatic choice. The failure mode is clear (K8s API error) and occurs early in execution.
|
||||||
|
|
||||||
|
> Note : we will eventually have compile time validation for such things. Rust macros are powerful and we could discover the actual capabilities we're dealing with, similar to sqlx approach in query! macros.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Pros
|
||||||
|
|
||||||
|
1. **Type Safety**: Invalid configurations are caught at compile time
|
||||||
|
2. **Extensibility**: Adding a new monitoring stack requires implementing traits, not modifying core code
|
||||||
|
3. **Clear Separation**: Cluster/Tenant/Application levels have distinct entry points
|
||||||
|
4. **Reusable Rules**: Pre-built alert rules as functions (`high_pvc_fill_rate_over_two_days()`)
|
||||||
|
5. **CRD Accuracy**: Type definitions match actual Kubernetes CRDs exactly
|
||||||
|
|
||||||
|
### Cons
|
||||||
|
|
||||||
|
1. **Implementation Explosion**: `DiscordReceiver` implements `AlertReceiver<S>` for each sender type (3+ implementations)
|
||||||
|
2. **Learning Curve**: Understanding the trait hierarchy takes time
|
||||||
|
3. **clone_box Boilerplate**: Required for trait object cloning (3 lines per impl)
|
||||||
|
|
||||||
|
### Mitigations
|
||||||
|
|
||||||
|
- Implementation explosion is contained: each receiver type has O(senders) implementations, but receivers are rare compared to rules
|
||||||
|
- Learning curve is documented with examples at each level
|
||||||
|
- clone_box boilerplate is minimal and copy-paste
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
### Unified MonitoringStack Type
|
||||||
|
|
||||||
|
See "Why Not a MonitoringStack Abstraction" above. Rejected for losing compile-time safety.
|
||||||
|
|
||||||
|
### Helm-Only Approach
|
||||||
|
|
||||||
|
Use `HelmScore` directly for each monitoring deployment. Rejected because:
|
||||||
|
- No type safety for alert rules
|
||||||
|
- Cannot compose with application features
|
||||||
|
- No tenant awareness
|
||||||
|
|
||||||
|
### Separate Modules Per Use Case
|
||||||
|
|
||||||
|
Have `cluster_monitoring/`, `tenant_monitoring/`, `app_monitoring/` as separate modules. Rejected because:
|
||||||
|
- Massive code duplication
|
||||||
|
- No shared abstraction for receivers/rules
|
||||||
|
- Adding a feature requires three implementations
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
### Module Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
modules/monitoring/
|
||||||
|
├── mod.rs # Public exports
|
||||||
|
├── alert_channel/ # Receivers (Discord, Webhook)
|
||||||
|
├── alert_rule/ # Rules and pre-built alerts
|
||||||
|
│ ├── prometheus_alert_rule.rs
|
||||||
|
│ └── alerts/ # Library of pre-built rules
|
||||||
|
│ ├── k8s/ # K8s-specific (pvc, pod, memory)
|
||||||
|
│ └── infra/ # Infrastructure (opnsense, dell)
|
||||||
|
├── okd/ # OpenshiftClusterAlertSender
|
||||||
|
├── kube_prometheus/ # KubePrometheus
|
||||||
|
├── prometheus/ # Prometheus
|
||||||
|
├── red_hat_cluster_observability/ # RHOB
|
||||||
|
├── grafana/ # Grafana
|
||||||
|
├── application_monitoring/ # Application-level scores
|
||||||
|
└── scrape_target/ # External scrape targets
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adding a New Alert Sender
|
||||||
|
|
||||||
|
1. Create sender type: `pub struct MySender; impl AlertSender for MySender { ... }`
|
||||||
|
2. Implement `Observability<MySender>` for topologies that support it
|
||||||
|
3. Create CRD types in `crd/` subdirectory
|
||||||
|
4. Implement `AlertReceiver<MySender>` for existing receivers
|
||||||
|
5. Implement `AlertRule<MySender>` for `AlertManagerRuleGroup`
|
||||||
|
|
||||||
|
### Adding a New Alert Rule
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub fn my_custom_alert() -> PrometheusAlertRule {
|
||||||
|
PrometheusAlertRule::new("MyAlert", "up == 0")
|
||||||
|
.for_duration("5m")
|
||||||
|
.label("severity", "critical")
|
||||||
|
.annotation("summary", "Service is down")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
No trait implementation needed - `AlertManagerRuleGroup` already handles conversion.
|
||||||
|
|
||||||
|
## Related ADRs
|
||||||
|
|
||||||
|
- [ADR-013](013-monitoring-notifications.md): Notification channel selection (ntfy)
|
||||||
|
- [ADR-011](011-multi-tenant-cluster.md): Multi-tenant cluster architecture
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
[package]
|
||||||
|
name = "example-monitoring-v2"
|
||||||
|
edition = "2024"
|
||||||
|
version.workspace = true
|
||||||
|
readme.workspace = true
|
||||||
|
license.workspace = true
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
harmony = { path = "../../harmony" }
|
||||||
|
harmony_cli = { path = "../../harmony_cli" }
|
||||||
|
harmony-k8s = { path = "../../harmony-k8s" }
|
||||||
|
harmony_types = { path = "../../harmony_types" }
|
||||||
|
kube = { workspace = true }
|
||||||
|
schemars = "0.8"
|
||||||
|
serde = { workspace = true, features = ["derive"] }
|
||||||
|
serde_json = { workspace = true }
|
||||||
|
serde_yaml = { workspace = true }
|
||||||
|
url = { workspace = true }
|
||||||
|
log = { workspace = true }
|
||||||
|
async-trait = { workspace = true }
|
||||||
|
k8s-openapi = { workspace = true }
|
||||||
@@ -0,0 +1,91 @@
|
|||||||
|
# Monitoring v2 - Improved Architecture
|
||||||
|
|
||||||
|
This example demonstrates the improved monitoring architecture that addresses the "WTF/minute" issues in the original design.
|
||||||
|
|
||||||
|
## Key Improvements
|
||||||
|
|
||||||
|
### 1. **Single AlertChannel Trait with Generic Sender**
|
||||||
|
|
||||||
|
The original design required 9-12 implementations for each alert channel (Discord, Webhook, etc.) - one for each sender type. The new design uses a single trait with generic sender parameterization:
|
||||||
|
|
||||||
|
pub trait AlertChannel<Sender: AlertSender> {
|
||||||
|
async fn install_config(&self, sender: &Sender) -> Result<Outcome, InterpretError>;
|
||||||
|
fn name(&self) -> String;
|
||||||
|
fn as_any(&self) -> &dyn std::any::Any;
|
||||||
|
}
|
||||||
|
|
||||||
|
**Benefits:**
|
||||||
|
- One Discord implementation works with all sender types
|
||||||
|
- Type safety at compile time
|
||||||
|
- No runtime dispatch overhead
|
||||||
|
|
||||||
|
### 2. **MonitoringStack Abstraction**
|
||||||
|
|
||||||
|
Instead of manually selecting CRDPrometheus vs KubePrometheus vs RHOBObservability, you now have a unified MonitoringStack that handles versioning:
|
||||||
|
|
||||||
|
let monitoring_stack = MonitoringStack::new(MonitoringApiVersion::V2CRD)
|
||||||
|
.set_namespace("monitoring")
|
||||||
|
.add_alert_channel(discord_receiver)
|
||||||
|
.set_scrape_targets(vec![...]);
|
||||||
|
|
||||||
|
**Benefits:**
|
||||||
|
- Single source of truth for monitoring configuration
|
||||||
|
- Easy to switch between monitoring versions
|
||||||
|
- Automatic version-specific configuration
|
||||||
|
|
||||||
|
### 3. **TenantMonitoringScore - True Composition**
|
||||||
|
|
||||||
|
The original monitoring_with_tenant example just put tenant and monitoring as separate items in a vec. The new design truly composes them:
|
||||||
|
|
||||||
|
let tenant_score = TenantMonitoringScore::new("test-tenant", monitoring_stack);
|
||||||
|
|
||||||
|
This creates a single score that:
|
||||||
|
- Has tenant context
|
||||||
|
- Has monitoring configuration
|
||||||
|
- Automatically installs monitoring scoped to tenant namespace
|
||||||
|
|
||||||
|
**Benefits:**
|
||||||
|
- No more "two separate things" confusion
|
||||||
|
- Automatic tenant namespace scoping
|
||||||
|
- Clear ownership: tenant owns its monitoring
|
||||||
|
|
||||||
|
### 4. **Versioned Monitoring APIs**
|
||||||
|
|
||||||
|
Clear versioning makes it obvious which monitoring stack you're using:
|
||||||
|
|
||||||
|
pub enum MonitoringApiVersion {
|
||||||
|
V1Helm, // Old Helm charts
|
||||||
|
V2CRD, // Current CRDs
|
||||||
|
V3RHOB, // RHOB (future)
|
||||||
|
}
|
||||||
|
|
||||||
|
**Benefits:**
|
||||||
|
- No guessing which API version you're using
|
||||||
|
- Easy to migrate between versions
|
||||||
|
- Backward compatibility path
|
||||||
|
|
||||||
|
## Comparison
|
||||||
|
|
||||||
|
### Original Design (monitoring_with_tenant)
|
||||||
|
- Manual selection of each component
|
||||||
|
- Manual installation of both components
|
||||||
|
- Need to remember to pass both to harmony_cli::run
|
||||||
|
- Monitoring not scoped to tenant automatically
|
||||||
|
|
||||||
|
### New Design (monitoring_v2)
|
||||||
|
- Single composed score
|
||||||
|
- One score does it all
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
cd examples/monitoring_v2
|
||||||
|
cargo run
|
||||||
|
|
||||||
|
## Migration Path
|
||||||
|
|
||||||
|
To migrate from the old design to the new:
|
||||||
|
|
||||||
|
1. Replace individual alert channel implementations with AlertChannel<Sender>
|
||||||
|
2. Use MonitoringStack instead of manual *Prometheus selection
|
||||||
|
3. Use TenantMonitoringScore instead of separate TenantScore + monitoring scores
|
||||||
|
4. Select monitoring version via MonitoringApiVersion
|
||||||
@@ -0,0 +1,343 @@
|
|||||||
|
use std::collections::HashMap;
|
||||||
|
use std::sync::{Arc, Mutex};
|
||||||
|
|
||||||
|
|
||||||
|
use log::debug;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use serde_yaml::{Mapping, Value};
|
||||||
|
|
||||||
|
use harmony::data::Version;
|
||||||
|
use harmony::interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome};
|
||||||
|
use harmony::inventory::Inventory;
|
||||||
|
use harmony::score::Score;
|
||||||
|
use harmony::topology::{Topology, tenant::TenantManager};
|
||||||
|
|
||||||
|
use harmony_k8s::K8sClient;
|
||||||
|
use harmony_types::k8s_name::K8sName;
|
||||||
|
use harmony_types::net::Url;
|
||||||
|
|
||||||
|
pub trait AlertSender: Send + Sync + std::fmt::Debug {
|
||||||
|
fn name(&self) -> String;
|
||||||
|
fn namespace(&self) -> String;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct CRDPrometheus {
|
||||||
|
pub namespace: String,
|
||||||
|
pub client: Arc<K8sClient>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertSender for CRDPrometheus {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"CRDPrometheus".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn namespace(&self) -> String {
|
||||||
|
self.namespace.clone()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct RHOBObservability {
|
||||||
|
pub namespace: String,
|
||||||
|
pub client: Arc<K8sClient>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertSender for RHOBObservability {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"RHOBObservability".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn namespace(&self) -> String {
|
||||||
|
self.namespace.clone()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct KubePrometheus {
|
||||||
|
pub config: Arc<Mutex<KubePrometheusConfig>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for KubePrometheus {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self::new()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl KubePrometheus {
|
||||||
|
pub fn new() -> Self {
|
||||||
|
Self {
|
||||||
|
config: Arc::new(Mutex::new(KubePrometheusConfig::new())),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertSender for KubePrometheus {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"KubePrometheus".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn namespace(&self) -> String {
|
||||||
|
self.config.lock().unwrap().namespace.clone().unwrap_or_else(|| "monitoring".to_string())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct KubePrometheusConfig {
|
||||||
|
pub namespace: Option<String>,
|
||||||
|
#[serde(skip)]
|
||||||
|
pub alert_receiver_configs: Vec<AlertManagerChannelConfig>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl KubePrometheusConfig {
|
||||||
|
pub fn new() -> Self {
|
||||||
|
Self {
|
||||||
|
namespace: None,
|
||||||
|
alert_receiver_configs: Vec::new(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct AlertManagerChannelConfig {
|
||||||
|
pub channel_receiver: serde_yaml::Value,
|
||||||
|
pub channel_route: serde_yaml::Value,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for AlertManagerChannelConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
channel_receiver: serde_yaml::Value::Mapping(Default::default()),
|
||||||
|
channel_route: serde_yaml::Value::Mapping(Default::default()),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct ScrapeTargetConfig {
|
||||||
|
pub service_name: String,
|
||||||
|
pub port: String,
|
||||||
|
pub path: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub enum MonitoringApiVersion {
|
||||||
|
V1Helm,
|
||||||
|
V2CRD,
|
||||||
|
V3RHOB,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct MonitoringStack {
|
||||||
|
pub version: MonitoringApiVersion,
|
||||||
|
pub namespace: String,
|
||||||
|
pub alert_channels: Vec<Arc<dyn AlertSender>>,
|
||||||
|
pub scrape_targets: Vec<ScrapeTargetConfig>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MonitoringStack {
|
||||||
|
pub fn new(version: MonitoringApiVersion) -> Self {
|
||||||
|
Self {
|
||||||
|
version,
|
||||||
|
namespace: "monitoring".to_string(),
|
||||||
|
alert_channels: Vec::new(),
|
||||||
|
scrape_targets: Vec::new(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn set_namespace(mut self, namespace: &str) -> Self {
|
||||||
|
self.namespace = namespace.to_string();
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn add_alert_channel(mut self, channel: impl AlertSender + 'static) -> Self {
|
||||||
|
self.alert_channels.push(Arc::new(channel));
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn set_scrape_targets(mut self, targets: Vec<(&str, &str, String)>) -> Self {
|
||||||
|
self.scrape_targets = targets
|
||||||
|
.into_iter()
|
||||||
|
.map(|(name, port, path)| ScrapeTargetConfig {
|
||||||
|
service_name: name.to_string(),
|
||||||
|
port: port.to_string(),
|
||||||
|
path,
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
self
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub trait AlertChannel<Sender: AlertSender> {
|
||||||
|
fn install_config(&self, sender: &Sender);
|
||||||
|
fn name(&self) -> String;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct DiscordWebhook {
|
||||||
|
pub name: K8sName,
|
||||||
|
pub url: Url,
|
||||||
|
pub selectors: Vec<HashMap<String, String>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl DiscordWebhook {
|
||||||
|
fn get_config(&self) -> AlertManagerChannelConfig {
|
||||||
|
let mut route = Mapping::new();
|
||||||
|
route.insert(
|
||||||
|
Value::String("receiver".to_string()),
|
||||||
|
Value::String(self.name.to_string()),
|
||||||
|
);
|
||||||
|
route.insert(
|
||||||
|
Value::String("matchers".to_string()),
|
||||||
|
Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
|
||||||
|
);
|
||||||
|
|
||||||
|
let mut receiver = Mapping::new();
|
||||||
|
receiver.insert(
|
||||||
|
Value::String("name".to_string()),
|
||||||
|
Value::String(self.name.to_string()),
|
||||||
|
);
|
||||||
|
|
||||||
|
let mut discord_config = Mapping::new();
|
||||||
|
discord_config.insert(
|
||||||
|
Value::String("webhook_url".to_string()),
|
||||||
|
Value::String(self.url.to_string()),
|
||||||
|
);
|
||||||
|
|
||||||
|
receiver.insert(
|
||||||
|
Value::String("discord_configs".to_string()),
|
||||||
|
Value::Sequence(vec![Value::Mapping(discord_config)]),
|
||||||
|
);
|
||||||
|
|
||||||
|
AlertManagerChannelConfig {
|
||||||
|
channel_receiver: Value::Mapping(receiver),
|
||||||
|
channel_route: Value::Mapping(route),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertChannel<CRDPrometheus> for DiscordWebhook {
|
||||||
|
fn install_config(&self, sender: &CRDPrometheus) {
|
||||||
|
debug!("Installing Discord webhook for CRDPrometheus in namespace: {}", sender.namespace());
|
||||||
|
debug!("Config: {:?}", self.get_config());
|
||||||
|
debug!("Installed!");
|
||||||
|
}
|
||||||
|
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"discord-webhook".to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertChannel<RHOBObservability> for DiscordWebhook {
|
||||||
|
fn install_config(&self, sender: &RHOBObservability) {
|
||||||
|
debug!("Installing Discord webhook for RHOBObservability in namespace: {}", sender.namespace());
|
||||||
|
debug!("Config: {:?}", self.get_config());
|
||||||
|
debug!("Installed!");
|
||||||
|
}
|
||||||
|
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"webhook-receiver".to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertChannel<KubePrometheus> for DiscordWebhook {
|
||||||
|
fn install_config(&self, sender: &KubePrometheus) {
|
||||||
|
debug!("Installing Discord webhook for KubePrometheus in namespace: {}", sender.namespace());
|
||||||
|
let config = sender.config.lock().unwrap();
|
||||||
|
let ns = config.namespace.clone().unwrap_or_else(|| "monitoring".to_string());
|
||||||
|
debug!("Namespace: {}", ns);
|
||||||
|
let mut config = sender.config.lock().unwrap();
|
||||||
|
config.alert_receiver_configs.push(self.get_config());
|
||||||
|
debug!("Installed!");
|
||||||
|
}
|
||||||
|
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"discord-webhook".to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn default_monitoring_stack() -> MonitoringStack {
|
||||||
|
MonitoringStack::new(MonitoringApiVersion::V2CRD)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct TenantMonitoringScore {
|
||||||
|
pub tenant_id: harmony_types::id::Id,
|
||||||
|
pub tenant_name: String,
|
||||||
|
#[serde(skip)]
|
||||||
|
#[serde(default = "default_monitoring_stack")]
|
||||||
|
pub monitoring_stack: MonitoringStack,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TenantMonitoringScore {
|
||||||
|
pub fn new(tenant_name: &str, monitoring_stack: MonitoringStack) -> Self {
|
||||||
|
Self {
|
||||||
|
tenant_id: harmony_types::id::Id::default(),
|
||||||
|
tenant_name: tenant_name.to_string(),
|
||||||
|
monitoring_stack,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + TenantManager> Score<T> for TenantMonitoringScore {
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(TenantMonitoringInterpret {
|
||||||
|
score: self.clone(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
fn name(&self) -> String {
|
||||||
|
format!("{} monitoring [TenantMonitoringScore]", self.tenant_name)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct TenantMonitoringInterpret {
|
||||||
|
pub score: TenantMonitoringScore,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait::async_trait]
|
||||||
|
impl<T: Topology + TenantManager> Interpret<T> for TenantMonitoringInterpret {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
_inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
let tenant_config = topology.get_tenant_config().await.unwrap();
|
||||||
|
let tenant_ns = tenant_config.name.clone();
|
||||||
|
|
||||||
|
match self.score.monitoring_stack.version {
|
||||||
|
MonitoringApiVersion::V1Helm => {
|
||||||
|
debug!("Installing Helm monitoring for tenant {}", tenant_ns);
|
||||||
|
}
|
||||||
|
MonitoringApiVersion::V2CRD => {
|
||||||
|
debug!("Installing CRD monitoring for tenant {}", tenant_ns);
|
||||||
|
}
|
||||||
|
MonitoringApiVersion::V3RHOB => {
|
||||||
|
debug!("Installing RHOB monitoring for tenant {}", tenant_ns);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Outcome::success(format!(
|
||||||
|
"Installed monitoring stack for tenant {} with version {:?}",
|
||||||
|
self.score.tenant_name,
|
||||||
|
self.score.monitoring_stack.version
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
InterpretName::Custom("TenantMonitoringInterpret")
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
Version::from("1.0.0").unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
InterpretStatus::SUCCESS
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<harmony_types::id::Id> {
|
||||||
|
Vec::new()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -31,3 +31,16 @@ Ready to build your own components? These guides show you how.
|
|||||||
- [**Writing a Score**](./guides/writing-a-score.md): Learn how to create your own `Score` and `Interpret` logic to define a new desired state.
|
- [**Writing a Score**](./guides/writing-a-score.md): Learn how to create your own `Score` and `Interpret` logic to define a new desired state.
|
||||||
- [**Writing a Topology**](./guides/writing-a-topology.md): Learn how to model a new environment (like AWS, GCP, or custom hardware) as a `Topology`.
|
- [**Writing a Topology**](./guides/writing-a-topology.md): Learn how to model a new environment (like AWS, GCP, or custom hardware) as a `Topology`.
|
||||||
- [**Adding Capabilities**](./guides/adding-capabilities.md): See how to add a `Capability` to your custom `Topology`.
|
- [**Adding Capabilities**](./guides/adding-capabilities.md): See how to add a `Capability` to your custom `Topology`.
|
||||||
|
- [**Coding Guide**](./coding-guide.md): Conventions and best practices for writing Harmony code.
|
||||||
|
|
||||||
|
## 5. Module Documentation
|
||||||
|
|
||||||
|
Deep dives into specific Harmony modules and features.
|
||||||
|
|
||||||
|
- [**Monitoring and Alerting**](./monitoring.md): Comprehensive guide to cluster, tenant, and application-level monitoring with support for OKD, KubePrometheus, RHOB, and more.
|
||||||
|
|
||||||
|
## 6. Architecture Decision Records
|
||||||
|
|
||||||
|
Important architectural decisions are documented in the `adr/` directory:
|
||||||
|
|
||||||
|
- [Full ADR Index](../adr/)
|
||||||
|
|||||||
299
docs/coding-guide.md
Normal file
299
docs/coding-guide.md
Normal file
@@ -0,0 +1,299 @@
|
|||||||
|
# Harmony Coding Guide
|
||||||
|
|
||||||
|
Harmony is an infrastructure automation framework. It is **code-first and code-only**: operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Good code here means a good operator experience.
|
||||||
|
|
||||||
|
### Concrete context
|
||||||
|
|
||||||
|
We use here the context of the KVM module to explain the coding style. This will make it very easy to understand and should translate quite well to other modules/contexts managed by Harmony like OPNSense and Kubernetes.
|
||||||
|
|
||||||
|
## Core Philosophy
|
||||||
|
|
||||||
|
### The Careful Craftsman Principle
|
||||||
|
|
||||||
|
Harmony is a powerful framework that does a lot. With that power comes responsibility. Every abstraction, every trait, every module must earn its place. Before adding anything, ask:
|
||||||
|
|
||||||
|
1. **Does this solve a real problem users have?** Not a theoretical problem, an actual one encountered in production.
|
||||||
|
2. **Is this the simplest solution that works?** Complexity is a cost that compounds over time.
|
||||||
|
3. **Will this make the next developer's life easier or harder?** Code is read far more often than written.
|
||||||
|
|
||||||
|
When in doubt, don't abstract. Wait for the pattern to emerge from real usage. A little duplication is better than the wrong abstraction.
|
||||||
|
|
||||||
|
### High-level functions over raw primitives
|
||||||
|
|
||||||
|
Callers should not need to know about underlying protocols, XML schemas, or API quirks. A function that deploys a VM should accept meaningful parameters like CPU count, memory, and network name — not XML strings.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Bad: caller constructs XML and passes it to a thin wrapper
|
||||||
|
let xml = format!(r#"<domain type='kvm'>...</domain>"#, name, memory_kb, ...);
|
||||||
|
executor.create_vm(&xml).await?;
|
||||||
|
|
||||||
|
// Good: caller describes intent, the module handles representation
|
||||||
|
executor.define_vm(&VmConfig::builder("my-vm")
|
||||||
|
.cpu(4)
|
||||||
|
.memory_gb(8)
|
||||||
|
.disk(DiskConfig::new(50))
|
||||||
|
.network(NetworkRef::named("mylan"))
|
||||||
|
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||||
|
.build())
|
||||||
|
.await?;
|
||||||
|
```
|
||||||
|
|
||||||
|
The module owns the XML, the virsh invocations, the API calls — not the caller.
|
||||||
|
|
||||||
|
### Use the right abstraction layer
|
||||||
|
|
||||||
|
Prefer native library bindings over shelling out to CLI tools. The `virt` crate provides direct libvirt bindings and should be used instead of spawning `virsh` subprocesses.
|
||||||
|
|
||||||
|
- CLI subprocess calls are fragile: stdout/stderr parsing, exit codes, quoting, PATH differences
|
||||||
|
- Native bindings give typed errors, no temp files, no shell escaping
|
||||||
|
- `virt::connect::Connect` opens a connection; `virt::domain::Domain` manages VMs; `virt::network::Network` manages virtual networks
|
||||||
|
|
||||||
|
### Keep functions small and well-named
|
||||||
|
|
||||||
|
Each function should do one thing. If a function is doing two conceptually separate things, split it. Function names should read like plain English: `ensure_network_active`, `define_vm`, `vm_is_running`.
|
||||||
|
|
||||||
|
### Prefer short modules over large files
|
||||||
|
|
||||||
|
Group related types and functions by concept. A module that handles one resource (e.g., network, domain, storage) is better than a single file for everything.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
### Use `thiserror` for all error types
|
||||||
|
|
||||||
|
Define error types with `thiserror::Error`. This removes the boilerplate of implementing `Display` and `std::error::Error` by hand, keeps error messages close to their variants, and makes types easy to extend.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Bad: hand-rolled Display + std::error::Error
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub enum KVMError {
|
||||||
|
ConnectionError(String),
|
||||||
|
VMNotFound(String),
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::fmt::Display for KVMError { ... }
|
||||||
|
impl std::error::Error for KVMError {}
|
||||||
|
|
||||||
|
// Good: derive Display via thiserror
|
||||||
|
#[derive(thiserror::Error, Debug)]
|
||||||
|
pub enum KVMError {
|
||||||
|
#[error("connection failed: {0}")]
|
||||||
|
ConnectionFailed(String),
|
||||||
|
#[error("VM not found: {name}")]
|
||||||
|
VmNotFound { name: String },
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Make bubbling errors easy with `?` and `From`
|
||||||
|
|
||||||
|
`?` works on any error type for which there is a `From` impl. Add `From` conversions from lower-level errors into your module's error type so callers can use `?` without boilerplate.
|
||||||
|
|
||||||
|
With `thiserror`, wrapping a foreign error is one line:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[derive(thiserror::Error, Debug)]
|
||||||
|
pub enum KVMError {
|
||||||
|
#[error("libvirt error: {0}")]
|
||||||
|
Libvirt(#[from] virt::error::Error),
|
||||||
|
|
||||||
|
#[error("IO error: {0}")]
|
||||||
|
Io(#[from] std::io::Error),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This means a call that returns `virt::error::Error` can be `?`-propagated into a `Result<_, KVMError>` without any `.map_err(...)`.
|
||||||
|
|
||||||
|
### Typed errors over stringly-typed errors
|
||||||
|
|
||||||
|
Avoid `Box<dyn Error>` or `String` as error return types in library code. Callers need to distinguish errors programmatically — `KVMError::VmAlreadyExists` is actionable, `"VM already exists: foo"` as a `String` is not.
|
||||||
|
|
||||||
|
At binary entry points (e.g., `main`) it is acceptable to convert to `String` or `anyhow::Error` for display.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Logging
|
||||||
|
|
||||||
|
### Use the `log` crate macros
|
||||||
|
|
||||||
|
All log output must go through the `log` crate. Never use `println!`, `eprintln!`, or `dbg!` in library code. This makes output compatible with any logging backend (env_logger, tracing, structured logging, etc.).
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Bad
|
||||||
|
println!("Creating VM: {}", name);
|
||||||
|
|
||||||
|
// Good
|
||||||
|
use log::{info, debug, warn};
|
||||||
|
info!("Creating VM: {name}");
|
||||||
|
debug!("VM XML:\n{xml}");
|
||||||
|
warn!("Network already active, skipping creation");
|
||||||
|
```
|
||||||
|
|
||||||
|
Use the right level:
|
||||||
|
|
||||||
|
| Level | When to use |
|
||||||
|
|---------|-------------|
|
||||||
|
| `error` | Unrecoverable failures (before returning Err) |
|
||||||
|
| `warn` | Recoverable issues, skipped steps |
|
||||||
|
| `info` | High-level progress events visible in normal operation |
|
||||||
|
| `debug` | Detailed operational info useful for debugging |
|
||||||
|
| `trace` | Very granular, per-iteration or per-call data |
|
||||||
|
|
||||||
|
Log before significant operations and after unexpected conditions. Do not log inside tight loops at `info` level.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Types and Builders
|
||||||
|
|
||||||
|
### Derive `Serialize` on all public domain types
|
||||||
|
|
||||||
|
All public structs and enums that represent configuration or state should derive `serde::Serialize`. Add `Deserialize` when round-trip serialization is needed.
|
||||||
|
|
||||||
|
### Builder pattern for complex configs
|
||||||
|
|
||||||
|
When a type has more than three fields or optional fields, provide a builder. The builder pattern allows named, incremental construction without positional arguments.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let config = VmConfig::builder("bootstrap")
|
||||||
|
.cpu(4)
|
||||||
|
.memory_gb(8)
|
||||||
|
.disk(DiskConfig::new(50).labeled("os"))
|
||||||
|
.disk(DiskConfig::new(100).labeled("data"))
|
||||||
|
.network(NetworkRef::named("harmonylan"))
|
||||||
|
.boot_order([BootDevice::Network, BootDevice::Disk])
|
||||||
|
.build();
|
||||||
|
```
|
||||||
|
|
||||||
|
### Avoid `pub` fields on config structs
|
||||||
|
|
||||||
|
Expose data through methods or the builder, not raw field access. This preserves the ability to validate, rename, or change representation without breaking callers.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Async
|
||||||
|
|
||||||
|
### Use `tokio` for all async runtime needs
|
||||||
|
|
||||||
|
All async code runs on tokio. Use `tokio::spawn`, `tokio::time`, etc. Use `#[async_trait]` for traits with async methods.
|
||||||
|
|
||||||
|
### No blocking in async context
|
||||||
|
|
||||||
|
Never call blocking I/O (file I/O, network, process spawn) directly in an async function. Use `tokio::fs`, `tokio::process`, or `tokio::task::spawn_blocking` as appropriate.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Module Structure
|
||||||
|
|
||||||
|
### Follow the `Score` / `Interpret` pattern
|
||||||
|
|
||||||
|
Modules that represent deployable infrastructure should implement `Score<T: Topology>` and `Interpret<T>`:
|
||||||
|
|
||||||
|
- `Score` is the serializable, clonable configuration declaring *what* to deploy
|
||||||
|
- `Interpret` does the actual work when `execute()` is called
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct KvmScore {
|
||||||
|
network: NetworkConfig,
|
||||||
|
vms: Vec<VmConfig>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + KvmHost> Score<T> for KvmScore {
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(KvmInterpret::new(self.clone()))
|
||||||
|
}
|
||||||
|
fn name(&self) -> String { "KvmScore".to_string() }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Flatten the public API in `mod.rs`
|
||||||
|
|
||||||
|
Internal submodules are implementation detail. Re-export what callers need at the module root:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// modules/kvm/mod.rs
|
||||||
|
mod connection;
|
||||||
|
mod domain;
|
||||||
|
mod network;
|
||||||
|
mod error;
|
||||||
|
mod xml;
|
||||||
|
|
||||||
|
pub use connection::KvmConnection;
|
||||||
|
pub use domain::{VmConfig, VmConfigBuilder, VmStatus, DiskConfig, BootDevice};
|
||||||
|
pub use error::KvmError;
|
||||||
|
pub use network::NetworkConfig;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Commit Style
|
||||||
|
|
||||||
|
Follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/):
|
||||||
|
|
||||||
|
```
|
||||||
|
feat(kvm): add network isolation support
|
||||||
|
fix(kvm): correct memory unit conversion for libvirt
|
||||||
|
refactor(kvm): replace virsh subprocess calls with virt crate bindings
|
||||||
|
docs: add coding guide
|
||||||
|
```
|
||||||
|
|
||||||
|
Keep pull requests small and single-purpose (under ~200 lines excluding generated code). Do not mix refactoring, bug fixes, and new features in one PR.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When to Add Abstractions
|
||||||
|
|
||||||
|
Harmony provides powerful abstraction mechanisms: traits, generics, the Score/Interpret pattern, and capabilities. Use them judiciously.
|
||||||
|
|
||||||
|
### Add an abstraction when:
|
||||||
|
|
||||||
|
- **You have three or more concrete implementations** doing the same thing. Two is often coincidence; three is a pattern.
|
||||||
|
- **The abstraction provides compile-time safety** that prevents real bugs (e.g., capability bounds on topologies).
|
||||||
|
- **The abstraction hides genuine complexity** that callers shouldn't need to understand (e.g., XML schema generation for libvirt).
|
||||||
|
|
||||||
|
### Don't add an abstraction when:
|
||||||
|
|
||||||
|
- **It's just to avoid a few lines of boilerplate**. Copy-paste is sometimes better than a trait hierarchy.
|
||||||
|
- **You're anticipating future flexibility** that isn't needed today. YAGNI (You Aren't Gonna Need It).
|
||||||
|
- **The abstraction makes the code harder to understand** for someone unfamiliar with the codebase.
|
||||||
|
- **You're wrapping a single implementation**. A trait with one implementation is usually over-engineering.
|
||||||
|
|
||||||
|
### Signs you've over-abstracted:
|
||||||
|
|
||||||
|
- You need to explain the type system to a competent Rust developer for them to understand how to add a simple feature.
|
||||||
|
- Adding a new concrete type requires changes in multiple trait definitions.
|
||||||
|
- The word "factory" or "manager" appears in your type names.
|
||||||
|
- You have more trait definitions than concrete implementations.
|
||||||
|
|
||||||
|
### The Rule of Three for Traits
|
||||||
|
|
||||||
|
Before creating a new trait, ensure you have:
|
||||||
|
|
||||||
|
1. A clear, real use case (not hypothetical)
|
||||||
|
2. At least one concrete implementation
|
||||||
|
3. A plan for how callers will use it
|
||||||
|
|
||||||
|
Only generalize when the pattern is proven. The monitoring module is a good example: we had multiple alert senders (OKD, KubePrometheus, RHOB) before we introduced the `AlertSender` and `AlertReceiver<S>` traits. The traits emerged from real needs, not design sessions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
|
||||||
|
### Document the "why", not the "what"
|
||||||
|
|
||||||
|
Code should be self-explanatory for the "what". Comments and documentation should explain intent, rationale, and gotchas.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Bad: restates the code
|
||||||
|
// Returns the number of VMs
|
||||||
|
fn vm_count(&self) -> usize { self.vms.len() }
|
||||||
|
|
||||||
|
// Good: explains the why
|
||||||
|
// Returns 0 if connection is lost, rather than erroring,
|
||||||
|
// because monitoring code uses this for health checks
|
||||||
|
fn vm_count(&self) -> usize { self.vms.len() }
|
||||||
|
```
|
||||||
|
|
||||||
|
### Keep examples in the `examples/` directory
|
||||||
|
|
||||||
|
Working code beats documentation. Every major feature should have a runnable example that demonstrates real usage.
|
||||||
|
|
||||||
443
docs/monitoring.md
Normal file
443
docs/monitoring.md
Normal file
@@ -0,0 +1,443 @@
|
|||||||
|
# Monitoring and Alerting in Harmony
|
||||||
|
|
||||||
|
Harmony provides a unified, type-safe approach to monitoring and alerting across Kubernetes, OpenShift, and bare-metal infrastructure. This guide explains the architecture and how to use it at different levels of abstraction.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Harmony's monitoring module supports three distinct use cases:
|
||||||
|
|
||||||
|
| Level | Who Uses It | What It Provides |
|
||||||
|
|-------|-------------|------------------|
|
||||||
|
| **Cluster** | Cluster administrators | Full control over monitoring stack, cluster-wide alerts, external scrape targets |
|
||||||
|
| **Tenant** | Platform teams | Namespace-scoped monitoring in multi-tenant environments |
|
||||||
|
| **Application** | Application developers | Zero-config monitoring that "just works" |
|
||||||
|
|
||||||
|
Each level builds on the same underlying abstractions, ensuring consistency while providing appropriate complexity for each audience.
|
||||||
|
|
||||||
|
## Core Concepts
|
||||||
|
|
||||||
|
### AlertSender
|
||||||
|
|
||||||
|
An `AlertSender` represents the system that evaluates alert rules and sends notifications. Harmony supports multiple monitoring stacks:
|
||||||
|
|
||||||
|
| Sender | Description | Use When |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| `OpenshiftClusterAlertSender` | OKD/OpenShift built-in monitoring | Running on OKD/OpenShift |
|
||||||
|
| `KubePrometheus` | kube-prometheus-stack via Helm | Standard Kubernetes, need full stack |
|
||||||
|
| `Prometheus` | Standalone Prometheus | Custom Prometheus deployment |
|
||||||
|
| `RedHatClusterObservability` | RHOB operator | Red Hat managed clusters |
|
||||||
|
| `Grafana` | Grafana-managed alerting | Grafana as primary alerting layer |
|
||||||
|
|
||||||
|
### AlertReceiver
|
||||||
|
|
||||||
|
An `AlertReceiver` defines where alerts are sent (Discord, Slack, email, webhook, etc.). Receivers are parameterized by sender type because each monitoring stack has different configuration formats.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub trait AlertReceiver<S: AlertSender> {
|
||||||
|
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
|
||||||
|
fn name(&self) -> String;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Built-in receivers:
|
||||||
|
- `DiscordReceiver` - Discord webhooks
|
||||||
|
- `WebhookReceiver` - Generic HTTP webhooks
|
||||||
|
|
||||||
|
### AlertRule
|
||||||
|
|
||||||
|
An `AlertRule` defines a Prometheus alert expression. Rules are also parameterized by sender to handle different CRD formats.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub trait AlertRule<S: AlertSender> {
|
||||||
|
fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
|
||||||
|
fn name(&self) -> String;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Observability Capability
|
||||||
|
|
||||||
|
Topologies implement `Observability<S>` to indicate they support a specific alert sender:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
impl Observability<OpenshiftClusterAlertSender> for K8sAnywhereTopology {
|
||||||
|
async fn install_receivers(&self, sender, inventory, receivers) { ... }
|
||||||
|
async fn install_rules(&self, sender, inventory, rules) { ... }
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This provides **compile-time verification**: if you try to use `OpenshiftClusterAlertScore` with a topology that doesn't implement `Observability<OpenshiftClusterAlertSender>`, the code won't compile.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Level 1: Cluster Monitoring
|
||||||
|
|
||||||
|
Cluster monitoring is for administrators who need full control over the monitoring infrastructure. This includes:
|
||||||
|
- Installing/managing the monitoring stack
|
||||||
|
- Configuring cluster-wide alert receivers
|
||||||
|
- Defining cluster-level alert rules
|
||||||
|
- Adding external scrape targets (e.g., bare-metal servers, firewalls)
|
||||||
|
|
||||||
|
### Example: OKD Cluster Alerts
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use harmony::{
|
||||||
|
modules::monitoring::{
|
||||||
|
alert_channel::discord_alert_channel::DiscordReceiver,
|
||||||
|
alert_rule::{alerts::k8s::pvc::high_pvc_fill_rate_over_two_days, prometheus_alert_rule::AlertManagerRuleGroup},
|
||||||
|
okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
|
||||||
|
scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
|
||||||
|
},
|
||||||
|
topology::{K8sAnywhereTopology, monitoring::{AlertMatcher, AlertRoute, MatchOp}},
|
||||||
|
};
|
||||||
|
|
||||||
|
let severity_matcher = AlertMatcher {
|
||||||
|
label: "severity".to_string(),
|
||||||
|
operator: MatchOp::Eq,
|
||||||
|
value: "critical".to_string(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let rule_group = AlertManagerRuleGroup::new(
|
||||||
|
"cluster-rules",
|
||||||
|
vec![high_pvc_fill_rate_over_two_days()],
|
||||||
|
);
|
||||||
|
|
||||||
|
let external_exporter = PrometheusNodeExporter {
|
||||||
|
job_name: "firewall".to_string(),
|
||||||
|
metrics_path: "/metrics".to_string(),
|
||||||
|
listen_address: ip!("192.168.1.1"),
|
||||||
|
port: 9100,
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
|
||||||
|
harmony_cli::run(
|
||||||
|
Inventory::autoload(),
|
||||||
|
K8sAnywhereTopology::from_env(),
|
||||||
|
vec![Box::new(OpenshiftClusterAlertScore {
|
||||||
|
sender: OpenshiftClusterAlertSender,
|
||||||
|
receivers: vec![Box::new(DiscordReceiver {
|
||||||
|
name: "critical-alerts".to_string(),
|
||||||
|
url: hurl!("https://discord.com/api/webhooks/..."),
|
||||||
|
route: AlertRoute {
|
||||||
|
matchers: vec![severity_matcher],
|
||||||
|
..AlertRoute::default("critical-alerts".to_string())
|
||||||
|
},
|
||||||
|
})],
|
||||||
|
rules: vec![Box::new(rule_group)],
|
||||||
|
scrape_targets: Some(vec![Box::new(external_exporter)]),
|
||||||
|
})],
|
||||||
|
None,
|
||||||
|
).await?;
|
||||||
|
```
|
||||||
|
|
||||||
|
### What This Does
|
||||||
|
|
||||||
|
1. **Enables cluster monitoring** - Activates OKD's built-in Prometheus
|
||||||
|
2. **Enables user workload monitoring** - Allows namespace-scoped rules
|
||||||
|
3. **Configures Alertmanager** - Adds Discord receiver with route matching
|
||||||
|
4. **Deploys alert rules** - Creates `AlertingRule` CRD with PVC fill rate alert
|
||||||
|
5. **Adds external scrape target** - Configures Prometheus to scrape the firewall
|
||||||
|
|
||||||
|
### Compile-Time Safety
|
||||||
|
|
||||||
|
The `OpenshiftClusterAlertScore` requires:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T>
|
||||||
|
for OpenshiftClusterAlertScore
|
||||||
|
```
|
||||||
|
|
||||||
|
If `K8sAnywhereTopology` didn't implement `Observability<OpenshiftClusterAlertSender>`, this code would fail to compile. You cannot accidentally deploy OKD alerts to a cluster that doesn't support them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Level 2: Tenant Monitoring
|
||||||
|
|
||||||
|
In multi-tenant clusters, teams are often confined to specific namespaces. Tenant monitoring adapts to this constraint:
|
||||||
|
|
||||||
|
- Resources are deployed in the tenant's namespace
|
||||||
|
- Cannot modify cluster-level monitoring configuration
|
||||||
|
- The topology determines namespace context at runtime
|
||||||
|
|
||||||
|
### How It Works
|
||||||
|
|
||||||
|
The topology's `Observability` implementation handles tenant scoping:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
impl Observability<KubePrometheus> for K8sAnywhereTopology {
|
||||||
|
async fn install_rules(&self, sender, inventory, rules) {
|
||||||
|
// Topology knows if it's tenant-scoped
|
||||||
|
let namespace = self.get_tenant_config().await
|
||||||
|
.map(|t| t.name)
|
||||||
|
.unwrap_or_else(|| "monitoring".to_string());
|
||||||
|
|
||||||
|
// Rules are installed in the appropriate namespace
|
||||||
|
for rule in rules.unwrap_or_default() {
|
||||||
|
let score = KubePrometheusRuleScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
rule,
|
||||||
|
namespace: namespace.clone(), // Tenant namespace
|
||||||
|
};
|
||||||
|
score.create_interpret().execute(inventory, self).await?;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tenant vs Cluster Resources
|
||||||
|
|
||||||
|
| Resource | Cluster-Level | Tenant-Level |
|
||||||
|
|----------|---------------|--------------|
|
||||||
|
| Alertmanager config | Global receivers | Namespaced receivers (where supported) |
|
||||||
|
| PrometheusRules | Cluster-wide alerts | Namespace alerts only |
|
||||||
|
| ServiceMonitors | Any namespace | Own namespace only |
|
||||||
|
| External scrape targets | Can add | Cannot add (cluster config) |
|
||||||
|
|
||||||
|
### Runtime Validation
|
||||||
|
|
||||||
|
Tenant constraints are validated at runtime via Kubernetes RBAC. If a tenant-scoped deployment attempts cluster-level operations, it fails with a clear permission error from the Kubernetes API.
|
||||||
|
|
||||||
|
This cannot be fully compile-time because tenant context is determined by who's running the code and what permissions they have—information only available at runtime.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Level 3: Application Monitoring
|
||||||
|
|
||||||
|
Application monitoring provides zero-config, opinionated monitoring for developers. Just add the `Monitoring` feature to your application and it works.
|
||||||
|
|
||||||
|
### Example
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use harmony::modules::{
|
||||||
|
application::{Application, ApplicationFeature},
|
||||||
|
monitoring::alert_channel::webhook_receiver::WebhookReceiver,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Define your application
|
||||||
|
let my_app = MyApplication::new();
|
||||||
|
|
||||||
|
// Add monitoring as a feature
|
||||||
|
let monitoring = Monitoring {
|
||||||
|
application: Arc::new(my_app),
|
||||||
|
alert_receiver: vec![], // Uses defaults
|
||||||
|
};
|
||||||
|
|
||||||
|
// Install with the application
|
||||||
|
my_app.add_feature(monitoring);
|
||||||
|
```
|
||||||
|
|
||||||
|
### What Application Monitoring Provides
|
||||||
|
|
||||||
|
1. **Automatic ServiceMonitor** - Creates a ServiceMonitor for your application's pods
|
||||||
|
2. **Ntfy Notification Channel** - Auto-installs and configures Ntfy for push notifications
|
||||||
|
3. **Tenant Awareness** - Automatically scopes to the correct namespace
|
||||||
|
4. **Sensible Defaults** - Pre-configured alert routes and receivers
|
||||||
|
|
||||||
|
### Under the Hood
|
||||||
|
|
||||||
|
```rust
|
||||||
|
impl<T: Topology + Observability<Prometheus> + TenantManager>
|
||||||
|
ApplicationFeature<T> for Monitoring
|
||||||
|
{
|
||||||
|
async fn ensure_installed(&self, topology: &T) -> Result<...> {
|
||||||
|
// 1. Get tenant namespace (or use app name)
|
||||||
|
let namespace = topology.get_tenant_config().await
|
||||||
|
.map(|ns| ns.name.clone())
|
||||||
|
.unwrap_or_else(|| self.application.name());
|
||||||
|
|
||||||
|
// 2. Create ServiceMonitor for the app
|
||||||
|
let app_service_monitor = ServiceMonitor {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(self.application.name()),
|
||||||
|
namespace: Some(namespace.clone()),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
spec: ServiceMonitorSpec::default(),
|
||||||
|
};
|
||||||
|
|
||||||
|
// 3. Install Ntfy for notifications
|
||||||
|
let ntfy = NtfyScore { namespace, host };
|
||||||
|
ntfy.interpret(&Inventory::empty(), topology).await?;
|
||||||
|
|
||||||
|
// 4. Wire up webhook receiver to Ntfy
|
||||||
|
let ntfy_receiver = WebhookReceiver { ... };
|
||||||
|
|
||||||
|
// 5. Execute monitoring score
|
||||||
|
alerting_score.interpret(&Inventory::empty(), topology).await?;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-Built Alert Rules
|
||||||
|
|
||||||
|
Harmony provides a library of common alert rules in `modules/monitoring/alert_rule/alerts/`:
|
||||||
|
|
||||||
|
### Kubernetes Alerts (`alerts/k8s/`)
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use harmony::modules::monitoring::alert_rule::alerts::k8s::{
|
||||||
|
pod::pod_failed,
|
||||||
|
pvc::high_pvc_fill_rate_over_two_days,
|
||||||
|
memory_usage::alert_high_memory_usage,
|
||||||
|
};
|
||||||
|
|
||||||
|
let rules = AlertManagerRuleGroup::new("k8s-rules", vec![
|
||||||
|
pod_failed(),
|
||||||
|
high_pvc_fill_rate_over_two_days(),
|
||||||
|
alert_high_memory_usage(),
|
||||||
|
]);
|
||||||
|
```
|
||||||
|
|
||||||
|
Available rules:
|
||||||
|
- `pod_failed()` - Pod in failed state
|
||||||
|
- `alert_container_restarting()` - Container restart loop
|
||||||
|
- `alert_pod_not_ready()` - Pod not ready for extended period
|
||||||
|
- `high_pvc_fill_rate_over_two_days()` - PVC will fill within 2 days
|
||||||
|
- `alert_high_memory_usage()` - Memory usage above threshold
|
||||||
|
- `alert_high_cpu_usage()` - CPU usage above threshold
|
||||||
|
|
||||||
|
### Infrastructure Alerts (`alerts/infra/`)
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use harmony::modules::monitoring::alert_rule::alerts::infra::opnsense::high_http_error_rate;
|
||||||
|
|
||||||
|
let rules = AlertManagerRuleGroup::new("infra-rules", vec![
|
||||||
|
high_http_error_rate(),
|
||||||
|
]);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Creating Custom Rules
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use harmony::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
|
||||||
|
|
||||||
|
pub fn my_custom_alert() -> PrometheusAlertRule {
|
||||||
|
PrometheusAlertRule::new("MyServiceDown", "up{job=\"my-service\"} == 0")
|
||||||
|
.for_duration("5m")
|
||||||
|
.label("severity", "critical")
|
||||||
|
.annotation("summary", "My service is down")
|
||||||
|
.annotation("description", "The my-service job has been down for more than 5 minutes")
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alert Receivers
|
||||||
|
|
||||||
|
### Discord Webhook
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use harmony::modules::monitoring::alert_channel::discord_alert_channel::DiscordReceiver;
|
||||||
|
use harmony::topology::monitoring::{AlertRoute, AlertMatcher, MatchOp};
|
||||||
|
|
||||||
|
let discord = DiscordReceiver {
|
||||||
|
name: "ops-alerts".to_string(),
|
||||||
|
url: hurl!("https://discord.com/api/webhooks/123456/abcdef"),
|
||||||
|
route: AlertRoute {
|
||||||
|
receiver: "ops-alerts".to_string(),
|
||||||
|
matchers: vec![AlertMatcher {
|
||||||
|
label: "severity".to_string(),
|
||||||
|
operator: MatchOp::Eq,
|
||||||
|
value: "critical".to_string(),
|
||||||
|
}],
|
||||||
|
group_by: vec!["alertname".to_string()],
|
||||||
|
repeat_interval: Some("30m".to_string()),
|
||||||
|
continue_matching: false,
|
||||||
|
children: vec![],
|
||||||
|
},
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generic Webhook
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use harmony::modules::monitoring::alert_channel::webhook_receiver::WebhookReceiver;
|
||||||
|
|
||||||
|
let webhook = WebhookReceiver {
|
||||||
|
name: "custom-webhook".to_string(),
|
||||||
|
url: hurl!("https://api.example.com/alerts"),
|
||||||
|
route: AlertRoute::default("custom-webhook".to_string()),
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding a New Monitoring Stack
|
||||||
|
|
||||||
|
To add support for a new monitoring stack:
|
||||||
|
|
||||||
|
1. **Create the sender type** in `modules/monitoring/my_sender/mod.rs`:
|
||||||
|
```rust
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct MySender;
|
||||||
|
|
||||||
|
impl AlertSender for MySender {
|
||||||
|
fn name(&self) -> String { "MySender".to_string() }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Define CRD types** in `modules/monitoring/my_sender/crd/`:
|
||||||
|
```rust
|
||||||
|
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone)]
|
||||||
|
#[kube(group = "monitoring.example.com", version = "v1", kind = "MyAlertRule")]
|
||||||
|
pub struct MyAlertRuleSpec { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Implement Observability** in `domain/topology/k8s_anywhere/observability/my_sender.rs`:
|
||||||
|
```rust
|
||||||
|
impl Observability<MySender> for K8sAnywhereTopology {
|
||||||
|
async fn install_receivers(&self, sender, inventory, receivers) { ... }
|
||||||
|
async fn install_rules(&self, sender, inventory, rules) { ... }
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Implement receiver conversions** for existing receivers:
|
||||||
|
```rust
|
||||||
|
impl AlertReceiver<MySender> for DiscordReceiver {
|
||||||
|
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
|
||||||
|
// Convert DiscordReceiver to MySender's format
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Create score types**:
|
||||||
|
```rust
|
||||||
|
pub struct MySenderAlertScore {
|
||||||
|
pub sender: MySender,
|
||||||
|
pub receivers: Vec<Box<dyn AlertReceiver<MySender>>>,
|
||||||
|
pub rules: Vec<Box<dyn AlertRule<MySender>>>,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Principles
|
||||||
|
|
||||||
|
### Type Safety Over Flexibility
|
||||||
|
|
||||||
|
Each monitoring stack has distinct CRDs and configuration formats. Rather than a unified "MonitoringStack" type that loses stack-specific features, we use generic traits that provide type safety while allowing each stack to express its unique configuration.
|
||||||
|
|
||||||
|
### Compile-Time Capability Verification
|
||||||
|
|
||||||
|
The `Observability<S>` bound ensures you can't deploy OKD alerts to a KubePrometheus cluster. The compiler catches platform mismatches before deployment.
|
||||||
|
|
||||||
|
### Explicit Over Implicit
|
||||||
|
|
||||||
|
Monitoring stacks are chosen explicitly (`OpenshiftClusterAlertSender` vs `KubePrometheus`). There's no "auto-detection" that could lead to surprising behavior.
|
||||||
|
|
||||||
|
### Three Levels, One Foundation
|
||||||
|
|
||||||
|
Cluster, tenant, and application monitoring all use the same traits (`AlertSender`, `AlertReceiver`, `AlertRule`). The difference is in how scores are constructed and how topologies interpret them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
- [ADR-020: Monitoring and Alerting Architecture](../adr/020-monitoring-alerting-architecture.md)
|
||||||
|
- [ADR-013: Monitoring Notifications (ntfy)](../adr/013-monitoring-notifications.md)
|
||||||
|
- [ADR-011: Multi-Tenant Cluster Architecture](../adr/011-multi-tenant-cluster.md)
|
||||||
|
- [Coding Guide](coding-guide.md)
|
||||||
|
- [Core Concepts](concepts.md)
|
||||||
@@ -7,7 +7,7 @@ use harmony::{
|
|||||||
monitoring::alert_channel::webhook_receiver::WebhookReceiver,
|
monitoring::alert_channel::webhook_receiver::WebhookReceiver,
|
||||||
tenant::TenantScore,
|
tenant::TenantScore,
|
||||||
},
|
},
|
||||||
topology::{K8sAnywhereTopology, tenant::TenantConfig},
|
topology::{K8sAnywhereTopology, monitoring::AlertRoute, tenant::TenantConfig},
|
||||||
};
|
};
|
||||||
use harmony_types::id::Id;
|
use harmony_types::id::Id;
|
||||||
use harmony_types::net::Url;
|
use harmony_types::net::Url;
|
||||||
@@ -33,9 +33,14 @@ async fn main() {
|
|||||||
service_port: 3000,
|
service_port: 3000,
|
||||||
});
|
});
|
||||||
|
|
||||||
|
let receiver_name = "sample-webhook-receiver".to_string();
|
||||||
|
|
||||||
let webhook_receiver = WebhookReceiver {
|
let webhook_receiver = WebhookReceiver {
|
||||||
name: "sample-webhook-receiver".to_string(),
|
name: receiver_name.clone(),
|
||||||
url: Url::Url(url::Url::parse("https://webhook-doesnt-exist.com").unwrap()),
|
url: Url::Url(url::Url::parse("https://webhook-doesnt-exist.com").unwrap()),
|
||||||
|
route: AlertRoute {
|
||||||
|
..AlertRoute::default(receiver_name)
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
let app = ApplicationScore {
|
let app = ApplicationScore {
|
||||||
|
|||||||
@@ -1,37 +1,45 @@
|
|||||||
use std::collections::HashMap;
|
use std::{
|
||||||
|
collections::HashMap,
|
||||||
|
sync::{Arc, Mutex},
|
||||||
|
};
|
||||||
|
|
||||||
use harmony::{
|
use harmony::{
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
modules::{
|
modules::monitoring::{
|
||||||
monitoring::{
|
alert_channel::discord_alert_channel::DiscordReceiver,
|
||||||
alert_channel::discord_alert_channel::DiscordWebhook,
|
alert_rule::{
|
||||||
alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
|
alerts::{
|
||||||
kube_prometheus::{
|
infra::dell_server::{
|
||||||
helm_prometheus_alert_score::HelmPrometheusAlertingScore,
|
alert_global_storage_status_critical,
|
||||||
types::{
|
alert_global_storage_status_non_recoverable,
|
||||||
HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
|
global_storage_status_degraded_non_critical,
|
||||||
ServiceMonitorEndpoint,
|
|
||||||
},
|
},
|
||||||
|
k8s::pvc::high_pvc_fill_rate_over_two_days,
|
||||||
},
|
},
|
||||||
|
prometheus_alert_rule::AlertManagerRuleGroup,
|
||||||
},
|
},
|
||||||
prometheus::alerts::{
|
kube_prometheus::{
|
||||||
infra::dell_server::{
|
helm::config::KubePrometheusConfig,
|
||||||
alert_global_storage_status_critical, alert_global_storage_status_non_recoverable,
|
kube_prometheus_alerting_score::KubePrometheusAlertingScore,
|
||||||
global_storage_status_degraded_non_critical,
|
types::{
|
||||||
|
HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
|
||||||
|
ServiceMonitorEndpoint,
|
||||||
},
|
},
|
||||||
k8s::pvc::high_pvc_fill_rate_over_two_days,
|
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
topology::K8sAnywhereTopology,
|
topology::{K8sAnywhereTopology, monitoring::AlertRoute},
|
||||||
};
|
};
|
||||||
use harmony_types::{k8s_name::K8sName, net::Url};
|
use harmony_types::{k8s_name::K8sName, net::Url};
|
||||||
|
|
||||||
#[tokio::main]
|
#[tokio::main]
|
||||||
async fn main() {
|
async fn main() {
|
||||||
let discord_receiver = DiscordWebhook {
|
let receiver_name = "test-discord".to_string();
|
||||||
name: K8sName("test-discord".to_string()),
|
let discord_receiver = DiscordReceiver {
|
||||||
|
name: receiver_name.clone(),
|
||||||
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
|
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
|
||||||
selectors: vec![],
|
route: AlertRoute {
|
||||||
|
..AlertRoute::default(receiver_name)
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
|
let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
|
||||||
@@ -70,10 +78,15 @@ async fn main() {
|
|||||||
endpoints: vec![service_monitor_endpoint],
|
endpoints: vec![service_monitor_endpoint],
|
||||||
..Default::default()
|
..Default::default()
|
||||||
};
|
};
|
||||||
let alerting_score = HelmPrometheusAlertingScore {
|
|
||||||
|
let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
|
||||||
|
|
||||||
|
let alerting_score = KubePrometheusAlertingScore {
|
||||||
receivers: vec![Box::new(discord_receiver)],
|
receivers: vec![Box::new(discord_receiver)],
|
||||||
rules: vec![Box::new(additional_rules), Box::new(additional_rules2)],
|
rules: vec![Box::new(additional_rules), Box::new(additional_rules2)],
|
||||||
service_monitors: vec![service_monitor],
|
service_monitors: vec![service_monitor],
|
||||||
|
scrape_targets: None,
|
||||||
|
config,
|
||||||
};
|
};
|
||||||
|
|
||||||
harmony_cli::run(
|
harmony_cli::run(
|
||||||
|
|||||||
@@ -1,24 +1,32 @@
|
|||||||
use std::{collections::HashMap, str::FromStr};
|
use std::{
|
||||||
|
collections::HashMap,
|
||||||
|
str::FromStr,
|
||||||
|
sync::{Arc, Mutex},
|
||||||
|
};
|
||||||
|
|
||||||
use harmony::{
|
use harmony::{
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
modules::{
|
modules::{
|
||||||
monitoring::{
|
monitoring::{
|
||||||
alert_channel::discord_alert_channel::DiscordWebhook,
|
alert_channel::discord_alert_channel::DiscordReceiver,
|
||||||
alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
|
alert_rule::{
|
||||||
|
alerts::k8s::pvc::high_pvc_fill_rate_over_two_days,
|
||||||
|
prometheus_alert_rule::AlertManagerRuleGroup,
|
||||||
|
},
|
||||||
kube_prometheus::{
|
kube_prometheus::{
|
||||||
helm_prometheus_alert_score::HelmPrometheusAlertingScore,
|
helm::config::KubePrometheusConfig,
|
||||||
|
kube_prometheus_alerting_score::KubePrometheusAlertingScore,
|
||||||
types::{
|
types::{
|
||||||
HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
|
HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
|
||||||
ServiceMonitorEndpoint,
|
ServiceMonitorEndpoint,
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
prometheus::alerts::k8s::pvc::high_pvc_fill_rate_over_two_days,
|
|
||||||
tenant::TenantScore,
|
tenant::TenantScore,
|
||||||
},
|
},
|
||||||
topology::{
|
topology::{
|
||||||
K8sAnywhereTopology,
|
K8sAnywhereTopology,
|
||||||
|
monitoring::AlertRoute,
|
||||||
tenant::{ResourceLimits, TenantConfig, TenantNetworkPolicy},
|
tenant::{ResourceLimits, TenantConfig, TenantNetworkPolicy},
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
@@ -42,10 +50,13 @@ async fn main() {
|
|||||||
},
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
let discord_receiver = DiscordWebhook {
|
let receiver_name = "test-discord".to_string();
|
||||||
name: K8sName("test-discord".to_string()),
|
let discord_receiver = DiscordReceiver {
|
||||||
|
name: receiver_name.clone(),
|
||||||
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
|
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
|
||||||
selectors: vec![],
|
route: AlertRoute {
|
||||||
|
..AlertRoute::default(receiver_name)
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
|
let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
|
||||||
@@ -74,10 +85,14 @@ async fn main() {
|
|||||||
..Default::default()
|
..Default::default()
|
||||||
};
|
};
|
||||||
|
|
||||||
let alerting_score = HelmPrometheusAlertingScore {
|
let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
|
||||||
|
|
||||||
|
let alerting_score = KubePrometheusAlertingScore {
|
||||||
receivers: vec![Box::new(discord_receiver)],
|
receivers: vec![Box::new(discord_receiver)],
|
||||||
rules: vec![Box::new(additional_rules)],
|
rules: vec![Box::new(additional_rules)],
|
||||||
service_monitors: vec![service_monitor],
|
service_monitors: vec![service_monitor],
|
||||||
|
scrape_targets: None,
|
||||||
|
config,
|
||||||
};
|
};
|
||||||
|
|
||||||
harmony_cli::run(
|
harmony_cli::run(
|
||||||
|
|||||||
@@ -1,35 +1,64 @@
|
|||||||
use std::collections::HashMap;
|
|
||||||
|
|
||||||
use harmony::{
|
use harmony::{
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
modules::monitoring::{
|
modules::monitoring::{
|
||||||
alert_channel::discord_alert_channel::DiscordWebhook,
|
alert_channel::discord_alert_channel::DiscordReceiver,
|
||||||
okd::cluster_monitoring::OpenshiftClusterAlertScore,
|
alert_rule::{
|
||||||
|
alerts::{
|
||||||
|
infra::opnsense::high_http_error_rate, k8s::pvc::high_pvc_fill_rate_over_two_days,
|
||||||
|
},
|
||||||
|
prometheus_alert_rule::AlertManagerRuleGroup,
|
||||||
|
},
|
||||||
|
okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
|
||||||
|
scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
|
||||||
|
},
|
||||||
|
topology::{
|
||||||
|
K8sAnywhereTopology,
|
||||||
|
monitoring::{AlertMatcher, AlertRoute, MatchOp},
|
||||||
},
|
},
|
||||||
topology::K8sAnywhereTopology,
|
|
||||||
};
|
};
|
||||||
use harmony_macros::hurl;
|
|
||||||
use harmony_types::k8s_name::K8sName;
|
use harmony_macros::{hurl, ip};
|
||||||
|
|
||||||
#[tokio::main]
|
#[tokio::main]
|
||||||
async fn main() {
|
async fn main() {
|
||||||
let mut sel = HashMap::new();
|
let platform_matcher = AlertMatcher {
|
||||||
sel.insert(
|
label: "prometheus".to_string(),
|
||||||
"openshift_io_alert_source".to_string(),
|
operator: MatchOp::Eq,
|
||||||
"platform".to_string(),
|
value: "openshift-monitoring/k8s".to_string(),
|
||||||
);
|
};
|
||||||
let mut sel2 = HashMap::new();
|
let severity = AlertMatcher {
|
||||||
sel2.insert("openshift_io_alert_source".to_string(), "".to_string());
|
label: "severity".to_string(),
|
||||||
let selectors = vec![sel, sel2];
|
operator: MatchOp::Eq,
|
||||||
|
value: "critical".to_string(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let high_http_error_rate = high_http_error_rate();
|
||||||
|
|
||||||
|
let additional_rules = AlertManagerRuleGroup::new("test-rule", vec![high_http_error_rate]);
|
||||||
|
|
||||||
|
let scrape_target = PrometheusNodeExporter {
|
||||||
|
job_name: "firewall".to_string(),
|
||||||
|
metrics_path: "/metrics".to_string(),
|
||||||
|
listen_address: ip!("192.168.1.1"),
|
||||||
|
port: 9100,
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
|
||||||
harmony_cli::run(
|
harmony_cli::run(
|
||||||
Inventory::autoload(),
|
Inventory::autoload(),
|
||||||
K8sAnywhereTopology::from_env(),
|
K8sAnywhereTopology::from_env(),
|
||||||
vec![Box::new(OpenshiftClusterAlertScore {
|
vec![Box::new(OpenshiftClusterAlertScore {
|
||||||
receivers: vec![Box::new(DiscordWebhook {
|
receivers: vec![Box::new(DiscordReceiver {
|
||||||
name: K8sName("wills-discord-webhook-example".to_string()),
|
name: "crit-wills-discord-channel-example".to_string(),
|
||||||
url: hurl!("https://something.io"),
|
url: hurl!("https://test.io"),
|
||||||
selectors: selectors,
|
route: AlertRoute {
|
||||||
|
matchers: vec![severity],
|
||||||
|
..AlertRoute::default("crit-wills-discord-channel-example".to_string())
|
||||||
|
},
|
||||||
})],
|
})],
|
||||||
|
sender: harmony::modules::monitoring::okd::OpenshiftClusterAlertSender,
|
||||||
|
rules: vec![Box::new(additional_rules)],
|
||||||
|
scrape_targets: Some(vec![Box::new(scrape_target)]),
|
||||||
})],
|
})],
|
||||||
None,
|
None,
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -6,9 +6,9 @@ use harmony::{
|
|||||||
application::{
|
application::{
|
||||||
ApplicationScore, RustWebFramework, RustWebapp, features::rhob_monitoring::Monitoring,
|
ApplicationScore, RustWebFramework, RustWebapp, features::rhob_monitoring::Monitoring,
|
||||||
},
|
},
|
||||||
monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
|
monitoring::alert_channel::discord_alert_channel::DiscordReceiver,
|
||||||
},
|
},
|
||||||
topology::K8sAnywhereTopology,
|
topology::{K8sAnywhereTopology, monitoring::AlertRoute},
|
||||||
};
|
};
|
||||||
use harmony_types::{k8s_name::K8sName, net::Url};
|
use harmony_types::{k8s_name::K8sName, net::Url};
|
||||||
|
|
||||||
@@ -22,18 +22,21 @@ async fn main() {
|
|||||||
service_port: 3000,
|
service_port: 3000,
|
||||||
});
|
});
|
||||||
|
|
||||||
let discord_receiver = DiscordWebhook {
|
let receiver_name = "test-discord".to_string();
|
||||||
name: K8sName("test-discord".to_string()),
|
let discord_receiver = DiscordReceiver {
|
||||||
|
name: receiver_name.clone(),
|
||||||
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
|
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
|
||||||
selectors: vec![],
|
route: AlertRoute {
|
||||||
|
..AlertRoute::default(receiver_name)
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
let app = ApplicationScore {
|
let app = ApplicationScore {
|
||||||
features: vec![
|
features: vec![
|
||||||
Box::new(Monitoring {
|
// Box::new(Monitoring {
|
||||||
application: application.clone(),
|
// application: application.clone(),
|
||||||
alert_receiver: vec![Box::new(discord_receiver)],
|
// alert_receiver: vec![Box::new(discord_receiver)],
|
||||||
}),
|
// }),
|
||||||
// TODO add backups, multisite ha, etc
|
// TODO add backups, multisite ha, etc
|
||||||
],
|
],
|
||||||
application,
|
application,
|
||||||
|
|||||||
@@ -8,13 +8,13 @@ use harmony::{
|
|||||||
features::{Monitoring, PackagingDeployment},
|
features::{Monitoring, PackagingDeployment},
|
||||||
},
|
},
|
||||||
monitoring::alert_channel::{
|
monitoring::alert_channel::{
|
||||||
discord_alert_channel::DiscordWebhook, webhook_receiver::WebhookReceiver,
|
discord_alert_channel::DiscordReceiver, webhook_receiver::WebhookReceiver,
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
topology::K8sAnywhereTopology,
|
topology::{K8sAnywhereTopology, monitoring::AlertRoute},
|
||||||
};
|
};
|
||||||
use harmony_macros::hurl;
|
use harmony_macros::hurl;
|
||||||
use harmony_types::k8s_name::K8sName;
|
use harmony_types::{k8s_name::K8sName, net::Url};
|
||||||
|
|
||||||
#[tokio::main]
|
#[tokio::main]
|
||||||
async fn main() {
|
async fn main() {
|
||||||
@@ -26,15 +26,23 @@ async fn main() {
|
|||||||
service_port: 3000,
|
service_port: 3000,
|
||||||
});
|
});
|
||||||
|
|
||||||
let discord_receiver = DiscordWebhook {
|
let receiver_name = "test-discord".to_string();
|
||||||
name: K8sName("test-discord".to_string()),
|
let discord_receiver = DiscordReceiver {
|
||||||
url: hurl!("https://discord.doesnt.exist.com"),
|
name: receiver_name.clone(),
|
||||||
selectors: vec![],
|
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
|
||||||
|
route: AlertRoute {
|
||||||
|
..AlertRoute::default(receiver_name)
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
let receiver_name = "sample-webhook-receiver".to_string();
|
||||||
|
|
||||||
let webhook_receiver = WebhookReceiver {
|
let webhook_receiver = WebhookReceiver {
|
||||||
name: "sample-webhook-receiver".to_string(),
|
name: receiver_name.clone(),
|
||||||
url: hurl!("https://webhook-doesnt-exist.com"),
|
url: hurl!("https://webhook-doesnt-exist.com"),
|
||||||
|
route: AlertRoute {
|
||||||
|
..AlertRoute::default(receiver_name)
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
let app = ApplicationScore {
|
let app = ApplicationScore {
|
||||||
@@ -42,10 +50,10 @@ async fn main() {
|
|||||||
Box::new(PackagingDeployment {
|
Box::new(PackagingDeployment {
|
||||||
application: application.clone(),
|
application: application.clone(),
|
||||||
}),
|
}),
|
||||||
Box::new(Monitoring {
|
// Box::new(Monitoring {
|
||||||
application: application.clone(),
|
// application: application.clone(),
|
||||||
alert_receiver: vec![Box::new(discord_receiver), Box::new(webhook_receiver)],
|
// alert_receiver: vec![Box::new(discord_receiver), Box::new(webhook_receiver)],
|
||||||
}),
|
// }),
|
||||||
// TODO add backups, multisite ha, etc
|
// TODO add backups, multisite ha, etc
|
||||||
],
|
],
|
||||||
application,
|
application,
|
||||||
|
|||||||
@@ -1,11 +1,8 @@
|
|||||||
use harmony::{
|
use harmony::{
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
modules::{
|
modules::application::{
|
||||||
application::{
|
ApplicationScore, RustWebFramework, RustWebapp,
|
||||||
ApplicationScore, RustWebFramework, RustWebapp,
|
features::{Monitoring, PackagingDeployment},
|
||||||
features::{Monitoring, PackagingDeployment},
|
|
||||||
},
|
|
||||||
monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
|
|
||||||
},
|
},
|
||||||
topology::K8sAnywhereTopology,
|
topology::K8sAnywhereTopology,
|
||||||
};
|
};
|
||||||
@@ -30,14 +27,14 @@ async fn main() {
|
|||||||
Box::new(PackagingDeployment {
|
Box::new(PackagingDeployment {
|
||||||
application: application.clone(),
|
application: application.clone(),
|
||||||
}),
|
}),
|
||||||
Box::new(Monitoring {
|
// Box::new(Monitoring {
|
||||||
application: application.clone(),
|
// application: application.clone(),
|
||||||
alert_receiver: vec![Box::new(DiscordWebhook {
|
// alert_receiver: vec![Box::new(DiscordWebhook {
|
||||||
name: K8sName("test-discord".to_string()),
|
// name: K8sName("test-discord".to_string()),
|
||||||
url: hurl!("https://discord.doesnt.exist.com"),
|
// url: hurl!("https://discord.doesnt.exist.com"),
|
||||||
selectors: vec![],
|
// selectors: vec![],
|
||||||
})],
|
// })],
|
||||||
}),
|
// }),
|
||||||
],
|
],
|
||||||
application,
|
application,
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -46,6 +46,14 @@ impl std::fmt::Debug for K8sClient {
|
|||||||
}
|
}
|
||||||
|
|
||||||
impl K8sClient {
|
impl K8sClient {
|
||||||
|
pub fn inner_client(&self) -> &Client {
|
||||||
|
&self.client
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn inner_client_clone(&self) -> Client {
|
||||||
|
self.client.clone()
|
||||||
|
}
|
||||||
|
|
||||||
/// Create a client, reading `DRY_RUN` from the environment.
|
/// Create a client, reading `DRY_RUN` from the environment.
|
||||||
pub fn new(client: Client) -> Self {
|
pub fn new(client: Client) -> Self {
|
||||||
Self {
|
Self {
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
use std::{collections::BTreeMap, process::Command, sync::Arc, time::Duration};
|
use std::{collections::BTreeMap, process::Command, sync::Arc};
|
||||||
|
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use base64::{Engine, engine::general_purpose};
|
use base64::{Engine, engine::general_purpose};
|
||||||
@@ -8,7 +8,7 @@ use k8s_openapi::api::{
|
|||||||
core::v1::{Pod, Secret},
|
core::v1::{Pod, Secret},
|
||||||
rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
|
rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
|
||||||
};
|
};
|
||||||
use kube::api::{DynamicObject, GroupVersionKind, ObjectMeta};
|
use kube::api::{GroupVersionKind, ObjectMeta};
|
||||||
use log::{debug, info, trace, warn};
|
use log::{debug, info, trace, warn};
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
use tokio::sync::OnceCell;
|
use tokio::sync::OnceCell;
|
||||||
@@ -29,28 +29,7 @@ use crate::{
|
|||||||
score_cert_management::CertificateManagementScore,
|
score_cert_management::CertificateManagementScore,
|
||||||
},
|
},
|
||||||
k3d::K3DInstallationScore,
|
k3d::K3DInstallationScore,
|
||||||
k8s::ingress::{K8sIngressScore, PathType},
|
|
||||||
monitoring::{
|
|
||||||
grafana::{grafana::Grafana, helm::helm_grafana::grafana_helm_chart_score},
|
|
||||||
kube_prometheus::crd::{
|
|
||||||
crd_alertmanager_config::CRDPrometheus,
|
|
||||||
crd_grafana::{
|
|
||||||
Grafana as GrafanaCRD, GrafanaCom, GrafanaDashboard,
|
|
||||||
GrafanaDashboardDatasource, GrafanaDashboardSpec, GrafanaDatasource,
|
|
||||||
GrafanaDatasourceConfig, GrafanaDatasourceJsonData,
|
|
||||||
GrafanaDatasourceSecureJsonData, GrafanaDatasourceSpec, GrafanaSpec,
|
|
||||||
},
|
|
||||||
crd_prometheuses::LabelSelector,
|
|
||||||
prometheus_operator::prometheus_operator_helm_chart_score,
|
|
||||||
rhob_alertmanager_config::RHOBObservability,
|
|
||||||
service_monitor::ServiceMonitor,
|
|
||||||
},
|
|
||||||
},
|
|
||||||
okd::{crd::ingresses_config::Ingress as IngressResource, route::OKDTlsPassthroughScore},
|
okd::{crd::ingresses_config::Ingress as IngressResource, route::OKDTlsPassthroughScore},
|
||||||
prometheus::{
|
|
||||||
k8s_prometheus_alerting_score::K8sPrometheusCRDAlertingScore,
|
|
||||||
prometheus::PrometheusMonitoring, rhob_alerting_score::RHOBAlertingScore,
|
|
||||||
},
|
|
||||||
},
|
},
|
||||||
score::Score,
|
score::Score,
|
||||||
topology::{TlsRoute, TlsRouter, ingress::Ingress},
|
topology::{TlsRoute, TlsRouter, ingress::Ingress},
|
||||||
@@ -59,7 +38,6 @@ use crate::{
|
|||||||
use super::super::{
|
use super::super::{
|
||||||
DeploymentTarget, HelmCommand, K8sclient, MultiTargetTopology, PreparationError,
|
DeploymentTarget, HelmCommand, K8sclient, MultiTargetTopology, PreparationError,
|
||||||
PreparationOutcome, Topology,
|
PreparationOutcome, Topology,
|
||||||
oberservability::monitoring::AlertReceiver,
|
|
||||||
tenant::{
|
tenant::{
|
||||||
TenantConfig, TenantManager,
|
TenantConfig, TenantManager,
|
||||||
k8s::K8sTenantManager,
|
k8s::K8sTenantManager,
|
||||||
@@ -166,216 +144,6 @@ impl TlsRouter for K8sAnywhereTopology {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl Grafana for K8sAnywhereTopology {
|
|
||||||
async fn ensure_grafana_operator(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
debug!("ensure grafana operator");
|
|
||||||
let client = self.k8s_client().await.unwrap();
|
|
||||||
let grafana_gvk = GroupVersionKind {
|
|
||||||
group: "grafana.integreatly.org".to_string(),
|
|
||||||
version: "v1beta1".to_string(),
|
|
||||||
kind: "Grafana".to_string(),
|
|
||||||
};
|
|
||||||
let name = "grafanas.grafana.integreatly.org";
|
|
||||||
let ns = "grafana";
|
|
||||||
|
|
||||||
let grafana_crd = client
|
|
||||||
.get_resource_json_value(name, Some(ns), &grafana_gvk)
|
|
||||||
.await;
|
|
||||||
match grafana_crd {
|
|
||||||
Ok(_) => {
|
|
||||||
return Ok(PreparationOutcome::Success {
|
|
||||||
details: "Found grafana CRDs in cluster".to_string(),
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
Err(_) => {
|
|
||||||
return self
|
|
||||||
.install_grafana_operator(inventory, Some("grafana"))
|
|
||||||
.await;
|
|
||||||
}
|
|
||||||
};
|
|
||||||
}
|
|
||||||
async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
let ns = "grafana";
|
|
||||||
|
|
||||||
let mut label = BTreeMap::new();
|
|
||||||
|
|
||||||
label.insert("dashboards".to_string(), "grafana".to_string());
|
|
||||||
|
|
||||||
let label_selector = LabelSelector {
|
|
||||||
match_labels: label.clone(),
|
|
||||||
match_expressions: vec![],
|
|
||||||
};
|
|
||||||
|
|
||||||
let client = self.k8s_client().await?;
|
|
||||||
|
|
||||||
let grafana = self.build_grafana(ns, &label);
|
|
||||||
|
|
||||||
client.apply(&grafana, Some(ns)).await?;
|
|
||||||
//TODO change this to a ensure ready or something better than just a timeout
|
|
||||||
client
|
|
||||||
.wait_until_deployment_ready(
|
|
||||||
"grafana-grafana-deployment",
|
|
||||||
Some("grafana"),
|
|
||||||
Some(Duration::from_secs(30)),
|
|
||||||
)
|
|
||||||
.await?;
|
|
||||||
|
|
||||||
let sa_name = "grafana-grafana-sa";
|
|
||||||
let token_secret_name = "grafana-sa-token-secret";
|
|
||||||
|
|
||||||
let sa_token_secret = self.build_sa_token_secret(token_secret_name, sa_name, ns);
|
|
||||||
|
|
||||||
client.apply(&sa_token_secret, Some(ns)).await?;
|
|
||||||
let secret_gvk = GroupVersionKind {
|
|
||||||
group: "".to_string(),
|
|
||||||
version: "v1".to_string(),
|
|
||||||
kind: "Secret".to_string(),
|
|
||||||
};
|
|
||||||
|
|
||||||
let secret = client
|
|
||||||
.get_resource_json_value(token_secret_name, Some(ns), &secret_gvk)
|
|
||||||
.await?;
|
|
||||||
|
|
||||||
let token = format!(
|
|
||||||
"Bearer {}",
|
|
||||||
self.extract_and_normalize_token(&secret).unwrap()
|
|
||||||
);
|
|
||||||
|
|
||||||
debug!("creating grafana clusterrole binding");
|
|
||||||
|
|
||||||
let clusterrolebinding =
|
|
||||||
self.build_cluster_rolebinding(sa_name, "cluster-monitoring-view", ns);
|
|
||||||
|
|
||||||
client.apply(&clusterrolebinding, Some(ns)).await?;
|
|
||||||
|
|
||||||
debug!("creating grafana datasource crd");
|
|
||||||
|
|
||||||
let thanos_url = format!(
|
|
||||||
"https://{}",
|
|
||||||
self.get_domain("thanos-querier-openshift-monitoring")
|
|
||||||
.await
|
|
||||||
.unwrap()
|
|
||||||
);
|
|
||||||
|
|
||||||
let thanos_openshift_datasource = self.build_grafana_datasource(
|
|
||||||
"thanos-openshift-monitoring",
|
|
||||||
ns,
|
|
||||||
&label_selector,
|
|
||||||
&thanos_url,
|
|
||||||
&token,
|
|
||||||
);
|
|
||||||
|
|
||||||
client.apply(&thanos_openshift_datasource, Some(ns)).await?;
|
|
||||||
|
|
||||||
debug!("creating grafana dashboard crd");
|
|
||||||
let dashboard = self.build_grafana_dashboard(ns, &label_selector);
|
|
||||||
|
|
||||||
client.apply(&dashboard, Some(ns)).await?;
|
|
||||||
debug!("creating grafana ingress");
|
|
||||||
let grafana_ingress = self.build_grafana_ingress(ns).await;
|
|
||||||
|
|
||||||
grafana_ingress
|
|
||||||
.interpret(&Inventory::empty(), self)
|
|
||||||
.await
|
|
||||||
.map_err(|e| PreparationError::new(e.to_string()))?;
|
|
||||||
|
|
||||||
Ok(PreparationOutcome::Success {
|
|
||||||
details: "Installed grafana composants".to_string(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl PrometheusMonitoring<CRDPrometheus> for K8sAnywhereTopology {
|
|
||||||
async fn install_prometheus(
|
|
||||||
&self,
|
|
||||||
sender: &CRDPrometheus,
|
|
||||||
_inventory: &Inventory,
|
|
||||||
_receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
let client = self.k8s_client().await?;
|
|
||||||
|
|
||||||
for monitor in sender.service_monitor.iter() {
|
|
||||||
client
|
|
||||||
.apply(monitor, Some(&sender.namespace))
|
|
||||||
.await
|
|
||||||
.map_err(|e| PreparationError::new(e.to_string()))?;
|
|
||||||
}
|
|
||||||
Ok(PreparationOutcome::Success {
|
|
||||||
details: "successfuly installed prometheus components".to_string(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn ensure_prometheus_operator(
|
|
||||||
&self,
|
|
||||||
sender: &CRDPrometheus,
|
|
||||||
_inventory: &Inventory,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
let po_result = self.ensure_prometheus_operator(sender).await?;
|
|
||||||
|
|
||||||
match po_result {
|
|
||||||
PreparationOutcome::Success { details: _ } => {
|
|
||||||
debug!("Detected prometheus crds operator present in cluster.");
|
|
||||||
return Ok(po_result);
|
|
||||||
}
|
|
||||||
PreparationOutcome::Noop => {
|
|
||||||
debug!("Skipping Prometheus CR installation due to missing operator.");
|
|
||||||
return Ok(po_result);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl PrometheusMonitoring<RHOBObservability> for K8sAnywhereTopology {
|
|
||||||
async fn install_prometheus(
|
|
||||||
&self,
|
|
||||||
sender: &RHOBObservability,
|
|
||||||
inventory: &Inventory,
|
|
||||||
receivers: Option<Vec<Box<dyn AlertReceiver<RHOBObservability>>>>,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
let po_result = self.ensure_cluster_observability_operator(sender).await?;
|
|
||||||
|
|
||||||
if po_result == PreparationOutcome::Noop {
|
|
||||||
debug!("Skipping Prometheus CR installation due to missing operator.");
|
|
||||||
return Ok(po_result);
|
|
||||||
}
|
|
||||||
|
|
||||||
let result = self
|
|
||||||
.get_cluster_observability_operator_prometheus_application_score(
|
|
||||||
sender.clone(),
|
|
||||||
receivers,
|
|
||||||
)
|
|
||||||
.await
|
|
||||||
.interpret(inventory, self)
|
|
||||||
.await;
|
|
||||||
|
|
||||||
match result {
|
|
||||||
Ok(outcome) => match outcome.status {
|
|
||||||
InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
|
|
||||||
details: outcome.message,
|
|
||||||
}),
|
|
||||||
InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
|
|
||||||
_ => Err(PreparationError::new(outcome.message)),
|
|
||||||
},
|
|
||||||
Err(err) => Err(PreparationError::new(err.to_string())),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn ensure_prometheus_operator(
|
|
||||||
&self,
|
|
||||||
sender: &RHOBObservability,
|
|
||||||
inventory: &Inventory,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Serialize for K8sAnywhereTopology {
|
impl Serialize for K8sAnywhereTopology {
|
||||||
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
where
|
where
|
||||||
@@ -580,23 +348,6 @@ impl K8sAnywhereTopology {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
|
|
||||||
let token_b64 = secret
|
|
||||||
.data
|
|
||||||
.get("token")
|
|
||||||
.or_else(|| secret.data.get("data").and_then(|d| d.get("token")))
|
|
||||||
.and_then(|v| v.as_str())?;
|
|
||||||
|
|
||||||
let bytes = general_purpose::STANDARD.decode(token_b64).ok()?;
|
|
||||||
|
|
||||||
let s = String::from_utf8(bytes).ok()?;
|
|
||||||
|
|
||||||
let cleaned = s
|
|
||||||
.trim_matches(|c: char| c.is_whitespace() || c == '\0')
|
|
||||||
.to_string();
|
|
||||||
Some(cleaned)
|
|
||||||
}
|
|
||||||
|
|
||||||
pub async fn get_k8s_distribution(&self) -> Result<KubernetesDistribution, PreparationError> {
|
pub async fn get_k8s_distribution(&self) -> Result<KubernetesDistribution, PreparationError> {
|
||||||
self.k8s_client()
|
self.k8s_client()
|
||||||
.await?
|
.await?
|
||||||
@@ -656,141 +407,6 @@ impl K8sAnywhereTopology {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn build_grafana_datasource(
|
|
||||||
&self,
|
|
||||||
name: &str,
|
|
||||||
ns: &str,
|
|
||||||
label_selector: &LabelSelector,
|
|
||||||
url: &str,
|
|
||||||
token: &str,
|
|
||||||
) -> GrafanaDatasource {
|
|
||||||
let mut json_data = BTreeMap::new();
|
|
||||||
json_data.insert("timeInterval".to_string(), "5s".to_string());
|
|
||||||
|
|
||||||
GrafanaDatasource {
|
|
||||||
metadata: ObjectMeta {
|
|
||||||
name: Some(name.to_string()),
|
|
||||||
namespace: Some(ns.to_string()),
|
|
||||||
..Default::default()
|
|
||||||
},
|
|
||||||
spec: GrafanaDatasourceSpec {
|
|
||||||
instance_selector: label_selector.clone(),
|
|
||||||
allow_cross_namespace_import: Some(true),
|
|
||||||
values_from: None,
|
|
||||||
datasource: GrafanaDatasourceConfig {
|
|
||||||
access: "proxy".to_string(),
|
|
||||||
name: name.to_string(),
|
|
||||||
r#type: "prometheus".to_string(),
|
|
||||||
url: url.to_string(),
|
|
||||||
database: None,
|
|
||||||
json_data: Some(GrafanaDatasourceJsonData {
|
|
||||||
time_interval: Some("60s".to_string()),
|
|
||||||
http_header_name1: Some("Authorization".to_string()),
|
|
||||||
tls_skip_verify: Some(true),
|
|
||||||
oauth_pass_thru: Some(true),
|
|
||||||
}),
|
|
||||||
secure_json_data: Some(GrafanaDatasourceSecureJsonData {
|
|
||||||
http_header_value1: Some(format!("Bearer {token}")),
|
|
||||||
}),
|
|
||||||
is_default: Some(false),
|
|
||||||
editable: Some(true),
|
|
||||||
},
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
fn build_grafana_dashboard(
|
|
||||||
&self,
|
|
||||||
ns: &str,
|
|
||||||
label_selector: &LabelSelector,
|
|
||||||
) -> GrafanaDashboard {
|
|
||||||
let graf_dashboard = GrafanaDashboard {
|
|
||||||
metadata: ObjectMeta {
|
|
||||||
name: Some(format!("grafana-dashboard-{}", ns)),
|
|
||||||
namespace: Some(ns.to_string()),
|
|
||||||
..Default::default()
|
|
||||||
},
|
|
||||||
spec: GrafanaDashboardSpec {
|
|
||||||
resync_period: Some("30s".to_string()),
|
|
||||||
instance_selector: label_selector.clone(),
|
|
||||||
datasources: Some(vec![GrafanaDashboardDatasource {
|
|
||||||
input_name: "DS_PROMETHEUS".to_string(),
|
|
||||||
datasource_name: "thanos-openshift-monitoring".to_string(),
|
|
||||||
}]),
|
|
||||||
json: None,
|
|
||||||
grafana_com: Some(GrafanaCom {
|
|
||||||
id: 17406,
|
|
||||||
revision: None,
|
|
||||||
}),
|
|
||||||
},
|
|
||||||
};
|
|
||||||
graf_dashboard
|
|
||||||
}
|
|
||||||
|
|
||||||
fn build_grafana(&self, ns: &str, labels: &BTreeMap<String, String>) -> GrafanaCRD {
|
|
||||||
let grafana = GrafanaCRD {
|
|
||||||
metadata: ObjectMeta {
|
|
||||||
name: Some(format!("grafana-{}", ns)),
|
|
||||||
namespace: Some(ns.to_string()),
|
|
||||||
labels: Some(labels.clone()),
|
|
||||||
..Default::default()
|
|
||||||
},
|
|
||||||
spec: GrafanaSpec {
|
|
||||||
config: None,
|
|
||||||
admin_user: None,
|
|
||||||
admin_password: None,
|
|
||||||
ingress: None,
|
|
||||||
persistence: None,
|
|
||||||
resources: None,
|
|
||||||
},
|
|
||||||
};
|
|
||||||
grafana
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn build_grafana_ingress(&self, ns: &str) -> K8sIngressScore {
|
|
||||||
let domain = self.get_domain(&format!("grafana-{}", ns)).await.unwrap();
|
|
||||||
let name = format!("{}-grafana", ns);
|
|
||||||
let backend_service = format!("grafana-{}-service", ns);
|
|
||||||
|
|
||||||
K8sIngressScore {
|
|
||||||
name: fqdn::fqdn!(&name),
|
|
||||||
host: fqdn::fqdn!(&domain),
|
|
||||||
backend_service: fqdn::fqdn!(&backend_service),
|
|
||||||
port: 3000,
|
|
||||||
path: Some("/".to_string()),
|
|
||||||
path_type: Some(PathType::Prefix),
|
|
||||||
namespace: Some(fqdn::fqdn!(&ns)),
|
|
||||||
ingress_class_name: Some("openshift-default".to_string()),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn get_cluster_observability_operator_prometheus_application_score(
|
|
||||||
&self,
|
|
||||||
sender: RHOBObservability,
|
|
||||||
receivers: Option<Vec<Box<dyn AlertReceiver<RHOBObservability>>>>,
|
|
||||||
) -> RHOBAlertingScore {
|
|
||||||
RHOBAlertingScore {
|
|
||||||
sender,
|
|
||||||
receivers: receivers.unwrap_or_default(),
|
|
||||||
service_monitors: vec![],
|
|
||||||
prometheus_rules: vec![],
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn get_k8s_prometheus_application_score(
|
|
||||||
&self,
|
|
||||||
sender: CRDPrometheus,
|
|
||||||
receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
|
|
||||||
service_monitors: Option<Vec<ServiceMonitor>>,
|
|
||||||
) -> K8sPrometheusCRDAlertingScore {
|
|
||||||
return K8sPrometheusCRDAlertingScore {
|
|
||||||
sender,
|
|
||||||
receivers: receivers.unwrap_or_default(),
|
|
||||||
service_monitors: service_monitors.unwrap_or_default(),
|
|
||||||
prometheus_rules: vec![],
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn openshift_ingress_operator_available(&self) -> Result<(), PreparationError> {
|
async fn openshift_ingress_operator_available(&self) -> Result<(), PreparationError> {
|
||||||
let client = self.k8s_client().await?;
|
let client = self.k8s_client().await?;
|
||||||
let gvk = GroupVersionKind {
|
let gvk = GroupVersionKind {
|
||||||
@@ -956,137 +572,6 @@ impl K8sAnywhereTopology {
|
|||||||
)),
|
)),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn ensure_cluster_observability_operator(
|
|
||||||
&self,
|
|
||||||
sender: &RHOBObservability,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
let status = Command::new("sh")
|
|
||||||
.args(["-c", "kubectl get crd -A | grep -i rhobs"])
|
|
||||||
.status()
|
|
||||||
.map_err(|e| PreparationError::new(format!("could not connect to cluster: {}", e)))?;
|
|
||||||
|
|
||||||
if !status.success() {
|
|
||||||
if let Some(Some(k8s_state)) = self.k8s_state.get() {
|
|
||||||
match k8s_state.source {
|
|
||||||
K8sSource::LocalK3d => {
|
|
||||||
warn!(
|
|
||||||
"Installing observability operator is not supported on LocalK3d source"
|
|
||||||
);
|
|
||||||
return Ok(PreparationOutcome::Noop);
|
|
||||||
debug!("installing cluster observability operator");
|
|
||||||
todo!();
|
|
||||||
let op_score =
|
|
||||||
prometheus_operator_helm_chart_score(sender.namespace.clone());
|
|
||||||
let result = op_score.interpret(&Inventory::empty(), self).await;
|
|
||||||
|
|
||||||
return match result {
|
|
||||||
Ok(outcome) => match outcome.status {
|
|
||||||
InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
|
|
||||||
details: "installed cluster observability operator".into(),
|
|
||||||
}),
|
|
||||||
InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
|
|
||||||
_ => Err(PreparationError::new(
|
|
||||||
"failed to install cluster observability operator (unknown error)".into(),
|
|
||||||
)),
|
|
||||||
},
|
|
||||||
Err(err) => Err(PreparationError::new(err.to_string())),
|
|
||||||
};
|
|
||||||
}
|
|
||||||
K8sSource::Kubeconfig => {
|
|
||||||
debug!(
|
|
||||||
"unable to install cluster observability operator, contact cluster admin"
|
|
||||||
);
|
|
||||||
return Ok(PreparationOutcome::Noop);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
warn!(
|
|
||||||
"Unable to detect k8s_state. Skipping Cluster Observability Operator install."
|
|
||||||
);
|
|
||||||
return Ok(PreparationOutcome::Noop);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
debug!("Cluster Observability Operator is already present, skipping install");
|
|
||||||
|
|
||||||
Ok(PreparationOutcome::Success {
|
|
||||||
details: "cluster observability operator present in cluster".into(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn ensure_prometheus_operator(
|
|
||||||
&self,
|
|
||||||
sender: &CRDPrometheus,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
let status = Command::new("sh")
|
|
||||||
.args(["-c", "kubectl get crd -A | grep -i prometheuses"])
|
|
||||||
.status()
|
|
||||||
.map_err(|e| PreparationError::new(format!("could not connect to cluster: {}", e)))?;
|
|
||||||
|
|
||||||
if !status.success() {
|
|
||||||
if let Some(Some(k8s_state)) = self.k8s_state.get() {
|
|
||||||
match k8s_state.source {
|
|
||||||
K8sSource::LocalK3d => {
|
|
||||||
debug!("installing prometheus operator");
|
|
||||||
let op_score =
|
|
||||||
prometheus_operator_helm_chart_score(sender.namespace.clone());
|
|
||||||
let result = op_score.interpret(&Inventory::empty(), self).await;
|
|
||||||
|
|
||||||
return match result {
|
|
||||||
Ok(outcome) => match outcome.status {
|
|
||||||
InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
|
|
||||||
details: "installed prometheus operator".into(),
|
|
||||||
}),
|
|
||||||
InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
|
|
||||||
_ => Err(PreparationError::new(
|
|
||||||
"failed to install prometheus operator (unknown error)".into(),
|
|
||||||
)),
|
|
||||||
},
|
|
||||||
Err(err) => Err(PreparationError::new(err.to_string())),
|
|
||||||
};
|
|
||||||
}
|
|
||||||
K8sSource::Kubeconfig => {
|
|
||||||
debug!("unable to install prometheus operator, contact cluster admin");
|
|
||||||
return Ok(PreparationOutcome::Noop);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
warn!("Unable to detect k8s_state. Skipping Prometheus Operator install.");
|
|
||||||
return Ok(PreparationOutcome::Noop);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
debug!("Prometheus operator is already present, skipping install");
|
|
||||||
|
|
||||||
Ok(PreparationOutcome::Success {
|
|
||||||
details: "prometheus operator present in cluster".into(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn install_grafana_operator(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
ns: Option<&str>,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError> {
|
|
||||||
let namespace = ns.unwrap_or("grafana");
|
|
||||||
info!("installing grafana operator in ns {namespace}");
|
|
||||||
let tenant = self.get_k8s_tenant_manager()?.get_tenant_config().await;
|
|
||||||
let mut namespace_scope = false;
|
|
||||||
if tenant.is_some() {
|
|
||||||
namespace_scope = true;
|
|
||||||
}
|
|
||||||
let _grafana_operator_score = grafana_helm_chart_score(namespace, namespace_scope)
|
|
||||||
.interpret(inventory, self)
|
|
||||||
.await
|
|
||||||
.map_err(|e| PreparationError::new(e.to_string()));
|
|
||||||
Ok(PreparationOutcome::Success {
|
|
||||||
details: format!(
|
|
||||||
"Successfully installed grafana operator in ns {}",
|
|
||||||
ns.unwrap()
|
|
||||||
),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Clone, Debug)]
|
#[derive(Clone, Debug)]
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
mod k8s_anywhere;
|
mod k8s_anywhere;
|
||||||
pub mod nats;
|
pub mod nats;
|
||||||
|
pub mod observability;
|
||||||
mod postgres;
|
mod postgres;
|
||||||
pub use k8s_anywhere::*;
|
pub use k8s_anywhere::*;
|
||||||
|
|||||||
@@ -0,0 +1,147 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::monitoring::grafana::{
|
||||||
|
grafana::Grafana,
|
||||||
|
k8s::{
|
||||||
|
score_ensure_grafana_ready::GrafanaK8sEnsureReadyScore,
|
||||||
|
score_grafana_alert_receiver::GrafanaK8sReceiverScore,
|
||||||
|
score_grafana_datasource::GrafanaK8sDatasourceScore,
|
||||||
|
score_grafana_rule::GrafanaK8sRuleScore, score_install_grafana::GrafanaK8sInstallScore,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
topology::{
|
||||||
|
K8sAnywhereTopology, PreparationError, PreparationOutcome,
|
||||||
|
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl Observability<Grafana> for K8sAnywhereTopology {
|
||||||
|
async fn install_alert_sender(
|
||||||
|
&self,
|
||||||
|
sender: &Grafana,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let score = GrafanaK8sInstallScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Grafana not installed {}", e)))?;
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully installed grafana alert sender".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_receivers(
|
||||||
|
&self,
|
||||||
|
sender: &Grafana,
|
||||||
|
inventory: &Inventory,
|
||||||
|
receivers: Option<Vec<Box<dyn AlertReceiver<Grafana>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let receivers = match receivers {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for receiver in receivers {
|
||||||
|
let score = GrafanaK8sReceiverScore {
|
||||||
|
receiver,
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All alert receivers installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_rules(
|
||||||
|
&self,
|
||||||
|
sender: &Grafana,
|
||||||
|
inventory: &Inventory,
|
||||||
|
rules: Option<Vec<Box<dyn AlertRule<Grafana>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let rules = match rules {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for rule in rules {
|
||||||
|
let score = GrafanaK8sRuleScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
rule,
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All alert rules installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn add_scrape_targets(
|
||||||
|
&self,
|
||||||
|
sender: &Grafana,
|
||||||
|
inventory: &Inventory,
|
||||||
|
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Grafana>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let scrape_targets = match scrape_targets {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for scrape_target in scrape_targets {
|
||||||
|
let score = GrafanaK8sDatasourceScore {
|
||||||
|
scrape_target,
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to add DataSource: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All datasources installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn ensure_monitoring_installed(
|
||||||
|
&self,
|
||||||
|
sender: &Grafana,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let score = GrafanaK8sEnsureReadyScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Grafana not ready {}", e)))?;
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Grafana Ready".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,142 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::monitoring::kube_prometheus::{
|
||||||
|
KubePrometheus, helm::kube_prometheus_helm_chart::kube_prometheus_helm_chart_score,
|
||||||
|
score_kube_prometheus_alert_receivers::KubePrometheusReceiverScore,
|
||||||
|
score_kube_prometheus_ensure_ready::KubePrometheusEnsureReadyScore,
|
||||||
|
score_kube_prometheus_rule::KubePrometheusRuleScore,
|
||||||
|
score_kube_prometheus_scrape_target::KubePrometheusScrapeTargetScore,
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
topology::{
|
||||||
|
K8sAnywhereTopology, PreparationError, PreparationOutcome,
|
||||||
|
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl Observability<KubePrometheus> for K8sAnywhereTopology {
|
||||||
|
async fn install_alert_sender(
|
||||||
|
&self,
|
||||||
|
sender: &KubePrometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
kube_prometheus_helm_chart_score(sender.config.clone())
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully installed kubeprometheus alert sender".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_receivers(
|
||||||
|
&self,
|
||||||
|
sender: &KubePrometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
receivers: Option<Vec<Box<dyn AlertReceiver<KubePrometheus>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let receivers = match receivers {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for receiver in receivers {
|
||||||
|
let score = KubePrometheusReceiverScore {
|
||||||
|
receiver,
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All alert receivers installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_rules(
|
||||||
|
&self,
|
||||||
|
sender: &KubePrometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
rules: Option<Vec<Box<dyn AlertRule<KubePrometheus>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let rules = match rules {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for rule in rules {
|
||||||
|
let score = KubePrometheusRuleScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
rule,
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All alert rules installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn add_scrape_targets(
|
||||||
|
&self,
|
||||||
|
sender: &KubePrometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<KubePrometheus>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let scrape_targets = match scrape_targets {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for scrape_target in scrape_targets {
|
||||||
|
let score = KubePrometheusScrapeTargetScore {
|
||||||
|
scrape_target,
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All scrap targets installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn ensure_monitoring_installed(
|
||||||
|
&self,
|
||||||
|
sender: &KubePrometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let score = KubePrometheusEnsureReadyScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("KubePrometheus not ready {}", e)))?;
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "KubePrometheus Ready".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
pub mod grafana;
|
||||||
|
pub mod kube_prometheus;
|
||||||
|
pub mod openshift_monitoring;
|
||||||
|
pub mod prometheus;
|
||||||
|
pub mod redhat_cluster_observability;
|
||||||
@@ -0,0 +1,142 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
use log::info;
|
||||||
|
|
||||||
|
use crate::score::Score;
|
||||||
|
use crate::{
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::monitoring::okd::{
|
||||||
|
OpenshiftClusterAlertSender,
|
||||||
|
score_enable_cluster_monitoring::OpenshiftEnableClusterMonitoringScore,
|
||||||
|
score_openshift_alert_rule::OpenshiftAlertRuleScore,
|
||||||
|
score_openshift_receiver::OpenshiftReceiverScore,
|
||||||
|
score_openshift_scrape_target::OpenshiftScrapeTargetScore,
|
||||||
|
score_user_workload::OpenshiftUserWorkloadMonitoring,
|
||||||
|
score_verify_user_workload_monitoring::VerifyUserWorkload,
|
||||||
|
},
|
||||||
|
topology::{
|
||||||
|
K8sAnywhereTopology, PreparationError, PreparationOutcome,
|
||||||
|
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl Observability<OpenshiftClusterAlertSender> for K8sAnywhereTopology {
|
||||||
|
async fn install_alert_sender(
|
||||||
|
&self,
|
||||||
|
_sender: &OpenshiftClusterAlertSender,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
info!("enabling cluster monitoring");
|
||||||
|
let cluster_monitoring_score = OpenshiftEnableClusterMonitoringScore {};
|
||||||
|
cluster_monitoring_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError { msg: e.to_string() })?;
|
||||||
|
|
||||||
|
info!("enabling user workload monitoring");
|
||||||
|
let user_workload_score = OpenshiftUserWorkloadMonitoring {};
|
||||||
|
user_workload_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError { msg: e.to_string() })?;
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully configured cluster monitoring".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_receivers(
|
||||||
|
&self,
|
||||||
|
_sender: &OpenshiftClusterAlertSender,
|
||||||
|
inventory: &Inventory,
|
||||||
|
receivers: Option<Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
if let Some(receivers) = receivers {
|
||||||
|
for receiver in receivers {
|
||||||
|
info!("Installing receiver {}", receiver.name());
|
||||||
|
let receiver_score = OpenshiftReceiverScore { receiver };
|
||||||
|
receiver_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError { msg: e.to_string() })?;
|
||||||
|
}
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully installed receivers for OpenshiftClusterMonitoring"
|
||||||
|
.to_string(),
|
||||||
|
})
|
||||||
|
} else {
|
||||||
|
Ok(PreparationOutcome::Noop)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_rules(
|
||||||
|
&self,
|
||||||
|
_sender: &OpenshiftClusterAlertSender,
|
||||||
|
inventory: &Inventory,
|
||||||
|
rules: Option<Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
if let Some(rules) = rules {
|
||||||
|
for rule in rules {
|
||||||
|
info!("Installing rule ");
|
||||||
|
let rule_score = OpenshiftAlertRuleScore { rule: rule };
|
||||||
|
rule_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError { msg: e.to_string() })?;
|
||||||
|
}
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully installed rules for OpenshiftClusterMonitoring".to_string(),
|
||||||
|
})
|
||||||
|
} else {
|
||||||
|
Ok(PreparationOutcome::Noop)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn add_scrape_targets(
|
||||||
|
&self,
|
||||||
|
_sender: &OpenshiftClusterAlertSender,
|
||||||
|
inventory: &Inventory,
|
||||||
|
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
if let Some(scrape_targets) = scrape_targets {
|
||||||
|
for scrape_target in scrape_targets {
|
||||||
|
info!("Installing scrape target");
|
||||||
|
let scrape_target_score = OpenshiftScrapeTargetScore {
|
||||||
|
scrape_target: scrape_target,
|
||||||
|
};
|
||||||
|
scrape_target_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError { msg: e.to_string() })?;
|
||||||
|
}
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully added scrape targets for OpenshiftClusterMonitoring"
|
||||||
|
.to_string(),
|
||||||
|
})
|
||||||
|
} else {
|
||||||
|
Ok(PreparationOutcome::Noop)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn ensure_monitoring_installed(
|
||||||
|
&self,
|
||||||
|
_sender: &OpenshiftClusterAlertSender,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let verify_monitoring_score = VerifyUserWorkload {};
|
||||||
|
info!("Verifying user workload and cluster monitoring installed");
|
||||||
|
verify_monitoring_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError { msg: e.to_string() })?;
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "OpenshiftClusterMonitoring ready".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,147 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::monitoring::prometheus::{
|
||||||
|
Prometheus, score_prometheus_alert_receivers::PrometheusReceiverScore,
|
||||||
|
score_prometheus_ensure_ready::PrometheusEnsureReadyScore,
|
||||||
|
score_prometheus_install::PrometheusInstallScore,
|
||||||
|
score_prometheus_rule::PrometheusRuleScore,
|
||||||
|
score_prometheus_scrape_target::PrometheusScrapeTargetScore,
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
topology::{
|
||||||
|
K8sAnywhereTopology, PreparationError, PreparationOutcome,
|
||||||
|
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl Observability<Prometheus> for K8sAnywhereTopology {
|
||||||
|
async fn install_alert_sender(
|
||||||
|
&self,
|
||||||
|
sender: &Prometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let score = PrometheusInstallScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Prometheus not installed {}", e)))?;
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully installed kubeprometheus alert sender".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_receivers(
|
||||||
|
&self,
|
||||||
|
sender: &Prometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
receivers: Option<Vec<Box<dyn AlertReceiver<Prometheus>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let receivers = match receivers {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for receiver in receivers {
|
||||||
|
let score = PrometheusReceiverScore {
|
||||||
|
receiver,
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All alert receivers installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_rules(
|
||||||
|
&self,
|
||||||
|
sender: &Prometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
rules: Option<Vec<Box<dyn AlertRule<Prometheus>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let rules = match rules {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for rule in rules {
|
||||||
|
let score = PrometheusRuleScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
rule,
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All alert rules installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn add_scrape_targets(
|
||||||
|
&self,
|
||||||
|
sender: &Prometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Prometheus>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let scrape_targets = match scrape_targets {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for scrape_target in scrape_targets {
|
||||||
|
let score = PrometheusScrapeTargetScore {
|
||||||
|
scrape_target,
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "All scrap targets installed successfully".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn ensure_monitoring_installed(
|
||||||
|
&self,
|
||||||
|
sender: &Prometheus,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let score = PrometheusEnsureReadyScore {
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(format!("Prometheus not ready {}", e)))?;
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Prometheus Ready".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,116 @@
|
|||||||
|
use crate::{
|
||||||
|
modules::monitoring::red_hat_cluster_observability::{
|
||||||
|
score_alert_receiver::RedHatClusterObservabilityReceiverScore,
|
||||||
|
score_coo_monitoring_stack::RedHatClusterObservabilityMonitoringStackScore,
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
};
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use log::info;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::monitoring::red_hat_cluster_observability::{
|
||||||
|
RedHatClusterObservability,
|
||||||
|
score_redhat_cluster_observability_operator::RedHatClusterObservabilityOperatorScore,
|
||||||
|
},
|
||||||
|
topology::{
|
||||||
|
K8sAnywhereTopology, PreparationError, PreparationOutcome,
|
||||||
|
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl Observability<RedHatClusterObservability> for K8sAnywhereTopology {
|
||||||
|
async fn install_alert_sender(
|
||||||
|
&self,
|
||||||
|
sender: &RedHatClusterObservability,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
info!("Verifying Redhat Cluster Observability Operator");
|
||||||
|
|
||||||
|
let coo_score = RedHatClusterObservabilityOperatorScore::default();
|
||||||
|
|
||||||
|
coo_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
info!(
|
||||||
|
"Installing Cluster Observability Operator Monitoring Stack in ns {}",
|
||||||
|
sender.namespace.clone()
|
||||||
|
);
|
||||||
|
|
||||||
|
let coo_monitoring_stack_score = RedHatClusterObservabilityMonitoringStackScore {
|
||||||
|
namespace: sender.namespace.clone(),
|
||||||
|
resource_selector: sender.resource_selector.clone(),
|
||||||
|
};
|
||||||
|
|
||||||
|
coo_monitoring_stack_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully installed RedHatClusterObservability Operator".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_receivers(
|
||||||
|
&self,
|
||||||
|
sender: &RedHatClusterObservability,
|
||||||
|
inventory: &Inventory,
|
||||||
|
receivers: Option<Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
let receivers = match receivers {
|
||||||
|
Some(r) if !r.is_empty() => r,
|
||||||
|
_ => return Ok(PreparationOutcome::Noop),
|
||||||
|
};
|
||||||
|
|
||||||
|
for receiver in receivers {
|
||||||
|
info!("Installing receiver {}", receiver.name());
|
||||||
|
|
||||||
|
let receiver_score = RedHatClusterObservabilityReceiverScore {
|
||||||
|
receiver,
|
||||||
|
sender: sender.clone(),
|
||||||
|
};
|
||||||
|
receiver_score
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, self)
|
||||||
|
.await
|
||||||
|
.map_err(|e| PreparationError::new(e.to_string()))?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(PreparationOutcome::Success {
|
||||||
|
details: "Successfully installed receivers for OpenshiftClusterMonitoring".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn install_rules(
|
||||||
|
&self,
|
||||||
|
_sender: &RedHatClusterObservability,
|
||||||
|
_inventory: &Inventory,
|
||||||
|
_rules: Option<Vec<Box<dyn AlertRule<RedHatClusterObservability>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn add_scrape_targets(
|
||||||
|
&self,
|
||||||
|
_sender: &RedHatClusterObservability,
|
||||||
|
_inventory: &Inventory,
|
||||||
|
_scrape_targets: Option<Vec<Box<dyn ScrapeTarget<RedHatClusterObservability>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn ensure_monitoring_installed(
|
||||||
|
&self,
|
||||||
|
_sender: &RedHatClusterObservability,
|
||||||
|
_inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -2,6 +2,7 @@ pub mod decentralized;
|
|||||||
mod failover;
|
mod failover;
|
||||||
mod ha_cluster;
|
mod ha_cluster;
|
||||||
pub mod ingress;
|
pub mod ingress;
|
||||||
|
pub mod monitoring;
|
||||||
pub mod node_exporter;
|
pub mod node_exporter;
|
||||||
pub mod opnsense;
|
pub mod opnsense;
|
||||||
pub use failover::*;
|
pub use failover::*;
|
||||||
@@ -11,7 +12,6 @@ mod http;
|
|||||||
pub mod installable;
|
pub mod installable;
|
||||||
mod k8s_anywhere;
|
mod k8s_anywhere;
|
||||||
mod localhost;
|
mod localhost;
|
||||||
pub mod oberservability;
|
|
||||||
pub mod tenant;
|
pub mod tenant;
|
||||||
use derive_new::new;
|
use derive_new::new;
|
||||||
pub use k8s_anywhere::*;
|
pub use k8s_anywhere::*;
|
||||||
|
|||||||
256
harmony/src/domain/topology/monitoring.rs
Normal file
256
harmony/src/domain/topology/monitoring.rs
Normal file
@@ -0,0 +1,256 @@
|
|||||||
|
use std::{
|
||||||
|
any::Any,
|
||||||
|
collections::{BTreeMap, HashMap},
|
||||||
|
net::IpAddr,
|
||||||
|
};
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use kube::api::DynamicObject;
|
||||||
|
use log::{debug, info};
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
data::Version,
|
||||||
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
|
inventory::Inventory,
|
||||||
|
topology::{PreparationError, PreparationOutcome, Topology, installable::Installable},
|
||||||
|
};
|
||||||
|
use harmony_types::id::Id;
|
||||||
|
|
||||||
|
/// Defines the application that sends the alerts to a receivers
|
||||||
|
/// for example prometheus
|
||||||
|
#[async_trait]
|
||||||
|
pub trait AlertSender: Send + Sync + std::fmt::Debug {
|
||||||
|
fn name(&self) -> String;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Trait which defines how an alert sender is impleneted for a specific topology
|
||||||
|
#[async_trait]
|
||||||
|
pub trait Observability<S: AlertSender> {
|
||||||
|
async fn install_alert_sender(
|
||||||
|
&self,
|
||||||
|
sender: &S,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError>;
|
||||||
|
|
||||||
|
async fn install_receivers(
|
||||||
|
&self,
|
||||||
|
sender: &S,
|
||||||
|
inventory: &Inventory,
|
||||||
|
receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError>;
|
||||||
|
|
||||||
|
async fn install_rules(
|
||||||
|
&self,
|
||||||
|
sender: &S,
|
||||||
|
inventory: &Inventory,
|
||||||
|
rules: Option<Vec<Box<dyn AlertRule<S>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError>;
|
||||||
|
|
||||||
|
async fn add_scrape_targets(
|
||||||
|
&self,
|
||||||
|
sender: &S,
|
||||||
|
inventory: &Inventory,
|
||||||
|
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError>;
|
||||||
|
|
||||||
|
async fn ensure_monitoring_installed(
|
||||||
|
&self,
|
||||||
|
sender: &S,
|
||||||
|
inventory: &Inventory,
|
||||||
|
) -> Result<PreparationOutcome, PreparationError>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Defines the entity that receives the alerts from a sender. For example Discord, Slack, etc
|
||||||
|
///
|
||||||
|
pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
|
||||||
|
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
|
||||||
|
fn name(&self) -> String;
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Defines a generic rule that can be applied to a sender, such as aprometheus alert rule
|
||||||
|
pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
|
||||||
|
fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
|
||||||
|
fn name(&self) -> String;
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertRule<S>>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A generic scrape target that can be added to a sender to scrape metrics from, for example a
|
||||||
|
/// server outside of the cluster
|
||||||
|
pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
|
||||||
|
fn build_scrape_target(&self) -> Result<ExternalScrapeTarget, InterpretError>;
|
||||||
|
fn name(&self) -> String;
|
||||||
|
fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct ExternalScrapeTarget {
|
||||||
|
pub ip: IpAddr,
|
||||||
|
pub port: i32,
|
||||||
|
pub interval: Option<String>,
|
||||||
|
pub path: Option<String>,
|
||||||
|
pub labels: Option<BTreeMap<String, String>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Alerting interpret to install an alert sender on a given topology
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct AlertingInterpret<S: AlertSender> {
|
||||||
|
pub sender: S,
|
||||||
|
pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
|
||||||
|
pub rules: Vec<Box<dyn AlertRule<S>>>,
|
||||||
|
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<S: AlertSender, T: Topology + Observability<S>> Interpret<T> for AlertingInterpret<S> {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
info!("Configuring alert sender {}", self.sender.name());
|
||||||
|
topology
|
||||||
|
.install_alert_sender(&self.sender, inventory)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
info!("Installing receivers");
|
||||||
|
topology
|
||||||
|
.install_receivers(&self.sender, inventory, Some(self.receivers.clone()))
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
info!("Installing rules");
|
||||||
|
topology
|
||||||
|
.install_rules(&self.sender, inventory, Some(self.rules.clone()))
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
info!("Adding extra scrape targets");
|
||||||
|
topology
|
||||||
|
.add_scrape_targets(&self.sender, inventory, self.scrape_targets.clone())
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
info!("Ensuring alert sender {} is ready", self.sender.name());
|
||||||
|
topology
|
||||||
|
.ensure_monitoring_installed(&self.sender, inventory)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
Ok(Outcome::success(format!(
|
||||||
|
"successfully installed alert sender {}",
|
||||||
|
self.sender.name()
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
InterpretName::Alerting
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<Id> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<S: AlertSender> Clone for Box<dyn AlertReceiver<S>> {
|
||||||
|
fn clone(&self) -> Self {
|
||||||
|
self.clone_box()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<S: AlertSender> Clone for Box<dyn AlertRule<S>> {
|
||||||
|
fn clone(&self) -> Self {
|
||||||
|
self.clone_box()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<S: AlertSender> Clone for Box<dyn ScrapeTarget<S>> {
|
||||||
|
fn clone(&self) -> Self {
|
||||||
|
self.clone_box()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub struct ReceiverInstallPlan {
|
||||||
|
pub install_operation: Option<Vec<InstallOperation>>,
|
||||||
|
pub route: Option<AlertRoute>,
|
||||||
|
pub receiver: Option<serde_yaml::Value>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for ReceiverInstallPlan {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
install_operation: None,
|
||||||
|
route: None,
|
||||||
|
receiver: None,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub enum InstallOperation {
|
||||||
|
CreateSecret {
|
||||||
|
name: String,
|
||||||
|
data: BTreeMap<String, String>,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
///Generic routing that can map to various alert sender backends
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct AlertRoute {
|
||||||
|
pub receiver: String,
|
||||||
|
#[serde(skip_serializing_if = "Vec::is_empty")]
|
||||||
|
pub matchers: Vec<AlertMatcher>,
|
||||||
|
#[serde(skip_serializing_if = "Vec::is_empty")]
|
||||||
|
pub group_by: Vec<String>,
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub repeat_interval: Option<String>,
|
||||||
|
#[serde(rename = "continue")]
|
||||||
|
pub continue_matching: bool,
|
||||||
|
#[serde(skip_serializing_if = "Vec::is_empty")]
|
||||||
|
pub children: Vec<AlertRoute>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertRoute {
|
||||||
|
pub fn default(name: String) -> Self {
|
||||||
|
Self {
|
||||||
|
receiver: name,
|
||||||
|
matchers: vec![],
|
||||||
|
group_by: vec![],
|
||||||
|
repeat_interval: Some("30s".to_string()),
|
||||||
|
continue_matching: true,
|
||||||
|
children: vec![],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct AlertMatcher {
|
||||||
|
pub label: String,
|
||||||
|
pub operator: MatchOp,
|
||||||
|
pub value: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub enum MatchOp {
|
||||||
|
Eq,
|
||||||
|
NotEq,
|
||||||
|
Regex,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Serialize for MatchOp {
|
||||||
|
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
let op = match self {
|
||||||
|
MatchOp::Eq => "=",
|
||||||
|
MatchOp::NotEq => "!=",
|
||||||
|
MatchOp::Regex => "=~",
|
||||||
|
};
|
||||||
|
serializer.serialize_str(op)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -1 +0,0 @@
|
|||||||
pub mod monitoring;
|
|
||||||
@@ -1,101 +0,0 @@
|
|||||||
use std::{any::Any, collections::HashMap};
|
|
||||||
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use kube::api::DynamicObject;
|
|
||||||
use log::debug;
|
|
||||||
|
|
||||||
use crate::{
|
|
||||||
data::Version,
|
|
||||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
|
||||||
inventory::Inventory,
|
|
||||||
topology::{Topology, installable::Installable},
|
|
||||||
};
|
|
||||||
use harmony_types::id::Id;
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait AlertSender: Send + Sync + std::fmt::Debug {
|
|
||||||
fn name(&self) -> String;
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug)]
|
|
||||||
pub struct AlertingInterpret<S: AlertSender> {
|
|
||||||
pub sender: S,
|
|
||||||
pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
|
|
||||||
pub rules: Vec<Box<dyn AlertRule<S>>>,
|
|
||||||
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<S: AlertSender + Installable<T>, T: Topology> Interpret<T> for AlertingInterpret<S> {
|
|
||||||
async fn execute(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
debug!("hit sender configure for AlertingInterpret");
|
|
||||||
self.sender.configure(inventory, topology).await?;
|
|
||||||
for receiver in self.receivers.iter() {
|
|
||||||
receiver.install(&self.sender).await?;
|
|
||||||
}
|
|
||||||
for rule in self.rules.iter() {
|
|
||||||
debug!("installing rule: {:#?}", rule);
|
|
||||||
rule.install(&self.sender).await?;
|
|
||||||
}
|
|
||||||
if let Some(targets) = &self.scrape_targets {
|
|
||||||
for target in targets.iter() {
|
|
||||||
debug!("installing scrape_target: {:#?}", target);
|
|
||||||
target.install(&self.sender).await?;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
self.sender.ensure_installed(inventory, topology).await?;
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"successfully installed alert sender {}",
|
|
||||||
self.sender.name()
|
|
||||||
)))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_name(&self) -> InterpretName {
|
|
||||||
InterpretName::Alerting
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_version(&self) -> Version {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_status(&self) -> InterpretStatus {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_children(&self) -> Vec<Id> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
|
|
||||||
async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
|
|
||||||
fn name(&self) -> String;
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
|
|
||||||
fn as_any(&self) -> &dyn Any;
|
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String>;
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug)]
|
|
||||||
pub struct AlertManagerReceiver {
|
|
||||||
pub receiver_config: serde_json::Value,
|
|
||||||
// FIXME we should not leak k8s here. DynamicObject is k8s specific
|
|
||||||
pub additional_ressources: Vec<DynamicObject>,
|
|
||||||
pub route_config: serde_json::Value,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
|
|
||||||
async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertRule<S>>;
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
|
|
||||||
async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
|
|
||||||
fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
|
|
||||||
}
|
|
||||||
@@ -2,13 +2,15 @@ use crate::modules::application::{
|
|||||||
Application, ApplicationFeature, InstallationError, InstallationOutcome,
|
Application, ApplicationFeature, InstallationError, InstallationOutcome,
|
||||||
};
|
};
|
||||||
use crate::modules::monitoring::application_monitoring::application_monitoring_score::ApplicationMonitoringScore;
|
use crate::modules::monitoring::application_monitoring::application_monitoring_score::ApplicationMonitoringScore;
|
||||||
use crate::modules::monitoring::grafana::grafana::Grafana;
|
|
||||||
use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus;
|
|
||||||
use crate::modules::monitoring::kube_prometheus::crd::service_monitor::{
|
use crate::modules::monitoring::kube_prometheus::crd::service_monitor::{
|
||||||
ServiceMonitor, ServiceMonitorSpec,
|
ServiceMonitor, ServiceMonitorSpec,
|
||||||
};
|
};
|
||||||
|
use crate::modules::monitoring::prometheus::Prometheus;
|
||||||
|
use crate::modules::monitoring::prometheus::helm::prometheus_config::PrometheusConfig;
|
||||||
use crate::topology::MultiTargetTopology;
|
use crate::topology::MultiTargetTopology;
|
||||||
use crate::topology::ingress::Ingress;
|
use crate::topology::ingress::Ingress;
|
||||||
|
use crate::topology::monitoring::Observability;
|
||||||
|
use crate::topology::monitoring::{AlertReceiver, AlertRoute};
|
||||||
use crate::{
|
use crate::{
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
modules::monitoring::{
|
modules::monitoring::{
|
||||||
@@ -17,10 +19,6 @@ use crate::{
|
|||||||
score::Score,
|
score::Score,
|
||||||
topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
|
topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
|
||||||
};
|
};
|
||||||
use crate::{
|
|
||||||
modules::prometheus::prometheus::PrometheusMonitoring,
|
|
||||||
topology::oberservability::monitoring::AlertReceiver,
|
|
||||||
};
|
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use base64::{Engine as _, engine::general_purpose};
|
use base64::{Engine as _, engine::general_purpose};
|
||||||
use harmony_secret::SecretManager;
|
use harmony_secret::SecretManager;
|
||||||
@@ -30,12 +28,13 @@ use kube::api::ObjectMeta;
|
|||||||
use log::{debug, info};
|
use log::{debug, info};
|
||||||
use schemars::JsonSchema;
|
use schemars::JsonSchema;
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
use std::sync::Arc;
|
use std::sync::{Arc, Mutex};
|
||||||
|
|
||||||
|
//TODO test this
|
||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
pub struct Monitoring {
|
pub struct Monitoring {
|
||||||
pub application: Arc<dyn Application>,
|
pub application: Arc<dyn Application>,
|
||||||
pub alert_receiver: Vec<Box<dyn AlertReceiver<CRDPrometheus>>>,
|
pub alert_receiver: Vec<Box<dyn AlertReceiver<Prometheus>>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
@@ -46,8 +45,7 @@ impl<
|
|||||||
+ TenantManager
|
+ TenantManager
|
||||||
+ K8sclient
|
+ K8sclient
|
||||||
+ MultiTargetTopology
|
+ MultiTargetTopology
|
||||||
+ PrometheusMonitoring<CRDPrometheus>
|
+ Observability<Prometheus>
|
||||||
+ Grafana
|
|
||||||
+ Ingress
|
+ Ingress
|
||||||
+ std::fmt::Debug,
|
+ std::fmt::Debug,
|
||||||
> ApplicationFeature<T> for Monitoring
|
> ApplicationFeature<T> for Monitoring
|
||||||
@@ -74,17 +72,15 @@ impl<
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut alerting_score = ApplicationMonitoringScore {
|
let mut alerting_score = ApplicationMonitoringScore {
|
||||||
sender: CRDPrometheus {
|
sender: Prometheus {
|
||||||
namespace: namespace.clone(),
|
config: Arc::new(Mutex::new(PrometheusConfig::new())),
|
||||||
client: topology.k8s_client().await.unwrap(),
|
|
||||||
service_monitor: vec![app_service_monitor],
|
|
||||||
},
|
},
|
||||||
application: self.application.clone(),
|
application: self.application.clone(),
|
||||||
receivers: self.alert_receiver.clone(),
|
receivers: self.alert_receiver.clone(),
|
||||||
};
|
};
|
||||||
let ntfy = NtfyScore {
|
let ntfy = NtfyScore {
|
||||||
namespace: namespace.clone(),
|
namespace: namespace.clone(),
|
||||||
host: domain,
|
host: domain.clone(),
|
||||||
};
|
};
|
||||||
ntfy.interpret(&Inventory::empty(), topology)
|
ntfy.interpret(&Inventory::empty(), topology)
|
||||||
.await
|
.await
|
||||||
@@ -105,20 +101,28 @@ impl<
|
|||||||
|
|
||||||
debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");
|
debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");
|
||||||
|
|
||||||
|
debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");
|
||||||
let ntfy_receiver = WebhookReceiver {
|
let ntfy_receiver = WebhookReceiver {
|
||||||
name: "ntfy-webhook".to_string(),
|
name: "ntfy-webhook".to_string(),
|
||||||
url: Url::Url(
|
url: Url::Url(
|
||||||
url::Url::parse(
|
url::Url::parse(
|
||||||
format!(
|
format!(
|
||||||
"http://ntfy.{}.svc.cluster.local/rust-web-app?auth={ntfy_default_auth_param}",
|
"http://{domain}/{}?auth={ntfy_default_auth_param}",
|
||||||
namespace.clone()
|
__self.application.name()
|
||||||
)
|
)
|
||||||
.as_str(),
|
.as_str(),
|
||||||
)
|
)
|
||||||
.unwrap(),
|
.unwrap(),
|
||||||
),
|
),
|
||||||
|
route: AlertRoute {
|
||||||
|
..AlertRoute::default("ntfy-webhook".to_string())
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
debug!(
|
||||||
|
"ntfy webhook receiver \n{:#?}\nntfy topic: {}",
|
||||||
|
ntfy_receiver.clone(),
|
||||||
|
self.application.name()
|
||||||
|
);
|
||||||
alerting_score.receivers.push(Box::new(ntfy_receiver));
|
alerting_score.receivers.push(Box::new(ntfy_receiver));
|
||||||
alerting_score
|
alerting_score
|
||||||
.interpret(&Inventory::empty(), topology)
|
.interpret(&Inventory::empty(), topology)
|
||||||
|
|||||||
@@ -3,11 +3,13 @@ use std::sync::Arc;
|
|||||||
use crate::modules::application::{
|
use crate::modules::application::{
|
||||||
Application, ApplicationFeature, InstallationError, InstallationOutcome,
|
Application, ApplicationFeature, InstallationError, InstallationOutcome,
|
||||||
};
|
};
|
||||||
use crate::modules::monitoring::application_monitoring::rhobs_application_monitoring_score::ApplicationRHOBMonitoringScore;
|
|
||||||
|
|
||||||
use crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::RHOBObservability;
|
use crate::modules::monitoring::red_hat_cluster_observability::RedHatClusterObservability;
|
||||||
|
use crate::modules::monitoring::red_hat_cluster_observability::redhat_cluster_observability::RedHatClusterObservabilityScore;
|
||||||
use crate::topology::MultiTargetTopology;
|
use crate::topology::MultiTargetTopology;
|
||||||
use crate::topology::ingress::Ingress;
|
use crate::topology::ingress::Ingress;
|
||||||
|
use crate::topology::monitoring::Observability;
|
||||||
|
use crate::topology::monitoring::{AlertReceiver, AlertRoute};
|
||||||
use crate::{
|
use crate::{
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
modules::monitoring::{
|
modules::monitoring::{
|
||||||
@@ -16,10 +18,6 @@ use crate::{
|
|||||||
score::Score,
|
score::Score,
|
||||||
topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
|
topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
|
||||||
};
|
};
|
||||||
use crate::{
|
|
||||||
modules::prometheus::prometheus::PrometheusMonitoring,
|
|
||||||
topology::oberservability::monitoring::AlertReceiver,
|
|
||||||
};
|
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use base64::{Engine as _, engine::general_purpose};
|
use base64::{Engine as _, engine::general_purpose};
|
||||||
use harmony_types::net::Url;
|
use harmony_types::net::Url;
|
||||||
@@ -28,9 +26,10 @@ use log::{debug, info};
|
|||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
pub struct Monitoring {
|
pub struct Monitoring {
|
||||||
pub application: Arc<dyn Application>,
|
pub application: Arc<dyn Application>,
|
||||||
pub alert_receiver: Vec<Box<dyn AlertReceiver<RHOBObservability>>>,
|
pub alert_receiver: Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
///TODO TEST this
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
impl<
|
impl<
|
||||||
T: Topology
|
T: Topology
|
||||||
@@ -41,7 +40,7 @@ impl<
|
|||||||
+ MultiTargetTopology
|
+ MultiTargetTopology
|
||||||
+ Ingress
|
+ Ingress
|
||||||
+ std::fmt::Debug
|
+ std::fmt::Debug
|
||||||
+ PrometheusMonitoring<RHOBObservability>,
|
+ Observability<RedHatClusterObservability>,
|
||||||
> ApplicationFeature<T> for Monitoring
|
> ApplicationFeature<T> for Monitoring
|
||||||
{
|
{
|
||||||
async fn ensure_installed(
|
async fn ensure_installed(
|
||||||
@@ -55,13 +54,14 @@ impl<
|
|||||||
.map(|ns| ns.name.clone())
|
.map(|ns| ns.name.clone())
|
||||||
.unwrap_or_else(|| self.application.name());
|
.unwrap_or_else(|| self.application.name());
|
||||||
|
|
||||||
let mut alerting_score = ApplicationRHOBMonitoringScore {
|
let mut alerting_score = RedHatClusterObservabilityScore {
|
||||||
sender: RHOBObservability {
|
sender: RedHatClusterObservability {
|
||||||
namespace: namespace.clone(),
|
namespace: namespace.clone(),
|
||||||
client: topology.k8s_client().await.unwrap(),
|
resource_selector: todo!(),
|
||||||
},
|
},
|
||||||
application: self.application.clone(),
|
|
||||||
receivers: self.alert_receiver.clone(),
|
receivers: self.alert_receiver.clone(),
|
||||||
|
rules: vec![],
|
||||||
|
scrape_targets: None,
|
||||||
};
|
};
|
||||||
let domain = topology
|
let domain = topology
|
||||||
.get_domain("ntfy")
|
.get_domain("ntfy")
|
||||||
@@ -97,12 +97,15 @@ impl<
|
|||||||
url::Url::parse(
|
url::Url::parse(
|
||||||
format!(
|
format!(
|
||||||
"http://{domain}/{}?auth={ntfy_default_auth_param}",
|
"http://{domain}/{}?auth={ntfy_default_auth_param}",
|
||||||
self.application.name()
|
__self.application.name()
|
||||||
)
|
)
|
||||||
.as_str(),
|
.as_str(),
|
||||||
)
|
)
|
||||||
.unwrap(),
|
.unwrap(),
|
||||||
),
|
),
|
||||||
|
route: AlertRoute {
|
||||||
|
..AlertRoute::default("ntfy-webhook".to_string())
|
||||||
|
},
|
||||||
};
|
};
|
||||||
debug!(
|
debug!(
|
||||||
"ntfy webhook receiver \n{:#?}\nntfy topic: {}",
|
"ntfy webhook receiver \n{:#?}\nntfy topic: {}",
|
||||||
|
|||||||
@@ -1,5 +1,4 @@
|
|||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use k8s_openapi::ResourceScope;
|
|
||||||
use kube::Resource;
|
use kube::Resource;
|
||||||
use log::info;
|
use log::info;
|
||||||
use serde::{Serialize, de::DeserializeOwned};
|
use serde::{Serialize, de::DeserializeOwned};
|
||||||
@@ -29,7 +28,7 @@ impl<K: Resource + std::fmt::Debug> K8sResourceScore<K> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
impl<
|
impl<
|
||||||
K: Resource<Scope: ResourceScope>
|
K: Resource
|
||||||
+ std::fmt::Debug
|
+ std::fmt::Debug
|
||||||
+ Sync
|
+ Sync
|
||||||
+ DeserializeOwned
|
+ DeserializeOwned
|
||||||
@@ -61,7 +60,7 @@ pub struct K8sResourceInterpret<K: Resource + std::fmt::Debug + Sync + Send> {
|
|||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
impl<
|
impl<
|
||||||
K: Resource<Scope: ResourceScope>
|
K: Resource
|
||||||
+ Clone
|
+ Clone
|
||||||
+ std::fmt::Debug
|
+ std::fmt::Debug
|
||||||
+ DeserializeOwned
|
+ DeserializeOwned
|
||||||
|
|||||||
@@ -20,7 +20,6 @@ pub mod okd;
|
|||||||
pub mod openbao;
|
pub mod openbao;
|
||||||
pub mod opnsense;
|
pub mod opnsense;
|
||||||
pub mod postgresql;
|
pub mod postgresql;
|
||||||
pub mod prometheus;
|
|
||||||
pub mod storage;
|
pub mod storage;
|
||||||
pub mod tenant;
|
pub mod tenant;
|
||||||
pub mod tftp;
|
pub mod tftp;
|
||||||
|
|||||||
@@ -1,99 +1,38 @@
|
|||||||
use std::any::Any;
|
use crate::modules::monitoring::kube_prometheus::KubePrometheus;
|
||||||
use std::collections::{BTreeMap, HashMap};
|
use crate::modules::monitoring::okd::OpenshiftClusterAlertSender;
|
||||||
|
use crate::modules::monitoring::red_hat_cluster_observability::RedHatClusterObservability;
|
||||||
use async_trait::async_trait;
|
use crate::topology::monitoring::{AlertRoute, InstallOperation, ReceiverInstallPlan};
|
||||||
use harmony_types::k8s_name::K8sName;
|
use crate::{interpret::InterpretError, topology::monitoring::AlertReceiver};
|
||||||
use k8s_openapi::api::core::v1::Secret;
|
use harmony_types::net::Url;
|
||||||
use kube::Resource;
|
|
||||||
use kube::api::{DynamicObject, ObjectMeta};
|
|
||||||
use log::{debug, trace};
|
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
use serde_json::json;
|
use serde_json::json;
|
||||||
use serde_yaml::{Mapping, Value};
|
use std::collections::BTreeMap;
|
||||||
|
|
||||||
use crate::infra::kube::kube_resource_to_dynamic;
|
|
||||||
use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::{
|
|
||||||
AlertmanagerConfig, AlertmanagerConfigSpec, CRDPrometheus,
|
|
||||||
};
|
|
||||||
use crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::RHOBObservability;
|
|
||||||
use crate::modules::monitoring::okd::OpenshiftClusterAlertSender;
|
|
||||||
use crate::topology::oberservability::monitoring::AlertManagerReceiver;
|
|
||||||
use crate::{
|
|
||||||
interpret::{InterpretError, Outcome},
|
|
||||||
modules::monitoring::{
|
|
||||||
kube_prometheus::{
|
|
||||||
prometheus::{KubePrometheus, KubePrometheusReceiver},
|
|
||||||
types::{AlertChannelConfig, AlertManagerChannelConfig},
|
|
||||||
},
|
|
||||||
prometheus::prometheus::{Prometheus, PrometheusReceiver},
|
|
||||||
},
|
|
||||||
topology::oberservability::monitoring::AlertReceiver,
|
|
||||||
};
|
|
||||||
use harmony_types::net::Url;
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub struct DiscordWebhook {
|
pub struct DiscordReceiver {
|
||||||
pub name: K8sName,
|
pub name: String,
|
||||||
pub url: Url,
|
pub url: Url,
|
||||||
pub selectors: Vec<HashMap<String, String>>,
|
pub route: AlertRoute,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl DiscordWebhook {
|
impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordReceiver {
|
||||||
fn get_receiver_config(&self) -> Result<AlertManagerReceiver, String> {
|
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
|
||||||
let secret_name = format!("{}-secret", self.name.clone());
|
let receiver_block = serde_yaml::to_value(json!({
|
||||||
let webhook_key = format!("{}", self.url.clone());
|
"name": self.name,
|
||||||
|
"discord_configs": [{
|
||||||
|
"webhook_url": format!("{}", self.url),
|
||||||
|
"title": "{{ template \"discord.default.title\" . }}",
|
||||||
|
"message": "{{ template \"discord.default.message\" . }}"
|
||||||
|
}]
|
||||||
|
}))
|
||||||
|
.map_err(|e| InterpretError::new(e.to_string()))?;
|
||||||
|
|
||||||
let mut string_data = BTreeMap::new();
|
Ok(ReceiverInstallPlan {
|
||||||
string_data.insert("webhook-url".to_string(), webhook_key.clone());
|
install_operation: None,
|
||||||
|
route: Some(self.route.clone()),
|
||||||
let secret = Secret {
|
receiver: Some(receiver_block),
|
||||||
metadata: kube::core::ObjectMeta {
|
|
||||||
name: Some(secret_name.clone()),
|
|
||||||
..Default::default()
|
|
||||||
},
|
|
||||||
string_data: Some(string_data),
|
|
||||||
type_: Some("Opaque".to_string()),
|
|
||||||
..Default::default()
|
|
||||||
};
|
|
||||||
|
|
||||||
let mut matchers: Vec<String> = Vec::new();
|
|
||||||
for selector in &self.selectors {
|
|
||||||
trace!("selector: {:#?}", selector);
|
|
||||||
for (k, v) in selector {
|
|
||||||
matchers.push(format!("{} = {}", k, v));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
Ok(AlertManagerReceiver {
|
|
||||||
additional_ressources: vec![kube_resource_to_dynamic(&secret)?],
|
|
||||||
|
|
||||||
receiver_config: json!({
|
|
||||||
"name": self.name,
|
|
||||||
"discord_configs": [
|
|
||||||
{
|
|
||||||
"webhook_url": self.url.clone(),
|
|
||||||
"title": "{{ template \"discord.default.title\" . }}",
|
|
||||||
"message": "{{ template \"discord.default.message\" . }}"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}),
|
|
||||||
route_config: json!({
|
|
||||||
"receiver": self.name,
|
|
||||||
"matchers": matchers,
|
|
||||||
|
|
||||||
}),
|
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordWebhook {
|
|
||||||
async fn install(
|
|
||||||
&self,
|
|
||||||
sender: &OpenshiftClusterAlertSender,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn name(&self) -> String {
|
fn name(&self) -> String {
|
||||||
self.name.clone().to_string()
|
self.name.clone().to_string()
|
||||||
@@ -102,309 +41,77 @@ impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordWebhook {
|
|||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
|
fn clone_box(&self) -> Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
|
||||||
Box::new(self.clone())
|
Box::new(self.clone())
|
||||||
}
|
}
|
||||||
|
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
|
||||||
self.get_receiver_config()
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
impl AlertReceiver<RedHatClusterObservability> for DiscordReceiver {
|
||||||
impl AlertReceiver<RHOBObservability> for DiscordWebhook {
|
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn install(&self, sender: &RHOBObservability) -> Result<Outcome, InterpretError> {
|
|
||||||
let ns = sender.namespace.clone();
|
|
||||||
|
|
||||||
let config = self.get_receiver_config()?;
|
|
||||||
for resource in config.additional_ressources.iter() {
|
|
||||||
todo!("can I apply a dynamicresource");
|
|
||||||
// sender.client.apply(resource, Some(&ns)).await;
|
|
||||||
}
|
|
||||||
|
|
||||||
let spec = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfigSpec {
|
|
||||||
data: json!({
|
|
||||||
"route": {
|
|
||||||
"receiver": self.name,
|
|
||||||
},
|
|
||||||
"receivers": [
|
|
||||||
config.receiver_config
|
|
||||||
]
|
|
||||||
}),
|
|
||||||
};
|
|
||||||
|
|
||||||
let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfig {
|
|
||||||
metadata: ObjectMeta {
|
|
||||||
name: Some(self.name.clone().to_string()),
|
|
||||||
labels: Some(std::collections::BTreeMap::from([(
|
|
||||||
"alertmanagerConfig".to_string(),
|
|
||||||
"enabled".to_string(),
|
|
||||||
)])),
|
|
||||||
namespace: Some(sender.namespace.clone()),
|
|
||||||
..Default::default()
|
|
||||||
},
|
|
||||||
spec,
|
|
||||||
};
|
|
||||||
debug!(
|
|
||||||
"alertmanager_configs yaml:\n{:#?}",
|
|
||||||
serde_yaml::to_string(&alertmanager_configs)
|
|
||||||
);
|
|
||||||
debug!(
|
|
||||||
"alert manager configs: \n{:#?}",
|
|
||||||
alertmanager_configs.clone()
|
|
||||||
);
|
|
||||||
|
|
||||||
sender
|
|
||||||
.client
|
|
||||||
.apply(&alertmanager_configs, Some(&sender.namespace))
|
|
||||||
.await?;
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"installed rhob-alertmanagerconfigs for {}",
|
|
||||||
self.name
|
|
||||||
)))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"webhook-receiver".to_string()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<RHOBObservability>> {
|
|
||||||
Box::new(self.clone())
|
|
||||||
}
|
|
||||||
|
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertReceiver<CRDPrometheus> for DiscordWebhook {
|
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
async fn install(&self, sender: &CRDPrometheus) -> Result<Outcome, InterpretError> {
|
|
||||||
let ns = sender.namespace.clone();
|
|
||||||
let secret_name = format!("{}-secret", self.name.clone());
|
let secret_name = format!("{}-secret", self.name.clone());
|
||||||
let webhook_key = format!("{}", self.url.clone());
|
let webhook_key = format!("{}", self.url.clone());
|
||||||
|
|
||||||
let mut string_data = BTreeMap::new();
|
let mut string_data = BTreeMap::new();
|
||||||
string_data.insert("webhook-url".to_string(), webhook_key.clone());
|
string_data.insert("webhook-url".to_string(), webhook_key.clone());
|
||||||
|
|
||||||
let secret = Secret {
|
let receiver_config = json!({
|
||||||
metadata: kube::core::ObjectMeta {
|
"name": self.name,
|
||||||
name: Some(secret_name.clone()),
|
"discordConfigs": [
|
||||||
..Default::default()
|
{
|
||||||
},
|
"apiURL": {
|
||||||
string_data: Some(string_data),
|
"key": "webhook-url",
|
||||||
type_: Some("Opaque".to_string()),
|
"name": format!("{}-secret", self.name)
|
||||||
..Default::default()
|
},
|
||||||
};
|
"title": "{{ template \"discord.default.title\" . }}",
|
||||||
|
"message": "{{ template \"discord.default.message\" . }}"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
});
|
||||||
|
|
||||||
let _ = sender.client.apply(&secret, Some(&ns)).await;
|
Ok(ReceiverInstallPlan {
|
||||||
|
install_operation: Some(vec![InstallOperation::CreateSecret {
|
||||||
let spec = AlertmanagerConfigSpec {
|
name: secret_name,
|
||||||
data: json!({
|
data: string_data,
|
||||||
"route": {
|
}]),
|
||||||
"receiver": self.name,
|
route: Some(self.route.clone()),
|
||||||
},
|
receiver: Some(
|
||||||
"receivers": [
|
serde_yaml::to_value(receiver_config)
|
||||||
{
|
.map_err(|e| InterpretError::new(e.to_string()))
|
||||||
"name": self.name,
|
.expect("failed to build yaml value"),
|
||||||
"discordConfigs": [
|
),
|
||||||
{
|
})
|
||||||
"apiURL": {
|
|
||||||
"name": secret_name,
|
|
||||||
"key": "webhook-url",
|
|
||||||
},
|
|
||||||
"title": "{{ template \"discord.default.title\" . }}",
|
|
||||||
"message": "{{ template \"discord.default.message\" . }}"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}),
|
|
||||||
};
|
|
||||||
|
|
||||||
let alertmanager_configs = AlertmanagerConfig {
|
|
||||||
metadata: ObjectMeta {
|
|
||||||
name: Some(self.name.clone().to_string()),
|
|
||||||
labels: Some(std::collections::BTreeMap::from([(
|
|
||||||
"alertmanagerConfig".to_string(),
|
|
||||||
"enabled".to_string(),
|
|
||||||
)])),
|
|
||||||
namespace: Some(ns),
|
|
||||||
..Default::default()
|
|
||||||
},
|
|
||||||
spec,
|
|
||||||
};
|
|
||||||
|
|
||||||
sender
|
|
||||||
.client
|
|
||||||
.apply(&alertmanager_configs, Some(&sender.namespace))
|
|
||||||
.await?;
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"installed crd-alertmanagerconfigs for {}",
|
|
||||||
self.name
|
|
||||||
)))
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn name(&self) -> String {
|
fn name(&self) -> String {
|
||||||
"discord-webhook".to_string()
|
self.name.clone()
|
||||||
}
|
}
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<CRDPrometheus>> {
|
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertReceiver<RedHatClusterObservability>> {
|
||||||
Box::new(self.clone())
|
Box::new(self.clone())
|
||||||
}
|
}
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
impl AlertReceiver<KubePrometheus> for DiscordReceiver {
|
||||||
impl AlertReceiver<Prometheus> for DiscordWebhook {
|
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
let receiver_block = serde_yaml::to_value(json!({
|
||||||
todo!()
|
"name": self.name,
|
||||||
}
|
"discord_configs": [{
|
||||||
async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
|
"webhook_url": format!("{}", self.url),
|
||||||
sender.install_receiver(self).await
|
"title": "{{ template \"discord.default.title\" . }}",
|
||||||
|
"message": "{{ template \"discord.default.message\" . }}"
|
||||||
|
}]
|
||||||
|
}))
|
||||||
|
.map_err(|e| InterpretError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
Ok(ReceiverInstallPlan {
|
||||||
|
install_operation: None,
|
||||||
|
route: Some(self.route.clone()),
|
||||||
|
receiver: Some(receiver_block),
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
fn name(&self) -> String {
|
fn name(&self) -> String {
|
||||||
"discord-webhook".to_string()
|
self.name.clone()
|
||||||
}
|
}
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<Prometheus>> {
|
|
||||||
Box::new(self.clone())
|
|
||||||
}
|
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl PrometheusReceiver for DiscordWebhook {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
self.name.clone().to_string()
|
|
||||||
}
|
|
||||||
async fn configure_receiver(&self) -> AlertManagerChannelConfig {
|
|
||||||
self.get_config().await
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertReceiver<KubePrometheus> for DiscordWebhook {
|
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
|
|
||||||
sender.install_receiver(self).await
|
|
||||||
}
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
|
fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
|
||||||
Box::new(self.clone())
|
Box::new(self.clone())
|
||||||
}
|
}
|
||||||
fn name(&self) -> String {
|
|
||||||
"discord-webhook".to_string()
|
|
||||||
}
|
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl KubePrometheusReceiver for DiscordWebhook {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
self.name.clone().to_string()
|
|
||||||
}
|
|
||||||
async fn configure_receiver(&self) -> AlertManagerChannelConfig {
|
|
||||||
self.get_config().await
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertChannelConfig for DiscordWebhook {
|
|
||||||
async fn get_config(&self) -> AlertManagerChannelConfig {
|
|
||||||
let channel_global_config = None;
|
|
||||||
let channel_receiver = self.alert_channel_receiver().await;
|
|
||||||
let channel_route = self.alert_channel_route().await;
|
|
||||||
|
|
||||||
AlertManagerChannelConfig {
|
|
||||||
channel_global_config,
|
|
||||||
channel_receiver,
|
|
||||||
channel_route,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl DiscordWebhook {
|
|
||||||
async fn alert_channel_route(&self) -> serde_yaml::Value {
|
|
||||||
let mut route = Mapping::new();
|
|
||||||
route.insert(
|
|
||||||
Value::String("receiver".to_string()),
|
|
||||||
Value::String(self.name.clone().to_string()),
|
|
||||||
);
|
|
||||||
route.insert(
|
|
||||||
Value::String("matchers".to_string()),
|
|
||||||
Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
|
|
||||||
);
|
|
||||||
route.insert(Value::String("continue".to_string()), Value::Bool(true));
|
|
||||||
Value::Mapping(route)
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn alert_channel_receiver(&self) -> serde_yaml::Value {
|
|
||||||
let mut receiver = Mapping::new();
|
|
||||||
receiver.insert(
|
|
||||||
Value::String("name".to_string()),
|
|
||||||
Value::String(self.name.clone().to_string()),
|
|
||||||
);
|
|
||||||
|
|
||||||
let mut discord_config = Mapping::new();
|
|
||||||
discord_config.insert(
|
|
||||||
Value::String("webhook_url".to_string()),
|
|
||||||
Value::String(self.url.to_string()),
|
|
||||||
);
|
|
||||||
|
|
||||||
receiver.insert(
|
|
||||||
Value::String("discord_configs".to_string()),
|
|
||||||
Value::Sequence(vec![Value::Mapping(discord_config)]),
|
|
||||||
);
|
|
||||||
|
|
||||||
Value::Mapping(receiver)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[cfg(test)]
|
|
||||||
mod tests {
|
|
||||||
use super::*;
|
|
||||||
|
|
||||||
#[tokio::test]
|
|
||||||
async fn discord_serialize_should_match() {
|
|
||||||
let discord_receiver = DiscordWebhook {
|
|
||||||
name: K8sName("test-discord".to_string()),
|
|
||||||
url: Url::Url(url::Url::parse("https://discord.i.dont.exist.com").unwrap()),
|
|
||||||
selectors: vec![],
|
|
||||||
};
|
|
||||||
|
|
||||||
let discord_receiver_receiver =
|
|
||||||
serde_yaml::to_string(&discord_receiver.alert_channel_receiver().await).unwrap();
|
|
||||||
println!("receiver \n{:#}", discord_receiver_receiver);
|
|
||||||
let discord_receiver_receiver_yaml = r#"name: test-discord
|
|
||||||
discord_configs:
|
|
||||||
- webhook_url: https://discord.i.dont.exist.com/
|
|
||||||
"#
|
|
||||||
.to_string();
|
|
||||||
|
|
||||||
let discord_receiver_route =
|
|
||||||
serde_yaml::to_string(&discord_receiver.alert_channel_route().await).unwrap();
|
|
||||||
println!("route \n{:#}", discord_receiver_route);
|
|
||||||
let discord_receiver_route_yaml = r#"receiver: test-discord
|
|
||||||
matchers:
|
|
||||||
- alertname!=Watchdog
|
|
||||||
continue: true
|
|
||||||
"#
|
|
||||||
.to_string();
|
|
||||||
|
|
||||||
assert_eq!(discord_receiver_receiver, discord_receiver_receiver_yaml);
|
|
||||||
assert_eq!(discord_receiver_route, discord_receiver_route_yaml);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,25 +1,13 @@
|
|||||||
use std::any::Any;
|
|
||||||
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use kube::api::ObjectMeta;
|
|
||||||
use log::debug;
|
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
use serde_json::json;
|
use serde_json::json;
|
||||||
use serde_yaml::{Mapping, Value};
|
|
||||||
|
|
||||||
use crate::{
|
use crate::{
|
||||||
interpret::{InterpretError, Outcome},
|
interpret::InterpretError,
|
||||||
modules::monitoring::{
|
modules::monitoring::{
|
||||||
kube_prometheus::{
|
kube_prometheus::KubePrometheus, okd::OpenshiftClusterAlertSender, prometheus::Prometheus,
|
||||||
crd::{
|
red_hat_cluster_observability::RedHatClusterObservability,
|
||||||
crd_alertmanager_config::CRDPrometheus, rhob_alertmanager_config::RHOBObservability,
|
|
||||||
},
|
|
||||||
prometheus::{KubePrometheus, KubePrometheusReceiver},
|
|
||||||
types::{AlertChannelConfig, AlertManagerChannelConfig},
|
|
||||||
},
|
|
||||||
prometheus::prometheus::{Prometheus, PrometheusReceiver},
|
|
||||||
},
|
},
|
||||||
topology::oberservability::monitoring::{AlertManagerReceiver, AlertReceiver},
|
topology::monitoring::{AlertReceiver, AlertRoute, ReceiverInstallPlan},
|
||||||
};
|
};
|
||||||
use harmony_types::net::Url;
|
use harmony_types::net::Url;
|
||||||
|
|
||||||
@@ -27,281 +15,115 @@ use harmony_types::net::Url;
|
|||||||
pub struct WebhookReceiver {
|
pub struct WebhookReceiver {
|
||||||
pub name: String,
|
pub name: String,
|
||||||
pub url: Url,
|
pub url: Url,
|
||||||
}
|
pub route: AlertRoute,
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertReceiver<RHOBObservability> for WebhookReceiver {
|
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
async fn install(&self, sender: &RHOBObservability) -> Result<Outcome, InterpretError> {
|
|
||||||
let spec = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfigSpec {
|
|
||||||
data: json!({
|
|
||||||
"route": {
|
|
||||||
"receiver": self.name,
|
|
||||||
},
|
|
||||||
"receivers": [
|
|
||||||
{
|
|
||||||
"name": self.name,
|
|
||||||
"webhookConfigs": [
|
|
||||||
{
|
|
||||||
"url": self.url,
|
|
||||||
"httpConfig": {
|
|
||||||
"tlsConfig": {
|
|
||||||
"insecureSkipVerify": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}),
|
|
||||||
};
|
|
||||||
|
|
||||||
let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfig {
|
|
||||||
metadata: ObjectMeta {
|
|
||||||
name: Some(self.name.clone()),
|
|
||||||
labels: Some(std::collections::BTreeMap::from([(
|
|
||||||
"alertmanagerConfig".to_string(),
|
|
||||||
"enabled".to_string(),
|
|
||||||
)])),
|
|
||||||
namespace: Some(sender.namespace.clone()),
|
|
||||||
..Default::default()
|
|
||||||
},
|
|
||||||
spec,
|
|
||||||
};
|
|
||||||
debug!(
|
|
||||||
"alert manager configs: \n{:#?}",
|
|
||||||
alertmanager_configs.clone()
|
|
||||||
);
|
|
||||||
|
|
||||||
sender
|
|
||||||
.client
|
|
||||||
.apply(&alertmanager_configs, Some(&sender.namespace))
|
|
||||||
.await?;
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"installed rhob-alertmanagerconfigs for {}",
|
|
||||||
self.name
|
|
||||||
)))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"webhook-receiver".to_string()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<RHOBObservability>> {
|
|
||||||
Box::new(self.clone())
|
|
||||||
}
|
|
||||||
|
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertReceiver<CRDPrometheus> for WebhookReceiver {
|
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
async fn install(&self, sender: &CRDPrometheus) -> Result<Outcome, InterpretError> {
|
|
||||||
let spec = crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::AlertmanagerConfigSpec {
|
|
||||||
data: json!({
|
|
||||||
"route": {
|
|
||||||
"receiver": self.name,
|
|
||||||
},
|
|
||||||
"receivers": [
|
|
||||||
{
|
|
||||||
"name": self.name,
|
|
||||||
"webhookConfigs": [
|
|
||||||
{
|
|
||||||
"url": self.url,
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}),
|
|
||||||
};
|
|
||||||
|
|
||||||
let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::AlertmanagerConfig {
|
|
||||||
metadata: ObjectMeta {
|
|
||||||
name: Some(self.name.clone()),
|
|
||||||
labels: Some(std::collections::BTreeMap::from([(
|
|
||||||
"alertmanagerConfig".to_string(),
|
|
||||||
"enabled".to_string(),
|
|
||||||
)])),
|
|
||||||
namespace: Some(sender.namespace.clone()),
|
|
||||||
..Default::default()
|
|
||||||
},
|
|
||||||
spec,
|
|
||||||
};
|
|
||||||
debug!(
|
|
||||||
"alert manager configs: \n{:#?}",
|
|
||||||
alertmanager_configs.clone()
|
|
||||||
);
|
|
||||||
|
|
||||||
sender
|
|
||||||
.client
|
|
||||||
.apply(&alertmanager_configs, Some(&sender.namespace))
|
|
||||||
.await?;
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"installed crd-alertmanagerconfigs for {}",
|
|
||||||
self.name
|
|
||||||
)))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"webhook-receiver".to_string()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<CRDPrometheus>> {
|
|
||||||
Box::new(self.clone())
|
|
||||||
}
|
|
||||||
|
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertReceiver<Prometheus> for WebhookReceiver {
|
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
|
|
||||||
sender.install_receiver(self).await
|
|
||||||
}
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"webhook-receiver".to_string()
|
|
||||||
}
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<Prometheus>> {
|
|
||||||
Box::new(self.clone())
|
|
||||||
}
|
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl PrometheusReceiver for WebhookReceiver {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
self.name.clone()
|
|
||||||
}
|
|
||||||
async fn configure_receiver(&self) -> AlertManagerChannelConfig {
|
|
||||||
self.get_config().await
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertReceiver<KubePrometheus> for WebhookReceiver {
|
|
||||||
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
|
|
||||||
sender.install_receiver(self).await
|
|
||||||
}
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"webhook-receiver".to_string()
|
|
||||||
}
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
|
|
||||||
Box::new(self.clone())
|
|
||||||
}
|
|
||||||
fn as_any(&self) -> &dyn Any {
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl KubePrometheusReceiver for WebhookReceiver {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
self.name.clone()
|
|
||||||
}
|
|
||||||
async fn configure_receiver(&self) -> AlertManagerChannelConfig {
|
|
||||||
self.get_config().await
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertChannelConfig for WebhookReceiver {
|
|
||||||
async fn get_config(&self) -> AlertManagerChannelConfig {
|
|
||||||
let channel_global_config = None;
|
|
||||||
let channel_receiver = self.alert_channel_receiver().await;
|
|
||||||
let channel_route = self.alert_channel_route().await;
|
|
||||||
|
|
||||||
AlertManagerChannelConfig {
|
|
||||||
channel_global_config,
|
|
||||||
channel_receiver,
|
|
||||||
channel_route,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl WebhookReceiver {
|
impl WebhookReceiver {
|
||||||
async fn alert_channel_route(&self) -> serde_yaml::Value {
|
fn build_receiver(&self) -> serde_json::Value {
|
||||||
let mut route = Mapping::new();
|
json!({
|
||||||
route.insert(
|
"name": self.name,
|
||||||
Value::String("receiver".to_string()),
|
"webhookConfigs": [
|
||||||
Value::String(self.name.clone()),
|
{
|
||||||
);
|
"url": self.url,
|
||||||
route.insert(
|
"httpConfig": {
|
||||||
Value::String("matchers".to_string()),
|
"tlsConfig": {
|
||||||
Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
|
"insecureSkipVerify": true
|
||||||
);
|
}
|
||||||
route.insert(Value::String("continue".to_string()), Value::Bool(true));
|
}
|
||||||
Value::Mapping(route)
|
}
|
||||||
|
]})
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn alert_channel_receiver(&self) -> serde_yaml::Value {
|
fn build_route(&self) -> serde_json::Value {
|
||||||
let mut receiver = Mapping::new();
|
json!({
|
||||||
receiver.insert(
|
"name": self.name})
|
||||||
Value::String("name".to_string()),
|
|
||||||
Value::String(self.name.clone()),
|
|
||||||
);
|
|
||||||
|
|
||||||
let mut webhook_config = Mapping::new();
|
|
||||||
webhook_config.insert(
|
|
||||||
Value::String("url".to_string()),
|
|
||||||
Value::String(self.url.to_string()),
|
|
||||||
);
|
|
||||||
|
|
||||||
receiver.insert(
|
|
||||||
Value::String("webhook_configs".to_string()),
|
|
||||||
Value::Sequence(vec![Value::Mapping(webhook_config)]),
|
|
||||||
);
|
|
||||||
|
|
||||||
Value::Mapping(receiver)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
impl AlertReceiver<OpenshiftClusterAlertSender> for WebhookReceiver {
|
||||||
mod tests {
|
fn name(&self) -> String {
|
||||||
use super::*;
|
self.name.clone()
|
||||||
#[tokio::test]
|
}
|
||||||
async fn webhook_serialize_should_match() {
|
|
||||||
let webhook_receiver = WebhookReceiver {
|
|
||||||
name: "test-webhook".to_string(),
|
|
||||||
url: Url::Url(url::Url::parse("https://webhook.i.dont.exist.com").unwrap()),
|
|
||||||
};
|
|
||||||
|
|
||||||
let webhook_receiver_receiver =
|
fn clone_box(&self) -> Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
|
||||||
serde_yaml::to_string(&webhook_receiver.alert_channel_receiver().await).unwrap();
|
Box::new(self.clone())
|
||||||
println!("receiver \n{:#}", webhook_receiver_receiver);
|
}
|
||||||
let webhook_receiver_receiver_yaml = r#"name: test-webhook
|
|
||||||
webhook_configs:
|
|
||||||
- url: https://webhook.i.dont.exist.com/
|
|
||||||
"#
|
|
||||||
.to_string();
|
|
||||||
|
|
||||||
let webhook_receiver_route =
|
fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
|
||||||
serde_yaml::to_string(&webhook_receiver.alert_channel_route().await).unwrap();
|
let receiver = self.build_receiver();
|
||||||
println!("route \n{:#}", webhook_receiver_route);
|
let receiver =
|
||||||
let webhook_receiver_route_yaml = r#"receiver: test-webhook
|
serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
|
||||||
matchers:
|
|
||||||
- alertname!=Watchdog
|
|
||||||
continue: true
|
|
||||||
"#
|
|
||||||
.to_string();
|
|
||||||
|
|
||||||
assert_eq!(webhook_receiver_receiver, webhook_receiver_receiver_yaml);
|
Ok(ReceiverInstallPlan {
|
||||||
assert_eq!(webhook_receiver_route, webhook_receiver_route_yaml);
|
install_operation: None,
|
||||||
|
route: Some(self.route.clone()),
|
||||||
|
receiver: Some(receiver),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertReceiver<RedHatClusterObservability> for WebhookReceiver {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
self.name.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertReceiver<RedHatClusterObservability>> {
|
||||||
|
Box::new(self.clone())
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
|
||||||
|
let receiver = self.build_receiver();
|
||||||
|
let receiver =
|
||||||
|
serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
Ok(ReceiverInstallPlan {
|
||||||
|
install_operation: None,
|
||||||
|
route: Some(self.route.clone()),
|
||||||
|
receiver: Some(receiver),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertReceiver<KubePrometheus> for WebhookReceiver {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
self.name.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
|
||||||
|
Box::new(self.clone())
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
|
||||||
|
let receiver = self.build_receiver();
|
||||||
|
let receiver =
|
||||||
|
serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
Ok(ReceiverInstallPlan {
|
||||||
|
install_operation: None,
|
||||||
|
route: Some(self.route.clone()),
|
||||||
|
receiver: Some(receiver),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertReceiver<Prometheus> for WebhookReceiver {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
self.name.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertReceiver<Prometheus>> {
|
||||||
|
Box::new(self.clone())
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
|
||||||
|
let receiver = self.build_receiver();
|
||||||
|
let receiver =
|
||||||
|
serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
Ok(ReceiverInstallPlan {
|
||||||
|
install_operation: None,
|
||||||
|
route: Some(self.route.clone()),
|
||||||
|
receiver: Some(receiver),
|
||||||
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1 +1,2 @@
|
|||||||
pub mod dell_server;
|
pub mod dell_server;
|
||||||
|
pub mod opnsense;
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
|
||||||
|
|
||||||
|
pub fn high_http_error_rate() -> PrometheusAlertRule {
|
||||||
|
let expression = r#"(
|
||||||
|
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job, route, service)
|
||||||
|
/
|
||||||
|
sum(rate(http_requests_total[5m])) by (job, route, service)
|
||||||
|
) > 0.05 and sum(rate(http_requests_total[5m])) by (job, route, service) > 10"#;
|
||||||
|
|
||||||
|
PrometheusAlertRule::new("HighApplicationErrorRate", expression)
|
||||||
|
.for_duration("10m")
|
||||||
|
.label("severity", "warning")
|
||||||
|
.annotation("summary", "High HTTP error rate on {{ $labels.job }}")
|
||||||
|
.annotation("description", "Job {{ $labels.job }} (route {{ $labels.route }}) has an error rate > 5% over the last 10m.")
|
||||||
|
}
|
||||||
@@ -1 +1,2 @@
|
|||||||
|
pub mod alerts;
|
||||||
pub mod prometheus_alert_rule;
|
pub mod prometheus_alert_rule;
|
||||||
|
|||||||
@@ -1,79 +1,13 @@
|
|||||||
use std::collections::{BTreeMap, HashMap};
|
use std::collections::HashMap;
|
||||||
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
|
|
||||||
use crate::{
|
use crate::{
|
||||||
interpret::{InterpretError, Outcome},
|
interpret::InterpretError,
|
||||||
modules::monitoring::{
|
modules::monitoring::{kube_prometheus::KubePrometheus, okd::OpenshiftClusterAlertSender},
|
||||||
kube_prometheus::{
|
topology::monitoring::AlertRule,
|
||||||
prometheus::{KubePrometheus, KubePrometheusRule},
|
|
||||||
types::{AlertGroup, AlertManagerAdditionalPromRules},
|
|
||||||
},
|
|
||||||
prometheus::prometheus::{Prometheus, PrometheusRule},
|
|
||||||
},
|
|
||||||
topology::oberservability::monitoring::AlertRule,
|
|
||||||
};
|
};
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertRule<KubePrometheus> for AlertManagerRuleGroup {
|
|
||||||
async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
|
|
||||||
sender.install_rule(self).await
|
|
||||||
}
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertRule<KubePrometheus>> {
|
|
||||||
Box::new(self.clone())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertRule<Prometheus> for AlertManagerRuleGroup {
|
|
||||||
async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
|
|
||||||
sender.install_rule(self).await
|
|
||||||
}
|
|
||||||
fn clone_box(&self) -> Box<dyn AlertRule<Prometheus>> {
|
|
||||||
Box::new(self.clone())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl PrometheusRule for AlertManagerRuleGroup {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
self.name.clone()
|
|
||||||
}
|
|
||||||
async fn configure_rule(&self) -> AlertManagerAdditionalPromRules {
|
|
||||||
let mut additional_prom_rules = BTreeMap::new();
|
|
||||||
|
|
||||||
additional_prom_rules.insert(
|
|
||||||
self.name.clone(),
|
|
||||||
AlertGroup {
|
|
||||||
groups: vec![self.clone()],
|
|
||||||
},
|
|
||||||
);
|
|
||||||
AlertManagerAdditionalPromRules {
|
|
||||||
rules: additional_prom_rules,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#[async_trait]
|
|
||||||
impl KubePrometheusRule for AlertManagerRuleGroup {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
self.name.clone()
|
|
||||||
}
|
|
||||||
async fn configure_rule(&self) -> AlertManagerAdditionalPromRules {
|
|
||||||
let mut additional_prom_rules = BTreeMap::new();
|
|
||||||
|
|
||||||
additional_prom_rules.insert(
|
|
||||||
self.name.clone(),
|
|
||||||
AlertGroup {
|
|
||||||
groups: vec![self.clone()],
|
|
||||||
},
|
|
||||||
);
|
|
||||||
AlertManagerAdditionalPromRules {
|
|
||||||
rules: additional_prom_rules,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl AlertManagerRuleGroup {
|
impl AlertManagerRuleGroup {
|
||||||
pub fn new(name: &str, rules: Vec<PrometheusAlertRule>) -> AlertManagerRuleGroup {
|
pub fn new(name: &str, rules: Vec<PrometheusAlertRule>) -> AlertManagerRuleGroup {
|
||||||
AlertManagerRuleGroup {
|
AlertManagerRuleGroup {
|
||||||
@@ -129,3 +63,55 @@ impl PrometheusAlertRule {
|
|||||||
self
|
self
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
impl AlertRule<OpenshiftClusterAlertSender> for AlertManagerRuleGroup {
|
||||||
|
fn build_rule(&self) -> Result<serde_json::Value, InterpretError> {
|
||||||
|
let name = self.name.clone();
|
||||||
|
let mut rules: Vec<crate::modules::monitoring::okd::crd::alerting_rules::Rule> = vec![];
|
||||||
|
for rule in self.rules.clone() {
|
||||||
|
rules.push(rule.into())
|
||||||
|
}
|
||||||
|
|
||||||
|
let rule_groups =
|
||||||
|
vec![crate::modules::monitoring::okd::crd::alerting_rules::RuleGroup { name, rules }];
|
||||||
|
|
||||||
|
Ok(serde_json::to_value(rule_groups).map_err(|e| InterpretError::new(e.to_string()))?)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn name(&self) -> String {
|
||||||
|
self.name.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertRule<OpenshiftClusterAlertSender>> {
|
||||||
|
Box::new(self.clone())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl AlertRule<KubePrometheus> for AlertManagerRuleGroup {
|
||||||
|
fn build_rule(&self) -> Result<serde_json::Value, InterpretError> {
|
||||||
|
let name = self.name.clone();
|
||||||
|
let mut rules: Vec<
|
||||||
|
crate::modules::monitoring::kube_prometheus::crd::crd_prometheus_rules::Rule,
|
||||||
|
> = vec![];
|
||||||
|
for rule in self.rules.clone() {
|
||||||
|
rules.push(rule.into())
|
||||||
|
}
|
||||||
|
|
||||||
|
let rule_groups = vec![
|
||||||
|
crate::modules::monitoring::kube_prometheus::crd::crd_prometheus_rules::RuleGroup {
|
||||||
|
name,
|
||||||
|
rules,
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
Ok(serde_json::to_value(rule_groups).map_err(|e| InterpretError::new(e.to_string()))?)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn name(&self) -> String {
|
||||||
|
self.name.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn clone_box(&self) -> Box<dyn AlertRule<KubePrometheus>> {
|
||||||
|
Box::new(self.clone())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -5,32 +5,26 @@ use serde::Serialize;
|
|||||||
|
|
||||||
use crate::{
|
use crate::{
|
||||||
interpret::Interpret,
|
interpret::Interpret,
|
||||||
modules::{
|
modules::{application::Application, monitoring::prometheus::Prometheus},
|
||||||
application::Application,
|
|
||||||
monitoring::{
|
|
||||||
grafana::grafana::Grafana, kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus,
|
|
||||||
},
|
|
||||||
prometheus::prometheus::PrometheusMonitoring,
|
|
||||||
},
|
|
||||||
score::Score,
|
score::Score,
|
||||||
topology::{
|
topology::{
|
||||||
K8sclient, Topology,
|
K8sclient, Topology,
|
||||||
oberservability::monitoring::{AlertReceiver, AlertingInterpret, ScrapeTarget},
|
monitoring::{AlertReceiver, AlertingInterpret, Observability, ScrapeTarget},
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub struct ApplicationMonitoringScore {
|
pub struct ApplicationMonitoringScore {
|
||||||
pub sender: CRDPrometheus,
|
pub sender: Prometheus,
|
||||||
pub application: Arc<dyn Application>,
|
pub application: Arc<dyn Application>,
|
||||||
pub receivers: Vec<Box<dyn AlertReceiver<CRDPrometheus>>>,
|
pub receivers: Vec<Box<dyn AlertReceiver<Prometheus>>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl<T: Topology + PrometheusMonitoring<CRDPrometheus> + K8sclient + Grafana> Score<T>
|
impl<T: Topology + Observability<Prometheus> + K8sclient> Score<T> for ApplicationMonitoringScore {
|
||||||
for ApplicationMonitoringScore
|
|
||||||
{
|
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
debug!("creating alerting interpret");
|
debug!("creating alerting interpret");
|
||||||
|
//TODO will need to use k8sclient to apply service monitors or find a way to pass
|
||||||
|
//them to the AlertingInterpret potentially via Sender Prometheus
|
||||||
Box::new(AlertingInterpret {
|
Box::new(AlertingInterpret {
|
||||||
sender: self.sender.clone(),
|
sender: self.sender.clone(),
|
||||||
receivers: self.receivers.clone(),
|
receivers: self.receivers.clone(),
|
||||||
|
|||||||
@@ -9,28 +9,27 @@ use crate::{
|
|||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
modules::{
|
modules::{
|
||||||
application::Application,
|
application::Application,
|
||||||
monitoring::kube_prometheus::crd::{
|
monitoring::red_hat_cluster_observability::RedHatClusterObservability,
|
||||||
crd_alertmanager_config::CRDPrometheus, rhob_alertmanager_config::RHOBObservability,
|
|
||||||
},
|
|
||||||
prometheus::prometheus::PrometheusMonitoring,
|
|
||||||
},
|
},
|
||||||
score::Score,
|
score::Score,
|
||||||
topology::{PreparationOutcome, Topology, oberservability::monitoring::AlertReceiver},
|
topology::{
|
||||||
|
Topology,
|
||||||
|
monitoring::{AlertReceiver, AlertingInterpret, Observability},
|
||||||
|
},
|
||||||
};
|
};
|
||||||
use harmony_types::id::Id;
|
use harmony_types::id::Id;
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub struct ApplicationRHOBMonitoringScore {
|
pub struct ApplicationRedHatClusterMonitoringScore {
|
||||||
pub sender: RHOBObservability,
|
pub sender: RedHatClusterObservability,
|
||||||
pub application: Arc<dyn Application>,
|
pub application: Arc<dyn Application>,
|
||||||
pub receivers: Vec<Box<dyn AlertReceiver<RHOBObservability>>>,
|
pub receivers: Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Score<T>
|
impl<T: Topology + Observability<RedHatClusterObservability>> Score<T>
|
||||||
for ApplicationRHOBMonitoringScore
|
for ApplicationRedHatClusterMonitoringScore
|
||||||
{
|
{
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
Box::new(ApplicationRHOBMonitoringInterpret {
|
Box::new(ApplicationRedHatClusterMonitoringInterpret {
|
||||||
score: self.clone(),
|
score: self.clone(),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
@@ -44,38 +43,28 @@ impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Score<T>
|
|||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug)]
|
#[derive(Debug)]
|
||||||
pub struct ApplicationRHOBMonitoringInterpret {
|
pub struct ApplicationRedHatClusterMonitoringInterpret {
|
||||||
score: ApplicationRHOBMonitoringScore,
|
score: ApplicationRedHatClusterMonitoringScore,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Interpret<T>
|
impl<T: Topology + Observability<RedHatClusterObservability>> Interpret<T>
|
||||||
for ApplicationRHOBMonitoringInterpret
|
for ApplicationRedHatClusterMonitoringInterpret
|
||||||
{
|
{
|
||||||
async fn execute(
|
async fn execute(
|
||||||
&self,
|
&self,
|
||||||
inventory: &Inventory,
|
inventory: &Inventory,
|
||||||
topology: &T,
|
topology: &T,
|
||||||
) -> Result<Outcome, InterpretError> {
|
) -> Result<Outcome, InterpretError> {
|
||||||
let result = topology
|
//TODO will need to use k8sclient to apply crd ServiceMonitor or find a way to pass
|
||||||
.install_prometheus(
|
//them to the AlertingInterpret potentially via Sender RedHatClusterObservability
|
||||||
&self.score.sender,
|
let alerting_interpret = AlertingInterpret {
|
||||||
inventory,
|
sender: self.score.sender.clone(),
|
||||||
Some(self.score.receivers.clone()),
|
receivers: self.score.receivers.clone(),
|
||||||
)
|
rules: vec![],
|
||||||
.await;
|
scrape_targets: None,
|
||||||
|
};
|
||||||
match result {
|
alerting_interpret.execute(inventory, topology).await
|
||||||
Ok(outcome) => match outcome {
|
|
||||||
PreparationOutcome::Success { details: _ } => {
|
|
||||||
Ok(Outcome::success("Prometheus installed".into()))
|
|
||||||
}
|
|
||||||
PreparationOutcome::Noop => {
|
|
||||||
Ok(Outcome::noop("Prometheus installation skipped".into()))
|
|
||||||
}
|
|
||||||
},
|
|
||||||
Err(err) => Err(InterpretError::from(err)),
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn get_name(&self) -> InterpretName {
|
fn get_name(&self) -> InterpretName {
|
||||||
|
|||||||
@@ -1,17 +1,41 @@
|
|||||||
use async_trait::async_trait;
|
use serde::Serialize;
|
||||||
use k8s_openapi::Resource;
|
|
||||||
|
|
||||||
use crate::{
|
use crate::topology::monitoring::{AlertReceiver, AlertRule, AlertSender, ScrapeTarget};
|
||||||
inventory::Inventory,
|
|
||||||
topology::{PreparationError, PreparationOutcome},
|
|
||||||
};
|
|
||||||
|
|
||||||
#[async_trait]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub trait Grafana {
|
pub struct Grafana {
|
||||||
async fn ensure_grafana_operator(
|
pub namespace: String,
|
||||||
&self,
|
}
|
||||||
inventory: &Inventory,
|
|
||||||
) -> Result<PreparationOutcome, PreparationError>;
|
impl AlertSender for Grafana {
|
||||||
|
fn name(&self) -> String {
|
||||||
async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError>;
|
"grafana".to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn AlertReceiver<Grafana>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn AlertRule<Grafana>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn ScrapeTarget<Grafana>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -0,0 +1,32 @@
|
|||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
modules::monitoring::grafana::grafana::Grafana,
|
||||||
|
score::Score,
|
||||||
|
topology::{
|
||||||
|
Topology,
|
||||||
|
monitoring::{AlertReceiver, AlertRule, AlertingInterpret, Observability, ScrapeTarget},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct GrafanaAlertingScore {
|
||||||
|
pub receivers: Vec<Box<dyn AlertReceiver<Grafana>>>,
|
||||||
|
pub rules: Vec<Box<dyn AlertRule<Grafana>>>,
|
||||||
|
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Grafana>>>>,
|
||||||
|
pub sender: Grafana,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + Observability<Grafana>> Score<T> for GrafanaAlertingScore {
|
||||||
|
fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
|
||||||
|
Box::new(AlertingInterpret {
|
||||||
|
sender: self.sender.clone(),
|
||||||
|
receivers: self.receivers.clone(),
|
||||||
|
rules: self.rules.clone(),
|
||||||
|
scrape_targets: self.scrape_targets.clone(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"HelmPrometheusAlertingScore".to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -1,28 +0,0 @@
|
|||||||
use harmony_macros::hurl;
|
|
||||||
use non_blank_string_rs::NonBlankString;
|
|
||||||
use std::{collections::HashMap, str::FromStr};
|
|
||||||
|
|
||||||
use crate::modules::helm::chart::{HelmChartScore, HelmRepository};
|
|
||||||
|
|
||||||
pub fn grafana_helm_chart_score(ns: &str, namespace_scope: bool) -> HelmChartScore {
|
|
||||||
let mut values_overrides = HashMap::new();
|
|
||||||
values_overrides.insert(
|
|
||||||
NonBlankString::from_str("namespaceScope").unwrap(),
|
|
||||||
namespace_scope.to_string(),
|
|
||||||
);
|
|
||||||
HelmChartScore {
|
|
||||||
namespace: Some(NonBlankString::from_str(ns).unwrap()),
|
|
||||||
release_name: NonBlankString::from_str("grafana-operator").unwrap(),
|
|
||||||
chart_name: NonBlankString::from_str("grafana/grafana-operator").unwrap(),
|
|
||||||
chart_version: None,
|
|
||||||
values_overrides: Some(values_overrides),
|
|
||||||
values_yaml: None,
|
|
||||||
create_namespace: true,
|
|
||||||
install_only: true,
|
|
||||||
repository: Some(HelmRepository::new(
|
|
||||||
"grafana".to_string(),
|
|
||||||
hurl!("https://grafana.github.io/helm-charts"),
|
|
||||||
true,
|
|
||||||
)),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1 +0,0 @@
|
|||||||
pub mod helm_grafana;
|
|
||||||
@@ -4,7 +4,7 @@ use kube::CustomResource;
|
|||||||
use schemars::JsonSchema;
|
use schemars::JsonSchema;
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
use super::crd_prometheuses::LabelSelector;
|
use crate::modules::monitoring::kube_prometheus::crd::crd_prometheuses::LabelSelector;
|
||||||
|
|
||||||
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
|
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
|
||||||
#[kube(
|
#[kube(
|
||||||
3
harmony/src/modules/monitoring/grafana/k8s/crd/mod.rs
Normal file
3
harmony/src/modules/monitoring/grafana/k8s/crd/mod.rs
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
pub mod crd_grafana;
|
||||||
|
pub mod grafana_default_dashboard;
|
||||||
|
pub mod rhob_grafana;
|
||||||
@@ -4,7 +4,7 @@ use kube::CustomResource;
|
|||||||
use schemars::JsonSchema;
|
use schemars::JsonSchema;
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
use crate::modules::monitoring::kube_prometheus::crd::rhob_prometheuses::LabelSelector;
|
use crate::modules::monitoring::red_hat_cluster_observability::crd::rhob_prometheuses::LabelSelector;
|
||||||
|
|
||||||
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
|
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
|
||||||
#[kube(
|
#[kube(
|
||||||
1
harmony/src/modules/monitoring/grafana/k8s/helm/mod.rs
Normal file
1
harmony/src/modules/monitoring/grafana/k8s/helm/mod.rs
Normal file
@@ -0,0 +1 @@
|
|||||||
|
pub mod grafana_operator;
|
||||||
7
harmony/src/modules/monitoring/grafana/k8s/mod.rs
Normal file
7
harmony/src/modules/monitoring/grafana/k8s/mod.rs
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
pub mod crd;
|
||||||
|
pub mod helm;
|
||||||
|
pub mod score_ensure_grafana_ready;
|
||||||
|
pub mod score_grafana_alert_receiver;
|
||||||
|
pub mod score_grafana_datasource;
|
||||||
|
pub mod score_grafana_rule;
|
||||||
|
pub mod score_install_grafana;
|
||||||
@@ -0,0 +1,54 @@
|
|||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::monitoring::grafana::grafana::Grafana,
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct GrafanaK8sEnsureReadyScore {
|
||||||
|
pub sender: Grafana,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for GrafanaK8sEnsureReadyScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"GrafanaK8sEnsureReadyScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// async fn ensure_ready(
|
||||||
|
// &self,
|
||||||
|
// inventory: &Inventory,
|
||||||
|
// ) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
// debug!("ensure grafana operator");
|
||||||
|
// let client = self.k8s_client().await.unwrap();
|
||||||
|
// let grafana_gvk = GroupVersionKind {
|
||||||
|
// group: "grafana.integreatly.org".to_string(),
|
||||||
|
// version: "v1beta1".to_string(),
|
||||||
|
// kind: "Grafana".to_string(),
|
||||||
|
// };
|
||||||
|
// let name = "grafanas.grafana.integreatly.org";
|
||||||
|
// let ns = "grafana";
|
||||||
|
//
|
||||||
|
// let grafana_crd = client
|
||||||
|
// .get_resource_json_value(name, Some(ns), &grafana_gvk)
|
||||||
|
// .await;
|
||||||
|
// match grafana_crd {
|
||||||
|
// Ok(_) => {
|
||||||
|
// return Ok(PreparationOutcome::Success {
|
||||||
|
// details: "Found grafana CRDs in cluster".to_string(),
|
||||||
|
// });
|
||||||
|
// }
|
||||||
|
//
|
||||||
|
// Err(_) => {
|
||||||
|
// return self
|
||||||
|
// .install_grafana_operator(inventory, Some("grafana"))
|
||||||
|
// .await;
|
||||||
|
// }
|
||||||
|
// };
|
||||||
|
// }
|
||||||
@@ -0,0 +1,24 @@
|
|||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::monitoring::grafana::grafana::Grafana,
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology, monitoring::AlertReceiver},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct GrafanaK8sReceiverScore {
|
||||||
|
pub sender: Grafana,
|
||||||
|
pub receiver: Box<dyn AlertReceiver<Grafana>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for GrafanaK8sReceiverScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"GrafanaK8sReceiverScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,83 @@
|
|||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::monitoring::grafana::grafana::Grafana,
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology, monitoring::ScrapeTarget},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct GrafanaK8sDatasourceScore {
|
||||||
|
pub sender: Grafana,
|
||||||
|
pub scrape_target: Box<dyn ScrapeTarget<Grafana>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for GrafanaK8sDatasourceScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"GrafanaK8sDatasourceScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
|
||||||
|
// let token_b64 = secret
|
||||||
|
// .data
|
||||||
|
// .get("token")
|
||||||
|
// .or_else(|| secret.data.get("data").and_then(|d| d.get("token")))
|
||||||
|
// .and_then(|v| v.as_str())?;
|
||||||
|
//
|
||||||
|
// let bytes = general_purpose::STANDARD.decode(token_b64).ok()?;
|
||||||
|
//
|
||||||
|
// let s = String::from_utf8(bytes).ok()?;
|
||||||
|
//
|
||||||
|
// let cleaned = s
|
||||||
|
// .trim_matches(|c: char| c.is_whitespace() || c == '\0')
|
||||||
|
// .to_string();
|
||||||
|
// Some(cleaned)
|
||||||
|
// }
|
||||||
|
// fn build_grafana_datasource(
|
||||||
|
// &self,
|
||||||
|
// name: &str,
|
||||||
|
// ns: &str,
|
||||||
|
// label_selector: &LabelSelector,
|
||||||
|
// url: &str,
|
||||||
|
// token: &str,
|
||||||
|
// ) -> GrafanaDatasource {
|
||||||
|
// let mut json_data = BTreeMap::new();
|
||||||
|
// json_data.insert("timeInterval".to_string(), "5s".to_string());
|
||||||
|
//
|
||||||
|
// GrafanaDatasource {
|
||||||
|
// metadata: ObjectMeta {
|
||||||
|
// name: Some(name.to_string()),
|
||||||
|
// namespace: Some(ns.to_string()),
|
||||||
|
// ..Default::default()
|
||||||
|
// },
|
||||||
|
// spec: GrafanaDatasourceSpec {
|
||||||
|
// instance_selector: label_selector.clone(),
|
||||||
|
// allow_cross_namespace_import: Some(true),
|
||||||
|
// values_from: None,
|
||||||
|
// datasource: GrafanaDatasourceConfig {
|
||||||
|
// access: "proxy".to_string(),
|
||||||
|
// name: name.to_string(),
|
||||||
|
// rype: "prometheus".to_string(),
|
||||||
|
// url: url.to_string(),
|
||||||
|
// database: None,
|
||||||
|
// json_data: Some(GrafanaDatasourceJsonData {
|
||||||
|
// time_interval: Some("60s".to_string()),
|
||||||
|
// http_header_name1: Some("Authorization".to_string()),
|
||||||
|
// tls_skip_verify: Some(true),
|
||||||
|
// oauth_pass_thru: Some(true),
|
||||||
|
// }),
|
||||||
|
// secure_json_data: Some(GrafanaDatasourceSecureJsonData {
|
||||||
|
// http_header_value1: Some(format!("Bearer {token}")),
|
||||||
|
// }),
|
||||||
|
// is_default: Some(false),
|
||||||
|
// editable: Some(true),
|
||||||
|
// },
|
||||||
|
// },
|
||||||
|
// }
|
||||||
|
// }
|
||||||
@@ -0,0 +1,67 @@
|
|||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::monitoring::grafana::grafana::Grafana,
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology, monitoring::AlertRule},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct GrafanaK8sRuleScore {
|
||||||
|
pub sender: Grafana,
|
||||||
|
pub rule: Box<dyn AlertRule<Grafana>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for GrafanaK8sRuleScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"GrafanaK8sRuleScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// kind: Secret
|
||||||
|
// apiVersion: v1
|
||||||
|
// metadata:
|
||||||
|
// name: credentials
|
||||||
|
// namespace: grafana
|
||||||
|
// stringData:
|
||||||
|
// PROMETHEUS_USERNAME: root
|
||||||
|
// PROMETHEUS_PASSWORD: secret
|
||||||
|
// type: Opaque
|
||||||
|
// ---
|
||||||
|
// apiVersion: grafana.integreatly.org/v1beta1
|
||||||
|
// kind: GrafanaDatasource
|
||||||
|
// metadata:
|
||||||
|
// name: grafanadatasource-sample
|
||||||
|
// spec:
|
||||||
|
// valuesFrom:
|
||||||
|
// - targetPath: "basicAuthUser"
|
||||||
|
// valueFrom:
|
||||||
|
// secretKeyRef:
|
||||||
|
// name: "credentials"
|
||||||
|
// key: "PROMETHEUS_USERNAME"
|
||||||
|
// - targetPath: "secureJsonData.basicAuthPassword"
|
||||||
|
// valueFrom:
|
||||||
|
// secretKeyRef:
|
||||||
|
// name: "credentials"
|
||||||
|
// key: "PROMETHEUS_PASSWORD"
|
||||||
|
// instanceSelector:
|
||||||
|
// matchLabels:
|
||||||
|
// dashboards: "grafana"
|
||||||
|
// datasource:
|
||||||
|
// name: prometheus
|
||||||
|
// type: prometheus
|
||||||
|
// access: proxy
|
||||||
|
// basicAuth: true
|
||||||
|
// url: http://prometheus-service:9090
|
||||||
|
// isDefault: true
|
||||||
|
// basicAuthUser: ${PROMETHEUS_USERNAME}
|
||||||
|
// jsonData:
|
||||||
|
// "tlsSkipVerify": true
|
||||||
|
// "timeInterval": "5s"
|
||||||
|
// secureJsonData:
|
||||||
|
// "basicAuthPassword": ${PROMETHEUS_PASSWORD} #
|
||||||
@@ -0,0 +1,223 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
use harmony_types::id::Id;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
data::Version,
|
||||||
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::monitoring::grafana::grafana::Grafana,
|
||||||
|
score::Score,
|
||||||
|
topology::{HelmCommand, K8sclient, Topology},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct GrafanaK8sInstallScore {
|
||||||
|
pub sender: Grafana,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient + HelmCommand> Score<T> for GrafanaK8sInstallScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"GrafanaK8sEnsureReadyScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(GrafanaK8sInstallInterpret {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct GrafanaK8sInstallInterpret {}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Topology + K8sclient + HelmCommand> Interpret<T> for GrafanaK8sInstallInterpret {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
InterpretName::Custom("GrafanaK8sInstallInterpret")
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<Id> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// let score = grafana_operator_helm_chart_score(sender.namespace.clone());
|
||||||
|
//
|
||||||
|
// score
|
||||||
|
// .create_interpret()
|
||||||
|
// .execute(inventory, self)
|
||||||
|
// .await
|
||||||
|
// .map_err(|e| PreparationError::new(e.to_string()))?;
|
||||||
|
//
|
||||||
|
|
||||||
|
//
|
||||||
|
// fn build_grafana_dashboard(
|
||||||
|
// &self,
|
||||||
|
// ns: &str,
|
||||||
|
// label_selector: &LabelSelector,
|
||||||
|
// ) -> GrafanaDashboard {
|
||||||
|
// let graf_dashboard = GrafanaDashboard {
|
||||||
|
// metadata: ObjectMeta {
|
||||||
|
// name: Some(format!("grafana-dashboard-{}", ns)),
|
||||||
|
// namespace: Some(ns.to_string()),
|
||||||
|
// ..Default::default()
|
||||||
|
// },
|
||||||
|
// spec: GrafanaDashboardSpec {
|
||||||
|
// resync_period: Some("30s".to_string()),
|
||||||
|
// instance_selector: label_selector.clone(),
|
||||||
|
// datasources: Some(vec![GrafanaDashboardDatasource {
|
||||||
|
// input_name: "DS_PROMETHEUS".to_string(),
|
||||||
|
// datasource_name: "thanos-openshift-monitoring".to_string(),
|
||||||
|
// }]),
|
||||||
|
// json: None,
|
||||||
|
// grafana_com: Some(GrafanaCom {
|
||||||
|
// id: 17406,
|
||||||
|
// revision: None,
|
||||||
|
// }),
|
||||||
|
// },
|
||||||
|
// };
|
||||||
|
// graf_dashboard
|
||||||
|
// }
|
||||||
|
//
|
||||||
|
// fn build_grafana(&self, ns: &str, labels: &BTreeMap<String, String>) -> GrafanaCRD {
|
||||||
|
// let grafana = GrafanaCRD {
|
||||||
|
// metadata: ObjectMeta {
|
||||||
|
// name: Some(format!("grafana-{}", ns)),
|
||||||
|
// namespace: Some(ns.to_string()),
|
||||||
|
// labels: Some(labels.clone()),
|
||||||
|
// ..Default::default()
|
||||||
|
// },
|
||||||
|
// spec: GrafanaSpec {
|
||||||
|
// config: None,
|
||||||
|
// admin_user: None,
|
||||||
|
// admin_password: None,
|
||||||
|
// ingress: None,
|
||||||
|
// persistence: None,
|
||||||
|
// resources: None,
|
||||||
|
// },
|
||||||
|
// };
|
||||||
|
// grafana
|
||||||
|
// }
|
||||||
|
//
|
||||||
|
// async fn build_grafana_ingress(&self, ns: &str) -> K8sIngressScore {
|
||||||
|
// let domain = self.get_domain(&format!("grafana-{}", ns)).await.unwrap();
|
||||||
|
// let name = format!("{}-grafana", ns);
|
||||||
|
// let backend_service = format!("grafana-{}-service", ns);
|
||||||
|
//
|
||||||
|
// K8sIngressScore {
|
||||||
|
// name: fqdn::fqdn!(&name),
|
||||||
|
// host: fqdn::fqdn!(&domain),
|
||||||
|
// backend_service: fqdn::fqdn!(&backend_service),
|
||||||
|
// port: 3000,
|
||||||
|
// path: Some("/".to_string()),
|
||||||
|
// path_type: Some(PathType::Prefix),
|
||||||
|
// namespace: Some(fqdn::fqdn!(&ns)),
|
||||||
|
// ingress_class_name: Some("openshift-default".to_string()),
|
||||||
|
// }
|
||||||
|
// }
|
||||||
|
// #[async_trait]
|
||||||
|
// impl Grafana for K8sAnywhereTopology {
|
||||||
|
// async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError> {
|
||||||
|
// let ns = "grafana";
|
||||||
|
//
|
||||||
|
// let mut label = BTreeMap::new();
|
||||||
|
//
|
||||||
|
// label.insert("dashboards".to_string(), "grafana".to_string());
|
||||||
|
//
|
||||||
|
// let label_selector = LabelSelector {
|
||||||
|
// match_labels: label.clone(),
|
||||||
|
// match_expressions: vec![],
|
||||||
|
// };
|
||||||
|
//
|
||||||
|
// let client = self.k8s_client().await?;
|
||||||
|
//
|
||||||
|
// let grafana = self.build_grafana(ns, &label);
|
||||||
|
//
|
||||||
|
// client.apply(&grafana, Some(ns)).await?;
|
||||||
|
// //TODO change this to a ensure ready or something better than just a timeout
|
||||||
|
// client
|
||||||
|
// .wait_until_deployment_ready(
|
||||||
|
// "grafana-grafana-deployment",
|
||||||
|
// Some("grafana"),
|
||||||
|
// Some(Duration::from_secs(30)),
|
||||||
|
// )
|
||||||
|
// .await?;
|
||||||
|
//
|
||||||
|
// let sa_name = "grafana-grafana-sa";
|
||||||
|
// let token_secret_name = "grafana-sa-token-secret";
|
||||||
|
//
|
||||||
|
// let sa_token_secret = self.build_sa_token_secret(token_secret_name, sa_name, ns);
|
||||||
|
//
|
||||||
|
// client.apply(&sa_token_secret, Some(ns)).await?;
|
||||||
|
// let secret_gvk = GroupVersionKind {
|
||||||
|
// group: "".to_string(),
|
||||||
|
// version: "v1".to_string(),
|
||||||
|
// kind: "Secret".to_string(),
|
||||||
|
// };
|
||||||
|
//
|
||||||
|
// let secret = client
|
||||||
|
// .get_resource_json_value(token_secret_name, Some(ns), &secret_gvk)
|
||||||
|
// .await?;
|
||||||
|
//
|
||||||
|
// let token = format!(
|
||||||
|
// "Bearer {}",
|
||||||
|
// self.extract_and_normalize_token(&secret).unwrap()
|
||||||
|
// );
|
||||||
|
//
|
||||||
|
// debug!("creating grafana clusterrole binding");
|
||||||
|
//
|
||||||
|
// let clusterrolebinding =
|
||||||
|
// self.build_cluster_rolebinding(sa_name, "cluster-monitoring-view", ns);
|
||||||
|
//
|
||||||
|
// client.apply(&clusterrolebinding, Some(ns)).await?;
|
||||||
|
//
|
||||||
|
// debug!("creating grafana datasource crd");
|
||||||
|
//
|
||||||
|
// let thanos_url = format!(
|
||||||
|
// "https://{}",
|
||||||
|
// self.get_domain("thanos-querier-openshift-monitoring")
|
||||||
|
// .await
|
||||||
|
// .unwrap()
|
||||||
|
// );
|
||||||
|
//
|
||||||
|
// let thanos_openshift_datasource = self.build_grafana_datasource(
|
||||||
|
// "thanos-openshift-monitoring",
|
||||||
|
// ns,
|
||||||
|
// &label_selector,
|
||||||
|
// &thanos_url,
|
||||||
|
// &token,
|
||||||
|
// );
|
||||||
|
//
|
||||||
|
// client.apply(&thanos_openshift_datasource, Some(ns)).await?;
|
||||||
|
//
|
||||||
|
// debug!("creating grafana dashboard crd");
|
||||||
|
// let dashboard = self.build_grafana_dashboard(ns, &label_selector);
|
||||||
|
//
|
||||||
|
// client.apply(&dashboard, Some(ns)).await?;
|
||||||
|
// debug!("creating grafana ingress");
|
||||||
|
// let grafana_ingress = self.build_grafana_ingress(ns).await;
|
||||||
|
//
|
||||||
|
// grafana_ingress
|
||||||
|
// .interpret(&Inventory::empty(), self)
|
||||||
|
// .await
|
||||||
|
// .map_err(|e| PreparationError::new(e.to_string()))?;
|
||||||
|
//
|
||||||
|
// Ok(PreparationOutcome::Success {
|
||||||
|
// details: "Installed grafana composants".to_string(),
|
||||||
|
// })
|
||||||
|
// }
|
||||||
|
// }
|
||||||
@@ -1,2 +1,3 @@
|
|||||||
pub mod grafana;
|
pub mod grafana;
|
||||||
pub mod helm;
|
pub mod grafana_alerting_score;
|
||||||
|
pub mod k8s;
|
||||||
|
|||||||
@@ -1,91 +1,17 @@
|
|||||||
use std::sync::Arc;
|
|
||||||
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use kube::CustomResource;
|
use kube::CustomResource;
|
||||||
use schemars::JsonSchema;
|
use schemars::JsonSchema;
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
use crate::{
|
#[derive(CustomResource, Serialize, Deserialize, Default, Debug, Clone, JsonSchema)]
|
||||||
interpret::InterpretError,
|
|
||||||
inventory::Inventory,
|
|
||||||
modules::{
|
|
||||||
monitoring::{
|
|
||||||
grafana::grafana::Grafana, kube_prometheus::crd::service_monitor::ServiceMonitor,
|
|
||||||
},
|
|
||||||
prometheus::prometheus::PrometheusMonitoring,
|
|
||||||
},
|
|
||||||
topology::{
|
|
||||||
K8sclient, Topology,
|
|
||||||
installable::Installable,
|
|
||||||
oberservability::monitoring::{AlertReceiver, AlertSender, ScrapeTarget},
|
|
||||||
},
|
|
||||||
};
|
|
||||||
use harmony_k8s::K8sClient;
|
|
||||||
|
|
||||||
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
|
|
||||||
#[kube(
|
#[kube(
|
||||||
group = "monitoring.coreos.com",
|
group = "monitoring.coreos.com",
|
||||||
version = "v1alpha1",
|
version = "v1",
|
||||||
kind = "AlertmanagerConfig",
|
kind = "AlertmanagerConfig",
|
||||||
plural = "alertmanagerconfigs",
|
plural = "alertmanagerconfigs",
|
||||||
namespaced
|
namespaced,
|
||||||
|
derive = "Default"
|
||||||
)]
|
)]
|
||||||
pub struct AlertmanagerConfigSpec {
|
pub struct AlertmanagerConfigSpec {
|
||||||
#[serde(flatten)]
|
#[serde(flatten)]
|
||||||
pub data: serde_json::Value,
|
pub data: serde_json::Value,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct CRDPrometheus {
|
|
||||||
pub namespace: String,
|
|
||||||
pub client: Arc<K8sClient>,
|
|
||||||
pub service_monitor: Vec<ServiceMonitor>,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl AlertSender for CRDPrometheus {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"CRDAlertManager".to_string()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Clone for Box<dyn AlertReceiver<CRDPrometheus>> {
|
|
||||||
fn clone(&self) -> Self {
|
|
||||||
self.clone_box()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Clone for Box<dyn ScrapeTarget<CRDPrometheus>> {
|
|
||||||
fn clone(&self) -> Self {
|
|
||||||
self.clone_box()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Serialize for Box<dyn AlertReceiver<CRDPrometheus>> {
|
|
||||||
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
|
||||||
where
|
|
||||||
S: serde::Serializer,
|
|
||||||
{
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + K8sclient + PrometheusMonitoring<CRDPrometheus> + Grafana> Installable<T>
|
|
||||||
for CRDPrometheus
|
|
||||||
{
|
|
||||||
async fn configure(&self, inventory: &Inventory, topology: &T) -> Result<(), InterpretError> {
|
|
||||||
topology.ensure_grafana_operator(inventory).await?;
|
|
||||||
topology.ensure_prometheus_operator(self, inventory).await?;
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn ensure_installed(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<(), InterpretError> {
|
|
||||||
topology.install_grafana().await?;
|
|
||||||
topology.install_prometheus(&self, inventory, None).await?;
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
use crate::modules::prometheus::alerts::k8s::{
|
use crate::modules::monitoring::alert_rule::alerts::k8s::{
|
||||||
deployment::alert_deployment_unavailable,
|
deployment::alert_deployment_unavailable,
|
||||||
pod::{alert_container_restarting, alert_pod_not_ready, pod_failed},
|
pod::{alert_container_restarting, alert_pod_not_ready, pod_failed},
|
||||||
pvc::high_pvc_fill_rate_over_two_days,
|
pvc::high_pvc_fill_rate_over_two_days,
|
||||||
|
|||||||
@@ -6,13 +6,14 @@ use serde::{Deserialize, Serialize};
|
|||||||
|
|
||||||
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
|
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
|
||||||
|
|
||||||
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone, JsonSchema)]
|
#[derive(CustomResource, Default, Debug, Serialize, Deserialize, Clone, JsonSchema)]
|
||||||
#[kube(
|
#[kube(
|
||||||
group = "monitoring.coreos.com",
|
group = "monitoring.coreos.com",
|
||||||
version = "v1",
|
version = "v1",
|
||||||
kind = "PrometheusRule",
|
kind = "PrometheusRule",
|
||||||
plural = "prometheusrules",
|
plural = "prometheusrules",
|
||||||
namespaced
|
namespaced,
|
||||||
|
derive = "Default"
|
||||||
)]
|
)]
|
||||||
#[serde(rename_all = "camelCase")]
|
#[serde(rename_all = "camelCase")]
|
||||||
pub struct PrometheusRuleSpec {
|
pub struct PrometheusRuleSpec {
|
||||||
|
|||||||
@@ -1,23 +1,18 @@
|
|||||||
use std::net::IpAddr;
|
use std::collections::BTreeMap;
|
||||||
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use kube::CustomResource;
|
use kube::CustomResource;
|
||||||
use schemars::JsonSchema;
|
use schemars::JsonSchema;
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
use crate::{
|
use crate::modules::monitoring::kube_prometheus::crd::crd_prometheuses::LabelSelector;
|
||||||
modules::monitoring::kube_prometheus::crd::{
|
|
||||||
crd_alertmanager_config::CRDPrometheus, crd_prometheuses::LabelSelector,
|
|
||||||
},
|
|
||||||
topology::oberservability::monitoring::ScrapeTarget,
|
|
||||||
};
|
|
||||||
|
|
||||||
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
|
#[derive(CustomResource, Default, Serialize, Deserialize, Debug, Clone, JsonSchema)]
|
||||||
#[kube(
|
#[kube(
|
||||||
group = "monitoring.coreos.com",
|
group = "monitoring.coreos.com",
|
||||||
version = "v1alpha1",
|
version = "v1alpha1",
|
||||||
kind = "ScrapeConfig",
|
kind = "ScrapeConfig",
|
||||||
plural = "scrapeconfigs",
|
plural = "scrapeconfigs",
|
||||||
|
derive = "Default",
|
||||||
namespaced
|
namespaced
|
||||||
)]
|
)]
|
||||||
#[serde(rename_all = "camelCase")]
|
#[serde(rename_all = "camelCase")]
|
||||||
@@ -70,8 +65,8 @@ pub struct ScrapeConfigSpec {
|
|||||||
#[serde(rename_all = "camelCase")]
|
#[serde(rename_all = "camelCase")]
|
||||||
pub struct StaticConfig {
|
pub struct StaticConfig {
|
||||||
pub targets: Vec<String>,
|
pub targets: Vec<String>,
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
pub labels: Option<LabelSelector>,
|
pub labels: Option<BTreeMap<String, String>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Relabeling configuration for target or metric relabeling.
|
/// Relabeling configuration for target or metric relabeling.
|
||||||
|
|||||||
@@ -1,22 +1,9 @@
|
|||||||
pub mod crd_alertmanager_config;
|
pub mod crd_alertmanager_config;
|
||||||
pub mod crd_alertmanagers;
|
pub mod crd_alertmanagers;
|
||||||
pub mod crd_default_rules;
|
pub mod crd_default_rules;
|
||||||
pub mod crd_grafana;
|
|
||||||
pub mod crd_prometheus_rules;
|
pub mod crd_prometheus_rules;
|
||||||
pub mod crd_prometheuses;
|
pub mod crd_prometheuses;
|
||||||
pub mod crd_scrape_config;
|
pub mod crd_scrape_config;
|
||||||
pub mod grafana_default_dashboard;
|
|
||||||
pub mod grafana_operator;
|
|
||||||
pub mod prometheus_operator;
|
pub mod prometheus_operator;
|
||||||
pub mod rhob_alertmanager_config;
|
|
||||||
pub mod rhob_alertmanagers;
|
|
||||||
pub mod rhob_cluster_observability_operator;
|
|
||||||
pub mod rhob_default_rules;
|
|
||||||
pub mod rhob_grafana;
|
|
||||||
pub mod rhob_monitoring_stack;
|
|
||||||
pub mod rhob_prometheus_rules;
|
|
||||||
pub mod rhob_prometheuses;
|
|
||||||
pub mod rhob_role;
|
|
||||||
pub mod rhob_service_monitor;
|
|
||||||
pub mod role;
|
pub mod role;
|
||||||
pub mod service_monitor;
|
pub mod service_monitor;
|
||||||
|
|||||||
@@ -1,48 +0,0 @@
|
|||||||
use std::sync::Arc;
|
|
||||||
|
|
||||||
use kube::CustomResource;
|
|
||||||
use schemars::JsonSchema;
|
|
||||||
use serde::{Deserialize, Serialize};
|
|
||||||
|
|
||||||
use crate::topology::oberservability::monitoring::{AlertReceiver, AlertSender};
|
|
||||||
use harmony_k8s::K8sClient;
|
|
||||||
|
|
||||||
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
|
|
||||||
#[kube(
|
|
||||||
group = "monitoring.rhobs",
|
|
||||||
version = "v1alpha1",
|
|
||||||
kind = "AlertmanagerConfig",
|
|
||||||
plural = "alertmanagerconfigs",
|
|
||||||
namespaced
|
|
||||||
)]
|
|
||||||
pub struct AlertmanagerConfigSpec {
|
|
||||||
#[serde(flatten)]
|
|
||||||
pub data: serde_json::Value,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct RHOBObservability {
|
|
||||||
pub namespace: String,
|
|
||||||
pub client: Arc<K8sClient>,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl AlertSender for RHOBObservability {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"RHOBAlertManager".to_string()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Clone for Box<dyn AlertReceiver<RHOBObservability>> {
|
|
||||||
fn clone(&self) -> Self {
|
|
||||||
self.clone_box()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Serialize for Box<dyn AlertReceiver<RHOBObservability>> {
|
|
||||||
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
|
||||||
where
|
|
||||||
S: serde::Serializer,
|
|
||||||
{
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,22 +0,0 @@
|
|||||||
use std::str::FromStr;
|
|
||||||
|
|
||||||
use non_blank_string_rs::NonBlankString;
|
|
||||||
|
|
||||||
use crate::modules::helm::chart::HelmChartScore;
|
|
||||||
//TODO package chart or something for COO okd
|
|
||||||
pub fn rhob_cluster_observability_operator() -> HelmChartScore {
|
|
||||||
HelmChartScore {
|
|
||||||
namespace: None,
|
|
||||||
release_name: NonBlankString::from_str("").unwrap(),
|
|
||||||
chart_name: NonBlankString::from_str(
|
|
||||||
"oci://hub.nationtech.io/harmony/nt-prometheus-operator",
|
|
||||||
)
|
|
||||||
.unwrap(),
|
|
||||||
chart_version: None,
|
|
||||||
values_overrides: None,
|
|
||||||
values_yaml: None,
|
|
||||||
create_namespace: true,
|
|
||||||
install_only: true,
|
|
||||||
repository: None,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,20 +1,13 @@
|
|||||||
use super::config::KubePrometheusConfig;
|
use super::config::KubePrometheusConfig;
|
||||||
use log::debug;
|
|
||||||
use non_blank_string_rs::NonBlankString;
|
use non_blank_string_rs::NonBlankString;
|
||||||
use serde_yaml::{Mapping, Value};
|
|
||||||
use std::{
|
use std::{
|
||||||
collections::BTreeMap,
|
|
||||||
str::FromStr,
|
str::FromStr,
|
||||||
sync::{Arc, Mutex},
|
sync::{Arc, Mutex},
|
||||||
};
|
};
|
||||||
|
|
||||||
use crate::modules::{
|
use crate::modules::{
|
||||||
helm::chart::HelmChartScore,
|
helm::chart::HelmChartScore,
|
||||||
monitoring::kube_prometheus::types::{
|
monitoring::kube_prometheus::types::{Limits, Requests, Resources},
|
||||||
AlertGroup, AlertManager, AlertManagerAdditionalPromRules, AlertManagerConfig,
|
|
||||||
AlertManagerConfigSelector, AlertManagerRoute, AlertManagerSpec, AlertManagerValues,
|
|
||||||
ConfigReloader, Limits, PrometheusConfig, Requests, Resources,
|
|
||||||
},
|
|
||||||
};
|
};
|
||||||
|
|
||||||
pub fn kube_prometheus_helm_chart_score(
|
pub fn kube_prometheus_helm_chart_score(
|
||||||
@@ -66,7 +59,7 @@ pub fn kube_prometheus_helm_chart_score(
|
|||||||
}
|
}
|
||||||
let _resource_section = resource_block(&resource_limit, 2);
|
let _resource_section = resource_block(&resource_limit, 2);
|
||||||
|
|
||||||
let mut values = format!(
|
let values = format!(
|
||||||
r#"
|
r#"
|
||||||
global:
|
global:
|
||||||
rbac:
|
rbac:
|
||||||
@@ -281,131 +274,6 @@ prometheusOperator:
|
|||||||
"#,
|
"#,
|
||||||
);
|
);
|
||||||
|
|
||||||
let prometheus_config =
|
|
||||||
crate::modules::monitoring::kube_prometheus::types::PrometheusConfigValues {
|
|
||||||
prometheus: PrometheusConfig {
|
|
||||||
prometheus: bool::from_str(prometheus.as_str()).expect("couldn't parse bool"),
|
|
||||||
additional_service_monitors: config.additional_service_monitors.clone(),
|
|
||||||
},
|
|
||||||
};
|
|
||||||
let prometheus_config_yaml =
|
|
||||||
serde_yaml::to_string(&prometheus_config).expect("Failed to serialize YAML");
|
|
||||||
|
|
||||||
debug!(
|
|
||||||
"serialized prometheus config: \n {:#}",
|
|
||||||
prometheus_config_yaml
|
|
||||||
);
|
|
||||||
values.push_str(&prometheus_config_yaml);
|
|
||||||
|
|
||||||
// add required null receiver for prometheus alert manager
|
|
||||||
let mut null_receiver = Mapping::new();
|
|
||||||
null_receiver.insert(
|
|
||||||
Value::String("receiver".to_string()),
|
|
||||||
Value::String("null".to_string()),
|
|
||||||
);
|
|
||||||
null_receiver.insert(
|
|
||||||
Value::String("matchers".to_string()),
|
|
||||||
Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
|
|
||||||
);
|
|
||||||
null_receiver.insert(Value::String("continue".to_string()), Value::Bool(true));
|
|
||||||
|
|
||||||
//add alert channels
|
|
||||||
let mut alert_manager_channel_config = AlertManagerConfig {
|
|
||||||
global: Mapping::new(),
|
|
||||||
route: AlertManagerRoute {
|
|
||||||
routes: vec![Value::Mapping(null_receiver)],
|
|
||||||
},
|
|
||||||
receivers: vec![serde_yaml::from_str("name: 'null'").unwrap()],
|
|
||||||
};
|
|
||||||
for receiver in config.alert_receiver_configs.iter() {
|
|
||||||
if let Some(global) = receiver.channel_global_config.clone() {
|
|
||||||
alert_manager_channel_config
|
|
||||||
.global
|
|
||||||
.insert(global.0, global.1);
|
|
||||||
}
|
|
||||||
alert_manager_channel_config
|
|
||||||
.route
|
|
||||||
.routes
|
|
||||||
.push(receiver.channel_route.clone());
|
|
||||||
alert_manager_channel_config
|
|
||||||
.receivers
|
|
||||||
.push(receiver.channel_receiver.clone());
|
|
||||||
}
|
|
||||||
|
|
||||||
let mut labels = BTreeMap::new();
|
|
||||||
labels.insert("alertmanagerConfig".to_string(), "enabled".to_string());
|
|
||||||
let alert_manager_config_selector = AlertManagerConfigSelector {
|
|
||||||
match_labels: labels,
|
|
||||||
};
|
|
||||||
let alert_manager_values = AlertManagerValues {
|
|
||||||
alertmanager: AlertManager {
|
|
||||||
enabled: config.alert_manager,
|
|
||||||
config: alert_manager_channel_config,
|
|
||||||
alertmanager_spec: AlertManagerSpec {
|
|
||||||
resources: Resources {
|
|
||||||
limits: Limits {
|
|
||||||
memory: "100Mi".to_string(),
|
|
||||||
cpu: "100m".to_string(),
|
|
||||||
},
|
|
||||||
requests: Requests {
|
|
||||||
memory: "100Mi".to_string(),
|
|
||||||
cpu: "100m".to_string(),
|
|
||||||
},
|
|
||||||
},
|
|
||||||
alert_manager_config_selector,
|
|
||||||
replicas: 2,
|
|
||||||
},
|
|
||||||
init_config_reloader: ConfigReloader {
|
|
||||||
resources: Resources {
|
|
||||||
limits: Limits {
|
|
||||||
memory: "100Mi".to_string(),
|
|
||||||
cpu: "100m".to_string(),
|
|
||||||
},
|
|
||||||
requests: Requests {
|
|
||||||
memory: "100Mi".to_string(),
|
|
||||||
cpu: "100m".to_string(),
|
|
||||||
},
|
|
||||||
},
|
|
||||||
},
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
let alert_manager_yaml =
|
|
||||||
serde_yaml::to_string(&alert_manager_values).expect("Failed to serialize YAML");
|
|
||||||
debug!("serialized alert manager: \n {:#}", alert_manager_yaml);
|
|
||||||
values.push_str(&alert_manager_yaml);
|
|
||||||
|
|
||||||
//format alert manager additional rules for helm chart
|
|
||||||
let mut merged_rules: BTreeMap<String, AlertGroup> = BTreeMap::new();
|
|
||||||
|
|
||||||
for additional_rule in config.alert_rules.clone() {
|
|
||||||
for (key, group) in additional_rule.rules {
|
|
||||||
merged_rules.insert(key, group);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
let merged_rules = AlertManagerAdditionalPromRules {
|
|
||||||
rules: merged_rules,
|
|
||||||
};
|
|
||||||
|
|
||||||
let mut alert_manager_additional_rules = serde_yaml::Mapping::new();
|
|
||||||
let rules_value = serde_yaml::to_value(merged_rules).unwrap();
|
|
||||||
|
|
||||||
alert_manager_additional_rules.insert(
|
|
||||||
serde_yaml::Value::String("additionalPrometheusRulesMap".to_string()),
|
|
||||||
rules_value,
|
|
||||||
);
|
|
||||||
|
|
||||||
let alert_manager_additional_rules_yaml =
|
|
||||||
serde_yaml::to_string(&alert_manager_additional_rules).expect("Failed to serialize YAML");
|
|
||||||
debug!(
|
|
||||||
"alert_rules_yaml:\n{:#}",
|
|
||||||
alert_manager_additional_rules_yaml
|
|
||||||
);
|
|
||||||
|
|
||||||
values.push_str(&alert_manager_additional_rules_yaml);
|
|
||||||
debug!("full values.yaml: \n {:#}", values);
|
|
||||||
|
|
||||||
HelmChartScore {
|
HelmChartScore {
|
||||||
namespace: Some(NonBlankString::from_str(&config.namespace.clone().unwrap()).unwrap()),
|
namespace: Some(NonBlankString::from_str(&config.namespace.clone().unwrap()).unwrap()),
|
||||||
release_name: NonBlankString::from_str("kube-prometheus").unwrap(),
|
release_name: NonBlankString::from_str("kube-prometheus").unwrap(),
|
||||||
|
|||||||
@@ -2,36 +2,41 @@ use std::sync::{Arc, Mutex};
|
|||||||
|
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
|
|
||||||
use super::{helm::config::KubePrometheusConfig, prometheus::KubePrometheus};
|
use super::helm::config::KubePrometheusConfig;
|
||||||
use crate::{
|
use crate::{
|
||||||
modules::monitoring::kube_prometheus::types::ServiceMonitor,
|
modules::monitoring::kube_prometheus::{KubePrometheus, types::ServiceMonitor},
|
||||||
score::Score,
|
score::Score,
|
||||||
topology::{
|
topology::{
|
||||||
HelmCommand, Topology,
|
Topology,
|
||||||
oberservability::monitoring::{AlertReceiver, AlertRule, AlertingInterpret},
|
monitoring::{AlertReceiver, AlertRule, AlertingInterpret, Observability, ScrapeTarget},
|
||||||
tenant::TenantManager,
|
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
//TODO untested
|
||||||
#[derive(Clone, Debug, Serialize)]
|
#[derive(Clone, Debug, Serialize)]
|
||||||
pub struct HelmPrometheusAlertingScore {
|
pub struct KubePrometheusAlertingScore {
|
||||||
pub receivers: Vec<Box<dyn AlertReceiver<KubePrometheus>>>,
|
pub receivers: Vec<Box<dyn AlertReceiver<KubePrometheus>>>,
|
||||||
pub rules: Vec<Box<dyn AlertRule<KubePrometheus>>>,
|
pub rules: Vec<Box<dyn AlertRule<KubePrometheus>>>,
|
||||||
|
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<KubePrometheus>>>>,
|
||||||
pub service_monitors: Vec<ServiceMonitor>,
|
pub service_monitors: Vec<ServiceMonitor>,
|
||||||
|
pub config: Arc<Mutex<KubePrometheusConfig>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl<T: Topology + HelmCommand + TenantManager> Score<T> for HelmPrometheusAlertingScore {
|
impl<T: Topology + Observability<KubePrometheus>> Score<T> for KubePrometheusAlertingScore {
|
||||||
fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
|
fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
|
||||||
let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
|
//TODO test that additional service monitor is added
|
||||||
config
|
self.config
|
||||||
.try_lock()
|
.try_lock()
|
||||||
.expect("couldn't lock config")
|
.expect("couldn't lock config")
|
||||||
.additional_service_monitors = self.service_monitors.clone();
|
.additional_service_monitors = self.service_monitors.clone();
|
||||||
|
|
||||||
Box::new(AlertingInterpret {
|
Box::new(AlertingInterpret {
|
||||||
sender: KubePrometheus { config },
|
sender: KubePrometheus {
|
||||||
|
config: self.config.clone(),
|
||||||
|
},
|
||||||
receivers: self.receivers.clone(),
|
receivers: self.receivers.clone(),
|
||||||
rules: self.rules.clone(),
|
rules: self.rules.clone(),
|
||||||
scrape_targets: None,
|
scrape_targets: self.scrape_targets.clone(),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
fn name(&self) -> String {
|
fn name(&self) -> String {
|
||||||
@@ -1,5 +1,71 @@
|
|||||||
|
use std::sync::{Arc, Mutex};
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
modules::monitoring::kube_prometheus::helm::config::KubePrometheusConfig,
|
||||||
|
topology::monitoring::{AlertReceiver, AlertRule, AlertSender, ScrapeTarget},
|
||||||
|
};
|
||||||
|
|
||||||
pub mod crd;
|
pub mod crd;
|
||||||
pub mod helm;
|
pub mod helm;
|
||||||
pub mod helm_prometheus_alert_score;
|
pub mod kube_prometheus_alerting_score;
|
||||||
pub mod prometheus;
|
pub mod score_kube_prometheus_alert_receivers;
|
||||||
|
pub mod score_kube_prometheus_ensure_ready;
|
||||||
|
pub mod score_kube_prometheus_rule;
|
||||||
|
pub mod score_kube_prometheus_scrape_target;
|
||||||
pub mod types;
|
pub mod types;
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn ScrapeTarget<KubePrometheus>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl AlertSender for KubePrometheus {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"HelmKubePrometheus".to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct KubePrometheus {
|
||||||
|
pub config: Arc<Mutex<KubePrometheusConfig>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for KubePrometheus {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self::new()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl KubePrometheus {
|
||||||
|
pub fn new() -> Self {
|
||||||
|
Self {
|
||||||
|
config: Arc::new(Mutex::new(KubePrometheusConfig::new())),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn AlertReceiver<KubePrometheus>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn AlertRule<KubePrometheus>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -1,167 +0,0 @@
|
|||||||
use std::sync::{Arc, Mutex};
|
|
||||||
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use log::{debug, error};
|
|
||||||
use serde::Serialize;
|
|
||||||
|
|
||||||
use crate::{
|
|
||||||
interpret::{InterpretError, Outcome},
|
|
||||||
inventory::Inventory,
|
|
||||||
modules::monitoring::alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
|
|
||||||
score,
|
|
||||||
topology::{
|
|
||||||
HelmCommand, Topology,
|
|
||||||
installable::Installable,
|
|
||||||
oberservability::monitoring::{AlertReceiver, AlertRule, AlertSender},
|
|
||||||
tenant::TenantManager,
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
use score::Score;
|
|
||||||
|
|
||||||
use super::{
|
|
||||||
helm::{
|
|
||||||
config::KubePrometheusConfig, kube_prometheus_helm_chart::kube_prometheus_helm_chart_score,
|
|
||||||
},
|
|
||||||
types::{AlertManagerAdditionalPromRules, AlertManagerChannelConfig},
|
|
||||||
};
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl AlertSender for KubePrometheus {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"HelmKubePrometheus".to_string()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + HelmCommand + TenantManager> Installable<T> for KubePrometheus {
|
|
||||||
async fn configure(&self, _inventory: &Inventory, topology: &T) -> Result<(), InterpretError> {
|
|
||||||
self.configure_with_topology(topology).await;
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn ensure_installed(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<(), InterpretError> {
|
|
||||||
self.install_prometheus(inventory, topology).await?;
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug)]
|
|
||||||
pub struct KubePrometheus {
|
|
||||||
pub config: Arc<Mutex<KubePrometheusConfig>>,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Default for KubePrometheus {
|
|
||||||
fn default() -> Self {
|
|
||||||
Self::new()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl KubePrometheus {
|
|
||||||
pub fn new() -> Self {
|
|
||||||
Self {
|
|
||||||
config: Arc::new(Mutex::new(KubePrometheusConfig::new())),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
pub async fn configure_with_topology<T: TenantManager>(&self, topology: &T) {
|
|
||||||
let ns = topology
|
|
||||||
.get_tenant_config()
|
|
||||||
.await
|
|
||||||
.map(|cfg| cfg.name.clone())
|
|
||||||
.unwrap_or_else(|| "monitoring".to_string());
|
|
||||||
error!("This must be refactored, see comments in pr #74");
|
|
||||||
debug!("NS: {}", ns);
|
|
||||||
self.config.lock().unwrap().namespace = Some(ns);
|
|
||||||
}
|
|
||||||
|
|
||||||
pub async fn install_receiver(
|
|
||||||
&self,
|
|
||||||
prometheus_receiver: &dyn KubePrometheusReceiver,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
let prom_receiver = prometheus_receiver.configure_receiver().await;
|
|
||||||
debug!(
|
|
||||||
"adding alert receiver to prometheus config: {:#?}",
|
|
||||||
&prom_receiver
|
|
||||||
);
|
|
||||||
let mut config = self.config.lock().unwrap();
|
|
||||||
|
|
||||||
config.alert_receiver_configs.push(prom_receiver);
|
|
||||||
let prom_receiver_name = prometheus_receiver.name();
|
|
||||||
debug!("installed alert receiver {}", &prom_receiver_name);
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"Sucessfully installed receiver {}",
|
|
||||||
prom_receiver_name
|
|
||||||
)))
|
|
||||||
}
|
|
||||||
|
|
||||||
pub async fn install_rule(
|
|
||||||
&self,
|
|
||||||
prometheus_rule: &AlertManagerRuleGroup,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
let prometheus_rule = prometheus_rule.configure_rule().await;
|
|
||||||
let mut config = self.config.lock().unwrap();
|
|
||||||
|
|
||||||
config.alert_rules.push(prometheus_rule.clone());
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"Successfully installed alert rule: {:#?},",
|
|
||||||
prometheus_rule
|
|
||||||
)))
|
|
||||||
}
|
|
||||||
|
|
||||||
pub async fn install_prometheus<T: Topology + HelmCommand + Send + Sync>(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
kube_prometheus_helm_chart_score(self.config.clone())
|
|
||||||
.interpret(inventory, topology)
|
|
||||||
.await
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait KubePrometheusReceiver: Send + Sync + std::fmt::Debug {
|
|
||||||
fn name(&self) -> String;
|
|
||||||
async fn configure_receiver(&self) -> AlertManagerChannelConfig;
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Serialize for Box<dyn AlertReceiver<KubePrometheus>> {
|
|
||||||
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
|
||||||
where
|
|
||||||
S: serde::Serializer,
|
|
||||||
{
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Clone for Box<dyn AlertReceiver<KubePrometheus>> {
|
|
||||||
fn clone(&self) -> Self {
|
|
||||||
self.clone_box()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait KubePrometheusRule: Send + Sync + std::fmt::Debug {
|
|
||||||
fn name(&self) -> String;
|
|
||||||
async fn configure_rule(&self) -> AlertManagerAdditionalPromRules;
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Serialize for Box<dyn AlertRule<KubePrometheus>> {
|
|
||||||
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
|
||||||
where
|
|
||||||
S: serde::Serializer,
|
|
||||||
{
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Clone for Box<dyn AlertRule<KubePrometheus>> {
|
|
||||||
fn clone(&self) -> Self {
|
|
||||||
self.clone_box()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -0,0 +1,61 @@
|
|||||||
|
use kube::api::ObjectMeta;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::{
|
||||||
|
k8s::resource::K8sResourceScore,
|
||||||
|
monitoring::kube_prometheus::{
|
||||||
|
KubePrometheus,
|
||||||
|
crd::crd_alertmanager_config::{AlertmanagerConfig, AlertmanagerConfigSpec},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology, monitoring::AlertReceiver},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct KubePrometheusReceiverScore {
|
||||||
|
pub sender: KubePrometheus,
|
||||||
|
pub receiver: Box<dyn AlertReceiver<KubePrometheus>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for KubePrometheusReceiverScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"KubePrometheusReceiverScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
let name = self.receiver.name();
|
||||||
|
let namespace = self.sender.config.lock().unwrap().namespace.clone();
|
||||||
|
let install_plan = self.receiver.build().expect("failed to build install plan");
|
||||||
|
|
||||||
|
let route = install_plan.route.expect(&format!(
|
||||||
|
"failed to build route for receveiver {}",
|
||||||
|
name.clone()
|
||||||
|
));
|
||||||
|
|
||||||
|
let route = serde_yaml::to_value(route).expect("failed to serialize route object to yaml");
|
||||||
|
|
||||||
|
let receiver = install_plan.receiver.expect(&format!(
|
||||||
|
"failed to build receiver path for receiver {}",
|
||||||
|
name.clone()
|
||||||
|
));
|
||||||
|
|
||||||
|
let data = serde_json::json!({
|
||||||
|
"route": route,
|
||||||
|
"receivers": [receiver]
|
||||||
|
});
|
||||||
|
|
||||||
|
let alertmanager_config = AlertmanagerConfig {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(name),
|
||||||
|
namespace: namespace.clone(),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
spec: AlertmanagerConfigSpec { data: data },
|
||||||
|
};
|
||||||
|
|
||||||
|
K8sResourceScore::single(alertmanager_config, namespace).create_interpret()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,80 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
use harmony_types::id::Id;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
data::Version,
|
||||||
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::monitoring::kube_prometheus::KubePrometheus,
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct KubePrometheusEnsureReadyScore {
|
||||||
|
pub sender: KubePrometheus,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for KubePrometheusEnsureReadyScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"KubePrometheusEnsureReadyScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(KubePrometheusEnsureReadyInterpret {
|
||||||
|
sender: self.sender.clone(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct KubePrometheusEnsureReadyInterpret {
|
||||||
|
pub sender: KubePrometheus,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Topology + K8sclient> Interpret<T> for KubePrometheusEnsureReadyInterpret {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
_inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
let client = topology.k8s_client().await?;
|
||||||
|
let namespace = self
|
||||||
|
.sender
|
||||||
|
.config
|
||||||
|
.lock()
|
||||||
|
.unwrap()
|
||||||
|
.namespace
|
||||||
|
.clone()
|
||||||
|
.unwrap_or("default".to_string());
|
||||||
|
|
||||||
|
let prometheus_name = "kube-prometheues-kube-prometheus-operator";
|
||||||
|
|
||||||
|
client
|
||||||
|
.wait_until_deployment_ready(prometheus_name, Some(&namespace), None)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
Ok(Outcome::success(format!(
|
||||||
|
"deployment: {} ready in ns: {}",
|
||||||
|
prometheus_name, namespace
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<Id> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,46 @@
|
|||||||
|
use kube::api::ObjectMeta;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::{
|
||||||
|
k8s::resource::K8sResourceScore,
|
||||||
|
monitoring::kube_prometheus::{
|
||||||
|
KubePrometheus,
|
||||||
|
crd::crd_prometheus_rules::{PrometheusRule, PrometheusRuleSpec, RuleGroup},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology, monitoring::AlertRule},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct KubePrometheusRuleScore {
|
||||||
|
pub sender: KubePrometheus,
|
||||||
|
pub rule: Box<dyn AlertRule<KubePrometheus>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for KubePrometheusRuleScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"KubePrometheusRuleScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
let name = self.rule.name();
|
||||||
|
let namespace = self.sender.config.lock().unwrap().namespace.clone();
|
||||||
|
let groups: Vec<RuleGroup> =
|
||||||
|
serde_json::from_value(self.rule.build_rule().expect("failed to build alert rule"))
|
||||||
|
.expect("failed to serialize rule group");
|
||||||
|
|
||||||
|
let prometheus_rule = PrometheusRule {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(name.clone()),
|
||||||
|
namespace: namespace.clone(),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
|
||||||
|
spec: PrometheusRuleSpec { groups },
|
||||||
|
};
|
||||||
|
K8sResourceScore::single(prometheus_rule, namespace).create_interpret()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,61 @@
|
|||||||
|
use kube::api::ObjectMeta;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::{
|
||||||
|
k8s::resource::K8sResourceScore,
|
||||||
|
monitoring::kube_prometheus::{
|
||||||
|
KubePrometheus,
|
||||||
|
crd::crd_scrape_config::{ScrapeConfig, ScrapeConfigSpec, StaticConfig},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology, monitoring::ScrapeTarget},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct KubePrometheusScrapeTargetScore {
|
||||||
|
pub sender: KubePrometheus,
|
||||||
|
pub scrape_target: Box<dyn ScrapeTarget<KubePrometheus>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for KubePrometheusScrapeTargetScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"KubePrometheusScrapeTargetScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
let name = self.scrape_target.name();
|
||||||
|
let namespace = self.sender.config.lock().unwrap().namespace.clone();
|
||||||
|
|
||||||
|
let external_target = self
|
||||||
|
.scrape_target
|
||||||
|
.build_scrape_target()
|
||||||
|
.expect("failed to build external scrape target");
|
||||||
|
|
||||||
|
//TODO this may need to modified to include a scrapeConfigSelector label from the
|
||||||
|
//prometheus operator
|
||||||
|
let labels = external_target.labels;
|
||||||
|
|
||||||
|
let scrape_target = ScrapeConfig {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(name.clone()),
|
||||||
|
namespace: namespace.clone(),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
spec: ScrapeConfigSpec {
|
||||||
|
static_configs: Some(vec![StaticConfig {
|
||||||
|
targets: vec![format!("{}:{}", external_target.ip, external_target.port)],
|
||||||
|
labels,
|
||||||
|
}]),
|
||||||
|
metrics_path: external_target.path,
|
||||||
|
scrape_interval: external_target.interval,
|
||||||
|
job_name: Some(name),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
K8sResourceScore::single(scrape_target, namespace).create_interpret()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -3,7 +3,7 @@ use std::collections::{BTreeMap, HashMap};
|
|||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use schemars::JsonSchema;
|
use schemars::JsonSchema;
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
use serde_yaml::{Mapping, Sequence, Value};
|
use serde_yaml::Value;
|
||||||
|
|
||||||
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::AlertManagerRuleGroup;
|
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::AlertManagerRuleGroup;
|
||||||
|
|
||||||
@@ -12,36 +12,6 @@ pub trait AlertChannelConfig {
|
|||||||
async fn get_config(&self) -> AlertManagerChannelConfig;
|
async fn get_config(&self) -> AlertManagerChannelConfig;
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct AlertManagerValues {
|
|
||||||
pub alertmanager: AlertManager,
|
|
||||||
}
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
#[serde(rename_all = "camelCase")]
|
|
||||||
pub struct AlertManager {
|
|
||||||
pub enabled: bool,
|
|
||||||
pub config: AlertManagerConfig,
|
|
||||||
pub alertmanager_spec: AlertManagerSpec,
|
|
||||||
pub init_config_reloader: ConfigReloader,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct ConfigReloader {
|
|
||||||
pub resources: Resources,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct AlertManagerConfig {
|
|
||||||
pub global: Mapping,
|
|
||||||
pub route: AlertManagerRoute,
|
|
||||||
pub receivers: Sequence,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct AlertManagerRoute {
|
|
||||||
pub routes: Sequence,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub struct AlertManagerChannelConfig {
|
pub struct AlertManagerChannelConfig {
|
||||||
///expecting an option that contains two values
|
///expecting an option that contains two values
|
||||||
@@ -52,20 +22,6 @@ pub struct AlertManagerChannelConfig {
|
|||||||
pub channel_receiver: Value,
|
pub channel_receiver: Value,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
#[serde(rename_all = "camelCase")]
|
|
||||||
pub struct AlertManagerSpec {
|
|
||||||
pub(crate) resources: Resources,
|
|
||||||
pub replicas: u32,
|
|
||||||
pub alert_manager_config_selector: AlertManagerConfigSelector,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
#[serde(rename_all = "camelCase")]
|
|
||||||
pub struct AlertManagerConfigSelector {
|
|
||||||
pub match_labels: BTreeMap<String, String>,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub struct Resources {
|
pub struct Resources {
|
||||||
pub limits: Limits,
|
pub limits: Limits,
|
||||||
|
|||||||
@@ -6,4 +6,5 @@ pub mod kube_prometheus;
|
|||||||
pub mod ntfy;
|
pub mod ntfy;
|
||||||
pub mod okd;
|
pub mod okd;
|
||||||
pub mod prometheus;
|
pub mod prometheus;
|
||||||
|
pub mod red_hat_cluster_observability;
|
||||||
pub mod scrape_target;
|
pub mod scrape_target;
|
||||||
|
|||||||
@@ -1,270 +0,0 @@
|
|||||||
use base64::prelude::*;
|
|
||||||
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use harmony_types::id::Id;
|
|
||||||
use kube::api::DynamicObject;
|
|
||||||
use log::{debug, info, trace};
|
|
||||||
use serde::Serialize;
|
|
||||||
|
|
||||||
use crate::{
|
|
||||||
data::Version,
|
|
||||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
|
||||||
inventory::Inventory,
|
|
||||||
modules::monitoring::okd::OpenshiftClusterAlertSender,
|
|
||||||
score::Score,
|
|
||||||
topology::{K8sclient, Topology, oberservability::monitoring::AlertReceiver},
|
|
||||||
};
|
|
||||||
|
|
||||||
impl Clone for Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
|
|
||||||
fn clone(&self) -> Self {
|
|
||||||
self.clone_box()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Serialize for Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
|
|
||||||
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
|
||||||
where
|
|
||||||
S: serde::Serializer,
|
|
||||||
{
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct OpenshiftClusterAlertScore {
|
|
||||||
pub receivers: Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<T: Topology + K8sclient> Score<T> for OpenshiftClusterAlertScore {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"ClusterAlertScore".to_string()
|
|
||||||
}
|
|
||||||
|
|
||||||
#[doc(hidden)]
|
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
|
||||||
Box::new(OpenshiftClusterAlertInterpret {
|
|
||||||
receivers: self.receivers.clone(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug)]
|
|
||||||
pub struct OpenshiftClusterAlertInterpret {
|
|
||||||
receivers: Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftClusterAlertInterpret {
|
|
||||||
async fn execute(
|
|
||||||
&self,
|
|
||||||
_inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
let client = topology.k8s_client().await?;
|
|
||||||
let openshift_monitoring_namespace = "openshift-monitoring";
|
|
||||||
|
|
||||||
let mut alertmanager_main_secret: DynamicObject = client
|
|
||||||
.get_secret_json_value("alertmanager-main", Some(openshift_monitoring_namespace))
|
|
||||||
.await?;
|
|
||||||
trace!("Got secret {alertmanager_main_secret:#?}");
|
|
||||||
|
|
||||||
let data: &mut serde_json::Value = &mut alertmanager_main_secret.data;
|
|
||||||
trace!("Alertmanager-main secret data {data:#?}");
|
|
||||||
let data_obj = data
|
|
||||||
.get_mut("data")
|
|
||||||
.ok_or(InterpretError::new(
|
|
||||||
"Missing 'data' field in alertmanager-main secret.".to_string(),
|
|
||||||
))?
|
|
||||||
.as_object_mut()
|
|
||||||
.ok_or(InterpretError::new(
|
|
||||||
"'data' field in alertmanager-main secret is expected to be an object ."
|
|
||||||
.to_string(),
|
|
||||||
))?;
|
|
||||||
|
|
||||||
let config_b64 = data_obj
|
|
||||||
.get("alertmanager.yaml")
|
|
||||||
.ok_or(InterpretError::new(
|
|
||||||
"Missing 'alertmanager.yaml' in alertmanager-main secret data".to_string(),
|
|
||||||
))?
|
|
||||||
.as_str()
|
|
||||||
.unwrap_or("");
|
|
||||||
trace!("Config base64 {config_b64}");
|
|
||||||
|
|
||||||
let config_bytes = BASE64_STANDARD.decode(config_b64).unwrap_or_default();
|
|
||||||
|
|
||||||
let mut am_config: serde_yaml::Value =
|
|
||||||
serde_yaml::from_str(&String::from_utf8(config_bytes).unwrap_or_default())
|
|
||||||
.unwrap_or_default();
|
|
||||||
|
|
||||||
debug!("Current alertmanager config {am_config:#?}");
|
|
||||||
|
|
||||||
let existing_receivers_sequence = if let Some(receivers) = am_config.get_mut("receivers") {
|
|
||||||
match receivers.as_sequence_mut() {
|
|
||||||
Some(seq) => seq,
|
|
||||||
None => {
|
|
||||||
return Err(InterpretError::new(format!(
|
|
||||||
"Expected alertmanager config receivers to be a sequence, got {:?}",
|
|
||||||
receivers
|
|
||||||
)));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
&mut serde_yaml::Sequence::default()
|
|
||||||
};
|
|
||||||
|
|
||||||
let mut additional_resources = vec![];
|
|
||||||
|
|
||||||
for custom_receiver in &self.receivers {
|
|
||||||
let name = custom_receiver.name();
|
|
||||||
let alertmanager_receiver = custom_receiver.as_alertmanager_receiver()?;
|
|
||||||
|
|
||||||
let receiver_json_value = alertmanager_receiver.receiver_config;
|
|
||||||
|
|
||||||
let receiver_yaml_string =
|
|
||||||
serde_json::to_string(&receiver_json_value).map_err(|e| {
|
|
||||||
InterpretError::new(format!("Failed to serialize receiver config: {}", e))
|
|
||||||
})?;
|
|
||||||
|
|
||||||
let receiver_yaml_value: serde_yaml::Value =
|
|
||||||
serde_yaml::from_str(&receiver_yaml_string).map_err(|e| {
|
|
||||||
InterpretError::new(format!("Failed to parse receiver config as YAML: {}", e))
|
|
||||||
})?;
|
|
||||||
|
|
||||||
if let Some(idx) = existing_receivers_sequence.iter().position(|r| {
|
|
||||||
r.get("name")
|
|
||||||
.and_then(|n| n.as_str())
|
|
||||||
.map_or(false, |n| n == name)
|
|
||||||
}) {
|
|
||||||
info!("Replacing existing AlertManager receiver: {}", name);
|
|
||||||
existing_receivers_sequence[idx] = receiver_yaml_value;
|
|
||||||
} else {
|
|
||||||
debug!("Adding new AlertManager receiver: {}", name);
|
|
||||||
existing_receivers_sequence.push(receiver_yaml_value);
|
|
||||||
}
|
|
||||||
|
|
||||||
additional_resources.push(alertmanager_receiver.additional_ressources);
|
|
||||||
}
|
|
||||||
|
|
||||||
let existing_route_mapping = if let Some(route) = am_config.get_mut("route") {
|
|
||||||
match route.as_mapping_mut() {
|
|
||||||
Some(map) => map,
|
|
||||||
None => {
|
|
||||||
return Err(InterpretError::new(format!(
|
|
||||||
"Expected alertmanager config route to be a mapping, got {:?}",
|
|
||||||
route
|
|
||||||
)));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
&mut serde_yaml::Mapping::default()
|
|
||||||
};
|
|
||||||
|
|
||||||
let existing_route_sequence = if let Some(routes) = existing_route_mapping.get_mut("routes")
|
|
||||||
{
|
|
||||||
match routes.as_sequence_mut() {
|
|
||||||
Some(seq) => seq,
|
|
||||||
None => {
|
|
||||||
return Err(InterpretError::new(format!(
|
|
||||||
"Expected alertmanager config routes to be a sequence, got {:?}",
|
|
||||||
routes
|
|
||||||
)));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
&mut serde_yaml::Sequence::default()
|
|
||||||
};
|
|
||||||
|
|
||||||
for custom_receiver in &self.receivers {
|
|
||||||
let name = custom_receiver.name();
|
|
||||||
let alertmanager_receiver = custom_receiver.as_alertmanager_receiver()?;
|
|
||||||
|
|
||||||
let route_json_value = alertmanager_receiver.route_config;
|
|
||||||
let route_yaml_string = serde_json::to_string(&route_json_value).map_err(|e| {
|
|
||||||
InterpretError::new(format!("Failed to serialize route config: {}", e))
|
|
||||||
})?;
|
|
||||||
|
|
||||||
let route_yaml_value: serde_yaml::Value = serde_yaml::from_str(&route_yaml_string)
|
|
||||||
.map_err(|e| {
|
|
||||||
InterpretError::new(format!("Failed to parse route config as YAML: {}", e))
|
|
||||||
})?;
|
|
||||||
|
|
||||||
if let Some(idy) = existing_route_sequence.iter().position(|r| {
|
|
||||||
r.get("receiver")
|
|
||||||
.and_then(|n| n.as_str())
|
|
||||||
.map_or(false, |n| n == name)
|
|
||||||
}) {
|
|
||||||
info!("Replacing existing AlertManager receiver: {}", name);
|
|
||||||
existing_route_sequence[idy] = route_yaml_value;
|
|
||||||
} else {
|
|
||||||
debug!("Adding new AlertManager receiver: {}", name);
|
|
||||||
existing_route_sequence.push(route_yaml_value);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
debug!("Current alertmanager config {am_config:#?}");
|
|
||||||
// TODO
|
|
||||||
// - save new version of alertmanager config
|
|
||||||
// - write additional ressources to the cluster
|
|
||||||
let am_config = serde_yaml::to_string(&am_config).map_err(|e| {
|
|
||||||
InterpretError::new(format!(
|
|
||||||
"Failed to serialize new alertmanager config to string : {e}"
|
|
||||||
))
|
|
||||||
})?;
|
|
||||||
|
|
||||||
let mut am_config_b64 = String::new();
|
|
||||||
BASE64_STANDARD.encode_string(am_config, &mut am_config_b64);
|
|
||||||
|
|
||||||
// TODO put update configmap value and save new value
|
|
||||||
data_obj.insert(
|
|
||||||
"alertmanager.yaml".to_string(),
|
|
||||||
serde_json::Value::String(am_config_b64),
|
|
||||||
);
|
|
||||||
|
|
||||||
// https://kubernetes.io/docs/reference/using-api/server-side-apply/#field-management
|
|
||||||
alertmanager_main_secret.metadata.managed_fields = None;
|
|
||||||
|
|
||||||
trace!("Applying new alertmanager_main_secret {alertmanager_main_secret:#?}");
|
|
||||||
client
|
|
||||||
.apply_dynamic(
|
|
||||||
&alertmanager_main_secret,
|
|
||||||
Some(openshift_monitoring_namespace),
|
|
||||||
true,
|
|
||||||
)
|
|
||||||
.await?;
|
|
||||||
|
|
||||||
let additional_resources = additional_resources.concat();
|
|
||||||
trace!("Applying additional ressources for alert receivers {additional_resources:#?}");
|
|
||||||
client
|
|
||||||
.apply_dynamic_many(
|
|
||||||
&additional_resources,
|
|
||||||
Some(openshift_monitoring_namespace),
|
|
||||||
true,
|
|
||||||
)
|
|
||||||
.await?;
|
|
||||||
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"Successfully configured {} cluster alert receivers: {}",
|
|
||||||
self.receivers.len(),
|
|
||||||
self.receivers
|
|
||||||
.iter()
|
|
||||||
.map(|r| r.name())
|
|
||||||
.collect::<Vec<_>>()
|
|
||||||
.join(", ")
|
|
||||||
)))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_name(&self) -> InterpretName {
|
|
||||||
InterpretName::Custom("OpenshiftClusterAlertInterpret")
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_version(&self) -> Version {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_status(&self) -> InterpretStatus {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_children(&self) -> Vec<Id> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
58
harmony/src/modules/monitoring/okd/crd/alerting_rules.rs
Normal file
58
harmony/src/modules/monitoring/okd/crd/alerting_rules.rs
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
use std::collections::BTreeMap;
|
||||||
|
|
||||||
|
use kube::CustomResource;
|
||||||
|
use schemars::JsonSchema;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
|
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
|
||||||
|
|
||||||
|
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone, JsonSchema, Default)]
|
||||||
|
#[kube(
|
||||||
|
group = "monitoring.openshift.io",
|
||||||
|
version = "v1",
|
||||||
|
kind = "AlertingRule",
|
||||||
|
plural = "alertingrules",
|
||||||
|
namespaced,
|
||||||
|
derive = "Default"
|
||||||
|
)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct AlertingRuleSpec {
|
||||||
|
pub groups: serde_json::Value,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
|
||||||
|
pub struct RuleGroup {
|
||||||
|
pub name: String,
|
||||||
|
pub rules: Vec<Rule>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct Rule {
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub alert: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub expr: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub for_: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub labels: Option<std::collections::BTreeMap<String, String>>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub annotations: Option<std::collections::BTreeMap<String, String>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl From<PrometheusAlertRule> for Rule {
|
||||||
|
fn from(value: PrometheusAlertRule) -> Self {
|
||||||
|
Rule {
|
||||||
|
alert: Some(value.alert),
|
||||||
|
expr: Some(value.expr),
|
||||||
|
for_: value.r#for,
|
||||||
|
labels: Some(value.labels.into_iter().collect::<BTreeMap<_, _>>()),
|
||||||
|
annotations: Some(value.annotations.into_iter().collect::<BTreeMap<_, _>>()),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
3
harmony/src/modules/monitoring/okd/crd/mod.rs
Normal file
3
harmony/src/modules/monitoring/okd/crd/mod.rs
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
pub mod alerting_rules;
|
||||||
|
pub mod scrape_target;
|
||||||
|
pub mod service_monitor;
|
||||||
72
harmony/src/modules/monitoring/okd/crd/scrape_target.rs
Normal file
72
harmony/src/modules/monitoring/okd/crd/scrape_target.rs
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
use kube::CustomResource;
|
||||||
|
use schemars::JsonSchema;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use std::collections::BTreeMap;
|
||||||
|
|
||||||
|
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone, JsonSchema, Default)]
|
||||||
|
#[kube(
|
||||||
|
group = "monitoring.coreos.com",
|
||||||
|
version = "v1alpha1",
|
||||||
|
kind = "ScrapeConfig",
|
||||||
|
plural = "scrapeconfigs",
|
||||||
|
namespaced,
|
||||||
|
derive = "Default"
|
||||||
|
)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct ScrapeConfigSpec {
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub job_name: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub metrics_path: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub scrape_interval: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub scrape_timeout: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub scheme: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub static_configs: Option<Vec<StaticConfig>>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub relabelings: Option<Vec<RelabelConfig>>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub metric_relabelings: Option<Vec<RelabelConfig>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct StaticConfig {
|
||||||
|
/// targets: ["host:port"]
|
||||||
|
pub targets: Vec<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub labels: Option<BTreeMap<String, String>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct RelabelConfig {
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub source_labels: Option<Vec<String>>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub separator: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub target_label: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub replacement: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub action: Option<String>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub regex: Option<String>,
|
||||||
|
}
|
||||||
97
harmony/src/modules/monitoring/okd/crd/service_monitor.rs
Normal file
97
harmony/src/modules/monitoring/okd/crd/service_monitor.rs
Normal file
@@ -0,0 +1,97 @@
|
|||||||
|
use kube::CustomResource;
|
||||||
|
use schemars::JsonSchema;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use std::collections::BTreeMap;
|
||||||
|
|
||||||
|
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone, JsonSchema, Default)]
|
||||||
|
#[kube(
|
||||||
|
group = "monitoring.coreos.com",
|
||||||
|
version = "v1",
|
||||||
|
kind = "ServiceMonitor",
|
||||||
|
plural = "servicemonitors",
|
||||||
|
namespaced,
|
||||||
|
derive = "Default"
|
||||||
|
)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct ServiceMonitorSpec {
|
||||||
|
/// The label to use to retrieve the job name from.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub job_label: Option<String>,
|
||||||
|
|
||||||
|
/// TargetLabels transfers labels on the Kubernetes Service onto the target.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub target_labels: Option<Vec<String>>,
|
||||||
|
|
||||||
|
/// PodTargetLabels transfers labels on the Kubernetes Pod onto the target.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub pod_target_labels: Option<Vec<String>>,
|
||||||
|
|
||||||
|
/// A list of endpoints allowed as part of this ServiceMonitor.
|
||||||
|
pub endpoints: Vec<Endpoint>,
|
||||||
|
|
||||||
|
/// Selector to select Endpoints objects.
|
||||||
|
pub selector: LabelSelector,
|
||||||
|
|
||||||
|
/// Selector to select which namespaces the Kubernetes Endpoints objects are discovered from.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub namespace_selector: Option<NamespaceSelector>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct Endpoint {
|
||||||
|
/// Name of the service port this endpoint refers to.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub port: Option<String>,
|
||||||
|
|
||||||
|
/// HTTP path to scrape for metrics.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub path: Option<String>,
|
||||||
|
|
||||||
|
/// HTTP scheme to use for scraping.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub scheme: Option<String>,
|
||||||
|
|
||||||
|
/// Interval at which metrics should be scraped.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub interval: Option<String>,
|
||||||
|
|
||||||
|
/// Timeout after which the scrape is ended.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub scrape_timeout: Option<String>,
|
||||||
|
|
||||||
|
/// HonorLabels chooses the metric's labels on collisions with target labels.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub honor_labels: Option<bool>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct LabelSelector {
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub match_labels: Option<BTreeMap<String, String>>,
|
||||||
|
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub match_expressions: Option<Vec<LabelSelectorRequirement>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct LabelSelectorRequirement {
|
||||||
|
pub key: String,
|
||||||
|
pub operator: String,
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub values: Option<Vec<String>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
|
||||||
|
#[serde(rename_all = "camelCase")]
|
||||||
|
pub struct NamespaceSelector {
|
||||||
|
/// Boolean describing whether all namespaces are selected in contrast to a list restricting them.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub any: Option<bool>,
|
||||||
|
|
||||||
|
/// List of namespace names.
|
||||||
|
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||||
|
pub match_names: Option<Vec<String>>,
|
||||||
|
}
|
||||||
@@ -1,60 +0,0 @@
|
|||||||
use crate::{
|
|
||||||
data::Version,
|
|
||||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
|
||||||
inventory::Inventory,
|
|
||||||
modules::monitoring::okd::config::Config,
|
|
||||||
score::Score,
|
|
||||||
topology::{K8sclient, Topology},
|
|
||||||
};
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use harmony_types::id::Id;
|
|
||||||
use serde::Serialize;
|
|
||||||
|
|
||||||
#[derive(Clone, Debug, Serialize)]
|
|
||||||
pub struct OpenshiftUserWorkloadMonitoring {}
|
|
||||||
|
|
||||||
impl<T: Topology + K8sclient> Score<T> for OpenshiftUserWorkloadMonitoring {
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"OpenshiftUserWorkloadMonitoringScore".to_string()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
|
||||||
Box::new(OpenshiftUserWorkloadMonitoringInterpret {})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Clone, Debug, Serialize)]
|
|
||||||
pub struct OpenshiftUserWorkloadMonitoringInterpret {}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftUserWorkloadMonitoringInterpret {
|
|
||||||
async fn execute(
|
|
||||||
&self,
|
|
||||||
_inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
let client = topology.k8s_client().await.unwrap();
|
|
||||||
Config::create_cluster_monitoring_config_cm(&client).await?;
|
|
||||||
Config::create_user_workload_monitoring_config_cm(&client).await?;
|
|
||||||
Config::verify_user_workload(&client).await?;
|
|
||||||
Ok(Outcome::success(
|
|
||||||
"successfully enabled user-workload-monitoring".to_string(),
|
|
||||||
))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_name(&self) -> InterpretName {
|
|
||||||
InterpretName::Custom("OpenshiftUserWorkloadMonitoring")
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_version(&self) -> Version {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_status(&self) -> InterpretStatus {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_children(&self) -> Vec<Id> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,10 +1,17 @@
|
|||||||
use crate::topology::oberservability::monitoring::AlertSender;
|
use serde::Serialize;
|
||||||
|
|
||||||
pub mod cluster_monitoring;
|
use crate::topology::monitoring::{AlertReceiver, AlertRule, AlertSender, ScrapeTarget};
|
||||||
pub(crate) mod config;
|
|
||||||
pub mod enable_user_workload;
|
|
||||||
|
|
||||||
#[derive(Debug)]
|
pub mod crd;
|
||||||
|
pub mod openshift_cluster_alerting_score;
|
||||||
|
pub mod score_enable_cluster_monitoring;
|
||||||
|
pub mod score_openshift_alert_rule;
|
||||||
|
pub mod score_openshift_receiver;
|
||||||
|
pub mod score_openshift_scrape_target;
|
||||||
|
pub mod score_user_workload;
|
||||||
|
pub mod score_verify_user_workload_monitoring;
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub struct OpenshiftClusterAlertSender;
|
pub struct OpenshiftClusterAlertSender;
|
||||||
|
|
||||||
impl AlertSender for OpenshiftClusterAlertSender {
|
impl AlertSender for OpenshiftClusterAlertSender {
|
||||||
@@ -12,3 +19,30 @@ impl AlertSender for OpenshiftClusterAlertSender {
|
|||||||
"OpenshiftClusterAlertSender".to_string()
|
"OpenshiftClusterAlertSender".to_string()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn AlertRule<OpenshiftClusterAlertSender>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Serialize for Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>> {
|
||||||
|
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
|
||||||
|
where
|
||||||
|
S: serde::Serializer,
|
||||||
|
{
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -0,0 +1,36 @@
|
|||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::monitoring::okd::OpenshiftClusterAlertSender,
|
||||||
|
score::Score,
|
||||||
|
topology::{
|
||||||
|
Topology,
|
||||||
|
monitoring::{AlertReceiver, AlertRule, AlertingInterpret, Observability, ScrapeTarget},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct OpenshiftClusterAlertScore {
|
||||||
|
pub sender: OpenshiftClusterAlertSender,
|
||||||
|
pub receivers: Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
|
||||||
|
pub rules: Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>>,
|
||||||
|
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T>
|
||||||
|
for OpenshiftClusterAlertScore
|
||||||
|
{
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"OpenshiftClusterAlertScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(AlertingInterpret {
|
||||||
|
sender: OpenshiftClusterAlertSender,
|
||||||
|
receivers: self.receivers.clone(),
|
||||||
|
rules: self.rules.clone(),
|
||||||
|
scrape_targets: self.scrape_targets.clone(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,150 @@
|
|||||||
|
use std::{collections::BTreeMap, sync::Arc};
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use harmony_k8s::K8sClient;
|
||||||
|
use harmony_types::id::Id;
|
||||||
|
use k8s_openapi::api::core::v1::ConfigMap;
|
||||||
|
use kube::api::{GroupVersionKind, ObjectMeta};
|
||||||
|
use log::debug;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
data::Version,
|
||||||
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::k8s::resource::K8sResourceScore,
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct OpenshiftEnableClusterMonitoringScore {}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for OpenshiftEnableClusterMonitoringScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"OpenshiftClusterMonitoringScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(OpenshiftEnableClusterMonitoringInterpret {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct OpenshiftEnableClusterMonitoringInterpret {}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftEnableClusterMonitoringInterpret {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
let namespace = "openshift-monitoring".to_string();
|
||||||
|
let name = "cluster-monitoring-config".to_string();
|
||||||
|
let client = topology.k8s_client().await?;
|
||||||
|
let enabled = self
|
||||||
|
.check_cluster_monitoring_enabled(client, &name, &namespace)
|
||||||
|
.await
|
||||||
|
.map_err(|e| InterpretError::new(e))?;
|
||||||
|
|
||||||
|
debug!("enabled {:#?}", enabled);
|
||||||
|
|
||||||
|
match enabled {
|
||||||
|
true => Ok(Outcome::success(
|
||||||
|
"Openshift Cluster Monitoring already enabled".to_string(),
|
||||||
|
)),
|
||||||
|
false => {
|
||||||
|
let mut data = BTreeMap::new();
|
||||||
|
data.insert(
|
||||||
|
"config.yaml".to_string(),
|
||||||
|
r#"
|
||||||
|
enableUserWorkload: true
|
||||||
|
alertmanagerMain:
|
||||||
|
enableUserAlertmanagerConfig: true
|
||||||
|
"#
|
||||||
|
.to_string(),
|
||||||
|
);
|
||||||
|
|
||||||
|
let cm = ConfigMap {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(name),
|
||||||
|
namespace: Some(namespace.clone()),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
data: Some(data),
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
K8sResourceScore::single(cm, Some(namespace))
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, topology)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
Ok(Outcome::success(
|
||||||
|
"Successfully enabled Openshift Cluster Monitoring".to_string(),
|
||||||
|
))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
InterpretName::Custom("OpenshiftEnableClusterMonitoringInterpret")
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<Id> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl OpenshiftEnableClusterMonitoringInterpret {
|
||||||
|
async fn check_cluster_monitoring_enabled(
|
||||||
|
&self,
|
||||||
|
client: Arc<K8sClient>,
|
||||||
|
name: &str,
|
||||||
|
namespace: &str,
|
||||||
|
) -> Result<bool, String> {
|
||||||
|
let gvk = GroupVersionKind {
|
||||||
|
group: "".to_string(),
|
||||||
|
version: "v1".to_string(),
|
||||||
|
kind: "ConfigMap".to_string(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let cm = match client
|
||||||
|
.get_resource_json_value(name, Some(namespace), &gvk)
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(obj) => obj,
|
||||||
|
Err(_) => return Ok(false),
|
||||||
|
};
|
||||||
|
|
||||||
|
debug!("{:#?}", cm.data.pointer("/data/config.yaml"));
|
||||||
|
let config_yaml_str = match cm
|
||||||
|
.data
|
||||||
|
.pointer("/data/config.yaml")
|
||||||
|
.and_then(|v| v.as_str())
|
||||||
|
{
|
||||||
|
Some(s) => s,
|
||||||
|
None => return Ok(false),
|
||||||
|
};
|
||||||
|
|
||||||
|
debug!("{:#?}", config_yaml_str);
|
||||||
|
let parsed_config: serde_yaml::Value = serde_yaml::from_str(config_yaml_str)
|
||||||
|
.map_err(|e| format!("Failed to parse nested YAML: {}", e))?;
|
||||||
|
|
||||||
|
let enabled = parsed_config
|
||||||
|
.get("enableUserWorkload")
|
||||||
|
.and_then(|v| v.as_bool())
|
||||||
|
.unwrap_or(false);
|
||||||
|
|
||||||
|
debug!("{:#?}", enabled);
|
||||||
|
Ok(enabled)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,42 @@
|
|||||||
|
use kube::api::ObjectMeta;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
interpret::Interpret,
|
||||||
|
modules::{
|
||||||
|
k8s::resource::K8sResourceScore,
|
||||||
|
monitoring::okd::{
|
||||||
|
OpenshiftClusterAlertSender,
|
||||||
|
crd::alerting_rules::{AlertingRule, AlertingRuleSpec},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology, monitoring::AlertRule},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct OpenshiftAlertRuleScore {
|
||||||
|
pub rule: Box<dyn AlertRule<OpenshiftClusterAlertSender>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for OpenshiftAlertRuleScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"OpenshiftAlertingRuleScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
let namespace = "openshift-monitoring".to_string();
|
||||||
|
let alerting_rule = AlertingRule {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(self.rule.name()),
|
||||||
|
namespace: Some(namespace.clone()),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
spec: AlertingRuleSpec {
|
||||||
|
groups: self.rule.build_rule().unwrap(),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
K8sResourceScore::single(alerting_rule, Some(namespace)).create_interpret()
|
||||||
|
}
|
||||||
|
}
|
||||||
213
harmony/src/modules/monitoring/okd/score_openshift_receiver.rs
Normal file
213
harmony/src/modules/monitoring/okd/score_openshift_receiver.rs
Normal file
@@ -0,0 +1,213 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
use base64::{Engine as _, prelude::BASE64_STANDARD};
|
||||||
|
use harmony_types::id::Id;
|
||||||
|
use kube::api::DynamicObject;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
data::Version,
|
||||||
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::monitoring::okd::OpenshiftClusterAlertSender,
|
||||||
|
score::Score,
|
||||||
|
topology::{
|
||||||
|
K8sclient, Topology,
|
||||||
|
monitoring::{AlertReceiver, AlertRoute, MatchOp},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct OpenshiftReceiverScore {
|
||||||
|
pub receiver: Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for OpenshiftReceiverScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"OpenshiftAlertReceiverScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(OpenshiftReceiverInterpret {
|
||||||
|
receiver: self.receiver.clone(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct OpenshiftReceiverInterpret {
|
||||||
|
receiver: Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftReceiverInterpret {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
_inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
let client = topology.k8s_client().await?;
|
||||||
|
let ns = "openshift-monitoring";
|
||||||
|
|
||||||
|
let mut am_secret: DynamicObject = client
|
||||||
|
.get_secret_json_value("alertmanager-main", Some(ns))
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let data = am_secret
|
||||||
|
.data
|
||||||
|
.get_mut("data")
|
||||||
|
.ok_or_else(|| {
|
||||||
|
InterpretError::new("Missing 'data' field in alertmanager-main secret".into())
|
||||||
|
})?
|
||||||
|
.as_object_mut()
|
||||||
|
.ok_or_else(|| InterpretError::new("'data' field must be a JSON object".into()))?;
|
||||||
|
|
||||||
|
let config_b64 = data
|
||||||
|
.get("alertmanager.yaml")
|
||||||
|
.and_then(|v| v.as_str())
|
||||||
|
.unwrap_or_default();
|
||||||
|
|
||||||
|
let config_bytes = BASE64_STANDARD.decode(config_b64).unwrap_or_default();
|
||||||
|
|
||||||
|
let mut am_config: serde_yaml::Value = serde_yaml::from_slice(&config_bytes)
|
||||||
|
.unwrap_or_else(|_| serde_yaml::Value::Mapping(serde_yaml::Mapping::new()));
|
||||||
|
|
||||||
|
let name = self.receiver.name();
|
||||||
|
let install_plan = self.receiver.build().expect("failed to build install plan");
|
||||||
|
let receiver = install_plan.receiver.expect("unable to find receiver path");
|
||||||
|
|
||||||
|
let alert_route = install_plan
|
||||||
|
.route
|
||||||
|
.ok_or_else(|| InterpretError::new("missing route".into()))?;
|
||||||
|
|
||||||
|
let route = self.serialize_route(&alert_route);
|
||||||
|
|
||||||
|
let route = serde_yaml::to_value(route).map_err(|e| InterpretError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
if am_config.get("receivers").is_none() {
|
||||||
|
am_config["receivers"] = serde_yaml::Value::Sequence(vec![]);
|
||||||
|
}
|
||||||
|
if am_config.get("route").is_none() {
|
||||||
|
am_config["route"] = serde_yaml::Value::Mapping(serde_yaml::Mapping::new());
|
||||||
|
}
|
||||||
|
if am_config["route"].get("routes").is_none() {
|
||||||
|
am_config["route"]["routes"] = serde_yaml::Value::Sequence(vec![]);
|
||||||
|
}
|
||||||
|
|
||||||
|
{
|
||||||
|
let receivers_seq = am_config["receivers"].as_sequence_mut().unwrap();
|
||||||
|
if let Some(idx) = receivers_seq
|
||||||
|
.iter()
|
||||||
|
.position(|r| r.get("name").and_then(|n| n.as_str()) == Some(&name))
|
||||||
|
{
|
||||||
|
receivers_seq[idx] = receiver;
|
||||||
|
} else {
|
||||||
|
receivers_seq.push(receiver);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
{
|
||||||
|
let route_seq = am_config["route"]["routes"].as_sequence_mut().unwrap();
|
||||||
|
if let Some(idx) = route_seq
|
||||||
|
.iter()
|
||||||
|
.position(|r| r.get("receiver").and_then(|n| n.as_str()) == Some(&name))
|
||||||
|
{
|
||||||
|
route_seq[idx] = route;
|
||||||
|
} else {
|
||||||
|
route_seq.push(route);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let yaml_str =
|
||||||
|
serde_yaml::to_string(&am_config).map_err(|e| InterpretError::new(e.to_string()))?;
|
||||||
|
|
||||||
|
let mut yaml_b64 = String::new();
|
||||||
|
|
||||||
|
BASE64_STANDARD.encode_string(yaml_str, &mut yaml_b64);
|
||||||
|
data.insert(
|
||||||
|
"alertmanager.yaml".to_string(),
|
||||||
|
serde_json::Value::String(yaml_b64),
|
||||||
|
);
|
||||||
|
am_secret.metadata.managed_fields = None;
|
||||||
|
|
||||||
|
client.apply_dynamic(&am_secret, Some(ns), true).await?;
|
||||||
|
|
||||||
|
Ok(Outcome::success(format!(
|
||||||
|
"Configured OpenShift cluster alert receiver: {}",
|
||||||
|
name
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
InterpretName::Custom("OpenshiftAlertReceiverInterpret")
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<Id> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
impl OpenshiftReceiverInterpret {
|
||||||
|
fn serialize_route(&self, route: &AlertRoute) -> serde_yaml::Value {
|
||||||
|
// Convert matchers
|
||||||
|
let matchers: Vec<String> = route
|
||||||
|
.matchers
|
||||||
|
.iter()
|
||||||
|
.map(|m| match m.operator {
|
||||||
|
MatchOp::Eq => format!("{} = {}", m.label, m.value),
|
||||||
|
MatchOp::NotEq => format!("{} != {}", m.label, m.value),
|
||||||
|
MatchOp::Regex => format!("{} =~ {}", m.label, m.value),
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
// Recursively convert children routes
|
||||||
|
let children: Vec<serde_yaml::Value> = route
|
||||||
|
.children
|
||||||
|
.iter()
|
||||||
|
.map(|c| self.serialize_route(c))
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
// Build the YAML object for this route
|
||||||
|
let mut route_map = serde_yaml::Mapping::new();
|
||||||
|
route_map.insert(
|
||||||
|
serde_yaml::Value::String("receiver".to_string()),
|
||||||
|
serde_yaml::Value::String(route.receiver.clone()),
|
||||||
|
);
|
||||||
|
if !matchers.is_empty() {
|
||||||
|
route_map.insert(
|
||||||
|
serde_yaml::Value::String("matchers".to_string()),
|
||||||
|
serde_yaml::to_value(matchers).unwrap(),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if !route.group_by.is_empty() {
|
||||||
|
route_map.insert(
|
||||||
|
serde_yaml::Value::String("group_by".to_string()),
|
||||||
|
serde_yaml::to_value(route.group_by.clone()).unwrap(),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if let Some(ref interval) = route.repeat_interval {
|
||||||
|
route_map.insert(
|
||||||
|
serde_yaml::Value::String("repeat_interval".to_string()),
|
||||||
|
serde_yaml::Value::String(interval.clone()),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
route_map.insert(
|
||||||
|
serde_yaml::Value::String("continue".to_string()),
|
||||||
|
serde_yaml::Value::Bool(route.continue_matching),
|
||||||
|
);
|
||||||
|
if !children.is_empty() {
|
||||||
|
route_map.insert(
|
||||||
|
serde_yaml::Value::String("routes".to_string()),
|
||||||
|
serde_yaml::Value::Sequence(children),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
serde_yaml::Value::Mapping(route_map)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,188 @@
|
|||||||
|
use std::collections::BTreeMap;
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use harmony_types::id::Id;
|
||||||
|
use k8s_openapi::{
|
||||||
|
api::core::v1::{
|
||||||
|
EndpointAddress, EndpointPort, EndpointSubset, Endpoints, Service, ServicePort, ServiceSpec,
|
||||||
|
},
|
||||||
|
apimachinery::pkg::util::intstr::IntOrString,
|
||||||
|
};
|
||||||
|
use kube::api::ObjectMeta;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
data::Version,
|
||||||
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::{
|
||||||
|
k8s::resource::K8sResourceScore,
|
||||||
|
monitoring::okd::{
|
||||||
|
OpenshiftClusterAlertSender,
|
||||||
|
crd::service_monitor::{Endpoint, LabelSelector, ServiceMonitor, ServiceMonitorSpec},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
score::Score,
|
||||||
|
topology::{
|
||||||
|
K8sclient, Topology,
|
||||||
|
monitoring::{ExternalScrapeTarget, ScrapeTarget},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct OpenshiftScrapeTargetScore {
|
||||||
|
pub scrape_target: Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for OpenshiftScrapeTargetScore {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"OpenshiftAlertingRuleScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(OpenshiftScrapeTargetInterpret {
|
||||||
|
scrape_target: self.scrape_target.clone(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct OpenshiftScrapeTargetInterpret {
|
||||||
|
scrape_target: Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftScrapeTargetInterpret {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
let namespace = "openshift-monitoring".to_string();
|
||||||
|
let name = self.scrape_target.name();
|
||||||
|
let external_target = self
|
||||||
|
.scrape_target
|
||||||
|
.build_scrape_target()
|
||||||
|
.expect("failed to build scrape target ExternalScrapeTarget");
|
||||||
|
|
||||||
|
let (service, endpoints, service_monitor) =
|
||||||
|
self.to_k8s_resources(&name, &namespace, external_target);
|
||||||
|
|
||||||
|
K8sResourceScore::single(service, Some(namespace.clone()))
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, topology)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
K8sResourceScore::single(endpoints, Some(namespace.clone()))
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, topology)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
K8sResourceScore::single(service_monitor, Some(namespace.clone()))
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, topology)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
Ok(Outcome::success(
|
||||||
|
"Installed scrape target of Openshift".to_string(),
|
||||||
|
))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
InterpretName::Custom("OpenshiftScrapeTargetInterpret")
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<Id> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl OpenshiftScrapeTargetInterpret {
|
||||||
|
/// Maps the generic intent into the 3 required Kubernetes objects
|
||||||
|
pub fn to_k8s_resources(
|
||||||
|
&self,
|
||||||
|
name: &str,
|
||||||
|
namespace: &str,
|
||||||
|
external_target: ExternalScrapeTarget,
|
||||||
|
) -> (Service, Endpoints, ServiceMonitor) {
|
||||||
|
let mut labels = external_target.labels.clone().unwrap_or(BTreeMap::new());
|
||||||
|
|
||||||
|
labels.insert("harmony/target-name".to_string(), name.to_string().clone());
|
||||||
|
|
||||||
|
let service = Service {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(name.to_string().clone()),
|
||||||
|
namespace: Some(namespace.to_string()),
|
||||||
|
labels: Some(labels.clone()),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
spec: Some(ServiceSpec {
|
||||||
|
cluster_ip: Some("None".to_string()), // Headless
|
||||||
|
ports: Some(vec![ServicePort {
|
||||||
|
name: Some("metrics".to_string()),
|
||||||
|
port: external_target.port.clone(),
|
||||||
|
target_port: Some(IntOrString::Int(external_target.port)),
|
||||||
|
..Default::default()
|
||||||
|
}]),
|
||||||
|
..Default::default()
|
||||||
|
}),
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
|
||||||
|
let endpoints = Endpoints {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(name.to_string().clone()),
|
||||||
|
namespace: Some(namespace.to_string()),
|
||||||
|
labels: Some(labels.clone()),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
subsets: Some(vec![EndpointSubset {
|
||||||
|
addresses: Some(vec![EndpointAddress {
|
||||||
|
ip: external_target.ip.to_string().clone(),
|
||||||
|
..Default::default()
|
||||||
|
}]),
|
||||||
|
ports: Some(vec![EndpointPort {
|
||||||
|
name: Some("metrics".to_string()),
|
||||||
|
port: external_target.port,
|
||||||
|
..Default::default()
|
||||||
|
}]),
|
||||||
|
..Default::default()
|
||||||
|
}]),
|
||||||
|
};
|
||||||
|
|
||||||
|
let service_monitor = ServiceMonitor {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some(name.to_string().clone()),
|
||||||
|
namespace: Some(namespace.to_string()),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
spec: ServiceMonitorSpec {
|
||||||
|
job_label: Some("harmony/target-name".to_string()),
|
||||||
|
endpoints: vec![Endpoint {
|
||||||
|
port: Some("metrics".to_string()),
|
||||||
|
interval: external_target.interval.clone(),
|
||||||
|
path: external_target.path.clone(),
|
||||||
|
..Default::default()
|
||||||
|
}],
|
||||||
|
selector: LabelSelector {
|
||||||
|
match_labels: Some(BTreeMap::from([(
|
||||||
|
"harmony/target-name".to_string(),
|
||||||
|
name.to_string().clone(),
|
||||||
|
)])),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
(service, endpoints, service_monitor)
|
||||||
|
}
|
||||||
|
}
|
||||||
158
harmony/src/modules/monitoring/okd/score_user_workload.rs
Normal file
158
harmony/src/modules/monitoring/okd/score_user_workload.rs
Normal file
@@ -0,0 +1,158 @@
|
|||||||
|
use std::{collections::BTreeMap, sync::Arc};
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
data::Version,
|
||||||
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
|
inventory::Inventory,
|
||||||
|
modules::k8s::resource::K8sResourceScore,
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology},
|
||||||
|
};
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use harmony_k8s::K8sClient;
|
||||||
|
use harmony_types::id::Id;
|
||||||
|
use k8s_openapi::api::core::v1::ConfigMap;
|
||||||
|
use kube::api::{GroupVersionKind, ObjectMeta};
|
||||||
|
use log::debug;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct OpenshiftUserWorkloadMonitoring {}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for OpenshiftUserWorkloadMonitoring {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"OpenshiftUserWorkloadMonitoringScore".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(OpenshiftUserWorkloadMonitoringInterpret {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct OpenshiftUserWorkloadMonitoringInterpret {}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftUserWorkloadMonitoringInterpret {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
let namespace = "openshift-user-workload-monitoring".to_string();
|
||||||
|
let cm_name = "user-workload-monitoring-config".to_string();
|
||||||
|
let client = topology.k8s_client().await?;
|
||||||
|
|
||||||
|
let cm_enabled = self
|
||||||
|
.check_cluster_user_workload_monitoring_enabled(client, &cm_name, &namespace)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
match cm_enabled {
|
||||||
|
true => Ok(Outcome::success(
|
||||||
|
"OpenshiftUserWorkloadMonitoringEnabled".to_string(),
|
||||||
|
)),
|
||||||
|
false => {
|
||||||
|
let mut data = BTreeMap::new();
|
||||||
|
data.insert(
|
||||||
|
"config.yaml".to_string(),
|
||||||
|
r#"
|
||||||
|
alertmanager:
|
||||||
|
enabled: true
|
||||||
|
enableAlertmanagerConfig: true
|
||||||
|
"#
|
||||||
|
.to_string(),
|
||||||
|
);
|
||||||
|
|
||||||
|
let cm = ConfigMap {
|
||||||
|
metadata: ObjectMeta {
|
||||||
|
name: Some("user-workload-monitoring-config".to_string()),
|
||||||
|
namespace: Some(namespace.clone()),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
data: Some(data),
|
||||||
|
..Default::default()
|
||||||
|
};
|
||||||
|
|
||||||
|
K8sResourceScore::single(cm, Some(namespace))
|
||||||
|
.create_interpret()
|
||||||
|
.execute(inventory, topology)
|
||||||
|
.await
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
InterpretName::Custom("OpenshiftUserWorkloadMonitoringInterpret")
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<Id> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl OpenshiftUserWorkloadMonitoringInterpret {
|
||||||
|
async fn check_cluster_user_workload_monitoring_enabled(
|
||||||
|
&self,
|
||||||
|
client: Arc<K8sClient>,
|
||||||
|
name: &str,
|
||||||
|
namespace: &str,
|
||||||
|
) -> Result<bool, String> {
|
||||||
|
let gvk = GroupVersionKind {
|
||||||
|
group: "".to_string(),
|
||||||
|
version: "v1".to_string(),
|
||||||
|
kind: "ConfigMap".to_string(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let cm = match client
|
||||||
|
.get_resource_json_value(name, Some(namespace), &gvk)
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(obj) => obj,
|
||||||
|
Err(_) => return Ok(false), // CM doesn't exist? Treat as disabled.
|
||||||
|
};
|
||||||
|
|
||||||
|
debug!("{:#?}", cm.data.pointer("/data/config.yaml"));
|
||||||
|
let config_yaml_str = match cm
|
||||||
|
.data
|
||||||
|
.pointer("/data/config.yaml")
|
||||||
|
.and_then(|v| v.as_str())
|
||||||
|
{
|
||||||
|
Some(s) => s,
|
||||||
|
None => return Ok(false), // Key missing? Treat as disabled.
|
||||||
|
};
|
||||||
|
|
||||||
|
debug!("{:#?}", config_yaml_str);
|
||||||
|
let parsed_config: serde_yaml::Value = serde_yaml::from_str(config_yaml_str)
|
||||||
|
.map_err(|e| format!("Failed to parse nested YAML: {}", e))?;
|
||||||
|
|
||||||
|
let alert_manager_enabled = parsed_config
|
||||||
|
.get("alertmanager")
|
||||||
|
.and_then(|a| a.get("enableAlertmanagerConfig"))
|
||||||
|
.and_then(|v| v.as_bool())
|
||||||
|
.unwrap_or(false);
|
||||||
|
|
||||||
|
debug!("alertmanagerenabled: {:#?}", alert_manager_enabled);
|
||||||
|
|
||||||
|
let enabled = parsed_config
|
||||||
|
.get("alertmanager")
|
||||||
|
.and_then(|enabled| enabled.get("enabled"))
|
||||||
|
.and_then(|v| v.as_bool())
|
||||||
|
.unwrap_or(false);
|
||||||
|
|
||||||
|
debug!("user workload monitoring enabled: {:#?}", enabled);
|
||||||
|
|
||||||
|
if alert_manager_enabled && enabled == true {
|
||||||
|
Ok(true)
|
||||||
|
} else {
|
||||||
|
Ok(false)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,70 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
use harmony_types::id::Id;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
data::Version,
|
||||||
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
|
inventory::Inventory,
|
||||||
|
score::Score,
|
||||||
|
topology::{K8sclient, Topology},
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct VerifyUserWorkload {}
|
||||||
|
|
||||||
|
impl<T: Topology + K8sclient> Score<T> for VerifyUserWorkload {
|
||||||
|
fn name(&self) -> String {
|
||||||
|
"VerifyUserWorkload".to_string()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
|
Box::new(VerifyUserWorkloadInterpret {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct VerifyUserWorkloadInterpret {}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<T: Topology + K8sclient> Interpret<T> for VerifyUserWorkloadInterpret {
|
||||||
|
async fn execute(
|
||||||
|
&self,
|
||||||
|
_inventory: &Inventory,
|
||||||
|
topology: &T,
|
||||||
|
) -> Result<Outcome, InterpretError> {
|
||||||
|
let client = topology.k8s_client().await?;
|
||||||
|
let namespace = "openshift-user-workload-monitoring";
|
||||||
|
let alertmanager_name = "alertmanager-user-workload-0";
|
||||||
|
let prometheus_name = "prometheus-user-workload-0";
|
||||||
|
|
||||||
|
client
|
||||||
|
.wait_for_pod_ready(alertmanager_name, Some(namespace))
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
client
|
||||||
|
.wait_for_pod_ready(prometheus_name, Some(namespace))
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
Ok(Outcome::success(format!(
|
||||||
|
"pods: {}, {} ready in ns: {}",
|
||||||
|
alertmanager_name, prometheus_name, namespace
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_name(&self) -> InterpretName {
|
||||||
|
InterpretName::Custom("VerifyUserWorkloadInterpret")
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_version(&self) -> Version {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_status(&self) -> InterpretStatus {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn get_children(&self) -> Vec<Id> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -1 +1,2 @@
|
|||||||
|
pub mod prometheus_config;
|
||||||
pub mod prometheus_helm;
|
pub mod prometheus_helm;
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user