Compare commits

..

24 Commits

Author SHA1 Message Date
304490977c wip: vllm example
Some checks failed
Run Check Script / check (pull_request) Failing after 36s
2026-03-23 08:40:29 -04:00
8499f4d1b7 Merge pull request 'fix: small details were preventing to re-save frontends,backends and healthchecks in opnsense UI' (#248) from fix/load-balancer-xml into master
Some checks failed
Run Check Script / check (push) Has been cancelled
Compile and package harmony_composer / package_harmony_composer (push) Has been cancelled
Reviewed-on: #248
2026-03-17 14:38:35 +00:00
aa07f4c8ad Merge pull request 'fix/dynamically_get_public_domain' (#234) from fix/dynamically_get_public_domain into master
Some checks failed
Compile and package harmony_composer / package_harmony_composer (push) Failing after 1m48s
Run Check Script / check (push) Failing after 11m1s
Reviewed-on: #234
Reviewed-by: johnride <jg@nationtech.io>
2026-03-15 14:07:25 +00:00
77bb138497 Merge remote-tracking branch 'origin/master' into fix/dynamically_get_public_domain
All checks were successful
Run Check Script / check (pull_request) Successful in 1m20s
2026-03-15 09:54:36 -04:00
a16879b1b6 Merge pull request 'fix: readded tokio retry to get ca cert for a nats cluster which was accidentally removed during a refactor' (#229) from fix/nats-ca-cert-retry into master
Some checks failed
Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m1s
Run Check Script / check (push) Failing after 12m23s
Reviewed-on: #229
2026-03-15 12:36:05 +00:00
f57e6f5957 Merge pull request 'feat: add priorityClass to node_health daemonset' (#249) from feat/health_endpoint_priority_class into master
Some checks failed
Run Check Script / check (push) Successful in 1m28s
Compile and package harmony_composer / package_harmony_composer (push) Failing after 1m59s
Reviewed-on: #249
2026-03-14 18:53:30 +00:00
7605d05de3 fix: opnsense fixes for st-mcd (cb1)
All checks were successful
Run Check Script / check (pull_request) Successful in 1m29s
2026-03-13 13:13:37 -04:00
b244127843 feat: add priorityClass to node_health daemonset
All checks were successful
Run Check Script / check (pull_request) Successful in 1m27s
2026-03-13 11:18:18 -04:00
67c3265286 fix: small details were preventing to re-save frontends,backends and healthchecks in opnsense UI
All checks were successful
Run Check Script / check (pull_request) Successful in 2m12s
2026-03-13 10:31:17 -04:00
d10598d01e Merge pull request 'okdload balancer using 1936 port http healthcheck' (#240) from feat/okd_loadbalancer_betterhealthcheck into master
Some checks failed
Run Check Script / check (push) Successful in 1m26s
Compile and package harmony_composer / package_harmony_composer (push) Failing after 1m56s
Reviewed-on: #240
2026-03-10 17:45:51 +00:00
61ba7257d0 fix: remove broken test
All checks were successful
Run Check Script / check (pull_request) Successful in 1m22s
2026-03-10 13:40:24 -04:00
b0e9594d92 Merge branch 'master' into feat/okd_loadbalancer_betterhealthcheck
Some checks failed
Run Check Script / check (pull_request) Failing after 47s
2026-03-07 23:06:50 +00:00
bfb86f63ce fix: xml field for vlan
All checks were successful
Run Check Script / check (pull_request) Successful in 1m31s
2026-03-07 11:29:44 -05:00
d920de34cf fix: configure health_check: None for public_services
All checks were successful
Run Check Script / check (pull_request) Successful in 1m35s
2026-03-05 14:55:00 -05:00
4276b9137b fix: put the hc on private_services, not public_services
All checks were successful
Run Check Script / check (pull_request) Successful in 1m32s
2026-03-05 14:35:33 -05:00
6ab88ab8d9 Merge branch 'master' into feat/okd_loadbalancer_betterhealthcheck 2026-03-04 10:46:57 -05:00
53d0704a35 wip: okdload balancer using 1936 port http healthcheck
Some checks failed
Run Check Script / check (pull_request) Failing after 31s
2026-03-02 20:47:41 -05:00
d8ab9d52a4 fix:broken test
All checks were successful
Run Check Script / check (pull_request) Successful in 1m0s
2026-02-17 15:34:42 -05:00
2cb7aeefc0 fix: deploys replicated postgresql with site 2 as standby
Some checks failed
Run Check Script / check (pull_request) Failing after 1m8s
2026-02-17 15:02:00 -05:00
16016febcf wip: adding impl details for deploying connected replica cluster 2026-02-16 16:22:30 -05:00
e709de531d fix: added route building to failover topology 2026-02-13 16:08:05 -05:00
6ab0f3a6ab wip 2026-02-13 15:48:24 -05:00
724ab0b888 wip: removed hardcoding and added fn to trait tlsrouter 2026-02-13 15:18:23 -05:00
8b6ce8d069 fix: readded tokio retry to get ca cert for a nats cluster which was accidentally removed during a refactor
All checks were successful
Run Check Script / check (pull_request) Successful in 1m9s
2026-02-06 09:09:01 -05:00
150 changed files with 3751 additions and 5760 deletions

12
Cargo.lock generated
View File

@@ -7001,6 +7001,18 @@ version = "0.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
[[package]]
name = "vllm"
version = "0.1.0"
dependencies = [
"env_logger",
"harmony",
"harmony_cli",
"k8s-openapi",
"log",
"tokio",
]
[[package]]
name = "wait-timeout"
version = "0.2.1"

View File

@@ -18,7 +18,7 @@ members = [
"adr/agent_discovery/mdns",
"brocade",
"harmony_agent",
"harmony_agent/deploy", "harmony_node_readiness", "harmony-k8s",
"harmony_agent/deploy", "harmony_node_readiness", "harmony-k8s", "examples/vllm",
]
[workspace.package]

View File

@@ -1,318 +0,0 @@
# Architecture Decision Record: Monitoring and Alerting Architecture
Initial Author: Willem Rolleman, Jean-Gabriel Carrier
Initial Date: March 9, 2026
Last Updated Date: March 9, 2026
## Status
Accepted
Supersedes: [ADR-010](010-monitoring-and-alerting.md)
## Context
Harmony needs a unified approach to monitoring and alerting across different infrastructure targets:
1. **Cluster-level monitoring**: Administrators managing entire Kubernetes/OKD clusters need to define cluster-wide alerts, receivers, and scrape targets.
2. **Tenant-level monitoring**: Multi-tenant clusters where teams are confined to namespaces need monitoring scoped to their resources.
3. **Application-level monitoring**: Developers deploying applications want zero-config monitoring that "just works" for their services.
The monitoring landscape is fragmented:
- **OKD/OpenShift**: Built-in Prometheus with AlertmanagerConfig CRDs
- **KubePrometheus**: Helm-based stack with PrometheusRule CRDs
- **RHOB (Red Hat Observability)**: Operator-based with MonitoringStack CRDs
- **Standalone Prometheus**: Raw Prometheus deployments
Each system has different CRDs, different installation methods, and different configuration APIs.
## Decision
We implement a **trait-based architecture with compile-time capability verification** that provides:
1. **Type-safe abstractions** via parameterized traits: `AlertReceiver<S>`, `AlertRule<S>`, `ScrapeTarget<S>`
2. **Compile-time topology compatibility** via the `Observability<S>` capability bound
3. **Three levels of abstraction**: Cluster, Tenant, and Application monitoring
4. **Pre-built alert rules** as functions that return typed structs
### Core Traits
```rust
// domain/topology/monitoring.rs
/// Marker trait for systems that send alerts (Prometheus, etc.)
pub trait AlertSender: Send + Sync + std::fmt::Debug {
fn name(&self) -> String;
}
/// Defines how a receiver (Discord, Slack, etc.) builds its configuration
/// for a specific sender type
pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
fn name(&self) -> String;
fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
}
/// Defines how an alert rule builds its PrometheusRule configuration
pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
fn name(&self) -> String;
fn clone_box(&self) -> Box<dyn AlertRule<S>>;
}
/// Capability that topologies implement to support monitoring
pub trait Observability<S: AlertSender> {
async fn install_alert_sender(&self, sender: &S, inventory: &Inventory)
-> Result<PreparationOutcome, PreparationError>;
async fn install_receivers(&self, sender: &S, inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>) -> Result<...>;
async fn install_rules(&self, sender: &S, inventory: &Inventory,
rules: Option<Vec<Box<dyn AlertRule<S>>>>) -> Result<...>;
async fn add_scrape_targets(&self, sender: &S, inventory: &Inventory,
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>) -> Result<...>;
async fn ensure_monitoring_installed(&self, sender: &S, inventory: &Inventory)
-> Result<...>;
}
```
### Alert Sender Types
Each monitoring stack is a distinct `AlertSender`:
| Sender | Module | Use Case |
|--------|--------|----------|
| `OpenshiftClusterAlertSender` | `monitoring/okd/` | OKD/OpenShift built-in monitoring |
| `KubePrometheus` | `monitoring/kube_prometheus/` | Helm-deployed kube-prometheus-stack |
| `Prometheus` | `monitoring/prometheus/` | Standalone Prometheus via Helm |
| `RedHatClusterObservability` | `monitoring/red_hat_cluster_observability/` | RHOB operator |
| `Grafana` | `monitoring/grafana/` | Grafana-managed alerting |
### Three Levels of Monitoring
#### 1. Cluster-Level Monitoring
For cluster administrators. Full control over monitoring infrastructure.
```rust
// examples/okd_cluster_alerts/src/main.rs
OpenshiftClusterAlertScore {
sender: OpenshiftClusterAlertSender,
receivers: vec![Box::new(DiscordReceiver { ... })],
rules: vec![Box::new(alert_rules)],
scrape_targets: Some(vec![Box::new(external_exporters)]),
}
```
**Characteristics:**
- Cluster-scoped CRDs and resources
- Can add external scrape targets (outside cluster)
- Manages Alertmanager configuration
- Requires cluster-admin privileges
#### 2. Tenant-Level Monitoring
For teams confined to namespaces. The topology determines tenant context.
```rust
// The topology's Observability impl handles namespace scoping
impl Observability<KubePrometheus> for K8sAnywhereTopology {
async fn install_rules(&self, sender: &KubePrometheus, ...) {
// Topology knows if it's tenant-scoped
let namespace = self.get_tenant_config().await
.map(|t| t.name)
.unwrap_or("default");
// Install rules in tenant namespace
}
}
```
**Characteristics:**
- Namespace-scoped resources
- Cannot modify cluster-level monitoring config
- May have restricted receiver types
- Runtime validation of permissions (cannot be fully compile-time)
#### 3. Application-Level Monitoring
For developers. Zero-config, opinionated monitoring.
```rust
// modules/application/features/monitoring.rs
pub struct Monitoring {
pub application: Arc<dyn Application>,
pub alert_receiver: Vec<Box<dyn AlertReceiver<Prometheus>>>,
}
impl<T: Topology + Observability<Prometheus> + TenantManager + ...>
ApplicationFeature<T> for Monitoring
{
async fn ensure_installed(&self, topology: &T) -> Result<...> {
// Auto-creates ServiceMonitor
// Auto-installs Ntfy for notifications
// Handles tenant namespace automatically
// Wires up sensible defaults
}
}
```
**Characteristics:**
- Automatic ServiceMonitor creation
- Opinionated notification channel (Ntfy)
- Tenant-aware via topology
- Minimal configuration required
## Rationale
### Why Generic Traits Instead of Unified Types?
Each monitoring stack (OKD, KubePrometheus, RHOB) has fundamentally different CRDs:
```rust
// OKD uses AlertmanagerConfig with different structure
AlertmanagerConfig { spec: { receivers: [...] } }
// RHOB uses secret references for webhook URLs
MonitoringStack { spec: { alertmanagerConfig: { discordConfigs: [{ apiURL: { key: "..." } }] } } }
// KubePrometheus uses Alertmanager CRD with different field names
Alertmanager { spec: { config: { receivers: [...] } } }
```
A unified type would either:
1. Be a lowest-common-denominator (loses stack-specific features)
2. Be a complex union type (hard to use, easy to misconfigure)
Generic traits let each stack express its configuration naturally while providing a consistent interface.
### Why Compile-Time Capability Bounds?
```rust
impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T>
for OpenshiftClusterAlertScore { ... }
```
This fails at compile time if you try to use `OpenshiftClusterAlertScore` with a topology that doesn't support OKD monitoring. This prevents the "config-is-valid-but-platform-is-wrong" errors that Harmony was designed to eliminate.
### Why Not a MonitoringStack Abstraction (V2 Approach)?
The V2 approach proposed a unified `MonitoringStack` that hides sender selection:
```rust
// V2 approach - rejected
MonitoringStack::new(MonitoringApiVersion::V2CRD)
.add_alert_channel(discord)
```
**Problems:**
1. Hides which sender you're using, losing compile-time guarantees
2. "Version selection" actually chooses between fundamentally different systems
3. Would need to handle all stack-specific features through a generic interface
The current approach is explicit: you choose `OpenshiftClusterAlertSender` and the compiler verifies compatibility.
### Why Runtime Validation for Tenants?
Tenant confinement is determined at runtime by the topology and K8s RBAC. We cannot know at compile time whether a user has cluster-admin or namespace-only access.
Options considered:
1. **Compile-time tenant markers** - Would require modeling entire RBAC hierarchy in types. Over-engineering.
2. **Runtime validation** - Current approach. Fails with clear K8s permission errors if insufficient access.
3. **No tenant support** - Would exclude a major use case.
Runtime validation is the pragmatic choice. The failure mode is clear (K8s API error) and occurs early in execution.
> Note : we will eventually have compile time validation for such things. Rust macros are powerful and we could discover the actual capabilities we're dealing with, similar to sqlx approach in query! macros.
## Consequences
### Pros
1. **Type Safety**: Invalid configurations are caught at compile time
2. **Extensibility**: Adding a new monitoring stack requires implementing traits, not modifying core code
3. **Clear Separation**: Cluster/Tenant/Application levels have distinct entry points
4. **Reusable Rules**: Pre-built alert rules as functions (`high_pvc_fill_rate_over_two_days()`)
5. **CRD Accuracy**: Type definitions match actual Kubernetes CRDs exactly
### Cons
1. **Implementation Explosion**: `DiscordReceiver` implements `AlertReceiver<S>` for each sender type (3+ implementations)
2. **Learning Curve**: Understanding the trait hierarchy takes time
3. **clone_box Boilerplate**: Required for trait object cloning (3 lines per impl)
### Mitigations
- Implementation explosion is contained: each receiver type has O(senders) implementations, but receivers are rare compared to rules
- Learning curve is documented with examples at each level
- clone_box boilerplate is minimal and copy-paste
## Alternatives Considered
### Unified MonitoringStack Type
See "Why Not a MonitoringStack Abstraction" above. Rejected for losing compile-time safety.
### Helm-Only Approach
Use `HelmScore` directly for each monitoring deployment. Rejected because:
- No type safety for alert rules
- Cannot compose with application features
- No tenant awareness
### Separate Modules Per Use Case
Have `cluster_monitoring/`, `tenant_monitoring/`, `app_monitoring/` as separate modules. Rejected because:
- Massive code duplication
- No shared abstraction for receivers/rules
- Adding a feature requires three implementations
## Implementation Notes
### Module Structure
```
modules/monitoring/
├── mod.rs # Public exports
├── alert_channel/ # Receivers (Discord, Webhook)
├── alert_rule/ # Rules and pre-built alerts
│ ├── prometheus_alert_rule.rs
│ └── alerts/ # Library of pre-built rules
│ ├── k8s/ # K8s-specific (pvc, pod, memory)
│ └── infra/ # Infrastructure (opnsense, dell)
├── okd/ # OpenshiftClusterAlertSender
├── kube_prometheus/ # KubePrometheus
├── prometheus/ # Prometheus
├── red_hat_cluster_observability/ # RHOB
├── grafana/ # Grafana
├── application_monitoring/ # Application-level scores
└── scrape_target/ # External scrape targets
```
### Adding a New Alert Sender
1. Create sender type: `pub struct MySender; impl AlertSender for MySender { ... }`
2. Implement `Observability<MySender>` for topologies that support it
3. Create CRD types in `crd/` subdirectory
4. Implement `AlertReceiver<MySender>` for existing receivers
5. Implement `AlertRule<MySender>` for `AlertManagerRuleGroup`
### Adding a New Alert Rule
```rust
pub fn my_custom_alert() -> PrometheusAlertRule {
PrometheusAlertRule::new("MyAlert", "up == 0")
.for_duration("5m")
.label("severity", "critical")
.annotation("summary", "Service is down")
}
```
No trait implementation needed - `AlertManagerRuleGroup` already handles conversion.
## Related ADRs
- [ADR-013](013-monitoring-notifications.md): Notification channel selection (ntfy)
- [ADR-011](011-multi-tenant-cluster.md): Multi-tenant cluster architecture

View File

@@ -1,21 +0,0 @@
[package]
name = "example-monitoring-v2"
edition = "2024"
version.workspace = true
readme.workspace = true
license.workspace = true
[dependencies]
harmony = { path = "../../harmony" }
harmony_cli = { path = "../../harmony_cli" }
harmony-k8s = { path = "../../harmony-k8s" }
harmony_types = { path = "../../harmony_types" }
kube = { workspace = true }
schemars = "0.8"
serde = { workspace = true, features = ["derive"] }
serde_json = { workspace = true }
serde_yaml = { workspace = true }
url = { workspace = true }
log = { workspace = true }
async-trait = { workspace = true }
k8s-openapi = { workspace = true }

View File

@@ -1,91 +0,0 @@
# Monitoring v2 - Improved Architecture
This example demonstrates the improved monitoring architecture that addresses the "WTF/minute" issues in the original design.
## Key Improvements
### 1. **Single AlertChannel Trait with Generic Sender**
The original design required 9-12 implementations for each alert channel (Discord, Webhook, etc.) - one for each sender type. The new design uses a single trait with generic sender parameterization:
pub trait AlertChannel<Sender: AlertSender> {
async fn install_config(&self, sender: &Sender) -> Result<Outcome, InterpretError>;
fn name(&self) -> String;
fn as_any(&self) -> &dyn std::any::Any;
}
**Benefits:**
- One Discord implementation works with all sender types
- Type safety at compile time
- No runtime dispatch overhead
### 2. **MonitoringStack Abstraction**
Instead of manually selecting CRDPrometheus vs KubePrometheus vs RHOBObservability, you now have a unified MonitoringStack that handles versioning:
let monitoring_stack = MonitoringStack::new(MonitoringApiVersion::V2CRD)
.set_namespace("monitoring")
.add_alert_channel(discord_receiver)
.set_scrape_targets(vec![...]);
**Benefits:**
- Single source of truth for monitoring configuration
- Easy to switch between monitoring versions
- Automatic version-specific configuration
### 3. **TenantMonitoringScore - True Composition**
The original monitoring_with_tenant example just put tenant and monitoring as separate items in a vec. The new design truly composes them:
let tenant_score = TenantMonitoringScore::new("test-tenant", monitoring_stack);
This creates a single score that:
- Has tenant context
- Has monitoring configuration
- Automatically installs monitoring scoped to tenant namespace
**Benefits:**
- No more "two separate things" confusion
- Automatic tenant namespace scoping
- Clear ownership: tenant owns its monitoring
### 4. **Versioned Monitoring APIs**
Clear versioning makes it obvious which monitoring stack you're using:
pub enum MonitoringApiVersion {
V1Helm, // Old Helm charts
V2CRD, // Current CRDs
V3RHOB, // RHOB (future)
}
**Benefits:**
- No guessing which API version you're using
- Easy to migrate between versions
- Backward compatibility path
## Comparison
### Original Design (monitoring_with_tenant)
- Manual selection of each component
- Manual installation of both components
- Need to remember to pass both to harmony_cli::run
- Monitoring not scoped to tenant automatically
### New Design (monitoring_v2)
- Single composed score
- One score does it all
## Usage
cd examples/monitoring_v2
cargo run
## Migration Path
To migrate from the old design to the new:
1. Replace individual alert channel implementations with AlertChannel<Sender>
2. Use MonitoringStack instead of manual *Prometheus selection
3. Use TenantMonitoringScore instead of separate TenantScore + monitoring scores
4. Select monitoring version via MonitoringApiVersion

View File

@@ -1,343 +0,0 @@
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use log::debug;
use serde::{Deserialize, Serialize};
use serde_yaml::{Mapping, Value};
use harmony::data::Version;
use harmony::interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome};
use harmony::inventory::Inventory;
use harmony::score::Score;
use harmony::topology::{Topology, tenant::TenantManager};
use harmony_k8s::K8sClient;
use harmony_types::k8s_name::K8sName;
use harmony_types::net::Url;
pub trait AlertSender: Send + Sync + std::fmt::Debug {
fn name(&self) -> String;
fn namespace(&self) -> String;
}
#[derive(Debug)]
pub struct CRDPrometheus {
pub namespace: String,
pub client: Arc<K8sClient>,
}
impl AlertSender for CRDPrometheus {
fn name(&self) -> String {
"CRDPrometheus".to_string()
}
fn namespace(&self) -> String {
self.namespace.clone()
}
}
#[derive(Debug)]
pub struct RHOBObservability {
pub namespace: String,
pub client: Arc<K8sClient>,
}
impl AlertSender for RHOBObservability {
fn name(&self) -> String {
"RHOBObservability".to_string()
}
fn namespace(&self) -> String {
self.namespace.clone()
}
}
#[derive(Debug)]
pub struct KubePrometheus {
pub config: Arc<Mutex<KubePrometheusConfig>>,
}
impl Default for KubePrometheus {
fn default() -> Self {
Self::new()
}
}
impl KubePrometheus {
pub fn new() -> Self {
Self {
config: Arc::new(Mutex::new(KubePrometheusConfig::new())),
}
}
}
impl AlertSender for KubePrometheus {
fn name(&self) -> String {
"KubePrometheus".to_string()
}
fn namespace(&self) -> String {
self.config.lock().unwrap().namespace.clone().unwrap_or_else(|| "monitoring".to_string())
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct KubePrometheusConfig {
pub namespace: Option<String>,
#[serde(skip)]
pub alert_receiver_configs: Vec<AlertManagerChannelConfig>,
}
impl KubePrometheusConfig {
pub fn new() -> Self {
Self {
namespace: None,
alert_receiver_configs: Vec::new(),
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AlertManagerChannelConfig {
pub channel_receiver: serde_yaml::Value,
pub channel_route: serde_yaml::Value,
}
impl Default for AlertManagerChannelConfig {
fn default() -> Self {
Self {
channel_receiver: serde_yaml::Value::Mapping(Default::default()),
channel_route: serde_yaml::Value::Mapping(Default::default()),
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ScrapeTargetConfig {
pub service_name: String,
pub port: String,
pub path: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum MonitoringApiVersion {
V1Helm,
V2CRD,
V3RHOB,
}
#[derive(Debug, Clone)]
pub struct MonitoringStack {
pub version: MonitoringApiVersion,
pub namespace: String,
pub alert_channels: Vec<Arc<dyn AlertSender>>,
pub scrape_targets: Vec<ScrapeTargetConfig>,
}
impl MonitoringStack {
pub fn new(version: MonitoringApiVersion) -> Self {
Self {
version,
namespace: "monitoring".to_string(),
alert_channels: Vec::new(),
scrape_targets: Vec::new(),
}
}
pub fn set_namespace(mut self, namespace: &str) -> Self {
self.namespace = namespace.to_string();
self
}
pub fn add_alert_channel(mut self, channel: impl AlertSender + 'static) -> Self {
self.alert_channels.push(Arc::new(channel));
self
}
pub fn set_scrape_targets(mut self, targets: Vec<(&str, &str, String)>) -> Self {
self.scrape_targets = targets
.into_iter()
.map(|(name, port, path)| ScrapeTargetConfig {
service_name: name.to_string(),
port: port.to_string(),
path,
})
.collect();
self
}
}
pub trait AlertChannel<Sender: AlertSender> {
fn install_config(&self, sender: &Sender);
fn name(&self) -> String;
}
#[derive(Debug, Clone)]
pub struct DiscordWebhook {
pub name: K8sName,
pub url: Url,
pub selectors: Vec<HashMap<String, String>>,
}
impl DiscordWebhook {
fn get_config(&self) -> AlertManagerChannelConfig {
let mut route = Mapping::new();
route.insert(
Value::String("receiver".to_string()),
Value::String(self.name.to_string()),
);
route.insert(
Value::String("matchers".to_string()),
Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
);
let mut receiver = Mapping::new();
receiver.insert(
Value::String("name".to_string()),
Value::String(self.name.to_string()),
);
let mut discord_config = Mapping::new();
discord_config.insert(
Value::String("webhook_url".to_string()),
Value::String(self.url.to_string()),
);
receiver.insert(
Value::String("discord_configs".to_string()),
Value::Sequence(vec![Value::Mapping(discord_config)]),
);
AlertManagerChannelConfig {
channel_receiver: Value::Mapping(receiver),
channel_route: Value::Mapping(route),
}
}
}
impl AlertChannel<CRDPrometheus> for DiscordWebhook {
fn install_config(&self, sender: &CRDPrometheus) {
debug!("Installing Discord webhook for CRDPrometheus in namespace: {}", sender.namespace());
debug!("Config: {:?}", self.get_config());
debug!("Installed!");
}
fn name(&self) -> String {
"discord-webhook".to_string()
}
}
impl AlertChannel<RHOBObservability> for DiscordWebhook {
fn install_config(&self, sender: &RHOBObservability) {
debug!("Installing Discord webhook for RHOBObservability in namespace: {}", sender.namespace());
debug!("Config: {:?}", self.get_config());
debug!("Installed!");
}
fn name(&self) -> String {
"webhook-receiver".to_string()
}
}
impl AlertChannel<KubePrometheus> for DiscordWebhook {
fn install_config(&self, sender: &KubePrometheus) {
debug!("Installing Discord webhook for KubePrometheus in namespace: {}", sender.namespace());
let config = sender.config.lock().unwrap();
let ns = config.namespace.clone().unwrap_or_else(|| "monitoring".to_string());
debug!("Namespace: {}", ns);
let mut config = sender.config.lock().unwrap();
config.alert_receiver_configs.push(self.get_config());
debug!("Installed!");
}
fn name(&self) -> String {
"discord-webhook".to_string()
}
}
fn default_monitoring_stack() -> MonitoringStack {
MonitoringStack::new(MonitoringApiVersion::V2CRD)
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TenantMonitoringScore {
pub tenant_id: harmony_types::id::Id,
pub tenant_name: String,
#[serde(skip)]
#[serde(default = "default_monitoring_stack")]
pub monitoring_stack: MonitoringStack,
}
impl TenantMonitoringScore {
pub fn new(tenant_name: &str, monitoring_stack: MonitoringStack) -> Self {
Self {
tenant_id: harmony_types::id::Id::default(),
tenant_name: tenant_name.to_string(),
monitoring_stack,
}
}
}
impl<T: Topology + TenantManager> Score<T> for TenantMonitoringScore {
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(TenantMonitoringInterpret {
score: self.clone(),
})
}
fn name(&self) -> String {
format!("{} monitoring [TenantMonitoringScore]", self.tenant_name)
}
}
#[derive(Debug)]
pub struct TenantMonitoringInterpret {
pub score: TenantMonitoringScore,
}
#[async_trait::async_trait]
impl<T: Topology + TenantManager> Interpret<T> for TenantMonitoringInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let tenant_config = topology.get_tenant_config().await.unwrap();
let tenant_ns = tenant_config.name.clone();
match self.score.monitoring_stack.version {
MonitoringApiVersion::V1Helm => {
debug!("Installing Helm monitoring for tenant {}", tenant_ns);
}
MonitoringApiVersion::V2CRD => {
debug!("Installing CRD monitoring for tenant {}", tenant_ns);
}
MonitoringApiVersion::V3RHOB => {
debug!("Installing RHOB monitoring for tenant {}", tenant_ns);
}
}
Ok(Outcome::success(format!(
"Installed monitoring stack for tenant {} with version {:?}",
self.score.tenant_name,
self.score.monitoring_stack.version
)))
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("TenantMonitoringInterpret")
}
fn get_version(&self) -> Version {
Version::from("1.0.0").unwrap()
}
fn get_status(&self) -> InterpretStatus {
InterpretStatus::SUCCESS
}
fn get_children(&self) -> Vec<harmony_types::id::Id> {
Vec::new()
}
}

View File

@@ -31,16 +31,3 @@ Ready to build your own components? These guides show you how.
- [**Writing a Score**](./guides/writing-a-score.md): Learn how to create your own `Score` and `Interpret` logic to define a new desired state.
- [**Writing a Topology**](./guides/writing-a-topology.md): Learn how to model a new environment (like AWS, GCP, or custom hardware) as a `Topology`.
- [**Adding Capabilities**](./guides/adding-capabilities.md): See how to add a `Capability` to your custom `Topology`.
- [**Coding Guide**](./coding-guide.md): Conventions and best practices for writing Harmony code.
## 5. Module Documentation
Deep dives into specific Harmony modules and features.
- [**Monitoring and Alerting**](./monitoring.md): Comprehensive guide to cluster, tenant, and application-level monitoring with support for OKD, KubePrometheus, RHOB, and more.
## 6. Architecture Decision Records
Important architectural decisions are documented in the `adr/` directory:
- [Full ADR Index](../adr/)

View File

@@ -1,443 +0,0 @@
# Monitoring and Alerting in Harmony
Harmony provides a unified, type-safe approach to monitoring and alerting across Kubernetes, OpenShift, and bare-metal infrastructure. This guide explains the architecture and how to use it at different levels of abstraction.
## Overview
Harmony's monitoring module supports three distinct use cases:
| Level | Who Uses It | What It Provides |
|-------|-------------|------------------|
| **Cluster** | Cluster administrators | Full control over monitoring stack, cluster-wide alerts, external scrape targets |
| **Tenant** | Platform teams | Namespace-scoped monitoring in multi-tenant environments |
| **Application** | Application developers | Zero-config monitoring that "just works" |
Each level builds on the same underlying abstractions, ensuring consistency while providing appropriate complexity for each audience.
## Core Concepts
### AlertSender
An `AlertSender` represents the system that evaluates alert rules and sends notifications. Harmony supports multiple monitoring stacks:
| Sender | Description | Use When |
|--------|-------------|----------|
| `OpenshiftClusterAlertSender` | OKD/OpenShift built-in monitoring | Running on OKD/OpenShift |
| `KubePrometheus` | kube-prometheus-stack via Helm | Standard Kubernetes, need full stack |
| `Prometheus` | Standalone Prometheus | Custom Prometheus deployment |
| `RedHatClusterObservability` | RHOB operator | Red Hat managed clusters |
| `Grafana` | Grafana-managed alerting | Grafana as primary alerting layer |
### AlertReceiver
An `AlertReceiver` defines where alerts are sent (Discord, Slack, email, webhook, etc.). Receivers are parameterized by sender type because each monitoring stack has different configuration formats.
```rust
pub trait AlertReceiver<S: AlertSender> {
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
fn name(&self) -> String;
}
```
Built-in receivers:
- `DiscordReceiver` - Discord webhooks
- `WebhookReceiver` - Generic HTTP webhooks
### AlertRule
An `AlertRule` defines a Prometheus alert expression. Rules are also parameterized by sender to handle different CRD formats.
```rust
pub trait AlertRule<S: AlertSender> {
fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
fn name(&self) -> String;
}
```
### Observability Capability
Topologies implement `Observability<S>` to indicate they support a specific alert sender:
```rust
impl Observability<OpenshiftClusterAlertSender> for K8sAnywhereTopology {
async fn install_receivers(&self, sender, inventory, receivers) { ... }
async fn install_rules(&self, sender, inventory, rules) { ... }
// ...
}
```
This provides **compile-time verification**: if you try to use `OpenshiftClusterAlertScore` with a topology that doesn't implement `Observability<OpenshiftClusterAlertSender>`, the code won't compile.
---
## Level 1: Cluster Monitoring
Cluster monitoring is for administrators who need full control over the monitoring infrastructure. This includes:
- Installing/managing the monitoring stack
- Configuring cluster-wide alert receivers
- Defining cluster-level alert rules
- Adding external scrape targets (e.g., bare-metal servers, firewalls)
### Example: OKD Cluster Alerts
```rust
use harmony::{
modules::monitoring::{
alert_channel::discord_alert_channel::DiscordReceiver,
alert_rule::{alerts::k8s::pvc::high_pvc_fill_rate_over_two_days, prometheus_alert_rule::AlertManagerRuleGroup},
okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
},
topology::{K8sAnywhereTopology, monitoring::{AlertMatcher, AlertRoute, MatchOp}},
};
let severity_matcher = AlertMatcher {
label: "severity".to_string(),
operator: MatchOp::Eq,
value: "critical".to_string(),
};
let rule_group = AlertManagerRuleGroup::new(
"cluster-rules",
vec![high_pvc_fill_rate_over_two_days()],
);
let external_exporter = PrometheusNodeExporter {
job_name: "firewall".to_string(),
metrics_path: "/metrics".to_string(),
listen_address: ip!("192.168.1.1"),
port: 9100,
..Default::default()
};
harmony_cli::run(
Inventory::autoload(),
K8sAnywhereTopology::from_env(),
vec![Box::new(OpenshiftClusterAlertScore {
sender: OpenshiftClusterAlertSender,
receivers: vec![Box::new(DiscordReceiver {
name: "critical-alerts".to_string(),
url: hurl!("https://discord.com/api/webhooks/..."),
route: AlertRoute {
matchers: vec![severity_matcher],
..AlertRoute::default("critical-alerts".to_string())
},
})],
rules: vec![Box::new(rule_group)],
scrape_targets: Some(vec![Box::new(external_exporter)]),
})],
None,
).await?;
```
### What This Does
1. **Enables cluster monitoring** - Activates OKD's built-in Prometheus
2. **Enables user workload monitoring** - Allows namespace-scoped rules
3. **Configures Alertmanager** - Adds Discord receiver with route matching
4. **Deploys alert rules** - Creates `AlertingRule` CRD with PVC fill rate alert
5. **Adds external scrape target** - Configures Prometheus to scrape the firewall
### Compile-Time Safety
The `OpenshiftClusterAlertScore` requires:
```rust
impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T>
for OpenshiftClusterAlertScore
```
If `K8sAnywhereTopology` didn't implement `Observability<OpenshiftClusterAlertSender>`, this code would fail to compile. You cannot accidentally deploy OKD alerts to a cluster that doesn't support them.
---
## Level 2: Tenant Monitoring
In multi-tenant clusters, teams are often confined to specific namespaces. Tenant monitoring adapts to this constraint:
- Resources are deployed in the tenant's namespace
- Cannot modify cluster-level monitoring configuration
- The topology determines namespace context at runtime
### How It Works
The topology's `Observability` implementation handles tenant scoping:
```rust
impl Observability<KubePrometheus> for K8sAnywhereTopology {
async fn install_rules(&self, sender, inventory, rules) {
// Topology knows if it's tenant-scoped
let namespace = self.get_tenant_config().await
.map(|t| t.name)
.unwrap_or_else(|| "monitoring".to_string());
// Rules are installed in the appropriate namespace
for rule in rules.unwrap_or_default() {
let score = KubePrometheusRuleScore {
sender: sender.clone(),
rule,
namespace: namespace.clone(), // Tenant namespace
};
score.create_interpret().execute(inventory, self).await?;
}
}
}
```
### Tenant vs Cluster Resources
| Resource | Cluster-Level | Tenant-Level |
|----------|---------------|--------------|
| Alertmanager config | Global receivers | Namespaced receivers (where supported) |
| PrometheusRules | Cluster-wide alerts | Namespace alerts only |
| ServiceMonitors | Any namespace | Own namespace only |
| External scrape targets | Can add | Cannot add (cluster config) |
### Runtime Validation
Tenant constraints are validated at runtime via Kubernetes RBAC. If a tenant-scoped deployment attempts cluster-level operations, it fails with a clear permission error from the Kubernetes API.
This cannot be fully compile-time because tenant context is determined by who's running the code and what permissions they have—information only available at runtime.
---
## Level 3: Application Monitoring
Application monitoring provides zero-config, opinionated monitoring for developers. Just add the `Monitoring` feature to your application and it works.
### Example
```rust
use harmony::modules::{
application::{Application, ApplicationFeature},
monitoring::alert_channel::webhook_receiver::WebhookReceiver,
};
// Define your application
let my_app = MyApplication::new();
// Add monitoring as a feature
let monitoring = Monitoring {
application: Arc::new(my_app),
alert_receiver: vec![], // Uses defaults
};
// Install with the application
my_app.add_feature(monitoring);
```
### What Application Monitoring Provides
1. **Automatic ServiceMonitor** - Creates a ServiceMonitor for your application's pods
2. **Ntfy Notification Channel** - Auto-installs and configures Ntfy for push notifications
3. **Tenant Awareness** - Automatically scopes to the correct namespace
4. **Sensible Defaults** - Pre-configured alert routes and receivers
### Under the Hood
```rust
impl<T: Topology + Observability<Prometheus> + TenantManager>
ApplicationFeature<T> for Monitoring
{
async fn ensure_installed(&self, topology: &T) -> Result<...> {
// 1. Get tenant namespace (or use app name)
let namespace = topology.get_tenant_config().await
.map(|ns| ns.name.clone())
.unwrap_or_else(|| self.application.name());
// 2. Create ServiceMonitor for the app
let app_service_monitor = ServiceMonitor {
metadata: ObjectMeta {
name: Some(self.application.name()),
namespace: Some(namespace.clone()),
..Default::default()
},
spec: ServiceMonitorSpec::default(),
};
// 3. Install Ntfy for notifications
let ntfy = NtfyScore { namespace, host };
ntfy.interpret(&Inventory::empty(), topology).await?;
// 4. Wire up webhook receiver to Ntfy
let ntfy_receiver = WebhookReceiver { ... };
// 5. Execute monitoring score
alerting_score.interpret(&Inventory::empty(), topology).await?;
}
}
```
---
## Pre-Built Alert Rules
Harmony provides a library of common alert rules in `modules/monitoring/alert_rule/alerts/`:
### Kubernetes Alerts (`alerts/k8s/`)
```rust
use harmony::modules::monitoring::alert_rule::alerts::k8s::{
pod::pod_failed,
pvc::high_pvc_fill_rate_over_two_days,
memory_usage::alert_high_memory_usage,
};
let rules = AlertManagerRuleGroup::new("k8s-rules", vec![
pod_failed(),
high_pvc_fill_rate_over_two_days(),
alert_high_memory_usage(),
]);
```
Available rules:
- `pod_failed()` - Pod in failed state
- `alert_container_restarting()` - Container restart loop
- `alert_pod_not_ready()` - Pod not ready for extended period
- `high_pvc_fill_rate_over_two_days()` - PVC will fill within 2 days
- `alert_high_memory_usage()` - Memory usage above threshold
- `alert_high_cpu_usage()` - CPU usage above threshold
### Infrastructure Alerts (`alerts/infra/`)
```rust
use harmony::modules::monitoring::alert_rule::alerts::infra::opnsense::high_http_error_rate;
let rules = AlertManagerRuleGroup::new("infra-rules", vec![
high_http_error_rate(),
]);
```
### Creating Custom Rules
```rust
use harmony::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
pub fn my_custom_alert() -> PrometheusAlertRule {
PrometheusAlertRule::new("MyServiceDown", "up{job=\"my-service\"} == 0")
.for_duration("5m")
.label("severity", "critical")
.annotation("summary", "My service is down")
.annotation("description", "The my-service job has been down for more than 5 minutes")
}
```
---
## Alert Receivers
### Discord Webhook
```rust
use harmony::modules::monitoring::alert_channel::discord_alert_channel::DiscordReceiver;
use harmony::topology::monitoring::{AlertRoute, AlertMatcher, MatchOp};
let discord = DiscordReceiver {
name: "ops-alerts".to_string(),
url: hurl!("https://discord.com/api/webhooks/123456/abcdef"),
route: AlertRoute {
receiver: "ops-alerts".to_string(),
matchers: vec![AlertMatcher {
label: "severity".to_string(),
operator: MatchOp::Eq,
value: "critical".to_string(),
}],
group_by: vec!["alertname".to_string()],
repeat_interval: Some("30m".to_string()),
continue_matching: false,
children: vec![],
},
};
```
### Generic Webhook
```rust
use harmony::modules::monitoring::alert_channel::webhook_receiver::WebhookReceiver;
let webhook = WebhookReceiver {
name: "custom-webhook".to_string(),
url: hurl!("https://api.example.com/alerts"),
route: AlertRoute::default("custom-webhook".to_string()),
};
```
---
## Adding a New Monitoring Stack
To add support for a new monitoring stack:
1. **Create the sender type** in `modules/monitoring/my_sender/mod.rs`:
```rust
#[derive(Debug, Clone)]
pub struct MySender;
impl AlertSender for MySender {
fn name(&self) -> String { "MySender".to_string() }
}
```
2. **Define CRD types** in `modules/monitoring/my_sender/crd/`:
```rust
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone)]
#[kube(group = "monitoring.example.com", version = "v1", kind = "MyAlertRule")]
pub struct MyAlertRuleSpec { ... }
```
3. **Implement Observability** in `domain/topology/k8s_anywhere/observability/my_sender.rs`:
```rust
impl Observability<MySender> for K8sAnywhereTopology {
async fn install_receivers(&self, sender, inventory, receivers) { ... }
async fn install_rules(&self, sender, inventory, rules) { ... }
// ...
}
```
4. **Implement receiver conversions** for existing receivers:
```rust
impl AlertReceiver<MySender> for DiscordReceiver {
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
// Convert DiscordReceiver to MySender's format
}
}
```
5. **Create score types**:
```rust
pub struct MySenderAlertScore {
pub sender: MySender,
pub receivers: Vec<Box<dyn AlertReceiver<MySender>>>,
pub rules: Vec<Box<dyn AlertRule<MySender>>>,
}
```
---
## Architecture Principles
### Type Safety Over Flexibility
Each monitoring stack has distinct CRDs and configuration formats. Rather than a unified "MonitoringStack" type that loses stack-specific features, we use generic traits that provide type safety while allowing each stack to express its unique configuration.
### Compile-Time Capability Verification
The `Observability<S>` bound ensures you can't deploy OKD alerts to a KubePrometheus cluster. The compiler catches platform mismatches before deployment.
### Explicit Over Implicit
Monitoring stacks are chosen explicitly (`OpenshiftClusterAlertSender` vs `KubePrometheus`). There's no "auto-detection" that could lead to surprising behavior.
### Three Levels, One Foundation
Cluster, tenant, and application monitoring all use the same traits (`AlertSender`, `AlertReceiver`, `AlertRule`). The difference is in how scores are constructed and how topologies interpret them.
---
## Related Documentation
- [ADR-020: Monitoring and Alerting Architecture](../adr/020-monitoring-alerting-architecture.md)
- [ADR-013: Monitoring Notifications (ntfy)](../adr/013-monitoring-notifications.md)
- [ADR-011: Multi-Tenant Cluster Architecture](../adr/011-multi-tenant-cluster.md)
- [Coding Guide](coding-guide.md)
- [Core Concepts](concepts.md)

View File

@@ -7,7 +7,7 @@ use harmony::{
monitoring::alert_channel::webhook_receiver::WebhookReceiver,
tenant::TenantScore,
},
topology::{K8sAnywhereTopology, monitoring::AlertRoute, tenant::TenantConfig},
topology::{K8sAnywhereTopology, tenant::TenantConfig},
};
use harmony_types::id::Id;
use harmony_types::net::Url;
@@ -33,14 +33,9 @@ async fn main() {
service_port: 3000,
});
let receiver_name = "sample-webhook-receiver".to_string();
let webhook_receiver = WebhookReceiver {
name: receiver_name.clone(),
name: "sample-webhook-receiver".to_string(),
url: Url::Url(url::Url::parse("https://webhook-doesnt-exist.com").unwrap()),
route: AlertRoute {
..AlertRoute::default(receiver_name)
},
};
let app = ApplicationScore {

View File

@@ -1,45 +1,37 @@
use std::{
collections::HashMap,
sync::{Arc, Mutex},
};
use std::collections::HashMap;
use harmony::{
inventory::Inventory,
modules::monitoring::{
alert_channel::discord_alert_channel::DiscordReceiver,
alert_rule::{
alerts::{
infra::dell_server::{
alert_global_storage_status_critical,
alert_global_storage_status_non_recoverable,
global_storage_status_degraded_non_critical,
modules::{
monitoring::{
alert_channel::discord_alert_channel::DiscordWebhook,
alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
kube_prometheus::{
helm_prometheus_alert_score::HelmPrometheusAlertingScore,
types::{
HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
ServiceMonitorEndpoint,
},
k8s::pvc::high_pvc_fill_rate_over_two_days,
},
prometheus_alert_rule::AlertManagerRuleGroup,
},
kube_prometheus::{
helm::config::KubePrometheusConfig,
kube_prometheus_alerting_score::KubePrometheusAlertingScore,
types::{
HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
ServiceMonitorEndpoint,
prometheus::alerts::{
infra::dell_server::{
alert_global_storage_status_critical, alert_global_storage_status_non_recoverable,
global_storage_status_degraded_non_critical,
},
k8s::pvc::high_pvc_fill_rate_over_two_days,
},
},
topology::{K8sAnywhereTopology, monitoring::AlertRoute},
topology::K8sAnywhereTopology,
};
use harmony_types::{k8s_name::K8sName, net::Url};
#[tokio::main]
async fn main() {
let receiver_name = "test-discord".to_string();
let discord_receiver = DiscordReceiver {
name: receiver_name.clone(),
let discord_receiver = DiscordWebhook {
name: K8sName("test-discord".to_string()),
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
route: AlertRoute {
..AlertRoute::default(receiver_name)
},
selectors: vec![],
};
let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
@@ -78,15 +70,10 @@ async fn main() {
endpoints: vec![service_monitor_endpoint],
..Default::default()
};
let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
let alerting_score = KubePrometheusAlertingScore {
let alerting_score = HelmPrometheusAlertingScore {
receivers: vec![Box::new(discord_receiver)],
rules: vec![Box::new(additional_rules), Box::new(additional_rules2)],
service_monitors: vec![service_monitor],
scrape_targets: None,
config,
};
harmony_cli::run(

View File

@@ -1,32 +1,24 @@
use std::{
collections::HashMap,
str::FromStr,
sync::{Arc, Mutex},
};
use std::{collections::HashMap, str::FromStr};
use harmony::{
inventory::Inventory,
modules::{
monitoring::{
alert_channel::discord_alert_channel::DiscordReceiver,
alert_rule::{
alerts::k8s::pvc::high_pvc_fill_rate_over_two_days,
prometheus_alert_rule::AlertManagerRuleGroup,
},
alert_channel::discord_alert_channel::DiscordWebhook,
alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
kube_prometheus::{
helm::config::KubePrometheusConfig,
kube_prometheus_alerting_score::KubePrometheusAlertingScore,
helm_prometheus_alert_score::HelmPrometheusAlertingScore,
types::{
HTTPScheme, MatchExpression, Operator, Selector, ServiceMonitor,
ServiceMonitorEndpoint,
},
},
},
prometheus::alerts::k8s::pvc::high_pvc_fill_rate_over_two_days,
tenant::TenantScore,
},
topology::{
K8sAnywhereTopology,
monitoring::AlertRoute,
tenant::{ResourceLimits, TenantConfig, TenantNetworkPolicy},
},
};
@@ -50,13 +42,10 @@ async fn main() {
},
};
let receiver_name = "test-discord".to_string();
let discord_receiver = DiscordReceiver {
name: receiver_name.clone(),
let discord_receiver = DiscordWebhook {
name: K8sName("test-discord".to_string()),
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
route: AlertRoute {
..AlertRoute::default(receiver_name)
},
selectors: vec![],
};
let high_pvc_fill_rate_over_two_days_alert = high_pvc_fill_rate_over_two_days();
@@ -85,14 +74,10 @@ async fn main() {
..Default::default()
};
let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
let alerting_score = KubePrometheusAlertingScore {
let alerting_score = HelmPrometheusAlertingScore {
receivers: vec![Box::new(discord_receiver)],
rules: vec![Box::new(additional_rules)],
service_monitors: vec![service_monitor],
scrape_targets: None,
config,
};
harmony_cli::run(

View File

@@ -14,7 +14,6 @@ async fn main() {
..Default::default() // Use harmony defaults, they are based on CNPG's default values :
// "default" namespace, 1 instance, 1Gi storage
},
hostname: "postgrestest.sto1.nationtech.io".to_string(),
};
harmony_cli::run(

View File

@@ -1,64 +1,35 @@
use std::collections::HashMap;
use harmony::{
inventory::Inventory,
modules::monitoring::{
alert_channel::discord_alert_channel::DiscordReceiver,
alert_rule::{
alerts::{
infra::opnsense::high_http_error_rate, k8s::pvc::high_pvc_fill_rate_over_two_days,
},
prometheus_alert_rule::AlertManagerRuleGroup,
},
okd::openshift_cluster_alerting_score::OpenshiftClusterAlertScore,
scrape_target::prometheus_node_exporter::PrometheusNodeExporter,
},
topology::{
K8sAnywhereTopology,
monitoring::{AlertMatcher, AlertRoute, MatchOp},
alert_channel::discord_alert_channel::DiscordWebhook,
okd::cluster_monitoring::OpenshiftClusterAlertScore,
},
topology::K8sAnywhereTopology,
};
use harmony_macros::{hurl, ip};
use harmony_macros::hurl;
use harmony_types::k8s_name::K8sName;
#[tokio::main]
async fn main() {
let platform_matcher = AlertMatcher {
label: "prometheus".to_string(),
operator: MatchOp::Eq,
value: "openshift-monitoring/k8s".to_string(),
};
let severity = AlertMatcher {
label: "severity".to_string(),
operator: MatchOp::Eq,
value: "critical".to_string(),
};
let high_http_error_rate = high_http_error_rate();
let additional_rules = AlertManagerRuleGroup::new("test-rule", vec![high_http_error_rate]);
let scrape_target = PrometheusNodeExporter {
job_name: "firewall".to_string(),
metrics_path: "/metrics".to_string(),
listen_address: ip!("192.168.1.1"),
port: 9100,
..Default::default()
};
let mut sel = HashMap::new();
sel.insert(
"openshift_io_alert_source".to_string(),
"platform".to_string(),
);
let mut sel2 = HashMap::new();
sel2.insert("openshift_io_alert_source".to_string(), "".to_string());
let selectors = vec![sel, sel2];
harmony_cli::run(
Inventory::autoload(),
K8sAnywhereTopology::from_env(),
vec![Box::new(OpenshiftClusterAlertScore {
receivers: vec![Box::new(DiscordReceiver {
name: "crit-wills-discord-channel-example".to_string(),
url: hurl!("https://test.io"),
route: AlertRoute {
matchers: vec![severity],
..AlertRoute::default("crit-wills-discord-channel-example".to_string())
},
receivers: vec![Box::new(DiscordWebhook {
name: K8sName("wills-discord-webhook-example".to_string()),
url: hurl!("https://something.io"),
selectors: selectors,
})],
sender: harmony::modules::monitoring::okd::OpenshiftClusterAlertSender,
rules: vec![Box::new(additional_rules)],
scrape_targets: Some(vec![Box::new(scrape_target)]),
})],
None,
)

View File

@@ -6,7 +6,10 @@ use harmony::{
data::{FileContent, FilePath},
modules::{
inventory::HarmonyDiscoveryStrategy,
okd::{installation::OKDInstallationPipeline, ipxe::OKDIpxeScore},
okd::{
installation::OKDInstallationPipeline, ipxe::OKDIpxeScore,
load_balancer::OKDLoadBalancerScore,
},
},
score::Score,
topology::HAClusterTopology,
@@ -32,6 +35,7 @@ async fn main() {
scores
.append(&mut OKDInstallationPipeline::get_all_scores(HarmonyDiscoveryStrategy::MDNS).await);
scores.push(Box::new(OKDLoadBalancerScore::new(&topology)));
harmony_cli::run(inventory, topology, scores, None)
.await
.unwrap();

View File

@@ -15,7 +15,6 @@ async fn main() {
..Default::default() // Use harmony defaults, they are based on CNPG's default values :
// 1 instance, 1Gi storage
},
hostname: "postgrestest.sto1.nationtech.io".to_string(),
};
let test_connection = PostgreSQLConnectionScore {

View File

@@ -6,9 +6,9 @@ use harmony::{
application::{
ApplicationScore, RustWebFramework, RustWebapp, features::rhob_monitoring::Monitoring,
},
monitoring::alert_channel::discord_alert_channel::DiscordReceiver,
monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
},
topology::{K8sAnywhereTopology, monitoring::AlertRoute},
topology::K8sAnywhereTopology,
};
use harmony_types::{k8s_name::K8sName, net::Url};
@@ -22,21 +22,18 @@ async fn main() {
service_port: 3000,
});
let receiver_name = "test-discord".to_string();
let discord_receiver = DiscordReceiver {
name: receiver_name.clone(),
let discord_receiver = DiscordWebhook {
name: K8sName("test-discord".to_string()),
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
route: AlertRoute {
..AlertRoute::default(receiver_name)
},
selectors: vec![],
};
let app = ApplicationScore {
features: vec![
// Box::new(Monitoring {
// application: application.clone(),
// alert_receiver: vec![Box::new(discord_receiver)],
// }),
Box::new(Monitoring {
application: application.clone(),
alert_receiver: vec![Box::new(discord_receiver)],
}),
// TODO add backups, multisite ha, etc
],
application,

View File

@@ -8,13 +8,13 @@ use harmony::{
features::{Monitoring, PackagingDeployment},
},
monitoring::alert_channel::{
discord_alert_channel::DiscordReceiver, webhook_receiver::WebhookReceiver,
discord_alert_channel::DiscordWebhook, webhook_receiver::WebhookReceiver,
},
},
topology::{K8sAnywhereTopology, monitoring::AlertRoute},
topology::K8sAnywhereTopology,
};
use harmony_macros::hurl;
use harmony_types::{k8s_name::K8sName, net::Url};
use harmony_types::k8s_name::K8sName;
#[tokio::main]
async fn main() {
@@ -26,23 +26,15 @@ async fn main() {
service_port: 3000,
});
let receiver_name = "test-discord".to_string();
let discord_receiver = DiscordReceiver {
name: receiver_name.clone(),
url: Url::Url(url::Url::parse("https://discord.doesnt.exist.com").unwrap()),
route: AlertRoute {
..AlertRoute::default(receiver_name)
},
let discord_receiver = DiscordWebhook {
name: K8sName("test-discord".to_string()),
url: hurl!("https://discord.doesnt.exist.com"),
selectors: vec![],
};
let receiver_name = "sample-webhook-receiver".to_string();
let webhook_receiver = WebhookReceiver {
name: receiver_name.clone(),
name: "sample-webhook-receiver".to_string(),
url: hurl!("https://webhook-doesnt-exist.com"),
route: AlertRoute {
..AlertRoute::default(receiver_name)
},
};
let app = ApplicationScore {
@@ -50,10 +42,10 @@ async fn main() {
Box::new(PackagingDeployment {
application: application.clone(),
}),
// Box::new(Monitoring {
// application: application.clone(),
// alert_receiver: vec![Box::new(discord_receiver), Box::new(webhook_receiver)],
// }),
Box::new(Monitoring {
application: application.clone(),
alert_receiver: vec![Box::new(discord_receiver), Box::new(webhook_receiver)],
}),
// TODO add backups, multisite ha, etc
],
application,

View File

@@ -1,8 +1,11 @@
use harmony::{
inventory::Inventory,
modules::application::{
ApplicationScore, RustWebFramework, RustWebapp,
features::{Monitoring, PackagingDeployment},
modules::{
application::{
ApplicationScore, RustWebFramework, RustWebapp,
features::{Monitoring, PackagingDeployment},
},
monitoring::alert_channel::discord_alert_channel::DiscordWebhook,
},
topology::K8sAnywhereTopology,
};
@@ -27,14 +30,14 @@ async fn main() {
Box::new(PackagingDeployment {
application: application.clone(),
}),
// Box::new(Monitoring {
// application: application.clone(),
// alert_receiver: vec![Box::new(DiscordWebhook {
// name: K8sName("test-discord".to_string()),
// url: hurl!("https://discord.doesnt.exist.com"),
// selectors: vec![],
// })],
// }),
Box::new(Monitoring {
application: application.clone(),
alert_receiver: vec![Box::new(DiscordWebhook {
name: K8sName("test-discord".to_string()),
url: hurl!("https://discord.doesnt.exist.com"),
selectors: vec![],
})],
}),
],
application,
};

View File

@@ -44,6 +44,7 @@ fn build_large_score() -> LoadBalancerScore {
],
listening_port: SocketAddr::V4(SocketAddrV4::new(ipv4!("192.168.0.0"), 49387)),
health_check: Some(HealthCheck::HTTP(
Some(1993),
"/some_long_ass_path_to_see_how_it_is_displayed_but_it_has_to_be_even_longer"
.to_string(),
HttpMethod::GET,

15
examples/vllm/Cargo.toml Normal file
View File

@@ -0,0 +1,15 @@
[package]
name = "vllm"
edition = "2024"
version.workspace = true
readme.workspace = true
license.workspace = true
publish = false
[dependencies]
harmony = { path = "../../harmony" }
harmony_cli = { path = "../../harmony_cli" }
k8s-openapi = { workspace = true }
tokio = { workspace = true }
log = { workspace = true }
env_logger = { workspace = true }

523
examples/vllm/src/main.rs Normal file
View File

@@ -0,0 +1,523 @@
//! vLLM Deployment Example for Qwen3.5-27B-FP8 on NVIDIA RTX 5090
//!
//! This example deploys vLLM serving Qwen3.5-27B with FP8 quantization,
//! optimized for single RTX 5090 (32GB VRAM) with tool calling support.
//!
//! # Architecture & Memory Constraints
//!
//! **Model Details:**
//! - Parameters: 27B (dense, not sparse/MoE)
//! - Quantization: FP8 (8-bit weights)
//! - Model size: ~27-28GB in memory
//! - Native context: 262,144 tokens (will NOT fit in 32GB VRAM)
//!
//! **VRAM Budget for RTX 5090 (32GB):**
//! - Model weights (FP8): ~27GB
//! - Framework overhead: ~1-2GB
//! - KV cache: ~2-3GB (for 16k context)
//! - CUDA context: ~500MB
//! - Temporary buffers: ~500MB
//! - **Total: ~31-33GB** (tight fit, leaves minimal headroom)
//!
//! # OpenShift/OKD Requirements
//!
//! **SCC (Security Context Constraint) Setup:**
//!
//! The official vLLM container runs as root and writes to `/root/.cache/huggingface`.
//! On OpenShift/OKD with the default restricted SCC, containers run as arbitrary UIDs
//! and cannot write to `/root`. For testing, grant the `anyuid` SCC:
//!
//! ```bash
//! # As cluster admin, grant anyuid SCC to the namespace's service account:
//! oc adm policy add-scc-to-user anyuid -z default -n vllm-qwen
//! ```
//!
//! This allows pods in the `vllm-qwen` namespace to run as root (UID 0).
//! For production, consider building a custom vLLM image that runs as non-root.
//!
//! # Critical Configuration Notes
//!
//! 1. **GPU_MEMORY_UTILIZATION=1.0**: Maximum GPU memory allocation.
//! NEVER decrease this for dense models - CPU offloading destroys performance
//! (100-1000x slower) for models where every parameter is used during inference.
//!
//! 2. **MAX_MODEL_LEN=16384**: Conservative context length that fits in available VRAM.
//! Agentic workflows with long tool call histories will need careful context management.
//!
//! 3. **--language-model-only**: Skips loading the vision encoder, saving ~1-2GB VRAM.
//! Essential for fitting the model in 32GB VRAM.
//!
//! 4. **PVC Size**: 50Gi for HuggingFace cache. Qwen3.5-27B-FP8 is ~30GB.
//!
//! # Performance Expectations
//!
//! - Single token latency: ~50-100ms (no CPU offloading)
//! - With CPU offloading: ~5-50 seconds per token (unusable for real-time inference)
//! - Throughput: ~10-20 tokens/second (single stream, no batching)
//!
//! # Next Steps for Production
//!
//! To increase context length:
//! 1. Monitor GPU memory: `kubectl exec -it deployment/qwen3-5-27b -- nvidia-smi dmon -s u`
//! 2. If stable, increase MAX_MODEL_LEN (try 32768, then 65536)
//! 3. If OOM: revert to lower value
//!
//! For full 262k context, consider:
//! - Multi-GPU setup with tensor parallelism (--tensor-parallel-size 8)
//! - Or use a smaller model (Qwen3.5-7B-FP8)
use std::collections::BTreeMap;
use harmony::{
inventory::Inventory,
modules::{
k8s::resource::K8sResourceScore,
okd::{
crd::route::{RoutePort, RouteSpec, RouteTargetReference, TLSConfig},
route::OKDRouteScore,
},
},
score::Score,
topology::{K8sAnywhereTopology, TlsRouter},
};
use k8s_openapi::{
api::{
apps::v1::{Deployment, DeploymentSpec, DeploymentStrategy},
core::v1::{
Container, ContainerPort, EmptyDirVolumeSource, EnvVar, EnvVarSource,
HTTPGetAction, PersistentVolumeClaim, PersistentVolumeClaimSpec,
PersistentVolumeClaimVolumeSource, PodSpec, PodTemplateSpec, Probe,
ResourceRequirements, Secret, SecretKeySelector, SecretVolumeSource, Service,
ServicePort, ServiceSpec, Volume, VolumeMount, VolumeResourceRequirements,
},
},
apimachinery::pkg::{
api::resource::Quantity,
apis::meta::v1::{LabelSelector, ObjectMeta},
util::intstr::IntOrString,
},
ByteString,
};
use log::info;
const NAMESPACE: &str = "vllm-qwen";
const MODEL_NAME: &str = "Qwen/Qwen3.5-27B-FP8";
const DEPLOYMENT_NAME: &str = "qwen3-5-27b";
const SERVICE_NAME: &str = DEPLOYMENT_NAME;
const ROUTE_NAME: &str = DEPLOYMENT_NAME;
const PVC_NAME: &str = "huggingface-cache";
const SECRET_NAME: &str = "hf-token-secret";
const VLLM_IMAGE: &str = "vllm/vllm-openai:latest";
const SERVICE_PORT: u16 = 8000;
const TARGET_PORT: u16 = 8000;
/// Maximum context length for the model (in tokens).
///
/// **Impact on VRAM:**
/// - Qwen3.5-27B uses per-token KV cache storage for the context window
/// - Larger context = more KV cache memory required
/// - Approximate KV cache per token: ~32KB for FP8 (very rough estimate)
/// - 16k tokens ≈ 0.5-1GB KV cache
/// - 262k tokens ≈ 8-16GB KV cache (native context length - will NOT fit in 32GB VRAM)
///
/// **Performance Impact:**
/// - Context length directly impacts memory for storing conversation history
/// - Agentic workflows with long tool call histories benefit from more context
/// - If context > available VRAM, vLLM will OOM and fail to start
///
/// **Recommendations for RTX 5090 (32GB):**
/// - Start with 16384 (conservative, should work)
/// - If no OOM, try 32768 (better for agentic workflows)
/// - Monitor GPU memory with `nvidia-smi` during operation
const MAX_MODEL_LEN: i64 = 16384;
/// Fraction of GPU memory to allocate for the model (0.0 to 1.0).
///
/// **CRITICAL WARNING: This is a dense model!**
/// Qwen3.5-27B-FP8 is NOT a sparse/mixture-of-experts model. All 27B parameters
/// are active during inference. CPU offloading will DESTROY performance.
///
/// **What this parameter controls:**
/// - Controls how much of GPU memory vLLM pre-allocates for:
/// 1. Model weights (~27GB for FP8 quantization)
/// 2. KV cache for context window
/// 3. Activation buffers for inference
/// 4. Runtime overhead
///
/// **VRAM Allocation Example:**
/// - GPU: 32GB RTX 5090
/// - GPU_MEMORY_UTILIZATION: 0.95
/// - vLLM will try to use: 32GB * 0.95 = 30.4GB
/// - Model weights: ~27-28GB
/// - Remaining for KV cache + runtime: ~2-3GB
///
/// **If set too LOW (e.g., 0.7):**
/// - vLLM restricts itself to 32GB * 0.7 = 22.4GB
/// - Model weights alone need ~27GB
/// - vLLM will OFFLOAD model weights to CPU memory
/// - Performance: **100-1000x slower** (single token generation can take seconds instead of milliseconds)
/// - This is catastrophic for a dense model where every layer needs all parameters
///
/// **If set too HIGH (e.g., 0.99):**
/// - vLLM tries to allocate nearly all GPU memory
/// - Risk: CUDA OOM if any other process needs GPU memory
/// - Risk: KV cache allocation fails during inference
/// - System instability
///
/// **Current Setting: 0.95**
/// - Leaves 5% buffer (1.6GB) for CUDA overhead, system processes
/// - Maximum allocation for model + KV cache: ~30.4GB
/// - Should leave enough headroom for:
/// - CUDA context: ~500MB
/// - Temporary buffers: ~500MB
/// - Safety margin: ~600MB
///
/// **How to tune:**
/// 1. Start with 0.95 (current setting)
/// 2. Monitor with `nvidia-smi dmon -s u` during operation
/// 3. If OOM during inference: reduce MAX_MODEL_LEN first
/// 4. If stable: try increasing MAX_MODEL_LEN before increasing this
/// 5. Only increase this if you're certain no other GPU processes run
///
/// **NEVER decrease this for dense models!**
/// If model doesn't fit, use a smaller model or quantization, not CPU offloading.
const GPU_MEMORY_UTILIZATION : f32 = 1.0;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
info!("Deploying vLLM with Qwen3.5-27B-FP8 model");
info!("Configuration:");
info!(" Model: {}", MODEL_NAME);
info!(" Max context length: {} tokens", MAX_MODEL_LEN);
info!(" GPU memory utilization: {}", GPU_MEMORY_UTILIZATION);
info!(" Language model only: true");
info!(" Tool calling enabled: true");
let topology = K8sAnywhereTopology::from_env();
let domain = topology
.get_internal_domain()
.await
.ok()
.flatten()
.unwrap_or_else(|| "cluster.local".to_string());
let host = format!("{}-{}.apps.{}", SERVICE_NAME, NAMESPACE, domain);
info!("Creating route with host: {}", host);
let scores: Vec<Box<dyn Score<K8sAnywhereTopology>>> = vec![
create_namespace(),
create_pvc(),
create_secret(),
create_deployment(),
create_service(),
create_route(&host),
];
harmony_cli::run(Inventory::autoload(), topology, scores, None)
.await
.map_err(|e| format!("Failed to deploy: {}", e))?;
info!("Successfully deployed vLLM with Qwen3.5-27B-FP8");
info!("Access the API at: http://{}.apps.<cluster-domain>", SERVICE_NAME);
Ok(())
}
fn create_namespace() -> Box<dyn Score<K8sAnywhereTopology>> {
use k8s_openapi::api::core::v1::Namespace;
let namespace = Namespace {
metadata: ObjectMeta {
name: Some(NAMESPACE.to_string()),
..Default::default()
},
spec: None,
status: None,
};
Box::new(K8sResourceScore::single(namespace, None))
}
fn create_pvc() -> Box<dyn Score<K8sAnywhereTopology>> {
let pvc = PersistentVolumeClaim {
metadata: ObjectMeta {
name: Some(PVC_NAME.to_string()),
namespace: Some(NAMESPACE.to_string()),
..Default::default()
},
spec: Some(PersistentVolumeClaimSpec {
access_modes: Some(vec!["ReadWriteOnce".to_string()]),
resources: Some(VolumeResourceRequirements {
requests: Some(BTreeMap::from([(
"storage".to_string(),
Quantity("50Gi".to_string()),
)])),
limits: None,
}),
..Default::default()
}),
status: None,
};
Box::new(K8sResourceScore::single(
pvc,
Some(NAMESPACE.to_string()),
))
}
fn create_secret() -> Box<dyn Score<K8sAnywhereTopology>> {
let mut data = BTreeMap::new();
data.insert(
"token".to_string(),
ByteString("".to_string().into_bytes()),
);
let secret = Secret {
metadata: ObjectMeta {
name: Some(SECRET_NAME.to_string()),
namespace: Some(NAMESPACE.to_string()),
..Default::default()
},
data: Some(data),
immutable: Some(false),
type_: Some("Opaque".to_string()),
string_data: None,
};
Box::new(K8sResourceScore::single(
secret,
Some(NAMESPACE.to_string()),
))
}
fn create_deployment() -> Box<dyn Score<K8sAnywhereTopology>> {
let deployment = Deployment {
metadata: ObjectMeta {
name: Some(DEPLOYMENT_NAME.to_string()),
namespace: Some(NAMESPACE.to_string()),
labels: Some(BTreeMap::from([(
"app".to_string(),
DEPLOYMENT_NAME.to_string(),
)])),
..Default::default()
},
spec: Some(DeploymentSpec {
replicas: Some(1),
selector: LabelSelector {
match_labels: Some(BTreeMap::from([(
"app".to_string(),
DEPLOYMENT_NAME.to_string(),
)])),
..Default::default()
},
strategy: Some(DeploymentStrategy {
type_: Some("Recreate".to_string()),
..Default::default()
}),
template: PodTemplateSpec {
metadata: Some(ObjectMeta {
labels: Some(BTreeMap::from([(
"app".to_string(),
DEPLOYMENT_NAME.to_string(),
)])),
..Default::default()
}),
spec: Some(PodSpec {
node_selector: Some(BTreeMap::from([(
"nvidia.com/gpu.product".to_string(),
"NVIDIA-GeForce-RTX-5090".to_string(),
)])),
volumes: Some(vec![
Volume {
name: "cache-volume".to_string(),
persistent_volume_claim: Some(PersistentVolumeClaimVolumeSource {
claim_name: PVC_NAME.to_string(),
read_only: Some(false),
}),
..Default::default()
},
Volume {
name: "shm".to_string(),
empty_dir: Some(EmptyDirVolumeSource {
medium: Some("Memory".to_string()),
size_limit: Some(Quantity("4Gi".to_string())),
}),
..Default::default()
},
Volume {
name: "hf-token".to_string(),
secret: Some(SecretVolumeSource {
secret_name: Some(SECRET_NAME.to_string()),
optional: Some(true),
..Default::default()
}),
..Default::default()
},
]),
containers: vec![Container {
name: DEPLOYMENT_NAME.to_string(),
image: Some(VLLM_IMAGE.to_string()),
command: Some(vec!["/bin/sh".to_string(), "-c".to_string()]),
args: Some(vec![build_vllm_command()]),
env: Some(vec![
EnvVar {
name: "HF_TOKEN".to_string(),
value_from: Some(EnvVarSource {
secret_key_ref: Some(SecretKeySelector {
key: "token".to_string(),
name: SECRET_NAME.to_string(),
optional: Some(true),
}),
..Default::default()
}),
value: None,
},
EnvVar {
name: "VLLM_WORKER_MULTIPROC_METHOD".to_string(),
value: Some("spawn".to_string()),
value_from: None,
},
]),
ports: Some(vec![ContainerPort {
container_port: SERVICE_PORT as i32,
protocol: Some("TCP".to_string()),
..Default::default()
}]),
resources: Some(ResourceRequirements {
limits: Some(BTreeMap::from([
("cpu".to_string(), Quantity("10".to_string())),
("memory".to_string(), Quantity("30Gi".to_string())),
("nvidia.com/gpu".to_string(), Quantity("1".to_string())),
])),
requests: Some(BTreeMap::from([
("cpu".to_string(), Quantity("2".to_string())),
("memory".to_string(), Quantity("10Gi".to_string())),
("nvidia.com/gpu".to_string(), Quantity("1".to_string())),
])),
claims: None,
}),
volume_mounts: Some(vec![
VolumeMount {
name: "cache-volume".to_string(),
mount_path: "/root/.cache/huggingface".to_string(),
read_only: Some(false),
..Default::default()
},
VolumeMount {
name: "shm".to_string(),
mount_path: "/dev/shm".to_string(),
..Default::default()
},
VolumeMount {
name: "hf-token".to_string(),
mount_path: "/etc/secrets/hf-token".to_string(),
read_only: Some(true),
..Default::default()
},
]),
liveness_probe: Some(Probe {
http_get: Some(HTTPGetAction {
path: Some("/health".to_string()),
port: IntOrString::Int(SERVICE_PORT as i32),
..Default::default()
}),
initial_delay_seconds: Some(300),
period_seconds: Some(30),
..Default::default()
}),
readiness_probe: Some(Probe {
http_get: Some(HTTPGetAction {
path: Some("/health".to_string()),
port: IntOrString::Int(SERVICE_PORT as i32),
..Default::default()
}),
initial_delay_seconds: Some(120),
period_seconds: Some(10),
..Default::default()
}),
..Default::default()
}],
..Default::default()
}),
},
..Default::default()
}),
status: None,
};
Box::new(K8sResourceScore::single(
deployment,
Some(NAMESPACE.to_string()),
))
}
fn build_vllm_command() -> String {
format!(
"vllm serve {} \
--port {} \
--max-model-len {} \
--gpu-memory-utilization {} \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--language-model-only",
MODEL_NAME, SERVICE_PORT, MAX_MODEL_LEN, GPU_MEMORY_UTILIZATION
)
}
fn create_service() -> Box<dyn Score<K8sAnywhereTopology>> {
let service = Service {
metadata: ObjectMeta {
name: Some(SERVICE_NAME.to_string()),
namespace: Some(NAMESPACE.to_string()),
..Default::default()
},
spec: Some(ServiceSpec {
ports: Some(vec![ServicePort {
name: Some("http".to_string()),
port: SERVICE_PORT as i32,
protocol: Some("TCP".to_string()),
target_port: Some(IntOrString::Int(TARGET_PORT as i32)),
..Default::default()
}]),
selector: Some(BTreeMap::from([(
"app".to_string(),
DEPLOYMENT_NAME.to_string(),
)])),
type_: Some("ClusterIP".to_string()),
..Default::default()
}),
status: None,
};
Box::new(K8sResourceScore::single(
service,
Some(NAMESPACE.to_string()),
))
}
fn create_route(host: &str) -> Box<dyn Score<K8sAnywhereTopology>> {
let route_spec = RouteSpec {
to: RouteTargetReference {
kind: "Service".to_string(),
name: SERVICE_NAME.to_string(),
weight: Some(100),
},
host: Some(host.to_string()),
port: Some(RoutePort {
target_port: SERVICE_PORT as u16,
}),
tls: Some(TLSConfig {
termination: "edge".to_string(),
insecure_edge_termination_policy: Some("Redirect".to_string()),
..Default::default()
}),
wildcard_policy: None,
..Default::default()
};
Box::new(OKDRouteScore::new(ROUTE_NAME, NAMESPACE, route_spec))
}

View File

@@ -1,4 +1,4 @@
use std::{collections::BTreeMap, process::Command, sync::Arc};
use std::{collections::BTreeMap, process::Command, sync::Arc, time::Duration};
use async_trait::async_trait;
use base64::{Engine, engine::general_purpose};
@@ -8,7 +8,7 @@ use k8s_openapi::api::{
core::v1::{Pod, Secret},
rbac::v1::{ClusterRoleBinding, RoleRef, Subject},
};
use kube::api::{GroupVersionKind, ObjectMeta};
use kube::api::{DynamicObject, GroupVersionKind, ObjectMeta};
use log::{debug, info, trace, warn};
use serde::Serialize;
use tokio::sync::OnceCell;
@@ -29,7 +29,28 @@ use crate::{
score_cert_management::CertificateManagementScore,
},
k3d::K3DInstallationScore,
k8s::ingress::{K8sIngressScore, PathType},
monitoring::{
grafana::{grafana::Grafana, helm::helm_grafana::grafana_helm_chart_score},
kube_prometheus::crd::{
crd_alertmanager_config::CRDPrometheus,
crd_grafana::{
Grafana as GrafanaCRD, GrafanaCom, GrafanaDashboard,
GrafanaDashboardDatasource, GrafanaDashboardSpec, GrafanaDatasource,
GrafanaDatasourceConfig, GrafanaDatasourceJsonData,
GrafanaDatasourceSecureJsonData, GrafanaDatasourceSpec, GrafanaSpec,
},
crd_prometheuses::LabelSelector,
prometheus_operator::prometheus_operator_helm_chart_score,
rhob_alertmanager_config::RHOBObservability,
service_monitor::ServiceMonitor,
},
},
okd::{crd::ingresses_config::Ingress as IngressResource, route::OKDTlsPassthroughScore},
prometheus::{
k8s_prometheus_alerting_score::K8sPrometheusCRDAlertingScore,
prometheus::PrometheusMonitoring, rhob_alerting_score::RHOBAlertingScore,
},
},
score::Score,
topology::{TlsRoute, TlsRouter, ingress::Ingress},
@@ -38,6 +59,7 @@ use crate::{
use super::super::{
DeploymentTarget, HelmCommand, K8sclient, MultiTargetTopology, PreparationError,
PreparationOutcome, Topology,
oberservability::monitoring::AlertReceiver,
tenant::{
TenantConfig, TenantManager,
k8s::K8sTenantManager,
@@ -87,6 +109,13 @@ impl K8sclient for K8sAnywhereTopology {
#[async_trait]
impl TlsRouter for K8sAnywhereTopology {
async fn get_public_domain(&self) -> Result<String, String> {
match &self.config.public_domain {
Some(public_domain) => Ok(public_domain.to_string()),
None => Err("Public domain not available".to_string()),
}
}
async fn get_internal_domain(&self) -> Result<Option<String>, String> {
match self.get_k8s_distribution().await.map_err(|e| {
format!(
@@ -144,6 +173,216 @@ impl TlsRouter for K8sAnywhereTopology {
}
}
#[async_trait]
impl Grafana for K8sAnywhereTopology {
async fn ensure_grafana_operator(
&self,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
debug!("ensure grafana operator");
let client = self.k8s_client().await.unwrap();
let grafana_gvk = GroupVersionKind {
group: "grafana.integreatly.org".to_string(),
version: "v1beta1".to_string(),
kind: "Grafana".to_string(),
};
let name = "grafanas.grafana.integreatly.org";
let ns = "grafana";
let grafana_crd = client
.get_resource_json_value(name, Some(ns), &grafana_gvk)
.await;
match grafana_crd {
Ok(_) => {
return Ok(PreparationOutcome::Success {
details: "Found grafana CRDs in cluster".to_string(),
});
}
Err(_) => {
return self
.install_grafana_operator(inventory, Some("grafana"))
.await;
}
};
}
async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError> {
let ns = "grafana";
let mut label = BTreeMap::new();
label.insert("dashboards".to_string(), "grafana".to_string());
let label_selector = LabelSelector {
match_labels: label.clone(),
match_expressions: vec![],
};
let client = self.k8s_client().await?;
let grafana = self.build_grafana(ns, &label);
client.apply(&grafana, Some(ns)).await?;
//TODO change this to a ensure ready or something better than just a timeout
client
.wait_until_deployment_ready(
"grafana-grafana-deployment",
Some("grafana"),
Some(Duration::from_secs(30)),
)
.await?;
let sa_name = "grafana-grafana-sa";
let token_secret_name = "grafana-sa-token-secret";
let sa_token_secret = self.build_sa_token_secret(token_secret_name, sa_name, ns);
client.apply(&sa_token_secret, Some(ns)).await?;
let secret_gvk = GroupVersionKind {
group: "".to_string(),
version: "v1".to_string(),
kind: "Secret".to_string(),
};
let secret = client
.get_resource_json_value(token_secret_name, Some(ns), &secret_gvk)
.await?;
let token = format!(
"Bearer {}",
self.extract_and_normalize_token(&secret).unwrap()
);
debug!("creating grafana clusterrole binding");
let clusterrolebinding =
self.build_cluster_rolebinding(sa_name, "cluster-monitoring-view", ns);
client.apply(&clusterrolebinding, Some(ns)).await?;
debug!("creating grafana datasource crd");
let thanos_url = format!(
"https://{}",
self.get_domain("thanos-querier-openshift-monitoring")
.await
.unwrap()
);
let thanos_openshift_datasource = self.build_grafana_datasource(
"thanos-openshift-monitoring",
ns,
&label_selector,
&thanos_url,
&token,
);
client.apply(&thanos_openshift_datasource, Some(ns)).await?;
debug!("creating grafana dashboard crd");
let dashboard = self.build_grafana_dashboard(ns, &label_selector);
client.apply(&dashboard, Some(ns)).await?;
debug!("creating grafana ingress");
let grafana_ingress = self.build_grafana_ingress(ns).await;
grafana_ingress
.interpret(&Inventory::empty(), self)
.await
.map_err(|e| PreparationError::new(e.to_string()))?;
Ok(PreparationOutcome::Success {
details: "Installed grafana composants".to_string(),
})
}
}
#[async_trait]
impl PrometheusMonitoring<CRDPrometheus> for K8sAnywhereTopology {
async fn install_prometheus(
&self,
sender: &CRDPrometheus,
_inventory: &Inventory,
_receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let client = self.k8s_client().await?;
for monitor in sender.service_monitor.iter() {
client
.apply(monitor, Some(&sender.namespace))
.await
.map_err(|e| PreparationError::new(e.to_string()))?;
}
Ok(PreparationOutcome::Success {
details: "successfuly installed prometheus components".to_string(),
})
}
async fn ensure_prometheus_operator(
&self,
sender: &CRDPrometheus,
_inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
let po_result = self.ensure_prometheus_operator(sender).await?;
match po_result {
PreparationOutcome::Success { details: _ } => {
debug!("Detected prometheus crds operator present in cluster.");
return Ok(po_result);
}
PreparationOutcome::Noop => {
debug!("Skipping Prometheus CR installation due to missing operator.");
return Ok(po_result);
}
}
}
}
#[async_trait]
impl PrometheusMonitoring<RHOBObservability> for K8sAnywhereTopology {
async fn install_prometheus(
&self,
sender: &RHOBObservability,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<RHOBObservability>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let po_result = self.ensure_cluster_observability_operator(sender).await?;
if po_result == PreparationOutcome::Noop {
debug!("Skipping Prometheus CR installation due to missing operator.");
return Ok(po_result);
}
let result = self
.get_cluster_observability_operator_prometheus_application_score(
sender.clone(),
receivers,
)
.await
.interpret(inventory, self)
.await;
match result {
Ok(outcome) => match outcome.status {
InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
details: outcome.message,
}),
InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
_ => Err(PreparationError::new(outcome.message)),
},
Err(err) => Err(PreparationError::new(err.to_string())),
}
}
async fn ensure_prometheus_operator(
&self,
sender: &RHOBObservability,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
todo!()
}
}
impl Serialize for K8sAnywhereTopology {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
@@ -348,6 +587,23 @@ impl K8sAnywhereTopology {
}
}
fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
let token_b64 = secret
.data
.get("token")
.or_else(|| secret.data.get("data").and_then(|d| d.get("token")))
.and_then(|v| v.as_str())?;
let bytes = general_purpose::STANDARD.decode(token_b64).ok()?;
let s = String::from_utf8(bytes).ok()?;
let cleaned = s
.trim_matches(|c: char| c.is_whitespace() || c == '\0')
.to_string();
Some(cleaned)
}
pub async fn get_k8s_distribution(&self) -> Result<KubernetesDistribution, PreparationError> {
self.k8s_client()
.await?
@@ -407,6 +663,141 @@ impl K8sAnywhereTopology {
}
}
fn build_grafana_datasource(
&self,
name: &str,
ns: &str,
label_selector: &LabelSelector,
url: &str,
token: &str,
) -> GrafanaDatasource {
let mut json_data = BTreeMap::new();
json_data.insert("timeInterval".to_string(), "5s".to_string());
GrafanaDatasource {
metadata: ObjectMeta {
name: Some(name.to_string()),
namespace: Some(ns.to_string()),
..Default::default()
},
spec: GrafanaDatasourceSpec {
instance_selector: label_selector.clone(),
allow_cross_namespace_import: Some(true),
values_from: None,
datasource: GrafanaDatasourceConfig {
access: "proxy".to_string(),
name: name.to_string(),
r#type: "prometheus".to_string(),
url: url.to_string(),
database: None,
json_data: Some(GrafanaDatasourceJsonData {
time_interval: Some("60s".to_string()),
http_header_name1: Some("Authorization".to_string()),
tls_skip_verify: Some(true),
oauth_pass_thru: Some(true),
}),
secure_json_data: Some(GrafanaDatasourceSecureJsonData {
http_header_value1: Some(format!("Bearer {token}")),
}),
is_default: Some(false),
editable: Some(true),
},
},
}
}
fn build_grafana_dashboard(
&self,
ns: &str,
label_selector: &LabelSelector,
) -> GrafanaDashboard {
let graf_dashboard = GrafanaDashboard {
metadata: ObjectMeta {
name: Some(format!("grafana-dashboard-{}", ns)),
namespace: Some(ns.to_string()),
..Default::default()
},
spec: GrafanaDashboardSpec {
resync_period: Some("30s".to_string()),
instance_selector: label_selector.clone(),
datasources: Some(vec![GrafanaDashboardDatasource {
input_name: "DS_PROMETHEUS".to_string(),
datasource_name: "thanos-openshift-monitoring".to_string(),
}]),
json: None,
grafana_com: Some(GrafanaCom {
id: 17406,
revision: None,
}),
},
};
graf_dashboard
}
fn build_grafana(&self, ns: &str, labels: &BTreeMap<String, String>) -> GrafanaCRD {
let grafana = GrafanaCRD {
metadata: ObjectMeta {
name: Some(format!("grafana-{}", ns)),
namespace: Some(ns.to_string()),
labels: Some(labels.clone()),
..Default::default()
},
spec: GrafanaSpec {
config: None,
admin_user: None,
admin_password: None,
ingress: None,
persistence: None,
resources: None,
},
};
grafana
}
async fn build_grafana_ingress(&self, ns: &str) -> K8sIngressScore {
let domain = self.get_domain(&format!("grafana-{}", ns)).await.unwrap();
let name = format!("{}-grafana", ns);
let backend_service = format!("grafana-{}-service", ns);
K8sIngressScore {
name: fqdn::fqdn!(&name),
host: fqdn::fqdn!(&domain),
backend_service: fqdn::fqdn!(&backend_service),
port: 3000,
path: Some("/".to_string()),
path_type: Some(PathType::Prefix),
namespace: Some(fqdn::fqdn!(&ns)),
ingress_class_name: Some("openshift-default".to_string()),
}
}
async fn get_cluster_observability_operator_prometheus_application_score(
&self,
sender: RHOBObservability,
receivers: Option<Vec<Box<dyn AlertReceiver<RHOBObservability>>>>,
) -> RHOBAlertingScore {
RHOBAlertingScore {
sender,
receivers: receivers.unwrap_or_default(),
service_monitors: vec![],
prometheus_rules: vec![],
}
}
async fn get_k8s_prometheus_application_score(
&self,
sender: CRDPrometheus,
receivers: Option<Vec<Box<dyn AlertReceiver<CRDPrometheus>>>>,
service_monitors: Option<Vec<ServiceMonitor>>,
) -> K8sPrometheusCRDAlertingScore {
return K8sPrometheusCRDAlertingScore {
sender,
receivers: receivers.unwrap_or_default(),
service_monitors: service_monitors.unwrap_or_default(),
prometheus_rules: vec![],
};
}
async fn openshift_ingress_operator_available(&self) -> Result<(), PreparationError> {
let client = self.k8s_client().await?;
let gvk = GroupVersionKind {
@@ -572,6 +963,137 @@ impl K8sAnywhereTopology {
)),
}
}
async fn ensure_cluster_observability_operator(
&self,
sender: &RHOBObservability,
) -> Result<PreparationOutcome, PreparationError> {
let status = Command::new("sh")
.args(["-c", "kubectl get crd -A | grep -i rhobs"])
.status()
.map_err(|e| PreparationError::new(format!("could not connect to cluster: {}", e)))?;
if !status.success() {
if let Some(Some(k8s_state)) = self.k8s_state.get() {
match k8s_state.source {
K8sSource::LocalK3d => {
warn!(
"Installing observability operator is not supported on LocalK3d source"
);
return Ok(PreparationOutcome::Noop);
debug!("installing cluster observability operator");
todo!();
let op_score =
prometheus_operator_helm_chart_score(sender.namespace.clone());
let result = op_score.interpret(&Inventory::empty(), self).await;
return match result {
Ok(outcome) => match outcome.status {
InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
details: "installed cluster observability operator".into(),
}),
InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
_ => Err(PreparationError::new(
"failed to install cluster observability operator (unknown error)".into(),
)),
},
Err(err) => Err(PreparationError::new(err.to_string())),
};
}
K8sSource::Kubeconfig => {
debug!(
"unable to install cluster observability operator, contact cluster admin"
);
return Ok(PreparationOutcome::Noop);
}
}
} else {
warn!(
"Unable to detect k8s_state. Skipping Cluster Observability Operator install."
);
return Ok(PreparationOutcome::Noop);
}
}
debug!("Cluster Observability Operator is already present, skipping install");
Ok(PreparationOutcome::Success {
details: "cluster observability operator present in cluster".into(),
})
}
async fn ensure_prometheus_operator(
&self,
sender: &CRDPrometheus,
) -> Result<PreparationOutcome, PreparationError> {
let status = Command::new("sh")
.args(["-c", "kubectl get crd -A | grep -i prometheuses"])
.status()
.map_err(|e| PreparationError::new(format!("could not connect to cluster: {}", e)))?;
if !status.success() {
if let Some(Some(k8s_state)) = self.k8s_state.get() {
match k8s_state.source {
K8sSource::LocalK3d => {
debug!("installing prometheus operator");
let op_score =
prometheus_operator_helm_chart_score(sender.namespace.clone());
let result = op_score.interpret(&Inventory::empty(), self).await;
return match result {
Ok(outcome) => match outcome.status {
InterpretStatus::SUCCESS => Ok(PreparationOutcome::Success {
details: "installed prometheus operator".into(),
}),
InterpretStatus::NOOP => Ok(PreparationOutcome::Noop),
_ => Err(PreparationError::new(
"failed to install prometheus operator (unknown error)".into(),
)),
},
Err(err) => Err(PreparationError::new(err.to_string())),
};
}
K8sSource::Kubeconfig => {
debug!("unable to install prometheus operator, contact cluster admin");
return Ok(PreparationOutcome::Noop);
}
}
} else {
warn!("Unable to detect k8s_state. Skipping Prometheus Operator install.");
return Ok(PreparationOutcome::Noop);
}
}
debug!("Prometheus operator is already present, skipping install");
Ok(PreparationOutcome::Success {
details: "prometheus operator present in cluster".into(),
})
}
async fn install_grafana_operator(
&self,
inventory: &Inventory,
ns: Option<&str>,
) -> Result<PreparationOutcome, PreparationError> {
let namespace = ns.unwrap_or("grafana");
info!("installing grafana operator in ns {namespace}");
let tenant = self.get_k8s_tenant_manager()?.get_tenant_config().await;
let mut namespace_scope = false;
if tenant.is_some() {
namespace_scope = true;
}
let _grafana_operator_score = grafana_helm_chart_score(namespace, namespace_scope)
.interpret(inventory, self)
.await
.map_err(|e| PreparationError::new(e.to_string()));
Ok(PreparationOutcome::Success {
details: format!(
"Successfully installed grafana operator in ns {}",
ns.unwrap()
),
})
}
}
#[derive(Clone, Debug)]
@@ -609,6 +1131,7 @@ pub struct K8sAnywhereConfig {
///
/// If the context name is not found, it will fail to initialize.
pub k8s_context: Option<String>,
public_domain: Option<String>,
}
impl K8sAnywhereConfig {
@@ -636,6 +1159,7 @@ impl K8sAnywhereConfig {
let mut kubeconfig: Option<String> = None;
let mut k8s_context: Option<String> = None;
let mut public_domain: Option<String> = None;
for part in env_var_value.split(',') {
let kv: Vec<&str> = part.splitn(2, '=').collect();
@@ -643,6 +1167,7 @@ impl K8sAnywhereConfig {
match kv[0].trim() {
"kubeconfig" => kubeconfig = Some(kv[1].trim().to_string()),
"context" => k8s_context = Some(kv[1].trim().to_string()),
"public_domain" => public_domain = Some(kv[1].trim().to_string()),
_ => {}
}
}
@@ -660,6 +1185,7 @@ impl K8sAnywhereConfig {
K8sAnywhereConfig {
kubeconfig,
k8s_context,
public_domain,
use_system_kubeconfig,
autoinstall: false,
use_local_k3d: false,
@@ -702,6 +1228,7 @@ impl K8sAnywhereConfig {
use_local_k3d: std::env::var("HARMONY_USE_LOCAL_K3D")
.map_or_else(|_| true, |v| v.parse().ok().unwrap_or(true)),
k8s_context: std::env::var("HARMONY_K8S_CONTEXT").ok(),
public_domain: std::env::var("HARMONY_PUBLIC_DOMAIN").ok(),
}
}
}

View File

@@ -1,5 +1,4 @@
mod k8s_anywhere;
pub mod nats;
pub mod observability;
mod postgres;
pub use k8s_anywhere::*;

View File

@@ -1,147 +0,0 @@
use async_trait::async_trait;
use crate::{
inventory::Inventory,
modules::monitoring::grafana::{
grafana::Grafana,
k8s::{
score_ensure_grafana_ready::GrafanaK8sEnsureReadyScore,
score_grafana_alert_receiver::GrafanaK8sReceiverScore,
score_grafana_datasource::GrafanaK8sDatasourceScore,
score_grafana_rule::GrafanaK8sRuleScore, score_install_grafana::GrafanaK8sInstallScore,
},
},
score::Score,
topology::{
K8sAnywhereTopology, PreparationError, PreparationOutcome,
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
},
};
#[async_trait]
impl Observability<Grafana> for K8sAnywhereTopology {
async fn install_alert_sender(
&self,
sender: &Grafana,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
let score = GrafanaK8sInstallScore {
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Grafana not installed {}", e)))?;
Ok(PreparationOutcome::Success {
details: "Successfully installed grafana alert sender".to_string(),
})
}
async fn install_receivers(
&self,
sender: &Grafana,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<Grafana>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let receivers = match receivers {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for receiver in receivers {
let score = GrafanaK8sReceiverScore {
receiver,
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All alert receivers installed successfully".to_string(),
})
}
async fn install_rules(
&self,
sender: &Grafana,
inventory: &Inventory,
rules: Option<Vec<Box<dyn AlertRule<Grafana>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let rules = match rules {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for rule in rules {
let score = GrafanaK8sRuleScore {
sender: sender.clone(),
rule,
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All alert rules installed successfully".to_string(),
})
}
async fn add_scrape_targets(
&self,
sender: &Grafana,
inventory: &Inventory,
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Grafana>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let scrape_targets = match scrape_targets {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for scrape_target in scrape_targets {
let score = GrafanaK8sDatasourceScore {
scrape_target,
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to add DataSource: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All datasources installed successfully".to_string(),
})
}
async fn ensure_monitoring_installed(
&self,
sender: &Grafana,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
let score = GrafanaK8sEnsureReadyScore {
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Grafana not ready {}", e)))?;
Ok(PreparationOutcome::Success {
details: "Grafana Ready".to_string(),
})
}
}

View File

@@ -1,142 +0,0 @@
use async_trait::async_trait;
use crate::{
inventory::Inventory,
modules::monitoring::kube_prometheus::{
KubePrometheus, helm::kube_prometheus_helm_chart::kube_prometheus_helm_chart_score,
score_kube_prometheus_alert_receivers::KubePrometheusReceiverScore,
score_kube_prometheus_ensure_ready::KubePrometheusEnsureReadyScore,
score_kube_prometheus_rule::KubePrometheusRuleScore,
score_kube_prometheus_scrape_target::KubePrometheusScrapeTargetScore,
},
score::Score,
topology::{
K8sAnywhereTopology, PreparationError, PreparationOutcome,
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
},
};
#[async_trait]
impl Observability<KubePrometheus> for K8sAnywhereTopology {
async fn install_alert_sender(
&self,
sender: &KubePrometheus,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
kube_prometheus_helm_chart_score(sender.config.clone())
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(e.to_string()))?;
Ok(PreparationOutcome::Success {
details: "Successfully installed kubeprometheus alert sender".to_string(),
})
}
async fn install_receivers(
&self,
sender: &KubePrometheus,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<KubePrometheus>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let receivers = match receivers {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for receiver in receivers {
let score = KubePrometheusReceiverScore {
receiver,
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All alert receivers installed successfully".to_string(),
})
}
async fn install_rules(
&self,
sender: &KubePrometheus,
inventory: &Inventory,
rules: Option<Vec<Box<dyn AlertRule<KubePrometheus>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let rules = match rules {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for rule in rules {
let score = KubePrometheusRuleScore {
sender: sender.clone(),
rule,
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All alert rules installed successfully".to_string(),
})
}
async fn add_scrape_targets(
&self,
sender: &KubePrometheus,
inventory: &Inventory,
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<KubePrometheus>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let scrape_targets = match scrape_targets {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for scrape_target in scrape_targets {
let score = KubePrometheusScrapeTargetScore {
scrape_target,
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All scrap targets installed successfully".to_string(),
})
}
async fn ensure_monitoring_installed(
&self,
sender: &KubePrometheus,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
let score = KubePrometheusEnsureReadyScore {
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("KubePrometheus not ready {}", e)))?;
Ok(PreparationOutcome::Success {
details: "KubePrometheus Ready".to_string(),
})
}
}

View File

@@ -1,5 +0,0 @@
pub mod grafana;
pub mod kube_prometheus;
pub mod openshift_monitoring;
pub mod prometheus;
pub mod redhat_cluster_observability;

View File

@@ -1,142 +0,0 @@
use async_trait::async_trait;
use log::info;
use crate::score::Score;
use crate::{
inventory::Inventory,
modules::monitoring::okd::{
OpenshiftClusterAlertSender,
score_enable_cluster_monitoring::OpenshiftEnableClusterMonitoringScore,
score_openshift_alert_rule::OpenshiftAlertRuleScore,
score_openshift_receiver::OpenshiftReceiverScore,
score_openshift_scrape_target::OpenshiftScrapeTargetScore,
score_user_workload::OpenshiftUserWorkloadMonitoring,
score_verify_user_workload_monitoring::VerifyUserWorkload,
},
topology::{
K8sAnywhereTopology, PreparationError, PreparationOutcome,
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
},
};
#[async_trait]
impl Observability<OpenshiftClusterAlertSender> for K8sAnywhereTopology {
async fn install_alert_sender(
&self,
_sender: &OpenshiftClusterAlertSender,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
info!("enabling cluster monitoring");
let cluster_monitoring_score = OpenshiftEnableClusterMonitoringScore {};
cluster_monitoring_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError { msg: e.to_string() })?;
info!("enabling user workload monitoring");
let user_workload_score = OpenshiftUserWorkloadMonitoring {};
user_workload_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError { msg: e.to_string() })?;
Ok(PreparationOutcome::Success {
details: "Successfully configured cluster monitoring".to_string(),
})
}
async fn install_receivers(
&self,
_sender: &OpenshiftClusterAlertSender,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>>,
) -> Result<PreparationOutcome, PreparationError> {
if let Some(receivers) = receivers {
for receiver in receivers {
info!("Installing receiver {}", receiver.name());
let receiver_score = OpenshiftReceiverScore { receiver };
receiver_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError { msg: e.to_string() })?;
}
Ok(PreparationOutcome::Success {
details: "Successfully installed receivers for OpenshiftClusterMonitoring"
.to_string(),
})
} else {
Ok(PreparationOutcome::Noop)
}
}
async fn install_rules(
&self,
_sender: &OpenshiftClusterAlertSender,
inventory: &Inventory,
rules: Option<Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>>>,
) -> Result<PreparationOutcome, PreparationError> {
if let Some(rules) = rules {
for rule in rules {
info!("Installing rule ");
let rule_score = OpenshiftAlertRuleScore { rule: rule };
rule_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError { msg: e.to_string() })?;
}
Ok(PreparationOutcome::Success {
details: "Successfully installed rules for OpenshiftClusterMonitoring".to_string(),
})
} else {
Ok(PreparationOutcome::Noop)
}
}
async fn add_scrape_targets(
&self,
_sender: &OpenshiftClusterAlertSender,
inventory: &Inventory,
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>>>,
) -> Result<PreparationOutcome, PreparationError> {
if let Some(scrape_targets) = scrape_targets {
for scrape_target in scrape_targets {
info!("Installing scrape target");
let scrape_target_score = OpenshiftScrapeTargetScore {
scrape_target: scrape_target,
};
scrape_target_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError { msg: e.to_string() })?;
}
Ok(PreparationOutcome::Success {
details: "Successfully added scrape targets for OpenshiftClusterMonitoring"
.to_string(),
})
} else {
Ok(PreparationOutcome::Noop)
}
}
async fn ensure_monitoring_installed(
&self,
_sender: &OpenshiftClusterAlertSender,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
let verify_monitoring_score = VerifyUserWorkload {};
info!("Verifying user workload and cluster monitoring installed");
verify_monitoring_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError { msg: e.to_string() })?;
Ok(PreparationOutcome::Success {
details: "OpenshiftClusterMonitoring ready".to_string(),
})
}
}

View File

@@ -1,147 +0,0 @@
use async_trait::async_trait;
use crate::{
inventory::Inventory,
modules::monitoring::prometheus::{
Prometheus, score_prometheus_alert_receivers::PrometheusReceiverScore,
score_prometheus_ensure_ready::PrometheusEnsureReadyScore,
score_prometheus_install::PrometheusInstallScore,
score_prometheus_rule::PrometheusRuleScore,
score_prometheus_scrape_target::PrometheusScrapeTargetScore,
},
score::Score,
topology::{
K8sAnywhereTopology, PreparationError, PreparationOutcome,
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
},
};
#[async_trait]
impl Observability<Prometheus> for K8sAnywhereTopology {
async fn install_alert_sender(
&self,
sender: &Prometheus,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
let score = PrometheusInstallScore {
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Prometheus not installed {}", e)))?;
Ok(PreparationOutcome::Success {
details: "Successfully installed kubeprometheus alert sender".to_string(),
})
}
async fn install_receivers(
&self,
sender: &Prometheus,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<Prometheus>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let receivers = match receivers {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for receiver in receivers {
let score = PrometheusReceiverScore {
receiver,
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to install receiver: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All alert receivers installed successfully".to_string(),
})
}
async fn install_rules(
&self,
sender: &Prometheus,
inventory: &Inventory,
rules: Option<Vec<Box<dyn AlertRule<Prometheus>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let rules = match rules {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for rule in rules {
let score = PrometheusRuleScore {
sender: sender.clone(),
rule,
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All alert rules installed successfully".to_string(),
})
}
async fn add_scrape_targets(
&self,
sender: &Prometheus,
inventory: &Inventory,
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Prometheus>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let scrape_targets = match scrape_targets {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for scrape_target in scrape_targets {
let score = PrometheusScrapeTargetScore {
scrape_target,
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Failed to install rule: {}", e)))?;
}
Ok(PreparationOutcome::Success {
details: "All scrap targets installed successfully".to_string(),
})
}
async fn ensure_monitoring_installed(
&self,
sender: &Prometheus,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
let score = PrometheusEnsureReadyScore {
sender: sender.clone(),
};
score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(format!("Prometheus not ready {}", e)))?;
Ok(PreparationOutcome::Success {
details: "Prometheus Ready".to_string(),
})
}
}

View File

@@ -1,116 +0,0 @@
use crate::{
modules::monitoring::red_hat_cluster_observability::{
score_alert_receiver::RedHatClusterObservabilityReceiverScore,
score_coo_monitoring_stack::RedHatClusterObservabilityMonitoringStackScore,
},
score::Score,
};
use async_trait::async_trait;
use log::info;
use crate::{
inventory::Inventory,
modules::monitoring::red_hat_cluster_observability::{
RedHatClusterObservability,
score_redhat_cluster_observability_operator::RedHatClusterObservabilityOperatorScore,
},
topology::{
K8sAnywhereTopology, PreparationError, PreparationOutcome,
monitoring::{AlertReceiver, AlertRule, Observability, ScrapeTarget},
},
};
#[async_trait]
impl Observability<RedHatClusterObservability> for K8sAnywhereTopology {
async fn install_alert_sender(
&self,
sender: &RedHatClusterObservability,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
info!("Verifying Redhat Cluster Observability Operator");
let coo_score = RedHatClusterObservabilityOperatorScore::default();
coo_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(e.to_string()))?;
info!(
"Installing Cluster Observability Operator Monitoring Stack in ns {}",
sender.namespace.clone()
);
let coo_monitoring_stack_score = RedHatClusterObservabilityMonitoringStackScore {
namespace: sender.namespace.clone(),
resource_selector: sender.resource_selector.clone(),
};
coo_monitoring_stack_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(e.to_string()))?;
Ok(PreparationOutcome::Success {
details: "Successfully installed RedHatClusterObservability Operator".to_string(),
})
}
async fn install_receivers(
&self,
sender: &RedHatClusterObservability,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>>,
) -> Result<PreparationOutcome, PreparationError> {
let receivers = match receivers {
Some(r) if !r.is_empty() => r,
_ => return Ok(PreparationOutcome::Noop),
};
for receiver in receivers {
info!("Installing receiver {}", receiver.name());
let receiver_score = RedHatClusterObservabilityReceiverScore {
receiver,
sender: sender.clone(),
};
receiver_score
.create_interpret()
.execute(inventory, self)
.await
.map_err(|e| PreparationError::new(e.to_string()))?;
}
Ok(PreparationOutcome::Success {
details: "Successfully installed receivers for OpenshiftClusterMonitoring".to_string(),
})
}
async fn install_rules(
&self,
_sender: &RedHatClusterObservability,
_inventory: &Inventory,
_rules: Option<Vec<Box<dyn AlertRule<RedHatClusterObservability>>>>,
) -> Result<PreparationOutcome, PreparationError> {
todo!()
}
async fn add_scrape_targets(
&self,
_sender: &RedHatClusterObservability,
_inventory: &Inventory,
_scrape_targets: Option<Vec<Box<dyn ScrapeTarget<RedHatClusterObservability>>>>,
) -> Result<PreparationOutcome, PreparationError> {
todo!()
}
async fn ensure_monitoring_installed(
&self,
_sender: &RedHatClusterObservability,
_inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError> {
todo!()
}
}

View File

@@ -106,6 +106,7 @@ pub enum SSL {
#[derive(Debug, Clone, PartialEq, Serialize)]
pub enum HealthCheck {
HTTP(String, HttpMethod, HttpStatusCode, SSL),
/// HTTP(None, "/healthz/ready", HttpMethod::GET, HttpStatusCode::Success2xx, SSL::Disabled)
HTTP(Option<u16>, String, HttpMethod, HttpStatusCode, SSL),
TCP(Option<u16>),
}

View File

@@ -2,7 +2,6 @@ pub mod decentralized;
mod failover;
mod ha_cluster;
pub mod ingress;
pub mod monitoring;
pub mod node_exporter;
pub mod opnsense;
pub use failover::*;
@@ -12,6 +11,7 @@ mod http;
pub mod installable;
mod k8s_anywhere;
mod localhost;
pub mod oberservability;
pub mod tenant;
use derive_new::new;
pub use k8s_anywhere::*;

View File

@@ -1,256 +0,0 @@
use std::{
any::Any,
collections::{BTreeMap, HashMap},
net::IpAddr,
};
use async_trait::async_trait;
use kube::api::DynamicObject;
use log::{debug, info};
use serde::{Deserialize, Serialize};
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
topology::{PreparationError, PreparationOutcome, Topology, installable::Installable},
};
use harmony_types::id::Id;
/// Defines the application that sends the alerts to a receivers
/// for example prometheus
#[async_trait]
pub trait AlertSender: Send + Sync + std::fmt::Debug {
fn name(&self) -> String;
}
/// Trait which defines how an alert sender is impleneted for a specific topology
#[async_trait]
pub trait Observability<S: AlertSender> {
async fn install_alert_sender(
&self,
sender: &S,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError>;
async fn install_receivers(
&self,
sender: &S,
inventory: &Inventory,
receivers: Option<Vec<Box<dyn AlertReceiver<S>>>>,
) -> Result<PreparationOutcome, PreparationError>;
async fn install_rules(
&self,
sender: &S,
inventory: &Inventory,
rules: Option<Vec<Box<dyn AlertRule<S>>>>,
) -> Result<PreparationOutcome, PreparationError>;
async fn add_scrape_targets(
&self,
sender: &S,
inventory: &Inventory,
scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
) -> Result<PreparationOutcome, PreparationError>;
async fn ensure_monitoring_installed(
&self,
sender: &S,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError>;
}
/// Defines the entity that receives the alerts from a sender. For example Discord, Slack, etc
///
pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError>;
fn name(&self) -> String;
fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
}
/// Defines a generic rule that can be applied to a sender, such as aprometheus alert rule
pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
fn build_rule(&self) -> Result<serde_json::Value, InterpretError>;
fn name(&self) -> String;
fn clone_box(&self) -> Box<dyn AlertRule<S>>;
}
/// A generic scrape target that can be added to a sender to scrape metrics from, for example a
/// server outside of the cluster
pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
fn build_scrape_target(&self) -> Result<ExternalScrapeTarget, InterpretError>;
fn name(&self) -> String;
fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ExternalScrapeTarget {
pub ip: IpAddr,
pub port: i32,
pub interval: Option<String>,
pub path: Option<String>,
pub labels: Option<BTreeMap<String, String>>,
}
/// Alerting interpret to install an alert sender on a given topology
#[derive(Debug)]
pub struct AlertingInterpret<S: AlertSender> {
pub sender: S,
pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
pub rules: Vec<Box<dyn AlertRule<S>>>,
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
}
#[async_trait]
impl<S: AlertSender, T: Topology + Observability<S>> Interpret<T> for AlertingInterpret<S> {
async fn execute(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
info!("Configuring alert sender {}", self.sender.name());
topology
.install_alert_sender(&self.sender, inventory)
.await?;
info!("Installing receivers");
topology
.install_receivers(&self.sender, inventory, Some(self.receivers.clone()))
.await?;
info!("Installing rules");
topology
.install_rules(&self.sender, inventory, Some(self.rules.clone()))
.await?;
info!("Adding extra scrape targets");
topology
.add_scrape_targets(&self.sender, inventory, self.scrape_targets.clone())
.await?;
info!("Ensuring alert sender {} is ready", self.sender.name());
topology
.ensure_monitoring_installed(&self.sender, inventory)
.await?;
Ok(Outcome::success(format!(
"successfully installed alert sender {}",
self.sender.name()
)))
}
fn get_name(&self) -> InterpretName {
InterpretName::Alerting
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
impl<S: AlertSender> Clone for Box<dyn AlertReceiver<S>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
impl<S: AlertSender> Clone for Box<dyn AlertRule<S>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
impl<S: AlertSender> Clone for Box<dyn ScrapeTarget<S>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
pub struct ReceiverInstallPlan {
pub install_operation: Option<Vec<InstallOperation>>,
pub route: Option<AlertRoute>,
pub receiver: Option<serde_yaml::Value>,
}
impl Default for ReceiverInstallPlan {
fn default() -> Self {
Self {
install_operation: None,
route: None,
receiver: None,
}
}
}
pub enum InstallOperation {
CreateSecret {
name: String,
data: BTreeMap<String, String>,
},
}
///Generic routing that can map to various alert sender backends
#[derive(Debug, Clone, Serialize)]
pub struct AlertRoute {
pub receiver: String,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub matchers: Vec<AlertMatcher>,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub group_by: Vec<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub repeat_interval: Option<String>,
#[serde(rename = "continue")]
pub continue_matching: bool,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub children: Vec<AlertRoute>,
}
impl AlertRoute {
pub fn default(name: String) -> Self {
Self {
receiver: name,
matchers: vec![],
group_by: vec![],
repeat_interval: Some("30s".to_string()),
continue_matching: true,
children: vec![],
}
}
}
#[derive(Debug, Clone, Serialize)]
pub struct AlertMatcher {
pub label: String,
pub operator: MatchOp,
pub value: String,
}
#[derive(Debug, Clone)]
pub enum MatchOp {
Eq,
NotEq,
Regex,
}
impl Serialize for MatchOp {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
let op = match self {
MatchOp::Eq => "=",
MatchOp::NotEq => "!=",
MatchOp::Regex => "=~",
};
serializer.serialize_str(op)
}
}

View File

@@ -0,0 +1 @@
pub mod monitoring;

View File

@@ -0,0 +1,101 @@
use std::{any::Any, collections::HashMap};
use async_trait::async_trait;
use kube::api::DynamicObject;
use log::debug;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
topology::{Topology, installable::Installable},
};
use harmony_types::id::Id;
#[async_trait]
pub trait AlertSender: Send + Sync + std::fmt::Debug {
fn name(&self) -> String;
}
#[derive(Debug)]
pub struct AlertingInterpret<S: AlertSender> {
pub sender: S,
pub receivers: Vec<Box<dyn AlertReceiver<S>>>,
pub rules: Vec<Box<dyn AlertRule<S>>>,
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<S>>>>,
}
#[async_trait]
impl<S: AlertSender + Installable<T>, T: Topology> Interpret<T> for AlertingInterpret<S> {
async fn execute(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
debug!("hit sender configure for AlertingInterpret");
self.sender.configure(inventory, topology).await?;
for receiver in self.receivers.iter() {
receiver.install(&self.sender).await?;
}
for rule in self.rules.iter() {
debug!("installing rule: {:#?}", rule);
rule.install(&self.sender).await?;
}
if let Some(targets) = &self.scrape_targets {
for target in targets.iter() {
debug!("installing scrape_target: {:#?}", target);
target.install(&self.sender).await?;
}
}
self.sender.ensure_installed(inventory, topology).await?;
Ok(Outcome::success(format!(
"successfully installed alert sender {}",
self.sender.name()
)))
}
fn get_name(&self) -> InterpretName {
InterpretName::Alerting
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
#[async_trait]
pub trait AlertReceiver<S: AlertSender>: std::fmt::Debug + Send + Sync {
async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
fn name(&self) -> String;
fn clone_box(&self) -> Box<dyn AlertReceiver<S>>;
fn as_any(&self) -> &dyn Any;
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String>;
}
#[derive(Debug)]
pub struct AlertManagerReceiver {
pub receiver_config: serde_json::Value,
// FIXME we should not leak k8s here. DynamicObject is k8s specific
pub additional_ressources: Vec<DynamicObject>,
pub route_config: serde_json::Value,
}
#[async_trait]
pub trait AlertRule<S: AlertSender>: std::fmt::Debug + Send + Sync {
async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
fn clone_box(&self) -> Box<dyn AlertRule<S>>;
}
#[async_trait]
pub trait ScrapeTarget<S: AlertSender>: std::fmt::Debug + Send + Sync {
async fn install(&self, sender: &S) -> Result<Outcome, InterpretError>;
fn clone_box(&self) -> Box<dyn ScrapeTarget<S>>;
}

View File

@@ -122,4 +122,6 @@ pub trait TlsRouter: Send + Sync {
/// Returns the port that this router exposes externally.
async fn get_router_port(&self) -> u16;
async fn get_public_domain(&self) -> Result<String, String>;
}

View File

@@ -216,7 +216,15 @@ pub(crate) fn get_health_check_for_backend(
SSL::Other(other.to_string())
}
};
Some(HealthCheck::HTTP(path, method, status_code, ssl))
let port = haproxy_health_check
.checkport
.content_string()
.parse::<u16>()
.ok();
debug!("Found haproxy healthcheck port {port:?}");
Some(HealthCheck::HTTP(port, path, method, status_code, ssl))
}
_ => panic!("Received unsupported health check type {}", uppercase),
}
@@ -251,7 +259,7 @@ pub(crate) fn harmony_load_balancer_service_to_haproxy_xml(
// frontend points to backend
let healthcheck = if let Some(health_check) = &service.health_check {
match health_check {
HealthCheck::HTTP(path, http_method, _http_status_code, ssl) => {
HealthCheck::HTTP(port, path, http_method, _http_status_code, ssl) => {
let ssl: MaybeString = match ssl {
SSL::SSL => "ssl".into(),
SSL::SNI => "sslni".into(),
@@ -259,14 +267,21 @@ pub(crate) fn harmony_load_balancer_service_to_haproxy_xml(
SSL::Default => "".into(),
SSL::Other(other) => other.as_str().into(),
};
let path_without_query = path.split_once('?').map_or(path.as_str(), |(p, _)| p);
let (port, port_name) = match port {
Some(port) => (Some(port.to_string()), port.to_string()),
None => (None, "serverport".to_string()),
};
let haproxy_check = HAProxyHealthCheck {
name: format!("HTTP_{http_method}_{path}"),
name: format!("HTTP_{http_method}_{path_without_query}_{port_name}"),
uuid: Uuid::new_v4().to_string(),
http_method: http_method.to_string().into(),
http_method: http_method.to_string().to_lowercase().into(),
health_check_type: "http".to_string(),
http_uri: path.clone().into(),
interval: "2s".to_string(),
ssl,
checkport: MaybeString::from(port.map(|p| p.to_string())),
..Default::default()
};
@@ -305,7 +320,10 @@ pub(crate) fn harmony_load_balancer_service_to_haproxy_xml(
let mut backend = HAProxyBackend {
uuid: Uuid::new_v4().to_string(),
enabled: 1,
name: format!("backend_{}", service.listening_port),
name: format!(
"backend_{}",
service.listening_port.to_string().replace(':', "_")
),
algorithm: "roundrobin".to_string(),
random_draws: Some(2),
stickiness_expire: "30m".to_string(),
@@ -337,10 +355,22 @@ pub(crate) fn harmony_load_balancer_service_to_haproxy_xml(
let frontend = Frontend {
uuid: uuid::Uuid::new_v4().to_string(),
enabled: 1,
name: format!("frontend_{}", service.listening_port),
name: format!(
"frontend_{}",
service.listening_port.to_string().replace(':', "_")
),
bind: service.listening_port.to_string(),
mode: "tcp".to_string(), // TODO do not depend on health check here
default_backend: Some(backend.uuid.clone()),
stickiness_expire: "30m".to_string().into(),
stickiness_size: "50k".to_string().into(),
stickiness_conn_rate_period: "10s".to_string().into(),
stickiness_sess_rate_period: "10s".to_string().into(),
stickiness_http_req_rate_period: "10s".to_string().into(),
stickiness_http_err_rate_period: "10s".to_string().into(),
stickiness_bytes_in_rate_period: "1m".to_string().into(),
stickiness_bytes_out_rate_period: "1m".to_string().into(),
ssl_hsts_max_age: 15768000,
..Default::default()
};
info!("HAPRoxy frontend and backend mode currently hardcoded to tcp");

View File

@@ -2,15 +2,13 @@ use crate::modules::application::{
Application, ApplicationFeature, InstallationError, InstallationOutcome,
};
use crate::modules::monitoring::application_monitoring::application_monitoring_score::ApplicationMonitoringScore;
use crate::modules::monitoring::grafana::grafana::Grafana;
use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus;
use crate::modules::monitoring::kube_prometheus::crd::service_monitor::{
ServiceMonitor, ServiceMonitorSpec,
};
use crate::modules::monitoring::prometheus::Prometheus;
use crate::modules::monitoring::prometheus::helm::prometheus_config::PrometheusConfig;
use crate::topology::MultiTargetTopology;
use crate::topology::ingress::Ingress;
use crate::topology::monitoring::Observability;
use crate::topology::monitoring::{AlertReceiver, AlertRoute};
use crate::{
inventory::Inventory,
modules::monitoring::{
@@ -19,6 +17,10 @@ use crate::{
score::Score,
topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
};
use crate::{
modules::prometheus::prometheus::PrometheusMonitoring,
topology::oberservability::monitoring::AlertReceiver,
};
use async_trait::async_trait;
use base64::{Engine as _, engine::general_purpose};
use harmony_secret::SecretManager;
@@ -28,13 +30,12 @@ use kube::api::ObjectMeta;
use log::{debug, info};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use std::sync::{Arc, Mutex};
use std::sync::Arc;
//TODO test this
#[derive(Debug, Clone)]
pub struct Monitoring {
pub application: Arc<dyn Application>,
pub alert_receiver: Vec<Box<dyn AlertReceiver<Prometheus>>>,
pub alert_receiver: Vec<Box<dyn AlertReceiver<CRDPrometheus>>>,
}
#[async_trait]
@@ -45,7 +46,8 @@ impl<
+ TenantManager
+ K8sclient
+ MultiTargetTopology
+ Observability<Prometheus>
+ PrometheusMonitoring<CRDPrometheus>
+ Grafana
+ Ingress
+ std::fmt::Debug,
> ApplicationFeature<T> for Monitoring
@@ -72,15 +74,17 @@ impl<
};
let mut alerting_score = ApplicationMonitoringScore {
sender: Prometheus {
config: Arc::new(Mutex::new(PrometheusConfig::new())),
sender: CRDPrometheus {
namespace: namespace.clone(),
client: topology.k8s_client().await.unwrap(),
service_monitor: vec![app_service_monitor],
},
application: self.application.clone(),
receivers: self.alert_receiver.clone(),
};
let ntfy = NtfyScore {
namespace: namespace.clone(),
host: domain.clone(),
host: domain,
};
ntfy.interpret(&Inventory::empty(), topology)
.await
@@ -101,28 +105,20 @@ impl<
debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");
debug!("ntfy_default_auth_param: {ntfy_default_auth_param}");
let ntfy_receiver = WebhookReceiver {
name: "ntfy-webhook".to_string(),
url: Url::Url(
url::Url::parse(
format!(
"http://{domain}/{}?auth={ntfy_default_auth_param}",
__self.application.name()
"http://ntfy.{}.svc.cluster.local/rust-web-app?auth={ntfy_default_auth_param}",
namespace.clone()
)
.as_str(),
)
.unwrap(),
),
route: AlertRoute {
..AlertRoute::default("ntfy-webhook".to_string())
},
};
debug!(
"ntfy webhook receiver \n{:#?}\nntfy topic: {}",
ntfy_receiver.clone(),
self.application.name()
);
alerting_score.receivers.push(Box::new(ntfy_receiver));
alerting_score
.interpret(&Inventory::empty(), topology)

View File

@@ -3,13 +3,11 @@ use std::sync::Arc;
use crate::modules::application::{
Application, ApplicationFeature, InstallationError, InstallationOutcome,
};
use crate::modules::monitoring::application_monitoring::rhobs_application_monitoring_score::ApplicationRHOBMonitoringScore;
use crate::modules::monitoring::red_hat_cluster_observability::RedHatClusterObservability;
use crate::modules::monitoring::red_hat_cluster_observability::redhat_cluster_observability::RedHatClusterObservabilityScore;
use crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::RHOBObservability;
use crate::topology::MultiTargetTopology;
use crate::topology::ingress::Ingress;
use crate::topology::monitoring::Observability;
use crate::topology::monitoring::{AlertReceiver, AlertRoute};
use crate::{
inventory::Inventory,
modules::monitoring::{
@@ -18,6 +16,10 @@ use crate::{
score::Score,
topology::{HelmCommand, K8sclient, Topology, tenant::TenantManager},
};
use crate::{
modules::prometheus::prometheus::PrometheusMonitoring,
topology::oberservability::monitoring::AlertReceiver,
};
use async_trait::async_trait;
use base64::{Engine as _, engine::general_purpose};
use harmony_types::net::Url;
@@ -26,10 +28,9 @@ use log::{debug, info};
#[derive(Debug, Clone)]
pub struct Monitoring {
pub application: Arc<dyn Application>,
pub alert_receiver: Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>,
pub alert_receiver: Vec<Box<dyn AlertReceiver<RHOBObservability>>>,
}
///TODO TEST this
#[async_trait]
impl<
T: Topology
@@ -40,7 +41,7 @@ impl<
+ MultiTargetTopology
+ Ingress
+ std::fmt::Debug
+ Observability<RedHatClusterObservability>,
+ PrometheusMonitoring<RHOBObservability>,
> ApplicationFeature<T> for Monitoring
{
async fn ensure_installed(
@@ -54,14 +55,13 @@ impl<
.map(|ns| ns.name.clone())
.unwrap_or_else(|| self.application.name());
let mut alerting_score = RedHatClusterObservabilityScore {
sender: RedHatClusterObservability {
let mut alerting_score = ApplicationRHOBMonitoringScore {
sender: RHOBObservability {
namespace: namespace.clone(),
resource_selector: todo!(),
client: topology.k8s_client().await.unwrap(),
},
application: self.application.clone(),
receivers: self.alert_receiver.clone(),
rules: vec![],
scrape_targets: None,
};
let domain = topology
.get_domain("ntfy")
@@ -97,15 +97,12 @@ impl<
url::Url::parse(
format!(
"http://{domain}/{}?auth={ntfy_default_auth_param}",
__self.application.name()
self.application.name()
)
.as_str(),
)
.unwrap(),
),
route: AlertRoute {
..AlertRoute::default("ntfy-webhook".to_string())
},
};
debug!(
"ntfy webhook receiver \n{:#?}\nntfy topic: {}",

View File

@@ -1,6 +1,5 @@
use std::fs::{self};
use std::path::{Path, PathBuf};
use std::process;
use std::path::PathBuf;
use std::sync::Arc;
use async_trait::async_trait;
@@ -65,6 +64,7 @@ pub struct RustWebapp {
///
/// This is the place to put the public host name if this is a public facing webapp.
pub dns: String,
pub version: String,
}
impl Application for RustWebapp {
@@ -465,6 +465,7 @@ impl RustWebapp {
let app_name = &self.name;
let service_port = self.service_port;
let version = &self.version;
// Create Chart.yaml
let chart_yaml = format!(
r#"
@@ -472,7 +473,7 @@ apiVersion: v2
name: {chart_name}
description: A Helm chart for the {app_name} web application.
type: application
version: 0.2.1
version: {version}
appVersion: "{image_tag}"
"#,
);

View File

@@ -1,4 +1,5 @@
use async_trait::async_trait;
use k8s_openapi::ResourceScope;
use kube::Resource;
use log::info;
use serde::{Serialize, de::DeserializeOwned};
@@ -28,7 +29,7 @@ impl<K: Resource + std::fmt::Debug> K8sResourceScore<K> {
}
impl<
K: Resource
K: Resource<Scope: ResourceScope>
+ std::fmt::Debug
+ Sync
+ DeserializeOwned
@@ -60,7 +61,7 @@ pub struct K8sResourceInterpret<K: Resource + std::fmt::Debug + Sync + Send> {
#[async_trait]
impl<
K: Resource
K: Resource<Scope: ResourceScope>
+ Clone
+ std::fmt::Debug
+ DeserializeOwned

View File

@@ -20,6 +20,7 @@ pub mod okd;
pub mod openbao;
pub mod opnsense;
pub mod postgresql;
pub mod prometheus;
pub mod storage;
pub mod tenant;
pub mod tftp;

View File

@@ -1,38 +1,99 @@
use crate::modules::monitoring::kube_prometheus::KubePrometheus;
use crate::modules::monitoring::okd::OpenshiftClusterAlertSender;
use crate::modules::monitoring::red_hat_cluster_observability::RedHatClusterObservability;
use crate::topology::monitoring::{AlertRoute, InstallOperation, ReceiverInstallPlan};
use crate::{interpret::InterpretError, topology::monitoring::AlertReceiver};
use harmony_types::net::Url;
use std::any::Any;
use std::collections::{BTreeMap, HashMap};
use async_trait::async_trait;
use harmony_types::k8s_name::K8sName;
use k8s_openapi::api::core::v1::Secret;
use kube::Resource;
use kube::api::{DynamicObject, ObjectMeta};
use log::{debug, trace};
use serde::Serialize;
use serde_json::json;
use std::collections::BTreeMap;
use serde_yaml::{Mapping, Value};
use crate::infra::kube::kube_resource_to_dynamic;
use crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::{
AlertmanagerConfig, AlertmanagerConfigSpec, CRDPrometheus,
};
use crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::RHOBObservability;
use crate::modules::monitoring::okd::OpenshiftClusterAlertSender;
use crate::topology::oberservability::monitoring::AlertManagerReceiver;
use crate::{
interpret::{InterpretError, Outcome},
modules::monitoring::{
kube_prometheus::{
prometheus::{KubePrometheus, KubePrometheusReceiver},
types::{AlertChannelConfig, AlertManagerChannelConfig},
},
prometheus::prometheus::{Prometheus, PrometheusReceiver},
},
topology::oberservability::monitoring::AlertReceiver,
};
use harmony_types::net::Url;
#[derive(Debug, Clone, Serialize)]
pub struct DiscordReceiver {
pub name: String,
pub struct DiscordWebhook {
pub name: K8sName,
pub url: Url,
pub route: AlertRoute,
pub selectors: Vec<HashMap<String, String>>,
}
impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordReceiver {
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
let receiver_block = serde_yaml::to_value(json!({
"name": self.name,
"discord_configs": [{
"webhook_url": format!("{}", self.url),
"title": "{{ template \"discord.default.title\" . }}",
"message": "{{ template \"discord.default.message\" . }}"
}]
}))
.map_err(|e| InterpretError::new(e.to_string()))?;
impl DiscordWebhook {
fn get_receiver_config(&self) -> Result<AlertManagerReceiver, String> {
let secret_name = format!("{}-secret", self.name.clone());
let webhook_key = format!("{}", self.url.clone());
Ok(ReceiverInstallPlan {
install_operation: None,
route: Some(self.route.clone()),
receiver: Some(receiver_block),
let mut string_data = BTreeMap::new();
string_data.insert("webhook-url".to_string(), webhook_key.clone());
let secret = Secret {
metadata: kube::core::ObjectMeta {
name: Some(secret_name.clone()),
..Default::default()
},
string_data: Some(string_data),
type_: Some("Opaque".to_string()),
..Default::default()
};
let mut matchers: Vec<String> = Vec::new();
for selector in &self.selectors {
trace!("selector: {:#?}", selector);
for (k, v) in selector {
matchers.push(format!("{} = {}", k, v));
}
}
Ok(AlertManagerReceiver {
additional_ressources: vec![kube_resource_to_dynamic(&secret)?],
receiver_config: json!({
"name": self.name,
"discord_configs": [
{
"webhook_url": self.url.clone(),
"title": "{{ template \"discord.default.title\" . }}",
"message": "{{ template \"discord.default.message\" . }}"
}
]
}),
route_config: json!({
"receiver": self.name,
"matchers": matchers,
}),
})
}
}
#[async_trait]
impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordWebhook {
async fn install(
&self,
sender: &OpenshiftClusterAlertSender,
) -> Result<Outcome, InterpretError> {
todo!()
}
fn name(&self) -> String {
self.name.clone().to_string()
@@ -41,77 +102,309 @@ impl AlertReceiver<OpenshiftClusterAlertSender> for DiscordReceiver {
fn clone_box(&self) -> Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
Box::new(self.clone())
}
fn as_any(&self) -> &dyn Any {
todo!()
}
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
self.get_receiver_config()
}
}
impl AlertReceiver<RedHatClusterObservability> for DiscordReceiver {
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
#[async_trait]
impl AlertReceiver<RHOBObservability> for DiscordWebhook {
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
todo!()
}
async fn install(&self, sender: &RHOBObservability) -> Result<Outcome, InterpretError> {
let ns = sender.namespace.clone();
let config = self.get_receiver_config()?;
for resource in config.additional_ressources.iter() {
todo!("can I apply a dynamicresource");
// sender.client.apply(resource, Some(&ns)).await;
}
let spec = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfigSpec {
data: json!({
"route": {
"receiver": self.name,
},
"receivers": [
config.receiver_config
]
}),
};
let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfig {
metadata: ObjectMeta {
name: Some(self.name.clone().to_string()),
labels: Some(std::collections::BTreeMap::from([(
"alertmanagerConfig".to_string(),
"enabled".to_string(),
)])),
namespace: Some(sender.namespace.clone()),
..Default::default()
},
spec,
};
debug!(
"alertmanager_configs yaml:\n{:#?}",
serde_yaml::to_string(&alertmanager_configs)
);
debug!(
"alert manager configs: \n{:#?}",
alertmanager_configs.clone()
);
sender
.client
.apply(&alertmanager_configs, Some(&sender.namespace))
.await?;
Ok(Outcome::success(format!(
"installed rhob-alertmanagerconfigs for {}",
self.name
)))
}
fn name(&self) -> String {
"webhook-receiver".to_string()
}
fn clone_box(&self) -> Box<dyn AlertReceiver<RHOBObservability>> {
Box::new(self.clone())
}
fn as_any(&self) -> &dyn Any {
self
}
}
#[async_trait]
impl AlertReceiver<CRDPrometheus> for DiscordWebhook {
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
todo!()
}
async fn install(&self, sender: &CRDPrometheus) -> Result<Outcome, InterpretError> {
let ns = sender.namespace.clone();
let secret_name = format!("{}-secret", self.name.clone());
let webhook_key = format!("{}", self.url.clone());
let mut string_data = BTreeMap::new();
string_data.insert("webhook-url".to_string(), webhook_key.clone());
let receiver_config = json!({
"name": self.name,
"discordConfigs": [
{
"apiURL": {
"key": "webhook-url",
"name": format!("{}-secret", self.name)
},
"title": "{{ template \"discord.default.title\" . }}",
"message": "{{ template \"discord.default.message\" . }}"
}
]
});
let secret = Secret {
metadata: kube::core::ObjectMeta {
name: Some(secret_name.clone()),
..Default::default()
},
string_data: Some(string_data),
type_: Some("Opaque".to_string()),
..Default::default()
};
Ok(ReceiverInstallPlan {
install_operation: Some(vec![InstallOperation::CreateSecret {
name: secret_name,
data: string_data,
}]),
route: Some(self.route.clone()),
receiver: Some(
serde_yaml::to_value(receiver_config)
.map_err(|e| InterpretError::new(e.to_string()))
.expect("failed to build yaml value"),
),
})
let _ = sender.client.apply(&secret, Some(&ns)).await;
let spec = AlertmanagerConfigSpec {
data: json!({
"route": {
"receiver": self.name,
},
"receivers": [
{
"name": self.name,
"discordConfigs": [
{
"apiURL": {
"name": secret_name,
"key": "webhook-url",
},
"title": "{{ template \"discord.default.title\" . }}",
"message": "{{ template \"discord.default.message\" . }}"
}
]
}
]
}),
};
let alertmanager_configs = AlertmanagerConfig {
metadata: ObjectMeta {
name: Some(self.name.clone().to_string()),
labels: Some(std::collections::BTreeMap::from([(
"alertmanagerConfig".to_string(),
"enabled".to_string(),
)])),
namespace: Some(ns),
..Default::default()
},
spec,
};
sender
.client
.apply(&alertmanager_configs, Some(&sender.namespace))
.await?;
Ok(Outcome::success(format!(
"installed crd-alertmanagerconfigs for {}",
self.name
)))
}
fn name(&self) -> String {
self.name.clone()
"discord-webhook".to_string()
}
fn clone_box(&self) -> Box<dyn AlertReceiver<RedHatClusterObservability>> {
fn clone_box(&self) -> Box<dyn AlertReceiver<CRDPrometheus>> {
Box::new(self.clone())
}
fn as_any(&self) -> &dyn Any {
self
}
}
impl AlertReceiver<KubePrometheus> for DiscordReceiver {
fn build(&self) -> Result<ReceiverInstallPlan, InterpretError> {
let receiver_block = serde_yaml::to_value(json!({
"name": self.name,
"discord_configs": [{
"webhook_url": format!("{}", self.url),
"title": "{{ template \"discord.default.title\" . }}",
"message": "{{ template \"discord.default.message\" . }}"
}]
}))
.map_err(|e| InterpretError::new(e.to_string()))?;
Ok(ReceiverInstallPlan {
install_operation: None,
route: Some(self.route.clone()),
receiver: Some(receiver_block),
})
#[async_trait]
impl AlertReceiver<Prometheus> for DiscordWebhook {
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
todo!()
}
async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
sender.install_receiver(self).await
}
fn name(&self) -> String {
self.name.clone()
"discord-webhook".to_string()
}
fn clone_box(&self) -> Box<dyn AlertReceiver<Prometheus>> {
Box::new(self.clone())
}
fn as_any(&self) -> &dyn Any {
self
}
}
#[async_trait]
impl PrometheusReceiver for DiscordWebhook {
fn name(&self) -> String {
self.name.clone().to_string()
}
async fn configure_receiver(&self) -> AlertManagerChannelConfig {
self.get_config().await
}
}
#[async_trait]
impl AlertReceiver<KubePrometheus> for DiscordWebhook {
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
todo!()
}
async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
sender.install_receiver(self).await
}
fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
Box::new(self.clone())
}
fn name(&self) -> String {
"discord-webhook".to_string()
}
fn as_any(&self) -> &dyn Any {
self
}
}
#[async_trait]
impl KubePrometheusReceiver for DiscordWebhook {
fn name(&self) -> String {
self.name.clone().to_string()
}
async fn configure_receiver(&self) -> AlertManagerChannelConfig {
self.get_config().await
}
}
#[async_trait]
impl AlertChannelConfig for DiscordWebhook {
async fn get_config(&self) -> AlertManagerChannelConfig {
let channel_global_config = None;
let channel_receiver = self.alert_channel_receiver().await;
let channel_route = self.alert_channel_route().await;
AlertManagerChannelConfig {
channel_global_config,
channel_receiver,
channel_route,
}
}
}
impl DiscordWebhook {
async fn alert_channel_route(&self) -> serde_yaml::Value {
let mut route = Mapping::new();
route.insert(
Value::String("receiver".to_string()),
Value::String(self.name.clone().to_string()),
);
route.insert(
Value::String("matchers".to_string()),
Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
);
route.insert(Value::String("continue".to_string()), Value::Bool(true));
Value::Mapping(route)
}
async fn alert_channel_receiver(&self) -> serde_yaml::Value {
let mut receiver = Mapping::new();
receiver.insert(
Value::String("name".to_string()),
Value::String(self.name.clone().to_string()),
);
let mut discord_config = Mapping::new();
discord_config.insert(
Value::String("webhook_url".to_string()),
Value::String(self.url.to_string()),
);
receiver.insert(
Value::String("discord_configs".to_string()),
Value::Sequence(vec![Value::Mapping(discord_config)]),
);
Value::Mapping(receiver)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn discord_serialize_should_match() {
let discord_receiver = DiscordWebhook {
name: K8sName("test-discord".to_string()),
url: Url::Url(url::Url::parse("https://discord.i.dont.exist.com").unwrap()),
selectors: vec![],
};
let discord_receiver_receiver =
serde_yaml::to_string(&discord_receiver.alert_channel_receiver().await).unwrap();
println!("receiver \n{:#}", discord_receiver_receiver);
let discord_receiver_receiver_yaml = r#"name: test-discord
discord_configs:
- webhook_url: https://discord.i.dont.exist.com/
"#
.to_string();
let discord_receiver_route =
serde_yaml::to_string(&discord_receiver.alert_channel_route().await).unwrap();
println!("route \n{:#}", discord_receiver_route);
let discord_receiver_route_yaml = r#"receiver: test-discord
matchers:
- alertname!=Watchdog
continue: true
"#
.to_string();
assert_eq!(discord_receiver_receiver, discord_receiver_receiver_yaml);
assert_eq!(discord_receiver_route, discord_receiver_route_yaml);
}
}

View File

@@ -1,13 +1,25 @@
use std::any::Any;
use async_trait::async_trait;
use kube::api::ObjectMeta;
use log::debug;
use serde::Serialize;
use serde_json::json;
use serde_yaml::{Mapping, Value};
use crate::{
interpret::InterpretError,
interpret::{InterpretError, Outcome},
modules::monitoring::{
kube_prometheus::KubePrometheus, okd::OpenshiftClusterAlertSender, prometheus::Prometheus,
red_hat_cluster_observability::RedHatClusterObservability,
kube_prometheus::{
crd::{
crd_alertmanager_config::CRDPrometheus, rhob_alertmanager_config::RHOBObservability,
},
prometheus::{KubePrometheus, KubePrometheusReceiver},
types::{AlertChannelConfig, AlertManagerChannelConfig},
},
prometheus::prometheus::{Prometheus, PrometheusReceiver},
},
topology::monitoring::{AlertReceiver, AlertRoute, ReceiverInstallPlan},
topology::oberservability::monitoring::{AlertManagerReceiver, AlertReceiver},
};
use harmony_types::net::Url;
@@ -15,115 +27,281 @@ use harmony_types::net::Url;
pub struct WebhookReceiver {
pub name: String,
pub url: Url,
pub route: AlertRoute,
}
impl WebhookReceiver {
fn build_receiver(&self) -> serde_json::Value {
json!({
"name": self.name,
"webhookConfigs": [
{
"url": self.url,
"httpConfig": {
"tlsConfig": {
"insecureSkipVerify": true
#[async_trait]
impl AlertReceiver<RHOBObservability> for WebhookReceiver {
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
todo!()
}
async fn install(&self, sender: &RHOBObservability) -> Result<Outcome, InterpretError> {
let spec = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfigSpec {
data: json!({
"route": {
"receiver": self.name,
},
"receivers": [
{
"name": self.name,
"webhookConfigs": [
{
"url": self.url,
"httpConfig": {
"tlsConfig": {
"insecureSkipVerify": true
}
}
}
]
}
}
}
]})
]
}),
};
let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::rhob_alertmanager_config::AlertmanagerConfig {
metadata: ObjectMeta {
name: Some(self.name.clone()),
labels: Some(std::collections::BTreeMap::from([(
"alertmanagerConfig".to_string(),
"enabled".to_string(),
)])),
namespace: Some(sender.namespace.clone()),
..Default::default()
},
spec,
};
debug!(
"alert manager configs: \n{:#?}",
alertmanager_configs.clone()
);
sender
.client
.apply(&alertmanager_configs, Some(&sender.namespace))
.await?;
Ok(Outcome::success(format!(
"installed rhob-alertmanagerconfigs for {}",
self.name
)))
}
fn build_route(&self) -> serde_json::Value {
json!({
"name": self.name})
}
}
impl AlertReceiver<OpenshiftClusterAlertSender> for WebhookReceiver {
fn name(&self) -> String {
self.name.clone()
"webhook-receiver".to_string()
}
fn clone_box(&self) -> Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
fn clone_box(&self) -> Box<dyn AlertReceiver<RHOBObservability>> {
Box::new(self.clone())
}
fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
let receiver = self.build_receiver();
let receiver =
serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
Ok(ReceiverInstallPlan {
install_operation: None,
route: Some(self.route.clone()),
receiver: Some(receiver),
})
fn as_any(&self) -> &dyn Any {
self
}
}
impl AlertReceiver<RedHatClusterObservability> for WebhookReceiver {
fn name(&self) -> String {
self.name.clone()
#[async_trait]
impl AlertReceiver<CRDPrometheus> for WebhookReceiver {
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
todo!()
}
async fn install(&self, sender: &CRDPrometheus) -> Result<Outcome, InterpretError> {
let spec = crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::AlertmanagerConfigSpec {
data: json!({
"route": {
"receiver": self.name,
},
"receivers": [
{
"name": self.name,
"webhookConfigs": [
{
"url": self.url,
}
]
}
]
}),
};
let alertmanager_configs = crate::modules::monitoring::kube_prometheus::crd::crd_alertmanager_config::AlertmanagerConfig {
metadata: ObjectMeta {
name: Some(self.name.clone()),
labels: Some(std::collections::BTreeMap::from([(
"alertmanagerConfig".to_string(),
"enabled".to_string(),
)])),
namespace: Some(sender.namespace.clone()),
..Default::default()
},
spec,
};
debug!(
"alert manager configs: \n{:#?}",
alertmanager_configs.clone()
);
sender
.client
.apply(&alertmanager_configs, Some(&sender.namespace))
.await?;
Ok(Outcome::success(format!(
"installed crd-alertmanagerconfigs for {}",
self.name
)))
}
fn clone_box(&self) -> Box<dyn AlertReceiver<RedHatClusterObservability>> {
fn name(&self) -> String {
"webhook-receiver".to_string()
}
fn clone_box(&self) -> Box<dyn AlertReceiver<CRDPrometheus>> {
Box::new(self.clone())
}
fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
let receiver = self.build_receiver();
let receiver =
serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
Ok(ReceiverInstallPlan {
install_operation: None,
route: Some(self.route.clone()),
receiver: Some(receiver),
})
}
}
impl AlertReceiver<KubePrometheus> for WebhookReceiver {
fn name(&self) -> String {
self.name.clone()
}
fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
Box::new(self.clone())
}
fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
let receiver = self.build_receiver();
let receiver =
serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
Ok(ReceiverInstallPlan {
install_operation: None,
route: Some(self.route.clone()),
receiver: Some(receiver),
})
fn as_any(&self) -> &dyn Any {
self
}
}
#[async_trait]
impl AlertReceiver<Prometheus> for WebhookReceiver {
fn name(&self) -> String {
self.name.clone()
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
todo!()
}
async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
sender.install_receiver(self).await
}
fn name(&self) -> String {
"webhook-receiver".to_string()
}
fn clone_box(&self) -> Box<dyn AlertReceiver<Prometheus>> {
Box::new(self.clone())
}
fn build(&self) -> Result<crate::topology::monitoring::ReceiverInstallPlan, InterpretError> {
let receiver = self.build_receiver();
let receiver =
serde_yaml::to_value(receiver).map_err(|e| InterpretError::new(e.to_string()))?;
Ok(ReceiverInstallPlan {
install_operation: None,
route: Some(self.route.clone()),
receiver: Some(receiver),
})
fn as_any(&self) -> &dyn Any {
self
}
}
#[async_trait]
impl PrometheusReceiver for WebhookReceiver {
fn name(&self) -> String {
self.name.clone()
}
async fn configure_receiver(&self) -> AlertManagerChannelConfig {
self.get_config().await
}
}
#[async_trait]
impl AlertReceiver<KubePrometheus> for WebhookReceiver {
fn as_alertmanager_receiver(&self) -> Result<AlertManagerReceiver, String> {
todo!()
}
async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
sender.install_receiver(self).await
}
fn name(&self) -> String {
"webhook-receiver".to_string()
}
fn clone_box(&self) -> Box<dyn AlertReceiver<KubePrometheus>> {
Box::new(self.clone())
}
fn as_any(&self) -> &dyn Any {
self
}
}
#[async_trait]
impl KubePrometheusReceiver for WebhookReceiver {
fn name(&self) -> String {
self.name.clone()
}
async fn configure_receiver(&self) -> AlertManagerChannelConfig {
self.get_config().await
}
}
#[async_trait]
impl AlertChannelConfig for WebhookReceiver {
async fn get_config(&self) -> AlertManagerChannelConfig {
let channel_global_config = None;
let channel_receiver = self.alert_channel_receiver().await;
let channel_route = self.alert_channel_route().await;
AlertManagerChannelConfig {
channel_global_config,
channel_receiver,
channel_route,
}
}
}
impl WebhookReceiver {
async fn alert_channel_route(&self) -> serde_yaml::Value {
let mut route = Mapping::new();
route.insert(
Value::String("receiver".to_string()),
Value::String(self.name.clone()),
);
route.insert(
Value::String("matchers".to_string()),
Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
);
route.insert(Value::String("continue".to_string()), Value::Bool(true));
Value::Mapping(route)
}
async fn alert_channel_receiver(&self) -> serde_yaml::Value {
let mut receiver = Mapping::new();
receiver.insert(
Value::String("name".to_string()),
Value::String(self.name.clone()),
);
let mut webhook_config = Mapping::new();
webhook_config.insert(
Value::String("url".to_string()),
Value::String(self.url.to_string()),
);
receiver.insert(
Value::String("webhook_configs".to_string()),
Value::Sequence(vec![Value::Mapping(webhook_config)]),
);
Value::Mapping(receiver)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn webhook_serialize_should_match() {
let webhook_receiver = WebhookReceiver {
name: "test-webhook".to_string(),
url: Url::Url(url::Url::parse("https://webhook.i.dont.exist.com").unwrap()),
};
let webhook_receiver_receiver =
serde_yaml::to_string(&webhook_receiver.alert_channel_receiver().await).unwrap();
println!("receiver \n{:#}", webhook_receiver_receiver);
let webhook_receiver_receiver_yaml = r#"name: test-webhook
webhook_configs:
- url: https://webhook.i.dont.exist.com/
"#
.to_string();
let webhook_receiver_route =
serde_yaml::to_string(&webhook_receiver.alert_channel_route().await).unwrap();
println!("route \n{:#}", webhook_receiver_route);
let webhook_receiver_route_yaml = r#"receiver: test-webhook
matchers:
- alertname!=Watchdog
continue: true
"#
.to_string();
assert_eq!(webhook_receiver_receiver, webhook_receiver_receiver_yaml);
assert_eq!(webhook_receiver_route, webhook_receiver_route_yaml);
}
}

View File

@@ -1,15 +0,0 @@
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
pub fn high_http_error_rate() -> PrometheusAlertRule {
let expression = r#"(
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job, route, service)
/
sum(rate(http_requests_total[5m])) by (job, route, service)
) > 0.05 and sum(rate(http_requests_total[5m])) by (job, route, service) > 10"#;
PrometheusAlertRule::new("HighApplicationErrorRate", expression)
.for_duration("10m")
.label("severity", "warning")
.annotation("summary", "High HTTP error rate on {{ $labels.job }}")
.annotation("description", "Job {{ $labels.job }} (route {{ $labels.route }}) has an error rate > 5% over the last 10m.")
}

View File

@@ -1,2 +1 @@
pub mod alerts;
pub mod prometheus_alert_rule;

View File

@@ -1,13 +1,79 @@
use std::collections::HashMap;
use std::collections::{BTreeMap, HashMap};
use async_trait::async_trait;
use serde::Serialize;
use crate::{
interpret::InterpretError,
modules::monitoring::{kube_prometheus::KubePrometheus, okd::OpenshiftClusterAlertSender},
topology::monitoring::AlertRule,
interpret::{InterpretError, Outcome},
modules::monitoring::{
kube_prometheus::{
prometheus::{KubePrometheus, KubePrometheusRule},
types::{AlertGroup, AlertManagerAdditionalPromRules},
},
prometheus::prometheus::{Prometheus, PrometheusRule},
},
topology::oberservability::monitoring::AlertRule,
};
#[async_trait]
impl AlertRule<KubePrometheus> for AlertManagerRuleGroup {
async fn install(&self, sender: &KubePrometheus) -> Result<Outcome, InterpretError> {
sender.install_rule(self).await
}
fn clone_box(&self) -> Box<dyn AlertRule<KubePrometheus>> {
Box::new(self.clone())
}
}
#[async_trait]
impl AlertRule<Prometheus> for AlertManagerRuleGroup {
async fn install(&self, sender: &Prometheus) -> Result<Outcome, InterpretError> {
sender.install_rule(self).await
}
fn clone_box(&self) -> Box<dyn AlertRule<Prometheus>> {
Box::new(self.clone())
}
}
#[async_trait]
impl PrometheusRule for AlertManagerRuleGroup {
fn name(&self) -> String {
self.name.clone()
}
async fn configure_rule(&self) -> AlertManagerAdditionalPromRules {
let mut additional_prom_rules = BTreeMap::new();
additional_prom_rules.insert(
self.name.clone(),
AlertGroup {
groups: vec![self.clone()],
},
);
AlertManagerAdditionalPromRules {
rules: additional_prom_rules,
}
}
}
#[async_trait]
impl KubePrometheusRule for AlertManagerRuleGroup {
fn name(&self) -> String {
self.name.clone()
}
async fn configure_rule(&self) -> AlertManagerAdditionalPromRules {
let mut additional_prom_rules = BTreeMap::new();
additional_prom_rules.insert(
self.name.clone(),
AlertGroup {
groups: vec![self.clone()],
},
);
AlertManagerAdditionalPromRules {
rules: additional_prom_rules,
}
}
}
impl AlertManagerRuleGroup {
pub fn new(name: &str, rules: Vec<PrometheusAlertRule>) -> AlertManagerRuleGroup {
AlertManagerRuleGroup {
@@ -63,55 +129,3 @@ impl PrometheusAlertRule {
self
}
}
impl AlertRule<OpenshiftClusterAlertSender> for AlertManagerRuleGroup {
fn build_rule(&self) -> Result<serde_json::Value, InterpretError> {
let name = self.name.clone();
let mut rules: Vec<crate::modules::monitoring::okd::crd::alerting_rules::Rule> = vec![];
for rule in self.rules.clone() {
rules.push(rule.into())
}
let rule_groups =
vec![crate::modules::monitoring::okd::crd::alerting_rules::RuleGroup { name, rules }];
Ok(serde_json::to_value(rule_groups).map_err(|e| InterpretError::new(e.to_string()))?)
}
fn name(&self) -> String {
self.name.clone()
}
fn clone_box(&self) -> Box<dyn AlertRule<OpenshiftClusterAlertSender>> {
Box::new(self.clone())
}
}
impl AlertRule<KubePrometheus> for AlertManagerRuleGroup {
fn build_rule(&self) -> Result<serde_json::Value, InterpretError> {
let name = self.name.clone();
let mut rules: Vec<
crate::modules::monitoring::kube_prometheus::crd::crd_prometheus_rules::Rule,
> = vec![];
for rule in self.rules.clone() {
rules.push(rule.into())
}
let rule_groups = vec![
crate::modules::monitoring::kube_prometheus::crd::crd_prometheus_rules::RuleGroup {
name,
rules,
},
];
Ok(serde_json::to_value(rule_groups).map_err(|e| InterpretError::new(e.to_string()))?)
}
fn name(&self) -> String {
self.name.clone()
}
fn clone_box(&self) -> Box<dyn AlertRule<KubePrometheus>> {
Box::new(self.clone())
}
}

View File

@@ -5,26 +5,32 @@ use serde::Serialize;
use crate::{
interpret::Interpret,
modules::{application::Application, monitoring::prometheus::Prometheus},
modules::{
application::Application,
monitoring::{
grafana::grafana::Grafana, kube_prometheus::crd::crd_alertmanager_config::CRDPrometheus,
},
prometheus::prometheus::PrometheusMonitoring,
},
score::Score,
topology::{
K8sclient, Topology,
monitoring::{AlertReceiver, AlertingInterpret, Observability, ScrapeTarget},
oberservability::monitoring::{AlertReceiver, AlertingInterpret, ScrapeTarget},
},
};
#[derive(Debug, Clone, Serialize)]
pub struct ApplicationMonitoringScore {
pub sender: Prometheus,
pub sender: CRDPrometheus,
pub application: Arc<dyn Application>,
pub receivers: Vec<Box<dyn AlertReceiver<Prometheus>>>,
pub receivers: Vec<Box<dyn AlertReceiver<CRDPrometheus>>>,
}
impl<T: Topology + Observability<Prometheus> + K8sclient> Score<T> for ApplicationMonitoringScore {
impl<T: Topology + PrometheusMonitoring<CRDPrometheus> + K8sclient + Grafana> Score<T>
for ApplicationMonitoringScore
{
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
debug!("creating alerting interpret");
//TODO will need to use k8sclient to apply service monitors or find a way to pass
//them to the AlertingInterpret potentially via Sender Prometheus
Box::new(AlertingInterpret {
sender: self.sender.clone(),
receivers: self.receivers.clone(),

View File

@@ -9,27 +9,28 @@ use crate::{
inventory::Inventory,
modules::{
application::Application,
monitoring::red_hat_cluster_observability::RedHatClusterObservability,
monitoring::kube_prometheus::crd::{
crd_alertmanager_config::CRDPrometheus, rhob_alertmanager_config::RHOBObservability,
},
prometheus::prometheus::PrometheusMonitoring,
},
score::Score,
topology::{
Topology,
monitoring::{AlertReceiver, AlertingInterpret, Observability},
},
topology::{PreparationOutcome, Topology, oberservability::monitoring::AlertReceiver},
};
use harmony_types::id::Id;
#[derive(Debug, Clone, Serialize)]
pub struct ApplicationRedHatClusterMonitoringScore {
pub sender: RedHatClusterObservability,
pub struct ApplicationRHOBMonitoringScore {
pub sender: RHOBObservability,
pub application: Arc<dyn Application>,
pub receivers: Vec<Box<dyn AlertReceiver<RedHatClusterObservability>>>,
pub receivers: Vec<Box<dyn AlertReceiver<RHOBObservability>>>,
}
impl<T: Topology + Observability<RedHatClusterObservability>> Score<T>
for ApplicationRedHatClusterMonitoringScore
impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Score<T>
for ApplicationRHOBMonitoringScore
{
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(ApplicationRedHatClusterMonitoringInterpret {
Box::new(ApplicationRHOBMonitoringInterpret {
score: self.clone(),
})
}
@@ -43,28 +44,38 @@ impl<T: Topology + Observability<RedHatClusterObservability>> Score<T>
}
#[derive(Debug)]
pub struct ApplicationRedHatClusterMonitoringInterpret {
score: ApplicationRedHatClusterMonitoringScore,
pub struct ApplicationRHOBMonitoringInterpret {
score: ApplicationRHOBMonitoringScore,
}
#[async_trait]
impl<T: Topology + Observability<RedHatClusterObservability>> Interpret<T>
for ApplicationRedHatClusterMonitoringInterpret
impl<T: Topology + PrometheusMonitoring<RHOBObservability>> Interpret<T>
for ApplicationRHOBMonitoringInterpret
{
async fn execute(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
//TODO will need to use k8sclient to apply crd ServiceMonitor or find a way to pass
//them to the AlertingInterpret potentially via Sender RedHatClusterObservability
let alerting_interpret = AlertingInterpret {
sender: self.score.sender.clone(),
receivers: self.score.receivers.clone(),
rules: vec![],
scrape_targets: None,
};
alerting_interpret.execute(inventory, topology).await
let result = topology
.install_prometheus(
&self.score.sender,
inventory,
Some(self.score.receivers.clone()),
)
.await;
match result {
Ok(outcome) => match outcome {
PreparationOutcome::Success { details: _ } => {
Ok(Outcome::success("Prometheus installed".into()))
}
PreparationOutcome::Noop => {
Ok(Outcome::noop("Prometheus installation skipped".into()))
}
},
Err(err) => Err(InterpretError::from(err)),
}
}
fn get_name(&self) -> InterpretName {

View File

@@ -1,41 +1,17 @@
use serde::Serialize;
use async_trait::async_trait;
use k8s_openapi::Resource;
use crate::topology::monitoring::{AlertReceiver, AlertRule, AlertSender, ScrapeTarget};
use crate::{
inventory::Inventory,
topology::{PreparationError, PreparationOutcome},
};
#[derive(Debug, Clone, Serialize)]
pub struct Grafana {
pub namespace: String,
}
impl AlertSender for Grafana {
fn name(&self) -> String {
"grafana".to_string()
}
}
impl Serialize for Box<dyn AlertReceiver<Grafana>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
impl Serialize for Box<dyn AlertRule<Grafana>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
impl Serialize for Box<dyn ScrapeTarget<Grafana>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
#[async_trait]
pub trait Grafana {
async fn ensure_grafana_operator(
&self,
inventory: &Inventory,
) -> Result<PreparationOutcome, PreparationError>;
async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError>;
}

View File

@@ -1,32 +0,0 @@
use serde::Serialize;
use crate::{
modules::monitoring::grafana::grafana::Grafana,
score::Score,
topology::{
Topology,
monitoring::{AlertReceiver, AlertRule, AlertingInterpret, Observability, ScrapeTarget},
},
};
#[derive(Clone, Debug, Serialize)]
pub struct GrafanaAlertingScore {
pub receivers: Vec<Box<dyn AlertReceiver<Grafana>>>,
pub rules: Vec<Box<dyn AlertRule<Grafana>>>,
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<Grafana>>>>,
pub sender: Grafana,
}
impl<T: Topology + Observability<Grafana>> Score<T> for GrafanaAlertingScore {
fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
Box::new(AlertingInterpret {
sender: self.sender.clone(),
receivers: self.receivers.clone(),
rules: self.rules.clone(),
scrape_targets: self.scrape_targets.clone(),
})
}
fn name(&self) -> String {
"HelmPrometheusAlertingScore".to_string()
}
}

View File

@@ -0,0 +1,28 @@
use harmony_macros::hurl;
use non_blank_string_rs::NonBlankString;
use std::{collections::HashMap, str::FromStr};
use crate::modules::helm::chart::{HelmChartScore, HelmRepository};
pub fn grafana_helm_chart_score(ns: &str, namespace_scope: bool) -> HelmChartScore {
let mut values_overrides = HashMap::new();
values_overrides.insert(
NonBlankString::from_str("namespaceScope").unwrap(),
namespace_scope.to_string(),
);
HelmChartScore {
namespace: Some(NonBlankString::from_str(ns).unwrap()),
release_name: NonBlankString::from_str("grafana-operator").unwrap(),
chart_name: NonBlankString::from_str("grafana/grafana-operator").unwrap(),
chart_version: None,
values_overrides: Some(values_overrides),
values_yaml: None,
create_namespace: true,
install_only: true,
repository: Some(HelmRepository::new(
"grafana".to_string(),
hurl!("https://grafana.github.io/helm-charts"),
true,
)),
}
}

View File

@@ -0,0 +1 @@
pub mod helm_grafana;

View File

@@ -1,3 +0,0 @@
pub mod crd_grafana;
pub mod grafana_default_dashboard;
pub mod rhob_grafana;

View File

@@ -1 +0,0 @@
pub mod grafana_operator;

View File

@@ -1,7 +0,0 @@
pub mod crd;
pub mod helm;
pub mod score_ensure_grafana_ready;
pub mod score_grafana_alert_receiver;
pub mod score_grafana_datasource;
pub mod score_grafana_rule;
pub mod score_install_grafana;

View File

@@ -1,54 +0,0 @@
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::monitoring::grafana::grafana::Grafana,
score::Score,
topology::{K8sclient, Topology},
};
#[derive(Debug, Clone, Serialize)]
pub struct GrafanaK8sEnsureReadyScore {
pub sender: Grafana,
}
impl<T: Topology + K8sclient> Score<T> for GrafanaK8sEnsureReadyScore {
fn name(&self) -> String {
"GrafanaK8sEnsureReadyScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
todo!()
}
}
// async fn ensure_ready(
// &self,
// inventory: &Inventory,
// ) -> Result<PreparationOutcome, PreparationError> {
// debug!("ensure grafana operator");
// let client = self.k8s_client().await.unwrap();
// let grafana_gvk = GroupVersionKind {
// group: "grafana.integreatly.org".to_string(),
// version: "v1beta1".to_string(),
// kind: "Grafana".to_string(),
// };
// let name = "grafanas.grafana.integreatly.org";
// let ns = "grafana";
//
// let grafana_crd = client
// .get_resource_json_value(name, Some(ns), &grafana_gvk)
// .await;
// match grafana_crd {
// Ok(_) => {
// return Ok(PreparationOutcome::Success {
// details: "Found grafana CRDs in cluster".to_string(),
// });
// }
//
// Err(_) => {
// return self
// .install_grafana_operator(inventory, Some("grafana"))
// .await;
// }
// };
// }

View File

@@ -1,24 +0,0 @@
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::monitoring::grafana::grafana::Grafana,
score::Score,
topology::{K8sclient, Topology, monitoring::AlertReceiver},
};
#[derive(Debug, Clone, Serialize)]
pub struct GrafanaK8sReceiverScore {
pub sender: Grafana,
pub receiver: Box<dyn AlertReceiver<Grafana>>,
}
impl<T: Topology + K8sclient> Score<T> for GrafanaK8sReceiverScore {
fn name(&self) -> String {
"GrafanaK8sReceiverScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
todo!()
}
}

View File

@@ -1,83 +0,0 @@
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::monitoring::grafana::grafana::Grafana,
score::Score,
topology::{K8sclient, Topology, monitoring::ScrapeTarget},
};
#[derive(Debug, Clone, Serialize)]
pub struct GrafanaK8sDatasourceScore {
pub sender: Grafana,
pub scrape_target: Box<dyn ScrapeTarget<Grafana>>,
}
impl<T: Topology + K8sclient> Score<T> for GrafanaK8sDatasourceScore {
fn name(&self) -> String {
"GrafanaK8sDatasourceScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
todo!()
}
}
// fn extract_and_normalize_token(&self, secret: &DynamicObject) -> Option<String> {
// let token_b64 = secret
// .data
// .get("token")
// .or_else(|| secret.data.get("data").and_then(|d| d.get("token")))
// .and_then(|v| v.as_str())?;
//
// let bytes = general_purpose::STANDARD.decode(token_b64).ok()?;
//
// let s = String::from_utf8(bytes).ok()?;
//
// let cleaned = s
// .trim_matches(|c: char| c.is_whitespace() || c == '\0')
// .to_string();
// Some(cleaned)
// }
// fn build_grafana_datasource(
// &self,
// name: &str,
// ns: &str,
// label_selector: &LabelSelector,
// url: &str,
// token: &str,
// ) -> GrafanaDatasource {
// let mut json_data = BTreeMap::new();
// json_data.insert("timeInterval".to_string(), "5s".to_string());
//
// GrafanaDatasource {
// metadata: ObjectMeta {
// name: Some(name.to_string()),
// namespace: Some(ns.to_string()),
// ..Default::default()
// },
// spec: GrafanaDatasourceSpec {
// instance_selector: label_selector.clone(),
// allow_cross_namespace_import: Some(true),
// values_from: None,
// datasource: GrafanaDatasourceConfig {
// access: "proxy".to_string(),
// name: name.to_string(),
// rype: "prometheus".to_string(),
// url: url.to_string(),
// database: None,
// json_data: Some(GrafanaDatasourceJsonData {
// time_interval: Some("60s".to_string()),
// http_header_name1: Some("Authorization".to_string()),
// tls_skip_verify: Some(true),
// oauth_pass_thru: Some(true),
// }),
// secure_json_data: Some(GrafanaDatasourceSecureJsonData {
// http_header_value1: Some(format!("Bearer {token}")),
// }),
// is_default: Some(false),
// editable: Some(true),
// },
// },
// }
// }

View File

@@ -1,67 +0,0 @@
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::monitoring::grafana::grafana::Grafana,
score::Score,
topology::{K8sclient, Topology, monitoring::AlertRule},
};
#[derive(Debug, Clone, Serialize)]
pub struct GrafanaK8sRuleScore {
pub sender: Grafana,
pub rule: Box<dyn AlertRule<Grafana>>,
}
impl<T: Topology + K8sclient> Score<T> for GrafanaK8sRuleScore {
fn name(&self) -> String {
"GrafanaK8sRuleScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
todo!()
}
}
// kind: Secret
// apiVersion: v1
// metadata:
// name: credentials
// namespace: grafana
// stringData:
// PROMETHEUS_USERNAME: root
// PROMETHEUS_PASSWORD: secret
// type: Opaque
// ---
// apiVersion: grafana.integreatly.org/v1beta1
// kind: GrafanaDatasource
// metadata:
// name: grafanadatasource-sample
// spec:
// valuesFrom:
// - targetPath: "basicAuthUser"
// valueFrom:
// secretKeyRef:
// name: "credentials"
// key: "PROMETHEUS_USERNAME"
// - targetPath: "secureJsonData.basicAuthPassword"
// valueFrom:
// secretKeyRef:
// name: "credentials"
// key: "PROMETHEUS_PASSWORD"
// instanceSelector:
// matchLabels:
// dashboards: "grafana"
// datasource:
// name: prometheus
// type: prometheus
// access: proxy
// basicAuth: true
// url: http://prometheus-service:9090
// isDefault: true
// basicAuthUser: ${PROMETHEUS_USERNAME}
// jsonData:
// "tlsSkipVerify": true
// "timeInterval": "5s"
// secureJsonData:
// "basicAuthPassword": ${PROMETHEUS_PASSWORD} #

View File

@@ -1,223 +0,0 @@
use async_trait::async_trait;
use harmony_types::id::Id;
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
modules::monitoring::grafana::grafana::Grafana,
score::Score,
topology::{HelmCommand, K8sclient, Topology},
};
#[derive(Debug, Clone, Serialize)]
pub struct GrafanaK8sInstallScore {
pub sender: Grafana,
}
impl<T: Topology + K8sclient + HelmCommand> Score<T> for GrafanaK8sInstallScore {
fn name(&self) -> String {
"GrafanaK8sEnsureReadyScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(GrafanaK8sInstallInterpret {})
}
}
#[derive(Debug, Clone, Serialize)]
pub struct GrafanaK8sInstallInterpret {}
#[async_trait]
impl<T: Topology + K8sclient + HelmCommand> Interpret<T> for GrafanaK8sInstallInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
todo!()
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("GrafanaK8sInstallInterpret")
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
// let score = grafana_operator_helm_chart_score(sender.namespace.clone());
//
// score
// .create_interpret()
// .execute(inventory, self)
// .await
// .map_err(|e| PreparationError::new(e.to_string()))?;
//
//
// fn build_grafana_dashboard(
// &self,
// ns: &str,
// label_selector: &LabelSelector,
// ) -> GrafanaDashboard {
// let graf_dashboard = GrafanaDashboard {
// metadata: ObjectMeta {
// name: Some(format!("grafana-dashboard-{}", ns)),
// namespace: Some(ns.to_string()),
// ..Default::default()
// },
// spec: GrafanaDashboardSpec {
// resync_period: Some("30s".to_string()),
// instance_selector: label_selector.clone(),
// datasources: Some(vec![GrafanaDashboardDatasource {
// input_name: "DS_PROMETHEUS".to_string(),
// datasource_name: "thanos-openshift-monitoring".to_string(),
// }]),
// json: None,
// grafana_com: Some(GrafanaCom {
// id: 17406,
// revision: None,
// }),
// },
// };
// graf_dashboard
// }
//
// fn build_grafana(&self, ns: &str, labels: &BTreeMap<String, String>) -> GrafanaCRD {
// let grafana = GrafanaCRD {
// metadata: ObjectMeta {
// name: Some(format!("grafana-{}", ns)),
// namespace: Some(ns.to_string()),
// labels: Some(labels.clone()),
// ..Default::default()
// },
// spec: GrafanaSpec {
// config: None,
// admin_user: None,
// admin_password: None,
// ingress: None,
// persistence: None,
// resources: None,
// },
// };
// grafana
// }
//
// async fn build_grafana_ingress(&self, ns: &str) -> K8sIngressScore {
// let domain = self.get_domain(&format!("grafana-{}", ns)).await.unwrap();
// let name = format!("{}-grafana", ns);
// let backend_service = format!("grafana-{}-service", ns);
//
// K8sIngressScore {
// name: fqdn::fqdn!(&name),
// host: fqdn::fqdn!(&domain),
// backend_service: fqdn::fqdn!(&backend_service),
// port: 3000,
// path: Some("/".to_string()),
// path_type: Some(PathType::Prefix),
// namespace: Some(fqdn::fqdn!(&ns)),
// ingress_class_name: Some("openshift-default".to_string()),
// }
// }
// #[async_trait]
// impl Grafana for K8sAnywhereTopology {
// async fn install_grafana(&self) -> Result<PreparationOutcome, PreparationError> {
// let ns = "grafana";
//
// let mut label = BTreeMap::new();
//
// label.insert("dashboards".to_string(), "grafana".to_string());
//
// let label_selector = LabelSelector {
// match_labels: label.clone(),
// match_expressions: vec![],
// };
//
// let client = self.k8s_client().await?;
//
// let grafana = self.build_grafana(ns, &label);
//
// client.apply(&grafana, Some(ns)).await?;
// //TODO change this to a ensure ready or something better than just a timeout
// client
// .wait_until_deployment_ready(
// "grafana-grafana-deployment",
// Some("grafana"),
// Some(Duration::from_secs(30)),
// )
// .await?;
//
// let sa_name = "grafana-grafana-sa";
// let token_secret_name = "grafana-sa-token-secret";
//
// let sa_token_secret = self.build_sa_token_secret(token_secret_name, sa_name, ns);
//
// client.apply(&sa_token_secret, Some(ns)).await?;
// let secret_gvk = GroupVersionKind {
// group: "".to_string(),
// version: "v1".to_string(),
// kind: "Secret".to_string(),
// };
//
// let secret = client
// .get_resource_json_value(token_secret_name, Some(ns), &secret_gvk)
// .await?;
//
// let token = format!(
// "Bearer {}",
// self.extract_and_normalize_token(&secret).unwrap()
// );
//
// debug!("creating grafana clusterrole binding");
//
// let clusterrolebinding =
// self.build_cluster_rolebinding(sa_name, "cluster-monitoring-view", ns);
//
// client.apply(&clusterrolebinding, Some(ns)).await?;
//
// debug!("creating grafana datasource crd");
//
// let thanos_url = format!(
// "https://{}",
// self.get_domain("thanos-querier-openshift-monitoring")
// .await
// .unwrap()
// );
//
// let thanos_openshift_datasource = self.build_grafana_datasource(
// "thanos-openshift-monitoring",
// ns,
// &label_selector,
// &thanos_url,
// &token,
// );
//
// client.apply(&thanos_openshift_datasource, Some(ns)).await?;
//
// debug!("creating grafana dashboard crd");
// let dashboard = self.build_grafana_dashboard(ns, &label_selector);
//
// client.apply(&dashboard, Some(ns)).await?;
// debug!("creating grafana ingress");
// let grafana_ingress = self.build_grafana_ingress(ns).await;
//
// grafana_ingress
// .interpret(&Inventory::empty(), self)
// .await
// .map_err(|e| PreparationError::new(e.to_string()))?;
//
// Ok(PreparationOutcome::Success {
// details: "Installed grafana composants".to_string(),
// })
// }
// }

View File

@@ -1,3 +1,2 @@
pub mod grafana;
pub mod grafana_alerting_score;
pub mod k8s;
pub mod helm;

View File

@@ -1,17 +1,91 @@
use std::sync::Arc;
use async_trait::async_trait;
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
#[derive(CustomResource, Serialize, Deserialize, Default, Debug, Clone, JsonSchema)]
use crate::{
interpret::InterpretError,
inventory::Inventory,
modules::{
monitoring::{
grafana::grafana::Grafana, kube_prometheus::crd::service_monitor::ServiceMonitor,
},
prometheus::prometheus::PrometheusMonitoring,
},
topology::{
K8sclient, Topology,
installable::Installable,
oberservability::monitoring::{AlertReceiver, AlertSender, ScrapeTarget},
},
};
use harmony_k8s::K8sClient;
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[kube(
group = "monitoring.coreos.com",
version = "v1",
version = "v1alpha1",
kind = "AlertmanagerConfig",
plural = "alertmanagerconfigs",
namespaced,
derive = "Default"
namespaced
)]
pub struct AlertmanagerConfigSpec {
#[serde(flatten)]
pub data: serde_json::Value,
}
#[derive(Debug, Clone, Serialize)]
pub struct CRDPrometheus {
pub namespace: String,
pub client: Arc<K8sClient>,
pub service_monitor: Vec<ServiceMonitor>,
}
impl AlertSender for CRDPrometheus {
fn name(&self) -> String {
"CRDAlertManager".to_string()
}
}
impl Clone for Box<dyn AlertReceiver<CRDPrometheus>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
impl Clone for Box<dyn ScrapeTarget<CRDPrometheus>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
impl Serialize for Box<dyn AlertReceiver<CRDPrometheus>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
#[async_trait]
impl<T: Topology + K8sclient + PrometheusMonitoring<CRDPrometheus> + Grafana> Installable<T>
for CRDPrometheus
{
async fn configure(&self, inventory: &Inventory, topology: &T) -> Result<(), InterpretError> {
topology.ensure_grafana_operator(inventory).await?;
topology.ensure_prometheus_operator(self, inventory).await?;
Ok(())
}
async fn ensure_installed(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<(), InterpretError> {
topology.install_grafana().await?;
topology.install_prometheus(&self, inventory, None).await?;
Ok(())
}
}

View File

@@ -1,4 +1,4 @@
use crate::modules::monitoring::alert_rule::alerts::k8s::{
use crate::modules::prometheus::alerts::k8s::{
deployment::alert_deployment_unavailable,
pod::{alert_container_restarting, alert_pod_not_ready, pod_failed},
pvc::high_pvc_fill_rate_over_two_days,

View File

@@ -4,7 +4,7 @@ use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::modules::monitoring::kube_prometheus::crd::crd_prometheuses::LabelSelector;
use super::crd_prometheuses::LabelSelector;
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[kube(

View File

@@ -6,14 +6,13 @@ use serde::{Deserialize, Serialize};
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
#[derive(CustomResource, Default, Debug, Serialize, Deserialize, Clone, JsonSchema)]
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone, JsonSchema)]
#[kube(
group = "monitoring.coreos.com",
version = "v1",
kind = "PrometheusRule",
plural = "prometheusrules",
namespaced,
derive = "Default"
namespaced
)]
#[serde(rename_all = "camelCase")]
pub struct PrometheusRuleSpec {

View File

@@ -1,18 +1,23 @@
use std::collections::BTreeMap;
use std::net::IpAddr;
use async_trait::async_trait;
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::modules::monitoring::kube_prometheus::crd::crd_prometheuses::LabelSelector;
use crate::{
modules::monitoring::kube_prometheus::crd::{
crd_alertmanager_config::CRDPrometheus, crd_prometheuses::LabelSelector,
},
topology::oberservability::monitoring::ScrapeTarget,
};
#[derive(CustomResource, Default, Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[kube(
group = "monitoring.coreos.com",
version = "v1alpha1",
kind = "ScrapeConfig",
plural = "scrapeconfigs",
derive = "Default",
namespaced
)]
#[serde(rename_all = "camelCase")]
@@ -65,8 +70,8 @@ pub struct ScrapeConfigSpec {
#[serde(rename_all = "camelCase")]
pub struct StaticConfig {
pub targets: Vec<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub labels: Option<BTreeMap<String, String>>,
pub labels: Option<LabelSelector>,
}
/// Relabeling configuration for target or metric relabeling.

View File

@@ -1,9 +1,22 @@
pub mod crd_alertmanager_config;
pub mod crd_alertmanagers;
pub mod crd_default_rules;
pub mod crd_grafana;
pub mod crd_prometheus_rules;
pub mod crd_prometheuses;
pub mod crd_scrape_config;
pub mod grafana_default_dashboard;
pub mod grafana_operator;
pub mod prometheus_operator;
pub mod rhob_alertmanager_config;
pub mod rhob_alertmanagers;
pub mod rhob_cluster_observability_operator;
pub mod rhob_default_rules;
pub mod rhob_grafana;
pub mod rhob_monitoring_stack;
pub mod rhob_prometheus_rules;
pub mod rhob_prometheuses;
pub mod rhob_role;
pub mod rhob_service_monitor;
pub mod role;
pub mod service_monitor;

View File

@@ -0,0 +1,48 @@
use std::sync::Arc;
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::topology::oberservability::monitoring::{AlertReceiver, AlertSender};
use harmony_k8s::K8sClient;
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[kube(
group = "monitoring.rhobs",
version = "v1alpha1",
kind = "AlertmanagerConfig",
plural = "alertmanagerconfigs",
namespaced
)]
pub struct AlertmanagerConfigSpec {
#[serde(flatten)]
pub data: serde_json::Value,
}
#[derive(Debug, Clone, Serialize)]
pub struct RHOBObservability {
pub namespace: String,
pub client: Arc<K8sClient>,
}
impl AlertSender for RHOBObservability {
fn name(&self) -> String {
"RHOBAlertManager".to_string()
}
}
impl Clone for Box<dyn AlertReceiver<RHOBObservability>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
impl Serialize for Box<dyn AlertReceiver<RHOBObservability>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}

View File

@@ -2,7 +2,7 @@ use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::modules::monitoring::red_hat_cluster_observability::crd::rhob_prometheuses::LabelSelector;
use super::crd_prometheuses::LabelSelector;
/// Rust CRD for `Alertmanager` from Prometheus Operator
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]

View File

@@ -0,0 +1,22 @@
use std::str::FromStr;
use non_blank_string_rs::NonBlankString;
use crate::modules::helm::chart::HelmChartScore;
//TODO package chart or something for COO okd
pub fn rhob_cluster_observability_operator() -> HelmChartScore {
HelmChartScore {
namespace: None,
release_name: NonBlankString::from_str("").unwrap(),
chart_name: NonBlankString::from_str(
"oci://hub.nationtech.io/harmony/nt-prometheus-operator",
)
.unwrap(),
chart_version: None,
values_overrides: None,
values_yaml: None,
create_namespace: true,
install_only: true,
repository: None,
}
}

View File

@@ -1,11 +1,11 @@
use crate::modules::monitoring::{
alert_rule::alerts::k8s::{
use crate::modules::{
monitoring::kube_prometheus::crd::rhob_prometheus_rules::Rule,
prometheus::alerts::k8s::{
deployment::alert_deployment_unavailable,
pod::{alert_container_restarting, alert_pod_not_ready, pod_failed},
pvc::high_pvc_fill_rate_over_two_days,
service::alert_service_down,
},
red_hat_cluster_observability::crd::rhob_prometheus_rules::Rule,
};
pub fn build_default_application_rules() -> Vec<Rule> {

View File

@@ -4,7 +4,7 @@ use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::modules::monitoring::red_hat_cluster_observability::crd::rhob_prometheuses::LabelSelector;
use crate::modules::monitoring::kube_prometheus::crd::rhob_prometheuses::LabelSelector;
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[kube(

View File

@@ -2,7 +2,7 @@ use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::modules::monitoring::red_hat_cluster_observability::crd::rhob_prometheuses::LabelSelector;
use crate::modules::monitoring::kube_prometheus::crd::rhob_prometheuses::LabelSelector;
/// MonitoringStack CRD for monitoring.rhobs/v1alpha1
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
@@ -11,8 +11,7 @@ use crate::modules::monitoring::red_hat_cluster_observability::crd::rhob_prometh
version = "v1alpha1",
kind = "MonitoringStack",
plural = "monitoringstacks",
namespaced,
derive = "Default"
namespaced
)]
#[serde(rename_all = "camelCase")]
pub struct MonitoringStackSpec {

View File

@@ -4,6 +4,8 @@ use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::modules::monitoring::kube_prometheus::types::Operator;
#[derive(CustomResource, Serialize, Deserialize, Debug, Clone, JsonSchema)]
#[kube(
group = "monitoring.rhobs",
@@ -91,14 +93,6 @@ pub struct LabelSelectorRequirement {
pub values: Vec<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
pub enum Operator {
In,
NotIn,
Exists,
DoesNotExist,
}
impl Default for PrometheusSpec {
fn default() -> Self {
PrometheusSpec {

View File

@@ -1,13 +1,20 @@
use super::config::KubePrometheusConfig;
use log::debug;
use non_blank_string_rs::NonBlankString;
use serde_yaml::{Mapping, Value};
use std::{
collections::BTreeMap,
str::FromStr,
sync::{Arc, Mutex},
};
use crate::modules::{
helm::chart::HelmChartScore,
monitoring::kube_prometheus::types::{Limits, Requests, Resources},
monitoring::kube_prometheus::types::{
AlertGroup, AlertManager, AlertManagerAdditionalPromRules, AlertManagerConfig,
AlertManagerConfigSelector, AlertManagerRoute, AlertManagerSpec, AlertManagerValues,
ConfigReloader, Limits, PrometheusConfig, Requests, Resources,
},
};
pub fn kube_prometheus_helm_chart_score(
@@ -59,7 +66,7 @@ pub fn kube_prometheus_helm_chart_score(
}
let _resource_section = resource_block(&resource_limit, 2);
let values = format!(
let mut values = format!(
r#"
global:
rbac:
@@ -274,6 +281,131 @@ prometheusOperator:
"#,
);
let prometheus_config =
crate::modules::monitoring::kube_prometheus::types::PrometheusConfigValues {
prometheus: PrometheusConfig {
prometheus: bool::from_str(prometheus.as_str()).expect("couldn't parse bool"),
additional_service_monitors: config.additional_service_monitors.clone(),
},
};
let prometheus_config_yaml =
serde_yaml::to_string(&prometheus_config).expect("Failed to serialize YAML");
debug!(
"serialized prometheus config: \n {:#}",
prometheus_config_yaml
);
values.push_str(&prometheus_config_yaml);
// add required null receiver for prometheus alert manager
let mut null_receiver = Mapping::new();
null_receiver.insert(
Value::String("receiver".to_string()),
Value::String("null".to_string()),
);
null_receiver.insert(
Value::String("matchers".to_string()),
Value::Sequence(vec![Value::String("alertname!=Watchdog".to_string())]),
);
null_receiver.insert(Value::String("continue".to_string()), Value::Bool(true));
//add alert channels
let mut alert_manager_channel_config = AlertManagerConfig {
global: Mapping::new(),
route: AlertManagerRoute {
routes: vec![Value::Mapping(null_receiver)],
},
receivers: vec![serde_yaml::from_str("name: 'null'").unwrap()],
};
for receiver in config.alert_receiver_configs.iter() {
if let Some(global) = receiver.channel_global_config.clone() {
alert_manager_channel_config
.global
.insert(global.0, global.1);
}
alert_manager_channel_config
.route
.routes
.push(receiver.channel_route.clone());
alert_manager_channel_config
.receivers
.push(receiver.channel_receiver.clone());
}
let mut labels = BTreeMap::new();
labels.insert("alertmanagerConfig".to_string(), "enabled".to_string());
let alert_manager_config_selector = AlertManagerConfigSelector {
match_labels: labels,
};
let alert_manager_values = AlertManagerValues {
alertmanager: AlertManager {
enabled: config.alert_manager,
config: alert_manager_channel_config,
alertmanager_spec: AlertManagerSpec {
resources: Resources {
limits: Limits {
memory: "100Mi".to_string(),
cpu: "100m".to_string(),
},
requests: Requests {
memory: "100Mi".to_string(),
cpu: "100m".to_string(),
},
},
alert_manager_config_selector,
replicas: 2,
},
init_config_reloader: ConfigReloader {
resources: Resources {
limits: Limits {
memory: "100Mi".to_string(),
cpu: "100m".to_string(),
},
requests: Requests {
memory: "100Mi".to_string(),
cpu: "100m".to_string(),
},
},
},
},
};
let alert_manager_yaml =
serde_yaml::to_string(&alert_manager_values).expect("Failed to serialize YAML");
debug!("serialized alert manager: \n {:#}", alert_manager_yaml);
values.push_str(&alert_manager_yaml);
//format alert manager additional rules for helm chart
let mut merged_rules: BTreeMap<String, AlertGroup> = BTreeMap::new();
for additional_rule in config.alert_rules.clone() {
for (key, group) in additional_rule.rules {
merged_rules.insert(key, group);
}
}
let merged_rules = AlertManagerAdditionalPromRules {
rules: merged_rules,
};
let mut alert_manager_additional_rules = serde_yaml::Mapping::new();
let rules_value = serde_yaml::to_value(merged_rules).unwrap();
alert_manager_additional_rules.insert(
serde_yaml::Value::String("additionalPrometheusRulesMap".to_string()),
rules_value,
);
let alert_manager_additional_rules_yaml =
serde_yaml::to_string(&alert_manager_additional_rules).expect("Failed to serialize YAML");
debug!(
"alert_rules_yaml:\n{:#}",
alert_manager_additional_rules_yaml
);
values.push_str(&alert_manager_additional_rules_yaml);
debug!("full values.yaml: \n {:#}", values);
HelmChartScore {
namespace: Some(NonBlankString::from_str(&config.namespace.clone().unwrap()).unwrap()),
release_name: NonBlankString::from_str("kube-prometheus").unwrap(),

View File

@@ -2,41 +2,36 @@ use std::sync::{Arc, Mutex};
use serde::Serialize;
use super::helm::config::KubePrometheusConfig;
use super::{helm::config::KubePrometheusConfig, prometheus::KubePrometheus};
use crate::{
modules::monitoring::kube_prometheus::{KubePrometheus, types::ServiceMonitor},
modules::monitoring::kube_prometheus::types::ServiceMonitor,
score::Score,
topology::{
Topology,
monitoring::{AlertReceiver, AlertRule, AlertingInterpret, Observability, ScrapeTarget},
HelmCommand, Topology,
oberservability::monitoring::{AlertReceiver, AlertRule, AlertingInterpret},
tenant::TenantManager,
},
};
//TODO untested
#[derive(Clone, Debug, Serialize)]
pub struct KubePrometheusAlertingScore {
pub struct HelmPrometheusAlertingScore {
pub receivers: Vec<Box<dyn AlertReceiver<KubePrometheus>>>,
pub rules: Vec<Box<dyn AlertRule<KubePrometheus>>>,
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<KubePrometheus>>>>,
pub service_monitors: Vec<ServiceMonitor>,
pub config: Arc<Mutex<KubePrometheusConfig>>,
}
impl<T: Topology + Observability<KubePrometheus>> Score<T> for KubePrometheusAlertingScore {
impl<T: Topology + HelmCommand + TenantManager> Score<T> for HelmPrometheusAlertingScore {
fn create_interpret(&self) -> Box<dyn crate::interpret::Interpret<T>> {
//TODO test that additional service monitor is added
self.config
let config = Arc::new(Mutex::new(KubePrometheusConfig::new()));
config
.try_lock()
.expect("couldn't lock config")
.additional_service_monitors = self.service_monitors.clone();
Box::new(AlertingInterpret {
sender: KubePrometheus {
config: self.config.clone(),
},
sender: KubePrometheus { config },
receivers: self.receivers.clone(),
rules: self.rules.clone(),
scrape_targets: self.scrape_targets.clone(),
scrape_targets: None,
})
}
fn name(&self) -> String {

View File

@@ -1,71 +1,5 @@
use std::sync::{Arc, Mutex};
use async_trait::async_trait;
use serde::Serialize;
use crate::{
modules::monitoring::kube_prometheus::helm::config::KubePrometheusConfig,
topology::monitoring::{AlertReceiver, AlertRule, AlertSender, ScrapeTarget},
};
pub mod crd;
pub mod helm;
pub mod kube_prometheus_alerting_score;
pub mod score_kube_prometheus_alert_receivers;
pub mod score_kube_prometheus_ensure_ready;
pub mod score_kube_prometheus_rule;
pub mod score_kube_prometheus_scrape_target;
pub mod helm_prometheus_alert_score;
pub mod prometheus;
pub mod types;
impl Serialize for Box<dyn ScrapeTarget<KubePrometheus>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
#[async_trait]
impl AlertSender for KubePrometheus {
fn name(&self) -> String {
"HelmKubePrometheus".to_string()
}
}
#[derive(Clone, Debug, Serialize)]
pub struct KubePrometheus {
pub config: Arc<Mutex<KubePrometheusConfig>>,
}
impl Default for KubePrometheus {
fn default() -> Self {
Self::new()
}
}
impl KubePrometheus {
pub fn new() -> Self {
Self {
config: Arc::new(Mutex::new(KubePrometheusConfig::new())),
}
}
}
impl Serialize for Box<dyn AlertReceiver<KubePrometheus>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
impl Serialize for Box<dyn AlertRule<KubePrometheus>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}

View File

@@ -0,0 +1,167 @@
use std::sync::{Arc, Mutex};
use async_trait::async_trait;
use log::{debug, error};
use serde::Serialize;
use crate::{
interpret::{InterpretError, Outcome},
inventory::Inventory,
modules::monitoring::alert_rule::prometheus_alert_rule::AlertManagerRuleGroup,
score,
topology::{
HelmCommand, Topology,
installable::Installable,
oberservability::monitoring::{AlertReceiver, AlertRule, AlertSender},
tenant::TenantManager,
},
};
use score::Score;
use super::{
helm::{
config::KubePrometheusConfig, kube_prometheus_helm_chart::kube_prometheus_helm_chart_score,
},
types::{AlertManagerAdditionalPromRules, AlertManagerChannelConfig},
};
#[async_trait]
impl AlertSender for KubePrometheus {
fn name(&self) -> String {
"HelmKubePrometheus".to_string()
}
}
#[async_trait]
impl<T: Topology + HelmCommand + TenantManager> Installable<T> for KubePrometheus {
async fn configure(&self, _inventory: &Inventory, topology: &T) -> Result<(), InterpretError> {
self.configure_with_topology(topology).await;
Ok(())
}
async fn ensure_installed(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<(), InterpretError> {
self.install_prometheus(inventory, topology).await?;
Ok(())
}
}
#[derive(Debug)]
pub struct KubePrometheus {
pub config: Arc<Mutex<KubePrometheusConfig>>,
}
impl Default for KubePrometheus {
fn default() -> Self {
Self::new()
}
}
impl KubePrometheus {
pub fn new() -> Self {
Self {
config: Arc::new(Mutex::new(KubePrometheusConfig::new())),
}
}
pub async fn configure_with_topology<T: TenantManager>(&self, topology: &T) {
let ns = topology
.get_tenant_config()
.await
.map(|cfg| cfg.name.clone())
.unwrap_or_else(|| "monitoring".to_string());
error!("This must be refactored, see comments in pr #74");
debug!("NS: {}", ns);
self.config.lock().unwrap().namespace = Some(ns);
}
pub async fn install_receiver(
&self,
prometheus_receiver: &dyn KubePrometheusReceiver,
) -> Result<Outcome, InterpretError> {
let prom_receiver = prometheus_receiver.configure_receiver().await;
debug!(
"adding alert receiver to prometheus config: {:#?}",
&prom_receiver
);
let mut config = self.config.lock().unwrap();
config.alert_receiver_configs.push(prom_receiver);
let prom_receiver_name = prometheus_receiver.name();
debug!("installed alert receiver {}", &prom_receiver_name);
Ok(Outcome::success(format!(
"Sucessfully installed receiver {}",
prom_receiver_name
)))
}
pub async fn install_rule(
&self,
prometheus_rule: &AlertManagerRuleGroup,
) -> Result<Outcome, InterpretError> {
let prometheus_rule = prometheus_rule.configure_rule().await;
let mut config = self.config.lock().unwrap();
config.alert_rules.push(prometheus_rule.clone());
Ok(Outcome::success(format!(
"Successfully installed alert rule: {:#?},",
prometheus_rule
)))
}
pub async fn install_prometheus<T: Topology + HelmCommand + Send + Sync>(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
kube_prometheus_helm_chart_score(self.config.clone())
.interpret(inventory, topology)
.await
}
}
#[async_trait]
pub trait KubePrometheusReceiver: Send + Sync + std::fmt::Debug {
fn name(&self) -> String;
async fn configure_receiver(&self) -> AlertManagerChannelConfig;
}
impl Serialize for Box<dyn AlertReceiver<KubePrometheus>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
impl Clone for Box<dyn AlertReceiver<KubePrometheus>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
#[async_trait]
pub trait KubePrometheusRule: Send + Sync + std::fmt::Debug {
fn name(&self) -> String;
async fn configure_rule(&self) -> AlertManagerAdditionalPromRules;
}
impl Serialize for Box<dyn AlertRule<KubePrometheus>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
impl Clone for Box<dyn AlertRule<KubePrometheus>> {
fn clone(&self) -> Self {
self.clone_box()
}
}

View File

@@ -1,61 +0,0 @@
use kube::api::ObjectMeta;
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::{
k8s::resource::K8sResourceScore,
monitoring::kube_prometheus::{
KubePrometheus,
crd::crd_alertmanager_config::{AlertmanagerConfig, AlertmanagerConfigSpec},
},
},
score::Score,
topology::{K8sclient, Topology, monitoring::AlertReceiver},
};
#[derive(Debug, Clone, Serialize)]
pub struct KubePrometheusReceiverScore {
pub sender: KubePrometheus,
pub receiver: Box<dyn AlertReceiver<KubePrometheus>>,
}
impl<T: Topology + K8sclient> Score<T> for KubePrometheusReceiverScore {
fn name(&self) -> String {
"KubePrometheusReceiverScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
let name = self.receiver.name();
let namespace = self.sender.config.lock().unwrap().namespace.clone();
let install_plan = self.receiver.build().expect("failed to build install plan");
let route = install_plan.route.expect(&format!(
"failed to build route for receveiver {}",
name.clone()
));
let route = serde_yaml::to_value(route).expect("failed to serialize route object to yaml");
let receiver = install_plan.receiver.expect(&format!(
"failed to build receiver path for receiver {}",
name.clone()
));
let data = serde_json::json!({
"route": route,
"receivers": [receiver]
});
let alertmanager_config = AlertmanagerConfig {
metadata: ObjectMeta {
name: Some(name),
namespace: namespace.clone(),
..Default::default()
},
spec: AlertmanagerConfigSpec { data: data },
};
K8sResourceScore::single(alertmanager_config, namespace).create_interpret()
}
}

View File

@@ -1,80 +0,0 @@
use async_trait::async_trait;
use harmony_types::id::Id;
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
modules::monitoring::kube_prometheus::KubePrometheus,
score::Score,
topology::{K8sclient, Topology},
};
#[derive(Clone, Debug, Serialize)]
pub struct KubePrometheusEnsureReadyScore {
pub sender: KubePrometheus,
}
impl<T: Topology + K8sclient> Score<T> for KubePrometheusEnsureReadyScore {
fn name(&self) -> String {
"KubePrometheusEnsureReadyScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(KubePrometheusEnsureReadyInterpret {
sender: self.sender.clone(),
})
}
}
#[derive(Clone, Debug, Serialize)]
pub struct KubePrometheusEnsureReadyInterpret {
pub sender: KubePrometheus,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for KubePrometheusEnsureReadyInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let client = topology.k8s_client().await?;
let namespace = self
.sender
.config
.lock()
.unwrap()
.namespace
.clone()
.unwrap_or("default".to_string());
let prometheus_name = "kube-prometheues-kube-prometheus-operator";
client
.wait_until_deployment_ready(prometheus_name, Some(&namespace), None)
.await?;
Ok(Outcome::success(format!(
"deployment: {} ready in ns: {}",
prometheus_name, namespace
)))
}
fn get_name(&self) -> InterpretName {
todo!()
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}

View File

@@ -1,46 +0,0 @@
use kube::api::ObjectMeta;
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::{
k8s::resource::K8sResourceScore,
monitoring::kube_prometheus::{
KubePrometheus,
crd::crd_prometheus_rules::{PrometheusRule, PrometheusRuleSpec, RuleGroup},
},
},
score::Score,
topology::{K8sclient, Topology, monitoring::AlertRule},
};
#[derive(Debug, Clone, Serialize)]
pub struct KubePrometheusRuleScore {
pub sender: KubePrometheus,
pub rule: Box<dyn AlertRule<KubePrometheus>>,
}
impl<T: Topology + K8sclient> Score<T> for KubePrometheusRuleScore {
fn name(&self) -> String {
"KubePrometheusRuleScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
let name = self.rule.name();
let namespace = self.sender.config.lock().unwrap().namespace.clone();
let groups: Vec<RuleGroup> =
serde_json::from_value(self.rule.build_rule().expect("failed to build alert rule"))
.expect("failed to serialize rule group");
let prometheus_rule = PrometheusRule {
metadata: ObjectMeta {
name: Some(name.clone()),
namespace: namespace.clone(),
..Default::default()
},
spec: PrometheusRuleSpec { groups },
};
K8sResourceScore::single(prometheus_rule, namespace).create_interpret()
}
}

View File

@@ -1,61 +0,0 @@
use kube::api::ObjectMeta;
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::{
k8s::resource::K8sResourceScore,
monitoring::kube_prometheus::{
KubePrometheus,
crd::crd_scrape_config::{ScrapeConfig, ScrapeConfigSpec, StaticConfig},
},
},
score::Score,
topology::{K8sclient, Topology, monitoring::ScrapeTarget},
};
#[derive(Debug, Clone, Serialize)]
pub struct KubePrometheusScrapeTargetScore {
pub sender: KubePrometheus,
pub scrape_target: Box<dyn ScrapeTarget<KubePrometheus>>,
}
impl<T: Topology + K8sclient> Score<T> for KubePrometheusScrapeTargetScore {
fn name(&self) -> String {
"KubePrometheusScrapeTargetScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
let name = self.scrape_target.name();
let namespace = self.sender.config.lock().unwrap().namespace.clone();
let external_target = self
.scrape_target
.build_scrape_target()
.expect("failed to build external scrape target");
//TODO this may need to modified to include a scrapeConfigSelector label from the
//prometheus operator
let labels = external_target.labels;
let scrape_target = ScrapeConfig {
metadata: ObjectMeta {
name: Some(name.clone()),
namespace: namespace.clone(),
..Default::default()
},
spec: ScrapeConfigSpec {
static_configs: Some(vec![StaticConfig {
targets: vec![format!("{}:{}", external_target.ip, external_target.port)],
labels,
}]),
metrics_path: external_target.path,
scrape_interval: external_target.interval,
job_name: Some(name),
..Default::default()
},
};
K8sResourceScore::single(scrape_target, namespace).create_interpret()
}
}

View File

@@ -3,7 +3,7 @@ use std::collections::{BTreeMap, HashMap};
use async_trait::async_trait;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use serde_yaml::Value;
use serde_yaml::{Mapping, Sequence, Value};
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::AlertManagerRuleGroup;
@@ -12,6 +12,36 @@ pub trait AlertChannelConfig {
async fn get_config(&self) -> AlertManagerChannelConfig;
}
#[derive(Debug, Clone, Serialize)]
pub struct AlertManagerValues {
pub alertmanager: AlertManager,
}
#[derive(Debug, Clone, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct AlertManager {
pub enabled: bool,
pub config: AlertManagerConfig,
pub alertmanager_spec: AlertManagerSpec,
pub init_config_reloader: ConfigReloader,
}
#[derive(Debug, Clone, Serialize)]
pub struct ConfigReloader {
pub resources: Resources,
}
#[derive(Debug, Clone, Serialize)]
pub struct AlertManagerConfig {
pub global: Mapping,
pub route: AlertManagerRoute,
pub receivers: Sequence,
}
#[derive(Debug, Clone, Serialize)]
pub struct AlertManagerRoute {
pub routes: Sequence,
}
#[derive(Debug, Clone, Serialize)]
pub struct AlertManagerChannelConfig {
///expecting an option that contains two values
@@ -22,6 +52,20 @@ pub struct AlertManagerChannelConfig {
pub channel_receiver: Value,
}
#[derive(Debug, Clone, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct AlertManagerSpec {
pub(crate) resources: Resources,
pub replicas: u32,
pub alert_manager_config_selector: AlertManagerConfigSelector,
}
#[derive(Debug, Clone, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct AlertManagerConfigSelector {
pub match_labels: BTreeMap<String, String>,
}
#[derive(Debug, Clone, Serialize)]
pub struct Resources {
pub limits: Limits,

View File

@@ -6,5 +6,4 @@ pub mod kube_prometheus;
pub mod ntfy;
pub mod okd;
pub mod prometheus;
pub mod red_hat_cluster_observability;
pub mod scrape_target;

View File

@@ -0,0 +1,270 @@
use base64::prelude::*;
use async_trait::async_trait;
use harmony_types::id::Id;
use kube::api::DynamicObject;
use log::{debug, info, trace};
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
modules::monitoring::okd::OpenshiftClusterAlertSender,
score::Score,
topology::{K8sclient, Topology, oberservability::monitoring::AlertReceiver},
};
impl Clone for Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
fn clone(&self) -> Self {
self.clone_box()
}
}
impl Serialize for Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
#[derive(Debug, Clone, Serialize)]
pub struct OpenshiftClusterAlertScore {
pub receivers: Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
}
impl<T: Topology + K8sclient> Score<T> for OpenshiftClusterAlertScore {
fn name(&self) -> String {
"ClusterAlertScore".to_string()
}
#[doc(hidden)]
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(OpenshiftClusterAlertInterpret {
receivers: self.receivers.clone(),
})
}
}
#[derive(Debug)]
pub struct OpenshiftClusterAlertInterpret {
receivers: Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftClusterAlertInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let client = topology.k8s_client().await?;
let openshift_monitoring_namespace = "openshift-monitoring";
let mut alertmanager_main_secret: DynamicObject = client
.get_secret_json_value("alertmanager-main", Some(openshift_monitoring_namespace))
.await?;
trace!("Got secret {alertmanager_main_secret:#?}");
let data: &mut serde_json::Value = &mut alertmanager_main_secret.data;
trace!("Alertmanager-main secret data {data:#?}");
let data_obj = data
.get_mut("data")
.ok_or(InterpretError::new(
"Missing 'data' field in alertmanager-main secret.".to_string(),
))?
.as_object_mut()
.ok_or(InterpretError::new(
"'data' field in alertmanager-main secret is expected to be an object ."
.to_string(),
))?;
let config_b64 = data_obj
.get("alertmanager.yaml")
.ok_or(InterpretError::new(
"Missing 'alertmanager.yaml' in alertmanager-main secret data".to_string(),
))?
.as_str()
.unwrap_or("");
trace!("Config base64 {config_b64}");
let config_bytes = BASE64_STANDARD.decode(config_b64).unwrap_or_default();
let mut am_config: serde_yaml::Value =
serde_yaml::from_str(&String::from_utf8(config_bytes).unwrap_or_default())
.unwrap_or_default();
debug!("Current alertmanager config {am_config:#?}");
let existing_receivers_sequence = if let Some(receivers) = am_config.get_mut("receivers") {
match receivers.as_sequence_mut() {
Some(seq) => seq,
None => {
return Err(InterpretError::new(format!(
"Expected alertmanager config receivers to be a sequence, got {:?}",
receivers
)));
}
}
} else {
&mut serde_yaml::Sequence::default()
};
let mut additional_resources = vec![];
for custom_receiver in &self.receivers {
let name = custom_receiver.name();
let alertmanager_receiver = custom_receiver.as_alertmanager_receiver()?;
let receiver_json_value = alertmanager_receiver.receiver_config;
let receiver_yaml_string =
serde_json::to_string(&receiver_json_value).map_err(|e| {
InterpretError::new(format!("Failed to serialize receiver config: {}", e))
})?;
let receiver_yaml_value: serde_yaml::Value =
serde_yaml::from_str(&receiver_yaml_string).map_err(|e| {
InterpretError::new(format!("Failed to parse receiver config as YAML: {}", e))
})?;
if let Some(idx) = existing_receivers_sequence.iter().position(|r| {
r.get("name")
.and_then(|n| n.as_str())
.map_or(false, |n| n == name)
}) {
info!("Replacing existing AlertManager receiver: {}", name);
existing_receivers_sequence[idx] = receiver_yaml_value;
} else {
debug!("Adding new AlertManager receiver: {}", name);
existing_receivers_sequence.push(receiver_yaml_value);
}
additional_resources.push(alertmanager_receiver.additional_ressources);
}
let existing_route_mapping = if let Some(route) = am_config.get_mut("route") {
match route.as_mapping_mut() {
Some(map) => map,
None => {
return Err(InterpretError::new(format!(
"Expected alertmanager config route to be a mapping, got {:?}",
route
)));
}
}
} else {
&mut serde_yaml::Mapping::default()
};
let existing_route_sequence = if let Some(routes) = existing_route_mapping.get_mut("routes")
{
match routes.as_sequence_mut() {
Some(seq) => seq,
None => {
return Err(InterpretError::new(format!(
"Expected alertmanager config routes to be a sequence, got {:?}",
routes
)));
}
}
} else {
&mut serde_yaml::Sequence::default()
};
for custom_receiver in &self.receivers {
let name = custom_receiver.name();
let alertmanager_receiver = custom_receiver.as_alertmanager_receiver()?;
let route_json_value = alertmanager_receiver.route_config;
let route_yaml_string = serde_json::to_string(&route_json_value).map_err(|e| {
InterpretError::new(format!("Failed to serialize route config: {}", e))
})?;
let route_yaml_value: serde_yaml::Value = serde_yaml::from_str(&route_yaml_string)
.map_err(|e| {
InterpretError::new(format!("Failed to parse route config as YAML: {}", e))
})?;
if let Some(idy) = existing_route_sequence.iter().position(|r| {
r.get("receiver")
.and_then(|n| n.as_str())
.map_or(false, |n| n == name)
}) {
info!("Replacing existing AlertManager receiver: {}", name);
existing_route_sequence[idy] = route_yaml_value;
} else {
debug!("Adding new AlertManager receiver: {}", name);
existing_route_sequence.push(route_yaml_value);
}
}
debug!("Current alertmanager config {am_config:#?}");
// TODO
// - save new version of alertmanager config
// - write additional ressources to the cluster
let am_config = serde_yaml::to_string(&am_config).map_err(|e| {
InterpretError::new(format!(
"Failed to serialize new alertmanager config to string : {e}"
))
})?;
let mut am_config_b64 = String::new();
BASE64_STANDARD.encode_string(am_config, &mut am_config_b64);
// TODO put update configmap value and save new value
data_obj.insert(
"alertmanager.yaml".to_string(),
serde_json::Value::String(am_config_b64),
);
// https://kubernetes.io/docs/reference/using-api/server-side-apply/#field-management
alertmanager_main_secret.metadata.managed_fields = None;
trace!("Applying new alertmanager_main_secret {alertmanager_main_secret:#?}");
client
.apply_dynamic(
&alertmanager_main_secret,
Some(openshift_monitoring_namespace),
true,
)
.await?;
let additional_resources = additional_resources.concat();
trace!("Applying additional ressources for alert receivers {additional_resources:#?}");
client
.apply_dynamic_many(
&additional_resources,
Some(openshift_monitoring_namespace),
true,
)
.await?;
Ok(Outcome::success(format!(
"Successfully configured {} cluster alert receivers: {}",
self.receivers.len(),
self.receivers
.iter()
.map(|r| r.name())
.collect::<Vec<_>>()
.join(", ")
)))
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("OpenshiftClusterAlertInterpret")
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}

View File

@@ -1,58 +0,0 @@
use std::collections::BTreeMap;
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone, JsonSchema, Default)]
#[kube(
group = "monitoring.openshift.io",
version = "v1",
kind = "AlertingRule",
plural = "alertingrules",
namespaced,
derive = "Default"
)]
#[serde(rename_all = "camelCase")]
pub struct AlertingRuleSpec {
pub groups: serde_json::Value,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
pub struct RuleGroup {
pub name: String,
pub rules: Vec<Rule>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct Rule {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub alert: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub expr: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub for_: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub labels: Option<std::collections::BTreeMap<String, String>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub annotations: Option<std::collections::BTreeMap<String, String>>,
}
impl From<PrometheusAlertRule> for Rule {
fn from(value: PrometheusAlertRule) -> Self {
Rule {
alert: Some(value.alert),
expr: Some(value.expr),
for_: value.r#for,
labels: Some(value.labels.into_iter().collect::<BTreeMap<_, _>>()),
annotations: Some(value.annotations.into_iter().collect::<BTreeMap<_, _>>()),
}
}
}

View File

@@ -1,3 +0,0 @@
pub mod alerting_rules;
pub mod scrape_target;
pub mod service_monitor;

View File

@@ -1,72 +0,0 @@
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone, JsonSchema, Default)]
#[kube(
group = "monitoring.coreos.com",
version = "v1alpha1",
kind = "ScrapeConfig",
plural = "scrapeconfigs",
namespaced,
derive = "Default"
)]
#[serde(rename_all = "camelCase")]
pub struct ScrapeConfigSpec {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub job_name: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub metrics_path: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub scrape_interval: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub scrape_timeout: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub scheme: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub static_configs: Option<Vec<StaticConfig>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub relabelings: Option<Vec<RelabelConfig>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub metric_relabelings: Option<Vec<RelabelConfig>>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
#[serde(rename_all = "camelCase")]
pub struct StaticConfig {
/// targets: ["host:port"]
pub targets: Vec<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub labels: Option<BTreeMap<String, String>>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
#[serde(rename_all = "camelCase")]
pub struct RelabelConfig {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub source_labels: Option<Vec<String>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub separator: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub target_label: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub replacement: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub action: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub regex: Option<String>,
}

View File

@@ -1,97 +0,0 @@
use kube::CustomResource;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
#[derive(CustomResource, Debug, Serialize, Deserialize, Clone, JsonSchema, Default)]
#[kube(
group = "monitoring.coreos.com",
version = "v1",
kind = "ServiceMonitor",
plural = "servicemonitors",
namespaced,
derive = "Default"
)]
#[serde(rename_all = "camelCase")]
pub struct ServiceMonitorSpec {
/// The label to use to retrieve the job name from.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub job_label: Option<String>,
/// TargetLabels transfers labels on the Kubernetes Service onto the target.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub target_labels: Option<Vec<String>>,
/// PodTargetLabels transfers labels on the Kubernetes Pod onto the target.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub pod_target_labels: Option<Vec<String>>,
/// A list of endpoints allowed as part of this ServiceMonitor.
pub endpoints: Vec<Endpoint>,
/// Selector to select Endpoints objects.
pub selector: LabelSelector,
/// Selector to select which namespaces the Kubernetes Endpoints objects are discovered from.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub namespace_selector: Option<NamespaceSelector>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
#[serde(rename_all = "camelCase")]
pub struct Endpoint {
/// Name of the service port this endpoint refers to.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub port: Option<String>,
/// HTTP path to scrape for metrics.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub path: Option<String>,
/// HTTP scheme to use for scraping.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub scheme: Option<String>,
/// Interval at which metrics should be scraped.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub interval: Option<String>,
/// Timeout after which the scrape is ended.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub scrape_timeout: Option<String>,
/// HonorLabels chooses the metric's labels on collisions with target labels.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub honor_labels: Option<bool>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
#[serde(rename_all = "camelCase")]
pub struct LabelSelector {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub match_labels: Option<BTreeMap<String, String>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub match_expressions: Option<Vec<LabelSelectorRequirement>>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
#[serde(rename_all = "camelCase")]
pub struct LabelSelectorRequirement {
pub key: String,
pub operator: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub values: Option<Vec<String>>,
}
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema, Default)]
#[serde(rename_all = "camelCase")]
pub struct NamespaceSelector {
/// Boolean describing whether all namespaces are selected in contrast to a list restricting them.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub any: Option<bool>,
/// List of namespace names.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub match_names: Option<Vec<String>>,
}

View File

@@ -0,0 +1,60 @@
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
modules::monitoring::okd::config::Config,
score::Score,
topology::{K8sclient, Topology},
};
use async_trait::async_trait;
use harmony_types::id::Id;
use serde::Serialize;
#[derive(Clone, Debug, Serialize)]
pub struct OpenshiftUserWorkloadMonitoring {}
impl<T: Topology + K8sclient> Score<T> for OpenshiftUserWorkloadMonitoring {
fn name(&self) -> String {
"OpenshiftUserWorkloadMonitoringScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(OpenshiftUserWorkloadMonitoringInterpret {})
}
}
#[derive(Clone, Debug, Serialize)]
pub struct OpenshiftUserWorkloadMonitoringInterpret {}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftUserWorkloadMonitoringInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let client = topology.k8s_client().await.unwrap();
Config::create_cluster_monitoring_config_cm(&client).await?;
Config::create_user_workload_monitoring_config_cm(&client).await?;
Config::verify_user_workload(&client).await?;
Ok(Outcome::success(
"successfully enabled user-workload-monitoring".to_string(),
))
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("OpenshiftUserWorkloadMonitoring")
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}

View File

@@ -1,17 +1,10 @@
use serde::Serialize;
use crate::topology::oberservability::monitoring::AlertSender;
use crate::topology::monitoring::{AlertReceiver, AlertRule, AlertSender, ScrapeTarget};
pub mod cluster_monitoring;
pub(crate) mod config;
pub mod enable_user_workload;
pub mod crd;
pub mod openshift_cluster_alerting_score;
pub mod score_enable_cluster_monitoring;
pub mod score_openshift_alert_rule;
pub mod score_openshift_receiver;
pub mod score_openshift_scrape_target;
pub mod score_user_workload;
pub mod score_verify_user_workload_monitoring;
#[derive(Debug, Clone, Serialize)]
#[derive(Debug)]
pub struct OpenshiftClusterAlertSender;
impl AlertSender for OpenshiftClusterAlertSender {
@@ -19,30 +12,3 @@ impl AlertSender for OpenshiftClusterAlertSender {
"OpenshiftClusterAlertSender".to_string()
}
}
impl Serialize for Box<dyn AlertReceiver<OpenshiftClusterAlertSender>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
impl Serialize for Box<dyn AlertRule<OpenshiftClusterAlertSender>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}
impl Serialize for Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>> {
fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
todo!()
}
}

View File

@@ -1,36 +0,0 @@
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::monitoring::okd::OpenshiftClusterAlertSender,
score::Score,
topology::{
Topology,
monitoring::{AlertReceiver, AlertRule, AlertingInterpret, Observability, ScrapeTarget},
},
};
#[derive(Debug, Clone, Serialize)]
pub struct OpenshiftClusterAlertScore {
pub sender: OpenshiftClusterAlertSender,
pub receivers: Vec<Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>>,
pub rules: Vec<Box<dyn AlertRule<OpenshiftClusterAlertSender>>>,
pub scrape_targets: Option<Vec<Box<dyn ScrapeTarget<OpenshiftClusterAlertSender>>>>,
}
impl<T: Topology + Observability<OpenshiftClusterAlertSender>> Score<T>
for OpenshiftClusterAlertScore
{
fn name(&self) -> String {
"OpenshiftClusterAlertScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(AlertingInterpret {
sender: OpenshiftClusterAlertSender,
receivers: self.receivers.clone(),
rules: self.rules.clone(),
scrape_targets: self.scrape_targets.clone(),
})
}
}

View File

@@ -1,150 +0,0 @@
use std::{collections::BTreeMap, sync::Arc};
use async_trait::async_trait;
use harmony_k8s::K8sClient;
use harmony_types::id::Id;
use k8s_openapi::api::core::v1::ConfigMap;
use kube::api::{GroupVersionKind, ObjectMeta};
use log::debug;
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
modules::k8s::resource::K8sResourceScore,
score::Score,
topology::{K8sclient, Topology},
};
#[derive(Clone, Debug, Serialize)]
pub struct OpenshiftEnableClusterMonitoringScore {}
impl<T: Topology + K8sclient> Score<T> for OpenshiftEnableClusterMonitoringScore {
fn name(&self) -> String {
"OpenshiftClusterMonitoringScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(OpenshiftEnableClusterMonitoringInterpret {})
}
}
#[derive(Clone, Debug, Serialize)]
pub struct OpenshiftEnableClusterMonitoringInterpret {}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftEnableClusterMonitoringInterpret {
async fn execute(
&self,
inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let namespace = "openshift-monitoring".to_string();
let name = "cluster-monitoring-config".to_string();
let client = topology.k8s_client().await?;
let enabled = self
.check_cluster_monitoring_enabled(client, &name, &namespace)
.await
.map_err(|e| InterpretError::new(e))?;
debug!("enabled {:#?}", enabled);
match enabled {
true => Ok(Outcome::success(
"Openshift Cluster Monitoring already enabled".to_string(),
)),
false => {
let mut data = BTreeMap::new();
data.insert(
"config.yaml".to_string(),
r#"
enableUserWorkload: true
alertmanagerMain:
enableUserAlertmanagerConfig: true
"#
.to_string(),
);
let cm = ConfigMap {
metadata: ObjectMeta {
name: Some(name),
namespace: Some(namespace.clone()),
..Default::default()
},
data: Some(data),
..Default::default()
};
K8sResourceScore::single(cm, Some(namespace))
.create_interpret()
.execute(inventory, topology)
.await?;
Ok(Outcome::success(
"Successfully enabled Openshift Cluster Monitoring".to_string(),
))
}
}
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("OpenshiftEnableClusterMonitoringInterpret")
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
impl OpenshiftEnableClusterMonitoringInterpret {
async fn check_cluster_monitoring_enabled(
&self,
client: Arc<K8sClient>,
name: &str,
namespace: &str,
) -> Result<bool, String> {
let gvk = GroupVersionKind {
group: "".to_string(),
version: "v1".to_string(),
kind: "ConfigMap".to_string(),
};
let cm = match client
.get_resource_json_value(name, Some(namespace), &gvk)
.await
{
Ok(obj) => obj,
Err(_) => return Ok(false),
};
debug!("{:#?}", cm.data.pointer("/data/config.yaml"));
let config_yaml_str = match cm
.data
.pointer("/data/config.yaml")
.and_then(|v| v.as_str())
{
Some(s) => s,
None => return Ok(false),
};
debug!("{:#?}", config_yaml_str);
let parsed_config: serde_yaml::Value = serde_yaml::from_str(config_yaml_str)
.map_err(|e| format!("Failed to parse nested YAML: {}", e))?;
let enabled = parsed_config
.get("enableUserWorkload")
.and_then(|v| v.as_bool())
.unwrap_or(false);
debug!("{:#?}", enabled);
Ok(enabled)
}
}

View File

@@ -1,42 +0,0 @@
use kube::api::ObjectMeta;
use serde::Serialize;
use crate::{
interpret::Interpret,
modules::{
k8s::resource::K8sResourceScore,
monitoring::okd::{
OpenshiftClusterAlertSender,
crd::alerting_rules::{AlertingRule, AlertingRuleSpec},
},
},
score::Score,
topology::{K8sclient, Topology, monitoring::AlertRule},
};
#[derive(Debug, Clone, Serialize)]
pub struct OpenshiftAlertRuleScore {
pub rule: Box<dyn AlertRule<OpenshiftClusterAlertSender>>,
}
impl<T: Topology + K8sclient> Score<T> for OpenshiftAlertRuleScore {
fn name(&self) -> String {
"OpenshiftAlertingRuleScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
let namespace = "openshift-monitoring".to_string();
let alerting_rule = AlertingRule {
metadata: ObjectMeta {
name: Some(self.rule.name()),
namespace: Some(namespace.clone()),
..Default::default()
},
spec: AlertingRuleSpec {
groups: self.rule.build_rule().unwrap(),
},
};
K8sResourceScore::single(alerting_rule, Some(namespace)).create_interpret()
}
}

View File

@@ -1,213 +0,0 @@
use async_trait::async_trait;
use base64::{Engine as _, prelude::BASE64_STANDARD};
use harmony_types::id::Id;
use kube::api::DynamicObject;
use serde::Serialize;
use crate::{
data::Version,
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
inventory::Inventory,
modules::monitoring::okd::OpenshiftClusterAlertSender,
score::Score,
topology::{
K8sclient, Topology,
monitoring::{AlertReceiver, AlertRoute, MatchOp},
},
};
#[derive(Debug, Clone, Serialize)]
pub struct OpenshiftReceiverScore {
pub receiver: Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>,
}
impl<T: Topology + K8sclient> Score<T> for OpenshiftReceiverScore {
fn name(&self) -> String {
"OpenshiftAlertReceiverScore".to_string()
}
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
Box::new(OpenshiftReceiverInterpret {
receiver: self.receiver.clone(),
})
}
}
#[derive(Debug)]
pub struct OpenshiftReceiverInterpret {
receiver: Box<dyn AlertReceiver<OpenshiftClusterAlertSender>>,
}
#[async_trait]
impl<T: Topology + K8sclient> Interpret<T> for OpenshiftReceiverInterpret {
async fn execute(
&self,
_inventory: &Inventory,
topology: &T,
) -> Result<Outcome, InterpretError> {
let client = topology.k8s_client().await?;
let ns = "openshift-monitoring";
let mut am_secret: DynamicObject = client
.get_secret_json_value("alertmanager-main", Some(ns))
.await?;
let data = am_secret
.data
.get_mut("data")
.ok_or_else(|| {
InterpretError::new("Missing 'data' field in alertmanager-main secret".into())
})?
.as_object_mut()
.ok_or_else(|| InterpretError::new("'data' field must be a JSON object".into()))?;
let config_b64 = data
.get("alertmanager.yaml")
.and_then(|v| v.as_str())
.unwrap_or_default();
let config_bytes = BASE64_STANDARD.decode(config_b64).unwrap_or_default();
let mut am_config: serde_yaml::Value = serde_yaml::from_slice(&config_bytes)
.unwrap_or_else(|_| serde_yaml::Value::Mapping(serde_yaml::Mapping::new()));
let name = self.receiver.name();
let install_plan = self.receiver.build().expect("failed to build install plan");
let receiver = install_plan.receiver.expect("unable to find receiver path");
let alert_route = install_plan
.route
.ok_or_else(|| InterpretError::new("missing route".into()))?;
let route = self.serialize_route(&alert_route);
let route = serde_yaml::to_value(route).map_err(|e| InterpretError::new(e.to_string()))?;
if am_config.get("receivers").is_none() {
am_config["receivers"] = serde_yaml::Value::Sequence(vec![]);
}
if am_config.get("route").is_none() {
am_config["route"] = serde_yaml::Value::Mapping(serde_yaml::Mapping::new());
}
if am_config["route"].get("routes").is_none() {
am_config["route"]["routes"] = serde_yaml::Value::Sequence(vec![]);
}
{
let receivers_seq = am_config["receivers"].as_sequence_mut().unwrap();
if let Some(idx) = receivers_seq
.iter()
.position(|r| r.get("name").and_then(|n| n.as_str()) == Some(&name))
{
receivers_seq[idx] = receiver;
} else {
receivers_seq.push(receiver);
}
}
{
let route_seq = am_config["route"]["routes"].as_sequence_mut().unwrap();
if let Some(idx) = route_seq
.iter()
.position(|r| r.get("receiver").and_then(|n| n.as_str()) == Some(&name))
{
route_seq[idx] = route;
} else {
route_seq.push(route);
}
}
let yaml_str =
serde_yaml::to_string(&am_config).map_err(|e| InterpretError::new(e.to_string()))?;
let mut yaml_b64 = String::new();
BASE64_STANDARD.encode_string(yaml_str, &mut yaml_b64);
data.insert(
"alertmanager.yaml".to_string(),
serde_json::Value::String(yaml_b64),
);
am_secret.metadata.managed_fields = None;
client.apply_dynamic(&am_secret, Some(ns), true).await?;
Ok(Outcome::success(format!(
"Configured OpenShift cluster alert receiver: {}",
name
)))
}
fn get_name(&self) -> InterpretName {
InterpretName::Custom("OpenshiftAlertReceiverInterpret")
}
fn get_version(&self) -> Version {
todo!()
}
fn get_status(&self) -> InterpretStatus {
todo!()
}
fn get_children(&self) -> Vec<Id> {
todo!()
}
}
impl OpenshiftReceiverInterpret {
fn serialize_route(&self, route: &AlertRoute) -> serde_yaml::Value {
// Convert matchers
let matchers: Vec<String> = route
.matchers
.iter()
.map(|m| match m.operator {
MatchOp::Eq => format!("{} = {}", m.label, m.value),
MatchOp::NotEq => format!("{} != {}", m.label, m.value),
MatchOp::Regex => format!("{} =~ {}", m.label, m.value),
})
.collect();
// Recursively convert children routes
let children: Vec<serde_yaml::Value> = route
.children
.iter()
.map(|c| self.serialize_route(c))
.collect();
// Build the YAML object for this route
let mut route_map = serde_yaml::Mapping::new();
route_map.insert(
serde_yaml::Value::String("receiver".to_string()),
serde_yaml::Value::String(route.receiver.clone()),
);
if !matchers.is_empty() {
route_map.insert(
serde_yaml::Value::String("matchers".to_string()),
serde_yaml::to_value(matchers).unwrap(),
);
}
if !route.group_by.is_empty() {
route_map.insert(
serde_yaml::Value::String("group_by".to_string()),
serde_yaml::to_value(route.group_by.clone()).unwrap(),
);
}
if let Some(ref interval) = route.repeat_interval {
route_map.insert(
serde_yaml::Value::String("repeat_interval".to_string()),
serde_yaml::Value::String(interval.clone()),
);
}
route_map.insert(
serde_yaml::Value::String("continue".to_string()),
serde_yaml::Value::Bool(route.continue_matching),
);
if !children.is_empty() {
route_map.insert(
serde_yaml::Value::String("routes".to_string()),
serde_yaml::Value::Sequence(children),
);
}
serde_yaml::Value::Mapping(route_map)
}
}

Some files were not shown because too many files have changed in this diff Show More