Compare commits
5 Commits
feat/kube-
...
feat/teams
| Author | SHA1 | Date | |
|---|---|---|---|
| f96572848a | |||
| 78aadadd22 | |||
| b5c6e1c99d | |||
| f94c899bf7 | |||
| 77eb1228be |
@@ -1,14 +0,0 @@
|
|||||||
name: Run Check Script
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
pull_request:
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
check:
|
|
||||||
runs-on: rust-cargo
|
|
||||||
steps:
|
|
||||||
- name: Checkout code
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
|
|
||||||
- name: Run check script
|
|
||||||
run: bash check.sh
|
|
||||||
@@ -1,36 +0,0 @@
|
|||||||
# Contributing to the Harmony project
|
|
||||||
|
|
||||||
## Write small P-R
|
|
||||||
|
|
||||||
Aim for the smallest piece of work that is mergeable.
|
|
||||||
|
|
||||||
Mergeable means that :
|
|
||||||
|
|
||||||
- it does not break the build
|
|
||||||
- it moves the codebase one step forward
|
|
||||||
|
|
||||||
P-Rs can be many things, they do not have to be complete features.
|
|
||||||
|
|
||||||
### What a P-R **should** be
|
|
||||||
|
|
||||||
- Introduce a new trait : This will be the place to discuss the new trait addition, its design and implementation
|
|
||||||
- A new implementation of a trait : a new concrete implementation of the LoadBalancer trait
|
|
||||||
- A new CI check : something that improves quality, robustness, ci performance
|
|
||||||
- Documentation improvements
|
|
||||||
- Refactoring
|
|
||||||
- Bugfix
|
|
||||||
|
|
||||||
### What a P-R **should not** be
|
|
||||||
|
|
||||||
- Large. Anything over 200 lines (excluding generated lines) should have a very good reason to be this large.
|
|
||||||
- A mix of refactoring, bug fixes and new features.
|
|
||||||
- Introducing multiple new features or ideas at once.
|
|
||||||
- Multiple new implementations of a trait/functionnality at once
|
|
||||||
|
|
||||||
The general idea is to keep P-Rs small and single purpose.
|
|
||||||
|
|
||||||
## Commit message formatting
|
|
||||||
|
|
||||||
We follow conventional commits guidelines.
|
|
||||||
|
|
||||||
https://www.conventionalcommits.org/en/v1.0.0/
|
|
||||||
@@ -1,6 +1,6 @@
|
|||||||
# Architecture Decision Record: \<Title\>
|
# Architecture Decision Record: \<Title\>
|
||||||
|
|
||||||
Initial Author: \<Name\>
|
Name: \<Name\>
|
||||||
|
|
||||||
Initial Date: \<Date\>
|
Initial Date: \<Date\>
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# Architecture Decision Record: Helm and Kustomize Handling
|
# Architecture Decision Record: Helm and Kustomize Handling
|
||||||
|
|
||||||
Initial Author: Taha Hawa
|
Name: Taha Hawa
|
||||||
|
|
||||||
Initial Date: 2025-04-15
|
Initial Date: 2025-04-15
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
# Architecture Decision Record: Monitoring and Alerting
|
# Architecture Decision Record: Monitoring and Alerting
|
||||||
|
|
||||||
Initial Author : Willem Rolleman
|
Proposed by: Willem Rolleman
|
||||||
Date : April 28 2025
|
Date: April 28 2025
|
||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
|
|||||||
@@ -1,160 +0,0 @@
|
|||||||
# Architecture Decision Record: Multi-Tenancy Strategy for Harmony Managed Clusters
|
|
||||||
|
|
||||||
Initial Author: Jean-Gabriel Gill-Couture
|
|
||||||
|
|
||||||
Initial Date: 2025-05-26
|
|
||||||
|
|
||||||
## Status
|
|
||||||
|
|
||||||
Proposed
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
Harmony manages production OKD/Kubernetes clusters that serve multiple clients with varying trust levels and operational requirements. We need a multi-tenancy strategy that provides:
|
|
||||||
|
|
||||||
1. **Strong isolation** between client workloads while maintaining operational simplicity
|
|
||||||
2. **Controlled API access** allowing clients self-service capabilities within defined boundaries
|
|
||||||
3. **Security-first approach** protecting both the cluster infrastructure and tenant data
|
|
||||||
4. **Harmony-native implementation** using our Score/Interpret pattern for automated tenant provisioning
|
|
||||||
5. **Scalable management** supporting both small trusted clients and larger enterprise customers
|
|
||||||
|
|
||||||
The official Kubernetes multi-tenancy documentation identifies two primary models: namespace-based isolation and virtual control planes per tenant. Given Harmony's focus on operational simplicity, provider-agnostic abstractions (ADR-003), and hexagonal architecture (ADR-002), we must choose an approach that balances security, usability, and maintainability.
|
|
||||||
|
|
||||||
Our clients represent a hybrid tenancy model:
|
|
||||||
- **Customer multi-tenancy**: Each client operates independently with no cross-tenant trust
|
|
||||||
- **Team multi-tenancy**: Individual clients may have multiple team members requiring coordinated access
|
|
||||||
- **API access requirement**: Unlike pure SaaS scenarios, clients need controlled Kubernetes API access for self-service operations
|
|
||||||
|
|
||||||
The official kubernetes documentation on multi tenancy heavily inspired this ADR : https://kubernetes.io/docs/concepts/security/multi-tenancy/
|
|
||||||
|
|
||||||
## Decision
|
|
||||||
|
|
||||||
Implement **namespace-based multi-tenancy** with the following architecture:
|
|
||||||
|
|
||||||
### 1. Network Security Model
|
|
||||||
- **Private cluster access**: Kubernetes API and OpenShift console accessible only via WireGuard VPN
|
|
||||||
- **No public exposure**: Control plane endpoints remain internal to prevent unauthorized access attempts
|
|
||||||
- **VPN-based authentication**: Initial access control through WireGuard client certificates
|
|
||||||
|
|
||||||
### 2. Tenant Isolation Strategy
|
|
||||||
- **Dedicated namespace per tenant**: Each client receives an isolated namespace with access limited only to the required resources and operations
|
|
||||||
- **Complete network isolation**: NetworkPolicies prevent cross-namespace communication while allowing full egress to public internet
|
|
||||||
- **Resource governance**: ResourceQuotas and LimitRanges enforce CPU, memory, and storage consumption limits
|
|
||||||
- **Storage access control**: Clients can create PersistentVolumeClaims but cannot directly manipulate PersistentVolumes or access other tenants' storage
|
|
||||||
|
|
||||||
### 3. Access Control Framework
|
|
||||||
- **Principle of Least Privilege**: RBAC grants only necessary permissions within tenant namespace scope
|
|
||||||
- **Namespace-scoped**: Clients can create/modify/delete resources within their namespace
|
|
||||||
- **Cluster-level restrictions**: No access to cluster-wide resources, other namespaces, or sensitive cluster operations
|
|
||||||
- **Whitelisted operations**: Controlled self-service capabilities for ingress, secrets, configmaps, and workload management
|
|
||||||
|
|
||||||
### 4. Identity Management Evolution
|
|
||||||
- **Phase 1**: Manual provisioning of VPN access and Kubernetes ServiceAccounts/Users
|
|
||||||
- **Phase 2**: Migration to Keycloak-based identity management (aligning with ADR-006) for centralized authentication and lifecycle management
|
|
||||||
|
|
||||||
### 5. Harmony Integration
|
|
||||||
- **TenantScore implementation**: Declarative tenant provisioning using Harmony's Score/Interpret pattern
|
|
||||||
- **Topology abstraction**: Tenant configuration abstracted from underlying Kubernetes implementation details
|
|
||||||
- **Automated deployment**: Complete tenant setup automated through Harmony's orchestration capabilities
|
|
||||||
|
|
||||||
## Rationale
|
|
||||||
|
|
||||||
### Network Security Through VPN Access
|
|
||||||
- **Defense in depth**: VPN requirement adds critical security layer preventing unauthorized cluster access
|
|
||||||
- **Simplified firewall rules**: No need for complex public endpoint protections or rate limiting
|
|
||||||
- **Audit capability**: VPN access provides clear audit trail of cluster connections
|
|
||||||
- **Aligns with enterprise practices**: Most enterprise customers already use VPN infrastructure
|
|
||||||
|
|
||||||
### Namespace Isolation vs Virtual Control Planes
|
|
||||||
Following Kubernetes official guidance, namespace isolation provides:
|
|
||||||
- **Lower resource overhead**: Virtual control planes require dedicated etcd, API server, and controller manager per tenant
|
|
||||||
- **Operational simplicity**: Single control plane to maintain, upgrade, and monitor
|
|
||||||
- **Cross-tenant service integration**: Enables future controlled cross-tenant communication if required
|
|
||||||
- **Proven stability**: Namespace-based isolation is well-tested and widely deployed
|
|
||||||
- **Cost efficiency**: Significantly lower infrastructure costs compared to dedicated control planes
|
|
||||||
|
|
||||||
### Hybrid Tenancy Model Suitability
|
|
||||||
Our approach addresses both customer and team multi-tenancy requirements:
|
|
||||||
- **Customer isolation**: Strong network and RBAC boundaries prevent cross-tenant interference
|
|
||||||
- **Team collaboration**: Multiple team members can share namespace access through group-based RBAC
|
|
||||||
- **Self-service balance**: Controlled API access enables client autonomy without compromising security
|
|
||||||
|
|
||||||
### Harmony Architecture Alignment
|
|
||||||
- **Provider agnostic**: TenantScore abstracts multi-tenancy concepts, enabling future support for other Kubernetes distributions
|
|
||||||
- **Hexagonal architecture**: Tenant management becomes an infrastructure capability accessed through well-defined ports
|
|
||||||
- **Declarative automation**: Tenant lifecycle fully managed through Harmony's Score execution model
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
### Positive Consequences
|
|
||||||
- **Strong security posture**: VPN + namespace isolation provides robust tenant separation
|
|
||||||
- **Operational efficiency**: Single cluster management with automated tenant provisioning
|
|
||||||
- **Client autonomy**: Self-service capabilities reduce operational support burden
|
|
||||||
- **Scalable architecture**: Can support hundreds of tenants per cluster without architectural changes
|
|
||||||
- **Future flexibility**: Foundation supports evolution to more sophisticated multi-tenancy models
|
|
||||||
- **Cost optimization**: Shared infrastructure maximizes resource utilization
|
|
||||||
|
|
||||||
### Negative Consequences
|
|
||||||
- **VPN operational overhead**: Requires VPN infrastructure management
|
|
||||||
- **Manual provisioning complexity**: Phase 1 manual user management creates administrative burden
|
|
||||||
- **Network policy dependency**: Requires CNI with NetworkPolicy support (OVN-Kubernetes provides this and is the OKD/Openshift default)
|
|
||||||
- **Cluster-wide resource limitations**: Some advanced Kubernetes features require cluster-wide access
|
|
||||||
- **Single point of failure**: Cluster outage affects all tenants simultaneously
|
|
||||||
|
|
||||||
### Migration Challenges
|
|
||||||
- **Legacy client integration**: Existing clients may need VPN client setup and credential migration
|
|
||||||
- **Monitoring complexity**: Per-tenant observability requires careful metric and log segmentation
|
|
||||||
- **Backup considerations**: Tenant data backup must respect isolation boundaries
|
|
||||||
|
|
||||||
## Alternatives Considered
|
|
||||||
|
|
||||||
### Alternative 1: Virtual Control Plane Per Tenant
|
|
||||||
**Pros**: Complete control plane isolation, full Kubernetes API access per tenant
|
|
||||||
**Cons**: 3-5x higher resource usage, complex cross-tenant networking, operational complexity scales linearly with tenants
|
|
||||||
|
|
||||||
**Rejected**: Resource overhead incompatible with cost-effective multi-tenancy goals
|
|
||||||
|
|
||||||
### Alternative 2: Dedicated Clusters Per Tenant
|
|
||||||
**Pros**: Maximum isolation, independent upgrade cycles, simplified security model
|
|
||||||
**Cons**: Exponential operational complexity, prohibitive costs, resource waste
|
|
||||||
|
|
||||||
**Rejected**: Operational overhead makes this approach unsustainable for multiple clients
|
|
||||||
|
|
||||||
### Alternative 3: Public API with Advanced Authentication
|
|
||||||
**Pros**: No VPN requirement, potentially simpler client access
|
|
||||||
**Cons**: Larger attack surface, complex rate limiting and DDoS protection, increased security monitoring requirements
|
|
||||||
|
|
||||||
**Rejected**: Risk/benefit analysis favors VPN-based access control
|
|
||||||
|
|
||||||
### Alternative 4: Service Mesh Based Isolation
|
|
||||||
**Pros**: Fine-grained traffic control, encryption, advanced observability
|
|
||||||
**Cons**: Significant operational complexity, performance overhead, steep learning curve
|
|
||||||
|
|
||||||
**Rejected**: Complexity overhead outweighs benefits for current requirements; remains option for future enhancement
|
|
||||||
|
|
||||||
## Additional Notes
|
|
||||||
|
|
||||||
### Implementation Roadmap
|
|
||||||
1. **Phase 1**: Implement VPN access and manual tenant provisioning
|
|
||||||
2. **Phase 2**: Deploy TenantScore automation for namespace, RBAC, and NetworkPolicy management
|
|
||||||
3. **Phase 3**: Integrate Keycloak for centralized identity management
|
|
||||||
4. **Phase 4**: Add advanced monitoring and per-tenant observability
|
|
||||||
|
|
||||||
### TenantScore Structure Preview
|
|
||||||
```rust
|
|
||||||
pub struct TenantScore {
|
|
||||||
pub tenant_config: TenantConfig,
|
|
||||||
pub resource_quotas: ResourceQuotaConfig,
|
|
||||||
pub network_isolation: NetworkIsolationPolicy,
|
|
||||||
pub storage_access: StorageAccessConfig,
|
|
||||||
pub rbac_config: RBACConfig,
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Future Enhancements
|
|
||||||
- **Cross-tenant service mesh**: For approved inter-tenant communication
|
|
||||||
- **Advanced monitoring**: Per-tenant Prometheus/Grafana instances
|
|
||||||
- **Backup automation**: Tenant-scoped backup policies
|
|
||||||
- **Cost allocation**: Detailed per-tenant resource usage tracking
|
|
||||||
|
|
||||||
This ADR establishes the foundation for secure, scalable multi-tenancy in Harmony-managed clusters while maintaining operational simplicity and cost effectiveness. A follow-up ADR will detail the Tenant abstraction and user management mechanisms within the Harmony framework.
|
|
||||||
@@ -16,3 +16,5 @@ harmony_macros = { path = "../../harmony_macros" }
|
|||||||
log = { workspace = true }
|
log = { workspace = true }
|
||||||
env_logger = { workspace = true }
|
env_logger = { workspace = true }
|
||||||
url = { workspace = true }
|
url = { workspace = true }
|
||||||
|
typetag = "0.2.20"
|
||||||
|
serde = "1.0.219"
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ use harmony::{
|
|||||||
maestro::Maestro,
|
maestro::Maestro,
|
||||||
modules::{
|
modules::{
|
||||||
lamp::{LAMPConfig, LAMPScore},
|
lamp::{LAMPConfig, LAMPScore},
|
||||||
monitoring::monitoring_alerting::{AlertChannel, MonitoringAlertingStackScore},
|
monitoring::{kube_prometheus::prometheus_alert_channel::{DiscordChannel, SlackChannel}, monitoring_alerting::MonitoringAlertingScore},
|
||||||
},
|
},
|
||||||
topology::{K8sAnywhereTopology, Url},
|
topology::{K8sAnywhereTopology, Url},
|
||||||
};
|
};
|
||||||
@@ -32,24 +32,42 @@ async fn main() {
|
|||||||
},
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
// You can choose the type of Topology you want, we suggest starting with the
|
// You can choose the type of Topology you want, we suggest starting with the
|
||||||
// K8sAnywhereTopology as it is the most automatic one that enables you to easily deploy
|
// K8sAnywhereTopology as it is the most automatic one that enables you to easily deploy
|
||||||
// locally, to development environment from a CI, to staging, and to production with settings
|
// locally, to development environment from a CI, to staging, and to production with settings
|
||||||
// that automatically adapt to each environment grade.
|
// that automatically adapt to each environment grade.
|
||||||
let mut maestro = Maestro::<K8sAnywhereTopology>::initialize(
|
let mut maestro = Maestro::<K8sAnywhereTopology>::initialize (
|
||||||
Inventory::autoload(),
|
Inventory::autoload(),
|
||||||
K8sAnywhereTopology::new(),
|
K8sAnywhereTopology::new(),
|
||||||
)
|
)
|
||||||
.await
|
.await
|
||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
let url = url::Url::parse("https://discord.com/api/webhooks/dummy_channel/dummy_token")
|
let url = url::Url::parse(
|
||||||
.expect("invalid URL");
|
"https://hooks.slack.com/services/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",
|
||||||
|
)
|
||||||
|
.expect("invalid URL");
|
||||||
|
|
||||||
let mut monitoring_stack_score = MonitoringAlertingStackScore::new();
|
let mut monitoring_stack_score = MonitoringAlertingScore::new();
|
||||||
monitoring_stack_score.namespace = Some(lamp_stack.config.namespace.clone());
|
monitoring_stack_score.namespace = Some(lamp_stack.config.namespace.clone());
|
||||||
|
monitoring_stack_score.alert_channels = vec![(Box::new(SlackChannel {
|
||||||
|
name: "alert-test".to_string(),
|
||||||
|
webhook_url: url,})),
|
||||||
|
(Box::new(DiscordChannel {
|
||||||
|
name: "discord".to_string(),
|
||||||
|
webhook_url: url::Url::parse("https://discord.com/api/webhooks/1372994201746276462/YRn4TA9pj8ve3lfmyj1j0Yx97i92gv4U_uavt4CV4_SSIVArYUqfDzMOmzSTic2d8XSL").expect("invalid URL"),}))];
|
||||||
|
|
||||||
maestro.register_all(vec![Box::new(lamp_stack), Box::new(monitoring_stack_score)]);
|
|
||||||
|
//TODO in process of testing
|
||||||
|
//webhook depricated in MSTeams August 2025
|
||||||
|
//(AlertChannel::MSTeams {
|
||||||
|
// connector: "alert-test".to_string(),
|
||||||
|
// webhook_url: url::Url::parse("").expect("invalid URL"),
|
||||||
|
//}),
|
||||||
|
|
||||||
|
|
||||||
|
maestro.register_all(vec![Box::new(monitoring_stack_score)]);
|
||||||
// Here we bootstrap the CLI, this gives some nice features if you need them
|
// Here we bootstrap the CLI, this gives some nice features if you need them
|
||||||
harmony_cli::init(maestro, None).await.unwrap();
|
harmony_cli::init(maestro, None).await.unwrap();
|
||||||
}
|
}
|
||||||
|
|||||||
14
examples/ms_teams_alert_channel/Cargo.toml
Normal file
14
examples/ms_teams_alert_channel/Cargo.toml
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
[package]
|
||||||
|
name = "ms_teams_alert_channel"
|
||||||
|
edition = "2024"
|
||||||
|
version.workspace = true
|
||||||
|
readme.workspace = true
|
||||||
|
license.workspace = true
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
harmony = { version = "0.1.0", path = "../../harmony" }
|
||||||
|
harmony_cli = { version = "0.1.0", path = "../../harmony_cli" }
|
||||||
|
serde = "1.0.219"
|
||||||
|
tokio.workspace = true
|
||||||
|
typetag = "0.2.20"
|
||||||
|
url.workspace = true
|
||||||
65
examples/ms_teams_alert_channel/src/main.rs
Normal file
65
examples/ms_teams_alert_channel/src/main.rs
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
mod prometheus_msteams;
|
||||||
|
use harmony::{
|
||||||
|
interpret::InterpretError, inventory::Inventory, maestro::Maestro, modules::{helm::chart::HelmChartScore, monitoring::{kube_prometheus::{prometheus_alert_channel::PrometheusAlertChannel, types::{AlertChannelConfig, AlertChannelReceiver, AlertChannelRoute, WebhookConfig}}, monitoring_alerting::MonitoringAlertingScore}}, topology::K8sAnywhereTopology
|
||||||
|
};
|
||||||
|
use prometheus_msteams::prometheus_msteams_score;
|
||||||
|
use url::Url;
|
||||||
|
use serde::{Serialize, Deserialize};
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() {
|
||||||
|
let alert_channels: Vec<Box<dyn PrometheusAlertChannel>> = vec![Box::new(MSTeamsChannel {
|
||||||
|
connector: "teams-test".to_string(),
|
||||||
|
webhook_url: url::Url::parse(
|
||||||
|
"https://msteams.com/services/dummy/dummy/dummy",
|
||||||
|
)
|
||||||
|
.expect("invalid URL"),
|
||||||
|
})];
|
||||||
|
|
||||||
|
let monitoring_score = MonitoringAlertingScore {
|
||||||
|
alert_channels,
|
||||||
|
namespace: None,
|
||||||
|
};
|
||||||
|
|
||||||
|
let mut maestro = Maestro::<K8sAnywhereTopology>::initialize(
|
||||||
|
Inventory::autoload(),
|
||||||
|
K8sAnywhereTopology::new(),
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
maestro.register_all(vec![Box::new(monitoring_score)]);
|
||||||
|
harmony_cli::init(maestro, None).await.unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
struct MSTeamsChannel {
|
||||||
|
connector: String,
|
||||||
|
webhook_url: Url,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[typetag::serde]
|
||||||
|
impl PrometheusAlertChannel for MSTeamsChannel {
|
||||||
|
fn get_alert_manager_config_contribution(&self) -> Result<AlertChannelConfig, InterpretError> {
|
||||||
|
Ok(AlertChannelConfig{
|
||||||
|
receiver: AlertChannelReceiver{
|
||||||
|
name: format!("MSTeams-{}",self.connector),
|
||||||
|
slack_configs: None,
|
||||||
|
webhook_configs: Some(vec![WebhookConfig{
|
||||||
|
url: url::Url::parse("http://prometheus-msteams-prometheus-msteams.monitoring.svc.cluster.local:2000/alertmanager").expect("invalid url"),
|
||||||
|
send_resolved: true,}])
|
||||||
|
},
|
||||||
|
route: AlertChannelRoute{
|
||||||
|
receiver: format!("MSTeams-{}", self.connector),
|
||||||
|
matchers: vec!["alertname!=Watchdog".to_string()],
|
||||||
|
r#continue: true,
|
||||||
|
},
|
||||||
|
global_config: None, })
|
||||||
|
}
|
||||||
|
fn get_dependency_score(&self, ns: String) -> Option<HelmChartScore> {
|
||||||
|
Some(prometheus_msteams_score(self.connector.clone(), self.webhook_url.clone(), ns.clone()))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
30
examples/ms_teams_alert_channel/src/prometheus_msteams.rs
Normal file
30
examples/ms_teams_alert_channel/src/prometheus_msteams.rs
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
use std::str::FromStr;
|
||||||
|
|
||||||
|
use harmony::modules::helm::chart::{HelmChartScore, NonBlankString};
|
||||||
|
use url::Url;
|
||||||
|
|
||||||
|
pub fn prometheus_msteams_score(
|
||||||
|
name: String,
|
||||||
|
webhook_url: Url,
|
||||||
|
namespace: String,
|
||||||
|
) -> HelmChartScore {
|
||||||
|
let values = format!(
|
||||||
|
r#"
|
||||||
|
connectors:
|
||||||
|
- default: "{webhook_url}"
|
||||||
|
"#,
|
||||||
|
);
|
||||||
|
|
||||||
|
HelmChartScore {
|
||||||
|
namespace: Some(NonBlankString::from_str(&namespace).unwrap()),
|
||||||
|
release_name: NonBlankString::from_str(&name).unwrap(),
|
||||||
|
chart_name: NonBlankString::from_str("oci://hub.nationtech.io/library/prometheus-msteams")
|
||||||
|
.unwrap(),
|
||||||
|
chart_version: None,
|
||||||
|
values_overrides: None,
|
||||||
|
values_yaml: Some(values.to_string()),
|
||||||
|
create_namespace: true,
|
||||||
|
install_only: true,
|
||||||
|
repository: None,
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -49,4 +49,5 @@ fqdn = { version = "0.4.6", features = [
|
|||||||
"serde",
|
"serde",
|
||||||
] }
|
] }
|
||||||
temp-dir = "0.1.14"
|
temp-dir = "0.1.14"
|
||||||
|
typetag = "0.2.20"
|
||||||
dyn-clone = "1.0.19"
|
dyn-clone = "1.0.19"
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
pub struct Id {
|
pub struct Id {
|
||||||
value: String,
|
value: String,
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,12 +1,11 @@
|
|||||||
use std::{collections::HashMap, process::Command, sync::Arc};
|
use std::{process::Command, sync::Arc};
|
||||||
|
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use inquire::Confirm;
|
use inquire::Confirm;
|
||||||
use log::{info, warn};
|
use log::{info, warn};
|
||||||
use tokio::sync::{Mutex, OnceCell};
|
use tokio::sync::OnceCell;
|
||||||
|
|
||||||
use crate::{
|
use crate::{
|
||||||
executors::ExecutorError,
|
|
||||||
interpret::{InterpretError, Outcome},
|
interpret::{InterpretError, Outcome},
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
maestro::Maestro,
|
maestro::Maestro,
|
||||||
@@ -14,14 +13,7 @@ use crate::{
|
|||||||
topology::LocalhostTopology,
|
topology::LocalhostTopology,
|
||||||
};
|
};
|
||||||
|
|
||||||
use super::{
|
use super::{HelmCommand, K8sclient, Topology, k8s::K8sClient};
|
||||||
HelmCommand, K8sclient, Topology,
|
|
||||||
k8s::K8sClient,
|
|
||||||
oberservability::monitoring::AlertReceiver,
|
|
||||||
tenant::{
|
|
||||||
ResourceLimits, TenantConfig, TenantManager, TenantNetworkPolicy, k8s::K8sTenantManager,
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
struct K8sState {
|
struct K8sState {
|
||||||
client: Arc<K8sClient>,
|
client: Arc<K8sClient>,
|
||||||
@@ -29,7 +21,6 @@ struct K8sState {
|
|||||||
message: String,
|
message: String,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug)]
|
|
||||||
enum K8sSource {
|
enum K8sSource {
|
||||||
LocalK3d,
|
LocalK3d,
|
||||||
Kubeconfig,
|
Kubeconfig,
|
||||||
@@ -37,8 +28,6 @@ enum K8sSource {
|
|||||||
|
|
||||||
pub struct K8sAnywhereTopology {
|
pub struct K8sAnywhereTopology {
|
||||||
k8s_state: OnceCell<Option<K8sState>>,
|
k8s_state: OnceCell<Option<K8sState>>,
|
||||||
tenant_manager: OnceCell<K8sTenantManager>,
|
|
||||||
pub alert_receivers: Mutex<HashMap<String, OnceCell<AlertReceiver>>>,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
@@ -62,8 +51,6 @@ impl K8sAnywhereTopology {
|
|||||||
pub fn new() -> Self {
|
pub fn new() -> Self {
|
||||||
Self {
|
Self {
|
||||||
k8s_state: OnceCell::new(),
|
k8s_state: OnceCell::new(),
|
||||||
tenant_manager: OnceCell::new(),
|
|
||||||
alert_receivers: Mutex::new(HashMap::new()),
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -172,15 +159,6 @@ impl K8sAnywhereTopology {
|
|||||||
|
|
||||||
Ok(Some(state))
|
Ok(Some(state))
|
||||||
}
|
}
|
||||||
|
|
||||||
fn get_k8s_tenant_manager(&self) -> Result<&K8sTenantManager, ExecutorError> {
|
|
||||||
match self.tenant_manager.get() {
|
|
||||||
Some(t) => Ok(t),
|
|
||||||
None => Err(ExecutorError::UnexpectedError(
|
|
||||||
"K8sTenantManager not available".to_string(),
|
|
||||||
)),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
struct K8sAnywhereConfig {
|
struct K8sAnywhereConfig {
|
||||||
@@ -231,38 +209,3 @@ impl Topology for K8sAnywhereTopology {
|
|||||||
}
|
}
|
||||||
|
|
||||||
impl HelmCommand for K8sAnywhereTopology {}
|
impl HelmCommand for K8sAnywhereTopology {}
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl TenantManager for K8sAnywhereTopology {
|
|
||||||
async fn provision_tenant(&self, config: &TenantConfig) -> Result<(), ExecutorError> {
|
|
||||||
self.get_k8s_tenant_manager()?
|
|
||||||
.provision_tenant(config)
|
|
||||||
.await
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn update_tenant_resource_limits(
|
|
||||||
&self,
|
|
||||||
tenant_name: &str,
|
|
||||||
new_limits: &ResourceLimits,
|
|
||||||
) -> Result<(), ExecutorError> {
|
|
||||||
self.get_k8s_tenant_manager()?
|
|
||||||
.update_tenant_resource_limits(tenant_name, new_limits)
|
|
||||||
.await
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn update_tenant_network_policy(
|
|
||||||
&self,
|
|
||||||
tenant_name: &str,
|
|
||||||
new_policy: &TenantNetworkPolicy,
|
|
||||||
) -> Result<(), ExecutorError> {
|
|
||||||
self.get_k8s_tenant_manager()?
|
|
||||||
.update_tenant_network_policy(tenant_name, new_policy)
|
|
||||||
.await
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn deprovision_tenant(&self, tenant_name: &str) -> Result<(), ExecutorError> {
|
|
||||||
self.get_k8s_tenant_manager()?
|
|
||||||
.deprovision_tenant(tenant_name)
|
|
||||||
.await
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -7,12 +7,6 @@ use serde::Serialize;
|
|||||||
use super::{IpAddress, LogicalHost};
|
use super::{IpAddress, LogicalHost};
|
||||||
use crate::executors::ExecutorError;
|
use crate::executors::ExecutorError;
|
||||||
|
|
||||||
impl std::fmt::Debug for dyn LoadBalancer {
|
|
||||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
|
||||||
f.write_fmt(format_args!("LoadBalancer {}", self.get_ip()))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
pub trait LoadBalancer: Send + Sync {
|
pub trait LoadBalancer: Send + Sync {
|
||||||
fn get_ip(&self) -> IpAddress;
|
fn get_ip(&self) -> IpAddress;
|
||||||
@@ -38,6 +32,11 @@ pub trait LoadBalancer: Send + Sync {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
impl std::fmt::Debug for dyn LoadBalancer {
|
||||||
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||||
|
f.write_fmt(format_args!("LoadBalancer {}", self.get_ip()))
|
||||||
|
}
|
||||||
|
}
|
||||||
#[derive(Debug, PartialEq, Clone, Serialize)]
|
#[derive(Debug, PartialEq, Clone, Serialize)]
|
||||||
pub struct LoadBalancerService {
|
pub struct LoadBalancerService {
|
||||||
pub backend_servers: Vec<BackendServer>,
|
pub backend_servers: Vec<BackendServer>,
|
||||||
|
|||||||
@@ -3,8 +3,6 @@ mod host_binding;
|
|||||||
mod http;
|
mod http;
|
||||||
mod k8s_anywhere;
|
mod k8s_anywhere;
|
||||||
mod localhost;
|
mod localhost;
|
||||||
pub mod oberservability;
|
|
||||||
pub mod tenant;
|
|
||||||
pub use k8s_anywhere::*;
|
pub use k8s_anywhere::*;
|
||||||
pub use localhost::*;
|
pub use localhost::*;
|
||||||
pub mod k8s;
|
pub mod k8s;
|
||||||
|
|||||||
@@ -1 +0,0 @@
|
|||||||
pub mod monitoring;
|
|
||||||
@@ -1,33 +0,0 @@
|
|||||||
use async_trait::async_trait;
|
|
||||||
use dyn_clone::DynClone;
|
|
||||||
use serde::Serialize;
|
|
||||||
|
|
||||||
use std::fmt::Debug;
|
|
||||||
|
|
||||||
use crate::interpret::InterpretError;
|
|
||||||
|
|
||||||
use crate::{interpret::Outcome, topology::Topology};
|
|
||||||
|
|
||||||
/// Represents an entity responsible for collecting and organizing observability data
|
|
||||||
/// from various telemetry sources
|
|
||||||
/// A `Monitor` abstracts the logic required to scrape, aggregate, and structure
|
|
||||||
/// monitoring data, enabling consistent processing regardless of the underlying data source.
|
|
||||||
#[async_trait]
|
|
||||||
pub trait Monitor<T: Topology>: Debug + Send + Sync {
|
|
||||||
async fn deploy_monitor(&self, topology: &T) -> Result<Outcome, InterpretError>;
|
|
||||||
|
|
||||||
async fn delete_monitor(&self, topolgy: &T) -> Result<Outcome, InterpretError>;
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait AlertReceiverDeployment<T: Topology>: Debug + DynClone + Send + Sync {
|
|
||||||
async fn deploy_alert_receiver(&self, topology: &T) -> Result<Outcome, InterpretError>;
|
|
||||||
}
|
|
||||||
|
|
||||||
dyn_clone::clone_trait_object!(<T> AlertReceiverDeployment<T>);
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct AlertReceiver {
|
|
||||||
pub receiver_id: String,
|
|
||||||
pub receiver_installed: bool,
|
|
||||||
}
|
|
||||||
@@ -1,95 +0,0 @@
|
|||||||
use std::sync::Arc;
|
|
||||||
|
|
||||||
use crate::{executors::ExecutorError, topology::k8s::K8sClient};
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use derive_new::new;
|
|
||||||
use k8s_openapi::api::core::v1::Namespace;
|
|
||||||
use serde_json::json;
|
|
||||||
|
|
||||||
use super::{ResourceLimits, TenantConfig, TenantManager, TenantNetworkPolicy};
|
|
||||||
|
|
||||||
#[derive(new)]
|
|
||||||
pub struct K8sTenantManager {
|
|
||||||
k8s_client: Arc<K8sClient>,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl TenantManager for K8sTenantManager {
|
|
||||||
async fn provision_tenant(&self, config: &TenantConfig) -> Result<(), ExecutorError> {
|
|
||||||
let namespace = json!(
|
|
||||||
{
|
|
||||||
"apiVersion": "v1",
|
|
||||||
"kind": "Namespace",
|
|
||||||
"metadata": {
|
|
||||||
"labels": {
|
|
||||||
"harmony.nationtech.io/tenant.id": config.id,
|
|
||||||
"harmony.nationtech.io/tenant.name": config.name,
|
|
||||||
},
|
|
||||||
"name": config.name,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
);
|
|
||||||
todo!("Validate that when tenant already exists (by id) that name has not changed");
|
|
||||||
|
|
||||||
let namespace: Namespace = serde_json::from_value(namespace).unwrap();
|
|
||||||
|
|
||||||
let resource_quota = json!(
|
|
||||||
{
|
|
||||||
"apiVersion": "v1",
|
|
||||||
"kind": "List",
|
|
||||||
"items": [
|
|
||||||
{
|
|
||||||
"apiVersion": "v1",
|
|
||||||
"kind": "ResourceQuota",
|
|
||||||
"metadata": {
|
|
||||||
"name": config.name,
|
|
||||||
"labels": {
|
|
||||||
"harmony.nationtech.io/tenant.id": config.id,
|
|
||||||
"harmony.nationtech.io/tenant.name": config.name,
|
|
||||||
},
|
|
||||||
"namespace": config.name,
|
|
||||||
},
|
|
||||||
"spec": {
|
|
||||||
"hard": {
|
|
||||||
"limits.cpu": format!("{:.0}",config.resource_limits.cpu_limit_cores),
|
|
||||||
"limits.memory": format!("{:.3}Gi", config.resource_limits.memory_limit_gb),
|
|
||||||
"requests.cpu": format!("{:.0}",config.resource_limits.cpu_request_cores),
|
|
||||||
"requests.memory": format!("{:.3}Gi", config.resource_limits.memory_request_gb),
|
|
||||||
"requests.storage": format!("{:.3}", config.resource_limits.storage_total_gb),
|
|
||||||
"pods": "20",
|
|
||||||
"services": "10",
|
|
||||||
"configmaps": "30",
|
|
||||||
"secrets": "30",
|
|
||||||
"persistentvolumeclaims": "15",
|
|
||||||
"services.loadbalancers": "2",
|
|
||||||
"services.nodeports": "5",
|
|
||||||
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn update_tenant_resource_limits(
|
|
||||||
&self,
|
|
||||||
tenant_name: &str,
|
|
||||||
new_limits: &ResourceLimits,
|
|
||||||
) -> Result<(), ExecutorError> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn update_tenant_network_policy(
|
|
||||||
&self,
|
|
||||||
tenant_name: &str,
|
|
||||||
new_policy: &TenantNetworkPolicy,
|
|
||||||
) -> Result<(), ExecutorError> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn deprovision_tenant(&self, tenant_name: &str) -> Result<(), ExecutorError> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,46 +0,0 @@
|
|||||||
use super::*;
|
|
||||||
use async_trait::async_trait;
|
|
||||||
|
|
||||||
use crate::executors::ExecutorError;
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait TenantManager {
|
|
||||||
/// Provisions a new tenant based on the provided configuration.
|
|
||||||
/// This operation should be idempotent; if a tenant with the same `config.name`
|
|
||||||
/// already exists and matches the config, it will succeed without changes.
|
|
||||||
/// If it exists but differs, it will be updated, or return an error if the update
|
|
||||||
/// action is not supported
|
|
||||||
///
|
|
||||||
/// # Arguments
|
|
||||||
/// * `config`: The desired configuration for the new tenant.
|
|
||||||
async fn provision_tenant(&self, config: &TenantConfig) -> Result<(), ExecutorError>;
|
|
||||||
|
|
||||||
/// Updates the resource limits for an existing tenant.
|
|
||||||
///
|
|
||||||
/// # Arguments
|
|
||||||
/// * `tenant_name`: The logical name of the tenant to update.
|
|
||||||
/// * `new_limits`: The new set of resource limits to apply.
|
|
||||||
async fn update_tenant_resource_limits(
|
|
||||||
&self,
|
|
||||||
tenant_name: &str,
|
|
||||||
new_limits: &ResourceLimits,
|
|
||||||
) -> Result<(), ExecutorError>;
|
|
||||||
|
|
||||||
/// Updates the high-level network isolation policy for an existing tenant.
|
|
||||||
///
|
|
||||||
/// # Arguments
|
|
||||||
/// * `tenant_name`: The logical name of the tenant to update.
|
|
||||||
/// * `new_policy`: The new network policy to apply.
|
|
||||||
async fn update_tenant_network_policy(
|
|
||||||
&self,
|
|
||||||
tenant_name: &str,
|
|
||||||
new_policy: &TenantNetworkPolicy,
|
|
||||||
) -> Result<(), ExecutorError>;
|
|
||||||
|
|
||||||
/// Decommissions an existing tenant, removing its isolated context and associated resources.
|
|
||||||
/// This operation should be idempotent.
|
|
||||||
///
|
|
||||||
/// # Arguments
|
|
||||||
/// * `tenant_name`: The logical name of the tenant to deprovision.
|
|
||||||
async fn deprovision_tenant(&self, tenant_name: &str) -> Result<(), ExecutorError>;
|
|
||||||
}
|
|
||||||
@@ -1,67 +0,0 @@
|
|||||||
pub mod k8s;
|
|
||||||
mod manager;
|
|
||||||
pub use manager::*;
|
|
||||||
use serde::{Deserialize, Serialize};
|
|
||||||
|
|
||||||
use std::collections::HashMap;
|
|
||||||
|
|
||||||
use crate::data::Id;
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] // Assuming serde for Scores
|
|
||||||
pub struct TenantConfig {
|
|
||||||
/// This will be used as the primary unique identifier for management operations and will never
|
|
||||||
/// change for the entire lifetime of the tenant
|
|
||||||
pub id: Id,
|
|
||||||
|
|
||||||
/// A human-readable name for the tenant (e.g., "client-alpha", "project-phoenix").
|
|
||||||
pub name: String,
|
|
||||||
|
|
||||||
/// Desired resource allocations and limits for the tenant.
|
|
||||||
pub resource_limits: ResourceLimits,
|
|
||||||
|
|
||||||
/// High-level network isolation policies for the tenant.
|
|
||||||
pub network_policy: TenantNetworkPolicy,
|
|
||||||
|
|
||||||
/// Key-value pairs for provider-specific tagging, labeling, or metadata.
|
|
||||||
/// Useful for billing, organization, or filtering within the provider's console.
|
|
||||||
pub labels_or_tags: HashMap<String, String>,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, Default)]
|
|
||||||
pub struct ResourceLimits {
|
|
||||||
/// Requested/guaranteed CPU cores (e.g., 2.0).
|
|
||||||
pub cpu_request_cores: f32,
|
|
||||||
/// Maximum CPU cores the tenant can burst to (e.g., 4.0).
|
|
||||||
pub cpu_limit_cores: f32,
|
|
||||||
|
|
||||||
/// Requested/guaranteed memory in Gigabytes (e.g., 8.0).
|
|
||||||
pub memory_request_gb: f32,
|
|
||||||
/// Maximum memory in Gigabytes tenant can burst to (e.g., 16.0).
|
|
||||||
pub memory_limit_gb: f32,
|
|
||||||
|
|
||||||
/// Total persistent storage allocation in Gigabytes across all volumes.
|
|
||||||
pub storage_total_gb: f32,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
|
|
||||||
pub struct TenantNetworkPolicy {
|
|
||||||
/// Policy for ingress traffic originating from other tenants within the same Harmony-managed environment.
|
|
||||||
pub default_inter_tenant_ingress: InterTenantIngressPolicy,
|
|
||||||
|
|
||||||
/// Policy for egress traffic destined for the public internet.
|
|
||||||
pub default_internet_egress: InternetEgressPolicy,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
|
|
||||||
pub enum InterTenantIngressPolicy {
|
|
||||||
/// Deny all traffic from other tenants by default.
|
|
||||||
DenyAll,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
|
|
||||||
pub enum InternetEgressPolicy {
|
|
||||||
/// Allow all outbound traffic to the internet.
|
|
||||||
AllowAll,
|
|
||||||
/// Deny all outbound traffic to the internet by default.
|
|
||||||
DenyAll,
|
|
||||||
}
|
|
||||||
@@ -23,7 +23,7 @@ pub struct HelmRepository {
|
|||||||
force_update: bool,
|
force_update: bool,
|
||||||
}
|
}
|
||||||
impl HelmRepository {
|
impl HelmRepository {
|
||||||
pub fn new(name: String, url: Url, force_update: bool) -> Self {
|
pub(crate) fn new(name: String, url: Url, force_update: bool) -> Self {
|
||||||
Self {
|
Self {
|
||||||
name,
|
name,
|
||||||
url,
|
url,
|
||||||
@@ -104,10 +104,7 @@ impl HelmChartInterpret {
|
|||||||
|
|
||||||
fn run_helm_command(args: &[&str]) -> Result<Output, InterpretError> {
|
fn run_helm_command(args: &[&str]) -> Result<Output, InterpretError> {
|
||||||
let command_str = format!("helm {}", args.join(" "));
|
let command_str = format!("helm {}", args.join(" "));
|
||||||
debug!(
|
debug!("Got KUBECONFIG: `{}`", std::env::var("KUBECONFIG").unwrap());
|
||||||
"Got KUBECONFIG: `{}`",
|
|
||||||
std::env::var("KUBECONFIG").unwrap_or("".to_string())
|
|
||||||
);
|
|
||||||
debug!("Running Helm command: `{}`", command_str);
|
debug!("Running Helm command: `{}`", command_str);
|
||||||
|
|
||||||
let output = Command::new("helm")
|
let output = Command::new("helm")
|
||||||
|
|||||||
@@ -1,9 +1,12 @@
|
|||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use log::debug;
|
use log::debug;
|
||||||
|
use non_blank_string_rs::NonBlankString;
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
use std::collections::HashMap;
|
use std::collections::HashMap;
|
||||||
|
use std::env::temp_dir;
|
||||||
|
use std::ffi::OsStr;
|
||||||
use std::io::ErrorKind;
|
use std::io::ErrorKind;
|
||||||
use std::path::PathBuf;
|
use std::path::{Path, PathBuf};
|
||||||
use std::process::{Command, Output};
|
use std::process::{Command, Output};
|
||||||
use temp_dir::{self, TempDir};
|
use temp_dir::{self, TempDir};
|
||||||
use temp_file::TempFile;
|
use temp_file::TempFile;
|
||||||
|
|||||||
@@ -1,13 +1,16 @@
|
|||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
|
|
||||||
use super::monitoring_alerting::AlertChannel;
|
use super::kube_prometheus::{prometheus_alert_channel::PrometheusAlertChannel, types::AlertManagerValues};
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub struct KubePrometheusConfig {
|
pub struct KubePrometheusChartConfig {
|
||||||
pub namespace: String,
|
pub namespace: String,
|
||||||
pub default_rules: bool,
|
pub default_rules: bool,
|
||||||
pub windows_monitoring: bool,
|
pub windows_monitoring: bool,
|
||||||
pub alert_manager: bool,
|
pub alert_manager: bool,
|
||||||
|
pub alert_manager_values: AlertManagerValues,
|
||||||
pub node_exporter: bool,
|
pub node_exporter: bool,
|
||||||
pub prometheus: bool,
|
pub prometheus: bool,
|
||||||
pub grafana: bool,
|
pub grafana: bool,
|
||||||
@@ -21,16 +24,17 @@ pub struct KubePrometheusConfig {
|
|||||||
pub kube_proxy: bool,
|
pub kube_proxy: bool,
|
||||||
pub kube_state_metrics: bool,
|
pub kube_state_metrics: bool,
|
||||||
pub prometheus_operator: bool,
|
pub prometheus_operator: bool,
|
||||||
pub alert_channel: Vec<AlertChannel>,
|
pub alert_channels: Vec<Box<dyn PrometheusAlertChannel>>,
|
||||||
}
|
}
|
||||||
impl KubePrometheusConfig {
|
impl KubePrometheusChartConfig {
|
||||||
pub fn new() -> Self {
|
pub fn new() -> Self {
|
||||||
Self {
|
Self {
|
||||||
namespace: "monitoring".into(),
|
namespace: "monitoring".into(),
|
||||||
default_rules: true,
|
default_rules: true,
|
||||||
windows_monitoring: false,
|
windows_monitoring: false,
|
||||||
alert_manager: true,
|
alert_manager: true,
|
||||||
alert_channel: Vec::new(),
|
alert_manager_values: AlertManagerValues::default(),
|
||||||
|
alert_channels: Vec::new(),
|
||||||
grafana: true,
|
grafana: true,
|
||||||
node_exporter: false,
|
node_exporter: false,
|
||||||
prometheus: true,
|
prometheus: true,
|
||||||
|
|||||||
@@ -5,17 +5,14 @@ use url::Url;
|
|||||||
|
|
||||||
use crate::modules::helm::chart::HelmChartScore;
|
use crate::modules::helm::chart::HelmChartScore;
|
||||||
|
|
||||||
pub fn discord_alert_manager_score(
|
pub fn discord_alert_manager_score(name: String, webhook: Url, namespace: String) -> HelmChartScore {
|
||||||
webhook_url: Url,
|
let url = webhook;
|
||||||
namespace: String,
|
|
||||||
name: String,
|
|
||||||
) -> HelmChartScore {
|
|
||||||
let values = format!(
|
let values = format!(
|
||||||
r#"
|
r#"
|
||||||
environment:
|
environment:
|
||||||
- name: "DISCORD_WEBHOOK"
|
- name: "DISCORD_WEBHOOK"
|
||||||
value: "{webhook_url}"
|
value: "{url}"
|
||||||
"#,
|
"#,
|
||||||
);
|
);
|
||||||
|
|
||||||
HelmChartScore {
|
HelmChartScore {
|
||||||
|
|||||||
@@ -1,168 +0,0 @@
|
|||||||
use super::{
|
|
||||||
discord_alert_manager::discord_alert_manager_score, kube_prometheus_monitor::AlertManagerConfig,
|
|
||||||
};
|
|
||||||
use async_trait::async_trait;
|
|
||||||
use serde::Serialize;
|
|
||||||
use serde_yaml::Value;
|
|
||||||
use tokio::sync::OnceCell;
|
|
||||||
use url::Url;
|
|
||||||
|
|
||||||
use crate::{
|
|
||||||
data::{Id, Version},
|
|
||||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
|
||||||
inventory::Inventory,
|
|
||||||
score::Score,
|
|
||||||
topology::{
|
|
||||||
HelmCommand, K8sAnywhereTopology, Topology,
|
|
||||||
oberservability::monitoring::{AlertReceiver, AlertReceiverDeployment},
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + DiscordWebhookReceiver> AlertReceiverDeployment<T> for DiscordWebhookConfig {
|
|
||||||
async fn deploy_alert_receiver(&self, topology: &T) -> Result<Outcome, InterpretError> {
|
|
||||||
topology.deploy_discord_webhook_receiver(self.clone()).await
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct DiscordWebhookConfig {
|
|
||||||
pub webhook_url: Url,
|
|
||||||
pub name: String,
|
|
||||||
pub send_resolved_notifications: bool,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait DiscordWebhookReceiver {
|
|
||||||
async fn deploy_discord_webhook_receiver(
|
|
||||||
&self,
|
|
||||||
config: DiscordWebhookConfig,
|
|
||||||
) -> Result<Outcome, InterpretError>;
|
|
||||||
fn delete_discord_webhook_receiver(
|
|
||||||
&self,
|
|
||||||
config: DiscordWebhookConfig,
|
|
||||||
) -> Result<Outcome, InterpretError>;
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: DiscordWebhookReceiver> AlertManagerConfig<T> for DiscordWebhookConfig {
|
|
||||||
async fn get_alert_manager_config(&self) -> Result<Value, InterpretError> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl DiscordWebhookReceiver for K8sAnywhereTopology {
|
|
||||||
async fn deploy_discord_webhook_receiver(
|
|
||||||
&self,
|
|
||||||
config: DiscordWebhookConfig,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
let receiver_key = config.name.clone();
|
|
||||||
let mut adapters_map_guard = self.alert_receivers.lock().await;
|
|
||||||
|
|
||||||
let cell = adapters_map_guard
|
|
||||||
.entry(receiver_key.clone())
|
|
||||||
.or_insert_with(OnceCell::new);
|
|
||||||
|
|
||||||
if let Some(initialized_receiver) = cell.get() {
|
|
||||||
return Ok(Outcome::success(format!(
|
|
||||||
"Discord Webhook adapter for '{}' already initialized.",
|
|
||||||
initialized_receiver.receiver_id
|
|
||||||
)));
|
|
||||||
}
|
|
||||||
|
|
||||||
let final_state = cell
|
|
||||||
.get_or_try_init(|| async {
|
|
||||||
initialize_discord_webhook_receiver(config.clone(), self).await
|
|
||||||
})
|
|
||||||
.await?;
|
|
||||||
|
|
||||||
Ok(Outcome::success(format!(
|
|
||||||
"Discord Webhook Receiver for '{}' ensured/initialized.",
|
|
||||||
final_state.receiver_id
|
|
||||||
)))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn delete_discord_webhook_receiver(
|
|
||||||
&self,
|
|
||||||
_config: DiscordWebhookConfig,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn initialize_discord_webhook_receiver(
|
|
||||||
conf: DiscordWebhookConfig,
|
|
||||||
topology: &K8sAnywhereTopology,
|
|
||||||
) -> Result<AlertReceiver, InterpretError> {
|
|
||||||
println!(
|
|
||||||
"Attempting to initialize Discord adapter for: {}",
|
|
||||||
conf.name
|
|
||||||
);
|
|
||||||
let score = DiscordWebhookReceiverScore {
|
|
||||||
config: conf.clone(),
|
|
||||||
};
|
|
||||||
let inventory = Inventory::autoload();
|
|
||||||
let interpret = score.create_interpret();
|
|
||||||
|
|
||||||
interpret.execute(&inventory, topology).await?;
|
|
||||||
|
|
||||||
Ok(AlertReceiver {
|
|
||||||
receiver_id: conf.name,
|
|
||||||
receiver_installed: true,
|
|
||||||
})
|
|
||||||
}
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
struct DiscordWebhookReceiverScore {
|
|
||||||
config: DiscordWebhookConfig,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<T: Topology + HelmCommand> Score<T> for DiscordWebhookReceiverScore {
|
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
|
||||||
Box::new(DiscordWebhookReceiverScoreInterpret {
|
|
||||||
config: self.config.clone(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
fn name(&self) -> String {
|
|
||||||
"DiscordWebhookReceiverScore".to_string()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#[derive(Debug)]
|
|
||||||
struct DiscordWebhookReceiverScoreInterpret {
|
|
||||||
config: DiscordWebhookConfig,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + HelmCommand> Interpret<T> for DiscordWebhookReceiverScoreInterpret {
|
|
||||||
async fn execute(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
discord_alert_manager_score(
|
|
||||||
self.config.webhook_url.clone(),
|
|
||||||
self.config.name.clone(),
|
|
||||||
self.config.name.clone(),
|
|
||||||
)
|
|
||||||
.create_interpret()
|
|
||||||
.execute(inventory, topology)
|
|
||||||
.await
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_name(&self) -> InterpretName {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_version(&self) -> Version {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_status(&self) -> InterpretStatus {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_children(&self) -> Vec<Id> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,17 +1,14 @@
|
|||||||
use super::{config::KubePrometheusConfig, monitoring_alerting::AlertChannel};
|
use crate::modules::{helm::chart::HelmChartScore, monitoring::config::KubePrometheusChartConfig};
|
||||||
use log::info;
|
use log::info;
|
||||||
use non_blank_string_rs::NonBlankString;
|
use non_blank_string_rs::NonBlankString;
|
||||||
use std::{collections::HashMap, str::FromStr};
|
use serde_yaml::{self};
|
||||||
use url::Url;
|
use std::str::FromStr;
|
||||||
|
|
||||||
use crate::modules::helm::chart::HelmChartScore;
|
pub fn kube_prometheus_helm_chart_score(config: &KubePrometheusChartConfig) -> HelmChartScore {
|
||||||
|
|
||||||
pub fn kube_prometheus_helm_chart_score(config: &KubePrometheusConfig) -> HelmChartScore {
|
|
||||||
//TODO this should be make into a rule with default formatting that can be easily passed as a vec
|
//TODO this should be make into a rule with default formatting that can be easily passed as a vec
|
||||||
//to the overrides or something leaving the user to deal with formatting here seems bad
|
//to the overrides or something leaving the user to deal with formatting here seems bad
|
||||||
let default_rules = config.default_rules.to_string();
|
let default_rules = config.default_rules.to_string();
|
||||||
let windows_monitoring = config.windows_monitoring.to_string();
|
let windows_monitoring = config.windows_monitoring.to_string();
|
||||||
let alert_manager = config.alert_manager.to_string();
|
|
||||||
let grafana = config.grafana.to_string();
|
let grafana = config.grafana.to_string();
|
||||||
let kubernetes_service_monitors = config.kubernetes_service_monitors.to_string();
|
let kubernetes_service_monitors = config.kubernetes_service_monitors.to_string();
|
||||||
let kubernetes_api_server = config.kubernetes_api_server.to_string();
|
let kubernetes_api_server = config.kubernetes_api_server.to_string();
|
||||||
@@ -25,6 +22,7 @@ pub fn kube_prometheus_helm_chart_score(config: &KubePrometheusConfig) -> HelmCh
|
|||||||
let node_exporter = config.node_exporter.to_string();
|
let node_exporter = config.node_exporter.to_string();
|
||||||
let prometheus_operator = config.prometheus_operator.to_string();
|
let prometheus_operator = config.prometheus_operator.to_string();
|
||||||
let prometheus = config.prometheus.to_string();
|
let prometheus = config.prometheus.to_string();
|
||||||
|
let alert_manager_values = config.alert_manager_values.clone();
|
||||||
let mut values = format!(
|
let mut values = format!(
|
||||||
r#"
|
r#"
|
||||||
additionalPrometheusRulesMap:
|
additionalPrometheusRulesMap:
|
||||||
@@ -142,68 +140,16 @@ prometheusOperator:
|
|||||||
enabled: {prometheus_operator}
|
enabled: {prometheus_operator}
|
||||||
prometheus:
|
prometheus:
|
||||||
enabled: {prometheus}
|
enabled: {prometheus}
|
||||||
|
prometheusSpec:
|
||||||
|
maximumStartupDurationSeconds: 240
|
||||||
"#,
|
"#,
|
||||||
);
|
);
|
||||||
|
|
||||||
let alertmanager_config = alert_manager_yaml_builder(&config);
|
let alert_manager_yaml = serde_yaml::to_string(&alert_manager_values).expect("Failed to serialize YAML");
|
||||||
values.push_str(&alertmanager_config);
|
values.push_str(&alert_manager_yaml);
|
||||||
|
|
||||||
fn alert_manager_yaml_builder(config: &KubePrometheusConfig) -> String {
|
|
||||||
let mut receivers = String::new();
|
|
||||||
let mut routes = String::new();
|
|
||||||
let mut global_configs = String::new();
|
|
||||||
let alert_manager = config.alert_manager;
|
|
||||||
for alert_channel in &config.alert_channel {
|
|
||||||
match alert_channel {
|
|
||||||
AlertChannel::Discord { name, .. } => {
|
|
||||||
let (receiver, route) = discord_alert_builder(name);
|
|
||||||
info!("discord receiver: {} \nroute: {}", receiver, route);
|
|
||||||
receivers.push_str(&receiver);
|
|
||||||
routes.push_str(&route);
|
|
||||||
}
|
|
||||||
AlertChannel::Slack {
|
|
||||||
slack_channel,
|
|
||||||
webhook_url,
|
|
||||||
} => {
|
|
||||||
let (receiver, route) = slack_alert_builder(slack_channel);
|
|
||||||
info!("slack receiver: {} \nroute: {}", receiver, route);
|
|
||||||
receivers.push_str(&receiver);
|
|
||||||
|
|
||||||
routes.push_str(&route);
|
info!("{}", values);
|
||||||
let global_config = format!(
|
|
||||||
r#"
|
|
||||||
global:
|
|
||||||
slack_api_url: {webhook_url}"#
|
|
||||||
);
|
|
||||||
|
|
||||||
global_configs.push_str(&global_config);
|
|
||||||
}
|
|
||||||
AlertChannel::Smpt { .. } => todo!(),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
info!("after alert receiver: {}", receivers);
|
|
||||||
info!("after alert routes: {}", routes);
|
|
||||||
|
|
||||||
let alertmanager_config = format!(
|
|
||||||
r#"
|
|
||||||
alertmanager:
|
|
||||||
enabled: {alert_manager}
|
|
||||||
config: {global_configs}
|
|
||||||
route:
|
|
||||||
group_by: ['job']
|
|
||||||
group_wait: 30s
|
|
||||||
group_interval: 5m
|
|
||||||
repeat_interval: 12h
|
|
||||||
routes:
|
|
||||||
{routes}
|
|
||||||
receivers:
|
|
||||||
- name: 'null'
|
|
||||||
{receivers}"#
|
|
||||||
);
|
|
||||||
|
|
||||||
info!("alert manager config: {}", alertmanager_config);
|
|
||||||
alertmanager_config
|
|
||||||
}
|
|
||||||
|
|
||||||
HelmChartScore {
|
HelmChartScore {
|
||||||
namespace: Some(NonBlankString::from_str(&config.namespace).unwrap()),
|
namespace: Some(NonBlankString::from_str(&config.namespace).unwrap()),
|
||||||
@@ -220,43 +166,3 @@ alertmanager:
|
|||||||
repository: None,
|
repository: None,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn discord_alert_builder(release_name: &String) -> (String, String) {
|
|
||||||
let discord_receiver_name = format!("Discord-{}", release_name);
|
|
||||||
let receiver = format!(
|
|
||||||
r#"
|
|
||||||
- name: '{discord_receiver_name}'
|
|
||||||
webhook_configs:
|
|
||||||
- url: 'http://{release_name}-alertmanager-discord:9094'
|
|
||||||
send_resolved: true"#,
|
|
||||||
);
|
|
||||||
let route = format!(
|
|
||||||
r#"
|
|
||||||
- receiver: '{discord_receiver_name}'
|
|
||||||
matchers:
|
|
||||||
- alertname!=Watchdog
|
|
||||||
continue: true"#,
|
|
||||||
);
|
|
||||||
(receiver, route)
|
|
||||||
}
|
|
||||||
|
|
||||||
fn slack_alert_builder(slack_channel: &String) -> (String, String) {
|
|
||||||
let slack_receiver_name = format!("Slack-{}", slack_channel);
|
|
||||||
let receiver = format!(
|
|
||||||
r#"
|
|
||||||
- name: '{slack_receiver_name}'
|
|
||||||
slack_configs:
|
|
||||||
- channel: '{slack_channel}'
|
|
||||||
send_resolved: true
|
|
||||||
title: '{{{{ .CommonAnnotations.title }}}}'
|
|
||||||
text: '{{{{ .CommonAnnotations.description }}}}'"#,
|
|
||||||
);
|
|
||||||
let route = format!(
|
|
||||||
r#"
|
|
||||||
- receiver: '{slack_receiver_name}'
|
|
||||||
matchers:
|
|
||||||
- alertname!=Watchdog
|
|
||||||
continue: true"#,
|
|
||||||
);
|
|
||||||
(receiver, route)
|
|
||||||
}
|
|
||||||
3
harmony/src/modules/monitoring/kube_prometheus/mod.rs
Normal file
3
harmony/src/modules/monitoring/kube_prometheus/mod.rs
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
pub mod kube_prometheus;
|
||||||
|
pub mod types;
|
||||||
|
pub mod prometheus_alert_channel;
|
||||||
@@ -0,0 +1,140 @@
|
|||||||
|
use crate::{
|
||||||
|
interpret::InterpretError,
|
||||||
|
modules::{
|
||||||
|
helm::chart::HelmChartScore,
|
||||||
|
monitoring::{
|
||||||
|
discord_alert_manager::discord_alert_manager_score,
|
||||||
|
kube_prometheus::types::{
|
||||||
|
AlertChannelConfig, AlertChannelGlobalConfig, AlertChannelReceiver,
|
||||||
|
AlertChannelRoute, SlackConfig, WebhookConfig,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
use dyn_clone::DynClone;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use std::fmt::Debug;
|
||||||
|
use url::Url;
|
||||||
|
|
||||||
|
#[typetag::serde(tag = "channel_type")]
|
||||||
|
#[async_trait::async_trait]
|
||||||
|
pub trait PrometheusAlertChannel: DynClone + Debug + Send + Sync {
|
||||||
|
fn get_alert_manager_config_contribution(&self) -> Result<AlertChannelConfig, InterpretError>;
|
||||||
|
|
||||||
|
fn get_dependency_score(&self, namespace:String) -> Option<HelmChartScore>;
|
||||||
|
}
|
||||||
|
|
||||||
|
dyn_clone::clone_trait_object!(PrometheusAlertChannel);
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct DiscordChannel {
|
||||||
|
pub name: String,
|
||||||
|
pub webhook_url: Url,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[typetag::serde]
|
||||||
|
impl PrometheusAlertChannel for DiscordChannel {
|
||||||
|
fn get_alert_manager_config_contribution(&self) -> Result<AlertChannelConfig, InterpretError> {
|
||||||
|
let service_url = format!("http://{}-alertmanager-discord:9094", &self.name);
|
||||||
|
Ok(AlertChannelConfig {
|
||||||
|
receiver: AlertChannelReceiver {
|
||||||
|
name: format!("Discord-{}", self.name),
|
||||||
|
slack_configs: None,
|
||||||
|
webhook_configs: Some(vec![WebhookConfig {
|
||||||
|
url: url::Url::parse(&service_url)
|
||||||
|
.expect("invalid url"),
|
||||||
|
send_resolved: true,
|
||||||
|
}]),
|
||||||
|
},
|
||||||
|
route: AlertChannelRoute {
|
||||||
|
receiver: format!("Discord-{}", self.name),
|
||||||
|
matchers: vec!["alertname!=Watchdog".to_string()],
|
||||||
|
r#continue: true,
|
||||||
|
},
|
||||||
|
global_config: None,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
fn get_dependency_score(&self, namespace: String) -> Option<HelmChartScore> {
|
||||||
|
Some(discord_alert_manager_score(self.name.clone(), self.webhook_url.clone(), namespace.clone()))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct SlackChannel {
|
||||||
|
pub name: String,
|
||||||
|
pub webhook_url: Url,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[typetag::serde]
|
||||||
|
impl PrometheusAlertChannel for SlackChannel {
|
||||||
|
fn get_alert_manager_config_contribution(&self) -> Result<AlertChannelConfig, InterpretError> {
|
||||||
|
Ok(AlertChannelConfig {
|
||||||
|
receiver: AlertChannelReceiver {
|
||||||
|
name: format!("Slack-{}", self.name),
|
||||||
|
slack_configs: Some(vec![SlackConfig {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
send_resolved: true,
|
||||||
|
title: "{{ .CommonAnnotations.title }}".to_string(),
|
||||||
|
text: ">-
|
||||||
|
*Alert:* {{ .CommonLabels.alertname }}
|
||||||
|
*Severity:* {{ .CommonLabels.severity }}
|
||||||
|
*Namespace:* {{ .CommonLabels.namespace }}
|
||||||
|
*Pod:* {{ .CommonLabels.pod }}
|
||||||
|
*ExternalURL:* {{ .ExternalURL }}
|
||||||
|
|
||||||
|
{{ range .Alerts }}
|
||||||
|
*Instance:* {{ .Labels.instance }}
|
||||||
|
*Summary:* {{ .Annotations.summary }}
|
||||||
|
*Description:* {{ .Annotations.description }}
|
||||||
|
*Starts At:* {{ .StartsAt }}
|
||||||
|
*Status:* {{ .Status }}
|
||||||
|
{{ end }}"
|
||||||
|
.to_string(),
|
||||||
|
}]),
|
||||||
|
webhook_configs: None,
|
||||||
|
},
|
||||||
|
route: AlertChannelRoute {
|
||||||
|
receiver: format!("Slack-{}", self.name),
|
||||||
|
matchers: vec!["alertname!=Watchdog".to_string()],
|
||||||
|
r#continue: true,
|
||||||
|
},
|
||||||
|
global_config: Some(AlertChannelGlobalConfig {
|
||||||
|
slack_api_url: Some(self.webhook_url.clone()),
|
||||||
|
}),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
fn get_dependency_score(&self, _namespace: String) -> Option<HelmChartScore> {
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct NullReceiver {}
|
||||||
|
|
||||||
|
impl NullReceiver {
|
||||||
|
pub fn new() -> Self {
|
||||||
|
Self {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[typetag::serde]
|
||||||
|
impl PrometheusAlertChannel for NullReceiver {
|
||||||
|
fn get_alert_manager_config_contribution(&self) -> Result<AlertChannelConfig, InterpretError> {
|
||||||
|
Ok(AlertChannelConfig {
|
||||||
|
receiver: AlertChannelReceiver {
|
||||||
|
name: "null".to_string(),
|
||||||
|
slack_configs: None,
|
||||||
|
webhook_configs: None,
|
||||||
|
},
|
||||||
|
route: AlertChannelRoute {
|
||||||
|
receiver: "null".to_string(),
|
||||||
|
matchers: vec!["alertname=Watchdog".to_string()],
|
||||||
|
r#continue: false,
|
||||||
|
},
|
||||||
|
global_config: None,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
fn get_dependency_score(&self, _namespace: String) -> Option<HelmChartScore> {
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -84,17 +84,9 @@ impl AlertManagerValues {
|
|||||||
group_wait: "30s".to_string(),
|
group_wait: "30s".to_string(),
|
||||||
group_interval: "5m".to_string(),
|
group_interval: "5m".to_string(),
|
||||||
repeat_interval: "12h".to_string(),
|
repeat_interval: "12h".to_string(),
|
||||||
routes: vec![AlertChannelRoute {
|
routes: vec![AlertChannelRoute{ receiver: "null".to_string(), matchers: vec!["alertname=Watchdog".to_string()], r#continue: false }],
|
||||||
receiver: "null".to_string(),
|
|
||||||
matchers: vec!["alertname=Watchdog".to_string()],
|
|
||||||
r#continue: false,
|
|
||||||
}],
|
|
||||||
},
|
},
|
||||||
receivers: vec![AlertChannelReceiver {
|
receivers: vec![AlertChannelReceiver{ name: "null".to_string(), slack_configs: None, webhook_configs: None }],
|
||||||
name: "null".to_string(),
|
|
||||||
slack_configs: None,
|
|
||||||
webhook_configs: None,
|
|
||||||
}],
|
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
5003
harmony/src/modules/monitoring/kube_prometheus/values.yaml
Normal file
5003
harmony/src/modules/monitoring/kube_prometheus/values.yaml
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1,108 +0,0 @@
|
|||||||
use async_trait::async_trait;
|
|
||||||
use serde::Serialize;
|
|
||||||
use serde_yaml::Value;
|
|
||||||
|
|
||||||
use crate::{
|
|
||||||
data::{Id, Version},
|
|
||||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
|
||||||
inventory::Inventory,
|
|
||||||
score::Score,
|
|
||||||
topology::{
|
|
||||||
HelmCommand, Topology,
|
|
||||||
oberservability::monitoring::{AlertReceiverDeployment, Monitor},
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
use super::{
|
|
||||||
config::KubePrometheusConfig, kube_prometheus_helm_chart::kube_prometheus_helm_chart_score,
|
|
||||||
};
|
|
||||||
|
|
||||||
#[derive(Debug, Clone)]
|
|
||||||
pub struct KubePrometheus<T> {
|
|
||||||
alert_receivers: Vec<Box<dyn AlertReceiverDeployment<T>>>,
|
|
||||||
config: KubePrometheusConfig,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
pub trait AlertManagerConfig<T> {
|
|
||||||
async fn get_alert_manager_config(&self) -> Result<Value, InterpretError>;
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<T: Topology> KubePrometheus<T> {
|
|
||||||
pub fn new() -> Self {
|
|
||||||
Self {
|
|
||||||
alert_receivers: Vec::new(),
|
|
||||||
config: KubePrometheusConfig::new(),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + HelmCommand + std::fmt::Debug> Monitor<T> for KubePrometheus<T> {
|
|
||||||
async fn deploy_monitor(&self, topology: &T) -> Result<Outcome, InterpretError> {
|
|
||||||
for alert_receiver in &self.alert_receivers {
|
|
||||||
alert_receiver.deploy_alert_receiver(topology).await?;
|
|
||||||
}
|
|
||||||
let score = KubePrometheusScore {
|
|
||||||
config: self.config.clone(),
|
|
||||||
};
|
|
||||||
let inventory = Inventory::autoload();
|
|
||||||
score.create_interpret().execute(&inventory, topology).await
|
|
||||||
}
|
|
||||||
|
|
||||||
async fn delete_monitor(&self, _topolgy: &T) -> Result<Outcome, InterpretError> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
struct KubePrometheusScore {
|
|
||||||
config: KubePrometheusConfig,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<T: Topology + HelmCommand> Score<T> for KubePrometheusScore {
|
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
|
||||||
Box::new(KubePromethusScoreInterpret {
|
|
||||||
score: self.clone(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
fn name(&self) -> String {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
struct KubePromethusScoreInterpret {
|
|
||||||
score: KubePrometheusScore,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[async_trait]
|
|
||||||
impl<T: Topology + HelmCommand> Interpret<T> for KubePromethusScoreInterpret {
|
|
||||||
async fn execute(
|
|
||||||
&self,
|
|
||||||
inventory: &Inventory,
|
|
||||||
topology: &T,
|
|
||||||
) -> Result<Outcome, InterpretError> {
|
|
||||||
kube_prometheus_helm_chart_score(&self.score.config)
|
|
||||||
.create_interpret()
|
|
||||||
.execute(inventory, topology)
|
|
||||||
.await
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_name(&self) -> InterpretName {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_version(&self) -> Version {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_status(&self) -> InterpretStatus {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn get_children(&self) -> Vec<Id> {
|
|
||||||
todo!()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,7 +1,5 @@
|
|||||||
pub mod alertmanager_types;
|
|
||||||
mod config;
|
mod config;
|
||||||
mod discord_alert_manager;
|
mod discord_alert_manager;
|
||||||
pub mod discord_webhook_sender;
|
pub mod kube_prometheus;
|
||||||
mod kube_prometheus_helm_chart;
|
|
||||||
pub mod kube_prometheus_monitor;
|
|
||||||
pub mod monitoring_alerting;
|
pub mod monitoring_alerting;
|
||||||
|
|
||||||
|
|||||||
@@ -1,58 +1,46 @@
|
|||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use email_address::EmailAddress;
|
|
||||||
|
|
||||||
use log::info;
|
use log::info;
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
use url::Url;
|
|
||||||
|
|
||||||
use crate::{
|
use crate::{
|
||||||
data::{Id, Version},
|
data::{Id, Version},
|
||||||
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
interpret::{Interpret, InterpretError, InterpretName, InterpretStatus, Outcome},
|
||||||
inventory::Inventory,
|
inventory::Inventory,
|
||||||
|
modules::monitoring::kube_prometheus::types::{
|
||||||
|
AlertManager, AlertManagerConfig, AlertManagerRoute,
|
||||||
|
},
|
||||||
score::Score,
|
score::Score,
|
||||||
topology::{HelmCommand, Topology},
|
topology::{HelmCommand, Topology},
|
||||||
};
|
};
|
||||||
|
|
||||||
use super::{
|
use super::{
|
||||||
config::KubePrometheusConfig, kube_prometheus_helm_chart::kube_prometheus_helm_chart_score,
|
config::KubePrometheusChartConfig,
|
||||||
|
kube_prometheus::{
|
||||||
|
kube_prometheus::kube_prometheus_helm_chart_score,
|
||||||
|
prometheus_alert_channel::{NullReceiver, PrometheusAlertChannel},
|
||||||
|
types::AlertManagerValues,
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
pub enum AlertChannel {
|
pub struct MonitoringAlertingScore {
|
||||||
Discord {
|
pub alert_channels: Vec<Box<dyn PrometheusAlertChannel>>,
|
||||||
name: String,
|
|
||||||
webhook_url: Url,
|
|
||||||
},
|
|
||||||
Slack {
|
|
||||||
slack_channel: String,
|
|
||||||
webhook_url: Url,
|
|
||||||
},
|
|
||||||
//TODO test and implement in helm chart
|
|
||||||
//currently does not work
|
|
||||||
Smpt {
|
|
||||||
email_address: EmailAddress,
|
|
||||||
service_name: String,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
|
||||||
pub struct MonitoringAlertingStackScore {
|
|
||||||
pub alert_channel: Vec<AlertChannel>,
|
|
||||||
pub namespace: Option<String>,
|
pub namespace: Option<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl MonitoringAlertingStackScore {
|
impl MonitoringAlertingScore {
|
||||||
pub fn new() -> Self {
|
pub fn new() -> Self {
|
||||||
Self {
|
Self {
|
||||||
alert_channel: Vec::new(),
|
alert_channels: Vec::new(),
|
||||||
namespace: None,
|
namespace: None,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
impl<T: Topology + HelmCommand> Score<T> for MonitoringAlertingStackScore {
|
impl<T: Topology + HelmCommand> Score<T> for MonitoringAlertingScore {
|
||||||
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
fn create_interpret(&self) -> Box<dyn Interpret<T>> {
|
||||||
Box::new(MonitoringAlertingStackInterpret {
|
Box::new(MonitoringAlertingInterpret {
|
||||||
score: self.clone(),
|
score: self.clone(),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
@@ -62,17 +50,61 @@ impl<T: Topology + HelmCommand> Score<T> for MonitoringAlertingStackScore {
|
|||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize)]
|
#[derive(Debug, Clone, Serialize)]
|
||||||
struct MonitoringAlertingStackInterpret {
|
struct MonitoringAlertingInterpret {
|
||||||
score: MonitoringAlertingStackScore,
|
score: MonitoringAlertingScore,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl MonitoringAlertingStackInterpret {
|
impl MonitoringAlertingInterpret {
|
||||||
async fn build_kube_prometheus_helm_chart_config(&self) -> KubePrometheusConfig {
|
async fn build_kube_prometheus_helm_chart_config(&self) -> KubePrometheusChartConfig {
|
||||||
let mut config = KubePrometheusConfig::new();
|
let mut config = KubePrometheusChartConfig::new();
|
||||||
|
let mut receivers = Vec::new();
|
||||||
|
let mut routes = Vec::new();
|
||||||
|
let mut global_config = None;
|
||||||
|
|
||||||
if let Some(ns) = &self.score.namespace {
|
if let Some(ns) = &self.score.namespace {
|
||||||
config.namespace = ns.clone();
|
config.namespace = ns.clone();
|
||||||
|
};
|
||||||
|
|
||||||
|
let null_channel = NullReceiver::new();
|
||||||
|
let null_channel = null_channel
|
||||||
|
.get_alert_manager_config_contribution()
|
||||||
|
.unwrap();
|
||||||
|
receivers.push(null_channel.receiver);
|
||||||
|
routes.push(null_channel.route);
|
||||||
|
|
||||||
|
for channel in self.score.alert_channels.clone() {
|
||||||
|
let alert_manager_config_contribution =
|
||||||
|
channel.get_alert_manager_config_contribution().unwrap();
|
||||||
|
receivers.push(alert_manager_config_contribution.receiver);
|
||||||
|
routes.push(alert_manager_config_contribution.route);
|
||||||
|
if let Some(global) = alert_manager_config_contribution.global_config {
|
||||||
|
global_config = Some(global);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
config.alert_channel = self.score.alert_channel.clone();
|
|
||||||
|
info!("after alert receiver: {:#?}", receivers);
|
||||||
|
info!("after alert routes: {:#?}", routes);
|
||||||
|
|
||||||
|
let alert_manager_config = AlertManagerConfig {
|
||||||
|
global: global_config,
|
||||||
|
route: AlertManagerRoute {
|
||||||
|
group_by: vec!["job".to_string()],
|
||||||
|
group_wait: "30s".to_string(),
|
||||||
|
group_interval: "5m".to_string(),
|
||||||
|
repeat_interval: "12h".to_string(),
|
||||||
|
routes,
|
||||||
|
},
|
||||||
|
receivers,
|
||||||
|
};
|
||||||
|
|
||||||
|
info!("alert manager config: {:?}", config);
|
||||||
|
|
||||||
|
config.alert_manager_values = AlertManagerValues {
|
||||||
|
alertmanager: AlertManager {
|
||||||
|
enabled: true,
|
||||||
|
config: alert_manager_config,
|
||||||
|
},
|
||||||
|
};
|
||||||
config
|
config
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -80,7 +112,7 @@ impl MonitoringAlertingStackInterpret {
|
|||||||
&self,
|
&self,
|
||||||
inventory: &Inventory,
|
inventory: &Inventory,
|
||||||
topology: &T,
|
topology: &T,
|
||||||
config: &KubePrometheusConfig,
|
config: &KubePrometheusChartConfig,
|
||||||
) -> Result<Outcome, InterpretError> {
|
) -> Result<Outcome, InterpretError> {
|
||||||
let helm_chart = kube_prometheus_helm_chart_score(config);
|
let helm_chart = kube_prometheus_helm_chart_score(config);
|
||||||
helm_chart
|
helm_chart
|
||||||
@@ -89,56 +121,52 @@ impl MonitoringAlertingStackInterpret {
|
|||||||
.await
|
.await
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn deploy_alert_channel_service<T: Topology + HelmCommand>(
|
async fn deploy_alert_channel_dependencies<T: Topology + HelmCommand>(
|
||||||
&self,
|
&self,
|
||||||
inventory: &Inventory,
|
inventory: &Inventory,
|
||||||
topology: &T,
|
topology: &T,
|
||||||
config: &KubePrometheusConfig,
|
config: &KubePrometheusChartConfig,
|
||||||
) -> Result<Outcome, InterpretError> {
|
) -> Result<Outcome, InterpretError> {
|
||||||
//let mut outcomes = vec![];
|
let mut outcomes = Vec::new();
|
||||||
|
|
||||||
//for channel in &self.score.alert_channel {
|
for channel in &self.score.alert_channels {
|
||||||
// let outcome = match channel {
|
let ns = config.namespace.clone();
|
||||||
// AlertChannel::Discord { .. } => {
|
if let Some(dependency_score) = channel.get_dependency_score(ns) {
|
||||||
// discord_alert_manager_score(config)
|
match dependency_score
|
||||||
// .create_interpret()
|
.create_interpret()
|
||||||
// .execute(inventory, topology)
|
.execute(inventory, topology)
|
||||||
// .await
|
.await
|
||||||
// }
|
{
|
||||||
// AlertChannel::Slack { .. } => Ok(Outcome::success(
|
Ok(outcome) => outcomes.push(outcome),
|
||||||
// "No extra configs for slack alerting".to_string(),
|
Err(e) => {
|
||||||
// )),
|
info!("failed to deploy dependency: {}", { &e });
|
||||||
// AlertChannel::Smpt { .. } => {
|
return Err(e);
|
||||||
// todo!()
|
}
|
||||||
// }
|
}
|
||||||
// };
|
}
|
||||||
// outcomes.push(outcome);
|
}
|
||||||
//}
|
|
||||||
//for result in outcomes {
|
|
||||||
// result?;
|
|
||||||
//}
|
|
||||||
|
|
||||||
Ok(Outcome::success("All alert channels deployed".to_string()))
|
Ok(Outcome::success("All alert channels deployed".to_string()))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
impl<T: Topology + HelmCommand> Interpret<T> for MonitoringAlertingStackInterpret {
|
impl<T: Topology + HelmCommand> Interpret<T> for MonitoringAlertingInterpret {
|
||||||
async fn execute(
|
async fn execute(
|
||||||
&self,
|
&self,
|
||||||
inventory: &Inventory,
|
inventory: &Inventory,
|
||||||
topology: &T,
|
topology: &T,
|
||||||
) -> Result<Outcome, InterpretError> {
|
) -> Result<Outcome, InterpretError> {
|
||||||
let config = self.build_kube_prometheus_helm_chart_config().await;
|
let config = self.build_kube_prometheus_helm_chart_config().await;
|
||||||
info!("Built kube prometheus config");
|
info!("Built kube prometheus config{:?}", config);
|
||||||
info!("Installing kube prometheus chart");
|
info!("Installing kube prometheus chart");
|
||||||
self.deploy_kube_prometheus_helm_chart_score(inventory, topology, &config)
|
self.deploy_kube_prometheus_helm_chart_score(inventory, topology, &config)
|
||||||
.await?;
|
.await?;
|
||||||
info!("Installing alert channel service");
|
info!("Installing alert channel service");
|
||||||
self.deploy_alert_channel_service(inventory, topology, &config)
|
self.deploy_alert_channel_dependencies(inventory, topology, &config)
|
||||||
.await?;
|
.await?;
|
||||||
Ok(Outcome::success(format!(
|
Ok(Outcome::success(format!(
|
||||||
"succesfully deployed monitoring and alerting stack"
|
"succesfully deployed monitoring and alerting score"
|
||||||
)))
|
)))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user