harmony/adr/010-monitoring-and-alerting.md

69 lines
3.5 KiB
Markdown

# Architecture Decision Record: Monitoring and Alerting
Initial Author : Willem Rolleman
Date : April 28 2025
## Status
Proposed
## Context
A harmony user should be able to initialize a monitoring stack easily, either at the first run of Harmony, or that integrates with existing proects and infra without creating multiple instances of the monitoring stack or overwriting existing alerts/configurations.The user also needs a simple way to configure the stack so that it watches the projects. There should be reasonable defaults configured that are easily customizable for each project
## Decision
Create MonitoringStack score that creates a maestro to launch the monitoring stack or not if it is already present.
The MonitoringStack score can be passed to the maestro in the vec! scores list
## Rationale
Having the score launch a maestro will allow the user to easily create a new monitoring stack and keeps composants grouped together. The MonitoringScore can handle all the logic for adding alerts, ensuring that the stack is running etc.
## Alerternatives considered
- ### Implement alerting and monitoring stack using existing HelmScore for each project
- **Pros**:
- Each project can choose to use the monitoring and alerting stack that they choose
- Less overhead in terms of care harmony code
- can add Box::new(grafana::grafanascore(namespace))
- **Cons**:
- No default solution implemented
- Dev needs to chose what they use
- Increases complexity of score projects
- Each project will create a new monitoring and alerting instance rather than joining the existing one
- ### Use OKD grafana and prometheus
- **Pros**:
- Minimal config to do in Harmony
- **Cons**:
- relies on OKD so will not working for local testing via k3d
- ### Create a monitoring and alerting crate similar to harmony tui
- **Pros**:
- Creates a default solution that can be implemented once by harmony
- can create a join function that will allow a project to connect to the existing solution
- eliminates risk of creating multiple instances of grafana or prometheus
- **Cons**:
- more complex than using a helm score
- management of values files for individual functions becomes more complicated, ie how do you create alerts for one project via helm install that doesnt overwrite the other alerts
- ### Add monitoring to Maestro struct so whether the monitoring stack is used must be defined
- **Pros**:
- less for the user to define
- may be easier to set defaults
- **Cons**:
- feels counterintuitive
- would need to modify the structure of the maestro and how it operates which seems like a bad idea
- unclear how to allow user to pass custom values/configs to the monitoring stack for subsequent projects
- ### Create MonitoringStack score to add to scores vec! which loads a maestro to install stack if not ready or add custom endpoints/alerts to existing stack
- **Pros**:
- Maestro already accepts a list of scores to initialize
- leaving out the monitoring score simply means the user does not want monitoring
- if the monitoring stack is already created, the MonitoringStack score doesn't necessarily need to be added to each project
- composants of the monitoring stack are bundled together and can be expaned or modified from the same place
- **Cons**:
- maybe need to create