harmony/adr/010-monitoring-and-alerting.md

# Architecture Decision Record: Monitoring and Alerting

Proposed by: Willem Rolleman
Date: April 28 2025

## Status

Proposed

## Context

A harmony user should be able to initialize a monitoring stack easily, either at the first run of Harmony, or that integrates with existing proects and infra without creating multiple instances of the monitoring stack or overwriting existing alerts/configurations.The user also needs a simple way to configure the stack so that it watches the projects. There should be reasonable defaults configured that are easily customizable for each project

## Decision

Create MonitoringStack score that creates a maestro to launch the monitoring stack or not if it is already present.
The MonitoringStack score can be passed to the maestro in the vec! scores list

## Rationale

Having the score launch a maestro will allow the user to easily create a new monitoring stack and keeps composants grouped together. The MonitoringScore can handle all the logic for adding alerts, ensuring that the stack is running etc.

## Alerternatives considered

- ### Implement alerting and monitoring stack using existing HelmScore for each project
    - **Pros**:
        - Each project can choose to use the monitoring and alerting stack that they choose
        - Less overhead in terms of care harmony code
        - can add Box::new(grafana::grafanascore(namespace))
    - **Cons**:
        - No default solution implemented
        - Dev needs to chose what they use
        - Increases complexity of score projects
        - Each project will create a new monitoring and alerting instance rather than joining the existing one


- ### Use OKD grafana and prometheus
    - **Pros**:
        - Minimal config to do in Harmony
    - **Cons**:
        - relies on OKD so will not working for local testing via k3d

- ### Create a monitoring and alerting crate similar to harmony tui
    - **Pros**:
        - Creates a default solution that can be implemented once by harmony
        - can create a join function that will allow a project to connect to the existing solution
        - eliminates risk of creating multiple instances of grafana or prometheus
    - **Cons**:
        - more complex than using a helm score
        - management of values files for individual functions becomes more complicated, ie how do you create alerts for one project via helm install that doesnt overwrite the other alerts

- ### Add monitoring to Maestro struct so whether the monitoring stack is used must be defined
    - **Pros**:
        - less for the user to define
        - may be easier to set defaults
    - **Cons**:
        - feels counterintuitive
        - would need to modify the structure of the maestro and how it operates which seems like a bad idea
        - unclear how to allow user to pass custom values/configs to the monitoring stack for subsequent projects

- ### Create MonitoringStack score to add to scores vec! which loads a maestro to install stack if not ready or add custom endpoints/alerts to existing stack
    - **Pros**:
        - Maestro already accepts a list of scores to initialize
        - leaving out the monitoring score simply means the user does not want monitoring
        - if the monitoring stack is already created, the MonitoringStack score doesn't necessarily need to be added to each project
        - composants of the monitoring stack are bundled together and can be expaned or modified from the same place
    - **Cons**:
        - maybe need to create