harmony/adr/010-monitoring-and-alerting.md
2025-04-28 15:09:11 -04:00

3.4 KiB

Architecture Decision Record: Monitoring and Alerting

Proposed by: Willem Rolleman Date: April 28 2025

Status

Proposed

Context

A harmony user should be able to initialize a monitoring stack easily, either at the first run of Harmony, or that integrates with existing proects and infra without creating multiple instances of the monitoring stack or overwriting existing alerts/configurations.The user also needs a simple way to configure the stack so that it watches the projects. There should be reasonable defaults configured that are easily customizable for each project

Decision

Create MonitoringStack score that creates a maestro to launch the monitoring stack or not if it is already present. The MonitoringStack score can be passed to the maestro in the vec! scores list

Rationale

Having the score launch a maestro will allow the user to easily create a new monitoring stack and keeps composants grouped together. The MonitoringScore can handle all the logic for adding alerts, ensuring that the stack is running etc.

Alerternatives considered

  • Implement alerting and monitoring stack using existing HelmScore for each project

    • Pros:
      • Each project can choose to use the monitoring and alerting stack that they choose
      • Less overhead in terms of care harmony code
      • can add Box::new(grafana::grafanascore(namespace))
    • Cons:
      • No default solution implemented
      • Dev needs to chose what they use
      • Increases complexity of score projects
      • Each project will create a new monitoring and alerting instance rather than joining the existing one
  • Use OKD grafana and prometheus

    • Pros:
      • Minimal config to do in Harmony
    • Cons:
      • relies on OKD so will not working for local testing via k3d
  • Create a monitoring and alerting crate similar to harmony tui

    • Pros:
      • Creates a default solution that can be implemented once by harmony
      • can create a join function that will allow a project to connect to the existing solution
      • eliminates risk of creating multiple instances of grafana or prometheus
    • Cons:
      • more complex than using a helm score
      • management of values files for individual functions becomes more complicated, ie how do you create alerts for one project via helm install that doesnt overwrite the other alerts
  • Add monitoring to Maestro struct so whether the monitoring stack is used must be defined

    • Pros:
      • less for the user to define
      • may be easier to set defaults
    • Cons:
      • feels counterintuitive
      • would need to modify the structure of the maestro and how it operates which seems like a bad idea
      • unclear how to allow user to pass custom values/configs to the monitoring stack for subsequent projects
  • Create MonitoringStack score to add to scores vec! which loads a maestro to install stack if not ready or add custom endpoints/alerts to existing stack

    • Pros:
      • Maestro already accepts a list of scores to initialize
      • leaving out the monitoring score simply means the user does not want monitoring
      • if the monitoring stack is already created, the MonitoringStack score doesn't necessarily need to be added to each project
      • composants of the monitoring stack are bundled together and can be expaned or modified from the same place
    • Cons:
      • maybe need to create