diff --git a/adr/010-monitoring-and-alerting.md b/adr/010-monitoring-and-alerting.md index d5ebb10..a91968b 100644 --- a/adr/010-monitoring-and-alerting.md +++ b/adr/010-monitoring-and-alerting.md @@ -9,15 +9,16 @@ Proposed ## Context -Currently our monitoring and alerting is done using grafana and prometheus alert manager, deployed via helm in k8s. We need to implement a monitoring and alerting solution that is managed by Harmony. A decision needs to be made as to how this should be implemented within Harmony. +A harmony user should be able to initialize a monitoring stack easily, either at the first run of Harmony, or that integrates with existing proects and infra without creating multiple instances of the monitoring stack or overwriting existing alerts/configurations.The user also needs a simple way to configure the stack so that it watches the projects. There should be reasonable defaults configured that are easily customizable for each project ## Decision -use existing HelmScore and pass the scores for grafana and prometheus for each individual project +Create MonitoringStack score that creates a maestro to launch the monitoring stack or not if it is already present. +The MonitoringStack score can be passed to the maestro in the vec! scores list ## Rationale -This will allow the end user to choose to use the monitoring and alerting stack if they choose for both local as well as dev/prod projects. Grafana and Prometheus are installed via helm which is consitent with OKD, helm and other design choices. Allows the use of already defined Scores. +Having the score launch a maestro will allow the user to easily create a new monitoring stack and keeps composants grouped together. The MonitoringScore can handle all the logic for adding alerts, ensuring that the stack is running etc. ## Alerternatives considered @@ -30,6 +31,8 @@ This will allow the end user to choose to use the monitoring and alerting stack - No default solution implemented - Dev needs to chose what they use - Increases complexity of score projects + - Each project will create a new monitoring and alerting instance rather than joining the existing one + - ### Use OKD grafana and prometheus - **Pros**: @@ -39,8 +42,27 @@ This will allow the end user to choose to use the monitoring and alerting stack - ### Create a monitoring and alerting crate similar to harmony tui - **Pros**: - - Creates a default solution that can be implemented or not depending on user choice + - Creates a default solution that can be implemented once by harmony + - can create a join function that will allow a project to connect to the existing solution + - eliminates risk of creating multiple instances of grafana or prometheus - **Cons**: - more complex than using a helm score + - management of values files for individual functions becomes more complicated, ie how do you create alerts for one project via helm install that doesnt overwrite the other alerts +- ### Add monitoring to Maestro struct so whether the monitoring stack is used must be defined + - **Pros**: + - less for the user to define + - may be easier to set defaults + - **Cons**: + - feels counterintuitive + - would need to modify the structure of the maestro and how it operates which seems like a bad idea + - unclear how to allow user to pass custom values/configs to the monitoring stack for subsequent projects +- ### Create MonitoringStack score to add to scores vec! which loads a maestro to install stack if not ready or add custom endpoints/alerts to existing stack + - **Pros**: + - Maestro already accepts a list of scores to initialize + - leaving out the monitoring score simply means the user does not want monitoring + - if the monitoring stack is already created, the MonitoringStack score doesn't necessarily need to be added to each project + - composants of the monitoring stack are bundled together and can be expaned or modified from the same place + - **Cons**: + - maybe need to create