feat: added alert rule and impl for prometheus as well as a few preconfigured bmc alerts for dell server that are used in the monitoring example #67
No reviewers
Labels
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: NationTech/harmony#67
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "feat/alert_rules"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Looks pretty good overall. Some minor refactoring comments but the rest is great!
@ -0,0 +1,40 @@
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
pub fn global_storage_status_degraded_non_critical() -> PrometheusAlertRule {
Is this really Dell specific?
these are from the Dell snmp walk:
dell:
walk:
- 1.3.6.1.4.1.674.10892.5.2
- 1.3.6.1.4.1.674.10892.5.4
- 1.3.6.1.4.1.674.10892.5.5
metrics:
- name: globalSystemStatus
oid: 1.3.6.1.4.1.674.10892.5.2.1
type: gauge
help: This attribute defines the overall rollup status of all components in
the system being monitored by the remote access card - 1.3.6.1.4.1.674.10892.5.2.1
enum_values:
1: other
2: unknown
3: ok
4: nonCritical
5: critical
6: nonRecoverable
there are a bunch of other ones as well but i only included a few for the example. Each server type has a different snmp walk that translates to a different name from the appropriate MIB file. I believe the Dell Mib that this is from is DELL-MM-MIB.
https://github.com/librenms/librenms/tree/master/mibs/dell
@ -0,0 +37,4 @@
"description",
"- **System**: {{ $labels.instance }}\n- **Status**: nonRecoverable\n- **Value**: {{ $value }}\n- **Job**: {{ $labels.job }}",
)
}
Nothing in this file seems Dell specific?
@ -0,0 +1,11 @@
use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;
For clarity, this is fine-ish to be in the kube_prometheus mod because it is specifically for pvc alerts but I think alert definitions should be in another module called just prometheus. kube_prometheus is for stuff specific to deploying prometheus on k8s. It is very possible to have a prometheus deployed somewhere else (AWS managed or grafana cloud maybe) which scrapes k8s targets and will want this alert.
So the logic here would ask for
modules/prometheus/alerts/k8s
as this is a k8s specific alert for any prometheus deployment.that makes sense