feat: added alert rule and impl for prometheus as well as a few preconfigured bmc alerts for dell server that are used in the monitoring example #67

wjro · 2025-06-25T19:11:32Z

wjro commented

2025-06-25 19:11:32 +00:00

No description provided.

wjro added 1 commit 2025-06-25 19:11:37 +00:00

feat: added alert rule and impl for prometheus as well as a few preconfigured bmc alerts for dell server that are used in the monitoring example

Run Check Script / check (pull_request) Successful in 1m35s

Details

c21f3084dc

johnride requested changes 2025-06-25 19:31:58 +00:00

johnride left a comment

Looks pretty good overall. Some minor refactoring comments but the rest is great!

harmony/src/modules/monitoring/kube_prometheus/alerts/dell_server.rs Outdated

						
				@ -0,0 +1,40 @@

				use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;

				pub fn global_storage_status_degraded_non_critical() -> PrometheusAlertRule {

johnride commented

2025-06-25 19:26:18 +00:00

Is this really Dell specific?

wjro commented

2025-06-25 19:50:31 +00:00

these are from the Dell snmp walk:
dell:
walk:
- 1.3.6.1.4.1.674.10892.5.2
- 1.3.6.1.4.1.674.10892.5.4
- 1.3.6.1.4.1.674.10892.5.5
metrics:
- name: globalSystemStatus
oid: 1.3.6.1.4.1.674.10892.5.2.1
type: gauge
help: This attribute defines the overall rollup status of all components in
the system being monitored by the remote access card - 1.3.6.1.4.1.674.10892.5.2.1
enum_values:
1: other
2: unknown
3: ok
4: nonCritical
5: critical
6: nonRecoverable

there are a bunch of other ones as well but i only included a few for the example. Each server type has a different snmp walk that translates to a different name from the appropriate MIB file. I believe the Dell Mib that this is from is DELL-MM-MIB.
https://github.com/librenms/librenms/tree/master/mibs/dell

these are from the Dell snmp walk: dell: walk: - 1.3.6.1.4.1.674.10892.5.2 - 1.3.6.1.4.1.674.10892.5.4 - 1.3.6.1.4.1.674.10892.5.5 metrics: - name: globalSystemStatus oid: 1.3.6.1.4.1.674.10892.5.2.1 type: gauge help: This attribute defines the overall rollup status of all components in the system being monitored by the remote access card - 1.3.6.1.4.1.674.10892.5.2.1 enum_values: 1: other 2: unknown 3: ok 4: nonCritical 5: critical 6: nonRecoverable there are a bunch of other ones as well but i only included a few for the example. Each server type has a different snmp walk that translates to a different name from the appropriate MIB file. I believe the Dell Mib that this is from is DELL-MM-MIB. https://github.com/librenms/librenms/tree/master/mibs/dell

harmony/src/modules/monitoring/kube_prometheus/alerts/dell_server.rs Outdated

						
				@ -0,0 +37,4 @@

				        "description",

				        "- **System**: {{ $labels.instance }}\n- **Status**: nonRecoverable\n- **Value**: {{ $value }}\n- **Job**: {{ $labels.job }}",

				    )

				}

johnride commented

2025-06-25 19:26:52 +00:00

Nothing in this file seems Dell specific?

harmony/src/modules/monitoring/kube_prometheus/alerts/pvc.rs Outdated

						
				@ -0,0 +1,11 @@

				use crate::modules::monitoring::alert_rule::prometheus_alert_rule::PrometheusAlertRule;

johnride commented

2025-06-25 19:30:31 +00:00

For clarity, this is fine-ish to be in the kube_prometheus mod because it is specifically for pvc alerts but I think alert definitions should be in another module called just prometheus. kube_prometheus is for stuff specific to deploying prometheus on k8s. It is very possible to have a prometheus deployed somewhere else (AWS managed or grafana cloud maybe) which scrapes k8s targets and will want this alert.

So the logic here would ask for modules/prometheus/alerts/k8s as this is a k8s specific alert for any prometheus deployment.

For clarity, this is fine-ish to be in the kube_prometheus mod because it is specifically for pvc alerts but I think alert definitions should be in another module called just prometheus. kube_prometheus is for stuff specific to *deploying* prometheus on k8s. It is very possible to have a prometheus deployed somewhere else (AWS managed or grafana cloud maybe) which scrapes k8s targets and will want this alert. So the logic here would ask for `modules/prometheus/alerts/k8s` as this is a k8s specific alert for any prometheus deployment.

wjro commented

2025-06-25 19:52:04 +00:00

that makes sense