Prometheus Alertmanager
Prometheus Alertmanager handles alerts generated by Prometheus and routes notifications to tools such as email, Slack, Microsoft Teams, or PagerDuty.
When your Prometheus server is configured to send alerts to Alertmanager, you can use MetricsHub alert rules to detect hardware, storage, and system issues.
- These alert rules are distinct from the internal alerts generated by MetricsHub and emitted as OpenTelemetry logs.
- The alert rules described on this page are evaluated by Prometheus. When an alert fires, Prometheus sends it to Alertmanager, which is responsible for routing and notifications.
- To view detailed alert descriptions and annotations, you must use the full Prometheus Alertmanager interface (typically available on port
9093). The lightweight Prometheus web UI does not display this additional alert information.
Available Alert Rules
The following rule sets are provided with MetricsHub:
| Alert Rules | When to Use | Alerts Triggered When |
|---|---|---|
| MetricsHub | Always |
|
| Hardware | When hardware monitoring is performed |
|
| Storage | When storage monitoring is performed |
|
| System | When system monitoring is performed |
|
Install Alert Rules
To activate the alert rules:
-
Copy the required configuration files into your
Prometheusconfiguration directory:config/metricshub-rules.yamlconfig/metricshub-hardware-rules.yamlconfig/metricshub-storage-rules.yamlconfig/metricshub-system-rules.yaml
-
Declare them in the
prometheus.ymlfile:rule_files:- metricshub-rules.yaml- metricshub-hardware-rules.yaml- metricshub-storage-rules.yaml- metricshub-system-rules.yaml -
Restart your Prometheus server to take the new rules into account.
Understanding Alert Rule Thresholds
MetricsHub alert rules use two types of thresholds:
-
Static thresholds Use fixed values that apply to all devices. Example: battery charge below
30%. -
Dynamic thresholds Use device-specific threshold metrics exposed directly by the monitored hardware. Example: warning and critical temperature limits provided by the device itself.
Dynamic thresholds allow MetricsHub to adapt alerts automatically to different hardware vendors, models, and configurations.
The following examples illustrate how static and dynamic thresholds are implemented in Prometheus alert rules.
Static Threshold Example
For the hw_battery_charge_ratio metric:
- a
warningalert is triggered when the battery charge is below 0.5 (50%) - a
criticalalert is triggered when the battery charge is below 0.3 (30%)
Because Prometheus rules are evaluated independently, both alerts fire when the charge falls below 30%.
- name: MetricsHub-Hardware-Battery-Charge
rules:
- alert: MetricsHub-Hardware-Battery-Charge-Warning
expr: hw_battery_charge_ratio >= 0 AND hw_battery_charge_ratio * 100 <= 50
for: 5m
labels:
severity: warning
- alert: MetricsHub-Hardware-Battery-Charge-Critical
expr: hw_battery_charge_ratio >= 0 AND hw_battery_charge_ratio * 100 < 30
for: 5m
labels:
severity: critical
Dynamic Threshold Example
For the hw_temperature_celsius metric:
- a
warningalert is triggered when the temperature exceeds the value ofhw_temperature_limit_celsius{limit_type="high.degraded"} - a
criticalalert is triggered when the temperature exceeds the value ofhw_temperature_limit_celsius{limit_type="high.critical"}
- name: Temperature
rules:
- alert: Temperature-High-Warning
expr: hw_temperature_celsius >= ignoring(limit_type) hw_temperature_limit_celsius{limit_type="high.degraded"}
labels:
severity: warning
- alert: Temperature-High-Critical
expr: hw_temperature_celsius >= ignoring(limit_type) hw_temperature_limit_celsius{limit_type="high.critical"}
labels:
severity: critical
Customizing Alert Rules
All MetricsHub alert rules can be customized.
You can:
- Adjust thresholds
- Modify alert durations (
for:) - Add or remove labels
- Customize annotations and descriptions
- Enable or disable specific alerts
- Integrate additional routing labels for Alertmanager
After modifying a rule file, restart Prometheus to reload the updated configuration.