https://github.com/sapcc/absent-metrics-operator
Absent Metrics Operator creates metric absence alerts atop Kubernetes
https://github.com/sapcc/absent-metrics-operator
Last synced: about 1 year ago
JSON representation
Absent Metrics Operator creates metric absence alerts atop Kubernetes
- Host: GitHub
- URL: https://github.com/sapcc/absent-metrics-operator
- Owner: sapcc
- License: apache-2.0
- Created: 2020-07-19T23:53:39.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-04-12T13:13:59.000Z (about 2 years ago)
- Last Synced: 2024-04-14T01:06:43.915Z (about 2 years ago)
- Language: Go
- Homepage:
- Size: 10.1 MB
- Stars: 7
- Watchers: 35
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Absent Metrics Operator
[](https://github.com/sapcc/absent-metrics-operator/releases/latest)
[](https://github.com/sapcc/absent-metrics-operator/actions/workflows/ci.yaml)
[](https://coveralls.io/github/sapcc/absent-metrics-operator?branch=master)
[](https://goreportcard.com/report/github.com/sapcc/absent-metrics-operator)
In this document:
- [Terminology](#terminology)
- [Overview](#overview)
- [Motivation](#motivation)
- [Installation](#installation)
- [Usage](#usage)
- [Metrics](#metrics)
In other documents:
- [Absence alert rule definition](./docs/absence-alert-rule-definition.md)
- [Playbook for operators](./docs/playbook.md)
## Terminology
An **_absence alert rule_** is an alert rule that alerts on the absence of a metric.
A `PrometheusRule` is a custom resource defined by the [Prometheus
operator](prometheus-operator), it is used to define a set of alerting rules. Within the
absent metrics operator documentation and source code, an **_AbsencePrometheusRule_** is a
`PrometheusRule` resource created (and managed) by the absent metrics operator that
defines corresponding **_absence alert rules_** for the metrics that were used in the
alert rule definitions in a `PrometheusRule`.
## Overview
The absent metrics operator is a companion operator for the [Prometheus
Operator][prometheus-operator].
It monitors all the `PrometheusRule` resources deployed across a
Kubernetes cluster and creates corresponding _absence alert rules_ for
the alert rules defined in those resources.
## Motivation
Consider the following alert rule definition:
```yaml
alert: ImportantAlert
expr: foo_bar > 0
for: 5m
labels:
support_group: network
service: foo
severity: critical
annotations:
summary: Data center is on fire!
```
This alert would never trigger if the metric `foo_bar` does not exist in
Prometheus.
This can be avoided by using the `absent()` function with the `or` operator so
the alert rule expression becomes:
```
absent(foo_bar) or foo_bar > 0
```
However, this gets tedious if you have hundreds of alerts deployed across the cluster.
There is also the element of human error, e.g. typo or forgetting to include
the `absent` function in the alert expression.
This problem is resolved by the absent metrics operator as it automatically creates the
corresponding alert rules that check and alert on metric absence.
For example, considering the alert rule mentioned above, the operator would generate the following _absence alert rule_:
```yaml
alert: AbsentNetworkFooBar
expr: absent(foo_bar)
for: 10m
labels:
context: absent-metrics
severity: info
support_group: network
service: foo
annotations:
summary: missing foo_bar
description: The metric 'foo_bar' is missing. 'ImportantAlert' alert using it may not fire as intended.
```
Refer to the _absence alert rule_ [definition
documentation](./doc/absence-alert-rule-definition.md) for more information on how these
alerts are generated and defined.
## Installation
You can build with `make`, install with `make install`, or `docker build`.
The `make install` target understands the conventional environment variables for choosing
install locations: `DESTDIR` and `PREFIX`.
## Usage
For usage instructions:
```
absent-metrics-operator --help
```
In case of a false positive, the operator can be disabled for a specific alert rule or the
entire `PrometheusRule` resource. Refer to the [playbook for operators](./docs/playbook.md#disable-the-operator)
for instructions.
### Metrics
Metrics are exposed at port `9659`. This port has been
[allocated](https://github.com/prometheus/prometheus/wiki/Default-port-allocations)
for the operator.
| Metric | Labels |
| --------------------------------------------------- | ------------------------------------------------- |
| `absent_metrics_operator_successful_reconcile_time` | `prometheusrule_namespace`, `prometheusrule_name` |
[prometheus-operator]: https://github.com/prometheus-operator/prometheus-operator