Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/opsdis/monitor-exporter

The monitor-exporter utilize OP5 Monitors API to fetch service based performance data and publish it in a way that lets prometheus scrape the performance data as metrics.
https://github.com/opsdis/monitor-exporter

grafana monitor-exporter op5-monitor prometheus

Last synced: 3 months ago
JSON representation

The monitor-exporter utilize OP5 Monitors API to fetch service based performance data and publish it in a way that lets prometheus scrape the performance data as metrics.

Awesome Lists containing this project

README

        

[![PyPI version](https://badge.fury.io/py/monitor-exporter.svg)](https://badge.fury.io/py/monitor-exporter)

monitor-exporter
-----------------------

- [Overview](#overview)
- [Metrics naming](#metrics-naming)
* [Service performance data](#service-performance-data)
* [Host performance data](#host-performance-data)
* [State](#state)
* [Metric labels](#metric-labels)
* [Performance metrics name to labels](#performance-metrics-name-to-labels)
- [Configuration](#configuration)
* [monitor-exporter](#monitor-exporter-1)
- [Using Redis cache](#using-redis-cache)
- [Logging](#logging)
- [Prometheus configuration](#prometheus-configuration)
* [Static config](#static-config)
* [File discovery config for usage with `monitor-promdiscovery`](#file-discovery-config-for-usage-with--monitor-promdiscovery-)
- [Installing](#installing)
- [Running](#running)
* [Development with Quart built in webserver](#development-with-quart-built-in-webserver)
* [Production deployment](#production-deployment)
+ [Deploying with gunicorn](#deploying-with-gunicorn)
* [Test the connection](#test-the-connection)
- [System requirements](#system-requirements)
- [License](#license)

# Overview

The monitor-exporter utilises ITRS, former OP5, Monitor's API to fetch host and service-based performance data and
publish it in a way that lets Prometheus scrape the performance data and state as metrics.

Benefits:

- Enable advanced queries and aggregation on time series
- Prometheus based alerting rules
- Grafana graphing
- Take advantage of metrics already collected by Monitor, without rerunning checks
- Collect hosts and services performance data and state and translate to Prometheus metrics

This solution is a perfect gateway for any Monitor users that would like to start using Prometheus and Grafana.

# Metrics naming
## Service performance data
Metrics that are scraped with the monitor-exporter will have the following naming structure:

monitor___

> Unit is only added if it exists for the performance data

For example the check command `check_ping` will result in two metrics:

monitor_check_ping_rta_seconds
monitor_check_ping_pl_ratio

## Host performance data
In Monitor the host also have a check to verify the state of the host. The metric name is always called `monitor_check_host_alive`.
If this check as multiple performance values they will be reported as individual metrics, e.g.

```
monitor_check_host_alive_pkt{hostname="foo.com", environment="production", service="isalive"} 1
monitor_check_host_alive_rta{hostname="foo.com", environment="production", service="isalive"} 2.547
monitor_check_host_alive_pl_ratio{hostname="foo.com", environment="production", service="isalive"} 0.0
```

> Service label will always be `isalive`

## State
State metrics is reported for both hosts and services.
State metrics is reported as value 0 (okay), 1 (warning), 2 (critical) and 4 (unknown).

For hosts the metric name is:

monitor_host_state

For services the metric name is:

monitor_service_state

## Metric labels
The monitor-exporter adds a number of labels to each metric:

- **hostname** - is the `host_name` in Monitor
- **service** - is the `service_description` in Monitor
- **downtime** - if the host or service is currently in a downtime period - true/false. If the host is in downtime its
services are also in downtime. **Attention, downtime is only support if monitor-export is running in cache mode.**
- **address** - the hosts real address
- **acknowledged** - is applicable if a host or service is in warning or critical and have been acknowledged by operations -
0/1 where 1 is acknowledged.

Optionally the monitor-exporter can be configured to pass all or specific custom variables configured in Monitor as
labels Prometheus.

> Any host based custom variables that is used as labels is also set for its services.

> Labels created from custom variables are all transformed to lowercase.

## Performance metrics name to labels
As described above, the default naming of the Prometheus name is:

monitor___

For some check commands this does not work well like for the `self_check_by_snmp_disk_usage_v3` check command where the
perfname are the unique mount paths.
For checks where the perfname is defined depending on a specific name, you can change it so the perfname becomes a
label instead.
This is defined in the configuration like:

```yaml
perfnametolabel:
# The command name
self_check_by_snmp_disk_usage_v3:
# the label name to be used
label_name: disk
check_disk_local_mb:
label_name: local_disk
```
So if the check command is `self_check_by_snmp_disk_usage_v3`, the Prometheus metrics will have a format like:

monitor_self_check_by_snmp_disk_usage_v3_bytes{hostname="monitor", service="Disk usage /", disk="/_used"} 48356130816.0

If we did not make this transformation, we would get the following:

monitor_self_check_by_snmp_disk_usage_v3_slash_used_bytes{hostname="monitor", service="Disk usage /"} 48356130816.0

Which is bad since we get specific metric name from the perfname.

> Please be aware of naming conventions for perfname and services, especially when they include a name depending on
> what is checked like a mountpoint or disk name.

# Configuration
## monitor-exporter
All configuration is made in the `config.yml` file. Please see read thee file for all configuration options.

> When running with gunicorn the port is defined by gunicorn

# Using Redis cache
If you have a large Monitor configuration, the load of the Monitor server can get high when collecting host and service data over the api with a high rate.
We strongly recommend that you instead collect host and service data in a batch and store it in a redis cache.
The interval of the batch collecting is configurable, but considering that most service checks in Monitor are often done in 5 minutes interval,
collecting every minute should be more than enough.

To use caching just add this to your `config.yml`:
```
cache:
# Use redis for cache - future may support others
# Values below is the default
redis:
# redis host
host: localhost
# redis port
port: 6379
# the auth string used in redis
#auth: secretstuff
# the redis db to use
db: 0
# The interval to collect data from Monitor in secoends
interval: 60
# The time to live for the stored Monitor objects in the redis cache
ttl: 300
```
> Redis must be installed on some host on the network and be accessible from the server running monitor-exporter

# Logging
The log stream is configure in the above config. If `logfile` is not set the logs will go to stdout.

Logs are formatted as json so it's easy to store logs in log servers like Loki and Elasticsearch.

# Prometheus configuration
Prometheus can be used with static configuration or with dynamic file discovery using the project
[monitor-promdiscovery](https://bitbucket.org/opsdis/monitor-promdiscovery)

Please add the the job to the scrape_configs in prometheus.yml.

> The target is the `host_name` configured in Monitor.

## Static config
```yaml

scrape_configs:
- job_name: 'op5monitor'
metrics_path: /metrics
static_configs:
- targets:
- monitor
- google.se
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9631

```

## File discovery config for usage with `monitor-promdiscovery`

```yaml

scrape_configs:
- job_name: 'op5monitor'
scrape_interval: 1m
metrics_path: /metrics
file_sd_configs:
- files:
- 'sd/monitor_sd.yml'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9631

```
# Installing
1. Clone the git repo.
2. Install dependencies

`pip install -r requirements.txt`

3. Build a distribution

`python setup.py sdist`

4. Install locally

`pip install dist/monitor-exporter-X.Y.Z.tar.gz`

# Running
## Development with Quart built in webserver

python -m monitor_exporter -f config.yml

The switch -p enable setting of the port.

## Production deployment
There are a number of ASGI containers that can be can use to deploy *monitor-exporter*. The dependency for these are not
included in the distribution.

### Deploying with gunicorn
First install the guincorn dependency into the python environment.

pip install gunicorn==20.1.0
pip install uvicorn==0.14.0

Running with the default config.yml. The default location is current directory.

gunicorn --access-logfile /dev/null -w 4 -k uvicorn.workers.UvicornWorker "wsgi:create_app()"

Set the path to the configuration file.

gunicorn --access-logfile /dev/null -w 4 -k uvicorn.workers.UvicornWorker "wsgi:create_app('/etc/monitor-exporter/config.yml')"

> Port for gunicorn is default 8000, but can be set with -b, e.g. `-b localhost:9631`

## Docker
Alt 1: Edit the config.yml in repo:

docker run -p 9631:9631 monitor-exporter

Alt 2: Have config in separate location

docker run -v /path/to/config:/monitor-exporter/config/ -p 9631:9631 monitor-exporter

## Test the connection

Check if the exporter is working.

curl -s http://localhost:9631/health

Get metrics for a host where `target` is a host using the same `host_name` in Monitor

curl -s http://localhost:9631/metrics?target=foo.com

# System requirements
Python 3.8

For required packages, please review `requirements.txt`

# License
The monitor-exporter is licensed under GPL version 3.