Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/softasap/sa-prometheus-exporters

Collection of the preselected prometheus exporters to be installed on a target nodes
https://github.com/softasap/sa-prometheus-exporters

ansible blackbox-exporter cadvisor-exporter cloudwatch-exporter ecs-exporter elasticsearch-exporter memcached-exporter mongodb-exporter monit-exporter mysql-exporter node-exporter phpfpm-exporter postgres-exporter prometheus prometheus-exporter redis-exporter sql-exporter ssl-exporter

Last synced: 5 days ago
JSON representation

Collection of the preselected prometheus exporters to be installed on a target nodes

Awesome Lists containing this project

README

        

sa-prometheus-exporters
=======================

[![Build Status](https://travis-ci.org/softasap/sa-prometheus-exporters.svg?branch=master)](https://travis-ci.org/softasap/sa-prometheus-exporters)

Re-usable list of exporters usually used with Prometheus

Bundled exporters:

| Exporter | Description | Port | Grafana | Example | Github |
| --- | --- | --- | --- | --- | --- |
| node | Server params | 9100 | 1860 | http://192.168.2.66:9100/metrics | [prometheus/node_exporter](https://github.com/prometheus/node_exporter/) |
| mysqld | Prometheus MySQL exporter | 9104 | 7362 | http://192.168.2.66:9104/metrics | [prometheus/mysqld_exporter](https://github.com/prometheus/mysqld_exporter/) |
| elasticsearch | Elastic search | 9108 | 2322 | http://192.168.2.66:9108/metrics | [justwatchcom/elasticsearch_exporter](https://github.com/justwatchcom/elasticsearch_exporter/) |
| nginx | Nginx exporter | 9113 | 11199 | http://192.168.2.66:9115/metrics | [nginxinc/nginx-prometheus-exporter](https://github.com/nginxinc/nginx-prometheus-exporter) |
| blackbox | External services | 9115 | | http://192.168.2.66:9115/metrics | [prometheus/blackbox_exporter](https://github.com/prometheus/blackbox_exporter/) |
| apache | Apache webserver | 9117 | | http://192.168.2.66:9117/metrics | [Lusitaniae/apache_exporter](https://github.com/Lusitaniae/apache_exporter/) |
| redis | Redis exporter | 9121 | 763 | http://192.168.2.66:9121/metrics | [oliver006/redis_exporter](https://github.com/oliver006/redis_exporter/) |
| proxysql | ProxySQL exporter | 42004 | | http://192.168.2.66:42004/metrics | [percona/proxysql_exporter](https://github.com/percona/proxysql_exporter) |
| memcached | memcached health | 9150 | 37 | http://192.168.2.66:9150/metrics | [prometheus/memcached_exporter](https://github.com/prometheus/memcached_exporter/) |
| postgres | postgres exporter | 9187 | | http://192.168.2.66:9187/metrics | [wrouesnel/postgres_exporter](https://github.com/wrouesnel/postgres_exporter/) |
| mongodb | Percona's mongodb exporter | 9216 | | http://192.168.2.66:9216/metrics | [percona/mongodb_exporter](https://github.com/percona/mongodb_exporter/) |
| ssl | ssl exporter | 9219 | | http://192.168.2.66:9219/metrics | [ribbybibby/ssl_exporter](https://github.com/ribbybibby/ssl_exporter) |
| ecs | aws ecs exporter | 9222 | | http://192.168.2.66:9222/metrics | [slok/ecs-exporter](https://github.com/slok/ecs-exporter) |
| sql | custom sql exporter | 9237 | | http://192.168.2.66:9237/metrics | [justwatchcom/sql_exporter](https://github.com/justwatchcom/sql_exporter) |
| phpfpm | php fpm exporter via sock | 9253 | 5579 | http://192.168.2.66:9253/metrics | [Lusitaniae/phpfpm_exporter](https://github.com/Lusitaniae/phpfpm_exporter) |
| cadvisor | Google's cadvisor exporter | 9280 (configurable) | | http://192.168.2.66:9280/metrics | [google/cadvisor](https://github.com/google/cadvisor/) |
| monit | Monit exporter | 9388 (configurable) | | http://192.168.2.66:9388/metrics | [commercetools/monit_exporter](https://github.com/commercetools/monit_exporter) |
| cloudwatch | Cloudwatch exporter | 9400 (configurable) | | http://192.168.2.66:9400/metrics | [ivx/yet-another-cloudwatch-exporter](https://github.com/ivx/yet-another-cloudwatch-exporter) |

Example of usage:

Simple

```YAML

- {
role: "sa-prometheus-exporters"
}

```

Each exporter gets configured via OPTIONS specified in environment file located at `/etc/prometheus/exporters/exportername_exporter`
Additionally, you have possibility to influence systemd service file itself, see advanced section

```shell
OPTIONS=" --some-exporter-params --some-exporter-params --some-exporter-params --some-exporter-params"

```

Advanced

```YAML

box_prometheus_exporters:
- {
name: some_exporter,
exporter_configuration_template: "/path/to/template/with/console/params/for/exporter",
exporter_user: prometheus,
exporter_group: prometheus,
startup: {
env:
- ENV_VAR1: "ENV_VAR1 value",
- ENV_VAR2: "ENV_VAR1 value"

execstop: "/some/script/to/execute/on/stop",
pidfile: "/if/you/need/pidfile",
extralines:
- "IF YOU WANT"
- "SOME REALLY CUSTOM STUFF"
- "in Service section of the systemd file"
- "(see templates folder)"
}

}
- {
name: apache
}
- {
name: blackbox
}
- {
name: cadvisor
}
- {
name: ecs,
parameters: "--aws.region='us-east-1'"
}
- {
name: elasticsearch
}
- {
name: memcached
}
- {
name: mongodb
}
- {
name: monit,
monit_user: "admin",
monit_password: "see_box_example_for_example_)",
listen_address: "0.0.0.0:9388"
}
- {
name: mysqld,
parameters: "-config.my-cnf=/etc/prometheus/.mycnf -collect.binlog_size=true -collect.info_schema.processlist=true"
}
- {
name: node
}
- {
name: nginx,
parameters: "-nginx.scrape-uri http://:8080/stub_status
}
- {
name: proxysql
}
- {
name: phpfpm,
exporter_user: "www-data",
parameters: "--phpfpm.socket-paths /var/run/php/php7.0-fpm.sock"
}
- {
name: postgres
}
- {
name: redis
}
- {
name: sql
}
- {
name: ssl
}

roles:

- {
role: "sa-prometheus-exporters",
prometheus_exporters: "{{box_prometheus_exporters}}"
}

```

More exporters to consider: https://prometheus.io/docs/instrumenting/exporters/

If you want some to be included into role defaults, open the issue or PR (preferable)

mysqld exporter configuration
-----------------------------

Best served with grafana dashboards from percona https://github.com/percona/grafana-dashboards

For those you need to add parameter
`-collect.binlog_size=true -collect.info_schema.processlist=true`

additionally create exporter mysql user

```sql
CREATE USER 'prometheus_exporter'@'localhost' IDENTIFIED BY 'XXX' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'prometheus_exporter'@'localhost';
```

then either ensure environment variable in startup script (see configuration example in advanced)
```shell
export DATA_SOURCE_NAME='prometheus_exporter:XXX@(localhost:3306)/'
```

also option to do that via exporter environment file `/etc/prometheus/exporters/mysqld_exporter`
```
OPTIONS=""
DATA_SOURCE_NAME='prometheus_exporter:devroot@(localhost:3306)/'
```

or

create ~/.my.cnf

-config.my-cnf=/etc/prometheus/.mycnf -collect.binlog_size=true -collect.info_schema.processlist=true

```ini
[client]
user=prometheus_exporter
password=XXXXXX
```

For presentation, consider grafana dashboards , like

https://github.com/percona/grafana-dashboards

For troubleshouting, must have grafana dashboard is https://grafana.com/dashboards/456

apache exporter configuration
-----------------------------

apache Server params 9117 http://192.168.2.66:9117/metrics Lusitaniae/apache_exporter

prometheus.yml
```
- job_name: "apache"
ec2_sd_configs:
- region: us-east-1
port: 9117
relabel_configs:
# Only monitor instances with a non-no Prometheus tag
- source_labels: [__meta_ec2_tag_Prometheus]
regex: no
action: drop
- source_labels: [__meta_ec2_instance_state]
regex: stopped
action: drop
# Use the instance ID as the instance label
- source_labels: [__meta_ec2_instance_id]
target_label: instance
- source_labels: [__meta_ec2_tag_Name]
target_label: name
- source_labels: [__meta_ec2_tag_Environment]
target_label: env
- source_labels: [__meta_ec2_tag_Role]
target_label: role
- source_labels: [__meta_ec2_public_ip]
regex: (.*)
replacement: ${1}:9117
action: replace
target_label: __address__
```

Grafana dashboard compatible with this exporter is, for example https://grafana.com/dashboards/3894

blackbox exporter configuration
-------------------------------

Blackbox exporter running on a monitoring host can, for example be used

Example: monitoring ssh connectivity for external boxes

prometheus.yml
```
- job_name: 'blackbox_ssh'
metrics_path: /probe
params:
module: [ssh_banner]
ec2_sd_configs:
- region: us-east-1
port: 9100
relabel_configs:
# Only monitor instances with a non-no Prometheus tag
- source_labels: [__meta_ec2_tag_Prometheus]
regex: no
action: drop
- source_labels: [__meta_ec2_instance_state]
regex: stopped
action: drop
# Use the instance ID as the instance label
- source_labels: [__meta_ec2_instance_id]
target_label: instance
- source_labels: [__meta_ec2_tag_Name]
target_label: name
- source_labels: [__meta_ec2_tag_Environment]
target_label: env
- source_labels: [__meta_ec2_tag_Role]
target_label: role
- source_labels: [__meta_ec2_private_ip]
regex: (.*)(:.*)?
replacement: ${1}:22
target_label: __param_target
# Actually talk to the blackbox exporter though
- target_label: __address__
replacement: 127.0.0.1:9115

```

And in a result you can tune prometheus alert manager to alert if you got issues with ssh connectivity

/etc/prometheus/rules/ssh_alerts.yml
```
groups:
- name: ssh_alerts.rules
rules:
- alert: SSHDown
expr: probe_success{job="blackbox_ssh"} == 0
annotations:
description: '[{{ $labels.env }}]{{ $labels.instance }} {{ $labels.name }} of job {{ $labels.job }} is not responding on ssh.'
summary: 'Instance {{ $labels.instance }} has possible ssh issue'
for: 10m
```

memcached exporter configuration
--------------------------------

memcached memcached health 9150 http://192.168.2.66:9150/metrics prometheus/memcached_exporter

prometheus.yml
```
- job_name: "memcached"
ec2_sd_configs:
- region: us-east-1
port: 9150
relabel_configs:
# Only monitor instances with a non-no Prometheus tag
- source_labels: [__meta_ec2_tag_Prometheus]
regex: no
action: drop
- source_labels: [__meta_ec2_instance_state]
regex: stopped
action: drop
# Use the instance ID as the instance label
- source_labels: [__meta_ec2_instance_id]
target_label: instance
- source_labels: [__meta_ec2_tag_Name]
target_label: name
- source_labels: [__meta_ec2_tag_Environment]
target_label: env
- source_labels: [__meta_ec2_tag_Role]
target_label: role
- source_labels: [__meta_ec2_public_ip]
regex: (.*)
replacement: ${1}:9150
action: replace
target_label: __address__
```

Compatible grafana dashboard
https://grafana.com/dashboards/37

node exporter configuration
---------------------------

node Server params 9100 http://192.168.2.66:9100/metrics prometheus/node_exporter

Auto discovery in aws cloud, with filtering

prometheus.yml
```
- job_name: "node"
ec2_sd_configs:
- region: us-east-1
port: 9100
relabel_configs:
# Only monitor instances with a non-no Prometheus tag
- source_labels: [__meta_ec2_tag_Prometheus]
regex: no
action: drop
- source_labels: [__meta_ec2_instance_state]
regex: stopped
action: drop
# Use the instance ID as the instance label
- source_labels: [__meta_ec2_instance_id]
target_label: instance
- source_labels: [__meta_ec2_tag_Name]
target_label: name
- source_labels: [__meta_ec2_tag_Environment]
target_label: env
- source_labels: [__meta_ec2_tag_Role]
target_label: role
- source_labels: [__meta_ec2_public_ip]
regex: (.*)
replacement: ${1}:9100
action: replace
target_label: __address__
```

Some reasonable set of linked prometheus alerts to consider

/etc/prometheus/rules/node_alerts.yml
```
groups:
- name: node_alerts.rules
rules:
- alert: LowDiskSpace
expr: node_filesystem_avail{fstype=~"(ext.|xfs)",job="node"} / node_filesystem_size{fstype=~"(ext.|xfs)",job="node"}* 100 <= 10
for: 15m
labels:
severity: warn
annotations:
title: 'Less than 10% disk space left'
description: |
Consider sshing into the instance and removing old logs, clean
temp files, or remove old apt packages with `apt-get autoremove`
runbook: troubleshooting/filesystem_alerts.md
value: '{{ $value | humanize }}%'
device: '{{ $labels.device }}%'
mount_point: '{{ $labels.mountpoint }}%'
- alert: NoDiskSpace
expr: node_filesystem_avail{fstype=~"(ext.|xfs)",job="node"} / node_filesystem_size{fstype=~"(ext.|xfs)",job="node"}* 100 <= 1
for: 15m
labels:
pager: pagerduty
severity: critical
annotations:
title: '1% disk space left'
description: "There's only 1% disk space left"
runbook: troubleshooting/filesystem_alerts.md
value: '{{ $value | humanize }}%'
device: '{{ $labels.device }}%'
mount_point: '{{ $labels.mountpoint }}%'

- alert: HighInodeUsage
expr: node_filesystem_files_free{fstype=~"(ext.|xfs)",job="node"} / node_filesystem_files{fstype=~"(ext.|xfs)",job="node"}* 100 <= 20
for: 15m
labels:
severity: critical
annotations:
title: "High number of inode usage"
description: |
"Consider ssh'ing into the instance and removing files or clean
temp files"
runbook: troubleshooting/filesystem_alerts_inodes.md
value: '{{ $value | printf "%.2f" }}%'
device: '{{ $labels.device }}%'
mount_point: '{{ $labels.mountpoint }}%'

- alert: ExtremelyHighCPU
expr: instance:node_cpu_in_use:ratio > 0.95
for: 2h
labels:
pager: pagerduty
severity: critical
annotations:
description: CPU use percent is extremely high on {{ if $labels.fqdn }}{{ $labels.fqdn
}}{{ else }}{{ $labels.instance }}{{ end }} for the past 2 hours.
runbook: troubleshooting
title: CPU use percent is extremely high on {{ if $labels.fqdn }}{{ $labels.fqdn
}}{{ else }}{{ $labels.instance }}{{ end }} for the past 2 hours.

- alert: HighCPU
expr: instance:node_cpu_in_use:ratio > 0.8
for: 2h
labels:
severity: critical
annotations:
description: CPU use percent is extremely high on {{ if $labels.fqdn }}{{ $labels.fqdn
}}{{ else }}{{ $labels.instance }}{{ end }} for the past 2 hours.
runbook: troubleshooting
title: CPU use percent is high on {{ if $labels.fqdn }}{{ $labels.fqdn }}{{
else }}{{ $labels.instance }}{{ end }} for the past 2 hours.

```

Grafana dashboard might be https://grafana.com/dashboards/22 (label_values(node_exporter_build_info, instance))

Consider: https://grafana.com/dashboards/6014 or https://grafana.com/dashboards/718

nginx exporter configuration
----------------------------

Additional parameters for that might be passed for nginx-prometheus-exporter:
-nginx.plus
Start the exporter for NGINX Plus. By default, the exporter is started for NGINX. The default value can be overwritten by NGINX_PLUS environment variable.
-nginx.retries int
A number of retries the exporter will make on start to connect to the NGINX stub_status page/NGINX Plus API before exiting with an error. The default value can be overwritten by NGINX_RETRIES environment variable.
-nginx.retry-interval duration
An interval between retries to connect to the NGINX stub_status page/NGINX Plus API on start. The default value can be overwritten by NGINX_RETRY_INTERVAL environment variable. (default 5s)
-nginx.scrape-uri string
A URI for scraping NGINX or NGINX Plus metrics.
For NGINX, the stub_status page must be available through the URI. For NGINX Plus -- the API. The default value can be overwritten by SCRAPE_URI environment variable. (default "http://127.0.0.1:8080/stub_status")
-nginx.ssl-verify
Perform SSL certificate verification. The default value can be overwritten by SSL_VERIFY environment variable. (default true)
-nginx.timeout duration
A timeout for scraping metrics from NGINX or NGINX Plus. The default value can be overwritten by TIMEOUT environment variable. (default 5s)
-web.listen-address string
An address to listen on for web interface and telemetry. The default value can be overwritten by LISTEN_ADDRESS environment variable. (default ":9113")
-web.telemetry-path string
A path under which to expose metrics. The default value can be overwritten by TELEMETRY_PATH environment variable. (default "/metrics")

Exported metrics

Exported Metrics
Common metrics:

nginxexporter_build_info -- shows the exporter build information.
For NGINX, the following metrics are exported:

All stub_status metrics.
nginx_up -- shows the status of the last metric scrape: 1 for a successful scrape and 0 for a failed one.
Connect to the /metrics page of the running exporter to see the complete list of metrics along with their descriptions.

For NGINX Plus, the following metrics are exported:

Connections.
HTTP.
SSL.
HTTP Server Zones.
Stream Server Zones.
HTTP Upstreams. Note: for the state metric, the string values are converted to float64 using the following rule: "up" -> 1.0, "draining" -> 2.0, "down" -> 3.0, "unavail" –> 4.0, "checking" –> 5.0, "unhealthy" -> 6.0.
Stream Upstreams. Note: for the state metric, the string values are converted to float64 using the following rule: "up" -> 1.0, "down" -> 3.0, "unavail" –> 4.0, "checking" –> 5.0, "unhealthy" -> 6.0.
Stream Zone Sync.
nginxplus_up -- shows the status of the last metric scrape: 1 for a successful scrape and 0 for a failed one.
Location Zones.
Resolver.
Connect to the /metrics page of the running exporter to see the complete list of metrics along with their descriptions. Note: to see server zones related metrics you must configure status zones and to see upstream related metrics you must configure upstreams with a shared memory zone.

You would need also activate

```
location /stub_status {
stub_status;
allow 127.0.0.1; #only allow requests from localhost
deny all; #deny all other hosts
}
```

phpfpm exporter configuration
-----------------------------

phpfpm php fpm exporter via sock 9253 http://192.168.2.66:9253/metrics Lusitaniae/phpfpm_exporter

Pay attention to configuration

First of all, exporter should work under user, that can read php-fpm sock (for example www-data if you used sa-php-fpm role),
Second - you should provide path to sock, like `--phpfpm.socket-paths /var/run/php/php7.0-fpm.sock`

```yaml
exporter_user: "www-data",
parameters: "--phpfpm.socket-paths /var/run/php/php7.0-fpm.sock"
```

Third, the FPM status page must be enabled in every pool you'd like to monitor by defining pm.status_path = /status

Auto discovery in aws cloud, with filtering

prometheus.yml
```
- job_name: "phpfpm"
ec2_sd_configs:
- region: us-east-1
port: 9253
relabel_configs:
# Only monitor instances with a non-no Prometheus tag
- source_labels: [__meta_ec2_tag_Prometheus]
regex: no
action: drop
- source_labels: [__meta_ec2_instance_state]
regex: stopped
action: drop
# Use the instance ID as the instance label
- source_labels: [__meta_ec2_instance_id]
target_label: instance
- source_labels: [__meta_ec2_tag_Name]
target_label: name
- source_labels: [__meta_ec2_tag_Environment]
target_label: env
- source_labels: [__meta_ec2_tag_Role]
target_label: role
- source_labels: [__meta_ec2_public_ip]
regex: (.*)
replacement: ${1}:9253
action: replace
target_label: __address__
```

Linked grafana dashboard:

https://grafana.com/dashboards/5579 (single pool)

https://grafana.com/dashboards/5714 (multiple pools)

postgres exporter configuration
--------------------------------

Recommended approach - is to use dedicated user for stats collecting

create exporter `postgres_exporter` user

```sql
CREATE USER postgres_exporter PASSWORD 'XXXX';
ALTER USER postgres_exporter SET SEARCH_PATH TO postgres_exporter,pg_catalog;
```

As you are running as non postgres user, you would need to create views
to read statistics information

```sql

-- If deploying as non-superuser (for example in AWS RDS)
-- GRANT postgres_exporter TO :MASTER_USER;
CREATE SCHEMA postgres_exporter AUTHORIZATION postgres_exporter;

CREATE VIEW postgres_exporter.pg_stat_activity
AS
SELECT * from pg_catalog.pg_stat_activity;

GRANT SELECT ON postgres_exporter.pg_stat_activity TO postgres_exporter;

CREATE VIEW postgres_exporter.pg_stat_replication AS
SELECT * from pg_catalog.pg_stat_replication;

GRANT SELECT ON postgres_exporter.pg_stat_replication TO postgres_exporter;
```

in `/etc/prometheus/exporters/postgres_exporter`

```
OPTIONS="some parameters"
DATA_SOURCE_NAME="login:password@(hostname:port)/"
```

Assuming, you provided above exporter is ready to operate.

For presentation, consider grafana dashboards , like

https://grafana.com/dashboards/3300

https://grafana.com/dashboards/3742

Blackbox exporter notes
-----------------------

startup options should be kind of
```
OPTIONS="--config.file=/opt/prometheus/exporters/blackbox_exporter/blackbox.yml"
```

where blackbox.yml is

```yaml

modules:
http_2xx:
prober: http
http:
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
```

cloudwatch exporter configuration
---------------------------------

You will need to give appropriate rights to your monitoring instance

```tf

resource "aws_iam_role" "monitoring_iam_role" {
name = "monitoring_iam_role"
assume_role_policy = <