Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/williamchanrico/planet-exporter

:rocket: Determine server network dependencies along with required bandwidth
https://github.com/williamchanrico/planet-exporter

darkstat dependency-graph ebpf-exporter network-monitoring planet-exporter prometheus prometheus-exporter

Last synced: 2 months ago
JSON representation

:rocket: Determine server network dependencies along with required bandwidth

Awesome Lists containing this project

README

        

Planet Exporter





Know your network dependencies!


A simple Prometheus exporter that provides understanding of your machines network dependencies.





Goreleaser Build



Release



Go Version



CodeQL Analysis


Apache-2.0 Licence

## Table of Content

* [Table of Content](#table-of-content)
* [Introduction](#introduction)
* [Installation](#installation)
* [Configuration](#configuration)
* [Project Structure](#project-structure)
* [Collector Tasks](#collector-tasks)
+ [Inventory](#inventory)
+ [Socketstat](#socketstat)
+ [Darkstat](#darkstat)
+ [EBPF Exporter](#ebpf-exporter)
* [Exporter Cost](#exporter-cost)
- [Tools](#tools)
* [Planet Federator](#planet-federator)
+ [Example InfluxQL](#example-influxql)
* [Planet Federator InfluxDB to BQ](#planet-federator-influxdb-to-bigquery)
- [Ansible Role](#ansible-role)
- [Go Version](#go-version)
- [Contributing](#contributing)
- [License](#license)

## Introduction

Simple discovery space-ship for your ~~infrastructure~~ planetary ecosystem across the universe.

The primary goal here is to determine every servers' network dependencies (upstream/downstream) along with their bandwidth usage.

Measure an environment's potential to maintain ~~services~~ life.

## Installation

Grab a pre-built binary for your OS from the [Releases](https://github.com/williamchanrico/planet-exporter/releases/latest) page.

## Configuration

There are no required flags. It is configured with usable defaults where only `--task-socketstat-enabled` is on.

```
Usage of planet-exporter:
-listen-address string
Address to which exporter will bind its HTTP interface (default "0.0.0.0:19100")
-log-disable-colors
Disable colors on logger
-log-disable-timestamp
Disable timestamp on logger
-log-level string
Log level (default "info")
-task-darkstat-addr string
Darkstat target address
-task-darkstat-enabled
Enable darkstat collector task
-task-ebpf-addr string
Ebpf target address (default "http://localhost:9435/metrics")
-task-ebpf-enabled
Enable Ebpf collector task
-task-interval string
Interval between collection of expensive data into memory (default "7s")
-task-inventory-addr string
HTTP endpoint that returns the inventory data
-task-inventory-enabled
Enable inventory collector task
-task-inventory-format string
Inventory format to parse the returned inventory data (default "arrayjson")
-task-socketstat-enabled
Enable socketstat collector task (default true)
-version
Show version and exit
```

Running **without any flags** (it enables only the socketstat collector task)

```
# planet-exporter
```

Running **with inventory and darkstat** (darkstat has to be installed separately rev >= [e7e6652](https://www.unix4lyfe.org/gitweb/darkstat/commit/e7e6652113099e33930ab0f39630bf280e38f769)). See example darkstat [init.cfg](setup/darkstat/init.cfg).

```sh
planet-exporter \
-task-inventory-enabled \
-task-inventory-addr http://link-to-your.net/inventory_hosts.json \
-task-darkstat-enabled \
-task-darkstat-addr http://localhost:51666/metrics
```

Running **with another inventory format**

```sh
planet-exporter \
-task-inventory-enabled \
-task-inventory-format "ndjson" \
-task-inventory-addr http://link-to-your.net/inventory_hosts.json
```

Running **with ebpf_exporter and another inventory format**

* Follow instructions on https://github.com/cloudflare/ebpf_exporter to start ebpf_exporter with example [tcptop.yaml](setup/ebpf-exporter/tcptop.yaml) configuration.

```sh
planet-exporter \
-task-ebpf-enabled \
-task-inventory-enabled \
-task-inventory-format "ndjson" \
-task-inventory-addr http://link-to-your.net/inventory_hosts.json
```

## Project Structure

![project-structure](project-structure.png)

* The `collector` implements prometheus.Collector interface and is the one behind promhttp.Handler. It leverages `task/*` packages for expensive metrics (also as cache) instead of preparing them on every prometheus.Collect.
* The `task/*` packages are the crew that are doing the expensive tasks behind the scene. They store the data for the `collector` package.

## Collector Tasks

This is the heart of Planet Exporter that's doing the heavy-lifting. Integrations with other dependencies happen here.

### Inventory

Query inventory data that will be used to map `ip_address` into `hostgroup` (an identifier based on Ansible convention) and `domain`.
The `ip_address` may use CIDR notation (e.g. "10.1.0.0/16") and Inventory task will use the longest-prefix match.

Without this task enabled, those hostgroup and domain fields will be empty.

Related flags:

* `--task-inventory-enabled=true` to enable the task.
* `--task-inventory-addr` accepts an HTTP endpoint that returns inventory data in the supported format.
* `--task-inventory-format` to choose the supported format for the inventory data.

Inventory formats:

1. --task-inventory-format=arrayjson

```json
[
{
"ip_address": "10.1.2.3",
"domain": "xyz.service.consul",
"hostgroup": "xyz"
},
{
"ip_address": "10.2.3.4",
"domain": "debugapp.service.consul",
"hostgroup": "debugapp"
},
{
"ip_address": "10.3.0.0/16",
"domain": "",
"hostgroup": "unknown-but-its-network-xyz"
}
]
```

2. --task-inventory-format=ndjson

```json
{"ip_address":"10.0.1.2","domain":"xyz.service.consul","hostgroup":"xyz"}
{"ip_address":"172.16.1.2","domain":"abc.service.consul","hostgroup":"abc"}
{"ip_address":"10.3.0.0/16","domain":"","hostgroup":"unknown-but-its-network-xyz"}
```

### Socketstat

Query local connections socket similar to `ss` or `netstat` to build upstream and downstream dependency metrics.

```
# HELP planet_upstream Upstream dependency of this machine
# TYPE planet_upstream gauge
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="80",process_name="debugapp",protocol="tcp",remote_address="xyz.service.consul",remote_hostgroup="xyz"} 1
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="8500",process_name="consul-template",protocol="tcp",remote_address="127.0.0.1",remote_hostgroup="localhost"} 1
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="8300",process_name="consul",protocol="tcp",remote_address="10.2.3.3",remote_hostgroup="consul-server"} 1
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="8300",process_name="consul",protocol="tcp",remote_address="10.2.3.4",remote_hostgroup="consul-server"} 1
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="3128",process_name="",protocol="tcp",remote_address="100.100.98.18",remote_hostgroup=""} 1
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="443",process_name="",protocol="tcp",remote_address="35.158.25.125",remote_hostgroup=""} 1
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="443",process_name="",protocol="tcp",remote_address="52.219.32.222",remote_hostgroup=""} 1
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="80",process_name="cloudmetrics",protocol="tcp",remote_address="100.100.103.57",remote_hostgroup=""} 1
planet_upstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="80",process_name="cloudmetrics",protocol="tcp",remote_address="100.100.30.26",remote_hostgroup=""} 1
# HELP planet_downstream Downstream dependency of this machine

# TYPE planet_downstream gauge
planet_downstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="9100",process_name="node_exporter",protocol="tcp",remote_address="prometheus.service.consul",remote_hostgroup="prometheus"} 1
planet_downstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="19100",process_name="planet-exporter",protocol="tcp",remote_address="prometheus.service.consul",remote_hostgroup="prometheus"} 1
planet_downstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="19100",process_name="planet-exporter",protocol="tcp",remote_address="192.168.1.2",remote_hostgroup=""} 1
planet_downstream{local_address="debugapp.service.consul",local_hostgroup="debugapp",port="22",process_name="sshd",protocol="tcp",remote_address="192.168.1.2",remote_hostgroup=""} 1

# HELP planet_server_process Server process that are listening on network interfaces
# TYPE planet_server_process gauge
planet_server_process{bind="0.0.0.0:111",port="111",process_name="rpcbind"} 1
planet_server_process{bind="0.0.0.0:19100",port="19100",process_name="planet-exporter"} 1
planet_server_process{bind="0.0.0.0:22",port="22",process_name="sshd"} 1
planet_server_process{bind="0.0.0.0:25",port="25",process_name="master"} 1
planet_server_process{bind="0.0.0.0:5666",port="5666",process_name="nrpe"} 1
planet_server_process{bind="0.0.0.0:80",port="80",process_name="nginx"} 1
planet_server_process{bind="127.0.0.1:53",port="53",process_name="consul"} 1
planet_server_process{bind="127.0.0.1:8500",port="8500",process_name="consul"} 1
planet_server_process{bind="0.0.0.0:51666",port="51666",process_name="darkstat"} 1
planet_server_process{bind=":::111",port="111",process_name="rpcbind"} 1
planet_server_process{bind=":::25",port="25",process_name="master"} 1
planet_server_process{bind=":::50051",port="50051",process_name="socketmaster"} 1
planet_server_process{bind=":::5666",port="5666",process_name="nrpe"} 1
planet_server_process{bind=":::8301",port="8301",process_name="consul"} 1
planet_server_process{bind=":::9000",port="9000",process_name="socketmaster"} 1
planet_server_process{bind=":::9100",port="9100",process_name="node_exporter"} 1
planet_server_process{bind=":::9256",port="9256",process_name="process_exporte"} 1
```

Related flags:

* `--task-socketstat-enabled=true` to enable the task.

### Darkstat

[Darkstat](https://unix4lyfe.org/darkstat/) captures network traffic, calculates statistics about usage, and serves reports over HTTP.

Though there's no port detection from darkstat to determine remote/local port for each traffic direction, the bandwidth information can still be useful.

NOTE: this means we'll have to install darkstat along with planet-exporter.

Example parsed metrics from darkstat when enabled (plus inventory task for `remote_domain` and `remote_hostgroup`):

```
# HELP planet_traffic_bytes_total Total network traffic with peers
# TYPE planet_traffic_bytes_total gauge
planet_traffic_bytes_total{direction="egress",remote_domain="xyz.service.consul",remote_hostgroup="xyz",remote_ip="10.1.2.3"} 2005
planet_traffic_bytes_total{direction="egress",remote_domain="debugapp.service.consul",remote_hostgroup="debugapp",remote_ip="10.2.3.4"} 150474
planet_traffic_bytes_total{direction="ingress",remote_domain="xyz.service.consul",remote_hostgroup="xyz",remote_ip="10.1.2.3"} 2525
planet_traffic_bytes_total{direction="ingress",remote_domain="debugapp.service.consul",remote_hostgroup="debugapp",remote_ip="10.2.3.4"} 1.26014316e+08
```

Related flags:

* `--task-darkstat-enabled=true` to enable the task.
* `--task-darkstat-addr` accepts an HTTP endpoint that returns darkstat metrics.

### EBPF Exporter

Planet exporter can be used along with [ebpf-exporter](https://github.com/cloudflare/ebpf_exporter) to extract packet flow information directly from kernel. PE currently supports reading prometheus data with [tcptop.yaml](setup/ebpf-exporter/tcptop.yaml) ebpf configuration. Checkout [ebpf-exporter](https://github.com/cloudflare/ebpf_exporter) instructions to run it with [tcptop.yaml](setup/ebpf-exporter/tcptop.yaml).

Related flags:

* `--task-ebpf-enabled=true` to enable the task.
* `--task-ebpf-addr` accepts an HTTP endpoint that returns ebpf_exporter metrics (see [tcptop.yaml](setup/ebpf-exporter/tcptop.yaml) for the expected metrics values and format)

# Exporter Cost

Planet exporter will consume CPU and Memory in proportion to the number
of opened network file descriptors (opened sockets).

# Tools

## Planet Federator

Dashboard queries on Planet Exporter raw data in Prometheus can get expensive very fast.
A tested 1-hour range query for a crowded machine with ~300 upstreams/downstreams took about `9s`.

To improve query efficiency, Planet Federator aggregates Prometheus metrics collected from all the Planet Exporters. The aggregation is based on all except `ip_address` metrics label, therefore individual `ip_address` granularity is lost.

Planet Exporter runs a cron that queries Planet Exporter's traffic bandwidth data from Prometheus, process, and
store them in a time-series database for clean and more efficient queries.

Last tested query duration, before and after Planet Federator was `2.678s` vs `330ms`.

TSDB supports:
- [x] InfluxDB
- [ ] Prometheus
- [ ] BigQuery

### Example InfluxQL

These queries should be enough to build a useful dashboard based on Planet Exporter and Planet Federator processed metrics.

```sql
-- Example InfluxQL: Produces time series data showing traffic bandwidth for service = $service
SELECT
SUM("bandwidth_bps")
FROM
"ingress"
WHERE
("service" = '$service') AND $timeFilter
GROUP BY
time($__interval), "service", "remote_service", "remote_address"

-- Example InfluxQL: Produces tabular format listing upstreams for service = $service
SELECT
SUM("service_dependency")
FROM (
SELECT * FROM "upstream" WHERE ("service" = '$service') AND Time > now() - 7d
)
GROUP BY
"upstream_service", "upstream_address", "process_name", "upstream_port", "protocol", time(10000d)

-- Example InfluxQL: Produces tabular format listing downstreams for service = $service
SELECT
SUM("service_dependency")
FROM (
SELECT * FROM "downstream" WHERE ("service" = '$service') AND Time > now() - 7d
)
GROUP BY
"downstream_service", "downstream_address", "process_name", "port", "protocol", time(10000d)
```

```sh
$ planet-federator \
-prometheus-addr "http://127.0.0.1:9090" \
-influxdb-addr "http://127.0.0.1:8086" \
-influxdb-bucket "mothership" # Works as database name if you're using InfluxDB v1.8 and earlier
```

## Planet Federator InfluxDB to BigQuery

This tool helps query and aggregate the Planet Federator data further into 2 categories: (1) Traffic Bandwidth data & (2) Dependency list data, for every services, stored in BigQuery tables.

Read more in [cmd/planet-federator-influxdb-to-bq](./cmd/planet-federator-influxdb-to-bq)

# Ansible Role

Find the sample Ansible Roles in [./setup/ansible-roles](./setup/ansible-roles) to help setup Planet Exporter or Planet Federator.

Generally, planet_exporter is installed as a sidecar agent on most (if not all) servers,
while planet_federator is installed on an influxdb server.

Remember that for planet-federator-influxdb-to-bq to work, provide the instance's Service Account a write access to target BQ Tables (i.e. roles/bigquery.dataEditor).

# Go Version

```
$ go version
go version go1.20 linux/amd64
```

> Older Go versions should work fine.

# Contributing

Pull requests for new features, bug fixes, and suggestions are welcome!

# License

[Apache License 2.0](https://github.com/williamchanrico/planet-exporter/blob/master/LICENSE)