https://github.com/bitpingapp/distributed-metrics
Collect and expose distributed networking metrics to your prometheus instance
https://github.com/bitpingapp/distributed-metrics
grafana last-mile-monitoring metrics networking-tools peer-to-peer prometheus residential
Last synced: 9 days ago
JSON representation
Collect and expose distributed networking metrics to your prometheus instance
- Host: GitHub
- URL: https://github.com/bitpingapp/distributed-metrics
- Owner: BitpingApp
- License: other
- Created: 2024-11-23T11:16:11.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2026-05-15T09:12:02.000Z (24 days ago)
- Last Synced: 2026-05-15T11:29:16.356Z (24 days ago)
- Topics: grafana, last-mile-monitoring, metrics, networking-tools, peer-to-peer, prometheus, residential
- Language: Rust
- Homepage:
- Size: 434 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# Global Metrics Collector
This tool uses the Bitping Developer API to collect metrics about different protocols and how services respond from an external perspective.
Similar to Uptime testing tools such as BetterUptime or UptimeRobot but you own the data and can hook the data into your own Prometheus & Grafana for reporting.
You can also specify the network type of the reporting device such as if its a Residential IP, a Hosted VPS IP, a Mobile Broadband IP or even behind a Proxy/VPN service.
## Get Started
1. Sign up for the Bitping Developer API at https://developer.bitping.com
2. Generate an API Key
3. Create a `Metrics.yaml` file (see configuration below)
4. Set your BITPING_API_KEY environment variable:
```bash
export BITPING_API_KEY=your_api_key
```
5. Follow the install instructions below
6. Run:
```bash
distributed-metrics
```
Metrics will be available at `http://localhost:3000/metrics` in Prometheus format. You can also push metrics to remote endpoints via [remote write](#remote-write).
## Installation
### Install prebuilt binaries via shell script
```sh
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/BitpingApp/distributed-metrics/releases/latest/download/distributed-metrics-installer.sh | sh
```
### Install prebuilt binaries via Homebrew
```sh
brew install BitpingApp/tap/distributed-metrics
```
### Run Docker Container
```bash
docker run -d \
-p 3000:3000 \
-e BITPING_API_KEY=your_api_key \
-v $(pwd)/Metrics.yaml:/app/Metrics.yaml \
bitping/distributed-metrics
```
### Run Docker Compose
Save the following yaml to docker-compose.yaml
```yaml
version: '3'
services:
metrics:
image: bitping/distributed-metrics
ports:
- "3000:3000"
environment:
- BITPING_API_KEY=your_api_key
volumes:
- ./Metrics.yaml:/app/Metrics.yaml
restart: unless-stopped
```
Run:
```bash
docker compose up -d
```
#### Remote write only (no scrape endpoint)
If you only need remote write, you can skip exposing port 3000:
```yaml
version: '3'
services:
metrics:
image: bitping/distributed-metrics
environment:
- BITPING_API_KEY=your_api_key
volumes:
- ./Metrics.yaml:/app/Metrics.yaml
restart: unless-stopped
```
## Supported Protocols
### DNS
Measures DNS resolution performance and reliability.
```yaml
metrics:
- type: dns
prefix: "custom_prefix_" # Optional prefix for metrics
name: "custom_name" # Optional name override
endpoint: example.com
frequency: 1s
network:
proxy: denied
mobile: allowed
residential: required
country_code: NLD # Optional: ISO 3166-1 alpha-3 country code
continent_code: EU # Optional: AF, AN, AS, EU, NA, OC, SA
isp_regex: "^Comcast" # Optional: Filter by ISP name
node_id: "node123" # Optional: Specific node ID
lookup_type: IP # Optional: IP, MX, SOA, NS, TXT, SRV, TLSA (default: IP)
dns_servers: # Optional: custom DNS resolvers (host:port format)
- "8.8.8.8:53"
- "1.1.1.1:53"
```
Metrics collected:
- `dns_lookup_success_total`: Count of successful DNS lookups
- `dns_lookup_error_total`: Count of DNS lookup errors by type
- `dns_lookup_total`: Total number of DNS lookups attempted
- `dns_server_lookup_duration_ms`: Time taken for DNS resolution
- `dns_record_hash`: Hash of the DNS response for change detection
- `dns_records_count`: Number of records returned
- `dns_soa_records_count`: Number of SOA records (when applicable)
Labels:
- country_code
- continent
- city
- isp
- os
- endpoint
- dns_server
- record_type
- error_type (for errors)
### ICMP
Measures network latency and packet loss.
```yaml
metrics:
- type: icmp
prefix: "custom_prefix_" # Optional prefix for metrics
name: "custom_name" # Optional name override
endpoint: example.com
frequency: 1s
network:
proxy: denied
mobile: allowed
residential: required
```
Metrics collected:
- `icmp_ping_failures_total`: Count of ping failures
- `icmp_ping_success_total`: Count of successful pings
- `icmp_ping_duration_ms`: Overall ping duration
- `icmp_ping_latency_min_ms`: Minimum latency
- `icmp_ping_latency_max_ms`: Maximum latency
- `icmp_ping_latency_avg_ms`: Average latency
- `icmp_ping_latency_stddev_ms`: Standard deviation of latency
- `icmp_ping_packet_loss_ratio`: Ratio of lost packets
- `icmp_ping_packets_sent`: Number of packets sent
- `icmp_ping_packets_received`: Number of packets received
- `icmp_ping_success_ratio`: Success rate of pings
Labels:
- country_code
- continent
- city
- isp
- os
- endpoint
- ip_address
- error_type (for failures)
### HTTP
Measures HTTP response times, status codes, and body content.
```yaml
metrics:
- type: http
prefix: "custom_prefix_" # Optional prefix for metrics
name: "custom_name" # Optional name override
endpoint: https://api.example.com/health
frequency: 15s
method: GET
headers: # Optional custom headers
Authorization: "Bearer token123"
body: '{"key": "value"}' # Optional request body
regex: "ok|healthy" # Optional: regex to match against response body
status_codes: [200, 204] # Optional: expected status codes
network:
proxy: denied
residential: required
```
When `status_codes` is set, an `http_status_match` gauge is emitted: `1` if the response code is in the list, `0` otherwise. This lets you alert on unexpected status codes in PromQL:
```promql
http_status_match{endpoint="https://api.example.com/health"} == 0
```
Metrics collected:
- `http_request_duration_ms`: HTTP request duration
- `http_status_code`: Response status code
- `http_status_match`: Whether status matched expected codes (only when `status_codes` is set)
- `http_body_hash`: Hash of the response body for change detection
- `http_regex_match_count`: Number of regex matches in the response body
- `http_request_success_total`: Count of successful requests
- `http_request_error_total`: Count of failed requests
- `http_request_total`: Total requests attempted
Labels:
- country_code
- continent
- city
- isp
- os
- endpoint
- status_code
- error_type (for errors)
### HLS
Measures HLS video stream performance and quality metrics.
```yaml
metrics:
- type: hls
prefix: "custom_prefix_" # Optional prefix for metrics
name: "custom_name" # Optional name override
endpoint: https://example.com/stream.m3u8
frequency: 15s
network:
proxy: denied
mobile: allowed
residential: required
headers: # Optional custom headers
User-Agent: "CustomPlayer/1.0"
Authorization: "Bearer token123"
```
Metrics collected:
- `hls_total_ms`: Total time taken for HLS test
- `hls_master_download_ms`: Master playlist download time
- `hls_master_size_bytes`: Master playlist size
- `hls_master_bitrate`: Master playlist download speed
- `hls_renditions_count`: Number of available renditions
- `hls_master_tcp_connect_ms`: TCP connection time
- `hls_master_ttfb_ms`: Time to first byte
- `hls_master_dns_resolve_ms`: DNS resolution time
- `hls_master_tls_handshake_ms`: TLS handshake time
- `hls_fragment_download_ms`: Fragment download times
- `hls_fragment_size_bytes`: Fragment sizes
- `hls_fragment_bandwidth_bytes_per`: Fragment bandwidth
- `hls_fragment_duration_seconds`: Fragment durations
- `hls_buffer_fill_rate`: Buffer fill rate vs playback speed
- `hls_estimated_buffer_ms`: Estimated buffer length
- `hls_initial_buffer_ms`: Initial buffering time
- `hls_playlist_chain_load_time`: Total playlist load time
- `hls_failures_total`: Count of failures
- `hls_errors_by_type`: Errors by category
Labels:
- country_code
- continent
- city
- isp
- os
- endpoint
- resolution
- bandwidth
- target_duration_secs
- discontinuity_sequence
- playlist_type
- error_type (for failures)
## Configuration
### Global Configuration
```yaml
metric_clear_timeout: 10s # How long to keep metrics after a scrape has occured - prevents timeouts on scraping as cardinality can be high
scrape_enabled: true # Enable the /metrics scrape endpoint (default: true)
metrics:
# Protocol configurations as shown above
```
### Remote Write
Push metrics to one or more Prometheus-compatible remote write endpoints (Grafana Cloud, VictoriaMetrics, Mimir, Cortex, Thanos, etc.) instead of or in addition to the scrape endpoint.
```yaml
remote_write:
- name: grafana-cloud
url: https://prometheus-prod-01-eu-west-0.grafana.net/api/prom/push
username: "123456"
password: "glc_your_api_key_here"
interval: 15s
- name: local-victoriametrics
url: http://victoriametrics:8428/api/v1/write
interval: 10s
headers:
Authorization: "Bearer my-token"
```
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `name` | Yes | — | Identifier for logging |
| `url` | Yes | — | Remote write endpoint URL |
| `username` | No | — | Basic auth username |
| `password` | No | — | Basic auth password |
| `headers` | No | `{}` | Custom HTTP headers (e.g., bearer tokens) |
| `interval` | No | `15s` | Push interval |
| `timeout` | No | `30s` | HTTP request timeout per push |
To disable the scrape endpoint and use only remote write:
```yaml
scrape_enabled: false
remote_write:
- name: my-destination
url: https://my-endpoint/api/v1/write
```
On consecutive push failures, the sender backs off exponentially (base interval * 2^failures, capped at 5 minutes) and resets on success.
### Network Selection Parameters
All protocols support these network selection criteria:
- `proxy`: Policy for proxy nodes (allowed, denied, required)
- `mobile`: Policy for mobile nodes (allowed, denied, required)
- `residential`: Policy for residential nodes (allowed, denied, required)
- `continent_code`: Optional continent restriction (AF, AN, AS, EU, NA, OC, SA)
- `country_code`: Optional country restriction (ISO 3166-1 alpha-3)
- `isp_regex`: Optional ISP name filter using regex
- `node_id`: Optional specific node selection
### Common Metric Configuration
All metrics support these base configuration options:
- `prefix`: Optional prefix for metric names
- `name`: Optional name override for the endpoint label
- `endpoint`: Target hostname or URL
- `frequency`: How often to collect metrics (e.g., "1s", "15s", "1m")
- `network`: Network selection criteria (see above)
## Testing
```bash
cargo test # Unit tests only
cargo test --test remote_write -- --ignored # Integration tests (requires Docker/Podman)
```
Integration tests run the real remote write code path against both VictoriaMetrics and Prometheus containers (via [testcontainers](https://crates.io/crates/testcontainers)). They verify that gauges, counters, and histograms push correctly and are queryable on both backends.
## Error Handling
All collectors track failures with specific error types in their respective `*_failures_total` or `*_errors_by_type` metrics. Common error categories include:
- DNS: no_records, connection_refused, timeout, resolution_failed, server_misbehaving
- ICMP: dns_lookup_failed, timeout, host_unreachable, permission_denied, network_unreachable
- HLS: dns_error, not_found, invalid_manifest, timeout, connection_error, ssl_error, http_4xx/5xx