https://github.com/donchanee/metricops
Prometheus metric governance CLI for self-hosted operators
https://github.com/donchanee/metricops
cli cost-optimization golang grafana metrics observability prometheus sre
Last synced: 2 months ago
JSON representation
Prometheus metric governance CLI for self-hosted operators
- Host: GitHub
- URL: https://github.com/donchanee/metricops
- Owner: donchanee
- License: apache-2.0
- Created: 2026-04-17T13:59:26.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-18T01:26:50.000Z (2 months ago)
- Last Synced: 2026-04-18T02:41:33.754Z (2 months ago)
- Topics: cli, cost-optimization, golang, grafana, metrics, observability, prometheus, sre
- Language: Go
- Size: 92.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# metricops
**Prometheus metric governance for self-hosted operators.**
[](https://github.com/donchanee/metricops/actions/workflows/ci.yml)
[](https://github.com/donchanee/metricops/releases)
[](https://pkg.go.dev/github.com/donchanee/metricops)
[](./LICENSE)
`metricops analyze` reads your Prometheus TSDB, Grafana dashboards, and alert
rules and tells you three things you probably can't answer today:
1. **Which metrics are you paying to store but nobody queries?**
2. **Which metrics and labels are quietly eating most of your series budget?**
3. **How many bytes per day would you save if you dropped the dead weight?**
All local. No network calls. No telemetry. No cloud service. Your TSDB
metadata and dashboards never leave the machine `metricops` runs on.
---
## Example output
```markdown
# metricops Report
Generated: 2026-04-17T12:00:00Z · Schema: 1.0
## Summary
| Metric | Value |
|---------------------------|-----------------------------|
| Total metrics | 80 |
| Unused metrics | 21 |
| Total active series | 76,475 |
| Estimated daily bytes | 880,991,987 (840.18 MiB) |
| Estimated monthly bytes | 26,429,759,610 (24.61 GiB) |
| Bytes per sample assumed | 2.00 |
## Unused Metrics
21 metrics appear in TSDB but are not referenced by any dashboard, alert,
or recording rule.
| Metric | Active Series | Bytes/day |
|----------------------------------------|--------------:|-------------:|
| `http_request_duration_seconds_bucket` | 15,013 | 172,949,760 |
| `http_response_size_bytes` | 2,640 | 30,412,800 |
| `legacy_worker_queue_depth` | 26 | 299,520 |
...
## Recommendations
21 actions identified. Total estimated daily savings: 224.87 MiB.
- `drop_metric` **`http_request_duration_seconds_bucket`** — saves ~164.94 MiB/day
...
```
Ask for JSON with `--format=json` for CI pipelines and scripts. The v1.0 JSON
schema is byte-deterministic across runs (same inputs → byte-identical output,
a CI gate in this repo).
---
## Install
### Go
```bash
go install github.com/donchanee/metricops/cmd/metricops@latest
```
### Pre-built binaries
Grab the archive for your OS and arch from the
[releases page](https://github.com/donchanee/metricops/releases) and put the
`metricops` binary on your `PATH`.
Supported: linux/amd64, linux/arm64, darwin/amd64, darwin/arm64, windows/amd64.
### Build from source
```bash
git clone https://github.com/donchanee/metricops
cd metricops
go build -o metricops ./cmd/metricops
```
---
## Quick start
You need three inputs:
1. The output of `promtool tsdb analyze ` (a text file).
2. Your Grafana dashboards as JSON files in a directory (optional).
3. Your Prometheus rule files as YAML in a directory (optional).
```bash
# Capture TSDB state (takes a minute on a large block; that's promtool's time, not ours)
promtool tsdb analyze /var/lib/prometheus/ > tsdb-analyze.txt
# Export dashboards (or point at a git-mirrored dashboards-as-code directory)
# Skip this flag if you want a rules-only run.
# Analyze
metricops analyze \
--tsdb=tsdb-analyze.txt \
--grafana=./dashboards \
--rules=/etc/prometheus/rules
```
Try it on synthetic data right now — this repo ships a fixture generator:
```bash
go run testdata/fixtures/generate.go -out ./testdata/fixtures
go run ./cmd/metricops analyze \
--tsdb=./testdata/fixtures/tsdb-analyze.txt \
--grafana=./testdata/fixtures/dashboards \
--rules=./testdata/fixtures/rules
```
---
## CLI reference
```text
metricops analyze [flags]
Flags:
--tsdb string promtool tsdb analyze output (file or '-' for stdin) [required]
--grafana string Grafana dashboard JSON file or directory
--rules string Prometheus rule YAML file or directory
--format string output format: markdown (default), md, or json
--schema string JSON schema version (default "1.0"; only 1.0 supported in v0.x)
--strict promote per-file parse warnings to exit 1
--bytes-per-sample float assumed bytes per sample, compressed TSDB (default 2.00)
--fail-on string exit 1 when findings of this kind exist (e.g. 'findings')
--timeout duration analysis deadline (default 5m)
--progress emit stage-based progress on stderr
```
### Exit codes
| Code | Meaning |
|------|------------------------------------------------------------------------|
| 0 | Success, with or without findings (unless `--fail-on` matched). |
| 1 | Findings present and matched `--fail-on`, or strict parse warnings. |
| 2 | Invalid flags, unreadable input, unsupported schema. |
---
## How it works
```text
┌───────────────────┐ ┌───────────────────┐
│ promtool tsdb │ │ Grafana JSON +│
│ analyze output │ │ Prom rule YAML │
└─────────┬─────────┘ └─────────┬─────────┘
│ │
▼ ▼
┌──────────────┐ ┌───────────────────┐
│ parse.TSDB │ │ parse.Grafana/Rules
└──────┬───────┘ └─────────┬─────────┘
│ []*Metric │ []Reference
│ │
└──────────┬───────────────────┘
▼
┌─────────────────┐
│ builder.Build │ ← promqlx.MetricNames
└────────┬────────┘ walks each Expr AST
│ *Model to attribute refs
▼
┌─────────────────┐
│ analyze.* │ DetectUnused
│ │ DetectHotspots
│ │ EstimateSummary
│ │ BuildRecommendations
└────────┬────────┘
▼
┌─────────────────┐
│ render.* │ Markdown or JSON
└─────────────────┘
▼
stdout
```
Every stage is a pure function; parsers do the only I/O. The whole pipeline
runs under `--timeout` (default 5m) so a cron job can't hang your runner.
---
## CI integration
Use `--fail-on=findings` to block PRs that introduce unused metrics or
cardinality hotspots.
```yaml
# .github/workflows/prom-governance.yml
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install metricops
run: go install github.com/donchanee/metricops/cmd/metricops@latest
- name: Export TSDB state
run: |
ssh prod-prom 'promtool tsdb analyze /var/lib/prometheus/' > tsdb.txt
- name: Analyze
run: |
metricops analyze \
--tsdb=tsdb.txt \
--grafana=./dashboards \
--rules=./rules \
--format=json \
--fail-on=findings > report.json
- uses: actions/upload-artifact@v4
with:
name: metricops-report
path: report.json
```
---
## Data privacy
`metricops` runs entirely on your machine. It does not:
- make outbound network calls,
- phone home with telemetry,
- read configuration from any location you didn't point it at,
- write outputs anywhere other than stdout and stderr.
Your TSDB analysis, dashboards, and rule files stay on disk. You can run
it on air-gapped infrastructure.
---
## Known limitations
- **Recording-rule chains are not traversed.** A metric referenced only
through another recording rule's output may appear unused. v2 roadmap.
- **Template-variable expressions** (`${var}`, `$__range`) in Grafana are
skipped with a warning; their metric references are not attributed.
- **Repeat panels** in Grafana are skipped.
- **Regex `__name__` matchers** (`{__name__=~"foo.*"}`) cannot be resolved;
metrics referenced only this way will appear unused.
- **VictoriaMetrics / Thanos / Mimir** backends are not yet adapted; this
release targets vanilla Prometheus. v2 adds those via the same parser
contract.
See [CHANGELOG.md](./CHANGELOG.md) for the full v0.1.0 shipping notes.
---
## Contributing
Issues and PRs welcome.
- Small PRs merge faster. Keep one change per PR.
- `go test ./...` and `go vet ./...` must pass.
- New behavior needs tests.
- The JSON schema is a compatibility surface: additive changes only within
v1.x; breaking changes bump the major.
---
## License
[Apache-2.0](./LICENSE).