https://github.com/shepard-system/shepard-obs-stack
Self-hosted observability for AI coding agents. Clone. Configure. See.
https://github.com/shepard-system/shepard-obs-stack
ai-agents claude-code codex gemini-cli grafana loki observability opentelemetry prometheus
Last synced: 16 days ago
JSON representation
Self-hosted observability for AI coding agents. Clone. Configure. See.
- Host: GitHub
- URL: https://github.com/shepard-system/shepard-obs-stack
- Owner: shepard-system
- License: other
- Created: 2026-02-23T01:34:40.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-03-19T00:39:14.000Z (30 days ago)
- Last Synced: 2026-03-19T14:00:51.095Z (29 days ago)
- Topics: ai-agents, claude-code, codex, gemini-cli, grafana, loki, observability, opentelemetry, prometheus
- Language: Shell
- Size: 1.42 MB
- Stars: 63
- Watchers: 0
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Codeowners: CODEOWNERS
Awesome Lists containing this project
README
# shepard-obs-stack
[](https://grafana.com/)
[](https://prometheus.io/)
[](https://grafana.com/oss/loki/)
[](https://opentelemetry.io/docs/collector/)
[](LICENSE)
[](https://github.com/shepard-system/shepard-obs-stack/actions/workflows/test.yml)
**The Eye** — self-hosted observability for AI coding assistants.
You use Claude Code, Codex, or Gemini CLI every day.
You have no idea how much they cost, which tools they call, or whether they're actually helping.
This fixes that.

More screenshots
**Tools** — 5K calls across all three CLIs, top tools ranked, failing tools by error count:

**Operations** — live event rate, breakdown by source and event type:

**Claude Code Deep Dive** — per-model cost, token breakdown, cache efficiency, productivity ratio:

**Claude Code Deep Dive (Tools)** — tool decisions, active time breakdown:

**Quality** — cache hit rates, error rates, session trends:

## Table of Contents
- [Highlights](#highlights)
- [Quick Start](#quick-start)
- [Dashboards](#dashboards)
- [How It Works](#how-it-works)
- [Hook Setup](#hook-setup)
- [Claude Code Skills](#claude-code-skills)
- [Rust Accelerator](#rust-accelerator-optional)
- [Alerting](#alerting)
- [Services](#services)
- [Architecture](#architecture)
- [Project Structure](#project-structure)
- [Contributing](#contributing)
- [License](#license)
## Highlights
- **One command** to start: `./scripts/init.sh` — 6 services, 9 dashboards, under a minute
- **Three CLIs supported**: Claude Code, Codex, Gemini CLI — hooks + native OpenTelemetry
- **Nine Grafana dashboards** auto-provisioned: cost, tools, operations, quality, per-provider deep dives, session timeline, and side-by-side provider comparison
- **Minimal dependencies** — Docker, plus `bash`, `curl`, and `jq` on the host for hooks and tests. No Python, no Node, no cloud accounts
- **Optional [Rust accelerator](https://github.com/shepard-system/shepard-hooks-rs)** — drop-in `shepard-hook` binary replaces bash+jq+curl. Hooks auto-detect it; falls back to bash if absent
- **Seven Claude Code [skills](#claude-code-skills)** — `/obs-status`, `/obs-cost`, `/obs-sessions`, `/obs-tools`, `/obs-alerts`, `/obs-compare`, `/obs-query` — query the stack without leaving your terminal
- **Works offline** — everything runs on localhost, your data stays on your machine
## Quick Start
**Prerequisites:** Docker (with Compose v2), `curl`, `jq`, and at least one AI CLI installed.
```bash
git clone https://github.com/shepard-system/shepard-obs-stack.git
cd shepard-obs-stack
./scripts/init.sh # starts stack + health check
./hooks/install.sh # injects hooks into your CLI configs
```
Open [localhost:3000](http://localhost:3000) (admin / shepherd). Use your CLI as usual — data appears in dashboards within seconds.
```bash
./scripts/test-signal.sh # verify the full pipeline (11 checks)
```
## Dashboards
### Unified (cross-provider)
| Dashboard | Question it answers |
|----------------|-----------------------------------------|
| **Cost** | How much is this costing me? |
| **Tools** | Who is performing and who is wandering? |
| **Operations** | What is happening right now? |
| **Quality** | How well is the system working? |
### Deep Dive (per-provider)
| Dashboard | What you see |
|-----------------|---------------------------------------------------------|
| **Claude Code** | Token usage, cost by model, tool decisions, active time |
| **Codex** | Sessions, API latency percentiles, reasoning tokens |
| **Gemini CLI** | Token breakdown, latency heatmap, tool call routing |
### Session Timeline & Comparative
| Dashboard | What you see |
|----------------------|--------------------------------------------------------------------------------------------|
| **Session Timeline** | Synthetic traces from all 3 CLI session logs — tool call waterfall, MCP timing, sub-agents |
| **Comparative** | Side-by-side provider comparison: sessions, cost, tokens, tools, errors, top repos |
Click any Trace ID to open the full waterfall in Grafana Explore → Tempo.
Dashboard template variables: **Tools** and **Operations** support `$source` and `$git_repo` filtering.
**Deep Dive** dashboards use `$model`. **Session Timeline** uses `$provider`. **Comparative** uses `$git_repo`. **Cost** and **Quality** show aggregated data without filters.
## How It Works
AI CLIs emit telemetry through two channels:
```
AI CLI (Claude Code / Codex / Gemini)
│
├── bash hooks → OTLP metrics (tool calls, events, git context)
│ └─→ OTel Collector :4318
│
└── native OTel → gRPC (tokens, cost, logs, traces)
└─→ OTel Collector :4317
│
├── metrics → Prometheus :9090
├── logs → Loki :3100
└── traces → Tempo :3200
│
Loki recording rules ──── remote_write ───→ Prometheus
│
Grafana :3000 ←── PromQL + LogQL ──────────┘
```
**Hooks** provide what native OTel cannot: git repo context and labeled tool/event counters.
Everything else (tokens, cost, sessions) comes from native OTel export.
## Hook Setup
```bash
./hooks/install.sh # all detected CLIs
./hooks/install.sh claude # specific CLI
./hooks/install.sh codex gemini # selective
./hooks/uninstall.sh # clean removal
```
The installer auto-detects installed CLIs and merges hook configuration into their config files (creating backups first).
| CLI | Hooks | Native OTel signals |
|-------------|-------------------------------------------------------|-------------------------|
| Claude Code | `PreToolUse`, `PostToolUse`, `SessionStart`, `Stop` | metrics + logs |
| Codex CLI | `agent-turn-complete` | logs |
| Gemini CLI | `AfterTool`, `AfterAgent`, `AfterModel`, `SessionEnd` | metrics + logs + traces |
## Claude Code Skills
Seven slash-command skills for querying the obs stack directly from Claude Code — no browser needed.
| Skill | What it does |
|-------|-------------|
| `/obs-status` | Stack health: service status, scrape targets, last telemetry, active alerts |
| `/obs-cost` | Cost report by provider and model (supports `today`, `yesterday`, `week`, `24h`) |
| `/obs-sessions` | Recent sessions with model, duration, tool count, cost |
| `/obs-tools` | Top tools, error rates, usage by provider and repo |
| `/obs-alerts` | Active alerts with severity and resolution hints |
| `/obs-compare` | Side-by-side provider comparison: sessions, cost, tokens, tools, errors |
| `/obs-query` | Free-form PromQL or LogQL — run any query inline |
Skills are installed automatically when you clone the repo (they live in `.claude/skills/`). All API calls go through `scripts/obs-api.sh` — a centralized helper that's ready for auth and TLS when you need it:
```bash
# Default: plain HTTP to localhost (single-machine, no auth)
./scripts/obs-api.sh prometheus /api/v1/query --data-urlencode 'query=up'
# With auth (set env vars when hardening or going multi-machine)
SHEPARD_API_TOKEN=secret ./scripts/obs-api.sh prometheus /api/v1/query ...
SHEPARD_CA_CERT=/path/to/ca.pem ./scripts/obs-api.sh loki /ready
```
## Rust Accelerator (optional)
All hooks work out of the box with bash + jq + curl. For faster execution, you can optionally install the [Rust accelerator](https://github.com/shepard-system/shepard-hooks-rs) — a single static binary that replaces the entire bash pipeline:
```bash
./scripts/install-accelerator.sh # latest release → hooks/bin/ (no sudo)
./scripts/install-accelerator.sh v0.4.0 # specific version
```
The installer downloads a pre-built binary from [GitHub Releases](https://github.com/shepard-system/shepard-hooks-rs/releases) (linux/macOS, x64/arm64) and verifies it against the `SHA256SUMS` file published with each release. The binary is placed in `hooks/bin/` (gitignored, project-local).
Hooks auto-detect it via `hooks/lib/accelerator.sh` (project-local → PATH → bash fallback). No configuration needed — if the binary is present, hooks use it; if not, they fall back to bash.
Remove with `./hooks/uninstall.sh` or simply delete `hooks/bin/`.
## Alerting
Alertmanager runs on :9093 with 16 alert rules in three tiers:
| Tier | Alerts | Examples |
|--------------------|--------|---------------------------------------------------------------------------------------------------------------------------------------|
| **Infrastructure** | 6 | `OTelCollectorDown`, `CollectorHighMemory`, `PrometheusHighMemory`, export failures |
| **Pipeline** | 5 | `LokiDown`, `ShepherdServicesDown`, `TempoDown`, `PrometheusTargetDown`, `LokiRecordingRulesFailing` |
| **Business logic** | 5 | `HighSessionCost` (>$10/hr), `HighTokenBurn` (>50k tok/min), `HighToolErrorRate` (>10%), `SensitiveFileAccess`, `NoTelemetryReceived` |
Inhibit rules suppress business-logic alerts when infrastructure is down.
Native Telegram, Slack, and Discord receivers are included — uncomment and configure in `configs/alertmanager/alertmanager.yaml`:
```yaml
# telegram_configs:
# - bot_token: 'YOUR_BOT_TOKEN'
# chat_id: YOUR_CHAT_ID
# send_resolved: true
```
## Services
| Service | Port | Purpose |
|----------------|-----------|----------------------|
| Grafana | 3000 | Dashboards & explore |
| Prometheus | 9090 | Metrics & alerts |
| Loki | 3100 | Log aggregation |
| Tempo | 3200 | Distributed tracing |
| Alertmanager | 9093 | Alert routing |
| OTel Collector | 4317/4318 | OTLP gRPC + HTTP |
## Architecture
C4 diagrams (click to expand)
### System Context

### Containers

### Hook Components

### Hook Event Flow

### Event Schema Normalization

## Project Structure
```
shepard-obs-stack/
├── docker-compose.yaml
├── .env.example
├── .claude/skills/ # Claude Code slash-command skills
│ ├── obs-status/ # /obs-status — stack health
│ ├── obs-cost/ # /obs-cost — cost report
│ ├── obs-sessions/ # /obs-sessions — session summary
│ ├── obs-tools/ # /obs-tools — tool usage
│ ├── obs-alerts/ # /obs-alerts — active alerts
│ ├── obs-compare/ # /obs-compare — provider comparison
│ └── obs-query/ # /obs-query — free-form PromQL/LogQL
├── hooks/
│ ├── bin/ # Rust accelerator binary (gitignored, downloaded)
│ ├── lib/ # shared: accelerator, git context, OTLP metrics + traces, sensitive file detection, session parser
│ ├── claude/ # PreToolUse + PostToolUse + SessionStart + Stop
│ ├── codex/ # notify.sh (agent-turn-complete)
│ ├── gemini/ # AfterTool + AfterAgent + AfterModel + SessionEnd
│ ├── install.sh # auto-detect + inject
│ └── uninstall.sh # clean removal
├── scripts/
│ ├── init.sh # bootstrap
│ ├── install-accelerator.sh # download Rust accelerator to hooks/bin/
│ ├── obs-api.sh # centralized API client (auth-ready)
│ ├── test-signal.sh # pipeline verification (11 checks)
│ └── render-c4.sh # render PlantUML → SVG
├── tests/
│ ├── run-all.sh # test orchestrator (--e2e for Docker smoke)
│ ├── test-shell-syntax.sh # bash -n + shellcheck
│ ├── test-config-validate.sh # JSON + YAML validation
│ ├── test-hooks.sh # behavioral tests (41 tests)
│ ├── test-parsers.sh # session parser tests (37 tests)
│ └── fixtures/ # minimal session logs (Claude, Codex, Gemini)
├── configs/
│ ├── otel-collector/ # receivers → processors → exporters
│ ├── prometheus/ # scrape targets + alert rules
│ ├── alertmanager/ # routing, Telegram/Slack/Discord receivers
│ ├── loki/ # storage + 15 recording rules
│ ├── tempo/ # trace storage, 7d retention
│ └── grafana/ # provisioning + 9 dashboard JSONs
└── docs/c4/ # architecture diagrams
```
## Testing
128 automated tests across 4 suites, plus a Docker-based E2E smoke test:
```bash
bash tests/run-all.sh # unit tests: syntax, configs, hooks, parsers
bash tests/run-all.sh --e2e # + Docker E2E (starts stack, runs test-signal.sh)
```
| Suite | Tests | What it checks |
|-------|-------|----------------|
| Shell Syntax | 24 | `bash -n` on all scripts, shellcheck (if installed) |
| Config Validation | 26 | JSON dashboards (jq) + YAML configs (PyYAML) + promtool rules + alert regression |
| Hook Behavior | 41 | PreToolUse guard, PostToolUse metrics, Stop compaction, all Gemini hooks, Codex, install/uninstall |
| Session Parsers | 37 | Span count, required fields, attributes, error status, trace_id consistency, context breakdown, per-turn spans |
CI runs automatically on push/PR via [GitHub Actions](.github/workflows/test.yml).
## Contributing
Issues and pull requests are welcome. Before submitting changes, run the tests:
```bash
bash tests/run-all.sh
```
## License
[Elastic License 2.0](LICENSE) — free to use, modify, and distribute. Cannot be offered as a hosted or managed service.
Part of the [Shepard System](https://github.com/shepard-system).