An open API service indexing awesome lists of open source software.

https://github.com/nujovich/hermes-telemetry

Budget enforcement + observability plugin for Hermes Agent. Stops runaway costs before they happen.
https://github.com/nujovich/hermes-telemetry

agent-telemetry ai-cost-tracking budget-enforcement hermes-agent hermes-plugin llm-budget llm-observability token-tracking

Last synced: 5 days ago
JSON representation

Budget enforcement + observability plugin for Hermes Agent. Stops runaway costs before they happen.

Awesome Lists containing this project

README

          

# hermes-telemetry ☤

> *Observability + budget guardrails for [Hermes Agent](https://github.com/NousResearch/hermes-agent)*

**Budget enforcement + observability for Hermes Agent. The only plugin that can stop a run before it overspends.**

A comprehensive telemetry plugin that captures real usage data, enforces budget limits, and provides detailed cost analysis for AI agent operations. Built for the [Hermes Agent Challenge](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd) by [Nadia Ujovich](https://nadiaujovich.dev).

**The differentiator: it can _stop_ work that's about to overspend — not just report it after the fact.** Set a daily cap below current spend, and the next cron run is blocked by the budget:

![Budget enforcement demo: a $0.001 daily global cap is set, current spend already exceeds it, and the next marketing cron run is blocked by the resulting hard breach](docs/budget_enforcement.gif)

*`/budget set global daily 0.001` writes the cap to `budget.yaml`; current spend ($0.0102) already exceeds it, so `/budget` re-renders at 1020% `[daily]` — a hard breach — and the next marketing cron run is blocked by the budget.*

[![Hermes Agent](https://raw.githubusercontent.com/NousResearch/hermes-agent/HEAD/assets/banner.png)](https://raw.githubusercontent.com/NousResearch/hermes-agent/HEAD/assets/banner.png)

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://camo.githubusercontent.com/08cef40a9105b6526ca22088bc514fbfdbc9aac1ddbf8d4e6c750e3a88a44dca/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d626c75652e737667) [![Tests: 94 passing](https://img.shields.io/badge/Tests-94%20passing-green.svg)](https://camo.githubusercontent.com/89bc4bc6079d0e919e0c1363852fe900e05cb49429800097aa3ca83908c5cd59/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f54657374732d393425323070617373696e672d677265656e2e737667) [![Provider Support](https://img.shields.io/badge/Providers-OpenRouter%20%7C%20OpenAI%20%7C%20Anthropic-orange.svg)](https://camo.githubusercontent.com/cf0938e4acec0cd17c14dcf61a72734ffd03e8fff8eb44e359994f6ea773bfad/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50726f7669646572732d4f70656e526f757465722532302537432532304f70656e4149253230253743253230416e7468726f7069632d6f72616e67652e737667) [![Challenge Entry](https://img.shields.io/badge/Hermes%20Agent-Challenge%20Entry-purple.svg)](https://camo.githubusercontent.com/d0c993fdf35127e435629279025d4b1892e351f5e04ce1547329686aa4223366/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4865726d65732532304167656e742d4368616c6c656e6765253230456e7472792d707572706c652e737667)

-----

Hermes Agent runs autonomously — across sessions, platforms, and cron jobs — which
means it can keep spending even when you're not watching.
**hermes-telemetry lives inside the runtime** and enforces hard budget limits before
the next LLM call is made.

> This plugin addresses [NousResearch/hermes-agent#6642](https://github.com/NousResearch/hermes-agent/issues/6642) —
> the open feature request for a first-class telemetry and budget subsystem for Hermes Agent.

```
Your Hermes session
↓ every API call
hermes-telemetry (native plugin)
→ tracks tokens + cost in real time
→ enforces budget limits mid-session
→ logs to SQLite with WAL mode
→ syncs OpenRouter pricing automatically
↓ if budget OK
LLM provider
```

> **Not a log reader.** TokenTelemetry and similar tools read what already happened.
> hermes-telemetry hooks into the Hermes runtime and can *stop* what’s about to happen.

-----

**Design principle:** observability is invisible to the model. Everything goes through hooks. The only user-facing surface is `/stats` and `/budget`.

-----

## Table of Contents

- [Screenshots](#screenshots)
- [Dashboard (Web UI)](#dashboard-web-ui)
- [Slash Commands](#slash-commands-1)
- [What It Measures](#what-it-measures)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Setup Wizard](#setup-wizard)
- [Dashboard (Web UI)](#dashboard-web-ui-1)
- [Auto-Refresh](#auto-refresh)
- [Features](#features)
- [Slash Commands](#slash-commands-2)
- [/stats](#stats)
- [/budget](#budget)
- [Configuration](#configuration)
- [pricing.yaml](#pricingyaml)
- [budget.yaml](#budgetyaml)
- [Pricing Auto-Refresh](#pricing-auto-refresh)
- [How It Works](#how-it-works)
- [Estimated-Price Models](#estimated-price-models)
- [CLI Usage](#cli-usage)
- [Architecture](#architecture)
- [Hook Pipeline](#hook-pipeline)
- [Database Schema](#database-schema)
- [Concurrency Model](#concurrency-model)
- [Budget Enforcement](#budget-enforcement)
- [How It Works](#how-it-works)
- [Enforcement Levels](#enforcement-levels)
- [Estimated Data and Budget Degradation](#estimated-data-and-budget-degradation)
- [Provider Probe: Verifying Your Provider](#provider-probe-verifying-your-provider)
- [Proof of Concept](#proof-of-concept)
- [Setup](#setup)
- [Pricing Capture](#pricing-capture)
- [Budget Enforcement Test](#budget-enforcement-test)
- [Cron Job Cost Comparison](#cron-job-cost-comparison)
- [Results Summary](#results-summary)
- [Comparison](#comparison)
- [Running Tests](#running-tests)
- [Data Location](#data-location)
- [Known Limitations](#known-limitations)
- [Troubleshooting](#troubleshooting)
- [License](#license)
- [Hermes Agent Challenge](#hermes-agent-challenge)

-----

## Screenshots

### Dashboard (Web UI)

A standalone HTML dashboard for users who prefer a visual interface over slash commands. Served locally, reads directly from the telemetry SQLite database.

[![Dashboard overview](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/dashboard-overview.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/dashboard-overview.png)

*The dashboard auto-refreshes every 30 seconds. Shows sessions, API calls, tokens, cost, budget status, daily cost trends, top tools, cost by cron job, provider distribution, and recent sessions.*

### Slash Commands

#### `/stats` — Session analytics

[![Stats output](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/stats-output.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/stats-output.png)

#### `/budget` — Current spending vs limits

[![Budget output](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/budget-output.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/budget-output.png)

#### `/stats cron week` — Cron job cost breakdown

[![Cron output](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/cron-output.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/cron-output.png)

#### `/stats providers` — Real vs estimated usage + estimated-price warning

[![Providers output](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/providers-output.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/providers-output.png)

-----

## What It Measures

|Metric |Source |Real or Estimated |
|-----------------------------------------|-------------------------------|------------------------|
|Tokens in / out per API call |`post_api_request.usage` |✅ Real (from provider) |
|Cache read / write tokens |`post_api_request.usage` |✅ Real (from provider) |
|Reasoning tokens |`post_api_request.usage` |✅ Real (from provider) |
|API call latency |`post_api_request.api_duration`|✅ Real (ms) |
|Tool call latency & success/failure |`post_tool_call` |✅ Real |
|Session / cron job wall time |`started_at` → `ended_at` |✅ Real |
|Model & provider name |`post_api_request` |✅ Real |
|Platform (cli / cron / telegram / …) |`on_session_start.platform` |✅ Real |
|Cron job ID |Parsed from `session_id` |✅ Real |
|Subagent invocation count |`subagent_stop` hook |✅ Real (proxy) |
|**Cost (USD)** |Local pricing table × tokens |⚠️ **Estimated** |
|Tokens when provider returns `usage=None`|Fallback approximation |⚠️ **Estimated, flagged**|

Cost is always an **estimate** computed from a locally-maintained pricing table. No external pricing API is called. When the provider returns no usage data, tokens are estimated from a pre-request approximation + response length and the row is flagged as `estimated=1`, so `/stats` and `/budget` show a `~` prefix and an “estimated data” percentage.

-----

## Installation

Hermes plugins are **opt-in** — you must both install and enable the plugin.

### Option A: Install from GitHub

```
hermes plugins install nujovich/hermes-telemetry
hermes plugins enable hermes-telemetry
```

### Option B: Manual install

```
git clone https://github.com/nujovich/hermes-telemetry ~/.hermes/plugins/hermes-telemetry
hermes plugins enable hermes-telemetry
```

**Important:** restart the Hermes gateway after enabling:

```
hermes gateway restart
```

> **Note:** Plugin changes only take effect after a gateway restart. The gateway loads the plugin registry at startup. If you enable a plugin and cron jobs don’t appear in `/stats cron week`, this is the most likely cause.

-----

## Quick Start

1. Install and enable the plugin (see above)
1. Restart the gateway
1. Run any session, then type `/stats` to see captured data
1. Optionally configure `pricing.yaml` and `budget.yaml` (see below)

That’s it. The plugin captures data automatically — no agent action required.

-----

## Setup Wizard

hermes-telemetry includes a first-time setup wizard that runs automatically on first
plugin load when `pricing.yaml` and/or `budget.yaml` are missing. It can also be
triggered manually at any time with the `/setup` slash command.

### Auto-setup (first load)

On first load, if either config file is missing, the plugin auto-generates defaults:

- **Pricing:** fetches all models with fixed pricing from the OpenRouter API and merges
them with ~30 built-in defaults (Anthropic, OpenAI, DeepSeek, Google, Meta, Nous).
New prices take effect immediately — no gateway restart needed.
- **Budget:** writes a conservative global budget (`$5.00/day`, `$100.00/month`) with
an 80% soft warning and 100% hard cap.

### `/setup` slash command

Use `/setup` to check configuration status or reconfigure individual files.

```
/setup → show current status (which files exist)
/setup pricing auto → built-in defaults + fetch from OpenRouter API
/setup pricing minimal → built-in defaults only (~30 models, no network)
/setup pricing skip → skip (unrecognized models will record $0.00 cost)
/setup budget default → recommended global budget ($5/day, $100/month)
/setup budget custom → instructions for setting your own limits manually
/setup budget skip → no enforcement (costs still tracked)
```

#### Pricing options

| Option | Models | Network |
|--------|--------|---------|
| `auto` | ~30 built-in + all OpenRouter fixed-price models | Yes (OpenRouter API) |
| `minimal` | ~30 built-in only | No |
| `skip` | None — models will record `$0.00` cost | No |

#### Budget options

| Option | Behavior |
|--------|----------|
| `default` | Global: `$5.00/day`, `$100.00/month`. Soft warning at 80%, hard block at 100% |
| `custom` | Prints the `/budget set` commands for manual configuration |
| `skip` | Costs tracked but never enforced |

### Re-running setup

Setup skips files that already exist. To reconfigure:

```bash
# Reprice from scratch
rm ~/.hermes/telemetry/pricing.yaml
/setup pricing auto

# Reset budget
rm ~/.hermes/telemetry/budget.yaml
/setup budget default
```

> **Note:** Pricing changes take effect immediately without a gateway restart. Budget
> changes require a restart.

-----

## Slash Commands

### `/stats`

```
/stats → last 24h summary (sessions, tokens, cost, top tools)
/stats today → same as /stats
/stats week → last 7 days
/stats month → last 30 days
/stats cron → breakdown by cron_job_id (last 7 days)
/stats cron week → cron breakdown, last 7 days
/stats cron month → cron breakdown, last 30 days
/stats cron today → cron breakdown, last 24 hours
/stats providers → per-provider: real vs estimated calls + cost (last 24h)
/stats providers week → provider breakdown, last 7 days
/stats models → per-model breakdown within each provider (last 24h)
/stats models week → per-model breakdown, last 7 days
/stats raw [N] → last N raw run records (default 20, max 200)
```

**Example output (`/stats`):**

```
hermes-telemetry — last 24 h
============================================
Sessions : 14
Success rate : 92.9% (ok=13, failed=1)
API calls : 47
Tool calls : 183
Tokens in : 1,240,500
Tokens out : 87,300
Cost (est.) : $0.004822
Avg latency : 1.2s
Avg duration : 48.3s

Top tools:
Tool Calls Failures Avg ms
--------------------------------------------------------
read_file 92 0 12ms
terminal 51 3 340ms
write_file 28 0 18ms
```

**Example output (`/stats cron week`):**

```
hermes-telemetry — cron jobs (last 7 days)
========================================================================
Job ID Runs OK Fail Tok-in Tok-out Cost Avg dur
--------------------------------------------------------------------------
09dd0c24f29b 3 3 0 892,341 12,405 $0.314378 2.1m
d68c2728b513 1 1 0 445,119 8,200 $2.225595 4.7m
```

**Example output (`/stats providers`):**

```
hermes-telemetry — providers (last 24 h)
========================================================================
Provider Calls Real Est Est% Cost
-------------------------------------------------------------------
openrouter 66 66 0 0% $0.916782

Est% = share of calls where the provider returned no usage data
(tokens estimated locally).
If Est% > 0 for your main provider, budget hard-verdicts may be
degraded to soft under on_estimated.mode: warn_only.
```

**Example output (`/stats models`):**

```
hermes-telemetry — models (last 24 h)
================================================================================================
Provider Model Calls Real Est Cost
----------------------------------------------------------------------------------------------
openrouter owl-alpha 66 66 0 $0.000000
openrouter anthropic/claude-sonnet-4-6 42 42 0 $0.314378
openrouter anthropic/claude-opus-4-7 8 8 0 $2.225595

Rows are grouped by provider, then by calls (desc). A model showing $0.00 has no price entry
in pricing.yaml — run /setup pricing auto to refresh, or add it manually.
```

Breaks each provider's spend down to individual models. Rows are grouped by provider (ascending), then ordered by call count within each provider; the `Model` column is kept wide so dated model keys stay readable. Columns: `Calls` (total), `Real` (calls with provider-reported usage), `Est` (calls with locally estimated tokens), and `Cost`. A model showing `$0.000000` has no price entry in `pricing.yaml`.

### `/budget`

```
/budget → status of every scope (spent / limit / %)
/budget cron → per-cron-job budgets, with soft/hard flags
/budget set global daily 5.00 → set or raise a limit (persists + hot-reloads)
/budget set cron_job daily 1.00 → set default per-cron-job limit
/budget set sender daily 2.00 → set default per-sender limit
```

**Example output (`/budget`):**

```
hermes-telemetry — budget status
============================================================
global $ 0.1812 / $ 2.00 9% [daily]

Legend: (blank)=ok !=soft (≥80%) █=hard (≥100%) ~est=estimated data
```

**Status flags:**

|Flag |Meaning |
|-------|-----------------------------------------------------------|
|(blank)|Within budget (`< 80%`) |
|`!` |Soft warning (≥ 80%) — notice injected into conversation |
|`█` |Hard breach (≥ 100%) — tool calls blocked, cron jobs paused|
|`~est` |Verdict based partly on estimated (usage=None) data |

-----

## Dashboard (Web UI)

A standalone HTML dashboard for users who prefer a visual interface over slash commands. Zero dependencies — uses only Python stdlib.

### Auto-Refresh

The dashboard auto-refreshes every 30 seconds. No manual reload needed.

### Features

- **Summary cards**: Sessions, OK/failed, API calls, tokens in, cost
- **Budget bar**: Real-time spend vs limit with progress indicator
- **Daily cost chart**: 7-day line chart of spending
- **Top tools chart**: Bar chart of most-used tools
- **Cost by cron job**: Per-job cost breakdown
- **Provider distribution**: Donut chart (nous / openrouter / anthropic)
- **Cron jobs table**: Runs, tokens, cost, avg duration, last run
- **Recent sessions table**: All sessions with platform, model, status, cost
- **Time range selector**: Last 24h / 7 days / 30 days

### Usage

```
cd ~/.hermes/plugins/hermes-telemetry/dashboard
python3 serve.py # http://localhost:8765 (loopback only)
python3 serve.py --port 9090 # custom port, still loopback
python3 serve.py 9090 # positional port (back-compat)
```

Then open `http://localhost:8765` in your browser.

### Accessing the dashboard from another host

The dashboard has **no authentication** — anyone who can reach the port sees
every captured token, cost, and tool-call detail. By default it binds to
`127.0.0.1`, which is unreachable from other machines.

If your Hermes server is headless (Pi, VPS, NAS) and you browse from a laptop,
two options:

**Recommended — SSH tunnel** (no server-side change, leaves the safe default in
place):

```bash
# Start the dashboard on the server as usual
ssh server "cd ~/.hermes/plugins/hermes-telemetry/dashboard && python3 serve.py &"

# Tunnel from your client
ssh -L 8765:localhost:8765 -N server &

# Browse on the client
open http://localhost:8765
```

**Trusted-LAN shortcut — `--host 0.0.0.0`:**

```bash
python3 serve.py --host 0.0.0.0
```

The script prints a warning when binding to any non-loopback interface. Only
use this on a network where you trust every host. **Do not expose to the
public internet or to networks that include untrusted hosts** — the dashboard
ships without an auth layer by design (see CONTRIBUTING.md if you want to add
one).

-----

## Configuration

Configuration lives in `~/.hermes/telemetry/`:

```
~/.hermes/telemetry/
├── telemetry.db ← SQLite database (WAL mode)
├── telemetry.log ← plugin log (errors / debug)
├── pricing.yaml ← optional pricing overrides
└── budget.yaml ← optional spend budgets
```

If these files don’t exist, the plugin still works — it just uses defaults (all models at $0.00, budgets disabled).

### `pricing.yaml`

Override model prices in USD per 1 million tokens. Without overrides, unknown models log a one-time warning and record cost as `$0.00`.

**Full format:**

```yaml
models:
# Free model
"openrouter/owl-alpha":
input: 0.00
output: 0.00

# Paid model with full cache/reasoning split
"openrouter/anthropic/claude-sonnet-4-6":
input: 3.00
output: 15.00
cache_read: 0.30
cache_write: 3.75
reasoning: 15.00

# Minimal override (cache prices derived from multipliers)
"openrouter/anthropic/claude-opus-4-7":
input: 5.00
output: 25.00

defaults:
cache_read_multiplier: 0.10 # cache_read = input * 0.10 if not specified
cache_write_multiplier: 1.25 # cache_write = input * 1.25 if not specified
```

**Matching rules (in order):**

1. Exact match (case-insensitive) against `models:` keys in your YAML
1. Exact match against the built-in pricing table (~35 models)
1. Longest-prefix match (e.g. `claude-sonnet` matches `claude-sonnet-4-6-future`)
1. Unknown → `$0.00` with a one-time warning in `telemetry.log`

The built-in table covers: Anthropic (Claude 3/4 family), OpenAI (GPT-4o, GPT-4, o1, o3, o4), DeepSeek, Gemini, Llama, and Hermes models. Prices sourced from official provider pages (May 2026).

### `budget.yaml`

Configure spend guardrails. No file → budgets disabled.

```yaml
budgets:
global:
daily_usd: 2.00
monthly_usd: 50.00
per_cron_job:
default:
daily_usd: 1.00
overrides:
daily_email_report:
daily_usd: 3.00
per_sender:
default:
daily_usd: 2.00
overrides:
premium_user_123:
daily_usd: 5.00

thresholds:
soft_pct: 0.80 # warn at 80% of limit
hard_pct: 1.00 # enforce at 100%

on_estimated:
mode: enforce # warn_only | enforce
```

**Scope resolution:**

|Scope |How spend is calculated |
|--------------|-------------------------------------------------------------|
|`global` |All sessions + all cron jobs combined |
|`per_cron_job`|Sessions where `cron_job_id` matches (excludes subagent cost)|
|`per_sender` |Sessions from a specific sender (multi-user gateways) |

**Window math:** daily and monthly windows are computed in the user’s local timezone. A cron job that runs at 11:59 PM and another at 12:01 AM count against different daily windows.

-----

## Pricing Auto-Refresh

The plugin can automatically fetch model pricing from OpenRouter’s public API, eliminating the need to manually maintain `pricing.yaml` for hundreds of models.

### How It Works

- **Source**: OpenRouter public API (`https://openrouter.ai/api/v1/models`) — no auth required
- **Frequency**: Once per 24 hours (tracked via sentinel file)
- **Trigger**: Automatically on plugin load (gateway startup), or manually via CLI
- **Merge strategy**:
- User overrides in `pricing.yaml` are **always preserved** — manual entries take priority over auto-fetched ones
- New models from the API are added automatically
- Previously auto-fetched models are updated when prices change
- Models are tagged with `_auto: true` and `_source: openrouter` for traceability

### Estimated-Price Models

Some OpenRouter models have no fixed pricing (e.g. `auto` routing, experimental models). These are represented with negative prices in the API.

The plugin handles these safely:

- Prices are normalized to `$0.00` (they don’t inflate cost calculations)
- Flagged with `_estimated_price: true` in `pricing.yaml`
- The budget engine detects when spend uses these models

**Budget degradation logic:**

|Condition |Effect |
|----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
|`on_estimated.mode: warn_only` (default)|If >0% of calls use estimated-price models, **hard verdicts are degraded to soft** — the user gets a warning but tools aren’t blocked|
|`on_estimated.mode: enforce` |Hard verdicts take effect regardless |

### CLI Usage

```
# Dry run — see what would change
python -m hermes_telemetry.pricing_refresh --check

# Apply changes
python -m hermes_telemetry.pricing_refresh

# Verbose output
python -m hermes_telemetry.pricing_refresh --verbose
```

**Example output:**

```
INFO OpenRouterSource: fetched 320 models
Updated 3 model(s):

~ stepfun/step-3.7-flash (openrouter)
input: 0.9999 → 0.2000
output: 9.9999 → 1.1500

+ anthropic/claude-opus-4.8 (openrouter)
input=5.0000 output=25.0000

⚠ Model(s) with estimated pricing: openrouter/auto, openrouter/bodybuilder, openrouter/pareto-code
```

### Extending with New Sources

Add new pricing providers by subclassing `PricingSource`:

```python
from hermes_telemetry.pricing_refresh import PricingSource, register_source

class AnthropicSource(PricingSource):
name = "anthropic"

def fetch(self) -> dict[str, dict]:
# Fetch from Anthropic's pricing page or API
...

register_source(AnthropicSource)
```

Sources are registered in `pricing_refresh.py` and fetched in parallel on each refresh cycle.

-----

## Architecture

### Hook Plugin

The plugin registers 10 hooks (out of 16 available in Hermes) plus 2 slash commands:

```
Hook Purpose
─────────────────────────────────────────────────────────────
on_session_start Create run row, extract cron_job_id
pre_api_request Stash approx_input_tokens for fallback
post_api_request PRIMARY: record tokens, cost, latency
post_tool_call Record tool name, success, duration
post_llm_call Refresh session end timestamp
subagent_stop Record delegate_task proxy on parent
on_session_end Set final status (ok/error/interrupted)
on_session_finalize Safety net: ensure run is closed
pre_llm_call Soft budget alerts + capture sender_id
pre_tool_call Hard budget enforcement (tool-gate)
```

**Why `post_api_request` is the primary hook for tokens:** The Hermes conversation loop can make multiple API calls per turn (retries, reasoning models, tool calls). Only `post_api_request` carries the canonical `usage` dict with token counts and cost data. `pre_llm_call` fires once per turn with no token data. `post_llm_call` fires after the tool loop with no token data.

**Cron job identification:** There is no `cron_job_id` in any hook. The plugin extracts it from the `session_id`, which follows the format `cron_{job_id}_{YYYYMMDD_HHMMSS}` (confirmed in Hermes source). An anchored regex handles job IDs that contain underscores.

### Database Schema

SQLite with WAL mode, per-thread connections, schema v3:

**`runs`** — one row per session (CLI session or cron job execution):

|Column |Description |
|--------------------------|--------------------------------------------------------------------------------|
|`session_id` |Primary key (`{YYYYMMDD_HHMMSS}_{uuid6}` for CLI, `cron_{job_id}_{ts}` for cron)|
|`platform` |`cli`, `cron`, `telegram`, `discord`, etc. |
|`cron_job_id` |Extracted from session_id when platform=cron |
|`model` |Model name (updated from last API call) |
|`provider` |Provider name (e.g. `openrouter`, `anthropic`) |
|`started_at` / `ended_at` |ISO-8601 UTC timestamps |
|`status` |`running`, `ok`, `error`, `interrupted` |
|`tokens_in` / `tokens_out`|Accumulated across all API calls in the session |
|`cost_usd` |Accumulated estimated cost |
|`duration_ms` |Wall time (ms) via `julianday()` |
|`api_calls` / `tool_calls`|Counters |
|`parent_session_id` |Reserved for future parent-child linking (not populated in v0.2) |
|`estimated_llm_calls` |Count of calls where provider returned `usage=None` |
|`sender_id` |For per-sender budgets (set via `pre_llm_call`) |

**`llm_calls`** — one row per individual API call:

All of `runs` token/cost columns, plus `cache_read_tokens`, `cache_write_tokens`, `reasoning_tokens`, `estimated` (boolean).

**`tool_calls`** — one row per tool execution:

`session_id`, `ts`, `tool_name`, `ok` (boolean), `latency_ms`.

**`budget_alerts`** — anti-spam ledger:

`scope`, `scope_id`, `window`, `period_key`, `level`, `fired_at`, `spent_usd`, `limit_usd`. Unique constraint prevents duplicate alerts.

### Concurrency Model

Cron jobs run in a `ThreadPoolExecutor` (Hermes `cron/scheduler.py`). Multiple jobs can write to the DB simultaneously from different threads.

**Design:** per-thread SQLite connections via `threading.local()`. Each thread opens its own connection to the same WAL-mode DB file. A serializable `_schema_lock` protects DDL migrations on first connect (WAL mode switch requires a brief lock that `busy_timeout` alone doesn’t handle).

`busy_timeout=5000` ensures write collisions retry for 5 seconds before raising. `synchronous=NORMAL` balances durability with write performance (safe for WAL mode).

-----

## Budget Enforcement

> See the budget enforcement demo at the top of this README for an end-to-end walkthrough.

### How It Works

Every time the agent is about to do work, the plugin checks:

1. **`pre_llm_call`** (fires once per turn): evaluates all applicable budget scopes. If any has a `soft` or `hard` verdict that hasn’t been alerted yet this window, injects a one-time notice into the conversation context (anti-spam via `budget_alerts` table). Captures `sender_id`.
1. **`pre_tool_call`** (fires before every tool): re-evaluates budgets. If any scope is in `hard` breach, returns `{"action":"block","message":...}` which aborts the tool call.
1. **For cron jobs with `hard` breach:** additionally calls `cron.jobs.pause_job` to pause future runs.

### Enforcement Levels

Hermes does **not** expose a way to abort an in-flight model call from a plugin. `pre_llm_call` / `pre_api_request` returns can’t cancel a call. So enforcement is honest about its reach:

|Level |Trigger |Effect |Repeat? |
|-----------------------|-----------------------------------------|------------------------------------------|-----------------------------------|
|**Soft** (≥ `soft_pct`)|Spend reaches 80% of limit (configurable)|One-time notice injected into conversation|Once per window per scope |
|**Hard** (≥ `hard_pct`)|Spend reaches 100% of limit |Every subsequent tool call is blocked |Every tool call until window resets|
|**Cron pause** |Any hard `cron_job` verdict |Job is paused for future runs |Once per window per scope |

The model response already in flight still completes and is billed. What’s prevented is *further* tool-driven work.

### Estimated Data and Budget Degradation

When the provider returns `usage=None`, the plugin estimates tokens and flags the row as `estimated=1`. Since these estimates may be inaccurate, the budget engine offers a safety valve:

**`on_estimated.mode: warn_only` (default):** If a hard verdict rests partly on estimated rows, it is **degraded to soft** — the user gets a warning but tools aren’t blocked. Rationale: a budget built on estimates shouldn’t hard-stop work.

**`on_estimated.mode: enforce`:** Hard verdicts take effect regardless of estimate quality. Use this when you trust your provider’s usage data (Est% = 0) or when estimates are acceptable.

The `/stats providers` command shows the `Est%` column so you can see at a glance whether your provider returns real usage data.

**Estimated-price models:** Some models (e.g. OpenRouter `auto` routing) have no fixed pricing. These are flagged with `_estimated_price: true` in `pricing.yaml` and normalized to `$0.00`. If >0% of calls use these models, budget hard-verdicts are also degraded to soft under `warn_only` mode. See [Pricing Auto-Refresh](#pricing-auto-refresh) for details.

-----

## Provider Probe: Verifying Your Provider Returns Real Usage

Run this **once** after enabling the plugin:

1. Run one short session (any minimal task works)
1. Execute `/stats providers`
1. Look at the `Est%` column for your provider:
- **`0%`** → provider returns real usage data. Budget verdicts are based on real numbers. Set `on_estimated.mode: enforce` for strict enforcement. ✅
- **`> 0%`** → provider omits usage in some responses. Those calls are estimated and flagged. Budget hard-verdicts will be degraded to soft under `warn_only`. The `telemetry.log` will have a **one-time WARNING** per provider. ⚠️

-----

## Proof of Concept

The following PoC was executed live to validate the plugin end-to-end.

### Setup

- **Hermes gateway** running on Linux (WSL), model `openrouter/owl-alpha` (free tier)
- **Plugin:** hermes-telemetry v0.2.0, loaded in gateway process
- **DB:** `/home/nujovich/.hermes/telemetry/telemetry.db` (schema v3, WAL mode)
- **6 cron jobs** configured, 2 used for this PoC

### Pricing Capture

Added models to `~/.hermes/telemetry/pricing.yaml`:

```yaml
models:
"openrouter/owl-alpha":
input: 0.00
output: 0.00
"openrouter/anthropic/claude-sonnet-4-6":
input: 3.00
output: 15.00
cache_read: 0.30
cache_write: 3.75
"openrouter/anthropic/claude-opus-4-7":
input: 5.00
output: 25.00
cache_read: 0.50
cache_write: 6.25
```

Set `on_estimated.mode: enforce` for deterministic enforcement.

### Budget Enforcement Test

**Step 1 — Trigger a hard breach:**

- Budget: `global.daily_usd: 0.001` ($0.001/day)
- Ran MCP Lead Gen job (model: `claude-sonnet-4-6`, ~$3/$15 per 1M)
- Result: job spent $0.1812 on first run → **18,120% of daily limit** → █ hard breach → **job auto-paused**

```
█ global $0.1812 / $0.00 18120% [daily]
↑ (0.001 rounded to 0.00 in display)
```

**Step 2 — Raise budget and resume:**

```
/budget set global daily 2.00
```

Result after `/budget set`:

```
global $0.1812 / $2.00 9% [daily]
```

**Step 3 — Verify job runs normally:**

- MCP Lead Gen re-ran successfully under the $2.00 daily budget
- Second run confirmed: `state: scheduled`, `paused_at: null`

### Cron Job Cost Comparison

|Job |Model |Price (input/output) |
|--------------------|-------------------|---------------------|
|MCP Lead Gen |`claude-sonnet-4-6`|$3.00 / $15.00 per 1M|
|Marketing Highlights|`claude-opus-4-7` |$5.00 / $25.00 per 1M|
|Base sessions (CLI) |`owl-alpha` |$0.00 / $0.00 (free) |

**Results from SQLite (`/stats` after all runs):**

- **CLI sessions** (owl-alpha, free): ~1M tokens in → **$0.00**
- **MCP Lead Gen** (claude-sonnet-4-6): ~892K tokens in → **$0.314**
- **Marketing Highlights** (claude-opus-4-7): ~445K tokens in → **$2.23** (opus is ~5-8x more expensive per token)

### Results Summary

|Component |Status |
|-------------------------------------|---------------------------------------------------|
|Token capture from provider |✅ Real usage (`estimated=0`) |
|Cost estimation with pricing table |✅ Accurate to pricing YAML |
|Cron job session tracking |✅ Captured via `session_id` regex |
|Budget soft alerts |✅ One-time context injection |
|Budget hard enforcement |✅ Paused job at $0.001/day |
|Budget hot-reload via `/budget set` |✅ Cache cleared, new limit active |
|Multi-model cost comparison |✅ Sonnet vs Opus vs Free |
|Pricing auto-refresh (OpenRouter API)|✅ 320 models fetched, manual overrides preserved |
|Estimated-price model handling |✅ Negative prices → $0.00, budget degradation |
|Dashboard (HTML, auto-refresh 30s) |✅ Charts, tables, budget bar, provider distribution|
|94 tests pass |✅ |

-----

## Comparison

| |hermes-telemetry|TokenTelemetry |Martin Loop |
|------------------|----------------|---------------------|--------------------|
|Hermes-native |✅ Native plugin |❌ Reads external logs|❌ No Hermes support |
|Budget enforcement|✅ Stops the run |❌ Observe only |✅ But not for Hermes|
|Real-time |✅ Pre-call |❌ Post-hoc |✅ Pre-attempt |
|Requires Hermes |✅ Hermes only |Any agent |Claude Code / Codex |
|Local dashboard |✅ |✅ (more complete) |❌ |
|Open source |✅ MIT |✅ MIT |✅ MIT |

**When to use TokenTelemetry instead:** if you need a multi-agent dashboard (Claude Code + Codex + Hermes in one place), TokenTelemetry is the right choice. hermes-telemetry is purpose-built for Hermes operators who need budget enforcement, not just visibility.

-----

## Running Tests

```
cd hermes-telemetry
pip install pytest pyyaml
pytest tests/ -v
```

**Test suite (94 tests):**

|File |Tests|Coverage |
|---------------------------------|-----|-------------------------------------------------------------------------------------------------------------------------------|
|`test_db.py` |15 |Schema v1→v3 migrations, CRUD, aggregations, concurrent WAL writes (10 threads × 5 writes) |
|`test_pricing.py` |17 |Cache/reasoning split, no double-counting of `prompt_tokens`, YAML overrides, prefix matching, unknown model handling |
|`test_init.py` |6 |Cron session ID regex, tool success/failure parsing |
|`test_budget.py` |17 |ok/soft/hard verdicts, estimated-to-soft degradation, anti-spam ledger, cron pause, per-scope routing, `/budget set` hot-reload|
|`test_stats_providers.py` |8 |Real vs estimated per provider, `/stats providers` output format, Nous warning dedup |
|`test_subagent_reconciliation.py`|4 |Parent + child hook sequence, token reconciliation, no double-counting |

No live Hermes is required — all tests are self-contained with in-memory SQLite.

-----

## Data Location

```
~/.hermes/telemetry/
├── telemetry.db ← SQLite (WAL mode, ~70KB base + growth)
├── telemetry.log ← Plugin log (errors, debug, one-time warnings)
├── pricing.yaml ← Your model price overrides
└── budget.yaml ← Your spend guardrails
```

The DB grows over time. For high-frequency cron jobs, consider periodic cleanup of old rows (not yet automated — see [Known Limitations](#known-limitations)).

-----

## Known Limitations

**Enforcement gaps:**

- **No true mid-call abort.** `pre_llm_call` / `pre_api_request` cannot cancel an in-flight model call. The response that’s already generating will complete and be billed. The tool-gate (`pre_tool_call`) stops *subsequent* work at the next tool boundary.
- **Runaway text-only sessions.** A session that generates text without calling any tools never hits the tool-gate. If this becomes a problem, a pre-flight check in `on_session_start` for cron jobs could abort before the first LLM call.

**Subagent attribution:**

- Child agents (`delegate_task`) run as their own sessions. Their tokens are captured independently and included in **global** totals. But there is no parent→child link in any hook — so `per_cron_job` budgets **exclude** subagent cost. Use the `global` budget for a cap that captures delegated work.

**Pricing refresh only for OpenRouter models:**

- `pricing.yaml` is updated with OpenRouter models via OpenRouter API, preserving those entered manually by the user.

**DB retention:**

- `telemetry.db` grows without bound. No automatic purge of old rows. For >100K rows, consider manual cleanup or a retention policy (not yet implemented).

**Gateway restart required:**

- Enabling the plugin takes effect only after gateway restart. Cron runs that started before the restart won’t have telemetry.

-----

## Troubleshooting

**`/stats cron week` shows “No cron runs in the last 7 days”:**

The gateway loaded before the plugin was enabled. Restart the gateway:

```
hermes gateway restart
```

Then re-run a cron job.

**`/budget` shows `$0.00` as the limit:**

The limit is cached in memory at gateway start. If you edited `budget.yaml` directly, the cache is stale. Use `/budget set global daily ` to hot-reload, or restart the gateway.

**Cost is $0.00 for all sessions:**

Your model isn’t in the pricing table. Check `telemetry.log` for a one-time warning like:

```
hermes-telemetry: unknown model 'openrouter/some-model' — cost recorded as $0.00
```

Add it to `pricing.yaml`.

**Provider Est% > 0:**

Your provider returns `usage=None` for some/all calls. Tokens are estimated. Check `/stats providers` to see which providers are affected. If Est% is 100% for your main provider, all spend is estimated and budget hard-verdicts degrade to soft under `warn_only` mode.

**Plugin not loading at all:**

Check `telemetry.log` for errors. Common causes:

- Missing `pyyaml` in the gateway’s venv: `pip install pyyaml`
- Plugin not in `plugins.enabled` in config.yaml
- Syntax error in `pricing.yaml` or `budget.yaml`

-----

## License

MIT — see [LICENSE](https://github.com/nujovich/hermes-telemetry/blob/main/LICENSE).

-----

## Hermes Agent Challenge

This plugin was built for the [**Hermes Agent Challenge**](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd) — a $1,000 competition to build the most useful Hermes Agent plugins and extensions.

**🔗 Challenge Entry:** [hermes-telemetry on dev.to](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd)

**🛠️ Built by:** [Nadia Ujovich](https://github.com/nujovich)

**💡 Why this plugin:** Every AI system needs observability and cost control. This plugin gives Hermes Agent users the visibility to optimize their workflows and the guardrails to prevent bill shock — essential for production deployments and automated cron jobs.

-----

*Made with ☕ for the Hermes Agent ecosystem*