https://github.com/nujovich/hermes-telemetry
Budget enforcement + observability plugin for Hermes Agent. Stops runaway costs before they happen.
https://github.com/nujovich/hermes-telemetry
agent-telemetry ai-cost-tracking budget-enforcement hermes-agent hermes-plugin llm-budget llm-observability token-tracking
Last synced: 5 days ago
JSON representation
Budget enforcement + observability plugin for Hermes Agent. Stops runaway costs before they happen.
- Host: GitHub
- URL: https://github.com/nujovich/hermes-telemetry
- Owner: nujovich
- License: mit
- Created: 2026-05-31T06:51:19.000Z (15 days ago)
- Default Branch: main
- Last Pushed: 2026-06-07T18:50:44.000Z (8 days ago)
- Last Synced: 2026-06-07T20:20:36.429Z (7 days ago)
- Topics: agent-telemetry, ai-cost-tracking, budget-enforcement, hermes-agent, hermes-plugin, llm-budget, llm-observability, token-tracking
- Language: Python
- Homepage:
- Size: 782 KB
- Stars: 7
- Watchers: 0
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# hermes-telemetry ☤
> *Observability + budget guardrails for [Hermes Agent](https://github.com/NousResearch/hermes-agent)*
**Budget enforcement + observability for Hermes Agent. The only plugin that can stop a run before it overspends.**
A comprehensive telemetry plugin that captures real usage data, enforces budget limits, and provides detailed cost analysis for AI agent operations. Built for the [Hermes Agent Challenge](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd) by [Nadia Ujovich](https://nadiaujovich.dev).
**The differentiator: it can _stop_ work that's about to overspend — not just report it after the fact.** Set a daily cap below current spend, and the next cron run is blocked by the budget:

*`/budget set global daily 0.001` writes the cap to `budget.yaml`; current spend ($0.0102) already exceeds it, so `/budget` re-renders at 1020% `[daily]` — a hard breach — and the next marketing cron run is blocked by the budget.*
[](https://raw.githubusercontent.com/NousResearch/hermes-agent/HEAD/assets/banner.png)
[](https://camo.githubusercontent.com/08cef40a9105b6526ca22088bc514fbfdbc9aac1ddbf8d4e6c750e3a88a44dca/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d626c75652e737667) [](https://camo.githubusercontent.com/89bc4bc6079d0e919e0c1363852fe900e05cb49429800097aa3ca83908c5cd59/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f54657374732d393425323070617373696e672d677265656e2e737667) [](https://camo.githubusercontent.com/cf0938e4acec0cd17c14dcf61a72734ffd03e8fff8eb44e359994f6ea773bfad/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50726f7669646572732d4f70656e526f757465722532302537432532304f70656e4149253230253743253230416e7468726f7069632d6f72616e67652e737667) [](https://camo.githubusercontent.com/d0c993fdf35127e435629279025d4b1892e351f5e04ce1547329686aa4223366/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4865726d65732532304167656e742d4368616c6c656e6765253230456e7472792d707572706c652e737667)
-----
Hermes Agent runs autonomously — across sessions, platforms, and cron jobs — which
means it can keep spending even when you're not watching.
**hermes-telemetry lives inside the runtime** and enforces hard budget limits before
the next LLM call is made.
> This plugin addresses [NousResearch/hermes-agent#6642](https://github.com/NousResearch/hermes-agent/issues/6642) —
> the open feature request for a first-class telemetry and budget subsystem for Hermes Agent.
```
Your Hermes session
↓ every API call
hermes-telemetry (native plugin)
→ tracks tokens + cost in real time
→ enforces budget limits mid-session
→ logs to SQLite with WAL mode
→ syncs OpenRouter pricing automatically
↓ if budget OK
LLM provider
```
> **Not a log reader.** TokenTelemetry and similar tools read what already happened.
> hermes-telemetry hooks into the Hermes runtime and can *stop* what’s about to happen.
-----
**Design principle:** observability is invisible to the model. Everything goes through hooks. The only user-facing surface is `/stats` and `/budget`.
-----
## Table of Contents
- [Screenshots](#screenshots)
- [Dashboard (Web UI)](#dashboard-web-ui)
- [Slash Commands](#slash-commands-1)
- [What It Measures](#what-it-measures)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Setup Wizard](#setup-wizard)
- [Dashboard (Web UI)](#dashboard-web-ui-1)
- [Auto-Refresh](#auto-refresh)
- [Features](#features)
- [Slash Commands](#slash-commands-2)
- [/stats](#stats)
- [/budget](#budget)
- [Configuration](#configuration)
- [pricing.yaml](#pricingyaml)
- [budget.yaml](#budgetyaml)
- [Pricing Auto-Refresh](#pricing-auto-refresh)
- [How It Works](#how-it-works)
- [Estimated-Price Models](#estimated-price-models)
- [CLI Usage](#cli-usage)
- [Architecture](#architecture)
- [Hook Pipeline](#hook-pipeline)
- [Database Schema](#database-schema)
- [Concurrency Model](#concurrency-model)
- [Budget Enforcement](#budget-enforcement)
- [How It Works](#how-it-works)
- [Enforcement Levels](#enforcement-levels)
- [Estimated Data and Budget Degradation](#estimated-data-and-budget-degradation)
- [Provider Probe: Verifying Your Provider](#provider-probe-verifying-your-provider)
- [Proof of Concept](#proof-of-concept)
- [Setup](#setup)
- [Pricing Capture](#pricing-capture)
- [Budget Enforcement Test](#budget-enforcement-test)
- [Cron Job Cost Comparison](#cron-job-cost-comparison)
- [Results Summary](#results-summary)
- [Comparison](#comparison)
- [Running Tests](#running-tests)
- [Data Location](#data-location)
- [Known Limitations](#known-limitations)
- [Troubleshooting](#troubleshooting)
- [License](#license)
- [Hermes Agent Challenge](#hermes-agent-challenge)
-----
## Screenshots
### Dashboard (Web UI)
A standalone HTML dashboard for users who prefer a visual interface over slash commands. Served locally, reads directly from the telemetry SQLite database.
[](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/dashboard-overview.png)
*The dashboard auto-refreshes every 30 seconds. Shows sessions, API calls, tokens, cost, budget status, daily cost trends, top tools, cost by cron job, provider distribution, and recent sessions.*
### Slash Commands
#### `/stats` — Session analytics
[](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/stats-output.png)
#### `/budget` — Current spending vs limits
[](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/budget-output.png)
#### `/stats cron week` — Cron job cost breakdown
[](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/cron-output.png)
#### `/stats providers` — Real vs estimated usage + estimated-price warning
[](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/providers-output.png)
-----
## What It Measures
|Metric |Source |Real or Estimated |
|-----------------------------------------|-------------------------------|------------------------|
|Tokens in / out per API call |`post_api_request.usage` |✅ Real (from provider) |
|Cache read / write tokens |`post_api_request.usage` |✅ Real (from provider) |
|Reasoning tokens |`post_api_request.usage` |✅ Real (from provider) |
|API call latency |`post_api_request.api_duration`|✅ Real (ms) |
|Tool call latency & success/failure |`post_tool_call` |✅ Real |
|Session / cron job wall time |`started_at` → `ended_at` |✅ Real |
|Model & provider name |`post_api_request` |✅ Real |
|Platform (cli / cron / telegram / …) |`on_session_start.platform` |✅ Real |
|Cron job ID |Parsed from `session_id` |✅ Real |
|Subagent invocation count |`subagent_stop` hook |✅ Real (proxy) |
|**Cost (USD)** |Local pricing table × tokens |⚠️ **Estimated** |
|Tokens when provider returns `usage=None`|Fallback approximation |⚠️ **Estimated, flagged**|
Cost is always an **estimate** computed from a locally-maintained pricing table. No external pricing API is called. When the provider returns no usage data, tokens are estimated from a pre-request approximation + response length and the row is flagged as `estimated=1`, so `/stats` and `/budget` show a `~` prefix and an “estimated data” percentage.
-----
## Installation
Hermes plugins are **opt-in** — you must both install and enable the plugin.
### Option A: Install from GitHub
```
hermes plugins install nujovich/hermes-telemetry
hermes plugins enable hermes-telemetry
```
### Option B: Manual install
```
git clone https://github.com/nujovich/hermes-telemetry ~/.hermes/plugins/hermes-telemetry
hermes plugins enable hermes-telemetry
```
**Important:** restart the Hermes gateway after enabling:
```
hermes gateway restart
```
> **Note:** Plugin changes only take effect after a gateway restart. The gateway loads the plugin registry at startup. If you enable a plugin and cron jobs don’t appear in `/stats cron week`, this is the most likely cause.
-----
## Quick Start
1. Install and enable the plugin (see above)
1. Restart the gateway
1. Run any session, then type `/stats` to see captured data
1. Optionally configure `pricing.yaml` and `budget.yaml` (see below)
That’s it. The plugin captures data automatically — no agent action required.
-----
## Setup Wizard
hermes-telemetry includes a first-time setup wizard that runs automatically on first
plugin load when `pricing.yaml` and/or `budget.yaml` are missing. It can also be
triggered manually at any time with the `/setup` slash command.
### Auto-setup (first load)
On first load, if either config file is missing, the plugin auto-generates defaults:
- **Pricing:** fetches all models with fixed pricing from the OpenRouter API and merges
them with ~30 built-in defaults (Anthropic, OpenAI, DeepSeek, Google, Meta, Nous).
New prices take effect immediately — no gateway restart needed.
- **Budget:** writes a conservative global budget (`$5.00/day`, `$100.00/month`) with
an 80% soft warning and 100% hard cap.
### `/setup` slash command
Use `/setup` to check configuration status or reconfigure individual files.
```
/setup → show current status (which files exist)
/setup pricing auto → built-in defaults + fetch from OpenRouter API
/setup pricing minimal → built-in defaults only (~30 models, no network)
/setup pricing skip → skip (unrecognized models will record $0.00 cost)
/setup budget default → recommended global budget ($5/day, $100/month)
/setup budget custom → instructions for setting your own limits manually
/setup budget skip → no enforcement (costs still tracked)
```
#### Pricing options
| Option | Models | Network |
|--------|--------|---------|
| `auto` | ~30 built-in + all OpenRouter fixed-price models | Yes (OpenRouter API) |
| `minimal` | ~30 built-in only | No |
| `skip` | None — models will record `$0.00` cost | No |
#### Budget options
| Option | Behavior |
|--------|----------|
| `default` | Global: `$5.00/day`, `$100.00/month`. Soft warning at 80%, hard block at 100% |
| `custom` | Prints the `/budget set` commands for manual configuration |
| `skip` | Costs tracked but never enforced |
### Re-running setup
Setup skips files that already exist. To reconfigure:
```bash
# Reprice from scratch
rm ~/.hermes/telemetry/pricing.yaml
/setup pricing auto
# Reset budget
rm ~/.hermes/telemetry/budget.yaml
/setup budget default
```
> **Note:** Pricing changes take effect immediately without a gateway restart. Budget
> changes require a restart.
-----
## Slash Commands
### `/stats`
```
/stats → last 24h summary (sessions, tokens, cost, top tools)
/stats today → same as /stats
/stats week → last 7 days
/stats month → last 30 days
/stats cron → breakdown by cron_job_id (last 7 days)
/stats cron week → cron breakdown, last 7 days
/stats cron month → cron breakdown, last 30 days
/stats cron today → cron breakdown, last 24 hours
/stats providers → per-provider: real vs estimated calls + cost (last 24h)
/stats providers week → provider breakdown, last 7 days
/stats models → per-model breakdown within each provider (last 24h)
/stats models week → per-model breakdown, last 7 days
/stats raw [N] → last N raw run records (default 20, max 200)
```
**Example output (`/stats`):**
```
hermes-telemetry — last 24 h
============================================
Sessions : 14
Success rate : 92.9% (ok=13, failed=1)
API calls : 47
Tool calls : 183
Tokens in : 1,240,500
Tokens out : 87,300
Cost (est.) : $0.004822
Avg latency : 1.2s
Avg duration : 48.3s
Top tools:
Tool Calls Failures Avg ms
--------------------------------------------------------
read_file 92 0 12ms
terminal 51 3 340ms
write_file 28 0 18ms
```
**Example output (`/stats cron week`):**
```
hermes-telemetry — cron jobs (last 7 days)
========================================================================
Job ID Runs OK Fail Tok-in Tok-out Cost Avg dur
--------------------------------------------------------------------------
09dd0c24f29b 3 3 0 892,341 12,405 $0.314378 2.1m
d68c2728b513 1 1 0 445,119 8,200 $2.225595 4.7m
```
**Example output (`/stats providers`):**
```
hermes-telemetry — providers (last 24 h)
========================================================================
Provider Calls Real Est Est% Cost
-------------------------------------------------------------------
openrouter 66 66 0 0% $0.916782
Est% = share of calls where the provider returned no usage data
(tokens estimated locally).
If Est% > 0 for your main provider, budget hard-verdicts may be
degraded to soft under on_estimated.mode: warn_only.
```
**Example output (`/stats models`):**
```
hermes-telemetry — models (last 24 h)
================================================================================================
Provider Model Calls Real Est Cost
----------------------------------------------------------------------------------------------
openrouter owl-alpha 66 66 0 $0.000000
openrouter anthropic/claude-sonnet-4-6 42 42 0 $0.314378
openrouter anthropic/claude-opus-4-7 8 8 0 $2.225595
Rows are grouped by provider, then by calls (desc). A model showing $0.00 has no price entry
in pricing.yaml — run /setup pricing auto to refresh, or add it manually.
```
Breaks each provider's spend down to individual models. Rows are grouped by provider (ascending), then ordered by call count within each provider; the `Model` column is kept wide so dated model keys stay readable. Columns: `Calls` (total), `Real` (calls with provider-reported usage), `Est` (calls with locally estimated tokens), and `Cost`. A model showing `$0.000000` has no price entry in `pricing.yaml`.
### `/budget`
```
/budget → status of every scope (spent / limit / %)
/budget cron → per-cron-job budgets, with soft/hard flags
/budget set global daily 5.00 → set or raise a limit (persists + hot-reloads)
/budget set cron_job daily 1.00 → set default per-cron-job limit
/budget set sender daily 2.00 → set default per-sender limit
```
**Example output (`/budget`):**
```
hermes-telemetry — budget status
============================================================
global $ 0.1812 / $ 2.00 9% [daily]
Legend: (blank)=ok !=soft (≥80%) █=hard (≥100%) ~est=estimated data
```
**Status flags:**
|Flag |Meaning |
|-------|-----------------------------------------------------------|
|(blank)|Within budget (`< 80%`) |
|`!` |Soft warning (≥ 80%) — notice injected into conversation |
|`█` |Hard breach (≥ 100%) — tool calls blocked, cron jobs paused|
|`~est` |Verdict based partly on estimated (usage=None) data |
-----
## Dashboard (Web UI)
A standalone HTML dashboard for users who prefer a visual interface over slash commands. Zero dependencies — uses only Python stdlib.
### Auto-Refresh
The dashboard auto-refreshes every 30 seconds. No manual reload needed.
### Features
- **Summary cards**: Sessions, OK/failed, API calls, tokens in, cost
- **Budget bar**: Real-time spend vs limit with progress indicator
- **Daily cost chart**: 7-day line chart of spending
- **Top tools chart**: Bar chart of most-used tools
- **Cost by cron job**: Per-job cost breakdown
- **Provider distribution**: Donut chart (nous / openrouter / anthropic)
- **Cron jobs table**: Runs, tokens, cost, avg duration, last run
- **Recent sessions table**: All sessions with platform, model, status, cost
- **Time range selector**: Last 24h / 7 days / 30 days
### Usage
```
cd ~/.hermes/plugins/hermes-telemetry/dashboard
python3 serve.py # http://localhost:8765 (loopback only)
python3 serve.py --port 9090 # custom port, still loopback
python3 serve.py 9090 # positional port (back-compat)
```
Then open `http://localhost:8765` in your browser.
### Accessing the dashboard from another host
The dashboard has **no authentication** — anyone who can reach the port sees
every captured token, cost, and tool-call detail. By default it binds to
`127.0.0.1`, which is unreachable from other machines.
If your Hermes server is headless (Pi, VPS, NAS) and you browse from a laptop,
two options:
**Recommended — SSH tunnel** (no server-side change, leaves the safe default in
place):
```bash
# Start the dashboard on the server as usual
ssh server "cd ~/.hermes/plugins/hermes-telemetry/dashboard && python3 serve.py &"
# Tunnel from your client
ssh -L 8765:localhost:8765 -N server &
# Browse on the client
open http://localhost:8765
```
**Trusted-LAN shortcut — `--host 0.0.0.0`:**
```bash
python3 serve.py --host 0.0.0.0
```
The script prints a warning when binding to any non-loopback interface. Only
use this on a network where you trust every host. **Do not expose to the
public internet or to networks that include untrusted hosts** — the dashboard
ships without an auth layer by design (see CONTRIBUTING.md if you want to add
one).
-----
## Configuration
Configuration lives in `~/.hermes/telemetry/`:
```
~/.hermes/telemetry/
├── telemetry.db ← SQLite database (WAL mode)
├── telemetry.log ← plugin log (errors / debug)
├── pricing.yaml ← optional pricing overrides
└── budget.yaml ← optional spend budgets
```
If these files don’t exist, the plugin still works — it just uses defaults (all models at $0.00, budgets disabled).
### `pricing.yaml`
Override model prices in USD per 1 million tokens. Without overrides, unknown models log a one-time warning and record cost as `$0.00`.
**Full format:**
```yaml
models:
# Free model
"openrouter/owl-alpha":
input: 0.00
output: 0.00
# Paid model with full cache/reasoning split
"openrouter/anthropic/claude-sonnet-4-6":
input: 3.00
output: 15.00
cache_read: 0.30
cache_write: 3.75
reasoning: 15.00
# Minimal override (cache prices derived from multipliers)
"openrouter/anthropic/claude-opus-4-7":
input: 5.00
output: 25.00
defaults:
cache_read_multiplier: 0.10 # cache_read = input * 0.10 if not specified
cache_write_multiplier: 1.25 # cache_write = input * 1.25 if not specified
```
**Matching rules (in order):**
1. Exact match (case-insensitive) against `models:` keys in your YAML
1. Exact match against the built-in pricing table (~35 models)
1. Longest-prefix match (e.g. `claude-sonnet` matches `claude-sonnet-4-6-future`)
1. Unknown → `$0.00` with a one-time warning in `telemetry.log`
The built-in table covers: Anthropic (Claude 3/4 family), OpenAI (GPT-4o, GPT-4, o1, o3, o4), DeepSeek, Gemini, Llama, and Hermes models. Prices sourced from official provider pages (May 2026).
### `budget.yaml`
Configure spend guardrails. No file → budgets disabled.
```yaml
budgets:
global:
daily_usd: 2.00
monthly_usd: 50.00
per_cron_job:
default:
daily_usd: 1.00
overrides:
daily_email_report:
daily_usd: 3.00
per_sender:
default:
daily_usd: 2.00
overrides:
premium_user_123:
daily_usd: 5.00
thresholds:
soft_pct: 0.80 # warn at 80% of limit
hard_pct: 1.00 # enforce at 100%
on_estimated:
mode: enforce # warn_only | enforce
```
**Scope resolution:**
|Scope |How spend is calculated |
|--------------|-------------------------------------------------------------|
|`global` |All sessions + all cron jobs combined |
|`per_cron_job`|Sessions where `cron_job_id` matches (excludes subagent cost)|
|`per_sender` |Sessions from a specific sender (multi-user gateways) |
**Window math:** daily and monthly windows are computed in the user’s local timezone. A cron job that runs at 11:59 PM and another at 12:01 AM count against different daily windows.
-----
## Pricing Auto-Refresh
The plugin can automatically fetch model pricing from OpenRouter’s public API, eliminating the need to manually maintain `pricing.yaml` for hundreds of models.
### How It Works
- **Source**: OpenRouter public API (`https://openrouter.ai/api/v1/models`) — no auth required
- **Frequency**: Once per 24 hours (tracked via sentinel file)
- **Trigger**: Automatically on plugin load (gateway startup), or manually via CLI
- **Merge strategy**:
- User overrides in `pricing.yaml` are **always preserved** — manual entries take priority over auto-fetched ones
- New models from the API are added automatically
- Previously auto-fetched models are updated when prices change
- Models are tagged with `_auto: true` and `_source: openrouter` for traceability
### Estimated-Price Models
Some OpenRouter models have no fixed pricing (e.g. `auto` routing, experimental models). These are represented with negative prices in the API.
The plugin handles these safely:
- Prices are normalized to `$0.00` (they don’t inflate cost calculations)
- Flagged with `_estimated_price: true` in `pricing.yaml`
- The budget engine detects when spend uses these models
**Budget degradation logic:**
|Condition |Effect |
|----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
|`on_estimated.mode: warn_only` (default)|If >0% of calls use estimated-price models, **hard verdicts are degraded to soft** — the user gets a warning but tools aren’t blocked|
|`on_estimated.mode: enforce` |Hard verdicts take effect regardless |
### CLI Usage
```
# Dry run — see what would change
python -m hermes_telemetry.pricing_refresh --check
# Apply changes
python -m hermes_telemetry.pricing_refresh
# Verbose output
python -m hermes_telemetry.pricing_refresh --verbose
```
**Example output:**
```
INFO OpenRouterSource: fetched 320 models
Updated 3 model(s):
~ stepfun/step-3.7-flash (openrouter)
input: 0.9999 → 0.2000
output: 9.9999 → 1.1500
+ anthropic/claude-opus-4.8 (openrouter)
input=5.0000 output=25.0000
⚠ Model(s) with estimated pricing: openrouter/auto, openrouter/bodybuilder, openrouter/pareto-code
```
### Extending with New Sources
Add new pricing providers by subclassing `PricingSource`:
```python
from hermes_telemetry.pricing_refresh import PricingSource, register_source
class AnthropicSource(PricingSource):
name = "anthropic"
def fetch(self) -> dict[str, dict]:
# Fetch from Anthropic's pricing page or API
...
register_source(AnthropicSource)
```
Sources are registered in `pricing_refresh.py` and fetched in parallel on each refresh cycle.
-----
## Architecture
### Hook Plugin
The plugin registers 10 hooks (out of 16 available in Hermes) plus 2 slash commands:
```
Hook Purpose
─────────────────────────────────────────────────────────────
on_session_start Create run row, extract cron_job_id
pre_api_request Stash approx_input_tokens for fallback
post_api_request PRIMARY: record tokens, cost, latency
post_tool_call Record tool name, success, duration
post_llm_call Refresh session end timestamp
subagent_stop Record delegate_task proxy on parent
on_session_end Set final status (ok/error/interrupted)
on_session_finalize Safety net: ensure run is closed
pre_llm_call Soft budget alerts + capture sender_id
pre_tool_call Hard budget enforcement (tool-gate)
```
**Why `post_api_request` is the primary hook for tokens:** The Hermes conversation loop can make multiple API calls per turn (retries, reasoning models, tool calls). Only `post_api_request` carries the canonical `usage` dict with token counts and cost data. `pre_llm_call` fires once per turn with no token data. `post_llm_call` fires after the tool loop with no token data.
**Cron job identification:** There is no `cron_job_id` in any hook. The plugin extracts it from the `session_id`, which follows the format `cron_{job_id}_{YYYYMMDD_HHMMSS}` (confirmed in Hermes source). An anchored regex handles job IDs that contain underscores.
### Database Schema
SQLite with WAL mode, per-thread connections, schema v3:
**`runs`** — one row per session (CLI session or cron job execution):
|Column |Description |
|--------------------------|--------------------------------------------------------------------------------|
|`session_id` |Primary key (`{YYYYMMDD_HHMMSS}_{uuid6}` for CLI, `cron_{job_id}_{ts}` for cron)|
|`platform` |`cli`, `cron`, `telegram`, `discord`, etc. |
|`cron_job_id` |Extracted from session_id when platform=cron |
|`model` |Model name (updated from last API call) |
|`provider` |Provider name (e.g. `openrouter`, `anthropic`) |
|`started_at` / `ended_at` |ISO-8601 UTC timestamps |
|`status` |`running`, `ok`, `error`, `interrupted` |
|`tokens_in` / `tokens_out`|Accumulated across all API calls in the session |
|`cost_usd` |Accumulated estimated cost |
|`duration_ms` |Wall time (ms) via `julianday()` |
|`api_calls` / `tool_calls`|Counters |
|`parent_session_id` |Reserved for future parent-child linking (not populated in v0.2) |
|`estimated_llm_calls` |Count of calls where provider returned `usage=None` |
|`sender_id` |For per-sender budgets (set via `pre_llm_call`) |
**`llm_calls`** — one row per individual API call:
All of `runs` token/cost columns, plus `cache_read_tokens`, `cache_write_tokens`, `reasoning_tokens`, `estimated` (boolean).
**`tool_calls`** — one row per tool execution:
`session_id`, `ts`, `tool_name`, `ok` (boolean), `latency_ms`.
**`budget_alerts`** — anti-spam ledger:
`scope`, `scope_id`, `window`, `period_key`, `level`, `fired_at`, `spent_usd`, `limit_usd`. Unique constraint prevents duplicate alerts.
### Concurrency Model
Cron jobs run in a `ThreadPoolExecutor` (Hermes `cron/scheduler.py`). Multiple jobs can write to the DB simultaneously from different threads.
**Design:** per-thread SQLite connections via `threading.local()`. Each thread opens its own connection to the same WAL-mode DB file. A serializable `_schema_lock` protects DDL migrations on first connect (WAL mode switch requires a brief lock that `busy_timeout` alone doesn’t handle).
`busy_timeout=5000` ensures write collisions retry for 5 seconds before raising. `synchronous=NORMAL` balances durability with write performance (safe for WAL mode).
-----
## Budget Enforcement
> See the budget enforcement demo at the top of this README for an end-to-end walkthrough.
### How It Works
Every time the agent is about to do work, the plugin checks:
1. **`pre_llm_call`** (fires once per turn): evaluates all applicable budget scopes. If any has a `soft` or `hard` verdict that hasn’t been alerted yet this window, injects a one-time notice into the conversation context (anti-spam via `budget_alerts` table). Captures `sender_id`.
1. **`pre_tool_call`** (fires before every tool): re-evaluates budgets. If any scope is in `hard` breach, returns `{"action":"block","message":...}` which aborts the tool call.
1. **For cron jobs with `hard` breach:** additionally calls `cron.jobs.pause_job` to pause future runs.
### Enforcement Levels
Hermes does **not** expose a way to abort an in-flight model call from a plugin. `pre_llm_call` / `pre_api_request` returns can’t cancel a call. So enforcement is honest about its reach:
|Level |Trigger |Effect |Repeat? |
|-----------------------|-----------------------------------------|------------------------------------------|-----------------------------------|
|**Soft** (≥ `soft_pct`)|Spend reaches 80% of limit (configurable)|One-time notice injected into conversation|Once per window per scope |
|**Hard** (≥ `hard_pct`)|Spend reaches 100% of limit |Every subsequent tool call is blocked |Every tool call until window resets|
|**Cron pause** |Any hard `cron_job` verdict |Job is paused for future runs |Once per window per scope |
The model response already in flight still completes and is billed. What’s prevented is *further* tool-driven work.
### Estimated Data and Budget Degradation
When the provider returns `usage=None`, the plugin estimates tokens and flags the row as `estimated=1`. Since these estimates may be inaccurate, the budget engine offers a safety valve:
**`on_estimated.mode: warn_only` (default):** If a hard verdict rests partly on estimated rows, it is **degraded to soft** — the user gets a warning but tools aren’t blocked. Rationale: a budget built on estimates shouldn’t hard-stop work.
**`on_estimated.mode: enforce`:** Hard verdicts take effect regardless of estimate quality. Use this when you trust your provider’s usage data (Est% = 0) or when estimates are acceptable.
The `/stats providers` command shows the `Est%` column so you can see at a glance whether your provider returns real usage data.
**Estimated-price models:** Some models (e.g. OpenRouter `auto` routing) have no fixed pricing. These are flagged with `_estimated_price: true` in `pricing.yaml` and normalized to `$0.00`. If >0% of calls use these models, budget hard-verdicts are also degraded to soft under `warn_only` mode. See [Pricing Auto-Refresh](#pricing-auto-refresh) for details.
-----
## Provider Probe: Verifying Your Provider Returns Real Usage
Run this **once** after enabling the plugin:
1. Run one short session (any minimal task works)
1. Execute `/stats providers`
1. Look at the `Est%` column for your provider:
- **`0%`** → provider returns real usage data. Budget verdicts are based on real numbers. Set `on_estimated.mode: enforce` for strict enforcement. ✅
- **`> 0%`** → provider omits usage in some responses. Those calls are estimated and flagged. Budget hard-verdicts will be degraded to soft under `warn_only`. The `telemetry.log` will have a **one-time WARNING** per provider. ⚠️
-----
## Proof of Concept
The following PoC was executed live to validate the plugin end-to-end.
### Setup
- **Hermes gateway** running on Linux (WSL), model `openrouter/owl-alpha` (free tier)
- **Plugin:** hermes-telemetry v0.2.0, loaded in gateway process
- **DB:** `/home/nujovich/.hermes/telemetry/telemetry.db` (schema v3, WAL mode)
- **6 cron jobs** configured, 2 used for this PoC
### Pricing Capture
Added models to `~/.hermes/telemetry/pricing.yaml`:
```yaml
models:
"openrouter/owl-alpha":
input: 0.00
output: 0.00
"openrouter/anthropic/claude-sonnet-4-6":
input: 3.00
output: 15.00
cache_read: 0.30
cache_write: 3.75
"openrouter/anthropic/claude-opus-4-7":
input: 5.00
output: 25.00
cache_read: 0.50
cache_write: 6.25
```
Set `on_estimated.mode: enforce` for deterministic enforcement.
### Budget Enforcement Test
**Step 1 — Trigger a hard breach:**
- Budget: `global.daily_usd: 0.001` ($0.001/day)
- Ran MCP Lead Gen job (model: `claude-sonnet-4-6`, ~$3/$15 per 1M)
- Result: job spent $0.1812 on first run → **18,120% of daily limit** → █ hard breach → **job auto-paused**
```
█ global $0.1812 / $0.00 18120% [daily]
↑ (0.001 rounded to 0.00 in display)
```
**Step 2 — Raise budget and resume:**
```
/budget set global daily 2.00
```
Result after `/budget set`:
```
global $0.1812 / $2.00 9% [daily]
```
**Step 3 — Verify job runs normally:**
- MCP Lead Gen re-ran successfully under the $2.00 daily budget
- Second run confirmed: `state: scheduled`, `paused_at: null`
### Cron Job Cost Comparison
|Job |Model |Price (input/output) |
|--------------------|-------------------|---------------------|
|MCP Lead Gen |`claude-sonnet-4-6`|$3.00 / $15.00 per 1M|
|Marketing Highlights|`claude-opus-4-7` |$5.00 / $25.00 per 1M|
|Base sessions (CLI) |`owl-alpha` |$0.00 / $0.00 (free) |
**Results from SQLite (`/stats` after all runs):**
- **CLI sessions** (owl-alpha, free): ~1M tokens in → **$0.00**
- **MCP Lead Gen** (claude-sonnet-4-6): ~892K tokens in → **$0.314**
- **Marketing Highlights** (claude-opus-4-7): ~445K tokens in → **$2.23** (opus is ~5-8x more expensive per token)
### Results Summary
|Component |Status |
|-------------------------------------|---------------------------------------------------|
|Token capture from provider |✅ Real usage (`estimated=0`) |
|Cost estimation with pricing table |✅ Accurate to pricing YAML |
|Cron job session tracking |✅ Captured via `session_id` regex |
|Budget soft alerts |✅ One-time context injection |
|Budget hard enforcement |✅ Paused job at $0.001/day |
|Budget hot-reload via `/budget set` |✅ Cache cleared, new limit active |
|Multi-model cost comparison |✅ Sonnet vs Opus vs Free |
|Pricing auto-refresh (OpenRouter API)|✅ 320 models fetched, manual overrides preserved |
|Estimated-price model handling |✅ Negative prices → $0.00, budget degradation |
|Dashboard (HTML, auto-refresh 30s) |✅ Charts, tables, budget bar, provider distribution|
|94 tests pass |✅ |
-----
## Comparison
| |hermes-telemetry|TokenTelemetry |Martin Loop |
|------------------|----------------|---------------------|--------------------|
|Hermes-native |✅ Native plugin |❌ Reads external logs|❌ No Hermes support |
|Budget enforcement|✅ Stops the run |❌ Observe only |✅ But not for Hermes|
|Real-time |✅ Pre-call |❌ Post-hoc |✅ Pre-attempt |
|Requires Hermes |✅ Hermes only |Any agent |Claude Code / Codex |
|Local dashboard |✅ |✅ (more complete) |❌ |
|Open source |✅ MIT |✅ MIT |✅ MIT |
**When to use TokenTelemetry instead:** if you need a multi-agent dashboard (Claude Code + Codex + Hermes in one place), TokenTelemetry is the right choice. hermes-telemetry is purpose-built for Hermes operators who need budget enforcement, not just visibility.
-----
## Running Tests
```
cd hermes-telemetry
pip install pytest pyyaml
pytest tests/ -v
```
**Test suite (94 tests):**
|File |Tests|Coverage |
|---------------------------------|-----|-------------------------------------------------------------------------------------------------------------------------------|
|`test_db.py` |15 |Schema v1→v3 migrations, CRUD, aggregations, concurrent WAL writes (10 threads × 5 writes) |
|`test_pricing.py` |17 |Cache/reasoning split, no double-counting of `prompt_tokens`, YAML overrides, prefix matching, unknown model handling |
|`test_init.py` |6 |Cron session ID regex, tool success/failure parsing |
|`test_budget.py` |17 |ok/soft/hard verdicts, estimated-to-soft degradation, anti-spam ledger, cron pause, per-scope routing, `/budget set` hot-reload|
|`test_stats_providers.py` |8 |Real vs estimated per provider, `/stats providers` output format, Nous warning dedup |
|`test_subagent_reconciliation.py`|4 |Parent + child hook sequence, token reconciliation, no double-counting |
No live Hermes is required — all tests are self-contained with in-memory SQLite.
-----
## Data Location
```
~/.hermes/telemetry/
├── telemetry.db ← SQLite (WAL mode, ~70KB base + growth)
├── telemetry.log ← Plugin log (errors, debug, one-time warnings)
├── pricing.yaml ← Your model price overrides
└── budget.yaml ← Your spend guardrails
```
The DB grows over time. For high-frequency cron jobs, consider periodic cleanup of old rows (not yet automated — see [Known Limitations](#known-limitations)).
-----
## Known Limitations
**Enforcement gaps:**
- **No true mid-call abort.** `pre_llm_call` / `pre_api_request` cannot cancel an in-flight model call. The response that’s already generating will complete and be billed. The tool-gate (`pre_tool_call`) stops *subsequent* work at the next tool boundary.
- **Runaway text-only sessions.** A session that generates text without calling any tools never hits the tool-gate. If this becomes a problem, a pre-flight check in `on_session_start` for cron jobs could abort before the first LLM call.
**Subagent attribution:**
- Child agents (`delegate_task`) run as their own sessions. Their tokens are captured independently and included in **global** totals. But there is no parent→child link in any hook — so `per_cron_job` budgets **exclude** subagent cost. Use the `global` budget for a cap that captures delegated work.
**Pricing refresh only for OpenRouter models:**
- `pricing.yaml` is updated with OpenRouter models via OpenRouter API, preserving those entered manually by the user.
**DB retention:**
- `telemetry.db` grows without bound. No automatic purge of old rows. For >100K rows, consider manual cleanup or a retention policy (not yet implemented).
**Gateway restart required:**
- Enabling the plugin takes effect only after gateway restart. Cron runs that started before the restart won’t have telemetry.
-----
## Troubleshooting
**`/stats cron week` shows “No cron runs in the last 7 days”:**
The gateway loaded before the plugin was enabled. Restart the gateway:
```
hermes gateway restart
```
Then re-run a cron job.
**`/budget` shows `$0.00` as the limit:**
The limit is cached in memory at gateway start. If you edited `budget.yaml` directly, the cache is stale. Use `/budget set global daily ` to hot-reload, or restart the gateway.
**Cost is $0.00 for all sessions:**
Your model isn’t in the pricing table. Check `telemetry.log` for a one-time warning like:
```
hermes-telemetry: unknown model 'openrouter/some-model' — cost recorded as $0.00
```
Add it to `pricing.yaml`.
**Provider Est% > 0:**
Your provider returns `usage=None` for some/all calls. Tokens are estimated. Check `/stats providers` to see which providers are affected. If Est% is 100% for your main provider, all spend is estimated and budget hard-verdicts degrade to soft under `warn_only` mode.
**Plugin not loading at all:**
Check `telemetry.log` for errors. Common causes:
- Missing `pyyaml` in the gateway’s venv: `pip install pyyaml`
- Plugin not in `plugins.enabled` in config.yaml
- Syntax error in `pricing.yaml` or `budget.yaml`
-----
## License
MIT — see [LICENSE](https://github.com/nujovich/hermes-telemetry/blob/main/LICENSE).
-----
## Hermes Agent Challenge
This plugin was built for the [**Hermes Agent Challenge**](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd) — a $1,000 competition to build the most useful Hermes Agent plugins and extensions.
**🔗 Challenge Entry:** [hermes-telemetry on dev.to](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd)
**🛠️ Built by:** [Nadia Ujovich](https://github.com/nujovich)
**💡 Why this plugin:** Every AI system needs observability and cost control. This plugin gives Hermes Agent users the visibility to optimize their workflows and the guardrails to prevent bill shock — essential for production deployments and automated cron jobs.
-----
*Made with ☕ for the Hermes Agent ecosystem*