https://github.com/nujovich/hermes-telemetry

Budget enforcement + observability plugin for Hermes Agent. Stops runaway costs before they happen.
https://github.com/nujovich/hermes-telemetry
agent-telemetry ai-cost-tracking budget-enforcement hermes-agent hermes-plugin llm-budget llm-observability token-tracking
Last synced: about 1 month ago
JSON representation
Budget enforcement + observability plugin for Hermes Agent. Stops runaway costs before they happen.
Host: GitHub
URL: https://github.com/nujovich/hermes-telemetry
Owner: nujovich
License: mit
Created: 2026-05-31T06:51:19.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-06-07T18:50:44.000Z (about 2 months ago)
Last Synced: 2026-06-07T20:20:36.429Z (about 2 months ago)
Topics: agent-telemetry, ai-cost-tracking, budget-enforcement, hermes-agent, hermes-plugin, llm-budget, llm-observability, token-tracking
Language: Python
Homepage:
Size: 782 KB
Stars: 7
Watchers: 0
Forks: 2
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # hermes-telemetry ☤

> *Observability + budget guardrails for [Hermes Agent](https://github.com/NousResearch/hermes-agent)*

**Budget enforcement + observability for Hermes Agent. The only plugin that can stop a run before it overspends.**

A comprehensive telemetry plugin that captures real usage data, enforces budget limits, and provides detailed cost analysis for AI agent operations. Built for the [Hermes Agent Challenge](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd) by [Nadia Ujovich](https://nadiaujovich.dev).

**The differentiator: it can _stop_ work that's about to overspend — not just report it after the fact.** Set a daily cap below current spend, and the next cron run is blocked by the budget:

![Budget enforcement demo: a $0.001 daily global cap is set, current spend already exceeds it, and the next marketing cron run is blocked by the resulting hard breach](docs/budget_enforcement.gif)

*`/budget set global daily 0.001` writes the cap to `budget.yaml`; current spend ($0.0102) already exceeds it, so `/budget` re-renders at 1020% `[daily]` — a hard breach — and the next marketing cron run is blocked by the budget.*

[![Hermes Agent](https://raw.githubusercontent.com/NousResearch/hermes-agent/HEAD/assets/banner.png)](https://raw.githubusercontent.com/NousResearch/hermes-agent/HEAD/assets/banner.png)

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://camo.githubusercontent.com/08cef40a9105b6526ca22088bc514fbfdbc9aac1ddbf8d4e6c750e3a88a44dca/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d626c75652e737667) [![Tests: 94 passing](https://img.shields.io/badge/Tests-94%20passing-green.svg)](https://camo.githubusercontent.com/89bc4bc6079d0e919e0c1363852fe900e05cb49429800097aa3ca83908c5cd59/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f54657374732d393425323070617373696e672d677265656e2e737667) [![Provider Support](https://img.shields.io/badge/Providers-OpenRouter%20%7C%20OpenAI%20%7C%20Anthropic-orange.svg)](https://camo.githubusercontent.com/cf0938e4acec0cd17c14dcf61a72734ffd03e8fff8eb44e359994f6ea773bfad/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50726f7669646572732d4f70656e526f757465722532302537432532304f70656e4149253230253743253230416e7468726f7069632d6f72616e67652e737667) [![Challenge Entry](https://img.shields.io/badge/Hermes%20Agent-Challenge%20Entry-purple.svg)](https://camo.githubusercontent.com/d0c993fdf35127e435629279025d4b1892e351f5e04ce1547329686aa4223366/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4865726d65732532304167656e742d4368616c6c656e6765253230456e7472792d707572706c652e737667)

-----

Hermes Agent runs autonomously — across sessions, platforms, and cron jobs — which 

means it can keep spending even when you're not watching.  

**hermes-telemetry lives inside the runtime** and enforces hard budget limits before 

the next LLM call is made.

> This plugin addresses [NousResearch/hermes-agent#6642](https://github.com/NousResearch/hermes-agent/issues/6642) — 

> the open feature request for a first-class telemetry and budget subsystem for Hermes Agent.

```

Your Hermes session

  ↓ every API call

hermes-telemetry (native plugin)

  → tracks tokens + cost in real time

  → enforces budget limits mid-session

  → logs to SQLite with WAL mode

  → syncs OpenRouter pricing automatically

  ↓ if budget OK

LLM provider

```

> **Not a log reader.** TokenTelemetry and similar tools read what already happened.

> hermes-telemetry hooks into the Hermes runtime and can *stop* what’s about to happen.

-----

**Design principle:** observability is invisible to the model. Everything goes through hooks. The only user-facing surface is `/stats` and `/budget`.

-----

## Table of Contents

- [Screenshots](#screenshots)

  - [Dashboard (Web UI)](#dashboard-web-ui)

  - [Slash Commands](#slash-commands-1)

- [What It Measures](#what-it-measures)

- [Installation](#installation)

- [Quick Start](#quick-start)

- [Setup Wizard](#setup-wizard)

- [Dashboard (Web UI)](#dashboard-web-ui-1)

  - [Auto-Refresh](#auto-refresh)

  - [Features](#features)

- [Slash Commands](#slash-commands-2)

  - [/stats](#stats)

  - [/budget](#budget)

- [Configuration](#configuration)

  - [pricing.yaml](#pricingyaml)

  - [budget.yaml](#budgetyaml)

- [Pricing Auto-Refresh](#pricing-auto-refresh)

  - [How It Works](#how-it-works)

  - [Estimated-Price Models](#estimated-price-models)

  - [CLI Usage](#cli-usage)

- [Architecture](#architecture)

  - [Hook Pipeline](#hook-pipeline)

  - [Database Schema](#database-schema)

  - [Concurrency Model](#concurrency-model)

- [Budget Enforcement](#budget-enforcement)

  - [How It Works](#how-it-works)

  - [Enforcement Levels](#enforcement-levels)

  - [Estimated Data and Budget Degradation](#estimated-data-and-budget-degradation)

- [Provider Probe: Verifying Your Provider](#provider-probe-verifying-your-provider)

- [Proof of Concept](#proof-of-concept)

  - [Setup](#setup)

  - [Pricing Capture](#pricing-capture)

  - [Budget Enforcement Test](#budget-enforcement-test)

  - [Cron Job Cost Comparison](#cron-job-cost-comparison)

  - [Results Summary](#results-summary)

- [Comparison](#comparison)

- [Running Tests](#running-tests)

- [Data Location](#data-location)

- [Known Limitations](#known-limitations)

- [Troubleshooting](#troubleshooting)

- [License](#license)

- [Hermes Agent Challenge](#hermes-agent-challenge)

-----

## Screenshots

### Dashboard (Web UI)

A standalone HTML dashboard for users who prefer a visual interface over slash commands. Served locally, reads directly from the telemetry SQLite database.

[![Dashboard overview](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/dashboard-overview.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/dashboard-overview.png)

*The dashboard auto-refreshes every 30 seconds. Shows sessions, API calls, tokens, cost, budget status, daily cost trends, top tools, cost by cron job, provider distribution, and recent sessions.*

### Slash Commands

#### `/stats` — Session analytics

[![Stats output](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/stats-output.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/stats-output.png)

#### `/budget` — Current spending vs limits

[![Budget output](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/budget-output.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/budget-output.png)

#### `/stats cron week` — Cron job cost breakdown

[![Cron output](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/cron-output.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/cron-output.png)

#### `/stats providers` — Real vs estimated usage + estimated-price warning

[![Providers output](https://github.com/nujovich/hermes-telemetry/raw/main/docs/screenshots/providers-output.png)](https://github.com/nujovich/hermes-telemetry/blob/main/docs/screenshots/providers-output.png)

-----

## What It Measures

|Metric                                   |Source                         |Real or Estimated       |

|-----------------------------------------|-------------------------------|------------------------|

|Tokens in / out per API call             |`post_api_request.usage`       |✅ Real (from provider)  |

|Cache read / write tokens                |`post_api_request.usage`       |✅ Real (from provider)  |

|Reasoning tokens                         |`post_api_request.usage`       |✅ Real (from provider)  |

|API call latency                         |`post_api_request.api_duration`|✅ Real (ms)             |

|Tool call latency & success/failure      |`post_tool_call`               |✅ Real                  |

|Session / cron job wall time             |`started_at` → `ended_at`      |✅ Real                  |

|Model & provider name                    |`post_api_request`             |✅ Real                  |

|Platform (cli / cron / telegram / …)     |`on_session_start.platform`    |✅ Real                  |

|Cron job ID                              |Parsed from `session_id`       |✅ Real                  |

|Subagent invocation count                |`subagent_stop` hook           |✅ Real (proxy)          |

|**Cost (USD)**                           |Local pricing table × tokens   |⚠️ **Estimated**         |

|Tokens when provider returns `usage=None`|Fallback approximation         |⚠️ **Estimated, flagged**|

Cost is always an **estimate** computed from a locally-maintained pricing table. No external pricing API is called. When the provider returns no usage data, tokens are estimated from a pre-request approximation + response length and the row is flagged as `estimated=1`, so `/stats` and `/budget` show a `~` prefix and an “estimated data” percentage.

-----

## Installation

Hermes plugins are **opt-in** — you must both install and enable the plugin.

### Option A: Install from GitHub

```

hermes plugins install nujovich/hermes-telemetry

hermes plugins enable hermes-telemetry

```

### Option B: Manual install

```

git clone https://github.com/nujovich/hermes-telemetry ~/.hermes/plugins/hermes-telemetry

hermes plugins enable hermes-telemetry

```

**Important:** restart the Hermes gateway after enabling:

```

hermes gateway restart

```

> **Note:** Plugin changes only take effect after a gateway restart. The gateway loads the plugin registry at startup. If you enable a plugin and cron jobs don’t appear in `/stats cron week`, this is the most likely cause.

-----

## Quick Start

1. Install and enable the plugin (see above)

1. Restart the gateway

1. Run any session, then type `/stats` to see captured data

1. Optionally configure `pricing.yaml` and `budget.yaml` (see below)

That’s it. The plugin captures data automatically — no agent action required.

-----

## Setup Wizard

hermes-telemetry includes a first-time setup wizard that runs automatically on first

plugin load when `pricing.yaml` and/or `budget.yaml` are missing. It can also be

triggered manually at any time with the `/setup` slash command.

### Auto-setup (first load)

On first load, if either config file is missing, the plugin auto-generates defaults:

- **Pricing:** fetches all models with fixed pricing from the OpenRouter API and merges

  them with ~30 built-in defaults (Anthropic, OpenAI, DeepSeek, Google, Meta, Nous).

  New prices take effect immediately — no gateway restart needed.

- **Budget:** writes a conservative global budget (`$5.00/day`, `$100.00/month`) with

  an 80% soft warning and 100% hard cap.

### `/setup` slash command

Use `/setup` to check configuration status or reconfigure individual files.

```

/setup                     → show current status (which files exist)

/setup pricing auto        → built-in defaults + fetch from OpenRouter API

/setup pricing minimal     → built-in defaults only (~30 models, no network)

/setup pricing skip        → skip (unrecognized models will record $0.00 cost)

/setup budget default      → recommended global budget ($5/day, $100/month)

/setup budget custom       → instructions for setting your own limits manually

/setup budget skip         → no enforcement (costs still tracked)

```

#### Pricing options

| Option | Models | Network |

|--------|--------|---------|

| `auto` | ~30 built-in + all OpenRouter fixed-price models | Yes (OpenRouter API) |

| `minimal` | ~30 built-in only | No |

| `skip` | None — models will record `$0.00` cost | No |

#### Budget options

| Option | Behavior |

|--------|----------|

| `default` | Global: `$5.00/day`, `$100.00/month`. Soft warning at 80%, hard block at 100% |

| `custom` | Prints the `/budget set` commands for manual configuration |

| `skip` | Costs tracked but never enforced |

### Re-running setup

Setup skips files that already exist. To reconfigure:

```bash

# Reprice from scratch

rm ~/.hermes/telemetry/pricing.yaml

/setup pricing auto

# Reset budget

rm ~/.hermes/telemetry/budget.yaml

/setup budget default

```

> **Note:** Pricing changes take effect immediately without a gateway restart. Budget

> changes require a restart.

-----

## Slash Commands

### `/stats`

```

/stats                  → last 24h summary (sessions, tokens, cost, top tools)

/stats today            → same as /stats

/stats week             → last 7 days

/stats month            → last 30 days

/stats cron             → breakdown by cron_job_id (last 7 days)

/stats cron week        → cron breakdown, last 7 days

/stats cron month       → cron breakdown, last 30 days

/stats cron today       → cron breakdown, last 24 hours

/stats providers        → per-provider: real vs estimated calls + cost (last 24h)

/stats providers week   → provider breakdown, last 7 days

/stats models           → per-model breakdown within each provider (last 24h)

/stats models week      → per-model breakdown, last 7 days

/stats raw [N]          → last N raw run records (default 20, max 200)

```

**Example output (`/stats`):**

```

hermes-telemetry — last 24 h

============================================

  Sessions      : 14

  Success rate  : 92.9%  (ok=13, failed=1)

  API calls     : 47

  Tool calls    : 183

  Tokens in     : 1,240,500

  Tokens out    : 87,300

  Cost (est.)   : $0.004822

  Avg latency   : 1.2s

  Avg duration  : 48.3s

  Top tools:

  Tool                            Calls  Failures   Avg ms

  --------------------------------------------------------

  read_file                          92         0      12ms

  terminal                           51         3     340ms

  write_file                         28         0      18ms

```

**Example output (`/stats cron week`):**

```

hermes-telemetry — cron jobs (last 7 days)

========================================================================

  Job ID               Runs    OK  Fail     Tok-in    Tok-out         Cost   Avg dur

  --------------------------------------------------------------------------

  09dd0c24f29b            3     3     0   892,341    12,405    $0.314378     2.1m

  d68c2728b513            1     1     0   445,119     8,200    $2.225595     4.7m

```

**Example output (`/stats providers`):**

```

hermes-telemetry — providers (last 24 h)

========================================================================

  Provider                     Calls   Real   Est   Est%         Cost

  -------------------------------------------------------------------

  openrouter                      66     66      0     0%    $0.916782

  Est% = share of calls where the provider returned no usage data

  (tokens estimated locally).

  If Est% > 0 for your main provider, budget hard-verdicts may be

  degraded to soft under on_estimated.mode: warn_only.

```

**Example output (`/stats models`):**

```

hermes-telemetry — models (last 24 h)

================================================================================================

  Provider             Model                                           Calls   Real   Est         Cost

  ----------------------------------------------------------------------------------------------

  openrouter           owl-alpha                                          66     66     0    $0.000000

  openrouter           anthropic/claude-sonnet-4-6                        42     42     0    $0.314378

  openrouter           anthropic/claude-opus-4-7                           8      8     0    $2.225595

  Rows are grouped by provider, then by calls (desc). A model showing $0.00 has no price entry

  in pricing.yaml — run /setup pricing auto to refresh, or add it manually.

```

Breaks each provider's spend down to individual models. Rows are grouped by provider (ascending), then ordered by call count within each provider; the `Model` column is kept wide so dated model keys stay readable. Columns: `Calls` (total), `Real` (calls with provider-reported usage), `Est` (calls with locally estimated tokens), and `Cost`. A model showing `$0.000000` has no price entry in `pricing.yaml`.

### `/budget`

```

/budget                             → status of every scope (spent / limit / %)

/budget cron                        → per-cron-job budgets, with soft/hard flags

/budget set global daily 5.00       → set or raise a limit (persists + hot-reloads)

/budget set cron_job daily 1.00     → set default per-cron-job limit

/budget set sender daily 2.00       → set default per-sender limit

```

**Example output (`/budget`):**

```

hermes-telemetry — budget status

============================================================

  global                       $   0.1812 / $    2.00      9%  [daily]

  Legend:  (blank)=ok  !=soft (≥80%)  █=hard (≥100%)  ~est=estimated data

```

**Status flags:**

|Flag   |Meaning                                                    |

|-------|-----------------------------------------------------------|

|(blank)|Within budget (`< 80%`)                                    |

|`!`    |Soft warning (≥ 80%) — notice injected into conversation   |

|`█`    |Hard breach (≥ 100%) — tool calls blocked, cron jobs paused|

|`~est` |Verdict based partly on estimated (usage=None) data        |

-----

## Dashboard (Web UI)

A standalone HTML dashboard for users who prefer a visual interface over slash commands. Zero dependencies — uses only Python stdlib.

### Auto-Refresh

The dashboard auto-refreshes every 30 seconds. No manual reload needed.

### Features

- **Summary cards**: Sessions, OK/failed, API calls, tokens in, cost

- **Budget bar**: Real-time spend vs limit with progress indicator

- **Daily cost chart**: 7-day line chart of spending

- **Top tools chart**: Bar chart of most-used tools

- **Cost by cron job**: Per-job cost breakdown

- **Provider distribution**: Donut chart (nous / openrouter / anthropic)

- **Cron jobs table**: Runs, tokens, cost, avg duration, last run

- **Recent sessions table**: All sessions with platform, model, status, cost

- **Time range selector**: Last 24h / 7 days / 30 days

### Usage

```

cd ~/.hermes/plugins/hermes-telemetry/dashboard

python3 serve.py                  # http://localhost:8765 (loopback only)

python3 serve.py --port 9090      # custom port, still loopback

python3 serve.py 9090             # positional port (back-compat)

```

Then open `http://localhost:8765` in your browser.

### Accessing the dashboard from another host

The dashboard has **no authentication** — anyone who can reach the port sees

every captured token, cost, and tool-call detail. By default it binds to

`127.0.0.1`, which is unreachable from other machines.

If your Hermes server is headless (Pi, VPS, NAS) and you browse from a laptop,

two options:

**Recommended — SSH tunnel** (no server-side change, leaves the safe default in

place):

```bash

# Start the dashboard on the server as usual

ssh server "cd ~/.hermes/plugins/hermes-telemetry/dashboard && python3 serve.py &"

# Tunnel from your client

ssh -L 8765:localhost:8765 -N server &

# Browse on the client

open http://localhost:8765

```

**Trusted-LAN shortcut — `--host 0.0.0.0`:**

```bash

python3 serve.py --host 0.0.0.0

```

The script prints a warning when binding to any non-loopback interface. Only

use this on a network where you trust every host. **Do not expose to the

public internet or to networks that include untrusted hosts** — the dashboard

ships without an auth layer by design (see CONTRIBUTING.md if you want to add

one).

-----

## Configuration

Configuration lives in `~/.hermes/telemetry/`:

```

~/.hermes/telemetry/

├── telemetry.db      ← SQLite database (WAL mode)

├── telemetry.log     ← plugin log (errors / debug)

├── pricing.yaml      ← optional pricing overrides

└── budget.yaml       ← optional spend budgets

```

If these files don’t exist, the plugin still works — it just uses defaults (all models at $0.00, budgets disabled).

### `pricing.yaml`

Override model prices in USD per 1 million tokens. Without overrides, unknown models log a one-time warning and record cost as `$0.00`.

**Full format:**

```yaml

models:

  # Free model

  "openrouter/owl-alpha":

    input: 0.00

    output: 0.00

  # Paid model with full cache/reasoning split

  "openrouter/anthropic/claude-sonnet-4-6":

    input: 3.00

    output: 15.00

    cache_read: 0.30

    cache_write: 3.75

    reasoning: 15.00

  # Minimal override (cache prices derived from multipliers)

  "openrouter/anthropic/claude-opus-4-7":

    input: 5.00

    output: 25.00

defaults:

  cache_read_multiplier: 0.10   # cache_read = input * 0.10 if not specified

  cache_write_multiplier: 1.25  # cache_write = input * 1.25 if not specified

```

**Matching rules (in order):**

1. Exact match (case-insensitive) against `models:` keys in your YAML

1. Exact match against the built-in pricing table (~35 models)

1. Longest-prefix match (e.g. `claude-sonnet` matches `claude-sonnet-4-6-future`)

1. Unknown → `$0.00` with a one-time warning in `telemetry.log`

The built-in table covers: Anthropic (Claude 3/4 family), OpenAI (GPT-4o, GPT-4, o1, o3, o4), DeepSeek, Gemini, Llama, and Hermes models. Prices sourced from official provider pages (May 2026).

### `budget.yaml`

Configure spend guardrails. No file → budgets disabled.

```yaml

budgets:

  global:

    daily_usd: 2.00

    monthly_usd: 50.00

  per_cron_job:

    default:

      daily_usd: 1.00

    overrides:

      daily_email_report:

        daily_usd: 3.00

  per_sender:

    default:

      daily_usd: 2.00

    overrides:

      premium_user_123:

        daily_usd: 5.00

thresholds:

  soft_pct: 0.80    # warn at 80% of limit

  hard_pct: 1.00    # enforce at 100%

on_estimated:

  mode: enforce     # warn_only | enforce

```

**Scope resolution:**

|Scope         |How spend is calculated                                      |

|--------------|-------------------------------------------------------------|

|`global`      |All sessions + all cron jobs combined                        |

|`per_cron_job`|Sessions where `cron_job_id` matches (excludes subagent cost)|

|`per_sender`  |Sessions from a specific sender (multi-user gateways)        |

**Window math:** daily and monthly windows are computed in the user’s local timezone. A cron job that runs at 11:59 PM and another at 12:01 AM count against different daily windows.

-----

## Pricing Auto-Refresh

The plugin can automatically fetch model pricing from OpenRouter’s public API, eliminating the need to manually maintain `pricing.yaml` for hundreds of models.

### How It Works

- **Source**: OpenRouter public API (`https://openrouter.ai/api/v1/models`) — no auth required

- **Frequency**: Once per 24 hours (tracked via sentinel file)

- **Trigger**: Automatically on plugin load (gateway startup), or manually via CLI

- **Merge strategy**:

  - User overrides in `pricing.yaml` are **always preserved** — manual entries take priority over auto-fetched ones

  - New models from the API are added automatically

  - Previously auto-fetched models are updated when prices change

  - Models are tagged with `_auto: true` and `_source: openrouter` for traceability

### Estimated-Price Models

Some OpenRouter models have no fixed pricing (e.g. `auto` routing, experimental models). These are represented with negative prices in the API.

The plugin handles these safely:

- Prices are normalized to `$0.00` (they don’t inflate cost calculations)

- Flagged with `_estimated_price: true` in `pricing.yaml`

- The budget engine detects when spend uses these models

**Budget degradation logic:**

|Condition                               |Effect                                                                                                                               |

|----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|

|`on_estimated.mode: warn_only` (default)|If >0% of calls use estimated-price models, **hard verdicts are degraded to soft** — the user gets a warning but tools aren’t blocked|

|`on_estimated.mode: enforce`            |Hard verdicts take effect regardless                                                                                                 |

### CLI Usage

```

# Dry run — see what would change

python -m hermes_telemetry.pricing_refresh --check

# Apply changes

python -m hermes_telemetry.pricing_refresh

# Verbose output

python -m hermes_telemetry.pricing_refresh --verbose

```

**Example output:**

```

INFO OpenRouterSource: fetched 320 models

Updated 3 model(s):

  ~ stepfun/step-3.7-flash  (openrouter)

      input: 0.9999 → 0.2000

      output: 9.9999 → 1.1500

  + anthropic/claude-opus-4.8  (openrouter)

      input=5.0000 output=25.0000

  ⚠  Model(s) with estimated pricing: openrouter/auto, openrouter/bodybuilder, openrouter/pareto-code

```

### Extending with New Sources

Add new pricing providers by subclassing `PricingSource`:

```python

from hermes_telemetry.pricing_refresh import PricingSource, register_source

class AnthropicSource(PricingSource):

    name = "anthropic"

    def fetch(self) -> dict[str, dict]:

        # Fetch from Anthropic's pricing page or API

        ...

register_source(AnthropicSource)

```

Sources are registered in `pricing_refresh.py` and fetched in parallel on each refresh cycle.

-----

## Architecture

### Hook Plugin

The plugin registers 10 hooks (out of 16 available in Hermes) plus 2 slash commands:

```

Hook                      Purpose

─────────────────────────────────────────────────────────────

on_session_start          Create run row, extract cron_job_id

pre_api_request           Stash approx_input_tokens for fallback

post_api_request          PRIMARY: record tokens, cost, latency

post_tool_call            Record tool name, success, duration

post_llm_call             Refresh session end timestamp

subagent_stop             Record delegate_task proxy on parent

on_session_end            Set final status (ok/error/interrupted)

on_session_finalize       Safety net: ensure run is closed

pre_llm_call              Soft budget alerts + capture sender_id

pre_tool_call             Hard budget enforcement (tool-gate)

```

**Why `post_api_request` is the primary hook for tokens:** The Hermes conversation loop can make multiple API calls per turn (retries, reasoning models, tool calls). Only `post_api_request` carries the canonical `usage` dict with token counts and cost data. `pre_llm_call` fires once per turn with no token data. `post_llm_call` fires after the tool loop with no token data.

**Cron job identification:** There is no `cron_job_id` in any hook. The plugin extracts it from the `session_id`, which follows the format `cron_{job_id}_{YYYYMMDD_HHMMSS}` (confirmed in Hermes source). An anchored regex handles job IDs that contain underscores.

### Database Schema

SQLite with WAL mode, per-thread connections, schema v3:

**`runs`** — one row per session (CLI session or cron job execution):

|Column                    |Description                                                                     |

|--------------------------|--------------------------------------------------------------------------------|

|`session_id`              |Primary key (`{YYYYMMDD_HHMMSS}_{uuid6}` for CLI, `cron_{job_id}_{ts}` for cron)|

|`platform`                |`cli`, `cron`, `telegram`, `discord`, etc.                                      |

|`cron_job_id`             |Extracted from session_id when platform=cron                                    |

|`model`                   |Model name (updated from last API call)                                         |

|`provider`                |Provider name (e.g. `openrouter`, `anthropic`)                                  |

|`started_at` / `ended_at` |ISO-8601 UTC timestamps                                                         |

|`status`                  |`running`, `ok`, `error`, `interrupted`                                         |

|`tokens_in` / `tokens_out`|Accumulated across all API calls in the session                                 |

|`cost_usd`                |Accumulated estimated cost                                                      |

|`duration_ms`             |Wall time (ms) via `julianday()`                                                |

|`api_calls` / `tool_calls`|Counters                                                                        |

|`parent_session_id`       |Reserved for future parent-child linking (not populated in v0.2)                |

|`estimated_llm_calls`     |Count of calls where provider returned `usage=None`                             |

|`sender_id`               |For per-sender budgets (set via `pre_llm_call`)                                 |

**`llm_calls`** — one row per individual API call:

All of `runs` token/cost columns, plus `cache_read_tokens`, `cache_write_tokens`, `reasoning_tokens`, `estimated` (boolean).

**`tool_calls`** — one row per tool execution:

`session_id`, `ts`, `tool_name`, `ok` (boolean), `latency_ms`.

**`budget_alerts`** — anti-spam ledger:

`scope`, `scope_id`, `window`, `period_key`, `level`, `fired_at`, `spent_usd`, `limit_usd`. Unique constraint prevents duplicate alerts.

### Concurrency Model

Cron jobs run in a `ThreadPoolExecutor` (Hermes `cron/scheduler.py`). Multiple jobs can write to the DB simultaneously from different threads.

**Design:** per-thread SQLite connections via `threading.local()`. Each thread opens its own connection to the same WAL-mode DB file. A serializable `_schema_lock` protects DDL migrations on first connect (WAL mode switch requires a brief lock that `busy_timeout` alone doesn’t handle).

`busy_timeout=5000` ensures write collisions retry for 5 seconds before raising. `synchronous=NORMAL` balances durability with write performance (safe for WAL mode).

-----

## Budget Enforcement

> See the budget enforcement demo at the top of this README for an end-to-end walkthrough.

### How It Works

Every time the agent is about to do work, the plugin checks:

1. **`pre_llm_call`** (fires once per turn): evaluates all applicable budget scopes. If any has a `soft` or `hard` verdict that hasn’t been alerted yet this window, injects a one-time notice into the conversation context (anti-spam via `budget_alerts` table). Captures `sender_id`.

1. **`pre_tool_call`** (fires before every tool): re-evaluates budgets. If any scope is in `hard` breach, returns `{"action":"block","message":...}` which aborts the tool call.

1. **For cron jobs with `hard` breach:** additionally calls `cron.jobs.pause_job` to pause future runs.

### Enforcement Levels

Hermes does **not** expose a way to abort an in-flight model call from a plugin. `pre_llm_call` / `pre_api_request` returns can’t cancel a call. So enforcement is honest about its reach:

|Level                  |Trigger                                  |Effect                                    |Repeat?                            |

|-----------------------|-----------------------------------------|------------------------------------------|-----------------------------------|

|**Soft** (≥ `soft_pct`)|Spend reaches 80% of limit (configurable)|One-time notice injected into conversation|Once per window per scope          |

|**Hard** (≥ `hard_pct`)|Spend reaches 100% of limit              |Every subsequent tool call is blocked     |Every tool call until window resets|

|**Cron pause**         |Any hard `cron_job` verdict              |Job is paused for future runs             |Once per window per scope          |

The model response already in flight still completes and is billed. What’s prevented is *further* tool-driven work.

### Estimated Data and Budget Degradation

When the provider returns `usage=None`, the plugin estimates tokens and flags the row as `estimated=1`. Since these estimates may be inaccurate, the budget engine offers a safety valve:

**`on_estimated.mode: warn_only` (default):** If a hard verdict rests partly on estimated rows, it is **degraded to soft** — the user gets a warning but tools aren’t blocked. Rationale: a budget built on estimates shouldn’t hard-stop work.

**`on_estimated.mode: enforce`:** Hard verdicts take effect regardless of estimate quality. Use this when you trust your provider’s usage data (Est% = 0) or when estimates are acceptable.

The `/stats providers` command shows the `Est%` column so you can see at a glance whether your provider returns real usage data.

**Estimated-price models:** Some models (e.g. OpenRouter `auto` routing) have no fixed pricing. These are flagged with `_estimated_price: true` in `pricing.yaml` and normalized to `$0.00`. If >0% of calls use these models, budget hard-verdicts are also degraded to soft under `warn_only` mode. See [Pricing Auto-Refresh](#pricing-auto-refresh) for details.

-----

## Provider Probe: Verifying Your Provider Returns Real Usage

Run this **once** after enabling the plugin:

1. Run one short session (any minimal task works)

1. Execute `/stats providers`

1. Look at the `Est%` column for your provider:

- **`0%`** → provider returns real usage data. Budget verdicts are based on real numbers. Set `on_estimated.mode: enforce` for strict enforcement. ✅

- **`> 0%`** → provider omits usage in some responses. Those calls are estimated and flagged. Budget hard-verdicts will be degraded to soft under `warn_only`. The `telemetry.log` will have a **one-time WARNING** per provider. ⚠️

-----

## Proof of Concept

The following PoC was executed live to validate the plugin end-to-end.

### Setup

- **Hermes gateway** running on Linux (WSL), model `openrouter/owl-alpha` (free tier)

- **Plugin:** hermes-telemetry v0.2.0, loaded in gateway process

- **DB:** `/home/nujovich/.hermes/telemetry/telemetry.db` (schema v3, WAL mode)

- **6 cron jobs** configured, 2 used for this PoC

### Pricing Capture

Added models to `~/.hermes/telemetry/pricing.yaml`:

```yaml

models:

  "openrouter/owl-alpha":

    input: 0.00

    output: 0.00

  "openrouter/anthropic/claude-sonnet-4-6":

    input: 3.00

    output: 15.00

    cache_read: 0.30

    cache_write: 3.75

  "openrouter/anthropic/claude-opus-4-7":

    input: 5.00

    output: 25.00

    cache_read: 0.50

    cache_write: 6.25

```

Set `on_estimated.mode: enforce` for deterministic enforcement.

### Budget Enforcement Test

**Step 1 — Trigger a hard breach:**

- Budget: `global.daily_usd: 0.001` ($0.001/day)

- Ran MCP Lead Gen job (model: `claude-sonnet-4-6`, ~$3/$15 per 1M)

- Result: job spent $0.1812 on first run → **18,120% of daily limit** → █ hard breach → **job auto-paused**

```

█ global    $0.1812 / $0.00    18120%  [daily]

                         ↑ (0.001 rounded to 0.00 in display)

```

**Step 2 — Raise budget and resume:**

```

/budget set global daily 2.00

```

Result after `/budget set`:

```

global    $0.1812 / $2.00    9%  [daily]

```

**Step 3 — Verify job runs normally:**

- MCP Lead Gen re-ran successfully under the $2.00 daily budget

- Second run confirmed: `state: scheduled`, `paused_at: null`

### Cron Job Cost Comparison

|Job                 |Model              |Price (input/output) |

|--------------------|-------------------|---------------------|

|MCP Lead Gen        |`claude-sonnet-4-6`|$3.00 / $15.00 per 1M|

|Marketing Highlights|`claude-opus-4-7`  |$5.00 / $25.00 per 1M|

|Base sessions (CLI) |`owl-alpha`        |$0.00 / $0.00 (free) |

**Results from SQLite (`/stats` after all runs):**

- **CLI sessions** (owl-alpha, free): ~1M tokens in → **$0.00**

- **MCP Lead Gen** (claude-sonnet-4-6): ~892K tokens in → **$0.314**

- **Marketing Highlights** (claude-opus-4-7): ~445K tokens in → **$2.23** (opus is ~5-8x more expensive per token)

### Results Summary

|Component                            |Status                                             |

|-------------------------------------|---------------------------------------------------|

|Token capture from provider          |✅ Real usage (`estimated=0`)                       |

|Cost estimation with pricing table   |✅ Accurate to pricing YAML                         |

|Cron job session tracking            |✅ Captured via `session_id` regex                  |

|Budget soft alerts                   |✅ One-time context injection                       |

|Budget hard enforcement              |✅ Paused job at $0.001/day                         |

|Budget hot-reload via `/budget set`  |✅ Cache cleared, new limit active                  |

|Multi-model cost comparison          |✅ Sonnet vs Opus vs Free                           |

|Pricing auto-refresh (OpenRouter API)|✅ 320 models fetched, manual overrides preserved   |

|Estimated-price model handling       |✅ Negative prices → $0.00, budget degradation      |

|Dashboard (HTML, auto-refresh 30s)   |✅ Charts, tables, budget bar, provider distribution|

|94 tests pass                        |✅                                                  |

-----

## Comparison

|                  |hermes-telemetry|TokenTelemetry       |Martin Loop         |

|------------------|----------------|---------------------|--------------------|

|Hermes-native     |✅ Native plugin |❌ Reads external logs|❌ No Hermes support |

|Budget enforcement|✅ Stops the run |❌ Observe only       |✅ But not for Hermes|

|Real-time         |✅ Pre-call      |❌ Post-hoc           |✅ Pre-attempt       |

|Requires Hermes   |✅ Hermes only   |Any agent            |Claude Code / Codex |

|Local dashboard   |✅               |✅ (more complete)    |❌                   |

|Open source       |✅ MIT           |✅ MIT                |✅ MIT               |

**When to use TokenTelemetry instead:** if you need a multi-agent dashboard (Claude Code + Codex + Hermes in one place), TokenTelemetry is the right choice. hermes-telemetry is purpose-built for Hermes operators who need budget enforcement, not just visibility.

-----

## Running Tests

```

cd hermes-telemetry

pip install pytest pyyaml

pytest tests/ -v

```

**Test suite (94 tests):**

|File                             |Tests|Coverage                                                                                                                       |

|---------------------------------|-----|-------------------------------------------------------------------------------------------------------------------------------|

|`test_db.py`                     |15   |Schema v1→v3 migrations, CRUD, aggregations, concurrent WAL writes (10 threads × 5 writes)                                     |

|`test_pricing.py`                |17   |Cache/reasoning split, no double-counting of `prompt_tokens`, YAML overrides, prefix matching, unknown model handling          |

|`test_init.py`                   |6    |Cron session ID regex, tool success/failure parsing                                                                            |

|`test_budget.py`                 |17   |ok/soft/hard verdicts, estimated-to-soft degradation, anti-spam ledger, cron pause, per-scope routing, `/budget set` hot-reload|

|`test_stats_providers.py`        |8    |Real vs estimated per provider, `/stats providers` output format, Nous warning dedup                                           |

|`test_subagent_reconciliation.py`|4    |Parent + child hook sequence, token reconciliation, no double-counting                                                         |

No live Hermes is required — all tests are self-contained with in-memory SQLite.

-----

## Data Location

```

~/.hermes/telemetry/

├── telemetry.db        ← SQLite (WAL mode, ~70KB base + growth)

├── telemetry.log       ← Plugin log (errors, debug, one-time warnings)

├── pricing.yaml        ← Your model price overrides

└── budget.yaml         ← Your spend guardrails

```

The DB grows over time. For high-frequency cron jobs, consider periodic cleanup of old rows (not yet automated — see [Known Limitations](#known-limitations)).

-----

## Known Limitations

**Enforcement gaps:**

- **No true mid-call abort.** `pre_llm_call` / `pre_api_request` cannot cancel an in-flight model call. The response that’s already generating will complete and be billed. The tool-gate (`pre_tool_call`) stops *subsequent* work at the next tool boundary.

- **Runaway text-only sessions.** A session that generates text without calling any tools never hits the tool-gate. If this becomes a problem, a pre-flight check in `on_session_start` for cron jobs could abort before the first LLM call.

**Subagent attribution:**

- Child agents (`delegate_task`) run as their own sessions. Their tokens are captured independently and included in **global** totals. But there is no parent→child link in any hook — so `per_cron_job` budgets **exclude** subagent cost. Use the `global` budget for a cap that captures delegated work.

**Pricing refresh only for OpenRouter models:**

- `pricing.yaml` is updated with OpenRouter models via OpenRouter API, preserving those entered manually by the user.

**DB retention:**

- `telemetry.db` grows without bound. No automatic purge of old rows. For >100K rows, consider manual cleanup or a retention policy (not yet implemented).

**Gateway restart required:**

- Enabling the plugin takes effect only after gateway restart. Cron runs that started before the restart won’t have telemetry.

-----

## Troubleshooting

**`/stats cron week` shows “No cron runs in the last 7 days”:**

The gateway loaded before the plugin was enabled. Restart the gateway:

```

hermes gateway restart

```

Then re-run a cron job.

**`/budget` shows `$0.00` as the limit:**

The limit is cached in memory at gateway start. If you edited `budget.yaml` directly, the cache is stale. Use `/budget set global daily ` to hot-reload, or restart the gateway.

**Cost is $0.00 for all sessions:**

Your model isn’t in the pricing table. Check `telemetry.log` for a one-time warning like:

```

hermes-telemetry: unknown model 'openrouter/some-model' — cost recorded as $0.00

```

Add it to `pricing.yaml`.

**Provider Est% > 0:**

Your provider returns `usage=None` for some/all calls. Tokens are estimated. Check `/stats providers` to see which providers are affected. If Est% is 100% for your main provider, all spend is estimated and budget hard-verdicts degrade to soft under `warn_only` mode.

**Plugin not loading at all:**

Check `telemetry.log` for errors. Common causes:

- Missing `pyyaml` in the gateway’s venv: `pip install pyyaml`

- Plugin not in `plugins.enabled` in config.yaml

- Syntax error in `pricing.yaml` or `budget.yaml`

-----

## License

MIT — see [LICENSE](https://github.com/nujovich/hermes-telemetry/blob/main/LICENSE).

-----

## Hermes Agent Challenge

This plugin was built for the [**Hermes Agent Challenge**](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd) — a $1,000 competition to build the most useful Hermes Agent plugins and extensions.

**🔗 Challenge Entry:** [hermes-telemetry on dev.to](https://dev.to/devteam/join-the-hermes-agent-challenge-1000-in-prizes-13cd)

**🛠️ Built by:** [Nadia Ujovich](https://github.com/nujovich)

**💡 Why this plugin:** Every AI system needs observability and cost control. This plugin gives Hermes Agent users the visibility to optimize their workflows and the guardrails to prevent bill shock — essential for production deployments and automated cron jobs.

-----

*Made with ☕ for the Hermes Agent ecosystem*
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nujovich/hermes-telemetry

Awesome Lists containing this project

README