An open API service indexing awesome lists of open source software.

https://github.com/chopratejas/headroom

The Context Optimization Layer for LLM Applications
https://github.com/chopratejas/headroom

agent ai anthropic compression context-engineering context-window fastapi langchain llm mcp openai proxy python rag token-optimization

Last synced: 8 days ago
JSON representation

The Context Optimization Layer for LLM Applications

Awesome Lists containing this project

README

          

```
β–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•—
β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘
β•šβ•β• β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β• β•šβ•β•β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•
The context compression layer for AI agents
```

60–95% fewer tokens Β· library Β· proxy Β· MCP Β· 6 algorithms Β· local-first Β· reversible


CI
codecov
PyPI
npm
Model: Kompress-base
Tokens saved: 60B+
License: Apache 2.0
Docs


Docs Β·
Install Β·
Proof Β·
Agents Β·
Discord Β·
llms.txt


AI agents / LLMs: read /llms.txt here, or fetch the live index / full docs blob.

---

> Headroom compresses everything your AI agent reads β€” tool outputs, logs, RAG chunks, files, and conversation history β€” before it reaches the LLM. Same answers, fraction of the tokens.


Headroom in action

Live: 10,144 β†’ 1,260 tokens β€” same FATAL found.

## What it does

- **Library** β€” `compress(messages)` in Python or TypeScript, inline in any app
- **Proxy** β€” `headroom proxy --port 8787`, zero code changes, any language
- **Agent wrap** β€” `headroom wrap claude|codex|cursor|aider|copilot` in one command
- **MCP server** β€” `headroom_compress`, `headroom_retrieve`, `headroom_stats` for any MCP client
- **Cross-agent memory** β€” shared store across Claude, Codex, Gemini, auto-dedup
- **`headroom learn`** β€” mines failed sessions, writes corrections to `CLAUDE.md` / `AGENTS.md`
- **Reversible (CCR)** β€” originals never deleted; LLM retrieves on demand

## How it works (30 seconds)

```
Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
β”‚ prompts Β· tool outputs Β· logs Β· RAG results Β· files
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Headroom (runs locally β€” your data stays here) β”‚
β”‚ ─────────────────────────────────────────────── β”‚
β”‚ CacheAligner β†’ ContentRouter β†’ CCR β”‚
β”‚ β”œβ”€ SmartCrusher (JSON) β”‚
β”‚ β”œβ”€ CodeCompressor (AST) β”‚
β”‚ └─ Kompress-base (text, HF) β”‚
β”‚ β”‚
β”‚ Cross-agent memory Β· headroom learn Β· MCP β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ compressed prompt + retrieval tool
β–Ό
LLM provider (Anthropic Β· OpenAI Β· Bedrock Β· …)
```

- **ContentRouter** β€” detects content type, selects the right compressor
- **SmartCrusher / CodeCompressor / Kompress-base** β€” compress JSON, AST, or prose
- **CacheAligner** β€” stabilizes prefixes so provider KV caches actually hit
- **CCR** β€” stores originals locally; LLM calls `headroom_retrieve` if it needs them

β†’ [Architecture](https://headroom-docs.vercel.app/docs/architecture) Β· [CCR reversible compression](https://headroom-docs.vercel.app/docs/ccr) Β· [Kompress-base model card](https://huggingface.co/chopratejas/kompress-base)

## Get started (60 seconds)

```bash
# 1 β€” Install
pip install "headroom-ai[all]" # Python
npm install headroom-ai # Node / TypeScript

# 2 β€” Pick your mode
headroom wrap claude # wrap a coding agent
headroom proxy --port 8787 # drop-in proxy, zero code changes
# or: from headroom import compress # inline library

# 3 β€” See the savings
headroom stats
```

Granular extras: `[proxy]`, `[mcp]`, `[ml]`, `[agno]`, `[langchain]`, `[evals]`. Requires **Python 3.10+**.

## Proof

**Savings on real agent workloads:**

| Workload | Before | After | Savings |
|-------------------------------|-------:|-------:|--------:|
| Code search (100 results) | 17,765 | 1,408 | **92%** |
| SRE incident debugging | 65,694 | 5,118 | **92%** |
| GitHub issue triage | 54,174 | 14,761 | **73%** |
| Codebase exploration | 78,502 | 41,254 | **47%** |

**Accuracy preserved on standard benchmarks:**

| Benchmark | Category | N | Baseline | Headroom | Delta |
|------------|----------|----:|---------:|---------:|------------|
| GSM8K | Math | 100 | 0.870 | 0.870 | **Β±0.000** |
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | **+0.030** |
| SQuAD v2 | QA | 100 | β€” | **97%** | 19% compression |
| BFCL | Tools | 100 | β€” | **97%** | 32% compression |

Reproduce: `python -m headroom.evals suite --tier 1` Β· [Full benchmarks & methodology](https://headroom-docs.vercel.app/docs/benchmarks)



60B+ tokens saved β€” community leaderboard


60B+ tokens saved by the community β€” live leaderboard β†’

## Agent compatibility matrix

| Agent | `headroom wrap` | Notes |
|-------------|:---------------:|----------------------------------|
| Claude Code | ● | `--memory` Β· `--code-graph` |
| Codex | ● | shares memory with Claude |
| Cursor | ● | prints config β€” paste once |
| Aider | ● | starts proxy + launches |
| Copilot CLI | ● | starts proxy + launches |
| OpenClaw | ● | installs as ContextEngine plugin |

Any OpenAI-compatible client works via `headroom proxy`. MCP-native: `headroom mcp install`.

## When to use Β· When to skip

**Great fit if you…**
- run AI coding agents daily and want savings without changing your code
- work across multiple agents and want shared memory
- need reversible compression β€” originals always retrievable via CCR

**Skip it if you…**
- only use a single provider's native compaction and don't need cross-agent memory
- work in a sandboxed environment where local processes can't run

Integrations β€” drop Headroom into any stack

| Your setup | Hook in with |
|------------------------|------------------------------------------------------------------|
| Any Python app | `compress(messages, model=…)` |
| Any TypeScript app | `await compress(messages, { model })` |
| Anthropic / OpenAI SDK | `withHeadroom(new Anthropic())` Β· `withHeadroom(new OpenAI())` |
| Vercel AI SDK | `wrapLanguageModel({ model, middleware: headroomMiddleware() })` |
| LiteLLM | `litellm.callbacks = [HeadroomCallback()]` |
| LangChain | `HeadroomChatModel(your_llm)` |
| Agno | `HeadroomAgnoModel(your_model)` |
| Strands | [Strands guide](https://headroom-docs.vercel.app/docs/strands) |
| ASGI apps | `app.add_middleware(CompressionMiddleware)` |
| Multi-agent | `SharedContext().put / .get` |
| MCP clients | `headroom mcp install` |

What's inside

- **SmartCrusher** β€” universal JSON: arrays of dicts, nested objects, mixed types.
- **CodeCompressor** β€” AST-aware for Python, JS, Go, Rust, Java, C++.
- **Kompress-base** β€” our HuggingFace model, trained on agentic traces.
- **Image compression** β€” 40–90% reduction via trained ML router.
- **CacheAligner** β€” stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
- **IntelligentContext** β€” score-based context fitting with learned importance.
- **CCR** β€” reversible compression; LLM retrieves originals on demand.
- **Cross-agent memory** β€” shared store, agent provenance, auto-dedup.
- **SharedContext** β€” compressed context passing across multi-agent workflows.
- **`headroom learn`** β€” plugin-based failure mining for Claude, Codex, Gemini.

Pipeline internals

Headroom exposes one stable request lifecycle across `compress()`, the SDK, and the proxy:

`Setup` β†’ `Pre-Start` β†’ `Post-Start` β†’ `Input Received` β†’ `Input Cached` β†’ `Input Routed` β†’ `Input Compressed` β†’ `Input Remembered` β†’ `Pre-Send` β†’ `Post-Send` β†’ `Response Received`

- **Transforms** do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.
- **Pipeline extensions** observe or customize lifecycle stages via `on_pipeline_event(...)`.
- **Compression hooks** sit alongside the canonical lifecycle as an additional extension seam.
- **Proxy extensions** remain the server/app integration seam for ASGI middleware, routes, and startup policy.

Provider and tool-specific behavior lives under `headroom/providers/` so core orchestration stays focused on lifecycle, sequencing, and policy.

- **CLI/tool slices**: `headroom/providers/claude`, `copilot`, `codex`, `openclaw`
- **Provider runtime slices**: `headroom/providers/claude`, `gemini`, plus shared backend/runtime dispatch in `headroom/providers/registry.py`
- **Core files stay orchestration-first**: `wrap.py`, `client.py`, `cli/proxy.py`, and `proxy/server.py` delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.

## Install

```bash
pip install "headroom-ai[all]" # Python, everything
npm install headroom-ai # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest
```

Granular extras: `[proxy]`, `[mcp]`, `[ml]` (Kompress-base), `[agno]`, `[langchain]`, `[evals]`. Requires **Python 3.10+**.

Using `pipx`? Choose a supported interpreter explicitly:

```bash
pipx install --python python3.13 "headroom-ai[all]"
```

β†’ [Installation guide](https://headroom-docs.vercel.app/docs/installation) β€” Docker tags, persistent service, PowerShell, devcontainers.

## headroom learn


headroom learn in action

`headroom learn` β€” mines failed sessions, writes corrections to `CLAUDE.md` / `AGENTS.md` / `GEMINI.md`.

## Documentation

| Start here | Go deeper |
|-------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| [Quickstart](https://headroom-docs.vercel.app/docs/quickstart) | [Architecture](https://headroom-docs.vercel.app/docs/architecture) |
| [Proxy](https://headroom-docs.vercel.app/docs/proxy) | [How compression works](https://headroom-docs.vercel.app/docs/how-compression-works) |
| [MCP tools](https://headroom-docs.vercel.app/docs/mcp) | [CCR β€” reversible compression](https://headroom-docs.vercel.app/docs/ccr) |
| [Memory](https://headroom-docs.vercel.app/docs/memory) | [Cache optimization](https://headroom-docs.vercel.app/docs/cache-optimization) |
| [Failure learning](https://headroom-docs.vercel.app/docs/failure-learning) | [Benchmarks](https://headroom-docs.vercel.app/docs/benchmarks) |
| [Configuration](https://headroom-docs.vercel.app/docs/configuration) | [Limitations](https://headroom-docs.vercel.app/docs/limitations) |

## Compared to

Headroom runs **locally**, covers **every** content type, works with every major framework, and is **reversible**.

| | Scope | Deploy | Local | Reversible |
|------------------------------------------------------------------------------|------------------------------------------------|------------------------------------|:-----:|:----------:|
| **Headroom** | All context β€” tools, RAG, logs, files, history | Proxy Β· library Β· middleware Β· MCP | Yes | Yes |
| [RTK](https://github.com/rtk-ai/rtk) | CLI command outputs | CLI wrapper | Yes | No |
| [lean-ctx](https://github.com/yvgude/lean-ctx) | CLI commands, MCP tools, editor rules | CLI wrapper Β· MCP | Yes | No |
| [Compresr](https://compresr.ai), [Token Co.](https://thetokencompany.ai) | Text sent to their API | Hosted API call | No | No |
| OpenAI Compaction | Conversation history | Provider-native | No | No |

> **Attribution.** Headroom ships with the excellent [RTK](https://github.com/rtk-ai/rtk) binary for shell-output rewriting β€” `git show --short`, scoped `ls`, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use [lean-ctx](https://github.com/yvgude/lean-ctx) as the selected CLI context tool; set `HEADROOM_CONTEXT_TOOL=lean-ctx` before running `headroom wrap ...`.

## Contributing

```bash
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
```

Devcontainers in `.devcontainer/` (default + `memory-stack` with Qdrant & Neo4j). See [CONTRIBUTING.md](CONTRIBUTING.md).

## Community

- **[Live leaderboard](https://headroomlabs.ai/dashboard)** β€” 60B+ tokens saved and counting.
- **[Discord](https://discord.gg/yRmaUNpsPJ)** β€” questions, feedback, war stories.
- **[Kompress-base on HuggingFace](https://huggingface.co/chopratejas/kompress-base)** β€” the model behind our text compression.

## License

Apache 2.0 β€” see [LICENSE](LICENSE).