https://github.com/chopratejas/headroom
The Context Optimization Layer for LLM Applications
https://github.com/chopratejas/headroom
agent ai anthropic compression context-engineering context-window fastapi langchain llm mcp openai proxy python rag token-optimization
Last synced: about 5 hours ago
JSON representation
The Context Optimization Layer for LLM Applications
- Host: GitHub
- URL: https://github.com/chopratejas/headroom
- Owner: chopratejas
- License: apache-2.0
- Created: 2026-01-07T19:58:51.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-11T18:05:13.000Z (7 days ago)
- Last Synced: 2026-04-11T18:07:10.016Z (7 days ago)
- Topics: agent, ai, anthropic, compression, context-engineering, context-window, fastapi, langchain, llm, mcp, openai, proxy, python, rag, token-optimization
- Language: Python
- Homepage: https://chopratejas.github.io/headroom/
- Size: 35.5 MB
- Stars: 1,269
- Watchers: 10
- Forks: 113
- Open Issues: 33
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Notice: NOTICE
Awesome Lists containing this project
- awesome-ChatGPT-repositories - headroom - The Context Optimization Layer for LLM Applications (The latest additions π)
- awesome-LangGraph - chopratejas/headroom - optimization proxy layer for LLM applications β compresses token usage, manages context windows, and provides an OpenAI-compatible API for LangChain, MCP, and FastAPI stacks | <br> | (π Community Projects / π οΈ Developer Tools)
- awesome-ai-tools - Headroom - Context compression system that reduces token usage by 70-95% while preserving accuracy. Works as a proxy, library, or framework integration. (AI Back Ends)
README
# Headroom
**Compress everything your AI agent reads. Same answers, fraction of the tokens.**
[](https://github.com/chopratejas/headroom/actions/workflows/ci.yml)
[](https://pypi.org/project/headroom-ai/)
[](https://www.npmjs.com/package/headroom-ai)
[](https://huggingface.co/chopratejas/kompress-base)
[](https://headroomlabs.ai/dashboard)
[](LICENSE)
[](https://chopratejas.github.io/headroom/)

---
Every tool call, log line, DB read, RAG chunk, and file your agent injects into a prompt is mostly boilerplate. Headroom strips the noise and keeps the signal β **losslessly, locally, and without touching accuracy.**
> **100 logs. One FATAL error buried at position 67. Both runs found it.**
> Baseline **10,144 tokens** β Headroom **1,260 tokens** β **87% fewer, identical answer.**
> `python examples/needle_in_haystack_test.py`
---
## Quick start
Works with Anthropic, OpenAI, Google, Bedrock, Vertex, Azure, OpenRouter, and 100+ models via LiteLLM.
**Wrap your coding agent β one command:**
```bash
pip install "headroom-ai[all]"
headroom wrap claude # Claude Code
headroom wrap codex # Codex
headroom wrap cursor # Cursor
headroom wrap aider # Aider
headroom wrap copilot # GitHub Copilot CLI
```
**Drop it into your own code β Python or TypeScript:**
```python
from headroom import compress
result = compress(messages, model="claude-sonnet-4-5")
response = client.messages.create(model="claude-sonnet-4-5", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
```
```typescript
import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'gpt-4o' });
```
**Or run it as a proxy β zero code changes, any language:**
```bash
headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app
```
---
## Why Headroom
- **Accuracy-preserving.** GSM8K **0.870 β 0.870** (Β±0.000). TruthfulQA **+0.030**. SQuAD v2 and BFCL both **97%** accuracy after compression. Validated on public OSS benchmarks you can rerun yourself.
- **Runs on your machine.** No cloud API, no data egress. Compression latency is milliseconds β faster end-to-end for Sonnet / Opus / GPT-4 class models than a hosted service round-trip.
- **[Kompress-base](https://huggingface.co/chopratejas/kompress-base) on HuggingFace.** Our open-source text compressor, fine-tuned on real agentic traces β tool outputs, logs, RAG chunks, code. Install with `pip install "headroom-ai[ml]"`.
- **Cross-agent memory and learning.** Claude Code saves a fact, Codex reads it back. `headroom learn` mines failed sessions and writes corrections straight to `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` β reliability compounds over time.
- **Reversible (CCR).** Compression is not deletion. The model can always call `headroom_retrieve` to pull the original bytes. Nothing is thrown away.
Bundles the [RTK](https://github.com/rtk-ai/rtk) binary for shell-output rewriting β full [attribution below](#compared-to).
---
## How it fits
```
Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own codeβ¦)
β prompts Β· tool outputs Β· logs Β· RAG results Β· files
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Headroom (runs locally β your data stays here) β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β CacheAligner β ContentRouter β CCR β
β ββ SmartCrusher (JSON) β
β ββ CodeCompressor (AST) β
β ββ Kompress-base (text, HF) β
β β
β Cross-agent memory Β· headroom learn Β· MCP β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β compressed prompt + retrieval tool
βΌ
LLM provider (Anthropic Β· OpenAI Β· Bedrock Β· β¦)
```
β [Architecture](https://chopratejas.github.io/headroom/docs/architecture) Β· [CCR reversible compression](https://chopratejas.github.io/headroom/docs/ccr) Β· [Kompress-base model card](https://huggingface.co/chopratejas/kompress-base)
---
## Proof
**Savings on real agent workloads:**
| Workload | Before | After | Savings |
|-------------------------------|-------:|-------:|--------:|
| Code search (100 results) | 17,765 | 1,408 | **92%** |
| SRE incident debugging | 65,694 | 5,118 | **92%** |
| GitHub issue triage | 54,174 | 14,761 | **73%** |
| Codebase exploration | 78,502 | 41,254 | **47%** |
**Accuracy preserved on standard benchmarks:**
| Benchmark | Category | N | Baseline | Headroom | Delta |
|------------|----------|----:|---------:|---------:|----------:|
| GSM8K | Math | 100 | 0.870 | 0.870 | **Β±0.000**|
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | **+0.030**|
| SQuAD v2 | QA | 100 | β | **97%** | 19% compression |
| BFCL | Tools | 100 | β | **97%** | 32% compression |
Reproduce:
```bash
python -m headroom.evals suite --tier 1
```
**Community, live:**
β [Full benchmarks & methodology](https://chopratejas.github.io/headroom/docs/benchmarks)
---
## Built for coding agents
| Agent | One-command wrap | Notes |
|--------------------|------------------------------------|------------------------------------------------------------------|
| **Claude Code** | `headroom wrap claude` | `--memory` for cross-agent memory, `--code-graph` for codebase intel |
| **Codex** | `headroom wrap codex --memory` | Shares the same memory store as Claude |
| **Cursor** | `headroom wrap cursor` | Prints Cursor config β paste once, done |
| **Aider** | `headroom wrap aider` | Starts proxy, launches Aider |
| **Copilot CLI** | `headroom wrap copilot` | Starts proxy, launches Copilot |
| **OpenClaw** | `headroom wrap openclaw` | Installs Headroom as ContextEngine plugin |
MCP-native too β `headroom mcp install` exposes `headroom_compress`, `headroom_retrieve`, and `headroom_stats` to any MCP client.
---
## Integrations
Drop Headroom into any stack
| Your setup | Hook in with |
|-------------------------|------------------------------------------------------------------|
| Any Python app | `compress(messages, model=β¦)` |
| Any TypeScript app | `await compress(messages, { model })` |
| Anthropic / OpenAI SDK | `withHeadroom(new Anthropic())` Β· `withHeadroom(new OpenAI())` |
| Vercel AI SDK | `wrapLanguageModel({ model, middleware: headroomMiddleware() })` |
| LiteLLM | `litellm.callbacks = [HeadroomCallback()]` |
| LangChain | `HeadroomChatModel(your_llm)` |
| Agno | `HeadroomAgnoModel(your_model)` |
| Strands | [Strands guide](https://chopratejas.github.io/headroom/docs/strands) |
| ASGI apps | `app.add_middleware(CompressionMiddleware)` |
| Multi-agent | `SharedContext().put / .get` |
| MCP clients | `headroom mcp install` |
What's inside
- **SmartCrusher** β universal JSON: arrays of dicts, nested objects, mixed types.
- **CodeCompressor** β AST-aware for Python, JS, Go, Rust, Java, C++.
- **Kompress-base** β our HuggingFace model, trained on agentic traces.
- **Image compression** β 40β90% reduction via trained ML router.
- **CacheAligner** β stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
- **IntelligentContext** β score-based context fitting with learned importance.
- **CCR** β reversible compression; LLM retrieves originals on demand.
- **Cross-agent memory** β shared store, agent provenance, auto-dedup.
- **SharedContext** β compressed context passing across multi-agent workflows.
- **`headroom learn`** β plugin-based failure mining for Claude, Codex, Gemini.
---
## Install
```bash
pip install "headroom-ai[all]" # Python, everything
npm install headroom-ai # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest
```
Granular extras: `[proxy]`, `[mcp]`, `[ml]` (Kompress-base), `[agno]`, `[langchain]`, `[evals]`. Requires **Python 3.10+**.
β [Installation guide](https://chopratejas.github.io/headroom/docs/installation) β Docker tags, persistent service, PowerShell, devcontainers.
---
## Documentation
| Start here | Go deeper |
|-------------------------------------------------------------------------|------------------------------------------------------------------------|
| [Quickstart](https://chopratejas.github.io/headroom/docs/quickstart) | [Architecture](https://chopratejas.github.io/headroom/docs/architecture) |
| [Proxy](https://chopratejas.github.io/headroom/docs/proxy) | [How compression works](https://chopratejas.github.io/headroom/docs/how-compression-works) |
| [MCP tools](https://chopratejas.github.io/headroom/docs/mcp) | [CCR β reversible compression](https://chopratejas.github.io/headroom/docs/ccr) |
| [Memory](https://chopratejas.github.io/headroom/docs/memory) | [Cache optimization](https://chopratejas.github.io/headroom/docs/cache-optimization) |
| [Failure learning](https://chopratejas.github.io/headroom/docs/failure-learning) | [Benchmarks](https://chopratejas.github.io/headroom/docs/benchmarks) |
| [Configuration](https://chopratejas.github.io/headroom/docs/configuration) | [Limitations](https://chopratejas.github.io/headroom/docs/limitations) |
---
## Compared to
Headroom runs **locally**, covers **every** content type (not just CLI or text), works with every major framework, and is **reversible**.
| | Scope | Deploy | Local | Reversible |
|----------------------------------|-------------------------------------------------|-------------------------------------|:-----:|:----------:|
| **Headroom** | All context β tools, RAG, logs, files, history | Proxy Β· library Β· middleware Β· MCP | Yes | Yes |
| [RTK](https://github.com/rtk-ai/rtk) | CLI command outputs | CLI wrapper | Yes | No |
| [Compresr](https://compresr.ai), [Token Co.](https://thetokencompany.ai) | Text sent to their API | Hosted API call | No | No |
| OpenAI Compaction | Conversation history | Provider-native | No | No |
> **Attribution.** Headroom ships with the excellent [RTK](https://github.com/rtk-ai/rtk) binary for shell-output rewriting β `git show` β `git show --short`, noisy `ls` β scoped, chatty installers β summarized. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it.
---
## Contributing
```bash
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
```
Devcontainers in `.devcontainer/` (default + `memory-stack` with Qdrant & Neo4j). See [CONTRIBUTING.md](CONTRIBUTING.md).
---
## Community
- **[Live leaderboard](https://headroomlabs.ai/dashboard)** β 60B+ tokens saved and counting.
- **[Discord](https://discord.gg/yRmaUNpsPJ)** β questions, feedback, war stories.
- **[Kompress-base on HuggingFace](https://huggingface.co/chopratejas/kompress-base)** β the model behind our text compression.
## License
Apache 2.0 β see [LICENSE](LICENSE).