An open API service indexing awesome lists of open source software.

https://github.com/chopratejas/headroom

The Context Optimization Layer for LLM Applications
https://github.com/chopratejas/headroom

agent ai anthropic compression context-engineering context-window fastapi langchain llm mcp openai proxy python rag token-optimization

Last synced: about 5 hours ago
JSON representation

The Context Optimization Layer for LLM Applications

Awesome Lists containing this project

README

          

# Headroom

**Compress everything your AI agent reads. Same answers, fraction of the tokens.**

[![CI](https://github.com/chopratejas/headroom/actions/workflows/ci.yml/badge.svg)](https://github.com/chopratejas/headroom/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/headroom-ai.svg)](https://pypi.org/project/headroom-ai/)
[![npm](https://img.shields.io/npm/v/headroom-ai.svg)](https://www.npmjs.com/package/headroom-ai)
[![Model: Kompress-base](https://img.shields.io/badge/model-Kompress--base-yellow.svg)](https://huggingface.co/chopratejas/kompress-base)
[![Tokens saved: 60B+](https://img.shields.io/badge/tokens%20saved-60B%2B-2ea44f)](https://headroomlabs.ai/dashboard)
[![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![Docs](https://img.shields.io/badge/docs-online-blue.svg)](https://chopratejas.github.io/headroom/)

Headroom in action

---

Every tool call, log line, DB read, RAG chunk, and file your agent injects into a prompt is mostly boilerplate. Headroom strips the noise and keeps the signal β€” **losslessly, locally, and without touching accuracy.**

> **100 logs. One FATAL error buried at position 67. Both runs found it.**
> Baseline **10,144 tokens** β†’ Headroom **1,260 tokens** β€” **87% fewer, identical answer.**
> `python examples/needle_in_haystack_test.py`

---

## Quick start

Works with Anthropic, OpenAI, Google, Bedrock, Vertex, Azure, OpenRouter, and 100+ models via LiteLLM.

**Wrap your coding agent β€” one command:**

```bash
pip install "headroom-ai[all]"

headroom wrap claude # Claude Code
headroom wrap codex # Codex
headroom wrap cursor # Cursor
headroom wrap aider # Aider
headroom wrap copilot # GitHub Copilot CLI
```

**Drop it into your own code β€” Python or TypeScript:**

```python
from headroom import compress

result = compress(messages, model="claude-sonnet-4-5")
response = client.messages.create(model="claude-sonnet-4-5", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
```

```typescript
import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'gpt-4o' });
```

**Or run it as a proxy β€” zero code changes, any language:**

```bash
headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app
```

---

## Why Headroom

- **Accuracy-preserving.** GSM8K **0.870 β†’ 0.870** (Β±0.000). TruthfulQA **+0.030**. SQuAD v2 and BFCL both **97%** accuracy after compression. Validated on public OSS benchmarks you can rerun yourself.
- **Runs on your machine.** No cloud API, no data egress. Compression latency is milliseconds β€” faster end-to-end for Sonnet / Opus / GPT-4 class models than a hosted service round-trip.
- **[Kompress-base](https://huggingface.co/chopratejas/kompress-base) on HuggingFace.** Our open-source text compressor, fine-tuned on real agentic traces β€” tool outputs, logs, RAG chunks, code. Install with `pip install "headroom-ai[ml]"`.
- **Cross-agent memory and learning.** Claude Code saves a fact, Codex reads it back. `headroom learn` mines failed sessions and writes corrections straight to `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` β€” reliability compounds over time.
- **Reversible (CCR).** Compression is not deletion. The model can always call `headroom_retrieve` to pull the original bytes. Nothing is thrown away.

Bundles the [RTK](https://github.com/rtk-ai/rtk) binary for shell-output rewriting β€” full [attribution below](#compared-to).

---

## How it fits

```
Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
β”‚ prompts Β· tool outputs Β· logs Β· RAG results Β· files
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Headroom (runs locally β€” your data stays here) β”‚
β”‚ ─────────────────────────────────────────────── β”‚
β”‚ CacheAligner β†’ ContentRouter β†’ CCR β”‚
β”‚ β”œβ”€ SmartCrusher (JSON) β”‚
β”‚ β”œβ”€ CodeCompressor (AST) β”‚
β”‚ └─ Kompress-base (text, HF) β”‚
β”‚ β”‚
β”‚ Cross-agent memory Β· headroom learn Β· MCP β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ compressed prompt + retrieval tool
β–Ό
LLM provider (Anthropic Β· OpenAI Β· Bedrock Β· …)
```

β†’ [Architecture](https://chopratejas.github.io/headroom/docs/architecture) Β· [CCR reversible compression](https://chopratejas.github.io/headroom/docs/ccr) Β· [Kompress-base model card](https://huggingface.co/chopratejas/kompress-base)

---

## Proof

**Savings on real agent workloads:**

| Workload | Before | After | Savings |
|-------------------------------|-------:|-------:|--------:|
| Code search (100 results) | 17,765 | 1,408 | **92%** |
| SRE incident debugging | 65,694 | 5,118 | **92%** |
| GitHub issue triage | 54,174 | 14,761 | **73%** |
| Codebase exploration | 78,502 | 41,254 | **47%** |

**Accuracy preserved on standard benchmarks:**

| Benchmark | Category | N | Baseline | Headroom | Delta |
|------------|----------|----:|---------:|---------:|----------:|
| GSM8K | Math | 100 | 0.870 | 0.870 | **Β±0.000**|
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | **+0.030**|
| SQuAD v2 | QA | 100 | β€” | **97%** | 19% compression |
| BFCL | Tools | 100 | β€” | **97%** | 32% compression |

Reproduce:

```bash
python -m headroom.evals suite --tier 1
```

**Community, live:**



60B+ tokens saved β€” community leaderboard

60B+ tokens saved by the community in the last 20 days β€” live leaderboard β†’


β†’ [Full benchmarks & methodology](https://chopratejas.github.io/headroom/docs/benchmarks)

---

## Built for coding agents

| Agent | One-command wrap | Notes |
|--------------------|------------------------------------|------------------------------------------------------------------|
| **Claude Code** | `headroom wrap claude` | `--memory` for cross-agent memory, `--code-graph` for codebase intel |
| **Codex** | `headroom wrap codex --memory` | Shares the same memory store as Claude |
| **Cursor** | `headroom wrap cursor` | Prints Cursor config β€” paste once, done |
| **Aider** | `headroom wrap aider` | Starts proxy, launches Aider |
| **Copilot CLI** | `headroom wrap copilot` | Starts proxy, launches Copilot |
| **OpenClaw** | `headroom wrap openclaw` | Installs Headroom as ContextEngine plugin |

MCP-native too β€” `headroom mcp install` exposes `headroom_compress`, `headroom_retrieve`, and `headroom_stats` to any MCP client.


headroom learn in action

---

## Integrations

Drop Headroom into any stack

| Your setup | Hook in with |
|-------------------------|------------------------------------------------------------------|
| Any Python app | `compress(messages, model=…)` |
| Any TypeScript app | `await compress(messages, { model })` |
| Anthropic / OpenAI SDK | `withHeadroom(new Anthropic())` Β· `withHeadroom(new OpenAI())` |
| Vercel AI SDK | `wrapLanguageModel({ model, middleware: headroomMiddleware() })` |
| LiteLLM | `litellm.callbacks = [HeadroomCallback()]` |
| LangChain | `HeadroomChatModel(your_llm)` |
| Agno | `HeadroomAgnoModel(your_model)` |
| Strands | [Strands guide](https://chopratejas.github.io/headroom/docs/strands) |
| ASGI apps | `app.add_middleware(CompressionMiddleware)` |
| Multi-agent | `SharedContext().put / .get` |
| MCP clients | `headroom mcp install` |

What's inside

- **SmartCrusher** β€” universal JSON: arrays of dicts, nested objects, mixed types.
- **CodeCompressor** β€” AST-aware for Python, JS, Go, Rust, Java, C++.
- **Kompress-base** β€” our HuggingFace model, trained on agentic traces.
- **Image compression** β€” 40–90% reduction via trained ML router.
- **CacheAligner** β€” stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
- **IntelligentContext** β€” score-based context fitting with learned importance.
- **CCR** β€” reversible compression; LLM retrieves originals on demand.
- **Cross-agent memory** β€” shared store, agent provenance, auto-dedup.
- **SharedContext** β€” compressed context passing across multi-agent workflows.
- **`headroom learn`** β€” plugin-based failure mining for Claude, Codex, Gemini.

---

## Install

```bash
pip install "headroom-ai[all]" # Python, everything
npm install headroom-ai # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest
```

Granular extras: `[proxy]`, `[mcp]`, `[ml]` (Kompress-base), `[agno]`, `[langchain]`, `[evals]`. Requires **Python 3.10+**.

β†’ [Installation guide](https://chopratejas.github.io/headroom/docs/installation) β€” Docker tags, persistent service, PowerShell, devcontainers.

---

## Documentation

| Start here | Go deeper |
|-------------------------------------------------------------------------|------------------------------------------------------------------------|
| [Quickstart](https://chopratejas.github.io/headroom/docs/quickstart) | [Architecture](https://chopratejas.github.io/headroom/docs/architecture) |
| [Proxy](https://chopratejas.github.io/headroom/docs/proxy) | [How compression works](https://chopratejas.github.io/headroom/docs/how-compression-works) |
| [MCP tools](https://chopratejas.github.io/headroom/docs/mcp) | [CCR β€” reversible compression](https://chopratejas.github.io/headroom/docs/ccr) |
| [Memory](https://chopratejas.github.io/headroom/docs/memory) | [Cache optimization](https://chopratejas.github.io/headroom/docs/cache-optimization) |
| [Failure learning](https://chopratejas.github.io/headroom/docs/failure-learning) | [Benchmarks](https://chopratejas.github.io/headroom/docs/benchmarks) |
| [Configuration](https://chopratejas.github.io/headroom/docs/configuration) | [Limitations](https://chopratejas.github.io/headroom/docs/limitations) |

---

## Compared to

Headroom runs **locally**, covers **every** content type (not just CLI or text), works with every major framework, and is **reversible**.

| | Scope | Deploy | Local | Reversible |
|----------------------------------|-------------------------------------------------|-------------------------------------|:-----:|:----------:|
| **Headroom** | All context β€” tools, RAG, logs, files, history | Proxy Β· library Β· middleware Β· MCP | Yes | Yes |
| [RTK](https://github.com/rtk-ai/rtk) | CLI command outputs | CLI wrapper | Yes | No |
| [Compresr](https://compresr.ai), [Token Co.](https://thetokencompany.ai) | Text sent to their API | Hosted API call | No | No |
| OpenAI Compaction | Conversation history | Provider-native | No | No |

> **Attribution.** Headroom ships with the excellent [RTK](https://github.com/rtk-ai/rtk) binary for shell-output rewriting β€” `git show` β†’ `git show --short`, noisy `ls` β†’ scoped, chatty installers β†’ summarized. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it.

---

## Contributing

```bash
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
```

Devcontainers in `.devcontainer/` (default + `memory-stack` with Qdrant & Neo4j). See [CONTRIBUTING.md](CONTRIBUTING.md).

---

## Community

- **[Live leaderboard](https://headroomlabs.ai/dashboard)** β€” 60B+ tokens saved and counting.
- **[Discord](https://discord.gg/yRmaUNpsPJ)** β€” questions, feedback, war stories.
- **[Kompress-base on HuggingFace](https://huggingface.co/chopratejas/kompress-base)** β€” the model behind our text compression.

## License

Apache 2.0 β€” see [LICENSE](LICENSE).