https://github.com/congvmit/awesome-llm-token-reduction
A curated list of techniques, tools, and research for reducing LLM token usage. Optimize context for Claude Code, Copilot, Cursor, and Aider.
https://github.com/congvmit/awesome-llm-token-reduction
List: awesome-llm-token-reduction
ai-coding-assistant awesome awesome-list claude-code context-optimization github-copilot llm openai-codex prompt-compression token-reduction
Last synced: 2 days ago
JSON representation
A curated list of techniques, tools, and research for reducing LLM token usage. Optimize context for Claude Code, Copilot, Cursor, and Aider.
- Host: GitHub
- URL: https://github.com/congvmit/awesome-llm-token-reduction
- Owner: congvmit
- License: cc0-1.0
- Created: 2026-06-13T02:48:17.000Z (17 days ago)
- Default Branch: main
- Last Pushed: 2026-06-13T06:40:36.000Z (17 days ago)
- Last Synced: 2026-06-13T08:13:31.492Z (17 days ago)
- Topics: ai-coding-assistant, awesome, awesome-list, claude-code, context-optimization, github-copilot, llm, openai-codex, prompt-compression, token-reduction
- Size: 19.5 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# Awesome LLM Token Reduction [](https://awesome.re) [](http://makeapullrequest.com) [](https://creativecommons.org/publicdomain/zero/1.0/)
> A curated list of techniques, tools, and research for reducing LLM token usage — with a focus on AI coding assistants like Claude Code, OpenAI Codex, and GitHub Copilot.
Every prompt and response costs tokens, and coding agents burn through them fast: large files, tool output, logs, and long sessions all inflate the context window. This list collects the drop-in tools, libraries, data formats, and papers that cut tokens while keeping answers intact.
## Contents
- [Surveys \& Background](#surveys--background)
- [Coding-Assistant Token Savers](#coding-assistant-token-savers)
- [Prompt Compression Libraries](#prompt-compression-libraries)
- [Token-Efficient Data Formats](#token-efficient-data-formats)
- [Context \& Memory Management](#context--memory-management)
- [Output Compression](#output-compression)
- [Research \& Methods](#research--methods)
- [Star History](#star-history)
## Surveys & Background
> Start here for the lay of the land before picking a technique.
- [Prompt Compression for Large Language Models: A Survey](https://arxiv.org/abs/2410.12388) - Taxonomy of hard- and soft-prompt compression methods, mechanisms, and open problems.
## Coding-Assistant Token Savers
> Drop-in proxies, plugins, hooks, and MCP servers that cut tokens for Claude Code, Codex, Copilot, Cursor, and Aider.
- [claude-rolling-context](https://github.com/NodeNestor/claude-rolling-context) - Claude Code plugin that compresses old messages while keeping recent context verbatim. 
- [claude-shorthand](https://github.com/gladehq/claude-shorthand) - LLMLingua-2 prompt-compression hook for Claude Code. 
- [ClaudeShrink](https://github.com/g-akshay/ClaudeShrink) - Claude Code skill that shrinks large prompts and files with LLMLingua to save tokens. 
- [engram](https://github.com/pythondatascrape/engram) - Local-first context compression for AI coding tools, deduping redundant tokens across calls. 
- [entroly](https://github.com/juyterman1000/entroly) - Local proxy that compresses context for Claude Code, Codex, Cursor, and Aider. 
- [headroom](https://github.com/chopratejas/headroom) - Compresses tool output, logs, files, and RAG chunks before they reach the LLM. 
- [llmtrim](https://github.com/fkiene/llmtrim) - Provider-agnostic Rust proxy that compresses input, output, and cache with no extra model calls. 
- [rtk](https://github.com/rtk-ai/rtk) - CLI proxy that cuts LLM token use 60-90% on common dev commands, single Rust binary. 
- [sigmap](https://github.com/manojmallick/sigmap) - Zero-dependency MCP server for AST-based code context reduction across 31 languages. 
- [token-optimizer-mcp](https://github.com/ooples/token-optimizer-mcp) - Claude Code MCP server reaching 95%+ token reduction through caching and optimization. 
- [token-reducer](https://github.com/Madhan230205/token-reducer) - Local-first Claude Code context compression using hybrid RAG and AST chunking. 
- [TokenTamer](https://github.com/borhen68/TokenTamer) - Drop-in proxy that compresses bloated code context in real time to cut API costs. 
- [tokless](https://github.com/HoangP8/tokless) - Unified CLI to install and update token-saving plugins for Claude Code, Codex, and OpenCode. 
## Prompt Compression Libraries
> General-purpose SDKs you call directly to compress prompts in any LLM app.
- [claw-compactor](https://github.com/open-compress/claw-compactor) - 14-stage reversible, AST-aware pipeline for LLM token compression with zero inference cost. 
- [leanctx](https://github.com/jia-gao/leanctx) - Drop-in prompt-compression SDK for production LLM apps, built on LLMLingua-2. 
- [LLMLingua](https://github.com/microsoft/LLMLingua) - Microsoft toolkit compressing prompts and KV-cache up to 20x with minimal quality loss. 
- [llmlingua-2-js](https://github.com/atjsh/llmlingua-2-js) - JavaScript/TypeScript implementation of LLMLingua-2 for browser and Node. 
## Token-Efficient Data Formats
> Compact, LLM-friendly encodings that pass the same data in fewer tokens than JSON.
- [TOON](https://github.com/toon-format/toon) - Token-Oriented Object Notation, a lossless JSON encoding that cuts tokens ~30-60% for uniform data. 
- [Tooner](https://github.com/chaindead/tooner) - MCP proxy that converts JSON tool responses to TOON before they reach the model. 
## Context & Memory Management
> Persist and retrieve only what matters, so sessions stay short instead of replaying everything.
- [codex-agent-mem](https://github.com/MarceloCaporale/codex-agent-mem) - Local-first MCP memory layer for Codex and Claude with compact, token-saving context packs. 
- [mnemosyne](https://github.com/castnettech/mnemosyne) - Zero-dependency knowledge compression, ingestion, and hybrid retrieval engine. 
- [Zep](https://github.com/getzep/zep) - Context engineering platform that assembles relationship-aware context from a temporal knowledge graph. 
## Output Compression
> Reduce generation tokens — the part you pay the most for — without losing the answer.
- [caveman](https://github.com/JuliusBrussee/caveman) - Claude Code skill that rewrites output in terse "caveman speak" to cut ~65% of tokens. 
- [scrooge-mode](https://github.com/Kir93/scrooge-mode) - Output-compression skill for Claude Code and Codex measured on real session output tokens. 
- [squeez](https://github.com/KRLabsOrg/squeez) - Squeezes verbose LLM agent tool output down to only the relevant lines. 
## Research & Methods
> Foundational papers behind the tools above.
- [Adapting Language Models to Compress Contexts](https://arxiv.org/abs/2305.14788) - AutoCompressors that summarize long contexts into compact summary vectors.
- [In-Context Autoencoder for Context Compression](https://arxiv.org/abs/2307.06945) - ICAE encodes long context into a few memory slots for a frozen LLM.
- [Learning to Compress Prompts with Gist Tokens](https://arxiv.org/abs/2304.08467) - Gisting trains an LM to compress prompts into reusable "gist" tokens, up to 26x.
- [LLMLingua](https://arxiv.org/abs/2310.05736) - Coarse-to-fine prompt compression using a small LM to drop low-information tokens.
- [LLMLingua-2](https://arxiv.org/abs/2403.12968) - Task-agnostic prompt compression via token classification distilled from GPT-4.
- [LLoCO: Learning Long Contexts Offline](https://arxiv.org/abs/2404.07979) - Offline context compression plus LoRA finetuning for efficient long-context inference.
- [LongLLMLingua](https://arxiv.org/abs/2310.06839) - Prompt compression that mitigates "lost in the middle" and boosts RAG with fewer tokens.
## Contributing
Contributions are welcome! Please read the [contribution guidelines](CONTRIBUTING.md) first. In short: one entry per pull request, one entry per line, keep descriptions concise and present tense (ending with a period), verify the link resolves, and place the entry alphabetically within its section.
---
## Star History
[](https://star-history.com/#congvmit/awesome-llm-token-reduction&Date)