{"id":51204660,"url":"https://github.com/congvmit/awesome-llm-token-reduction","last_synced_at":"2026-06-28T02:33:05.708Z","repository":{"id":364482971,"uuid":"1267966563","full_name":"congvmit/awesome-llm-token-reduction","owner":"congvmit","description":"A curated list of techniques, tools, and research for reducing LLM token usage. Optimize context for Claude Code, Copilot, Cursor, and Aider.","archived":false,"fork":false,"pushed_at":"2026-06-13T06:40:36.000Z","size":20,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-13T08:13:31.492Z","etag":null,"topics":["ai-coding-assistant","awesome","awesome-list","claude-code","context-optimization","github-copilot","llm","openai-codex","prompt-compression","token-reduction"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/congvmit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["congvmit"]}},"created_at":"2026-06-13T02:48:17.000Z","updated_at":"2026-06-13T07:07:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/congvmit/awesome-llm-token-reduction","commit_stats":null,"previous_names":["congvmit/awesome-llm-token-reduction"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/congvmit/awesome-llm-token-reduction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/congvmit%2Fawesome-llm-token-reduction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/congvmit%2Fawesome-llm-token-reduction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/congvmit%2Fawesome-llm-token-reduction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/congvmit%2Fawesome-llm-token-reduction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/congvmit","download_url":"https://codeload.github.com/congvmit/awesome-llm-token-reduction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/congvmit%2Fawesome-llm-token-reduction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34875357,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-28T02:00:05.809Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-coding-assistant","awesome","awesome-list","claude-code","context-optimization","github-copilot","llm","openai-codex","prompt-compression","token-reduction"],"created_at":"2026-06-28T02:33:04.659Z","updated_at":"2026-06-28T02:33:05.701Z","avatar_url":"https://github.com/congvmit.png","language":null,"funding_links":["https://github.com/sponsors/congvmit"],"categories":[],"sub_categories":[],"readme":"# Awesome LLM Token Reduction [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com) [![License: CC0-1.0](https://img.shields.io/badge/License-CC0_1.0-lightgrey.svg)](https://creativecommons.org/publicdomain/zero/1.0/)\n\n\u003e A curated list of techniques, tools, and research for reducing LLM token usage — with a focus on AI coding assistants like Claude Code, OpenAI Codex, and GitHub Copilot.\n\nEvery prompt and response costs tokens, and coding agents burn through them fast: large files, tool output, logs, and long sessions all inflate the context window. This list collects the drop-in tools, libraries, data formats, and papers that cut tokens while keeping answers intact.\n\n## Contents\n\n- [Surveys \\\u0026 Background](#surveys--background)\n- [Coding-Assistant Token Savers](#coding-assistant-token-savers)\n- [Prompt Compression Libraries](#prompt-compression-libraries)\n- [Token-Efficient Data Formats](#token-efficient-data-formats)\n- [Context \\\u0026 Memory Management](#context--memory-management)\n- [Output Compression](#output-compression)\n- [Research \\\u0026 Methods](#research--methods)\n- [Star History](#star-history)\n\n## Surveys \u0026 Background\n\n\u003e Start here for the lay of the land before picking a technique.\n\n- [Prompt Compression for Large Language Models: A Survey](https://arxiv.org/abs/2410.12388) - Taxonomy of hard- and soft-prompt compression methods, mechanisms, and open problems.\n\n## Coding-Assistant Token Savers\n\n\u003e Drop-in proxies, plugins, hooks, and MCP servers that cut tokens for Claude Code, Codex, Copilot, Cursor, and Aider.\n\n- [claude-rolling-context](https://github.com/NodeNestor/claude-rolling-context) - Claude Code plugin that compresses old messages while keeping recent context verbatim. ![Stars](https://img.shields.io/github/stars/NodeNestor/claude-rolling-context?style=social)\n- [claude-shorthand](https://github.com/gladehq/claude-shorthand) - LLMLingua-2 prompt-compression hook for Claude Code. ![Stars](https://img.shields.io/github/stars/gladehq/claude-shorthand?style=social)\n- [ClaudeShrink](https://github.com/g-akshay/ClaudeShrink) - Claude Code skill that shrinks large prompts and files with LLMLingua to save tokens. ![Stars](https://img.shields.io/github/stars/g-akshay/ClaudeShrink?style=social)\n- [engram](https://github.com/pythondatascrape/engram) - Local-first context compression for AI coding tools, deduping redundant tokens across calls. ![Stars](https://img.shields.io/github/stars/pythondatascrape/engram?style=social)\n- [entroly](https://github.com/juyterman1000/entroly) - Local proxy that compresses context for Claude Code, Codex, Cursor, and Aider. ![Stars](https://img.shields.io/github/stars/juyterman1000/entroly?style=social)\n- [headroom](https://github.com/chopratejas/headroom) - Compresses tool output, logs, files, and RAG chunks before they reach the LLM. ![Stars](https://img.shields.io/github/stars/chopratejas/headroom?style=social)\n- [llmtrim](https://github.com/fkiene/llmtrim) - Provider-agnostic Rust proxy that compresses input, output, and cache with no extra model calls. ![Stars](https://img.shields.io/github/stars/fkiene/llmtrim?style=social)\n- [rtk](https://github.com/rtk-ai/rtk) - CLI proxy that cuts LLM token use 60-90% on common dev commands, single Rust binary. ![Stars](https://img.shields.io/github/stars/rtk-ai/rtk?style=social)\n- [sigmap](https://github.com/manojmallick/sigmap) - Zero-dependency MCP server for AST-based code context reduction across 31 languages. ![Stars](https://img.shields.io/github/stars/manojmallick/sigmap?style=social)\n- [token-optimizer-mcp](https://github.com/ooples/token-optimizer-mcp) - Claude Code MCP server reaching 95%+ token reduction through caching and optimization. ![Stars](https://img.shields.io/github/stars/ooples/token-optimizer-mcp?style=social)\n- [token-reducer](https://github.com/Madhan230205/token-reducer) - Local-first Claude Code context compression using hybrid RAG and AST chunking. ![Stars](https://img.shields.io/github/stars/Madhan230205/token-reducer?style=social)\n- [TokenTamer](https://github.com/borhen68/TokenTamer) - Drop-in proxy that compresses bloated code context in real time to cut API costs. ![Stars](https://img.shields.io/github/stars/borhen68/TokenTamer?style=social)\n- [tokless](https://github.com/HoangP8/tokless) - Unified CLI to install and update token-saving plugins for Claude Code, Codex, and OpenCode. ![Stars](https://img.shields.io/github/stars/HoangP8/tokless?style=social)\n\n## Prompt Compression Libraries\n\n\u003e General-purpose SDKs you call directly to compress prompts in any LLM app.\n\n- [claw-compactor](https://github.com/open-compress/claw-compactor) - 14-stage reversible, AST-aware pipeline for LLM token compression with zero inference cost. ![Stars](https://img.shields.io/github/stars/open-compress/claw-compactor?style=social)\n- [leanctx](https://github.com/jia-gao/leanctx) - Drop-in prompt-compression SDK for production LLM apps, built on LLMLingua-2. ![Stars](https://img.shields.io/github/stars/jia-gao/leanctx?style=social)\n- [LLMLingua](https://github.com/microsoft/LLMLingua) - Microsoft toolkit compressing prompts and KV-cache up to 20x with minimal quality loss. ![Stars](https://img.shields.io/github/stars/microsoft/LLMLingua?style=social)\n- [llmlingua-2-js](https://github.com/atjsh/llmlingua-2-js) - JavaScript/TypeScript implementation of LLMLingua-2 for browser and Node. ![Stars](https://img.shields.io/github/stars/atjsh/llmlingua-2-js?style=social)\n\n## Token-Efficient Data Formats\n\n\u003e Compact, LLM-friendly encodings that pass the same data in fewer tokens than JSON.\n\n- [TOON](https://github.com/toon-format/toon) - Token-Oriented Object Notation, a lossless JSON encoding that cuts tokens ~30-60% for uniform data. ![Stars](https://img.shields.io/github/stars/toon-format/toon?style=social)\n- [Tooner](https://github.com/chaindead/tooner) - MCP proxy that converts JSON tool responses to TOON before they reach the model. ![Stars](https://img.shields.io/github/stars/chaindead/tooner?style=social)\n\n## Context \u0026 Memory Management\n\n\u003e Persist and retrieve only what matters, so sessions stay short instead of replaying everything.\n\n- [codex-agent-mem](https://github.com/MarceloCaporale/codex-agent-mem) - Local-first MCP memory layer for Codex and Claude with compact, token-saving context packs. ![Stars](https://img.shields.io/github/stars/MarceloCaporale/codex-agent-mem?style=social)\n- [mnemosyne](https://github.com/castnettech/mnemosyne) - Zero-dependency knowledge compression, ingestion, and hybrid retrieval engine. ![Stars](https://img.shields.io/github/stars/castnettech/mnemosyne?style=social)\n- [Zep](https://github.com/getzep/zep) - Context engineering platform that assembles relationship-aware context from a temporal knowledge graph. ![Stars](https://img.shields.io/github/stars/getzep/zep?style=social)\n\n## Output Compression\n\n\u003e Reduce generation tokens — the part you pay the most for — without losing the answer.\n\n- [caveman](https://github.com/JuliusBrussee/caveman) - Claude Code skill that rewrites output in terse \"caveman speak\" to cut ~65% of tokens. ![Stars](https://img.shields.io/github/stars/JuliusBrussee/caveman?style=social)\n- [scrooge-mode](https://github.com/Kir93/scrooge-mode) - Output-compression skill for Claude Code and Codex measured on real session output tokens. ![Stars](https://img.shields.io/github/stars/Kir93/scrooge-mode?style=social)\n- [squeez](https://github.com/KRLabsOrg/squeez) - Squeezes verbose LLM agent tool output down to only the relevant lines. ![Stars](https://img.shields.io/github/stars/KRLabsOrg/squeez?style=social)\n\n## Research \u0026 Methods\n\n\u003e Foundational papers behind the tools above.\n\n- [Adapting Language Models to Compress Contexts](https://arxiv.org/abs/2305.14788) - AutoCompressors that summarize long contexts into compact summary vectors.\n- [In-Context Autoencoder for Context Compression](https://arxiv.org/abs/2307.06945) - ICAE encodes long context into a few memory slots for a frozen LLM.\n- [Learning to Compress Prompts with Gist Tokens](https://arxiv.org/abs/2304.08467) - Gisting trains an LM to compress prompts into reusable \"gist\" tokens, up to 26x.\n- [LLMLingua](https://arxiv.org/abs/2310.05736) - Coarse-to-fine prompt compression using a small LM to drop low-information tokens.\n- [LLMLingua-2](https://arxiv.org/abs/2403.12968) - Task-agnostic prompt compression via token classification distilled from GPT-4.\n- [LLoCO: Learning Long Contexts Offline](https://arxiv.org/abs/2404.07979) - Offline context compression plus LoRA finetuning for efficient long-context inference.\n- [LongLLMLingua](https://arxiv.org/abs/2310.06839) - Prompt compression that mitigates \"lost in the middle\" and boosts RAG with fewer tokens.\n\n## Contributing\n\nContributions are welcome! Please read the [contribution guidelines](CONTRIBUTING.md) first. In short: one entry per pull request, one entry per line, keep descriptions concise and present tense (ending with a period), verify the link resolves, and place the entry alphabetically within its section.\n\n---\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=congvmit/awesome-llm-token-reduction\u0026type=Date)](https://star-history.com/#congvmit/awesome-llm-token-reduction\u0026Date)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcongvmit%2Fawesome-llm-token-reduction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcongvmit%2Fawesome-llm-token-reduction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcongvmit%2Fawesome-llm-token-reduction/lists"}