{"id":51045249,"url":"https://github.com/haru0416-dev/coffer","last_synced_at":"2026-06-22T13:01:59.265Z","repository":{"id":365978129,"uuid":"1274514930","full_name":"haru0416-dev/coffer","owner":"haru0416-dev","description":"Byte-exact, reversible compression of LLM tool-output, with an exact compute-digest — an MCP server and a transparent HTTP proxy.","archived":false,"fork":false,"pushed_at":"2026-06-19T18:24:29.000Z","size":287,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-19T19:05:52.141Z","etag":null,"topics":["ai-agents","compression","content-addressable-storage","context-window","llm","mcp","model-context-protocol","proxy","rust"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/haru0416-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-19T15:38:21.000Z","updated_at":"2026-06-19T18:24:33.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/haru0416-dev/coffer","commit_stats":null,"previous_names":["haru0416-dev/coffer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/haru0416-dev/coffer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haru0416-dev%2Fcoffer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haru0416-dev%2Fcoffer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haru0416-dev%2Fcoffer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haru0416-dev%2Fcoffer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/haru0416-dev","download_url":"https://codeload.github.com/haru0416-dev/coffer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haru0416-dev%2Fcoffer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34649822,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-22T02:00:06.391Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","compression","content-addressable-storage","context-window","llm","mcp","model-context-protocol","proxy","rust"],"created_at":"2026-06-22T13:01:58.038Z","updated_at":"2026-06-22T13:01:59.257Z","avatar_url":"https://github.com/haru0416-dev.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# coffer\n\n[![ci](https://github.com/haru0416-dev/coffer/actions/workflows/ci.yml/badge.svg)](https://github.com/haru0416-dev/coffer/actions/workflows/ci.yml)\n[![license: Apache-2.0](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)\n![status: experimental](https://img.shields.io/badge/status-experimental-orange.svg)\n\n\u003e Byte-exact, **reversible** compression of LLM tool-output — plus an exact **compute-digest** that\n\u003e answers questions over data too big to read back into the context window.\n\n![coffer compressing a 5000-pod kubectl dump ~90%, still byte-for-byte reversible](demo/demo.gif)\n\n**Status: experimental.** The engine, the CAS-backed MCP server and transparent proxy, and the\nproduction smoke gates are real and tested. Two properties are mechanical and verifiable today —\nbyte-exact recovery and exact aggregation. End-task *accuracy* claims (does compressing tool-output\npreserve or improve what the model answers?) are governed by a pre-registered protocol, never by a\ncompression percentage.\n\n## The idea\n\nWhen an agent dumps a 100-result code search, a 65k-line log, or a noisy RAG payload into the model's\ncontext, most of it is noise that both costs tokens and degrades the model's answers (\"context rot\").\ncoffer compresses that tool-output **before it enters the context window**, keeps the **original bytes**\nin a SHA-256 content-addressed store, and shows the model a short `\u003c\u003ccof:HASH\u003e\u003e` sentinel. The model\nrecovers the exact byte window (or a capped full payload) on demand, and the parts of the prompt the\nprovider is caching are never touched.\n\nTwo things make this more than a trimmer:\n\n- **Byte-exact reversibility.** Nothing is summarized away. `reconstruct(compress(x)) == x` byte-for-byte\n  (a Stage-0 invariant, property-tested), backed by a durable, hash-verified content store. A dropped\n  needle is recoverable, not lost.\n- **A compute-digest.** Some questions have no answer in any single surviving line — \"how many errors?\",\n  \"the record with the largest value\", \"the p95 latency\". coffer computes those **exactly, in Rust, over\n  ALL of the data including the offloaded bytes** (count / sum / mean / median / percentile / group-by /\n  argmax / threshold-count), with a refuse-rather-than-guess contract. No model call, no loss of\n  reversibility — the answer is exact even when the raw data was far too big to read. A frontier model\n  asked to count or sum a few thousand rows sitting in its own context gets the number wrong; computing\n  it in Rust over the held bytes does not.\n\n## Surfaces\n\n- **Transparent proxy.** Point an agent at it (`ANTHROPIC_BASE_URL=http://127.0.0.1:8788`); it compresses\n  the `tool_result` blocks of each request and forwards the rest unchanged, streaming the response back.\n  OpenAI Responses tool-output is handled the same way.\n- **MCP server.** Tools to point at server-held output instead of reading it: `coffer_describe` (a\n  generic exact summary of any record set — row count, per-field stats, count-by-value), `coffer_digest`\n  and `coffer_aggregate` (exact aggregates from a plain-English ask or a typed `count|sum|mean|min|max`\n  over a predicate, returned with the indices of the rows behind the number), `coffer_query` and\n  `coffer_select` (keep the matching rows, or hand back a new handle to narrow again), `coffer_pick`\n  (pull specific rows back to re-check a number), `coffer_search`/`coffer_lines` (drill into logs),\n  `coffer_rows`/`coffer_json`/`coffer_retrieve`/`coffer_unfold` (windowed JSON and bounded byte windows),\n  and `coffer_ingest` (hold a file).\n\nBoth share one content store, so the proxy can compress and the MCP server can recover the same bytes.\n\n## Honesty, up front\n\nContext-compression tools commonly measure compression % and accuracy on different datasets and report\nonly their best regime. coffer commits to the opposite, against a protocol fixed before the results:\n\n- Measure **end-task accuracy on the same workload**, at multiple compression levels → an\n  accuracy-vs-compression curve, per regime and content type.\n- The decisive test is **coffer vs naive head/tail truncation at the same token budget.** If a cheap\n  baseline matches us on a workload, we say so.\n- Report the **typical regime where we may lose**, not just the favorable tail.\n- Count tokens with the **target model's own tokenizer**, and **count retrieval round-trip tokens.**\n  Byte-faithful round-trip fidelity is reported separately from accuracy.\n\nWhere coffer does **not** win is just as clear. On plain retrieval that a frontier model's context window\nalready handles, compressing the input does not beat feeding it raw — coffer matches it, no more. And a\ncode-execution agent can compute the same exact aggregate by writing its own code; on accuracy that is a\ntie, not a coffer win. The difference worth stating plainly is narrower: coffer runs at the transport\nlayer before the bytes ever reach the model, needs no code sandbox or codegen round-trip, and keeps every\noriginal byte recoverable.\n\nIf the accuracy thesis fails its kill-probe, the failed curve is still a useful public result.\n\n## Quickstart\n\n```sh\n# Proxy: compress tool_result blocks transparently.\ncargo run --release -p coffer-proxy\n# then point your agent at it:\nANTHROPIC_BASE_URL=http://127.0.0.1:8788  # COFFER_PROXY_UPSTREAM defaults to api.anthropic.com\n\n# MCP server (stdio): register coffer-mcp with your agent, then direct tools at held output.\ncargo run --release -p coffer-mcp\n```\n\nAn npm launcher is scaffolded under [`npm/`](npm/): once published it will run the prebuilt native\nbinary for your platform (`npx coffer coffer-mcp`). It is **not on the npm registry yet** — for now,\nbuild from source as above.\n\nSafe by default: the proxy refuses a non-loopback bind unless `COFFER_PROXY_ALLOW_PUBLIC=1` (it has no\nauth and replays your upstream key); the MCP `coffer_run` shell tool is disabled unless\n`COFFER_MCP_ENABLE_RUN=1`. See [`docs/deployment.md`](docs/deployment.md) for production wiring.\n\n## Layout\n\n- `crates/` — the engine (`coffer-core`), content store (`coffer-cas`), tokenizer-parity counting\n  (`coffer-tokenizer`), MCP server (`coffer-mcp`), and transparent proxy (`coffer-proxy`).\n- [`docs/DESIGN.md`](docs/DESIGN.md) — design \u0026 specification: the reversibility invariant, data model,\n  compression pipeline, budget search, the compute-digest, surfaces, and non-goals.\n- [`docs/deployment.md`](docs/deployment.md) — MCP/proxy deployment, shared-CAS wiring, and limits.\n\n## License\n\nApache-2.0. See [`LICENSE`](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharu0416-dev%2Fcoffer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharu0416-dev%2Fcoffer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharu0416-dev%2Fcoffer/lists"}