{"id":50784172,"url":"https://github.com/pallaprolus/contextcut","last_synced_at":"2026-06-12T06:06:06.917Z","repository":{"id":363975781,"uuid":"1265816950","full_name":"pallaprolus/contextcut","owner":"pallaprolus","description":"Pack a repository into ultra-dense, AI-optimized Markdown — gitignore-aware, noise-pruned, with token estimates","archived":false,"fork":false,"pushed_at":"2026-06-11T05:42:23.000Z","size":26,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-11T07:18:06.411Z","etag":null,"topics":["ai","cli","context-window","developer-tools","llm","rust"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pallaprolus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-11T05:25:27.000Z","updated_at":"2026-06-11T05:42:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/pallaprolus/contextcut","commit_stats":null,"previous_names":["pallaprolus/contextcut"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/pallaprolus/contextcut","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pallaprolus%2Fcontextcut","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pallaprolus%2Fcontextcut/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pallaprolus%2Fcontextcut/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pallaprolus%2Fcontextcut/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pallaprolus","download_url":"https://codeload.github.com/pallaprolus/contextcut/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pallaprolus%2Fcontextcut/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34231243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","cli","context-window","developer-tools","llm","rust"],"created_at":"2026-06-12T06:06:06.439Z","updated_at":"2026-06-12T06:06:06.912Z","avatar_url":"https://github.com/pallaprolus.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ContextCut\n\n[![CI](https://github.com/pallaprolus/contextcut/actions/workflows/ci.yml/badge.svg)](https://github.com/pallaprolus/contextcut/actions/workflows/ci.yml) [![crates.io](https://img.shields.io/crates/v/contextcut.svg)](https://crates.io/crates/contextcut)\n\n**Pack a repository into ultra-dense, AI-optimized Markdown — with token estimates before you paste.**\n\nFeeding a whole repo to an LLM wastes thousands of tokens on vendor directories, lockfiles, caches, and binaries. ContextCut walks your project gitignore-aware, prunes the noise, and emits one clean Markdown document (file tree + language-tagged code blocks) ready for any chat or agent context window — and tells you what it will cost in tokens *before* you send it.\n\n```console\n$ contextcut ~/code/my-project -o packed.md\n  Files packed:  114   (skipped: 0 binary, 0 lockfile/minified/vendor, 0 filtered, 0 unreadable)\n  Output size:   400.8 KB\n  ── Estimated tokens ─────────────\n  GPT (o200k_base)         101,052\n  GPT-4 (cl100k_base)      100,147\n  Claude (approx ×1.15)    115,169\n  Gemini (approx)          101,052\n```\n\nReal-world result: a 2,240-file / 38 MB Python repo → 114 files / 0.4 MB of signal.\n\n## Install\n\n```bash\ncargo install contextcut\n# (Homebrew tap planned)\n```\n\n## Usage\n\n```bash\ncontextcut [PATH] [OPTIONS]\n```\n\n| Flag | Default | Effect |\n|---|---|---|\n| `PATH` | `.` | Root directory to pack |\n| `-o, --output \u003cFILE\u003e` | stdout | Write Markdown to a file (the stats table always goes to stderr, so stdout stays pipeable) |\n| `--related \u003cPATH\u003e` | — | Pack only files related to PATH in the import graph (repeatable): its imports *and* its importers |\n| `--diff [REF]` | — | Pack files changed vs REF (default `HEAD`) plus untracked files, with their import blast radius |\n| `--depth \u003cN\u003e` | `2` | Hops to follow in the import graph for `--related`/`--diff` |\n| `--map` | off | Append a dependency map section (`→` imports, `←` importers) to the output |\n| `--exact-claude` | off | Exact Claude count via Anthropic's count-tokens API (needs `ANTHROPIC_API_KEY`; falls back to the approximation on any error) |\n| `--tokens-only` | off | Dry run: stats + token table only, no Markdown |\n| `--strip-comments` | off | Drop full-line comments (py, rs, js/ts, go, c/cpp, java, sh, yaml/toml) |\n| `--max-file-size \u003cSIZE\u003e` | `64kb` | Truncate larger files with a `[truncated: N of M bytes]` marker (`4096`, `64kb`, `1mb`) |\n| `--include \u003cGLOB\u003e` | all | Only pack matching files (repeatable), e.g. `--include '**/*.py'` |\n| `--exclude \u003cGLOB\u003e` | none | Skip matching files (repeatable, applied after includes) |\n| `--no-gitignore` | off | Ignore `.gitignore` rules (built-in prunes still apply) |\n\n### Pack only the blast radius\n\nMost questions are about *part* of a codebase. ContextCut builds an import graph (Python, JS/TS, Rust, Go — lightweight line-based extraction, resolved against the real file set) and packs only what's connected:\n\n```bash\n# Working on the scheduler? Pack it, what it imports, and what imports it:\ncontextcut . --related kube_foresight/scheduler.py --depth 1\n#   → 5 files / ~5k tokens instead of 114 files / ~101k\n\n# Reviewing a change? Pack the diff plus everything it can break:\ncontextcut . --diff main\n\n# Add --map for an explicit imports/importers section the model can navigate by\ncontextcut . --related src/api.py --map\n```\n\n### What gets pruned automatically\n\nNo flags needed — this is the product's opinion:\n\n- Anything matched by `.gitignore` / `.ignore` (via ripgrep's [`ignore`](https://crates.io/crates/ignore) walker)\n- Binary files (content-sniffed, not extension-guessed)\n- Lockfiles: `Cargo.lock`, `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`, `poetry.lock`, `uv.lock`, `Pipfile.lock`, `Gemfile.lock`, `composer.lock`, `go.sum`, `flake.lock`\n- Minified assets: `*.min.js`, `*.min.css`, `*.map`\n- Vendor/cache dirs: `.git`, `node_modules`, `vendor`, `__pycache__`, `.venv`, `venv`, `dist`, `build`, `target`, `.pytest_cache`, `.ruff_cache`, `.mypy_cache`, `*.egg-info`, `.idea`, `.vscode`\n\n## Token estimates: how they're computed\n\n- **GPT counts are exact** — real BPE via [`tiktoken-rs`](https://crates.io/crates/tiktoken-rs) (`o200k_base` for GPT-4o/5-class, `cl100k_base` for GPT-4). Verified byte-identical against Python `tiktoken`.\n- **Claude is exact with `--exact-claude`** — Anthropic publishes no local tokenizer, but their count-tokens API returns exact numbers (free to call; set `ANTHROPIC_API_KEY`). Without the flag (or on any API error) we report `cl100k × 1.15` as a rough budgeting factor, labeled \"approx\".\n- **Gemini is an approximation** — we reuse the `o200k_base` count as a nearby proxy, labeled \"approx\".\n- Special tokens (a literal `\u003c|endoftext|\u003e` in source) are counted as plain text, never as control tokens.\n\n## Known limitations (v0.1)\n\n- `--strip-comments` is line-based: it removes *full-line* comments only and leaves inline trailing comments. Rare multi-line strings whose lines begin with `#`/`//` could be affected. A tree-sitter-based stripper is planned for v0.2.\n- Non-UTF-8 text files are lossy-converted (`U+FFFD` replacement) rather than skipped.\n- Claude/Gemini counts are estimates — treat them as budgeting guidance, not billing truth.\n\n## Roadmap\n\n- **v0.3 — tree-sitter comment stripping**: replaces the line-based stripper\n- **v0.3 — architecture overview mode**: `--map` without file bodies\n- Homebrew tap; Gemini count-tokens API\n\n## Development\n\n```bash\ncargo test            # unit + fixture-based integration + insta snapshot tests\ncargo insta review    # review Markdown-format snapshot changes\ncargo clippy          # lint (CI gate)\n```\n\nIntegration tests run the real binary against `tests/fixtures/mini-repo/`, a planted-noise fixture (gitignored secrets, a lockfile, a minified asset, a real PNG, comment/string traps). The fixture's `gitignore.txt` is renamed to `.gitignore` inside a tempdir at test time so it behaves identically regardless of the host repo's git context.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpallaprolus%2Fcontextcut","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpallaprolus%2Fcontextcut","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpallaprolus%2Fcontextcut/lists"}