{"id":49006471,"url":"https://github.com/t11z/llm-context-collector","last_synced_at":"2026-04-18T20:13:00.902Z","repository":{"id":349856803,"uuid":"1204224602","full_name":"t11z/llm-context-collector","owner":"t11z","description":"A small CLI tool to collect source files from a repository into a single Markdown document, ready to share with an LLM.","archived":false,"fork":false,"pushed_at":"2026-04-07T21:12:12.000Z","size":51,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-07T22:26:46.656Z","etag":null,"topics":["ai","ai-coding","claude-code","codex","context","gemini","llm","vibe-coding"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/t11z.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-07T20:08:26.000Z","updated_at":"2026-04-07T21:10:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/t11z/llm-context-collector","commit_stats":null,"previous_names":["t11z/context-collector"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/t11z/llm-context-collector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t11z%2Fllm-context-collector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t11z%2Fllm-context-collector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t11z%2Fllm-context-collector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t11z%2Fllm-context-collector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/t11z","download_url":"https://codeload.github.com/t11z/llm-context-collector/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t11z%2Fllm-context-collector/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31982836,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T17:30:12.329Z","status":"ssl_error","status_checked_at":"2026-04-18T17:29:59.069Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-coding","claude-code","codex","context","gemini","llm","vibe-coding"],"created_at":"2026-04-18T20:12:59.506Z","updated_at":"2026-04-18T20:13:00.884Z","avatar_url":"https://github.com/t11z.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# llm-context-collector\n\nA small CLI tool to collect source files from a repository into a single Markdown document, ready to share with an LLM.\n\n```\n$ llm-context-collector auth\n✓ Collected 8 files (47 KB) from topic 'auth'\n  Written to: context-auth.md\n```\n\n## Why\n\nLLM architecture sparring sessions — whether you're debugging a tricky issue, reviewing an approach, or exploring a refactor — work best when the model can see the full picture. But getting code into a chat interface means copy-pasting files one by one, losing track of what you've shared, and spending more time on context assembly than on the actual conversation.\n\nllm-context-collector automates the boring part. You define topics in a config file (or just point at directories), run one command, and get a single Markdown file with all the relevant source code, properly formatted with syntax highlighting and a table of contents. Drop it into Claude, ChatGPT, Gemini, or any other LLM, and start the conversation with full context.\n\n## Installation\n\nThe recommended way to install is with [pipx](https://pypa.github.io/pipx/), which installs the tool in an isolated environment:\n\n```bash\npipx install llm-context-collector\n```\n\nAlternatively, install with pip:\n\n```bash\npip install llm-context-collector\n```\n\nFor local development:\n\n```bash\ngit clone https://github.com/t11z/llm-context-collector.git\ncd llm-context-collector\npip install -e \".[dev]\"\n```\n\n## Quick Start\n\n1. Create a `.llm-context-collector.toml` in your repository root:\n\n```toml\n[topics.auth]\ndescription = \"Authentication flow\"\npaths = [\n  \"backend/api/auth.py\",\n  \"backend/security/\",\n  \"frontend/src/hooks/useAuth.ts\",\n]\n```\n\n2. Run the tool:\n\n```bash\nllm-context-collector auth\n```\n\n3. Upload the generated `context-auth.md` to your LLM chat.\n\nThat's it. The file contains all the specified source code in a single, well-formatted Markdown document.\n\n## Configuration\n\nllm-context-collector reads its configuration from `.llm-context-collector.toml` in your repository root (or any parent directory). The config file defines **topics** — named collections of files that you frequently share together.\n\n### Topics\n\nEach topic has a name, an optional description, and a list of paths:\n\n```toml\n[topics.auth]\ndescription = \"Authentication flow: login, JWT, password handling, user registration\"\npaths = [\n  \"backend/api/auth.py\",\n  \"backend/security/jwt.py\",\n  \"backend/security/password.py\",\n  \"backend/domain/models.py\",\n  \"frontend/src/pages/Login.tsx\",\n  \"frontend/src/pages/Register.tsx\",\n  \"frontend/src/hooks/useAuth.ts\",\n  \"backend/tests/integration/test_api_auth.py\",\n]\n\n[topics.scan-pipeline]\ndescription = \"Scan worker, hashing, duplicate detection\"\npaths = [\n  \"backend/worker/\",\n  \"backend/adapters/hashing/\",\n  \"backend/domain/scanning.py\",\n  \"backend/ports/hasher.py\",\n]\n```\n\nPaths can be:\n\n- **Exact file paths** — `backend/api/auth.py`\n- **Directory paths** — `backend/worker/` — all files within are included recursively\n- **Glob patterns** — `backend/**/*_repo.py` — standard glob semantics\n\n### Exclusions\n\nBy default, llm-context-collector excludes common generated files, binaries, lock files, and any file larger than 1 MB. You can customize this:\n\n```toml\n[exclusions]\n# Add project-specific exclusions\nadditional = [\"docs/generated/\", \"*.snapshot.json\"]\n\n# Remove default exclusions if you really need to include them\nremove_defaults = [\"*.svg\"]\n\n# Override the maximum file size (in bytes, default: 1048576 = 1 MB)\nmax_file_size = 2097152\n```\n\n**Default exclusions include:**\n\n- Version control directories (`.git/`, `.svn/`, `.hg/`)\n- Generated directories (`node_modules/`, `venv/`, `__pycache__/`, `dist/`, `build/`, etc.)\n- Compiled artifacts (`*.pyc`, `*.o`, `*.so`, `*.dll`, `*.class`)\n- Lock files (`*.lock`, `package-lock.json`, `yarn.lock`, etc.)\n- Minified files (`*.min.js`, `*.min.css`, `*.map`)\n- Binary and image files (`*.jpg`, `*.png`, `*.pdf`, `*.zip`, etc.)\n- Any file not valid UTF-8\n- Any file larger than 1 MB\n\n## Usage\n\n```\nUsage: llm-context-collector [TOPIC] [OPTIONS]\n\nArguments:\n  TOPIC                   Topic name from .llm-context-collector.toml\n\nOptions:\n  --paths PATH [PATH ...] Free-form path selection (alternative to TOPIC)\n  -o, --output PATH       Output file path. Use '-' for stdout.\n                          Default: context-\u003ctopic\u003e.md or context.md\n  --config PATH           Path to config file\n  --dry-run               List files without writing output\n  --fail-on-large         Exit with error if output exceeds size threshold\n  --max-size BYTES        Override size warning threshold (default: 512000)\n  --no-toc                Skip the table of contents section\n  --list-topics           List all defined topics and exit\n  -q, --quiet             Suppress all output except errors\n  -v, --verbose           Print detailed info about collection\n  --version               Print version and exit\n  -h, --help              Show help and exit\n```\n\n### Examples\n\nCollect a named topic:\n\n```bash\nllm-context-collector auth\n```\n\nCollect specific paths without a config file:\n\n```bash\nllm-context-collector --paths backend/api/ backend/security/\n```\n\nPreview what would be collected:\n\n```bash\nllm-context-collector auth --dry-run\n```\n\nList all available topics:\n\n```bash\nllm-context-collector --list-topics\n```\n\nWrite to stdout (for piping to clipboard):\n\n```bash\n# macOS\nllm-context-collector auth -o - | pbcopy\n\n# Linux\nllm-context-collector auth -o - | xclip -selection clipboard\n```\n\nWrite to a specific file:\n\n```bash\nllm-context-collector auth -o ~/Desktop/context.md\n```\n\n## Typical Workflows\n\n### Architecture Sparring with Claude\n\nYou're about to refactor the authentication system. You want Claude to review the current state and suggest improvements.\n\n```bash\nllm-context-collector auth\n# Upload context-auth.md to a Claude project or conversation\n# \"Here's the current auth implementation. I want to add OAuth2 support. What would you change?\"\n```\n\n### Bug Report with Context\n\nA colleague reports a bug in the scan pipeline. You want to give your LLM the full context to help debug.\n\n```bash\nllm-context-collector scan-pipeline\n# Upload context-scan-pipeline.md\n# \"Users report that duplicate detection fails for files \u003e 100MB. Here's the relevant code.\"\n```\n\n### Quick Ad-Hoc Selection\n\nYou don't have a topic defined, but you need to share a few specific files:\n\n```bash\nllm-context-collector --paths src/api/endpoints.py src/models/ tests/test_api.py\n# Upload context.md\n```\n\n### CI Integration\n\nUse `--fail-on-large` in CI to catch accidentally bloated context files:\n\n```bash\nllm-context-collector auth --fail-on-large --max-size 200000 -q\n```\n\n## Output Format\n\nThe generated Markdown file includes:\n\n- A header with the topic name and description\n- Metadata: timestamp, repository name, file count, total size\n- A table of contents linking to each file\n- Each file's contents in a syntax-highlighted code fence\n\nLanguage detection covers 30+ file extensions including Python, JavaScript, TypeScript, Go, Rust, Java, C/C++, and more. Files with unrecognized extensions use plain code fences.\n\n## FAQ\n\n### Why not just copy-paste files manually?\n\nYou can, and for one or two files it's fine. But once you're regularly sharing 5-15 files for architecture discussions, the manual process gets tedious. You forget files, you lose track of what you've shared, and you spend time on logistics instead of the conversation. llm-context-collector makes it a one-command operation.\n\n### Why not use a web tool or IDE extension?\n\nThose work too. llm-context-collector is for people who prefer the terminal, want reproducible topic definitions committed to the repo, and want a tool that works the same way across projects and machines.\n\n### Does it support my programming language?\n\nllm-context-collector doesn't analyze code — it collects files. It works with any text file in any language. The syntax highlighting in the output covers 30+ languages, but even unsupported extensions get included with plain code fences.\n\n### Is it safe to upload my code to an LLM?\n\nThis is a decision you and your organization need to make. llm-context-collector does not upload anything — it only writes a local file. What you do with that file is up to you. Consider your company's policies on sharing code with third-party services, and be mindful of secrets, credentials, or proprietary algorithms in the files you collect.\n\n### What about token limits?\n\nllm-context-collector reports file sizes in bytes, not tokens. Different LLMs have different context windows and tokenization schemes, so byte-to-token conversion varies. As a rough guide, 1 KB of code is roughly 250-400 tokens. The tool warns you when the output exceeds 500 KB (~125K-200K tokens), which approaches the limits of most current models.\n\n### Can I use glob patterns?\n\nYes. Both in the config file paths and with `--paths` on the command line. Standard glob syntax: `*` matches anything except `/`, `**` matches any number of directories, `?` matches a single character.\n\n## Contributing\n\nContributions are welcome. This is a small, focused tool — please keep PRs small and focused too.\n\n```bash\n# Setup\ngit clone https://github.com/t11z/llm-context-collector.git\ncd llm-context-collector\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Lint\nruff check src/ tests/\n\n# Type check\nmypy src/\n```\n\n## License\n\nMIT — see [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ft11z%2Fllm-context-collector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ft11z%2Fllm-context-collector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ft11z%2Fllm-context-collector/lists"}