{"id":48768548,"url":"https://github.com/avanrossum/codebase-analyzer","last_synced_at":"2026-04-13T09:02:13.009Z","repository":{"id":348309208,"uuid":"1196603732","full_name":"avanrossum/codebase-analyzer","owner":"avanrossum","description":"Language-agnostic CLI tool that generates structured file documentation using local LLMs via Ollama or any OpenAI-compatible API. Two-pass analysis with quorum validation, resumable SQLite state, and optional frontier model relationship mapping.","archived":false,"fork":false,"pushed_at":"2026-03-31T15:52:09.000Z","size":70,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-31T17:46:19.563Z","etag":null,"topics":["cli","code-analysis","developer-tools","documentation","llm","lm-studio","ollama","openai-compatible","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/avanrossum.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-30T21:27:01.000Z","updated_at":"2026-03-31T15:52:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/avanrossum/codebase-analyzer","commit_stats":null,"previous_names":["avanrossum/codebase-analyzer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/avanrossum/codebase-analyzer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avanrossum%2Fcodebase-analyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avanrossum%2Fcodebase-analyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avanrossum%2Fcodebase-analyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avanrossum%2Fcodebase-analyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/avanrossum","download_url":"https://codeload.github.com/avanrossum/codebase-analyzer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avanrossum%2Fcodebase-analyzer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31746113,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T06:26:45.479Z","status":"ssl_error","status_checked_at":"2026-04-13T06:26:44.645Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","code-analysis","developer-tools","documentation","llm","lm-studio","ollama","openai-compatible","python"],"created_at":"2026-04-13T09:02:12.136Z","updated_at":"2026-04-13T09:02:12.998Z","avatar_url":"https://github.com/avanrossum.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Codebase Analyzer\n\nA language-agnostic CLI tool that traverses any codebase, generates structured descriptions of every file using a local LLM via [Ollama](https://ollama.com), validates those descriptions through a quorum process, and outputs 1:1 markdown files.\n\n## Features\n\n- **Language-agnostic** — ships with profiles for Python, JavaScript/TypeScript, Java, Go, Ruby, Rust, PHP, and more\n- **Local-first** — uses Ollama for all file analysis, no API keys required for core functionality\n- **Quorum validation** — two independent LLM passes with a judge pass to ensure accuracy\n- **Resumable** — SQLite-backed state means you can stop and resume at any time\n- **Relationship mapping** — optional frontier model integration for cross-file dependency analysis\n\n## Installation\n\n```bash\npip install codebase-analyzer\n```\n\nFor development:\n\n```bash\ngit clone https://github.com/avanrossum/codebase-analyzer.git\ncd codebase-analyzer\npip install -e \".[dev]\"\n```\n\n## Prerequisites\n\n- Python 3.9+\n- A local LLM server running with a capable model. Supported backends:\n  - [Ollama](https://ollama.com) (default, uses `/api/chat`)\n  - [LM Studio](https://lmstudio.ai) (uses OpenAI-compatible `/v1/chat/completions`)\n  - Any OpenAI-compatible API (vLLM, llama.cpp server, etc.)\n\n### Model Requirements\n\n- **Context window: 8192 tokens minimum, 16384+ recommended.** The tool sends a ~500 token system prompt plus the full file content, and the model must generate a structured JSON response. Files that exceed the context window will error and be skipped.\n- **JSON output quality matters.** The model must reliably produce valid JSON. Larger models (30B+) perform significantly better at this than smaller ones.\n- Default model: `qwen3:32b-q5_K_M` (Ollama). Override with `--model`.\n\n### Performance Notes\n\n- Each file requires **3 LLM calls minimum** (two analysis passes + one quorum judge), plus retries on disagreement. A 100-file repo means 300+ inference calls.\n- Default concurrency is 1 (single-GPU safe). For multi-GPU setups, increase with `--concurrency`.\n- Large codebases can take hours on consumer hardware. The tool is fully resumable — stop and restart at any time.\n\n## Quick Start\n\n```bash\n# Analyze a repository (auto-detects language profiles)\ncodebase-analyzer analyze /path/to/repo --output ./analysis\n\n# Check progress\ncodebase-analyzer status ./analysis\n\n# Resume an interrupted run (just re-run the same command)\ncodebase-analyzer analyze /path/to/repo --output ./analysis\n```\n\n## Usage\n\n### Analyze\n\n```bash\n# Explicit language profiles\ncodebase-analyzer analyze /path/to/repo --output ./analysis --profiles python,web,config\n\n# Custom profile file\ncodebase-analyzer analyze /path/to/repo --output ./analysis --profile-file ./my-project.yaml\n\n# Include all text files\ncodebase-analyzer analyze /path/to/repo --output ./analysis --all-text-files\n\n# Override model and concurrency\ncodebase-analyzer analyze /path/to/repo --output ./analysis \\\n  --model qwen3:32b-q5_K_M \\\n  --ollama-url http://localhost:11434 \\\n  --max-retries 3 \\\n  --max-file-size 100000 \\\n  --concurrency 1\n\n# Remote LLM server with authentication\ncodebase-analyzer analyze /path/to/repo --output ./analysis \\\n  --ollama-url https://your-server.example.com \\\n  --model your-model-name \\\n  --api-token $LLM_API_TOKEN\n```\n\n### API Token Management\n\nIf your LLM server requires authentication, pass a bearer token via `--api-token` or the `LLM_API_TOKEN` environment variable. Here are some options for managing it securely:\n\n**macOS Keychain (recommended on Mac — encrypted at rest, never in a plaintext file):**\n```bash\n# Store once\nsecurity add-generic-password -a \"$USER\" -s \"llm-api-token\" -w \"your-token-here\"\n\n# Retrieve into env var\nexport LLM_API_TOKEN=$(security find-generic-password -a \"$USER\" -s \"llm-api-token\" -w)\n\n# Or use an alias in ~/.zshrc\nalias lm-token='export LLM_API_TOKEN=$(security find-generic-password -a \"$USER\" -s \"llm-api-token\" -w)'\n```\n\n**1Password / Bitwarden CLI (best for multi-machine setups):**\n```bash\nexport LLM_API_TOKEN=$(op read \"op://Private/LLM Server/token\")    # 1Password\nexport LLM_API_TOKEN=$(bw get password \"llm-api-token\")             # Bitwarden\n```\n\n**direnv (per-project, auto-loads when you `cd` into the project):**\n```bash\n# .envrc in project root — make sure .envrc is in your .gitignore\nexport LLM_API_TOKEN=\"your-token\"\n```\n\n**Avoid** putting tokens directly in `~/.zshrc` or `~/.bashrc` — they're unencrypted, easy to accidentally commit, and visible to any process that reads your shell config.\n\n### Relationship Mapping\n\nAfter analysis completes, optionally map cross-file relationships:\n\n```bash\n# Via Claude API (automated)\ncodebase-analyzer relationships ./analysis --api-key $ANTHROPIC_API_KEY\n\n# Export prompt for Claude Code (interactive)\ncodebase-analyzer relationships ./analysis --export-prompt\n```\n\n### Resolve Flagged Files\n\nFiles that fail quorum after retries can be resolved with a frontier model:\n\n```bash\n# Via Claude API\ncodebase-analyzer resolve-flagged ./analysis --api-key $ANTHROPIC_API_KEY\n\n# Export for manual review\ncodebase-analyzer resolve-flagged ./analysis --export-prompt\n```\n\n## Output Structure\n\n```\nanalysis/\n  files/                    # 1:1 markdown files mirroring repo structure\n    path/to/module.py.md\n  flagged/                  # files that failed quorum (JSON with full history)\n    path/to/problem.py.json\n  relationships/            # cross-file dependency maps (if generated)\n    _index.md\n    module_map.md\n  analyzer_state.db         # SQLite state for resume capability\n  run_report.md             # summary statistics\n```\n\n## Optional Dependencies\n\nThe core analysis pipeline requires only Ollama. For automated relationship mapping and flagged file resolution via Claude API:\n\n```bash\npip install \"codebase-analyzer[api]\"\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favanrossum%2Fcodebase-analyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Favanrossum%2Fcodebase-analyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favanrossum%2Fcodebase-analyzer/lists"}