https://github.com/avanrossum/codebase-analyzer

Language-agnostic CLI tool that generates structured file documentation using local LLMs via Ollama or any OpenAI-compatible API. Two-pass analysis with quorum validation, resumable SQLite state, and optional frontier model relationship mapping.
https://github.com/avanrossum/codebase-analyzer

cli code-analysis developer-tools documentation llm lm-studio ollama openai-compatible python

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/avanrossum/codebase-analyzer
Owner: avanrossum
Created: 2026-03-30T21:27:01.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-31T15:52:09.000Z (3 months ago)
Last Synced: 2026-03-31T17:46:19.563Z (3 months ago)
Topics: cli, code-analysis, developer-tools, documentation, llm, lm-studio, ollama, openai-compatible, python
Language: Python
Size: 68.4 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Roadmap: ROADMAP.md

Awesome Lists containing this project

README

# Codebase Analyzer

A language-agnostic CLI tool that traverses any codebase, generates structured descriptions of every file using a local LLM via [Ollama](https://ollama.com), validates those descriptions through a quorum process, and outputs 1:1 markdown files.

## Features

- **Language-agnostic** — ships with profiles for Python, JavaScript/TypeScript, Java, Go, Ruby, Rust, PHP, and more
- **Local-first** — uses Ollama for all file analysis, no API keys required for core functionality
- **Quorum validation** — two independent LLM passes with a judge pass to ensure accuracy
- **Resumable** — SQLite-backed state means you can stop and resume at any time
- **Relationship mapping** — optional frontier model integration for cross-file dependency analysis

## Installation

```bash
pip install codebase-analyzer
```

For development:

```bash
git clone https://github.com/avanrossum/codebase-analyzer.git
cd codebase-analyzer
pip install -e ".[dev]"
```

## Prerequisites

- Python 3.9+
- A local LLM server running with a capable model. Supported backends:
- [Ollama](https://ollama.com) (default, uses `/api/chat`)
- [LM Studio](https://lmstudio.ai) (uses OpenAI-compatible `/v1/chat/completions`)
- Any OpenAI-compatible API (vLLM, llama.cpp server, etc.)

### Model Requirements

- **Context window: 8192 tokens minimum, 16384+ recommended.** The tool sends a ~500 token system prompt plus the full file content, and the model must generate a structured JSON response. Files that exceed the context window will error and be skipped.
- **JSON output quality matters.** The model must reliably produce valid JSON. Larger models (30B+) perform significantly better at this than smaller ones.
- Default model: `qwen3:32b-q5_K_M` (Ollama). Override with `--model`.

### Performance Notes

- Each file requires **3 LLM calls minimum** (two analysis passes + one quorum judge), plus retries on disagreement. A 100-file repo means 300+ inference calls.
- Default concurrency is 1 (single-GPU safe). For multi-GPU setups, increase with `--concurrency`.
- Large codebases can take hours on consumer hardware. The tool is fully resumable — stop and restart at any time.

## Quick Start

```bash
# Analyze a repository (auto-detects language profiles)
codebase-analyzer analyze /path/to/repo --output ./analysis

# Check progress
codebase-analyzer status ./analysis

# Resume an interrupted run (just re-run the same command)
codebase-analyzer analyze /path/to/repo --output ./analysis
```

## Usage

### Analyze

```bash
# Explicit language profiles
codebase-analyzer analyze /path/to/repo --output ./analysis --profiles python,web,config

# Custom profile file
codebase-analyzer analyze /path/to/repo --output ./analysis --profile-file ./my-project.yaml

# Include all text files
codebase-analyzer analyze /path/to/repo --output ./analysis --all-text-files

# Override model and concurrency
codebase-analyzer analyze /path/to/repo --output ./analysis \
--model qwen3:32b-q5_K_M \
--ollama-url http://localhost:11434 \
--max-retries 3 \
--max-file-size 100000 \
--concurrency 1

# Remote LLM server with authentication
codebase-analyzer analyze /path/to/repo --output ./analysis \
--ollama-url https://your-server.example.com \
--model your-model-name \
--api-token $LLM_API_TOKEN
```

### API Token Management

If your LLM server requires authentication, pass a bearer token via `--api-token` or the `LLM_API_TOKEN` environment variable. Here are some options for managing it securely:

**macOS Keychain (recommended on Mac — encrypted at rest, never in a plaintext file):**
```bash
# Store once
security add-generic-password -a "$USER" -s "llm-api-token" -w "your-token-here"

# Retrieve into env var
export LLM_API_TOKEN=$(security find-generic-password -a "$USER" -s "llm-api-token" -w)

# Or use an alias in ~/.zshrc
alias lm-token='export LLM_API_TOKEN=$(security find-generic-password -a "$USER" -s "llm-api-token" -w)'
```

**1Password / Bitwarden CLI (best for multi-machine setups):**
```bash
export LLM_API_TOKEN=$(op read "op://Private/LLM Server/token") # 1Password
export LLM_API_TOKEN=$(bw get password "llm-api-token") # Bitwarden
```

**direnv (per-project, auto-loads when you `cd` into the project):**
```bash
# .envrc in project root — make sure .envrc is in your .gitignore
export LLM_API_TOKEN="your-token"
```

**Avoid** putting tokens directly in `~/.zshrc` or `~/.bashrc` — they're unencrypted, easy to accidentally commit, and visible to any process that reads your shell config.

### Relationship Mapping

After analysis completes, optionally map cross-file relationships:

```bash
# Via Claude API (automated)
codebase-analyzer relationships ./analysis --api-key $ANTHROPIC_API_KEY

# Export prompt for Claude Code (interactive)
codebase-analyzer relationships ./analysis --export-prompt
```

### Resolve Flagged Files

Files that fail quorum after retries can be resolved with a frontier model:

```bash
# Via Claude API
codebase-analyzer resolve-flagged ./analysis --api-key $ANTHROPIC_API_KEY

# Export for manual review
codebase-analyzer resolve-flagged ./analysis --export-prompt
```

## Output Structure

```
analysis/
files/ # 1:1 markdown files mirroring repo structure
path/to/module.py.md
flagged/ # files that failed quorum (JSON with full history)
path/to/problem.py.json
relationships/ # cross-file dependency maps (if generated)
_index.md
module_map.md
analyzer_state.db # SQLite state for resume capability
run_report.md # summary statistics
```

## Optional Dependencies

The core analysis pipeline requires only Ollama. For automated relationship mapping and flagged file resolution via Claude API:

```bash
pip install "codebase-analyzer[api]"
```

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/avanrossum/codebase-analyzer

Awesome Lists containing this project

README