https://github.com/peteretelej/diffchunk
diffchunk - A local MCP server that gives LLMs the ability to work with large diff files. Essential for working with large repos.
https://github.com/peteretelej/diffchunk
code-review diff-analysis llm-tools mcp-server
Last synced: 3 months ago
JSON representation
diffchunk - A local MCP server that gives LLMs the ability to work with large diff files. Essential for working with large repos.
- Host: GitHub
- URL: https://github.com/peteretelej/diffchunk
- Owner: peteretelej
- License: mit
- Created: 2025-06-26T11:48:31.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-07-19T13:01:52.000Z (5 months ago)
- Last Synced: 2025-09-30T12:33:30.000Z (3 months ago)
- Topics: code-review, diff-analysis, llm-tools, mcp-server
- Language: Python
- Homepage:
- Size: 5.74 MB
- Stars: 8
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# diffchunk
[](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml)
[](https://codecov.io/gh/peteretelej/diffchunk)
[](https://pypi.org/project/diffchunk/)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/astral-sh/ruff)
[](https://github.com/astral-sh/uv)
MCP server that enables LLMs to navigate large diff files efficiently. Instead of reading entire diffs sequentially, LLMs can jump directly to relevant changes using pattern-based navigation.
## Problem
Large diffs exceed LLM context limits and waste tokens on irrelevant changes. A 50k+ line diff can't be processed directly and manual splitting loses file relationships.
## Solution
MCP server with 4 navigation tools:
- `load_diff` - Parse diff file with custom settings (optional)
- `list_chunks` - Show chunk overview with file mappings (auto-loads)
- `get_chunk` - Retrieve specific chunk content (auto-loads)
- `find_chunks_for_files` - Locate chunks by file patterns (auto-loads)
## Setup
**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.
Add to your MCP client configuration:
```json
{
"mcpServers": {
"diffchunk": {
"command": "uvx",
"args": ["--from", "diffchunk", "diffchunk-mcp"]
}
}
}
```
## Usage
Your AI assistant can now handle massive changesets that previously caused failures in Cline, Roocode, Cursor, and other tools.
### Using with AI Assistant
Once configured, your AI assistant can analyze large commits, branches, or diffs using diffchunk.
Here are some example use cases:
**Branch comparisons:**
- _"Review all changes in develop not in the main branch for any bugs"_
- _"Tell me about all the changes I have yet to merge"_
- _"What new features were added to the staging branch?"_
- _"Summarize all changes to this repo in the last 2 weeks"_
**Code review:**
- _"Use diffchunk to check my feature branch for security vulnerabilities"_
- _"Use diffchunk to find any breaking changes before I merge to production"_
- _"Use diffchunk to review this large refactor for potential issues"_
**Change analysis:**
- _"Use diffchunk to show me all database migrations that need to be run"_
- _"Use diffchunk to find what API changes might affect our mobile app"_
- _"Use diffchunk to analyze all new dependencies added recently"_
**Direct file analysis:**
- _"Use diffchunk to analyze the diff at /tmp/changes.diff and find any bugs"_
- _"Create a diff of my uncommitted changes and review it"_
- _"Compare my local branch with origin and highlight conflicts"_
### Tip: AI Assistant Rules
Add to your AI assistant's custom instructions for automatic usage:
```
When reviewing large changesets or git commits, use diffchunk to handle large diff files.
Create temporary diff files and tracking files as needed and clean up after analysis.
```
## How It Works
When you ask your AI assistant to analyze changes, it uses diffchunk's tools strategically:
1. **Creates the diff file** (e.g., `git diff main..develop > /tmp/changes.diff`) based on your question
2. **Uses `list_chunks`** to get an overview of the diff structure and total scope
3. **Uses `find_chunks_for_files`** to locate relevant sections when you ask about specific file types
4. **Uses `get_chunk`** to examine specific sections without loading the entire diff into context
5. **Tracks progress systematically** through large changesets, analyzing chunk by chunk
6. **Cleans up temporary files** after completing the analysis
This lets your AI assistant handle massive diffs that would normally crash other tools, while providing thorough analysis without losing context.
### Tool Usage Patterns
**Overview first:**
```python
list_chunks("/tmp/changes.diff")
# → 5 chunks across 12 files, 3,847 total lines
```
**Target specific files:**
```python
find_chunks_for_files("/tmp/changes.diff", "*.py")
# → [1, 3, 5] - Python file chunks
get_chunk("/tmp/changes.diff", 1)
# → Content of first Python chunk
```
**Systematic analysis:**
```python
# Process each chunk in sequence
get_chunk("/tmp/changes.diff", 1)
get_chunk("/tmp/changes.diff", 2)
# ... continue through all chunks
```
## Configuration
### Path Requirements
- **Absolute paths only**: `/home/user/project/changes.diff`
- **Cross-platform**: Windows (`C:\path`) and Unix (`/path`)
- **Home expansion**: `~/project/changes.diff`
### Auto-Loading Defaults
Tools auto-load with optimized settings:
- `max_chunk_lines`: 1000
- `skip_trivial`: true (whitespace-only)
- `skip_generated`: true (lock files, build artifacts)
### Custom Settings
Use `load_diff` for non-default behavior:
```python
load_diff(
"/tmp/large.diff",
max_chunk_lines=2000,
include_patterns="*.py,*.js",
exclude_patterns="*test*"
)
```
## Supported Formats
- Git diff output (`git diff`, `git show`)
- Unified diff format (`diff -u`)
- Multiple files in single diff
- Binary file change indicators
## Performance
- Efficiently handles 100k+ line diffs
- Memory efficient streaming
- Auto-reload on file changes
## Documentation
- [Design](docs/design.md) - Architecture and implementation details
- [Contributing](docs/CONTRIBUTING.md) - Development setup and workflows
## License
[MIT](./LICENSE)