An open API service indexing awesome lists of open source software.

https://github.com/peteretelej/diffchunk

diffchunk - A local MCP server that gives LLMs the ability to work with large diff files. Essential for working with large repos.
https://github.com/peteretelej/diffchunk

code-review diff-analysis llm-tools mcp-server

Last synced: 3 months ago
JSON representation

diffchunk - A local MCP server that gives LLMs the ability to work with large diff files. Essential for working with large repos.

Awesome Lists containing this project

README

          

# diffchunk

[![CI](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml/badge.svg)](https://github.com/peteretelej/diffchunk/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/peteretelej/diffchunk/branch/main/graph/badge.svg)](https://codecov.io/gh/peteretelej/diffchunk)
[![PyPI version](https://img.shields.io/pypi/v/diffchunk.svg)](https://pypi.org/project/diffchunk/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)

MCP server that enables LLMs to navigate large diff files efficiently. Instead of reading entire diffs sequentially, LLMs can jump directly to relevant changes using pattern-based navigation.

## Problem

Large diffs exceed LLM context limits and waste tokens on irrelevant changes. A 50k+ line diff can't be processed directly and manual splitting loses file relationships.

## Solution

MCP server with 4 navigation tools:

- `load_diff` - Parse diff file with custom settings (optional)
- `list_chunks` - Show chunk overview with file mappings (auto-loads)
- `get_chunk` - Retrieve specific chunk content (auto-loads)
- `find_chunks_for_files` - Locate chunks by file patterns (auto-loads)

## Setup

**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.

Add to your MCP client configuration:

```json
{
"mcpServers": {
"diffchunk": {
"command": "uvx",
"args": ["--from", "diffchunk", "diffchunk-mcp"]
}
}
}
```

## Usage

Your AI assistant can now handle massive changesets that previously caused failures in Cline, Roocode, Cursor, and other tools.

### Using with AI Assistant

Once configured, your AI assistant can analyze large commits, branches, or diffs using diffchunk.

Here are some example use cases:

**Branch comparisons:**

- _"Review all changes in develop not in the main branch for any bugs"_
- _"Tell me about all the changes I have yet to merge"_
- _"What new features were added to the staging branch?"_
- _"Summarize all changes to this repo in the last 2 weeks"_

**Code review:**

- _"Use diffchunk to check my feature branch for security vulnerabilities"_
- _"Use diffchunk to find any breaking changes before I merge to production"_
- _"Use diffchunk to review this large refactor for potential issues"_

**Change analysis:**

- _"Use diffchunk to show me all database migrations that need to be run"_
- _"Use diffchunk to find what API changes might affect our mobile app"_
- _"Use diffchunk to analyze all new dependencies added recently"_

**Direct file analysis:**

- _"Use diffchunk to analyze the diff at /tmp/changes.diff and find any bugs"_
- _"Create a diff of my uncommitted changes and review it"_
- _"Compare my local branch with origin and highlight conflicts"_

### Tip: AI Assistant Rules

Add to your AI assistant's custom instructions for automatic usage:

```
When reviewing large changesets or git commits, use diffchunk to handle large diff files.
Create temporary diff files and tracking files as needed and clean up after analysis.
```

## How It Works

When you ask your AI assistant to analyze changes, it uses diffchunk's tools strategically:

1. **Creates the diff file** (e.g., `git diff main..develop > /tmp/changes.diff`) based on your question
2. **Uses `list_chunks`** to get an overview of the diff structure and total scope
3. **Uses `find_chunks_for_files`** to locate relevant sections when you ask about specific file types
4. **Uses `get_chunk`** to examine specific sections without loading the entire diff into context
5. **Tracks progress systematically** through large changesets, analyzing chunk by chunk
6. **Cleans up temporary files** after completing the analysis

This lets your AI assistant handle massive diffs that would normally crash other tools, while providing thorough analysis without losing context.

### Tool Usage Patterns

**Overview first:**

```python
list_chunks("/tmp/changes.diff")
# → 5 chunks across 12 files, 3,847 total lines
```

**Target specific files:**

```python
find_chunks_for_files("/tmp/changes.diff", "*.py")
# → [1, 3, 5] - Python file chunks

get_chunk("/tmp/changes.diff", 1)
# → Content of first Python chunk
```

**Systematic analysis:**

```python
# Process each chunk in sequence
get_chunk("/tmp/changes.diff", 1)
get_chunk("/tmp/changes.diff", 2)
# ... continue through all chunks
```

## Configuration

### Path Requirements

- **Absolute paths only**: `/home/user/project/changes.diff`
- **Cross-platform**: Windows (`C:\path`) and Unix (`/path`)
- **Home expansion**: `~/project/changes.diff`

### Auto-Loading Defaults

Tools auto-load with optimized settings:

- `max_chunk_lines`: 1000
- `skip_trivial`: true (whitespace-only)
- `skip_generated`: true (lock files, build artifacts)

### Custom Settings

Use `load_diff` for non-default behavior:

```python
load_diff(
"/tmp/large.diff",
max_chunk_lines=2000,
include_patterns="*.py,*.js",
exclude_patterns="*test*"
)
```

## Supported Formats

- Git diff output (`git diff`, `git show`)
- Unified diff format (`diff -u`)
- Multiple files in single diff
- Binary file change indicators

## Performance

- Efficiently handles 100k+ line diffs
- Memory efficient streaming
- Auto-reload on file changes

## Documentation

- [Design](docs/design.md) - Architecture and implementation details
- [Contributing](docs/CONTRIBUTING.md) - Development setup and workflows

## License

[MIT](./LICENSE)