https://github.com/dvideby0/fast_json_repair
Rust based JSON repair library
https://github.com/dvideby0/fast_json_repair
json json-parser llm-tools python
Last synced: 4 months ago
JSON representation
Rust based JSON repair library
- Host: GitHub
- URL: https://github.com/dvideby0/fast_json_repair
- Owner: dvideby0
- License: mit
- Created: 2025-08-27T10:45:19.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-10-23T13:58:57.000Z (8 months ago)
- Last Synced: 2025-12-17T02:35:26.975Z (7 months ago)
- Topics: json, json-parser, llm-tools, python
- Language: Python
- Homepage: https://pypi.org/project/fast-json-repair/
- Size: 106 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# fast_json_repair
[](https://pypi.org/project/fast-json-repair/)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
A high-performance JSON repair library for Python, powered by Rust. This is a drop-in replacement for [json_repair](https://github.com/mangiucugna/json_repair) with significant performance improvements.
## π Attribution
This library is a **Rust port** of the excellent [json_repair](https://github.com/mangiucugna/json_repair) library created by [Stefano Baccianella](https://github.com/mangiucugna). The original Python implementation is a brilliant solution for fixing malformed JSON from Large Language Models (LLMs), and this port aims to bring the same functionality with improved performance.
**All credit for the original concept, logic, and implementation goes to Stefano Baccianella.** This Rust port maintains API compatibility with the original library while leveraging Rust's performance benefits.
If you find this library useful, please also consider starring the [original json_repair repository](https://github.com/mangiucugna/json_repair).
## Features
- π¦ **Available on PyPI**: `pip install fast-json-repair`
- π **Rust Performance**: Core repair logic implemented in Rust for maximum speed
- π§ **Automatic Repair**: Fixes common JSON errors automatically
- π **Python Compatible**: Works with Python 3.11-3.14
- π **Drop-in Replacement**: Compatible API with the original json_repair library
- β‘ **Fast JSON Parsing**: Uses orjson for JSON parsing operations
## Compatibility with Original json_repair
This is a **drop-in replacement** for the original `json_repair` library with the same API:
**β
Included:**
- `repair_json()` - Main repair function with `return_objects`, `skip_json_loads`, `ensure_ascii`, `indent` parameters
- `loads()` - Convenience function for loading broken JSON directly to Python objects
- All repair capabilities: quotes, literals, commas, brackets, escape sequences, Unicode
**β Not Included:**
- File operations (`load()`, `from_file()`) - Use Python's built-in file handling + `repair_json()`
- CLI tool - Library-only implementation
- Streaming support - Not yet implemented
**Key Differences:**
- π **20x faster average**, up to 110x for large objects with long strings
- π’ Unquoted numbers parsed as numbers (not strings)
- π¦ Uses `orjson` for high-performance JSON operations
## Installation
### Quick Install
```bash
pip install fast-json-repair
```
### Build from Source
Click to expand build instructions
#### Prerequisites
- Python 3.11-3.14
- Rust toolchain (`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`)
- [uv](https://docs.astral.sh/uv/) (recommended) or pip
#### Quick Start with uv (Recommended)
```bash
# Clone the repository
git clone https://github.com/dvideby0/fast_json_repair.git
cd fast_json_repair
# Run the automated setup script
./setup.sh
```
The setup script will:
- β
Install `uv` and Rust if needed
- β
Create a virtual environment (`.venv`)
- β
Install all dependencies
- β
Build the Rust extension
- β
Verify the installation
#### Manual Build Steps
```bash
# Clone the repository
git clone https://github.com/dvideby0/fast_json_repair.git
cd fast_json_repair
# Option 1: Using uv (fast!)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv sync
maturin develop --release
# Option 2: Using pip
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install maturin orjson
maturin develop --release
```
## Usage
```python
from fast_json_repair import repair_json, loads
# Fix broken JSON
broken = "{'name': 'John', 'age': 30}" # Single quotes
fixed = repair_json(broken)
print(fixed) # {"age":30,"name":"John"}
# Parse directly to Python object
data = loads("{'key': 'value'}")
print(data) # {'key': 'value'}
# Handle Unicode properly
text = "{'message': 'δ½ ε₯½δΈη'}"
result = repair_json(text, ensure_ascii=False)
print(result) # {"message":"δ½ ε₯½δΈη"}
# Format with indentation
formatted = repair_json("{'a': 1}", indent=2)
```
## What It Repairs
Automatically fixes common JSON formatting issues:
| Issue | Fix |
|-------|-----|
| Single quotes | β Double quotes |
| Unquoted keys | β Quoted keys |
| Python literals (True/False/None) | β JSON (true/false/null) |
| Trailing commas | Removed |
| Missing commas | Added |
| Extra commas | Removed |
| Unclosed brackets/braces | Auto-closed |
| Invalid escape sequences | Fixed |
| Unicode characters | Preserved or escaped (configurable) |
## API Reference
### `repair_json(json_string, **kwargs)`
Repairs invalid JSON and returns valid JSON string.
**Parameters:**
- `json_string` (str): The potentially invalid JSON string to repair
- `return_objects` (bool): If True, return parsed Python object instead of JSON string
- `skip_json_loads` (bool): If True, skip initial validation for better performance
- `ensure_ascii` (bool): If True, escape non-ASCII characters in output
- `indent` (int): Number of spaces for indentation (None for compact output)
**Returns:**
- str or object: Repaired JSON string or parsed Python object
### `loads(json_string, **kwargs)`
Repairs and parses invalid JSON string to Python object.
**Parameters:**
- `json_string` (str): The potentially invalid JSON string to repair and parse
- `**kwargs`: Additional arguments passed to repair_json
**Returns:**
- object: The parsed Python object
## Performance
This Rust-based implementation provides significant performance improvements over the pure Python original.
### Fast Path Optimization
The library automatically uses the fastest path when possible:
**Fast Path (uses `orjson` for serialization):**
- Valid JSON input
- `ensure_ascii=False`
- `indent` is either `None` (compact) or `2`
**Fallback Path (uses stdlib `json`):**
- Valid JSON input with `ensure_ascii=True`
- Valid JSON input with `indent` values other than `None` or `2`
**Repair Path (uses Rust implementation):**
- Any invalid JSON that needs repair
- Always respects `ensure_ascii` and `indent` settings
For maximum performance with valid JSON:
```python
# Fastest - uses orjson throughout
result = repair_json(valid_json, ensure_ascii=False, indent=2)
# Slower - falls back to json.dumps for formatting
result = repair_json(valid_json, ensure_ascii=True) # ASCII escaping
result = repair_json(valid_json, indent=4) # Custom indentation
```
### Benchmark Results
Comprehensive comparison of fast_json_repair vs json_repair across 20 test cases (10 invalid JSON, 10 valid JSON) with both `ensure_ascii` settings:
| Test Case | fast_json_repair (ms) | json_repair (ms) | Speedup |
|-----------|----------------------|------------------|---------|
| **Invalid JSON (needs repair)** | | | |
| Simple quotes (ascii=T) | 0.007 | 0.032 | π 4.7x |
| Simple quotes (ascii=F) | 0.006 | 0.037 | π 5.7x |
| Medium nested (ascii=T) | 0.020 | 0.192 | π 9.6x |
| Medium nested (ascii=F) | 0.019 | 0.197 | π 10.5x |
| Large array 1000 (ascii=T) | 0.246 | 2.273 | π 9.3x |
| Large array 1000 (ascii=F) | 0.237 | 2.162 | π 9.1x |
| Deep nesting 50 (ascii=T) | 0.055 | 0.410 | π 7.5x |
| Deep nesting 50 (ascii=F) | 0.050 | 0.420 | π 8.4x |
| Large object 500 (ascii=T) | 0.404 | 27.339 | π **67.7x** |
| Large object 500 (ascii=F) | 0.408 | 26.436 | π **64.8x** |
| Complex mixed (ascii=T) | 0.033 | 0.408 | π 12.2x |
| Complex mixed (ascii=F) | 0.035 | 0.401 | π 11.4x |
| Very large 5000 (ascii=T) | 29.531 | 580.959 | π **19.7x** |
| Very large 5000 (ascii=F) | 28.526 | 581.489 | π **20.4x** |
| Long strings 10K (ascii=T) | 0.040 | 4.403 | π **110.2x** |
| Long strings 10K (ascii=F) | 0.040 | 4.360 | π **108.7x** |
| **Valid JSON (fast path)** | | | |
| Small ASCII (ascii=T) | 0.003 | 0.004 | π 1.3x |
| Small ASCII (ascii=F) | 0.002 | 0.005 | π 2.9x |
| Nested structure (ascii=T) | 0.007 | 0.008 | π 1.2x |
| Nested structure (ascii=F) | 0.003 | 0.008 | π 2.4x |
| Large array 1000 (ascii=T) | 0.799 | 0.907 | π 1.1x |
| Large array 1000 (ascii=F) | 0.421 | 0.903 | π 2.1x |
| Large object 500 (ascii=T) | 0.506 | 0.590 | π 1.2x |
| Large object 500 (ascii=F) | 0.281 | 0.571 | π 2.0x |
**Overall: 19.7x faster** across all test cases
**Key Insights:**
- π = fast_json_repair is faster (all test cases)
- **Invalid JSON repair**: 5-110x faster
- **Valid JSON with ensure_ascii=False**: 2-3x faster (uses orjson fast path)
- **Valid JSON with ensure_ascii=True**: 1.1-1.3x faster
- **Best performance gains**: Long strings (110x), large objects (68x), very large arrays (20x)
### Performance Advantages
- **Large JSON documents**: 10-70x faster for documents with many keys/values
- **Long strings**: Up to 110x faster for documents with large string values
- **Very large arrays**: 20x faster for arrays with thousands of elements
- **Deeply nested structures**: 7-10x faster with consistent performance
- **Memory efficiency**: Lower memory footprint due to Rust's zero-cost abstractions and optimized allocations
Run `python benchmark.py` to test performance on your system. See [PERFORMANCE.md](PERFORMANCE.md) for detailed analysis.
## AWS Deployment
Works seamlessly on AWS with pre-built wheels for all architectures:
- **x86_64** - Standard EC2 instances (t2, t3, m5, c5, etc.)
- **ARM64/aarch64** - Graviton instances (t4g, m6g, c6g, etc.)
```bash
# Install on any AWS instance - pip auto-selects the correct wheel
pip install fast-json-repair
```
For Lambda layers and cross-compilation, see [DEPLOYMENT.md](DEPLOYMENT.md).
## Development
### Quick Reference
| Task | Command | VS Code Task |
|------|---------|--------------|
| **Setup** | `./setup.sh` | - |
| **Build (debug)** | `maturin develop` | π§ Build: Development |
| **Build (release)** | `maturin develop --release` | π Build: Development (Release) |
| **Run tests** | `pytest tests/ -v` | π§ͺ Test: Python (All) |
| **Run benchmarks** | `python benchmark.py` | β‘ Benchmark: Run Full Suite |
| **Format code** | `cargo fmt && black . && isort .` | β¨ Format: All (Rust + Python) |
| **Lint Rust** | `cargo clippy` | π¦ Rust: Clippy |
| **Lint Python** | `ruff check .` | π Python: Lint (Ruff) |
| **Full check** | `maturin develop && pytest && python benchmark.py` | β
Full Check: Build + Test + Benchmark |
### Quick Setup
```bash
# Automated setup (recommended)
./setup.sh
# Or manually with uv
uv venv && source .venv/bin/activate
uv sync
maturin develop
```
### VS Code Integration
This project includes a complete VS Code workspace configuration:
**Getting Started:**
1. Open the project folder in VS Code
2. Install recommended extensions (you'll see a prompt)
3. The Python interpreter will auto-detect `.venv`
4. Press `Cmd+Shift+P` β "Tasks: Run Task" to see all available commands
**Available Tasks:**
- π§ **Build Tasks**: Debug build, release build, wheels, cross-platform builds
- π§ͺ **Test Tasks**: Run all tests, quick tests, coverage reports
- β‘ **Benchmark Tasks**: Full benchmarks, quick benchmarks, save results
- π¦ **Rust Tasks**: Check, clippy, format, clean
- π **Python Tasks**: Format (black), sort imports (isort), lint (ruff)
- π’ **Workflows**: Full check (build+test+benchmark), release prep, quality checks
**Debugging:**
- Press `F5` to debug Python tests
- Set breakpoints in Python code
- Use "Debug: Select and Start Debugging" for specific configs
### Common Commands
See the Quick Reference table above for the most common tasks. Additional commands:
```bash
# Code quality
black . # Format Python code
isort . # Sort Python imports
ruff check . # Lint Python code
# Cross-platform builds (requires zig)
maturin build --release --target x86_64-unknown-linux-gnu --zig
maturin build --release --target aarch64-unknown-linux-gnu --zig
maturin build --release --target universal2-apple-darwin
```
### Project Structure
```
fast_json_repair/
βββ src/
β βββ lib.rs # Rust implementation (core repair logic)
βββ python/
β βββ fast_json_repair/
β βββ __init__.py # Python API wrapper
βββ tests/
β βββ test_all.py # Python test suite
βββ benchmark.py # Performance benchmarks
βββ pyproject.toml # Python package configuration
βββ Cargo.toml # Rust package configuration
βββ .vscode/ # VS Code workspace settings (local)
βββ settings.json # Python/Rust interpreter & formatting
βββ tasks.json # Build/test/benchmark tasks
βββ launch.json # Debug configurations
βββ extensions.json # Recommended extensions
```
### Typical Workflow
1. **Make Changes** - Edit Rust (`src/`) or Python (`python/`) code
2. **Rebuild** - `maturin develop` or VS Code task `π§ Build: Development`
3. **Test** - `pytest tests/ -v` or VS Code task `π§ͺ Test: Python (All)`
4. **Benchmark** - `python benchmark.py` or VS Code task `β‘ Benchmark: Run Full Suite`
5. **Release** - `maturin build --release` when ready to publish
## License
MIT License (same as original json_repair)
## Credits & Acknowledgments
### Original Author
- **[Stefano Baccianella](https://github.com/mangiucugna)** - Creator of the original [json_repair](https://github.com/mangiucugna/json_repair) library
- Original concept and algorithm design
- Python implementation that this library is based on
- Comprehensive test cases and edge case handling
### This Rust Port
- Performance optimization through Rust implementation
- Maintains full API compatibility with the original
- Uses [PyO3](https://pyo3.rs/) for Python bindings
- Uses [orjson](https://github.com/ijl/orjson) for fast JSON parsing
### Special Thanks
A huge thank you to Stefano Baccianella for creating json_repair and making it open source. This library wouldn't exist without the original brilliant implementation that has helped countless developers handle malformed JSON from LLMs.
If you appreciate this performance-focused port, please also show support for the [original json_repair project](https://github.com/mangiucugna/json_repair) that made it all possible.