https://github.com/kaos599/betterrag

BetterRAG: Powerful RAG evaluation toolkit for LLMs. Measure, analyze, and optimize how your AI processes text chunks with precision metrics. Perfect for RAG systems, document processing, and embedding quality assessment.
https://github.com/kaos599/betterrag

chunking-optimization embeddings embeddings-extraction embeddings-optimization evaluation evaluation-framework optimization rag rag-application rag-evaluation rag-optimization

Last synced: 29 days ago
JSON representation

Host: GitHub
URL: https://github.com/kaos599/betterrag
Owner: Kaos599
Created: 2025-03-17T12:17:38.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2025-03-26T17:29:01.000Z (about 1 month ago)
Last Synced: 2025-03-26T18:33:00.083Z (30 days ago)
Topics: chunking-optimization, embeddings, embeddings-extraction, embeddings-optimization, evaluation, evaluation-framework, optimization, rag, rag-application, rag-evaluation, rag-optimization
Language: Python
Homepage:
Size: 104 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

BetterRAG

🚀 Supercharge your RAG pipeline with optimized text chunking

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![MongoDB](https://img.shields.io/badge/MongoDB-4EA94B?logo=mongodb&logoColor=white)](https://www.mongodb.com/)
[![Dashboard](https://img.shields.io/badge/Dash-Interactive-blue?logo=plotly&logoColor=white)](https://dash.plotly.com/)

## ✨ Overview

**BetterRAG** helps you find the optimal text chunking strategy for your Retrieval-Augmented Generation pipeline through rigorous, data-driven evaluation. Stop guessing which chunking method works best—measure it!

📊 Compare Strategies
⚙️ Zero-Code Configuration
📈 Interactive Dashboard

## 🔎 Why BetterRAG?

Text chunking can make or break your RAG system's performance. Different strategies yield dramatically different results, but the optimal approach depends on your specific documents and use case. BetterRAG provides:

- **Quantitative comparison** between chunking strategies
- **Visualized metrics** to understand performance differences
- **Clear recommendations** based on real data
- **No coding required** to evaluate and improve your pipeline

## 🛠️ Features

🧩 Multiple Chunking Strategies

Fixed-size chunking: Simple token-based splitting

Recursive chunking: Follows document hierarchy

Semantic chunking: Preserves meaning and context

🤖 LLM Integration

Azure OpenAI compatibility

Google Gemini support

Extensible for other models

📊 Comprehensive Metrics

Context precision

Token efficiency

Answer relevance

Latency measurement

💾 Persistent Storage

MongoDB integration

Reuse embeddings across evaluations

Cache results for faster iteration

## 🚀 Quick Start

### Prerequisites

- Python 3.8+
- MongoDB (local or remote)
- API keys for Azure OpenAI and/or Google Gemini

### Installation in 3 Steps

```bash
# 1. Clone the repository
git clone https://github.com/yourusername/betterrag.git
cd betterrag

# 2. Install dependencies
pip install -r requirements.txt

# 3. Set up your configuration
cp config.template.yaml config.yaml
# Edit config.yaml with your API keys and preferences
```

### Running Your First Evaluation

```bash
# Add your documents to data/documents/

# Run the evaluation
python -m app.main

# View the interactive dashboard
# Default: http://127.0.0.1:8050/
```

## 📊 Sample Results

BetterRAG provides clear visual comparisons between chunking strategies:

Based on comprehensive metrics, BetterRAG will recommend the most effective chunking approach for your specific documents and queries.

## ⚙️ Configuration Options

BetterRAG uses a single YAML configuration file for all settings:

```yaml
# Chunking strategies to evaluate
chunking:
fixed_size:
enabled: true
chunk_size: 500
chunk_overlap: 50

recursive:
enabled: true
chunk_size: 1000
separators: ["\n\n", "\n", " ", ""]

semantic:
enabled: true
model: "all-MiniLM-L6-v2"

# API credentials (or use environment variables)
api:
azure_openai:
api_key: ${AZURE_OPENAI_API_KEY}
endpoint: ${AZURE_OPENAI_ENDPOINT}
```

See [config_setup.md](config_setup.md) for detailed configuration instructions.

## 🔧 Advanced Usage

```bash
# Run dashboard only (using previously processed data)
python -m app.main --dashboard-only

# Reset database before processing
python -m app.main --reset-db

# Use custom config file
python -m app.main --config my_custom_config.yaml
```

## 🛠️ Extending BetterRAG

### Adding a New Chunking Strategy

1. Create a new chunker implementation in `app/chunkers/`
2. Register it in `app/chunkers/__init__.py`
3. Add configuration parameters in `config.yaml`

### Custom Metrics

Extend the `ChunkingEvaluator` class in `app/evaluation/metrics.py` to add new metrics.

## 🤝 Contributing

Contributions are welcome! Feel free to:

- Report bugs and issues
- Suggest new features or enhancements
- Add support for additional LLM providers
- Implement new chunking strategies

## 📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

---

Built with ❤️ for the RAG community

Report Bug
·
Request Feature

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome