https://github.com/leo310/rag-chunking-evaluation
Assess the effectiveness of chunking strategies in RAG systems via a custom evaluation framework.
https://github.com/leo310/rag-chunking-evaluation
chunking evaluation-framework retrieval retrieval-augmented-generation
Last synced: 3 months ago
JSON representation
Assess the effectiveness of chunking strategies in RAG systems via a custom evaluation framework.
- Host: GitHub
- URL: https://github.com/leo310/rag-chunking-evaluation
- Owner: Leo310
- License: mit
- Created: 2024-06-30T10:45:23.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-17T09:12:22.000Z (over 1 year ago)
- Last Synced: 2025-09-04T22:08:25.925Z (7 months ago)
- Topics: chunking, evaluation-framework, retrieval, retrieval-augmented-generation
- Language: Jupyter Notebook
- Homepage:
- Size: 4.44 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# RAG Chunking Evaluation
This repository contains code and datasets for evaluating chunking strategies in Retrieval-Augmented Generation (RAG) systems. The project includes various benchmarks, data loaders, and utility functions to facilitate the evaluation process.
## Setup
1. **Clone this repository**
2. **Create a virtual environment:**
```sh
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
3. **Install dependencies:**
```sh
pip install -r requirements.txt
```
4. **Set up environment variables:**
Copy `.env.example` to `.env` and fill in the required values.
## Usage
Follow the instructions in the `my_benchmark` notebook to run the proposed chunking evaluation framework. The specific chunking strategies under evaluation are detailed in the `chunking_strategies` notebook.
Each step in the evaluation pipeline generates intermediate results, which are saved in the `data` directory for later review and loading.
The `experimental` directory includes tests for other benchmarks and evaluation frameworks, such as Ragas, Trulens, and Multi-Hop-RAG.