https://github.com/gurpreetkaurjethra/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems
https://github.com/gurpreetkaurjethra/rag-evaluator

bert bleu-score diversity evaluation-metrics generative-ai library llm metrics perplexity python-3 racial-bias rag rouge-metric streamlit

Last synced: 14 days ago
JSON representation

A library for evaluating Retrieval-Augmented Generation (RAG) systems

Host: GitHub
URL: https://github.com/gurpreetkaurjethra/rag-evaluator
Owner: GURPREETKAURJETHRA
License: mit
Created: 2024-05-22T09:19:41.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-05-22T09:35:28.000Z (11 months ago)
Last Synced: 2025-04-15T15:56:07.195Z (14 days ago)
Topics: bert, bleu-score, diversity, evaluation-metrics, generative-ai, library, llm, metrics, perplexity, python-3, racial-bias, rag, rouge-metric, streamlit
Language: Python
Homepage: https://github.com/GURPREETKAURJETHRA/END-TO-END-GENERATIVE-AI-PROJECTS
Size: 14.6 KB
Stars: 4
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# RAG Evaluator

## Overview

RAG Evaluator is a Python library for evaluating Retrieval-Augmented Generation (RAG) systems. It provides various metrics to evaluate the quality of generated text against reference text.

## Installation

You can install the library using pip:

```bash
pip install rag-evaluator
```

## Usage

Here's how to use the RAG Evaluator library:

```python
from rag_evaluator import RAGEvaluator

# Initialize the evaluator
evaluator = RAGEvaluator()

# Input data
question = "What are the causes of climate change?"
response = "Climate change is caused by human activities."
reference = "Human activities such as burning fossil fuels cause climate change."

# Evaluate the response
metrics = evaluator.evaluate_all(question, response, reference)

# Print the results
print(metrics)
```

## Streamlit Web App

To run the web app:

- cd into streamlit app folder.
- Create a virtual env
- Activate
- Install all dependencies
- and run
```
streamlit run app.py
```

## Metrics

The following metrics are provided by the library:

- **BLEU**: Measures the overlap between the generated output and reference text based on n-grams.
- **ROUGE-1**: Measures the overlap of unigrams between the generated output and reference text.
- **BERT Score**: Evaluates the semantic similarity between the generated output and reference text using BERT embeddings.
- **Perplexity**: Measures how well a language model predicts the text.
- **Diversity**: Measures the uniqueness of bigrams in the generated output.
- **Racial Bias**: Detects the presence of biased language in the generated output.

## Testing

To run the tests, use the following command:

```
python -m unittest discover -s rag_evaluator -p "test_*.py"
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gurpreetkaurjethra/rag-evaluator

Awesome Lists containing this project

README