Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ayodimeji1/ai_rag
https://github.com/ayodimeji1/ai_rag
artificial-intelligence chromadb langchain large-language-models llm openai python retrieval-augmented-generation
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/ayodimeji1/ai_rag
- Owner: Ayodimeji1
- Created: 2024-11-19T06:32:10.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-19T06:45:22.000Z (about 2 months ago)
- Last Synced: 2024-11-19T07:37:39.427Z (about 2 months ago)
- Topics: artificial-intelligence, chromadb, langchain, large-language-models, llm, openai, python, retrieval-augmented-generation
- Language: Python
- Homepage:
- Size: 3.72 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# RAG Framework for Query Processing
## Table of Contents
- [Introduction](#introduction)
- [Project Structure](#project-structure)
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Dependencies](#dependencies)
- [Configuration](#configuration)
- [Contributing](#contributing)
- [License](#license)---
## Introduction
This project provides a collection of tools for implementing and testing various Retrieval-Augmented Generation (RAG) techniques to process and enhance queries for document retrieval tasks. The framework is organized into three subprojects, each targeting a specific approach for query management:
1. **Simple Query**: Basic query-based document retrieval.
2. **Augmenting the Query**: Enhancing queries by generating related sub-queries.
3. **Multi-query**: Using multiple queries to combine information for improved retrieval and analysis.## Project Structure
```
project-root/
│
├── simple_query/ # Basic RAG implementation
│ ├── app.py
│ └── README.md # Detailed explanation for Simple Query
│
├── augmenting_query/ # Augmented query RAG implementation
│ ├── query_aug.py
| ├── helper_utils.py
│ └── README.md # Documentation for Augmenting the Query
│
├── multi_query/
| ├── multi_query.py # Multi-query RAG implementation
│ ├── reranking.py
| ├── helper_utils.py
| ├── dpr_technique.py
│ └── README.md # Documentation for Multi-query
│
├── data/ # Folder for storing PDF documents
│
└── README.md # Main project documentation (this file)
```## Features
- **Simple Query**:
- Basic retrieval of documents based on single query input.
- Embeds text and retrieves the most relevant results from ChromaDB.
- **Augmented Query**:
- Enhances original queries with additional contextually relevant sub-queries using OpenAI GPT models.
- Retrieves and ranks documents based on augmented queries.
- **Multi-query**:
- Combines multiple queries (original and generated) to maximize information coverage.
- Reranks results using relevance scoring with cross-encoders.
- Supports embedding visualization and multi-query analysis.## Installation
1. Clone the repository:
```bash
git clone [https://github.com/your-repo/rag-framework.git](https://github.com/Ayodimeji1/AI_RAG.git)
cd rag-framework
```2. Install dependencies:
```bash
pip install -r requirements.txt
```3. Set up environment variables:
- Create a `.env` file in the root directory.
- Add your OpenAI API key:
```
OPENAI_API_KEY=your-api-key
```## Usage
### Workflow
1. **Simple Query**:
- Navigate to the `RAG1.0/` folder and run:
```bash
python app.py
```
- This module retrieves documents based on a single input query.2. **Augmenting the Query**:
- Navigate to the `RAG2.0/` folder and run:
```bash
python query_aug.py
```
- This module generates and executes augmented queries for improved retrieval.3. **Multi-query**:
- Navigate to the `RAG3.0/` folder and run:
```bash
python multi_query.py
```
- This module combines multiple queries, deduplicates results, and reranks them based on relevance scores.## Dependencies
The project relies on the following libraries:
- `openai`
- `chromadb`
- `umap-learn`
- `pandas`
- `numpy`
- `langchain`
- `sentence-transformers`
- `matplotlib`
- `python-dotenv`
- `pypdf`## Configuration
- **Environment Variables**: Store sensitive credentials like OpenAI API keys in a `.env` file.
- **Data Storage**: Place all PDF files to be processed in the `data/` folder.## Contributing
Contributions are welcome! Please open issues or submit pull requests to enhance the framework.## License
This project is licensed under the [MIT License](LICENSE).---