https://github.com/sandy1990418/deepraghub
https://github.com/sandy1990418/deepraghub
embedding finetune-embedding rag reranker
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/sandy1990418/deepraghub
- Owner: sandy1990418
- License: apache-2.0
- Created: 2024-08-27T05:31:42.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-09-02T06:56:02.000Z (10 months ago)
- Last Synced: 2025-02-12T09:57:31.878Z (4 months ago)
- Topics: embedding, finetune-embedding, rag, reranker
- Language: Python
- Homepage:
- Size: 26.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DeepRAGHub
DeepRAGHub is an advanced Retrieval-Augmented Generation (RAG) system that combines the power of large language models with efficient document retrieval for enhanced question-answering capabilities.
## 🌟 Features
- 📚 Flexible document loading and preprocessing
- 🧠 Customizable embedding models
- 🚀 Efficient vector storage and retrieval using Qdrant
- 🤖 Support for various LLM backends (OpenAI, Hugging Face)
- 🔧 Configurable RAG pipeline## 📁 Project Structure
```bash
DeepRAGHub/
├── data/
│ ├── raw/
│ └── processed/
├── deepraghub/
│ ├── data/
│ ├── embedding/
│ ├── generation/
│ ├── retrieval/
│ ├── rag/
│ └── utils/
│ └── config/
├── tests/
├── main.py
├── requirements.txt
└── README.md
```## 🛠️ Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/DeepRAGHub.git
cd DeepRAGHub
```2. Install the required dependencies:
```bash
pip install -r requirements.txt
```## ⚙️ Configuration
The system is highly configurable. Modify the `deepraghub/utils/config/config.yaml` file to adjust settings for:
- Data processing
- Embedding model
- Vector storage
- LLM generation
- RAG pipelineFor detailed configuration options, refer to the `config.yaml` file in the repository.
## 🚀 Usage
1. Prepare your documents by placing them in the `data/raw` directory.
2. Run the main script:
```bash
python main.py
```This script will:
- Load and preprocess documents
- Initialize and (optionally) train the embedding model
- Embed documents and store them in the vector database
- Set up the LLM and RAG pipeline
- Run a sample queryFor custom usage, you can import the necessary components:
```python
from deepraghub import load_config, load_documents, chunk_documents, EmbeddingModel, VectorStore, LLMModel, RAGPipeline### Load configuration
config = load_config("path/to/your/config.yaml")
### Create RAG pipeline
rag_pipeline = RAGPipeline(llm_model, vector_store, max_context_size=config.rag.max_context_size, top_k_docs=config.rag.top_k_docs)
### Query
answer = rag_pipeline.query("Your question here")
print(answer)
```
## 🏗️ Architecture
DeepRAGHub consists of several key components:
1. **Data Processing**: Handles document loading and chunking.
2. **Embedding**: Converts text chunks into vector representations.
3. **Vector Store**: Efficiently stores and retrieves document embeddings.
4. **LLM Integration**: Interfaces with various language models for text generation.
5. **RAG Pipeline**: Orchestrates the retrieval and generation process.
## 🔧 Extending the System
- To add new embedding models, extend the `EmbeddingModel` class in `deepraghub/embedding/model.py`.
- For new LLM backends, modify the `LLMFactory` in `deepraghub/generation/llm.py`.
- Custom document loaders can be added to `deepraghub/data/loader.py`.