https://github.com/md-emon-hasan/retrieval-augmented-generation-rag

RAG enhances LLMs by retrieving relevant external knowledge before generating responses, improving accuracy and reducing hallucinations.
https://github.com/md-emon-hasan/retrieval-augmented-generation-rag

ai-chatbot chromadb custom-llm document-retrieval embedding-models faiss huggingface-rag knowledge-augmented-llm knowledge-graph langchain-rag llm-applications llm-retrieval multi-modal-rag prompt-engineering rag-pipeline retrieval-augmented-generation retrieval-qa semantic-search text-embedding vector-search

Last synced: 7 months ago
JSON representation

RAG enhances LLMs by retrieving relevant external knowledge before generating responses, improving accuracy and reducing hallucinations.

Host: GitHub
URL: https://github.com/md-emon-hasan/retrieval-augmented-generation-rag
Owner: Md-Emon-Hasan
License: mit
Created: 2025-02-14T14:58:40.000Z (8 months ago)
Default Branch: master
Last Pushed: 2025-02-14T15:13:20.000Z (8 months ago)
Last Synced: 2025-02-14T16:22:36.500Z (8 months ago)
Topics: ai-chatbot, chromadb, custom-llm, document-retrieval, embedding-models, faiss, huggingface-rag, knowledge-augmented-llm, knowledge-graph, langchain-rag, llm-applications, llm-retrieval, multi-modal-rag, prompt-engineering, rag-pipeline, retrieval-augmented-generation, retrieval-qa, semantic-search, text-embedding, vector-search
Language: Jupyter Notebook
Homepage:
Size: 569 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Retrieval-Augmented Generation (RAG)

![Image](https://github.com/user-attachments/assets/c689213c-b822-44bd-bfbb-2cd2a8dffb6a)

## Overview

Retrieval-Augmented Generation (RAG) is an advanced approach that enhances large language models (LLMs) by retrieving relevant information from external knowledge sources before generating responses. This technique improves factual accuracy, reduces hallucinations, and enables dynamic knowledge updates without retraining the model.

## Key Components

1. **Retriever**: Searches and fetches relevant documents from a knowledge base.

2. **Embedder**: Converts text into dense vector representations for efficient retrieval.

3. **Vector Store**: Stores and indexes embeddings for fast similarity search.

4. **Generator (LLM)**: Generates responses based on retrieved documents.

5. **Pipeline**: Integrates all components to process user queries efficiently.

## Implementation Steps

### 1. Environment Setup

```bash

pip install langchain transformers faiss-cpu sentence-transformers chromadb

```

### 2. Load and Preprocess Documents

```python

from langchain.document_loaders import TextLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load documents

doc_loader = TextLoader("data/documents.txt")

documents = doc_loader.load()

# Split text into smaller chunks

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)

document_chunks = text_splitter.split_documents(documents)

```

### 3. Create Embeddings and Store in Vector Database

```python

from langchain.embeddings import HuggingFaceEmbeddings

from langchain.vectorstores import FAISS

# Initialize embedding model

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Store embeddings in FAISS vector database

vector_store = FAISS.from_documents(document_chunks, embedding_model)

```

### 4. Build the RAG Pipeline

```python

from langchain.chains import RetrievalQAWithSourcesChain

from langchain.llms import HuggingFacePipeline

from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer

# Load LLM

model_name = "distilgpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name)

# Create pipeline for text generation

hf_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

llm = HuggingFacePipeline(pipeline=hf_pipeline, max_new_tokens=150)

# Initialize retrieval-based QA system

qa_chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vector_store.as_retriever())

```

### 5. Query the RAG System

```python

query = "What is Retrieval-Augmented Generation?"

response = qa_chain({"question": query}, return_only_outputs=True)

print("Answer:", response['answer'])

```

## Deployment Considerations

- **Scalability**: Use **Weaviate, Pinecone, or ChromaDB** for large-scale vector storage.

- **Latency Optimization**: Use optimized embedding models like **BGE-M3** or **FAISS-HNSW**.

- **Fine-Tuning**: Adapt the LLM to domain-specific knowledge.

- **API Integration**: Deploy using **FastAPI** or **Flask** for production use.

## Conclusion

Retrieval-Augmented Generation significantly improves LLM performance by incorporating external knowledge retrieval. Implementing RAG with LangChain and Hugging Face provides a powerful framework for knowledge-grounded AI applications.

---

### Contact Information

- **Email:** [iconicemon01@gmail.com](mailto:iconicemon01@gmail.com)

- **WhatsApp:** [+8801834363533](https://wa.me/8801834363533)

- **GitHub:** [Md-Emon-Hasan](https://github.com/Md-Emon-Hasan)

- **LinkedIn:** [Md Emon Hasan](https://www.linkedin.com/in/md-emon-hasan)

- **Facebook:** [Md Emon Hasan](https://www.facebook.com/mdemon.hasan2001/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/md-emon-hasan/retrieval-augmented-generation-rag

Awesome Lists containing this project

README