https://github.com/balavenkatesh3322/rag-tutorial
This project implements a modular Retrieval-Augmented Generation (RAG) pipeline using Python OOP concepts and LangChain.
https://github.com/balavenkatesh3322/rag-tutorial
faiss faiss-vector-database generative-ai langchain llm openai rag
Last synced: about 2 months ago
JSON representation
This project implements a modular Retrieval-Augmented Generation (RAG) pipeline using Python OOP concepts and LangChain.
- Host: GitHub
- URL: https://github.com/balavenkatesh3322/rag-tutorial
- Owner: balavenkatesh3322
- Created: 2025-06-26T01:24:46.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-06-26T01:29:56.000Z (12 months ago)
- Last Synced: 2025-08-18T15:23:45.374Z (10 months ago)
- Topics: faiss, faiss-vector-database, generative-ai, langchain, llm, openai, rag
- Language: Python
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# RAG System Tutorial using LangChain
This project implements a modular Retrieval-Augmented Generation (RAG) pipeline using Python OOP concepts and LangChain. It walks you through setting up a document-based QA system using FAISS for vector storage and OpenAI LLM for answering questions.
## π Project Structure
```
.
βββ start_rag_here.py # Main RAG pipeline implementation
βββ sample_data.txt # Input text file used for ingestion
βββ README.md # Documentation and usage guide
```
## π§ Components
### 1. **DocumentChunker**
Splits large documents into manageable chunks using LangChain's `RecursiveCharacterTextSplitter`.
### 2. **Embedder**
Uses `HuggingFaceEmbeddings` (e.g., `all-MiniLM-L6-v2`) to convert text chunks into dense vectors.
### 3. **VectorDB**
Uses FAISS to index embeddings and store them efficiently.
### 4. **Retriever**
Fetches relevant document chunks using vector similarity search.
### 5. **RAGPipeline**
Combines a retriever with OpenAI's GPT model to provide answers and source references.
### 6. **RAGSystem**
Orchestrates the end-to-end workflow: loading, chunking, embedding, storing, retrieving, and querying.
---
## βοΈ Setup Instructions
### 1. Clone the Repository
```bash
# If stored in Git repo
git clone
cd
```
### 2. Install Dependencies
```bash
pip install langchain faiss-cpu openai sentence-transformers
```
### 3. Prepare Environment Variables
```bash
export OPENAI_API_KEY=your-openai-api-key
```
Alternatively, you can use `.env` file and `dotenv` to manage secrets.
### 4. Add Sample Data
Place your raw document in `sample_data.txt` in the root directory. Example:
```
LangChain is a framework for developing applications powered by language models.
```
### 5. Run the Application
```bash
python start_rag_here.py
```
You should see an answer printed along with source documents.
---
## π Notes
- This is a basic RAG setup; you can extend it using LangChainβs advanced retrievers, rerankers, or LangGraph.
- You can save/load FAISS index using `VectorDB.save_local()` and `load_local()`.
---
## π Future Enhancements
- Add PDF/CSV/URL loaders.
- Metadata-based filtering.
- Use LangGraph for stateful RAG workflows.
- Integrate caching and rate limiters.
---
## π§ Credits
Built with π‘ using:
- [LangChain](https://github.com/hwchase17/langchain)
- [FAISS](https://github.com/facebookresearch/faiss)
- [OpenAI](https://openai.com)
- [Sentence Transformers](https://www.sbert.net/)
---
## π¬ Feedback
Feel free to reach out or fork the project for improvements!