https://github.com/balavenkatesh3322/rag-tutorial

This project implements a modular Retrieval-Augmented Generation (RAG) pipeline using Python OOP concepts and LangChain.
https://github.com/balavenkatesh3322/rag-tutorial

faiss faiss-vector-database generative-ai langchain llm openai rag

Last synced: about 2 months ago
JSON representation

This project implements a modular Retrieval-Augmented Generation (RAG) pipeline using Python OOP concepts and LangChain.

Host: GitHub
URL: https://github.com/balavenkatesh3322/rag-tutorial
Owner: balavenkatesh3322
Created: 2025-06-26T01:24:46.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-06-26T01:29:56.000Z (12 months ago)
Last Synced: 2025-08-18T15:23:45.374Z (10 months ago)
Topics: faiss, faiss-vector-database, generative-ai, langchain, llm, openai, rag
Language: Python
Homepage:
Size: 5.86 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# RAG System Tutorial using LangChain

This project implements a modular Retrieval-Augmented Generation (RAG) pipeline using Python OOP concepts and LangChain. It walks you through setting up a document-based QA system using FAISS for vector storage and OpenAI LLM for answering questions.

## 📁 Project Structure

```
.
├── start_rag_here.py # Main RAG pipeline implementation
├── sample_data.txt # Input text file used for ingestion
└── README.md # Documentation and usage guide
```

## 🧠 Components

### 1. **DocumentChunker**

Splits large documents into manageable chunks using LangChain's `RecursiveCharacterTextSplitter`.

### 2. **Embedder**

Uses `HuggingFaceEmbeddings` (e.g., `all-MiniLM-L6-v2`) to convert text chunks into dense vectors.

### 3. **VectorDB**

Uses FAISS to index embeddings and store them efficiently.

### 4. **Retriever**

Fetches relevant document chunks using vector similarity search.

### 5. **RAGPipeline**

Combines a retriever with OpenAI's GPT model to provide answers and source references.

### 6. **RAGSystem**

Orchestrates the end-to-end workflow: loading, chunking, embedding, storing, retrieving, and querying.

---

## ⚙️ Setup Instructions

### 1. Clone the Repository

```bash
# If stored in Git repo
git clone
cd
```

### 2. Install Dependencies

```bash
pip install langchain faiss-cpu openai sentence-transformers
```

### 3. Prepare Environment Variables

```bash
export OPENAI_API_KEY=your-openai-api-key
```

Alternatively, you can use `.env` file and `dotenv` to manage secrets.

### 4. Add Sample Data

Place your raw document in `sample_data.txt` in the root directory. Example:

```
LangChain is a framework for developing applications powered by language models.
```

### 5. Run the Application

```bash
python start_rag_here.py
```

You should see an answer printed along with source documents.

---

## 📌 Notes

- This is a basic RAG setup; you can extend it using LangChain’s advanced retrievers, rerankers, or LangGraph.
- You can save/load FAISS index using `VectorDB.save_local()` and `load_local()`.

---

## 📈 Future Enhancements

- Add PDF/CSV/URL loaders.
- Metadata-based filtering.
- Use LangGraph for stateful RAG workflows.
- Integrate caching and rate limiters.

---

## 🧠 Credits

Built with 💡 using:

- [LangChain](https://github.com/hwchase17/langchain)
- [FAISS](https://github.com/facebookresearch/faiss)
- [OpenAI](https://openai.com)
- [Sentence Transformers](https://www.sbert.net/)

---

## 📬 Feedback

Feel free to reach out or fork the project for improvements!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/balavenkatesh3322/rag-tutorial

Awesome Lists containing this project

README