https://github.com/yashdew3/generic-data-rag-agent
AI-powered RAG agent to upload CSV, Excel, PDF & chat with your data using FastAPI, ChromaDB, and Google Gemini API.
https://github.com/yashdew3/generic-data-rag-agent
ai-agents chromadb csv embeddings fastapi gemini-api generative-ai langchain pdf-excel rag react sentence-transformers tailwind
Last synced: 6 months ago
JSON representation
AI-powered RAG agent to upload CSV, Excel, PDF & chat with your data using FastAPI, ChromaDB, and Google Gemini API.
- Host: GitHub
- URL: https://github.com/yashdew3/generic-data-rag-agent
- Owner: yashdew3
- License: mit
- Created: 2025-10-05T09:01:49.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-10-05T09:39:44.000Z (6 months ago)
- Last Synced: 2025-10-05T11:33:33.692Z (6 months ago)
- Topics: ai-agents, chromadb, csv, embeddings, fastapi, gemini-api, generative-ai, langchain, pdf-excel, rag, react, sentence-transformers, tailwind
- Language: Python
- Homepage:
- Size: 173 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Generic Data RAG Agent π€π
[](https://fastapi.tiangolo.com/)
[](https://reactjs.org/)
[](https://www.python.org/)
[](https://opensource.org/licenses/MIT)
A powerful **Retrieval-Augmented Generation (RAG)** system that allows users to upload various data formats and interact with them through natural language queries. Built with modern technologies and designed for scalability and ease of use.

## π Features
- **π Multi-Format Support**: Upload and process CSV, Excel, PDF, and text files
- **π§ Intelligent Retrieval**: Uses sentence transformers for semantic search
- **π¬ Natural Language Chat**: Query your data using conversational AI powered by Google Gemini
- **π Vector Database**: ChromaDB for efficient similarity search and retrieval
- **π Real-time Processing**: Instant file processing and indexing
- **π Chat History**: Persistent conversation history with context awareness
- **π¨ Modern UI**: Clean, responsive interface built with React and Tailwind CSS
- **β‘ Fast API**: High-performance backend with FastAPI and async processing
## ποΈ Architecture
```
ββββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β React Frontend ββββββ FastAPI ββββββ ChromaDB β
β (Vite + Tailwind)β β Backend β β Vector Store β
ββββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ
β Google Gemini β
β AI Model β
βββββββββββββββββββ
```
### Core Components
- **Frontend**: React 18 with Vite, Tailwind CSS, and Lucide React icons
- **Backend**: FastAPI with async support, CORS middleware, and structured routing
- **AI Model**: Google Gemini 2.5 Flash for natural language processing
- **Embeddings**: Sentence Transformers for semantic understanding
- **Vector Database**: ChromaDB for efficient similarity search
- **File Processing**: Support for multiple formats with automatic text extraction
## π οΈ Tech Stack
### Backend
- **FastAPI** - Modern, fast web framework for building APIs
- **Google Generative AI** - Gemini 2.5 Flash model integration
- **ChromaDB** - Vector database for embeddings and similarity search
- **Sentence Transformers** - State-of-the-art sentence embeddings
- **Pandas** - Data manipulation and analysis
- **PDFPlumber** - PDF text extraction
- **OpenPyXL** - Excel file processing
### Frontend
- **React 18** - Modern React with hooks and functional components
- **Vite** - Fast build tool and development server
- **Tailwind CSS** - Utility-first CSS framework
- **Lucide React** - Beautiful, customizable icons
## ποΈ Project Structure
```
generic-data-rag-agent/
βββ backend/
β βββ app/
β β βββ core/
β β β βββ config.py # Configuration settings
β β βββ routers/
β β β βββ chat.py # Chat endpoints
β β β βββ files.py # File management endpoints
β β β βββ history.py # History endpoints
β β βββ services/
β β β βββ indexer.py # Document indexing
β β β βββ ingestion.py # File processing
β β β βββ retriever.py # Vector search
β β β βββ history.py # Chat history management
β β βββ main.py # FastAPI application
β β βββ models.py # Pydantic models
β β βββ storage.py # File storage utilities
β βββ chroma_db/ # Vector database storage
β βββ uploads/ # Uploaded files storage
β βββ requirements.txt # Python dependencies
β βββ start_server.py # Server startup script
βββ frontend/
β βββ src/
β β βββ App.jsx # Main React component
β β βββ main.jsx # React entry point
β β βββ index.css # Tailwind styles
β βββ index.html # HTML template
β βββ package.json # Node.js dependencies
β βββ tailwind.config.js # Tailwind configuration
β βββ vite.config.js # Vite configuration
βββ start-backend.bat # Windows backend starter
βββ start-frontend.bat # Windows frontend starter
βββ README.md # This file
```
## π Prerequisites
- **Python 3.8+**
- **Node.js 16+**
- **Google Gemini API Key** ([Get it here](https://makersuite.google.com/app/apikey))
## π Quick Start
### 1. Clone the Repository
```bash
git clone https://github.com/yashdew3/generic-data-rag-agent.git
cd generic-data-rag-agent
```
### 2. Backend Setup
```bash
# Navigate to backend directory
cd backend
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.example .env
```
### 3. Environment Configuration
Create a `.env` file in the backend directory:
```env
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.5-flash
FRONTEND_ORIGIN=http://localhost:5173
```
### 4. Frontend Setup
```bash
# Navigate to frontend directory (new terminal)
cd frontend
# Install dependencies
npm install
```
### 5. Start the Application
#### Option 1: Using Batch Files (Windows)
```bash
# Start backend (from root directory)
start-backend.bat
# Start frontend (from root directory)
start-frontend.bat
```
#### Option 2: Manual Start
```bash
# Terminal 1 - Backend
cd backend
python start_server.py
# Terminal 2 - Frontend
cd frontend
npm run dev
```
### 6. Access the Application
- **Frontend**: http://localhost:5173
- **Backend API**: http://localhost:8000
- **API Documentation**: http://localhost:8000/docs
## π Usage Guide
### 1. Upload Files
- Click the **"Choose Files"** button
- Select CSV, Excel, PDF, or text files
- Files are automatically processed and indexed
### 2. Chat with Your Data
- Type natural language questions about your uploaded data
- Examples:
- "What are the main trends in this dataset?"
- "Summarize the key findings from the uploaded report"
- "Show me insights about sales performance"
## π§ API Endpoints
### File Management
- `POST /files/upload` - Upload and process files
- `GET /files/list` - List uploaded files
- `DELETE /files/{file_id}` - Delete a file
### Chat System
- `POST /chat/message` - Send a chat message
- `GET /chat/history/{session_id}` - Get chat history
### History Management
- `GET /history/sessions` - List all chat sessions
- `DELETE /history/sessions/{session_id}` - Delete a session
## π§ͺ Testing
### Backend Tests
```bash
cd backend
python test_system.py
```
### Frontend Development
```bash
cd frontend
npm run lint # ESLint checking
npm run build # Production build
npm run preview # Preview production build
```
## π Security Features
- **CORS Protection**: Configurable origin restrictions
- **File Validation**: Secure file type checking
- **API Key Management**: Environment-based configuration
- **Input Sanitization**: Secure data processing
## π€ Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](https://github.com/yashdew3/generic-data-rag-agent/issues) (if you have one) or open a new issue to discuss changes. Pull requests are also appreciated.
## π License
This project is licensed under the MIT License Β© Yash Dewangan
## Let's Connect
Feel free to connect or suggest improvements!
- Built by **Yash Dewangan**
- πGithub: [YashDewangan](https://github.com/yashdew3)
- π§Email: [yashdew06@gmail.com](mailto:yashdew06@gmail.com)
- πLinkedin: [YashDewangan](https://www.linkedin.com/in/yash-dewangan/)
---
**Built with β€οΈ for intelligent data interaction**
*This project demonstrates modern RAG architecture with production-ready code quality and comprehensive documentation.*