https://github.com/prabal-verma/agentic-rag
An Agentic RAG (Retrieval-Augmented Generation) system powered by LangChain, enabling multi-step reasoning over documents using LLMs, ChromaDB, and Google Drive as a document source.
https://github.com/prabal-verma/agentic-rag
agentic-rag chromadb reactjs
Last synced: 2 months ago
JSON representation
An Agentic RAG (Retrieval-Augmented Generation) system powered by LangChain, enabling multi-step reasoning over documents using LLMs, ChromaDB, and Google Drive as a document source.
- Host: GitHub
- URL: https://github.com/prabal-verma/agentic-rag
- Owner: Prabal-verma
- Created: 2025-07-03T23:21:21.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-07-03T23:29:07.000Z (3 months ago)
- Last Synced: 2025-07-04T00:26:09.428Z (3 months ago)
- Topics: agentic-rag, chromadb, reactjs
- Language: Python
- Homepage:
- Size: 394 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ค Agentic RAG System
A comprehensive Retrieval-Augmented Generation (RAG) system that combines voice input, multimodal document processing, and intelligent search capabilities across multiple sources. Built with FastAPI and Next.js, this system provides real-time AI-powered question answering with visual grounding and citation support.
## ๐ Features
### ๐ค Voice Integration
- **Real-time Speech-to-Text**: Streaming voice input using Google Cloud Speech-to-Text
- **WebSocket-based Audio Processing**: Low-latency voice recognition with partial results
- **Multi-language Support**: Configurable language detection and transcription
- **Voice-enabled Chat Interface**: Natural conversation flow with voice commands### ๐ Multimodal Document Processing
- **Advanced PDF Processing**: Extract text, images, charts, and tables from PDFs
- **Image Understanding**: AI-powered analysis of charts, diagrams, and visual content
- **OCR Integration**: Text extraction from scanned documents and images
- **Smart Chunking**: Intelligent text segmentation with context preservation
- **Visual Grounding**: Link answers to specific document images and pages### ๐ Intelligent Multi-Source Search
- **Local RAG System**: Vector-based document retrieval with ChromaDB
- **Web Search Integration**: Real-time web search via SERP API
- **Google Drive MCP**: Model Context Protocol integration for Drive documents
- **Parallel Search Execution**: Simultaneous queries across all sources
- **Smart Result Fusion**: Intelligent combination of results from multiple sources### ๐ Citations & Transparency
- **Comprehensive Citations**: Detailed source attribution for every answer
- **Visual Citations**: Click-through access to source images and documents
- **Confidence Scoring**: Reliability indicators for each source
- **Source Traceability**: Full audit trail of information sources
- **Interactive Content Viewer**: In-app display of PDFs, images, and web content### โก Real-time Capabilities
- **WebSocket Communication**: Real-time chat and voice processing
- **Streaming Responses**: Progressive answer generation
- **Live Transcription**: Real-time speech-to-text with partial results
- **Concurrent Processing**: Parallel execution of search and generation tasks## ๐๏ธ Architecture
```
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Frontend โ โ Backend โ โ External โ
โ (Next.js) โ โ (FastAPI) โ โ Services โ
โโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโค
โ โข Voice Input โโโโโบโ โข STT Service โ โ โข Gemini/Claude โ
โ โข Chat UI โ โ โข RAG Engine โโโโโบโ โข Google Drive โ
โ โข Citations โ โ โข Web Search โ โ โข SERP API โ
โ โข Image Display โ โ โข Document Proc. โ โ โข ChromaDB โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
```## ๐ ๏ธ Technology Stack
### Backend
- **Framework**: FastAPI 0.104+ with async/await support
- **AI Providers**: Google Gemini 1.5 Pro, Anthropic Claude 3 Sonnet
- **Vector Database**: ChromaDB for embedding storage and retrieval
- **Speech Processing**: Google Cloud Speech-to-Text API
- **Document Processing**: PyPDF2, Pillow, pytesseract for OCR
- **Search Integration**: SERP API for web search, Google Drive API
- **WebSocket**: Real-time communication with connection management
- **Authentication**: OAuth 2.0 for Google services### Frontend
- **Framework**: Next.js 14 with App Router
- **Language**: TypeScript for type safety
- **UI Library**: React 18 with Tailwind CSS
- **State Management**: Zustand for client state
- **Audio Processing**: Web Audio API with WebRTC
- **Real-time**: WebSocket client with auto-reconnection
- **Testing**: Jest and React Testing Library### AI & ML
- **Embeddings**: sentence-transformers/all-MiniLM-L6-v2
- **Vector Search**: Similarity search with configurable thresholds
- **Multimodal AI**: Vision models for image understanding
- **Text Generation**: Context-aware response generation
- **Confidence Scoring**: Relevance and reliability metrics## ๐ Installation & Setup
### Prerequisites
- **Python 3.9+** (3.11 recommended)
- **Node.js 18+** with npm/yarn
- **Google Cloud Account** (for Speech-to-Text)
- **AI Provider Account** (Gemini or Claude)### 1. Clone Repository
```bash
git clone
cd agnt
```### 2. Backend Setup
#### Install Dependencies
```bash
cd backend
pip install -r requirements.txt
```#### Configure Environment
```bash
cp env.example .env
# Edit .env with your configuration (see Configuration section)
```#### Set up Google Cloud Service Account
1. Create a project in [Google Cloud Console](https://console.cloud.google.com)
2. Enable Speech-to-Text API and Drive API
3. Create a service account and download the JSON key
4. Set `GOOGLE_CLOUD_SERVICE_ACCOUNT_PATH` in your `.env` file#### Initialize Database
```bash
# ChromaDB will be initialized automatically on first run
# Data will be stored in ./chroma_db/ directory
```### 3. Frontend Setup
#### Install Dependencies
```bash
cd frontend
npm install
# or
yarn install
```#### Configure Environment
```bash
# Create .env.local file
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
echo "NEXT_PUBLIC_WS_URL=ws://localhost:8000" >> .env.local
```### 4. Running the Application
#### Start Backend Server
```bash
cd backend
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```#### Start Frontend Development Server
```bash
cd frontend
npm run dev
# or
yarn dev
```#### Access the Application
- **Frontend**: http://localhost:3000
- **Backend API Docs**: http://localhost:8000/docs
- **Backend Health Check**: http://localhost:8000/health## โ๏ธ Configuration
### Environment Variables
#### Core AI Configuration
```bash
# Choose your AI provider
AI_PROVIDER=claude # or "gemini"# API Keys (get one based on your provider choice)
CLAUDE_API_KEY=your_claude_api_key
GEMINI_API_KEY=your_gemini_api_key
```#### Speech-to-Text Setup
```bash
# Google Cloud Speech-to-Text (required for voice features)
GOOGLE_CLOUD_SERVICE_ACCOUNT_PATH=/path/to/service-account.json
STT_PROVIDER=google
GOOGLE_SPEECH_MODEL=latest_long
```#### Search Integration (Optional)
```bash
# Web Search (choose one)
SERP_API_KEY=your_serp_api_key # Recommended
GOOGLE_API_KEY=your_google_api_key # Alternative# Google Drive Integration
GOOGLE_DRIVE_CLIENT_ID=your_client_id
GOOGLE_DRIVE_CLIENT_SECRET=your_client_secret
```#### Advanced Settings
```bash
# Vector Database
CHROMA_PERSIST_DIRECTORY=./chroma_db
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2# Performance Tuning
MAX_SEARCH_RESULTS=10
SIMILARITY_THRESHOLD=0.7
MAX_TOKENS_PER_CHUNK=1000
CHUNK_OVERLAP=200
MAX_CONCURRENT_REQUESTS=100# Security
CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
SECRET_KEY=your-secret-key-change-in-production
RATE_LIMIT_PER_MINUTE=60# Feature Flags
ENABLE_WEB_SEARCH=true
ENABLE_GOOGLE_DRIVE=true
ENABLE_VOICE_INPUT=true
ENABLE_IMAGE_ANALYSIS=true
```### Model Configuration
#### AI Provider Selection
- **Claude**: Better reasoning, more conservative responses
- **Gemini**: Faster processing, better multimodal understanding#### Model Choices
```bash
# Gemini Models
GEMINI_CHAT_MODEL=gemini-1.5-pro
GEMINI_VISION_MODEL=gemini-1.5-pro-vision# Claude Models
CLAUDE_CHAT_MODEL=claude-3-sonnet-20240229
CLAUDE_VISION_MODEL=claude-3-sonnet-20240229
```## ๐ API Documentation
### Core Endpoints
#### Health Check
```http
GET /health
```
Returns system status and service availability.#### Document Upload
```http
POST /upload
Content-Type: multipart/form-datafile:
```
Upload and process a PDF document with image extraction.**Response:**
```json
{
"success": true,
"document_id": "uuid",
"filename": "document.pdf",
"pages_processed": 10,
"images_extracted": 5,
"text_chunks": 25,
"processing_time_ms": 1500
}
```#### Query System
```http
POST /query
Content-Type: application/json{
"query": "What is the main conclusion of the research?",
"num_results": 5,
"include_web_search": true,
"include_drive_search": true
}
```**Response:**
```json
{
"answer": "Based on the research findings...",
"citations": [
{
"id": "cite_1",
"source_type": "document",
"citation_type": "text",
"title": "Research Paper.pdf",
"content": "The main conclusion shows...",
"page_number": 15,
"confidence_score": 0.95
}
],
"confidence_score": 0.87,
"processing_time_ms": 2300
}
```#### Citation Details
```http
GET /citation/{citation_id}
```
Retrieve full content and metadata for a specific citation.### WebSocket Endpoints
#### Speech-to-Text
```javascript
const ws = new WebSocket('ws://localhost:8000/ws/stt');// Send audio data
ws.send(audioBuffer);// Receive transcription
ws.onmessage = (event) => {
const result = JSON.parse(event.data);
console.log(result.text, result.confidence);
};
```#### Real-time Chat
```javascript
const ws = new WebSocket('ws://localhost:8000/ws/chat');// Send message
ws.send(JSON.stringify({
type: 'query',
message: 'Hello, how can you help me?',
session_id: 'session_123'
}));
```### Frontend API Client
The frontend includes a comprehensive API client in `src/lib/api.ts`:
```typescript
import { QueryRequest, QueryResponse } from '@/types/api';// Query the system
const response = await api.query({
query: 'What is machine learning?',
num_results: 5
});// Upload document
const result = await api.uploadDocument(file);// Get citation details
const citation = await api.getCitation(citationId);
```## ๐ฏ Usage Examples
### Basic Text Query
1. Open the application at http://localhost:3000
2. Type your question in the chat input
3. View the AI-generated response with citations
4. Click citations to view source content### Voice Query
1. Click the microphone icon in the chat interface
2. Speak your question clearly
3. Watch real-time transcription appear
4. Release to send the query
5. Receive voice-enabled response### Document Upload & Analysis
1. Click the upload button or drag files into the interface
2. Select a PDF document (with images/charts)
3. Wait for processing to complete
4. Ask questions about the document content
5. View responses with page-specific citations### Advanced Search Features
```bash
# Query with specific filters
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{
"query": "quarterly revenue trends",
"num_results": 10,
"include_web_search": true,
"filters": {
"date_range": "2023-2024",
"document_type": "financial"
}
}'
```## ๐งช Development
### Running Tests
#### Backend Tests
```bash
cd backend
pytest -v
pytest --cov=app tests/ # With coverage
```#### Frontend Tests
```bash
cd frontend
npm test
npm run test:watch # Watch mode
```### Code Quality
#### Linting & Formatting
```bash
# Backend
cd backend
black .
flake8 .# Frontend
cd frontend
npm run lint
npm run type-check
```### Development Workflow
1. **Feature Development**
- Create feature branch from `main`
- Add tests for new functionality
- Update documentation as needed2. **Testing**
- Run full test suite
- Test with different AI providers
- Verify WebSocket functionality3. **Code Review**
- Check API compatibility
- Verify error handling
- Test edge cases### Project Structure
```
agnt/
โโโ backend/ # FastAPI backend
โ โโโ app/
โ โ โโโ config.py # Configuration management
โ โ โโโ models/ # Pydantic schemas
โ โ โโโ services/ # Business logic
โ โ โโโ websocket/ # WebSocket handlers
โ โโโ main.py # FastAPI application
โ โโโ requirements.txt # Python dependencies
โโโ frontend/ # Next.js frontend
โ โโโ src/
โ โ โโโ app/ # App router pages
โ โ โโโ components/ # React components
โ โ โโโ lib/ # Utilities and API client
โ โ โโโ store/ # State management
โ โโโ package.json # Node.js dependencies
โโโ README.md # This file
```## ๐ Troubleshooting
### Common Issues
#### 1. Speech-to-Text Not Working
```bash
# Check Google Cloud credentials
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json# Verify API is enabled
gcloud services list --enabled | grep speech# Test authentication
python -c "from google.cloud import speech; print('Auth OK')"
```#### 2. Vector Database Issues
```bash
# Reset ChromaDB
rm -rf backend/chroma_db/
# Restart backend to reinitialize
```#### 3. CORS Errors
```bash
# Update CORS_ORIGINS in .env
CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
```#### 4. WebSocket Connection Failed
```bash
# Check firewall settings
# Verify WebSocket URL in frontend config
# Check backend logs for connection errors
```### Performance Optimization
#### 1. Slow Document Processing
- Reduce image resolution in processing
- Increase `MAX_CONCURRENT_REQUESTS`
- Use SSD storage for database#### 2. High Memory Usage
- Adjust `EMBEDDING_BATCH_SIZE`
- Limit `MAX_TOKENS_PER_CHUNK`
- Monitor vector database size#### 3. API Response Times
- Enable caching with Redis
- Optimize similarity threshold
- Use parallel search execution### Debugging
#### Enable Debug Logging
```bash
# Backend
LOG_LEVEL=DEBUG# Frontend
NEXT_PUBLIC_DEBUG=true
```#### Monitor System Health
```bash
# Check service status
curl http://localhost:8000/health# View logs
tail -f backend/app.log
```## ๐ Deployment
### Production Deployment
#### Docker Deployment
```bash
# Build and run with Docker
docker-compose up --build -d
```#### Environment Setup
```bash
# Production environment variables
DEBUG=false
RELOAD=false
LOG_LEVEL=INFO
CORS_ORIGINS=https://yourdomain.com
```#### Performance Considerations
- Use PostgreSQL for metadata storage
- Implement Redis for caching
- Set up load balancing for multiple instances
- Configure CDN for static assets### Security Checklist
- [ ] Change default SECRET_KEY
- [ ] Enable HTTPS in production
- [ ] Implement rate limiting
- [ ] Secure API endpoints
- [ ] Validate file uploads
- [ ] Monitor for suspicious activity## ๐ค Contributing
### Development Setup
1. Fork the repository
2. Create a feature branch
3. Install development dependencies
4. Run tests to ensure everything works
5. Make your changes
6. Add tests for new functionality
7. Submit a pull request### Code Standards
- **Python**: Follow PEP 8, use type hints
- **TypeScript**: Use strict mode, proper interfaces
- **Documentation**: Update README for new features
- **Testing**: Maintain test coverage above 80%### Reporting Issues
1. Check existing issues first
2. Provide detailed reproduction steps
3. Include system information
4. Add relevant logs and error messages## ๐ License
MIT License - see [LICENSE](LICENSE) file for details.
## ๐โโ๏ธ Support
### Getting Help
- **Documentation**: Check this README and API docs
- **Issues**: Create GitHub issue for bugs
- **Discussions**: Use GitHub Discussions for questions