An open API service indexing awesome lists of open source software.

https://github.com/cloaky233/rag_new

RAG with self correction, gap detection and query loops
https://github.com/cloaky233/rag_new

agentic-ai github-models rag retreival-augmented-generation

Last synced: 3 months ago
JSON representation

RAG with self correction, gap detection and query loops

Awesome Lists containing this project

README

          

# **Reflexion RAG Engine**






Last Commit
Top Language
Language Count



Built with the tools and technologies:



Markdown
Typer
TOML
.ENV
Rich
NumPy



Python
uv
SurrealDB
Pydantic
YAML


---

A production-ready Retrieval Augmented Generation (RAG) system with advanced self-correction, iterative refinement, and comprehensive web search integration. Built for complex reasoning tasks requiring multi-step analysis and comprehensive knowledge synthesis.

---

## šŸš€ Key Features

### 🧠 Advanced Reflexion Architecture
- **Self-Evaluation System**: Iterative cycles with confidence scoring and dynamic query refinement
- **Gap Detection**: Intelligent identification of missing information and knowledge gaps
- **Multi-Cycle Processing**: Automatic follow-up queries for comprehensive answers
- **Smart Decision Engine**: Four-tier framework (CONTINUE, COMPLETE, REFINE_QUERY, INSUFFICIENT_DATA)

### šŸ”„ Multi-LLM Orchestration
- **Specialized Model Allocation**: Dedicated models for generation, evaluation, and synthesis
- **Generation Model**: Meta-Llama-3.1-405B for primary answer generation
- **Evaluation Model**: Cohere-command-r for self-assessment and confidence scoring
- **Summary Model**: Meta-Llama-3.1-70B for final synthesis across cycles
- **40+ GitHub Models**: Access to the full GitHub Models ecosystem

### 🌐 Web Search Integration
- **Google Custom Search**: Real-time web search with configurable modes
- **Content Extraction**: Advanced web content extraction using Crawl4AI
- **Hybrid Retrieval**: Seamlessly combines vector store and web search results
- **Intelligent Filtering**: Content quality assessment and relevance scoring

### šŸš€ High-Performance Infrastructure
- **Azure AI Inference**: Superior semantic understanding with 3072-dimensional embeddings
- **SurrealDB Vector Store**: Native vector search with HNSW indexing for production scalability
- **Intelligent Memory Caching**: LRU-based cache with hit rate tracking
- **Streaming Architecture**: Real-time response streaming with progress indicators
- **Async Design**: Non-blocking operations throughout the pipeline

### šŸŽÆ Enterprise-Ready Features
- **YAML Prompt Management**: Template-based prompt system with versioning
- **Production Monitoring**: Comprehensive logging, error handling, and performance metrics
- **Modular Design**: Clean architecture with dependency injection and clear interfaces
- **Context-Aware Processing**: Dynamic retrieval scaling with intelligent context management
- **Error Resilience**: Graceful degradation to simpler RAG modes when reflexion fails

## šŸ“Š Performance Metrics

- **40%+ improvement** in answer comprehensiveness compared to traditional RAG
- **60%+ improvement** in semantic similarity accuracy with 3072D embeddings
- **25%+ performance boost** in vector search with SurrealDB HNSW indexing
- **Real-time web search** integration for up-to-date information
- **Sub-linear search** performance even with millions of documents

## šŸ›  Quick Start

### Prerequisites

- **Python 3.13+** with UV package manager (recommended)
- **GitHub Personal Access Token** with Models access
- **SurrealDB** instance (local or cloud)
- **Google Search API** credentials (optional, for web search)
- **8GB+ RAM** recommended for optimal performance

### Installation

```bash
# Clone the repository
git clone https://github.com/cloaky233/rag_new
cd rag_new

# Create virtual environment and install dependencies
uv venv && source .venv/bin/activate # Linux/macOS
# or .venv\Scripts\activate on Windows
uv sync

# Configure environment variables
cp .env.example .env
# Edit .env with your credentials

# Ingest documents
uv run rag.py ingest --docs_path=./docs

# Start interactive chat
uv run rag.py chat
```

## āš™ Configuration

Create a `.env` file in the project root:

```bash
# GitHub Models Configuration
GITHUB_TOKEN=your_github_pat_token_here
LLM_MODEL=meta/Meta-Llama-3.1-405B-Instruct
EVALUATION_MODEL=cohere/Cohere-command-r
SUMMARY_MODEL=meta/Meta-Llama-3.1-70B-Instruct

# Azure AI Inference Embeddings
EMBEDDING_MODEL=text-embedding-3-large
EMBEDDING_ENDPOINT=https://models.inference.ai.azure.com

# SurrealDB Configuration
SURREALDB_URL=wss://your-surreal-instance.surreal.cloud
SURREALDB_NS=rag
SURREALDB_DB=rag
SURREALDB_USER=your_username
SURREALDB_PASS=your_password

# Reflexion Settings
MAX_REFLEXION_CYCLES=3
CONFIDENCE_THRESHOLD=0.85
INITIAL_RETRIEVAL_K=3
REFLEXION_RETRIEVAL_K=5

# Web Search Configuration (Optional)
WEB_SEARCH_MODE=off # off, initial_only, every_cycle
GOOGLE_API_KEY=your_google_api_key
GOOGLE_CSE_ID=your_custom_search_engine_id

# Performance Settings
ENABLE_MEMORY_CACHE=true
MAX_CACHE_SIZE=100
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
```

## šŸŽ® Usage

### Command Line Interface

```bash
# Interactive chat with reflexion engine
uv run rag.py chat

# Ingest documents from a directory
uv run rag.py ingest --docs_path=/path/to/documents

# View current configuration
uv run rag.py config

# Delete all documents from vector store
uv run rag.py delete
```

### Programmatic Usage

```python
from src.rag.engine import RAGEngine
import asyncio

async def main():
# Initialize the RAG engine
engine = RAGEngine()

# Process a query with reflexion
response = ""
async for chunk in engine.query_stream("What are the benefits of renewable energy?"):
response += chunk.content
print(chunk.content, end="")

return response

# Run the async function
asyncio.run(main())
```

### Advanced Example with Metadata

```python
import asyncio
from src.rag.engine import RAGEngine

async def advanced_query():
engine = RAGEngine()

query = "Compare different machine learning approaches for natural language processing"

print("šŸ”„ Starting Reflexion Analysis...")
current_cycle = 0

async for chunk in engine.query_stream(query):
# Handle metadata
if chunk.metadata:
cycle = chunk.metadata.get("cycle_number", 1)
confidence = chunk.metadata.get("confidence_score", 0)

if cycle != current_cycle:
current_cycle = cycle
print(f"\n--- Cycle {cycle} (Confidence: {confidence:.2f}) ---")

# Print content
print(chunk.content, end="")

# Check for completion
if chunk.is_complete and chunk.metadata.get("reflexion_complete"):
stats = chunk.metadata
print(f"\n\nāœ… Analysis Complete!")
print(f"Total Cycles: {stats.get('total_cycles', 0)}")
print(f"Processing Time: {stats.get('total_processing_time', 0):.2f}s")
print(f"Final Confidence: {stats.get('final_confidence', 0):.2f}")

asyncio.run(advanced_query())
```

## šŸ— Architecture Overview

### Core Components

```
Reflexion RAG Engine
ā”œā”€ā”€ Generation Pipeline (Meta-Llama-405B)
│ ā”œā”€ā”€ Initial Response Generation
│ ā”œā”€ā”€ Context Retrieval & Web Search
│ └── Streaming Output
ā”œā”€ā”€ Evaluation System (Cohere-command-r)
│ ā”œā”€ā”€ Confidence Scoring
│ ā”œā”€ā”€ Gap Analysis
│ ā”œā”€ā”€ Follow-up Generation
│ └── Decision Classification
ā”œā”€ā”€ Memory Cache (LRU)
│ ā”œā”€ā”€ Query Caching
│ ā”œā”€ā”€ Hit Rate Tracking
│ └── Automatic Eviction
ā”œā”€ā”€ Web Search Engine
│ ā”œā”€ā”€ Google Custom Search
│ ā”œā”€ā”€ Content Extraction
│ ā”œā”€ā”€ Quality Assessment
│ └── Hybrid Retrieval
└── Decision Engine
ā”œā”€ā”€ CONTINUE (confidence < threshold)
ā”œā”€ā”€ REFINE_QUERY (specific gaps identified)
ā”œā”€ā”€ COMPLETE (high confidence ≄0.85)
└── INSUFFICIENT_DATA (knowledge base gaps)

Document Pipeline
ā”œā”€ā”€ Multi-format Loading (PDF, TXT, DOCX, MD, HTML)
ā”œā”€ā”€ Intelligent Chunking (1000 chars, 200 overlap)
ā”œā”€ā”€ Azure AI Embeddings (3072D vectors)
└── SurrealDB Storage (HNSW indexing)
```

### Reflexion Flow

```mermaid
graph TB
A[User Query] --> B[Initial Generation]
B --> C[Self-Evaluation]
C --> D{Confidence ≄ 0.85?}
D -->|Yes| E[Complete Response]
D -->|No| F[Gap Analysis]
F --> G[Generate Follow-up Queries]
G --> H[Enhanced Retrieval + Web Search]
H --> I[Synthesis Cycle]
I --> C
E --> J[Final Answer]
```

## šŸ“ Project Structure

```
rag/
ā”œā”€ā”€ src/ # Main source code
│ ā”œā”€ā”€ config/ # Configuration management
│ │ ā”œā”€ā”€ __init__.py
│ │ └── settings.py # Pydantic settings with env support
│ ā”œā”€ā”€ core/ # Core interfaces and exceptions
│ │ ā”œā”€ā”€ __init__.py
│ │ ā”œā”€ā”€ exceptions.py # Custom exception classes
│ │ └── interfaces.py # Abstract base classes
│ ā”œā”€ā”€ data/ # Document loading and processing
│ │ ā”œā”€ā”€ __init__.py
│ │ ā”œā”€ā”€ loader.py # Multi-format document loader
│ │ └── processor.py # Text chunking and preprocessing
│ ā”œā”€ā”€ embeddings/ # Embedding providers
│ │ ā”œā”€ā”€ __init__.py
│ │ └── github_embeddings.py # Azure AI Inference
│ ā”œā”€ā”€ llm/ # LLM interfaces and implementations
│ │ ā”œā”€ā”€ __init__.py
│ │ └── github_llm.py # GitHub Models integration
│ ā”œā”€ā”€ memory/ # Caching and memory management
│ │ ā”œā”€ā”€ __init__.py
│ │ └── cache.py # LRU cache for reflexion memory
│ ā”œā”€ā”€ rag/ # Main RAG engine
│ │ ā”œā”€ā”€ __init__.py
│ │ ā”œā”€ā”€ engine.py # Main RAG engine interface
│ │ └── reflexion_engine.py # Reflexion implementation
│ ā”œā”€ā”€ reflexion/ # Reflexion evaluation logic
│ │ ā”œā”€ā”€ __init__.py
│ │ └── evaluator.py # Smart evaluation and follow-up
│ ā”œā”€ā”€ utils/ # Utility functions
│ │ ā”œā”€ā”€ __init__.py
│ │ └── logging.py # Structured logging
│ ā”œā”€ā”€ vectorstore/ # Vector storage implementations
│ │ ā”œā”€ā”€ __init__.py
│ │ └── surrealdb_store.py # SurrealDB vector store
│ └── websearch/ # Web search integration
│ ā”œā”€ā”€ __init__.py
│ └── google_search.py # Google Search with content extraction
ā”œā”€ā”€ prompts/ # YAML prompt templates
│ ā”œā”€ā”€ __init__.py
│ ā”œā”€ā”€ manager.py # Prompt template manager
│ ā”œā”€ā”€ evaluation/ # Evaluation prompts
│ ā”œā”€ā”€ generation/ # Generation prompts
│ ā”œā”€ā”€ synthesis/ # Synthesis prompts
│ └── templates/ # Base templates
ā”œā”€ā”€ schema/ # SurrealDB schema definitions
│ ā”œā”€ā”€ documents.surql # Document table schema
│ ā”œā”€ā”€ web_search.surql # Web search results schema
│ └── *.surql # Database functions
ā”œā”€ā”€ Documentation/ # Comprehensive documentation
ā”œā”€ā”€ rag.py # Main CLI entry point
ā”œā”€ā”€ pyproject.toml # Project dependencies and metadata
ā”œā”€ā”€ .env.example # Example environment configuration
└── README.md # This file
```

## šŸ”§ Advanced Configuration

### Model Selection

```python
# Generation Models (Primary Response)
LLM_MODEL=meta/Meta-Llama-3.1-405B-Instruct # High-quality generation
LLM_MODEL=meta/Meta-Llama-3.1-70B-Instruct # Balanced performance
LLM_MODEL=microsoft/Phi-3-mini-4k-instruct # Fast responses

# Evaluation Models (Self-Assessment)
EVALUATION_MODEL=cohere/Cohere-command-r # Recommended
EVALUATION_MODEL=mistralai/Mistral-7B-Instruct-v0.3

# Summary Models (Final Synthesis)
SUMMARY_MODEL=meta/Meta-Llama-3.1-70B-Instruct
SUMMARY_MODEL=meta/Meta-Llama-3.1-8B-Instruct
```

### Performance Tuning

```bash
# Reflexion Parameters
MAX_REFLEXION_CYCLES=3 # Faster responses
MAX_REFLEXION_CYCLES=5 # More comprehensive answers
CONFIDENCE_THRESHOLD=0.7 # Lower threshold for completion
CONFIDENCE_THRESHOLD=0.9 # Higher quality requirement

# Retrieval Configuration
INITIAL_RETRIEVAL_K=3 # Documents for first cycle
REFLEXION_RETRIEVAL_K=5 # Documents for follow-up cycles

# Web Search Configuration
WEB_SEARCH_MODE=off # Disable web search
WEB_SEARCH_MODE=initial_only # Search only on first cycle
WEB_SEARCH_MODE=every_cycle # Search on every cycle

# Memory Management
ENABLE_MEMORY_CACHE=true # Enable LRU caching
MAX_CACHE_SIZE=1000 # Cache size (adjust for RAM)
```

## šŸ“ˆ Monitoring and Performance

### Real-time Metrics

- **Reflexion Cycles**: Track iteration count and decision points
- **Confidence Scoring**: Monitor answer quality and completion confidence
- **Memory Cache**: Hit rates and performance improvements
- **Processing Time**: End-to-end response time analysis
- **Web Search**: Integration success and content quality
- **Vector Search**: SurrealDB query performance and indexing efficiency

### Performance Dashboard

```python
# Get comprehensive engine statistics
from src.rag.engine import RAGEngine

engine = RAGEngine()
engine_info = engine.get_engine_info()

print(f"Engine Type: {engine_info['engine_type']}")
print(f"Max Cycles: {engine_info['max_reflexion_cycles']}")
print(f"Memory Cache: {engine_info['memory_cache_enabled']}")

if 'memory_stats' in engine_info:
memory = engine_info['memory_stats']
print(f"Cache Hit Rate: {memory.get('hit_rate', 0):.2%}")
print(f"Cache Size: {memory.get('size', 0)}/{memory.get('max_size', 0)}")
```

## šŸ” Web Search Integration

### Google Custom Search Setup

1. **Create Google Cloud Project**: Enable Custom Search API
2. **Create Custom Search Engine**: Configure search scope and preferences
3. **Get API Credentials**: Obtain API key and Custom Search Engine ID
4. **Configure Environment**: Add credentials to `.env` file

### Web Search Modes

- **OFF**: Traditional RAG without web search
- **INITIAL_ONLY**: Web search only on the first reflexion cycle
- **EVERY_CYCLE**: Web search on every reflexion cycle for maximum coverage

### Content Extraction

- **Crawl4AI Integration**: Advanced web content extraction
- **Quality Assessment**: Content validation and filtering
- **Smart Truncation**: Token-aware content limiting
- **Error Handling**: Graceful fallback to snippets

## šŸš€ Deployment

### Local Development

```bash
# Clone and setup
git clone https://github.com/cloaky233/rag_new
cd rag_new
uv venv && source .venv/bin/activate
uv sync

# Configure environment
cp .env.example .env
# Edit .env with your credentials

# Start development
uv run rag.py chat
```

### Production Deployment

```bash
# Using Docker
docker build -t reflexion-rag .
docker run --env-file .env -v $(pwd)/docs:/app/docs reflexion-rag

# Using systemd service
sudo cp reflexion-rag.service /etc/systemd/system/
sudo systemctl enable reflexion-rag
sudo systemctl start reflexion-rag
```

## šŸ¤ Contributing

We welcome contributions! Areas for improvement:

- **Additional LLM Providers**: Support for more model providers
- **Vector Stores**: Alternative vector storage backends
- **Web Search**: Additional search engines and providers
- **Performance**: Optimization and caching improvements
- **UI/UX**: Web interface and visualization tools

## šŸ“š Documentation

- [Installation Guide](Documentation/installation.md) - Detailed setup instructions
- [API Documentation](Documentation/api.md) - Programming interface reference
- [Configuration Guide](Documentation/configuration.md) - Advanced configuration options
- [Performance Guide](Documentation/performance.md) - Optimization and tuning
- [Troubleshooting](Documentation/troubleshooting.md) - Common issues and solutions

## šŸ›£ Roadmap

- **Model Context Protocol (MCP)**: AI-powered document ingestion
- **Advanced Web Search**: Multi-engine search with fact checking
- **Rust Performance**: High-performance Rust extensions
- **Modern Web Interface**: React/Vue.js frontend with FastAPI backend

See [ROADMAP.md](ROADMAP.md) for detailed future plans.

## šŸ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## šŸ™ Acknowledgments

Built with:
- [GitHub Models](https://github.com/features/models) - AI model infrastructure
- [Azure AI Inference](https://azure.microsoft.com/products/ai-services) - High-quality embeddings
- [SurrealDB](https://surrealdb.com/) - Modern database for vector operations
- [Crawl4AI](https://github.com/unclecode/crawl4ai) - Web content extraction

## šŸ‘Øā€šŸ’» Author

**Lay Sheth** ([@cloaky233](https://github.com/cloaky233))
- AI Engineer & Enthusiast
- B.Tech Computer Science Student at VIT Bhopal
- Portfolio: [cloaky.works](https://cloaky.works)

## šŸ“ž Support

- šŸ› [Report Issues](https://github.com/cloaky233/rag_new/issues)
- šŸ’¬ [GitHub Discussions](https://github.com/cloaky233/rag_new/discussions)
- šŸ“§ Email: laysheth1@gmail.com
- šŸ’¼ LinkedIn: [cloaky233](https://linkedin.com/in/cloaky233)

---

*Production-ready RAG with human-like iterative reasoning, real-time web search, and enterprise-grade vector storage.*