https://github.com/bjornmelin/enhanced-mem-vector-rag
⚡ Developer-friendly hybrid-RAG toolkit merging Graphiti, Qdrant, mem0, LlamaIndex, and LangChain into one powerful engine.
https://github.com/bjornmelin/enhanced-mem-vector-rag
Last synced: 2 months ago
JSON representation
⚡ Developer-friendly hybrid-RAG toolkit merging Graphiti, Qdrant, mem0, LlamaIndex, and LangChain into one powerful engine.
- Host: GitHub
- URL: https://github.com/bjornmelin/enhanced-mem-vector-rag
- Owner: BjornMelin
- License: mit
- Created: 2025-05-06T05:48:28.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-05-28T16:10:09.000Z (5 months ago)
- Last Synced: 2025-06-23T20:49:36.483Z (4 months ago)
- Language: Python
- Size: 461 KB
- Stars: 6
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Enhanced Memory Vector RAG
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://github.com/BjornMelin/enhanced-mem-vector-rag/graphs/commit-activity)
[](https://makeapullrequest.com)
[](https://github.com/BjornMelin/enhanced-mem-vector-rag/wiki)
[](https://github.com/BjornMelin/enhanced-mem-vector-rag/blob/main/CLAUDE.md)⚡ Developer-friendly hybrid-RAG toolkit merging Graphiti, Qdrant, mem0, LlamaIndex, and LangChain into one powerful engine.
This implementation creates a sophisticated knowledge retrieval system by integrating KAG methodologies with traditional RAG approaches. It seamlessly combines Graphiti's graph intelligence, Qdrant's vector capabilities, and mem0's memory persistence - all accessible through flexible LlamaIndex and LangChain interfaces for applications requiring both factual accuracy and contextual understanding.
## Table of Contents
- [Enhanced Memory Vector RAG](#enhanced-memory-vector-rag)
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [Features](#features)
- [Architecture](#architecture)
- [Layered Architecture](#layered-architecture)
- [Comprehensive System Architecture](#comprehensive-system-architecture)
- [Data Flow](#data-flow)
- [MCP Interaction Flow](#mcp-interaction-flow)
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Local Development](#local-development)
- [Docker Deployment](#docker-deployment)
- [Quick Start](#quick-start)
- [Components](#components)
- [Memory System (mem0)](#memory-system-mem0)
- [Graph Knowledge Base (Graphiti/Neo4j)](#graph-knowledge-base-graphitineo4j)
- [Vector Storage (Qdrant)](#vector-storage-qdrant)
- [Framework Integration (LlamaIndex \& LangChain)](#framework-integration-llamaindex--langchain)
- [Usage Examples](#usage-examples)
- [Configuration](#configuration)
- [Benchmarks](#benchmarks)
- [Roadmap](#roadmap)
- [Contributing](#contributing)
- [How to Cite](#how-to-cite)
- [License](#license)
- [Acknowledgements](#acknowledgements)
- [Custom MCP Server Implementation](#custom-mcp-server-implementation)
- [Key MCP Endpoints](#key-mcp-endpoints)
- [Claude Code Development](#claude-code-development)
- [Deployment](#deployment)
- [Docker Components](#docker-components)
- [Deployment Options](#deployment-options)
- [Local Deployment](#local-deployment)
- [Using Makefile](#using-makefile)
- [Security](#security)
- [Monitoring \& Observability](#monitoring--observability)
- [Backup \& Restore](#backup--restore)## Overview
Enhanced Memory Vector RAG (EMVR) is a comprehensive framework that combines the strengths of multiple retrieval methodologies to create a more robust, accurate, and contextually aware knowledge system. By integrating graph-based Knowledge-Augmented Generation (KAG) with traditional vector-based Retrieval-Augmented Generation (RAG), EMVR provides superior performance in complex knowledge retrieval tasks.
The system leverages:
- **Graphiti/Neo4j** for structured knowledge representation and graph traversal
- **Qdrant** for efficient vector similarity search
- **mem0** for persistent memory and context management
- **LlamaIndex & LangChain** for flexible orchestration and agent-based workflows## Features
- 🔄 **Hybrid Retrieval System** - Combines vector similarity search with graph-based knowledge retrieval
- 🧠 **Persistent Memory** - Maintains context and relationships across sessions
- 🔍 **Multi-modal Search** - Query across different data types and structures
- 🔗 **Knowledge Graph Integration** - Leverages structured relationships for improved context
- 🚀 **Framework Flexibility** - Works with both LlamaIndex and LangChain
- 📊 **Extensible Architecture** - Easy to customize and extend for specific use cases
- 🛠️ **Developer-Friendly APIs** - Simple interfaces for complex retrieval operations
- 📈 **Performance Optimization** - Efficient retrieval strategies for reduced latency
- 🐳 **Docker Deployment** - Containerized architecture for easy deployment## Architecture
EMVR implements a comprehensive layered architecture integrating multiple components for advanced retrieval:
### Layered Architecture
```mermaid
graph TD
subgraph "Application Layer"
QueryInterface("Query Interfaces")
ResponseGen("Response Generation")
AgentWorkflows("Custom Agent Workflows")
MCP("Model Context Protocol (MCP)")
endsubgraph "Orchestration Layer"
HybridManager("Hybrid Retrieval Manager")
ContextFusion("Context Fusion Engine")
GraphTraversal("Knowledge Graph Traversal")
LangGraph("LangGraph Orchestration")
endsubgraph "Integration Layer"
LlamaIndexConn("LlamaIndex Connectors")
LangChainComp("LangChain Components")
FastEmbed("FastEmbed Integration")
FastMCP("FastMCP Framework")
endsubgraph "Storage Layer"
Qdrant("Vector Database (Qdrant)")
Neo4j("Graph Database (Neo4j/Graphiti)")
Mem0("Memory System (mem0)")
Supabase("Metadata Storage (Supabase)")
endQueryInterface --> HybridManager
ResponseGen --> ContextFusion
AgentWorkflows --> GraphTraversal
AgentWorkflows --> HybridManager
MCP --> FastMCPLangGraph --> LangChainComp
HybridManager --> LlamaIndexConn
HybridManager --> LangChainComp
ContextFusion --> LlamaIndexConn
ContextFusion --> LangChainComp
GraphTraversal --> LlamaIndexConn
FastMCP --> LlamaIndexConnFastEmbed -.-> Qdrant
LlamaIndexConn --> Qdrant
LlamaIndexConn --> Neo4j
LlamaIndexConn --> Mem0
LlamaIndexConn --> Supabase
LangChainComp --> Qdrant
LangChainComp --> Neo4j
LangChainComp --> Mem0
LangChainComp --> Supabase
```### Comprehensive System Architecture
```mermaid
graph TB
User([User]) <--> ClaudeCode["Claude Code & MCP Tools"]
ClaudeCode <--> CustomMCP["Custom 'memory' MCP Server\n(FastMCP Framework)"]
ClaudeCode <--> ExternalMCP["External MCP Servers\n(tavily, firecrawl, context7, etc.)"]subgraph "Agent System"
LangGraph["LangGraph\n(Agent Orchestration)"]
LangChain["LangChain\n(Agent Tools & Planning)"]
Agents["Specialized Agents\n(Supervisor-Worker Pattern)"]LangGraph --> LangChain
LangGraph --> Agents
endsubgraph "RAG Framework"
LlamaIndex["LlamaIndex\n(Core RAG Framework)"]
QueryEngines["Query Engines\n(Vector, Graph, Hybrid)"]
Retrievers["Specialized Retrievers"]
DataLoaders["Data Loaders & Indexers"]LlamaIndex --> QueryEngines
LlamaIndex --> Retrievers
LlamaIndex --> DataLoaders
endsubgraph "Memory & Storage"
Qdrant[(Qdrant\nVector Store)]
Neo4j[(Neo4j\nGraph Database)]
Mem0["Mem0\n(Memory Interface)"]
Graphiti["Graphiti\n(Graph Interface)"]
Supabase[(Supabase\nMetadata & Documents)]
S3[(AWS S3\nOriginal Documents)]Mem0 -.-> Qdrant
Graphiti -.-> Neo4j
endsubgraph "Embedding & Ingestion"
FastEmbed["FastEmbed\nEmbedding Generation"]
WebCrawlers["Web Crawlers\n(Crawl4AI, Firecrawl)"]
ConnectorAPIs["Connector APIs\n(GitHub, Reddit, etc.)"]FastEmbed --> Qdrant
WebCrawlers --> DataLoaders
ConnectorAPIs --> DataLoaders
endCustomMCP <--> LangGraph
CustomMCP <--> LlamaIndexLangGraph <--> LlamaIndex
LlamaIndex <--> Qdrant
LlamaIndex <--> Neo4j
LlamaIndex <--> Mem0
LlamaIndex <--> Graphiti
LlamaIndex <--> SupabaseDataLoaders --> S3
DataLoaders --> Supabase
DataLoaders --> Qdrant
DataLoaders --> Neo4jstyle CustomMCP fill:#f9d6ff,stroke:#9333ea,stroke-width:2px
style LlamaIndex fill:#d1fae5,stroke:#059669,stroke-width:2px
style LangGraph fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style Qdrant fill:#fee2e2,stroke:#ef4444,stroke-width:2px
style Neo4j fill:#ffedd5,stroke:#f97316,stroke-width:2px
```## Data Flow
```mermaid
flowchart LR
classDef userInteraction fill:#f9d6ff,stroke:#9333ea,stroke-width:2px
classDef retrieval fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
classDef processing fill:#d1fae5,stroke:#059669,stroke-width:2px
classDef storage fill:#fee2e2,stroke:#ef4444,stroke-width:2px
classDef fusion fill:#ffedd5,stroke:#f97316,stroke-width:2pxInput("User Query/Task") --> ClaudeCode("Claude Code\nMCP Interface")
ClaudeCode --> MemoryMCP("Custom 'memory'\nMCP Server")MemoryMCP --> Agent("Agent System\n(LangChain/LangGraph)")
Agent --> VR("Vector Retrieval\n(Qdrant via LlamaIndex)")
Agent --> GR("Graph Retrieval\n(Neo4j/Graphiti via LlamaIndex)")
Agent --> MR("Memory Retrieval\n(mem0)")
Agent --> WS("Web Search\n(Tavily/Firecrawl)")VR --> CF("Context Fusion\n(LlamaIndex Orchestration)")
GR --> CF
MR --> CF
WS --> CFCF --> QP("Query Planning\n(LangGraph)")
QP --> RT("Response Templates")CF --> LLM("Large Language Model")
RT --> LLM
LLM --> Response("Enhanced Response")Response --> MemUpdate("Memory Update\n(mem0)")
Response --> KGUpdate("Knowledge Graph Update\n(Neo4j)")
Response --> MetaUpdate("Metadata Update\n(Supabase)")Response --> ClaudeCode
ClaudeCode --> User([User])class Input,ClaudeCode,User userInteraction
class VR,GR,MR,WS retrieval
class QP,RT,LLM processing
class MemUpdate,KGUpdate,MetaUpdate storage
class CF fusion
```### MCP Interaction Flow
```mermaid
sequenceDiagram
participant User
participant Claude as Claude Code
participant Memory as custom 'memory' MCP
participant External as External MCP Servers
participant LlamaIdx as LlamaIndex
participant Storage as Storage SystemsUser->>Claude: Query or Task
Claude->>Memory: memory.read_graph()
Memory->>LlamaIdx: Query through LlamaIndex
LlamaIdx->>Storage: Fetch from Qdrant/Neo4j/Supabase
Storage-->>LlamaIdx: Return relevant data
LlamaIdx-->>Memory: Process & return results
Memory-->>Claude: Return graph stateClaude->>External: context7.get_library_docs()
External-->>Claude: Return documentationNote over Claude,Memory: Agent Planning & Execution
Claude->>Memory: Execute retrieval/update
Memory->>LlamaIdx: Orchestrate operations
LlamaIdx->>Storage: Execute operations
Storage-->>LlamaIdx: Return operation results
LlamaIdx-->>Memory: Process & return results
Memory-->>Claude: Return operation status/resultsClaude->>Memory: memory.add_observations()
Memory->>Storage: Update memory stateClaude-->>User: Deliver response/results
```## Getting Started
### Prerequisites
- Python 3.11+
- Docker (recommended for Neo4j, Qdrant, and Supabase)
- `uv` for Python package management
- Basic understanding of RAG systems### Installation
#### Local Development
```bash
# Clone the repository
git clone https://github.com/BjornMelin/enhanced-mem-vector-rag.git
cd enhanced-mem-vector-rag# Install dependencies using uv
uv pip install -r requirements.txt
```#### Docker Deployment
```bash
# Navigate to deployment directory
cd emvr/deployment# Setup environment
./setup_local.sh# Start services
docker compose up -d
```### Quick Start
```python
from emvr import EmvrSystem# Initialize the system
system = EmvrSystem()# Load data
system.load_documents("path/to/documents")
system.build_knowledge_graph()# Query the system
response = system.query("What is the relationship between X and Y?")
print(response)
```## Components
### Memory System (mem0)
The memory component leverages mem0 to maintain persistent context across queries and sessions. This allows the system to:
- Remember previous interactions
- Build cumulative knowledge
- Maintain entity relationships
- Support temporal reasoning```mermaid
graph LR
Query("User Query") --> Memory("mem0 Memory System")
Memory --> Scoring("Relevance Scoring")
Memory --> Personalization("Personalization Layer")
Memory --> Context("Contextual History")Scoring --> Retrieval("Enhanced Retrieval")
Personalization --> Retrieval
Context --> RetrievalRetrieval --> LLM("Large Language Model")
LLM --> Response("Enhanced Response")
Response --> Memory
```### Graph Knowledge Base (Graphiti/Neo4j)
The graph component uses Graphiti with Neo4j to:
- Store structured relationships between entities
- Enable complex traversal queries
- Support reasoning about interconnected concepts
- Provide explicit knowledge paths```mermaid
graph TD
subgraph "Knowledge Graph (Neo4j/Graphiti)"
Entity1("Entity A")
Entity2("Entity B")
Entity3("Entity C")
Entity4("Entity D")Entity1 -- "relates_to" --> Entity2
Entity2 -- "depends_on" --> Entity3
Entity1 -- "creates" --> Entity4
Entity3 -- "part_of" --> Entity4
endQuery("Knowledge Query") --> GraphTraversal("Graph Traversal (Graphiti)")
GraphTraversal --> Neo4j("Neo4j Database")
Neo4j --> Results("Structured Results")
Results --> LLM("LLM for Reasoning")
```### Vector Storage (Qdrant)
The vector component uses Qdrant to:
- Store and retrieve document embeddings
- Perform efficient similarity search
- Support semantic matching
- Handle large-scale vector operations```mermaid
graph TD
Documents["Input Documents"] --> TextChunker["Text Chunker"]
TextChunker --> EmbeddingGen["Embedding Generation"]
EmbeddingGen --> VectorDB["Qdrant Vector Database"]Query["User Query"] --> QueryEmbed["Query Embedding"]
QueryEmbed --> SearchVec["Vector Search"]
SearchVec --> VectorDB
VectorDB --> TopMatches["Top K Matches"]
TopMatches --> Reranker["Reranker"]
Reranker --> ContextGen["Context Generation"]
```### Framework Integration (LlamaIndex & LangChain)
EMVR integrates with both major RAG frameworks:
- **LlamaIndex** - For advanced indexing and retrieval operations
- **LangChain** - For agent-based workflows and tool integration```mermaid
graph TD
subgraph "LlamaIndex Integration"
Docs[("Documents")] --> Loaders["Data Loaders"]
Loaders --> Indexing["Indexing Pipelines"]
Indexing --> QueryEngines["Query Engines"]
QueryEngines --> RetFramework["Retrieval Framework"]
endsubgraph "LangChain Integration"
Agents["Agent Framework"] --> Planning["Planning Modules"]
Planning --> Tools["Tool Integration"]
Tools --> Memory["Memory Components"]
Memory --> Callbacks["Callback Handlers"]
endRetFramework <--> Tools
QueryEngines <--> Agents
```## Usage Examples
Examples are coming soon. They will demonstrate:
- Basic RAG workflows
- Knowledge graph integration
- Multi-hop reasoning
- Custom retrieval strategies
- Agent-based applications## Configuration
EMVR can be configured through:
- Configuration files
- Environment variables
- Programmatic settingsDetailed configuration options will be provided in the upcoming documentation.
## Benchmarks
Performance benchmarks comparing EMVR to traditional RAG systems will be available soon.
## Roadmap
- [x] Initial release with core functionality
- [x] Basic documentation
- [x] Agent orchestration implementation
- [x] UI implementation with Chainlit
- [x] Docker containerization and deployment
- [ ] Comprehensive documentation
- [ ] Performance benchmarks
- [ ] Advanced examples
- [ ] Cloud deployment guides
- [ ] Additional vector database integrations
- [ ] Custom agent templates## Contributing
Contributions are welcome! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) file for guidelines.
## How to Cite
If you use EMVR in your research, please cite:
```bibtex
@software{emvr2025,
author = {Melin, Bjorn},
title = {Enhanced Memory Vector RAG: A Hybrid Retrieval Framework},
year = {2025},
url = {https://github.com/BjornMelin/enhanced-mem-vector-rag},
version = {0.1.0}
}
```## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgements
- [Graphiti](https://github.com/neo4j/graphiti) for Neo4j integration
- [Qdrant](https://github.com/qdrant/qdrant) for vector database capabilities
- [mem0](https://github.com/mem0ai/mem0) for memory systems
- [LlamaIndex](https://github.com/run-llama/llama_index) for indexing frameworks
- [LangChain](https://github.com/langchain-ai/langchain) for agent orchestration## Custom MCP Server Implementation
This project implements a custom `memory` MCP server using the FastMCP framework that serves as the central interface between Claude Code and the system's backend components:
```mermaid
flowchart TD
classDef mcp fill:#f9d6ff,stroke:#9333ea,stroke-width:2px
classDef frameworks fill:#d1fae5,stroke:#059669,stroke-width:2px
classDef storage fill:#fee2e2,stroke:#ef4444,stroke-width:2pxClaude([Claude Code]) --> MCP["Custom 'memory' MCP Server\n(FastMCP Framework)"]
subgraph "MCP Endpoints"
SearchHybrid["/search.hybrid"]
GraphQuery["/graph.query"]
MemoryOps["/memory.*"]
RulesValidate["/rules.validate"]
IngestOps["/ingest.*"]
endMCP --> SearchHybrid
MCP --> GraphQuery
MCP --> MemoryOps
MCP --> RulesValidate
MCP --> IngestOpsSearchHybrid --> LlamaIndex["LlamaIndex\nRAG Orchestration"]
GraphQuery --> LlamaIndex
MemoryOps --> LlamaIndex
RulesValidate --> APOC["Neo4j APOC\nRules Engine"]
IngestOps --> LlamaIndexLlamaIndex --> Qdrant[(Qdrant)]
LlamaIndex --> Neo4j[(Neo4j)]
LlamaIndex --> Supabase[(Supabase)]
APOC --> Neo4jMem0["Mem0 SDK"] --> Qdrant
Graphiti["Graphiti Client"] --> Neo4jclass MCP,SearchHybrid,GraphQuery,MemoryOps,RulesValidate,IngestOps mcp
class LlamaIndex,APOC,Mem0,Graphiti frameworks
class Qdrant,Neo4j,Supabase storage
```### Key MCP Endpoints
| Endpoint | Description | Implementation |
| ----------------- | ----------------------------------------------------- | ------------------------------------------------------------------------------------ |
| `/search.hybrid` | Performs hybrid search across vector and graph stores | Uses LlamaIndex for orchestrating hybrid search across Qdrant and Neo4j |
| `/graph.query` | Executes knowledge graph queries | Translates natural language to Cypher using LlamaIndex's `KnowledgeGraphQueryEngine` |
| `/memory.*` | Operations for memory management | Includes CRUD operations for graph entities and observations |
| `/rules.validate` | Validates operations against defined rules | Uses Neo4j APOC for rule enforcement |
| `/ingest.*` | Handles data ingestion from various sources | Utilizes LlamaIndex data loaders and FastEmbed for embedding generation |## Claude Code Development
This project provides a detailed development guide for Claude Code users. The guide includes:
- Project overview and technical architecture
- Development workflow and memory protocol
- Coding standards and practices
- Git workflow
- MCP server documentation and usage
- Key architectural components and their rolesFor Claude Code development, please refer to [CLAUDE.md](CLAUDE.md) for comprehensive guidelines.
## Deployment
The project includes a complete deployment system using Docker Compose:
### Docker Components
- **MCP Server**: FastAPI server implementing the Model Context Protocol
- **Chainlit UI**: Web interface for user interaction
- **Qdrant**: Vector database for semantic search
- **Neo4j**: Graph database for knowledge graphs
- **Supabase**: PostgreSQL for structured data and metadata
- **Grafana/Prometheus**: Monitoring and observability### Deployment Options
#### Local Deployment
```bash
# Navigate to deployment directory
cd emvr/deployment# Set up environment
./setup_local.sh# Start services using docker-compose
docker compose up -d
```#### Using Makefile
```bash
cd emvr/deployment
make setup # Run setup script
make up # Start all services
```### Security
The deployment includes comprehensive security features:
- JWT-based authentication
- Role-Based Access Control (RBAC)
- Secure environment variable management
- Container-based isolation### Monitoring & Observability
Access system metrics and logs through:
- Grafana dashboard:
- Prometheus metrics:### Backup & Restore
The system includes scripts for data backup and restoration:
```bash
# Create backup
./scripts/backup.sh# Restore from backup
./scripts/restore.sh ./backups/emvr_backup_20250506_120000.tar.gz
```For detailed deployment instructions, see the [deployment README](emvr/deployment/README.md).