{"id":30191159,"url":"https://github.com/jbenshetler/mcp-ragex","last_synced_at":"2026-04-05T21:03:26.029Z","repository":{"id":309438103,"uuid":"1020167184","full_name":"jbenshetler/mcp-ragex","owner":"jbenshetler","description":"MCP server for intelligent code search: semantic (RAG), symbolic (tree-sitter), and regex (ripgrep) search modes. Built for Claude Code and AI coding assistants.","archived":false,"fork":false,"pushed_at":"2025-09-10T19:28:23.000Z","size":1829,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-10T19:51:31.535Z","etag":null,"topics":["ai-tools","chromadb","claude-code","code-analysis","code-search","coding-assistant","developer-tools","embeddings","mcp","mcp-server","model-context-protocol-servers","rag","ripgrep","semantic-search","tree-sitter"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jbenshetler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-15T12:49:46.000Z","updated_at":"2025-09-10T19:28:27.000Z","dependencies_parsed_at":"2025-08-11T23:41:12.808Z","dependency_job_id":null,"html_url":"https://github.com/jbenshetler/mcp-ragex","commit_stats":null,"previous_names":["jbenshetler/mcp-ragex"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/jbenshetler/mcp-ragex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbenshetler%2Fmcp-ragex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbenshetler%2Fmcp-ragex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbenshetler%2Fmcp-ragex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbenshetler%2Fmcp-ragex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jbenshetler","download_url":"https://codeload.github.com/jbenshetler/mcp-ragex/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbenshetler%2Fmcp-ragex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31449838,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T15:22:31.103Z","status":"ssl_error","status_checked_at":"2026-04-05T15:22:00.205Z","response_time":75,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-tools","chromadb","claude-code","code-analysis","code-search","coding-assistant","developer-tools","embeddings","mcp","mcp-server","model-context-protocol-servers","rag","ripgrep","semantic-search","tree-sitter"],"created_at":"2025-08-12T21:00:46.323Z","updated_at":"2026-04-05T21:03:25.990Z","avatar_url":"https://github.com/jbenshetler.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAGex - AI-Powered Code Search for Claude\n\n\u003e **Stop creating duplicate code.** Give Claude semantic search superpowers to find and reuse existing patterns in your codebase.\n\n\u003cdetails open\u003e\n\u003csummary\u003e\u003cstrong\u003eTL;DR - Quick Installation \u0026 Setup\u003c/strong\u003e\u003c/summary\u003e\n\n**Install:**\n```bash\ncurl -fsSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/refs/heads/main/install.sh | bash\n```\n\n**Setup:**\n```bash\ncd your-project\nragex start                    # Index your codebase (1-5 minutes)\nragex register claude | sh     # Connect to Claude Code\n```\n\n**Test:**\n```bash\nragex search \"auth functions\"     # Semantic search\nragex search \"async def\" --regex   # Pattern search\n```\n\n**What you get:** Claude Code can now semantically understand your entire codebase, find existing patterns, and reuse code instead of duplicating it. Works with both semantic search (\"find authentication logic\") and fast regex patterns (\"async def.*test\").\n\n\u003c/details\u003e\n\n## Table of Contents\n\n- [🚨 The Problem](#-the-problem)\n- [✨ The Solution](#-the-solution)  \n- [🚀 Quick Start](#-quick-start)\n- [📹 Video Demos](#-video-demos)\n- [🎯 What You Get](#-what-you-get)\n- [📖 Complete Examples](#-complete-examples)\n- [🔧 CLAUDE.md Setup](#-claudemd-setup)\n- [📋 Installation Details](#-installation-details)\n- [🚀 Advanced Usage](#-advanced-usage)\n- [⚡ Performance \u0026 Architecture](#-performance--architecture)\n- [🌟 Why RAGex?](#-why-ragex)\n- [🤝 Contributing \u0026 Support](#-contributing--support)\n\n## 🚨 The Problem\n\nClaude Code can't find existing code in your project, leading to:\n- ❌ **Duplicate functions** - \"I'll create a new authentication system...\" (when one exists)\n- ❌ **Missed patterns** - Ignores your coding conventions and best practices\n- ❌ **Inefficient workflow** - You resort to manual grep/search to guide Claude\n\n## ✨ The Solution\n\nRAGex gives Claude **semantic understanding** of your entire codebase:\n- 🔍 **Semantic search** - \"Find auth functions\" → discovers `UserValidator`, `loginHandler`, `AuthMiddleware`\n- ⚡ **Lightning fast** - Sub-second search across thousands of files using ripgrep + vector embeddings\n- 🧠 **Context aware** - Understands code relationships, not just text matching\n- 🛡️ **Secure \u0026 private** - Runs locally in Docker, no code leaves your machine\n\n## Features\n\n### 🔍 **Intelligent Search Modes**\n- **Auto-detection**: Automatically chooses the best search mode based on query patterns\n- **Regex mode**: Fast pattern matching with ripgrep for exact patterns\n- **Semantic mode**: Natural language search using sentence-transformers embeddings\n\n### 🚀 **Performance \u0026 Security**\n- **Fast code search** using ripgrep with regex support\n- **Security-first design** with input validation and path restrictions\n- **File type filtering** supporting 30+ programming languages\n- **Enhanced file exclusions** with multi-level .rgignore support and comprehensive defaults\n- **Configurable limits** to prevent resource exhaustion\n- **JSON-RPC interface** following MCP standards\n\n### 🧠 **AI-Powered Features**\n- **Semantic code search** using sentence-transformers embeddings\n- **Query enhancement** with abbreviation expansion and context addition\n- **Intelligent fallback** when primary search mode fails\n- **Teaching system** that guides Claude Code to optimal search usage\n\n## 🚀 Quick Start\n\n### One-Line Install\n\n```bash\ncurl -sSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/main/install.sh | bash\n```\n\n**What happens:**\n- ✅ Auto-detects your platform (AMD64/ARM64/CUDA)\n- ✅ Pulls optimized Docker image (~3GiB/2GiB/13GB)\n- ✅ Installs `ragex` CLI to `~/.local/bin`\n- ✅ Creates isolated user data volume\n\n\u003cdetails\u003e\n\u003csummary\u003e📋 Installation Options \u0026 Details\u003c/summary\u003e\n\n### Installation with Options\n```bash\n# Enable network access + better model (recommended)\ncurl -sSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/main/install.sh | bash -s -- --network --model balanced\n\n# Force CPU version (smaller download)\ncurl -sSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/main/install.sh | bash -s -- --cpu --network\n\n# Force CUDA (NVIDIA GPU)\ncurl -sSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/main/install.sh | bash -s -- --cuda --model accurate\n```\n\n### Platform Auto-Detection\nThe CUDA model image is substantially (8X) faster at indexing, with no performance difference for queries. \n| Platform | Auto-Selected | Image Size |\n|----------|---------------|------------|\n| **AMD64 + NVIDIA GPU** | CUDA | ~13GB |\n| **AMD64 (no GPU)** | CPU | ~3GB |\n| **ARM64 (Apple Silicon)** | CPU | ~2GB |\n\n### Security Modes\n- **Default (Secure)**: No network access, only pre-bundled models\n- **Network Enabled**: Can download additional models (`--network` flag)\n\n### Embedding Models\nThe **fast** model is generally good enough for Python and JavaScript because code is first parsed with `tree-sitter`. \n| Model | Size | Quality | Speed | Use Case |\n|-------|------|---------|-------|---------|\n| **fast** | 90MB | Good | Fastest | Default, Recommended for most users |\n| **balanced** | 435MB | Better | Fast | Improvement for more complex code bases |\n| **accurate** | 1.3GB | Best | Slower | Large codebases |\n| **multilingual** | 435MB | Good | Fast | Multi-language projects |\n\n### Manual Installation\nIf you prefer to inspect the script first:\n```bash\ncurl -sSL https://get.ragex.dev -o install.sh\ncat install.sh  # Review the script\nbash install.sh --network --model balanced\n```\n\n\u003c/details\u003e\n\n### Your First Project\n```bash\ncd your-project\nragex start                    # Index codebase (1-5 minutes)\nragex register claude | sh     # Connect to Claude Code\n```\n\n### Test It Works\n```bash\n# Test semantic search\nragex search \"auth functions\"     # Finds authentication code\nragex search \"error handling\"     # Finds error handling patterns\nragex search \"database queries\"   # Finds DB-related code\n\n# Test regex search  \nragex search \"async def\" --regex   # Find async functions\nragex search \"TODO|FIXME\" --regex  # Find code comments\n```\n\n## 📹 Video Demos\n\n\u003c!-- Asciinema/term-svg capture placeholders - coming soon --\u003e\n\n\n\u003c!--\n### 🚀 Installation Demo\n![Installation Demo](https://github.com/jbenshetler/mcp-ragex/blob/main/cast/install.svg)\n*One-line installation with platform auto-detection*\n--\u003e\n\n\n\u003c!--\n### ⚙️ Setup \u0026 Indexing\n![Setup Demo](https://github.com/jbenshetler/mcp-ragex/blob/main/cast/claude.svg)\n*Project indexing, semantic search examples, and CLI usage*\n--\u003e\n\n\u003c!--\n\n### 💻 CLI Usage Examples  \n[![CLI Demo](https://img.shields.io/badge/Video-Coming%20Soon-blue)](https://github.com/jbenshetler/mcp-ragex)\n*Semantic search, regex patterns, project management commands*\n\n### 🤖 Claude Code Integration\n[![Claude Integration](https://img.shields.io/badge/Video-Coming%20Soon-blue)](https://github.com/jbenshetler/mcp-ragex)\n*Real development workflow: using RAGex within Claude Code sessions*\n--\u003e\n\n\u003e **Note**: Video demonstrations will be added soon using asciinema/term-svg captures showing real-world usage scenarios.\n\n## 🎯 What You Get\n\n### Before RAGex\n```\nYou: \"Add user authentication to this Express app\"\nClaude: \"I'll create a comprehensive authentication system...\"\n        [Creates 200 lines of new auth code]\n        [Duplicates existing middleware patterns]\n        [Ignores your error handling conventions]\n```\n\n### After RAGex\n```\nYou: \"Add user authentication to this Express app\"\nClaude: \"I found your existing auth middleware at middleware/auth.js:15\n         and your user model at models/User.js:8. Let me extend these\n         patterns to add the authentication you need...\"\n        [Reuses existing patterns]\n        [Follows your conventions]\n        [Builds on your architecture]\n```\n\n### Semantic Search Magic\n\n| Your Query | RAGex Finds | Why It's Smart |\n|------------|-------------|----------------|\n| `\"auth functions\"` | `validateToken()`, `loginUser()`, `AuthMiddleware` | Understands authentication concepts |\n| `\"database queries\"` | `getUserById()`, `saveToRedis()`, `queryBuilder` | Recognizes data access patterns |\n| `\"error handling\"` | `try/catch blocks`, `errorMiddleware`, `logError()` | Groups error-related code |\n| `\"file upload\"` | `multer config`, `uploadToS3()`, `validateFile()` | Connects upload-related logic |\n\n## 📖 Complete Examples\n\n### Project Isolation\nEach project gets its own semantic index:\n\n```bash\n# Work project with accurate model\ncd ~/work/api-server\nragex start --model accurate\n# → Creates: ragex_1000_a1b2c3d4ef567890\n\n# Personal project with fast model  \ncd ~/personal/blog\nragex start --model fast\n# → Creates: ragex_1000_f9e8d7c6b5a43210\n\nragex ls -l\n# PROJECT          ID                         MODEL     INDEXED   PATH\n# api-server       ragex_1000_a1b2c3d4ef567890 accurate  yes      ~/work/api-server\n# blog             ragex_1000_f9e8d7c6b5a43210 fast      yes      ~/personal/blog\n```\n\n### Advanced Search Patterns\n\n```bash\n# Semantic search (natural language)\nragex search \"functions that validate user input\"\nragex search \"code that handles file uploads\"\nragex search \"database connection error handling\"\nragex search \"JWT token verification logic\"\n\n# Regex search (exact patterns)\nragex search \"async def.*test\" --regex    # Async test functions\nragex search \"app\\.get\\(.*api\" --regex      # Express API routes\nragex search \"interface.*Props\" --regex    # TypeScript interfaces\n\n# Search with limits and JSON output\nragex search \"auth\" --limit 10 --json     # Top 10 results as JSON\n```\n\n### Project Management\n\n```bash\n# List and inspect projects\nragex ls                         # Show all your projects\nragex ls -l                      # Detailed view with models/status\nragex ls \"api-*\"                 # Filter projects by pattern\nragex info                       # Current project details\n\n# Clean up old projects\nragex rm \"old-project-*\"         # Remove projects matching pattern\nragex rm ragex_1000_abc123       # Remove by specific ID\n\n# Configuration\nragex configure                  # Show current config\nragex configure --cpu            # Switch to CPU mode\nragex configure --cuda           # Switch to CUDA mode\n```\n\n## 🔧 CLAUDE.md Setup\n\nAdd this to your project's `CLAUDE.md` file to optimize Claude Code's search behavior:\n\n```markdown\n# Code Search Guidelines\n\n**IMPORTANT: This project has RAGex MCP enabled for intelligent code search.**\n\n## Search Strategy (Priority Order)\n\n1. **FIRST: Use RAGex MCP tools** - Semantic understanding of your codebase\n   - `search_code()` with semantic mode for concepts: \"auth functions\", \"error handling\"\n   - `search_code()` with regex mode for patterns: \"async def.*test\", \"TODO|FIXME\"\n   - `search_code_simple()` for quick searches with auto-detection\n\n2. **FALLBACK: Use built-in search tools** - Only if RAGex fails or is unavailable\n   - `Grep` for text patterns\n   - `Glob` for file discovery\n\n## RAGex Search Modes\n\nRAGex automatically detects the best search mode:\n\n- **Semantic Mode**: Natural language queries\n  - `\"functions that handle user authentication\"`\n  - `\"error handling for database connections\"`\n  - `\"code that validates JWT tokens\"`\n\n- **Regex Mode**: Pattern matching (use `--regex` or detected automatically)\n  - `\"async def.*test\"` → finds async test functions\n  - `\"app\\.get\\(.*api\"` → finds Express API routes\n  - `\"interface.*Props\"` → finds TypeScript interfaces\n\n- **Symbol Mode**: Exact names (detected automatically)\n  - `\"UserService\"` → finds UserService class\n  - `\"validateInput\"` → finds validateInput function\n\n## Effective Query Examples\n\n```bash\n# Semantic search (recommended)\nsearch_code(\"user authentication and session management\")\nsearch_code(\"database connection error handling\")\nsearch_code(\"file upload processing logic\")\nsearch_code(\"JWT token validation functions\")\n\n# Regex patterns for exact matching\nsearch_code(\"async def.*test\", mode=\"regex\")\nsearch_code(\"TODO|FIXME\", mode=\"regex\")\nsearch_code(\"interface.*Props\", mode=\"regex\")\n\n# Simple interface (auto-detects everything)\nsearch_code_simple(\"auth middleware\")\nsearch_code_simple(\"error handlers\")\n```\n\n## When RAGex Finds Existing Code\n\n1. **ANALYZE** the patterns before writing new code\n2. **EXTEND** existing functions rather than duplicating logic\n3. **FOLLOW** established architecture and naming conventions\n4. **REUSE** utility functions, middleware, and helpers\n5. **UNDERSTAND** the codebase structure and relationships\n\n## Search Tips\n\n- Be specific with domain terms: \"JWT\", \"middleware\", \"validation\", \"serialization\"\n- Use natural language for concepts, patterns for exact matches\n- RAGex understands code relationships, not just text matching\n- Results include file paths and line numbers for easy navigation\n- Try different phrasings if first search doesn't find what you need\n\n## Benefits\n\n- **Faster development**: Reuse existing patterns instead of recreating\n- **Consistent architecture**: Follow established project conventions\n- **Better code discovery**: Find forgotten utilities and helpers\n- **Reduced duplication**: Stop reinventing the wheel\n```\n\n**Why this helps:**\n- Prioritizes RAGex MCP tools over built-in search\n- Provides concrete examples for different search modes\n- Guides Claude toward code reuse and architectural consistency\n- Sets clear expectations for search capabilities and workflow\n\n## 📋 Installation Details\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand full installation guide from doc/installation-guide.md\u003c/summary\u003e\n\n### Quick Start (One-Line Installation)\n\n#### Basic Installation (Auto-Detection)\n```bash\ncurl -fsSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/refs/heads/main/install.sh | bash\n```\n\nThis will:\n- Auto-detect your platform (AMD64, ARM64, or CUDA)\n- Install with secure defaults (no network access for containers)\n- Use the pre-bundled fast embedding model\n\n#### Installation with Options\n```bash\n# Install with network access enabled and balanced model as default\ncurl -fsSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/refs/heads/main/install.sh | bash -s -- --network --model balanced\n\n# Force CPU version (smaller image) with network access\ncurl -fsSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/refs/heads/main/install.sh | bash -s -- --cpu --network --model accurate\n\n# Force CUDA version (requires NVIDIA GPU)\ncurl -fsSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/refs/heads/main/install.sh | bash -s -- --cuda --model balanced\n```\n\n### Installation Parameters\n\n#### Platform Selection\n- **Auto-detection** (default): Automatically detects platform and CUDA support\n- `--cpu`: Force CPU-only version (works on AMD64 and ARM64)\n- `--cuda`: Force CUDA version (AMD64 only, requires NVIDIA GPU + nvidia-docker)\n\n#### Network Configuration\n- **No flag** (default): Secure mode - containers run without network access\n- `--network`: Enable network access for containers (allows downloading additional models)\n\n#### Default Embedding Model\n- **No flag** (default): Uses 'fast' model (pre-bundled in all images)\n- `--model \u003cname\u003e`: Sets default model for new projects\n  - Valid options: `fast`, `balanced`, `accurate`, `multilingual`\n\n### Docker Image Sizes\n\n| Platform | Image Size | Use Case |\n|----------|------------|----------|\n| **AMD64 CPU** | ~3.2 GiB | General use, smaller download |\n| **ARM64 CPU** | ~2.2 GiB | Apple Silicon Macs, ARM servers |\n| **CUDA** | ~13.1 GiB | NVIDIA GPU acceleration |\n\n### Embedding Models\n\n| Model | Size | Speed | Quality | Use Case |\n|-------|------|-------|---------|----------|\n| **fast** | ~90 MB | Fastest | Good | Quick prototyping, smaller codebases |\n| **balanced** | ~435 MB | Moderate | Better | Production use, balanced performance |\n| **accurate** | ~1.3 GB | Slower | Best | Large codebases, maximum quality |\n| **multilingual** | ~435 MB | Moderate | Good | Multi-language projects |\n\n### Security Modes\n\n#### Secure Mode (Default)\n```bash\ncurl -fsSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/refs/heads/main/install.sh | bash\n```\n- Containers run with `--network none`\n- No external network access from containers\n- Only pre-bundled fast model available\n- Suitable for air-gapped environments\n\n#### Network-Enabled Mode\n```bash\ncurl -fsSL https://raw.githubusercontent.com/jbenshetler/mcp-ragex/refs/heads/main/install.sh | bash -s -- --network\n```\n- Containers can access external networks\n- Can download additional embedding models on demand\n- Required for using balanced, accurate, or multilingual models\n\n### Post-Installation\n\n#### Verify Installation\n```bash\nragex --help\nragex info\n```\n\n#### Quick Start\n```bash\ncd your-project\nragex index .          # Index current directory\nragex search \"query\"   # Search your code\n```\n\n#### Configuration\n```bash\nragex configure        # Show current configuration\nragex ls              # List indexed projects\n```\n\n### Troubleshooting\n\n#### Docker Not Found\n```\n❌ Docker not found. Please install Docker first.\n```\n**Solution**: Install Docker from https://docs.docker.com/get-docker/\n\n#### Docker Daemon Not Running\n```\n❌ Docker daemon not running. Please start Docker.\n```\n**Solution**: Start Docker Desktop or run `sudo systemctl start docker`\n\n#### Unsupported Architecture\n```\n❌ Unsupported architecture: s390x\n```\n**Solution**: RAGex currently supports AMD64 and ARM64 only\n\n### Integration with Claude Code\n\nAfter installation, register RAGex with Claude Code:\n\n```bash\n# Get the registration command\nragex register claude\n\n# Run the output command (example):\nclaude mcp add ragex ~/.local/bin/ragex-mcp --scope project\n```\n\nThis enables RAGex as an MCP server for Claude Code, providing intelligent code search capabilities directly in your Claude conversations.\n\n\u003c/details\u003e\n\n## 🚀 Advanced Usage\n\n### Multiple Projects\n```bash\n# Work on different projects simultaneously\ncd ~/work/api-server \u0026\u0026 ragex start --model accurate\ncd ~/personal/blog \u0026\u0026 ragex start --model fast \ncd ~/opensource/cli-tool \u0026\u0026 ragex start --model balanced\n\n# Switch between projects automatically\nragex ls                        # See all projects\ncd ~/work/api-server           # RAGex automatically uses api-server index\nragex search \"authentication\"   # Searches only api-server code\n```\n\n### Environment Variables\n```bash\n# Customize behavior\nexport RAGEX_EMBEDDING_MODEL=balanced    # Default model for new projects\nexport RAGEX_LOG_LEVEL=DEBUG             # Enable debug logging\nexport RAGEX_DOCKER_IMAGE=custom:tag     # Use custom Docker image\n\n# Log rotation settings\nexport RAGEX_LOG_MAX_SIZE=100m           # Max log file size\nexport RAGEX_LOG_MAX_FILES=5             # Number of rotated logs to keep\n```\n\n### Development \u0026 Debugging\n```bash\n# View logs\nragex log                       # Current project logs\nragex log -f                    # Follow logs in real-time\nragex log --tail 50            # Last 50 lines\n\n# Status and info\nragex status                    # Check daemon status\nragex info                      # Project details\nragex configure                 # Current configuration\n\n# Development mode\nragex bash                      # Get shell inside container\nRAGEX_DEBUG=1 ragex start      # Enable debug output\n```\n\n### Data Management\n```bash\n# Your data is isolated by user ID\ndocker volume ls | grep ragex_user_$(id -u)\n\n# Project data structure:\n# /data/models/                   # Shared embedding models (90MB-1.3GB)\n# /data/projects/ragex_1000_*/    # Individual project indexes\n#   ├── chroma_db/               # Vector database  \n#   └── project_info.json        # Project metadata\n\n# Backup a project\nragex export my-project backup.tar.gz\n\n# Check disk usage\nragex ls -l                     # Shows index sizes\n\n# Clean up old projects\nragex rm \"test-*\"               # Remove test projects\nragex rm ragex_1000_old123      # Remove specific project\n```\n\n### Uninstall\n```bash\n# Complete removal (WARNING: Deletes all indexed data)\n# Stop all ragex containers\ndocker ps -a -f \"name=ragex_\" -q | xargs -r docker stop\ndocker ps -a -f \"name=ragex_\" -q | xargs -r docker rm\n\n# Remove images and volumes\ndocker images \"*ragex*\" -q | xargs -r docker rmi\ndocker volume ls -f \"name=ragex_user_\" -q | xargs -r docker volume rm\n\n# Remove binaries and config\nrm -rf ~/.config/ragex ~/.local/bin/ragex ~/.local/bin/ragex-mcp\n\n# Unregister from Claude Code\nclaude mcp remove ragex --scope project\n```\n\n## ⚡ Performance \u0026 Architecture\n\n### 🔌 MCP Communication Through Docker\n\nThe MCP protocol uses stdin/stdout for communication. The `ragex` wrapper script handles this transparently:\n\n```bash\n# When Claude Code runs:\nclaude mcp add ragex /home/user/.local/bin/ragex\n\n# It communicates like this:\nClaude Code ←→ ragex script ←→ Docker container ←→ MCP Server\n           stdin/stdout    stdin/stdout      stdin/stdout\n```\n\n**Key points:**\n- The `ragex` script acts as a bridge between Claude Code and the Docker container\n- For MCP server mode, Docker runs with `-i` (interactive) but NOT `-t` (no TTY)\n- TTY would break JSON-RPC communication by adding terminal control codes\n- The wrapper preserves stdin/stdout pipes for proper MCP protocol communication\n\n### 🏗️ Container Structure\n\n```\n/app/                     # Application code (read-only)\n├── src/                  # Source code\n├── scripts/              # Utility scripts\n├── requirements.txt      # Python dependencies\n└── entrypoint.sh        # Container entrypoint\n\n/data/                    # User-specific persistent data (volume)\n├── models/              # Shared embedding models (400MB-1.3GB)\n└── projects/            # Project-specific data\n    ├── ragex_1000_abc123/  # Project 1\n    │   ├── chroma_db/      # ChromaDB vector database  \n    │   └── project_info.json # Project metadata\n    └── ragex_1000_def456/  # Project 2\n        ├── chroma_db/\n        └── project_info.json\n\n/workspace/              # Your code (volume, read-only)\n└── [current project]    # Code to be indexed\n```\n\n### 🔧 Environment Variables\n\nDocker containers support these environment variables:\n\n```bash\n# Project identification (automatically set by wrapper)\nWORKSPACE_PATH=/path/to/your/project    # Workspace being indexed\nPROJECT_NAME=ragex_1000_abc123          # Generated project ID\n\n# Data directories (automatically configured)\nRAGEX_PROJECT_DATA_DIR=/data/projects/ragex_1000_abc123  # Project data\nRAGEX_CHROMA_PERSIST_DIR=/data/projects/ragex_1000_abc123/chroma_db  # ChromaDB\nTRANSFORMERS_CACHE=/data/models         # Shared model cache\nSENTENCE_TRANSFORMERS_HOME=/data/models # Sentence transformers cache\n\n# User configuration\nRAGEX_EMBEDDING_MODEL=fast              # Model preset (fast/balanced/accurate)\nRAGEX_CHROMA_COLLECTION=code_embeddings # Collection name\n\n# System configuration\nRAGEX_LOG_LEVEL=INFO                    # Log level (DEBUG, INFO, WARN, ERROR) - default: INFO\nLOG_LEVEL=INFO                          # Fallback log level (RAGEX_LOG_LEVEL takes precedence)\nDOCKER_CONTAINER=true                   # Indicates running in container\n```\n\n### 🐳 Production Deployment\n\nUse the production Docker Compose for deployment:\n\n```bash\n# Production setup with resource limits\ndocker compose -f docker-compose.prod.yml up -d\n\n# Check status\ndocker compose -f docker-compose.prod.yml ps\n\n# View logs\ndocker compose -f docker-compose.prod.yml logs ragex\n```\n\n## Integration\n\n### 🖥️ Claude Code (CLI)\n\n#### Docker Integration (Recommended)\n\nRegister the Docker-based MCP server:\n\n```bash\ncd /path/to/your/project\nclaude mcp add ragex /path/to/mcp-ragex/ragex --scope project\n```\n\nThe `ragex` script automatically handles Docker execution and volume mounting.\n\n#### Manual Integration\n\nFor manual/development setups:\n\n```bash\n# Option 1: Using wrapper script\nclaude mcp add ragex /path/to/mcp-ragex/mcp_ragex.sh --scope project\n\n# Option 2: Direct Python command\nclaude mcp add ragex uv run /path/to/mcp-ragex/src/server.py --scope project\n```\n\n### 🖱️ Claude Desktop (App)\n\nAdd to your `claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"ragex\": {\n      \"command\": \"/path/to/mcp-ragex/ragex\",\n      \"env\": {\n        \"RAGEX_DATA_DIR\": \"/path/to/persistent/data\"\n      }\n    }\n  }\n}\n```\n\n### ✅ Verifying MCP Connection\n\nAfter configuration, verify the MCP server is connected:\n\n```bash\n# In Claude Code, use the /mcp command\n/mcp\n```\n\nThis will show the status of all configured MCP servers. You should see `ragex` in the list.\n\n## Usage Examples\n\n### **Example Semantic Search**\nThis is what happened in the CLI:\n```\n● Let me try a broader semantic search:\n\n● Let me search for admin endpoints:\n ⎿  Interrupted by user\n\n\u003e /exit\n ⎿  (no content)\n\n\u003e Try the ragex semantic search again.\n\n● Task(Search for reindex endpoints)\n ⎿  Done (8 tool uses · 79.3k tokens · 1m 32.2s)\n\n● Based on the search results, I found the department reindexing endpoints. The main force reindex endpoint is:\n\n POST /admin/reindex/department in /docprocessor/workflows_api.py (line 834)\n\n```\n\nAnd this is what the Ctrl-R logs show:\n```\n● I'll search for the function that adds documents to OpenSearch and check the field structure, then verify all 11 documents have extracted text.\n\n● Task(Task: Find OpenSearch indexing function\n\n   Prompt: Search for the function that indexes documents to OpenSearch. Look for:\n   1. Functions that create or update documents in OpenSearch\n   2. The field names used when indexing, particularly for extracted text\n   3. Index operations, PUT or POST requests to OpenSearch\n   4. Document structure being sent to OpenSearch\n\n   Use search terms like:\n   - opensearch index document\n   - _doc PUT POST\n   - index_document\n   - create_document\n   - extracted_text content text)\n ⎿  Search(pattern: \"index.*document|create.*document|update.*document\", path: \"undefined\")\n      ⎿  Found 55 files\n         /home/jeff/clients/search/contextsearch/tests/ui/README.md\n         /home/jeff/clients/search/contextsearch/tests/ocr/output_metadata.json\n         /home/jeff/clients/search/contextsearch/tests/ocr/output_rmeta_fixed.json\n         /home/jeff/clients/search/contextsearch/tests/needs_porting/test_workflow_implementation.py\n         /home/jeff/clients/search/contextsearch/tests/obsolete/test_workflow_celery_integration.py\n         /home/jeff/clients/search/contextsearch/tests/obsolete/test_workflow_integration.py\n         /home/jeff/clients/search/contextsearch/tests/docprocessor/test_workflows_real_files.py\n         /home/jeff/clients/search/contextsearch/tests/functional/cache/test_cache_repopulation.py\n         /home/jeff/clients/search/contextsearch/tests/functional/cache/test_comprehensive_cache_validation.py\n         /home/jeff/clients/search/contextsearch/tests/functional/integration/filesystem_scanner.py\n         /home/jeff/clients/search/contextsearch/tests/functional/nlp/test_batch_size_one.py\n         /home/jeff/clients/search/contextsearch/tests/functional/search/test_date_filtering.py\n         /home/jeff/clients/search/contextsearch/tests/functional/search/test_entity_filtering.py\n         /home/jeff/clients/search/contextsearch/tests/functional/search/test_force_reindexing.py\n         /home/jeff/clients/search/contextsearch/samples/simulate_tasks.py\n```\n\n### 🔍 **Intelligent Search Modes**\n\n#### Auto-Detection (Recommended)\n```bash\n# Claude Code automatically detects the best search mode\nsearch_code(\"DatabaseConnection\")              # → symbol mode\nsearch_code(\"functions that handle auth\")      # → semantic mode  \nsearch_code(\"error.*Exception\")                # → regex mode\n```\n\n#### Explicit Mode Selection\n```bash\n# Symbol search - when you know exact names\nsearch_code(\"AuthenticationService\")\n\n# Semantic search - when you know the concept\nsearch_code(\"functions that validate user input\", mode=\"semantic\")\n\n# Regex search - when you know the pattern\nsearch_code(\"handleError.*Exception\", mode=\"regex\")\n```\n\n### 🧠 **Semantic Search Examples**\n\n```bash\n# Find authentication-related code\nsearch_code(\"functions that handle user authentication\")\n\n# Find error handling patterns\nsearch_code(\"error handling for database connections\")\n\n# Find file processing code\nsearch_code(\"code that processes uploaded files\")\n\n# Find validation logic\nsearch_code(\"functions that validate user input\")\n```\n\n### 📋 **Symbol Search Examples**\n\n```bash\n# Find specific classes\nsearch_code(\"UserService\")\n\n# Find specific functions\nsearch_code(\"validateInput\")\n\n# Find methods\nsearch_code(\"submitToQueue\")\n```\n\n### 🔧 **Regex Search Examples**\n\n```bash\n# Find async functions\nsearch_code(\"async def\", mode=\"regex\")\n\n# Find TODO comments\nsearch_code(\"TODO|FIXME\", mode=\"regex\")\n\n# Find error handling blocks\nsearch_code(\"try.*except\", mode=\"regex\")\n```\n\n### 🛠 **Advanced Features**\n\n#### Capability Discovery\n```bash\n# Check available search modes\nget_search_capabilities()\n```\n\n#### Simple Search Interface\n```bash\n# Just search - auto-detects everything\nsearch_code_simple(\"database connection error\")\n```\n\n#### Raw Output Format\n```bash\nsearch_code(\"submit_file\", format=\"raw\")\n```\nReturns simple `file:line` format for programmatic use.\n\n### 📁 **File Type and Path Filtering**\n\n```bash\n# Search only Python files\nsearch_code(\"class.*User\", file_types=[\"py\"])\n\n# Search specific directories\nsearch_code(\"test_\", paths=[\"tests\", \"src/tests\"])\n\n# Combine filters\nsearch_code(\"async def\", file_types=[\"py\"], paths=[\"src\"])\n```\n\n## Logging and Debugging\n\n### Setting Log Levels\n\nRAGex uses `RAGEX_LOG_LEVEL` to control logging verbosity. The default is `INFO`.\n\n```bash\n# Set log level before starting daemon\nexport RAGEX_LOG_LEVEL=DEBUG\nragex start\n\n# Or set for a single command\nRAGEX_LOG_LEVEL=DEBUG ragex start\n\n# For very verbose debugging (generates lots of output)\nRAGEX_LOG_LEVEL=TRACE ragex start\n```\n\n**Available Log Levels:**\n- `TRACE`: Very detailed debugging (ignore decisions, file system operations)\n- `DEBUG`: Detailed debugging info (file processing, embeddings, scores)\n- `INFO`: General operation info (search queries, index progress) - **default**\n- `WARN`: Warnings and potential issues only\n- `ERROR`: Error messages only\n\n**Important:** The log level is set when the daemon starts and cannot be changed without restarting:\n\n```bash\n# To change log level after daemon is running:\nragex stop\nRAGEX_LOG_LEVEL=DEBUG ragex start\n```\n\n### Viewing Logs\n\n```bash\n# View daemon logs for current project\nragex log\n\n# Follow logs in real-time\nragex log -f\n\n# View last 50 lines\nragex log --tail 50\n\n# View logs for specific project\nragex log project-name\n\n# View MCP server logs (when using with Claude)\ntail -f /tmp/ragex-mcp.log\n```\n\n### Log Rotation and Storage\n\nRAGex automatically manages log file sizes through Docker's log rotation to prevent disk space issues:\n\n**Default Log Limits:**\n- **Daemon logs**: 50MB per file, 3 files maximum (150MB total)\n- **Admin commands**: 10MB per file, 2 files maximum (20MB total)\n\n**Customizing Log Rotation:**\n\n```bash\n# Set custom log rotation limits\nexport RAGEX_LOG_MAX_SIZE=100m      # Maximum size per log file\nexport RAGEX_LOG_MAX_FILES=5        # Maximum number of log files to keep\n\n# Apply settings when starting daemon\nragex stop\nragex start\n\n# View current log rotation settings\nragex configure\n```\n\n**Available Size Units:**\n- `k` or `kb`: Kilobytes (e.g., `500k`)\n- `m` or `mb`: Megabytes (e.g., `50m`) \n- `g` or `gb`: Gigabytes (e.g., `1g`)\n\n**Log Storage Location:**\n- Logs are stored inside Docker containers and managed by Docker's log driver\n- Use `ragex log` to view logs (automatic log rotation is handled transparently)\n- Old log files are automatically deleted when limits are exceeded\n\n### Performance Metrics\n\n| Operation | Speed | Notes |\n|-----------|-------|-------|\n| **Indexing** | ~100 symbols/sec | Intel i9-7900X, varies by project size |\n| **Semantic Search** | \u003c100ms | 1000+ symbols, cached embeddings |\n| **Regex Search** | \u003c50ms | Powered by ripgrep, sub-second for large codebases |\n| **Project Switching** | Instant | Automatic workspace detection |\n\n### Real-World Timing\n- **Small projects** (1k-10k LOC): 70 seconds initial indexing\n- **Medium projects** (10k-100k LOC): 130 seconds initial indexing  \n- **Large projects** (100k+ LOC): 5+ minutes initial indexing\n- **Subsequent searches**: Sub-second response times\n\n### Docker Resource Usage\n\n| Component | Memory | CPU | Storage |\n|-----------|--------|-----|----------|\n| **Base container** | ~100MB | Low | ~3-13GB (varies by image) |\n| **During indexing** | ~500MB peak | High | Temporary spike |\n| **During search** | ~300MB | Low | Persistent |\n| **ChromaDB index** | ~50MB | N/A | ~1MB per 1000 symbols |\n\n### Security \u0026 Privacy\n\n🔒 **Enterprise-Ready Security:**\n- ✅ **Air-gapped mode** - No network access required (secure default)\n- ✅ **Local processing** - Code never leaves your machine\n- ✅ **Input validation** - All queries sanitized and validated\n- ✅ **Path restrictions** - Searches confined to project directories\n- ✅ **Resource limits** - Protection against resource exhaustion\n- ✅ **No shell execution** - Direct subprocess calls only\n\n### Architecture Overview\n\n```mermaid\ngraph TD\n    A[Claude Code] --\u003e|MCP Protocol| B[RAGex Server]\n    B --\u003e C[Query Router]\n    C --\u003e|Semantic Queries| D[Vector Search]\n    C --\u003e|Regex Patterns| E[Ripgrep Search]\n    C --\u003e|Symbol Names| F[Tree-sitter Search]\n    \n    D --\u003e G[ChromaDB]\n    D --\u003e H[Sentence Transformers]\n    E --\u003e I[ripgrep]\n    F --\u003e J[Tree-sitter AST]\n    \n    K[Docker Container] --\u003e L[Project Isolation]\n    L --\u003e M[User Volume]\n    M --\u003e N[Project 1 Index]\n    M --\u003e O[Project 2 Index]\n    M --\u003e P[Shared Models]\n```\n\n**Key Components:**\n- 🧠 **Vector Search**: Semantic understanding via sentence-transformers + ChromaDB\n- ⚡ **Regex Search**: Lightning-fast pattern matching with ripgrep\n- 🌳 **AST Parsing**: Code structure analysis with Tree-sitter\n- 🐳 **Docker Isolation**: Secure, reproducible environment per user\n- 📁 **Project Separation**: SHA256-based unique project identification\n\n### Search Intelligence\n\n```mermaid\nflowchart LR\n    A[User Query] --\u003e B{Query Analysis}\n    B --\u003e|\"auth login\"| C[Semantic Mode]\n    B --\u003e|\"async def.*\"| D[Regex Mode] \n    B --\u003e|\"handleSubmit\"| E[Symbol Mode]\n    \n    C --\u003e F[Vector Similarity]\n    D --\u003e G[Pattern Matching]\n    E --\u003e H[AST Analysis]\n    \n    F --\u003e I{Results Found?}\n    G --\u003e I\n    H --\u003e I\n    \n    I --\u003e|Yes| J[Return Results]\n    I --\u003e|No| K[Try Fallback Mode]\n    K --\u003e J\n```\n\n**Smart Features:**\n1. 🎯 **Auto-detection** - Analyzes query patterns to choose optimal search mode\n2. 🔄 **Intelligent fallback** - Tries alternative modes if primary search fails\n3. 📊 **Result ranking** - Semantic relevance scoring for better matches\n4. 💡 **Query enhancement** - Expands abbreviations and adds context\n5. 🎓 **Learning system** - Guides Claude Code to optimal usage patterns\n\n### Supported Languages\n\n| Feature | Supported Languages |\n|---------|--------------------|\n| **Regex Search** | All file types (universal) |\n| **Semantic Search** | Python, JavaScript, TypeScript, JSX, TSX, C/C++, HTML, CSS |\n| **Symbol Extraction** | Python, JavaScript, TypeScript (Tree-sitter AST parsing) |\n| **Planned Support** | Go, Rust, Java, C#, PHP, Ruby |\n\n### File Type Detection\n- **Automatic**: Based on file extensions\n- **Configurable**: Via `.gitignore` patterns\n- **Smart Exclusions**: Skips binaries, generated files, dependencies\n\n### Smart File Exclusions\n\n**🎯 Comprehensive Defaults** (automatically applied):\n```gitignore\n# Dependencies\nnode_modules/, .venv/, __pycache__/, vendor/\n\n# Build artifacts  \nbuild/, dist/, target/, .next/, .nuxt/\n\n# IDE files\n.vscode/, .idea/, *.swp, .DS_Store\n\n# Logs and temp\n*.log, .tmp/, .cache/\n\n# Media files\n*.jpg, *.png, *.mp4, *.zip, *.pdf\n```\n\n**⚙️ Customizable per Project**:\n- Uses standard `.gitignore` syntax\n- Multi-level support (project/directory/subdirectory)\n- Respects existing `.gitignore` files\n- `ragex init` creates comprehensive `.gitignore` template\n\n## 🌟 Why RAGex?\n\n### The RAGex Advantage\n\n| Traditional Tools | RAGex |\n|------------------|--------|\n| 😫 Text-only search | 🧠 **Semantic understanding** |\n| 📝 Manual copy-paste workflow | 🔄 **Intelligent code reuse** |\n| 🐌 Grep through everything | ⚡ **Vector-powered speed** |\n| 🔍 Find exact matches only | 🎯 **Conceptual similarity** |\n| 😰 \"Did I check all files?\" | ✅ **Comprehensive indexing** |\n| 🚫 Works against Claude | 🤝 **Enhances Claude's abilities** |\n\n### Success Stories\n\n**Before RAGex:**\n\u003e \"Claude, add authentication to this Express app\"\n\u003e → Creates duplicate middleware (200+ lines)\n\u003e → Ignores existing user models  \n\u003e → Breaks established patterns\n\n**After RAGex:**\n\u003e \"Claude, add authentication to this Express app\"\n\u003e → Finds existing `auth.middleware.js:15`\n\u003e → Extends current `User` model  \n\u003e → Follows project conventions\n\u003e → 90% less code, 100% more consistency\n\n### Enterprise Benefits\n- 📈 **Faster development** - Reuse \u003e Rewrite\n- 🎯 **Consistent patterns** - Claude follows your architecture\n- 🔍 **Better code discovery** - Find forgotten utilities and helpers\n- 🧹 **Reduced duplication** - Stop reinventing the wheel\n- 📚 **Knowledge preservation** - Your codebase becomes searchable documentation\n\n## 🤝 Contributing \u0026 Support\n\n### Getting Help\n- 📖 **Documentation**: [Full docs](https://github.com/jbenshetler/mcp-ragex/tree/main/doc)\n- 🐛 **Issues**: [GitHub Issues](https://github.com/jbenshetler/mcp-ragex/issues)\n- 💬 **Discussions**: [GitHub Discussions](https://github.com/jbenshetler/mcp-ragex/discussions)\n- 📧 **Support**: [support@ragex.dev](mailto:support@ragex.dev)\n\n### Development\n```bash\n# Get the code\ngit clone https://github.com/jbenshetler/mcp-ragex.git\ncd mcp-ragex\n\n# Local development setup\nmake install-cpu \u0026\u0026 ragex start\n\n# Run tests\nuv run tests/test_server.py\npytest tests/\n\n# Build documentation\nmake docs\n```\n\n### Roadmap\n- 🚀 **Multi-language support** - Go, Rust, Java, C#\n- 🔍 **Hybrid search** - Combine semantic + keyword results\n- 📱 **IDE extensions** - VS Code, JetBrains, Vim\n- 🌐 **Cloud deployment** - Kubernetes, AWS, GCP\n- 🧠 **Custom embeddings** - Fine-tune models for your domain\n\n---\n\n**⭐ Star us on GitHub** | **🐳 [Docker Hub](https://hub.docker.com/r/ragex/mcp-server)** | **📦 [GitHub Packages](https://github.com/jbenshetler/mcp-ragex/pkgs/container/mcp-ragex)**\n\n*Made with ❤️ for developers who believe in smart code reuse*\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjbenshetler%2Fmcp-ragex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjbenshetler%2Fmcp-ragex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjbenshetler%2Fmcp-ragex/lists"}