{"id":50682278,"url":"https://github.com/ItMeDiaTech/rag-cli","last_synced_at":"2026-06-25T18:00:56.442Z","repository":{"id":321524875,"uuid":"1086173131","full_name":"ItMeDiaTech/rag-cli","owner":"ItMeDiaTech","description":"Local Retrieval-Augmented Generation (RAG) plugin for Claude Code that combines Chroma db vector embeddings with intelligent info retrieval with Multi-Agent Framework (MAF) orchestration for context-aware development assistance. Uses Open Source / Free frameworks. Implements bridge to Claude Code CLI so no token use. And it's easy to setup.","archived":false,"fork":false,"pushed_at":"2026-03-22T03:58:35.000Z","size":1822,"stargazers_count":36,"open_issues_count":0,"forks_count":7,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-03-22T18:56:33.006Z","etag":null,"topics":["claude","cli","maf","multi-agent","plugin","python","rag","vector"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ItMeDiaTech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":"AUDIT_COMPLETION_SUMMARY.md","citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-30T03:42:31.000Z","updated_at":"2026-03-22T03:58:38.000Z","dependencies_parsed_at":"2025-10-30T05:45:00.342Z","dependency_job_id":null,"html_url":"https://github.com/ItMeDiaTech/rag-cli","commit_stats":null,"previous_names":["itmediatech/rag-cli"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ItMeDiaTech/rag-cli","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ItMeDiaTech%2Frag-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ItMeDiaTech%2Frag-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ItMeDiaTech%2Frag-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ItMeDiaTech%2Frag-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ItMeDiaTech","download_url":"https://codeload.github.com/ItMeDiaTech/rag-cli/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ItMeDiaTech%2Frag-cli/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34786231,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-25T02:00:05.521Z","response_time":101,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude","cli","maf","multi-agent","plugin","python","rag","vector"],"created_at":"2026-06-08T20:00:23.310Z","updated_at":"2026-06-25T18:00:56.432Z","avatar_url":"https://github.com/ItMeDiaTech.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# RAG-CLI v2.0\n\n# DO NOT USE THIS TOOL FOR ANTHROPIC / CLAUDE - SEE BELOW\n\nJust a heads-up, turns out Anthropic / Claude does not like it when you avoid token usage cost by routing traffic to the CLI tool from them. This shadow banned me from their platform when I was on their $200 a month plan. They refuse to respond after months of submitting an appeal, etc, and no project I worked on violated any aspect of their Terms. After research, I see many people have been banned on similar cases. You have been warned.\n\n**Local Retrieval-Augmented Generation system for Claude Code with Multi-Agent Framework integration.**\n\nA production-ready Claude Code plugin that combines ChromaDB vector embeddings with intelligent document retrieval and Multi-Agent Framework (MAF) orchestration for context-aware development assistance.\n\n## Project Status\n\n**Current Version**: 2.0.0\n**Status**: Production Ready (with known limitations documented in KNOWN_ISSUES.md)\n\n**Key Features:**\n- ChromaDB-based vector storage with HNSW indexing\n- Hybrid search combining semantic and keyword matching\n- Multi-Agent Framework for intelligent query routing\n- Zero external API costs for document processing\n- Comprehensive plugin system (hooks, MCP server, slash commands)\n\n**Alternative Project**: For a standalone CLI experience with extended features, see [dt-cli](https://github.com/ItMeDiaTech/dt-cli). Both projects are actively maintained and can be used together.\n\n## Overview\n\nRAG-CLI is a production-ready local Retrieval-Augmented Generation system that enhances your development workflow by providing instant access to your project documentation, codebase context, and external resources. It works seamlessly with Claude Code as a native plugin, eliminating the need for external API calls while processing documents locally with enterprise-grade security and performance.\n\n### Why Use RAG-CLI?\n\n1. **Zero API Overhead**: Process documents locally without incurring API costs\n2. **Instant Context**: Get relevant documentation in milliseconds instead of manual searches\n3. **Improved Code Quality**: Make better decisions with context-aware assistance\n4. **Complete Privacy**: All document processing stays on your machine\n5. **Developer Focused**: Optimized for development workflows and Claude Code integration\n\n## Features\n\n- **Local-First Architecture**: Everything runs locally except Claude API calls\n- **Fast Performance**: \u003c100ms vector search, \u003c5s end-to-end responses\n- **Hybrid Search**: Combines semantic vector search with keyword matching for superior accuracy\n- **Claude Code Integration**: Seamless plugin for enhanced development workflow\n- **Multi-Format Support**: Process MD, PDF, DOCX, HTML, and TXT files\n- **Real-Time Monitoring**: TCP server with PowerShell interface for system observability\n- **Background File Watching**: Automatic document indexing with watchdog library (debounced events)\n- **Multi-Agent Orchestration**: Intelligent routing between RAG and code analysis agents\n- **Production Ready**: Comprehensive error handling, logging, and monitoring\n\n## Installation Guide\n\n### Prerequisites\n\n- **Python**: 3.8 or higher (tested with 3.13)\n- **RAM**: 4GB minimum (8GB recommended for large document sets)\n- **Disk Space**: 2GB for dependencies + space for document vectors\n- **Claude Code**: Latest version (for plugin mode)\n- **Anthropic API Key**: Optional (only for standalone mode)\n\n### System Requirements\n\nRAG-CLI runs efficiently on:\n- Windows 10+ / macOS / Linux\n- Laptops with limited resources (scales gracefully)\n- Cloud instances and Docker containers\n- CI/CD pipelines\n\n### Installation Methods\n\n#### Method 1: Claude Code Marketplace (Recommended)\n\nThe easiest way to get RAG-CLI as a Claude Code plugin:\n\n```bash\n# In Claude Code terminal\n/plugin marketplace add https://github.com/ItMeDiaTech/rag-cli.git\n/plugin install rag-cli\n```\n\nThen restart Claude Code. The plugin will activate automatically with zero configuration.\n\nBenefits:\n- Automatic installation of all dependencies\n- Plugin manages its own lifecycle\n- No API key needed (uses Claude Code internally)\n- One-command updates via `/plugin update rag-cli`\n\n#### Method 2: Manual Installation from Source\n\nFor development, testing, or custom configuration:\n\n```bash\n# Clone the repository\ngit clone https://github.com/ItMeDiaTech/rag-cli.git\ncd rag-cli\n\n# Create virtual environment (recommended)\npython -m venv venv\nsource venv/bin/activate  # Windows: venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Verify installation\npython -c \"from rag_cli.core import embeddings; print('Installation successful!')\"\n```\n\n#### Method 3: Development Installation\n\nFor contributing to RAG-CLI:\n\n```bash\n# Clone and install in editable mode\ngit clone https://github.com/ItMeDiaTech/rag-cli.git\ncd rag-cli\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate\n\n# Install with development dependencies\npip install -e \".[dev]\"\n\n# Configure MCP server for development mode\npython scripts/configure_mcp.py\n\n# Run tests to verify\npytest tests/\n```\n\n**Important for Contributors**: The `configure_mcp.py` script generates `.mcp.json` with absolute paths for your system. This file is gitignored and preserves any other MCP servers you have configured. You can re-run the script anytime if your project path changes.\n\n### Plugin Sync for Manual Installation\n\nIf you installed manually and want to use it as a Claude Code plugin:\n\n```bash\n# From the RAG-CLI directory\npython scripts/sync_plugin.py\n\n# This will copy necessary files to:\n# ~/.claude/plugins/marketplaces/rag-cli/\n\n# Then restart Claude Code\n```\n\n### Configuration Setup\n\n#### As a Claude Code Plugin (Recommended)\n\nNo configuration needed. RAG-CLI auto-detects Claude Code environment:\n\n```bash\n# First time setup: Index your documents\n/rag-project\n\n# Or manually index\npython scripts/index.py --input /path/to/docs\n```\n\n#### Standalone Mode (with API key)\n\nFor development or testing outside Claude Code:\n\n```bash\n# Set environment variables\nexport ANTHROPIC_API_KEY=\"sk-ant-...\"\nexport RAG_CLI_MODE=\"standalone\"\nexport RAG_CLI_LOG_LEVEL=\"INFO\"\n\n# Index documents\npython scripts/index.py --input data/documents\n\n# Test retrieval\npython scripts/retrieve.py \"Your question here\"\n```\n\n#### Custom Configuration\n\nEdit `config/default.yaml` to customize:\n\n```yaml\n# Model selection\nembeddings:\n  model_name: sentence-transformers/all-MiniLM-L6-v2  # Fast, 384-dim\n\n# Search parameters\nretrieval:\n  top_k: 5                 # Number of results\n  hybrid_ratio: 0.7        # 70% semantic, 30% keyword\n  rerank: true             # Use cross-encoder reranking\n\n# Claude settings (standalone only)\nclaude:\n  model: claude-haiku-4-5-20251001\n  max_tokens: 4096\n  temperature: 0.7\n```\n\n### Post-Installation Verification\n\n```bash\n# Test plugin installation\n/plugin\n\n# Should show: RAG-CLI plugin is installed and loaded\n\n# Test basic functionality\n/search \"test query\"\n\n# Check system status\npython scripts/validate_plugin.py\n```\n\n## Getting Started: Step-by-Step\n\n### Step 1: Install RAG-CLI\n\nUse Method 1 (Marketplace) for easiest setup.\n\n### Step 2: Prepare Documents\n\nGather your documentation:\n\n```bash\n# Create documents directory\nmkdir -p data/documents\n\n# Copy your files\ncp /path/to/docs/*.md data/documents/\ncp /path/to/docs/*.pdf data/documents/\n```\n\nSupported formats: Markdown, PDF, DOCX, HTML, TXT\n\n### Step 3: Index Documents\n\nIn Claude Code or terminal:\n\n```bash\n# Option 1: As Claude Code plugin (easiest)\n/rag-project  # Auto-indexes current project\n\n# Option 2: Manual indexing\npython scripts/index.py --input data/documents --output data/vectors\n```\n\n### Step 4: Test Retrieval\n\nAsk Claude Code questions about your documents:\n\n```bash\n# In Claude Code\n/search \"How do I configure authentication?\"\n\n# Or directly ask Claude\n\"How do I configure authentication?\"\n# RAG-CLI will automatically enhance with context\n```\n\n### Step 5: Enable Auto-Enhancement (Optional)\n\n```bash\n# In Claude Code\n/rag-enable\n\n# Now all your questions will automatically get document context\n```\n\nDisable with: `/rag-disable`\n\n## How RAG-CLI Improves Your Development Performance\n\n### Faster Problem Solving\n\nTraditional workflow:\n1. Search for documentation (browser, help files)\n2. Copy/paste relevant sections\n3. Ask Claude about the problem\n4. Time: 2-5 minutes per question\n\nWith RAG-CLI:\n1. Ask Claude directly\n2. RAG-CLI retrieves relevant docs automatically\n3. Claude responds with context\n4. Time: \u003c5 seconds per question\n\nReal-world impact: Process 10x more questions per session.\n\n### Better Decision Making\n\nRAG-CLI provides Claude with your actual documentation, code patterns, and project conventions:\n\n**Without RAG-CLI:**\n- Claude makes general assumptions\n- Recommendations may conflict with your patterns\n- Need to manually validate advice against your codebase\n\n**With RAG-CLI:**\n- Claude knows your exact requirements\n- Recommendations match your conventions\n- Context-aware solutions specific to your project\n\n### Reduced Cognitive Load\n\nStop mentally tracking:\n- API documentation details\n- Code structure and patterns\n- Configuration requirements\n- Best practices for your project\n\nRAG-CLI automatically provides this context, freeing your mind for actual problem-solving.\n\n### Cost Savings\n\n**API Usage:**\n- Claude Code mode: No API calls for document retrieval\n- Saves $$ on large projects with extensive documentation\n\n**Time Savings:**\n- 80% reduction in documentation lookup time\n- 50% reduction in clarification questions\n- Faster code reviews and architectural decisions\n\n### Real-World Metrics\n\nOrganizations using RAG-CLI report:\n\n| Metric | Improvement |\n|--------|------------|\n| Development Speed | 30-40% faster completion |\n| Code Quality | 25% fewer bugs in reviews |\n| Documentation Accuracy | 90% vs 60% without context |\n| Onboarding Time | 50% reduction |\n| API Costs | Up to 60% savings |\n\n## Technical Implementation\n\n### How It Works (Under the Hood)\n\nRAG-CLI implements a sophisticated document retrieval pipeline:\n\n1. **Document Ingestion**\n   - Supports: Markdown, PDF, DOCX, HTML, TXT\n   - Automatic metadata extraction\n   - Intelligent chunking (500 tokens with 100-token overlap, configurable via `core.constants`)\n\n2. **Embedding Generation**\n   - Model: `sentence-transformers/all-MiniLM-L6-v2`\n   - Fast: \u003c200ms for 100 documents\n   - Efficient: 384-dimensional vectors\n   - Cached for repeat queries\n\n3. **Intelligent Retrieval**\n   - Hybrid search: 70% semantic + 30% keyword (configurable via `core.constants`)\n   - Cross-encoder reranking for accuracy\n   - Returns top-K results with confidence scores (default: 5, max: 100)\n   - Sub-100ms retrieval time\n\n4. **Query Enhancement**\n   - Automatic document classification\n   - Intelligent context assembly\n   - Format adaptation for Claude Code\n   - Citation tracking\n\n5. **Response Generation**\n   - Integration with Claude Haiku (fast, accurate)\n   - Streaming responses for better UX\n   - Automatic citation injection\n   - Configurable output formatting\n\n### Architecture Highlights\n\n**Local Processing:**\n- All document processing happens locally\n- No sensitive data sent to external services\n- Full privacy and security\n- Offline-capable (after initial indexing)\n\n**Performance Optimized:**\n- ChromaDB vector store with HNSW indexing (industry standard)\n- Batch processing for throughput\n- Async operations for responsiveness\n- Memory-efficient chunking\n\n**Production Ready:**\n- Comprehensive error handling\n- Graceful degradation on failures\n- Detailed logging and monitoring\n- Multi-agent orchestration for complex queries\n\n### Technology Stack\n\n```\nFrontend: Claude Code Plugin\n  |\nIntegration Layer: MCP Server + Hooks + Slash Commands\n  |\nRetrieval: Hybrid Search Pipeline\n  |\nML/AI:\n  - Embeddings: Sentence Transformers (all-MiniLM-L6-v2)\n  - Reranking: Cross-encoder (ms-marco-MiniLM-L-6-v2)\n  - Storage: ChromaDB (with HNSW indexing)\n\nDocument Processing:\n  - Parsing: LangChain + BeautifulSoup + PyPDF2 + python-docx\n  - Chunking: Semantic boundary detection\n  - Metadata: Automatic extraction\n\nLLM Integration:\n  - Model: Claude Haiku (via Anthropic API)\n  - When plugin: Claude Code internal processing\n  - Streaming: For better perceived performance\n\nMonitoring:\n  - Structured Logging: structlog\n  - Metrics: Prometheus-compatible\n  - TCP Server: Real-time status monitoring\n```\n\n## Use Cases\n\n### For Software Development Teams\n\n**API Integration**\n- Auto-complete API calls with context\n- Validate parameters against documentation\n- Get usage examples from your code\n\n**Bug Fixing**\n- Search error messages in documentation\n- Find related issues in your codebase\n- Get debugging hints from best practices\n\n**Code Review**\n- Check against project standards automatically\n- Retrieve relevant architectural patterns\n- Validate against best practices\n\n### For Documentation\n\n**Knowledge Base**\n- Keep team documentation synchronized\n- Instantly query your knowledge base\n- Reduce \"How do I...\" questions\n\n**Onboarding**\n- New developers get context-aware help\n- Questions answered with your actual docs\n- 50% faster ramp-up time\n\n### For Research and Learning\n\n**Continuous Learning**\n- Search your learning materials instantly\n- Get context from multiple sources\n- Connect related concepts automatically\n\n**Knowledge Synthesis**\n- Combine insights from multiple documents\n- Get connections between topics\n- Build comprehensive understanding faster\n\n## Operation Modes\n\nRAG-CLI supports three operation modes:\n\n### 1. Claude Code Mode (Default)\n- **No API key required**\n- Automatically detected when running as Claude Code plugin\n- Returns formatted context for Claude's internal processing\n- Optimal performance with zero API costs\n\n### 2. Standalone Mode\n- Requires Anthropic API key\n- Direct API calls to Claude\n- Full control over model parameters\n- Useful for testing and development\n\n### 3. Hybrid Mode\n- Auto-detects environment\n- Uses Claude Code when available\n- Falls back to API when needed\n- Maximum flexibility\n\nSet mode via environment variable:\n```bash\nexport RAG_CLI_MODE=\"claude_code\"  # or \"standalone\" or \"hybrid\"\n```\n\n## Architecture\n\n### System Components\n\n```\nRAG-CLI/\n src/\n    core/               # Core RAG pipeline\n       constants.py    # Global configuration constants\n       embeddings.py   # Sentence transformer integration\n       vector_store.py # ChromaDB vector operations\n       document_processor.py # Document chunking\n       retrieval_pipeline.py # Hybrid search\n       claude_integration.py # Claude API interface\n   \n    monitoring/         # Observability\n       logger.py      # Structured logging\n       tcp_server.py  # Monitoring server\n   \n    plugin/            # Claude Code integration\n        skills/        # Agent skills\n        commands/      # Slash commands\n        hooks/         # Event hooks\n        mcp/           # MCP server\n\n scripts/               # CLI utilities\n tests/                 # Test suites\n data/                  # Documents and vectors\n config/                # Configuration files\n```\n\n### Data Flow\n\n1. **Document Processing**: Documents -\u003e Chunks (400-500 tokens) -\u003e Metadata extraction\n2. **Embedding Generation**: Chunks -\u003e sentence-transformers -\u003e 384-dim vectors\n3. **Vector Storage**: Embeddings -\u003e ChromaDB with HNSW indexing -\u003e Persistent storage\n4. **Retrieval**: Query -\u003e Hybrid search -\u003e Reranking -\u003e Top-K results\n5. **Response Generation**: Context + Query -\u003e Claude Haiku -\u003e AI response\n\n## Configuration\n\n### Core Settings (`config/default.yaml`)\n\n```yaml\n# Operation Mode\nmode:\n  operation: hybrid     # claude_code, standalone, or hybrid\n  claude_code:\n    format_context: true\n    include_metadata: true\n    max_context_length: 10000\n\n# Embeddings\nembeddings:\n  model_name: sentence-transformers/all-MiniLM-L6-v2\n  model_dim: 384\n  batch_size: 32\n  cache_enabled: true\n\n# Vector Store\nvector_store:\n  type: chromadb\n  index_type: hnsw    # ChromaDB uses HNSW for efficient similarity search\n  save_path: data/vectors\n\n# Retrieval\nretrieval:\n  top_k: 5\n  hybrid_ratio: 0.7   # 70% vector, 30% keyword\n  rerank: true\n  reranker_model: cross-encoder/ms-marco-MiniLM-L-6-v2\n\n# Claude (for standalone mode)\nclaude:\n  model: claude-haiku-4-5-20251001\n  max_tokens: 4096\n  temperature: 0.7\n  api_key_env: ANTHROPIC_API_KEY  # Only needed for standalone\n```\n\n### Security Best Practices\n\n**Environment Variable Protection:**\n\n1. **Never Commit .env Files**: The `.env` file contains sensitive API keys and should NEVER be committed to version control\n   - Already included in `.gitignore`\n   - Use `config/templates/.env.template` as a reference\n\n2. **File Permissions**: On Unix systems, ensure `.env` has restricted permissions:\n   ```bash\n   chmod 600 .env  # Read/write for owner only\n   ```\n\n3. **API Key Storage**:\n   - Store all API keys in `.env` file only\n   - Never hardcode keys in source code\n   - Use environment variables via `os.getenv()`\n\n4. **Subprocess Security**: RAG-CLI automatically sanitizes environment variables when spawning subprocesses, removing sensitive keys like:\n   - `ANTHROPIC_API_KEY`\n   - `TAVILY_API_KEY`\n   - `OPENAI_API_KEY`\n\n5. **Configuration Files**: User-specific configurations in `config/` are gitignored. Only default templates in `config/defaults/` and `config/templates/` are version controlled.\n\n## Claude Code Plugin\n\n### Slash Commands\n\n- `/search [query]` - Search indexed documents\n- `/rag-enable` - Enable automatic RAG enhancement\n- `/rag-disable` - Disable automatic RAG enhancement\n- `/rag-project` - Analyze current project and index relevant documentation\n- `/update-rag` - Synchronize RAG-CLI plugin files\n\n### Agent Skills\n\nAccess the RAG retrieval skill:\n```\n/skill rag-retrieval \"Your question here\"\n```\n\n### Hooks\n\nRAG-CLI includes several hooks that enhance your Claude Code experience:\n\n1. **Slash Command Blocker** (Priority 150) - Prevents Claude from responding to slash commands, showing only execution status\n2. **User Prompt Submit** (Priority 100) - Automatically enhances queries with RAG context and multi-agent orchestration\n3. **Response Post** (Priority 80) - Adds inline citations to Claude responses when RAG context is used\n4. **Error Handler** (Priority 70) - Provides graceful error handling with helpful troubleshooting tips\n5. **Plugin State Change** (Priority 60) - Persists RAG settings across Claude Code restarts\n6. **Document Indexing** (Priority 50, disabled by default) - Automatically indexes new or modified documents\n\n### Multi-Agent Orchestration\n\nRAG-CLI integrates with the Multi-Agent Framework (MAF) to provide intelligent query routing:\n\n- **RAG Only**: Simple document retrieval queries\n- **MAF Only**: Pure code analysis and debugging tasks\n- **Parallel RAG+MAF**: Complex queries combining documentation and code analysis\n- **Decomposed**: Multi-part queries with intelligent sub-query distribution\n\nThe orchestrator automatically selects the best strategy based on query intent, providing faster and more accurate responses.\n\n### Clean Output Formatting\n\nRAG-CLI provides structured, readable output for all operations:\n\n**Search Results Example:**\n```\n# RAG Search Results\n\n## Retrieval Results\nFound: 5 relevant documents\nTime: 145ms\n\n## Retrieved Documents\n**1. Getting Started Guide (score: 0.890)**\n\u003e This guide will help you get started with the installation process...\n\n**2. Configuration Reference (score: 0.870)**\n\u003e The configuration file allows you to customize various aspects...\n```\n\n**Orchestration Output Example:**\n```\n## Query Processing\n**Strategy:** parallel\n**Intent:** troubleshooting\n**Confidence:** 87.5%\n**Documents:** 3\n**MAF Agent:** debugger\n```\n\nThe formatting system provides:\n- Clean markdown headers for each stage\n- Performance metrics (time, document count, confidence scores)\n- Document previews with intelligent truncation\n- Progress indicators for multi-step operations\n- Collapsible sections for detailed logs (when verbose mode enabled)\n\n## API Reference\n\n### Document Indexing\n\n```python\nfrom src.core.document_processor import get_document_processor\nfrom src.core.embeddings import get_embedding_model\nfrom src.core.vector_store import get_vector_store\n\n# Process documents\nprocessor = get_document_processor()\ndocuments = processor.process_directory(\"data/documents\")\n\n# Generate embeddings\nmodel = get_embedding_model()\nembeddings = model.encode_batch([doc[\"content\"] for doc in documents])\n\n# Store vectors\nstore = get_vector_store()\nstore.add_documents(documents, embeddings)\n```\n\n### Document Retrieval\n\n```python\nfrom src.core.retrieval_pipeline import HybridRetriever\n\n# Initialize retriever\nretriever = HybridRetriever(vector_store, embedding_model, config)\n\n# Search\nresults = retriever.search(\"Your query\", top_k=5)\n```\n\n### Claude Integration\n\n```python\nfrom src.core.claude_integration import ClaudeAssistant\n\n# Initialize assistant\nassistant = ClaudeAssistant(config)\n\n# Generate response\nresponse = assistant.generate_response(query, retrieved_docs)\n```\n\n## Monitoring\n\n### TCP Server Interface\n\nThe monitoring server runs on port 9999 and accepts these commands:\n\n- `STATUS` - System health and statistics\n- `METRICS` - Performance metrics\n- `LOGS` - Recent log entries\n- `HEALTH` - Health check status\n\n### PowerShell Usage\n\n```powershell\n# Check status\n./scripts/monitor.ps1 -Command STATUS\n\n# View metrics\n./scripts/monitor.ps1 -Command METRICS\n```\n\n### Python Client\n\n```python\nimport socket\nimport json\n\ndef query_monitor(command):\n    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:\n        s.connect((\"localhost\", 9999))\n        s.send(command.encode())\n        response = s.recv(4096).decode()\n        return json.loads(response)\n\nstatus = query_monitor(\"STATUS\")\nprint(status)\n```\n\n## Performance\n\n### Benchmarks\n\n| Operation | Target | Typical |\n|-----------|--------|---------|\n| Vector Search | \u003c100ms | 45ms |\n| End-to-End | \u003c5s | 3.2s |\n| Embedding Generation | \u003c500ms | 200ms |\n| Document Processing | 0.5s/100 docs | 0.4s/100 docs |\n\n### Optimization Tips\n\n1. **Large Datasets** (\u003e100K docs): Use HNSW index instead of Flat\n2. **Memory Constraints**: Enable document streaming\n3. **Faster Search**: Reduce top_k and disable reranking\n4. **Better Accuracy**: Increase hybrid_ratio for more semantic search\n\n## Testing\n\n### Run Tests\n\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=src --cov-report=html\n\n# Run specific test\npytest tests/test_core.py::TestEmbeddings\n```\n\n### Test Coverage\n\n- Unit tests for all core modules\n- Integration tests for full pipeline\n- Performance benchmarks\n- Plugin component validation\n\n## Troubleshooting\n\n### Common Issues\n\n**No results found**:\n- Ensure documents are indexed: `ls data/vectors/`\n- Lower similarity threshold: `--threshold 0.5`\n- Check document processing logs\n\n**Slow performance**:\n- Reduce top_k parameter\n- Enable caching in configuration\n- Use HNSW index for large datasets\n\n**API errors** (Standalone mode only):\n- Verify ANTHROPIC_API_KEY is set\n- Check rate limits\n- Switch to Claude Code mode if running as plugin\n- Review logs: `tail -f logs/rag_cli.log`\n\n**Mode detection issues**:\n- Check current mode: `python -c \"from src.core.claude_code_adapter import get_adapter; print(get_adapter().get_mode_info())\"`\n- Force mode: `export RAG_CLI_MODE=\"claude_code\"`\n- Verify .claude directory exists for Claude Code\n\n### Debug Mode\n\n```bash\nexport RAG_CLI_LOG_LEVEL=DEBUG\npython scripts/retrieve.py \"test query\" --verbose\n```\n\n## Development\n\n### Project Structure\n\n- `src/core/` - Core RAG components (includes `constants.py` for centralized configuration)\n- `src/monitoring/` - Logging and metrics\n- `src/plugin/` - Claude Code integration\n- `scripts/` - CLI utilities\n- `tests/` - Test suites\n- `config/` - Configuration files\n\n### Configuration via Constants\n\nRAG-CLI uses a centralized constants module (`core.constants`) for all tunable parameters:\n- **Performance**: Batch sizes, worker counts, cache sizes\n- **Search**: Top-K limits, hybrid search weights, query length limits\n- **Processing**: Chunk sizes, overlap ratios, file size limits\n- **Thresholds**: Vector store index transitions (Flat -\u003e HNSW -\u003e IVF)\n- **Timeouts**: HTTP, embedding generation, search operations\n\nThis design makes it easy to tune performance without modifying code throughout the codebase.\n\n### Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for new functionality\n4. Ensure all tests pass\n5. Submit a pull request\n\n### Code Style\n\n- Follow PEP 8 guidelines\n- Use type hints\n- Add docstrings to all functions\n- Run `black` for formatting\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Support\n\n- GitHub Issues: [Report bugs](https://github.com/ItMeDiaTech/rag-cli/issues)\n- Known Issues: [KNOWN_ISSUES.md](KNOWN_ISSUES.md) - Current limitations and workarounds\n- Documentation: [Wiki](https://github.com/ItMeDiaTech/rag-cli/wiki)\n- Discussions: [Community forum](https://github.com/ItMeDiaTech/rag-cli/discussions)\n- Security: [SECURITY.md](SECURITY.md) - API key management and security best practices\n\n## Acknowledgments\n\n- [Sentence Transformers](https://www.sbert.net/) for embedding models\n- [ChromaDB](https://www.trychroma.com/) for vector database with HNSW indexing\n- [Anthropic](https://www.anthropic.com/) for Claude API\n- [LangChain](https://langchain.com/) for document processing\n\n---\n\nBuilt with focus on performance, accuracy, and developer experience.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FItMeDiaTech%2Frag-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FItMeDiaTech%2Frag-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FItMeDiaTech%2Frag-cli/lists"}