{"id":28790654,"url":"https://github.com/vitali87/code-graph-rag","last_synced_at":"2026-01-21T03:14:18.930Z","repository":{"id":299455142,"uuid":"1003043071","full_name":"vitali87/code-graph-rag","owner":"vitali87","description":"The ultimate RAG for your monorepo. Query, understand, and edit multi-language codebases with the power of AI and knowledge graphs","archived":false,"fork":false,"pushed_at":"2026-01-18T14:39:02.000Z","size":47029,"stargazers_count":1702,"open_issues_count":64,"forks_count":284,"subscribers_count":21,"default_branch":"main","last_synced_at":"2026-01-18T21:59:44.873Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vitali87.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"vitali87","buy_me_a_coffee":"vitali87"}},"created_at":"2025-06-16T14:31:57.000Z","updated_at":"2026-01-18T14:42:23.000Z","dependencies_parsed_at":"2025-07-01T00:27:42.833Z","dependency_job_id":"8afdc918-b887-4945-9148-cc9be8129ea1","html_url":"https://github.com/vitali87/code-graph-rag","commit_stats":null,"previous_names":["vitali87/code-graph-rag"],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/vitali87/code-graph-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitali87%2Fcode-graph-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitali87%2Fcode-graph-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitali87%2Fcode-graph-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitali87%2Fcode-graph-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vitali87","download_url":"https://codeload.github.com/vitali87/code-graph-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitali87%2Fcode-graph-rag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28624352,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-21T02:47:06.670Z","status":"ssl_error","status_checked_at":"2026-01-21T02:45:44.886Z","response_time":86,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-17T23:07:27.557Z","updated_at":"2026-01-21T03:14:18.912Z","avatar_url":"https://github.com/vitali87.png","language":"Python","funding_links":["https://github.com/sponsors/vitali87","https://buymeacoffee.com/vitali87"],"categories":["Code Analysis","其他工具与实用程序","Python","🎯 Advanced Approaches"],"sub_categories":["Implementation Resources"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource srcset=\"assets/logo-dark-any.png\" media=\"(prefers-color-scheme: dark)\"\u003e\n    \u003csource srcset=\"assets/logo-light-any.png\" media=\"(prefers-color-scheme: light)\"\u003e\n    \u003cimg src=\"assets/logo-dark.png\" alt=\"Graph-Code Logo\" width=\"480\"\u003e\n  \u003c/picture\u003e\n\n  \u003cp\u003e\n  \u003ca href=\"https://github.com/vitali87/code-graph-rag/stargazers\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/stars/vitali87/code-graph-rag?style=social\" alt=\"GitHub stars\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/vitali87/code-graph-rag/network/members\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/forks/vitali87/code-graph-rag?style=social\" alt=\"GitHub forks\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/vitali87/code-graph-rag/blob/main/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/vitali87/code-graph-rag\" alt=\"License\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://mseep.ai/app/vitali87-code-graph-rag\"\u003e\n    \u003cimg src=\"https://mseep.net/pr/vitali87-code-graph-rag-badge.png\" alt=\"MseeP.ai Security Assessment\" height=\"20\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n# Graph-Code: A Graph-Based RAG System for Any Codebases\n\nAn accurate Retrieval-Augmented Generation (RAG) system that analyzes multi-language codebases using Tree-sitter, builds comprehensive knowledge graphs, and enables natural language querying of codebase structure and relationships as well as editing capabilities.\n\n\n![demo](./assets/demo.gif)\n\n## Latest News 🔥\n\n- **[NEW]** **MCP Server Integration**: Graph-Code now works as an MCP server with Claude Code! Query and edit your codebase using natural language directly from Claude Code. [Setup Guide](docs/claude-code-setup.md)\n- [2025/10/21] **Semantic Code Search**: Added intent-based code search using UniXcoder embeddings. Find functions by describing what they do (e.g., \"error handling functions\", \"authentication code\") rather than by exact names.\n\n## 🚀 Features\n\n- **Multi-Language Support**:\n\n\u003c!-- SECTION:supported_languages --\u003e\n| Language | Status | Extensions | Functions | Classes/Structs | Modules | Package Detection | Additional Features |\n|--------|------|----------|---------|---------------|-------|-----------------|-------------------|\n| C++ | Fully Supported | .cpp, .h, .hpp, .cc, .cxx, .hxx, .hh, .ixx, .cppm, .ccm | ✓ | ✓ | ✓ | ✓ | Constructors, destructors, operator overloading, templates, lambdas, C++20 modules, namespaces |\n| Java | Fully Supported | .java | ✓ | ✓ | ✓ | - | Generics, annotations, modern features (records/sealed classes), concurrency, reflection |\n| JavaScript | Fully Supported | .js, .jsx | ✓ | ✓ | ✓ | - | ES6 modules, CommonJS, prototype methods, object methods, arrow functions |\n| Lua | Fully Supported | .lua | ✓ | - | ✓ | - | Local/global functions, metatables, closures, coroutines |\n| Python | Fully Supported | .py | ✓ | ✓ | ✓ | ✓ | Type inference, decorators, nested functions |\n| Rust | Fully Supported | .rs | ✓ | ✓ | ✓ | ✓ | impl blocks, associated functions |\n| TypeScript | Fully Supported | .ts, .tsx | ✓ | ✓ | ✓ | - | Interfaces, type aliases, enums, namespaces, ES6/CommonJS modules |\n| C# | In Development | .cs | ✓ | ✓ | ✓ | - | Classes, interfaces, generics (planned) |\n| Go | In Development | .go | ✓ | ✓ | ✓ | - | Methods, type declarations |\n| PHP | In Development | .php | ✓ | ✓ | ✓ | - | Classes, functions, namespaces |\n| Scala | In Development | .scala, .sc | ✓ | ✓ | ✓ | - | Case classes, objects |\n\u003c!-- /SECTION:supported_languages --\u003e\n- **🌳 Tree-sitter Parsing**: Uses Tree-sitter for robust, language-agnostic AST parsing\n- **📊 Knowledge Graph Storage**: Uses Memgraph to store codebase structure as an interconnected graph\n- **🗣️ Natural Language Querying**: Ask questions about your codebase in plain English\n- **🤖 AI-Powered Cypher Generation**: Supports both cloud models (Google Gemini), local models (Ollama), and OpenAI models for natural language to Cypher translation\n- **🤖 OpenAI Integration**: Leverage OpenAI models to enhance AI functionalities.\n- **📝 Code Snippet Retrieval**: Retrieves actual source code snippets for found functions/methods\n- **✍️ Advanced File Editing**: Surgical code replacement with AST-based function targeting, visual diff previews, and exact code block modifications\n- **⚡️ Shell Command Execution**: Can execute terminal commands for tasks like running tests or using CLI tools.\n- **🚀 Interactive Code Optimization**: AI-powered codebase optimization with language-specific best practices and interactive approval workflow\n- **📚 Reference-Guided Optimization**: Use your own coding standards and architectural documents to guide optimization suggestions\n- **🔗 Dependency Analysis**: Parses `pyproject.toml` to understand external dependencies\n- **🎯 Nested Function Support**: Handles complex nested functions and class hierarchies\n- **🔄 Language-Agnostic Design**: Unified graph schema across all supported languages\n\n## 🏗️ Architecture\n\nThe system consists of two main components:\n\n1. **Multi-language Parser**: Tree-sitter based parsing system that analyzes codebases and ingests data into Memgraph\n2. **RAG System** (`codebase_rag/`): Interactive CLI for querying the stored knowledge graph\n\n\n## 📋 Prerequisites\n\n- Python 3.12+\n- Docker \u0026 Docker Compose (for Memgraph)\n- **cmake** (required for building pymgclient dependency)\n- **ripgrep** (`rg`) (required for shell command text searching)\n- **For cloud models**: Google Gemini API key\n- **For local models**: Ollama installed and running\n- `uv` package manager\n\n### Installing cmake and ripgrep\n\nOn macOS:\n```bash\nbrew install cmake ripgrep\n```\n\nOn Linux (Ubuntu/Debian):\n```bash\nsudo apt-get update\nsudo apt-get install cmake ripgrep\n```\n\nOn Linux (CentOS/RHEL):\n```bash\nsudo yum install cmake\nsudo dnf install ripgrep\n# Note: ripgrep may need to be installed from EPEL or via cargo\n```\n\n## 🛠️ Installation\n\n```bash\ngit clone https://github.com/vitali87/code-graph-rag.git\n\ncd code-graph-rag\n```\n\n2. **Install dependencies**:\n\nFor basic Python support:\n```bash\nuv sync\n```\n\nFor full multi-language support:\n```bash\nuv sync --extra treesitter-full\n```\n\nFor development (including tests and pre-commit hooks):\n```bash\nmake dev\n```\n\nThis installs all dependencies and sets up pre-commit hooks automatically.\n\nThis installs Tree-sitter grammars for all supported languages (see Multi-Language Support section).\n\n3. **Set up environment variables**:\n```bash\ncp .env.example .env\n# Edit .env with your configuration (see options below)\n```\n\n### Configuration Options\n\nThe new provider-explicit configuration supports mixing different providers for orchestrator and cypher models.\n\n#### Option 1: All Ollama (Local Models)\n\n```bash\n# .env file\nORCHESTRATOR_PROVIDER=ollama\nORCHESTRATOR_MODEL=llama3.2\nORCHESTRATOR_ENDPOINT=http://localhost:11434/v1\n\nCYPHER_PROVIDER=ollama\nCYPHER_MODEL=codellama\nCYPHER_ENDPOINT=http://localhost:11434/v1\n```\n\n#### Option 2: All OpenAI Models\n```bash\n# .env file\nORCHESTRATOR_PROVIDER=openai\nORCHESTRATOR_MODEL=gpt-4o\nORCHESTRATOR_API_KEY=sk-your-openai-key\n\nCYPHER_PROVIDER=openai\nCYPHER_MODEL=gpt-4o-mini\nCYPHER_API_KEY=sk-your-openai-key\n```\n\n#### Option 3: All Google Models\n```bash\n# .env file\nORCHESTRATOR_PROVIDER=google\nORCHESTRATOR_MODEL=gemini-2.5-pro\nORCHESTRATOR_API_KEY=your-google-api-key\n\nCYPHER_PROVIDER=google\nCYPHER_MODEL=gemini-2.5-flash\nCYPHER_API_KEY=your-google-api-key\n```\n\n#### Option 4: Mixed Providers\n```bash\n# .env file - Google orchestrator + Ollama cypher\nORCHESTRATOR_PROVIDER=google\nORCHESTRATOR_MODEL=gemini-2.5-pro\nORCHESTRATOR_API_KEY=your-google-api-key\n\nCYPHER_PROVIDER=ollama\nCYPHER_MODEL=codellama\nCYPHER_ENDPOINT=http://localhost:11434/v1\n```\n\nGet your Google API key from [Google AI Studio](https://aistudio.google.com/app/apikey).\n\n**Install and run Ollama**:\n```bash\n# Install Ollama (macOS/Linux)\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Pull required models\nollama pull llama3.2\n# Or try other models like:\n# ollama pull llama3\n# ollama pull mistral\n# ollama pull codellama\n\n# Ollama will automatically start serving on localhost:11434\n```\n\n\u003e **Note**: Local models provide privacy and no API costs, but may have lower accuracy compared to cloud models like Gemini.\n\n4. **Start Memgraph database**:\n```bash\ndocker-compose up -d\n```\n\n## 🛠️ Makefile Commands\n\nUse the Makefile for common development tasks:\n\n\u003c!-- SECTION:makefile_commands --\u003e\n| Command | Description |\n|-------|-----------|\n| `make help` | Show this help message |\n| `make all` | Install everything for full development environment (deps, grammars, hooks, tests) |\n| `make install` | Install project dependencies with full language support |\n| `make python` | Install project dependencies for Python only |\n| `make dev` | Setup development environment (install deps + pre-commit hooks) |\n| `make test` | Run unit tests only (fast, no Docker) |\n| `make test-parallel` | Run unit tests in parallel (fast, no Docker) |\n| `make test-integration` | Run integration tests (requires Docker) |\n| `make test-all` | Run all tests including integration and e2e (requires Docker) |\n| `make test-parallel-all` | Run all tests in parallel including integration and e2e (requires Docker) |\n| `make clean` | Clean up build artifacts and cache |\n| `make build-grammars` | Build grammar submodules |\n| `make watch` | Watch repository for changes and update graph in real-time |\n| `make readme` | Regenerate README.md from codebase |\n| `make lint` | Run ruff check |\n| `make format` | Run ruff format |\n| `make typecheck` | Run type checking with ty |\n| `make check` | Run all checks: lint, typecheck, test |\n\u003c!-- /SECTION:makefile_commands --\u003e\n\n## 🎯 Usage\n\nThe Graph-Code system offers four main modes of operation:\n1. **Parse \u0026 Ingest**: Build knowledge graph from your codebase\n2. **Interactive Query**: Ask questions about your code in natural language\n3. **Export \u0026 Analyze**: Export graph data for programmatic analysis\n4. **AI Optimization**: Get AI-powered optimization suggestions for your code.\n5. **Editing**: Perform surgical code replacements and modifications with precise targeting.\n\n### Step 1: Parse a Repository\n\nParse and ingest a multi-language repository into the knowledge graph:\n\n**For the first repository (clean start):**\n```bash\ncgr start --repo-path /path/to/repo1 --update-graph --clean\n```\n\n**For additional repositories (preserve existing data):**\n```bash\ncgr start --repo-path /path/to/repo2 --update-graph\ncgr start --repo-path /path/to/repo3 --update-graph\n```\n\n**Control Memgraph batch flushing:**\n```bash\n# Flush every 5,000 records instead of the default from settings\ncgr start --repo-path /path/to/repo --update-graph \\\n  --batch-size 5000\n```\n\nThe system automatically detects and processes files for all supported languages (see Multi-Language Support section).\n\n### Step 2: Query the Codebase\n\nStart the interactive RAG CLI:\n\n```bash\ncgr start --repo-path /path/to/your/repo\n```\n\n### Step 2.5: Real-Time Graph Updates (Optional)\n\nFor active development, you can keep your knowledge graph automatically synchronized with code changes using the realtime updater. This is particularly useful when you're actively modifying code and want the AI assistant to always work with the latest codebase structure.\n\n**What it does:**\n- Watches your repository for file changes (create, modify, delete)\n- Automatically updates the knowledge graph in real-time\n- Maintains consistency by recalculating all function call relationships\n- Filters out irrelevant files (`.git`, `node_modules`, etc.)\n\n**How to use:**\n\nRun the realtime updater in a separate terminal:\n\n```bash\n# Using Python directly\npython realtime_updater.py /path/to/your/repo\n\n# Or using the Makefile\nmake watch REPO_PATH=/path/to/your/repo\n```\n\n**With custom Memgraph settings:**\n```bash\n# Python\npython realtime_updater.py /path/to/your/repo --host localhost --port 7687 --batch-size 1000\n\n# Makefile\nmake watch REPO_PATH=/path/to/your/repo HOST=localhost PORT=7687 BATCH_SIZE=1000\n```\n\n**Multi-terminal workflow:**\n```bash\n# Terminal 1: Start the realtime updater\npython realtime_updater.py ~/my-project\n\n# Terminal 2: Run the AI assistant\ncgr start --repo-path ~/my-project\n```\n\n**Performance note:** The updater currently recalculates all CALLS relationships on every file change to ensure consistency. This prevents \"island\" problems where changes in one file aren't reflected in relationships from other files, but may impact performance on very large codebases with frequent changes. **Note:** Optimization of this behavior is a work in progress.\n\n**CLI Arguments:**\n- `repo_path` (required): Path to repository to watch\n- `--host`: Memgraph host (default: `localhost`)\n- `--port`: Memgraph port (default: `7687`)\n- `--batch-size`: Number of buffered nodes/relationships before flushing to Memgraph\n\n**Specify Custom Models:**\n```bash\n# Use specific local models\ncgr start --repo-path /path/to/your/repo \\\n  --orchestrator ollama:llama3.2 \\\n  --cypher ollama:codellama\n\n# Use specific Gemini models\ncgr start --repo-path /path/to/your/repo \\\n  --orchestrator google:gemini-2.0-flash-thinking-exp-01-21 \\\n  --cypher google:gemini-2.5-flash-lite-preview-06-17\n\n# Use mixed providers\ncgr start --repo-path /path/to/your/repo \\\n  --orchestrator google:gemini-2.0-flash-thinking-exp-01-21 \\\n  --cypher ollama:codellama\n```\n\nExample queries (works across all supported languages):\n- \"Show me all classes that contain 'user' in their name\"\n- \"Find functions related to database operations\"\n- \"What methods does the User class have?\"\n- \"Show me functions that handle authentication\"\n- \"List all TypeScript components\"\n- \"Find Rust structs and their methods\"\n- \"Show me Go interfaces and implementations\"\n- \"Find all C++ operator overloads in the Matrix class\"\n- \"Show me C++ template functions with their specializations\"\n- \"List all C++ namespaces and their contained classes\"\n- \"Find C++ lambda expressions used in algorithms\"\n- \"Add logging to all database connection functions\"\n- \"Refactor the User class to use dependency injection\"\n- \"Convert these Python functions to async/await pattern\"\n- \"Add error handling to authentication methods\"\n- \"Optimize this function for better performance\"\n\n### Step 3: Export Graph Data\n\nFor programmatic access and integration with other tools, you can export the entire knowledge graph to JSON:\n\n**Export during graph update:**\n```bash\ncgr start --repo-path /path/to/repo --update-graph --clean -o my_graph.json\n```\n\n**Export existing graph without updating:**\n```bash\ncgr export -o my_graph.json\n```\n\n**Optional: adjust Memgraph batching during export:**\n```bash\ncgr export -o my_graph.json --batch-size 5000\n```\n\n**Working with exported data:**\n```python\nfrom codebase_rag.graph_loader import load_graph\n\n# Load the exported graph\ngraph = load_graph(\"my_graph.json\")\n\n# Get summary statistics\nsummary = graph.summary()\nprint(f\"Total nodes: {summary['total_nodes']}\")\nprint(f\"Total relationships: {summary['total_relationships']}\")\n\n# Find specific node types\nfunctions = graph.find_nodes_by_label(\"Function\")\nclasses = graph.find_nodes_by_label(\"Class\")\n\n# Analyze relationships\nfor func in functions[:5]:\n    relationships = graph.get_relationships_for_node(func.node_id)\n    print(f\"Function {func.properties['name']} has {len(relationships)} relationships\")\n```\n\n**Example analysis script:**\n```bash\npython examples/graph_export_example.py my_graph.json\n```\n\nThis provides a reliable, programmatic way to access your codebase structure without LLM restrictions, perfect for:\n- Integration with other tools\n- Custom analysis scripts\n- Building documentation generators\n- Creating code metrics dashboards\n\n### Step 4: Code Optimization\n\nFor AI-powered codebase optimization with best practices guidance:\n\n**Basic optimization for a specific language:**\n```bash\ncgr optimize python --repo-path /path/to/your/repo\n```\n\n**Optimization with reference documentation:**\n```bash\ncgr optimize python \\\n  --repo-path /path/to/your/repo \\\n  --reference-document /path/to/best_practices.md\n```\n\n**Using specific models for optimization:**\n```bash\ncgr optimize javascript \\\n  --repo-path /path/to/frontend \\\n  --orchestrator google:gemini-2.0-flash-thinking-exp-01-21\n\n# Optional: override Memgraph batch flushing during optimization\ncgr optimize javascript --repo-path /path/to/frontend \\\n  --batch-size 5000\n```\n\n**Supported Languages for Optimization:**\nAll supported languages: `python`, `javascript`, `typescript`, `rust`, `go`, `java`, `scala`, `cpp`\n\n**How It Works:**\n1. **Analysis Phase**: The agent analyzes your codebase structure using the knowledge graph\n2. **Pattern Recognition**: Identifies common anti-patterns, performance issues, and improvement opportunities\n3. **Best Practices Application**: Applies language-specific best practices and patterns\n4. **Interactive Approval**: Presents each optimization suggestion for your approval before implementation\n5. **Guided Implementation**: Implements approved changes with detailed explanations\n\n**Example Optimization Session:**\n```\nStarting python optimization session...\n┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ The agent will analyze your python codebase and propose specific          ┃\n┃ optimizations. You'll be asked to approve each suggestion before          ┃\n┃ implementation. Type 'exit' or 'quit' to end the session.                 ┃\n┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛\n\n🔍 Analyzing codebase structure...\n📊 Found 23 Python modules with potential optimizations\n\n💡 Optimization Suggestion #1:\n   File: src/data_processor.py\n   Issue: Using list comprehension in a loop can be optimized\n   Suggestion: Replace with generator expression for memory efficiency\n\n   [y/n] Do you approve this optimization?\n```\n\n**Reference Document Support:**\nYou can provide reference documentation (like coding standards, architectural guidelines, or best practices documents) to guide the optimization process:\n\n```bash\n# Use company coding standards\ncgr optimize python \\\n  --reference-document ./docs/coding_standards.md\n\n# Use architectural guidelines\ncgr optimize java \\\n  --reference-document ./ARCHITECTURE.md\n\n# Use performance best practices\ncgr optimize rust \\\n  --reference-document ./docs/performance_guide.md\n```\n\nThe agent will incorporate the guidance from your reference documents when suggesting optimizations, ensuring they align with your project's standards and architectural decisions.\n\n**Common CLI Arguments:**\n- `--orchestrator`: Specify provider:model for main operations (e.g., `google:gemini-2.0-flash-thinking-exp-01-21`, `ollama:llama3.2`)\n- `--cypher`: Specify provider:model for graph queries (e.g., `google:gemini-2.5-flash-lite-preview-06-17`, `ollama:codellama`)\n- `--repo-path`: Path to repository (defaults to current directory)\n- `--batch-size`: Override Memgraph flush batch size (defaults to `MEMGRAPH_BATCH_SIZE` in settings)\n- `--reference-document`: Path to reference documentation (optimization only)\n\n## 🔌 MCP Server (Claude Code Integration)\n\nGraph-Code can run as an MCP (Model Context Protocol) server, enabling seamless integration with Claude Code and other MCP clients.\n\n### Quick Setup\n\n```bash\nclaude mcp add --transport stdio graph-code \\\n  --env TARGET_REPO_PATH=/absolute/path/to/your/project \\\n  --env CYPHER_PROVIDER=openai \\\n  --env CYPHER_MODEL=gpt-4 \\\n  --env CYPHER_API_KEY=your-api-key \\\n  -- uv run --directory /path/to/code-graph-rag graph-code mcp-server\n```\n\n### Available Tools\n\n\u003c!-- SECTION:mcp_tools --\u003e\n| Tool | Description |\n|----|-----------|\n| `list_projects` | List all indexed projects in the knowledge graph database. Returns a list of project names that have been indexed. |\n| `delete_project` | Delete a specific project from the knowledge graph database. This removes all nodes associated with the project while preserving other projects. Use list_projects first to see available projects. |\n| `wipe_database` | WARNING: Completely wipe the entire database, removing ALL indexed projects. This cannot be undone. Use delete_project for removing individual projects. |\n| `index_repository` | Parse and ingest the repository into the Memgraph knowledge graph. This builds a comprehensive graph of functions, classes, dependencies, and relationships. Note: This preserves other projects - only the current project is re-indexed. |\n| `query_code_graph` | Query the codebase knowledge graph using natural language. Ask questions like 'What functions call UserService.create_user?' or 'Show me all classes that implement the Repository interface'. |\n| `get_code_snippet` | Retrieve source code for a function, class, or method by its qualified name. Returns the source code, file path, line numbers, and docstring. |\n| `surgical_replace_code` | Surgically replace an exact code block in a file using diff-match-patch. Only modifies the exact target block, leaving the rest unchanged. |\n| `read_file` | Read the contents of a file from the project. Supports pagination for large files. |\n| `write_file` | Write content to a file, creating it if it doesn't exist. |\n| `list_directory` | List contents of a directory in the project. |\n\u003c!-- /SECTION:mcp_tools --\u003e\n\n### Example Usage\n\n```\n\u003e Index this repository\n\u003e What functions call UserService.create_user?\n\u003e Update the login function to add rate limiting\n```\n\nFor detailed setup, see [Claude Code Setup Guide](docs/claude-code-setup.md).\n\n## 📊 Graph Schema\n\nThe knowledge graph uses the following node types and relationships:\n\n### Node Types\n\n\u003c!-- SECTION:node_schemas --\u003e\n| Label | Properties |\n|-----|----------|\n| Project | `{name: string}` |\n| Package | `{qualified_name: string, name: string, path: string}` |\n| Folder | `{path: string, name: string}` |\n| File | `{path: string, name: string, extension: string}` |\n| Module | `{qualified_name: string, name: string, path: string}` |\n| Class | `{qualified_name: string, name: string, decorators: list[string]}` |\n| Function | `{qualified_name: string, name: string, decorators: list[string]}` |\n| Method | `{qualified_name: string, name: string, decorators: list[string]}` |\n| Interface | `{qualified_name: string, name: string}` |\n| Enum | `{qualified_name: string, name: string}` |\n| Type | `{qualified_name: string, name: string}` |\n| Union | `{qualified_name: string, name: string}` |\n| ModuleInterface | `{qualified_name: string, name: string, path: string}` |\n| ModuleImplementation | `{qualified_name: string, name: string, path: string, implements_module: string}` |\n| ExternalPackage | `{name: string, version_spec: string}` |\n\u003c!-- /SECTION:node_schemas --\u003e\n\n### Language-Specific Mappings\n\n\u003c!-- SECTION:language_mappings --\u003e\n- **C++**: `class_specifier`, `declaration`, `enum_specifier`, `field_declaration`, `function_definition`, `lambda_expression`, `struct_specifier`, `template_declaration`, `union_specifier`\n- **Java**: `annotation_type_declaration`, `class_declaration`, `constructor_declaration`, `enum_declaration`, `interface_declaration`, `method_declaration`, `record_declaration`\n- **JavaScript**: `arrow_function`, `class`, `class_declaration`, `function_declaration`, `function_expression`, `generator_function_declaration`, `method_definition`\n- **Lua**: `function_declaration`, `function_definition`\n- **Python**: `class_definition`, `function_definition`\n- **Rust**: `closure_expression`, `enum_item`, `function_item`, `function_signature_item`, `impl_item`, `struct_item`, `trait_item`, `type_item`, `union_item`\n- **TypeScript**: `abstract_class_declaration`, `arrow_function`, `class`, `class_declaration`, `enum_declaration`, `function_declaration`, `function_expression`, `function_signature`, `generator_function_declaration`, `interface_declaration`, `internal_module`, `method_definition`, `type_alias_declaration`\n- **C#**: `anonymous_method_expression`, `class_declaration`, `constructor_declaration`, `destructor_declaration`, `enum_declaration`, `function_pointer_type`, `interface_declaration`, `lambda_expression`, `local_function_statement`, `method_declaration`, `struct_declaration`\n- **Go**: `function_declaration`, `method_declaration`, `type_declaration`\n- **PHP**: `anonymous_function`, `arrow_function`, `class_declaration`, `enum_declaration`, `function_definition`, `function_static_declaration`, `interface_declaration`, `trait_declaration`\n- **Scala**: `class_definition`, `function_declaration`, `function_definition`, `object_definition`, `trait_definition`\n\u003c!-- /SECTION:language_mappings --\u003e\n\n### Relationships\n\n\u003c!-- SECTION:relationship_schemas --\u003e\n| Source | Relationship | Target |\n|------|------------|------|\n| Project, Package, Folder | CONTAINS_PACKAGE | Package |\n| Project, Package, Folder | CONTAINS_FOLDER | Folder |\n| Project, Package, Folder | CONTAINS_FILE | File |\n| Project, Package, Folder | CONTAINS_MODULE | Module |\n| Module | DEFINES | Class, Function |\n| Class | DEFINES_METHOD | Method |\n| Module | IMPORTS | Module |\n| Module | EXPORTS | Class, Function |\n| Module | EXPORTS_MODULE | ModuleInterface |\n| Module | IMPLEMENTS_MODULE | ModuleImplementation |\n| Class | INHERITS | Class |\n| Class | IMPLEMENTS | Interface |\n| Method | OVERRIDES | Method |\n| ModuleImplementation | IMPLEMENTS | ModuleInterface |\n| Project | DEPENDS_ON_EXTERNAL | ExternalPackage |\n| Function, Method | CALLS | Function, Method |\n\u003c!-- /SECTION:relationship_schemas --\u003e\n\n## 🔧 Configuration\n\nConfiguration is managed through environment variables in `.env` file:\n\n### Provider-Specific Settings\n\n#### Orchestrator Model Configuration\n- `ORCHESTRATOR_PROVIDER`: Provider name (`google`, `openai`, `ollama`)\n- `ORCHESTRATOR_MODEL`: Model ID (e.g., `gemini-2.5-pro`, `gpt-4o`, `llama3.2`)\n- `ORCHESTRATOR_API_KEY`: API key for the provider (if required)\n- `ORCHESTRATOR_ENDPOINT`: Custom endpoint URL (if required)\n- `ORCHESTRATOR_PROJECT_ID`: Google Cloud project ID (for Vertex AI)\n- `ORCHESTRATOR_REGION`: Google Cloud region (default: `us-central1`)\n- `ORCHESTRATOR_PROVIDER_TYPE`: Google provider type (`gla` or `vertex`)\n- `ORCHESTRATOR_THINKING_BUDGET`: Thinking budget for reasoning models\n- `ORCHESTRATOR_SERVICE_ACCOUNT_FILE`: Path to service account file (for Vertex AI)\n\n#### Cypher Model Configuration\n- `CYPHER_PROVIDER`: Provider name (`google`, `openai`, `ollama`)\n- `CYPHER_MODEL`: Model ID (e.g., `gemini-2.5-flash`, `gpt-4o-mini`, `codellama`)\n- `CYPHER_API_KEY`: API key for the provider (if required)\n- `CYPHER_ENDPOINT`: Custom endpoint URL (if required)\n- `CYPHER_PROJECT_ID`: Google Cloud project ID (for Vertex AI)\n- `CYPHER_REGION`: Google Cloud region (default: `us-central1`)\n- `CYPHER_PROVIDER_TYPE`: Google provider type (`gla` or `vertex`)\n- `CYPHER_THINKING_BUDGET`: Thinking budget for reasoning models\n- `CYPHER_SERVICE_ACCOUNT_FILE`: Path to service account file (for Vertex AI)\n\n### System Settings\n- `MEMGRAPH_HOST`: Memgraph hostname (default: `localhost`)\n- `MEMGRAPH_PORT`: Memgraph port (default: `7687`)\n- `MEMGRAPH_HTTP_PORT`: Memgraph HTTP port (default: `7444`)\n- `LAB_PORT`: Memgraph Lab port (default: `3000`)\n- `MEMGRAPH_BATCH_SIZE`: Batch size for Memgraph operations (default: `1000`)\n- `TARGET_REPO_PATH`: Default repository path (default: `.`)\n- `LOCAL_MODEL_ENDPOINT`: Fallback endpoint for Ollama (default: `http://localhost:11434/v1`)\n\n### Custom Ignore Patterns\n\nYou can specify additional directories to exclude by creating a `.cgrignore` file in your repository root:\n\n```\n# Comments start with #\nvendor\n.custom_cache\nmy_build_output\n```\n\n- One directory name per line\n- Lines starting with `#` are comments\n- Blank lines are ignored\n- Patterns are exact directory name matches (not globs)\n- Patterns from `.cgrignore` are merged with `--exclude` flags and auto-detected directories\n\n### Key Dependencies\n\n\u003c!-- SECTION:dependencies --\u003e\n- **loguru**: Python logging made (stupidly) simple\n- **mcp**: Model Context Protocol SDK\n- **pydantic-ai**: Agent Framework / shim to use Pydantic with LLMs\n- **pydantic-settings**: Settings management using Pydantic\n- **pymgclient**: Memgraph database adapter for Python language\n- **python-dotenv**: Read key-value pairs from a .env file and set them as environment variables\n- **toml**: Python Library for Tom's Obvious, Minimal Language\n- **tree-sitter-python**: Python grammar for tree-sitter\n- **tree-sitter**: Python bindings to the Tree-sitter parsing library\n- **watchdog**: Filesystem events monitoring\n- **typer**: Typer, build great CLIs. Easy to code. Based on Python type hints.\n- **rich**: Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal\n- **prompt-toolkit**: Library for building powerful interactive command lines in Python\n- **diff-match-patch**: Repackaging of Google's Diff Match and Patch libraries.\n- **click**: Composable command line interface toolkit\n- **protobuf**\n- **defusedxml**: XML bomb protection for Python stdlib modules\n\u003c!-- /SECTION:dependencies --\u003e\n\n## 🤖 Agentic Workflow \u0026 Tools\n\nThe agent is designed with a deliberate workflow to ensure it acts with context and precision, especially when modifying the file system.\n\n### Core Tools\n\nThe agent has access to a suite of tools to understand and interact with the codebase:\n\n\u003c!-- SECTION:agentic_tools --\u003e\n| Tool | Description |\n|----|-----------|\n| `query_graph` | Query the codebase knowledge graph using natural language questions. Ask in plain English about classes, functions, methods, dependencies, or code structure. Examples: 'Find all functions that call each other', 'What classes are in the user module', 'Show me functions with the longest call chains'. |\n| `read_file` | Reads the content of text-based files. For documents like PDFs or images, use the 'analyze_document' tool instead. |\n| `create_file` | Creates a new file with content. IMPORTANT: Check file existence first! Overwrites completely WITHOUT showing diff. Use only for new files, not existing file modifications. |\n| `replace_code` | Surgically replaces specific code blocks in files. Requires exact target code and replacement. Only modifies the specified block, leaving rest of file unchanged. True surgical patching. |\n| `list_directory` | Lists the contents of a directory to explore the codebase. |\n| `analyze_document` | Analyzes documents (PDFs, images) to answer questions about their content. |\n| `execute_shell` | Executes shell commands from allowlist. Read-only commands run without approval; write operations require user confirmation. |\n| `semantic_search` | Performs a semantic search for functions based on a natural language query describing their purpose, returning a list of potential matches with similarity scores. |\n| `get_function_source` | Retrieves the source code for a specific function or method using its internal node ID, typically obtained from a semantic search result. |\n| `get_code_snippet` | Retrieves the source code for a specific function, class, or method using its full qualified name. |\n\u003c!-- /SECTION:agentic_tools --\u003e\n\n### Intelligent and Safe File Editing\n\nThe agent uses AST-based function targeting with Tree-sitter for precise code modifications. Features include:\n- **Visual diff preview** before changes\n- **Surgical patching** that only modifies target code blocks\n- **Multi-language support** across all supported languages\n- **Security sandbox** preventing edits outside project directory\n- **Smart function matching** with qualified names and line numbers\n\n\n\n## 🌍 Multi-Language Support\n\n### Adding New Languages\n\nGraph-Code makes it easy to add support for any language that has a Tree-sitter grammar. The system automatically handles grammar compilation and integration.\n\n\u003e **⚠️ Recommendation**: While you can add languages yourself, we recommend waiting for official full support to ensure optimal parsing quality, comprehensive feature coverage, and robust integration. The languages marked as \"In Development\" above will receive dedicated optimization and testing.\n\n\u003e **💡 Request Support**: If you want a specific language to be officially supported, please [submit an issue](https://github.com/vitali87/code-graph-rag/issues) with your language request.\n\n#### Quick Start: Add a Language\n\nUse the built-in language management tool to add any Tree-sitter supported language:\n\n```bash\n# Add a language using the standard tree-sitter repository\ncgr language add-grammar \u003clanguage-name\u003e\n\n# Examples:\ncgr language add-grammar c-sharp\ncgr language add-grammar php\ncgr language add-grammar ruby\ncgr language add-grammar kotlin\n```\n\n#### Custom Grammar Repositories\n\nFor languages hosted outside the standard tree-sitter organization:\n\n```bash\n# Add a language from a custom repository\ncgr language add-grammar --grammar-url https://github.com/custom/tree-sitter-mylang\n```\n\n#### What Happens Automatically\n\nWhen you add a language, the tool automatically:\n\n1. **Downloads the Grammar**: Clones the tree-sitter grammar repository as a git submodule\n2. **Detects Configuration**: Auto-extracts language metadata from `tree-sitter.json`\n3. **Analyzes Node Types**: Automatically identifies AST node types for:\n   - Functions/methods (`method_declaration`, `function_definition`, etc.)\n   - Classes/structs (`class_declaration`, `struct_declaration`, etc.)\n   - Modules/files (`compilation_unit`, `source_file`, etc.)\n   - Function calls (`call_expression`, `method_invocation`, etc.)\n4. **Compiles Bindings**: Builds Python bindings from the grammar source\n5. **Updates Configuration**: Adds the language to `codebase_rag/language_config.py`\n6. **Enables Parsing**: Makes the language immediately available for codebase analysis\n\n#### Example: Adding C# Support\n\n```bash\n$ cgr language add-grammar c-sharp\n🔍 Using default tree-sitter URL: https://github.com/tree-sitter/tree-sitter-c-sharp\n🔄 Adding submodule from https://github.com/tree-sitter/tree-sitter-c-sharp...\n✅ Successfully added submodule at grammars/tree-sitter-c-sharp\nAuto-detected language: c-sharp\nAuto-detected file extensions: ['cs']\nAuto-detected node types:\nFunctions: ['destructor_declaration', 'method_declaration', 'constructor_declaration']\nClasses: ['struct_declaration', 'enum_declaration', 'interface_declaration', 'class_declaration']\nModules: ['compilation_unit', 'file_scoped_namespace_declaration', 'namespace_declaration']\nCalls: ['invocation_expression']\n\n✅ Language 'c-sharp' has been added to the configuration!\n📝 Updated codebase_rag/language_config.py\n```\n\n#### Managing Languages\n\n```bash\n# List all configured languages\ncgr language list-languages\n\n# Remove a language (this also removes the git submodule unless --keep-submodule is specified)\ncgr language remove-language \u003clanguage-name\u003e\n```\n\n#### Language Configuration\n\nThe system uses a configuration-driven approach for language support. Each language is defined in `codebase_rag/language_config.py` with the following structure:\n\n```python\n\"language-name\": LanguageConfig(\n    name=\"language-name\",\n    file_extensions=[\".ext1\", \".ext2\"],\n    function_node_types=[\"function_declaration\", \"method_declaration\"],\n    class_node_types=[\"class_declaration\", \"struct_declaration\"],\n    module_node_types=[\"compilation_unit\", \"source_file\"],\n    call_node_types=[\"call_expression\", \"method_invocation\"],\n),\n```\n\n#### Troubleshooting\n\n**Grammar not found**: If the automatic URL doesn't work, use a custom URL:\n```bash\ncgr language add-grammar --grammar-url https://github.com/custom/tree-sitter-mylang\n```\n\n**Version incompatibility**: If you get \"Incompatible Language version\" errors, update your tree-sitter package:\n```bash\nuv add tree-sitter@latest\n```\n\n**Missing node types**: The tool automatically detects common node patterns, but you can manually adjust the configuration in `language_config.py` if needed.\n\n## 📦 Building a binary\n\nYou can build a binary of the application using the `build_binary.py` script. This script uses PyInstaller to package the application and its dependencies into a single executable.\n\n```bash\npython build_binary.py\n```\nThe resulting binary will be located in the `dist` directory.\n\n## 🐛 Debugging\n\n1. **Check Memgraph connection**:\n   - Ensure Docker containers are running: `docker-compose ps`\n   - Verify Memgraph is accessible on port 7687\n\n2. **View database in Memgraph Lab**:\n   - Open http://localhost:3000\n   - Connect to memgraph:7687\n\n3. **For local models**:\n   - Verify Ollama is running: `ollama list`\n   - Check if models are downloaded: `ollama pull llama3`\n   - Test Ollama API: `curl http://localhost:11434/v1/models`\n   - Check Ollama logs: `ollama logs`\n\n## 🤝 Contributing\n\nPlease see [CONTRIBUTING.md](CONTRIBUTING.md) for detailed contribution guidelines.\n\nGood first PRs are from TODO issues.\n\n## 🙋‍♂️ Support\n\nFor issues or questions:\n1. Check the logs for error details\n2. Verify Memgraph connection\n3. Ensure all environment variables are set\n4. Review the graph schema matches your expectations\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=vitali87/code-graph-rag\u0026type=Date)](https://www.star-history.com/#vitali87/code-graph-rag\u0026Date)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvitali87%2Fcode-graph-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvitali87%2Fcode-graph-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvitali87%2Fcode-graph-rag/lists"}