{"id":45033818,"url":"https://github.com/arthurmgraf/graphmind","last_synced_at":"2026-02-19T06:05:21.240Z","repository":{"id":336975645,"uuid":"1151585754","full_name":"arthurmgraf/graphmind","owner":"arthurmgraf","description":"Autonomous Knowledge Agent Platform - Agentic RAG with Knowledge Graphs, hybrid retrieval, LangGraph agents, and MCP server","archived":false,"fork":false,"pushed_at":"2026-02-07T12:53:49.000Z","size":128,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-07T13:59:48.300Z","etag":null,"topics":["agentic-ai","fastapi","knowledge-graph","langchain","langgraph","llm","neo4j","python","rag","vector-search"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arthurmgraf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-06T16:45:23.000Z","updated_at":"2026-02-07T12:51:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/arthurmgraf/graphmind","commit_stats":null,"previous_names":["arthurmgraf/graphmind"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/arthurmgraf/graphmind","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fgraphmind","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fgraphmind/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fgraphmind/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fgraphmind/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arthurmgraf","download_url":"https://codeload.github.com/arthurmgraf/graphmind/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fgraphmind/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29604552,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T05:11:50.834Z","status":"ssl_error","status_checked_at":"2026-02-19T05:11:38.921Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","fastapi","knowledge-graph","langchain","langgraph","llm","neo4j","python","rag","vector-search"],"created_at":"2026-02-19T06:05:20.465Z","updated_at":"2026-02-19T06:05:21.234Z","avatar_url":"https://github.com/arthurmgraf.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GraphMind\n\n**Autonomous Knowledge Agent Platform** -- Agentic RAG powered by Knowledge Graphs, dual-engine orchestration, and self-evaluating retrieval pipelines.\n\n[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)\n[![Tests](https://img.shields.io/badge/tests-85%20passing-brightgreen.svg)](#testing)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n\n---\n\n## Architecture\n\nGraphMind runs two orchestration engines over a shared hybrid retrieval layer.\nQueries enter through the API, select an engine, and pass through self-evaluation\nbefore returning an answer.\n\n```\n                          +------------------+\n                          |   FastAPI / MCP   |\n                          |   Streamlit UI    |\n                          +--------+---------+\n                                   |\n                          engine = ?\n                     +-------------+-------------+\n                     |                           |\n          +----------v----------+     +----------v----------+\n          |      LangGraph      |     |       CrewAI        |\n          |   (state machine)   |     |  (role-based crew)  |\n          |                     |     |                     |\n          |  Planner            |     |  Research Agent     |\n          |    |                |     |  Analysis Agent     |\n          |  Retriever Agent    |     |  Synthesis Agent    |\n          |    |                |     |  QA Agent           |\n          |  Synthesizer        |     |                     |\n          |    |                |     |  Sequential process |\n          |  Evaluator          |     |  with shared tools  |\n          |    |                |     |                     |\n          |  score \u003c 0.7 ?      |     +----------+----------+\n          |   yes -\u003e retry (x2) |                |\n          |   no  -\u003e done       |                |\n          +----------+----------+     +----------+\n                     |                           |\n                     +-------------+-------------+\n                                   |\n                    +--------------v--------------+\n                    |    Hybrid Retrieval Layer    |\n                    |                             |\n                    |  +--------+   +---------+   |\n                    |  | Qdrant |   |  Neo4j  |   |\n                    |  | Vector |   |  Graph  |   |\n                    |  +---+----+   +----+----+   |\n                    |      |             |        |\n                    |      +------+------+        |\n                    |             |                |\n                    |        RRF Fusion            |\n                    +--------------+---------------+\n                                   |\n                    +--------------v--------------+\n                    |       LLM Router            |\n                    |  Groq -\u003e Gemini -\u003e Ollama   |\n                    |    (cascading fallback)      |\n                    +-----------------------------+\n```\n\n---\n\n## Key Features\n\n- **Dual Orchestration Engines** -- LangGraph state machine for deterministic pipelines; CrewAI role-based crew for collaborative multi-agent reasoning. Choose per query.\n- **Hybrid Retrieval with RRF** -- Combines Qdrant vector similarity search with Neo4j graph traversal, fused via Reciprocal Rank Fusion for higher recall and precision.\n- **Self-Evaluation Loop** -- The LangGraph evaluator scores every answer. Scores below 0.7 trigger an automatic rewrite and re-query cycle (max 2 retries).\n- **Multi-Provider LLM Routing** -- Cascading fallback across Groq, Google Gemini, and Ollama. If the primary provider is down or rate-limited, the next one picks up seamlessly.\n- **Knowledge Graph Construction** -- Automated entity and relation extraction from ingested documents, building a Neo4j graph that enriches retrieval context.\n- **7-Format Document Ingestion** -- Markdown, PDF, TXT, HTML, DOCX, CSV, and JSON loaders with configurable chunking strategies.\n- **NeMo Guardrails** -- Input and output safety filtering via Colang flows to enforce content policies.\n- **Full Observability** -- Langfuse tracing, per-request cost tracking, and metrics collection across every pipeline stage.\n- **Evaluation Suite** -- DeepEval and RAGAS benchmarks measuring faithfulness, relevancy, and groundedness.\n- **MCP Server** -- Model Context Protocol integration for IDE tools (Claude Code, Cursor, VS Code).\n- **Streamlit Dashboard** -- Web UI for querying, document ingestion, knowledge graph statistics, and system health monitoring.\n- **85 Unit Tests** passing across 10 test files.\n\n---\n\n## Technology Stack\n\n| Component | Technology | Purpose |\n|---|---|---|\n| Orchestration | **LangGraph** + **CrewAI** | Dual-engine: state machine + role-based multi-agent crew |\n| LLM Routing | **Groq** / **Gemini** / **Ollama** | Multi-provider with cascading fallback |\n| Vector Store | **Qdrant** | Semantic similarity search |\n| Graph Database | **Neo4j** | Entity-relationship traversal |\n| Embeddings | **Ollama** (nomic-embed-text) | 768-dim local embeddings |\n| Safety | **NeMo Guardrails** | Input/output filtering via Colang flows |\n| Observability | **Langfuse** | Tracing, cost tracking, evaluation |\n| Evaluation | **DeepEval** + **RAGAS** | Faithfulness, relevancy, groundedness metrics |\n| API | **FastAPI** | REST endpoints for query, ingest, health |\n| MCP Server | **Model Context Protocol** | IDE integration (Claude Code, Cursor, VS Code) |\n| Dashboard | **Streamlit** | Web UI for queries, ingestion, and monitoring |\n| Configuration | **Pydantic Settings** | Type-safe config with YAML overlay |\n| Data Models | **Pydantic v2** | 13 shared models across the platform |\n| Infrastructure | **Docker Compose** | Qdrant, Neo4j, PostgreSQL, Langfuse, Ollama |\n\n---\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.11+\n- Docker and Docker Compose\n- Groq API key (free at [console.groq.com](https://console.groq.com))\n\n### 1. Clone and install\n\n```bash\ngit clone https://github.com/arthurmgraf/graphmind.git\ncd graphmind\npip install -e \".[dev,eval]\"\n```\n\n### 2. Start infrastructure\n\n```bash\ndocker compose up -d\n```\n\nThis launches Qdrant, Neo4j, PostgreSQL, Langfuse, and Ollama.\n\n### 3. Pull the embedding model\n\n```bash\nmake pull-models\n```\n\n### 4. Configure environment variables\n\n```bash\nexport GROQ_API_KEY=\"your-key-here\"\n# Optional:\nexport GEMINI_API_KEY=\"your-key-here\"\nexport NEO4J_PASSWORD=\"your-password\"\n```\n\n### 5. Run\n\n```bash\n# FastAPI server\nmake run\n# or: graphmind\n\n# Streamlit dashboard\nmake dashboard\n# or: graphmind-dashboard\n\n# MCP server (for IDE integration)\nmake mcp\n# or: graphmind-mcp\n```\n\n### 6. Ingest documents\n\n```bash\n# Via CLI\ngraphmind-ingest path/to/document.md --type md\n\n# Via API\ncurl -X POST http://localhost:8000/api/v1/ingest \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"content\": \"# My Doc\\n\\nContent here.\", \"filename\": \"doc.md\", \"doc_type\": \"md\"}'\n```\n\n### 7. Query\n\n```bash\n# LangGraph engine (default)\ncurl -X POST http://localhost:8000/api/v1/query \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"question\": \"What is LangGraph?\", \"top_k\": 10, \"engine\": \"langgraph\"}'\n\n# CrewAI engine\ncurl -X POST http://localhost:8000/api/v1/query \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"question\": \"Compare CrewAI and LangGraph\", \"engine\": \"crewai\"}'\n```\n\n---\n\n## Project Structure\n\n```\ngraphmind/\n├── config/                          # YAML configuration files\n├── diagrams/\n│   └── generated/                   # Exported diagrams (architecture, agents, data-flow)\n├── docs/\n│   ├── adrs/                        # Architecture Decision Records (5 ADRs)\n│   ├── getting-started.md\n│   ├── running.md\n│   ├── querying.md\n│   ├── ingestion.md\n│   ├── testing.md\n│   ├── deployment.md\n│   └── BUILD_REPORT.md\n├── eval/                            # Benchmark datasets and reports\n├── src/graphmind/\n│   ├── agents/                      # LangGraph nodes + orchestrator\n│   │   ├── planner.py               #   Query planning and decomposition\n│   │   ├── retriever_agent.py       #   Hybrid retrieval execution\n│   │   ├── synthesizer.py           #   Answer generation\n│   │   ├── evaluator.py             #   Self-evaluation with retry logic\n│   │   ├── orchestrator.py          #   LangGraph state machine wiring\n│   │   └── states.py                #   TypedDict state definitions\n│   ├── crew/                        # CrewAI multi-agent crew\n│   │   ├── agents.py                #   Role definitions (Research, Analysis, Synthesis, QA)\n│   │   ├── tasks.py                 #   Task specifications\n│   │   ├── tools.py                 #   Shared tool wrappers\n│   │   └── crew.py                  #   Crew assembly and kickoff\n│   ├── api/                         # FastAPI application\n│   │   ├── main.py                  #   App factory and middleware\n│   │   └── routes/                  #   query, ingest, health endpoints\n│   ├── dashboard/                   # Streamlit web UI\n│   │   └── app.py                   #   Query, ingest, graph stats, system health\n│   ├── ingestion/                   # Document processing pipeline\n│   │   ├── loaders.py               #   7 format loaders (MD, PDF, TXT, HTML, DOCX, CSV, JSON)\n│   │   ├── chunker.py               #   Configurable text chunking\n│   │   └── pipeline.py              #   End-to-end ingestion orchestration\n│   ├── knowledge/                   # Knowledge graph construction\n│   │   ├── entity_extractor.py      #   LLM-based entity extraction\n│   │   ├── relation_extractor.py    #   LLM-based relation extraction\n│   │   ├── graph_builder.py         #   Neo4j graph population\n│   │   └── graph_schema.cypher      #   Graph schema definition\n│   ├── retrieval/                   # Hybrid retrieval layer\n│   │   ├── embedder.py              #   Ollama embedding client\n│   │   ├── vector_retriever.py      #   Qdrant vector search\n│   │   ├── graph_retriever.py       #   Neo4j graph traversal\n│   │   └── hybrid_retriever.py      #   RRF fusion of vector + graph results\n│   ├── safety/                      # NeMo Guardrails\n│   │   ├── guardrails.py            #   Guardrails integration\n│   │   ├── config.py                #   Safety configuration\n│   │   ├── config.yml               #   NeMo config file\n│   │   └── rails.co                 #   Colang flow definitions\n│   ├── observability/               # Monitoring and tracing\n│   │   ├── langfuse_client.py       #   Langfuse integration\n│   │   ├── cost_tracker.py          #   Per-request cost tracking\n│   │   └── metrics.py               #   Metrics collection\n│   ├── evaluation/                  # Evaluation framework\n│   │   ├── deepeval_suite.py        #   DeepEval test suite\n│   │   ├── ragas_eval.py            #   RAGAS evaluation metrics\n│   │   ├── eval_models.py           #   Evaluation data models\n│   │   └── benchmark.py             #   Benchmark runner\n│   ├── mcp/                         # Model Context Protocol server\n│   │   └── server.py                #   MCP tool definitions\n│   ├── config.py                    # Pydantic Settings with YAML overlay\n│   ├── llm_router.py               # Multi-provider LLM routing with fallback\n│   └── schemas.py                   # 13 shared Pydantic models\n├── tests/\n│   ├── unit/                        # 85 unit tests across 10 files\n│   │   ├── test_agents.py\n│   │   ├── test_chunker.py\n│   │   ├── test_config.py\n│   │   ├── test_cost_tracker.py\n│   │   ├── test_crew.py\n│   │   ├── test_deepeval_suite.py\n│   │   ├── test_hybrid_retriever.py\n│   │   ├── test_loaders.py\n│   │   ├── test_metrics.py\n│   │   └── test_schemas.py\n│   ├── integration/                 # Integration tests\n│   └── conftest.py                  # Shared fixtures\n├── docker-compose.yml               # Qdrant, Neo4j, PostgreSQL, Langfuse, Ollama\n├── Makefile                         # Common commands\n└── pyproject.toml                   # Project metadata and dependencies\n```\n\n---\n\n## Development\n\n### Testing\n\n```bash\n# Run all unit tests (85 tests across 10 files)\nmake test\n\n# Run with coverage report\nmake test-all\n\n# Run a specific test file\npytest tests/unit/test_agents.py -v\n```\n\n### Linting and Formatting\n\n```bash\nmake lint\nmake format\n```\n\n### Evaluation Benchmark\n\n```bash\n# Run DeepEval + RAGAS evaluation suite\nmake eval\n```\n\n---\n\n## Orchestration Engines\n\nGraphMind provides two orchestration engines. Choose per query via the `engine` parameter.\n\n### LangGraph -- State Machine Pipeline\n\nA deterministic, graph-based pipeline where each node performs a single step. The evaluator node implements a self-correction loop: if the answer scores below **0.7**, it rewrites the query and retries (up to **2 times**).\n\n| Node | Responsibility |\n|---|---|\n| **Planner** | Decomposes the query into sub-questions and a retrieval strategy |\n| **Retriever Agent** | Executes hybrid retrieval (vector + graph + RRF) |\n| **Synthesizer** | Generates a grounded answer from retrieved context |\n| **Evaluator** | Scores the answer; triggers retry loop if quality is insufficient |\n\n### CrewAI -- Role-Based Multi-Agent Crew\n\nA collaborative crew of specialized agents that execute tasks sequentially, delegating and sharing context through CrewAI's built-in mechanisms.\n\n| Agent | Role |\n|---|---|\n| **Research Agent** | Retrieves and ranks relevant information |\n| **Analysis Agent** | Identifies patterns, contradictions, and gaps |\n| **Synthesis Agent** | Composes a coherent, well-structured answer |\n| **QA Agent** | Validates accuracy and completeness |\n\n### When to Use Which\n\n| Criteria | LangGraph | CrewAI |\n|---|---|---|\n| Deterministic flow | Yes | No |\n| Self-evaluation retry | Built-in | Via QA agent |\n| Multi-perspective analysis | Single pipeline | Multiple agents collaborate |\n| Best for | Factual Q\u0026A, precise retrieval | Complex analysis, comparison tasks |\n\n---\n\n## MCP Integration\n\nGraphMind exposes an [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) server for integration with AI-powered IDEs and tools.\n\n### Configuration\n\nAdd the following to your MCP client settings (Claude Code, Cursor, VS Code, etc.):\n\n```json\n{\n  \"mcpServers\": {\n    \"graphmind\": {\n      \"command\": \"graphmind-mcp\",\n      \"args\": []\n    }\n  }\n}\n```\n\n### Available Tools\n\n| Tool | Description |\n|---|---|\n| `query` | Ask a question against the knowledge base |\n| `ingest` | Ingest a document into the system |\n| `graph_stats` | Retrieve knowledge graph statistics (entities, relations, counts) |\n| `health` | Check system health status of all components |\n\n---\n\n## Documentation\n\n| Document | Description |\n|---|---|\n| [Getting Started](docs/getting-started.md) | Installation and initial setup guide |\n| [Running](docs/running.md) | How to run the API, dashboard, and MCP server |\n| [Querying](docs/querying.md) | Query API reference and engine selection |\n| [Ingestion](docs/ingestion.md) | Document ingestion formats and pipeline details |\n| [Testing](docs/testing.md) | Test suite structure, running tests, writing new tests |\n| [Deployment](docs/deployment.md) | Production deployment guide |\n| [Build Report](docs/BUILD_REPORT.md) | Full project build report |\n\n---\n\n## Architecture Decision Records\n\n| ADR | Decision |\n|---|---|\n| [ADR-001](docs/adrs/001-multi-provider-llm-routing.md) | Multi-Provider LLM Routing with cascading fallback |\n| [ADR-002](docs/adrs/002-hybrid-retrieval-with-rrf.md) | Hybrid Retrieval with Reciprocal Rank Fusion |\n| [ADR-003](docs/adrs/003-langgraph-agentic-rag.md) | LangGraph Agentic RAG pipeline design |\n| [ADR-004](docs/adrs/004-mcp-server-integration.md) | MCP Server Integration for IDE tooling |\n| [ADR-005](docs/adrs/005-crewai-dual-engine.md) | Dual Engine architecture (LangGraph + CrewAI) |\n\n---\n\n## Diagrams\n\nArchitecture and data-flow diagrams are maintained as Excalidraw source files and exported\nto the `diagrams/generated/` directory, organized into three categories:\n\n```\ndiagrams/generated/\n├── agents/          # Agent interaction and delegation flows\n├── architecture/    # High-level system architecture\n└── data-flow/       # Data ingestion and retrieval pipelines\n```\n\n---\n\n## Configuration\n\nAll settings are managed via `config/settings.yaml` with environment variable overrides.\nConfiguration is loaded through Pydantic Settings, providing type safety and validation.\n\n| Variable | Required | Default | Description |\n|---|---|---|---|\n| `GROQ_API_KEY` | Yes | -- | Groq API key (primary LLM provider) |\n| `GEMINI_API_KEY` | No | -- | Google Gemini API key (fallback LLM) |\n| `NEO4J_PASSWORD` | Yes | -- | Neo4j database password |\n| `LANGFUSE_PUBLIC_KEY` | No | -- | Langfuse public key for observability |\n| `LANGFUSE_SECRET_KEY` | No | -- | Langfuse secret key for observability |\n| `QDRANT_URL` | No | `http://localhost:6333` | Qdrant vector store URL |\n| `NEO4J_URI` | No | `bolt://localhost:7687` | Neo4j connection URI |\n| `OLLAMA_BASE_URL` | No | `http://localhost:11434` | Ollama API base URL |\n\n---\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n---\n\n## Author\n\n**Arthur Maia Graf** -- [arthurmgraf@hotmail.com](mailto:arthurmgraf@hotmail.com)\n\nGitHub: [github.com/arthurmgraf/graphmind](https://github.com/arthurmgraf/graphmind)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farthurmgraf%2Fgraphmind","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farthurmgraf%2Fgraphmind","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farthurmgraf%2Fgraphmind/lists"}