{"id":39274128,"url":"https://github.com/stevereiner/flexible-graphrag","last_synced_at":"2026-05-16T05:19:58.495Z","repository":{"id":308657025,"uuid":"1032851610","full_name":"stevereiner/flexible-graphrag","owner":"stevereiner","description":"Flexible GraphRAG: Python, LlamaIndex, Docker Compose: 8 Graph dbs, 10 Vector dbs, OpenSearch, Elasticsearch, Alfresco. 13 data sources (9 auto-sync), KG auto-building, schemas, LLMs, Docling or LlamaParse doc processing, GraphRAG, RAG only, Hybrid search, AI chat. React, Vue, Angular frontends, FastAPI backend, REST API, MCP Server. Please 🌟 Star","archived":false,"fork":false,"pushed_at":"2026-03-10T04:19:19.000Z","size":27885,"stargazers_count":109,"open_issues_count":4,"forks_count":26,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-03-10T12:26:20.419Z","etag":null,"topics":["ai","ai-chat","alfresco","arcadedb","doc-processing","docling","falkordb","generative-ai","graphrag","hybrid-search","knowledge-graph","llamaindex","llamaparse","llm","mcp","mcp-server","neo4j","python","rag","search"],"latest_commit_sha":null,"homepage":"https://integratedsemantics.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stevereiner.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-05T23:49:36.000Z","updated_at":"2026-03-10T04:19:22.000Z","dependencies_parsed_at":"2025-08-24T07:19:05.998Z","dependency_job_id":"8fe17322-905f-44d7-9e25-3828ec79b655","html_url":"https://github.com/stevereiner/flexible-graphrag","commit_stats":null,"previous_names":["stevereiner/flexible-graphrag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/stevereiner/flexible-graphrag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevereiner%2Fflexible-graphrag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevereiner%2Fflexible-graphrag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevereiner%2Fflexible-graphrag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevereiner%2Fflexible-graphrag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stevereiner","download_url":"https://codeload.github.com/stevereiner/flexible-graphrag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevereiner%2Fflexible-graphrag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30362803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T21:41:54.280Z","status":"ssl_error","status_checked_at":"2026-03-10T21:40:59.357Z","response_time":106,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-chat","alfresco","arcadedb","doc-processing","docling","falkordb","generative-ai","graphrag","hybrid-search","knowledge-graph","llamaindex","llamaparse","llm","mcp","mcp-server","neo4j","python","rag","search"],"created_at":"2026-01-18T00:51:57.441Z","updated_at":"2026-05-16T05:19:58.483Z","avatar_url":"https://github.com/stevereiner.png","language":"Python","funding_links":[],"categories":["Browse The Shelves"],"sub_categories":["Agent observability"],"readme":"\n**New 5/6/26:** 15 property graph databases total: 8 supported on both LlamaIndex and LangChain, 1 LI-only (Google Cloud Spanner Graph), 6 LC-only (ArangoDB, Apache AGE, Azure Cosmos DB for Gremlin, Apache HugeGraph, SurrealDB, TigerGraph). AWS Neptune RDF/SPARQL added. All 10 vector databases, all 3 search engines, and all LLM/embedding providers work with both LlamaIndex and LangChain. Every pipeline stage (chunking, KG extraction, graph write, vector write, search write, and retrieval fusion) can be configured independently. (Data source reading is LlamaIndex only; RDF stores use framework-independent adapters with LangChain Text-to-SPARQL retrieval.)\n\n**New:** Flexible GraphRAG now supports RDF-based ontologies for both property graph databases and RDF triple store databases (Graphwise Ontotext GraphDB, Fuseki, and Oxigraph). Document ingestion with KG extraction, auto incremental data source change detection, and UI search (hybrid search, AI query, and AI chat) are all supported with both database types.\n\n**New:** Flexible GraphRAG supports **automatic incremental updates** (Optional) from most data sources, keeping your Vector, Search and Graph databases synchronized in real-time or near real-time.\n\n**New:** [KG Spaces Integration of Flexible GraphRAG in Alfresco ACA Client](https://github.com/stevereiner/kg-spaces-aca)\n\n# Flexible GraphRAG\n\n[![PyPI - flexible-graphrag](https://img.shields.io/pypi/v/flexible-graphrag?label=flexible-graphrag\u0026color=blue)](https://pypi.org/project/flexible-graphrag/)\n[![Downloads - flexible-graphrag](https://img.shields.io/pepy/dt/flexible-graphrag)](https://pepy.tech/project/flexible-graphrag)\n[![PyPI - flexible-graphrag-mcp](https://img.shields.io/pypi/v/flexible-graphrag-mcp?label=flexible-graphrag-mcp\u0026color=blue)](https://pypi.org/project/flexible-graphrag-mcp/)\n[![Downloads - flexible-graphrag-mcp](https://img.shields.io/pepy/dt/flexible-graphrag-mcp)](https://pepy.tech/project/flexible-graphrag-mcp)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python](https://img.shields.io/badge/python-3.12%20%7C%203.13%20%7C%203.14-blue)](https://www.python.org/)\n[![React](https://img.shields.io/badge/React-18-61DAFB?logo=react\u0026logoColor=white)](https://react.dev/)\n[![Angular](https://img.shields.io/badge/Angular-19-DD0031?logo=angular\u0026logoColor=white)](https://angular.dev/)\n[![Vue](https://img.shields.io/badge/Vue-3-4FC08D?logo=vue.js\u0026logoColor=white)](https://vuejs.org/)\n\n**Flexible GraphRAG** is an open source AI context platform supporting a document processing pipeline (Docling or LlamaParse), knowledge graph auto-building, ontologies, schemas, many LLM providers, GraphRAG and RAG, hybrid semantic search (fulltext, vector, property graph, RDF/SPARQL), AI query, and AI chat. The backend is **Python** with **LlamaIndex** and **LangChain** as peer frameworks. **LlamaIndex** is the default for each pipeline stage; **LangChain** can be selected per stage in environment configuration. The API is a REST **FastAPI** service. **Angular**, **React**, and **Vue** TypeScript frontends and an **MCP** server are included. The stack supports 13 data sources (9 with incremental auto-sync), 15 property graph databases, 4 RDF triple stores (Apache Jena Fuseki, Ontotext GraphDB, Oxigraph, Amazon Neptune RDF), 10 vector databases, OpenSearch / Elasticsearch / BM25 search, and Alfresco. Services and dashboards can be enabled with the provided Docker Compose layout.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"./screen-shots/auto-sync/auto-sync.png\"\u003e\n    \u003cimg src=\"./screen-shots/auto-sync/auto-sync.png\" alt=\"Flexible GraphRAG data sources, processing tab, auto-sync document states in Postgres, Neo4j\" width=\"700\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cem\u003eFlexible GraphRAG data sources, processing tab, auto-sync document states in Postgres, Neo4j\u003c/em\u003e\u003c/p\u003e\n\n## v0.6.0 in brief\n\nVersion **0.6.0** broadens framework and database choice: **LangChain** is a full peer to **LlamaIndex** (per-stage env pickers for chunking, vector, search, property graph, KG extraction, fusion). **15** property graph backends: **8** on both frameworks, **Google Cloud Spanner** (LlamaIndex-only), **6** LangChain-only (ArangoDB, Apache AGE, Azure Cosmos DB for Gremlin, HugeGraph, SurrealDB, TigerGraph). **RDF** includes **Apache Jena Fuseki**, **Ontotext GraphDB**, **Oxigraph**, and **Amazon Neptune RDF**. Incremental delete, LangChain adapters, and cleanup paths were extended across stores (see [CHANGELOG.md](CHANGELOG.md)).\n\n## Features\n\n- **Hybrid Search**: Configurable hybrid search combining vector search, full-text search, property-graph GraphRAG, and SPARQL against RDF stores.\n- **Knowledge Graph GraphRAG**: Extracts entities and relationships from documents to build graphs in property graph databases and RDF stores. Optional schemas and ontologies guide extraction or act as a starting point for the LLM to extend.\n- **RDF/Ontology Support**: Load OWL/RDFS ontologies to guide KG extraction into any property graph or RDF store; SPARQL 1.1 queries; RDF 1.2 triple annotations; full UI pipeline (ingest, hybrid search, AI query/chat, incremental auto-sync). See [Ontology and RDF Support](#ontology-and-rdf-support) below.\n- **15 Property Graph Databases**: 8 on both LI+LC (Neo4j, ArcadeDB, FalkorDB, Ladybug, Memgraph, NebulaGraph, Amazon Neptune, Neptune Analytics), 1 LI-only (Google Cloud Spanner), 6 LC-only (ArangoDB, Apache AGE, Cosmos Gremlin, HugeGraph, SurrealDB, TigerGraph) — with KG extraction, hybrid search, and AI query/chat\n- **4 RDF Triple Stores**: Apache Jena Fuseki, Ontotext GraphDB, Oxigraph, Amazon Neptune RDF.\n- **10 Vector Databases**: Qdrant, Elasticsearch, OpenSearch, Neo4j, Chroma, Milvus, Weaviate, Pinecone, PostgreSQL pgvector, LanceDB — for semantic similarity search\n- **3 Search Databases**: Elasticsearch, OpenSearch, BM25 (built-in) — for full-text search and hybrid ranking\n- **LLM providers (KG extraction \u0026 chat)**: Ollama, OpenAI, Azure OpenAI, Google Gemini, Anthropic Claude, Google Vertex AI, Amazon Bedrock, Groq, Fireworks AI, OpenAI-compatible endpoints (`openai_like`), OpenRouter, LiteLLM proxy, and vLLM — configurable via `LLM_PROVIDER`; see [Supported LLM Providers](#supported-llm-providers)\n- **Embedding providers**: OpenAI, Ollama, Azure OpenAI, Google GenAI, Vertex AI, Bedrock, Fireworks, OpenAI-like (`EMBEDDING_KIND=openai_like`), and LiteLLM — see [LLM Configuration](#llm-configuration)\n- **Dual-framework pipeline**: **LlamaIndex** and **LangChain** are first-class choices for chunking, vector and search adapters, property graphs, KG extraction, RDF text-to-SPARQL retrieval, and hybrid fusion—each stage can be set independently (**LlamaIndex** defaults). See [Framework Configuration](#framework-configuration).\n- **Multi-Source Ingestion**: Processes documents from 13 data sources (9 with incremental auto sync): (file upload, cloud storage, enterprise repositories, web sources) with Docling (default) or LlamaParse (cloud API) document parsing.\n- **Observability**: Built-in OpenTelemetry instrumentation with automatic LlamaIndex tracing, Prometheus metrics, Jaeger traces, and Grafana dashboards for production monitoring\n- **FastAPI Server with REST API**: Python based FastAPI server with REST APIs for document ingesting, hybrid search, AI query, and AI chat.\n- **MCP Server**: MCP server providing Claude Desktop and other MCP clients with tools for document/text ingesting (all 13 data sources with 9 supporting incremental auto sync), hybrid search, and AI query. Uses FastAPI backend REST APIs. \n- **UI Clients**: Angular, React, and Vue UI clients support choosing the data source (filesystem, Alfresco, CMIS, etc.), ingesting documents, performing hybrid searches, AI queries, and AI chat. The UI clients use the REST APIs of the FastAPI backend.\n- **Docker Deployment Flexibility**: Supports both standalone and Docker deployment modes. Docker infrastructure provides modular database selection via docker-compose includes - vector, graph, search engines, and Alfresco can be included or excluded with a single comment. Choose between hybrid deployment (databases in Docker, backend and UIs standalone) or full containerization.\n\n## Frontend Screenshots\n\n### Angular Frontend - Tabbed Interface\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to view Angular UI screenshots (Light Theme)\u003c/summary\u003e\n\n| Sources Tab | Processing Tab | Search Tab | Chat Tab |\n|-------------|----------------|------------|----------|\n| [![Angular Sources](./screen-shots/angular/angular-sources.png)](./screen-shots/angular/angular-sources.png) | [![Angular Processing](./screen-shots/angular/angular-processing.png)](./screen-shots/angular/angular-processing.png) | [![Angular Search](./screen-shots/angular/angular-search.png)](./screen-shots/angular/angular-search.png) | [![Angular Chat](./screen-shots/angular/angular-chat.png)](./screen-shots/angular/angular-chat.png) |\n\n\u003c/details\u003e\n\n### React Frontend - Tabbed Interface\n\n\u003cdetails open\u003e\n\u003csummary\u003eClick to view React UI screenshots (Dark Theme)\u003c/summary\u003e\n\n| Sources Tab | Processing Tab | Search Tab | Chat Tab |\n|-------------|----------------|------------|----------|\n| [![React Sources](./screen-shots/react/react-sources.png)](./screen-shots/react/react-sources.png) | [![React Processing](./screen-shots/react/react-processing.png)](./screen-shots/react/react-processing.png) | [![React Search](./screen-shots/react/react-search-hybrid-search.png)](./screen-shots/react/react-search-hybrid-search.png) | [![React Chat](./screen-shots/react/react-chat-using.png)](./screen-shots/react/react-chat-using.png) |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to view React UI screenshots (Light Theme)\u003c/summary\u003e\n\n| Sources Tab | Processing Tab | Search Tab | Chat Tab |\n|-------------|----------------|------------|----------|\n| [![React Sources Light](./screen-shots/react/react-sources-light.png)](./screen-shots/react/react-sources-light.png) | [![React Processing Light](./screen-shots/react/react-processing-light.png)](./screen-shots/react/react-processing-light.png) | [![React Search Light](./screen-shots/react/react-search-hybrid-search-light.png)](./screen-shots/react/react-search-hybrid-search-light.png) | [![React Chat Light](./screen-shots/react/react-chat-using-light.png)](./screen-shots/react/react-chat-using-light.png) |\n\n\u003c/details\u003e\n\n### Vue Frontend - Tabbed Interface\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to view Vue UI screenshots (Light Theme)\u003c/summary\u003e\n\n| Sources Tab | Processing Tab | Search Tab | Chat Tab |\n|-------------|----------------|------------|----------|\n| [![Vue Sources](./screen-shots/vue/vue-sources.png)](./screen-shots/vue/vue-sources.png) | [![Vue Processing](./screen-shots/vue/vue-processing.png)](./screen-shots/vue/vue-processing.png) | [![Vue Search](./screen-shots/vue/vue-search.png)](./screen-shots/vue/vue-search.png) | [![Vue Chat](./screen-shots/vue/vue-chat.png)](./screen-shots/vue/vue-chat.png) |\n\n\u003c/details\u003e\n\n## System Components\n\n### FastAPI Backend (`/flexible-graphrag`)\n- **REST API Server**: Provides endpoints for document ingestion, search, and AI query/chat\n- **Hybrid Search Engine**: Combines vector similarity (RAG), fulltext (BM25), and graph traversal (GraphRAG)\n- **Document Processing**: Advanced document conversion with Docling and LlamaParse integration\n- **Configurable Architecture**: Environment-based configuration for all components\n- **Async Processing**: Background task processing with real-time progress updates\n\n### MCP Server (`/flexible-graphrag-mcp`)  \n- **MCP Client support**: Model Context Protocol server for Claude Desktop and other MCP clients\n- **Full API Parity**: Tools like `ingest_documents()` support all 13 data sources with source-specific configs: filesystem, repositories (Alfresco, SharePoint, Box, CMIS), cloud storage, web; `skip_graph` flag for all data sources; `paths` parameter for filesystem/Alfresco/CMIS; Alfresco also supports `nodeDetails` list (multi-select for KG Spaces)\n- **Additional Tools**: `search_documents()`, `query_documents()`, `ingest_text()`, system diagnostics, and health checks\n- **Dual Transport**: HTTP mode for debugging, stdio mode for production\n- **Tool Suite**: 9 specialized tools for document processing, search, and system management\n- **Multiple Installation**: pipx system installation or uvx no-install execution\n\n### UI Clients (`/flexible-graphrag-ui`)\n- **Angular Frontend**: Material Design with TypeScript\n- **React Frontend**: Modern React with Vite and TypeScript  \n- **Vue Frontend**: Vue 3 Composition API with Vuetify and TypeScript\n- **Unified Features**: All clients support the 4 tab views, async processing, progress tracking, and cancellation\n\n### Docker Infrastructure (`/docker`)\n- **Modular Database Selection**: Include/exclude vector, graph, and search engines, and Alfresco with single-line comments\n- **Flexible Deployment**: Hybrid mode (databases in Docker, apps standalone) or full containerization\n- **NGINX Reverse Proxy**: Unified access to all services with proper routing\n- **Built-in Database Dashboards**: Most server dockers also provide built-in web interface dashboards (Neo4j browser, ArcadeDB, FalkorDB, OpenSearch, etc.)\n- **Separate Dashboards**: Additional dashboard dockers are provided: including Kibana for Elasticsearch and optional Ladybug Explorer (see `docker/includes/ladybug-explorer.yaml`).\n\n## Data Sources\n\nFlexible GraphRAG supports **13 different data sources** for ingesting documents into your knowledge base:\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"./screen-shots/react/data-sources-1.jpeg\"\u003e\n    \u003cimg src=\"./screen-shots/react/data-sources-1.jpeg\" alt=\"Data Sources\" width=\"700\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n### File \u0026 Upload Sources\n1. **File Upload** - Direct file upload through web interface with drag \u0026 drop support\n\n\n### Cloud Storage Sources\n2. **Amazon S3** - AWS S3 bucket integration\n3. **Google Cloud Storage (GCS)** - Google Cloud storage buckets\n4. **Azure Blob Storage** - Microsoft Azure blob containers\n5. **OneDrive** - Microsoft OneDrive personal/business storage\n6. **Google Drive** - Google Drive file storage\n\n### Enterprise Repository Sources\n7. **Alfresco** - Alfresco ECM/content repository with two integration options:\n   - **[KG Spaces ACA Extension](https://github.com/stevereiner/kg-spaces-aca)** - Integrates the Flexible GraphRAG Angular UI as an extension plugin within the Alfresco Content Application (ACA), enabling multi-select document/folder ingestion with nodeIds directly from the Alfresco interface\n   - **Flexible GraphRAG Alfresco Data Source** - Direct integration using Alfresco paths (e.g., /Shared/GraphRAG, /Company Home/Shared/GraphRAG, or /Shared/GraphRAG/cmispress.txt)\n8. **SharePoint** - Microsoft SharePoint document libraries\n9. **Box** - Box.com cloud storage\n10. **CMIS (Content Management Interoperability Services)** - Industry-standard content repository interface\n\n### Web Sources\n11. **Web Pages** - Extract content from web URLs\n12. **Wikipedia** - Ingest Wikipedia articles by title or URL\n13. **YouTube** - Process YouTube video transcripts\n\nEach data source includes:\n- **Configuration Forms**: Easy-to-use interfaces for credentials and settings\n- **Progress Tracking**: Real-time per-file progress indicators\n- **Flexible Authentication**: Support for various auth methods (API keys, OAuth, service accounts)\n\n### Incremental Updates \u0026 Auto-Sync\n\n**NEW!** Flexible GraphRAG supports **automatic incremental updates** (Optional) from most data sources, keeping your Vector, Search and Graph databases synchronized in real-time or near real-time:\n\n| Data Source | Auto-Sync Support | Detection Method | Status | Notes |\n|-------------|-------------------|------------------|--------|-------|\n| **Alfresco** | ✅ Real-time | Community ActiveMQ | Ready | Enterprise Event Gateway planned |\n| **Amazon S3** | ✅ Real-time | SQS event notifications | Ready | |\n| **Azure Blob Storage** | ✅ Real-time | Change feed | Ready | |\n| **Google Cloud Storage** | ✅ Real-time | Pub/Sub notifications | Ready | |\n| **Google Drive** | ✅ Near real-time | Changes API (polling) | Ready | |\n| **OneDrive** | ✅ Near real-time | Polling | Ready | Delta query support planned |\n| **SharePoint** | ✅ Near real-time | Polling | Ready | Delta query support planned |\n| **Box** | ✅ Near real-time | Events API (polling) | Ready | |\n| **Local Filesystem** | ✅ Real-time | OS events (watchdog) | Ready | REST API and MCP Server only |\n| **File Upload UI, CMIS, Web Pages, Wikipedia, YouTube** | ➖ Not supported | - | - | No support for incremental updates |\n\n**Features**:\n- **Modification Date Tracking**: Uses file modification timestamps (ordinal) to detect changes\n- **Content Hash Optimization**: Skips reprocessing when file modification date changed but content hasn't\n- **Dual Mechanism**: Event-driven streams (real-time) + periodic polling fallback\n- **LlamaIndex Integration**: Uses proper abstractions for all databases\n- **UI, REST API, MCP Server**: Setting up an auto update data source location can be done thru the 3 UIs, with the REST API, or with the MCP server\n\n**Setup Requirements**:\n\nEnable incremental updates in your `.env` file:\n```bash\nENABLE_INCREMENTAL_UPDATES=true\n\n# PostgreSQL database for state management\n# By default, uses the pgvector database from docker-compose.yaml\nPOSTGRES_INCREMENTAL_URL=postgresql://postgres:password@localhost:5433/postgres\n```\n\n**Note**: The incremental updates system uses PostgreSQL to track document state. The `docker-compose.yaml` includes a pgvector container that can be used both as a vector database option and for incremental updates state management. The database connection creates the necessary tables automatically on first use.\n\n**Usage**: \n- Check the **\"Enable auto change sync\"** checkbox in the Processing tab when configuring your data source\n- For **S3**: Also provide the \"SQS Queue URL\" for event notifications\n- For **GCS**: Also provide the \"Pub/Sub Subscription Name\" for real-time updates\n\n**PostgreSQL for State Management**:\n\nThe `docker/includes/postgres-pgvector.yaml` sets up two databases automatically on first start: `flexible_graphrag` (for optional pgvector vector storage) and `flexible_graphrag_incremental` (for incremental update state management, with its schema created automatically). pgAdmin is also configured at http://localhost:5050 with both databases pre-registered — just enter the master password `admin` when prompted, then use `password` for the server connection and save it. See [docs/DATABASES/POSTGRES-SETUP.md](docs/DATABASES/POSTGRES-SETUP.md) for details.\n\n**Documentation**:\n- System overview: [`docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/README.md`](docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/README.md)\n- Quick start: [`docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/QUICKSTART.md`](docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/QUICKSTART.md)\n- Detailed setup: [`docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/SETUP-GUIDE.md`](docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/SETUP-GUIDE.md)\n- API reference: [`docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/API-REFERENCE.md`](docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/API-REFERENCE.md)\n- PostgreSQL setup: [`docs/DATABASES/POSTGRES-SETUP.md`](docs/DATABASES/POSTGRES-SETUP.md)\n\n**Scripts**:\n- `scripts/incremental/sync-now.sh|.ps1|.bat` - Trigger immediate synchronization\n- `scripts/incremental/set-refresh-interval.sh|.ps1|.bat` - Configure polling interval\n- `scripts/incremental/TIMING-CONFIGURATION.md` - Timing configuration details\n- `scripts/incremental/README.md` - Script usage documentation\n\n### Document Processing Options\n\nAll data sources support two document parser options:\n\n**Docling (Default)**:\n- Open-source, local processing\n- Free with no API costs\n- **GPU acceleration** supported (CUDA/Apple Silicon) for 5-10x faster processing\n- Built-in OCR for scanned documents and images — `DOCLING_OCR=true` + `DOCLING_OCR_ENGINE=auto|rapidocr|easyocr|tesseract_cli|tesserocr|ocrmac`\n- Multi-language support (English, German, French, Spanish, Czech, Russian, Chinese, Japanese, etc.)\n- Configured via: `DOCUMENT_PARSER=docling`\n- `DOCLING_DEVICE=auto|cpu|cuda|mps` — control GPU vs CPU processing\n- `SAVE_PARSING_OUTPUT=true` — save intermediate parsing results for inspection (works for both parsers)\n- `PARSER_FORMAT_FOR_EXTRACTION=auto|markdown|plaintext` — control format used for knowledge graph extraction\n- See [Docling GPU + OCR Configuration Guide](docs/DATA-SOURCES/DOC-PROCESSING/DOCLING-GPU-CONFIGURATION.md) for setup details | [Quick Reference](docs/DATA-SOURCES/DOC-PROCESSING/DOCLING-GPU-CONFIGURATION.md#quick-reference-installation-commands)\n\n**LlamaParse**:\n- Cloud-based API service with advanced AI\n- Multimodal parsing with Claude Sonnet 3.5\n- Three modes available:\n  - `parse_page_without_llm` - 1 credit/page\n  - `parse_page_with_llm` - 3 credits/page (default)\n  - `parse_page_with_agent` - 10-90 credits/page\n- Configured via: `DOCUMENT_PARSER=llamaparse` + `LLAMAPARSE_API_KEY`\n- Get your API key from [LlamaCloud](https://cloud.llamaindex.ai/)\n- **New**: `SAVE_PARSING_OUTPUT=true` - Save parsed output and metadata for inspection\n- **New**: `PARSER_FORMAT_FOR_EXTRACTION=auto|markdown|plaintext` - Control format used for knowledge graph extraction\n\n## Supported File Formats\n\n### Document Formats\n- **PDF**: `.pdf`\n  - **Docling**: Advanced layout analysis, table extraction, formula recognition, configurable OCR (EasyOCR, Tesseract, RapidOCR)\n  - **LlamaParse**: Automatic OCR within parsing pipeline, multimodal vision processing\n- **Microsoft Office**: `.docx`, `.xlsx`, `.pptx` and legacy formats (`.doc`, `.xls`, `.ppt`)\n  - **Docling**: DOCX, XLSX, PPTX structure preservation and content extraction\n  - **LlamaParse**: Full Office suite support including legacy formats and hundreds of variants\n- **Web Formats**: `.html`, `.htm`, `.xhtml`\n  - **Docling**: HTML/XHTML markup structure analysis\n  - **LlamaParse**: HTML/XHTML content extraction and formatting\n- **Data Formats**: `.csv`, `.tsv`, `.json`, `.xml`\n  - **Docling**: CSV structured data processing\n  - **LlamaParse**: CSV, TSV, JSON, XML with enhanced table understanding\n- **Documentation**: `.md`, `.markdown`, `.asciidoc`, `.adoc`, `.rtf`, `.txt`, `.epub`\n  - **Docling**: Markdown, AsciiDoc technical documentation with markup preservation\n  - **LlamaParse**: Extended format support including RTF, EPUB, and hundreds of text format variants\n\n### Image Formats\n- **Standard Images**: `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.webp`, `.tiff`, `.tif`\n  - **Docling**: OCR text extraction with configurable OCR backends (EasyOCR, Tesseract, RapidOCR)\n  - **LlamaParse**: Automatic OCR with multimodal vision processing and context understanding\n\n### Audio Formats\n- **Audio Files**: `.wav`, `.mp3`, `.mp4`, `.m4a`\n  - **Docling**: Automatic speech recognition (ASR) support\n  - **LlamaParse**: Transcription and content extraction for MP3, MP4, MPEG, MPGA, M4A, WAV, WEBM\n\n### Processing Intelligence\n- **Parser Selection**: \n  - **Docling** (default, free): Local processing with specialized CV models (DocLayNet layout analysis, TableFormer for tables), configurable OCR backends (EasyOCR/Tesseract/RapidOCR), optional local VLM support (Granite-Docling, SmolDocling, Qwen2.5-VL, Pixtral)\n  - **LlamaParse** (cloud API, 3 credits/page): Automatic OCR in parsing pipeline, supports hundreds of file formats, fast mode (OCR-only), default mode (proprietary LlamaCloud model), premium mode (proprietary VLM mixture), multimodal mode (bring your own API keys: OpenAI GPT-4o, Anthropic Claude 3.5/4.5 Sonnet, Google Gemini 1.5/2.0, Azure OpenAI)\n- **Output Formats**: \n  - **Flexible GraphRAG** saves both markdown and plaintext, then automatically selects which to use for processing (knowledge graph extraction, vector embeddings, and search indexing) - defaults to markdown for tables, plaintext for text-heavy docs - override with `PARSER_FORMAT_FOR_EXTRACTION`\n  - **Docling** supports: Markdown, JSON (lossless with bounding boxes and provenance), HTML, plain text, and DocTags (specialized markup preserving multi-column layouts, mathematical formulas, and code blocks)\n  - **LlamaParse** supports: Markdown, plain text, raw JSON, XLSX (extracted tables), PDF, images (extracted separately), and structured output (beta - enforces custom JSON schema for strict data model extraction)\n- **Format Detection**: Automatic routing based on file extension and content analysis\n\n## Database Configuration\n\nFlexible GraphRAG uses three types of databases for its hybrid search capabilities. Each can be configured independently via environment variables.\n\n### Search Databases (Full-Text Search)\n\nSet `SEARCH_DB` to select the store and `SEARCH_BACKEND=llamaindex` or `langchain` for the framework.\n\n- **BM25 (Built-in)**: Local in-memory BM25 full-text search with TF-IDF ranking\n  - Dashboard: None (file-based)\n  - Configuration:\n    ```bash\n    SEARCH_DB=bm25\n    BM25_SEARCH_DB_CONFIG={\"persist_dir\": \"./bm25_index\"}\n    ```\n\n- **Elasticsearch**: Enterprise search engine with advanced analyzers, faceted search, and real-time analytics\n  - Dashboard: Kibana (http://localhost:5601)\n  - Configuration:\n    ```bash\n    SEARCH_DB=elasticsearch\n    ELASTICSEARCH_SEARCH_DB_CONFIG={\"hosts\": [\"http://localhost:9200\"], \"index_name\": \"hybrid_search\"}\n    ```\n\n- **OpenSearch**: AWS-led open-source fork with native hybrid scoring (vector + BM25) and k-NN algorithms\n  - Dashboard: OpenSearch Dashboards (http://localhost:5601)\n  - Configuration:\n    ```bash\n    SEARCH_DB=opensearch\n    OPENSEARCH_SEARCH_DB_CONFIG={\"hosts\": [\"http://localhost:9201\"], \"index_name\": \"hybrid_search\"}\n    ```\n\n- **None**: Disable full-text search (vector search only)\n  - Configuration:\n    ```bash\n    SEARCH_DB=none\n    ```\n\n### Vector Databases (Semantic Search)\n\nSet `VECTOR_DB` to select the store and `VECTOR_BACKEND=llamaindex` or `langchain` for the framework.\n\nWhen switching embedding models, delete existing vector indexes — dimensions differ by provider. See [docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md](docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md) for cleanup instructions.\n\n#### Supported Vector Databases\n\n- **Neo4j**: Can be used as vector database with separate vector configuration\n  - Dashboard: Neo4j Browser (http://localhost:7474)\n  - Configuration:\n    ```bash\n    VECTOR_DB=neo4j\n    NEO4J_VECTOR_DB_CONFIG={\"uri\": \"bolt://localhost:7687\", \"username\": \"neo4j\", \"password\": \"your_password\", \"index_name\": \"hybrid_search_vector\"}\n    ```\n\n- **Qdrant**: Dedicated vector database with advanced filtering\n  - Dashboard: Qdrant Web UI (http://localhost:6333/dashboard)\n  - Configuration:\n    ```bash\n    VECTOR_DB=qdrant\n    QDRANT_VECTOR_DB_CONFIG={\"host\": \"localhost\", \"port\": 6333, \"collection_name\": \"hybrid_search\"}\n    ```\n\n- **Elasticsearch**: Can be used as vector database with separate vector configuration\n  - Dashboard: Kibana (http://localhost:5601)\n  - Configuration:\n    ```bash\n    VECTOR_DB=elasticsearch\n    ELASTICSEARCH_VECTOR_DB_CONFIG={\"hosts\": [\"http://localhost:9200\"], \"index_name\": \"hybrid_search_vectors\"}\n    ```\n\n- **OpenSearch**: Can be used as vector database with separate vector configuration\n  - Dashboard: OpenSearch Dashboards (http://localhost:5601)\n  - Configuration:\n    ```bash\n    VECTOR_DB=opensearch\n    OPENSEARCH_VECTOR_DB_CONFIG={\"hosts\": [\"http://localhost:9201\"], \"index_name\": \"hybrid_search_vectors\"}\n    ```\n\n- **Chroma**: Open-source vector database with dual deployment modes\n  - Dashboard: Swagger UI (http://localhost:8001/docs/) (HTTP mode)\n  - Configuration (Local Mode):\n    ```bash\n    VECTOR_DB=chroma\n    CHROMA_VECTOR_DB_CONFIG={\"persist_directory\": \"./chroma_db\", \"collection_name\": \"hybrid_search\"}\n    ```\n  - Configuration (HTTP Mode):\n    ```bash\n    VECTOR_DB=chroma\n    CHROMA_VECTOR_DB_CONFIG={\"host\": \"localhost\", \"port\": 8001, \"collection_name\": \"hybrid_search\"}\n    ```\n\n- **Milvus**: Cloud-native, scalable vector database for similarity search\n  - Dashboard: Attu (http://localhost:3003)\n  - Configuration:\n    ```bash\n    VECTOR_DB=milvus\n    MILVUS_VECTOR_DB_CONFIG={\"host\": \"localhost\", \"port\": 19530, \"collection_name\": \"hybrid_search\"}\n    ```\n\n- **Weaviate**: Vector search engine with semantic capabilities and data enrichment\n  - Dashboard: Weaviate Console (http://localhost:8081/console)\n  - Configuration:\n    ```bash\n    VECTOR_DB=weaviate\n    WEAVIATE_VECTOR_DB_CONFIG={\"url\": \"http://localhost:8081\", \"index_name\": \"HybridSearch\"}\n    ```\n\n- **Pinecone**: Managed vector database service optimized for real-time applications\n  - Dashboard: Pinecone Console (web-based)\n  - Configuration:\n    ```bash\n    VECTOR_DB=pinecone\n    PINECONE_VECTOR_DB_CONFIG={\"api_key\": \"your_api_key\", \"region\": \"us-east-1\", \"cloud\": \"aws\", \"index_name\": \"hybrid-search\"}\n    ```\n\n- **PostgreSQL**: Traditional database with pgvector extension for vector similarity search\n  - Dashboard: pgAdmin (http://localhost:5050)\n  - Configuration:\n    ```bash\n    VECTOR_DB=postgres\n    POSTGRES_VECTOR_DB_CONFIG={\"host\": \"localhost\", \"port\": 5433, \"database\": \"postgres\", \"username\": \"postgres\", \"password\": \"your_password\"}\n    ```\n\n- **LanceDB**: Modern, lightweight vector database designed for high-performance ML applications\n  - Dashboard: LanceDB Viewer (http://localhost:3005)\n  - Configuration:\n    ```bash\n    VECTOR_DB=lancedb\n    LANCEDB_VECTOR_DB_CONFIG={\"uri\": \"./lancedb\", \"table_name\": \"hybrid_search\"}\n    ```\n\n#### RAG without GraphRAG\n\nFor faster document ingest processing (no graph extraction), and hybrid search with only full text + vector, configure:\n```bash\nVECTOR_DB=qdrant       # Any vector store\nSEARCH_DB=elasticsearch  # Any search engine\nPG_GRAPH_DB=none\n```\n\n\n### Property Graph Databases (Knowledge Graph / GraphRAG)\n\nSet `PG_GRAPH_DB` to select the store and `GRAPH_BACKEND=llamaindex` or `langchain` for the framework where both are supported. **LangChain-only** stores (ArangoDB, Apache AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin) route property-graph ingestion and retrieval through LangChain adapters regardless of other env defaults. **LlamaIndex-only** stores (Spanner): when `PG_GRAPH_DB=spanner`, startup forces `GRAPH_BACKEND=llamaindex` and ignores `GRAPH_BACKEND=langchain`.\n\n- **Neo4j Property Graph**: Primary knowledge graph storage with Cypher querying\n  - Dashboard: Neo4j Browser (http://localhost:7474)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=neo4j\n    NEO4J_GRAPH_DB_CONFIG={\"uri\": \"bolt://localhost:7687\", \"username\": \"neo4j\", \"password\": \"your_password\"}\n    ```\n\n- **ArcadeDB**: Multi-model database supporting graph, document, key-value, and search with SQL and Cypher\n  - Dashboard: ArcadeDB Studio (http://localhost:2480)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=arcadedb\n    ARCADEDB_GRAPH_DB_CONFIG={\"host\": \"localhost\", \"port\": 2480, \"username\": \"root\", \"password\": \"password\", \"database\": \"flexible_graphrag\", \"query_language\": \"sql\"}\n    ```\n\n- **FalkorDB**: High-performance graph database using GraphBLAS; purpose-built for LLM / GraphRAG\n  - Dashboard: FalkorDB Browser (http://localhost:3001)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=falkordb\n    FALKORDB_GRAPH_DB_CONFIG={\"url\": \"falkor://localhost:6379\", \"database\": \"falkor\"}\n    ```\n\n- **Ladybug**: Embedded property graph database (Cypher, single `.lbug` file) with optional structured schema and HNSW vector index on chunks; Explorer UI via Docker (port 7003)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=ladybug\n    LADYBUG_GRAPH_DB_CONFIG={\"db_dir\": \"./ladybug\", \"db_file\": \"database.lbug\", \"use_vector_index\": true, \"has_structured_schema\": false, \"strict_schema\": false}\n    ```\n\n- **MemGraph**: Real-time graph database with streaming support and advanced graph algorithms\n  - Dashboard: MemGraph Lab (http://localhost:3002)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=memgraph\n    MEMGRAPH_GRAPH_DB_CONFIG={\"url\": \"bolt://localhost:7687\", \"username\": \"\", \"password\": \"\"}\n    ```\n\n- **NebulaGraph**: Distributed graph database for large-scale data with horizontal scalability\n  - Dashboard: NebulaGraph Studio (http://localhost:7001)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=nebula\n    NEBULA_GRAPH_DB_CONFIG={\"space\": \"flexible_graphrag\", \"host\": \"localhost\", \"port\": 9669, \"username\": \"root\", \"password\": \"nebula\"}\n    ```\n\n- **Amazon Neptune**: Fully managed graph database service supporting property graph and RDF models\n  - Dashboard: Graph-Explorer (http://localhost:3007) or Neptune Workbench (AWS Console)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=neptune\n    NEPTUNE_GRAPH_DB_CONFIG={\"host\": \"your-cluster.region.neptune.amazonaws.com\", \"port\": 8182}\n    ```\n\n- **Amazon Neptune Analytics**: Serverless graph analytics with openCypher support\n  - Dashboard: Graph-Explorer (http://localhost:3007) or Neptune Workbench (AWS Console)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=neptune_analytics\n    NEPTUNE_ANALYTICS_GRAPH_DB_CONFIG={\"graph_identifier\": \"g-xxxxx\", \"region\": \"us-east-1\"}\n    ```\n\n- **Google Cloud Spanner Graph** *(LlamaIndex only)*: Managed relational + property graph (GQL). Uses `llama-index-spanner` — install with `uv pip install -e \".[spanner-extras]\"` then `uv pip uninstall llama-index` (see [Optional](#optional) under Prerequisites). LangChain is not supported for this store (`langchain-google-spanner` pins incompatible `langchain-core`).\n  - Setup: [docs/DATABASES/GRAPH-DATABASES/SPANNER-SETUP.md](docs/DATABASES/GRAPH-DATABASES/SPANNER-SETUP.md)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=spanner\n    # GRAPH_BACKEND=llamaindex is forced for Spanner (LlamaIndex-only); langchain is ignored\n    SPANNER_GRAPH_DB_CONFIG={\"project_id\": \"my-gcp-project\", \"instance_id\": \"my-spanner-instance\", \"database_id\": \"my-database\", \"graph_name\": \"knowledge_graph\", \"credentials_file\": \"./gcs.json\"}\n    ```\n\n- **ArangoDB** *(LangChain only)*: Multi-model database with AQL graph queries\n  - Dashboard: ArangoDB Web UI (http://localhost:8529)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=arangodb\n    ARANGODB_GRAPH_DB_CONFIG={\"url\": \"http://localhost:8529\", \"database\": \"flexible_graphrag\", \"username\": \"root\", \"password\": \"password\"}\n    ```\n\n- **Apache AGE** *(LangChain only)*: PostgreSQL extension for graph data via Cypher\n  - Dashboard: pgAdmin (http://localhost:5050)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=apache_age\n    APACHE_AGE_GRAPH_DB_CONFIG={\"host\": \"localhost\", \"port\": 5434, \"database\": \"flexible_graphrag_age\", \"username\": \"postgres\", \"password\": \"password\", \"graph_name\": \"knowledge_graph\"}\n    ```\n\n- **HugeGraph** *(LangChain only)*: Distributed graph database with Gremlin and openCypher\n  - Dashboard: HugeGraph Hubble (http://localhost:8085)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=hugegraph\n    HUGEGRAPH_GRAPH_DB_CONFIG={\"host\": \"localhost\", \"port\": 8082, \"database\": \"hugegraph\"}\n    ```\n\n- **SurrealDB** *(LangChain only)*: Multi-model database with SurrealQL graph queries\n  - Dashboard: Surrealist (http://localhost:8011)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=surrealdb\n    SURREALDB_GRAPH_DB_CONFIG={\"url\": \"ws://localhost:8010/rpc\", \"namespace\": \"test\", \"database\": \"flexible_graphrag\", \"username\": \"root\", \"password\": \"root\"}\n    ```\n\n- **TigerGraph** *(LangChain only)*: Distributed graph database with GSQL\n  - Dashboard: GraphStudio (http://localhost:14240)\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=tigergraph\n    TIGERGRAPH_GRAPH_DB_CONFIG={\"host\": \"http://localhost\", \"port\": 14240, \"restpp_port\": 9002, \"database\": \"MyGraph\", \"username\": \"tigergraph\", \"password\": \"tigergraph\"}\n    ```\n\n- **Cosmos Gremlin** *(LangChain only)*: Azure Cosmos DB for Gremlin API\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=cosmos_gremlin\n    COSMOS_GREMLIN_GRAPH_DB_CONFIG={\"url\": \"ws://localhost:8182/gremlin\"}\n    ```\n\n- **None**: Disable knowledge graph extraction for RAG-only mode\n  - Configuration:\n    ```bash\n    PG_GRAPH_DB=none\n    ```\n\n## Ontology and RDF Support\n\nFlexible GraphRAG supports RDF/RDFS/OWL ontologies to guide knowledge graph extraction, with optional RDF graph store backends. Ontology-guided extraction works with **any** configured store — property graph, RDF graph store, or both.\n\n- Load OWL/RDFS ontologies (`owl:Class`, `owl:ObjectProperty`, `owl:DatatypeProperty`, `rdfs:domain`, `rdfs:range`) to constrain entity/relation extraction; OWL is supported but not required\n- Works with all 15 property graph databases — no RDF store required to use ontology-guided extraction\n- Full pipeline for all 4 RDF graph stores: UI document ingest → KG extraction → RDF storage; auto incremental sync; Hybrid Search and AI Query/Chat fuse RDF store results alongside vector, BM25, and property graph results\n- SPARQL 1.1 queries; RDF 1.2 triple terms and relation annotations (`{| |}` syntax); XSD-typed literals from OWL `DatatypeProperty` ranges\n\n**RDF Graph Store Configuration** — set `RDF_GRAPH_DB` to select the store (all four support RDF 1.2 triple terms; Neptune is AWS-managed—no local compose include):\n\n- **Apache Jena Fuseki** — SPARQL 1.1 server; dashboard: http://localhost:3030\n  ```bash\n  RDF_GRAPH_DB=fuseki\n  FUSEKI_BASE_URL=http://localhost:3030\n  FUSEKI_DATASET=flexible-graphrag\n  ```\n\n- **Ontotext GraphDB** — enterprise RDF store with OWL reasoning; dashboard: http://localhost:7200\n  ```bash\n  RDF_GRAPH_DB=graphdb\n  GRAPHDB_BASE_URL=http://localhost:7200\n  GRAPHDB_REPOSITORY=flexible-graphrag\n  GRAPHDB_USERNAME=admin\n  GRAPHDB_PASSWORD=admin\n  ```\n\n- **Oxigraph** — lightweight local store, native RDF 1.2; dashboard: http://localhost:7878\n  ```bash\n  RDF_GRAPH_DB=oxigraph\n  OXIGRAPH_URL=http://localhost:7878\n  ```\n\n- **Amazon Neptune RDF** — managed SPARQL 1.1 on Neptune (same cluster can host property graph and RDF; IAM SigV4 auth). See [Neptune RDF setup](docs/DATABASES/GRAPH-DATABASES/NEPTUNE-SETUP.md).\n  ```bash\n  RDF_GRAPH_DB=neptune_rdf\n  NEPTUNE_RDF_HOST=db-neptune-1.cluster-xxxxxxxxxxxx.us-east-1.neptune.amazonaws.com\n  NEPTUNE_RDF_PORT=8182\n  NEPTUNE_RDF_REGION=us-east-1\n  NEPTUNE_RDF_USE_IAM_AUTH=true\n  NEPTUNE_RDF_USE_HTTPS=true\n  # Optional explicit keys (else default AWS credential chain):\n  # NEPTUNE_RDF_AWS_ACCESS_KEY_ID=\n  # NEPTUNE_RDF_AWS_SECRET_ACCESS_KEY=\n  ```\n\n- **None** — disable RDF graph store:\n  ```bash\n  RDF_GRAPH_DB=none\n  ```\n\n**Docker Setup:** Uncomment local RDF store includes in `docker-compose.yaml` (Fuseki, GraphDB, Oxigraph):\n```yaml\nincludes:\n  # - includes/jena-fuseki.yaml\n  # - includes/ontotext-graphdb.yaml\n  # - includes/oxigraph.yaml\n```\n\n**Complete Documentation:** [docs/DATABASES/RDF/RDF-ONTOLOGY-SUPPORT.md](docs/DATABASES/RDF/RDF-ONTOLOGY-SUPPORT.md) | [docs/DATABASES/RDF/RDF-STORE-USER-GUIDE.md](docs/DATABASES/RDF/RDF-STORE-USER-GUIDE.md)\n\n## Framework Configuration\n\nEvery pipeline stage can independently run on LlamaIndex or LangChain via env var pickers:\n\n| Variable | Options | Description |\n|---|---|---|\n| `GRAPH_BACKEND` | `llamaindex` \\| `langchain` | Property graph store and KG retrieval |\n| `VECTOR_BACKEND` | `llamaindex` \\| `langchain` | Vector store adapter |\n| `SEARCH_BACKEND` | `llamaindex` \\| `langchain` | Full-text search adapter |\n| `CHUNKER_BACKEND` | `llamaindex` \\| `langchain` | Document chunking / splitting |\n| `KG_EXTRACTOR_BACKEND` | `llamaindex` \\| `langchain` | KG extraction from chunks |\n| `RETRIEVAL_FUSION` | `llamaindex` \\| `langchain` | Result fusion across retrievers |\n\nLangChain-only graph stores (ArangoDB, Apache AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin) auto-select `GRAPH_BACKEND=langchain`. LlamaIndex-only Spanner (`PG_GRAPH_DB=spanner`) forces `GRAPH_BACKEND=llamaindex` at startup and ignores `GRAPH_BACKEND=langchain` (no LangChain adapter).\n\n**Complete Documentation:** [docs/ADVANCED/LANGCHAIN/LANGCHAIN-GRAPH-INTEGRATION.md](docs/ADVANCED/LANGCHAIN/LANGCHAIN-GRAPH-INTEGRATION.md)\n\n## LLM and Embedding Configuration\n\nSet via `LLM_PROVIDER` and provider-specific environment variables.\n\n### Supported LLM Providers\n\n1. **OpenAI** - gpt-4o-mini (default), gpt-4o, gpt-4.1-mini, gpt-5-mini, etc.\n2. **Ollama** - Local deployment (llama3.2, llama3.1, qwen2.5, gpt-oss, etc.)\n3. **Azure OpenAI** - Azure-hosted OpenAI models\n4. **Google Gemini** - gemini-2.5-flash, gemini-3-flash-preview, gemini-3.1-pro-preview, etc.\n5. **Anthropic Claude** - claude-sonnet-4-5, claude-haiku-4-5, etc.\n6. **Google Vertex AI** - Google Cloud-hosted Vertex AI Platform Gemini models\n7. **Amazon Bedrock** - Amazon Nova, Titan, Anthropic Claude, Meta Llama, Mistral AI, etc.\n8. **Groq** - Fast low-cost LPU inference, low latency: OpenAI GPT-OSS, Meta Llama (4, 3.3, 3.1), Qwen3, Kimi, etc.\n9. **Fireworks AI** - More choices, fine-tuning: Meta, Qwen, Mistral AI, DeepSeek, OpenAI GPT-OSS, Kimi, GLM, MiniMax, etc.\n10. **OpenAI-Compatible** (`openai_like`) - Any OpenAI-compatible endpoint (LM Studio, LocalAI, Llamafile, vLLM, etc.)\n11. **OpenRouter** - 200+ models via unified API (openai/gpt-4o-mini, anthropic/claude, meta-llama, etc.)\n12. **LiteLLM Proxy** - 100+ providers via LiteLLM proxy; sample config in `scripts/litellm_config.yaml`\n13. **vLLM** - High-performance local inference server (Linux/macOS; use `openai_like` on Windows)\n\n### LLM Provider Configuration\n\nSee [docs/LLM/LLM-EMBEDDING-CONFIG.md](docs/LLM/LLM-EMBEDDING-CONFIG.md) for all 13 providers with detailed configuration examples.\n\n**OpenAI** (recommended):\n```bash\nLLM_PROVIDER=openai\nOPENAI_API_KEY=your_api_key\nOPENAI_MODEL=gpt-4o-mini\n```\n\n**Ollama** (local):\n```bash\nLLM_PROVIDER=ollama\nOLLAMA_BASE_URL=http://localhost:11434\nOLLAMA_MODEL=llama3.2:latest\n```\n\n**Azure OpenAI**:\n```bash\nLLM_PROVIDER=azure_openai\nAZURE_OPENAI_API_KEY=your_key\nAZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/\nAZURE_OPENAI_ENGINE=gpt-4o-mini\n```\n\n### Embedding Configuration\n\nSet `EMBEDDING_KIND` to choose the embedding provider — independent of the LLM provider. All 13 LLM providers are also supported as embedding providers. See [docs/LLM/LLM-EMBEDDING-CONFIG.md](docs/LLM/LLM-EMBEDDING-CONFIG.md) for all providers and options.\n\n**OpenAI**:\n```bash\nEMBEDDING_KIND=openai\nOPENAI_EMBEDDING_MODEL=text-embedding-3-small\nOPENAI_API_KEY=your_api_key\n```\n\n**Ollama** (local):\n```bash\nEMBEDDING_KIND=ollama\nOLLAMA_EMBEDDING_MODEL=nomic-embed-text\nOLLAMA_BASE_URL=http://localhost:11434\n```\n\n**Azure OpenAI**:\n```bash\nEMBEDDING_KIND=azure_openai\nAZURE_EMBEDDING_MODEL=text-embedding-3-small\nAZURE_EMBEDDING_DEPLOYMENT=your_deployment_name\nAZURE_OPENAI_API_KEY=your_key\nAZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/\n```\n\n**Common embedding dimensions:**\n- OpenAI: 1536 (text-embedding-3-small), 3072 (text-embedding-3-large)\n- Ollama: 384 (all-minilm), 768 (nomic-embed-text), 1024 (mxbai-embed-large)\n- Google: 768 (gemini-embedding-2-preview)\n- Bedrock: 1024 (amazon.titan-embed-text-v2:0)\n\nWhen switching embedding models, delete existing vector indexes. See [docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md](docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md) for cleanup instructions.\n\n### Ollama Configuration\n\nWhen using Ollama, configure system-wide environment variables before starting the Ollama service:\n\n**Key requirements**:\n- Configure environment variables **system-wide** (not in Flexible GraphRAG `.env` file)\n- `OLLAMA_NUM_PARALLEL=4` for optimal performance (or 1-2 if resource constrained)\n- Always restart Ollama service after changing environment variables\n\nSee [docs/LLM/OLLAMA-CONFIGURATION.md](docs/LLM/OLLAMA-CONFIGURATION.md) for complete setup instructions including platform-specific steps and performance optimization.\n\n\n\n## Prerequisites\n\n### Required\n- Python 3.12, 3.13, or 3.14 (as specified in `pyproject.toml`)\n- UV package manager (for dependency management)\n- Node.js 22.x (for UI clients)\n- npm (package manager)\n- Search database: Elasticsearch or OpenSearch\n- Vector database: Qdrant (or other supported vector databases)\n- Property graph database: Neo4j (or other supported property graph databases) - unless using vector-only RAG\n- OpenAI with API key (recommended) or Ollama (for LLM processing)\n\n**Note**: The `docker/docker-compose.yaml` file can provide all these databases via Docker containers.\n\n### Install\n\n```bash\ncd flexible-graphrag\nuv pip install -e .\n```\n\n### Optional\n- **LangChain 1.x integration** — Optional peer stack alongside LlamaIndex (extras pin **`langchain\u003e=1.0`** and the LangChain **1.x** line, not legacy 0.3):\n  - `uv pip install -e \".[langchain]\"` — core LC extras: property graph stores via `langchain-community` where supported, 10 vector stores, 3 search stores, RDF SPARQL retrieval, native LC LLM/embedding clients for all 13 providers, KG extraction via `langchain-experimental`, retrieval fusion\n  - `uv pip install --override extras-overrides.txt -e \".[langchain,langchain-extras]\"` — adds Neo4j (LC), PostgreSQL pgvector, ArcadeDB, ArangoDB, Cosmos Gremlin, HugeGraph, TigerGraph, and related dependencies (see `pyproject.toml` group `langchain-extras`)\n  - **Apache AGE** — property graph via LangChain needs the separate **`age-extras`** group (BAEM1N `langchain-age` driver):\n    ```bash\n    uv pip install --override extras-overrides.txt -e \".[langchain,langchain-extras,age-extras]\"\n    python scripts/patch_langchain_age.py\n    ```\n    Run `patch_langchain_age.py` on **Python 3.14+** (required); on 3.12/3.13 it is harmless.\n  - `uv pip install -e \".[spanner-extras]\"` — adds LI-only Spanner support via `llama-index-spanner`. **Note:** `llama-index-spanner` declares `llama-index` (the meta-package) as a dependency, which `uv` will install. Uninstall it immediately after: `uv pip uninstall llama-index` — having both `llama-index` and `llama-index-core` installed simultaneously can cause version conflicts, as the meta-package pins versions of `llama-index-*` component packages that can clash with the versions already required by this project\n  - SurrealDB — two-step install required (resolver conflict):\n    ```bash\n    uv pip install -e \".[surrealdb-extras]\"\n    uv pip install \"surrealdb\u003e=2.0\" \"langchain-core\u003e=1.3\"\n    ```\n- **ArcadeDB embedded mode** (`uv pip install arcadedb\u003e=26.3.2`) — runs ArcadeDB in-process; includes a bundled JVM, no separate Java install needed; latest release: 26.3.2\n- **Enterprise Repositories**:\n  - Alfresco repository - only if using Alfresco data source\n  - SharePoint - requires SharePoint access\n  - Box - requires Box Business account (3 users minimum), API keys\n  - CMIS-compliant repository (e.g., Alfresco) - only if using CMIS data source\n- **Cloud Storage** (requires accounts and API keys/credentials):\n  - Amazon S3 - requires AWS account and access keys\n  - Google Cloud Storage - requires GCP account and service account credentials\n  - Google Drive - requires Google Cloud account and OAuth credentials or service account\n  - Azure Blob Storage - requires Azure account and connection string or account keys\n  - Microsoft OneDrive - requires OneDrive for Business (not personal OneDrive)\n  - **Note**: SharePoint and OneDrive for Business are also available with a M365 Developer Program sandbox (with full Visual Studio annual subscription, not monthly).\n- **File Upload** (no account required):\n  - Web interface with file dialog (drag \u0026 drop or click to select)\n- **Web Sources** (no account required):\n  - Web pages, Wikipedia, YouTube - no accounts needed\n\n## Setup\n\n### 🐳 Docker Deployment\n\nDocker deployment offers multiple scenarios. Before deploying any scenario, set up your environment files:\n\n**Environment File Setup (Required for All Scenarios):**\n\n1. **Backend Configuration** (`.env`):\n   ```bash\n   # Navigate to backend directory\n   cd flexible-graphrag\n   \n   # Linux/macOS\n   cp env-sample.txt .env\n   \n   # Windows Command Prompt\n   copy env-sample.txt .env\n   \n   # Edit .env with your database credentials, API keys, and settings\n   # Then return to project root\n   cd ..\n   ```\n\n2. **Docker Configuration** (`docker.env`):\n   ```bash\n   # Navigate to docker directory\n   cd docker\n   \n   # Linux/macOS\n   cp docker-env-sample.txt docker.env\n   \n   # Windows Command Prompt\n   copy docker-env-sample.txt docker.env\n   \n   # Edit docker.env for Docker-specific overrides (network addresses, service names)\n   # Stay in docker directory for next steps\n   ```\n\n---\n\n#### Scenario A: Databases in Docker, App Standalone (Hybrid)\n\n**Configuration Setup:**\n```bash\n# If not already in docker directory from previous step:\n# cd docker\n\n# Edit docker-compose.yaml to uncomment/comment services as needed\n# Scenario A setup in docker-compose.yaml:\n# Keep these services uncommented (default setup):\n  - includes/neo4j.yaml\n  - includes/qdrant.yaml\n  - includes/elasticsearch-dev.yaml\n  - includes/kibana-simple.yaml\n\n# Keep these services commented out:\n# - includes/app-stack.yaml       # Must be commented out for Scenario A\n# - includes/proxy.yaml           # Must be commented out for Scenario A\n# - All other services remain commented unless you want a different vector database, \n#   graph database, OpenSearch for search, or Alfresco included\n```\n\n**Deploy Services:**\n```bash\n# From the docker directory\ndocker-compose -f docker-compose.yaml -p flexible-graphrag up -d\n```\n\n#### Scenario B: Full Stack in Docker (Complete)\n\n**Configuration Setup:**\n```bash\n# If not already in docker directory from previous step:\n# cd docker\n\n# Edit docker-compose.yaml to uncomment/comment services as needed\n# Scenario B setup in docker-compose.yaml:\n# Keep these services uncommented:\n  - includes/neo4j.yaml\n  - includes/qdrant.yaml\n  - includes/elasticsearch-dev.yaml\n  - includes/kibana-simple.yaml\n  - includes/app-stack.yaml       # Backend and UI in Docker\n  - includes/proxy.yaml           # NGINX reverse proxy\n\n# Keep other services commented out unless you want a different vector database,\n# graph database, OpenSearch for search, or Alfresco included\n```\n\n**Deploy Services:**\n```bash\n# From the docker directory\ndocker-compose -f docker-compose.yaml -p flexible-graphrag up -d\n```\n\n**Scenario B Service URLs:**\n- **Angular UI**: http://localhost:8070/ui/angular/\n- **React UI**: http://localhost:8070/ui/react/  \n- **Vue UI**: http://localhost:8070/ui/vue/\n- **Backend API**: http://localhost:8070/api/\n\n#### Other Deployment Scenarios\n\n**Scenario C: Fully Standalone** - Not using docker-compose at all\n- Standalone backend, standalone UIs, all databases running separately\n- Configure all database connections in `flexible-graphrag/.env`\n\n**Scenario D: Backend/UIs in Docker, Databases External**\n- Using docker-compose for backend and UIs (app-stack + proxy)\n- Some or all databases running separately (same docker-compose, other local Docker, cloud/remote servers)\n- Configure database connections in `docker/docker.env`: Backend in Docker reads this file\n  - For databases in same docker-compose: Use service names (e.g., `neo4j:7687`, `qdrant:6333`)\n  - For databases in other local Docker containers: Use `host.docker.internal:PORT`\n  - For remote/cloud databases: Use actual hostnames/IPs\n\n**Scenario E: Mixed Docker/Standalone**\n- Standalone backend and UIs\n- Running some databases in Docker (local) and some outside (cloud, external servers)\n- Configure all database connections in `flexible-graphrag/.env`: Use `host.docker.internal:PORT` for locally-running Docker databases, use actual hostnames/IPs for remote Docker or non-Docker databases\n\n#### Docker Control and Configuration\n\n**Managing Docker services:**\n\n```bash\n# Navigate to docker directory (if not already there)\ncd docker\n\n# Create and start services (recreates if configuration changed)\ndocker-compose -f docker-compose.yaml -p flexible-graphrag up -d\n\n# Stop services (keeps containers)\ndocker-compose -f docker-compose.yaml -p flexible-graphrag stop\n\n# Start stopped services\ndocker-compose -f docker-compose.yaml -p flexible-graphrag start\n\n# Stop and remove services\ndocker-compose -f docker-compose.yaml -p flexible-graphrag down\n\n# View logs\ndocker-compose -f docker-compose.yaml -p flexible-graphrag logs -f\n\n# Restart after configuration changes\ndocker-compose -f docker-compose.yaml -p flexible-graphrag down\n# Edit docker-compose.yaml, docker.env, or includes/app-stack.yaml as needed\ndocker-compose -f docker-compose.yaml -p flexible-graphrag up -d\n```\n\n**Configuration:**\n- **Modular deployment**: Comment/uncomment services in `docker/docker-compose.yaml`\n- **Backend configuration** (Scenario B): Backend uses `flexible-graphrag/.env` with `docker/docker.env` for Docker-specific overrides (like using service names instead of localhost). No configuration needed in `app-stack.yaml`\n\nSee [docker/README.md](./docker/README.md) for detailed Docker configuration.\n\n### 🔧 Local Development Setup (Scenario A)\n\n**Note**: Skip this entire section if using Scenario B (Full Stack in Docker).\n\n#### Environment Configuration\n\n**Create environment file** (cross-platform):\n```bash\n# Linux/macOS\ncp flexible-graphrag/env-sample.txt flexible-graphrag/.env\n\n# Windows Command Prompt  \ncopy flexible-graphrag\\env-sample.txt flexible-graphrag\\.env\n```\nEdit `.env` with your database credentials and API keys.\n\n### Python Backend Setup (Standalone)\n\n#### Option A — Install from PyPI package (Quickstart)\n\n```bash\n# 1. Create and activate a virtual environment\nuv venv venv-3.13 --python 3.13\nvenv-3.13\\Scripts\\Activate   # Windows\nsource venv-3.13/bin/activate  # Linux/macOS\n\n# 2. Install flexible-graphrag\nuv pip install flexible-graphrag\n\n# 3. Optionally install ArcadeDB embedded mode support (includes bundled JVM, no Java install needed)\nuv pip install arcadedb\u003e=26.3.2\n\n# 3a. Optional dependency groups, for example:\nuv pip install \"flexible-graphrag[langchain]\"\n# Other extras ([langchain-extras], [age-extras], overrides): see source README, Prerequisites \u003e Optional.\n\n# 4. Create .env from the sample (copy from the source repo or download env-sample.txt)\ncopy env-sample.txt .env   # Windows\ncp env-sample.txt .env     # Linux/macOS\n# Edit .env with your LLM API keys and database settings\n\n# 5. Start your databases (docker compose or standalone)\ndocker compose -f docker/docker-compose.yml up -d\n\n# 6. Run the backend\nflexible-graphrag\n# or: uv run start.py\n```\n\n#### Option B — Install from source (editable)\n\n1. Navigate to the backend directory:\n   ```bash\n   cd flexible-graphrag\n   ```\n\n2. Create and activate a virtual environment, then install in editable mode:\n   ```bash\n   uv venv venv-3.13 --python 3.13\n   venv-3.13\\Scripts\\Activate   # Windows\n   source venv-3.13/bin/activate  # Linux/macOS\n   uv pip install -e .\n\n   # --- Optional: dependency groups from pyproject.toml [project.optional-dependencies] ---\n   # LangChain (peer framework; use overrides when combining with langchain-extras)\n   uv pip install -e \".[langchain]\"\n   uv pip install --override extras-overrides.txt -e \".[langchain,langchain-extras]\"\n   uv pip install --override extras-overrides.txt -e \".[langchain,langchain-extras,age-extras]\"\n   python scripts/patch_langchain_age.py\n   uv pip install --override extras-overrides.txt -e \".[surrealdb-extras]\"\n   uv pip install \"surrealdb\u003e=2.0\" \"langchain-core\u003e=1.3\"\n   uv pip install --override extras-overrides.txt -e \".[spanner-extras]\"\n   uv pip uninstall llama-index\n\n   # RDF extras (base install already includes rdflib/pyoxigraph; use these if you need the named groups)\n   uv pip install -e \".[rdf]\"\n   uv pip install -e \".[rdf-full]\"\n\n   # Observability\n   uv pip install -e \".[observability]\"\n   uv pip install -e \".[observability-openlit]\"\n   uv pip install -e \".[observability-dual]\"\n\n   # Development tests / tooling\n   uv pip install -e \".[dev]\"\n\n   # Docling OCR backends (see DOCLING_OCR in env-sample)\n   uv pip install -e \".[docling-ocr-easyocr]\"\n   uv pip install -e \".[docling-ocr-tesserocr]\"\n   uv pip install -e \".[docling-ocr-ocrmac]\"   # macOS only\n\n   # Embedded ArcadeDB (not a bracket extra; bundled JVM)\n   uv pip install arcadedb\u003e=26.3.2\n   ```\n\n   **uv-managed venv** (alternative): change `managed = false` to `managed = true` in `pyproject.toml` `[tool.uv]` section, then just run `uv pip install -e .`.\n\n   Notes: run only the optional lines you need. For **`age-extras`**, run **`patch_langchain_age.py`** on **Python 3.14+** (safe on 3.12/3.13). For **`surrealdb-extras`**, keep the follow-up **`surrealdb` / `langchain-core`** upgrades. For **`spanner-extras`**, **`uv pip uninstall llama-index`** removes the meta-package pulled in by **`llama-index-spanner`**. See **### Optional** under **Prerequisites** for context.\n\n   **Windows Note**: If installation fails with \"Microsoft Visual C++ 14.0 or greater is required\" error, install [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) (required for compiling Docling dependencies). Select \"Desktop development with C++\" during installation.\n\n3. Create a `.env` file by copying the sample and customizing:\n   ```bash\n   cp env-sample.txt .env   # Linux/macOS\n   copy env-sample.txt .env  # Windows\n   ```\n\n   Edit `.env` with your specific configuration. See [docs/GETTING-STARTED/ENVIRONMENT-CONFIGURATION.md](docs/GETTING-STARTED/ENVIRONMENT-CONFIGURATION.md) for detailed setup guide.\n\n**Note**: The system requires Python 3.12, 3.13, or 3.14 as specified in `pyproject.toml` (requires-python = \"\u003e=3.12,\u003c3.15\"). Python 3.12 and 3.13 are fully tested. Python 3.14 works with the patches applied automatically in `main.py` at startup. Virtual environment management is controlled by `managed = false` in `pyproject.toml` `[tool.uv]` section (you control venv creation and naming).\n\n4. Start the backend:\n   ```bash\n   flexible-graphrag        # after uv pip install flexible-graphrag\n   # or: uv run start.py   # with source\n   ```\n\nThe backend will be available at `http://localhost:8000`.\n\n### Frontend Setup (Standalone)\n\n**Standalone backend and frontend URLs:**\n- **Backend API**: http://localhost:8000 (FastAPI server)\n- **Angular**: http://localhost:4200 (npm start)\n- **React**: http://localhost:5173 (npm run dev)  \n- **Vue**: http://localhost:3000 (npm run dev)\n\nChoose one of the following frontend options to work with:\n\n#### React Frontend\n\n1. Navigate to the React frontend directory:\n   ```bash\n   cd flexible-graphrag-ui/frontend-react\n   ```\n\n2. Install Node.js dependencies (first time only):\n   ```bash\n   npm install\n   ```\n\n3. Start the development server (uses Vite):\n   ```bash\n   npm run dev\n   ```\n\nThe React frontend will be available at `http://localhost:5174`.\n\n#### Angular Frontend\n\n1. Navigate to the Angular frontend directory:\n   ```bash\n   cd flexible-graphrag-ui/frontend-angular\n   ```\n\n2. Install Node.js dependencies (first time only):\n   ```bash\n   npm install\n   ```\n\n3. Start the development server (uses Angular CLI):\n   ```bash\n   npm start\n   ```\n\nThe Angular frontend will be available at `http://localhost:4200`.\n\n#### Vue Frontend\n\n1. Navigate to the Vue frontend directory:\n   ```bash\n   cd flexible-graphrag-ui/frontend-vue\n   ```\n\n2. Install Node.js dependencies (first time only):\n   ```bash\n   npm install\n   ```\n\n3. Start the development server (uses Vite):\n   ```bash\n   npm run dev\n   ```\n\nThe Vue frontend will be available at `http://localhost:3000`.\n\n## UI Usage\n\nThe system provides a tabbed interface for document processing and querying. Follow these steps in order. See [docs/UI-GUIDE/UI-GUIDE.md](docs/UI-GUIDE/UI-GUIDE.md) for full details.\n\n### 1. Sources Tab\n\nConfigure your data source and select files for processing. The system supports **13 data sources**:\n\n**Detailed Configuration:**\n\n#### File Upload Data Source\n- **Select**: \"File Upload\" from the data source dropdown\n- **Add Files**: \n  - **Drag \u0026 Drop**: Drag files directly onto the upload area\n  - **Click to Select**: Click the upload area to open file selection dialog (supports multi-select)\n  - **Note**: If you drag \u0026 drop new files after selecting via dialog, only the dragged files will be used\n- **Supported Formats**: PDF, DOCX, XLSX, PPTX, TXT, MD, HTML, CSV, PNG, JPG, and more\n- **Next Step**: Click \"CONFIGURE PROCESSING →\" to proceed to Processing tab\n\n#### Alfresco Repository\n- **Select**: \"Alfresco Repository\" from the data source dropdown\n- **Configure**:\n  - Alfresco Base URL (e.g., `http://localhost:8080/alfresco`)\n  - Username and password\n  - Path (e.g., `/Sites/example/documentLibrary`)\n- **Next Step**: Click \"CONFIGURE PROCESSING →\" to proceed to Processing tab\n\n#### CMIS Repository\n- **Select**: \"CMIS Repository\" from the data source dropdown\n- **Configure**: \n  - CMIS Repository URL (e.g., `http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/atom`)\n  - Username and password\n  - Folder path (e.g., `/Sites/example/documentLibrary`)\n- **Next Step**: Click \"CONFIGURE PROCESSING →\" to proceed to Processing tab\n\n**All Data Sources** (13 available):\n- **Web Sources**: Web Page, Wikipedia, YouTube\n- **Cloud Storage**: Amazon S3, Google Cloud Storage, Azure Blob Storage, Google Drive, Microsoft OneDrive\n- **Enterprise Repositories**: Alfresco, Microsoft SharePoint, Box, CMIS\n\nSee the [Data Sources](#data-sources) section for complete details on all 13 sources.\n\n### 2. Processing Tab\n\nProcess your selected documents and monitor progress:\n\n- **Start Processing**: Click \"START PROCESSING\" to begin document ingestion\n- **Monitor Progress**: View real-time progress bars for each file\n- **File Management**: \n  - Use checkboxes to select files\n  - Click \"REMOVE SELECTED (N)\" to remove selected files from the list\n  - **Note**: This removes files from the processing queue, not from your system\n- **Processing Pipeline**: Documents are processed through Docling conversion, vector indexing, and knowledge graph creation\n\n### 3. Search Tab\n\nPerform searches on your processed documents:\n\n#### Hybrid Search\n- **Purpose**: Find and rank the most relevant document excerpts\n- **Usage**: Enter search terms or phrases (e.g., \"machine learning algorithms\", \"financial projections\")\n- **Action**: Click \"SEARCH\" button\n- **Results**: Ranked list of document excerpts with relevance scores and source information\n- **Best for**: Research, fact-checking, finding specific information across documents\n\n#### Q\u0026A Query\n- **Purpose**: Get AI-generated answers to natural language questions\n- **Usage**: Enter natural language questions (e.g., \"What are the main findings in the research papers?\")\n- **Action**: Click \"ASK\" button\n- **Results**: AI-generated narrative answers that synthesize information from multiple documents\n- **Best for**: Summarization, analysis, getting overviews of complex topics\n\n### 4. Chat Tab\n\nInteractive conversational interface for document Q\u0026A:\n\n- **Chat Interface**: \n  - **Your Questions**: Displayed on the right side vertically\n  - **AI Answers**: Displayed on the left side vertically\n- **Usage**: Type questions and press Enter or click send\n- **Conversation History**: All questions and answers are preserved in the chat history\n- **Clear History**: Click \"CLEAR HISTORY\" button to start a new conversation\n- **Best for**: Iterative questioning, follow-up queries, conversational document exploration\n\n### Testing Cleanup\n\nBetween tests you can clean up data:\n- **Run `cleanup.py`**: Clears vector, graph, and search indexes in one step — run from the `flexible-graphrag` directory\n- **Vector Indexes**: See [docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md](docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md) for vector database cleanup instructions\n- **Graph Data**: See [docs/DATABASES/GRAPH-DATABASES/README-neo4j.md](docs/DATABASES/GRAPH-DATABASES/README-neo4j.md) for graph-related cleanup commands\n\n## MCP Server Setup (Quickstart)\n\nThe MCP server (`flexible-graphrag-mcp`) is a lightweight standalone package that connects MCP clients (Claude Desktop, Cursor, etc.) to the Flexible GraphRAG backend via its REST API.\n\nFor full details see [`flexible-graphrag-mcp/README.md`](flexible-graphrag-mcp/README.md) and [`flexible-graphrag-mcp/QUICK-USAGE-GUIDE.md`](flexible-graphrag-mcp/QUICK-USAGE-GUIDE.md). For the full list of available MCP tools see [MCP Tools for Claude Desktop and Other MCP Clients](#mcp-tools-for-claude-desktop-and-other-mcp-clients) below.\n\n### Steps\n\n1. **First terminal — install and run the flexible-graphrag backend** (see [Python Backend Setup](#python-backend-setup-standalone) above) — it must be running on `http://localhost:8000`.\n\n2. **Second terminal — install and start the MCP server** in HTTP mode:\n   ```bash\n   uv venv venv-mcp --python 3.13\n   venv-mcp\\Scripts\\Activate   # Windows\n   source venv-mcp/bin/activate  # Linux/macOS\n   uv pip install flexible-graphrag-mcp\n   flexible-graphrag-mcp --http --port 3001\n   ```\n\n3. **Third terminal — test with MCP Inspector**:\n   ```bash\n   npx @modelcontextprotocol/inspector\n   ```\n   Open the URL printed in the console (token pre-filled), set transport to **Streamable HTTP**, URL to `http://localhost:3001/mcp`, then click **Connect**.\n\n4. **Use with Claude Desktop and other MCP clients** — see [`flexible-graphrag-mcp/README.md`](flexible-graphrag-mcp/README.md) for stdio transport config and client-specific setup.\n\n## MCP Tools for Claude Desktop and Other MCP Clients\n\nThe MCP server provides 9 specialized tools for document intelligence workflows:\n\n| Tool | Purpose | Usage |\n|------|---------|-------|\n| `get_system_status()` | System health and configuration | Verify setup and database connections |\n| `ingest_documents()` | Bulk document processing | All sources support `skip_graph`; filesystem/Alfresco/CMIS use `paths`; Alfresco also supports `nodeDetails` list (13 sources have their own config: filesystem, repositories (Alfresco, SharePoint, Box, CMIS), cloud storage, web) |\n| `ingest_text(content, source_name)` | Custom text analysis | Analyze specific text content |\n| `search_documents(query, top_k)` | Hybrid document retrieval | Find relevant document excerpts |\n| `query_documents(query, top_k)` | AI-powered Q\u0026A | Generate answers from document corpus |\n| `test_with_sample()` | System verification | Quick test with sample content |\n| `check_processing_status(id)` | Async operation monitoring | Track long-running ingestion tasks |\n| `get_python_info()` | Environment diagnostics | Debug Python environment issues |\n| `health_check()` | Backend connectivity | Verify API server connection |\n\n### Client Support\n- **Claude Desktop and other MCP clients**: Native MCP integration with stdio transport\n- **MCP Inspector**: HTTP transport for debugging and development\n- **Multiple Installation**: pipx (system-wide) or uvx (no-install) options\n\n## Backend REST API\n\nThe FastAPI backend provides the following REST API endpoints:\n\n**Base URL**: `http://localhost:8000/api/`\n\n**System**\n\n| Endpoint | Method | Purpose |\n|---|---|---|\n| `/api/health` | GET | Health check — verify backend is running |\n| `/api/status` | GET | System status and configuration (databases, LLM, feature flags) |\n| `/api/info` | GET | System information and package versions |\n| `/api/python-info` | GET | Python environment diagnostics |\n\n**Ingestion**\n\n| Endpoint | Method | Purpose |\n|---|---|---|\n| `/api/ingest` | POST | Ingest documents from a data source (`filesystem`, `s3`, `web`, `cmis`, ...) |\n| `/api/upload` | POST | Upload files directly for processing |\n| `/api/ingest-text` | POST | Ingest raw text content |\n| `/api/test-sample` | POST | Test the system with built-in sample content |\n| `/api/cleanup-uploads` | POST | Remove temporarily uploaded files |\n\n**Async Processing**\n\n| Endpoint | Method | Purpose |\n|---|---|---|\n| `/api/processing-status/{id}` | GET | Poll status of an async ingestion operation |\n| `/api/processing-events/{id}` | GET | Server-Sent Events stream for real-time progress |\n| `/api/cancel-processing/{id}` | POST | Cancel an ongoing processing operation |\n\n**Search \u0026 Query**\n\n| Endpoint | Method | Purpose |\n|---|---|---|\n| `/api/search` | POST | Hybrid search — returns ranked document excerpts |\n| `/api/query` | POST | AI-powered Q\u0026A — generates an answer from the document corpus |\n\n**Graph**\n\n| Endpoint | Method | Purpose |\n|---|---|---|\n| `/api/graph` | GET | Graph database status and node/relationship counts (Neo4j: live Cypher counts; other LC-backed stores: counts via `lc_graph.query()` where supported; remaining stores: status + dashboard URL) |\n| `/api/graph/query` | POST | Execute a native graph query against the configured store — Cypher (Neo4j, Memgraph, FalkorDB, ArcadeDB, Ladybug, Apache AGE), AQL (ArangoDB), SurrealQL (SurrealDB), Gremlin (Cosmos), GSQL (TigerGraph), openCypher (Neptune/Analytics), GQL (Spanner), SPARQL fallback for RDF-only |\n\n**RDF / Ontology** *(when `RDF_GRAPH_DB` is configured)*\n\n| Endpoint | Method | Purpose |\n|---|---|---|\n| `/api/rdf/query/sparql` | POST | Execute a SPARQL query against the configured RDF store |\n| `/api/rdf/ontology/info` | GET | Return loaded ontology entity and relation type lists |\n| `/api/rdf/ontology/upload` | POST | Upload a new ontology file at runtime |\n| `/api/rdf/rdf-store/list` | GET | List registered RDF stores |\n| `/api/rdf/rdf-store/connect` | POST | Register an additional RDF store at runtime |\n| `/api/rdf/rdf-store/{name}` | DELETE | Deregister an RDF store |\n| `/api/rdf/export/rdf` | POST | Export knowledge graph as RDF *(501 stub — not yet implemented)* |\n\n**Interactive API Documentation** (requires running backend):\n\n| UI | URL | Notes |\n|---|---|---|\n| **Swagger UI** | http://localhost:8000/docs | Try endpoints, inspect schemas, submit requests |\n| **ReDoc** | http://localhost:8000/redoc | Cleaner read-only reference view |\n\nSee [docs/DEVELOPER/REST-API.md](docs/DEVELOPER/REST-API.md) for the full endpoint reference with request/response examples.\n\n## Full-Stack Debugging (Standalone Mode)\n\nVS Code launch configurations, backend/frontend debugging, log levels, and MCP Inspector setup — see [docs/DEVELOPER/DEVELOPER-FULL-STACK-DEBUGGING.md](docs/DEVELOPER/DEVELOPER-FULL-STACK-DEBUGGING.md).\n\n## Observability and Monitoring\n\nFlexible GraphRAG includes comprehensive observability features for production monitoring:\n\n- **OpenTelemetry Integration**: Industry-standard instrumentation with automatic LlamaIndex tracing\n- **Distributed Tracing**: Jaeger UI for visualizing complete request flows\n- **Metrics Collection**: Prometheus for RAG-specific metrics (retrieval/LLM latency, token usage, entity/relation counts)\n- **Visualization**: Grafana dashboards with pre-configured RAG metrics panels\n- **Dual Mode Support**: OpenInference (LlamaIndex) + OpenLIT (optional) as dual OTLP producers\n- **Custom Instrumentation**: Decorators for adding tracing to custom code\n\n### Quick Start\n\n1. Install observability dependencies (optional):\n   ```bash\n   cd flexible-graphrag\n   uv pip install -e \".[observability-dual]\"  # OpenInference (LlamaIndex + LangChain) + OpenLIT (recommended)\n   # Or combine with dev tools: uv pip install -e \".[observability-dual,dev]\"\n   ```\n\n2. Enable in `.env`:\n   ```bash\n   ENABLE_OBSERVABILITY=true\n   OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318\n   OBSERVABILITY_BACKEND=both  # openinference, openlit, or both (recommended)\n   ```\n\n3. Start observability stack:\n   ```bash\n   cd docker\n   # Uncomment observability.yaml in docker-compose.yaml first\n   docker-compose -f docker-compose.yaml -p flexible-graphrag up -d\n   ```\n\n4. Access dashboards:\n   - **Grafana**: http://localhost:3009 (admin/admin) - RAG metrics dashboards\n   - **Jaeger**: http://localhost:16686 - Distributed tracing\n   - **Prometheus**: http://localhost:9090 - Raw metrics\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"./screen-shots/observability/observability-grafana-prometheus-jaeger-ui.png\"\u003e\n    \u003cimg src=\"./screen-shots/observability/observability-grafana-prometheus-jaeger-ui.png\" alt=\"Observability Dashboard\" width=\"700\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nSee [docs/DEVELOPER/OBSERVABILITY/OBSERVABILITY.md](docs/DEVELOPER/OBSERVABILITY/OBSERVABILITY.md) for complete setup, custom instrumentation, and production best practices.\n\n## Project Structure\n\n- `/flexible-graphrag`: Python FastAPI backend\n  - `main.py`: FastAPI REST API server\n  - `backend.py`: Shared business logic used by both API and MCP\n  - `config.py`: Configurable settings for data sources, databases, and LLM providers\n  - `factories.py`: Factory classes for LLM and database creation\n  - `hybrid_system.py`: Main hybrid search and ingestion system\n  - `post_ingestion_state.py`: Post-ingestion document state tracking\n  - `query_engine.py`: Query engine with result deduplication and re-scoring\n  - `retriever_setup.py`: Retriever assembly — vector, search, graph, RDF, synonym expansion\n  - `schema_manager.py`: Database schema management\n  - `adapters/`: Framework-neutral ABCs and factories for all subsystems\n    - `adapters/graph/`: Property graph and RDF store adapter ABCs\n    - `adapters/llm/`: LLM and embedding adapter ABCs (`BothLLMAdapter`, `BothEmbeddingAdapter`)\n    - `adapters/process/`: Chunker and KG extractor ABCs and `build_*` factories\n    - `adapters/search/`: Search store adapter ABC\n    - `adapters/vector/`: Vector store adapter ABC\n  - `incremental_updates/`: Auto-sync engine — detectors, orchestrator, state manager for real-time/near-real-time source sync\n  - `ingest/`: Modular ingestion steps — `ingest_from_files`, `ingest_from_text`, `ingest_from_source`, `run_chunk_pipeline`, `update_pg_graph`, `update_rdf_graph`, `update_vector`, `update_search`\n  - `langchain/`: LangChain peer framework — graph, vector, search, chunking, KG extraction, retrieval\n    - `langchain/graph/pg_store_adapters/`: 15 property graph store adapters (one file per store)\n    - `langchain/graph/rdf_store_adapters/`: 4 RDF/SPARQL store adapters (Fuseki, GraphDB, Oxigraph, Neptune)\n    - `langchain/graph/retrievers/`: `li_`/`lc_` two-layer retriever classes — text-to-query, neighborhood, vector, logging, synonym\n    - `langchain/llm/`: LangChain LLM + embedding factories for all 13 providers\n    - `langchain/process/`: `LangChainChunkerAdapter` (6 splitter types), `LangChainKGExtractorAdapter`\n    - `langchain/search/adapters/`: BM25, Elasticsearch, OpenSearch search adapters\n    - `langchain/vector/adapters/`: 10 vector store adapters\n  - `llamaindex/`: LlamaIndex peer framework — graph, vector, search, chunking, KG extraction\n    - `llamaindex/graph/adapters/`: LlamaIndex property graph store adapters (Neo4j, ArcadeDB, FalkorDB, Memgraph, Nebula, Neptune, etc.)\n    - `llamaindex/llm/`: LlamaIndex LLM + embedding factories for all 13 providers\n    - `llamaindex/process/`: `LlamaIndexChunkerAdapter`, `LlamaIndexKGExtractorAdapter`\n    - `llamaindex/search/adapters/`: Elasticsearch, OpenSearch search adapters\n    - `llamaindex/vector/adapters/`: Qdrant, Elasticsearch, OpenSearch, pgvector, Chroma, and others\n  - `observability/`: OpenTelemetry instrumentation, Prometheus metrics, tracing setup\n  - `process/`: Core document processing — `document_processor.py` (Docling/LlamaParse), `kg_extractor.py`, `node_pipeline.py`\n  - `rdf/`: RDF/ontology support — ontology manager, KG-to-RDF converter, SPARQL tools, bundled schemas (`rdf/schemas/`)\n    - `rdf/store/`: RDF store adapters — Fuseki, GraphDB, Oxigraph, store factory\n  - `sources/`: Data source connectors — filesystem, CMIS/Alfresco, Azure Blob, S3, GCS, OneDrive, SharePoint, Google Drive, Box, web, Wikipedia, YouTube, etc.\n  - `stores/`: Index managers — `index_manager.py`, `rdf_manager.py`\n  - `pyproject.toml`: Modern Python package definition (PEP 517/518)\n  - `uv.toml`: UV package manager configuration\n  - `start.py`: Startup script (`flexible-graphrag` console entry point)\n  - `install.py`: Installation helper script\n\n- `/flexible-graphrag-mcp`: Standalone MCP server\n  - `main.py`: HTTP-based MCP server (calls REST API)\n  - `pyproject.toml`: MCP package definition with minimal dependencies\n  - `README.md`: MCP server setup and installation instructions\n  - `QUICK-USAGE-GUIDE.md`: Quick usage guide\n  - **Lightweight**: Only 4 dependencies (fastmcp, nest-asyncio, httpx, python-dotenv)\n\n- `/flexible-graphrag-ui`: Frontend applications\n  - `/frontend-react`: React + TypeScript frontend (built with Vite)\n    - `/src`: Source code\n    - `vite.config.ts`: Vite configuration\n    - `tsconfig.json`: TypeScript configuration\n    - `package.json`: Node.js dependencies and scripts\n\n  - `/frontend-angular`: Angular + TypeScript frontend (built with Angular CLI)\n    - `/src`: Source code\n    - `angular.json`: Angular configuration\n    - `tsconfig.json`: TypeScript configuration\n    - `package.json`: Node.js dependencies and scripts\n\n  - `/frontend-vue`: Vue + TypeScript frontend (built with Vite)\n    - `/src`: Source code\n    - `vite.config.ts`: Vite configuration\n    - `tsconfig.json`: TypeScript configuration\n    - `package.json`: Node.js dependencies and scripts\n\n- `/docker`: Docker infrastructure\n  - `docker-compose.yaml`: Main compose file with modular includes\n  - `/includes`: Modular database and service configurations\n  - `/nginx`: Reverse proxy configuration\n  - `README.md`: Docker deployment documentation\n\n- `/docs`: Documentation\n  - `ARCHITECTURE.md`: System architecture and component relationships\n  - `DEPLOYMENT-CONFIGURATIONS.md`: Standalone, hybrid, and full Docker deployment guides\n  - `DOCKER-RESOURCE-CONFIGURATION.md`: Docker memory/CPU configuration for Windows (WSL2), macOS, and Linux — essential for running the full stack, especially with vLLM\n  - `ENVIRONMENT-CONFIGURATION.md`: Environment setup guide with database switching\n  - `POSTGRES-SETUP.md`: PostgreSQL setup for pgvector and incremental state management\n  - `SCHEMA-EXAMPLES.md`: Knowledge graph schema examples\n  - `PERFORMANCE.md`: Performance benchmarks and optimization guides\n  - `DEFAULT-USERNAMES-PASSWORDS.md`: Database credentials and dashboard access\n  - `PORT-MAPPINGS.md`: Complete port reference for all services\n  - `DATA-SOURCES/`: Data source setup guides (Azure Blob, S3, GCS, Alfresco etc.)\n  - `DOC-PROCESSING/`: Document processing guides (Docling GPU, parser output)\n  - `GRAPH-DATABASES/`: Graph database guides (Neo4j, Neptune, Nebula, ArcadeDB, etc.)\n  - `INCREMENTAL-UPDATE-AUTO-SYNC/`: Incremental updates documentation (README, QUICKSTART, SETUP-GUIDE, API-REFERENCE)\n  - `LLM/`: LLM and embedding configuration guides\n  - `LANGCHAIN/`: LangChain integration guides (RDF QA fusion, graph retriever setup, adapter reference)\n  - `OBSERVABILITY/`: Observability and monitoring guides\n  - `RDF/`: RDF/ontology guides (store setup, ontology config, ingestion modes, SPARQL examples, user guide)\n  - `VECTOR-DATABASES/`: Vector database guides (dimensions, integration, Chroma modes)\n\n- `/scripts`: Utility scripts\n  - `create_opensearch_pipeline.py`: OpenSearch hybrid search pipeline setup\n  - `setup-opensearch-pipeline.sh/.bat`: Cross-platform pipeline creation\n  - `rdf_cleanup.py`: RDF store CLI tool — list-docs, count, clear-doc, clear-all\n  - `litellm_config.yaml`: Sample LiteLLM proxy config (copy to your LiteLLM install dir)\n  - `/incremental`: Incremental updates control scripts\n    - `sync-now.sh/.ps1/.bat`: Trigger immediate synchronization\n    - `set-refresh-interval.sh/.ps1/.bat`: Configure polling interval\n    - `README.md`: Script usage documentation\n\n- `/tests`: Test suite\n  - `test_bm25_*.py`: BM25 configuration and integration tests\n  - `conftest.py`: Test configuration and fixtures\n  - `run_tests.py`: Test runner\n\n- `/examples`: Standalone usage examples (not re-tested)\n  - `observability_example.py`: OpenTelemetry / observability integration example\n  - `/rdf`: RDF/ontology examples\n    - `sparql_examples.py`: Sample SPARQL queries for all three stores\n    - `unified_query_engine_examples.py`: `UnifiedQueryEngine` usage examples\n    - `store_index_example.py`: Build a LlamaIndex from an RDF store\n    - `ontology_guided_ingestion_example.py`: `OntologyAwarePropertyGraphBuilder` usage\n    - `ingest_with_ontology.py`: Ontology-guided ingestion example class\n    - `rdf_export_import_examples.py`: RDF export/import patterns\n    - `config_rdf_stores.py`: RDF store config reference snippets\n\n## License\n\nThis project is licensed under the terms of the Apache License 2.0. See the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevereiner%2Fflexible-graphrag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstevereiner%2Fflexible-graphrag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevereiner%2Fflexible-graphrag/lists"}