{"id":50454318,"url":"https://github.com/umerjavaidkh/graphrag-examples","last_synced_at":"2026-06-01T01:30:39.218Z","repository":{"id":359168928,"uuid":"1244379835","full_name":"umerjavaidkh/graphrag-examples","owner":"umerjavaidkh","description":"GraphRAG on Neo4j Community Docker — entity extraction from PDFs + structured CSV import + semantic search + agentic Q\u0026A with Semantic Kernel and OpenAI","archived":false,"fork":false,"pushed_at":"2026-05-20T17:09:57.000Z","size":42408,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-20T22:44:48.892Z","etag":null,"topics":["ai-agent","docker","embeddings","entity-extraction","generative-ai","graph-database","graphrag","knowledge-graph","neo4j","nlp","openai","rag","recommendation-system","retrieval-augmented-generation","semantic-kernel","vector-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/umerjavaidkh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-20T08:00:22.000Z","updated_at":"2026-05-20T17:10:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/umerjavaidkh/graphrag-examples","commit_stats":null,"previous_names":["umerjavaidkh/graphrag-examples"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/umerjavaidkh/graphrag-examples","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umerjavaidkh%2Fgraphrag-examples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umerjavaidkh%2Fgraphrag-examples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umerjavaidkh%2Fgraphrag-examples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umerjavaidkh%2Fgraphrag-examples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/umerjavaidkh","download_url":"https://codeload.github.com/umerjavaidkh/graphrag-examples/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umerjavaidkh%2Fgraphrag-examples/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33756575,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","docker","embeddings","entity-extraction","generative-ai","graph-database","graphrag","knowledge-graph","neo4j","nlp","openai","rag","recommendation-system","retrieval-augmented-generation","semantic-kernel","vector-search"],"created_at":"2026-06-01T01:30:38.435Z","updated_at":"2026-06-01T01:30:39.213Z","avatar_url":"https://github.com/umerjavaidkh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Customer Graph — GraphRAG Setup Guide\n\n\u003e **Base repository:** https://github.com/neo4j-product-examples/graphrag-examples/tree/main/customer-graph\n\u003e\n\u003e This guide adapts the original tutorial to run on **Neo4j Community Edition via Docker** instead of AuraDB Professional. All code changes required to make this work are documented in [`CODE_CHANGES.md`](./CODE_CHANGES.md).\n\n---\n\n## What This Project Does\n\nBuilds a GraphRAG (Graph Retrieval-Augmented Generation) system over a fashion retail dataset by combining:\n\n- **Unstructured data** — PDFs (fashion catalog, credit notes) extracted into a knowledge graph using LLM entity extraction\n- **Structured data** — CSV files (customers, orders, articles, products, suppliers) imported as graph nodes and relationships\n- **Vector embeddings** — Product descriptions embedded with OpenAI for semantic search\n- **Agentic Q\u0026A** — A Semantic Kernel agent that answers natural language questions by traversing the graph\n\n---\n\n## Why Docker Community Instead of AuraDB\n\nThe original tutorial uses **AuraDB Professional** which provides:\n- Aura Importer (GUI-based CSV-to-graph tool)\n- GenAI plugin for in-database vector embedding\n- Graph Data Science (GDS) plugin\n\nWe replace all of this with:\n- **Manual `LOAD CSV` Cypher queries** instead of Aura Importer\n- **Python + OpenAI batched API calls** instead of GenAI plugin\n- **`graph-data-science` Docker plugin** for community GDS support\n\n---\n\n## Prerequisites\n\n- Python 3.13+\n- Docker Desktop installed and running\n- OpenAI API key\n- Git\n\n---\n\n## Step 1 — Clone the Repository\n\n```bash\ngit clone https://github.com/neo4j-product-examples/graphrag-examples.git\ncd graphrag-examples/customer-graph\n```\n\n---\n\n## Step 2 — Create Python Virtual Environment\n\n```bash\n\nbrew unlink python@3.14\nbrew link --overwrite python@3.13\n\npython3 -m venv venv\nsource venv/bin/activate   # Mac/Linux\n\ncd customer-graph\npip install -r requirements.txt\n```\n\n---\n\n## Step 3 — Configure Environment Variables\n\n```bash\ncp .env.example .env\n```\n\nEdit `.env` with your credentials:\n\n```env\nNEO4J_URI=bolt://localhost:7687\nNEO4J_USERNAME=neo4j\nNEO4J_PASSWORD=password123\nOPENAI_API_KEY=sk-...\n```\n\n---\n\n## Step 4 — Start Neo4j via Docker\n\nInstead of AuraDB, run Neo4j Community Edition locally. This single command sets up Neo4j with all required plugins (APOC, APOC Extended, Graph Data Science) and creates named volumes so your data persists across container restarts:\n\n```bash\ndocker run -d \\\n  --name neo4j \\\n  -p 7474:7474 \\\n  -p 7687:7687 \\\n  -e NEO4J_AUTH=neo4j/password123 \\\n  -e NEO4J_PLUGINS='[\"apoc\", \"apoc-extended\", \"graph-data-science\"]' \\\n  -e NEO4J_dbms_security_procedures_unrestricted='apoc.*,genai.*,gds.*' \\\n  -e NEO4J_dbms_security_procedures_allowlist='apoc.*,genai.*,gds.*' \\\n  -e NEO4J_dbms_default__listen__address=0.0.0.0 \\\n  -e NEO4J_dbms_default__advertised__address=localhost \\\n  -v docker_neo4j_data:/data \\\n  -v docker_neo4j_logs:/logs \\\n  neo4j:5.18-community\n```\n\n![](img/docker_image.png)\n\nWait ~30 seconds for startup, then open Neo4j Browser at **http://localhost:7474**\nLogin: `neo4j` / `password123`\n\nVerify plugins loaded:\n```cypher\nRETURN gds.version()\n```\n![](img/noe4j_local_image.png)\n\n\n\u003e **Note:** The `genai` plugin is not available on Neo4j 5.18 Community. We handle embeddings in Python instead — see Step 9.\n\n---\n\n## Step 5 — Apply Code Fixes (Skip no need)\n\nThe `neo4j-graphrag` library has breaking API changes since the original tutorial was written. Before running any scripts, apply all fixes documented in [`CODE_CHANGES.md`](./CODE_CHANGES.md).\n\nFiles are updated already no need to change any code:\n- `rag_schema_from_onto.py` — renamed schema classes\n- `unstructured_ingest.py` — deprecated imports + pass schema directly\n- `ingest_post_processing.py` — replace genai plugin with Python embeddings\n- `graphrag/retail_service.py` — fix relationship paths + add missing methods\n- `graphrag/retail_plugin.py` — expose new agent tool\n\n---\n\n## Step 6 — Run Unstructured PDF Ingestion\n\nThis reads the PDFs (`data/credit-notes.pdf`, `data/fashion-catalog.pdf`), uses the ontology in `ontos/customer.ttl` to guide LLM entity extraction, and writes a knowledge graph to Neo4j:\n\n```bash\npython unstructured_ingest.py\n```\n\nThis takes several minutes. When complete, verify in Neo4j Browser:\n\n```cypher\nMATCH (n) RETURN labels(n), count(n) ORDER BY count(n) DESC\n```\n\nYou should see nodes tagged `__KGBuilder__` and `__Entity__` with labels like `CreditNote`, `Order`, `Article`, `Product`.\n\n---\n\n## Step 7 — Import Structured CSV Data\n\nThe original tutorial uses **Aura Importer** (AuraDB-only GUI tool). We replace it with `LOAD CSV` Cypher queries.\n\n### 7a — Copy CSVs into the Docker Container\n\n```bash\nfor f in data/articles.csv data/customers.csv data/order-details.csv data/suppliers.csv data/products.csv; do\n    docker cp $f neo4j:/var/lib/neo4j/import/\ndone\n```\n\nVerify files are inside the container:\n\n```bash\ndocker exec neo4j ls /var/lib/neo4j/import/\n```\n\n### 7b — Run LOAD CSV Queries in Neo4j Browser\n\nRun each block **one at a time, in this exact order**:\n\n**1. Suppliers**\n```cypher\nLOAD CSV WITH HEADERS FROM 'file:///suppliers.csv' AS row\nMERGE (s:Supplier {supplierId: row.supplierId})\nSET s.name = row.supplierName,\n    s.address = row.supplierAddress;\n```\n\n**2. Products**\n```cypher\nLOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row\nMERGE (p:Product {productCode: row.productCode})\nSET p.name = row.prodName,\n    p.productTypeNo = row.productTypeNo,\n    p.productTypeName = row.productTypeName,\n    p.productGroupName = row.productGroupName,\n    p.garmentGroupNo = row.garmentGroupNo,\n    p.garmentGroupName = row.garmentGroupName,\n    p.description = row.detailDesc;\n```\n\n**3. Articles** (links to Products and Suppliers)\n```cypher\nLOAD CSV WITH HEADERS FROM 'file:///articles.csv' AS row\nMERGE (a:Article {articleId: row.articleId})\nSET a.productCode = row.productCode,\n    a.name = row.prodName,\n    a.productTypeName = row.productTypeName,\n    a.graphicalAppearanceNo = row.graphicalAppearanceNo,\n    a.graphicalAppearanceName = row.graphicalAppearanceName,\n    a.colourGroupCode = row.colourGroupCode,\n    a.colourGroupName = row.colourGroupName\nWITH a, row\nMATCH (p:Product {productCode: row.productCode})\nMERGE (a)-[:VARIANT_OF]-\u003e(p)\nWITH a, row\nMATCH (s:Supplier {supplierId: row.supplierId})\nMERGE (a)-[:SUPPLIED_BY]-\u003e(s);\n```\n\n**4. Customers**\n```cypher\nLOAD CSV WITH HEADERS FROM 'file:///customers.csv' AS row\nMERGE (c:Customer {customerId: row.customerId})\nSET c.firstName = row.fn,\n    c.active = row.active,\n    c.clubMemberStatus = row.clubMemberStatus,\n    c.fashionNewsFrequency = row.fashionNewsFrequency,\n    c.age = toInteger(row.age),\n    c.postalCode = row.postalCode;\n```\n\n**5. Orders, Transactions and Relationships**\n\n\u003e ⚠️ **Important:** Use `toInteger(row.orderId)` — this is critical for linking with PDF-extracted entities in the next step.\n\n```cypher\nLOAD CSV WITH HEADERS FROM 'file:///order-details.csv' AS row\nMERGE (o:Order {orderId: toInteger(row.orderId)})\nWITH o, row\nMERGE (t:Transaction {txId: row.txId})\nSET t.date = row.tDat,\n    t.price = toFloat(row.price),\n    t.salesChannelId = row.salesChannelId\nMERGE (o)-[:HAS_TRANSACTION]-\u003e(t)\nWITH o, t, row\nMATCH (c:Customer {customerId: row.customerId})\nMERGE (c)-[:PLACED]-\u003e(o)\nWITH o, t, row\nMATCH (a:Article {articleId: row.articleId})\nMERGE (t)-[:CONTAINS]-\u003e(a);\n```\n\n---\n\n## Step 8 — Create Cross-Links Between Structured and Unstructured Data\n\nThe LLM extracts `orderId` and `articleId` as integers from PDFs, but `LOAD CSV` imports them as strings by default. This causes joins between structured (CSV) and unstructured (PDF) nodes to silently fail. Run these three queries in Neo4j Browser to fix the types and create the cross-links:\n\n**Fix Article ID type (string → integer):**\n```cypher\nMATCH (a:Article) WHERE NOT '__KGBuilder__' IN labels(a)\nSET a.articleId = toInteger(a.articleId)\n```\n\n**Link CreditNotes to structured Articles:**\n```cypher\nMATCH (c:CreditNote)-[:REFUND_OF_ARTICLE]-\u003e(a1:Article)\nWHERE '__KGBuilder__' IN labels(a1)\nMATCH (a2:Article) WHERE NOT '__KGBuilder__' IN labels(a2)\nAND a2.articleId = a1.articleId\nMERGE (c)-[:REFUND_OF_ARTICLE_STRUCTURED]-\u003e(a2)\n```\n\n**Link CreditNotes to Suppliers via the Order chain:**\n```cypher\nMATCH (c:CreditNote)-[:REFUND_FOR_ORDER]-\u003e(o1:Order)\nMATCH (o2:Order)-[:HAS_TRANSACTION]-\u003e(t:Transaction)-[:CONTAINS]-\u003e(a:Article)-[:SUPPLIED_BY]-\u003e(s:Supplier)\nWHERE o1.orderId = o2.orderId\nMERGE (c)-[:RETURNED_TO_SUPPLIER]-\u003e(s)\n```\n\nVerify both links were created:\n```cypher\nMATCH (c:CreditNote)-[:REFUND_OF_ARTICLE_STRUCTURED]-\u003e(a) RETURN count(*) AS articleLinks\n```\n```cypher\nMATCH (c:CreditNote)-[:RETURNED_TO_SUPPLIER]-\u003e(s) RETURN count(*) AS supplierLinks\n```\n\nBoth should return values greater than 0.\n\n---\n\n## Step 9 — Run Post-Processing (Embeddings + Vector Index)\n\nThe original tutorial uses the `genai.vector.encodeBatch` Neo4j procedure (not available on Community 5.18). The updated `ingest_post_processing.py` generates embeddings directly via the OpenAI Python SDK in batches of 500:\n\n```bash\npython ingest_post_processing.py\n```\n\nExpected output:\n```\nFormatting Product Text\nCreating Product Text Embeddings\n  Found 8018 products to embed\n  Embedded 500/8018 products\n  Embedded 1000/8018 products\n  ...\n  Embedded 8018/8018 products\nCreating Product Vector Index\nWaiting for vector index to come online...\nDone.\n```\n\n---\n\n## Step 10 — Run the Agent\n\n```bash\ncd graphrag\npython cli_agent.py\n```\n\nThe agent uses Semantic Kernel with OpenAI `gpt-4o-mini` and has access to these tools:\n- **`search_products`** — semantic vector search over product descriptions\n- **`recommend_products`** — graph-based collaborative filtering\n- **`create_customer_segments`** — GDS Leiden community detection\n- **`get_product_order_supplier_info`** — order and return stats by product\n- **`get_supplier_order_product_info`** — order and return stats by supplier\n- **`get_top_suppliers_by_returns`** — ranks all suppliers by credit note count\n- **`answer_general_question`** — text-to-Cypher for arbitrary graph queries\n\n### Sample Questions\n\n```\n**Q: What are some good sweaters for spring? Nothing too warm please!**\n\nHere are some great lightweight sweaters perfect for spring:\n\n| # | Product | Description |\n|---|---------|-------------|\n| 1 | [Queen Sweater](https://representative-domain/product/677930) | Lightweight sweatshirt fabric with ribbing around neckline, cuffs, and hem |\n| 2 | [Stressan Light Knit Jumper](https://representative-domain/product/358483) | Light, fine, soft knit with long sleeves, raw edges, rounded hem |\n| 3 | [King Sweater](https://representative-domain/product/716999) | Short top in lightweight sweatshirt fabric with ribbed details |\n| 4 | [Sorbet Sweatshirt](https://representative-domain/product/822888) | Boxy-style top with round neckline and low dropped shoulders |\n| 5 | [Grace Sweater](https://representative-domain/product/796033) | Soft knit with low dropped shoulders and ribbed neckline |\n| 6 | [Sandrine](https://representative-domain/product/827370) | Cotton blend top with wide ribbing around neckline |\n| 7 | [Puff Sweater](https://representative-domain/product/783925) | Soft fine knit with wool, relaxed fit, dropped shoulders |\n| 8 | [Buffy Lace Sweater](https://representative-domain/product/758790) | Soft rib knit with lace sections and dropped shoulders |\n```\n\n```\nWhich suppliers have the highest number of returns (i.e., credit notes)?\n```\n```\nWhat are the top 3 most returned products for supplier 1616? Get those product codes and find other suppliers who have less returns for each product I can use instead.\n```\n```\nCan you run a customer segmentation analysis?\n```\n```\nWhat are the most common product types purchased for each segment?\n```\n```\nFor the largest group make a creative spring promotional campaign for them highlighting recommended products. Draft it as an email.\n```\n\n---\n\n## Troubleshooting\n\n| Error | Cause | Fix |\n|---|---|---|\n| `ImportError: cannot import name 'SchemaEntity'` | Library API change | Rename to `NodeType` — see CODE_CHANGES.md |\n| `ImportError: cannot import name 'SchemaConfig'` | Library API change | Rename to `GraphSchema` — see CODE_CHANGES.md |\n| `ValidationError: List should have at least 1 item` | Pydantic now rejects empty properties list | Use `make_node()` helper — see CODE_CHANGES.md |\n| `TypeError: missing argument 'node_types'` | `create_schema_model` params renamed | See CODE_CHANGES.md |\n| `AttributeError: 'GraphSchema' has no attribute 'entities'` | Field renamed | Pass `schema=neo4j_schema` directly to `SimpleKGPipeline` |\n| `ProcedureNotFound: genai.vector.encodeBatch` | GenAI plugin not on Community 5.18 | Use Python OpenAI embeddings — see CODE_CHANGES.md |\n| Supplier/article returns always 0 | ID type mismatch between CSV (string) and PDF (integer) | Run Step 8 cross-link queries |\n| `gds.graph.drop` not found | GDS plugin missing | Add `graph-data-science` to Docker plugins — Step 4 |\n| GDS projection fails | Wrong relationship names in original code | Fix `ORDERED/CONTAINS` → `PLACED/HAS_TRANSACTION` — see CODE_CHANGES.md |\n| Agent says \"no supplier data available\" | Missing `get_top_suppliers_by_returns` tool | Add new method — see CODE_CHANGES.md |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fumerjavaidkh%2Fgraphrag-examples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fumerjavaidkh%2Fgraphrag-examples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fumerjavaidkh%2Fgraphrag-examples/lists"}