{"id":50119497,"url":"https://github.com/yonk-labs/pg-raggraph","last_synced_at":"2026-05-23T18:04:26.263Z","repository":{"id":355497914,"uuid":"1209372080","full_name":"yonk-labs/pg-raggraph","owner":"yonk-labs","description":"PostgreSQL-native GraphRAG. Vector search, full-text, and knowledge-graph traversal in a single SQL query. No Neo4j. No Pinecone. No AGE.","archived":false,"fork":false,"pushed_at":"2026-05-18T22:55:49.000Z","size":6058,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-19T00:55:51.989Z","etag":null,"topics":["embeddings","graphrag","knowledge-graph","llm","pgvector","postgresql","python","rag","retrieval-augmented-generation","vector-search"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yonk-labs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-13T11:10:04.000Z","updated_at":"2026-05-18T22:55:55.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/yonk-labs/pg-raggraph","commit_stats":null,"previous_names":["yonk-labs/pg_raggraph"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/yonk-labs/pg-raggraph","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yonk-labs%2Fpg-raggraph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yonk-labs%2Fpg-raggraph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yonk-labs%2Fpg-raggraph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yonk-labs%2Fpg-raggraph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yonk-labs","download_url":"https://codeload.github.com/yonk-labs/pg-raggraph/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yonk-labs%2Fpg-raggraph/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33406480,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T04:15:53.637Z","status":"ssl_error","status_checked_at":"2026-05-23T04:15:53.242Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embeddings","graphrag","knowledge-graph","llm","pgvector","postgresql","python","rag","retrieval-augmented-generation","vector-search"],"created_at":"2026-05-23T18:04:13.323Z","updated_at":"2026-05-23T18:04:26.252Z","avatar_url":"https://github.com/yonk-labs.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pg-raggraph\n\n\u003e **PostgreSQL-native GraphRAG.** Vector search, full-text search, and knowledge-graph traversal — all in a single SQL query. No Neo4j. No Pinecone. No Apache AGE. Just the Postgres you already run.\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Tests](https://img.shields.io/badge/tests-385%20passing-brightgreen)](#tests-and-benchmarks) [![Python](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue)](pyproject.toml) [![Status: alpha](https://img.shields.io/badge/status-alpha%20(0.3.0a3)-orange)]()\n\n---\n\n## What this is\n\npg-raggraph is a Python library for **GraphRAG on plain PostgreSQL**. You point it at a directory of documents, it ingests them — chunks, embeddings, entities, relationships, full-text index — and you get back a query API that combines vector similarity, BM25, and graph traversal. All retrieval happens in one round-trip to Postgres.\n\nIt is also a **full toolkit** around that library: a CLI (`pgrg`), an optional FastAPI server with a web UI, and an MCP server for Claude Desktop / Cursor / Zed.\n\nTwo retrieval workloads are first-class:\n\n- **Classic GraphRAG** — static corpora, code Q\u0026A, technical docs, multi-hop entity reasoning. Validated at **+18.9% accuracy lift** over plain vector search on a real 909-doc dev codebase.\n- **Evolving knowledge** — corpora where the right answer depends on *time*, *version*, or *retraction status*. Validated on Python 3.10/3.11/3.12 docs (**13/13 perfect version-filter purity**) and PubMed HRT retractions (**15/15 perfect on retraction-aware + time-travel queries**).\n\n## Why it exists\n\nMost GraphRAG today means stitching together two or three databases:\n\n- A vector DB (Pinecone, Weaviate, Qdrant) for semantic search.\n- A graph DB (Neo4j) for relationship traversal.\n- An orchestrator on top — LangChain, LlamaIndex, or hand-rolled.\n\nThat's three deploy targets, three connection pools, three sets of credentials, three failure modes, three vendors to negotiate with. And the killer GraphRAG operation — *\"find chunks similar to X, then expand via the entity graph\"* — needs at least two round-trips, often more, because vector and graph live in different worlds.\n\npg-raggraph proves you don't need any of that. PostgreSQL already has:\n\n- **pgvector** — vector similarity search with HNSW or IVFFlat indexes.\n- **pg_trgm** — trigram fuzzy matching, perfect for entity resolution.\n- **Recursive CTEs** — fast, well-indexed graph traversal that the planner understands.\n- **`tsvector` + `to_tsquery`** — production-grade full-text search with BM25-equivalent ranking.\n\nCombine them in one SQL query and you have a complete GraphRAG stack. One ACID-compliant database. One backup story. One thing to monitor. Works on **every managed Postgres** — AWS RDS, Supabase, Neon, GCP Cloud SQL, Azure, self-hosted — anywhere modern PostgreSQL runs.\n\nThe thesis is decided by benchmark, not opinion. See *Tests and benchmarks* below.\n\n## Wait — isn't it called *graphrag*, not *raggraph*?\n\nThe name flip is deliberate. Most \"GraphRAG\" systems lead with the graph: docs get converted to entities and relationships up front, the graph **is** the corpus, and retrieval is graph-walks looking for relevant subgraphs. That's the Microsoft GraphRAG / LightRAG / Neo4j-GraphRAG model.\n\nThat model misreads what most corpora actually are. Documentation, technical articles, code, support tickets, papers, chat logs — none of these start out as graphs. They're prose. They answer most questions through plain semantic similarity. Forcing them through an entity-extraction pipeline first, then querying the resulting graph, adds latency, LLM cost, and information loss without buying you much for the bulk of queries.\n\npg-raggraph inverts the order. **The graph is an enhancer, not the main attraction.** A query starts as RAG — vector similarity + BM25 — and the graph layer kicks in only when retrieval needs help: re-ranking the top-K via 1-hop entity connectivity (`naive_boost`), or expanding to chunks reachable through entity relationships when the seed retrieval is weak (`local` / `hybrid`). **Graph helps finish the story, not start it.**\n\nThis isn't aesthetic preference. The [bake-off](benchmarks/age-bakeoff/results/REPORT-VERDICT.md) confirms it: on clean technical corpora, graph-only retrieval modes don't beat plain vector + BM25. They earn their cost when the chunker is weak, when the corpus has cross-document entity reasoning, or when you need explainability and provenance trails. Calling it \"raggraph\" rather than \"graphrag\" reflects that ordering: RAG first, graph second, and only when it pays.\n\n## Quickstart — 5 minutes, works cold\n\nEvery command is copy-pasteable. You need a running Postgres 16+ with the\n`pgvector` and `pg_trgm` extensions; the `docker compose` step below sets\none up locally if you don't have one.\n\n```bash\n# 1. Install from PyPI\npip install pg-raggraph              # core SDK + CLI\n# or, with the bundled FastAPI server + web UI:\npip install 'pg-raggraph[server]'\n\n# 2. Start a local Postgres with pgvector + pg_trgm pre-installed\n#    (skip if you already have a Postgres with the extensions)\ncurl -sLo docker-compose.yml https://raw.githubusercontent.com/yonk-labs/pg-raggraph/main/docker-compose.yml\ndocker compose up -d postgres\n\n# 3. Pick an LLM endpoint (skip if you only want pure vector RAG)\n#    Option A — OpenAI:\nexport PGRG_LLM_BASE_URL=https://api.openai.com/v1\nexport PGRG_LLM_API_KEY=sk-...   # your key\nexport PGRG_LLM_MODEL=gpt-4o-mini\n\n#    Option B — local Ollama (free):\n# ollama pull llama3.2 \u0026\u0026 ollama serve   # leave running in another shell\n# (PGRG defaults to Ollama at http://localhost:11434/v1, so no env needed)\n\n# 4. Ingest a directory and ask questions\npgrg devmem ingest ./my-repo/\npgrg devmem ask \"who owns the authentication service?\"\n```\n\nPrefer to run from source? `git clone https://github.com/yonk-labs/pg-raggraph\n\u0026\u0026 cd pg-raggraph \u0026\u0026 uv sync` works the same way; substitute `uv run pgrg` for\n`pgrg` in the commands above.\n\nIf your LLM endpoint is up and your repo has docs/code, you'll see something like:\n\n```\nFound 12 files to process.\n[1/12] README.md: 8 entities, 14 rels\n[2/12] auth/service.py: 5 entities, 11 rels\n...\nDone: 12 ingested, 0 skipped. 87 entities, 156 relationships.\n\nAnswer: The authentication service is owned by the platform team.\nSarah Chen leads platform; auth.py was last touched by alex@acme.com\nin commit 4f2c8a1 (\"rotate JWT signing key\").\n\nSources:\n  [0.79] auth/README.md\n  [0.71] team/platform.md\n  [0.68] commits/4f2c8a1.md\n```\n\nThat's the whole loop. From `pip install` to a grounded answer in five minutes.\n\n\u003e **One thing to know about `pgrg serve`** — the bundled FastAPI web UI is for **local development and demos only**. It ships without authentication, rate limiting, or upload size caps. **Do not expose it directly to the public internet.** For production, put it behind a reverse proxy that adds auth, TLS, and rate limits — or embed `create_app()` in your own FastAPI application. See [`docs/user-guide.md#production-deployment`](docs/user-guide.md#production-deployment) for the recommended setup.\n\n## Tests and benchmarks\n\nReal numbers from real corpora. No cherry-picking.\n\n**Classic GraphRAG** — `pg-agents` real dev codebase (909 docs, 17K entities, 38K relationships):\n\n| Mode | Avg top score | Latency p50 | vs naive |\n|------|:-:|:-:|:-:|\n| naive (vector + BM25) | 0.602 | 109 ms | baseline |\n| **`naive_boost`** ⭐ | **0.716** | **107 ms** | **+18.9%** |\n| **`smart`** (default) | **0.716** | 127 ms | **+18.9%** at routing |\n| local (graph traversal) | 0.614 | 423 ms | +1.9% |\n| hybrid (local + global) | 0.614 | 482 ms | +1.9% |\n\n**Evolving knowledge — versioned docs** ([`benchmarks/python-versioned-docs/`](benchmarks/python-versioned-docs/)):\n\n12 docs (Python 3.10 / 3.11 / 3.12), 1364 chunks, 15 hand-written gold questions.\n\n| Threshold | Result | Pass? |\n|---|---|:-:|\n| ≥ 80% of `version_filter`-tagged Qs return top-5 chunks ONLY from matching version | **100% (13/13)** | ✅ |\n| ≥ 1 unfiltered_target Q has expected version in top-3 | 1/2 | ✅ |\n\n**Evolving knowledge — medical retractions** ([`benchmarks/medical-hrt/`](benchmarks/medical-hrt/)):\n\n48 PubMed abstracts on HRT + cardiovascular outcomes (1998–2025), 7 epistemically-retracted (WHI 2002 superseded the prior consensus), 15 hand-written gold questions.\n\n| Threshold | Result | Pass? |\n|---|---|:-:|\n| ≥ 4/5 retraction_aware Qs return top-5 with **zero** retracted in `retracted_behavior=\"hide\"` mode | **5/5** | ✅ |\n| ≥ 1/5 time-travel Qs (`as_of=2001-12-31`) return ≥1 pre-2002 paper in top-5 | **5/5** | ✅ |\n\n**Versus Apache AGE** — SCOTUS bake-off (772 docs, 30 questions × 3 runs × 6 modes per engine):\n\n| Axis | pg-raggraph | Apache AGE |\n|---|:-:|:-:|\n| Accuracy (fully_correct/30) | 17–18 | 17–18 (tie) |\n| Retrieval p50 latency | **32–73 ms** | 3,079–3,906 ms (**42–111× slower**) |\n| Cloud compatibility | RDS, Supabase, Neon, Cloud SQL, Azure, self-host | Azure only |\n\nFull bake-off report: [`benchmarks/age-bakeoff/results/REPORT-VERDICT.md`](benchmarks/age-bakeoff/results/REPORT-VERDICT.md).\n\n**Test suite:** 385 passing tests (260 unit + 125 integration) across `tests/unit/` and `tests/integration/`, including a 15-test error-path suite that asserts specific exception types on bad DSNs, naive `as_of`, oversize `/ingest`, path traversal, etc. CI runs the full suite against pgvector containers on Python 3.12 and 3.13.\n\n## Where to go next\n\n```\n       ┌──────────────────────────────────────────────────┐\n       │  I want to …                                     │\n       ├──────────────────────────────────────────────────┤\n       │  Pick the right workload         → USE-CASES.md  │\n       │  Walk a worked example           → blog series   │\n       │  Get the full API surface        → user-guide.md │\n       │  Tier-1 evolving-knowledge       → cookbook      │\n       │  Avoid common API gotchas        → API-QUICKREF  │\n       │  Read the architecture decisions → research/     │\n       │  See the unvarnished critique    → ASSESSMENT.md │\n       └──────────────────────────────────────────────────┘\n```\n\n| Document | What's inside |\n|---|---|\n| [`docs/USE-CASES.md`](docs/USE-CASES.md) | Decision matrix: classic GraphRAG vs evolving knowledge. Corpus shape → recommended config. |\n| [`docs/blogs/01-intro-classic-vs-evolving.md`](docs/blogs/01-intro-classic-vs-evolving.md) | Series intro: two workloads, one Postgres database, when each one applies. |\n| [`docs/blogs/02-path-a-versioned-python-docs.md`](docs/blogs/02-path-a-versioned-python-docs.md) | Walkthrough: ingest Python 3.10/3.11/3.12 docs, query with `version_filter`. |\n| [`docs/blogs/03-path-b-medical-retractions.md`](docs/blogs/03-path-b-medical-retractions.md) | Walkthrough: ingest PubMed HRT abstracts, demonstrate `retracted_behavior` and `as_of`. |\n| [`docs/cookbook/evolution-tracking.md`](docs/cookbook/evolution-tracking.md) | Tier 1 quickstart — `effective_from`, `retracted`, `version_label` ingest + query patterns. |\n| [`docs/EVOLUTION-API-QUICKREF.md`](docs/EVOLUTION-API-QUICKREF.md) | Common assumptions vs reality for the Tier 1 API (which kwargs are per-query vs config-only, schema column locations, semantics of `as_of` × `retracted_at`). |\n| [`docs/cookbook/per-call-kwargs.md`](docs/cookbook/per-call-kwargs.md) | Per-call overrides on `query()`/`ask()` — `retracted_behavior`, `supersession_behavior`, `memory_tier`, `retrieval_strategy`, `as_of`, `version_filter`, `evolution_aware`. Multi-tenant-safe (no config mutation). |\n| [`docs/cookbook/retrieval-strategy.md`](docs/cookbook/retrieval-strategy.md) | Three SQL shapes for metadata + vector queries — `weighted` (default), `pre_filter`, `vector_first`. When to pick which; recall-shortfall metric. |\n| [`docs/cookbook/metadata-indexes.md`](docs/cookbook/metadata-indexes.md) | Btree / GIN / generated-column indexes on `chunks.metadata` and `documents.metadata`. Runtime API (`recommend_metadata_indexes()`, `apply_metadata_indexes_concurrently()`). |\n| [`docs/user-guide.md`](docs/user-guide.md) | Full user guide. Installation, all 6 modes, configuration, REST API, production deployment, troubleshooting. |\n| [`docs/devmem-guide.md`](docs/devmem-guide.md) | `pgrg devmem` — the developer-knowledge-base flavor with code-aware chunking + dev-tuned extraction. |\n| [`research/`](research/) | Architecture rationale, vs-AGE evaluation, competitor analyses (LightRAG, Neo4j, Zep). |\n| [`ASSESSMENT.md`](ASSESSMENT.md) | No-BS project evaluation. Strengths, gaps, where you should and shouldn't use it. |\n| [`benchmarks/`](benchmarks/) | Every benchmark corpus + runner + results document. Re-runnable from clone. |\n\n---\n\n# The weeds\n\nBelow this line is the reference material — architecture, the retrieval-mode menu, every environment variable, the schema, and the prior-art rebuttals. Read on if you want to go deep; skip if you just want to get something working.\n\n## Architecture\n\n```mermaid\ngraph TB\n    subgraph PGRG[\"pg-raggraph (Python, ~4K LOC core)\"]\n        CLI[pgrg CLI]\n        API[FastAPI server]\n        MCP[MCP server]\n        SDK[GraphRAG SDK]\n        CLI --\u003e SDK\n        API --\u003e SDK\n        MCP --\u003e SDK\n        SDK --\u003e ING[Ingestion Pipeline]\n        SDK --\u003e RET[Retrieval Engine]\n        ING --\u003e CHK[Chunker\u003cbr/\u003emarkdown / code / text]\n        ING --\u003e EMB[fastembed\u003cbr/\u003elocal 384-dim]\n        ING --\u003e EXT[LLM extractor\u003cbr/\u003eOpenAI-compatible]\n        ING --\u003e RES[Entity resolver\u003cbr/\u003epg_trgm + vector]\n        RET --\u003e SM[Smart Router]\n        SM --\u003e NV[naive: vector + BM25]\n        SM --\u003e GB[graph boost: 1-hop re-rank]\n        SM --\u003e LC[local / global / hybrid:\u003cbr/\u003erecursive CTEs]\n    end\n    subgraph PG[\"PostgreSQL 16+\"]\n        PGV[pgvector HNSW]\n        PGT[pg_trgm GIN]\n        FTS[tsvector full-text]\n        TBL[(documents · chunks ·\u003cbr/\u003eentities · relationships ·\u003cbr/\u003edocument_versions ·\u003cbr/\u003efacts · fact_edges)]\n    end\n    NV --\u003e PGV\n    NV --\u003e FTS\n    GB --\u003e TBL\n    LC --\u003e TBL\n    RES --\u003e PGT\n    RES --\u003e PGV\n```\n\n**Two extensions** — `pgvector` (vector search) and `pg_trgm` (built into Postgres in most builds). Auto-bootstrapped schema. Migrations applied on first connect under a per-project advisory lock. Everything else is plain SQL.\n\n## Retrieval modes\n\n`smart` (the default) routes between three strategies based on confidence: ship-as-is when the naive top score is high, apply a cheap graph boost when medium, escalate to graph expansion when low. Manually pin to a specific mode with `mode=\"...\"` if you know your access pattern.\n\n| Mode | What it does | Typical latency |\n|------|--------------|:-:|\n| **`smart`** ⭐ | Routes between naive / boost / expand based on confidence | 85–220 ms |\n| `naive` | Vector similarity + BM25 | ~85 ms |\n| `naive_boost` | Naive + 1-hop graph re-rank | ~90 ms |\n| `local` | Seed → recursive CTE traversal → rank | ~220 ms |\n| `global` | Relationship-centric retrieval | ~150 ms |\n| `hybrid` | local + global merged | ~450 ms |\n\nFull deep-dive with selection guidance and per-mode SQL: [`docs/modes.md`](docs/modes.md). Schema diagram + ER relationships: [`docs/user-guide.md#schema-overview`](docs/user-guide.md#schema-overview).\n\n## Configuration (essentials)\n\nAll settings via env vars prefixed `PGRG_` (also work as kwargs to `GraphRAG(...)`). The most-used ones:\n\n| Variable | Default | What it does |\n|----------|---------|--------------|\n| `PGRG_DSN` | `postgresql://postgres:postgres@localhost:5434/pg_raggraph` | Database connection. Refuses to start if `PGRG_ENV=production` and DSN unchanged. |\n| `PGRG_NAMESPACE` | `default` | Data isolation key. |\n| `PGRG_LLM_BASE_URL` | `http://localhost:11434/v1` | OpenAI-compatible LLM endpoint. |\n| `PGRG_LLM_API_KEY` | `\"\"` | Bearer token (empty for Ollama). |\n| `PGRG_EVOLUTION_TIER` | `off` | `off` / `structural` (Tier 1 evolution-aware). |\n| `PGRG_INGEST_PROFILE` | `balanced` | `conservative` / `balanced` / `aggressive` / `max`. |\n| `PGRG_LOG_FORMAT` | (unset) | Set to `json` for structured logging (Datadog / ELK / Loki). |\n| `PGRG_SERVER_API_KEY` | (unset) | Enables Bearer auth on the FastAPI server. |\n\nFull reference (~25 vars including evolution scoring weights, entity-resolution thresholds, server upload caps, Origin allowlists): [`docs/user-guide.md#configuration`](docs/user-guide.md#configuration).\n\n## CLI reference\n\n```bash\n# Core\npgrg init                                # Bootstrap schema, verify connection\npgrg ingest PATH... [-n NS] [-p PROFILE] # Ingest files / directories\npgrg query \"question\" [-m MODE] [-n NS]  # Query (default: smart mode)\npgrg ask \"question\" [-m MODE] [-n NS]    # Query + grounded LLM answer\npgrg status [-n NS]                      # Graph statistics\npgrg delete -n NS                        # Delete a namespace's data\n\n# Servers\npgrg serve --port 8080                   # FastAPI + web UI (local/demo only)\npgrg demo                                # Auto-ingest sample data + launch UI\npgrg mcp-serve                           # MCP stdio server for Claude Desktop / Cursor / Zed\n\n# Developer-knowledge-base flavor (code-aware chunking + dev extraction prompt)\npgrg devmem ingest ./repo/ -p aggressive\npgrg devmem ask \"who owns the auth service?\"\n```\n\nThrottle profiles tune CPU-yield + parallel ingest knobs:\n\n| Profile | doc_concurrency | extract_concurrency | embed_batch_size | Use case |\n|---|:-:|:-:|:-:|---|\n| `conservative` | 1 | 4 | 8 | Shared servers, laptops on battery |\n| `balanced` | 2 | 8 | 16 | Default — most dev machines |\n| `aggressive` | 4 | 16 | 32 | Dedicated dev box |\n| `max` | 8 | 32 | 64 | One-off batch jobs on a beefy machine |\n\n## Why not Apache AGE?\n\nWe evaluated AGE (PostgreSQL's graph extension) before writing a line of code. We rejected it for four reasons:\n\n1. **Cloud killed.** AGE requires `shared_preload_libraries` — only Azure supports it among managed providers. No RDS, Supabase, Neon, or Cloud SQL.\n2. **Can't combine with pgvector in a single query.** AGE Cypher and pgvector live in different worlds. The killer GraphRAG operation needs two round-trips with AGE; one query with recursive CTEs.\n3. **Slower for GraphRAG patterns.** Bake-off measurements: AGE is **42–111× slower** on retrieval than recursive CTEs for the typical 1-3 hop pattern.\n4. **Production disaster.** LightRAG Issue #2255: 17-hour migration with AGE caused by a query plan estimating 49 **billion** intermediate rows for a 681K-row join. Closed `NOT_PLANNED`.\n\nFull analysis: [`research/apache-age-evaluation.md`](research/apache-age-evaluation.md). Bake-off verdict: [`benchmarks/age-bakeoff/results/REPORT-VERDICT.md`](benchmarks/age-bakeoff/results/REPORT-VERDICT.md).\n\n## Comparison\n\n| | pg-raggraph | LightRAG | Neo4j GraphRAG | Zep |\n|---|:-:|:-:|:-:|:-:|\n| PostgreSQL-native | ✅ | AGE adapter (Azure only) | ❌ | ❌ |\n| Single-query hybrid retrieval | ✅ | ❌ | ❌ | ❌ |\n| Works on RDS / Supabase / Neon | ✅ | ❌ | n/a | n/a |\n| License | MIT | MIT | Apache 2.0 | Apache 2.0 |\n| Pricing | free | free | $65+/mo Aura | $1.25/1K msgs |\n| Local embeddings by default | ✅ | ✅ | ❌ | ❌ |\n| Directed relationships | ✅ | ❌ (undirected) | ✅ | ✅ |\n| Time-aware / retraction-aware | ✅ Tier 1 | ❌ | ❌ | partial |\n| Stars | new | 33K+ | 2K+ | 24.8K |\n\nFull feature matrix: [`research/competition-comparison.md`](research/competition-comparison.md).\n\n## Requirements\n\n- Python 3.12+\n- PostgreSQL 16+ with `pgvector` and `pg_trgm` extensions\n- (Recommended) An OpenAI-compatible LLM endpoint for entity extraction. Without one, ingest still works as pure-vector RAG and graph features stay empty.\n\n## License\n\nMIT. See [`LICENSE`](LICENSE).\n\n---\n\n*Built with honest benchmarks and real corpora. Real numbers throughout this README come from `benchmarks/` runs that ship with the repo — re-runnable from clone. The unvarnished evaluation is in [`ASSESSMENT.md`](ASSESSMENT.md).*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyonk-labs%2Fpg-raggraph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyonk-labs%2Fpg-raggraph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyonk-labs%2Fpg-raggraph/lists"}