https://github.com/264gaurav/graph_rag

Graph RAG system using Neo4j, Gemma, and Groq — Imports documents, converts them into nodes/relationships via Cypher, and stores them in Neo4j. User queries retrieve relevant graph data, enabling multi-hop reasoning and accurate, context-aware answers powered by the Gemma model on Groq.
https://github.com/264gaurav/graph_rag

gemma google-colab-notebook graph-databases graphdb groq knowledge-graph langchain neo4j rag

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/264gaurav/graph_rag
Owner: 264Gaurav
License: mit
Created: 2025-08-10T11:39:33.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-08-10T13:32:35.000Z (6 months ago)
Last Synced: 2025-08-10T14:25:19.597Z (6 months ago)
Topics: gemma, google-colab-notebook, graph-databases, graphdb, groq, knowledge-graph, langchain, neo4j, rag
Language: Jupyter Notebook
Homepage:
Size: 1 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # 🚀 Graph RAG — Neo4j + Gemma (Groq) + Langchain

A **Graph-based Retrieval-Augmented Generation** system that ingests documents, builds a **Neo4j knowledge graph** with Cypher, and uses **Gemma** on the **Groq platform** for fast, accurate, relationship-aware question answering.

---

## 📌 Overview

Traditional **RAG** retrieves chunks of unstructured text using search techniques such as **dense vector similarity**, **sparse/lexical search** (e.g., keyword or BM25), or **hybrid search** that combines both approaches.

**Graph RAG** goes further — it stores data as **entities** (nodes) and **relationships** (edges) in a **graph database**, enabling **multi-hop reasoning**(the ability to connect and traverse multiple linked facts to answer complex queries) and delivering **explainable answers**.

This project:

1. Ingests documents.

2. Extracts entities and relationships.

3. Stores them in **Neo4j**.

4. Uses **LangChain’s GraphCypherQAChain** to query the graph.

5. Passes relevant context to **Gemma** (via Groq) for final answer generation.

---

## 🧠 Key Concepts

- **Graph Database (Neo4j):** Stores and queries data as nodes & edges for connected insights.

- **Knowledge Graph:** Structured network of facts linking entities and relationships.

- **RAG:** Retrieval-Augmented Generation — retrieve external data, feed to an LLM.

- **Graph RAG:** RAG enhanced with graph queries for deeper, relationship-aware reasoning.

---

## ⚙️ Architecture

```mermaid

flowchart LR

    A[Document Ingestion] --> B[Entity & Relationship Extraction]

    B --> C[Cypher Query Generation]

    C --> D[Neo4j Knowledge Graph]

    E[User Query] --> F[GraphCypherQAChain]

    D --> F

    F --> G[Gemma LLM via Groq]

    G --> H[Context-Aware Answer]

```

---

## ▶️ Quickstart

```bash

pip install --upgrade langchain langchain-community langchain-groq neo4j

export NEO4J_URI="bolt://:7687"

export NEO4J_USERNAME="neo4j"

export NEO4J_PASSWORD=""

export GROQ_API_KEY=""

```

---

## 💻 Example Usage

```python

from langchain_community.graphs import Neo4jGraph

from langchain_groq import ChatGroq

from langchain.chains import GraphCypherQAChain

import os

graph = Neo4jGraph(url=os.environ["NEO4J_URI"],

                   username=os.environ["NEO4J_USERNAME"],

                   password=os.environ["NEO4J_PASSWORD"])

graph.refresh_schema()

llm = ChatGroq(groq_api_key=os.environ["GROQ_API_KEY"], model_name="Gemma2-9b-It")

chain = GraphCypherQAChain.from_llm(llm=llm, graph=graph, verbose=True, allow_dangerous_requests=True)

result = chain.invoke({"query": "Who was the director of the movie GoldenEye"})

print(result)

```

---

## 🔧 Example Cypher

```cypher

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/.../movies_small.csv' AS row

MERGE (m:Movie {id: row.movieId})

SET m.title = row.title, m.released = date(row.released), m.imdbRating = toFloat(row.imdbRating)

FOREACH (actor IN split(row.actors, '|') |

  MERGE (p:Person {name: trim(actor)}) MERGE (p)-[:ACTED_IN]->(m))

FOREACH (director IN split(row.directors, '|') |

  MERGE (p:Person {name: trim(director)}) MERGE (p)-[:DIRECTED]->(m))

FOREACH (genre IN split(row.genres, '|') |

  MERGE (g:Genre {name: trim(genre)}) MERGE (m)-[:IN_GENRE]->(g));

```

---

## 📸 Sample Output

**Database visualisation in Graph :** (you can see here `https://console-preview.neo4j.io/tools/query` )

![Sample Output](images/img1.png)

---

**Database visualisation in Table :**

![Sample Output](images/img2.png)

---

> The screenshot above shows the reasoning steps and final answer generated by the **Gemma model** after retrieving relevant nodes and relationships from **Neo4j**.

---

## 🎯 Benefits of Graph RAG

✅ Multi-hop reasoning over connected facts

✅ More accurate, explainable answers

✅ Works well in finance, healthcare, research, legal domains

---

## 📌 Tech Stack

- **Neo4j** — Graph database

- **Cypher** — Graph query language

- **Gemma** — Large Language Model

- **Groq** — High-speed inference

- **LangChain** — Orchestration

---

## ⚠️ Notes

- Use environment variables or secret managers for credentials.

- `allow_dangerous_requests=True` allows generated Cypher execution — validate queries in production.

- Enhance ingestion with NLP-based entity/relation extraction for better graph quality.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/264gaurav/graph_rag

Awesome Lists containing this project

README