An open API service indexing awesome lists of open source software.

https://github.com/264gaurav/graph_rag

Graph RAG system using Neo4j, Gemma, and Groq β€” Imports documents, converts them into nodes/relationships via Cypher, and stores them in Neo4j. User queries retrieve relevant graph data, enabling multi-hop reasoning and accurate, context-aware answers powered by the Gemma model on Groq.
https://github.com/264gaurav/graph_rag

gemma google-colab-notebook graph-databases graphdb groq knowledge-graph langchain neo4j rag

Last synced: 5 months ago
JSON representation

Graph RAG system using Neo4j, Gemma, and Groq β€” Imports documents, converts them into nodes/relationships via Cypher, and stores them in Neo4j. User queries retrieve relevant graph data, enabling multi-hop reasoning and accurate, context-aware answers powered by the Gemma model on Groq.

Awesome Lists containing this project

README

          

# πŸš€ Graph RAG β€” Neo4j + Gemma (Groq) + Langchain

A **Graph-based Retrieval-Augmented Generation** system that ingests documents, builds a **Neo4j knowledge graph** with Cypher, and uses **Gemma** on the **Groq platform** for fast, accurate, relationship-aware question answering.

---

## πŸ“Œ Overview

Traditional **RAG** retrieves chunks of unstructured text using search techniques such as **dense vector similarity**, **sparse/lexical search** (e.g., keyword or BM25), or **hybrid search** that combines both approaches.
**Graph RAG** goes further β€” it stores data as **entities** (nodes) and **relationships** (edges) in a **graph database**, enabling **multi-hop reasoning**(the ability to connect and traverse multiple linked facts to answer complex queries) and delivering **explainable answers**.

This project:

1. Ingests documents.
2. Extracts entities and relationships.
3. Stores them in **Neo4j**.
4. Uses **LangChain’s GraphCypherQAChain** to query the graph.
5. Passes relevant context to **Gemma** (via Groq) for final answer generation.

---

## 🧠 Key Concepts

- **Graph Database (Neo4j):** Stores and queries data as nodes & edges for connected insights.
- **Knowledge Graph:** Structured network of facts linking entities and relationships.
- **RAG:** Retrieval-Augmented Generation β€” retrieve external data, feed to an LLM.
- **Graph RAG:** RAG enhanced with graph queries for deeper, relationship-aware reasoning.

---

## βš™οΈ Architecture

```mermaid
flowchart LR
A[Document Ingestion] --> B[Entity & Relationship Extraction]
B --> C[Cypher Query Generation]
C --> D[Neo4j Knowledge Graph]
E[User Query] --> F[GraphCypherQAChain]
D --> F
F --> G[Gemma LLM via Groq]
G --> H[Context-Aware Answer]
```

---

## ▢️ Quickstart

```bash
pip install --upgrade langchain langchain-community langchain-groq neo4j

export NEO4J_URI="bolt://:7687"
export NEO4J_USERNAME="neo4j"
export NEO4J_PASSWORD=""
export GROQ_API_KEY=""
```

---

## πŸ’» Example Usage

```python
from langchain_community.graphs import Neo4jGraph
from langchain_groq import ChatGroq
from langchain.chains import GraphCypherQAChain
import os

graph = Neo4jGraph(url=os.environ["NEO4J_URI"],
username=os.environ["NEO4J_USERNAME"],
password=os.environ["NEO4J_PASSWORD"])
graph.refresh_schema()

llm = ChatGroq(groq_api_key=os.environ["GROQ_API_KEY"], model_name="Gemma2-9b-It")

chain = GraphCypherQAChain.from_llm(llm=llm, graph=graph, verbose=True, allow_dangerous_requests=True)

result = chain.invoke({"query": "Who was the director of the movie GoldenEye"})
print(result)
```

---

## πŸ”§ Example Cypher

```cypher
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/.../movies_small.csv' AS row
MERGE (m:Movie {id: row.movieId})
SET m.title = row.title, m.released = date(row.released), m.imdbRating = toFloat(row.imdbRating)
FOREACH (actor IN split(row.actors, '|') |
MERGE (p:Person {name: trim(actor)}) MERGE (p)-[:ACTED_IN]->(m))
FOREACH (director IN split(row.directors, '|') |
MERGE (p:Person {name: trim(director)}) MERGE (p)-[:DIRECTED]->(m))
FOREACH (genre IN split(row.genres, '|') |
MERGE (g:Genre {name: trim(genre)}) MERGE (m)-[:IN_GENRE]->(g));
```

---

## πŸ“Έ Sample Output

**Database visualisation in Graph :** (you can see here `https://console-preview.neo4j.io/tools/query` )

![Sample Output](images/img1.png)

---

**Database visualisation in Table :**

![Sample Output](images/img2.png)

---

> The screenshot above shows the reasoning steps and final answer generated by the **Gemma model** after retrieving relevant nodes and relationships from **Neo4j**.

---

## 🎯 Benefits of Graph RAG

βœ… Multi-hop reasoning over connected facts
βœ… More accurate, explainable answers
βœ… Works well in finance, healthcare, research, legal domains

---

## πŸ“Œ Tech Stack

- **Neo4j** β€” Graph database
- **Cypher** β€” Graph query language
- **Gemma** β€” Large Language Model
- **Groq** β€” High-speed inference
- **LangChain** β€” Orchestration

---

## ⚠️ Notes

- Use environment variables or secret managers for credentials.
- `allow_dangerous_requests=True` allows generated Cypher execution β€” validate queries in production.
- Enhance ingestion with NLP-based entity/relation extraction for better graph quality.