https://github.com/264gaurav/graph_rag
Graph RAG system using Neo4j, Gemma, and Groq β Imports documents, converts them into nodes/relationships via Cypher, and stores them in Neo4j. User queries retrieve relevant graph data, enabling multi-hop reasoning and accurate, context-aware answers powered by the Gemma model on Groq.
https://github.com/264gaurav/graph_rag
gemma google-colab-notebook graph-databases graphdb groq knowledge-graph langchain neo4j rag
Last synced: 5 months ago
JSON representation
Graph RAG system using Neo4j, Gemma, and Groq β Imports documents, converts them into nodes/relationships via Cypher, and stores them in Neo4j. User queries retrieve relevant graph data, enabling multi-hop reasoning and accurate, context-aware answers powered by the Gemma model on Groq.
- Host: GitHub
- URL: https://github.com/264gaurav/graph_rag
- Owner: 264Gaurav
- License: mit
- Created: 2025-08-10T11:39:33.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-08-10T13:32:35.000Z (6 months ago)
- Last Synced: 2025-08-10T14:25:19.597Z (6 months ago)
- Topics: gemma, google-colab-notebook, graph-databases, graphdb, groq, knowledge-graph, langchain, neo4j, rag
- Language: Jupyter Notebook
- Homepage:
- Size: 1 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π Graph RAG β Neo4j + Gemma (Groq) + Langchain
A **Graph-based Retrieval-Augmented Generation** system that ingests documents, builds a **Neo4j knowledge graph** with Cypher, and uses **Gemma** on the **Groq platform** for fast, accurate, relationship-aware question answering.
---
## π Overview
Traditional **RAG** retrieves chunks of unstructured text using search techniques such as **dense vector similarity**, **sparse/lexical search** (e.g., keyword or BM25), or **hybrid search** that combines both approaches.
**Graph RAG** goes further β it stores data as **entities** (nodes) and **relationships** (edges) in a **graph database**, enabling **multi-hop reasoning**(the ability to connect and traverse multiple linked facts to answer complex queries) and delivering **explainable answers**.
This project:
1. Ingests documents.
2. Extracts entities and relationships.
3. Stores them in **Neo4j**.
4. Uses **LangChainβs GraphCypherQAChain** to query the graph.
5. Passes relevant context to **Gemma** (via Groq) for final answer generation.
---
## π§ Key Concepts
- **Graph Database (Neo4j):** Stores and queries data as nodes & edges for connected insights.
- **Knowledge Graph:** Structured network of facts linking entities and relationships.
- **RAG:** Retrieval-Augmented Generation β retrieve external data, feed to an LLM.
- **Graph RAG:** RAG enhanced with graph queries for deeper, relationship-aware reasoning.
---
## βοΈ Architecture
```mermaid
flowchart LR
A[Document Ingestion] --> B[Entity & Relationship Extraction]
B --> C[Cypher Query Generation]
C --> D[Neo4j Knowledge Graph]
E[User Query] --> F[GraphCypherQAChain]
D --> F
F --> G[Gemma LLM via Groq]
G --> H[Context-Aware Answer]
```
---
## βΆοΈ Quickstart
```bash
pip install --upgrade langchain langchain-community langchain-groq neo4j
export NEO4J_URI="bolt://:7687"
export NEO4J_USERNAME="neo4j"
export NEO4J_PASSWORD=""
export GROQ_API_KEY=""
```
---
## π» Example Usage
```python
from langchain_community.graphs import Neo4jGraph
from langchain_groq import ChatGroq
from langchain.chains import GraphCypherQAChain
import os
graph = Neo4jGraph(url=os.environ["NEO4J_URI"],
username=os.environ["NEO4J_USERNAME"],
password=os.environ["NEO4J_PASSWORD"])
graph.refresh_schema()
llm = ChatGroq(groq_api_key=os.environ["GROQ_API_KEY"], model_name="Gemma2-9b-It")
chain = GraphCypherQAChain.from_llm(llm=llm, graph=graph, verbose=True, allow_dangerous_requests=True)
result = chain.invoke({"query": "Who was the director of the movie GoldenEye"})
print(result)
```
---
## π§ Example Cypher
```cypher
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/.../movies_small.csv' AS row
MERGE (m:Movie {id: row.movieId})
SET m.title = row.title, m.released = date(row.released), m.imdbRating = toFloat(row.imdbRating)
FOREACH (actor IN split(row.actors, '|') |
MERGE (p:Person {name: trim(actor)}) MERGE (p)-[:ACTED_IN]->(m))
FOREACH (director IN split(row.directors, '|') |
MERGE (p:Person {name: trim(director)}) MERGE (p)-[:DIRECTED]->(m))
FOREACH (genre IN split(row.genres, '|') |
MERGE (g:Genre {name: trim(genre)}) MERGE (m)-[:IN_GENRE]->(g));
```
---
## πΈ Sample Output
**Database visualisation in Graph :** (you can see here `https://console-preview.neo4j.io/tools/query` )

---
**Database visualisation in Table :**

---
> The screenshot above shows the reasoning steps and final answer generated by the **Gemma model** after retrieving relevant nodes and relationships from **Neo4j**.
---
## π― Benefits of Graph RAG
β
Multi-hop reasoning over connected facts
β
More accurate, explainable answers
β
Works well in finance, healthcare, research, legal domains
---
## π Tech Stack
- **Neo4j** β Graph database
- **Cypher** β Graph query language
- **Gemma** β Large Language Model
- **Groq** β High-speed inference
- **LangChain** β Orchestration
---
## β οΈ Notes
- Use environment variables or secret managers for credentials.
- `allow_dangerous_requests=True` allows generated Cypher execution β validate queries in production.
- Enhance ingestion with NLP-based entity/relation extraction for better graph quality.