https://github.com/r4stin/kg-research-agent
Evidence-grounded, multi-agent research assistant that performs RAG over scientific papers, extracts structured claims, builds a Neo4j knowledge graph, and answers questions with verifiable citations and stateful session memory.
https://github.com/r4stin/kg-research-agent
adk ai-agents chroma evidence-extraction google-gemini knowledge-graph llm multi-agent-system neo4j nlp rag research-tools
Last synced: about 2 months ago
JSON representation
Evidence-grounded, multi-agent research assistant that performs RAG over scientific papers, extracts structured claims, builds a Neo4j knowledge graph, and answers questions with verifiable citations and stateful session memory.
- Host: GitHub
- URL: https://github.com/r4stin/kg-research-agent
- Owner: r4stin
- License: mit
- Created: 2025-11-13T19:38:27.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-26T21:04:02.000Z (7 months ago)
- Last Synced: 2025-11-29T16:41:51.866Z (7 months ago)
- Topics: adk, ai-agents, chroma, evidence-extraction, google-gemini, knowledge-graph, llm, multi-agent-system, neo4j, nlp, rag, research-tools
- Language: Python
- Homepage:
- Size: 58.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π **KG-Research-Agent**
### *Multi-Agent, Evidence-Grounded Research System with Gemini, ADK, ChromaDB & Neo4j*
**π₯ A research-grade AI agent that extracts claims + evidence from scientific papers, stores them in a knowledge graph, retrieves context, and answers questions using multi-agent reasoning with session memory.**
[]()
[]()
[]()
[]()
[]()
---
# π **Overview**
**KG-Research-Agent** is an AI-powered research assistant that:
- Ingests scientific PDFs
- Embeds + stores them in ChromaDB
- Retrieves relevant text chunks (RAG)
- Extracts **structured claims & evidence** from papers
- Stores them in a **Neo4j Knowledge Graph**
- Answers questions using **citations grounded in source text**
- Uses a **multi-agent pipeline** (Planner β Retriever β Evidence β Answer)
- Supports **multi-turn conversations with session memory**
A full walkthrough of the multi-agent research system is available on YouTube:
π **[Watch the Concept Overview](https://youtu.be/vaq0-AMOudo)**
---
# π§ **Updated Architecture (Multi-Agent + Memory)**
```
βββββββββ User βββββββββ
β
βΌ
βββββββββββββββββ
β Planner Agent β β uses chat history + memory
βββββββββββββββββ
β plans tasks
βΌ
ββββββββββββββββββββββββββ
β Retriever Agent β β ChromaDB (vector search)
ββββββββββββββββββββββββββ
β chunks
βΌ
ββββββββββββββββββββββββββ
β Evidence Agent β β extracts claims + sentences
ββββββββββββββββββββββββββ
β structured JSON
βΌ
ββββββββββββββββββββββββββ
β Answer Agent β β composes human-readable answer
ββββββββββββββββββββββββββ
β
βΌ
**Final Answer + Citations**
π¦ Persistent Storage:
- Neo4j β long-term knowledge graph
- ChromaDB β vector retrieval
- SessionState β short-term conversation memory
```
---
# β¨ **Current Features**
### βοΈ PDF β Chunking β Vector Storage
### βοΈ RAG Retrieval (Chroma + Gemini)
### βοΈ Multi-Agent System (Planner β Retriever β Evidence β Answer)
### βοΈ Structured JSON Evidence Extraction
### βοΈ Neo4j Knowledge Graph Storage
### βοΈ Session Memory (short-term conversational context)
### βοΈ Deduplication (per chunk + semantic similarity)
### βοΈ Multi-turn conversational research workflow
---
# π **Getting Started**
## 1οΈβ£ Clone the Repo
```
git clone https://github.com/yourusername/kg-research-agent.git
cd kg-research-agent
```
## 2οΈβ£ Create Conda Environment
```
conda create -n kg-research-agent python=3.10
conda activate kg-research-agent
```
## 3οΈβ£ Install Requirements
```
pip install -r requirements.txt
```
## 4οΈβ£ Environment Variables (`.env`)
```
GOOGLE_API_KEY="your-key"
CHROMA_DB_PATH="data/chroma"
PDF_STORAGE="data/papers"
NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="yourpassword"
```
---
# π§ͺ **Running the System**
### PDF Ingestion
```
python -m src.tools.pdf_ingest
```
### Evidence Extraction
```
python -m src.run_evidence_extraction
```
### KG Query
```
python -m src.pipelines.run_kg_query
```
# π§ **New: Multi-Agent Runner**
Run full pipeline with memory:
```
python -m src.pipelines.run_multi_agent_pipeline
```
Example:
```
You: What is a major challenge in scholarly information retrieval?
You: Summarize in one sentence.
```
The agent maintains context across turns.
---
# πΊοΈ **Roadmap**
## π₯ Agent Quality (Next Milestone)
- ADK logs + traces
- Metrics for agent performance
- LLM-as-a-Judge evaluation
## π¦ Multi-Agent Enhancements
- Add **KG Agent** (read/write Neo4j in pipeline)
- Add planner task types: `kg_query`, `kg_write`
- Context compaction + memory optimization
## π© Productionization
- A2A protocol (agent-to-agent messaging)
- Deployment to **Vertex AI Agent Engine**
- API endpoints + orchestration layer
---
# π License
MIT License.
You may use, modify, and distribute this project freely.