https://github.com/r4stin/kg-research-agent

Evidence-grounded, multi-agent research assistant that performs RAG over scientific papers, extracts structured claims, builds a Neo4j knowledge graph, and answers questions with verifiable citations and stateful session memory.
https://github.com/r4stin/kg-research-agent

adk ai-agents chroma evidence-extraction google-gemini knowledge-graph llm multi-agent-system neo4j nlp rag research-tools

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/r4stin/kg-research-agent
Owner: r4stin
License: mit
Created: 2025-11-13T19:38:27.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-11-26T21:04:02.000Z (7 months ago)
Last Synced: 2025-11-29T16:41:51.866Z (7 months ago)
Topics: adk, ai-agents, chroma, evidence-extraction, google-gemini, knowledge-graph, llm, multi-agent-system, neo4j, nlp, rag, research-tools
Language: Python
Homepage:
Size: 58.6 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 📚 **KG-Research-Agent**
### *Multi-Agent, Evidence-Grounded Research System with Gemini, ADK, ChromaDB & Neo4j*

**🔥 A research-grade AI agent that extracts claims + evidence from scientific papers, stores them in a knowledge graph, retrieves context, and answers questions using multi-agent reasoning with session memory.**

[![Python](https://img.shields.io/badge/Python-3.10-blue.svg)]()
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)]()
[![Neo4j](https://img.shields.io/badge/Neo4j-GraphDB-blue.svg)]()
[![ChromaDB](https://img.shields.io/badge/ChromaDB-Vector_Store-purple.svg)]()
[![Gemini](https://img.shields.io/badge/Gemini-LLM-orange.svg)]()

---

# 🚀 **Overview**

**KG-Research-Agent** is an AI-powered research assistant that:

- Ingests scientific PDFs
- Embeds + stores them in ChromaDB
- Retrieves relevant text chunks (RAG)
- Extracts **structured claims & evidence** from papers
- Stores them in a **Neo4j Knowledge Graph**
- Answers questions using **citations grounded in source text**
- Uses a **multi-agent pipeline** (Planner → Retriever → Evidence → Answer)
- Supports **multi-turn conversations with session memory**

A full walkthrough of the multi-agent research system is available on YouTube:

👉 **[Watch the Concept Overview](https://youtu.be/vaq0-AMOudo)**

---

# 🧠 **Updated Architecture (Multi-Agent + Memory)**

```
┌──────── User ────────┐
│
▼
┌───────────────┐
│ Planner Agent │ ← uses chat history + memory
└───────────────┘
│ plans tasks
▼
┌────────────────────────┐
│ Retriever Agent │ → ChromaDB (vector search)
└────────────────────────┘
│ chunks
▼
┌────────────────────────┐
│ Evidence Agent │ → extracts claims + sentences
└────────────────────────┘
│ structured JSON
▼
┌────────────────────────┐
│ Answer Agent │ → composes human-readable answer
└────────────────────────┘
│
▼
**Final Answer + Citations**

📦 Persistent Storage:
- Neo4j → long-term knowledge graph
- ChromaDB → vector retrieval
- SessionState → short-term conversation memory
```

---

# ✨ **Current Features**

### ✔️ PDF → Chunking → Vector Storage
### ✔️ RAG Retrieval (Chroma + Gemini)
### ✔️ Multi-Agent System (Planner → Retriever → Evidence → Answer)
### ✔️ Structured JSON Evidence Extraction
### ✔️ Neo4j Knowledge Graph Storage
### ✔️ Session Memory (short-term conversational context)
### ✔️ Deduplication (per chunk + semantic similarity)
### ✔️ Multi-turn conversational research workflow

---

# 🏁 **Getting Started**

## 1️⃣ Clone the Repo
```
git clone https://github.com/yourusername/kg-research-agent.git
cd kg-research-agent
```

## 2️⃣ Create Conda Environment
```
conda create -n kg-research-agent python=3.10
conda activate kg-research-agent
```

## 3️⃣ Install Requirements
```
pip install -r requirements.txt
```

## 4️⃣ Environment Variables (`.env`)

```
GOOGLE_API_KEY="your-key"
CHROMA_DB_PATH="data/chroma"
PDF_STORAGE="data/papers"

NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="yourpassword"
```

---

# 🧪 **Running the System**

### PDF Ingestion
```
python -m src.tools.pdf_ingest
```

### Evidence Extraction
```
python -m src.run_evidence_extraction
```

### KG Query
```
python -m src.pipelines.run_kg_query
```

# 🔧 **New: Multi-Agent Runner**

Run full pipeline with memory:

```
python -m src.pipelines.run_multi_agent_pipeline
```

Example:

```
You: What is a major challenge in scholarly information retrieval?
You: Summarize in one sentence.
```

The agent maintains context across turns.

---

# 🗺️ **Roadmap**

## 🟥 Agent Quality (Next Milestone)
- ADK logs + traces
- Metrics for agent performance
- LLM-as-a-Judge evaluation

## 🟦 Multi-Agent Enhancements
- Add **KG Agent** (read/write Neo4j in pipeline)
- Add planner task types: `kg_query`, `kg_write`
- Context compaction + memory optimization

## 🟩 Productionization
- A2A protocol (agent-to-agent messaging)
- Deployment to **Vertex AI Agent Engine**
- API endpoints + orchestration layer

---

# 📜 License

MIT License.
You may use, modify, and distribute this project freely.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/r4stin/kg-research-agent

Awesome Lists containing this project

README