An open API service indexing awesome lists of open source software.

https://github.com/r4stin/kg-research-agent

Evidence-grounded, multi-agent research assistant that performs RAG over scientific papers, extracts structured claims, builds a Neo4j knowledge graph, and answers questions with verifiable citations and stateful session memory.
https://github.com/r4stin/kg-research-agent

adk ai-agents chroma evidence-extraction google-gemini knowledge-graph llm multi-agent-system neo4j nlp rag research-tools

Last synced: about 2 months ago
JSON representation

Evidence-grounded, multi-agent research assistant that performs RAG over scientific papers, extracts structured claims, builds a Neo4j knowledge graph, and answers questions with verifiable citations and stateful session memory.

Awesome Lists containing this project

README

          

# πŸ“š **KG-Research-Agent**
### *Multi-Agent, Evidence-Grounded Research System with Gemini, ADK, ChromaDB & Neo4j*

**πŸ”₯ A research-grade AI agent that extracts claims + evidence from scientific papers, stores them in a knowledge graph, retrieves context, and answers questions using multi-agent reasoning with session memory.**

[![Python](https://img.shields.io/badge/Python-3.10-blue.svg)]()
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)]()
[![Neo4j](https://img.shields.io/badge/Neo4j-GraphDB-blue.svg)]()
[![ChromaDB](https://img.shields.io/badge/ChromaDB-Vector_Store-purple.svg)]()
[![Gemini](https://img.shields.io/badge/Gemini-LLM-orange.svg)]()

---

# πŸš€ **Overview**

**KG-Research-Agent** is an AI-powered research assistant that:

- Ingests scientific PDFs
- Embeds + stores them in ChromaDB
- Retrieves relevant text chunks (RAG)
- Extracts **structured claims & evidence** from papers
- Stores them in a **Neo4j Knowledge Graph**
- Answers questions using **citations grounded in source text**
- Uses a **multi-agent pipeline** (Planner β†’ Retriever β†’ Evidence β†’ Answer)
- Supports **multi-turn conversations with session memory**

A full walkthrough of the multi-agent research system is available on YouTube:

πŸ‘‰ **[Watch the Concept Overview](https://youtu.be/vaq0-AMOudo)**

---

# 🧠 **Updated Architecture (Multi-Agent + Memory)**

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€ User ────────┐
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Planner Agent β”‚ ← uses chat history + memory
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ plans tasks
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Retriever Agent β”‚ β†’ ChromaDB (vector search)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ chunks
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Evidence Agent β”‚ β†’ extracts claims + sentences
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ structured JSON
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Answer Agent β”‚ β†’ composes human-readable answer
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
**Final Answer + Citations**

πŸ“¦ Persistent Storage:
- Neo4j β†’ long-term knowledge graph
- ChromaDB β†’ vector retrieval
- SessionState β†’ short-term conversation memory
```

---

# ✨ **Current Features**

### βœ”οΈ PDF β†’ Chunking β†’ Vector Storage
### βœ”οΈ RAG Retrieval (Chroma + Gemini)
### βœ”οΈ Multi-Agent System (Planner β†’ Retriever β†’ Evidence β†’ Answer)
### βœ”οΈ Structured JSON Evidence Extraction
### βœ”οΈ Neo4j Knowledge Graph Storage
### βœ”οΈ Session Memory (short-term conversational context)
### βœ”οΈ Deduplication (per chunk + semantic similarity)
### βœ”οΈ Multi-turn conversational research workflow

---

# 🏁 **Getting Started**

## 1️⃣ Clone the Repo
```
git clone https://github.com/yourusername/kg-research-agent.git
cd kg-research-agent
```

## 2️⃣ Create Conda Environment
```
conda create -n kg-research-agent python=3.10
conda activate kg-research-agent
```

## 3️⃣ Install Requirements
```
pip install -r requirements.txt
```

## 4️⃣ Environment Variables (`.env`)

```
GOOGLE_API_KEY="your-key"
CHROMA_DB_PATH="data/chroma"
PDF_STORAGE="data/papers"

NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="yourpassword"
```

---

# πŸ§ͺ **Running the System**

### PDF Ingestion
```
python -m src.tools.pdf_ingest
```

### Evidence Extraction
```
python -m src.run_evidence_extraction
```

### KG Query
```
python -m src.pipelines.run_kg_query
```

# πŸ”§ **New: Multi-Agent Runner**

Run full pipeline with memory:

```
python -m src.pipelines.run_multi_agent_pipeline
```

Example:

```
You: What is a major challenge in scholarly information retrieval?
You: Summarize in one sentence.
```

The agent maintains context across turns.

---

# πŸ—ΊοΈ **Roadmap**

## πŸŸ₯ Agent Quality (Next Milestone)
- ADK logs + traces
- Metrics for agent performance
- LLM-as-a-Judge evaluation

## 🟦 Multi-Agent Enhancements
- Add **KG Agent** (read/write Neo4j in pipeline)
- Add planner task types: `kg_query`, `kg_write`
- Context compaction + memory optimization

## 🟩 Productionization
- A2A protocol (agent-to-agent messaging)
- Deployment to **Vertex AI Agent Engine**
- API endpoints + orchestration layer

---

# πŸ“œ License

MIT License.
You may use, modify, and distribute this project freely.