https://github.com/abdulvahapmutlu/research-agent
A hybrid Research Assistant that combines an exact Knowledge Graph (Neo4j) with a Retrieval‑Augmented Generation pipeline (FAISS + Cross‑Encoder + FLAN‑T5) behind a sleek Streamlit interface.
https://github.com/abdulvahapmutlu/research-agent
agent aiagent cross-encoder deep-learning docker faiss flan-t5 graph neo4j orchest rag rag-agent
Last synced: 3 months ago
JSON representation
A hybrid Research Assistant that combines an exact Knowledge Graph (Neo4j) with a Retrieval‑Augmented Generation pipeline (FAISS + Cross‑Encoder + FLAN‑T5) behind a sleek Streamlit interface.
- Host: GitHub
- URL: https://github.com/abdulvahapmutlu/research-agent
- Owner: abdulvahapmutlu
- License: mit
- Created: 2025-05-05T12:43:56.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-06-21T11:40:49.000Z (4 months ago)
- Last Synced: 2025-06-21T12:29:33.444Z (4 months ago)
- Topics: agent, aiagent, cross-encoder, deep-learning, docker, faiss, flan-t5, graph, neo4j, orchest, rag, rag-agent
- Language: Jupyter Notebook
- Homepage:
- Size: 11.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Research Assistant Agent
This repo contains a hybrid Research Assistant that combines an exact Knowledge Graph (Neo4j) with a Retrieval‑Augmented Generation pipeline (FAISS + Cross‑Encoder + FLAN‑T5) behind a sleek Streamlit interface.
## Overview
This project demonstrates a two‑pronged approach to question answering over a corpus of research papers:
- **Structured Queries via Neo4j**
Exact, lightning‑fast answers for graph‑able questions (e.g. “Which approaches used ResNet backbone?”).- **Free‑Form QA via RAG Pipeline**
Semantic search (FAISS) + keyword filtering + cross‑encoder reranking + an LLM (FLAN‑T5) to handle open‑ended queries.---
## Features
- **Graph‑Based Retrieval** with Cypher queries for high‑precision lookups.
- **Vector‑Based Retrieval** using FAISS for deep semantic similarity.
- **Cross‑Encoder Reranking** to boost relevance of top FAISS hits.
- **LLM‑Based Synthesis** with HuggingFace’s FLAN‑T5.
- **Interactive UI** powered by Streamlit, complete with real‑time metrics and example prompts.
- **Containerized Demo** via Docker Compose for one‑command deployment.---
## Prerequisites
- **Docker** & **Docker Compose** (for containerized demo)
- **Python 3.10+** and **pip** (for local development)
- (Optional) **Conda** for environment management---
## Installation
### Local Setup
1. **Clone the repository**
```
git clone https://github.com/abdulvahapmutlu/research-agent.git
cd research-agent
```2. **Create and activate your environment**
**Conda**:
```
conda env create -f environment.yml
conda activate agent
```
**venv**:
```
python -m venv venv
source venv/bin/activate # macOS/Linux
.\venv\Scripts\activate # Windows
```3. **Install dependencies**
```
pip install -r requirements.txt
```4. **Set environment variables**
```
export NEO4J_URI="bolt://localhost:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASS="password"
```
*Windows (PowerShell)*:
```
$env:NEO4J_URI="bolt://localhost:7687"
$env:NEO4J_USER="neo4j"
$env:NEO4J_PASS="password"
```5. **Populate the Neo4j graph**
Make sure Neo4j is running (e.g. via `docker-compose up neo4j`), then:
```
python graph/import_to_neo4j.py
```6. **Launch the Streamlit app**
```
streamlit run app.py
```
Open your browser at [http://localhost:8501](http://localhost:8501).---
### Docker Compose (One‑Command Demo)
1. Add your Neo4j credentials into the `neo4j.environment` and `app.environment` sections of `docker-compose.yml`.
2. Run:
```
docker-compose up --build
```
3. Browse to [http://localhost:8501](http://localhost:8501) for the live demo.---
## Usage
- **Structured Questions** (graph route):
- “Which approaches use ResNet backbone?”- **Free‑Form Questions** (RAG fallback):
- “Summarize the Transformer training setup.”Type your query into the text box and hit **Ask**. The UI will display either a graph‑backed result or a generated answer with source snippets.
### ⚠️ Important Note ⚠️
This is a demo project; therefore, it is only for demonstration and it may not answer each query. To specialize, you need to recreate it with your own data or use an LLM API---
## Configuration
- **Reader Model**: Override `google/flan-t5-large` by setting `READER_MODEL` in your environment.
- **Streamlit Port**: Change via `streamlit run app.py --server.port `.
- **Neo4j**: Adjust bolt URI, username, and password via environment variables.---
## Troubleshooting
- **AuthError**: Ensure your Neo4j password is ≥ 8 characters and matches the env vars.
- **Slow RAG**: Reduce `SEMANTIC_K` or use a smaller reader model.
- **Missing Index**: Rebuild with `python retriever_embeddings.py`.## License
MIT License. See [LICENSE](LICENSE) for details.