An open API service indexing awesome lists of open source software.

https://github.com/ako1983/llm_research_assistant

An AI-powered research assistant that leverages Retrieval-Augmented Generation (RAG) to provide accurate responses to user queries by retrieving relevant documents and reasoning through complex questions.
https://github.com/ako1983/llm_research_assistant

agent-based-modeling agents anthropic chromadb dspy langchain-python langgraph-python llm openai rag reasoning

Last synced: 5 months ago
JSON representation

An AI-powered research assistant that leverages Retrieval-Augmented Generation (RAG) to provide accurate responses to user queries by retrieving relevant documents and reasoning through complex questions.

Awesome Lists containing this project

README

          

# πŸ“š LLM-Powered Research Assistant πŸ€–

An AI-powered research assistant that leverages Retrieval-Augmented Generation (RAG) to provide accurate responses to user queries by retrieving relevant documents and reasoning through complex questions.

## Features

- **Smart Query Routing**: Autonomously decides whether to answer directly from knowledge, retrieve additional context, or use specialized tools
- **RAG Pipeline**: Retrieves relevant documents to enhance responses with accurate, up-to-date information
- **Multi-step Reasoning**: Uses DSPy for structured reasoning to break down complex queries
- **Tool Integration**: Can utilize calculators, web search, and other external tools when needed
- **Evaluation Framework**: Measures response quality and relevance using DSPy's evaluation capabilities

## Architecture

```ascii
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Interface │────▢│ Query Router │────▢│ RAG Pipeline β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β–² β”‚
β–Ό β”‚ β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tools (Calc, β”‚ β”‚ Vector Store β”‚
β”‚ Web Search) β”‚ β”‚ (ChromaDB) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β–² β”‚
β–Ό β”‚ β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LLM Provider │◀───▢│ DSPy Modules β”‚
β”‚ (OpenAI/Claude) β”‚ β”‚ & Metrics β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Setup & Installation

1. Clone the repository

```bash
git clone https://github.com/ako1983//LLM_research_assistant.git
cd /LLM_research_assistant
```

2. Install dependencies

```bash
pip install -r requirements.txt
```

3. Set up environment variables

```bash
export OPENAI_API_KEY="your-api-key"
export ANTHROPIC_API_KEY="your-api-key" # If using Claude
```

4. Prepare your data

```bash
python src/vectorstore_builder.py
```

## Usage

```python
from src.agent import ResearchAssistant
from src.llm_providers import OpenAILLM
from src.rag_pipeline import RAGPipeline

# Initialize components
llm = OpenAILLM(model_name="gpt-3.5-turbo")
rag = RAGPipeline()
rag.initialize()
retriever = rag.get_retriever()

# Create and use the assistant
assistant = ResearchAssistant(llm_provider=llm, retriever=retriever)
response = assistant.process_query("How do I fix Wi-Fi connection issues?")
print(response["response"])
```

## Project Structure

```
llm-research-assistant/
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ raw/ # Original dataset files
β”‚ β”œβ”€β”€ processed/ # Cleaned CSV files
β”‚ └── vector_stores/ # ChromaDB vector stores
β”œβ”€β”€ prompts/
β”‚ └── query_classification_prompt_template.txt # LLM prompts
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ agent.py # Main assistant logic
β”‚ β”œβ”€β”€ llm_providers.py # LLM abstraction layer
β”‚ β”œβ”€β”€ rag_pipeline.py # Document retrieval system
β”‚ β”œβ”€β”€ router.py # Query routing logic
β”‚ β”œβ”€β”€ tools/ # External tool integrations
β”‚ └── dspy_modules/ # DSPy components
β”œβ”€β”€ tests/ # Test cases
β”œβ”€β”€ main.py # Entry point
└── requirements.txt # Dependencies
```

## Requirements

- Python 3.8+
- LangChain
- DSPy
- ChromaDB
- OpenAI or Anthropic API access

## Evaluation

The system uses DSPy's evaluation framework to assess:

- Answer correctness
- Context relevance
- Reasoning quality

## Acknowledgements

- Data sourced from [Bitext Customer Support Dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset)
- Built for an assessment