An open API service indexing awesome lists of open source software.

https://github.com/ako1983/llm_research_assistant

An AI-powered research assistant that leverages Retrieval-Augmented Generation (RAG) to provide accurate responses to user queries by retrieving relevant documents and reasoning through complex questions.
https://github.com/ako1983/llm_research_assistant

agent-based-modeling agents anthropic chromadb dspy langchain-python langgraph-python llm openai rag reasoning

Last synced: about 2 months ago
JSON representation

An AI-powered research assistant that leverages Retrieval-Augmented Generation (RAG) to provide accurate responses to user queries by retrieving relevant documents and reasoning through complex questions.

Awesome Lists containing this project

README

          

# πŸ“š LLM-Powered Research Assistant πŸ€–

An advanced AI-powered research assistant that leverages Retrieval-Augmented Generation (RAG) to provide accurate responses to user queries by retrieving relevant documents and reasoning through complex questions.

## Features

- **Smart Query Routing**: Autonomously decides whether to answer directly from knowledge, retrieve additional context, or use specialized tools
- **RAG Pipeline**: Retrieves relevant documents to enhance responses with accurate, up-to-date information
- **Multi-step Reasoning**: Uses DSPy for structured reasoning to break down complex queries
- **Tool Integration**: Utilizes calculators, web search, and other external tools when needed
- **Hybrid Search**: Combines dense and sparse retrievers for optimal document retrieval
- **Multi-Modal Support**: Processes and responds to both text and image inputs
- **Error Handling**: Robust error management with graceful degradation
- **Evaluation Framework**: Measures response quality and relevance using DSPy's evaluation capabilities

## Architecture

```ascii
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Interface │────▢│ Query Router │────▢│ RAG Pipeline β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β–² β”‚
β–Ό β”‚ β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tools (Calc, β”‚ β”‚ Vector Store β”‚
β”‚ Web Search) β”‚ β”‚ (ChromaDB) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β–² β”‚
β–Ό β”‚ β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LLM Provider │◀───▢│ DSPy Modules β”‚
β”‚ (OpenAI/Claude) β”‚ β”‚ & Metrics β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Setup & Installation

### Local Installation

1. Clone the repository

```bash
git clone https://github.com/your-username/LLM_research_assistant.git
cd LLM_research_assistant
```

2. Install dependencies

```bash
pip install -r requirements.txt
```

3. Set up environment variables

```bash
export OPENAI_API_KEY="your-api-key"
export ANTHROPIC_API_KEY="your-api-key" # If using Claude
export SERPER_API_KEY="your-api-key" # If using Serper.dev for web search
```

4. Prepare your data

```bash
python src/vectorstore_builder.py --data_path data/raw --output_path data/vector_stores
```

### Docker Installation

1. Build and run with Docker Compose

```bash
docker-compose up -d
```

## Usage

### Basic Usage

```python
from src.agent import ResearchAssistant
from src.llm_providers import OpenAILLM
from src.rag_pipeline import RAGPipeline
from src.tools.web_search import WebSearch
from src.tools.calculator import Calculator

# Initialize components
llm = OpenAILLM(model_name="gpt-4o")
rag = RAGPipeline()
rag.initialize()
retriever = rag.get_retriever()

# Set up tools
tools = {
"calculator": Calculator(),
"web_search": WebSearch(api_key="your-search-api-key", search_engine="serper")
}

# Create and use the assistant
assistant = ResearchAssistant(llm_provider=llm, retriever=retriever, tools=tools)
response = assistant.process_query("What was the GDP growth rate in the US last quarter?")
print(response["response"])
```

### API Usage

Start the API server:

```bash
uvicorn api_gateway:app --host 0.0.0.0 --port 8000
```

Send a query via HTTP:

```bash
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What are the benefits of RAG over traditional LLM approaches?"}'
```

### Multi-Modal Usage

```python
from src.agent import MultiModalAssistant
from src.llm_providers import AnthropicLLM

# Initialize multi-modal assistant
llm = AnthropicLLM(model_name="claude-3-opus-20240229")
assistant = MultiModalAssistant(llm_provider=llm, retriever=retriever, tools=tools)

# Process query with image
response = assistant.process_query(
"What can you tell me about this graph?",
images=["path/to/image.png"]
)
```

## Project Structure

```
llm-research-assistant/
β”œβ”€β”€ api_gateway.py # FastAPI server for the assistant
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ raw/ # Original dataset files
β”‚ β”œβ”€β”€ processed/ # Cleaned CSV files
β”‚ └── vector_stores/ # ChromaDB vector stores
β”œβ”€β”€ docker-compose.yml # Docker configuration
β”œβ”€β”€ Dockerfile # Docker build instructions
β”œβ”€β”€ prompts/
β”‚ └── query_classification_prompt_template.txt # LLM prompts
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ agent.py # Main assistant logic
β”‚ β”œβ”€β”€ llm_providers.py # LLM abstraction layer
β”‚ β”œβ”€β”€ rag_pipeline.py # Document retrieval system
β”‚ β”œβ”€β”€ router.py # Query routing logic
β”‚ β”œβ”€β”€ tools/ # External tool integrations
β”‚ β”‚ β”œβ”€β”€ calculator.py # Math calculation tool
β”‚ β”‚ └── web_search.py # Web search with multiple engines
β”‚ β”œβ”€β”€ dspy_modules/ # DSPy components
β”‚ β”‚ β”œβ”€β”€ evaluators.py # Evaluation metrics
β”‚ β”‚ └── signatures.py # DSPy signatures
β”‚ └── vectorstore_builder.py # Indexing utility
β”œβ”€β”€ tests/
β”‚ β”œβ”€β”€ unit/ # Unit tests
β”‚ β”œβ”€β”€ integration/ # Integration tests
β”‚ └── fixtures/ # Test fixtures
β”œβ”€β”€ config.yaml # Configuration file
β”œβ”€β”€ main.py # Entry point
└── requirements.txt # Dependencies
```

## Advanced Features

### Hybrid Retrieval

The system combines dense vector similarity search with sparse BM25 retrieval for better document retrieval:

```python
from src.retrieval import HybridRetriever

# Create a hybrid retriever
hybrid_retriever = HybridRetriever(
vector_store=chroma_db,
sparse_weight=0.3,
dense_weight=0.7
)

# Use in assistant
assistant = ResearchAssistant(llm_provider=llm, retriever=hybrid_retriever)
```

### Caching

Enable result caching to improve performance:

```python
from src.utils.caching import ResultCache, cached

# Initialize cache
cache = ResultCache(redis_url="redis://localhost:6379/0")

# Apply caching to expensive operations
@cached(cache)
def get_embeddings(text):
# Expensive embedding computation
return embeddings
```

## Error Handling

The system implements robust error handling with custom exceptions:

```python
try:
response = assistant.process_query("Complex query")
except LLMProviderError as e:
# Handle LLM-specific errors
fallback_response = "I'm having trouble connecting to my knowledge base"
except RAGPipelineError as e:
# Handle retrieval errors
fallback_response = "I couldn't retrieve the necessary information"
except ToolExecutionError as e:
# Handle tool execution errors
fallback_response = "I encountered an issue with the requested operation"
```

## Requirements

- Python 3.8+
- LangChain
- DSPy
- ChromaDB
- OpenAI or Anthropic API access
- Redis (optional, for caching)

## Evaluation

The system uses DSPy's evaluation framework to assess:

- Answer correctness
- Context relevance
- Reasoning quality
- Hallucination detection

## Contributing

We welcome contributions to improve the research assistant! Please follow these steps:

1. Fork the repository
2. Create a new branch (`git checkout -b feature/your-feature`)
3. Make your changes
4. Run tests (`pytest`)
5. Commit your changes (`git commit -m 'Add some feature'`)
6. Push to the branch (`git push origin feature/your-feature`)
7. Open a Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgements

- Built with [LangChain](https://github.com/langchain-ai/langchain) and [DSPy](https://github.com/stanfordnlp/dspy)
- Vector storage provided by [ChromaDB](https://github.com/chroma-core/chroma)