{"id":27921519,"url":"https://github.com/ako1983/llm_research_assistant","last_synced_at":"2026-04-11T02:47:50.785Z","repository":{"id":284101877,"uuid":"953680945","full_name":"ako1983/LLM_research_assistant","owner":"ako1983","description":"An AI-powered research assistant that leverages Retrieval-Augmented Generation (RAG) to provide accurate responses to user queries by retrieving relevant documents and reasoning through complex questions.","archived":false,"fork":false,"pushed_at":"2025-03-24T09:48:23.000Z","size":58,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-06T21:08:33.015Z","etag":null,"topics":["agent-based-modeling","agents","anthropic","chromadb","dspy","langchain-python","langgraph-python","llm","openai","rag","reasoning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ako1983.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-23T22:25:52.000Z","updated_at":"2025-04-01T14:20:22.000Z","dependencies_parsed_at":"2025-03-24T10:23:20.953Z","dependency_job_id":null,"html_url":"https://github.com/ako1983/LLM_research_assistant","commit_stats":null,"previous_names":["ako1983/llm-research-assistant","ako1983/research-assistant."],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ako1983%2FLLM_research_assistant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ako1983%2FLLM_research_assistant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ako1983%2FLLM_research_assistant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ako1983%2FLLM_research_assistant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ako1983","download_url":"https://codeload.github.com/ako1983/LLM_research_assistant/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252769421,"owners_count":21801378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-based-modeling","agents","anthropic","chromadb","dspy","langchain-python","langgraph-python","llm","openai","rag","reasoning"],"created_at":"2025-05-06T21:08:35.764Z","updated_at":"2026-04-11T02:47:50.780Z","avatar_url":"https://github.com/ako1983.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📚 LLM-Powered Research Assistant 🤖\n\nAn advanced AI-powered research assistant that leverages Retrieval-Augmented Generation (RAG) to provide accurate responses to user queries by retrieving relevant documents and reasoning through complex questions.\n\n## Features\n\n- **Smart Query Routing**: Autonomously decides whether to answer directly from knowledge, retrieve additional context, or use specialized tools\n- **RAG Pipeline**: Retrieves relevant documents to enhance responses with accurate, up-to-date information\n- **Multi-step Reasoning**: Uses DSPy for structured reasoning to break down complex queries\n- **Tool Integration**: Utilizes calculators, web search, and other external tools when needed\n- **Hybrid Search**: Combines dense and sparse retrievers for optimal document retrieval\n- **Multi-Modal Support**: Processes and responds to both text and image inputs\n- **Error Handling**: Robust error management with graceful degradation\n- **Evaluation Framework**: Measures response quality and relevance using DSPy's evaluation capabilities\n\n## Architecture\n\n```ascii\n┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐\n│  User Interface │────▶│  Query Router   │────▶│  RAG Pipeline   │\n└─────────────────┘     └─────────────────┘     └─────────────────┘\n                               │  ▲                     │\n                               ▼  │                     ▼\n                        ┌─────────────────┐     ┌─────────────────┐\n                        │   Tools (Calc,  │     │  Vector Store   │\n                        │   Web Search)   │     │   (ChromaDB)    │\n                        └─────────────────┘     └─────────────────┘\n                               │  ▲                     │\n                               ▼  │                     ▼\n                        ┌─────────────────┐     ┌─────────────────┐\n                        │   LLM Provider  │◀───▶│  DSPy Modules   │\n                        │ (OpenAI/Claude) │     │    \u0026 Metrics    │\n                        └─────────────────┘     └─────────────────┘\n```\n\n## Setup \u0026 Installation\n\n### Local Installation\n\n1. Clone the repository\n\n```bash\ngit clone https://github.com/your-username/LLM_research_assistant.git\ncd LLM_research_assistant\n```\n\n2. Install dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n3. Set up environment variables\n\n```bash\nexport OPENAI_API_KEY=\"your-api-key\"\nexport ANTHROPIC_API_KEY=\"your-api-key\"  # If using Claude\nexport SERPER_API_KEY=\"your-api-key\"     # If using Serper.dev for web search\n```\n\n4. Prepare your data\n\n```bash\npython src/vectorstore_builder.py --data_path data/raw --output_path data/vector_stores\n```\n\n### Docker Installation\n\n1. Build and run with Docker Compose\n\n```bash\ndocker-compose up -d\n```\n\n## Usage\n\n### Basic Usage\n\n```python\nfrom src.agent import ResearchAssistant\nfrom src.llm_providers import OpenAILLM\nfrom src.rag_pipeline import RAGPipeline\nfrom src.tools.web_search import WebSearch\nfrom src.tools.calculator import Calculator\n\n# Initialize components\nllm = OpenAILLM(model_name=\"gpt-4o\")\nrag = RAGPipeline()\nrag.initialize()\nretriever = rag.get_retriever()\n\n# Set up tools\ntools = {\n    \"calculator\": Calculator(),\n    \"web_search\": WebSearch(api_key=\"your-search-api-key\", search_engine=\"serper\")\n}\n\n# Create and use the assistant\nassistant = ResearchAssistant(llm_provider=llm, retriever=retriever, tools=tools)\nresponse = assistant.process_query(\"What was the GDP growth rate in the US last quarter?\")\nprint(response[\"response\"])\n```\n\n### API Usage\n\nStart the API server:\n\n```bash\nuvicorn api_gateway:app --host 0.0.0.0 --port 8000\n```\n\nSend a query via HTTP:\n\n```bash\ncurl -X POST \"http://localhost:8000/query\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"What are the benefits of RAG over traditional LLM approaches?\"}'\n```\n\n### Multi-Modal Usage\n\n```python\nfrom src.agent import MultiModalAssistant\nfrom src.llm_providers import AnthropicLLM\n\n# Initialize multi-modal assistant\nllm = AnthropicLLM(model_name=\"claude-3-opus-20240229\")\nassistant = MultiModalAssistant(llm_provider=llm, retriever=retriever, tools=tools)\n\n# Process query with image\nresponse = assistant.process_query(\n    \"What can you tell me about this graph?\", \n    images=[\"path/to/image.png\"]\n)\n```\n\n## Project Structure\n\n```\nllm-research-assistant/\n├── api_gateway.py         # FastAPI server for the assistant\n├── data/\n│   ├── raw/                # Original dataset files\n│   ├── processed/          # Cleaned CSV files\n│   └── vector_stores/      # ChromaDB vector stores\n├── docker-compose.yml     # Docker configuration\n├── Dockerfile             # Docker build instructions\n├── prompts/\n│   └── query_classification_prompt_template.txt  # LLM prompts\n├── src/\n│   ├── agent.py            # Main assistant logic\n│   ├── llm_providers.py    # LLM abstraction layer\n│   ├── rag_pipeline.py     # Document retrieval system\n│   ├── router.py           # Query routing logic\n│   ├── tools/              # External tool integrations\n│   │   ├── calculator.py   # Math calculation tool\n│   │   └── web_search.py   # Web search with multiple engines\n│   ├── dspy_modules/       # DSPy components\n│   │   ├── evaluators.py   # Evaluation metrics\n│   │   └── signatures.py   # DSPy signatures\n│   └── vectorstore_builder.py  # Indexing utility\n├── tests/\n│   ├── unit/              # Unit tests\n│   ├── integration/       # Integration tests\n│   └── fixtures/          # Test fixtures\n├── config.yaml            # Configuration file\n├── main.py                # Entry point\n└── requirements.txt       # Dependencies\n```\n\n## Advanced Features\n\n### Hybrid Retrieval\n\nThe system combines dense vector similarity search with sparse BM25 retrieval for better document retrieval:\n\n```python\nfrom src.retrieval import HybridRetriever\n\n# Create a hybrid retriever\nhybrid_retriever = HybridRetriever(\n    vector_store=chroma_db,\n    sparse_weight=0.3,\n    dense_weight=0.7\n)\n\n# Use in assistant\nassistant = ResearchAssistant(llm_provider=llm, retriever=hybrid_retriever)\n```\n\n### Caching\n\nEnable result caching to improve performance:\n\n```python\nfrom src.utils.caching import ResultCache, cached\n\n# Initialize cache\ncache = ResultCache(redis_url=\"redis://localhost:6379/0\")\n\n# Apply caching to expensive operations\n@cached(cache)\ndef get_embeddings(text):\n    # Expensive embedding computation\n    return embeddings\n```\n\n## Error Handling\n\nThe system implements robust error handling with custom exceptions:\n\n```python\ntry:\n    response = assistant.process_query(\"Complex query\")\nexcept LLMProviderError as e:\n    # Handle LLM-specific errors\n    fallback_response = \"I'm having trouble connecting to my knowledge base\"\nexcept RAGPipelineError as e:\n    # Handle retrieval errors\n    fallback_response = \"I couldn't retrieve the necessary information\"\nexcept ToolExecutionError as e:\n    # Handle tool execution errors\n    fallback_response = \"I encountered an issue with the requested operation\"\n```\n\n## Requirements\n\n- Python 3.8+\n- LangChain\n- DSPy\n- ChromaDB\n- OpenAI or Anthropic API access\n- Redis (optional, for caching)\n\n## Evaluation\n\nThe system uses DSPy's evaluation framework to assess:\n\n- Answer correctness\n- Context relevance\n- Reasoning quality\n- Hallucination detection\n\n## Contributing\n\nWe welcome contributions to improve the research assistant! Please follow these steps:\n\n1. Fork the repository\n2. Create a new branch (`git checkout -b feature/your-feature`)\n3. Make your changes\n4. Run tests (`pytest`)\n5. Commit your changes (`git commit -m 'Add some feature'`)\n6. Push to the branch (`git push origin feature/your-feature`)\n7. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgements\n\n- Built with [LangChain](https://github.com/langchain-ai/langchain) and [DSPy](https://github.com/stanfordnlp/dspy)\n- Vector storage provided by [ChromaDB](https://github.com/chroma-core/chroma)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fako1983%2Fllm_research_assistant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fako1983%2Fllm_research_assistant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fako1983%2Fllm_research_assistant/lists"}