{"id":26317804,"url":"https://github.com/ricoledan/supervisor-multi-agent-system","last_synced_at":"2025-10-05T07:56:57.208Z","repository":{"id":281934117,"uuid":"946870122","full_name":"Ricoledan/supervisor-multi-agent-system","owner":"Ricoledan","description":"🤖 Example of a Hierarchical Multi-Agent System leveraging a central supervisor agent that oversees specialized worker agents. ","archived":false,"fork":false,"pushed_at":"2025-06-14T15:24:48.000Z","size":12852,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-14T15:31:39.071Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ricoledan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-11T19:59:23.000Z","updated_at":"2025-06-14T15:24:51.000Z","dependencies_parsed_at":"2025-06-14T15:33:47.721Z","dependency_job_id":null,"html_url":"https://github.com/Ricoledan/supervisor-multi-agent-system","commit_stats":null,"previous_names":["ricoledan/supervisor-ai-agent"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Ricoledan/supervisor-multi-agent-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fsupervisor-multi-agent-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fsupervisor-multi-agent-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fsupervisor-multi-agent-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fsupervisor-multi-agent-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ricoledan","download_url":"https://codeload.github.com/Ricoledan/supervisor-multi-agent-system/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fsupervisor-multi-agent-system/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278425493,"owners_count":25984687,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-15T14:15:42.806Z","updated_at":"2025-10-05T07:56:57.176Z","avatar_url":"https://github.com/Ricoledan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multi-Agent Research System\n\n## Overview\n\nThis project implements a sophisticated Multi-Agent System (MAS) architecture featuring a **Research Coordinator** that\norchestrates specialized AI agents to analyze academic research. The system transforms academic papers into a\nsearchable, structured, and semantically rich knowledge base through intelligent agent coordination powered by *\n*LangGraph** state management and **Command-based routing**.\n\n## Key Features\n\n### Core Capabilities\n\n- **Intelligent Agent Orchestration**: Research Coordinator uses LangGraph StateGraph with Command-based routing to\n  dynamically delegate queries to specialized agents\n- ️**Multi-Database Architecture**: Integrates Neo4j (graph), MongoDB (documents), and ChromaDB (vectors) for\n  comprehensive data storage\n- **Automated PDF Processing**: Advanced ingestion pipeline with entity extraction, topic modeling, and metadata\n  enrichment using OpenAI GPT-4\n- **Semantic Search \u0026 Retrieval**: Hybrid search combining vector similarity, graph traversal, and document analysis\n- **Real-time Research Analysis**: Dynamic routing between relationship analysis and thematic analysis based on query\n  classification\n- ️**Professional CLI Interface**: Comprehensive command-line management with health checks, logging, and testing\n  capabilities\n\n### LangGraph Workflow Architecture\n\n```\nResearch Coordinator (LangGraph StateGraph)\n    ↓\nQuery Classification Node\n    ↓\n┌─────────────┬─────────────┬─────────────┐\n│   Greeting  │   Simple    │  Research   │\n│   Handler   │  Question   │   Query     │\n└─────────────┴─────────────┴─────────────┘\n                                  ↓\n                          Planning Node\n                                  ↓\n                    ┌─────────────┬─────────────┐\n                    │Relationship │   Theme     │\n                    │  Analyst    │  Analyst    │\n                    │   Node      │    Node     │\n                    └─────────────┴─────────────┘\n                                  ↓\n                           Synthesis Node\n                                  ↓\n                           Final Response\n```\n\n### Agent Specializations\n\n- **🎯 Research Coordinator**: Central supervisor using LangGraph Commands for intelligent query classification and agent\n  delegation\n- **🔗 Relationship Analyst**: Maps connections between papers, authors, concepts, and research lineages using Neo4j\n  graph queries\n- **📊 Theme Analyst**: Identifies patterns, topics, and trends across research literature using MongoDB document\n  analysis\n- **🏷️ Entity Extraction**: Automated identification of key concepts, methodologies, and research entities via LLM\n  processing\n- **📈 Topic Modeling**: Latent theme discovery and research domain classification with weighted term extraction\n\n## ️ System Architecture\n\n### Complete Workflow Pipeline\n\n```\nPDF Ingestion ➜ Entity Extraction ➜ Topic Modeling ➜ Graph Construction ➜ Vector Embedding ➜ Agent Analysis ➜ User Interaction\n```\n\n### Advanced Ingestion Pipeline\n\nThe system features a sophisticated **multi-stage ingestion pipeline** that processes academic PDFs:\n\n1. **📄 PDF Text Extraction**: Uses PyMuPDF for robust text and metadata extraction\n2. **🧠 LLM-Powered Analysis**: OpenAI GPT-4 extracts entities, relationships, and topics\n3. **🔗 Knowledge Graph Construction**: Builds Neo4j nodes and relationships for papers, authors, and concepts\n4. **📊 Topic Modeling**: Discovers research themes and categorizes content in MongoDB\n5. **🎯 Vector Embeddings**: Creates semantic embeddings for similarity search in ChromaDB\n6. **✅ Quality Validation**: Tests data integrity across all databases\n\n```bash\n# Run complete ingestion pipeline\npython src/utils/ingestion_pipeline.py\n\n# Test with a single PDF\npython src/utils/ingestion_pipeline.py --test\n```\n\n### Directory Structure\n\n```\nsupervisor-multi-agent-system/\n├── src/\n│   ├── main.py                    # FastAPI application entry point\n│   ├── api/v1/endpoints/          # API endpoints\n│   │   ├── status.py              # Health check endpoints\n│   │   └── agent.py               # Main agent interaction endpoint\n│   ├── domain/\n│   │   ├── agents/                # Specialized AI agents\n│   │   │   ├── research_coordinator.py   # LangGraph orchestration agent\n│   │   │   ├── relationship_analyst.py  # Neo4j graph analysis\n│   │   │   └── theme_analyst.py         # MongoDB topic analysis\n│   ├── databases/                 # Database configurations\n│   │   ├── graph/                 # Neo4j configuration\n│   │   ├── document/              # MongoDB configuration\n│   │   └── vector/                # ChromaDB configuration\n│   ├── services/                  # Database service layers\n│   │   ├── graph_service.py       # Neo4j operations\n│   │   ├── document_service.py    # MongoDB operations\n│   │   └── vector_service.py      # ChromaDB operations\n│   └── utils/                     # Utilities and tools\n│       ├── ingestion_pipeline.py  # Comprehensive PDF processing\n│       ├── model_init.py          # LLM initialization\n│       └── agent_wrapper.py       # Agent response utilities\n├── cli.py                         # Professional CLI interface\n├── docker-compose.yml             # Multi-service orchestration\n├── requirements.txt               # Python dependencies\n└── sources/                       # PDF documents for ingestion\n```\n\n### Technology Stack\n\n| Component             | Technology              | Purpose                                               |\n|-----------------------|-------------------------|-------------------------------------------------------|\n| **Agent Framework**   | LangGraph + LangChain   | Modern state-based multi-agent orchestration          |\n| **LLM Integration**   | OpenAI GPT-4            | Entity extraction, topic modeling, and analysis       |\n| **API Framework**     | FastAPI                 | High-performance web API with automatic documentation |\n| **Graph Database**    | Neo4j                   | Knowledge graph for entity relationships              |\n| **Document Database** | MongoDB                 | Structured document storage and topic modeling        |\n| **Vector Database**   | ChromaDB                | Semantic search and similarity matching               |\n| **Containerization**  | Docker + Docker Compose | Consistent deployment and scaling                     |\n| **CLI Interface**     | Click                   | Professional command-line management                  |\n\n## Prerequisites\n\n- **Python**: 3.11+\n- **Docker**: Latest version with Docker Compose\n- **OpenAI API Key**: Required for LLM operations\n- **System Requirements**: 8GB RAM minimum, 16GB recommended\n- **Storage**: 20GB minimum, 50GB recommended for large document collections\n\n## Quick Start\n\n### 1. Clone and Setup\n\n```bash\ngit clone https://github.com/Ricoledan/supervisor-multi-agent-system\ncd supervisor-multi-agent-system\ncp .env.defaults .env\n# Edit .env and add your OPENAI_API_KEY\n```\n\n### 2. Start the System\n\n```bash\n# Using the CLI (recommended)\npython cli.py start\n\n# Quick start with minimal health checks\npython cli.py quick-start\n\n# Start only databases for development\npython cli.py start --databases-only\n```\n\n### 3. Add Research Papers\n\n```bash\n# Create sources directory and add PDFs\nmkdir -p sources\n# Copy your academic PDF files to sources/\n```\n\n### 4. Test the System\n\n```bash\n# Quick test with clean output\npython cli.py test --simple\n\n# Detailed system test\npython cli.py test --query \"machine learning applications\"\n\n# Test specific functionality\ncurl -X POST \"http://localhost:8000/api/v1/agent\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"How do neural networks relate to computer vision?\"}'\n```\n\n## ️ CLI Commands\n\nThe system includes a comprehensive CLI for professional management:\n\n### System Management\n\n```bash\npython cli.py start           # Start all services\npython cli.py stop            # Stop all services  \npython cli.py restart         # Restart system\npython cli.py status          # Check service status\n```\n\n### Development \u0026 Testing\n\n```bash\npython cli.py test            # Test system functionality\npython cli.py test --simple   # Clean, formatted output\npython cli.py health          # Run health checks\npython cli.py health --detailed  # Comprehensive health analysis\npython cli.py logs            # View system logs\npython cli.py logs --follow   # Follow logs in real-time\n```\n\n### Database Management\n\n```bash\npython cli.py start --databases-only  # Start only databases\npython cli.py restart --service neo4j  # Restart specific service\n```\n\n## Research Applications \u0026 Use Cases\n\n### Literature Review Automation\n\n```json\n{\n  \"query\": \"What are the main approaches to transformer architectures in natural language processing?\"\n}\n```\n\n### Research Gap Identification\n\n```json\n{\n  \"query\": \"How do computer vision techniques connect to medical diagnosis research?\"\n}\n```\n\n### Trend Analysis\n\n```json\n{\n  \"query\": \"What themes are emerging in climate change adaptation research over the past 5 years?\"\n}\n```\n\n### Citation Network Analysis\n\n```json\n{\n  \"query\": \"Show me the research lineage and evolution of BERT language models\"\n}\n```\n\n### Cross-Disciplinary Discovery\n\n```json\n{\n  \"query\": \"How does reinforcement learning apply to robotics and autonomous systems?\"\n}\n```\n\n## API Documentation\n\n### Core Research Endpoint\n\n**POST /api/v1/agent**\n\n```bash\ncurl -X POST \"http://localhost:8000/api/v1/agent\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"How do neural networks relate to computer vision?\"}'\n```\n\n### Response Structure\n\n```json\n{\n  \"status\": \"success\",\n  \"message\": \"# 🎯 Research Analysis Results\\n\\n**Query:** How do neural networks relate to computer vision?\\n\\n## 🔗 Relationship Analysis\\n\\nBased on the knowledge graph analysis...\",\n  \"query\": \"How do neural networks relate to computer vision?\",\n  \"query_type\": \"RESEARCH_QUERY\",\n  \"specialists_used\": {\n    \"relationship_analyst\": true,\n    \"theme_analyst\": true\n  },\n  \"system_health\": {\n    \"relationship_analyst\": \"✅ Active\",\n    \"theme_analyst\": \"✅ Active\",\n    \"database_usage\": \"✅ High\",\n    \"response_quality\": \"Database-driven\"\n  }\n}\n```\n\n### Additional Endpoints\n\n| Endpoint                 | Method | Description                     |\n|--------------------------|--------|---------------------------------|\n| `/api/v1/status`         | GET    | System health check             |\n| `/api/v1/agent`          | POST   | Main research analysis endpoint |\n| `/api/v1/agent/detailed` | POST   | Full conversation state         |\n| `/api/v1/agent/raw`      | POST   | Debug endpoint with raw outputs |\n| `/api/v1/agent/health`   | GET    | Agent system health check       |\n\n## 🗄️ Database Schema \u0026 Architecture\n\n### Neo4j Graph Schema\n\n```cypher\n// Nodes\n(:Paper {id, title, year, source, research_field, methodology})\n(:Author {name})\n(:Concept {name, category, description})\n\n// Relationships\n(:Author)-[:AUTHORED]-\u003e(:Paper)\n(:Paper)-[:CONTAINS]-\u003e(:Concept)\n(:Concept)-[:RELATES_TO {type, description}]-\u003e(:Concept)\n```\n\n### MongoDB Collections\n\n```javascript\n// papers collection\n{\n    paper_id: String,\n        metadata\n:\n    {\n        title, authors, year, abstract, keywords,\n            journal, doi, research_field, methodology\n    }\n,\n    content: [{page, text}],\n        entities\n:\n    {\n        concepts, relationships\n    }\n,\n    processed_at: Date\n}\n\n// topics collection  \n{\n    paper_id: String,\n        category\n:\n    String,\n        terms\n:\n    [{term, weight}],\n        source\n:\n    String,\n        created_at\n:\n    Date\n}\n```\n\n### ChromaDB Schema\n\n```python\n# Collection: academic_papers\n{\n    documents: [text_chunks],\n    embeddings: [vector_embeddings],\n    metadatas: [{\n        paper_id, page, source, title,\n        authors, year, research_field,\n        chunk_id, chunk_total\n    }],\n    ids: [unique_chunk_ids]\n}\n```\n\n## ️ Configuration \u0026 Environment\n\n### Environment Variables (.env)\n\n```bash\n# Required\nOPENAI_API_KEY=your_openai_api_key_here\n\n# Database Configuration (defaults provided)\nNEO4J_URI=bolt://localhost:7687\nNEO4J_USER=neo4j\nNEO4J_PASSWORD=password\nNEO4J_DB=neo4j\n\nMONGODB_HOST=localhost\nMONGODB_PORT=27017\nMONGODB_USER=user\nMONGODB_PASSWORD=password\nMONGODB_DB=research_db\n\nCHROMA_HOST=localhost\nCHROMA_PORT=8001\n```\n\n### Advanced Configuration\n\n```bash\n# LLM Model Selection\nOPENAI_MODEL=gpt-4  # or gpt-3.5-turbo for faster responses\n\n# Ingestion Pipeline Settings\nCHUNK_SIZE=1000        # Text chunk size for embeddings\nCHUNK_OVERLAP=200      # Overlap between chunks\nMAX_CONCEPTS=15        # Maximum concepts per paper\n\n# Performance Tuning\nNEO4J_POOL_SIZE=10\nMONGODB_POOL_SIZE=10\n```\n\n## Agent Tool System\n\n### Relationship Analyst Tools\n\n- **`analyze_research_relationships()`**: Queries Neo4j for entity connections\n    - Paper lineages and citation networks\n    - Author collaboration patterns\n    - Cross-disciplinary concept relationships\n    - Research influence patterns\n\n### Theme Analyst Tools\n\n- **`analyze_research_themes()`**: Queries MongoDB for topic patterns\n    - Latent theme discovery across document collections\n    - Research trend identification and evolution\n    - Methodological approach analysis\n    - Domain-specific terminology extraction\n\n## Performance \u0026 Scalability\n\n### Performance Characteristics\n\n- **Query Response Time**: 15-45 seconds (depends on database size and complexity)\n- **PDF Processing Speed**: 2-3 minutes per paper (including all extractions)\n- **Concurrent Users**: Supports 5-10 simultaneous research queries\n- **Database Storage**: ~500MB per 100 research papers\n\n### System Requirements\n\n- **Minimum**: 8GB RAM, 4 CPU cores, 20GB storage\n- **Recommended**: 16GB RAM, 8 CPU cores, 50GB storage\n- **Production**: 32GB RAM, 8+ CPU cores, 100GB+ storage\n\n### Scalability Options\n\n- **Horizontal Scaling**: Docker Compose replicas for API services\n- **Database Optimization**: Connection pooling and memory tuning\n- **Caching**: Redis integration for frequent queries (future enhancement)\n\n## Troubleshooting\n\n### Common Issues \u0026 Solutions\n\n#### Database Connection Failures\n\n```bash\n# Check service status\npython cli.py status\n\n# View detailed logs\npython cli.py logs --service neo4j\npython cli.py logs --service mongodb\npython cli.py logs --service chromadb\n\n# Restart specific service\npython cli.py restart --service neo4j\n```\n\n#### Empty Database Results\n\n```bash\n# Verify data ingestion completed\npython cli.py test --query \"machine learning\"\n\n# Check ingestion quality\npython src/utils/ingestion_pipeline.py --test\n\n# Re-run full ingestion if needed\npython src/utils/ingestion_pipeline.py\n```\n\n#### API Timeout Issues\n\n```bash\n# Increase timeout for complex queries\npython cli.py test --timeout 120\n\n# Check database performance\npython cli.py health --detailed\n\n# Monitor system resources\npython cli.py logs --follow\n```\n\n#### Source Directory Issues\n\n```bash\n# Verify sources directory exists\nls -la sources/\n\n# Check PDF file permissions\npython cli.py test --query \"test\"\n```\n\n## Development \u0026 Extension\n\n### Adding New Specialist Agents\n\n1. **Create Agent File**: `src/domain/agents/new_specialist.py`\n\n```python\nfrom langchain_core.tools import tool\nfrom langgraph.prebuilt import create_react_agent\n\n\n@tool\ndef analyze_custom_data(query: str) -\u003e str:\n    \"\"\"Custom analysis tool\"\"\"\n    # Your custom database queries here\n    return analysis_result\n\n\nspecialist_agent = create_react_agent(\n    model=model,\n    tools=[analyze_custom_data],\n    prompt=SYSTEM_PROMPT\n)\n```\n\n2. **Update Coordinator**: Add routing logic in `research_coordinator.py`\n3. **Add API Endpoints**: Update `agent.py` if needed\n\n### Custom Database Queries\n\n```python\n# Example: Custom Neo4j analysis\ndef analyze_author_networks(query: str):\n    with driver.session() as session:\n        result = session.run(\"\"\"\n            MATCH (a1:Author)-[:AUTHORED]-\u003e(p)\u003c-[:AUTHORED]-(a2:Author)\n            WHERE a1.name CONTAINS $query\n            RETURN a1.name, a2.name, count(p) as collaborations\n            ORDER BY collaborations DESC LIMIT 10\n        \"\"\", query=query)\n        return result.data()\n```\n\n### Development Workflow\n\n```bash\n# Start databases only for development\npython cli.py start --databases-only\n\n# Run API in development mode\npython -m uvicorn src.main:app --reload --host 0.0.0.0 --port 8000\n\n# Monitor logs in separate terminal\npython cli.py logs --follow\n\n# Test changes\npython cli.py test --simple\n```\n\n## Access Points\n\nAfter starting the system, access these interfaces:\n\n- **API Documentation**: http://localhost:8000/docs\n- **API Status**: http://localhost:8000/api/v1/status\n- **Neo4j Browser**: http://localhost:7474 (neo4j/password)\n- **MongoDB Express**: http://localhost:8081\n- **ChromaDB**: http://localhost:8001\n\n## 🧪 Testing \u0026 Quality Assurance\n\n### Comprehensive Testing\n\n```bash\n# Full system test with clean output\npython cli.py test --simple\n\n# Test with specific queries\npython cli.py test --query \"transformer models\" --timeout 60\n\n# Health check all components\npython cli.py health --detailed\n\n# Test ingestion pipeline\npython src/utils/ingestion_pipeline.py --test\n\n# API endpoint testing\ncurl -X GET \"http://localhost:8000/api/v1/agent/health\"\n```\n\n### Quality Validation\n\nThe system includes built-in quality checks:\n\n- **Data Integrity**: Validates cross-database consistency\n- **Response Quality**: Monitors agent specialist usage\n- **Performance Metrics**: Tracks query response times\n- **Database Health**: Monitors connection status and query performance\n\n## Supported Document Types \u0026 Sources\n\n### Input Formats\n\n- **Primary**: PDF research papers with text content\n- **Secondary**: Text files (.txt, .md) for preprocessing\n- **Future**: DOI-based ingestion, arXiv API integration\n\n### Recommended Paper Sources\n\n- Academic conferences (NeurIPS, ICML, ACL, ICLR, etc.)\n- Journal articles from major publishers (IEEE, ACM, Springer, Elsevier)\n- Preprint servers (arXiv, bioRxiv, medRxiv)\n- Technical reports and white papers\n\n## References \u0026 Resources\n\n- [LangGraph Multi-Agent Documentation](https://langchain-ai.github.io/langgraph/concepts/multi_agent/)\n- [Neo4j Graph Database Documentation](https://neo4j.com/docs/)\n- [ChromaDB Vector Database Documentation](https://docs.trychroma.com/)\n- [FastAPI Framework Documentation](https://fastapi.tiangolo.com/)\n- [OpenAI API Documentation](https://platform.openai.com/docs)\n- [Docker Compose Documentation](https://docs.docker.com/compose/)\n- [Multi-Agent Systems](https://langchain-ai.lang.chat/langgraph/concepts/multi_agent/)\n\n## Contributing\n\nContributions are welcome! Please read our contributing guidelines and submit pull requests for any improvements.\n\n## License\n\nThis project is licensed under the MIT License—see the LICENSE file for details.\n\n---","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fricoledan%2Fsupervisor-multi-agent-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fricoledan%2Fsupervisor-multi-agent-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fricoledan%2Fsupervisor-multi-agent-system/lists"}