{"id":28442977,"url":"https://github.com/asyncfuncai/dynamic-orchestrator-agent","last_synced_at":"2026-03-17T11:34:04.771Z","repository":{"id":296648510,"uuid":"993840212","full_name":"AsyncFuncAI/dynamic-orchestrator-agent","owner":"AsyncFuncAI","description":"Dynamic Orchestrator Agent (DOA)","archived":false,"fork":false,"pushed_at":"2025-06-01T04:34:08.000Z","size":88,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-01T13:52:44.850Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AsyncFuncAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-31T16:28:17.000Z","updated_at":"2025-06-01T04:35:14.000Z","dependencies_parsed_at":"2025-06-01T13:52:47.804Z","dependency_job_id":"bfe93519-7ee2-450d-b7c0-7e5fa8b62d2c","html_url":"https://github.com/AsyncFuncAI/dynamic-orchestrator-agent","commit_stats":null,"previous_names":["asyncfuncai/dynamic-orchestrator-agent"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AsyncFuncAI/dynamic-orchestrator-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsyncFuncAI%2Fdynamic-orchestrator-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsyncFuncAI%2Fdynamic-orchestrator-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsyncFuncAI%2Fdynamic-orchestrator-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsyncFuncAI%2Fdynamic-orchestrator-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AsyncFuncAI","download_url":"https://codeload.github.com/AsyncFuncAI/dynamic-orchestrator-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsyncFuncAI%2Fdynamic-orchestrator-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30622755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T11:26:08.186Z","status":"ssl_error","status_checked_at":"2026-03-17T11:24:37.311Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-06T06:40:12.022Z","updated_at":"2026-03-17T11:34:04.766Z","avatar_url":"https://github.com/AsyncFuncAI.png","language":"Python","readme":"# Dynamic Orchestrator Agent (DOA) Framework 🎭\n\nA cutting-edge framework for adaptive multi-agent LLM collaboration with reinforcement learning-based orchestration, based on the \"Puppeteer\" model from the paper:\n\n**Based on the research:**\n📄 [Puppeteer: Adaptive Multi-Agent Orchestration with Reinforcement Learning](https://arxiv.org/abs/2505.19591)\n\n## 🌟 Overview\n\nThe DOA Framework implements a **learnable orchestrator** that dynamically selects which agents to activate based on the current task state. Unlike static multi-agent systems, our orchestrator continuously improves through reinforcement learning, learning to:\n\n- 🎯 **Optimize agent selection** for better task performance\n- ⚡ **Minimize computational costs** through efficient orchestration\n- 🔄 **Adapt to complex reasoning patterns** including cycles and hubs\n- 📈 **Self-improve** via REINFORCE-based policy optimization\n\n## 🏗️ Architecture\n\n```\n┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐\n│   Task Input    │───▶│   Orchestrator   │───▶│  Agent Pool     │\n└─────────────────┘    │                  │    │                 │\n                       │  Policy Network  │    │ • EchoAgent     │\n┌─────────────────┐    │  (Neural Net)    │    │ • TerminatorAgent│\n│ Reward Signal   │◀───│                  │    │ • CustomAgents  │\n└─────────────────┘    └──────────────────┘    └─────────────────┘\n         ▲                       │\n         │              ┌─────────────────┐\n         └──────────────│ REINFORCE       │\n                        │ Trainer         │\n                        └─────────────────┘\n```\n\n## 🚀 Quick Start\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/your-org/dynamic-orchestrator-agent.git\ncd dynamic-orchestrator-agent\n\n# Install dependencies\npip install torch numpy dataclasses-json typing-extensions\n\n# Or use Poetry\npoetry install\n```\n\n### Run the MVP Training\n\n```bash\npython examples/run_mvp_training.py\n```\n\nThis will start training the orchestrator to learn optimal agent selection patterns!\n\n## 📊 Expected Output\n\n```\n🚀 Starting DOA Framework MVP Training\nEpochs: 50, Episodes per epoch: 10\nState embedding dim: 64, Hidden dim: 128\nLearning rate: 0.001, Max steps: 4\n------------------------------------------------------------\nInitialized 2 agents: ['EchoAgent', 'TerminatorAgent']\nReward config: λ=0.1, γ=0.99\nPolicy network: 17154 parameters\nAll components initialized successfully!\n============================================================\nEpoch   1/50 | Avg Reward: -0.234 | Success Rate:  20.0% | Loss:  0.45123\nEpoch   2/50 | Avg Reward: -0.156 | Success Rate:  30.0% | Loss:  0.38901\n...\nEpoch  50/50 | Avg Reward:  0.823 🌟 | Success Rate:  90.0% | Loss:  0.12456\n```\n\n## 🧩 Core Components\n\n### 1. **Orchestrator** (`doa_framework/orchestrator.py`)\nCentral coordinator that uses a neural policy to select agents dynamically.\n\n### 2. **Policy Network** (`doa_framework/policy.py`)\nNeural network that learns to map system states to agent selection probabilities.\n\n### 3. **Agent Interface** (`doa_framework/agents/base.py`)\nStandardized interface for implementing custom agents.\n\n### 4. **REINFORCE Trainer** (`doa_framework/trainer.py`)\nPolicy gradient trainer that optimizes the orchestrator's decision-making.\n\n### 5. **Reward System** (`doa_framework/rewards.py`)\nConfigurable reward function balancing task success and computational efficiency.\n\n## 🔧 Key Features\n\n- **🧠 Learnable Orchestration**: Neural policy learns optimal agent selection\n- **⚖️ Cost-Performance Balance**: Configurable λ parameter for cost vs. accuracy trade-offs\n- **🔄 Dynamic Topologies**: Supports complex reasoning patterns including cycles\n- **📈 Continuous Improvement**: REINFORCE-based learning from experience\n- **🔌 Modular Design**: Easy to add new agents and tools\n- **📊 Rich Observability**: Comprehensive trajectory logging and metrics\n\n## 🎯 Use Cases\n\n- **🤖 Multi-Agent AI Systems**: Coordinate specialized AI agents for complex tasks\n- **💼 Business Process Automation**: Optimize workflows with multiple AI components\n- **🔬 Research \u0026 Development**: Experiment with adaptive multi-agent architectures\n- **🎓 Educational**: Learn about RL-based coordination and multi-agent systems\n\n## 📈 Performance Metrics\n\nThe framework tracks several key metrics:\n\n- **Task Success Rate**: Percentage of successfully completed tasks\n- **Average Reward**: Balances success and computational cost\n- **Agent Utilization**: How frequently each agent is selected\n- **Convergence Speed**: How quickly the policy learns optimal patterns\n\n## 🛠️ Extending the Framework\n\n### Adding Custom Agents\n\n```python\nfrom doa_framework.agents.base import AgentInterface\nfrom doa_framework.structs import SystemState, AgentOutput\n\nclass MyCustomAgent(AgentInterface):\n    def __init__(self, name: str = \"MyCustomAgent\"):\n        super().__init__(name)\n\n    def execute(self, state: SystemState) -\u003e AgentOutput:\n        # Your agent logic here\n        result = self.process_task(state.task_specification)\n        return AgentOutput(\n            content=result,\n            cost=1.5,  # Computational cost\n            metadata={\"agent_type\": \"custom\"}\n        )\n```\n\n### Configuring Rewards\n\n```python\nfrom doa_framework import RewardConfig\n\n# Emphasize cost efficiency\ncost_focused_config = RewardConfig(\n    lambda_cost_penalty=0.5,  # Higher cost penalty\n    task_success_bonus=1.0,\n    task_failure_penalty=-2.0\n)\n\n# Emphasize task success\nperformance_focused_config = RewardConfig(\n    lambda_cost_penalty=0.05,  # Lower cost penalty\n    task_success_bonus=2.0,\n    task_failure_penalty=-1.0\n)\n```\n\n## 📚 Technical Details\n\n### State Representation\nThe system state includes:\n- **Task Specification**: The current task description\n- **Execution History**: Sequence of (agent_name, agent_output) pairs\n- **Step Information**: Current step and maximum allowed steps\n- **Custom Data**: Extensible metadata storage\n\n### Reward Function\nBased on the paper's formulation:\n- **Terminal Step**: `R_T = r - λ * C_total`\n- **Intermediate Steps**: `R_t = -λ * c_t`\n\nWhere:\n- `r`: Task success reward (+1) or failure penalty (-1)\n- `λ`: Cost penalty weight (configurable)\n- `C_total`: Total computational cost\n- `c_t`: Step-wise cost\n\n### Policy Network Architecture\n- **Input**: State embedding (task + history features)\n- **Architecture**: MLP with ReLU activations\n- **Output**: Probability distribution over available agents\n- **Training**: REINFORCE with gradient clipping\n\n## 🤝 Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\n# Install development dependencies\npoetry install --with dev\n\n# Run tests\npytest tests/\n\n# Format code\nblack doa_framework/ examples/ tests/\n\n# Type checking\nmypy doa_framework/\n```\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\nThis framework is inspired by the \"Puppeteer\" model from:\n\u003e Dang et al. (2025). \"Dynamic Multi-Agent Orchestration with Reinforcement Learning\"\n\n## 📞 Support\n\n- 📧 Email: sng@asyncfunc.ai\n- 💬 Discord: [Join our community](https://discord.gg/gMwThUMeme)\n\n---\n\n**Built with ❤️ by the DOA Team**\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasyncfuncai%2Fdynamic-orchestrator-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasyncfuncai%2Fdynamic-orchestrator-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasyncfuncai%2Fdynamic-orchestrator-agent/lists"}