https://github.com/asyncfuncai/dynamic-orchestrator-agent
Dynamic Orchestrator Agent (DOA)
https://github.com/asyncfuncai/dynamic-orchestrator-agent
Last synced: 27 days ago
JSON representation
Dynamic Orchestrator Agent (DOA)
- Host: GitHub
- URL: https://github.com/asyncfuncai/dynamic-orchestrator-agent
- Owner: AsyncFuncAI
- License: mit
- Created: 2025-05-31T16:28:17.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-06-01T04:34:08.000Z (11 months ago)
- Last Synced: 2025-06-01T13:52:44.850Z (11 months ago)
- Language: Python
- Size: 85.9 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Dynamic Orchestrator Agent (DOA) Framework π
A cutting-edge framework for adaptive multi-agent LLM collaboration with reinforcement learning-based orchestration, based on the "Puppeteer" model from the paper:
**Based on the research:**
π [Puppeteer: Adaptive Multi-Agent Orchestration with Reinforcement Learning](https://arxiv.org/abs/2505.19591)
## π Overview
The DOA Framework implements a **learnable orchestrator** that dynamically selects which agents to activate based on the current task state. Unlike static multi-agent systems, our orchestrator continuously improves through reinforcement learning, learning to:
- π― **Optimize agent selection** for better task performance
- β‘ **Minimize computational costs** through efficient orchestration
- π **Adapt to complex reasoning patterns** including cycles and hubs
- π **Self-improve** via REINFORCE-based policy optimization
## ποΈ Architecture
```
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Task Input βββββΆβ Orchestrator βββββΆβ Agent Pool β
βββββββββββββββββββ β β β β
β Policy Network β β β’ EchoAgent β
βββββββββββββββββββ β (Neural Net) β β β’ TerminatorAgentβ
β Reward Signal ββββββ β β β’ CustomAgents β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β² β
β βββββββββββββββββββ
ββββββββββββββββ REINFORCE β
β Trainer β
βββββββββββββββββββ
```
## π Quick Start
### Installation
```bash
# Clone the repository
git clone https://github.com/your-org/dynamic-orchestrator-agent.git
cd dynamic-orchestrator-agent
# Install dependencies
pip install torch numpy dataclasses-json typing-extensions
# Or use Poetry
poetry install
```
### Run the MVP Training
```bash
python examples/run_mvp_training.py
```
This will start training the orchestrator to learn optimal agent selection patterns!
## π Expected Output
```
π Starting DOA Framework MVP Training
Epochs: 50, Episodes per epoch: 10
State embedding dim: 64, Hidden dim: 128
Learning rate: 0.001, Max steps: 4
------------------------------------------------------------
Initialized 2 agents: ['EchoAgent', 'TerminatorAgent']
Reward config: Ξ»=0.1, Ξ³=0.99
Policy network: 17154 parameters
All components initialized successfully!
============================================================
Epoch 1/50 | Avg Reward: -0.234 | Success Rate: 20.0% | Loss: 0.45123
Epoch 2/50 | Avg Reward: -0.156 | Success Rate: 30.0% | Loss: 0.38901
...
Epoch 50/50 | Avg Reward: 0.823 π | Success Rate: 90.0% | Loss: 0.12456
```
## π§© Core Components
### 1. **Orchestrator** (`doa_framework/orchestrator.py`)
Central coordinator that uses a neural policy to select agents dynamically.
### 2. **Policy Network** (`doa_framework/policy.py`)
Neural network that learns to map system states to agent selection probabilities.
### 3. **Agent Interface** (`doa_framework/agents/base.py`)
Standardized interface for implementing custom agents.
### 4. **REINFORCE Trainer** (`doa_framework/trainer.py`)
Policy gradient trainer that optimizes the orchestrator's decision-making.
### 5. **Reward System** (`doa_framework/rewards.py`)
Configurable reward function balancing task success and computational efficiency.
## π§ Key Features
- **π§ Learnable Orchestration**: Neural policy learns optimal agent selection
- **βοΈ Cost-Performance Balance**: Configurable Ξ» parameter for cost vs. accuracy trade-offs
- **π Dynamic Topologies**: Supports complex reasoning patterns including cycles
- **π Continuous Improvement**: REINFORCE-based learning from experience
- **π Modular Design**: Easy to add new agents and tools
- **π Rich Observability**: Comprehensive trajectory logging and metrics
## π― Use Cases
- **π€ Multi-Agent AI Systems**: Coordinate specialized AI agents for complex tasks
- **πΌ Business Process Automation**: Optimize workflows with multiple AI components
- **π¬ Research & Development**: Experiment with adaptive multi-agent architectures
- **π Educational**: Learn about RL-based coordination and multi-agent systems
## π Performance Metrics
The framework tracks several key metrics:
- **Task Success Rate**: Percentage of successfully completed tasks
- **Average Reward**: Balances success and computational cost
- **Agent Utilization**: How frequently each agent is selected
- **Convergence Speed**: How quickly the policy learns optimal patterns
## π οΈ Extending the Framework
### Adding Custom Agents
```python
from doa_framework.agents.base import AgentInterface
from doa_framework.structs import SystemState, AgentOutput
class MyCustomAgent(AgentInterface):
def __init__(self, name: str = "MyCustomAgent"):
super().__init__(name)
def execute(self, state: SystemState) -> AgentOutput:
# Your agent logic here
result = self.process_task(state.task_specification)
return AgentOutput(
content=result,
cost=1.5, # Computational cost
metadata={"agent_type": "custom"}
)
```
### Configuring Rewards
```python
from doa_framework import RewardConfig
# Emphasize cost efficiency
cost_focused_config = RewardConfig(
lambda_cost_penalty=0.5, # Higher cost penalty
task_success_bonus=1.0,
task_failure_penalty=-2.0
)
# Emphasize task success
performance_focused_config = RewardConfig(
lambda_cost_penalty=0.05, # Lower cost penalty
task_success_bonus=2.0,
task_failure_penalty=-1.0
)
```
## π Technical Details
### State Representation
The system state includes:
- **Task Specification**: The current task description
- **Execution History**: Sequence of (agent_name, agent_output) pairs
- **Step Information**: Current step and maximum allowed steps
- **Custom Data**: Extensible metadata storage
### Reward Function
Based on the paper's formulation:
- **Terminal Step**: `R_T = r - Ξ» * C_total`
- **Intermediate Steps**: `R_t = -Ξ» * c_t`
Where:
- `r`: Task success reward (+1) or failure penalty (-1)
- `Ξ»`: Cost penalty weight (configurable)
- `C_total`: Total computational cost
- `c_t`: Step-wise cost
### Policy Network Architecture
- **Input**: State embedding (task + history features)
- **Architecture**: MLP with ReLU activations
- **Output**: Probability distribution over available agents
- **Training**: REINFORCE with gradient clipping
## π€ Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
### Development Setup
```bash
# Install development dependencies
poetry install --with dev
# Run tests
pytest tests/
# Format code
black doa_framework/ examples/ tests/
# Type checking
mypy doa_framework/
```
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π Acknowledgments
This framework is inspired by the "Puppeteer" model from:
> Dang et al. (2025). "Dynamic Multi-Agent Orchestration with Reinforcement Learning"
## π Support
- π§ Email: sng@asyncfunc.ai
- π¬ Discord: [Join our community](https://discord.gg/gMwThUMeme)
---
**Built with β€οΈ by the DOA Team**