{"id":28352788,"url":"https://github.com/legalaspro/unity_multiagent_rl","last_synced_at":"2025-10-07T22:33:05.604Z","repository":{"id":295894155,"uuid":"991420892","full_name":"legalaspro/unity_multiagent_rl","owner":"legalaspro","description":"Multi-agent reinforcement learning framework for Unity environments. Implements MAPPO, MASAC, MATD3, and MADDPG with comprehensive evaluation tools. Features sample-efficient training, competitive analysis, and pre-trained models achieving great performance in Tennis and Soccer environments.","archived":false,"fork":false,"pushed_at":"2025-05-30T17:07:18.000Z","size":72759,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-20T14:39:28.621Z","etag":null,"topics":["ai","collaborative-learning","competitive-intelligence","deep-learning","machine-learning","maddpg","mappo","marl","masac","matd3","multi-agent","multi-agent-reinforcement-learning","pytorch","reinforcement-learning","soccer","tennis","unity"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/legalaspro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-27T15:42:49.000Z","updated_at":"2025-06-12T17:59:22.000Z","dependencies_parsed_at":"2025-05-27T23:34:27.053Z","dependency_job_id":"4c27188b-a02c-40ee-9b04-a18c281a9fef","html_url":"https://github.com/legalaspro/unity_multiagent_rl","commit_stats":null,"previous_names":["legalaspro/unity_multiagent_rl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/legalaspro/unity_multiagent_rl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/legalaspro%2Funity_multiagent_rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/legalaspro%2Funity_multiagent_rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/legalaspro%2Funity_multiagent_rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/legalaspro%2Funity_multiagent_rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/legalaspro","download_url":"https://codeload.github.com/legalaspro/unity_multiagent_rl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/legalaspro%2Funity_multiagent_rl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278859586,"owners_count":26058509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","collaborative-learning","competitive-intelligence","deep-learning","machine-learning","maddpg","mappo","marl","masac","matd3","multi-agent","multi-agent-reinforcement-learning","pytorch","reinforcement-learning","soccer","tennis","unity"],"created_at":"2025-05-28T00:08:49.947Z","updated_at":"2025-10-07T22:33:05.599Z","avatar_url":"https://github.com/legalaspro.png","language":"Python","readme":"# Unity Multi-Agent Reinforcement Learning\n\n[![Python Version](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![CI Tests](https://github.com/legalaspro/unity_multiagent_rl/actions/workflows/ci.yml/badge.svg)](https://github.com/legalaspro/unity_multiagent_rl/actions/workflows/ci.yml)\n[![coverage status](https://codecov.io/gh/legalaspro/unity_multiagent_rl/branch/main/graph/badge.svg)](https://app.codecov.io/gh/legalaspro/unity_multiagent_rl)\n\nA multi-agent reinforcement learning framework for Unity environments. Provides implementations for training and evaluating MARL algorithms on collaborative and competitive tasks.\n\n## 🎯 Project Overview\n\nThis repository implements four multi-agent reinforcement learning algorithms for Unity environments. The framework has been validated on two environments and can be extended to support additional Unity ML-Agents environments. Train, evaluate, and compare MARL algorithms on collaborative and competitive tasks.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"results/Tennis/masac_tennis.gif\" width=\"49%\" alt=\"MASAC Tennis\" /\u003e\n  \u003cimg src=\"results/Soccer/masac_soccer.gif\" width=\"49%\" alt=\"MASAC Soccer\" /\u003e\n\u003c/div\u003e\n\n- **Tennis** - Collaborative 2-agent environment where agents control rackets to keep a ball in play. Success requires achieving +0.5 average score over 100 episodes.\n\n- **Soccer** - Competitive 4-agent environment with 2v2 teams (goalie and striker roles with different action sizes). Agents learn to score goals while defending. Success measured by win rate against previous model versions.\n\n## 🤖 Algorithms\n\nAll algorithms implement **CTDE (Centralized Training, Decentralized Execution)** architecture, enabling agents to learn with global information during training while acting independently during execution. Agents engage in **self-play**, continuously improving by playing against previous versions of themselves.\n\n**Key Implementation Features:**\n\n- **Self-play training**: Agents improve by competing against their own evolving strategies\n- **Shared critic optimization**: For shared critic variants:\n\n  - **Global observation ordering**: Current agent → teammates → opponents for consistent input\n  - **Action concatenation**: All agent actions appended in same order for centralized critic\n  - **Consistent input structure**: Improves and speeds up shared critic and agent training\n\n- **MAPPO** (Multi-Agent Proximal Policy Optimization)\n  - **All Shared**: Shared policy and critic networks across all agents\n  - **Critic Shared**: Individual policies with shared centralized critic\n  - **Independent**: Individual policies and critics per agent\n- **MATD3** (Multi-Agent Twin Delayed Deep Deterministic Policy Gradient)\n- **MASAC** (Multi-Agent Soft Actor-Critic)\n  - **Independent**: Individual actors and critics per agent\n  - **Shared Critic**: Individual actors with shared centralized critic\n- **MADDPG** (Multi-Agent Deep Deterministic Policy Gradient)\n\n## 📊 Results Summary\n\n| Environment | Algorithm             | Average Score (100-ep) | Training Steps | Agents | Notes              |\n| ----------- | --------------------- | ---------------------- | -------------- | ------ | ------------------ |\n| Tennis      | MATD3                 | 2.483                  | ~199k steps    | 2      | Best performer     |\n| Tennis      | MASAC                 | 2.450                  | ~199k steps    | 2      | Fastest to succeed |\n| Tennis      | MAPPO (All Shared)    | 1.490                  | ~501k steps    | 2      | Sample inefficient |\n| Tennis      | MADDPG                | 0.796                  | ~199k steps    | 2      | Successful         |\n| Tennis      | MAPPO (Critic Shared) | 0.765                  | ~501k steps    | 2      | Slowest to succeed |\n| Soccer      | MAPPO (Shared Critic) | 97.2% vs random        | ~1M steps      | 4      | Excellent          |\n| Soccer      | MASAC (Shared Critic) | 84.4% vs random        | ~200k steps    | 4      | Very good          |\n\n## 🚀 Quick Start\n\n### Training\n\n```bash\n# Train MASAC on Tennis environment\npython train.py --env_id Tennis --algo masac --max_steps 200000\n\n# Train MAPPO on Soccer environment with custom config\npython train.py --env_id Soccer --algo mappo --config configs/env_tuned/mappo_soccer.yaml\n\n# Render trained models\npython render.py --config results/Tennis/masac/config.yaml --model_path results/Tennis/masac/final-torch.model --worker_id 5 --render_episodes 5\n```\n\n### Visualization \u0026 Analysis\n\n```bash\n# Generate algorithm comparison plots\npython render_results.py\n\n# Create competitive evaluation plots (for Soccer)\npython render_competitive_results.py\n```\n\n## 📁 Project Structure\n\n```\n├── algos/                   # Algorithm implementations (MAPPO, MATD3, MASAC, MADDPG)\n├── networks/               # Neural network architectures (actors, critics, modules)\n├── envs/                   # Environment wrappers for Unity ML-Agents\n├── buffers/                # Experience replay and trajectory storage\n├── runners/                # Training loop implementations\n├── evals/                  # Evaluation metrics and competitive analysis\n├── configs/                # Configuration files and hyperparameters\n├── app/                    # Unity environment executables (download separately)\n├── results/                # Training outputs and saved models\n├── figures/                # Generated plots and visualizations\n├── utils/                  # Utility functions and helpers\n├── python/                 # Unity ML-Agents Python API\n├── train.py               # Main training script\n├── render_results.py      # Training results visualization\n├── render_competitive_results.py  # Competitive evaluation plots\n└── render.py              # Model rendering script\n```\n\n## 🛠️ Installation\n\n### Prerequisites\n\n- Python 3.11+\n- Git\n- Unity environments (download separately - see below)\n\n### Option 1: Using Conda (Recommended)\n\n```bash\n# Clone the repository\ngit clone https://github.com/legalaspro/unity_multiagent_rl.git\ncd unity_multiagent_rl\n\n# Create and activate environment\nconda env create -f environment.yaml\nconda activate unity_multiagent_rl\n```\n\n### Option 2: Using Pip\n\n```bash\n# Clone the repository\ngit clone https://github.com/legalaspro/unity_multiagent_rl.git\ncd unity_multiagent_rl\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\npip install -e ./python\n```\n\n### Download Unity Environments\n\n**Note**: These Unity environments use an older ML-Agents version (0.4.0 or earlier) loaded from the `python/` folder. The algorithms can be adapted to work with newer Unity ML-Agents versions.\n\nThe Unity environment executables are **not included** in this repository and must be downloaded separately:\n\n#### Tennis Environment\n\nDownload the environment that matches your operating system:\n\n- **Linux**: [Tennis_Linux.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis_Linux.zip)\n- **Mac OSX**: [Tennis.app.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis.app.zip)\n- **Windows (32-bit)**: [Tennis_Windows_x86.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis_Windows_x86.zip)\n- **Windows (64-bit)**: [Tennis_Windows_x86_64.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis_Windows_x86_64.zip)\n- **AWS/Headless**: [Tennis_Linux_NoVis.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis_Linux_NoVis.zip)\n\n#### Soccer Environment\n\nDownload the environment that matches your operating system:\n\n- **Linux**: [Soccer_Linux.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Soccer/Soccer_Linux.zip)\n- **Mac OSX**: [Soccer.app.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Soccer/Soccer.app.zip)\n- **Windows (32-bit)**: [Soccer_Windows_x86.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Soccer/Soccer_Windows_x86.zip)\n- **Windows (64-bit)**: [Soccer_Windows_x86_64.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Soccer/Soccer_Windows_x86_64.zip)\n- **AWS/Headless**: [Soccer_Linux_NoVis.zip](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Soccer/Soccer_Linux_NoVis.zip)\n\n#### Installation Steps\n\n1. Download the appropriate environment file(s) for your operating system\n2. Extract the downloaded file(s) to the `app/` directory in the project root\n3. Ensure the extracted files have the correct names:\n   - Tennis: `app/Tennis.app` (macOS) or `app/Tennis.exe` (Windows) or `app/Tennis` (Linux)\n   - Soccer: `app/Soccer.app` (macOS) or `app/Soccer.exe` (Windows) or `app/Soccer` (Linux)\n\n### Verify Installation\n\n```bash\n# Test the installation\npython train.py --help\n\n# Test with a short training run\npython train.py --env_id Tennis --algo masac --max_steps 1000\n```\n\n## 📊 Training \u0026 Evaluation\n\n### Training\n\n```bash\n# Basic training\npython train.py --env_id Tennis --algo masac --max_steps 200000\n\n# Use pre-configured settings\npython train.py --config configs/env_tuned/mappo_tennis.yaml\n```\n\n### Rendering\n\n```bash\n# Render trained models\npython render.py --config results/Tennis/masac/config.yaml --model_path results/Tennis/masac/final-torch.model --worker_id 5 --render_episodes 5\n```\n\n## 📈 Results and Visualization\n\nAll training results are stored in `results/Tennis/` and `results/Soccer/` including trained models, training data, videos, GIFs, and performance graphs.\n\n#### Tennis Algorithm Comparison\n\n![Tennis Rewards Comparison](results/Tennis/Tennis_rewards.png)\n\n_Shows training progress for all algorithms on Tennis environment. MATD3 and MASAC achieve the highest scores (~2.5), while MAPPO variants show steady improvement over longer training periods._\n\n#### Soccer Competitive Evaluation\n\n![Soccer Win Rate vs Random](results/Soccer/Soccer_competitive_win_rate_vs_random.png)\n\n_Displays win rate against random opponents over training. MAPPO (shared critic) reaches 97%+ win rate, demonstrating superior performance in competitive multi-agent scenarios._\n\n## 🔧 Configuration\n\n### Algorithm Hyperparameters\n\nEdit files in `configs/algos/` or `configs/env_tuned/` to customize:\n\n- Learning rates\n- Network architectures\n- Training parameters\n- Evaluation settings\n\n## 📝 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- Unity ML-Agents Team for the environments and Python API\n- OpenAI for algorithm implementations and research\n- PyTorch team for the deep learning framework\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flegalaspro%2Funity_multiagent_rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flegalaspro%2Funity_multiagent_rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flegalaspro%2Funity_multiagent_rl/lists"}