{"id":13816801,"url":"https://github.com/marlbenchmark/on-policy","last_synced_at":"2025-05-15T13:06:59.944Z","repository":{"id":45189350,"uuid":"341473983","full_name":"marlbenchmark/on-policy","owner":"marlbenchmark","description":"This is the official implementation of Multi-Agent PPO (MAPPO).","archived":false,"fork":false,"pushed_at":"2024-07-18T10:00:36.000Z","size":240,"stargazers_count":1514,"open_issues_count":20,"forks_count":322,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-15T05:32:05.864Z","etag":null,"topics":["algorithms","hanabi","mappo","mpes","multi-agent","ppo","smac","starcraftii"],"latest_commit_sha":null,"homepage":"https://sites.google.com/view/mappo","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marlbenchmark.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-23T07:59:10.000Z","updated_at":"2025-04-14T12:05:13.000Z","dependencies_parsed_at":"2023-01-29T01:45:28.954Z","dependency_job_id":"0c1f60d3-5da7-4c73-bc5f-f6af64826d39","html_url":"https://github.com/marlbenchmark/on-policy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marlbenchmark%2Fon-policy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marlbenchmark%2Fon-policy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marlbenchmark%2Fon-policy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marlbenchmark%2Fon-policy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marlbenchmark","download_url":"https://codeload.github.com/marlbenchmark/on-policy/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254346624,"owners_count":22055808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","hanabi","mappo","mpes","multi-agent","ppo","smac","starcraftii"],"created_at":"2024-08-04T06:00:21.794Z","updated_at":"2025-05-15T13:06:59.922Z","avatar_url":"https://github.com/marlbenchmark.png","language":"Python","funding_links":[],"categories":["Envs","漏洞库_漏洞靶场"],"sub_categories":["MPE","资源传输下载"],"readme":"# MAPPO\r\n\r\n## New Update！！！We support SMAC V2 now～\r\n\r\nChao Yu*, Akash Velu*, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. \r\n\r\nThis repository implements MAPPO, a multi-agent variant of PPO. The implementation in this repositorory is used in the paper \"The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games\" (https://arxiv.org/abs/2103.01955). This repository is heavily based on https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail. We also make the off-policy repo public, please feel free to try that. [off-policy link](https://github.com/marlbenchmark/off-policy)\r\n\r\n\u003cfont color=\"red\"\u003e All hyperparameters and training curves are reported in appendix, we would strongly suggest to double check the important factors before runing the code, such as the rollout threads, episode length, ppo epoch, mini-batches, clip term and so on. \u003cfont color='red'\u003eBesides, we have updated the newest results on google football testbed and suggestions about the episode length and parameter-sharing in appendix, welcome to check that. \u003c/font\u003e\r\n\r\n\u003cfont color=\"red\"\u003e We have recently noticed that a lot of papers do not reproduce the mappo results correctly, probably due to the rough hyper-parameters description. We have updated training scripts for each map or scenario in /train/train_xxx_scripts/*.sh. Feel free to try that.\u003c/font\u003e\r\n\r\n\r\n## Environments supported:\r\n\r\n- [StarCraftII (SMAC)](https://github.com/oxwhirl/smac)\r\n- [Hanabi](https://github.com/deepmind/hanabi-learning-environment)\r\n- [Multiagent Particle-World Environments (MPEs)](https://github.com/openai/multiagent-particle-envs)\r\n- [Google Research Football (GRF)](https://github.com/google-research/football)\r\n- [StarCraftII (SMAC) v2](https://github.com/oxwhirl/smacv2)\r\n\r\n## 1. Usage\r\n**WARNING: by default all experiments assume a shared policy by all agents i.e. there is one neural network shared by all agents**\r\n\r\nAll core code is located within the onpolicy folder. The algorithms/ subfolder contains algorithm-specific code\r\nfor MAPPO. \r\n\r\n* The envs/ subfolder contains environment wrapper implementations for the MPEs, SMAC, and Hanabi. \r\n\r\n* Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for \r\neach environment. \r\n\r\n* Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named\r\nin the following manner: train_algo_environment.sh. Within each file, the map name (in the case of SMAC and the MPEs) can be altered. \r\n* Python training scripts for each environment can be found in the scripts/train/ folder. \r\n\r\n* The config.py file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones\r\nused in the paper; however, please refer to the appendix for a full list of hyperparameters used. \r\n\r\n\r\n## 2. Installation\r\n\r\n Here we give an example installation on CUDA == 10.1. For non-GPU \u0026 other CUDA version installation, please refer to the [PyTorch website](https://pytorch.org/get-started/locally/). We remark that this repo. does not depend on a specific CUDA version, feel free to use any CUDA version suitable on your own computer.\r\n\r\n``` Bash\r\n# create conda environment\r\nconda create -n marl python==3.6.1\r\nconda activate marl\r\npip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html\r\n```\r\n\r\n```\r\n# install on-policy package\r\ncd on-policy\r\npip install -e .\r\n```\r\n\r\nEven though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet.\r\n\r\n### 2.1 StarCraftII [4.10](http://blzdistsc2-a.akamaihd.net/Linux/SC2.4.10.zip)\r\n\r\n   \r\n\r\n``` Bash\r\nunzip SC2.4.10.zip\r\n# password is iagreetotheeula\r\necho \"export SC2PATH=~/StarCraftII/\" \u003e\u003e ~/.bashrc\r\n```\r\n\r\n* download SMAC Maps, and move it to `~/StarCraftII/Maps/`.\r\n\r\n* To use a stableid, copy `stableid.json` from https://github.com/Blizzard/s2client-proto.git to `~/StarCraftII/`.\r\n\r\nFor SMAC v2, please refer to https://github.com/oxwhirl/smacv2.git. Make sure you have the `32x32_flat.SC2Map` map file in your `SMAC_Maps` folder.\r\n\r\n### 2.2 Hanabi\r\nEnvironment code for Hanabi is developed from the open-source environment code, but has been slightly modified to fit the algorithms used here.  \r\nTo install, execute the following:\r\n``` Bash\r\npip install cffi\r\ncd envs/hanabi\r\nmkdir build \u0026 cd build\r\ncmake ..\r\nmake -j\r\n```\r\nHere are all hanabi [models](https://drive.google.com/drive/folders/1RIcP_rG9NY9UzaWfFsIncDcjASk5h4Nx?usp=sharing).\r\n\r\n### 2.3 MPE\r\n\r\n``` Bash\r\n# install this package first\r\npip install seaborn\r\n```\r\n\r\nThere are 3 Cooperative scenarios in MPE:\r\n\r\n* simple_spread\r\n* simple_speaker_listener, which is 'Comm' scenario in paper\r\n* simple_reference\r\n\r\n### 2.4 GRF\r\n\r\nPlease see the [football](https://github.com/google-research/football/blob/master/README.md) repository to install the football environment.\r\n\r\n## 3.Train\r\nHere we use train_mpe.sh as an example:\r\n```\r\ncd onpolicy/scripts\r\nchmod +x ./train_mpe.sh\r\n./train_mpe.sh\r\n```\r\nLocal results are stored in subfold scripts/results. Note that we use Weights \u0026 Bias as the default visualization platform; to use Weights \u0026 Bias, please register and login to the platform first. More instructions for using Weights\u0026Bias can be found in the official [documentation](https://docs.wandb.ai/). Adding the `--use_wandb` in command line or in the .sh file will use Tensorboard instead of Weights \u0026 Biases. \r\n\r\nWe additionally provide `./eval_hanabi_forward.sh` for evaluating the hanabi score over 100k trials. \r\n\r\n## 4. Publication\r\n\r\nIf you find this repository useful, please cite our [paper](https://arxiv.org/abs/2103.01955):\r\n```\r\n@inproceedings{\r\nyu2022the,\r\ntitle={The Surprising Effectiveness of {PPO} in Cooperative Multi-Agent Games},\r\nauthor={Chao Yu and Akash Velu and Eugene Vinitsky and Jiaxuan Gao and Yu Wang and Alexandre Bayen and Yi Wu},\r\nbooktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},\r\nyear={2022}\r\n}\r\n```\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarlbenchmark%2Fon-policy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarlbenchmark%2Fon-policy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarlbenchmark%2Fon-policy/lists"}