{"id":13493885,"url":"https://github.com/openai/maddpg","last_synced_at":"2025-05-15T12:06:18.240Z","repository":{"id":37385199,"uuid":"119879716","full_name":"openai/maddpg","owner":"openai","description":"Code for the MADDPG algorithm from the paper \"Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments\"","archived":false,"fork":false,"pushed_at":"2024-04-01T21:04:21.000Z","size":55,"stargazers_count":1765,"open_issues_count":48,"forks_count":504,"subscribers_count":148,"default_branch":"master","last_synced_at":"2025-04-14T19:01:16.840Z","etag":null,"topics":["paper"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/1706.02275.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-01T18:59:57.000Z","updated_at":"2025-04-14T05:54:38.000Z","dependencies_parsed_at":"2022-07-16T05:30:38.176Z","dependency_job_id":"8fe6b774-269b-4109-b070-fed479ad7855","html_url":"https://github.com/openai/maddpg","commit_stats":{"total_commits":11,"total_committers":5,"mean_commits":2.2,"dds":0.4545454545454546,"last_synced_commit":"3ceefa0ada3ff31d633dd0bde8ff95213ce99be3"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openai%2Fmaddpg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openai%2Fmaddpg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openai%2Fmaddpg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openai%2Fmaddpg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openai","download_url":"https://codeload.github.com/openai/maddpg/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254337613,"owners_count":22054253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["paper"],"created_at":"2024-07-31T19:01:19.724Z","updated_at":"2025-05-15T12:06:13.229Z","avatar_url":"https://github.com/openai.png","language":"Python","funding_links":[],"categories":["Python","Swarm Intelligence \u0026 Multi-Agent Systems","Swarm Control Algorithms"],"sub_categories":[],"readme":"**Status:** Archive (code is provided as-is, no updates expected)\n\n# Multi-Agent Deep Deterministic Policy Gradient (MADDPG)\n\nThis is the code for implementing the MADDPG algorithm presented in the paper:\n[Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/pdf/1706.02275.pdf).\nIt is configured to be run in conjunction with environments from the\n[Multi-Agent Particle Environments (MPE)](https://github.com/openai/multiagent-particle-envs).\nNote: this codebase has been restructured since the original paper, and the results may\nvary from those reported in the paper.\n\n**Update:** the original implementation for policy ensemble and policy estimation can be found [here](https://www.dropbox.com/s/jlc6dtxo580lpl2/maddpg_ensemble_and_approx_code.zip?dl=0). The code is provided as-is. \n\n## Installation\n\n- To install, `cd` into the root directory and type `pip install -e .`\n\n- Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)\n\n## Case study: Multi-Agent Particle Environments\n\nWe demonstrate here how the code can be used in conjunction with the\n[Multi-Agent Particle Environments (MPE)](https://github.com/openai/multiagent-particle-envs).\n\n- Download and install the MPE code [here](https://github.com/openai/multiagent-particle-envs)\nby following the `README`.\n\n- Ensure that `multiagent-particle-envs` has been added to your `PYTHONPATH` (e.g. in `~/.bashrc` or `~/.bash_profile`).\n\n- To run the code, `cd` into the `experiments` directory and run `train.py`:\n\n``python train.py --scenario simple``\n\n- You can replace `simple` with any environment in the MPE you'd like to run.\n\n## Command-line options\n\n### Environment options\n\n- `--scenario`: defines which environment in the MPE is to be used (default: `\"simple\"`)\n\n- `--max-episode-len` maximum length of each episode for the environment (default: `25`)\n\n- `--num-episodes` total number of training episodes (default: `60000`)\n\n- `--num-adversaries`: number of adversaries in the environment (default: `0`)\n\n- `--good-policy`: algorithm used for the 'good' (non adversary) policies in the environment\n(default: `\"maddpg\"`; options: {`\"maddpg\"`, `\"ddpg\"`})\n\n- `--adv-policy`: algorithm used for the adversary policies in the environment\n(default: `\"maddpg\"`; options: {`\"maddpg\"`, `\"ddpg\"`})\n\n### Core training parameters\n\n- `--lr`: learning rate (default: `1e-2`)\n\n- `--gamma`: discount factor (default: `0.95`)\n\n- `--batch-size`: batch size (default: `1024`)\n\n- `--num-units`: number of units in the MLP (default: `64`)\n\n### Checkpointing\n\n- `--exp-name`: name of the experiment, used as the file name to save all results (default: `None`)\n\n- `--save-dir`: directory where intermediate training results and model will be saved (default: `\"/tmp/policy/\"`)\n\n- `--save-rate`: model is saved every time this number of episodes has been completed (default: `1000`)\n\n- `--load-dir`: directory where training state and model are loaded from (default: `\"\"`)\n\n### Evaluation\n\n- `--restore`: restores previous training state stored in `load-dir` (or in `save-dir` if no `load-dir`\nhas been provided), and continues training (default: `False`)\n\n- `--display`: displays to the screen the trained policy stored in `load-dir` (or in `save-dir` if no `load-dir`\nhas been provided), but does not continue training (default: `False`)\n\n- `--benchmark`: runs benchmarking evaluations on saved policy, saves results to `benchmark-dir` folder (default: `False`)\n\n- `--benchmark-iters`: number of iterations to run benchmarking for (default: `100000`)\n\n- `--benchmark-dir`: directory where benchmarking data is saved (default: `\"./benchmark_files/\"`)\n\n- `--plots-dir`: directory where training curves are saved (default: `\"./learning_curves/\"`)\n\n## Code structure\n\n- `./experiments/train.py`: contains code for training MADDPG on the MPE\n\n- `./maddpg/trainer/maddpg.py`: core code for the MADDPG algorithm\n\n- `./maddpg/trainer/replay_buffer.py`: replay buffer code for MADDPG\n\n- `./maddpg/common/distributions.py`: useful distributions used in `maddpg.py`\n\n- `./maddpg/common/tf_util.py`: useful tensorflow functions used in `maddpg.py`\n\n\n\n## Paper citation\n\nIf you used this code for your experiments or found it helpful, consider citing the following paper:\n\n\u003cpre\u003e\n@article{lowe2017multi,\n  title={Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments},\n  author={Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor},\n  journal={Neural Information Processing Systems (NIPS)},\n  year={2017}\n}\n\u003c/pre\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenai%2Fmaddpg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenai%2Fmaddpg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenai%2Fmaddpg/lists"}