Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/quantumiracle/mars
MARS is shortened for Multi-Agent Research Studio, a library for mulit-agent reinforcement learning research.
https://github.com/quantumiracle/mars
Last synced: 11 days ago
JSON representation
MARS is shortened for Multi-Agent Research Studio, a library for mulit-agent reinforcement learning research.
- Host: GitHub
- URL: https://github.com/quantumiracle/mars
- Owner: quantumiracle
- License: apache-2.0
- Created: 2021-06-30T03:24:32.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-03-08T21:26:49.000Z (9 months ago)
- Last Synced: 2024-11-02T03:11:51.871Z (18 days ago)
- Language: Jupyter Notebook
- Size: 11.3 GB
- Stars: 44
- Watchers: 6
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MARS - Multi-Agent Research Studio
*If life exists on Mars, shall we human cooperate or compete with it?*
**Table of contents:**
- [Status](#status)
- [Installation](#installation)
- [Usage](#usage)
- [Description](#description)
- [Support](#support)
- [Quick Start](#quick-start)
- [Advanced Usage](#advanced-usage)
- [Development](#development)
- [License](#license)
- [Citation](#citation)
- [Primary Results](#primary-results)## Description
If you have any question (propose an ISSUE if it's general problem) or want to contribute to this repository, feel free to contact me: *[email protected]*
**MARS** is a comprehensive library for benchmarking multi-player zero-sum Markov games, including our proposed **Nash-DQN** algorithm, as well as other baselines methods like **Self-play, Fictitious Self-play, Neural Fictitious Self-play, Policy Space Response Oracle**, etc. An independent implementation of **Nash-DQN** algorithm is provided in [another repo](https://github.com/quantumiracle/nash-dqn) if you wanna have a quick understanding.
## Installation
```
git clone --depth=1 https://github.com/quantumiracle/MARS.git # depth=1 ensures small size
cd MARS
conda env create -f conda_env_mars.yml
conda activate mars
```## Usage
### Description
MARS is mainly built for solving **mult-agent Atari games** in [PettingZoo](https://www.pettingzoo.ml/atari), especially competitive (zero-sum) games.
A comprehensive usage [document](http://htmlpreview.github.io/?https://github.com/quantumiracle/MARS/blob/master/docs/build/html/index.html) is provided.
Some [tutorials](https://github.com/quantumiracle/MARS/tree/master/tutorial) are provided for simple MARL concepts, including building an arbitrary matrix game, solving the Nash equilibrium with different algorithms for matrix games, building arbitrary Markov game, solving Markov games, etc.
MARS is still under-development and not prepared to release yet. You may find it hard to clone b.c. the author is testing algorithms with some models hosted on Git.
### Support
The `EnvSpec = Environment type + '_' + Environment Name` as a convention in MARS.Supported environments are as following:
| Environment Type | Environment Name |
| --------------- | --------------------------------------------------|
| [`gym`](https://github.com/openai/gym) | all standard envs in OpenAI Gym |
|[`pettingzoo`](https://www.pettingzoo.ml) | 'basketball_pong_v3', 'boxing_v2', 'combat_jet_v1', 'combat_tank_v2', 'double_dunk_v3', 'entombed_competitive_v3', 'entombed_cooperative_v3', 'flag_capture_v2', 'foozpong_v3', 'ice_hockey_v2', 'joust_v3','mario_bros_v3', 'maze_craze_v3', 'othello_v3', 'pong_v3', 'quadrapong_v4', 'space_invaders_v2', 'space_war_v2', 'surround_v2', 'tennis_v3', 'video_checkers_v4', 'volleyball_pong_v2', 'warlords_v3', 'wizard_of_wor_v3'; 'dou_dizhu_v4', 'go_v5', 'leduc_holdem_v4', 'rps_v2', 'texas_holdem_no_limit_v6', 'texas_holdem_v4', 'tictactoe_v3', 'uno_v4' |
|[`lasertag`](https://github.com/younggyoseo/lasertag-v0) | 'LaserTag-small2-v0', 'LaserTag-small3-v0', 'LaserTag-small4-v0' |
|[`slimevolley`](https://github.com/hardmaru/slimevolleygym) | 'SlimeVolley-v0', 'SlimeVolleySurvivalNoFrameskip-v0', 'SlimeVolleyNoFrameskip-v0', 'SlimeVolleyPixel-v0' |
|[`robosumo`](https://github.com/openai/robosumo) | 'RoboSumo-Ant-vs-Ant-v0', 'RoboSumo-Ant-vs-Bug-v0', 'RoboSumo-Ant-vs-Spider-v0', 'RoboSumo-Bug-vs-Ant-v0', 'RoboSumo-Bug-vs-Bug-v0', 'RoboSumo-Bug-vs-Spider-v0', 'RoboSumo-Spider-vs-Ant-v0', 'RoboSumo-Spider-vs-Bug-v0','RoboSumo-Spider-vs-Spider-v0' |
|[`mdp`](https://github.com/quantumiracle/MARS/tree/master/mars/env/mdp)| 'arbitrary_mdp', 'arbitrary_richobs_mdp', 'attack', 'combinatorial_lock' |Supported algorithms are as following:
| Method | Descriptions |
| --------------- | --------------- |
| Self-play | iterative best response |
| [Fictitious Self-Play](http://proceedings.mlr.press/v37/heinrich15.pdf) | iterative best response to the opponent's historical average strategy |
| [Neural Fictitious Self-Play](https://arxiv.org/abs/1603.01121) | a neural approximation version of FSP |
| [Policy Space Responce Oracle](https://proceedings.neurips.cc/paper/2017/file/3323fe11e9595c09af38fe67567a9394-Paper.pdf) | neural version of Double Oracle, which is iterative best response of opponent's meta Nash strategy |
| [Nash Q-learning](https://www.jmlr.org/papers/volume4/temp/hu03a.pdf) | model free, provable convergence under unique Nash assumption at each stage game |
| [Nash Value Iteration](http://proceedings.mlr.press/v139/liu21z.html) | model based, provable efficient convergence with optimistic value estimation (exploration bonus) |
| [Nash DQN](https://arxiv.org/pdf/2207.08894.pdf) | neural version of Nash Q-learning or Nash Value Iteration |
| [Nash DQN with Exploiter](https://arxiv.org/pdf/2207.08894.pdf) | Nash DQN with asymmetric learning scheme, opponent as an exploiter |### Quick Start:
**1. Train with MARL algorithm**:
*Format*:
`python general_train.py --env **EnvSpec** --method **Method** --save_id **WheretoSave**`
*Example*:
```bash
# PettingZoo Boxing_v1, neural fictitious self-play
python general_train.py --env pettingzoo_boxing_v1 --method nfsp --save_id train_0# PettingZoo Pong_v2, fictitious self-play
python general_train.py --env pettingzoo_pong_v2 --method fictitious_selfplay --save_id train_1# PettingZoo Surround_v1, policy space response oracle
python general_train.py --env pettingzoo_surround_v1 --method prso --save_id train_3# SlimeVolley SlimeVolley-v0, self-play
python general_train.py --env slimevolley_SlimeVolley-v0 --method selfplay --save_id train_4
```To see all user input arguments:
```bash
python general_train.py --help
```**2. Exploit a trained model**:
*Format*:
`python general_exploit.py --env **EnvSpec** --method **Method** --load_id **TrainedModelID** --save_id **WheretoSave** --to_exploit **ExploitWhichPlayer**`
*Example*:
```
python general_exploit.py --env pettingzoo_boxing_v1 --method nfsp --load_id train_0 --save_id exploit_0 --to_exploit second
```More examples are provided in [`./examples/`](https://github.com/quantumiracle/MARS/tree/master/examples) and [`./unit_test/`](https://github.com/quantumiracle/MARS/tree/master/unit_test). Note that these files need to be put under the **root** directory (`./`) to run.
### Advanced Usage:
**1. Use [Wandb](https://wandb.ai) for logging training results**:
*Format*:
`python general_train.py --env **EnvSpec** --method **Method** --save_id **WheretoSave --wandb_activate True --wandb_entity **YourWandbAccountName** --wandb_project **ProjectName**`*Example*:
```
python general_train.py --env pettingzoo_boxing_v1 --method nfsp --save_id multiprocess_train_0 --wandb_activate True --wandb_entity name --wandb_project pettingzoo_boxing_v1_nfsp
```**2. Train with MARL algorithm with multiprocess sampling and update**:
*Format*:
`python general_launch.py --env **EnvSpec** --method **Method** --save_id **WheretoSave**`
*Example*:
```
python general_launch.py --env pettingzoo_boxing_v1 --method nfsp --save_id multiprocess_train_0
```**3. Exploit a trained model (same as above)**:
*Example*:
```
python general_exploit.py --env pettingzoo_boxing_v1 --method nfsp --load_id multiprocess_train_0 --save_id exploit_0 --to_exploit second
```**4. Test a trained MARL model in single-agent Atari**:
This function is for limited environments (like *boxing*) since not all envs in PettingZoo Atari has a single-agent counterpart in OpenAI Gym.
*Example*:
```
python general_test.py --env pettingzoo_boxing_v1 --method nfsp --load_id train_0 --save_id test_0
```**5. Bash script for server**:
Those bash scripts to run multiple tasks on servers are provided in `./server_bash_scripts`. For example, to run a training bash script (put it in the **root** directory):
*Example*:
```bash
./general_train.sh
```## Development
Basic single-agent RL algorithms (for best response, etc) to do:
- [x] DQN
- [x] PPOMARL algorithms:
- [x] Self-Play
- [x] [Fictitious Self-Play](http://proceedings.mlr.press/v37/heinrich15.pdf)
- [x] [Neural Fictitious Self-Play](https://arxiv.org/abs/1603.01121)
- [x] [Policy Space Responce Oracle](https://proceedings.neurips.cc/paper/2017/file/3323fe11e9595c09af38fe67567a9394-Paper.pdf)
- [x] Nash-DQN
- [x] Nash-DQN-ExploiterSupported environments:
- [x] [Openai Gym](https://github.com/openai/gym)
- [x] [PettingZoo](https://www.pettingzoo.ml)
- [x] [LaserTag](https://github.com/younggyoseo/lasertag-v0)
- [x] [SlimeVolley](https://github.com/hardmaru/slimevolleygym)
- [x] [Robosumo](https://github.com/openai/robosumo) (requiring gym==0.16)
- [x] [Matrix Markov Game](https://github.com/quantumiracle/MARS/tree/master/mars/env/mdp)## Primary Results
Two agents in *SlimeVolley-v0* trained with self-play.
Two agents in *Boxing-v1 PettingZoo* trained with self-play.
[Exploitability](exploit.md) tests are also conducted.
## License
MARS is distributed under the terms of Apache License (Version 2.0).
See [Apache License](https://github.com/quantumiracle/MARS/blob/master/LICENSE) for details.
## Citation
If you find MARS useful, please cite it in your publications.
```
@article{ding2022deep,
title={A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games},
author={Ding, Zihan and Su, Dijia and Liu, Qinghua and Jin, Chi},
journal={arXiv preprint arXiv:2207.08894},
year={2022}
}
```
```
@software{MARS,
author = {Zihan Ding, Andy Su, Qinghua Liu, Chi Jin},
title = {MARS},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/quantumiracle/MARS}},
}
```