{"id":19723027,"url":"https://github.com/davidadsp/simple","last_synced_at":"2025-04-05T22:08:38.966Z","repository":{"id":41139539,"uuid":"332528498","full_name":"davidADSP/SIMPLE","owner":"davidADSP","description":"Selfplay In MultiPlayer Environments","archived":false,"fork":false,"pushed_at":"2024-06-12T20:17:48.000Z","size":21951,"stargazers_count":318,"open_issues_count":22,"forks_count":107,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-03-29T21:05:58.591Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidADSP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-24T18:47:37.000Z","updated_at":"2025-03-18T13:19:43.000Z","dependencies_parsed_at":"2025-02-03T20:41:12.972Z","dependency_job_id":null,"html_url":"https://github.com/davidADSP/SIMPLE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidADSP%2FSIMPLE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidADSP%2FSIMPLE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidADSP%2FSIMPLE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidADSP%2FSIMPLE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidADSP","download_url":"https://codeload.github.com/davidADSP/SIMPLE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247406091,"owners_count":20933803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T23:19:16.767Z","updated_at":"2025-04-05T22:08:38.944Z","avatar_url":"https://github.com/davidADSP.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- PROJECT SHIELDS --\u003e\n[![Contributors][contributors-shield]][contributors-url]\n[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url]\n\n\u003c!-- PROJECT LOGO --\u003e\n\u003cbr /\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/davidADSP/SIMPLE\"\u003e\n    \u003cimg src=\"images/logo.png\" alt=\"Logo\" height=\"120\"\u003e\n  \u003c/a\u003e\n\n  \u003c!-- \u003ch3 align=\"center\"\u003eSIMPLE\u003c/h3\u003e --\u003e\n\n  \u003cp align=\"center\"\u003e\n    Selfplay In MultiPlayer Environments\n    \u003c!-- \u003cbr /\u003e --\u003e\n    \u003c!-- \u003ca href=\"https://github.com/davidADSP/SIMPLE\"\u003e\u003cstrong\u003eExplore the docs »\u003c/strong\u003e\u003c/a\u003e --\u003e\n    \u003cbr /\u003e\n    \u003c!-- \u003ca href=\"https://github.com/davidADSP/SIMPLE\"\u003eView Demo\u003c/a\u003e --\u003e\n    ·\n    \u003ca href=\"https://github.com/davidADSP/SIMPLE/issues\"\u003eReport Bug\u003c/a\u003e\n    ·\n    \u003ca href=\"https://github.com/davidADSP/SIMPLE/issues\"\u003eRequest Feature\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\u003cbr\u003e\n\n\n\u003c!-- TABLE OF CONTENTS --\u003e\n\n  \u003csummary\u003e\u003ch2 style=\"display: inline-block\"\u003eTable of Contents\u003c/h2\u003e\u003c/summary\u003e\n  \u003col\u003e\n    \u003cli\u003e\n      \u003ca href=\"#about-the-project\"\u003eAbout The Project\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#getting-started\"\u003eGetting Started\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#prerequisites\"\u003ePrerequisites\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#installation\"\u003eInstallation\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#tutorial\"\u003eTutorial\u003c/a\u003e\u003c/li\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#prerequisites\"\u003eQuickstart\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#prerequisites\"\u003eTensorboard\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#custom-environments\"\u003eCustom Environments\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#parallelisation\"\u003eParallelisation\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#roadmap\"\u003eRoadmap\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#contributing\"\u003eContributing\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#contact\"\u003eContact\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#acknowledgements\"\u003eAcknowledgements\u003c/a\u003e\u003c/li\u003e\n  \u003c/ol\u003e\n\n\n\n\u003cbr\u003e\n\n---\n\u003c!-- ABOUT THE PROJECT --\u003e\n## About The Project\n\n\u003cimg src=\"images/diagram.png\" alt=\"SIMPLE Diagram\" width='100%'\u003e\n\nThis project allows you to train AI agents on custom-built multiplayer environments, through self-play reinforcement learning.\n\nIt implements [Proximal Policy Optimisation (PPO)](https://openai.com/blog/openai-baselines-ppo/), with a built-in wrapper around the multiplayer environments that handles the loading and action-taking of opponents in the environment. The wrapper delays the reward back to the PPO agent, until all opponents have taken their turn. In essence, it converts the multiplayer environment into a single-player environment that is constantly evolving as new versions of the policy network are added to the network bank.\n\nTo learn more, check out the accompanying [blog post](https://medium.com/applied-data-science/how-to-train-ai-agents-to-play-multiplayer-games-using-self-play-deep-reinforcement-learning-247d0b440717).\n\nThis guide explains how to get started with the repo, add new custom environments and tune the hyperparameters of the system.\n\nHave fun!\n\n---\n\u003c!-- GETTING STARTED --\u003e\n\n## Getting Started\n\nTo get a local copy up and running, follow these simple steps.\n\n### Prerequisites\n\nInstall [Docker](https://github.com/davidADSP/SIMPLE/issues) and [Docker Compose](https://docs.docker.com/compose/install/) to make use of the `docker-compose.yml` file\n\n### Installation\n\n1. Clone the repo\n   ```sh\n   git clone https://github.com/davidADSP/SIMPLE.git\n   cd SIMPLE\n   ```\n2. Build the image and 'up' the container.\n   ```sh\n   docker-compose up -d\n   ```\n3. Choose an environment to install in the container (`tictactoe`, `connect4`, `sushigo`, `geschenkt`, `butterfly`, and `flamme rouge` are currently implemented)\n   ```sh\n   bash ./scripts/install_env.sh sushigo\n   ```\n\n---\n\u003c!-- TUTORIAL --\u003e\n## Tutorial\n\nThis is a quick tutorial to allow you to start using the two entrypoints into the codebase: `test.py` and `train.py`.\n\n*TODO - I'll be adding more substantial documentation for both of these entrypoints in due course! For now, descriptions of each command line argument can be found at the bottom of the files themselves.*\n\n---\n\u003c!-- QUICKSTART --\u003e\n### Quickstart\n\n#### `test.py` \n\nThis entrypoint allows you to play against a trained AI, pit two AIs against eachother or play against a baseline random model.\n\nFor example, try the following command to play against a baseline random model in the Sushi Go environment.\n   ```sh\n   docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo \n   ```\n\n#### `train.py` \n\nThis entrypoint allows you to start training the AI using selfplay PPO. The underlying PPO engine is from the [Stable Baselines](https://stable-baselines.readthedocs.io/en/master/) package.\n\nFor example, you can start training the agent to learn how to play SushiGo with the following command:\n   ```sh\n   docker-compose exec app python3 train.py -r -e sushigo \n   ```\n\nAfter 30 or 40 iterations the process should have achieved above the default threshold score of 0.2 and will output a new `best_model.zip` to the `/zoo/sushigo` folder. \n\nTraining runs until you kill the process manually (e.g. with Ctrl-C), so do that now.\n\nYou can now use the `test.py` entrypoint to play 100 games silently between the current `best_model.zip` and the random baselines model as follows:\n\n  ```sh\n  docker-compose exec app python3 test.py -g 100 -a best_model base base -e sushigo \n  ```\n\nYou should see that the best_model scores better than the two baseline model opponents. \n```sh\nPlayed 100 games: {'best_model_btkce': 31.0, 'base_sajsi': -15.5, 'base_poqaj': -15.5}\n```\n\nYou can continue training the agent by dropping the `-r` reset flag from the `train.py` entrypoint arguments - it will just pick up from where it left off.\n\n   ```sh\n   docker-compose exec app python3 train.py -e sushigo \n   ```\n\nCongratulations, you've just completed one training cycle for the game Sushi Go! The PPO agent will now have to work out a way to beat the model it has just created...\n\n---\n\u003c!-- TENSORBOARD --\u003e\n### Tensorboard\n\nTo monitor training, you can start Tensorboard with the following command:\n\n  ```sh\n  bash scripts/tensorboard.sh\n  ```\n\nNavigate to `localhost:6006` in a browser to view the output.\n\nIn the `/zoo/pretrained/` folder there is a pre-trained `/\u003cgame\u003e/best_model.zip` for each game, that can be copied up a directory (e.g. to `/zoo/sushigo/best_model.zip`) if you want to test playing against a pre-trained agent right away.\n\n---\n\u003c!-- CUSTOM ENVIRONMENTS --\u003e\n### Custom Environments\n\nYou can add a new environment by copying and editing an existing environment in the `/environments/` folder.\n\nFor the environment to work with the SIMPLE self-play wrapper, the class must contain the following methods (expanding on the standard methods from the OpenAI Gym framework):\n\n`__init__`\n\nIn the initiation method, you need to define the usual `action_space` and `observation_space`, as well as two additional variables: \n  * `n_players` - the number of players in the game\n  * `current_player_num` - an integer that tracks which player is currently active\n   \n\n`step`\n\nThe `step` method accepts an `action` from the current active player and performs the necessary steps to update the game environment. It should also it should update the `current_player_num` to the next player, and check to see if an end state of the game has been reached.\n\n\n`reset`\n\nThe `reset` method is called to reset the game to the starting state, ready to accept the first action.\n\n\n`render`\n\nThe `render` function is called to output a visual or human readable summary of the current game state to the log file.\n\n\n`observation`\n\nThe `observation` function returns a numpy array that can be fed as input to the PPO policy network. It should return a numeric representation of the current game state, from the perspective of the current player, where each element of the array is in the range `[-1,1]`.\n\n\n`legal_actions`\n\nThe `legal_actions` function returns a numpy vector of the same length as the action space, where 1 indicates that the action is valid and 0 indicates that the action is invalid.\n\n\nPlease refer to existing environments for examples of how to implement each method.\n\nYou will also need to add the environment to the two functions in `/utils/register.py` - follow the existing examples of environments for the structure.\n\n---\n\u003c!-- Parallelisation --\u003e\n### Parallelisation\n\nThe training process can be parallelised using MPI across multiple cores.\n\nFor example to run 10 parallel threads that contribute games to the current iteration, you can simply run:\n\n  ```sh\n  docker-compose exec app mpirun -np 10 python3 train.py -e sushigo \n  ```\n\n---\n\u003c!-- ROADMAP --\u003e\n## Roadmap\n\nSee the [open issues](https://github.com/davidADSP/SIMPLE/issues) for a list of proposed features (and known issues).\n\n\n---\n\u003c!-- CONTRIBUTING --\u003e\n## Contributing\n\nAny contributions you make are **greatly appreciated**.\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\n---\n\u003c!-- LICENSE --\u003e\n## License\n\nDistributed under the GPL-3.0. See `LICENSE` for more information.\n\n\n---\n\u003c!-- CONTACT --\u003e\n## Contact\n\nDavid Foster - [@davidADSP](https://twitter.com/davidADSP) - david@adsp.ai\n\nProject Link: [https://github.com/davidADSP/SIMPLE](https://github.com/davidADSP/SIMPLE)\n\n\n---\n\u003c!-- ACKNOWLEDGEMENTS --\u003e\n## Acknowledgements\n\nThere are many repositories and blogs that have helped me to put together this repository. One that deserves particular acknowledgement is David's Ha's Slime Volleyball Gym, that also implements multi-agent reinforcement learning. It has helped to me understand how to adapt the callback function to a self-play setting and also to how to implement MPI so that the codebase can be highly parallelised. Definitely worth checking out! \n\n* [David Ha - Slime Volleyball Gym](https://github.com/hardmaru/slimevolleygym)\n\n---\n\u003c!-- MARKDOWN LINKS \u0026 IMAGES --\u003e\n\u003c!-- https://www.markdownguide.org/basic-syntax/#reference-style-links --\u003e\n[contributors-shield]: https://img.shields.io/github/contributors/davidADSP/SIMPLE.svg?style=for-the-badge\n[contributors-url]: https://github.com/davidADSP/SIMPLE/graphs/contributors\n[forks-shield]: https://img.shields.io/github/forks/davidADSP/SIMPLE.svg?style=for-the-badge\n[forks-url]: https://github.com/davidADSP/SIMPLE/network/members\n[stars-shield]: https://img.shields.io/github/stars/davidADSP/SIMPLE.svg?style=for-the-badge\n[stars-url]: https://github.com/davidADSP/SIMPLE/stargazers\n[issues-shield]: https://img.shields.io/github/issues/davidADSP/SIMPLE.svg?style=for-the-badge\n[issues-url]: https://github.com/davidADSP/SIMPLE/issues\n[license-shield]: https://img.shields.io/github/license/davidADSP/SIMPLE.svg?style=for-the-badge\n[license-url]: https://github.com/davidADSP/SIMPLE/blob/master/LICENSE.txt\n[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge\u0026logo=linkedin\u0026colorB=555\n[linkedin-url]: https://linkedin.com/in/davidtfoster","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidadsp%2Fsimple","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidadsp%2Fsimple","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidadsp%2Fsimple/lists"}