{"id":19466791,"url":"https://github.com/dena/handyrl","last_synced_at":"2025-05-16T12:09:54.914Z","repository":{"id":36989299,"uuid":"268993429","full_name":"DeNA/HandyRL","owner":"DeNA","description":"HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.","archived":false,"fork":false,"pushed_at":"2025-02-25T01:09:26.000Z","size":627,"stargazers_count":287,"open_issues_count":39,"forks_count":43,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-04-12T06:14:31.640Z","etag":null,"topics":["deep-learning","distributed-training","games","machine-learning","policy-gradient","pytorch","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DeNA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-03T05:00:05.000Z","updated_at":"2025-03-03T06:19:37.000Z","dependencies_parsed_at":"2023-10-12T01:06:32.669Z","dependency_job_id":"8c655f6d-be4c-41bd-890d-3a0abbd1ef60","html_url":"https://github.com/DeNA/HandyRL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeNA%2FHandyRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeNA%2FHandyRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeNA%2FHandyRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeNA%2FHandyRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DeNA","download_url":"https://codeload.github.com/DeNA/HandyRL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248525138,"owners_count":21118619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","distributed-training","games","machine-learning","policy-gradient","pytorch","reinforcement-learning"],"created_at":"2024-11-10T18:30:14.657Z","updated_at":"2025-04-12T06:14:40.163Z","avatar_url":"https://github.com/DeNA.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![HandyRL](docs/img/logo.png)\n\n![](https://github.com/DeNA/HandyRL/workflows/pytest/badge.svg?branch=master)\n\n**Quick to Start, Easy to Win**\n* Prepare your own environment\n* Let’s start large-scale distributed reinforcement learning\n* Get your strong AI agent!\n\nHandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments. HandyRL focuses on a practicable algorithm and implementation to create a strong and winning AI in competitive games. For large scale training, HandyRL provides a controllable high parallelism power according to your environment.\n\n\n* [More About HandyRL](#More-About-HandyRL)\n* [Installation](#Installation)\n* [Getting Started](#Getting-Started)\n    * [Train AI Model for Tic-Tac-Toe](#Train-AI-Model-for-Tic-Tac-Toe)\n* [Documentation](#Documentation)\n    * [Config Parameters](docs/parameters.md)\n    * [Large Scale Training](docs/large_scale_training.md)\n    * [Train with Customized Environment](docs/custom_environment.md)\n    * [API](docs/api.md)\n* [Use Cases](#Use-Cases)\n\nHandyRL is updated at the beginning of every month except for important updates. We appreciate all contributions. Please let us know if you find a bug or have a suggestion by creating an issue and a PR.\n\n## More About HandyRL\n\nHandyRL mainly provides **a policy gradient algorithm with off-policy correction**.\nFrom the perspective of stability and performance, the off-policy version policy gradient works fine in practice. So it’s a good first choice to create a baseline AI model.\nYou can use some off-policy variants of update methods (targets of policy and value) from traditional ones (monte carlo, TD(λ)) to novel ones (V-Trace, UPGO).\nThese items can be changed in `config.yaml`.\n\nAs a training architecture, HandyRL adopts **a learner-worker style architecture** like IMPALA.\nThe learner is a brain of training which updates a model and controls the workers.\nThe workers have two roles. They asynchronously generate episodes (trajectories) and evaluate trained models.\nIn episode generation, self-play is conducted as default.\n\n\n## Installation\n\n### Install dependencies\n\nHandyRL supports Python3.7+. At first, copy or fork HandyRL repository to your environment. If you want to use this script in your private project, just copy the files to your project directory and modify it there.\n```\ngit clone https://github.com/DeNA/HandyRL.git\ncd HandyRL\n```\n\nThen, install additional libraries (e.g. numpy, pytorch). Or run it in a virtual environment or container (e.g. Docker).\n```\npip3 install -r requirements.txt\n```\n\nTo use games of kaggle environments (e.g. Hungry Geese) you can install also additional dependencies.\n```\npip3 install -r handyrl/envs/kaggle/requirements.txt\n```\n\n\n## Getting Started\n\n\n### Train AI Model for Tic-Tac-Toe\n\nThis section shows the training a model for [Tic-Tac-Toe](https://en.wikipedia.org/wiki/Tic-tac-toe). Tic-Tac-Toe is a very simple game. You can play by googling \"Tic-Tac-Toe\".\n\n#### Step 1: Set up configuration\n\nSet `config.yaml` for your training configuration. When you run a training with Tic-Tac-Toe and batch size 64, set like the following:\n\n\n```yaml\nenv_args:\n    env: 'TicTacToe'\n\ntrain_args:\n    ...\n    batch_size: 64\n    ...\n```\n\nNOTE: [Here is the list of games implemented in HandyRL](handyrl/envs). All parameters are shown in [Config Parameters](docs/parameters.md).\n\n\n#### Step 2: Train!\n\nAfter creating the configuration, you can start training by running the following command. The trained models are saved in `models` folder every `update_episodes` described in `config.yaml`.\n\n```\npython main.py --train\n```\n\n\n#### Step 3: Evaluate\n\nAfter training, you can evaluate the model against any models. The below code evaluate the model of epoch 1 for 100 games with 4 processes.\n\n\n```\npython main.py --eval models/1.pth 100 4\n```\n\nNOTE: Default opponent AI is random agent implemented in `evaluation.py`. You can change the agent with any of your agents.\n\n\n## Documentation\n\n* [**Config Parameters**](docs/parameters.md) shows a list of parameters of `config.yaml`.\n* [**Large Scale Training**](docs/large_scale_training.md) is a procedure for large scale training remotely.\n* [**Train with Customized Environment**](docs/custom_environment.md) explains an interface of environment to create your own game.\n* [**API**](docs/api.md) shows entry-point APIs of `main.py`\n\n\n## Use Cases\n\n*   [The 1st place solution in Hungry Geese (Kaggle)](https://www.kaggle.com/c/hungry-geese/discussion/263279)\n*   [The 5th place solution in Google Research Football with Manchester City F.C. (Kaggle)](https://www.kaggle.com/c/google-football/discussion/203412)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdena%2Fhandyrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdena%2Fhandyrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdena%2Fhandyrl/lists"}