{"id":13793979,"url":"https://github.com/junxiaosong/AlphaZero_Gomoku","last_synced_at":"2025-05-12T20:31:20.002Z","repository":{"id":37390988,"uuid":"114990078","full_name":"junxiaosong/AlphaZero_Gomoku","owner":"junxiaosong","description":"An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row) ","archived":false,"fork":false,"pushed_at":"2024-04-24T00:40:39.000Z","size":8266,"stargazers_count":3459,"open_issues_count":78,"forks_count":985,"subscribers_count":102,"default_branch":"master","last_synced_at":"2025-04-20T02:02:04.027Z","etag":null,"topics":["alphago","alphago-zero","alphazero","board-game","gobang","gomoku","mcts","monte-carlo-tree-search","pytorch","reinforcement-learning","rl","self-learning","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/junxiaosong.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-21T09:49:40.000Z","updated_at":"2025-04-17T16:50:48.000Z","dependencies_parsed_at":"2024-04-04T01:41:31.934Z","dependency_job_id":"dfa2de23-6c26-408e-b9b3-2cf0ea447fde","html_url":"https://github.com/junxiaosong/AlphaZero_Gomoku","commit_stats":{"total_commits":34,"total_committers":10,"mean_commits":3.4,"dds":0.6764705882352942,"last_synced_commit":"a2555b26e38aaaa08270e0731c53135e6222ef46"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junxiaosong%2FAlphaZero_Gomoku","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junxiaosong%2FAlphaZero_Gomoku/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junxiaosong%2FAlphaZero_Gomoku/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junxiaosong%2FAlphaZero_Gomoku/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/junxiaosong","download_url":"https://codeload.github.com/junxiaosong/AlphaZero_Gomoku/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253816737,"owners_count":21968875,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alphago","alphago-zero","alphazero","board-game","gobang","gomoku","mcts","monte-carlo-tree-search","pytorch","reinforcement-learning","rl","self-learning","tensorflow"],"created_at":"2024-08-03T23:00:34.191Z","updated_at":"2025-05-12T20:31:18.566Z","avatar_url":"https://github.com/junxiaosong.png","language":"Python","funding_links":[],"categories":["Project","Python","Pytorch elsewhere ｜ Pytorch相关","漏洞库_漏洞靶场","Pytorch elsewhere","Open-Source Projects"],"sub_categories":["Implementation of Algorithms","Other libraries｜其他库:","资源传输下载","Other libraries:","Gomoku Projects"],"readme":"## AlphaZero-Gomoku\nThis is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) from pure self-play training. The game Gomoku is much simpler than Go or chess, so that we can focus on the training scheme of AlphaZero and obtain a pretty good AI model on a single PC in a few hours. \n\nReferences:  \n1. AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm\n2. AlphaGo Zero: Mastering the game of Go without human knowledge\n\n### Update 2018.2.24: supports training with TensorFlow!\n### Update 2018.1.17: supports training with PyTorch!\n\n### Example Games Between Trained Models\n- Each move with 400 MCTS playouts:  \n![playout400](https://raw.githubusercontent.com/junxiaosong/AlphaZero_Gomoku/master/playout400.gif)\n\n### Requirements\nTo play with the trained AI models, only need:\n- Python \u003e= 2.7\n- Numpy \u003e= 1.11\n\nTo train the AI model from scratch, further need, either:\n- Theano \u003e= 0.7 and Lasagne \u003e= 0.1      \nor\n- PyTorch \u003e= 0.2.0    \nor\n- TensorFlow\n\n**PS**: if your Theano's version \u003e 0.7, please follow this [issue](https://github.com/aigamedev/scikit-neuralnetwork/issues/235) to install Lasagne,  \notherwise, force pip to downgrade Theano to 0.7 ``pip install --upgrade theano==0.7.0``\n\nIf you would like to train the model using other DL frameworks, you only need to rewrite policy_value_net.py.\n\n### Getting Started\nTo play with provided models, run the following script from the directory:  \n```\npython human_play.py  \n```\nYou may modify human_play.py to try different provided models or the pure MCTS.\n\nTo train the AI model from scratch, with Theano and Lasagne, directly run:   \n```\npython train.py\n```\nWith PyTorch or TensorFlow, first modify the file [train.py](https://github.com/junxiaosong/AlphaZero_Gomoku/blob/master/train.py), i.e., comment the line\n```\nfrom policy_value_net import PolicyValueNet  # Theano and Lasagne\n```\nand uncomment the line \n```\n# from policy_value_net_pytorch import PolicyValueNet  # Pytorch\nor\n# from policy_value_net_tensorflow import PolicyValueNet # Tensorflow\n```\nand then execute: ``python train.py``  (To use GPU in PyTorch, set ``use_gpu=True`` and use ``return loss.item(), entropy.item()`` in function train_step in policy_value_net_pytorch.py if your pytorch version is greater than 0.5)\n\nThe models (best_policy.model and current_policy.model) will be saved every a few updates (default 50).  \n\n**Note:** the 4 provided models were trained using Theano/Lasagne, to use them with PyTorch, please refer to [issue 5](https://github.com/junxiaosong/AlphaZero_Gomoku/issues/5).\n\n**Tips for training:**\n1. It is good to start with a 6 * 6 board and 4 in a row. For this case, we may obtain a reasonably good model within 500~1000 self-play games in about 2 hours.\n2. For the case of 8 * 8 board and 5 in a row, it may need 2000~3000 self-play games to get a good model, and it may take about 2 days on a single PC.\n\n### Further reading\nMy article describing some details about the implementation in Chinese: [https://zhuanlan.zhihu.com/p/32089487](https://zhuanlan.zhihu.com/p/32089487) \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunxiaosong%2FAlphaZero_Gomoku","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjunxiaosong%2FAlphaZero_Gomoku","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunxiaosong%2FAlphaZero_Gomoku/lists"}