{"id":18264590,"url":"https://github.com/cheanus/alphazero-sanmill","last_synced_at":"2025-04-09T01:32:46.078Z","repository":{"id":216030802,"uuid":"698672645","full_name":"cheanus/AlphaZero-Sanmill","owner":"cheanus","description":"AlphaZero suitable for sanmill","archived":false,"fork":false,"pushed_at":"2024-04-25T02:47:06.000Z","size":127,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-14T19:51:25.705Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cheanus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-30T16:02:43.000Z","updated_at":"2024-04-25T02:47:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"8711a140-aac8-41af-8e77-b9cfb0dea002","html_url":"https://github.com/cheanus/AlphaZero-Sanmill","commit_stats":null,"previous_names":["cheanus/alphazero-sanmill"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cheanus%2FAlphaZero-Sanmill","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cheanus%2FAlphaZero-Sanmill/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cheanus%2FAlphaZero-Sanmill/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cheanus%2FAlphaZero-Sanmill/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cheanus","download_url":"https://codeload.github.com/cheanus/AlphaZero-Sanmill/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247956552,"owners_count":21024570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T11:15:18.927Z","updated_at":"2025-04-09T01:32:46.060Z","avatar_url":"https://github.com/cheanus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AlphaZero Sanmill\n一个适用于直棋（Sanmill）的AlphaZero训练及测试程序，基于开源项目[alpha-zero-general](https://github.com/suragnair/alpha-zero-general)。测试时提供了难度自平衡的人机对弈功能，以提高游玩体验。\n\n本项目的直棋所选用的特殊规则如下：\n- 棋盘上有4条斜线\n- 超过100步判定和棋\n- 当一方剩3枚棋子时可飞棋\n- 允许吃三连中的子\n\n## 优化\n相比于原项目，本项目主要做了以下优化：\n- 使用多进程加速Self-play和Pitting\n- 使用orjson加速训练样本数据的读写\n- 更改reward的计算方式，使其重视步数\n- 忽略loss中对无效action项的计算\n- 在Training的每个epoch后添加Validation步骤\n- 使用类似于Vision Transformer的结构作为backbone\n- 优化并丰富了终端输出的信息\n- 调整了MCTS以适配直棋规则\n- 引入基于指数移动平均（EMA）的对弈时难度动态自平衡机制\n\n## 环境\n本项目使用了以下第三方库：\n- pytorch\n- tqdm\n- coloredlogs\n- orjson\n\n## 训练\n运行`main.py`即可开始训练：\n```bash\npython main.py\n```\n如有需要，可以修改`main.py`中的参数，主要有：\n- `numIters`：一次Iteration中Self-play的次数\n- `numEps`：整个项目运行过程中Iteration的次数，应是`num_processes`的整数倍\n- `tempThreshold`：在Self-play过程中，当步数大于该值时，会直接选择最大概率的动作\n- `updateThreshold`：在Pitting过程中，当胜率大于该值时，会更新最佳模型\n- `maxlenOfQueue`：一次Self-play过程中，保存用于后续训练的最大棋局数\n- `numMCTSSims`：每次MCTS搜索的次数\n- `arenaCompare`：每次Pitting过程中，进行对弈的次数，应是`2*num_processes`的整数倍\n- `cpuct`：MCTS搜索中的探索参数\n- `checkpoint`：保存模型的路径\n- `numItersForTrainExamplesHistory`：保存用于后续训练的最大Iteration数\n- `num_processes`：Self-play和Pitting时使用的进程数\n- `lr`：学习率\n\n训练过程中会自动保存模型，训练结束后会自动保存最后一次训练的模型。\n\n**训练技巧**：遇到模型不收敛的情况时，可以尝试增大`numMCTSSims`，减小`lr`。\n\n## 测试\n运行`pit.py`即可开始测试：\n```bash\npython pit.py\n```\n如有需要，可以修改`pit.py`中的参数，主要有：\n- `difficulty`：改变AI的难度，范围在-1到1之间，越大越难。\n- `numMCTSSims`：每次MCTS搜索的次数\n- `cpuct`：MCTS搜索中的探索参数\n- `eat_factor`：AI对吃子的重视权重因子，越大越重视\n\n## 相关项目\n- [alpha-zero-general](https://github.com/suragnair/alpha-zero-general)  \n  通用的AlphaZero训练及测试程序\n- [Sanmill](https://github.com/calcitem/Sanmill)  \n  安卓平台上一个直棋游戏的实现\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcheanus%2Falphazero-sanmill","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcheanus%2Falphazero-sanmill","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcheanus%2Falphazero-sanmill/lists"}