{"id":14958658,"url":"https://github.com/zeta36/chess-alpha-zero","last_synced_at":"2025-05-15T09:07:10.122Z","repository":{"id":26901147,"uuid":"111276294","full_name":"Zeta36/chess-alpha-zero","owner":"Zeta36","description":"Chess reinforcement learning by AlphaGo Zero methods.","archived":false,"fork":false,"pushed_at":"2023-03-24T22:22:17.000Z","size":125337,"stargazers_count":2164,"open_issues_count":64,"forks_count":480,"subscribers_count":123,"default_branch":"master","last_synced_at":"2025-05-15T09:07:04.023Z","etag":null,"topics":["alphago-zero","chess","keras","reinforcement-learning","tensorflow"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Zeta36.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-11-19T08:07:54.000Z","updated_at":"2025-05-11T15:41:20.000Z","dependencies_parsed_at":"2022-07-10T22:46:53.124Z","dependency_job_id":"9b4958e7-e159-4be2-b721-98532d5f74bf","html_url":"https://github.com/Zeta36/chess-alpha-zero","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zeta36%2Fchess-alpha-zero","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zeta36%2Fchess-alpha-zero/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zeta36%2Fchess-alpha-zero/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zeta36%2Fchess-alpha-zero/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Zeta36","download_url":"https://codeload.github.com/Zeta36/chess-alpha-zero/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254310515,"owners_count":22049469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alphago-zero","chess","keras","reinforcement-learning","tensorflow"],"created_at":"2024-09-24T13:17:45.523Z","updated_at":"2025-05-15T09:07:05.114Z","avatar_url":"https://github.com/Zeta36.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/kmader/chess-alpha-zero/master?urlpath=lab)\n[![Demo Notebook](https://img.shields.io/badge/launch-demo_notebook-red.svg)](https://mybinder.org/v2/gh/kmader/chess-alpha-zero/master?filepath=notebooks%2Fdemo.ipynb)\n\nAbout\n=====\n\nChess reinforcement learning by [AlphaGo Zero](https://deepmind.com/blog/alphago-zero-learning-scratch/) methods.\n\nThis project is based on these main resources:\n1) DeepMind's Oct 19th publication: [Mastering the Game of Go without Human Knowledge](https://www.nature.com/articles/nature24270.epdf?author_access_token=VJXbVjaSHxFoctQQ4p2k4tRgN0jAjWel9jnR3ZoTv0PVW4gB86EEpGqTRDtpIz-2rmo8-KG06gqVobU5NSCFeHILHcVFUeMsbvwS-lxjqQGg98faovwjxeTUgZAUMnRQ).\n2) The \u003cb\u003egreat\u003c/b\u003e Reversi development of the DeepMind ideas that @mokemokechicken did in his repo: https://github.com/mokemokechicken/reversi-alpha-zero\n3) DeepMind just released a new version of AlphaGo Zero (named now AlphaZero) where they master chess from scratch:\nhttps://arxiv.org/pdf/1712.01815.pdf. In fact, in chess AlphaZero outperformed Stockfish after just 4 hours (300k steps) Wow!\n\nSee the [wiki](https://github.com/Akababa/Chess-Zero/wiki) for more details.\n\nNote\n----\n\nI'm the creator of this repo. I (and some others collaborators did our best: https://github.com/Zeta36/chess-alpha-zero/graphs/contributors) but we found the self-play is too much costed for an only machine. Supervised learning worked fine but we never try the self-play by itself.\n\nAnyway I want to mention we have moved to a new repo where lot of people is working in a distributed version of AZ for chess (MCTS in C++): https://github.com/glinscott/leela-chess\n\nProject is almost done and everybody will be able to participate just by executing a pre-compiled windows (or Linux) application. A really great job and effort has been done is this project and I'm pretty sure we'll be able to simulate the DeepMind results in not too long time of distributed cooperation.\n\nSo, I ask everybody that wish to see a UCI engine running a neural network to beat Stockfish go into that repo and help with his machine power.\n\nEnvironment\n-----------\n\n* Python 3.6.3\n* tensorflow-gpu: 1.3.0\n* Keras: 2.0.8\n\n### New results (after a great number of modifications due to @Akababa)\n\nUsing supervised learning on about 10k games, I trained a model (7 residual blocks of 256 filters) to a guesstimate of 1200 elo with 1200 sims/move. One of the strengths of MCTS is it scales quite well with computing power.\n\nHere you can see an example where I (black) played against the model in the repo (white):\n\n![img](https://user-images.githubusercontent.com/4205182/34333105-ada817c6-e8fe-11e7-8c01-5958aaf264c1.gif)\n\nHere you can see an example of a game where I (white, ~2000 elo) played against the model in this repo (black):\n\n![img](https://user-images.githubusercontent.com/4205182/34323276-ecd2a7b6-e806-11e7-856a-4e2394bd75df.gif)\n\n### First \"good\" results\n\nUsing the new supervised learning step I created, I've been able to train a model to the point that seems to be learning the openings of chess. Also it seems the model starts to avoid losing naively pieces.\n\nHere you can see an example of a game played for me against this model (AI plays black):\n\n![partida1](https://user-images.githubusercontent.com/17341905/33597844-ea53c8ae-d9a0-11e7-8564-4b9b0f35a221.gif)\n\nHere we have a game trained by @bame55 (AI plays white):\n\n![partida3](https://user-images.githubusercontent.com/17341905/34030278-8796f7c6-e16c-11e7-9ba4-97af15f2cde5.gif)\n\nThis model plays in this way after only 5 epoch iterations of the 'opt' worker, the 'eval' worker changed 4 times the best model (4 of 5). At this moment the loss of the 'opt' worker is 5.1 (and still seems to be converging very well).\n\nModules\n-------\n\n### Supervised Learning\n\nI've done a supervised learning new pipeline step (to use those human games files \"PGN\" we can find in internet as play-data generator).\nThis SL step was also used in the first and original version of AlphaGo and maybe chess is a some complex game that we have to pre-train first the policy model before starting the self-play process (i.e., maybe chess is too much complicated for a self training alone).\n\nTo use the new SL process is as simple as running in the beginning instead of the worker \"self\" the new worker \"sl\".\nOnce the model converges enough with SL play-data we just stop the worker \"sl\" and start the worker \"self\" so the model will start improving now due to self-play data.\n\n```bash\npython src/chess_zero/run.py sl\n```\nIf you want to use this new SL step you will have to download big PGN files (chess files) and paste them into the `data/play_data` folder ([FICS](http://ficsgames.org/download.html) is a good source of data). You can also use the [SCID program](http://scid.sourceforge.net/) to filter by headers like player ELO, game result and more.\n\n**To avoid overfitting, I recommend using data sets of at least 3000 games and running at most 3-4 epochs.**\n\n### Reinforcement Learning\n\nThis AlphaGo Zero implementation consists of three workers: `self`, `opt` and `eval`.\n\n* `self` is Self-Play to generate training data by self-play using BestModel.\n* `opt` is Trainer to train model, and generate next-generation models.\n* `eval` is Evaluator to evaluate whether the next-generation model is better than BestModel. If better, replace BestModel.\n\n\n### Distributed Training\n\nNow it's possible to train the model in a distributed way. The only thing needed is to use the new parameter:\n\n* `--type distributed`: use mini config for testing, (see `src/chess_zero/configs/distributed.py`)\n\nSo, in order to contribute to the distributed team you just need to run the three workers locally like this:\n\n```bash\npython src/chess_zero/run.py self --type distributed (or python src/chess_zero/run.py sl --type distributed)\npython src/chess_zero/run.py opt --type distributed\npython src/chess_zero/run.py eval --type distributed\n```\n\n### GUI\n* `uci` launches the Universal Chess Interface, for use in a GUI.\n\nTo set up ChessZero with a GUI, point it to `C0uci.bat` (or rename to .sh).\nFor example, this is screenshot of the random model using Arena's self-play feature:\n![capture](https://user-images.githubusercontent.com/4205182/34057277-e9c99118-e19b-11e7-91ee-dd717f7efe9d.PNG)\n\nData\n-----\n\n* `data/model/model_best_*`: BestModel.\n* `data/model/next_generation/*`: next-generation models.\n* `data/play_data/play_*.json`: generated training data.\n* `logs/main.log`: log file.\n\nIf you want to train the model from the beginning, delete the above directories.\n\nHow to use\n==========\n\nSetup\n-------\n### install libraries\n```bash\npip install -r requirements.txt\n```\n\nIf you want to use GPU, follow [these instructions](https://www.tensorflow.org/install/) to install with pip3.\n\nMake sure Keras is using Tensorflow and you have Python 3.6.3+. Depending on your environment, you may have to run python3/pip3 instead of python/pip.\n\n\nBasic Usage\n------------\n\nFor training model, execute `Self-Play`, `Trainer` and `Evaluator`.\n\n**Note**: Make sure you are running the scripts from the top-level directory of this repo, i.e. `python src/chess_zero/run.py opt`, not `python run.py opt`.\n\n\nSelf-Play\n--------\n\n```bash\npython src/chess_zero/run.py self\n```\n\nWhen executed, Self-Play will start using BestModel.\nIf the BestModel does not exist, new random model will be created and become BestModel.\n\n### options\n* `--new`: create new BestModel\n* `--type mini`: use mini config for testing, (see `src/chess_zero/configs/mini.py`)\n\nTrainer\n-------\n\n```bash\npython src/chess_zero/run.py opt\n```\n\nWhen executed, Training will start.\nA base model will be loaded from latest saved next-generation model. If not existed, BestModel is used.\nTrained model will be saved every epoch.\n\n### options\n* `--type mini`: use mini config for testing, (see `src/chess_zero/configs/mini.py`)\n* `--total-step`: specify total step(mini-batch) numbers. The total step affects learning rate of training.\n\nEvaluator\n---------\n\n```bash\npython src/chess_zero/run.py eval\n```\n\nWhen executed, Evaluation will start.\nIt evaluates BestModel and the latest next-generation model by playing about 200 games.\nIf next-generation model wins, it becomes BestModel.\n\n### options\n* `--type mini`: use mini config for testing, (see `src/chess_zero/configs/mini.py`)\n\n\nTips and Memory\n====\n\nGPU Memory\n----------\n\nUsually the lack of memory cause warnings, not error.\nIf error happens, try to change `vram_frac` in `src/configs/mini.py`,\n\n```python\nself.vram_frac = 1.0\n```\n\nSmaller batch_size will reduce memory usage of `opt`.\nTry to change `TrainerConfig#batch_size` in `MiniConfig`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzeta36%2Fchess-alpha-zero","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzeta36%2Fchess-alpha-zero","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzeta36%2Fchess-alpha-zero/lists"}