{"id":16166043,"url":"https://github.com/zjeffer/chess-deep-rl","last_synced_at":"2025-09-04T07:31:35.889Z","repository":{"id":111065632,"uuid":"421102249","full_name":"zjeffer/chess-deep-rl","owner":"zjeffer","description":"Research project: create a chess engine using Deep Reinforcement Learning","archived":false,"fork":false,"pushed_at":"2024-06-29T11:07:01.000Z","size":10851,"stargazers_count":135,"open_issues_count":0,"forks_count":12,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-05T04:51:14.647Z","etag":null,"topics":["ai","alphazero","artificial-intelligence","chess","chess-engine","deep-learning","deep-reinforcement-learning","machine-learning","mcts","neural-network","neural-networks","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjeffer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-25T16:30:27.000Z","updated_at":"2025-04-01T08:58:19.000Z","dependencies_parsed_at":"2024-06-29T12:24:53.041Z","dependency_job_id":"2a878e23-5871-4cdf-8757-d6a0e954010f","html_url":"https://github.com/zjeffer/chess-deep-rl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjeffer/chess-deep-rl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjeffer%2Fchess-deep-rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjeffer%2Fchess-deep-rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjeffer%2Fchess-deep-rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjeffer%2Fchess-deep-rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjeffer","download_url":"https://codeload.github.com/zjeffer/chess-deep-rl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjeffer%2Fchess-deep-rl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273573299,"owners_count":25129877,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","alphazero","artificial-intelligence","chess","chess-engine","deep-learning","deep-reinforcement-learning","machine-learning","mcts","neural-network","neural-networks","reinforcement-learning"],"created_at":"2024-10-10T02:53:11.858Z","updated_at":"2025-09-04T07:31:34.435Z","avatar_url":"https://github.com/zjeffer.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Chess engine with Deep Reinforcement learning\n\n***I'm currently rewriting the whole thing in C++, you can check it out [here](https://github.com/zjeffer/chess-deep-rl-cpp).***\n\n***You can read my bachelor thesis about this project [here](https://github.com/zjeffer/howest-thesis).***\n\n## Installation and user manual\n\nThe installation manual and the user manual can both be found under `./documents/`\n\nTo download my pretrained model, use this link: https://www.mediafire.com/file/75mzcj2aqcs6g6z/model.h5/file\n\nPut the model.h5 file in the models/ folder.\n\n\u003e [!WARNING]  \n\u003e Keep in mind this model has not been trained very well at all due to lack of compute resources. It's probably better to train your own model, but keep in mind you'd need *a lot* of compute power.\nI'm only posting it here for people to try out a model that has gone through a few training pipelines.\n\n## How do normal chess engines work?\n\nNormal chess engines work with the minimax algorithm: the engine tries to find the best move by creating a tree of all possible moves to a certain depth, and cutting down paths that lead to bad positions (alpha-beta pruning). It evaluates a position based on which pieces are on the board.\n\n![Alpha-Beta pruning in Minimax](code/img/AB_pruning.png)\n\n\u003e Image source: By Jez9999, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=3708424\n\n## How does my chess engine work?\n\nThis chess engine is based on AlphaZero by Deepmind. It uses a neural network\nto predict the next best move. The neural network learns by playing against\nitself for a high amount of games, and using their results to train the network.\nThe newly trained neural network is evaluated against the old network by playing\nmany games against each other, and the best network is kept. This process is repeated\nfor a long time.\n\n![Playing one move](code/img/ChessRL-schematic.png \"Playing one move\")\n\n### The neural network\n\n* Input layer: 19 8x8 boards of booleans\n\n\u003cimg src=\"code/tests/input_planes/full.png\" alt=\"Input example\" width=\"80%\"/\u003e\n\n* 20 hidden layers:\n\t* Convolutional hidden layer\n\t* 19 residual blocks with skip-connections\n* 2 outputs:\n\t1. The win probabilities of each move (73 boards of 8x8 floats)\n\t2. The value of the given board (scalar)\n\n\u003cimg src=\"code/tests/output_planes/unfiltered.png\" alt=\"Output example\" width=\"100%\"/\u003e\n\n=\u003e 30+ million parameters\n\nA visual representation of the model can be found in `./models/model.png`\n\nEvery move, run a high number amount of MCTS simulations. AlphaZero uses an custom version of MCTS.\n\n### Normal Monte Carlo Tree Search:\n\nhttps://en.wikipedia.org/wiki/Monte_Carlo_tree_search\n\n1. **Selection:** Traverse the tree **randomly** until a leaf node is reached.\n2. **Expansion:** expand the leaf node by creating a child for every possible action\n3. **Simulation:** 'rollout' the game by randomly choosing moves until the end of the game.\n4. **Backpropagation:** backpropagate the result of the rollout to the root node.\n\nIn chess, normal MCTS would be incredibly inefficient, because the amount of actions\nevery position can have is too high (step 1), and the length of the game can be very long\nwhen choosing random moves (step 3).\n\n![Monte Carlo Tree Search](code/img/MCTS-wikipedia.png \"Monte Carlo Tree Search\")\n\n\u003e Image source: By Rmoss92 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=88889583\n\n### AlphaZero's MCTS\n\nAlphaZero uses a different kind of MCTS: \n\n* step 1 (Selection) is not random, but based on neural network predictions and upper confidence bound\n* step 3 (Simulation) is replaced by the value prediction received by the neural network (Evaluation)\n\n![MCTS steps for 1 simulation](code/img/MCTS-alphazero.png \"MCTS steps for 1 simulation\")\n\n\u003e Image source: https://sebastianbodenstein.net/post/alphazero/\n\n**To run one MCTS simulation:**\n\n1. To traverse the tree, keep selecting the edges with maximum Q+U value\n\t* Q = mean value of the state over all simulations\n\t* U = upper confidence bound\n\t* Do this until a leaf node is reached (= a node which has not been visited/expanded yet)\n2. Expand the leaf node by adding a new edge for every possible action in the state\n\t* Input the leaf node into the neural network\n\t* The output:\n\t\t1) The probabilities \n\t\t2) The value of the state\n\t* Initialize the new edge's variables with these values:\n\t\t* `N = 0`\n\t\t* `W = 0` \n\t\t* `Q = 0`\n\t\t* `P = p_a` (prior probability for that action)\n\t* Add nodes (new states) for each action to the tree\n3. Backpropagation\n\t* From the leaf node, backpropagate to the root node\n\t* For every edge in the path, update the edge's variables\n\t\t* `N = N + 1`\n\t\t* `W = W + v`, v is the value of the leaf node predicted by the NN in step 2.\n\t\t* `Q = W / N`\n\n### After these simulations, the move can be chosen:\n\n* The move with greatest $N$ (deterministically)\n* According to a distribution (stochastically): $\\pi \\sim N$\n\n![Choose move from tree](code/img/MCTS-choose-move.png \"Choose move from tree\")\n\n\n### Creating a training set\n\n* To train the network, you need a lot of data\n* You create this data through self-play: letting the AI play against a copy of itself for many games.\n* For every move, store:\n\t* The state\n\t* The search probabilities\n\t* The winner, (added once the game is over)\n\n### Training the network\n\n* Sample a mini-batch from a high amount of positions (see training set)\n* Train the network on the mini-batch\n\n![Creating a training set](code/img/training.png \"Creating a training set\")\n\n\u003e Trophy icon by Freepik https://www.flaticon.com/authors/freepik\n\n\n| First training session | Second training session |\n|:-:| :-: |\n|![First training session](code/plots/first-training.png) | ![Second training session](code/plots/second-training-0.002.png) |\n\nThe first training session went pretty well, but the second didn't seem to train much at all. \nI believe I would need to generate a lot more data through selfplay to properly train the model.\n\n### Multi-processing improvements\n\nIt is necessary to create a huge training set of positions by making the current best AI play against itself. \nTo do that, I had the problem that playing multiple games in parallel was not possible because every agent needs access to the network:\n\n![Self-play without multiprocessing](code/img/without-multiprocessing.png \"Self-play without multiprocessing\")\n\nTo fix this, I created a server-client architecture with Python sockets: the server has access to the neural network, \nand the client sends predictions to the server. The server then sends the predictions back to the correct client. This is much more scalable and can be dockerized.\n\n![Self-play with multiprocessing](code/img/with-multiprocessing.png \"Self-play with multiprocessing\")\n\nWith a good system as a server (Ryzen 7 5800H + RTX 3070 Mobile), multiple clients (including clients on the system itself) can be connected to the server. \n\nThe result: much faster self-play. The other clients' GPUs do not get used, meaning any system with a good processor can run multiple self-play games in parallel when connected to a server.\n\n|System|No multiprocessing|Multiprocessing (16 processes)|\n|:-|:-------------------:|:-------------------:|\n|R7 5800H + RTX 3070|50 sims/sec|30 sims/sec each process|\n|i7 7700HQ + GTX 1050|20 sims/sec|15 sims/sec each process|\n\nI dockerized this server-client system so it can be deployed on a cluster.\nYou can find the configuration in code/docker-compose.yml, and the Dockerfiles in code/Dockerfile{client,server}.\nThe docker images are also pushed to `ghcr.io`: \n\n* The server: https://ghcr.io/zjeffer/chess-rl_prediction-server:latest\n\t* There is also a special server image if you're using an older Nvidia version (470 and CUDA 11.4): \n\t* https://ghcr.io/zjeffer/chess-rl_prediction-server:cuda-11.4\n* The client: https://ghcr.io/zjeffer/chess-rl_selfplay-client:latest\n\n### Evaluate the network\n\nTo know whether the new network is better than the previous one, let the new network play against the previous best for a high amount of games. Whoever wins the most games, is the new best network.\n\nUse that network to self-play again. Repeat indefinitely.\n\nI tried this with the newest network against a completely random neural network. These are the results after 10 games:\n\n```\nEvaluated these models: Model 1 = models/randommodel.h5, Model 2 = models/model.h5\nThe results:\nModel 1: 0\nModel 2: 5\nDraws: 5\n```\n\n\n# Sources\n\n### Wikipedia articles \u0026 Library documentation\n\n* [1]\"Deep reinforcement learning,\" Wikipedia. Jan. 29, 2022. Accessed: Feb. 01, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Deep_reinforcement_learning\u0026oldid=1068657803\n\n* [2]“Reinforcement learning,” Wikipedia. Jan. 15, 2022. Accessed: Feb. 01, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Reinforcement_learning\u0026oldid=1065862559\n\n* [3]“AlphaZero,” Wikipedia. Jan. 15, 2022. Accessed: Feb. 01, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=AlphaZero\u0026oldid=1065791194\n\n* [4]“AlphaGo,” Wikipedia. Jan. 25, 2022. Accessed: Feb. 01, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=AlphaGo\u0026oldid=1067772956\n\n* [5]“AlphaGo Zero,” Wikipedia. Oct. 14, 2021. Accessed: Feb. 01, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=AlphaGo_Zero\u0026oldid=1049954309\n\n* [6]“Monte Carlo tree search,” Wikipedia. Jan. 23, 2022. Accessed: Feb. 01, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Monte_Carlo_tree_search\u0026oldid=1067396622\n\n* [7]“Minimax,” Wikipedia. Jan. 18, 2022. Accessed: Feb. 01, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Minimax\u0026oldid=1066446492\n\n* [8]“Alpha–beta pruning,” Wikipedia. Jan. 30, 2022. Accessed: Feb. 01, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Alpha%E2%80%93beta_pruning\u0026oldid=1068746141\n\n* [9]“python-chess: a chess library for Python — python-chess 1.8.0 documentation.” https://python-chess.readthedocs.io/en/latest/ (accessed Feb. 01, 2022).\n\n* [10]“Technical Explanation of Leela Chess Zero · LeelaChessZero/lc0 Wiki,” GitHub. https://github.com/LeelaChessZero/lc0 (accessed Feb. 01, 2022).\n\n\n### AlphaZero \u0026 AlphaGo Zero specific articles \u0026 papers\n\n* [11]D. Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” arXiv:1712.01815 [cs], Dec. 2017, Accessed: Feb. 01, 2022. [Online]. Available: http://arxiv.org/abs/1712.01815\n\n* [12]“A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.” https://www.science.org/doi/10.1126/science.aar6404 (accessed Feb. 01, 2022).\n\n* [13]“engines - Understanding AlphaZero,” Chess Stack Exchange. https://chess.stackexchange.com/questions/19353/understanding-alphazero (accessed Feb. 01, 2022).\n\n* [14]“How does AlphaZero learn to evaluate a position it has never seen?,” Chess Stack Exchange. https://chess.stackexchange.com/questions/19401/how-does-alphazero-learn-to-evaluate-a-position-it-has-never-seen (accessed Feb. 01, 2022).\n\n* [15]“Figure 2: MCTS in AlphaGo Zero. | Nature”, Accessed: Feb. 01, 2022. [Online]. Available: https://www.nature.com/articles/nature24270/figures/2\n\n* [16]J. Varty, “Alpha Zero And Monte Carlo Tree Search.” https://joshvarty.github.io/AlphaZero/ (accessed Feb. 01, 2022).\n\n* [17]J. Varty, AlphaZeroSimple. 2022. Accessed: Feb. 01, 2022. [Online]. Available: https://github.com/JoshVarty/AlphaZeroSimple\n\n* [18]“Was AlphaZero taught castling?,” Chess Stack Exchange. https://chess.stackexchange.com/questions/37468/was-alphazero-taught-castling (accessed Feb. 01, 2022).\n\n* [19]T. M. Blog, “A Single-Player Alpha Zero Implementation in 250 Lines of Python.” https://tmoer.github.io/AlphaZero/ (accessed Feb. 01, 2022).\n\n* [20]“AlphaZero |.” https://sebastianbodenstein.net/post/alphazero/ (accessed Feb. 01, 2022).\n\n### Diagrams\n\n* [21]“AlphaGo Zero Explained In One Diagram | by David Foster | Applied Data Science | Medium.” https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0 (accessed Feb. 01, 2022).\n\n### Tutorials\n\n* [22]“AlphaZero, a novel Reinforcement Learning Algorithm, in JavaScript | by Carlos Aguayo | Towards Data Science.” https://towardsdatascience.com/alphazero-a-novel-reinforcement-learning-algorithm-deployed-in-javascript-56018503ad18 (accessed Feb. 01, 2022).\n\n* [23]D. Foster, “How to build your own AlphaZero AI using Python and Keras,” Applied Data Science, Dec. 02, 2019. https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188 (accessed Feb. 01, 2022).\n\n* [24]D. Foster, “How To Build Your Own MuZero AI Using Python (Part 1/3),” Applied Data Science, Feb. 23, 2021. https://medium.com/applied-data-science/how-to-build-your-own-muzero-in-python-f77d5718061a (accessed Feb. 01, 2022).\n\n* [25]“Simple Alpha Zero.” https://web.stanford.edu/~surag/posts/alphazero.html (accessed Feb. 01, 2022).\n\n* [26]D. Straus, “AlphaZero implementation and tutorial,” Medium, Jan. 27, 2020. https://towardsdatascience.com/alphazero-implementation-and-tutorial-f4324d65fdfc (accessed Feb. 01, 2022).\n\t* Updated article: [27]“How I trained a self-supervised neural network to beat GnuGo on small (7x7) boards | by Darin Straus | Analytics Vidhya | Medium.” https://medium.com/analytics-vidhya/how-i-trained-a-self-supervised-neural-network-to-beat-gnugo-on-small-7x7-boards-6b5b418895b7 (accessed Feb. 01, 2022).\n\t* [28]cody2007, alpha_go_zero_implementation. 2021. Accessed: Feb. 01, 2022. [Online]. Available: https://github.com/cody2007/alpha_go_zero_implementation\n\n\n\n## Interesting videos\n\n* [29]Lex Fridman, David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86, (Apr. 03, 2020). Accessed: Feb. 01, 2022. [Online]. Available: https://www.youtube.com/watch?v=uPUEq8d73JI\n\n* [30]DeepMind, RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning, (May 13, 2015). Accessed: Feb. 01, 2022. [Online]. Available: https://www.youtube.com/watch?v=2pWv7GOvuf0\n\n* [31]Aske Plaat, Keynote David Silver NIPS 2017 Deep Reinforcement Learning Symposium AlphaZero, (Dec. 10, 2017). Accessed: Feb. 01, 2022. [Online]. Available: https://www.youtube.com/watch?v=A3ekFcZ3KNw\n\n\n\n\n[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fzjeffer%2Fchess-deep-rl\u0026count_bg=%235E81AC\u0026title_bg=%23555555\u0026icon=\u0026icon_color=%235E81AC\u0026title=hits\u0026edge_flat=false)](https://hits.seeyoufarm.com)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjeffer%2Fchess-deep-rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjeffer%2Fchess-deep-rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjeffer%2Fchess-deep-rl/lists"}