{"id":14958728,"url":"https://github.com/flowun/gardnerchessai","last_synced_at":"2026-02-12T10:05:59.329Z","repository":{"id":229387059,"uuid":"776407938","full_name":"flowun/gardnerChessAi","owner":"flowun","description":"Implementation of the Double Deep Q-Learning algorithm with a prioritized experience replay memory to train an agent to play the minichess variante Gardner Chess","archived":false,"fork":false,"pushed_at":"2024-04-02T10:20:34.000Z","size":3738,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-07T16:19:46.754Z","etag":null,"topics":["ai","ai-projects","artificial-inteligence","chess","chess-ai","ddqn","deep-q-learning","double-deep-q-learning","double-dqn","minichess","prioritized-experience-replay","q-value","reinforcement-learning","tensorflow","tensorflow-tutorials","tutorials"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flowun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-23T12:28:00.000Z","updated_at":"2024-10-16T20:26:25.000Z","dependencies_parsed_at":"2024-04-02T11:52:07.966Z","dependency_job_id":null,"html_url":"https://github.com/flowun/gardnerChessAi","commit_stats":{"total_commits":8,"total_committers":2,"mean_commits":4.0,"dds":0.125,"last_synced_commit":"97eceef5b855c54fa5d5a93405bd614133a4cb49"},"previous_names":["flowun/gardnerchessai"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/flowun/gardnerChessAi","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flowun%2FgardnerChessAi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flowun%2FgardnerChessAi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flowun%2FgardnerChessAi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flowun%2FgardnerChessAi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flowun","download_url":"https://codeload.github.com/flowun/gardnerChessAi/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flowun%2FgardnerChessAi/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268361780,"owners_count":24238530,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-projects","artificial-inteligence","chess","chess-ai","ddqn","deep-q-learning","double-deep-q-learning","double-dqn","minichess","prioritized-experience-replay","q-value","reinforcement-learning","tensorflow","tensorflow-tutorials","tutorials"],"created_at":"2024-09-24T13:18:10.066Z","updated_at":"2026-02-12T10:05:59.290Z","avatar_url":"https://github.com/flowun.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GardnerChessAi with Double Deep Q-Learning\r\n\r\n\u003cimg src=\"rsc/startingPosition.png\" width=\"50%\" height=\"50%\"\u003e\r\n\r\n# Table of Contents\r\n1. [Game Rules](#Game-Rules)\r\n2. [Description](#Description)\r\n    1. [How does the AI evaluate positions?](#How-does-the-AI-evaluate-positions)\r\n    2. [Training Process](#Training-Process)\r\n    3. [Evaluation](#Evaluation)\r\n    4. [Example Game Against Minimax](#Example-Game-Against-Minimax)\r\n3. [Personal Experience](#Personal-Experience)\r\n    1. [Motivation](#Motivation)\r\n    2. [Problems](#Problems)\r\n    3. [Solutions](#Solutions)\r\n4. [Installation and Usage](#Installation-and-Usage)\r\n    1. [How to install](#How-to-install)\r\n    2. [Packages with Versions](#Packages-with-Versions)\r\n    3. [How to use](#How-to-use)\r\n\r\n----\r\n## Game Rules\r\nSame as in normal chess, except:\r\n- no castling\r\n- no en passant\r\n- pawn promotion to queen only\r\n- king needs to be captured to win\r\n- draw after 20 moves without a pawn move or a capture\r\n\r\n----\r\n## Description\r\nThis is an implementation of the [**Double Deep Q-Learning**](https://arxiv.org/pdf/1509.06461.pdf) algorithm with [**prioritized experience replay memory**](https://arxiv.org/pdf/1511.05952.pdf) to train\r\nan agent to play the [minichess variante gardner chess](https://en.wikipedia.org/wiki/Minichess)\r\n\r\n### How does the AI evaluate positions?\r\nThe board can be fed into a neural network in a one-hot-encoded format. The neural network is then used to predict the\r\nvalue of the position (Q-value). The Q-value represents the expected reward of the agent.\r\n\r\nUsing this value, the actions can be selected as follows:\r\n![](rsc/actionSelection.png)\r\n\r\n### Training Process\r\n\r\n![](rsc/training.png)\r\n\r\n### Evaluation\r\nEvery few epochs, the agent plays against some preprogrammed opponents (random, minimax, etc.) to evaluate its\r\nperformance. The results are automatically plotted in a matplotlib graph and saved as a pdf under saves/modelName/gardnerChessAi_training_graph.\r\n\r\nHere is the **training graph** of the pretrained model:\r\n\r\n![](rsc/evaluation.png)\r\n\r\nIn the training evaluation, a temperature of 0.1 was used. The strength of the AI, if it always plays the\r\nbest move (with temperature 0), is of course higher. It can also be further increased by using a minimax search on top \r\nof the neural network evaluation (e.g. 'ai+minimax2' for a search of depth 2 on top of the neural net evaluation).\r\n\r\nThese three versions of the best model were manually pitted against minimax with the searching depths 2, 3 and 4. \r\nThe win percentages of the AI are as follows (draw counts as half a win):\r\n \r\n| Opponent  | AI with temperature 0.1 | AI with temperature 0 | AI+Minimax2 |\r\n| --------- | ----------------------- | --------------------- | ----------- |\r\n| Minimax 2 | 52%                     | 64%                   | 87%         |\r\n| Minimax 3 | 37%                     | 46%                   | 59%         |\r\n| Minimax 4 | 12%                     | 16%                   | 41%         |\r\n\r\nAs one can see, the AI is almost as good as minimax with a searching depth of 3, when it always plays the best move (temperature 0) without further search.\r\n\r\nWhen using a minimax search of depth 2 on top of the neural network evaluation, the strength of the AI increases to a\r\npoint which is exactly between minimax with the searching depths 3 and 4.\r\n\r\n### Example Game Against Minimax\r\nHere is an example game of the pretrained model (white) playing against minimax with a searching depth of 3 (black).\r\n\u003cimg src=\"rsc/example_game.gif\" width=\"50%\" height=\"50%\"\u003e\r\n\r\n----\r\n## Personal Experience\r\n\r\n### Motivation \u0026 Goal\r\nThis project was a hobby during my last year of school in 2022/2023. My goal was to train an AI through self-play that \r\nbeats my family members in chess. Looking back, it was a lot of fun and I learned a lot about Q-Learning including ways \r\nto improve the vanilla Q-Learning algorithm and the choice of hyperparameters.\r\n\r\nNow about a year later, I migrated to the newest tensorflow version, retrained a model with a gpu (before, I had only \r\nused a cpu) and made this project public.\r\n\r\n### Problems\r\n- instability\r\n- exploding Q-Values\r\n- slow training\r\n- my time-consuming progress monitoring addiction\r\n\r\n### Solutions\r\n- instability\r\n    - long enough exploration phase helped\r\n    - checkpoints\r\n    - **experience replay buffer** is a must-have; I also implemented a **prioritized experience replay buffer** but it \r\n      slowed down the training (for me, it wasn't worth it)\r\n    - keeping Q-Values small\r\n- exploding Q-Values\r\n    - **Double Deep Q-Learning** instead of vanilla Q-Learning\r\n    - continuously updating target model by a very small percentage instead of copying the weights every few epochs\r\n    - don't have the discount factor unnecessary high\r\n    - patience: if the Q-Values don't explode too much, they often stabilise at some point\r\n- slow training\r\n    - exponential decaying learning rate\r\n    - **gpu** training instead of cpu-only training\r\n    - time different parts of the training process and optimize the most time-consuming parts. For me, this was:\r\n      - directly calling model() instead of model.predict() to get the Q-Values extremely sped up training and\r\n        interference (in get_q_values() methode in neural_network.py)\r\n      - minimizing model() calls by batching inputs in the fit_on_memory() methode in training.py\r\n    - with these optimizations, I was able to decrease the epoch time from 2 minutes to 7 seconds while at the same time\r\n      increasing the batch size from 32 to 128 and increasing the fitting frequency from 8 to 16\r\n- my time-consuming progress monitoring addiction\r\n    - partially solved by a fully automated training and evaluation process which includes saving, remembering and reloading training settings,\r\n      making checkpoints, pitting the agent against different opponents, updating the training graph\r\n----\r\n## Installation and Usage\r\n\r\n### How to install\r\n- clone the repository\r\n- install the dependencies (if you have conda and want to use a gpu (only possible on wsl2/linux), you can use the gardnerChessAi.yml file with the terminal command \r\n`conda env create -f gardnerChessAi.yml` to create a conda environment with all the dependencies)\r\n- if an error arises during the loading of the pretrained model, it can be resolved by manually downloading and replacing the saves\\pretrained\\gardnerChessAi_model_main_checkpoint\\keras_metadata.pb file. This issue is due to a known Git bug and is beyond my control.\r\n\r\n### Packages with Versions\r\n- python=3.11.5\r\n- tensorflow=2.15.0\r\n- numpy=1.26.2\r\n- matplotlib=3.8.3\r\n- pygame=2.5.2\r\n\r\n### How to use\r\n- run training.py to train a model (you can train you own model or continue training the pretrained model)\r\n- training evaluation can be followed in matplotlib plots under saves/modelName/gardnerChessAi_training_graph\r\n- run play.py to play against a model or watch two models play against each other\r\n- run spectate.py to see how the agent improved the play style over the epochs\r\n- in the scripts are more detailed explanations and options to choose from\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflowun%2Fgardnerchessai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflowun%2Fgardnerchessai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflowun%2Fgardnerchessai/lists"}