{"id":13994513,"url":"https://github.com/mokemokechicken/reversi-alpha-zero","last_synced_at":"2025-08-13T13:20:29.399Z","repository":{"id":26283703,"uuid":"107836790","full_name":"mokemokechicken/reversi-alpha-zero","owner":"mokemokechicken","description":"Reversi reinforcement learning by AlphaGo Zero methods.","archived":false,"fork":false,"pushed_at":"2022-12-07T23:43:54.000Z","size":1280,"stargazers_count":678,"open_issues_count":33,"forks_count":169,"subscribers_count":51,"default_branch":"master","last_synced_at":"2024-08-10T14:16:24.916Z","etag":null,"topics":["alphago-zero","deeplearning","keras","machine-learning","reinforcement-learning","reversi","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mokemokechicken.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-22T04:16:43.000Z","updated_at":"2024-08-09T09:45:44.000Z","dependencies_parsed_at":"2022-07-31T23:28:01.609Z","dependency_job_id":null,"html_url":"https://github.com/mokemokechicken/reversi-alpha-zero","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mokemokechicken%2Freversi-alpha-zero","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mokemokechicken%2Freversi-alpha-zero/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mokemokechicken%2Freversi-alpha-zero/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mokemokechicken%2Freversi-alpha-zero/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mokemokechicken","download_url":"https://codeload.github.com/mokemokechicken/reversi-alpha-zero/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227166769,"owners_count":17740978,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alphago-zero","deeplearning","keras","machine-learning","reinforcement-learning","reversi","tensorflow"],"created_at":"2024-08-09T14:02:54.858Z","updated_at":"2024-11-29T16:31:29.439Z","avatar_url":"https://github.com/mokemokechicken.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"About\n=====\n\nReversi reinforcement learning by [AlphaGo Zero](https://deepmind.com/blog/alphago-zero-learning-scratch/) methods.\n\n@mokemokechicken's training hisotry is [Challenge History](challenge_history.md).\n\nIf you can share your achievements, I would be grateful if you post them to [Performance Reports](https://github.com/mokemokechicken/reversi-alpha-zero/issues/40).\n\nEnvironment\n-----------\n\n* Python 3.6.3\n* tensorflow-gpu: 1.3.0 (+)\n  * tensorflow==1.3.0 is also ok, but very slow. When `play_gui`, tensorflow(cpu) is enough speed.\n* Keras: 2.0.8 (+)\n\nModules\n-------\n\n### Reinforcement Learning\n\nThis AlphaGo Zero implementation consists of three worker `self`, `opt` and `eval`.\n\n* `self` is Self-Play to generate training data by self-play using BestModel.\n* `opt` is Trainer to train model, and generate next-generation models.\n* `eval` is Evaluator to evaluate whether the next-generation model is better than BestModel. If better, replace BestModel.\n  * If `config.play.use_newest_next_generation_model = True`, this worker is useless. (It is AlphaZero method)\n\n### Evaluation\n\nFor evaluation, you can play reversi with the BestModel.  \n\n* `play_gui` is Play Game vs BestModel using wxPython.\n\nData\n-----\n\n* `data/model/model_best_*`: BestModel.\n* `data/model/next_generation/*`: next-generation models.\n* `data/play_data/play_*.json`: generated training data.\n* `logs/main.log`: log file.\n\nIf you want to train the model from the beginning, delete the above directories.\n\nHow to use\n==========\n\nSetup\n-------\n### install libraries\n```bash\npip install -r requirements.txt\n```\n\n### install libraries with Anaconda\n```bash\ncp requirements.txt conda-requirements.txt\n```\n* Comment out lines for `jedi`, `Keras`, `parso`, `python-dotenv`, `tensorflow-tensorboard`, `wxPython` libraries\n* Replace '-' with '_' for  `ipython-genutils`, `jupyter-*`, `prompt-toolkit` libraries\n```bash\nconda env create -f environment.yml\nsource activate reversi-a0\nconda install --yes --file conda-requirements.txt\n```\n\nIf you want use GPU,\n\n```bash\npip install tensorflow-gpu\n```\n\n### set environment variables\nCreate `.env` file and write this.\n\n```text:.env\nKERAS_BACKEND=tensorflow\n```\n\nWindows Setup \n-------------\nThis instruction is written by @GCRhoads, Thanks!\n\n### Required: 64-bit windows\n\n##### Procedure verified for Windows 8.1.  Not yet tested for other versions.\n\nNote: Windows uses backslashes not forward slashes in path names.\n\n1. Change the first line (if necessary) of \"src\\reversi_zero\\agent\\player.py\" to\n**from asyncio.futures import Future**\n\n2. **Install the 64-bit version of Python 3.5** (the 32-bit version is not sufficient).  You have two options\n   - [**Direct download page**](https://www.python.org/downloads/release/python-352/)\n   - **Anaconda with Python 3.5 (Recommended)** [instructions](https://docs.anaconda.com/anaconda/faq#how-do-i-get-anaconda-with-python-3-5)\n\n**Note:** For some strange reason, both Python 3.5 and Anaconda get installed in a hidden folder.  To access them, you first have to go to the Control Panel, select Folder Options, and on the View tab, click on the circle next to \"Show hidden files, folders, or drives\" in the Advanced settings section.  Anaconda gets installed in C:\\ProgramData\\Anaconda3\\.  The direct download option installs Python in (I believe) C:\\Users\\\\\u003cyour user name\u003e\\AppData\\Local\\Program\\Python\\.\n\n3. **Install Visual C++ 2015 build tools.** You could install the entire 2015 version (not the 2017 version that Microsoft tries to force on you) of Visual Studio but this is a large download and install, most of which you don't need. [Download visual C++ build tools](http://go.microsoft.com/fwlink/?LinkId=691126).  Double-click on the downloaded file to run the installer.\n\n4. **Rewrite all uses of an f-strings**.  The python source code for this project uses numerous f-strings, a feature new to Python 3.6.  Since we need Python 3.5 (required by the windows version of tensorflow), use your editor's search feature to find every occurrence of an f-string and rewrite it using string.format().\n\n5. **Install the libraries**  From either the Anaconda prompt or from a command window in the top level folder where you put this distribution, enter the following.\n\n```bash\npip install -r requirements.txt\n```\n\n6. **Install tensor-flow**\n\nIf you have a gpu compatible with tensor-flow (see the list on the tensor-flow web site), then your code will execute much faster if you install the gpu version.  To install the gpu-version enter the following in either the Anaconda prompt or the command window.\n\n```bash\npip3 install -- upgrade tensorflow-gpu\n```\n\nIf you do not have a compatible gpu, then you will have to settle for the slow cpu-only version.  To install this, enter the following in either the Anaconda prompt or the command window.\n\n```bash\npip3 install -- upgrade tensorflow\n```\n7. **set environment variables**  Create a `.env` file and write the following line in this file.\n\n```text:.env\nKERAS_BACKEND=tensorflow\n```\n\nNow you should be good to go.\n\n\nStrongest Model\n---------------\nNow, \"challenge 5 model\" and \"ch5 config\" are strongest in my models.\nIf you want to play with it,\n\n```bash\nrm -rf data/model/next_generation/\nsh ./download_model.sh 5\n# run as wxPython GUI\npython src/reversi_zero/run.py play_gui -c config/ch5.yml\n```\n\nIf you want to use as a NBoard engine(see below \"Run as NBoard2.0 Engine\"), please use `nboard_engine -c config/ch5.yml` for the Command. \n\nPast Models\n------------\nPlease remove( or rename) `data/model/next_generation/` directory if you want to use \"BestModel\" at `data/model/model_best_*`.\n\n### Download Trained BestModel\n\nDownload trained BestModel(trained by bellow Challenge 1) for example.\n\n```bash\nsh ./download_best_model.sh\n```\n\n### Download Trained the Newest Model\n\nDownload trained the newest model(trained by Challenge 2, 3, 4, 5) as BestModel.\n\n```bash\nsh ./download_model.sh \u003cversion\u003e\n```\n\nex)\n\n```bash\nsh ./download_model.sh 5\n```\n\nConfiguration\n--------------\n\n### 'AlphaGo Zero' method and 'AlphaZero' method\n\nI think the main difference between 'AlphaGo Zero' and 'AlphaZero' is whether using `eval` or not.\nIt is able to change these methods by configuration.\n\n#### AlphaGo Zero method\n\n* `PlayConfig#use_newest_next_generation_model = False`\n* `PlayWithHumanConfig#use_newest_next_generation_model = False`\n* Execute `Evaluator` to select the best model.\n\n#### AlphaZero method\n\n* `PlayConfig#use_newest_next_generation_model = True`\n* `PlayWithHumanConfig#use_newest_next_generation_model = True`\n* Not use `Evaluator` (the newest model is selected as `self-play`'s model)\n\n### policy distribution of self-play\n\nIn DeepMind's paper,\nit seems that policy(π) data saved by self-play are distribution in proportion to pow(N, 1/tau).\nAfter the middle of the game, the tau becomes 0, so the distribution is one-hot.\n\n`PlayDataConfig#save_policy_of_tau_1 = True` means that the saved policy's tau is always 1.\n\n## other important hyper-parameters (I think)\n\nIf you find a good parameter set, please share in the github issues!\n\n### PlayDataConfig\n\n* `nb_game_in_file,max_file_num`: The max game number of training data is `nb_game_in_file * max_file_num`.\n* `multi_process_num`: Number of process to generate self-play data.\n\n### PlayConfig, PlayWithHumanConfig\n\n* `simulation_num_per_move` : MCTS number per move.\n* `c_puct`: balance parameter of value network and policy network in MCTS.\n* `resign_threshold`: resign threshold\n* `parallel_search_num`: balance parameter(?) of speed and accuracy in MCTS.\n  * `prediction_queue_size` should be same or greater than `parallel_search_num`.\n* `dirichlet_alpha`: random parameter in self-play.\n* `share_mtcs_info_in_self_play`: extra option. if true, share MCTS tree node information among games in self-play.\n  * `reset_mtcs_info_per_game`: reset timing of shared MCTS information.\n* `use_solver_turn`, `use_solver_turn_in_simulation`: use solver from this turn. not use it if `None`.   \n\n### TrainerConfig\n\n* `wait_after_save_model_ratio`: if greater than 0, optimizer will wait the ratio time to time span of saving model every after saving model. It might be useful if you run `self-play` and `optimize` in one GPU. \n\nBasic Usages\n------------\n\nFor training model, execute `Self-Play`, `Trainer` and `Evaluator`.\n\n\nSelf-Play\n--------\n\n```bash\npython src/reversi_zero/run.py self\n```\n\nWhen executed, Self-Play will start using BestModel.\nIf the BestModel does not exist, new random model will be created and become BestModel.\n\n### options\n* `--new`: create new BestModel\n* `-c config_yaml`: specify config yaml path override default settings of `config.py`\n\nTrainer\n-------\n\n```bash\npython src/reversi_zero/run.py opt\n```\n\nWhen executed, Training will start.\nA base model will be loaded from latest saved next-generation model. If not existed, BestModel is used.\nTrained model will be saved every 2000 steps(mini-batch) after epoch.\n\n### options\n* `-c config_yaml`: specify config yaml path override default settings of `config.py`\n* `--total-step`: specify total step(mini-batch) numbers. The total step affects learning rate of training.\n\nEvaluator\n---------\n\n```bash\npython src/reversi_zero/run.py eval\n```\n\nWhen executed, Evaluation will start.\nIt evaluates BestModel and the latest next-generation model by playing about 200 games.\nIf next-generation model wins, it becomes BestModel.\n\n### options\n* `-c config_yaml`: specify config yaml path override default settings of `config.py`\n\nPlay Game\n---------\n\n```bash\npython src/reversi_zero/run.py play_gui\n```\n\n### Note: Mac pyenv environment\n\n`play_gui` uses `wxPython`.\nIt can not execute if your python environment is built without Framework.\nTry following pyenv install option.\n\n```bash\nenv PYTHON_CONFIGURE_OPTS=\"--enable-framework\" pyenv install 3.6.3\n```\n\nFor Anaconda users:\n```bash\nconda install python.app\npythonw src/reversi_zero/run.py play_gui\n```\n\n\u003cimg src=\"doc/img/play_gui.png\" width=\"50%\"\u003e\n\n\nWhen executed, ordinary reversi board will be displayed and you can play against BestModel.\nAfter BestModel moves, numbers are displayed on the board.\n\n* Top left numbers(1) mean 'Visit Count (=N(s,a))' of the last search.\n* Bottom left numbers(2) mean 'Q Value (=Q(s,a)) on AI side' of the last state and move. The Q values are multiplied by 100.\n\nRun as NBoard2.0 Engine\n--------------\n\n[NBoard](http://www.orbanova.com/nboard/) is a very good reversi GUI and has strong reversi engines,\nwhich runs on Windows, Mac, and Linux (JRE required).\n\n\u003cimg src=\"doc/img/nboard.png\" width=\"50%\"\u003e\n\nIt can add external engines that implement [NBoard Protocol](https://github.com/weltyc/ntest/blob/master/instructions/Protocol.htm).\n\n### How to add this model as an external engine to NBoard\n\n* (0) launch NBoard from command line(need environment variables like PATH)\n  * ex) `java -jar /Applications/NBoard/nboard-2.0.jar`\n* (1) select menu `Engine -\u003e Select Opponent...`\n* (2) clike button `Add Engine`\n* (3) set parameter:\n  * `Name` = `RAZ` (for example)\n  * `Working Directory` = PATH TO THIS PROJECT\n  * `Command` = `nboard_engine` or `bash nboard_engine`. If you want to specify config type, `nboard_engine -c config/ch5.yml`.\n  \n* (4) Engine Level N is set as `simulation_num_per_move=N*20`\n\n\u003cimg src=\"doc/img/add_to_nboard.png\" width=\"50%\"\u003e\n\n### convenient way to evaluate your model\n\nNBoard cannot play with two different engines (maybe).\nHowever, it can select different engines of play-engine and analysis-engine.\n\nSo, convenient way to evaluate your model is for example,\n\n* select this engine as play-engine (or analysis-engine), another engine as analysis-engine (or play-engine).\n* check menu `View -\u003e Highlight Best Move`\n* start `User plays Black`(or White)\n* You simply choose the best move of analysis-engine.\n\nI have little confidence about `hint` protocol as analysis-engine (there is odd behavior),\nbut work in my environment.\n\nAuto Evaluation with other reversi AIs\n----------------\n\n[reversi-arena](https://github.com/mokemokechicken/reversi-arena) is a system for evaluating reversi AIs which implement NBoard Protocol.\nIt is useful when playing many games with strong AI like NTest.\n\n\nView Training Log in TensorBoard\n----------------\n\n### 1. install tensorboard\n\n```bash\npip install tensorboard\n```\n\n### 2. launch tensorboard and access by web browser\n\n```bash\ntensorboard --logdir logs/tensorboard/\n```\n\nAnd access `http://\u003cThe Machine IP\u003e:6006/`.\n\n\u003cimg src=\"doc/img/tensorboard.png\"\u003e\n\n### Trouble Shooting\n\nIf you can not launch tensorboard by error,\ntry to create another new plain project which includes only `tensorflow` and `tensorboard`.\n\nAnd\n\n```bash\ntensorboard --logdir \u003cPATH TO REVERSI DIR\u003e/logs/tensorboard/\n```\n\n\nTips and Memo\n====\n\nGPU Memory\n----------\n\nIn my environment of GeForce GTX 1080, memory is about 8GB, so sometimes lack of memory happen.\nUsually the lack of memory cause warnings, not error.\nIf error happens, try to change `per_process_gpu_memory_fraction` in `src/worker/{evaluate.py,optimize.py,self_play.py}`,\n\n```python\ntf_util.set_session_config(per_process_gpu_memory_fraction=0.2)\n```\n\nLess batch_size will reduce memory usage of `opt`.\nTry to change `TrainerConfig#batch_size` in `NormalConfig`.\n\nTraining Speed\n------\n\n* CPU: 8 core i7-7700K CPU @ 4.20GHz\n* GPU: GeForce GTX 1080\n* 1 game in Self-Play: about 10~20 sec (simulation_num_per_move = 100, thinking_loop = 1).\n* 1 step(mini-batch, batch size=512) in Training: about 1.8 sec.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmokemokechicken%2Freversi-alpha-zero","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmokemokechicken%2Freversi-alpha-zero","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmokemokechicken%2Freversi-alpha-zero/lists"}