{"id":41559049,"url":"https://github.com/tuero/muzero-cpp","last_synced_at":"2026-05-18T01:07:32.950Z","repository":{"id":37523774,"uuid":"431398663","full_name":"tuero/muzero-cpp","owner":"tuero","description":"A C++ pytorch implementation of MuZero","archived":false,"fork":false,"pushed_at":"2026-05-16T03:02:51.000Z","size":68689,"stargazers_count":40,"open_issues_count":0,"forks_count":8,"subscribers_count":5,"default_branch":"master","last_synced_at":"2026-05-16T05:15:25.953Z","etag":null,"topics":["alphazero","cpp","libtorch","machine-learning","mcts","muzero","pytorch","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tuero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-11-24T08:10:47.000Z","updated_at":"2026-05-16T03:02:55.000Z","dependencies_parsed_at":"2022-08-19T03:41:08.781Z","dependency_job_id":null,"html_url":"https://github.com/tuero/muzero-cpp","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/tuero/muzero-cpp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuero%2Fmuzero-cpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuero%2Fmuzero-cpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuero%2Fmuzero-cpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuero%2Fmuzero-cpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tuero","download_url":"https://codeload.github.com/tuero/muzero-cpp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuero%2Fmuzero-cpp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33161411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T22:39:12.733Z","status":"ssl_error","status_checked_at":"2026-05-17T22:39:10.741Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alphazero","cpp","libtorch","machine-learning","mcts","muzero","pytorch","reinforcement-learning"],"created_at":"2026-01-24T05:26:59.525Z","updated_at":"2026-05-18T01:07:32.944Z","avatar_url":"https://github.com/tuero.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MuZero-CPP\n\nThis project is a complete C++ implementation of the [MuZero](https://arxiv.org/abs/1911.08265) algorithm, inspired by the work done by [MuZero General](https://github.com/werner-duvaud/muzero-general).\nThe motivation behind this project is for the added speed C++ provides, efficient batched inference on the GPU, as well as working in C++ environments where we don't want to leave the C++ runtime.\nThere are still many optimization tricks that can be used to improve efficiency, but there aren't immediate plans to do so.\n\n**Note**: I can't guarantee that this implementation is bug-free, as I don't have the computational resources to compare against some of the reported results.\n\n![pong_animation](./assets/muzero_atari_pong.gif)\n\n## Features\n\n- Multi-threaded async actor inference\n- Multiple device (CPUs and GPUs) support for learning and inference\n- Reanalyze for fresh buffer values (both value and policy updates)\n- Complex action representation options\n- Priority replay buffer\n- Tensorboard metric logging\n- Model configuration through json (see `models/`)\n- Model, buffer, and metric checkpointing for resuming\n- Easy to add environments\n- Play against the learned model (2 player games) in testing\n- Support for ALE\n\n## Dependencies\n\nThe following libraries are used in this project. We ues vcpkg to manage the dependencies (except libtorch)\n\n- [abseil-cpp](https://github.com/abseil/abseil-cpp)\n- [libnop](https://github.com/google/libnop)\n- [tensorboard_logger](https://github.com/RustingSword/tensorboard_logger)\n- [libtorch](https://pytorch.org/)\n- [ALE](https://github.com/mgbellemare/Arcade-Learning-Environment), `sdl2`, `sdl2_image`, and `OpenCV` if using the ALE wrapper (**Note**: v0.7.4 or newer is required, as older versions of ALE doesn't link with libtorch.)\n\nSome source files are also taken from (and) modified [OpenSpiel](https://github.com/deepmind/open_spiel), and have the corresponding Copyright notice included as well.\n\n## Include to your Project\nTo use this as a library and extend for your own games/environments, see the [muzero-example](https://github.com/tuero/muzero-example) repo.\n\n`muzero` is not part of the official registry for vcpkg,\nbut is supported in my personal registry [here](https://github.com/tuero/vcpkg-registry).\nThis is by far the easier way to use this library as it will pull in dependencies, and is really the only documented way.\nTo add `tuero/vcpkg-registry` as a git registry to your vcpkg project:\n```json\n\"registries\": [\n...\n{\n    \"kind\": \"git\",\n    \"repository\": \"https://github.com/tuero/vcpkg-registry\",\n    \"reference\": \"master\",\n    \"baseline\": \"\u003cCOMMIT_SHA\u003e\",\n    \"packages\": [\"muzero\", \"arcade-learning-environment\", \"tensorboard-logger\"]\n}\n]\n...\n```\nwhere `\u003cCOMMIT_SHA\u003e` is the 40-character git commit sha in the registry's repository (you can find \nthis by clicking on the latest commit [here](https://github.com/tuero/vcpkg-registry) and looking \nat the URL.\n\n```shell\nvcpkg add port muzero\n```\n\nNote that `torch` will look for the `libtorch` project in the environment variable `LIBTORCH_ROOT`, which is not part of the included dependencies.\nThe easiest way to get `libtorch` is through the python package.\nFirst, create a virtual environment and install pytorch:\n```shell\nconda create -n muzero python=3.12\nconda activate muzero\npip3 install torch torchvision\n```\n\nThen set the `LIBTORCH_ROOT` environment variable:\n```shell\nexport LIBTORCH_ROOT=`python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'`\n```\n\nThen in your project cmake:\n```cmake\ncmake_minimum_required(VERSION 3.25)\nproject(my_project LANGUAGES CXX)\n\nfind_package(muzero CONFIG REQUIRED)\nadd_executable(main main.cpp)\ntarget_link_libraries(main PRIVATE muzero::muzero)\n```\n\n\u003e [!IMPORTANT]\n\u003e If you have pytorch installed in multiple virtual environments, you may get a configure error under the following scenario:\n\u003e You configured this dependency through vcpkg with `LIBTORCH_ROOT` pointing to one virtual environment,\n\u003e and then you trying to configure this dependency through vcpkg with `LIBTORCH_ROOT` pointing to another virtual environment.\n\u003e \n\u003e You can use the triplet in the vcpkg-registry (or in `cmake/triplets`) which will include `LIBTORCH_ROOT` in the dependency ABI.\n\u003e\n\u003e If you still somehow see a CMake warning about an `RPATH` cycle involving `libtorch/libc10`\n\u003e (often caused by vcpkg reusing an older cached build of a dependency and you aren't using the vcpkg registry triplet),\n\u003e delete the build folder, and request to not reuse cached artifacts when configuring:\n\u003e `VCPKG_BINARY_SOURCES=clear cmake --preset=release-linux`\n\n\n## Building Examples\nAll dependencies are managed through [vcpkg](https://vcpkg.io/en/), except for `libtorch` (pytorch's C++ frontend). \nThe easiest way to get `libtorch` is through the python package.\nFirst, create a virtual environment and install pytorch:\n```shell\nconda create -n muzero python=3.12\nconda activate muzero\npip3 install torch torchvision\n```\n\nNext, the following environment variables are required for the toolchain packages to be found:\n- `CC`: The path to your C compiler\n- `CXX`: The path to your C++23 compliant compiler\n- `LIBTORCH_ROOT`: The path to the libtorch package, which we will point towards the just installed python package\n\nFor example:\n```shell\nexport CC=gcc-15.2\nexport CXX=g++-15.2\n# Ensure you activated the muzero virtual environment\nexport LIBTORCH_ROOT=`python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'`\n```\n\nFinally, we use `CMakePresets.json` which sets all the required CMake variables.\n```shell\ncmake --preset=release-linux\ncmake --build --preset=release-linux -- -j8\n```\n\n\u003e [!IMPORTANT]\n\u003e If you see a CMake warning about an RPATH cycle involving `libtorch/libc10`\n\u003e (often caused by vcpkg reusing an older cached build of a dependency which references a different virtual environment),\n\u003e delete the build folder, and request to not reuse cached artifacts when configuring:\n\u003e `VCPKG_BINARY_SOURCES=clear cmake --preset=release-linux`\n\n\n## Example Usage\n\nIncluded are a few examples which show how to add a new environment, and interact with the learning and testing.\nThere are predefined command line arguments to parameterize the MuZero algorithm (see [default_flags.cpp](./include/muzero/default_flags.cpp) or use the --help option when running). Note that the devices are listed as comma separated, following the `torch` notation.\nYou can also add additional command line arguments by adding `ABSL_FLAG`s (see the examples).\n\nSome important arguments to consider:\n\n- `path` The directory which will be used for checkpointing and resuming.\n- `devices` Torch style comma separated list of devices which a model will be spawned on for each item. A minimum of 2 is needed for training, with the first listed being used solely for training and the rest using lagged target network for inference/testing. E.g.\n  - `--devices \"cuda:0,cuda:0\"`\n  - `--devices \"cuda:0,cuda:1\"`\n  - `--devices \"cuda:0,cpu\"`\n- `min_sample_size` Should be at least as big as `batch_size`\n- `num_actors` Number of self-play actor threads to spawn.\n- `num_reanalyze_actors` Number of reanalyze actor threads to spawn.\n- `train_reanalyze_ratio` Ratio of samples the learner will sample from the reanalyze buffer. This should probably match the ratio of actors you spawn for self-play/reanalyze to keep things efficient.\n- It is also recommended to have `initial_inference_batch_size` and `recurrent_inference_batch_size` equal to the number of total actors, as this will increase the efficiency by batching inference queries rather than having threads wait for a model to become available.\n\n### Training\n\nOne can train Connect4 by the following:\n\n```shell\n# Run the connect4 binary without reanalyze\n./build/release-linux/examples/connect4/muzero_connect4 --num_actors=10 --initial_inference_batch_size=10 --recurrent_inference_batch_size=10 --devices=\"cuda:0,cuda:0\" --batch_size=256 --min_sample_size=512 --value_loss_weight=0.25 --td_steps=42 --num_unroll_steps=5 --checkpoint_interval=10000 --model_sync_interval=1000 --num_simulations=50 --max_training_steps=250000 --export_path=examples/connect4/reanalyze-00 --model_path=models/model_small.json\n\n# Run the connect4 binary with 50% of samples coming from reanalyze\n./build/release-linux/examples/connect4/muzero_connect4 --num_actors=5 --num_reanalyze_actors=5 --initial_inference_batch_size=10 --recurrent_inference_batch_size=10 --train_reanalyze_ratio=0.5 --devices=\"cuda:0,cuda:0\" --batch_size=256 --min_sample_size=512 --value_loss_weight=0.25 --td_steps=42 --num_unroll_steps=5 --checkpoint_interval=10000 --model_sync_interval=1000 --num_simulations=50 --max_training_steps=250000 --export_path=examples/connect4/reanalyze-50 --model_path=models/model_small.json\n```\n\n### Safely Pausing\n\nTo pause the training, issue an abort signal `\u003cCTRL + C\u003e` and the current state of the algorithm will be checkpointed (checkpoints are done periodically as well).\n\n### Resume Training\n\nTo resume training, issue the same command which was used for training, but add the flag `--resume`. Note that some command line arguments can be changed, while others are checked and enforced (i.e. replay buffer max size). Not every case is checked, so it is best to use exactly the same arguments.\n\n```shell\n# Run the connect4 binary with the appropriate arguments\n./build/release-linux/examples/connect4/muzero_connect4 --num_actors=10 --initial_inference_batch_size=10 --recurrent_inference_batch_size=10 --devices=\"cuda:0,cuda:0\" --batch_size=256 --min_sample_size=512 --value_loss_weight=0.25 --td_steps=42 --num_unroll_steps=5 --checkpoint_interval=10000 --model_sync_interval=1000 --num_simulations=50 --max_training_steps=250000 --export_path=examples/connect4/reanalyze-00 --model_path=models/model_small.json --resume\n```\n\n### Testing Against the Trained Agent\n\nThe `muzero::play_test_model` function can be used to test a trained model.\nThe invocation should be the same as used to train (only some of the arguments are needed but its safe to use all the args used in training), but with an addition `--test` command line argument (assuming you implement this, see the examples for details).\nBy default, the most recent checkpointed model during training will be loaded.\nTo load the best performance model during training, use the `--testing_checkpoint=-2` argument.\n\n\nThe opponent listed in the `config.opponent_type` is used during testing.\nFor 2 player games, you can manually play against your bot by setting `config.opponent_type = types::OpponentTypes::Human`.\n\n### Examining Metrics\n\nThe tensorboard metric file is saved at `config.path/metrics/tfevents.pb`. To view the metrics while training, run your normal tensorboard command:\n\n```shell\ncd examples/connect4/\ntensorboard --logdir=./metrics\n```\n\nNote that this requires a tensorboard python installation (i.e. conda or pip).\n\n## Adding Environments\n\nTo add a new environment, you must implement the following:\n\n- Create a new directory containing your environment under `./examples`\n- A game which extends [abstract_game.h](./include/muzero/abstract_game.h)\n- A `Config` which specifies the `action_representation` function, and`visit_softmax_temperature` function at a minimum (these can't really be specified as command line arguments)\n- A source file which contains the entry point (`main` with call to `muzero::muzero(config, game_factory\u003cYOUR_GAME_CLASS_NAME\u003e)`)\n- An optional added function call to `muzero::play_test_model` to test your trained agent\n- Create a `CMakeLists.txt` to compile your example and add the directory to the parent `CMakeLists.txt`\n\nSee the examples for proper usage.\n\n## Pretrained Models\n\nIncluded is a pretrained model on the Connect4 environment.\nTo test the pretrained model, use the following command:\n\n```shell\n./build/release-linux/examples/connect4/muzero_connect4 --devices=\"cpu\" --num_simulations 50 --export_path=examples/connect4/reanalyze-00 --model_path=models/model_small.json --test\n```\n\nIncluded as well is a pretrained model on the pong ALE environment. Note that we do not include any ROMS.\nTo test the pretrained model, use the following command:\n\n```shell\n./build/release-linux/examples/ale/muzero_ale --num_actors=5 --num_reanalyze_actors=5 --initial_inference_batch_size=10 --recurrent_inference_batch_size=10 --devices=cuda:0,cuda:0 --batch_size=256 --min_sample_size=512 --value_loss_weight=0.25 --td_steps=10 --num_unroll_steps=5 --checkpoint_interval=10000 --model_sync_interval=1000 --num_simulations=25 --max_training_steps=2000000 --export_path=examples/ale/pong --model_path=models/model_ale.json --replay_buffer_size=20000 --reanalyze_buffer_size=100000 --train_reanalyze_ratio=0.5 --max_history_len=200 --episodic_pong --game_file_path=pong.bin --stacked_observations=6 --frame_skip=4 --min_reward=-1 --max_reward=1 --min_value=-21 --max_value=21 --test\n```\n\n## Performance\n\nThe choice of using C++ was for environment constraints and added performance that could be gained instead of dealing with threading in python.\nThere are certainly many improvements that can be made to this codebase, and they are welcomed.\n\nThe following metrics are on training the Connect4 environment on a stock Intel 7820X, 64GB of system memory, Nvidia 3090, running on Ubuntu 20.04 using the Release build flags. The runtime configuration is given in the command window below for an extended training session. Note that Connect4 can be solved in much fewer training steps, this is simply a performance test.\n\n- Total training time of 9:48:43\n- ~13.5 training steps per second\n- ~39.7 self play steps per second\n\n```shell\n./build/release-linux/examples/connect4/muzero_connect4 --num_actors=10 --initial_inference_batch_size=10 --recurrent_inference_batch_size=10 --devices=\"cuda:0,cuda:0\" --batch_size=256 --min_sample_size=512 --value_loss_weight=0.25 --td_steps=42 --num_unroll_steps=5 --checkpoint_interval=10000 --model_sync_interval=1000 --num_simulations=50 --max_training_steps=500000 --export_path=examples/connect4/reanalyze-00 --model_path=models/model_small.json\n```\n\n## Tensorboard Metrics\n\nThe following metrics are tracked:\n\n- Evaluator:\n  - `Episode_length`: Length of the episode\n  - `Total_reward`: Total reward received (includes both players in 2 player games)\n  - `Mean_value`: Average root value along the trajectory\n  - `Muzero_reward`: Reward received by the Muzero player\n  - `Opponent_reward`: Reward received by the opponent\n- Workers:\n  - `Self_played_games`: Total number of games completed through self play by all actors\n  - `Self_played_steps`: Total environment steps completed through self play by all actors\n  - `Training_steps`: Total training steps (i.e. model updates)\n  - `Reanalyze_games`: Total number of games which have had their root values and policy reanalyzed\n  - `Reanalyze_steps`: Total number of game steps which have had their root values and policy reanalyzed\n  - `Training_per_selfplay_step_ratio`: Ratio of training steps to self play and reanalyze steps, used to monitor if actors are feeding data to replay buffer too fast or if model is not being fed enough fresh trajectories\n- Loss:\n  - `Total_weighted_loss`: Total loss (minus l2) weighted by importance sampling and the value loss scale\n  - `Value_loss`: Unweighted loss on the value prediction from the prediction network\n  - `Policy_loss`: Unweighted loss on the policy prediction from the prediction network\n  - `Reward_loss`: Unweighted loss on the reward prediction from the dynamics network\n\n![tensorboard_evaluator](./assets/tensorboard_evaluator.png)\n![tensorboard_evaluator](./assets/tensorboard_workers.png)\n![tensorboard_evaluator](./assets/tensorboard_loss.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuero%2Fmuzero-cpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftuero%2Fmuzero-cpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuero%2Fmuzero-cpp/lists"}