https://github.com/0xnineteen/hyper-alpha-zero
hyper optimized alpha zero implementation to play gomoku (distributed training with ray, mcts with cython)
https://github.com/0xnineteen/hyper-alpha-zero
alpha-go alpha-zero cpp cython deepmind mcts monte-carlo-tree-search multi-core python ray
Last synced: 5 months ago
JSON representation
hyper optimized alpha zero implementation to play gomoku (distributed training with ray, mcts with cython)
- Host: GitHub
- URL: https://github.com/0xnineteen/hyper-alpha-zero
- Owner: 0xNineteen
- Created: 2023-01-15T16:52:53.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-15T17:29:03.000Z (over 2 years ago)
- Last Synced: 2025-05-07T00:12:16.288Z (5 months ago)
- Topics: alpha-go, alpha-zero, cpp, cython, deepmind, mcts, monte-carlo-tree-search, multi-core, python, ray
- Language: Python
- Homepage:
- Size: 864 KB
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# hyper-alpha-zero
hyper-optimized alpha-zero implementation with ray + cython for speed
train an agent that beats random actions and pure MCTS in 2 minutes
### file structure
- `train.py`: distributed training with ray
- `ctree/`: mcts nodes in cython (node.py = pure python)
- `mcts.py`: mcts playouts
- `network.py`: neural net stuff
- `board.py`: gomoku board## system design
- ray distributed parts (`train.py`):
- one distributed replay buffer
- N actors with the 'best model' weights which self-play games and store data in replay buffer
- M 'candidate models' which pull from the replay buffer and train
- each iteration they play against the 'best model' and if they win the 'best model' weights is updated
- include write/evaluation locks on 'best weights'
- 1 best model weights store (PS / parameter server)
- stores the best weights which are retrived by self-play and updated when candidates win
- cython impl
- `ctree/`: c++/cython mcts
- `node.py`: pure python mcts-- todos --
- jax network impl
- tpu + gpu support
- saved model weights### references
- based off: https://github.com/junxiaosong/AlphaZero_Gomoku
- distributed rl: http://rail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-21.pdf