https://github.com/0xnineteen/hyper-alpha-zero

hyper optimized alpha zero implementation to play gomoku (distributed training with ray, mcts with cython)
https://github.com/0xnineteen/hyper-alpha-zero

alpha-go alpha-zero cpp cython deepmind mcts monte-carlo-tree-search multi-core python ray

Last synced: 5 months ago
JSON representation

hyper optimized alpha zero implementation to play gomoku (distributed training with ray, mcts with cython)

Host: GitHub
URL: https://github.com/0xnineteen/hyper-alpha-zero
Owner: 0xNineteen
Created: 2023-01-15T16:52:53.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-01-15T17:29:03.000Z (over 2 years ago)
Last Synced: 2025-05-07T00:12:16.288Z (5 months ago)
Topics: alpha-go, alpha-zero, cpp, cython, deepmind, mcts, monte-carlo-tree-search, multi-core, python, ray
Language: Python
Homepage:
Size: 864 KB
Stars: 8
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

# hyper-alpha-zero

hyper-optimized alpha-zero implementation with ray + cython for speed

train an agent that beats random actions and pure MCTS in 2 minutes

### file structure

- `train.py`: distributed training with ray
- `ctree/`: mcts nodes in cython (node.py = pure python)
- `mcts.py`: mcts playouts
- `network.py`: neural net stuff
- `board.py`: gomoku board

## system design
- ray distributed parts (`train.py`):
- one distributed replay buffer
- N actors with the 'best model' weights which self-play games and store data in replay buffer
- M 'candidate models' which pull from the replay buffer and train
- each iteration they play against the 'best model' and if they win the 'best model' weights is updated
- include write/evaluation locks on 'best weights'
- 1 best model weights store (PS / parameter server)
- stores the best weights which are retrived by self-play and updated when candidates win

![](imgs/2023-01-15-09-18-19.png)

- cython impl
- `ctree/`: c++/cython mcts
- `node.py`: pure python mcts

-- todos --

- jax network impl
- tpu + gpu support
- saved model weights

### references
- based off: https://github.com/junxiaosong/AlphaZero_Gomoku
- distributed rl: http://rail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-21.pdf

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/0xnineteen/hyper-alpha-zero

Awesome Lists containing this project

README