https://github.com/k-l-lambda/trigo.cpp
High-performance C++/CUDA implementation of Monte Carlo Tree Search for Trigo (3D Go) training data generation.
https://github.com/k-l-lambda/trigo.cpp
Last synced: 4 months ago
JSON representation
High-performance C++/CUDA implementation of Monte Carlo Tree Search for Trigo (3D Go) training data generation.
- Host: GitHub
- URL: https://github.com/k-l-lambda/trigo.cpp
- Owner: k-l-lambda
- Created: 2025-12-04T06:58:03.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-12-21T15:56:09.000Z (6 months ago)
- Last Synced: 2025-12-21T16:34:44.520Z (6 months ago)
- Language: C++
- Size: 696 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Trigo.cpp - High-Performance C++ Tools for Trigo AI
C++/CUDA inference and self-play tools for [Trigo](https://github.com/k-l-lambda/trigo) (3D Go). Provides ONNX Runtime-based neural network inference, AlphaZero-style MCTS, and high-performance self-play data generation for the [TrigoRL training pipeline](../trigoRL).
## Overview
This project implements production-ready tools for Trigo AI development:
**Key Features**:
- π **ONNX Runtime Integration**: CPU and GPU inference with trained models
- π― **AlphaZero MCTS**: Value network evaluation (255Γ faster than random rollouts)
- π§ **Self-Play Generator**: Command-line tool for training data generation
- π² **Random Board Selection**: 220 candidate shapes (2D and 3D) for diverse training
- β
**Cross-Language Validation**: 100% compatibility with TypeScript reference
- π¦ **Multiple Policies**: Random, Neural, Pure MCTS, AlphaZero MCTS
- π **TGN Format**: Compatible with TrigoRL training pipeline
## Quick Start
### Prerequisites
- CMake 3.18+
- GCC 9+ or Clang 10+
- CUDA Toolkit 11.0+ (optional, for GPU inference)
- ONNX Runtime 1.17.0+ (provided in repository)
### Build
```bash
# Clone repository
cd /path/to/trigo.cpp
# Create build directory
mkdir build && cd build
# Configure and build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
# Run tests
./test_trigo_game
./test_alphazero_mcts
```
### Usage
#### Self-Play Data Generation
**Generate games with random board shapes (recommended for training):**
```bash
# Random board selection from 220 candidates (2D: 2-13Γ1-13Γ1, 3D: 2-5Γ2-5Γ2-5)
# This creates a diverse dataset covering various board sizes
export TRIGO_FORCE_CPU=1
./self_play_generator \
--num-games 100 \
--random-board \
--black-policy mcts \
--white-policy mcts \
--model ../models/trained_shared \
--output /path/to/data/mcts_games \
--seed 42
# With custom board ranges (e.g., small 2D boards only)
./self_play_generator \
--num-games 100 \
--random-board \
--board-ranges "3-9x3-9x1-1,2-3x2-3x2-2" \
--black-policy mcts \
--white-policy mcts \
--model ../models/trained_shared \
--output /path/to/data/mcts_games
```
**Generate games with fixed board size:**
```bash
# AlphaZero-style MCTS with value network on 5Γ5Γ5 board
# Force CPU for best performance (1.52Γ faster than GPU for batch=1 MCTS)
export TRIGO_FORCE_CPU=1
./self_play_generator \
--num-games 100 \
--board 5x5x5 \
--black-policy mcts \
--white-policy mcts \
--model ../models/trained_shared \
--output /path/to/data/mcts_games \
--seed 42
# With custom MCTS parameters
./self_play_generator \
--num-games 100 \
--board 5x5x5 \
--black-policy mcts \
--white-policy mcts \
--model ../models/trained_shared \
--mcts-simulations 50 \
--mcts-c-puct 1.5 \
--output /path/to/data/mcts_games
```
**Generate games with neural policy (faster, less exploration):**
```bash
./self_play_generator \
--num-games 1000 \
--board 5x5x5 \
--black-policy neural \
--white-policy neural \
--model ../models/trained_shared \
--output /path/to/data/neural_games
```
**Generate baseline games with random policy:**
```bash
# Random policy with random board shapes
./self_play_generator \
--num-games 10000 \
--random-board \
--black-policy random \
--white-policy random \
--output /path/to/data/random_games \
--seed 42
# Random policy with fixed board
./self_play_generator \
--num-games 10000 \
--board 5x5x5 \
--black-policy random \
--white-policy random \
--output /path/to/data/random_games \
--seed 42
```
#### Board Shape Options
The generator supports two modes for board shape selection:
**Fixed Board (--board):**
```bash
--board 5x5x5 # Fixed 5Γ5Γ5 board for all games
--board 9x9x1 # Fixed 9Γ9Γ1 (2D) board for all games
--board 13x13x1 # Fixed 13Γ13 (traditional Go size)
```
**Random Board (--random-board):**
```bash
--random-board # Randomly select from 220 candidate shapes per game
```
The random board mode uses default ranges:
- **2D boards**: 2-13Γ1-13Γ1 (156 shapes)
- **3D boards**: 2-5Γ2-5Γ2-5 (64 shapes)
- **Total**: 220 candidate shapes
**Custom Board Ranges (--board-ranges):**
You can specify custom ranges with `--board-ranges` (requires `--random-board`):
```bash
# Format: "minX-maxXxminY-maxYxminZ-maxZ,..."
--random-board --board-ranges "2-13x1-13x1-1,2-5x2-5x2-5" # Default (220 shapes)
--random-board --board-ranges "3-9x3-9x1-1" # Small 2D boards only
--random-board --board-ranges "2-3x2-3x2-3" # Tiny 3D boards only
--random-board --board-ranges "5-5x5-5x5-5,9-9x9-9x1-1" # Mix of 5Γ5Γ5 and 9Γ9
```
**Range Format**: `minX-maxXxminY-maxYxminZ-maxZ`
- Multiple ranges can be comma-separated
- Each range generates all combinations within bounds
- Example: `2-3x2-3x1-1` generates: 2Γ2Γ1, 2Γ3Γ1, 3Γ2Γ1, 3Γ3Γ1 (4 shapes)
Random board selection is recommended for training diverse models that generalize across board sizes.
**Parameter Rules**:
- `--board` and `--random-board` are mutually exclusive
- `--board-ranges` requires `--random-board`
#### Policy Options
Available policy types:
- `random` - Random valid moves (fast, no model required)
- `neural` - Direct neural network inference (requires `--model`)
- `mcts` - AlphaZero MCTS with value network (requires `--model`)
#### MCTS Parameters
- `--mcts-simulations N` - Number of MCTS simulations per move (default: 50)
- `--mcts-c-puct F` - Exploration constant for PUCT formula (default: 1.5)
- `--mcts-temperature F` - Temperature for move selection (default: 1.0)
- `--mcts-dirichlet-alpha F` - Dirichlet noise alpha for root exploration (default: 0.3)
#### Model Path
The `--model` parameter should point to a directory containing the 3-model ONNX architecture:
```
models/trained_shared/
βββ base_model.onnx # Shared transformer base
βββ policy_head.onnx # Policy network
βββ value_head.onnx # Value network
```
Models are exported from TrigoRL using `exportOnnx.py`.
#### Performance Tips
**For Self-Play Generation:**
- Use `TRIGO_FORCE_CPU=1` for MCTS (CPU is 1.52Γ faster than GPU)
- MCTS with 50 simulations: ~280ms per move on CPU
- Can generate 10,000 games in 32.5 hours on a single CPU
**For GPU Inference:**
- GPU is recommended only for training with large batches (256+)
- Small batch sizes (batch=1) underutilize GPU parallelism
- GPU shows ~1.52Γ performance penalty for MCTS due to kernel launch overhead
## Architecture
### Component Stack
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Python Training Pipeline (TrigoRL) - SEPARATE PROJECT β
β ββ PyTorch Model Training β
β ββ ONNX Model Export (exportOnnx.py) β
β ββ Training Data Loading (.tgn files) β
β ββ Weights & Biases Integration β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β exports
ONNX Models (.onnx)
β uses
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β C++ Inference & Generation Tools (trigo.cpp) - THIS PROJECTβ
β ββ SharedModelInferencer (ONNX Runtime + CUDA) β
β β ββ Policy Network Inference β
β β ββ Value Network Inference β
β β ββ Prefix Tree Attention Builder β
β ββ TrigoGame (3D Go rules engine) β
β β ββ Board State Management β
β β ββ Move Validation β
β β ββ Capture & Ko Detection β
β β ββ Territory Calculation β
β ββ MCTS (Monte Carlo Tree Search) β
β β ββ AlphaZero MCTS (PUCT, value network) - Production β
β β ββ Pure MCTS (UCB1, random rollouts) - Reference β
β ββ Self-Play Generator (data generation tool) β
β β ββ Random Board Selection (220 candidates) β
β β ββ RandomPolicy β
β β ββ NeuralPolicy (ONNX inference) β
β β ββ MCTSPolicy (Pure MCTS) β
β β ββ TGN File Export β
β ββ Python Bindings (pybind11) [future] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β generates
Training Data (.tgn)
β feeds back to
TrigoRL Pipeline
```
### Directory Structure
```
trigo.cpp/
βββ include/ # Public C++ headers
β βββ trigo_game.hpp # 3D Go game engine
β βββ trigo_coords.hpp # ab0yz coordinate system
β βββ trigo_game_utils.hpp # Capture, Ko, territory
β βββ board_shape_candidates.hpp # Random board shape generation
β βββ mcts.hpp # AlphaZero MCTS (value network)
β βββ mcts_moc.hpp # Pure MCTS (random rollouts)
β βββ self_play_policy.hpp # Policy interfaces
β βββ shared_model_inferencer.hpp # ONNX Runtime wrapper
β βββ prefix_tree_builder.hpp # Tree attention
β βββ tgn_tokenizer.hpp # TGN tokenization
β βββ tgn_utils.hpp # TGN generation utilities
βββ src/ # Implementation
β βββ trigo_game.cpp
β βββ shared_model_inferencer.cpp
β βββ tgn_tokenizer.cpp
β βββ prefix_tree_builder.cpp
β βββ self_play_generator.cpp # Main CLI tool
βββ tests/ # Unit tests
β βββ test_trigo_game.cpp
β βββ test_mcts.cpp
β βββ test_alphazero_mcts.cpp
β βββ test_neural_policy_inference.cpp
β βββ ...
βββ models/ # Trained ONNX models
β βββ trained_shared/
β βββ base_model.onnx
β βββ policy_head.onnx
β βββ value_head.onnx
βββ docs/ # Documentation
β βββ PLAN.md # Development roadmap
βββ CMakeLists.txt
βββ README.md
```
## Performance
### C++ vs TypeScript MCTS Performance
Comprehensive benchmarking (December 2025) shows significant performance advantages:
| Implementation | Time per Move | Games per Minute | Speedup vs TypeScript |
|----------------|---------------|------------------|----------------------|
| **C++ CPU (MCTS)** | 280ms | 3.6 games/min | **6.59Γ** |
| **C++ GPU (MCTS)** | 335ms | 3.0 games/min | 5.51Γ |
| TypeScript (MCTS) | 1846ms | 0.65 games/min | 1Γ (baseline) |
**Key Findings:**
- **C++ is 5.47Γ faster** than TypeScript for MCTS self-play
- **CPU outperforms GPU by 1.52Γ** for batch=1 MCTS workloads
- Can generate **10,000 games in 32.5 hours** on a single CPU
### Value Network vs Random Rollouts
AlphaZero-style MCTS with value network provides massive speedup over traditional rollouts:
| Implementation | Time per simulation | 50 simulations | 800 simulations |
|----------------|---------------------|----------------|-----------------|
| PureMCTS (rollouts) | 923ms | 46 seconds | 12+ minutes |
| MCTS (value network) | 3.6ms | 180ms | 2.9 seconds |
| **Speedup** | **255Γ** | **255Γ** | **255Γ** |
**Test Configuration:**
- Board: 5Γ5Γ1
- MCTS simulations: 50 per move
- Model: Dynamic ONNX shared architecture
- Hardware: Multi-core CPU + RTX 3090 (24GB)
### Why CPU is Faster Than GPU for MCTS
For batch=1 MCTS workloads, CPU shows better performance due to:
- **Kernel launch overhead**: ~100-150ΞΌs per GPU call dominates small inference
- **Memory transfers**: 7 additional Memcpy operations for GPU
- **Underutilization**: GPU cores 99% idle with batch=1
- **Operator fallback**: Some operators fall back to CPU
**Recommendation:**
- β
Use CPU for MCTS self-play (batch=1)
- β
Use GPU for training (batch=256+)
- β
Future: Batch MCTS leaf evaluation for GPU (64-256 positions simultaneously)
### Production Capacity
**Single CPU Performance:**
- 7.7 games per minute (MCTS, 50 simulations/move)
- 10,000 games in 32.5 hours
- Ready for large-scale RL training pipelines
## Implementation Status
### β
Phase 1: Model Inference - COMPLETE
- β
`SharedModelInferencer` - ONNX Runtime with shared base model
- β
`TGNTokenizer` - Compatible with Python training tokenizer
- β
`PrefixTreeBuilder` - Tree attention support
- β
ONNX models can be loaded and run
- β
Model format: 3-model architecture (base + policy_head + value_head)
### β
Phase 2: Game Engine - COMPLETE
- β
`TrigoGame` - Complete 3D Go engine
- β
`trigo_coords.hpp` - ab0yz coordinate encoding
- β
`trigo_game_utils.hpp` - Capture, Ko, territory
- β
`tgn_utils.hpp` - Shared TGN generation
- β
Cross-language validation (100/100 games vs TypeScript)
### β
Phase 3: MCTS Algorithm - COMPLETE
- β
PureMCTS with random rollouts (`include/mcts_moc.hpp`)
- UCB1 selection, tree expansion, backpropagation working
- Reference implementation for validation
- Performance: ~923ms per simulation
- β
AlphaZero-style MCTS with value network (`include/mcts.hpp`)
- Uses `SharedModelInferencer::value_inference()` for evaluation
- PUCT formula for exploration
- **Performance: 255Γ speedup** (~3.6ms per simulation)
- Production-ready implementation
### π§ Phase 4: GPU Acceleration - FUTURE
- Planned: CUDA MCTS kernels for parallel tree operations
- Planned: Batched neural network inference
- Target: 50-100 games/sec on GPU
## Validation
The implementation is validated against the TypeScript golden reference at `trigoRL/third_party/trigo/trigo-web/`.
**Validation Results**:
- β
100/100 games match TypeScript implementation
- β
All moves legal (capture, Ko, suicide rules)
- β
Territory scoring matches
- β
TGN format parseable by TGNValueDataset
- β
Games terminate correctly
## Integration with TrigoRL Training
### Data Flow
1. **TrigoRL** trains models β exports `.onnx` files
2. **trigo.cpp** loads `.onnx` β runs self-play β generates `.tgn` files
3. **TrigoRL** loads `.tgn` files β continues training (iterative improvement)
### Model Format
The project uses a 3-model architecture:
- `base_model.onnx` - Shared transformer base
- `policy_head.onnx` - Policy network (move prediction)
- `value_head.onnx` - Value network (position evaluation)
Models are exported from TrigoRL using `exportOnnx.py`.
## Development
### Building Tests
```bash
cd build
# Build specific test
make test_trigo_game
# Run test
./test_trigo_game
```
### Available Tests
- `test_trigo_game` - Game engine validation
- `test_trigo_coords` - Coordinate system
- `test_trigo_game_utils` - Go rules (capture, Ko)
- `test_mcts` - Pure MCTS implementation
- `test_alphazero_mcts` - AlphaZero MCTS performance
- `test_neural_policy_inference` - Neural policy
- `test_tgn_consistency` - TGN format validation
- `test_game_replay` - Cross-language validation
### Code Style
- C++17 standard
- Modern C++ (curly braces on standalone lines, tab indentation)
- Comprehensive comments
- DRY principle (avoid code duplication)
## Documentation
- [Development Plan](docs/PLAN.md) - Roadmap and implementation status
- [Model Inference](docs/research/MODEL_INFERENCE.md) - ONNX Runtime integration
- [CUDA Inference](docs/research/CUDA_INFERENCE.md) - GPU acceleration research
- [Validation Report](docs/research/VALIDATION_REPORT.md) - Cross-language validation
## References
- [Trigo Game Rules](https://github.com/k-l-lambda/trigo)
- [TrigoRL Training Pipeline](../trigoRL)
- [AlphaZero Paper](https://arxiv.org/abs/1712.01815)
- [ONNX Runtime](https://onnxruntime.ai/)
## License
[Specify license]
---
**Project Scope**: C++/CUDA tools for Trigo game engine and MCTS self-play generation
**Goal**: Provide high-performance tools for TrigoRL training pipeline
**Status**: Phases 1-3 Complete - Production-ready self-play generation with AlphaZero MCTS