An open API service indexing awesome lists of open source software.

https://github.com/k-l-lambda/trigo.cpp

High-performance C++/CUDA implementation of Monte Carlo Tree Search for Trigo (3D Go) training data generation.
https://github.com/k-l-lambda/trigo.cpp

Last synced: 4 months ago
JSON representation

High-performance C++/CUDA implementation of Monte Carlo Tree Search for Trigo (3D Go) training data generation.

Awesome Lists containing this project

README

          

# Trigo.cpp - High-Performance C++ Tools for Trigo AI

C++/CUDA inference and self-play tools for [Trigo](https://github.com/k-l-lambda/trigo) (3D Go). Provides ONNX Runtime-based neural network inference, AlphaZero-style MCTS, and high-performance self-play data generation for the [TrigoRL training pipeline](../trigoRL).

## Overview

This project implements production-ready tools for Trigo AI development:

**Key Features**:
- πŸš€ **ONNX Runtime Integration**: CPU and GPU inference with trained models
- 🎯 **AlphaZero MCTS**: Value network evaluation (255Γ— faster than random rollouts)
- πŸ”§ **Self-Play Generator**: Command-line tool for training data generation
- 🎲 **Random Board Selection**: 220 candidate shapes (2D and 3D) for diverse training
- βœ… **Cross-Language Validation**: 100% compatibility with TypeScript reference
- πŸ“¦ **Multiple Policies**: Random, Neural, Pure MCTS, AlphaZero MCTS
- πŸ“Š **TGN Format**: Compatible with TrigoRL training pipeline

## Quick Start

### Prerequisites

- CMake 3.18+
- GCC 9+ or Clang 10+
- CUDA Toolkit 11.0+ (optional, for GPU inference)
- ONNX Runtime 1.17.0+ (provided in repository)

### Build

```bash
# Clone repository
cd /path/to/trigo.cpp

# Create build directory
mkdir build && cd build

# Configure and build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# Run tests
./test_trigo_game
./test_alphazero_mcts
```

### Usage

#### Self-Play Data Generation

**Generate games with random board shapes (recommended for training):**
```bash
# Random board selection from 220 candidates (2D: 2-13Γ—1-13Γ—1, 3D: 2-5Γ—2-5Γ—2-5)
# This creates a diverse dataset covering various board sizes
export TRIGO_FORCE_CPU=1

./self_play_generator \
--num-games 100 \
--random-board \
--black-policy mcts \
--white-policy mcts \
--model ../models/trained_shared \
--output /path/to/data/mcts_games \
--seed 42

# With custom board ranges (e.g., small 2D boards only)
./self_play_generator \
--num-games 100 \
--random-board \
--board-ranges "3-9x3-9x1-1,2-3x2-3x2-2" \
--black-policy mcts \
--white-policy mcts \
--model ../models/trained_shared \
--output /path/to/data/mcts_games
```

**Generate games with fixed board size:**
```bash
# AlphaZero-style MCTS with value network on 5Γ—5Γ—5 board
# Force CPU for best performance (1.52Γ— faster than GPU for batch=1 MCTS)
export TRIGO_FORCE_CPU=1

./self_play_generator \
--num-games 100 \
--board 5x5x5 \
--black-policy mcts \
--white-policy mcts \
--model ../models/trained_shared \
--output /path/to/data/mcts_games \
--seed 42

# With custom MCTS parameters
./self_play_generator \
--num-games 100 \
--board 5x5x5 \
--black-policy mcts \
--white-policy mcts \
--model ../models/trained_shared \
--mcts-simulations 50 \
--mcts-c-puct 1.5 \
--output /path/to/data/mcts_games
```

**Generate games with neural policy (faster, less exploration):**
```bash
./self_play_generator \
--num-games 1000 \
--board 5x5x5 \
--black-policy neural \
--white-policy neural \
--model ../models/trained_shared \
--output /path/to/data/neural_games
```

**Generate baseline games with random policy:**
```bash
# Random policy with random board shapes
./self_play_generator \
--num-games 10000 \
--random-board \
--black-policy random \
--white-policy random \
--output /path/to/data/random_games \
--seed 42

# Random policy with fixed board
./self_play_generator \
--num-games 10000 \
--board 5x5x5 \
--black-policy random \
--white-policy random \
--output /path/to/data/random_games \
--seed 42
```

#### Board Shape Options

The generator supports two modes for board shape selection:

**Fixed Board (--board):**
```bash
--board 5x5x5 # Fixed 5Γ—5Γ—5 board for all games
--board 9x9x1 # Fixed 9Γ—9Γ—1 (2D) board for all games
--board 13x13x1 # Fixed 13Γ—13 (traditional Go size)
```

**Random Board (--random-board):**
```bash
--random-board # Randomly select from 220 candidate shapes per game
```

The random board mode uses default ranges:
- **2D boards**: 2-13Γ—1-13Γ—1 (156 shapes)
- **3D boards**: 2-5Γ—2-5Γ—2-5 (64 shapes)
- **Total**: 220 candidate shapes

**Custom Board Ranges (--board-ranges):**

You can specify custom ranges with `--board-ranges` (requires `--random-board`):

```bash
# Format: "minX-maxXxminY-maxYxminZ-maxZ,..."
--random-board --board-ranges "2-13x1-13x1-1,2-5x2-5x2-5" # Default (220 shapes)
--random-board --board-ranges "3-9x3-9x1-1" # Small 2D boards only
--random-board --board-ranges "2-3x2-3x2-3" # Tiny 3D boards only
--random-board --board-ranges "5-5x5-5x5-5,9-9x9-9x1-1" # Mix of 5Γ—5Γ—5 and 9Γ—9
```

**Range Format**: `minX-maxXxminY-maxYxminZ-maxZ`
- Multiple ranges can be comma-separated
- Each range generates all combinations within bounds
- Example: `2-3x2-3x1-1` generates: 2Γ—2Γ—1, 2Γ—3Γ—1, 3Γ—2Γ—1, 3Γ—3Γ—1 (4 shapes)

Random board selection is recommended for training diverse models that generalize across board sizes.

**Parameter Rules**:
- `--board` and `--random-board` are mutually exclusive
- `--board-ranges` requires `--random-board`

#### Policy Options

Available policy types:
- `random` - Random valid moves (fast, no model required)
- `neural` - Direct neural network inference (requires `--model`)
- `mcts` - AlphaZero MCTS with value network (requires `--model`)

#### MCTS Parameters

- `--mcts-simulations N` - Number of MCTS simulations per move (default: 50)
- `--mcts-c-puct F` - Exploration constant for PUCT formula (default: 1.5)
- `--mcts-temperature F` - Temperature for move selection (default: 1.0)
- `--mcts-dirichlet-alpha F` - Dirichlet noise alpha for root exploration (default: 0.3)

#### Model Path

The `--model` parameter should point to a directory containing the 3-model ONNX architecture:
```
models/trained_shared/
β”œβ”€β”€ base_model.onnx # Shared transformer base
β”œβ”€β”€ policy_head.onnx # Policy network
└── value_head.onnx # Value network
```

Models are exported from TrigoRL using `exportOnnx.py`.

#### Performance Tips

**For Self-Play Generation:**
- Use `TRIGO_FORCE_CPU=1` for MCTS (CPU is 1.52Γ— faster than GPU)
- MCTS with 50 simulations: ~280ms per move on CPU
- Can generate 10,000 games in 32.5 hours on a single CPU

**For GPU Inference:**
- GPU is recommended only for training with large batches (256+)
- Small batch sizes (batch=1) underutilize GPU parallelism
- GPU shows ~1.52Γ— performance penalty for MCTS due to kernel launch overhead

## Architecture

### Component Stack

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Python Training Pipeline (TrigoRL) - SEPARATE PROJECT β”‚
β”‚ β”œβ”€ PyTorch Model Training β”‚
β”‚ β”œβ”€ ONNX Model Export (exportOnnx.py) β”‚
β”‚ β”œβ”€ Training Data Loading (.tgn files) β”‚
β”‚ └─ Weights & Biases Integration β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓ exports
ONNX Models (.onnx)
↓ uses
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ C++ Inference & Generation Tools (trigo.cpp) - THIS PROJECTβ”‚
β”‚ β”œβ”€ SharedModelInferencer (ONNX Runtime + CUDA) β”‚
β”‚ β”‚ β”œβ”€ Policy Network Inference β”‚
β”‚ β”‚ β”œβ”€ Value Network Inference β”‚
β”‚ β”‚ └─ Prefix Tree Attention Builder β”‚
β”‚ β”œβ”€ TrigoGame (3D Go rules engine) β”‚
β”‚ β”‚ β”œβ”€ Board State Management β”‚
β”‚ β”‚ β”œβ”€ Move Validation β”‚
β”‚ β”‚ β”œβ”€ Capture & Ko Detection β”‚
β”‚ β”‚ └─ Territory Calculation β”‚
β”‚ β”œβ”€ MCTS (Monte Carlo Tree Search) β”‚
β”‚ β”‚ β”œβ”€ AlphaZero MCTS (PUCT, value network) - Production β”‚
β”‚ β”‚ └─ Pure MCTS (UCB1, random rollouts) - Reference β”‚
β”‚ β”œβ”€ Self-Play Generator (data generation tool) β”‚
β”‚ β”‚ β”œβ”€ Random Board Selection (220 candidates) β”‚
β”‚ β”‚ β”œβ”€ RandomPolicy β”‚
β”‚ β”‚ β”œβ”€ NeuralPolicy (ONNX inference) β”‚
β”‚ β”‚ β”œβ”€ MCTSPolicy (Pure MCTS) β”‚
β”‚ β”‚ └─ TGN File Export β”‚
β”‚ └─ Python Bindings (pybind11) [future] β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓ generates
Training Data (.tgn)
↓ feeds back to
TrigoRL Pipeline
```

### Directory Structure

```
trigo.cpp/
β”œβ”€β”€ include/ # Public C++ headers
β”‚ β”œβ”€β”€ trigo_game.hpp # 3D Go game engine
β”‚ β”œβ”€β”€ trigo_coords.hpp # ab0yz coordinate system
β”‚ β”œβ”€β”€ trigo_game_utils.hpp # Capture, Ko, territory
β”‚ β”œβ”€β”€ board_shape_candidates.hpp # Random board shape generation
β”‚ β”œβ”€β”€ mcts.hpp # AlphaZero MCTS (value network)
β”‚ β”œβ”€β”€ mcts_moc.hpp # Pure MCTS (random rollouts)
β”‚ β”œβ”€β”€ self_play_policy.hpp # Policy interfaces
β”‚ β”œβ”€β”€ shared_model_inferencer.hpp # ONNX Runtime wrapper
β”‚ β”œβ”€β”€ prefix_tree_builder.hpp # Tree attention
β”‚ β”œβ”€β”€ tgn_tokenizer.hpp # TGN tokenization
β”‚ └── tgn_utils.hpp # TGN generation utilities
β”œβ”€β”€ src/ # Implementation
β”‚ β”œβ”€β”€ trigo_game.cpp
β”‚ β”œβ”€β”€ shared_model_inferencer.cpp
β”‚ β”œβ”€β”€ tgn_tokenizer.cpp
β”‚ β”œβ”€β”€ prefix_tree_builder.cpp
β”‚ └── self_play_generator.cpp # Main CLI tool
β”œβ”€β”€ tests/ # Unit tests
β”‚ β”œβ”€β”€ test_trigo_game.cpp
β”‚ β”œβ”€β”€ test_mcts.cpp
β”‚ β”œβ”€β”€ test_alphazero_mcts.cpp
β”‚ β”œβ”€β”€ test_neural_policy_inference.cpp
β”‚ └── ...
β”œβ”€β”€ models/ # Trained ONNX models
β”‚ └── trained_shared/
β”‚ β”œβ”€β”€ base_model.onnx
β”‚ β”œβ”€β”€ policy_head.onnx
β”‚ └── value_head.onnx
β”œβ”€β”€ docs/ # Documentation
β”‚ └── PLAN.md # Development roadmap
β”œβ”€β”€ CMakeLists.txt
└── README.md
```

## Performance

### C++ vs TypeScript MCTS Performance

Comprehensive benchmarking (December 2025) shows significant performance advantages:

| Implementation | Time per Move | Games per Minute | Speedup vs TypeScript |
|----------------|---------------|------------------|----------------------|
| **C++ CPU (MCTS)** | 280ms | 3.6 games/min | **6.59Γ—** |
| **C++ GPU (MCTS)** | 335ms | 3.0 games/min | 5.51Γ— |
| TypeScript (MCTS) | 1846ms | 0.65 games/min | 1Γ— (baseline) |

**Key Findings:**
- **C++ is 5.47Γ— faster** than TypeScript for MCTS self-play
- **CPU outperforms GPU by 1.52Γ—** for batch=1 MCTS workloads
- Can generate **10,000 games in 32.5 hours** on a single CPU

### Value Network vs Random Rollouts

AlphaZero-style MCTS with value network provides massive speedup over traditional rollouts:

| Implementation | Time per simulation | 50 simulations | 800 simulations |
|----------------|---------------------|----------------|-----------------|
| PureMCTS (rollouts) | 923ms | 46 seconds | 12+ minutes |
| MCTS (value network) | 3.6ms | 180ms | 2.9 seconds |
| **Speedup** | **255Γ—** | **255Γ—** | **255Γ—** |

**Test Configuration:**
- Board: 5Γ—5Γ—1
- MCTS simulations: 50 per move
- Model: Dynamic ONNX shared architecture
- Hardware: Multi-core CPU + RTX 3090 (24GB)

### Why CPU is Faster Than GPU for MCTS

For batch=1 MCTS workloads, CPU shows better performance due to:
- **Kernel launch overhead**: ~100-150ΞΌs per GPU call dominates small inference
- **Memory transfers**: 7 additional Memcpy operations for GPU
- **Underutilization**: GPU cores 99% idle with batch=1
- **Operator fallback**: Some operators fall back to CPU

**Recommendation:**
- βœ… Use CPU for MCTS self-play (batch=1)
- βœ… Use GPU for training (batch=256+)
- βœ… Future: Batch MCTS leaf evaluation for GPU (64-256 positions simultaneously)

### Production Capacity

**Single CPU Performance:**
- 7.7 games per minute (MCTS, 50 simulations/move)
- 10,000 games in 32.5 hours
- Ready for large-scale RL training pipelines

## Implementation Status

### βœ… Phase 1: Model Inference - COMPLETE

- βœ… `SharedModelInferencer` - ONNX Runtime with shared base model
- βœ… `TGNTokenizer` - Compatible with Python training tokenizer
- βœ… `PrefixTreeBuilder` - Tree attention support
- βœ… ONNX models can be loaded and run
- βœ… Model format: 3-model architecture (base + policy_head + value_head)

### βœ… Phase 2: Game Engine - COMPLETE

- βœ… `TrigoGame` - Complete 3D Go engine
- βœ… `trigo_coords.hpp` - ab0yz coordinate encoding
- βœ… `trigo_game_utils.hpp` - Capture, Ko, territory
- βœ… `tgn_utils.hpp` - Shared TGN generation
- βœ… Cross-language validation (100/100 games vs TypeScript)

### βœ… Phase 3: MCTS Algorithm - COMPLETE

- βœ… PureMCTS with random rollouts (`include/mcts_moc.hpp`)
- UCB1 selection, tree expansion, backpropagation working
- Reference implementation for validation
- Performance: ~923ms per simulation
- βœ… AlphaZero-style MCTS with value network (`include/mcts.hpp`)
- Uses `SharedModelInferencer::value_inference()` for evaluation
- PUCT formula for exploration
- **Performance: 255Γ— speedup** (~3.6ms per simulation)
- Production-ready implementation

### 🚧 Phase 4: GPU Acceleration - FUTURE

- Planned: CUDA MCTS kernels for parallel tree operations
- Planned: Batched neural network inference
- Target: 50-100 games/sec on GPU

## Validation

The implementation is validated against the TypeScript golden reference at `trigoRL/third_party/trigo/trigo-web/`.

**Validation Results**:
- βœ… 100/100 games match TypeScript implementation
- βœ… All moves legal (capture, Ko, suicide rules)
- βœ… Territory scoring matches
- βœ… TGN format parseable by TGNValueDataset
- βœ… Games terminate correctly

## Integration with TrigoRL Training

### Data Flow

1. **TrigoRL** trains models β†’ exports `.onnx` files
2. **trigo.cpp** loads `.onnx` β†’ runs self-play β†’ generates `.tgn` files
3. **TrigoRL** loads `.tgn` files β†’ continues training (iterative improvement)

### Model Format

The project uses a 3-model architecture:
- `base_model.onnx` - Shared transformer base
- `policy_head.onnx` - Policy network (move prediction)
- `value_head.onnx` - Value network (position evaluation)

Models are exported from TrigoRL using `exportOnnx.py`.

## Development

### Building Tests

```bash
cd build

# Build specific test
make test_trigo_game

# Run test
./test_trigo_game
```

### Available Tests

- `test_trigo_game` - Game engine validation
- `test_trigo_coords` - Coordinate system
- `test_trigo_game_utils` - Go rules (capture, Ko)
- `test_mcts` - Pure MCTS implementation
- `test_alphazero_mcts` - AlphaZero MCTS performance
- `test_neural_policy_inference` - Neural policy
- `test_tgn_consistency` - TGN format validation
- `test_game_replay` - Cross-language validation

### Code Style

- C++17 standard
- Modern C++ (curly braces on standalone lines, tab indentation)
- Comprehensive comments
- DRY principle (avoid code duplication)

## Documentation

- [Development Plan](docs/PLAN.md) - Roadmap and implementation status
- [Model Inference](docs/research/MODEL_INFERENCE.md) - ONNX Runtime integration
- [CUDA Inference](docs/research/CUDA_INFERENCE.md) - GPU acceleration research
- [Validation Report](docs/research/VALIDATION_REPORT.md) - Cross-language validation

## References

- [Trigo Game Rules](https://github.com/k-l-lambda/trigo)
- [TrigoRL Training Pipeline](../trigoRL)
- [AlphaZero Paper](https://arxiv.org/abs/1712.01815)
- [ONNX Runtime](https://onnxruntime.ai/)

## License

[Specify license]

---

**Project Scope**: C++/CUDA tools for Trigo game engine and MCTS self-play generation

**Goal**: Provide high-performance tools for TrigoRL training pipeline

**Status**: Phases 1-3 Complete - Production-ready self-play generation with AlphaZero MCTS