https://github.com/k-l-lambda/trigorl

An experimental reinforcement learning project based on the game of Trigo.
https://github.com/k-l-lambda/trigorl

Last synced: 4 months ago
JSON representation

An experimental reinforcement learning project based on the game of Trigo.

Host: GitHub
URL: https://github.com/k-l-lambda/trigorl
Owner: k-l-lambda
Created: 2025-11-11T02:25:41.000Z (8 months ago)
Default Branch: main
Last Pushed: 2026-01-15T09:29:03.000Z (5 months ago)
Last Synced: 2026-01-15T09:32:10.037Z (5 months ago)
Language: Python
Size: 2.43 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# TrigoRL

A reinforcement learning laboratory project for training AI agents to play Trigo, a 3D variant of the board game Go.

## Overview

TrigoRL is an experimental platform for exploring reinforcement learning techniques in the context of **Trigo**
- a strategic board game that extends the rules of Go into three-dimensional space.
While traditional Go is played on a 2D 19×19 board, Trigo is played on a cubic grid,
introducing new strategic dimensions and complexity.

## About Trigo

Trigo is a modern reimplementation of a 3D Go variant with the following characteristics:

- **Board**: 3D cubic grid (default: 5×5×5, configurable to other dimensions including 2D boards)
- **Rules**: Based on Go mechanics adapted for 3D space
- Stone placement with capture detection
- Ko rule enforcement
- Territory calculation in 3D
- Pass, undo/redo, and resignation support
- **Notation**: TGN (Trigo Game Notation) - a PGN-inspired text format for recording games
- **Coordinate System**: Center-symmetric notation (e.g., `000` = center, `aaa` = corner)

**TRY IT YOURSELF ONLINE**: here is a [Trigo demo page](https://huggingface.co/spaces/k-l-lambda/trigo).

## Quick Start

### Inspect Dataset

View and validate the TGNDataset:

```bash
# View dataset statistics
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --stats

# Validate dataset implementation
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --validate

# View a specific sample
python tools/view_dataset.py configs/training/trigo-gpt2.yaml --sample 0 --tokens
```

See [tools/README.md](tools/README.md) for comprehensive CLI documentation.

### Training Models

Train language models from scratch or resume from checkpoints:

```bash
# Start new training from scratch
python train_lm.py configs/training/trigo-gpt2.yaml

# Start with config overrides
python train_lm.py configs/training/trigo-gpt2.yaml training.epochs=50 training.learning_rate=5e-5

# Resume from checkpoint by specifying resume_from in config
python train_lm.py configs/training/trigo-gpt2.yaml training.resume_from=outputs/trigor/20251113-trigo-gpt2/checkpoints/best.chkpt

# Resume from experiment directory (automatically loads latest checkpoint)
python train_lm.py outputs/trigor/20251113-trigo-gpt2

# Resume with config overrides (useful for fine-tuning)
python train_lm.py outputs/trigor/20251113-trigo-gpt2 training.learning_rate=1e-5 training.epochs=100
```

**Available training configs**:
- `trigo-gpt2.yaml` - GPT-2 with standard multi-head attention
- `trigo-llama.yaml` - LLaMA with grouped query attention (GQA)
- `trigo-rwkv.yaml` - RWKV with linear attention
- `trigo-gpt2-invsqrt.yaml` - GPT-2 with inverse square root scheduler

**Resume training options**:
1. **From experiment directory**: `python train_lm.py outputs/trigor/[experiment-dir]`
- Automatically loads `checkpoints/latest.chkpt`
- Preserves all previous config settings
- Continues wandb logging to the same run (if wandb enabled)

2. **From specific checkpoint**: Set `training.resume_from` in config or override:
```yaml
training:
resume_from: path/to/checkpoint.chkpt # null = train from scratch
```
- Can use `best.chkpt`, `latest.chkpt`, or any epoch checkpoint
- Restores model weights, optimizer state, and training progress
- Useful for transfer learning or fine-tuning

**Training outputs**:
- `outputs/trigor/[experiment-id]/config.yaml` - Saved configuration
- `outputs/trigor/[experiment-id]/train.log` - Training logs
- `outputs/trigor/[experiment-id]/checkpoints/` - Model checkpoints
- `best.chkpt` - Best model (based on validation metric)
- `latest.chkpt` - Latest model (for resuming)
- `epoch_N.chkpt` - Periodic checkpoints

### Test Models

Run the model test suite:

```bash
python tests/test_models.py
```

This validates:
- Model registry with 4 CausalLM models
- Configuration loading (dict and OmegaConf)
- Forward passes for GPT-2, LLaMA, and RWKV
- Parameter counting and memory estimation

### Verify Configurations

Test all training configs:

```bash
python examples/verify_training_configs.py
```

### Export Models to ONNX

Export trained models for cross-platform deployment:

```bash
# Export best checkpoint (default - standard inference mode)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt

# Export in evaluation mode with fixed dimensions
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--evaluation-mode --prefix-len 10 --seq-len 15

# Export evaluation mode with dynamic dimensions (prefix-len/seq-len only for dummy input)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--evaluation-mode --dynamic-seq

# Export with INT8 quantization (recommended for deployment)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--quantize --quant-type int8

# Export with dynamic batch/sequence sizes
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--dynamic-batch --dynamic-seq

# Export with static quantization (best accuracy)
python exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \
--quantize --quant-method static --calibration-samples 200
```

**Export Modes**:
- **Standard mode**: Single input `input_ids`, returns `logits` for all positions
- **Evaluation mode**: Three inputs (`prefix_ids`, `evaluated_ids`, `evaluated_mask`), returns logits for last prefix + evaluated positions. Supports custom attention patterns like tree attention for computing sequence probabilities.

**Quantization benefits**:
- **INT8 dynamic**: ~3-4x smaller model, minimal accuracy loss
- **INT4**: ~8x smaller, more aggressive compression
- **Static quantization**: Better accuracy than dynamic, requires calibration

See `docs/onnx_quantization_guide.md` for comprehensive quantization documentation.

## Technical Stack

### Reinforcement Learning Framework

- **PyTorch**: Deep learning framework for model implementation
- **Transformers**: Architecture foundation for the RL agent (GPT-2, LLaMA, RWKV, xLSTM)
- **Weights & Biases (wandb)**: Training metrics and experiment tracking
- **ONNX**: Model weight export format for cross-platform deployment
- **OmegaConf/Hydra**: Hierarchical configuration management

### Current Implementation Status

✅ **Data Pipeline**
- TGNDataset: PyTorch dataset for TGN files with byte-level tokenization
- TGNByteTokenizer: 259-token vocab (256 bytes + PAD/START/END)
- Configuration-driven dataset loading

✅ **Model Architecture**
- 4 CausalLM models: GPT2, LLaMA (with GQA), RWKV (linear attention), xLSTM
- Model registry with factory pattern
- OmegaConf integration for flexible configuration
- Parameter counting and memory footprint estimation

✅ **Training Configuration**
- Complete YAML configs for all 4 models
- Hyperparameters tuned for each architecture
- WandB integration (optional)
- Checkpointing and learning rate scheduling

✅ **Development Tools**
- CLI tool for dataset inspection and validation
- Model testing suite (109 tests passing)
- Configuration verification scripts

✅ **Model Export**
- ONNX export script with checkpoint loading
- INT8/INT4 quantization (dynamic and static)
- 3-4x model compression with minimal accuracy loss
- Node.js inference validation and testing

## Development Roadmap

The following components need to be implemented for the RL framework:

1. ~~**Data Pipeline**~~ ✅ COMPLETE
- ~~TGNDataset implementation with byte tokenization~~
- ~~Dataset configuration and loading~~
- ~~Validation and inspection tools~~

2. ~~**Model Architecture**~~ ✅ COMPLETE
- ~~Transformer-based CausalLM implementations~~
- ~~Model registry and factory pattern~~
- ~~Configuration management~~

3. **Training Pipeline** 🚧 IN PROGRESS
- Training loop implementation
- Self-play game generation
- Experience replay buffer
- Policy gradient or actor-critic implementation
- Integration with Weights & Biases for experiment tracking

4. **Environment Wrapper** 📋 PLANNED
- Python interface to the Trigo game engine
- OpenAI Gym-compatible environment
- State representation for 3D board positions
- Action space definition

5. ~~**Model Export**~~ ✅ COMPLETE
- ~~ONNX conversion utilities~~
- ~~INT8/INT4 quantization support~~
- ~~Static and dynamic quantization~~
- ~~Node.js inference validation~~

6. **Evaluation & Analysis** 📋 PLANNED
- Agent performance metrics
- Game quality assessment
- Visualization tools

## Game Engine Features

The Trigo game engine provides:

- **3D Visualization**: Interactive Three.js-based board rendering
- **Multiplayer Support**: Real-time gameplay via WebSocket
- **Game Notation**: TGN format for saving and loading games
- **REST API**: Programmatic game control
- **Comprehensive Testing**: 10 test suites covering core functionality

For detailed API documentation, see:
- [Game Engine README](third_party/trigo/README.md)
- [TGN Format Specification](third_party/trigo/docs/tgn-format-spec.md)
- [Development Guidelines](third_party/trigo/CLAUDE.md)

## Acknowledgments

- Based on the Trigo game engine by k-l-lambda
- Inspired by AlphaGo and other game-playing RL systems

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/k-l-lambda/trigorl

Awesome Lists containing this project

README