{"id":44975951,"url":"https://github.com/k-l-lambda/trigorl","last_synced_at":"2026-02-18T17:04:46.862Z","repository":{"id":323659829,"uuid":"1093914196","full_name":"k-l-lambda/trigoRL","owner":"k-l-lambda","description":"An experimental reinforcement learning project based on the game of Trigo.","archived":false,"fork":false,"pushed_at":"2026-01-15T09:29:03.000Z","size":2547,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-15T09:32:10.037Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/k-l-lambda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-11T02:25:41.000Z","updated_at":"2025-11-24T12:13:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/k-l-lambda/trigoRL","commit_stats":null,"previous_names":["k-l-lambda/trigorl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/k-l-lambda/trigoRL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-l-lambda%2FtrigoRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-l-lambda%2FtrigoRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-l-lambda%2FtrigoRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-l-lambda%2FtrigoRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/k-l-lambda","download_url":"https://codeload.github.com/k-l-lambda/trigoRL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-l-lambda%2FtrigoRL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29587066,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T16:55:40.614Z","status":"ssl_error","status_checked_at":"2026-02-18T16:55:37.558Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-18T17:04:40.156Z","updated_at":"2026-02-18T17:04:46.822Z","avatar_url":"https://github.com/k-l-lambda.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TrigoRL\n\nA reinforcement learning laboratory project for training AI agents to play Trigo, a 3D variant of the board game Go.\n\n## Overview\n\nTrigoRL is an experimental platform for exploring reinforcement learning techniques in the context of **Trigo**\n- a strategic board game that extends the rules of Go into three-dimensional space.\nWhile traditional Go is played on a 2D 19×19 board, Trigo is played on a cubic grid,\nintroducing new strategic dimensions and complexity.\n\n## About Trigo\n\nTrigo is a modern reimplementation of a 3D Go variant with the following characteristics:\n\n- **Board**: 3D cubic grid (default: 5×5×5, configurable to other dimensions including 2D boards)\n- **Rules**: Based on Go mechanics adapted for 3D space\n  - Stone placement with capture detection\n  - Ko rule enforcement\n  - Territory calculation in 3D\n  - Pass, undo/redo, and resignation support\n- **Notation**: TGN (Trigo Game Notation) - a PGN-inspired text format for recording games\n- **Coordinate System**: Center-symmetric notation (e.g., `000` = center, `aaa` = corner)\n\n**TRY IT YOURSELF ONLINE**: here is a [Trigo demo page](https://huggingface.co/spaces/k-l-lambda/trigo).\n\n## Quick Start\n\n### Inspect Dataset\n\nView and validate the TGNDataset:\n\n```bash\n# View dataset statistics\npython tools/view_dataset.py configs/training/trigo-gpt2.yaml --stats\n\n# Validate dataset implementation\npython tools/view_dataset.py configs/training/trigo-gpt2.yaml --validate\n\n# View a specific sample\npython tools/view_dataset.py configs/training/trigo-gpt2.yaml --sample 0 --tokens\n```\n\nSee [tools/README.md](tools/README.md) for comprehensive CLI documentation.\n\n### Training Models\n\nTrain language models from scratch or resume from checkpoints:\n\n```bash\n# Start new training from scratch\npython train_lm.py configs/training/trigo-gpt2.yaml\n\n# Start with config overrides\npython train_lm.py configs/training/trigo-gpt2.yaml training.epochs=50 training.learning_rate=5e-5\n\n# Resume from checkpoint by specifying resume_from in config\npython train_lm.py configs/training/trigo-gpt2.yaml training.resume_from=outputs/trigor/20251113-trigo-gpt2/checkpoints/best.chkpt\n\n# Resume from experiment directory (automatically loads latest checkpoint)\npython train_lm.py outputs/trigor/20251113-trigo-gpt2\n\n# Resume with config overrides (useful for fine-tuning)\npython train_lm.py outputs/trigor/20251113-trigo-gpt2 training.learning_rate=1e-5 training.epochs=100\n```\n\n**Available training configs**:\n- `trigo-gpt2.yaml` - GPT-2 with standard multi-head attention\n- `trigo-llama.yaml` - LLaMA with grouped query attention (GQA)\n- `trigo-rwkv.yaml` - RWKV with linear attention\n- `trigo-gpt2-invsqrt.yaml` - GPT-2 with inverse square root scheduler\n\n**Resume training options**:\n1. **From experiment directory**: `python train_lm.py outputs/trigor/[experiment-dir]`\n   - Automatically loads `checkpoints/latest.chkpt`\n   - Preserves all previous config settings\n   - Continues wandb logging to the same run (if wandb enabled)\n\n2. **From specific checkpoint**: Set `training.resume_from` in config or override:\n   ```yaml\n   training:\n     resume_from: path/to/checkpoint.chkpt  # null = train from scratch\n   ```\n   - Can use `best.chkpt`, `latest.chkpt`, or any epoch checkpoint\n   - Restores model weights, optimizer state, and training progress\n   - Useful for transfer learning or fine-tuning\n\n**Training outputs**:\n- `outputs/trigor/[experiment-id]/config.yaml` - Saved configuration\n- `outputs/trigor/[experiment-id]/train.log` - Training logs\n- `outputs/trigor/[experiment-id]/checkpoints/` - Model checkpoints\n  - `best.chkpt` - Best model (based on validation metric)\n  - `latest.chkpt` - Latest model (for resuming)\n  - `epoch_N.chkpt` - Periodic checkpoints\n\n### Test Models\n\nRun the model test suite:\n\n```bash\npython tests/test_models.py\n```\n\nThis validates:\n- Model registry with 4 CausalLM models\n- Configuration loading (dict and OmegaConf)\n- Forward passes for GPT-2, LLaMA, and RWKV\n- Parameter counting and memory estimation\n\n### Verify Configurations\n\nTest all training configs:\n\n```bash\npython examples/verify_training_configs.py\n```\n\n### Export Models to ONNX\n\nExport trained models for cross-platform deployment:\n\n```bash\n# Export best checkpoint (default - standard inference mode)\npython exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt\n\n# Export in evaluation mode with fixed dimensions\npython exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \\\n    --evaluation-mode --prefix-len 10 --seq-len 15\n\n# Export evaluation mode with dynamic dimensions (prefix-len/seq-len only for dummy input)\npython exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \\\n    --evaluation-mode --dynamic-seq\n\n# Export with INT8 quantization (recommended for deployment)\npython exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \\\n    --quantize --quant-type int8\n\n# Export with dynamic batch/sequence sizes\npython exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \\\n    --dynamic-batch --dynamic-seq\n\n# Export with static quantization (best accuracy)\npython exportOnnx.py outputs/trigor/20251115-trigo-gpt2-l6-d64-251112-invsqrt \\\n    --quantize --quant-method static --calibration-samples 200\n```\n\n**Export Modes**:\n- **Standard mode**: Single input `input_ids`, returns `logits` for all positions\n- **Evaluation mode**: Three inputs (`prefix_ids`, `evaluated_ids`, `evaluated_mask`), returns logits for last prefix + evaluated positions. Supports custom attention patterns like tree attention for computing sequence probabilities.\n\n**Quantization benefits**:\n- **INT8 dynamic**: ~3-4x smaller model, minimal accuracy loss\n- **INT4**: ~8x smaller, more aggressive compression\n- **Static quantization**: Better accuracy than dynamic, requires calibration\n\nSee `docs/onnx_quantization_guide.md` for comprehensive quantization documentation.\n\n## Technical Stack\n\n### Reinforcement Learning Framework\n\n- **PyTorch**: Deep learning framework for model implementation\n- **Transformers**: Architecture foundation for the RL agent (GPT-2, LLaMA, RWKV, xLSTM)\n- **Weights \u0026 Biases (wandb)**: Training metrics and experiment tracking\n- **ONNX**: Model weight export format for cross-platform deployment\n- **OmegaConf/Hydra**: Hierarchical configuration management\n\n### Current Implementation Status\n\n✅ **Data Pipeline**\n- TGNDataset: PyTorch dataset for TGN files with byte-level tokenization\n- TGNByteTokenizer: 259-token vocab (256 bytes + PAD/START/END)\n- Configuration-driven dataset loading\n\n✅ **Model Architecture**\n- 4 CausalLM models: GPT2, LLaMA (with GQA), RWKV (linear attention), xLSTM\n- Model registry with factory pattern\n- OmegaConf integration for flexible configuration\n- Parameter counting and memory footprint estimation\n\n✅ **Training Configuration**\n- Complete YAML configs for all 4 models\n- Hyperparameters tuned for each architecture\n- WandB integration (optional)\n- Checkpointing and learning rate scheduling\n\n✅ **Development Tools**\n- CLI tool for dataset inspection and validation\n- Model testing suite (109 tests passing)\n- Configuration verification scripts\n\n✅ **Model Export**\n- ONNX export script with checkpoint loading\n- INT8/INT4 quantization (dynamic and static)\n- 3-4x model compression with minimal accuracy loss\n- Node.js inference validation and testing\n\n## Development Roadmap\n\nThe following components need to be implemented for the RL framework:\n\n1. ~~**Data Pipeline**~~ ✅ COMPLETE\n   - ~~TGNDataset implementation with byte tokenization~~\n   - ~~Dataset configuration and loading~~\n   - ~~Validation and inspection tools~~\n\n2. ~~**Model Architecture**~~ ✅ COMPLETE\n   - ~~Transformer-based CausalLM implementations~~\n   - ~~Model registry and factory pattern~~\n   - ~~Configuration management~~\n\n3. **Training Pipeline** 🚧 IN PROGRESS\n   - Training loop implementation\n   - Self-play game generation\n   - Experience replay buffer\n   - Policy gradient or actor-critic implementation\n   - Integration with Weights \u0026 Biases for experiment tracking\n\n4. **Environment Wrapper** 📋 PLANNED\n   - Python interface to the Trigo game engine\n   - OpenAI Gym-compatible environment\n   - State representation for 3D board positions\n   - Action space definition\n\n5. ~~**Model Export**~~ ✅ COMPLETE\n   - ~~ONNX conversion utilities~~\n   - ~~INT8/INT4 quantization support~~\n   - ~~Static and dynamic quantization~~\n   - ~~Node.js inference validation~~\n\n6. **Evaluation \u0026 Analysis** 📋 PLANNED\n   - Agent performance metrics\n   - Game quality assessment\n   - Visualization tools\n\n## Game Engine Features\n\nThe Trigo game engine provides:\n\n- **3D Visualization**: Interactive Three.js-based board rendering\n- **Multiplayer Support**: Real-time gameplay via WebSocket\n- **Game Notation**: TGN format for saving and loading games\n- **REST API**: Programmatic game control\n- **Comprehensive Testing**: 10 test suites covering core functionality\n\nFor detailed API documentation, see:\n- [Game Engine README](third_party/trigo/README.md)\n- [TGN Format Specification](third_party/trigo/docs/tgn-format-spec.md)\n- [Development Guidelines](third_party/trigo/CLAUDE.md)\n\n## Acknowledgments\n\n- Based on the Trigo game engine by k-l-lambda\n- Inspired by AlphaGo and other game-playing RL systems\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk-l-lambda%2Ftrigorl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fk-l-lambda%2Ftrigorl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk-l-lambda%2Ftrigorl/lists"}