https://github.com/tachyon-beep/keisei

A Deep Reinforcement Learning project demonstrating AI's power to create AI, aimed at mastering the complex game of Shogi. Built 100% by GitHub Copilot (Agent Mode) with human project management, it features a custom Shogi engine, PPO in PyTorch, rich experiment tracking via Weights & Biases, and a live TUI using Rich for dynamic monitoring.
https://github.com/tachyon-beep/keisei

actor-critic ai-coded artificial-intelligence board-games custom-game-engine deep-reinforcement-learning experiment-tracking game-ai generative-ai-development github-copilot neural-networks ppo python pytorch rich-library self-play shogi strategy-games terminal-ui weights-and-biases

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/tachyon-beep/keisei
Owner: tachyon-beep
License: mit
Created: 2025-05-18T09:10:07.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2026-02-07T16:19:00.000Z (4 months ago)
Last Synced: 2026-02-07T16:50:35.548Z (4 months ago)
Topics: actor-critic, ai-coded, artificial-intelligence, board-games, custom-game-engine, deep-reinforcement-learning, experiment-tracking, game-ai, generative-ai-development, github-copilot, neural-networks, ppo, python, pytorch, rich-library, self-play, shogi, strategy-games, terminal-ui, weights-and-biases
Language: Python
Homepage:
Size: 27.7 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Keisei: Deep Reinforcement Learning for Shogi

**Keisei** (形勢, "position" in Shogi) is a deep reinforcement learning system that learns to play Shogi from scratch through self-play, using Proximal Policy Optimization (PPO).

No opening books, no hardcoded heuristics — strategies emerge purely from reinforcement learning.

## Project Intent

Keisei has three legs with clear priority:

1. **Primary:** Showcase Shogi as a rich, competitive game worthy of continuous public exhibition.
2. **Primary:** Showcase deep reinforcement learning as a first-class AI paradigm, not as an accessory to LLM workflows.
3. **Subordinate:** Evaluate how effective LLMs are at building and evolving this system.

The first two define what we are building. The third measures how we build it.

## Features

- **Complete Shogi engine** with full rule support (drops, promotions, repetition)
- **PPO with self-play** — clipped surrogate, GAE, entropy regularization
- **ResNet + SE blocks** — configurable tower depth/width with Squeeze-and-Excitation attention
- **46-channel observation** (9x9 board) with 13,527-action policy space
- **Mixed precision** (AMP) and multi-GPU (DDP) support
- **Pydantic configuration** with YAML files and CLI overrides
- **Streamlit dashboard** for real-time training visualization
- **Weights & Biases** integration for experiment tracking
- **5 evaluation strategies** — single opponent, tournament, ladder, benchmark, custom

## Quick Start

### Prerequisites

- Python 3.12+ (3.13 recommended)
- CUDA-compatible GPU (optional but recommended)
- [uv](https://docs.astral.sh/uv/) package manager

### Installation

```bash
git clone https://github.com/tachyon-beep/shogidrl.git
cd keisei

# Create environment and install
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Optional: configure Weights & Biases
echo "WANDB_API_KEY=your_key" > .env
```

### Training

```bash
# Basic training
python train.py train

# With custom config
python train.py train --config examples/enhanced_display_config.yaml

# With CLI overrides
python train.py train --override training.learning_rate=0.001

# Resume from checkpoint
python train.py train --resume models/my_model/checkpoint.pt

# With Streamlit dashboard
python train.py train --override webui.enabled=true
```

### Evaluation

```bash
python train.py evaluate \
--agent_checkpoint path/to/model.pt \
--opponent_type random \
--num_games 100
```

## Architecture

Keisei uses a manager-based architecture with 9 specialized components orchestrated by a central `Trainer`:

| Manager | Responsibility |
|---------|---------------|
| **SessionManager** | Directories, W&B setup, config persistence |
| **ModelManager** | Model creation, checkpoints, mixed precision |
| **EnvManager** | Game environment, policy mapper, lifecycle |
| **StepManager** | Step execution, episode management, experience collection |
| **TrainingLoopManager** | Main loop, PPO updates, callbacks |
| **MetricsManager** | Statistics, progress tracking, formatting |
| **DisplayManager** | Stderr logging (throttled one-line summaries) |
| **CallbackManager** | Event system, evaluation scheduling, checkpoints |
| **SetupManager** | Component initialization, validation, dependencies |

**Optional:** StreamlitManager provides a real-time training dashboard via atomic JSON state file.

## Project Structure

```
keisei/
├── config_schema.py # Pydantic configuration models
├── constants.py # Shared constants
├── core/ # PPO agent, experience buffer, neural networks
├── shogi/ # Complete Shogi game engine
├── training/ # Manager-based training infrastructure
│ ├── models/ # Neural network architectures (ResNet, CNN)
│ └── parallel/ # Multi-process experience collection
├── evaluation/ # Multi-strategy evaluation system
├── webui/ # Streamlit training dashboard
└── utils/ # Logging, checkpoints, profiling
```

## Configuration

Configuration uses `default_config.yaml` with Pydantic validation. Override any setting via CLI:

```bash
python train.py train \
--override training.learning_rate=0.001 \
--override training.mixed_precision=true \
--override webui.enabled=true
```

See `default_config.yaml` for all available options.

## Development

```bash
# Run tests
pytest tests/unit/ # Fast unit tests
pytest tests/integration/ # Integration tests
pytest tests/e2e/ # End-to-end tests

# Full local CI
./scripts/run_local_ci.sh

# Code quality
black keisei/ # Formatting
mypy keisei/ # Type checking
flake8 keisei/ # Linting
```

See [CLAUDE.md](CLAUDE.md) for detailed development workflow, architecture notes, and contribution guidelines.

## Documentation

- [CLAUDE.md](CLAUDE.md) — Development guide with commands, architecture details, and patterns
- [docs/DESIGN.md](docs/DESIGN.md) — System design document
- [docs/CODE_MAP.md](docs/CODE_MAP.md) — Detailed code organization

## License

This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tachyon-beep/keisei

Awesome Lists containing this project

README