An open API service indexing awesome lists of open source software.

https://github.com/ericrihm/expedition-33

Native macOS port of Clair Obscur: Expedition 33 — UE5.4.4 Metal pipeline, AI-orchestrated build infrastructure, 98.9% shader coverage
https://github.com/ericrihm/expedition-33

Last synced: 20 days ago
JSON representation

Native macOS port of Clair Obscur: Expedition 33 — UE5.4.4 Metal pipeline, AI-orchestrated build infrastructure, 98.9% shader coverage

Awesome Lists containing this project

README

          

# Expedition 33: Native macOS Port

**A AAA Unreal Engine 5.4.4 JRPG running natively on Apple Silicon via a custom Metal GPU pipeline.**

| Shader Coverage | Scorecard | Metal RHI Patches | PSO Caches | Archive Chunks |
|:-:|:-:|:-:|:-:|:-:|
| **100%** | **98.9 / 100 (A+)** | **11** | **30** | **28/28** |

---

## What This Is

Clair Obscur: Expedition 33 is a AAA JRPG built on Unreal Engine 5.4.4, shipping only on Windows and consoles. This project is a ground-up native macOS port targeting Apple Silicon, built entirely outside the official engine toolchain. It required writing a custom Metal shader pipeline, patching the UE5 Metal RHI, building cross-platform asset streaming infrastructure, and orchestrating the entire build/test/validate cycle across a 3-machine fleet using AI-assisted development.

No game source code or engine source is included in this repository. This is a portfolio project demonstrating systems engineering at the intersection of GPU programming, build infrastructure, and AI-native development workflows.

---

## Results

| Metric | Value |
|---|---|
| Shaders compiled | 58,786 of 58,786 (100%) |
| Metal shader archive chunks | 28/28 streaming |
| Pipeline State Object caches | 30 binary caches |
| Custom Metal RHI patches | 11 |
| WaveSize adaptation | 64 -> 32 (40 compute shaders, resolved) |
| Overall scorecard | 98.9 / 100, Grade A+ |
| Crash recovery | Adaptive circuit breaker + regression analysis + fingerprinting |
| Target hardware | Apple M4 Max, 36GB unified memory |

---

## Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│ FLEET TOPOLOGY │
│ │
│ ┌──────────────────┐ SSH / Tailscale ┌────────────────────┐ │
│ │ Mac Mini M4 │◄────────────────────►│ Mac Studio M4 Max │ │
│ │ 24GB RAM │ │ 36GB Unified RAM │ │
│ │ │ │ │ │
│ │ - Control plane │ │ - UE5 builds │ │
│ │ - Orchestrator │ │ - Metal GPU tests │ │
│ │ - File watchers │ │ - Shader compile │ │
│ │ - Scorecard │ │ - Game runtime │ │
│ └────────┬─────────┘ └────────────────────┘ │
│ │ │
│ │ SSH / Tailscale │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Linux Workstation│ │
│ │ RTX 3080 Ti │ │
│ │ 64GB RAM │ │
│ │ │ │
│ │ - Reference GPU │ │
│ │ - Cross-platform │ │
│ │ validation │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘

BUILD PIPELINE (orchestrated from Mini)

fswatch ──► make -j4 ──► test ──► integration ──► gpu_validate ──► scorecard
(file watch) (build) (unit) (asset load) (Metal render) (98.9/A+)
```

The Mini acts as the control plane, watching for source changes and dispatching build/test/validate jobs to the Studio over Tailscale. The Linux workstation provides a reference NVIDIA GPU for cross-platform shader validation. A flywheel meta-orchestrator coordinates the full pipeline end-to-end, with MCP (Model Context Protocol) tool bindings for triggering builds, checking crash status, and restarting the game process programmatically.

---

## Key Technical Challenges

### Metal Shader Pipeline

UE5's shader system targets HLSL/DXIL/SPIRV. Getting 58,786 shaders to compile and run on Metal required:

- **11 custom patches** to the Metal RHI, covering precompiled shader library loading, archive discovery, and chunk streaming.
- **28 shader archive chunks** wired into Metal's resource streaming system, replacing the D3D12/Vulkan paths.
- **30 PSO binary caches** built and validated to eliminate runtime pipeline compilation stalls.
- **WaveSize(64) -> WaveSize(32)** adaptation for 40 compute shaders pinned to AMD/NVIDIA 64-wide wave sizes (Apple GPUs use 32-wide SIMD groups).
- The result: all 58,786 shaders compiled clean. Zero failures.

### WaveSize Adaptation (64 -> 32)

The game's compute shaders assume `WaveSize(64)`, matching AMD GCN/RDNA and NVIDIA warp sizes. Apple GPUs use 32-wide SIMD groups. Forty compute shaders required systematic adaptation from 64-lane to 32-lane execution, touching wave intrinsics, ballot operations, and shared memory layouts without altering compute results.

### Crash Recovery, Regression Analysis, and Adaptive Stability

A UE5 title running outside its target platform will crash. The port includes a **crash sentinel** that goes beyond simple restart logic:

- **Crash fingerprinting**: Each crash is hashed from its `.ips` report and error signature, enabling deduplication and pattern detection across sessions.
- **Uptime tracking**: Every session's runtime is recorded. Mean Time Between Failures (MTBF) is computed from real uptime data, not wall-clock intervals.
- **Regression analysis**: A sliding-window comparator detects when stability is degrading (MTBF drops >40%) or improving (MTBF rises >50%), emitting machine-readable signals that other systems can react to.
- **Adaptive circuit breaker**: The crash threshold is not static. When stability degrades, the breaker tightens (fewer crashes to trip). When stability improves, it relaxes. This prevents both over-sensitivity in stable periods and under-sensitivity during regressions.
- **Effectiveness scoring**: Not all restarts are equal. The sentinel tracks whether a restart actually produced a session lasting >60 seconds. A restart that crashes again in 5 seconds is scored differently than one that runs for 20 minutes.
- **Actionable recommendations**: The regression report surfaces whether crashes are dominated by a single fingerprint (fix one bug) or spread across many (systemic issue), and whether restarts are actually helping.

This enabled unattended overnight GPU validation runs that self-heal from transient Metal driver issues, automatically tighten protection during instability spikes, and produce structured data for root cause analysis.

### Fleet Orchestration

Building and validating a UE5 port is not a single-machine job. The build pipeline spans three machines coordinated over SSH/Tailscale:

1. **File watch** (Mini): `fswatch` detects source changes, triggers `make -j4` on the Studio.
2. **Build + test** (Studio): Compiles shaders, runs unit tests, loads assets.
3. **GPU validation** (Studio): Executes Metal render passes, captures frames, validates output.
4. **Cross-reference** (Linux): Compares Metal output against NVIDIA reference renders.
5. **Scorecard** (Mini): Aggregates all results into a single pass/fail grade.

The entire pipeline runs without manual intervention once triggered.

---

## AI-Assisted Development

This project was built using **AI-assisted tooling with custom orchestration** as a force multiplier. The AI did not write the port. It was directed to:

- Generate and iterate on Metal RHI patches based on specific crash logs and shader compiler errors.
- Analyze thousands of shader compilation failures and categorize them by root cause.
- Build and refine the fleet orchestration scripts, build pipeline, and crash sentinel.
- Produce MCP tool bindings for programmatic control of the build/test cycle.

The key skill demonstrated here is not "using AI" but **AI-native engineering**: knowing how to decompose a complex systems problem into tasks an AI can execute reliably, validating outputs rigorously, and maintaining architectural coherence across thousands of AI-assisted iterations. The human provides direction, judgment, and systems thinking. The AI provides throughput.

---

## Tech Stack

- **Engine**: Unreal Engine 5.4.4
- **GPU API**: Metal (Apple Silicon)
- **Target Hardware**: Mac Studio M4 Max (36GB unified memory)
- **Shader Pipeline**: Custom HLSL-to-Metal compilation with 11 RHI patches
- **Build System**: Make, fswatch, custom pipeline orchestrator
- **Fleet Coordination**: SSH over Tailscale, 3-machine topology
- **Crash Recovery**: Circuit breaker sentinel with auto-restart
- **AI Tooling**: Codex + custom orchestration, MCP tool bindings
- **Validation**: Automated scorecard (98.9/100, A+)

---

## Status

**Shader pipeline: 100% complete. Scene rendering: blocked by cross-platform asset deserialization.**

The GPU infrastructure is production-quality and fully validated:

- All 28 shader archive chunks streaming.
- 100% shader coverage (58,786 / 58,786).
- 30 PSO caches built and validated.
- 11 Metal RHI patches applied and passing.
- Automated build pipeline running end-to-end.
- Circuit breaker crash sentinel active.
- Scorecard: 98.9/100, Grade A+.

**Current blocker:** UE5 cooks assets per-platform, embedding platform-specific binary payloads (shader bytecode format, serialized sizes) in each asset. When the Mac runtime deserializes Windows-cooked assets, the serialized sizes don't match the Mac object representation. The engine seeks past mismatched payloads rather than crashing — but the resulting objects are empty. Empty Materials cascade into an empty scene, which triggers a GPU page fault in post-processing on uninitialized render targets.

This is a known UE5 architectural constraint when operating across platform cooking boundaries, not a failure of the shader or GPU work. See [`docs/cross-platform-assets.md`](docs/cross-platform-assets.md) for a full technical breakdown.

**Active development:** A runtime asset conversion layer that intercepts the deserialization pipeline to handle cross-platform size differences is under active development. Once complete, the validated shader and GPU infrastructure can drive scene rendering end-to-end.

---

## Legal

This repository contains no proprietary source code from Epic Games or Sandfall Interactive. No game assets, PAK files, or copyrighted content are included. This is an independent technical demonstration for portfolio purposes only.