https://github.com/openclaw/crabbox
Crabbox: warm a box, sync the diff, run the suite.
https://github.com/openclaw/crabbox
remote-test-runner
Last synced: 16 days ago
JSON representation
Crabbox: warm a box, sync the diff, run the suite.
- Host: GitHub
- URL: https://github.com/openclaw/crabbox
- Owner: openclaw
- License: mit
- Created: 2026-04-30T14:31:48.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-24T06:10:45.000Z (21 days ago)
- Last Synced: 2026-05-24T10:06:15.745Z (21 days ago)
- Topics: remote-test-runner
- Language: Go
- Homepage: http://crabbox.sh
- Size: 4.43 MB
- Stars: 487
- Watchers: 2
- Forks: 55
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: docs/security.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# 🦀 📦 Crabbox

[](https://github.com/openclaw/crabbox/actions/workflows/ci.yml)
[](https://github.com/openclaw/crabbox/actions/workflows/release.yml)
[](https://github.com/openclaw/crabbox/releases/latest)
**Warm a box, sync the diff, run the suite.**
Crabbox is a remote software testing and execution control plane for maintainers
and AI agents. Lease fast managed cloud capacity, point at an existing SSH host,
or use an agent sandbox provider — then sync your dirty checkout, run commands
remotely, stream output, collect evidence, and release. Local edit-save-run
loop, cloud-grade compute, agent-ready observability.
```sh
crabbox run -- pnpm test
```
Behind that one command: a Go CLI on your laptop, a Cloudflare Worker broker
that owns provider credentials and lease state, and a managed or delegated
runner.
## How it works
```text
your laptop Cloudflare Worker cloud provider
------------- ------------------ --------------
crabbox CLI -- HTTPS --> Fleet Durable Object --> Hetzner / AWS / Azure / GCP
| lease + cost state |
| |
+------------ SSH + rsync to leased runner <--------------+
```
- **CLI** — Go binary. Loads config, mints a per-lease SSH key, asks the broker
for a lease, waits for SSH, seeds remote Git, rsyncs the dirty checkout (with
a fingerprint skip when nothing changed), runs the command, streams output,
releases.
- **Broker** — Cloudflare Worker plus a single Fleet Durable Object. Owns
provider credentials, serializes lease state, enforces active-lease and
monthly spend caps, and expires stale leases by alarm. Auth is GitHub browser
login or a shared bearer token.
- **Runner** — a throwaway machine reachable over SSH on the primary port
(default `2222`) plus configured fallback ports, prepared with Crabbox's
sync/run prerequisites. Linux uses Ubuntu with cloud-init and `/work/crabbox`;
native Windows uses OpenSSH, Git for Windows, and `C:\crabbox`. No broker
credentials live on the box. Project runtimes (Go, Node, Docker, services,
secrets) come from your repo's GitHub Actions hydration, devcontainer, Nix,
mise/asdf, or setup scripts — not from Crabbox.
The data plane — SSH, rsync, command execution — always runs directly from the
CLI to the runner. The broker only manages leases, cost, and observability.
Only `aws`, `azure`, `gcp`, and `hetzner` can be brokered through the Worker,
and even those run direct from the CLI when no broker URL is configured. Every
other provider always runs direct. A direct-provider mode
(`--provider hetzner|aws|azure|gcp|proxmox` with local credentials) exists for
debugging the broker itself or using private infrastructure.
For the full mental model, see [How Crabbox Works](docs/how-it-works.md). For
the doc-to-code map, see [Source Map](docs/source-map.md).
## Install
```sh
brew install openclaw/tap/crabbox
crabbox --version
```
No Homebrew? Grab a [GoReleaser archive](https://github.com/openclaw/crabbox/releases)
for macOS, Linux, or Windows.
Laptop prerequisites: `git`, `ssh`, `ssh-keygen`, `rsync`, `curl`.
## Quick start
Broker access is deployment-specific. Use a coordinator URL from your team, use
direct-provider mode for a personal cloud account, or self-host the Worker
broker with your own provider credentials and spend caps. See
[Getting started](docs/getting-started.md#choosing-an-access-path) and
[Infrastructure](docs/infrastructure.md#self-hosted-broker-minimum-setup) for the
setup paths.
```sh
# log in once per machine (stores a broker token in user config)
crabbox login --url https://broker.example.com
# verify local prerequisites and broker reachability
crabbox doctor
# one-shot: lease, sync, run, release
crabbox run -- pnpm test
# named repo workflow from .crabbox.yaml
crabbox job run full-ci
# or warm a box once, then reuse it
crabbox warmup # prints cbx_... + a slug
crabbox run --id blue-lobster -- pnpm test:changed
crabbox ssh --id blue-lobster
crabbox stop blue-lobster
```
Every lease has a stable `cbx_...` ID and a friendly crustacean slug
(`blue-lobster`, `swift-hermit`, …). Either works wherever an `--id` is
accepted. Use `--slug ` on fresh leases when a specific reusable slug
helps, and `--label ` on `run` when the history entry needs a
human-readable name.
## Providers
`Coordinator: brokered` providers can run through the Worker (or direct when no
broker is configured); every other provider always runs direct from the CLI.
Targets: **L**inux, **M**acOS, **W**indows.
### SSH-lease providers (provision or connect a box, full lifecycle)
| Provider | `provider:` (aliases) | Targets | Coordinator | Notes |
| --- | --- | --- | --- | --- |
| [AWS EC2](docs/providers/aws.md) | `aws` | L / M / W | brokered | EC2 instances and EC2 Mac; native AMI/EBS checkpoints. |
| [Azure](docs/providers/azure.md) | `azure` | L / W | brokered | VMs with Tailscale support; native Windows and WSL2. |
| [Google Cloud](docs/providers/gcp.md) | `gcp` (`google`, `google-cloud`) | L | brokered | Linux Compute Engine VMs with Tailscale support. |
| [Hetzner Cloud](docs/providers/hetzner.md) | `hetzner` | L | brokered | Linux VMs with desktop/browser/code and Tailscale. |
| [Parallels](docs/providers/parallels.md) | `parallels` | L / M / W | direct | Local or remote macOS host; checkpoint/fork/restore/snapshot. |
| [Proxmox](docs/providers/proxmox.md) | `proxmox` | L | direct | Clone Linux QEMU templates on a private Proxmox VE cluster. |
| [Static SSH](docs/providers/ssh.md) | `ssh` (`static`, `static-ssh`) | L / M / W | direct | Existing machines; no provisioning. |
| [Local Container](docs/providers/local-container.md) | `local-container` (`docker`, `container`, `local-docker`) | L | direct | Local Docker-compatible runtime (Docker Desktop, OrbStack, Colima). |
| [exe.dev](docs/providers/exe-dev.md) | `exe-dev` (`exe`, `exedev`) | L | direct | exe.dev VMs exposed as public SSH leases. |
| [Namespace Devbox](docs/providers/namespace-devbox.md) | `namespace-devbox` (`namespace`, `namespace-devboxes`) | L | direct | Namespace.so Devboxes over SSH. |
| [Semaphore](docs/providers/semaphore.md) | `semaphore` (`sem`) | L | direct | A Semaphore CI job leased as a testbox. |
| [Sprites](docs/providers/sprites.md) | `sprites` | L | direct | Sprites microVMs through `sprite proxy`. |
| [Daytona](docs/providers/daytona.md) | `daytona` | L | direct | Daytona-managed dev sandbox over SSH. |
| [RunPod](docs/providers/runpod.md) | `runpod` (`run-pod`, `runpodio`) | L | direct | RunPod GPU pods with public SSH. |
### Delegated-run providers (sandbox/proof runners, no SSH lease)
| Provider | `provider:` (aliases) | Targets | Notes |
| --- | --- | --- | --- |
| [Cloudflare](docs/providers/cloudflare.md) | `cloudflare` (`cf`) | L | Cloudflare Containers via the Worker runtime. |
| [E2B](docs/providers/e2b.md) | `e2b` | L | E2B Firecracker sandbox. |
| [Islo](docs/providers/islo.md) | `islo` | L | Islo sandbox. |
| [Modal](docs/providers/modal.md) | `modal` | L | Modal Sandbox through the local Python client. |
| [Railway](docs/providers/railway.md) | `railway` (`rail`, `railwayapp`) | L | Redeploy and stream an existing Railway service. |
| [Tensorlake](docs/providers/tensorlake.md) | `tensorlake` (`tl`, `tensorlake-sbx`) | L | Tensorlake Firecracker sandbox via the Tensorlake CLI. |
| [Upstash Box](docs/providers/upstash-box.md) | `upstash-box` (`upstash`, `box`, `upstashbox`) | L | Upstash Box through the Box REST API. |
| [Azure Dynamic Sessions](docs/providers/azure-dynamic-sessions.md) | `azure-dynamic-sessions` | L | Azure Container Apps dynamic sessions. |
| [Blacksmith Testbox](docs/providers/blacksmith-testbox.md) | `blacksmith-testbox` (`blacksmith`) | L | Delegated Blacksmith CI Testbox lifecycle and execution. |
| [W&B Sandboxes](docs/providers/wandb.md) | `wandb` (`weights-and-biases`) | L | Weights & Biases Sandboxes; reuses `wandb login` credentials. |
See [Providers](docs/providers/README.md) for the full reference, capabilities,
and authoring guide.
## Highlights
- **One-shot or warm workspaces.** `crabbox run` for fire-and-forget;
`crabbox warmup` + `--id` for repeated runs against the same box. See
[warmup](docs/commands/warmup.md) and [run](docs/commands/run.md).
- **Named repo jobs.** `crabbox job run ` lets repos define warmup,
optional Actions hydration, run command, and cleanup policy in `.crabbox.yaml`.
See [Jobs](docs/features/jobs.md).
- **Local-first workspace sync.** No clean-checkout requirement. Tracked and
nonignored files only, fingerprint skip on no-op runs, sanity checks against
suspicious mass deletions, optional shallow base-ref hydration for
changed-test workflows. See [Sync](docs/features/sync.md).
- **Run observability.** Every coordinator-backed run gets an early `run_...`
handle. Use `crabbox attach ` while it is active,
`crabbox events ` for durable lifecycle/output events, and
`crabbox logs ` for retained output after completion. See
[History and logs](docs/features/history-logs.md) and
[Observability](docs/observability.md).
- **GitHub Actions hydration.** `crabbox actions hydrate` runs supported setup
steps from the repo's workflow locally over SSH, so leased boxes get the same
runtimes and tooling without GitHub write access. Use `--github-runner` only
when setup needs full Actions semantics such as repository secrets, OIDC,
service containers, or unsupported `uses:` steps. See
[Actions hydration](docs/features/actions-hydration.md).
- **Failure capsules.** `crabbox capsule from-actions ` captures a
failing CI run into a portable, replayable bundle; `capsule replay` reruns it.
See [Capsules](docs/features/capsules.md).
- **Checkpoints.** Save VM-or-workspace state and `restore`/`fork` from it, via
workspace archives or provider-native snapshots/images. See
[Checkpoints](docs/features/checkpoints.md).
- **Pond peer groups.** Leases that share a `--pond ` label form an
emergent peer group with discovery (`pond peers`), an SSH-mesh of
`ssh -L` forwards to members' `--expose` ports (`pond connect`), and bulk
`pond release`. See [Pond](docs/features/pond.md).
- **Brokered cloud with cost guardrails.** Maintainers and agents share infra
without sharing provider tokens. Hetzner, AWS, Azure, and Google Cloud are
the managed providers; per-lease and monthly spend caps reject over-budget
leases. Providers fall back across compatible instance families when capacity
or quota rejects a request. `crabbox usage` summarizes spend by user, org,
provider, and type. See [Coordinator](docs/features/coordinator.md),
[Capacity fallback](docs/features/capacity-fallback.md), and
[Cost and usage](docs/features/cost-usage.md).
- **Interactive desktop, browser, and code leases.** `--browser` provisions
Chrome/Chromium for headless automation, `--desktop` provisions a visible UI
with tunnel-only VNC takeover, and `--code` provisions code-server on managed
Linux. `crabbox desktop click/paste/type/key` provide first-class input
helpers; `desktop proof` captures metadata, screenshot, diagnostics, MP4, and
a contact-sheet PNG in one publishable bundle. See
[Interactive desktop and VNC](docs/features/interactive-desktop-vnc.md).
- **Authenticated web portal.** Browser login opens owner-scoped and shared
lease/run views with run logs/events, WebVNC, code-server, and telemetry
charts. `crabbox webvnc`/`crabbox code` bridge a lease into the portal;
`crabbox share` grants a lease to a user or the owning org. See
[Portal](docs/features/portal.md).
- **Agent workspace evidence.** History, logs, events, telemetry, JUnit
summaries, screenshots, recordings, artifacts, and PR publishing make
autonomous work reviewable instead of only ephemeral terminal output. See
[Artifacts](docs/features/artifacts.md) and
[Telemetry](docs/features/telemetry.md).
- **Stable timing records.** `--timing-json` on `run`, `warmup`, and
`actions hydrate` gives scripts one machine-readable sync/command/total
timing schema across providers.
- **Hardened coordinator auth.** GitHub browser login, owner-scoped leases,
admin-only routes, optional GitHub team allowlists, Cloudflare Access JWT
verification, and service-token support keep normal use and operator
automation separate. See [Auth and admin](docs/features/auth-admin.md) and
[Security](docs/security.md).
- **OpenClaw plugin.** The repo root is a native OpenClaw plugin for box
lifecycle operations. See [OpenClaw plugin](#openclaw-plugin) below and
[OpenClaw plugin](docs/features/openclaw-plugin.md).
## Machine classes
`beast` is the default for providers that expose class-based managed capacity.
The providers below fall back across ordered instance-type lists unless `--type`
pins a specific provider-native size.
```text
Hetzner standard ccx33, cpx62, cx53
fast ccx43, cpx62, cx53
large ccx53, ccx43, cpx62, cx53
beast ccx63, ccx53, ccx43, cpx62, cx53
AWS Linux standard c7a/c7i/m7a/m7i.8xlarge family
fast …16xlarge family
large …24xlarge family
beast …48xlarge family, falling back to 32x/24x/16x
AWS Win standard m7i.large, m7a.large, t3.large
fast m7i.xlarge, m7a.xlarge, t3.xlarge
large m7i.2xlarge, m7a.2xlarge, t3.2xlarge
beast m7i.4xlarge, m7a.4xlarge, m7i.2xlarge
AWS WSL2 standard m8i.large, m8i-flex.large, c8i.large, r8i.large
fast m8i.xlarge, m8i-flex.xlarge, c8i.xlarge, r8i.xlarge
large m8i.2xlarge, m8i-flex.2xlarge, c8i.2xlarge, r8i.2xlarge
beast m8i.4xlarge, m8i-flex.4xlarge, c8i.4xlarge, r8i.4xlarge, m8i.2xlarge
AWS macOS all mac2.metal, then mac1.metal unless --type is set
Azure standard Standard_D32ads_v6, Standard_D32ds_v6, Standard_F32s_v2, then 16-vCPU fallbacks
fast Standard_D64ads_v6, Standard_D64ds_v6, Standard_F64s_v2, then 48/32-vCPU fallbacks
large Standard_D96ads_v6, Standard_D96ds_v6, then 64/48-vCPU fallbacks
beast Standard_D192ds_v6, Standard_D128ds_v6, then 96/64-vCPU fallbacks
Azure Win/
WSL2 standard Standard_D2ads_v6, Standard_D2ds_v6, Standard_D2ads_v5, Standard_D2ds_v5, Standard_D2as_v6
fast Standard_D4ads_v6, Standard_D4ds_v6, Standard_D4ads_v5, Standard_D4ds_v5, Standard_D4as_v6
large Standard_D8ads_v6, Standard_D8ds_v6, Standard_D8ads_v5, Standard_D8ds_v5, Standard_D8as_v6
beast Standard_D16ads_v6, Standard_D16ds_v6, Standard_D16ads_v5, Standard_D16ds_v5, Standard_D8ads_v6
Namespace standard S
fast M
large L
beast XL
Cloudflare standard standard-4
fast standard-4
large standard-4
beast standard-4
```
Override with `--type` or `CRABBOX_SERVER_TYPE` for a specific instance.
Cloudflare also accepts `lite`, `basic`, `standard-1`, `standard-2`, and
`standard-3` as smaller explicit `--type` values; `standard-4` is the default.
Providers without a row either use provider-native capacity settings or reject
class/type selection.
## Configuration
Config resolves in order: flags → env → repo `.crabbox.yaml` → user
`~/.config/crabbox/config.yaml` → defaults.
```yaml
broker:
url: https://broker.example.com
provider: aws
token: ...
class: beast
capacity:
market: spot
strategy: most-available
fallback: on-demand-after-120s
hints: true
aws:
region: eu-west-1
rootGB: 400
lease:
idleTimeout: 30m
ttl: 90m
ssh:
key: ~/.ssh/id_ed25519
user: crabbox
port: "2222"
# Ordered fallback ports tried after ssh.port; use [] to disable fallback.
fallbackPorts:
- "22"
```
Forwarded environment is intentionally narrow: `NODE_OPTIONS` and `CI`. Do not
pass secrets as command-line arguments. For live-secret smoke tests, use
`crabbox run --env-from-profile --allow-env NAME` so Crabbox forwards
only selected names and prints redacted presence/length metadata. For stale warm
boxes, `--full-resync` (alias `--fresh-sync`) resets the remote workdir before
syncing. For larger commands, use `--script ` or `--script-stdin` so the
remote runner executes an uploaded file instead of a giant quoted shell string.
For binary or terminal-hostile output, use `crabbox run --capture-stdout `
or `--capture-stderr `. Add `--preflight` for a remote capability
snapshot, `--keep-on-failure` to SSH into the exact failed one-shot lease, or
`--download remote=local` to copy a successful-run artifact back. Failed
SSH-backed and Blacksmith delegated runs save local `.crabbox/captures/*.tar.gz`
bundles by default. Captured files are not redacted by Crabbox.
Optional Tailscale reachability for managed Linux leases:
```yaml
tailscale:
enabled: true
network: auto
tags:
- tag:crabbox
hostnameTemplate: crabbox-{slug}
authKeyEnv: CRABBOX_TAILSCALE_AUTH_KEY
exitNode: mac-studio.example.ts.net
exitNodeAllowLanAccess: true
```
Tailscale is a network plane, not a provider. `--tailscale` joins new managed
Linux leases to the tailnet; `--network auto|tailscale|public` chooses how SSH
and VNC tunnel commands resolve the host. Brokered mode uses Worker OAuth
secrets to mint one-off keys; direct-provider mode reads the auth key from the
configured env var. See [Tailscale](docs/features/tailscale.md).
A few provider-specific config snippets:
```yaml
# Static macOS or Windows target (existing machine, no provisioning)
provider: ssh
target: windows
windows:
mode: normal # or wsl2
static:
host: win-dev.local
user: alice
port: "22"
workRoot: C:\crabbox
```
```yaml
# Local container (alias: docker; works with OrbStack as the active context)
provider: local-container
localContainer:
runtime: docker
image: debian:bookworm
workRoot: /work/crabbox
```
```yaml
# Delegated Blacksmith CI Testbox
provider: blacksmith-testbox
blacksmith:
org: example-org
workflow: .github/workflows/ci-check-testbox.yml
job: test
ref: main
idleTimeout: 90m
```
Keep provider tokens in environment variables, not repo config (for example
`CRABBOX_SEMAPHORE_TOKEN`, `CRABBOX_SPRITES_TOKEN`, `RUNPOD_API_KEY`,
`E2B_API_KEY`, `DAYTONA_API_KEY`). The full env-var reference, per-provider
sections, and per-command flags are in [docs/cli.md](docs/cli.md),
[Configuration](docs/features/configuration.md), and the
[provider docs](docs/providers/README.md).
## OpenClaw plugin
The repo root is a native OpenClaw plugin package. Once installed, it exposes
Crabbox as agent tools:
- `crabbox_run`, `crabbox_warmup`, `crabbox_status`, `crabbox_list`,
`crabbox_stop`
The plugin shells out to the configured `crabbox` binary with argv arrays, so
local config, broker login, repo claims, and sync behavior stay owned by the
CLI. Set `plugins.entries.crabbox.config.binary` if `crabbox` is not on `PATH`.
Durable run inspection is intentionally CLI/skill-led instead of additional
plugin tools: use `crabbox history`, `crabbox events --after --limit`,
`crabbox attach`, `crabbox logs`, `crabbox results`, and `crabbox usage` from a
shell-capable agent. See [OpenClaw plugin](docs/features/openclaw-plugin.md).
## Development
```sh
# Go CLI
go build -trimpath -o bin/crabbox ./cmd/crabbox
go vet ./...
go test -race ./...
# Cloudflare Worker (Node 22+ locally; CI runs Node 24)
npm ci --prefix worker
npm test --prefix worker
npm run build --prefix worker
# Docs
npm run docs:check
# Optional live smoke, when broker/provider credentials are available
CRABBOX_LIVE=1 CRABBOX_LIVE_REPO=/path/to/my-app scripts/live-smoke.sh
```
CI runs the full gate (gofmt, vet, race tests, all Go modules, coverage
threshold, docs link/build check, GoReleaser snapshot, and Worker
lint/typecheck/tests/build) on every push and PR. Tagged pushes matching `v*`
publish Go archives via GoReleaser and bump the Homebrew formula at
[openclaw/homebrew-tap](https://github.com/openclaw/homebrew-tap).
Worker deployment, required secrets, and DNS routing live in
[docs/infrastructure.md](docs/infrastructure.md).
## Docs
- **Get the model:** [How Crabbox Works](docs/how-it-works.md), [Architecture](docs/architecture.md), [Concepts](docs/concepts.md), [Orchestrator](docs/orchestrator.md)
- **Use the CLI:** [CLI](docs/cli.md), [Commands](docs/commands/README.md), [Features](docs/features/README.md), [Configuration](docs/features/configuration.md)
- **Choose a provider:** [Providers](docs/providers/README.md), [AWS](docs/providers/aws.md), [Azure](docs/providers/azure.md), [GCP](docs/providers/gcp.md), [Hetzner](docs/providers/hetzner.md)
- **Advanced features:** [Actions hydration](docs/features/actions-hydration.md), [Capsules](docs/features/capsules.md), [Checkpoints](docs/features/checkpoints.md), [Jobs](docs/features/jobs.md), [Pond](docs/features/pond.md)
- **Interactive QA:** [Interactive Desktop and VNC](docs/features/interactive-desktop-vnc.md), [Artifacts](docs/features/artifacts.md), [Portal](docs/features/portal.md)
- **Operate it:** [Operations](docs/operations.md), [Observability](docs/observability.md), [Troubleshooting](docs/troubleshooting.md), [Performance](docs/performance.md)
- **Set it up or audit it:** [Infrastructure](docs/infrastructure.md), [Security](docs/security.md), [Getting Started](docs/getting-started.md), [Source Map](docs/source-map.md)
- **Changes:** [CHANGELOG.md](CHANGELOG.md)
The GitHub Pages site at is generated from
the `docs/` Markdown:
```sh
npm run docs:check
open dist/docs-site/index.html
```
## License
MIT — see [LICENSE](LICENSE).