An open API service indexing awesome lists of open source software.

https://github.com/openclaw/crabbox

Crabbox: warm a box, sync the diff, run the suite.
https://github.com/openclaw/crabbox

remote-test-runner

Last synced: 16 days ago
JSON representation

Crabbox: warm a box, sync the diff, run the suite.

Awesome Lists containing this project

README

          

# 🦀 📦 Crabbox

![Crabbox banner](docs/assets/readme-banner.jpg)

[![CI](https://github.com/openclaw/crabbox/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/openclaw/crabbox/actions/workflows/ci.yml)
[![Release](https://github.com/openclaw/crabbox/actions/workflows/release.yml/badge.svg)](https://github.com/openclaw/crabbox/actions/workflows/release.yml)
[![Latest release](https://img.shields.io/github/v/release/openclaw/crabbox?sort=semver)](https://github.com/openclaw/crabbox/releases/latest)

**Warm a box, sync the diff, run the suite.**

Crabbox is a remote software testing and execution control plane for maintainers
and AI agents. Lease fast managed cloud capacity, point at an existing SSH host,
or use an agent sandbox provider — then sync your dirty checkout, run commands
remotely, stream output, collect evidence, and release. Local edit-save-run
loop, cloud-grade compute, agent-ready observability.

```sh
crabbox run -- pnpm test
```

Behind that one command: a Go CLI on your laptop, a Cloudflare Worker broker
that owns provider credentials and lease state, and a managed or delegated
runner.

## How it works

```text
your laptop Cloudflare Worker cloud provider
------------- ------------------ --------------
crabbox CLI -- HTTPS --> Fleet Durable Object --> Hetzner / AWS / Azure / GCP
| lease + cost state |
| |
+------------ SSH + rsync to leased runner <--------------+
```

- **CLI** — Go binary. Loads config, mints a per-lease SSH key, asks the broker
for a lease, waits for SSH, seeds remote Git, rsyncs the dirty checkout (with
a fingerprint skip when nothing changed), runs the command, streams output,
releases.
- **Broker** — Cloudflare Worker plus a single Fleet Durable Object. Owns
provider credentials, serializes lease state, enforces active-lease and
monthly spend caps, and expires stale leases by alarm. Auth is GitHub browser
login or a shared bearer token.
- **Runner** — a throwaway machine reachable over SSH on the primary port
(default `2222`) plus configured fallback ports, prepared with Crabbox's
sync/run prerequisites. Linux uses Ubuntu with cloud-init and `/work/crabbox`;
native Windows uses OpenSSH, Git for Windows, and `C:\crabbox`. No broker
credentials live on the box. Project runtimes (Go, Node, Docker, services,
secrets) come from your repo's GitHub Actions hydration, devcontainer, Nix,
mise/asdf, or setup scripts — not from Crabbox.

The data plane — SSH, rsync, command execution — always runs directly from the
CLI to the runner. The broker only manages leases, cost, and observability.

Only `aws`, `azure`, `gcp`, and `hetzner` can be brokered through the Worker,
and even those run direct from the CLI when no broker URL is configured. Every
other provider always runs direct. A direct-provider mode
(`--provider hetzner|aws|azure|gcp|proxmox` with local credentials) exists for
debugging the broker itself or using private infrastructure.

For the full mental model, see [How Crabbox Works](docs/how-it-works.md). For
the doc-to-code map, see [Source Map](docs/source-map.md).

## Install

```sh
brew install openclaw/tap/crabbox
crabbox --version
```

No Homebrew? Grab a [GoReleaser archive](https://github.com/openclaw/crabbox/releases)
for macOS, Linux, or Windows.

Laptop prerequisites: `git`, `ssh`, `ssh-keygen`, `rsync`, `curl`.

## Quick start

Broker access is deployment-specific. Use a coordinator URL from your team, use
direct-provider mode for a personal cloud account, or self-host the Worker
broker with your own provider credentials and spend caps. See
[Getting started](docs/getting-started.md#choosing-an-access-path) and
[Infrastructure](docs/infrastructure.md#self-hosted-broker-minimum-setup) for the
setup paths.

```sh
# log in once per machine (stores a broker token in user config)
crabbox login --url https://broker.example.com

# verify local prerequisites and broker reachability
crabbox doctor

# one-shot: lease, sync, run, release
crabbox run -- pnpm test

# named repo workflow from .crabbox.yaml
crabbox job run full-ci

# or warm a box once, then reuse it
crabbox warmup # prints cbx_... + a slug
crabbox run --id blue-lobster -- pnpm test:changed
crabbox ssh --id blue-lobster
crabbox stop blue-lobster
```

Every lease has a stable `cbx_...` ID and a friendly crustacean slug
(`blue-lobster`, `swift-hermit`, …). Either works wherever an `--id` is
accepted. Use `--slug ` on fresh leases when a specific reusable slug
helps, and `--label ` on `run` when the history entry needs a
human-readable name.

## Providers

`Coordinator: brokered` providers can run through the Worker (or direct when no
broker is configured); every other provider always runs direct from the CLI.
Targets: **L**inux, **M**acOS, **W**indows.

### SSH-lease providers (provision or connect a box, full lifecycle)

| Provider | `provider:` (aliases) | Targets | Coordinator | Notes |
| --- | --- | --- | --- | --- |
| [AWS EC2](docs/providers/aws.md) | `aws` | L / M / W | brokered | EC2 instances and EC2 Mac; native AMI/EBS checkpoints. |
| [Azure](docs/providers/azure.md) | `azure` | L / W | brokered | VMs with Tailscale support; native Windows and WSL2. |
| [Google Cloud](docs/providers/gcp.md) | `gcp` (`google`, `google-cloud`) | L | brokered | Linux Compute Engine VMs with Tailscale support. |
| [Hetzner Cloud](docs/providers/hetzner.md) | `hetzner` | L | brokered | Linux VMs with desktop/browser/code and Tailscale. |
| [Parallels](docs/providers/parallels.md) | `parallels` | L / M / W | direct | Local or remote macOS host; checkpoint/fork/restore/snapshot. |
| [Proxmox](docs/providers/proxmox.md) | `proxmox` | L | direct | Clone Linux QEMU templates on a private Proxmox VE cluster. |
| [Static SSH](docs/providers/ssh.md) | `ssh` (`static`, `static-ssh`) | L / M / W | direct | Existing machines; no provisioning. |
| [Local Container](docs/providers/local-container.md) | `local-container` (`docker`, `container`, `local-docker`) | L | direct | Local Docker-compatible runtime (Docker Desktop, OrbStack, Colima). |
| [exe.dev](docs/providers/exe-dev.md) | `exe-dev` (`exe`, `exedev`) | L | direct | exe.dev VMs exposed as public SSH leases. |
| [Namespace Devbox](docs/providers/namespace-devbox.md) | `namespace-devbox` (`namespace`, `namespace-devboxes`) | L | direct | Namespace.so Devboxes over SSH. |
| [Semaphore](docs/providers/semaphore.md) | `semaphore` (`sem`) | L | direct | A Semaphore CI job leased as a testbox. |
| [Sprites](docs/providers/sprites.md) | `sprites` | L | direct | Sprites microVMs through `sprite proxy`. |
| [Daytona](docs/providers/daytona.md) | `daytona` | L | direct | Daytona-managed dev sandbox over SSH. |
| [RunPod](docs/providers/runpod.md) | `runpod` (`run-pod`, `runpodio`) | L | direct | RunPod GPU pods with public SSH. |

### Delegated-run providers (sandbox/proof runners, no SSH lease)

| Provider | `provider:` (aliases) | Targets | Notes |
| --- | --- | --- | --- |
| [Cloudflare](docs/providers/cloudflare.md) | `cloudflare` (`cf`) | L | Cloudflare Containers via the Worker runtime. |
| [E2B](docs/providers/e2b.md) | `e2b` | L | E2B Firecracker sandbox. |
| [Islo](docs/providers/islo.md) | `islo` | L | Islo sandbox. |
| [Modal](docs/providers/modal.md) | `modal` | L | Modal Sandbox through the local Python client. |
| [Railway](docs/providers/railway.md) | `railway` (`rail`, `railwayapp`) | L | Redeploy and stream an existing Railway service. |
| [Tensorlake](docs/providers/tensorlake.md) | `tensorlake` (`tl`, `tensorlake-sbx`) | L | Tensorlake Firecracker sandbox via the Tensorlake CLI. |
| [Upstash Box](docs/providers/upstash-box.md) | `upstash-box` (`upstash`, `box`, `upstashbox`) | L | Upstash Box through the Box REST API. |
| [Azure Dynamic Sessions](docs/providers/azure-dynamic-sessions.md) | `azure-dynamic-sessions` | L | Azure Container Apps dynamic sessions. |
| [Blacksmith Testbox](docs/providers/blacksmith-testbox.md) | `blacksmith-testbox` (`blacksmith`) | L | Delegated Blacksmith CI Testbox lifecycle and execution. |
| [W&B Sandboxes](docs/providers/wandb.md) | `wandb` (`weights-and-biases`) | L | Weights & Biases Sandboxes; reuses `wandb login` credentials. |

See [Providers](docs/providers/README.md) for the full reference, capabilities,
and authoring guide.

## Highlights

- **One-shot or warm workspaces.** `crabbox run` for fire-and-forget;
`crabbox warmup` + `--id` for repeated runs against the same box. See
[warmup](docs/commands/warmup.md) and [run](docs/commands/run.md).
- **Named repo jobs.** `crabbox job run ` lets repos define warmup,
optional Actions hydration, run command, and cleanup policy in `.crabbox.yaml`.
See [Jobs](docs/features/jobs.md).
- **Local-first workspace sync.** No clean-checkout requirement. Tracked and
nonignored files only, fingerprint skip on no-op runs, sanity checks against
suspicious mass deletions, optional shallow base-ref hydration for
changed-test workflows. See [Sync](docs/features/sync.md).
- **Run observability.** Every coordinator-backed run gets an early `run_...`
handle. Use `crabbox attach ` while it is active,
`crabbox events ` for durable lifecycle/output events, and
`crabbox logs ` for retained output after completion. See
[History and logs](docs/features/history-logs.md) and
[Observability](docs/observability.md).
- **GitHub Actions hydration.** `crabbox actions hydrate` runs supported setup
steps from the repo's workflow locally over SSH, so leased boxes get the same
runtimes and tooling without GitHub write access. Use `--github-runner` only
when setup needs full Actions semantics such as repository secrets, OIDC,
service containers, or unsupported `uses:` steps. See
[Actions hydration](docs/features/actions-hydration.md).
- **Failure capsules.** `crabbox capsule from-actions ` captures a
failing CI run into a portable, replayable bundle; `capsule replay` reruns it.
See [Capsules](docs/features/capsules.md).
- **Checkpoints.** Save VM-or-workspace state and `restore`/`fork` from it, via
workspace archives or provider-native snapshots/images. See
[Checkpoints](docs/features/checkpoints.md).
- **Pond peer groups.** Leases that share a `--pond ` label form an
emergent peer group with discovery (`pond peers`), an SSH-mesh of
`ssh -L` forwards to members' `--expose` ports (`pond connect`), and bulk
`pond release`. See [Pond](docs/features/pond.md).
- **Brokered cloud with cost guardrails.** Maintainers and agents share infra
without sharing provider tokens. Hetzner, AWS, Azure, and Google Cloud are
the managed providers; per-lease and monthly spend caps reject over-budget
leases. Providers fall back across compatible instance families when capacity
or quota rejects a request. `crabbox usage` summarizes spend by user, org,
provider, and type. See [Coordinator](docs/features/coordinator.md),
[Capacity fallback](docs/features/capacity-fallback.md), and
[Cost and usage](docs/features/cost-usage.md).
- **Interactive desktop, browser, and code leases.** `--browser` provisions
Chrome/Chromium for headless automation, `--desktop` provisions a visible UI
with tunnel-only VNC takeover, and `--code` provisions code-server on managed
Linux. `crabbox desktop click/paste/type/key` provide first-class input
helpers; `desktop proof` captures metadata, screenshot, diagnostics, MP4, and
a contact-sheet PNG in one publishable bundle. See
[Interactive desktop and VNC](docs/features/interactive-desktop-vnc.md).
- **Authenticated web portal.** Browser login opens owner-scoped and shared
lease/run views with run logs/events, WebVNC, code-server, and telemetry
charts. `crabbox webvnc`/`crabbox code` bridge a lease into the portal;
`crabbox share` grants a lease to a user or the owning org. See
[Portal](docs/features/portal.md).
- **Agent workspace evidence.** History, logs, events, telemetry, JUnit
summaries, screenshots, recordings, artifacts, and PR publishing make
autonomous work reviewable instead of only ephemeral terminal output. See
[Artifacts](docs/features/artifacts.md) and
[Telemetry](docs/features/telemetry.md).
- **Stable timing records.** `--timing-json` on `run`, `warmup`, and
`actions hydrate` gives scripts one machine-readable sync/command/total
timing schema across providers.
- **Hardened coordinator auth.** GitHub browser login, owner-scoped leases,
admin-only routes, optional GitHub team allowlists, Cloudflare Access JWT
verification, and service-token support keep normal use and operator
automation separate. See [Auth and admin](docs/features/auth-admin.md) and
[Security](docs/security.md).
- **OpenClaw plugin.** The repo root is a native OpenClaw plugin for box
lifecycle operations. See [OpenClaw plugin](#openclaw-plugin) below and
[OpenClaw plugin](docs/features/openclaw-plugin.md).

## Machine classes

`beast` is the default for providers that expose class-based managed capacity.
The providers below fall back across ordered instance-type lists unless `--type`
pins a specific provider-native size.

```text
Hetzner standard ccx33, cpx62, cx53
fast ccx43, cpx62, cx53
large ccx53, ccx43, cpx62, cx53
beast ccx63, ccx53, ccx43, cpx62, cx53

AWS Linux standard c7a/c7i/m7a/m7i.8xlarge family
fast …16xlarge family
large …24xlarge family
beast …48xlarge family, falling back to 32x/24x/16x

AWS Win standard m7i.large, m7a.large, t3.large
fast m7i.xlarge, m7a.xlarge, t3.xlarge
large m7i.2xlarge, m7a.2xlarge, t3.2xlarge
beast m7i.4xlarge, m7a.4xlarge, m7i.2xlarge

AWS WSL2 standard m8i.large, m8i-flex.large, c8i.large, r8i.large
fast m8i.xlarge, m8i-flex.xlarge, c8i.xlarge, r8i.xlarge
large m8i.2xlarge, m8i-flex.2xlarge, c8i.2xlarge, r8i.2xlarge
beast m8i.4xlarge, m8i-flex.4xlarge, c8i.4xlarge, r8i.4xlarge, m8i.2xlarge

AWS macOS all mac2.metal, then mac1.metal unless --type is set

Azure standard Standard_D32ads_v6, Standard_D32ds_v6, Standard_F32s_v2, then 16-vCPU fallbacks
fast Standard_D64ads_v6, Standard_D64ds_v6, Standard_F64s_v2, then 48/32-vCPU fallbacks
large Standard_D96ads_v6, Standard_D96ds_v6, then 64/48-vCPU fallbacks
beast Standard_D192ds_v6, Standard_D128ds_v6, then 96/64-vCPU fallbacks

Azure Win/
WSL2 standard Standard_D2ads_v6, Standard_D2ds_v6, Standard_D2ads_v5, Standard_D2ds_v5, Standard_D2as_v6
fast Standard_D4ads_v6, Standard_D4ds_v6, Standard_D4ads_v5, Standard_D4ds_v5, Standard_D4as_v6
large Standard_D8ads_v6, Standard_D8ds_v6, Standard_D8ads_v5, Standard_D8ds_v5, Standard_D8as_v6
beast Standard_D16ads_v6, Standard_D16ds_v6, Standard_D16ads_v5, Standard_D16ds_v5, Standard_D8ads_v6

Namespace standard S
fast M
large L
beast XL

Cloudflare standard standard-4
fast standard-4
large standard-4
beast standard-4
```

Override with `--type` or `CRABBOX_SERVER_TYPE` for a specific instance.
Cloudflare also accepts `lite`, `basic`, `standard-1`, `standard-2`, and
`standard-3` as smaller explicit `--type` values; `standard-4` is the default.
Providers without a row either use provider-native capacity settings or reject
class/type selection.

## Configuration

Config resolves in order: flags → env → repo `.crabbox.yaml` → user
`~/.config/crabbox/config.yaml` → defaults.

```yaml
broker:
url: https://broker.example.com
provider: aws
token: ...
class: beast
capacity:
market: spot
strategy: most-available
fallback: on-demand-after-120s
hints: true
aws:
region: eu-west-1
rootGB: 400
lease:
idleTimeout: 30m
ttl: 90m
ssh:
key: ~/.ssh/id_ed25519
user: crabbox
port: "2222"
# Ordered fallback ports tried after ssh.port; use [] to disable fallback.
fallbackPorts:
- "22"
```

Forwarded environment is intentionally narrow: `NODE_OPTIONS` and `CI`. Do not
pass secrets as command-line arguments. For live-secret smoke tests, use
`crabbox run --env-from-profile --allow-env NAME` so Crabbox forwards
only selected names and prints redacted presence/length metadata. For stale warm
boxes, `--full-resync` (alias `--fresh-sync`) resets the remote workdir before
syncing. For larger commands, use `--script ` or `--script-stdin` so the
remote runner executes an uploaded file instead of a giant quoted shell string.

For binary or terminal-hostile output, use `crabbox run --capture-stdout `
or `--capture-stderr `. Add `--preflight` for a remote capability
snapshot, `--keep-on-failure` to SSH into the exact failed one-shot lease, or
`--download remote=local` to copy a successful-run artifact back. Failed
SSH-backed and Blacksmith delegated runs save local `.crabbox/captures/*.tar.gz`
bundles by default. Captured files are not redacted by Crabbox.

Optional Tailscale reachability for managed Linux leases:

```yaml
tailscale:
enabled: true
network: auto
tags:
- tag:crabbox
hostnameTemplate: crabbox-{slug}
authKeyEnv: CRABBOX_TAILSCALE_AUTH_KEY
exitNode: mac-studio.example.ts.net
exitNodeAllowLanAccess: true
```

Tailscale is a network plane, not a provider. `--tailscale` joins new managed
Linux leases to the tailnet; `--network auto|tailscale|public` chooses how SSH
and VNC tunnel commands resolve the host. Brokered mode uses Worker OAuth
secrets to mint one-off keys; direct-provider mode reads the auth key from the
configured env var. See [Tailscale](docs/features/tailscale.md).

A few provider-specific config snippets:

```yaml
# Static macOS or Windows target (existing machine, no provisioning)
provider: ssh
target: windows
windows:
mode: normal # or wsl2
static:
host: win-dev.local
user: alice
port: "22"
workRoot: C:\crabbox
```

```yaml
# Local container (alias: docker; works with OrbStack as the active context)
provider: local-container
localContainer:
runtime: docker
image: debian:bookworm
workRoot: /work/crabbox
```

```yaml
# Delegated Blacksmith CI Testbox
provider: blacksmith-testbox
blacksmith:
org: example-org
workflow: .github/workflows/ci-check-testbox.yml
job: test
ref: main
idleTimeout: 90m
```

Keep provider tokens in environment variables, not repo config (for example
`CRABBOX_SEMAPHORE_TOKEN`, `CRABBOX_SPRITES_TOKEN`, `RUNPOD_API_KEY`,
`E2B_API_KEY`, `DAYTONA_API_KEY`). The full env-var reference, per-provider
sections, and per-command flags are in [docs/cli.md](docs/cli.md),
[Configuration](docs/features/configuration.md), and the
[provider docs](docs/providers/README.md).

## OpenClaw plugin

The repo root is a native OpenClaw plugin package. Once installed, it exposes
Crabbox as agent tools:

- `crabbox_run`, `crabbox_warmup`, `crabbox_status`, `crabbox_list`,
`crabbox_stop`

The plugin shells out to the configured `crabbox` binary with argv arrays, so
local config, broker login, repo claims, and sync behavior stay owned by the
CLI. Set `plugins.entries.crabbox.config.binary` if `crabbox` is not on `PATH`.

Durable run inspection is intentionally CLI/skill-led instead of additional
plugin tools: use `crabbox history`, `crabbox events --after --limit`,
`crabbox attach`, `crabbox logs`, `crabbox results`, and `crabbox usage` from a
shell-capable agent. See [OpenClaw plugin](docs/features/openclaw-plugin.md).

## Development

```sh
# Go CLI
go build -trimpath -o bin/crabbox ./cmd/crabbox
go vet ./...
go test -race ./...

# Cloudflare Worker (Node 22+ locally; CI runs Node 24)
npm ci --prefix worker
npm test --prefix worker
npm run build --prefix worker

# Docs
npm run docs:check

# Optional live smoke, when broker/provider credentials are available
CRABBOX_LIVE=1 CRABBOX_LIVE_REPO=/path/to/my-app scripts/live-smoke.sh
```

CI runs the full gate (gofmt, vet, race tests, all Go modules, coverage
threshold, docs link/build check, GoReleaser snapshot, and Worker
lint/typecheck/tests/build) on every push and PR. Tagged pushes matching `v*`
publish Go archives via GoReleaser and bump the Homebrew formula at
[openclaw/homebrew-tap](https://github.com/openclaw/homebrew-tap).

Worker deployment, required secrets, and DNS routing live in
[docs/infrastructure.md](docs/infrastructure.md).

## Docs

- **Get the model:** [How Crabbox Works](docs/how-it-works.md), [Architecture](docs/architecture.md), [Concepts](docs/concepts.md), [Orchestrator](docs/orchestrator.md)
- **Use the CLI:** [CLI](docs/cli.md), [Commands](docs/commands/README.md), [Features](docs/features/README.md), [Configuration](docs/features/configuration.md)
- **Choose a provider:** [Providers](docs/providers/README.md), [AWS](docs/providers/aws.md), [Azure](docs/providers/azure.md), [GCP](docs/providers/gcp.md), [Hetzner](docs/providers/hetzner.md)
- **Advanced features:** [Actions hydration](docs/features/actions-hydration.md), [Capsules](docs/features/capsules.md), [Checkpoints](docs/features/checkpoints.md), [Jobs](docs/features/jobs.md), [Pond](docs/features/pond.md)
- **Interactive QA:** [Interactive Desktop and VNC](docs/features/interactive-desktop-vnc.md), [Artifacts](docs/features/artifacts.md), [Portal](docs/features/portal.md)
- **Operate it:** [Operations](docs/operations.md), [Observability](docs/observability.md), [Troubleshooting](docs/troubleshooting.md), [Performance](docs/performance.md)
- **Set it up or audit it:** [Infrastructure](docs/infrastructure.md), [Security](docs/security.md), [Getting Started](docs/getting-started.md), [Source Map](docs/source-map.md)
- **Changes:** [CHANGELOG.md](CHANGELOG.md)

The GitHub Pages site at is generated from
the `docs/` Markdown:

```sh
npm run docs:check
open dist/docs-site/index.html
```

## License

MIT — see [LICENSE](LICENSE).