https://github.com/wan0net/sharkcage
Trust layer and sandboxing for OpenClaw — per-skill kernel sandboxing, capability model, tool interceptors
https://github.com/wan0net/sharkcage
Last synced: about 2 months ago
JSON representation
Trust layer and sandboxing for OpenClaw — per-skill kernel sandboxing, capability model, tool interceptors
- Host: GitHub
- URL: https://github.com/wan0net/sharkcage
- Owner: wan0net
- Created: 2026-03-29T05:45:51.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-30T08:28:54.000Z (3 months ago)
- Last Synced: 2026-03-30T08:32:04.266Z (3 months ago)
- Language: TypeScript
- Size: 494 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# sharkcage
OpenClaw, but you trust it.
Sharkcage registers as OpenClaw's **sandbox backend**, wrapping every AI-directed tool call with `srt` (Anthropic Sandbox Runtime). Every bash command, file read/write, and skill execution is sandboxed using built-in OS kernel primitives. Capabilities approved once at install become the baseline policy, and later scope expansion is explicit and audited.
> **Development status:** Sharkcage is still under active development. The core security model, test coverage, and install path are in much better shape now, but releases can still change quickly and operational edges are still being tightened. Treat it as a serious early-stage project, not a fully settled platform.
>
> If you deploy it, prefer a disposable VM, low-privilege accounts, revocable tokens, and a setup you can rebuild quickly.
>
> **No new sandboxing tech.** Sharkcage uses the same battle-tested OS primitives that Flatpak, Snap, and Chrome have relied on for years: [bubblewrap](https://github.com/containers/bubblewrap) + seccomp on Linux, Seatbelt (sandbox-exec) on macOS. Wrapped by Anthropic's [srt](https://github.com/anthropic-experimental/sandbox-runtime). These are proven, kernel-enforced boundaries — not a custom sandbox or a JS shim.
>
> Built by an unprofessional security engineer who got tired of `--dangerously-skip-permissions`. Vibe coded with AI, hardened by a human who kept asking "but what if..." until the sandbox actually held up. Three automated security review passes, Trivy and Semgrep on every build, every finding discussed before fixing. The security model wasn't designed top-down — it was discovered bottom-up by trying things, watching them break, and deciding what actually matters.
>
> Ubuntu 24.04+ note: if AppArmor is still restricting unprivileged user namespaces, secure startup will fail closed even when `bubblewrap` and `srt` are installed. The installer checks for this and points to the exact sysctl fix in [INSTALL.md](INSTALL.md).
## Security Model
```
OpenClaw + sharkcage plugin
│
├── Per-tool ASRT sandboxing (sandbox backend)
│ Every bash/exec/file tool call the AI makes:
│ srt --settings /bin/sh -c
│ Kernel-enforced filesystem + network restrictions per command
│
├── Per-skill ASRT sandboxing (supervisor)
│ Each skill runs in its own srt sandbox:
│ srt --settings node
│ Scoped to approved capabilities only
│
├── Capability enforcement (before_tool_call hook)
│ Unapproved skill? → native channel approval (AI cannot see it)
│ Approved? → route to supervisor for sandboxed execution
│
├── Localhost proxy (SOCKS5 on :18800)
│ Per-skill tokens, blocks unapproved localhost access
│
└── Audit log
Hash-chained local audit log, rotated and health-checked
```
- **Per-tool sandboxing** — the sandbox backend wraps every AI-directed command with `srt`. The AI's bash commands and file operations run inside per-session ASRT policies with restricted filesystem and network access. The gateway process itself runs unsandboxed — it only serves deterministic chat server code.
- **Per-skill sandboxing** — each skill gets its own ASRT config derived from approved capabilities. Skills cannot reach each other's hosts or files.
- **Approval flow** — uses OpenClaw's native `requireApproval` so the human sees approval prompts in their chat channel but the AI never does.
- **Approve once, enforce always** — install-time approvals become the baseline policy, and later scope expansion is explicit, persisted, and audited. No per-action runtime nagging.
- **Tamper-evident local audit trail** — tool and proxy events are written to a hash-chained local log with rotation and integrity checks.
- **Fail closed on unsupported hosts** — startup runs a real sandbox smoke test, not just `srt --version`, and refuses to enter secure mode when the host cannot actually launch sandboxed workers.
## Quick Start
```bash
# One-line install
curl -fsSL https://raw.githubusercontent.com/wan0net/sharkcage/main/install.sh | bash
# Or install + configure a server setup non-interactively
OPENROUTER_API_KEY=your-key-here \
curl -fsSL https://raw.githubusercontent.com/wan0net/sharkcage/main/install.sh | \
bash -s -- --configure --mode full --service-user openclaw
# Then
sc init # setup wizard (configures OpenClaw + sandbox mode)
sc start # start everything
```
See [INSTALL.md](INSTALL.md) for full installation instructions.
## CLI
```
sc start Start supervisor + OpenClaw
sc stop Stop everything
sc init First-time setup wizard
sc init --non-interactive ... Server/automation-friendly setup path
sc status Show sandbox state, uptime, skill stats
sc skill add Install a skill
sc skill list List installed skills
sc skill remove Remove a skill
sc skill infer Infer capabilities from skill source
sc approve Review and approve skill capabilities
sc verify Scan a skill for issues
sc sign Sign a skill with your key
sc config show Show sharkcage config
sc config add-service Add a host to the allowed services
sc config remove-service Remove a host from allowed services
sc audit Show recent audit log entries
sc audit --skill Filter by skill
sc audit --blocked Show only blocked calls
sc audit --tool Filter by tool name
sc audit --tail Show last N entries
sc user copy-in [--mode] Copy files into dedicated user's home
sc user shell Open shell as the dedicated user
sc user home Print dedicated user home directory
sc user info Show dedicated user details
sc trust Trust a skill signer
sc upgrade Safely upgrade OpenClaw with rollback
```
## Capability Model
Capabilities are approved once at install and enforced at the kernel level from then on. No runtime prompts. No fatigue. No `--dangerously-skip-permissions`.
When you install a skill, sharkcage:
1. Downloads it
2. Scans for dangerous patterns and missing fields
3. Generates a capability manifest (via static analysis if the skill has none)
4. Shows requested capabilities with risk levels
5. Asks you to approve
After approval, the skill runs in its own ASRT sandbox scoped to exactly what was approved. If it tries to reach a host outside its scope, the kernel blocks it silently and logs the attempt.
## Platform Support
| Platform | Sandbox | How |
|----------|---------|-----|
| macOS | Seatbelt (sandbox-exec) | Native via `srt` |
| Linux | bubblewrap + seccomp | Native via `srt` |
| Windows | bubblewrap + seccomp | Via WSL2 — run OpenClaw inside WSL2 |
`srt` (Anthropic Sandbox Runtime) provides kernel-level enforcement on all three. On Windows, WSL2 gives you a real Linux kernel, so the same bubblewrap+seccomp sandbox works identically.
## How This Was Built
Yes, this was vibe coded. An AI wrote most of the implementation while a human who understands security kept asking "but what if..." until the answers were honest. Multiple automated security review passes, Trivy and Semgrep on every build, every finding discussed before fixing — some were real vulnerabilities, some were the sandbox already doing its job, and knowing the difference mattered more than fixing everything blindly.
The security model wasn't designed top-down. It was discovered bottom-up by trying things, watching them break, understanding why, and deciding what actually matters. The original design had a full outer sandbox wrapping the entire OpenClaw binary. In practice it broke inbound connections, IPC, and FD inheritance. The per-tool model was already doing the real work. That's not a bug in the process — that's how you find out what works.
## Documentation
- [INSTALL.md](INSTALL.md) — Installation and setup
- [docs/unified-platform.md](docs/unified-platform.md) — Full design doc: architecture, capability model, sandbox enforcement, security model