https://github.com/borenstein/yolo-cage

AI coding agents that can't exfiltrate secrets or merge their own PRs.
https://github.com/borenstein/yolo-cage

Last synced: 5 months ago
JSON representation

AI coding agents that can't exfiltrate secrets or merge their own PRs.

Host: GitHub
URL: https://github.com/borenstein/yolo-cage
Owner: borenstein
License: mit
Created: 2026-01-11T04:07:57.000Z (6 months ago)
Default Branch: main
Last Pushed: 2026-01-29T02:01:23.000Z (6 months ago)
Last Synced: 2026-01-29T05:47:35.804Z (6 months ago)
Language: Python
Homepage:
Size: 1.73 MB
Stars: 102
Watchers: 1
Forks: 4
Open Issues: 6
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: docs/security-audit.md

Awesome Lists containing this project

awesome-agent-sandboxes - yolo-cage - Anti-exfiltration sandbox. (Self-hosted / Open Source)
awesome-agent-runtime-security - yolo-cage - branch isolation, a fail-closed mitmproxy egress proxy with LLM-Guard secret scanning and GitHub API operation blocking, TruffleHog pre-push hooks, and Kubernetes NetworkPolicy. | (Sandboxing & Isolation)

README

# yolo-cage: autonomous coding agents that do no harm

You're a responsible engineer. You'd never just let an AI run roughshod through your most sensitive systems and codebases.

That's why you'd **never** just shut off the safeguards for a tool like Claude Code. It asks permission for every dangerous action! Safe!

So you wait. And you answer. Decision fatigue sets in. And that's when it happens.

Agent deletes entire repo

Permission prompts neglect the weakest part of the thread model: a tired user. What if we could empower the agent while limiting its blast radius, thus deferring your decisions until PR review?

That would be great! And that would be yolo-cage.

Escape attempts blocked

## Try it

```bash
curl -fsSL https://github.com/borenstein/yolo-cage/releases/latest/download/yolo-cage -o yolo-cage
chmod +x yolo-cage && sudo mv yolo-cage /usr/local/bin/
yolo-cage build --interactive --up
```

Then create a [sandbox](docs/glossary.md#sandbox) and start coding:

```bash
yolo-cage create feature-branch
yolo-cage attach feature-branch # Attach to agent in tmux
```

**Prerequisites:** Vagrant with libvirt (Linux) or QEMU (macOS, experimental), 8GB RAM, 4 CPUs, GitHub PAT (`repo` scope), Claude account. See [setup docs](docs/setup.md) for details.

---

## What gets blocked

**Secrets in HTTP/HTTPS** - [egress proxy](docs/glossary.md#egress-proxy) scans request bodies, headers, URLs:
- `sk-ant-*`, `AKIA*`, `ghp_*`, SSH private keys, generic credential patterns

**Git operations** - [dispatcher](docs/glossary.md#dispatcher) enforces [branch isolation](docs/glossary.md#branch-isolation):
- Push to any branch except the [assigned branch](docs/glossary.md#assigned-branch)
- `git remote`, `git clone`, `git config`, `git credential`

**GitHub CLI** - dispatcher blocks dangerous commands:
- `gh pr merge`, `gh repo delete`, `gh api`

**GitHub API** - proxy blocks at HTTP layer:
- `PUT /repos/*/pulls/*/merge`, `DELETE /repos/*`, webhook modifications

**Exfiltration sites**: pastebin.com, file.io, transfer.sh, etc.

See [Architecture](docs/architecture.md) for the full threat model.

---

## How it works

```
┌──────────────────────────────────────────────────────────────────────────┐
│ Runtime (Vagrant VM + MicroK8s) │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Sandbox │ │
│ │ │ │
│ │ Agent (Claude Code in YOLO mode) │ │
│ │ │ │ │
│ │ ├── git/gh ──▶ Dispatcher ──▶ GitHub │ │
│ │ │ • Branch isolation enforcement │ │
│ │ │ • TruffleHog pre-push scanning │ │
│ │ │ │ │
│ │ └── HTTP/S ──▶ Egress Proxy ──▶ Internet │ │
│ │ • Secret scanning (LLM-Guard) │ │
│ │ • Domain blocklist │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────┘
```

One [sandbox](docs/glossary.md#sandbox) per branch. [Agents](docs/glossary.md#agent) can only push to their [assigned branch](docs/glossary.md#assigned-branch). All outbound traffic is filtered.

---

## CLI

| Command | Description |
|---------|-------------|
| `create ` | Create sandbox |
| `attach ` | Attach (Claude in tmux) |
| `shell ` | Attach (bash) |
| `list` | List sandboxes |
| `delete ` | Delete sandbox |
| `port-forward ` | Forward port from sandbox |
| `up` / `down` | Start/stop VM |
| `upgrade [--rebuild]` | Upgrade to latest version |
| `version` | Show version |

### Port forwarding

Access web apps running inside a [sandbox](docs/glossary.md#sandbox):

```bash
yolo-cage port-forward feature-x 8080 # localhost:8080 → sandbox:8080
yolo-cage port-forward feature-x 9000:3000 # localhost:9000 → sandbox:3000
yolo-cage port-forward feature-x 8080 --bind 0.0.0.0 # LAN accessible
```

See [Configuration](docs/configuration.md) for proxy bypass, hooks, and resource limits.

---

## Documentation

- **[Glossary](docs/glossary.md)** - Ubiquitous language and terminology
- **[Architecture](docs/architecture.md)** - Threat model, design rationale
- **[Configuration](docs/configuration.md)** - Egress policy, proxy bypass, hooks
- **[Customization](docs/customization.md)** - Adding tools, resource limits
- **[Security Audit](docs/security-audit.md)** - Escape testing guide

---

## Limitations

This reduces risk. It does not eliminate it.

- **DNS exfiltration** - data encoded in DNS queries
- **Timing side channels** - information leaked via response timing
- **Steganography** - secrets hidden in images or binary data
- **Sophisticated encoding** - bypassing pattern matching

Use scoped credentials. Don't use production secrets where exfiltration would be catastrophic. See [Security Audit](docs/security-audit.md) to test it yourself.

---

## License

MIT. See [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/borenstein/yolo-cage

Awesome Lists containing this project

README