An open API service indexing awesome lists of open source software.

https://github.com/patrick204nqh/browserctl

Persistent browser automation daemon and CLI for AI agents and developer workflows. Named sessions, Ruby DSL, and a token-efficient snapshot format.
https://github.com/patrick204nqh/browserctl

ai-agents browser-automation chrome-devtools-protocol cli developer-tools dsl ferrum headless-browser ruby smoke-testing unix-socket workflow-automation

Last synced: 21 days ago
JSON representation

Persistent browser automation daemon and CLI for AI agents and developer workflows. Named sessions, Ruby DSL, and a token-efficient snapshot format.

Awesome Lists containing this project

README

          


browserctl logo

browserctl


The browser you delegate to your agents — with a pause button for the parts that still need you.


CI
Gem Version
Downloads

---

Every browser automation tool restarts the browser when your script ends. That means re-authenticating, re-navigating, re-loading state — on every run. browserctl doesn't restart. The session stays alive between commands, so you pick up exactly where you left off.

```bash
browserd & # start the daemon (headless)
browserctl page open main --url https://example.com/login
browserctl snapshot main # AI-friendly JSON snapshot with ref IDs
browserctl fill main --ref e1 --value me@example.com # interact by ref, no selectors needed
browserctl click main --ref e2
browserctl daemon stop
```

---

## Quick Start

```bash
# 1. Install
gem install browserctl

# 2. Start the daemon
browserd &

# 3. Open a named page
browserctl page open main --url https://moatazeldebsy.github.io/test-automation-practices/#/auth

# 4. Snapshot — returns JSON with a ref ID per interactable element
browserctl snapshot main
# → [{"ref":"e1","tag":"input","attrs":{"data-test":"username-input"}}, {"ref":"e2",...}, {"ref":"e3","tag":"button","text":"Login",...}]

# 5. Interact using the ref IDs from the snapshot
browserctl fill main --ref e1 --value admin
browserctl fill main --ref e2 --value admin
browserctl click main --ref e3

# 6. Observe
browserctl url main
browserctl snapshot main --diff # only what changed

# Session persistence: save now, pick up later
browserctl session save my-session
# On a fresh daemon tomorrow: `browserctl session load my-session`
# → tabs restored, cookies intact, no re-login needed

# 7. Done
browserctl daemon stop
```

→ [Full Getting Started guide](docs/getting-started.md)

---

## See it in action

**Terminal**

CLI commands, live output, session persistence proof

browserctl terminal demo

**Browser**

What the browser sees as those commands run

browserctl browser demo

---

## Use cases

**AI coding agent authenticating into a staging environment** — the agent logs in once, the session persists, subsequent commands run inside the authenticated context without re-authenticating between steps.

**Developer reproducing a multi-step bug report** — navigate to the failure point once, then iterate on the fix with the browser already in the right state; no restarting from the home page each run.

**Automated smoke test that needs human sign-off** — the test runs until it hits something ambiguous, calls `browserctl pause`, lets a human inspect and act, then `browserctl resume` hands control back to the script with all state intact.

---

## Why browserctl?

Most automation tools are stateless — every script spins up a fresh browser and tears it down. browserctl doesn't.

| Capability | browserctl | Playwright / Selenium |
|---|---|---|
| Session persists across commands | ✓ | ✗ (per-script lifecycle) |
| Named page handles | ✓ | ✗ |
| AI-friendly DOM snapshot | ✓ | ✗ |
| Human-in-the-loop pause/resume | ✓ | ✗ |
| Lightweight CLI interface | ✓ | ✗ |
| Full browser automation API | — | ✓ |
| Parallel multi-browser testing | — | ✓ |

**Use browserctl when** you need a browser that stays alive and remembers state — for AI agents, iterative dev workflows, or tasks that mix automation with human judgment.

**Use Playwright/Selenium when** you need parallel test suites, multi-browser support, or a full programmatic API.

---

## Installation

**Requirements:** Ruby >= 3.3 · Chrome or Chromium installed

**macOS (Homebrew — recommended)**

```bash
brew install patrick204nqh/tap/browserctl
```

**RubyGems**

```bash
gem install browserctl
```

Or in your `Gemfile` (for projects using the client API directly):

```ruby
gem "browserctl"
```

---

## Claude Code Plugin

browserctl ships as a Claude Code plugin. Install it once and Claude automatically knows how to use the daemon, ref-based interaction, HITL patterns, and workflow authoring.

**Interactive install**

```
/plugin marketplace add patrick204nqh/browserctl
/plugin install browserctl@browserctl
```

**Project settings** — commit `.claude/settings.json` to share with your team:

```json
{
"extraKnownMarketplaces": {
"browserctl": {
"source": { "source": "github", "repo": "patrick204nqh/browserctl" }
}
},
"enabledPlugins": {
"browserctl@browserctl": true
}
}
```

Once installed, the `browserctl` skill loads automatically.

---

## How it works

`browserd` runs as a background process, listening on a Unix socket at `~/.browserctl/browserd.sock`. It manages a Ferrum (Chrome DevTools Protocol) browser instance with named page handles. `browserctl` sends JSON-RPC commands over the socket and prints the result.

Start multiple named instances for agent isolation:

```bash
browserd --name agent-a &
browserd --name agent-b &
browserctl --daemon agent-a page open main --url https://app.example.com
```

The daemon shuts itself down after 30 minutes of inactivity.

---

## Documentation

| | |
|---|---|
| [Getting Started](docs/getting-started.md) | Install, first session, first snapshot |
| [Agent Integration](docs/guides/agent-integration.md) | Call browserctl from Python, shell, or Anthropic tool-use agents |
| [Concepts](docs/concepts/) | Sessions, snapshots, human-in-the-loop |
| [Guides](docs/guides/) | Writing workflows, handling challenges, smoke testing |
| [Examples](examples/) | Runnable scripts: session reuse, Cloudflare HITL, and more |
| [Command Reference](docs/reference/commands.md) | Every command and flag |
| [API Stability](docs/reference/api-stability.md) | Wire protocol contract and stability zones |
| [CHANGELOG](CHANGELOG.md) | Release history |
| [Product](docs/product.md) | What browserctl is and who it's for |
| [Vision & Roadmap](docs/vision.md) | Philosophy and release roadmap |
| [vs. agent-browser](docs/vs-agent-browser.md) | How browserctl differs from Vercel's agent-browser |

---

## Development

```bash
git clone https://github.com/patrick204nqh/browserctl
cd browserctl
bin/setup # brew bundle (macOS) + bundle install + Chrome check

bundle exec rspec # run tests
bundle exec rubocop # lint

rake demo # full pipeline: screenshots + browser GIF + terminal GIF
rake demo:screenshots # smoke test screenshots only
rake demo:browser_gif # browser animation only (requires: ffmpeg)
rake demo:terminal # terminal GIF only (requires: vhs)
```

> Demo assets are regenerated automatically on every push to `main` that touches `demo/` or the login example.

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) · [SECURITY.md](SECURITY.md)

## License

[MIT](LICENSE)

---

Built by [Patrick](https://github.com/patrick204nqh) — I built this because I was building AI agents that needed authenticated web sessions, and every automation tool I tried restarted the browser between runs.