https://github.com/patrick204nqh/browserctl
Persistent browser automation daemon and CLI for AI agents and developer workflows. Named sessions, Ruby DSL, and a token-efficient snapshot format.
https://github.com/patrick204nqh/browserctl
ai-agents browser-automation chrome-devtools-protocol cli developer-tools dsl ferrum headless-browser ruby smoke-testing unix-socket workflow-automation
Last synced: 21 days ago
JSON representation
Persistent browser automation daemon and CLI for AI agents and developer workflows. Named sessions, Ruby DSL, and a token-efficient snapshot format.
- Host: GitHub
- URL: https://github.com/patrick204nqh/browserctl
- Owner: patrick204nqh
- License: mit
- Created: 2026-04-19T07:44:15.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-04-24T08:59:10.000Z (28 days ago)
- Last Synced: 2026-04-24T23:14:58.073Z (27 days ago)
- Topics: ai-agents, browser-automation, chrome-devtools-protocol, cli, developer-tools, dsl, ferrum, headless-browser, ruby, smoke-testing, unix-socket, workflow-automation
- Language: Ruby
- Homepage:
- Size: 1000 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
browserctl
The browser you delegate to your agents — with a pause button for the parts that still need you.
---
Every browser automation tool restarts the browser when your script ends. That means re-authenticating, re-navigating, re-loading state — on every run. browserctl doesn't restart. The session stays alive between commands, so you pick up exactly where you left off.
```bash
browserd & # start the daemon (headless)
browserctl page open main --url https://example.com/login
browserctl snapshot main # AI-friendly JSON snapshot with ref IDs
browserctl fill main --ref e1 --value me@example.com # interact by ref, no selectors needed
browserctl click main --ref e2
browserctl daemon stop
```
---
## Quick Start
```bash
# 1. Install
gem install browserctl
# 2. Start the daemon
browserd &
# 3. Open a named page
browserctl page open main --url https://moatazeldebsy.github.io/test-automation-practices/#/auth
# 4. Snapshot — returns JSON with a ref ID per interactable element
browserctl snapshot main
# → [{"ref":"e1","tag":"input","attrs":{"data-test":"username-input"}}, {"ref":"e2",...}, {"ref":"e3","tag":"button","text":"Login",...}]
# 5. Interact using the ref IDs from the snapshot
browserctl fill main --ref e1 --value admin
browserctl fill main --ref e2 --value admin
browserctl click main --ref e3
# 6. Observe
browserctl url main
browserctl snapshot main --diff # only what changed
# Session persistence: save now, pick up later
browserctl session save my-session
# On a fresh daemon tomorrow: `browserctl session load my-session`
# → tabs restored, cookies intact, no re-login needed
# 7. Done
browserctl daemon stop
```
→ [Full Getting Started guide](docs/getting-started.md)
---
## See it in action
**Terminal**
CLI commands, live output, session persistence proof

**Browser**
What the browser sees as those commands run

---
## Use cases
**AI coding agent authenticating into a staging environment** — the agent logs in once, the session persists, subsequent commands run inside the authenticated context without re-authenticating between steps.
**Developer reproducing a multi-step bug report** — navigate to the failure point once, then iterate on the fix with the browser already in the right state; no restarting from the home page each run.
**Automated smoke test that needs human sign-off** — the test runs until it hits something ambiguous, calls `browserctl pause`, lets a human inspect and act, then `browserctl resume` hands control back to the script with all state intact.
---
## Why browserctl?
Most automation tools are stateless — every script spins up a fresh browser and tears it down. browserctl doesn't.
| Capability | browserctl | Playwright / Selenium |
|---|---|---|
| Session persists across commands | ✓ | ✗ (per-script lifecycle) |
| Named page handles | ✓ | ✗ |
| AI-friendly DOM snapshot | ✓ | ✗ |
| Human-in-the-loop pause/resume | ✓ | ✗ |
| Lightweight CLI interface | ✓ | ✗ |
| Full browser automation API | — | ✓ |
| Parallel multi-browser testing | — | ✓ |
**Use browserctl when** you need a browser that stays alive and remembers state — for AI agents, iterative dev workflows, or tasks that mix automation with human judgment.
**Use Playwright/Selenium when** you need parallel test suites, multi-browser support, or a full programmatic API.
---
## Installation
**Requirements:** Ruby >= 3.3 · Chrome or Chromium installed
**macOS (Homebrew — recommended)**
```bash
brew install patrick204nqh/tap/browserctl
```
**RubyGems**
```bash
gem install browserctl
```
Or in your `Gemfile` (for projects using the client API directly):
```ruby
gem "browserctl"
```
---
## Claude Code Plugin
browserctl ships as a Claude Code plugin. Install it once and Claude automatically knows how to use the daemon, ref-based interaction, HITL patterns, and workflow authoring.
**Interactive install**
```
/plugin marketplace add patrick204nqh/browserctl
/plugin install browserctl@browserctl
```
**Project settings** — commit `.claude/settings.json` to share with your team:
```json
{
"extraKnownMarketplaces": {
"browserctl": {
"source": { "source": "github", "repo": "patrick204nqh/browserctl" }
}
},
"enabledPlugins": {
"browserctl@browserctl": true
}
}
```
Once installed, the `browserctl` skill loads automatically.
---
## How it works
`browserd` runs as a background process, listening on a Unix socket at `~/.browserctl/browserd.sock`. It manages a Ferrum (Chrome DevTools Protocol) browser instance with named page handles. `browserctl` sends JSON-RPC commands over the socket and prints the result.
Start multiple named instances for agent isolation:
```bash
browserd --name agent-a &
browserd --name agent-b &
browserctl --daemon agent-a page open main --url https://app.example.com
```
The daemon shuts itself down after 30 minutes of inactivity.
---
## Documentation
| | |
|---|---|
| [Getting Started](docs/getting-started.md) | Install, first session, first snapshot |
| [Agent Integration](docs/guides/agent-integration.md) | Call browserctl from Python, shell, or Anthropic tool-use agents |
| [Concepts](docs/concepts/) | Sessions, snapshots, human-in-the-loop |
| [Guides](docs/guides/) | Writing workflows, handling challenges, smoke testing |
| [Examples](examples/) | Runnable scripts: session reuse, Cloudflare HITL, and more |
| [Command Reference](docs/reference/commands.md) | Every command and flag |
| [API Stability](docs/reference/api-stability.md) | Wire protocol contract and stability zones |
| [CHANGELOG](CHANGELOG.md) | Release history |
| [Product](docs/product.md) | What browserctl is and who it's for |
| [Vision & Roadmap](docs/vision.md) | Philosophy and release roadmap |
| [vs. agent-browser](docs/vs-agent-browser.md) | How browserctl differs from Vercel's agent-browser |
---
## Development
```bash
git clone https://github.com/patrick204nqh/browserctl
cd browserctl
bin/setup # brew bundle (macOS) + bundle install + Chrome check
bundle exec rspec # run tests
bundle exec rubocop # lint
rake demo # full pipeline: screenshots + browser GIF + terminal GIF
rake demo:screenshots # smoke test screenshots only
rake demo:browser_gif # browser animation only (requires: ffmpeg)
rake demo:terminal # terminal GIF only (requires: vhs)
```
> Demo assets are regenerated automatically on every push to `main` that touches `demo/` or the login example.
---
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) · [SECURITY.md](SECURITY.md)
## License
[MIT](LICENSE)
---
Built by [Patrick](https://github.com/patrick204nqh) — I built this because I was building AI agents that needed authenticated web sessions, and every automation tool I tried restarted the browser between runs.