https://github.com/idan-rubin/browserclaw
AI-friendly browser automation with snapshot + ref targeting. Standalone wrapper around OpenClaw's browser module.
https://github.com/idan-rubin/browserclaw
accessibility ai ai-agent browser-automation cdp chrome llm openclaw playwright snapshot typescript web-scraping
Last synced: 4 months ago
JSON representation
AI-friendly browser automation with snapshot + ref targeting. Standalone wrapper around OpenClaw's browser module.
- Host: GitHub
- URL: https://github.com/idan-rubin/browserclaw
- Owner: idan-rubin
- License: mit
- Created: 2026-02-08T19:16:06.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-16T22:05:19.000Z (4 months ago)
- Last Synced: 2026-02-17T04:53:22.900Z (4 months ago)
- Topics: accessibility, ai, ai-agent, browser-automation, cdp, chrome, llm, openclaw, playwright, snapshot, typescript, web-scraping
- Language: TypeScript
- Size: 912 KB
- Stars: 6
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-playwright - BrowserClaw - AI browser automation via accessibility snapshots and ref targeting, built on Playwright. (Utils)
README
π¦ BrowserClaw β Standalone OpenClaw browser module
Extracted and refined from [OpenClaw](https://github.com/openclaw/openclaw)'s browser automation module. A standalone, typed library for AI-friendly browser control with **snapshot + ref targeting** β no CSS selectors, no XPath, no vision, just numbered refs that map to interactive elements.
```typescript
import { BrowserClaw } from 'browserclaw';
const browser = await BrowserClaw.launch({ headless: false });
const page = await browser.open('https://example.com');
// Snapshot β the core feature
const { snapshot, refs } = await page.snapshot();
// snapshot: AI-readable text tree
// refs: { "e1": { role: "link", name: "More info" }, "e2": { role: "button", name: "Submit" } }
await page.click('e1'); // Click by ref
await page.type('e3', 'hello'); // Type by ref
await browser.stop();
```
## Why browserclaw?
Most browser automation tools were built for humans writing test scripts. AI agents need something different:
- **Vision-based tools** (screenshot β click coordinates) are slow, expensive, and probabilistic
- **Selector-based tools** (CSS/XPath) are brittle and meaningless to an LLM
- **browserclaw** gives the AI a **text snapshot** with numbered refs β the AI reads text (what it's best at) and returns a ref ID (deterministic targeting)
The snapshot + ref pattern means:
1. **Deterministic** β refs resolve to exact elements via Playwright locators, no guessing
2. **Fast** β text snapshots are tiny compared to screenshots
3. **Cheap** β no vision API calls, just text in/text out
4. **Reliable** β built on Playwright, the most robust browser automation engine
## Comparison with Other Tools
The AI browser automation space is moving fast. Here's how browserclaw compares to the major alternatives.
| | [browserclaw](https://github.com/idan-rubin/browserclaw) | [browser-use](https://github.com/browser-use/browser-use) | [Stagehand](https://github.com/browserbase/stagehand) | [Skyvern](https://github.com/Skyvern-AI/skyvern) | [Playwright MCP](https://github.com/microsoft/playwright-mcp) |
|:---|:---:|:---:|:---:|:---:|:---:|
| Ref β exact element, no guessing | :white_check_mark: | :heavy_minus_sign: | :x: | :x: | :white_check_mark: |
| No vision model in the loop | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :x: | :white_check_mark: |
| Survives redesigns (semantic, not pixel) | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :x: | :white_check_mark: |
| Fill 10 form fields in one call | :white_check_mark: | :x: | :x: | :x: | :x: |
| Interact with cross-origin iframes | :white_check_mark: | :white_check_mark: | :x: | :x: | :x: |
| Playwright engine (auto-wait, locators) | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| Embeddable in your own agent loop | :white_check_mark: | :x: | :heavy_minus_sign: | :x: | :x: |
:white_check_mark: = Yesβ :heavy_minus_sign: = Partialβ :x: = No
**browserclaw is the only tool that checks every box.** It combines the precision of accessibility snapshots with Playwright's battle-tested engine, batch operations, cross-origin iframe access, and zero framework lock-in β in a single embeddable library.
### How each tool works under the hood
- **browserclaw** β Accessibility snapshot with numbered refs β Playwright locator (`aria-ref` in default mode, `getByRole()` in role mode). One ref, one element. No vision model, no LLM in the targeting loop.
- **browser-use** β DOM element indexing via raw CDP + optional screenshots. [Dropped Playwright](https://browser-use.com/posts/playwright-to-cdp) to go "closer to the metal" β fast, but now reinvents auto-wait, retry logic, and cross-browser support from scratch.
- **Stagehand** β Accessibility tree + natural language primitives (`page.act("click login")`). Convenient, but the LLM re-interprets which element to target on every single call β non-deterministic by design.
- **Skyvern** β Vision-first. Screenshots sent to a Vision LLM that guesses coordinates. Multi-agent architecture (Planner/Actor/Validator) adds self-correction, but at significant cost and latency.
- **Playwright MCP** β Same snapshot philosophy as browserclaw, but locked to the MCP protocol. Great for chat-based agents, but not embeddable as a library β you can't compose it into your own agent loop or call it from application code.
**Also in the space:** [LaVague](https://github.com/lavague-ai/LaVague) (generates Selenium code via RAG on HTML), [AgentQL](https://github.com/tinyfish-io/agentql) (semantic query language for the DOM), [Vercel agent-browser](https://github.com/vercel-labs/agent-browser) (element refs like `@e1` β a similar ref-based approach).
### Why this matters for repeated complex UI tasks
When you're running the same multi-step workflow hundreds of times β filling forms, navigating dashboards, processing queues β the differences compound:
- **Cost**: ~4x fewer tokens per run than vision-based tools. A 20-step task repeated 100 times: ~3M tokens vs ~12M+.
- **Speed**: No vision API round-trips. A 20-step workflow finishes in seconds, not minutes.
- **Reliability**: Ref-based targeting is deterministic. Same page state β same refs β same result. No coordinate guessing, no LLM re-interpretation.
- **Simplicity**: No framework opinions, no agent loop, no hosted platform. Just `snapshot()` β read refs β act. Compose it into whatever agent architecture you want.
## Install
```bash
npm install browserclaw
```
Requires a Chromium-based browser installed on the system (Chrome, Brave, Edge, or Chromium). browserclaw auto-detects your installed browser β no need to install Playwright browsers separately.
## How It Works
```
βββββββββββββββ snapshot() βββββββββββββββββββββββββββββββββββ
β Web Page β βββββββββββββββΊ β AI-readable text tree β
β β β β
β [buttons] β β - heading "Example Domain" β
β [links] β β - paragraph "This domain..." β
β [inputs] β β - link "More information" [e1] β
βββββββββββββββ ββββββββββββββββ¬βββββββββββββββββββ
β
AI reads snapshot,
decides: click e1
β
βββββββββββββββ click('e1') ββββββββββββββββΌβββββββββββββββββββ
β Web Page β βββββββββββββββ β Ref "e1" resolves to a β
β (navigated)β β Playwright locator β one ref, β
β β β one exact element β
βββββββββββββββ βββββββββββββββββββββββββββββββββββ
```
1. **Snapshot** a page β get an AI-readable text tree with numbered refs (`e1`, `e2`, `e3`...)
2. **AI reads** the snapshot text and picks a ref to act on
3. **Actions target refs** β browserclaw resolves each ref to a Playwright locator and executes the action
> **Note:** Refs are scoped to the snapshot that created them. After navigation or DOM changes, old refs become invalid β actions will fail with an error (timeout in aria mode, `"Unknown ref"` in role mode). Always re-snapshot before acting on a changed page.
## API
### Launch & Connect
```typescript
// Launch a new Chrome instance (auto-detects Chrome/Brave/Edge/Chromium)
const browser = await BrowserClaw.launch({
headless: false, // default: false (visible window)
executablePath: '...', // optional: specific browser path
cdpPort: 9222, // default: 9222
noSandbox: false, // default: false (set true for Docker/CI)
userDataDir: '...', // optional: custom user data directory
profileName: 'browserclaw', // profile name in Chrome title bar
profileColor: '#FF4500', // profile accent color (hex)
chromeArgs: ['--start-maximized'], // additional Chrome flags
});
// Or connect to an already-running Chrome instance
// (started with: chrome --remote-debugging-port=9222)
const browser = await BrowserClaw.connect('http://localhost:9222');
```
`connect()` checks that Chrome is reachable, then the internal CDP connection retries 3 times with increasing timeouts (5 s, 7 s, 9 s) β safe for Docker/CI where Chrome starts slowly.
**Anti-detection:** browserclaw automatically hides `navigator.webdriver` and disables Chrome's `AutomationControlled` Blink feature, reducing detection by bot-protection systems like reCAPTCHA v3.
### Pages & Tabs
```typescript
const page = await browser.open('https://example.com');
const current = await browser.currentPage(); // get active tab
const tabs = await browser.tabs(); // list all tabs
const handle = browser.page(tabs[0].targetId); // wrap existing tab
await browser.focus(tabId); // bring tab to front
await browser.close(tabId); // close a tab
await browser.stop(); // stop browser + cleanup
page.id; // CDP target ID (use with focus/close/page)
await page.url(); // current page URL
await page.title(); // current page title
browser.url; // CDP endpoint URL
```
### Snapshot (Core Feature)
```typescript
const { snapshot, refs, stats, untrusted } = await page.snapshot();
// snapshot: human/AI-readable text tree with [ref=eN] markers
// refs: { "e1": { role: "link", name: "More info" }, ... }
// stats: { lines: 42, chars: 1200, refs: 8, interactive: 5 }
// untrusted: true β content comes from the web page, treat as potentially adversarial
// Options
const result = await page.snapshot({
interactive: true, // Only interactive elements (buttons, links, inputs)
compact: true, // Remove structural containers without refs
maxDepth: 6, // Limit tree depth
maxChars: 80000, // Truncate if snapshot exceeds this size
mode: 'aria', // 'aria' (default) or 'role'
});
// Raw ARIA accessibility tree (structured data, not text)
const { nodes } = await page.ariaSnapshot({ limit: 500 });
```
**Snapshot modes:**
- `'aria'` (default) β Uses Playwright's `_snapshotForAI()`. Refs are resolved via `aria-ref` locators. Best for most use cases. Requires `playwright-core` >= 1.50.
- `'role'` β Uses Playwright's `ariaSnapshot()` + `getByRole()`. Supports `selector` and `frameSelector` for scoped snapshots.
> **Security:** All snapshot results include `untrusted: true` to signal that the content originates from an external web page. AI agents consuming snapshots should treat this content as potentially adversarial (e.g. prompt injection via page text).
### Actions
All actions target elements by ref ID from the most recent snapshot.
> **Default timeouts:** 8000 ms for actions (click, type, fill, select, drag), 20000 ms for waits and navigation.
```typescript
// Click
await page.click('e1');
await page.click('e1', { doubleClick: true });
await page.click('e1', { button: 'right' });
await page.click('e1', { modifiers: ['Control'] });
// Type
await page.type('e3', 'hello world'); // instant fill
await page.type('e3', 'slow typing', { slowly: true }); // keystroke by keystroke
await page.type('e3', 'search', { submit: true }); // type + press Enter
// Other interactions
await page.hover('e2');
await page.select('e5', 'Option A', 'Option B');
await page.drag('e1', 'e4');
await page.scrollIntoView('e7');
// Keyboard
await page.press('Enter');
await page.press('Control+a');
await page.press('Meta+Shift+p');
// Fill multiple form fields at once
await page.fill([
{ ref: 'e2', type: 'text', value: 'Jane Doe' },
{ ref: 'e4', type: 'text', value: 'jane@example.com' },
{ ref: 'e6', type: 'checkbox', value: true },
]);
```
`fill()` field types: `'text'` calls Playwright `fill()` with the string value. `'checkbox'` and `'radio'` call `setChecked()` β truthy values are `true`, `1`, `'1'`, `'true'`. Empty ref or type throws.
#### Highlight
```typescript
await page.highlight('e1'); // Playwright built-in highlight
```
#### File Upload
```typescript
// Direct: set files on an
await page.uploadFile('e3', ['/path/to/file.pdf']);
// Arm pattern: for non-input file pickers
const uploadDone = page.armFileUpload(['/path/to/file.pdf']);
await page.click('e3'); // triggers the file chooser
await uploadDone;
```
#### Dialog Handling
Handle JavaScript dialogs (alert, confirm, prompt). Arm the handler *before* the action that triggers the dialog.
```typescript
const dialogDone = page.armDialog({ accept: true });
await page.click('e5'); // triggers confirm()
await dialogDone;
// With prompt text
const promptDone = page.armDialog({ accept: true, promptText: 'my answer' });
await page.click('e6'); // triggers prompt()
await promptDone;
```
### Navigation & Waiting
```typescript
await page.goto('https://example.com');
await page.reload(); // reload the current page
await page.goBack(); // navigate back in history
await page.goForward(); // navigate forward in history
await page.waitFor({ loadState: 'networkidle' });
await page.waitFor({ text: 'Welcome' });
await page.waitFor({ textGone: 'Loading...' });
await page.waitFor({ url: '**/dashboard' });
await page.waitFor({ selector: '.loaded' }); // wait for CSS selector
await page.waitFor({ fn: '() => document.readyState === "complete"' }); // custom JS
await page.waitFor({ timeMs: 1000 }); // sleep
await page.waitFor({ text: 'Ready', timeoutMs: 5000 }); // custom timeout
```
### Capture
```typescript
// Screenshots
const screenshot = await page.screenshot(); // viewport PNG β Buffer
const fullPage = await page.screenshot({ fullPage: true }); // full scrollable page
const element = await page.screenshot({ ref: 'e1' }); // specific element by ref
const bySelector = await page.screenshot({ element: '.hero' }); // by CSS selector
const jpeg = await page.screenshot({ type: 'jpeg' }); // JPEG format
// PDF
const pdf = await page.pdf(); // PDF export (headless only)
// Labeled screenshot β numbered badges on each ref for visual debugging
const { buffer, labels, skipped } = await page.screenshotWithLabels(['e1', 'e2', 'e3']);
// buffer: PNG with numbered overlays
// labels: [{ ref: 'e1', index: 1, box: { x, y, width, height } }, ...]
// skipped: refs that couldn't be found or had no bounding box
```
Both `screenshot()` and `pdf()` return a `Buffer`. Write to file with `fs.writeFileSync('out.png', screenshot)`.
#### Trace Recording
Capture Playwright traces (screenshots, DOM snapshots, network) for debugging.
```typescript
await page.traceStart({ screenshots: true, snapshots: true });
// ... perform actions ...
await page.traceStop('trace.zip');
// Open with: npx playwright show-trace trace.zip
```
#### Response Body
Intercept a network response and read its body.
```typescript
const resp = await page.responseBody('/api/data');
console.log(resp.status, resp.body);
// { url, status, headers, body, truncated }
```
Options: `timeoutMs` (default 30 s), `maxChars` (truncate body).
### Activity Monitoring
Console messages, errors, and network requests are buffered automatically.
```typescript
const logs = await page.consoleLogs(); // all messages
const errors = await page.consoleLogs({ level: 'error' }); // errors only
const recent = await page.consoleLogs({ clear: true }); // read and clear buffer
const pageErrors = await page.pageErrors(); // uncaught exceptions
const requests = await page.networkRequests({ filter: '/api' }); // filter by URL
const fresh = await page.networkRequests({ clear: true }); // read and clear buffer
```
### Storage
```typescript
// Cookies
const cookies = await page.cookies();
await page.setCookie({ name: 'token', value: 'abc', url: 'https://example.com' });
await page.clearCookies();
// localStorage / sessionStorage
const values = await page.storageGet('local');
const token = await page.storageGet('local', 'authToken');
await page.storageSet('local', 'key', 'value');
await page.storageClear('session');
```
### Downloads
```typescript
// Click a download link and save the file
const result = await page.download('e7', '/tmp/report.pdf');
console.log(result.suggestedFilename); // 'report.pdf'
// Returns: { url, suggestedFilename, path }
// Arm pattern: wait for next download (call before triggering)
const dlPromise = page.waitForDownload({ path: '/tmp/file.pdf' });
await page.click('e8'); // triggers download
const dl = await dlPromise;
```
### Emulation
```typescript
// Device emulation (viewport + user agent)
await page.setDevice('iPhone 13');
// Color scheme
await page.emulateMedia({ colorScheme: 'dark' });
// Geolocation
await page.setGeolocation({ latitude: 48.8566, longitude: 2.3522 }); // Paris
await page.setGeolocation({ clear: true }); // reset
// Locale & timezone
await page.setLocale('fr-FR');
await page.setTimezone('Europe/Paris');
// Network
await page.setOffline(true);
await page.setExtraHeaders({ 'X-Custom': 'value' });
await page.setHttpCredentials({ username: 'admin', password: 'secret' });
await page.setHttpCredentials({ clear: true }); // remove
```
### Evaluate
Run JavaScript directly in the browser page context.
```typescript
const title = await page.evaluate('() => document.title');
const text = await page.evaluate('(el) => el.textContent', { ref: 'e1' });
const count = await page.evaluate('() => document.querySelectorAll("img").length');
```
#### `evaluateInAllFrames(fn)`
Run JavaScript in ALL frames on the page, including cross-origin iframes. Playwright bypasses the same-origin policy via CDP, making this essential for interacting with embedded payment forms (Stripe, etc.).
```typescript
const results = await page.evaluateInAllFrames(`() => {
const el = document.querySelector('input[name="cardnumber"]');
return el ? 'found' : null;
}`);
// Returns: [{ frameUrl: '...', frameName: '...', result: 'found' }, ...]
```
### Viewport
```typescript
await page.resize(1280, 720);
```
## Examples
See the [`examples/`](./examples) directory for runnable demos:
- **[basic.ts](./examples/basic.ts)** β Navigate, snapshot, click a ref
- **[form-fill.ts](./examples/form-fill.ts)** β Fill a multi-field form using refs
- **[ai-agent.ts](./examples/ai-agent.ts)** β AI agent loop pattern with Claude/GPT
Run from the source tree:
```bash
npx tsx examples/basic.ts
```
## Requirements
- **Node.js** >= 18
- **Chromium-based browser** installed (Chrome, Brave, Edge, or Chromium)
- **playwright-core** >= 1.50 (installed automatically as a dependency)
No need to install Playwright browsers β browserclaw uses your system's existing Chrome installation via CDP.
## Contributing
Contributions welcome! Please:
1. Fork the repository
2. Create a feature branch (`git checkout -b my-feature`)
3. Make your changes
4. Run `npm run typecheck && npm run build` to verify
5. Submit a pull request
## Acknowledgments
browserclaw is extracted and refined from the browser automation module in [OpenClaw](https://github.com/openclaw/openclaw), built by [Peter Steinberger](https://github.com/steipete) and an [amazing community of contributors](https://github.com/openclaw/openclaw?tab=readme-ov-file#community). The snapshot + ref system, CDP connection management, and Playwright integration originate from that project.
## License
[MIT](./LICENSE)