https://github.com/amankumarsingh77/pi-browser-harness

Browser control extension for pi — navigate, click, type, screenshot, and extract data from real Chrome via CDP
https://github.com/amankumarsingh77/pi-browser-harness
browser-automation cdp chrome-devtools-protocol pi pi-package web-automation
Last synced: about 1 month ago
JSON representation
Browser control extension for pi — navigate, click, type, screenshot, and extract data from real Chrome via CDP
Host: GitHub
URL: https://github.com/amankumarsingh77/pi-browser-harness
Owner: amankumarsingh77
License: mit
Created: 2026-04-28T19:10:43.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-05-07T18:48:29.000Z (about 1 month ago)
Last Synced: 2026-05-10T22:56:57.163Z (about 1 month ago)
Topics: browser-automation, cdp, chrome-devtools-protocol, pi, pi-package, web-automation
Language: TypeScript
Homepage: https://github.com/amankumarsingh77/pi-browser-harness#readme
Size: 1.57 MB
Stars: 1
Watchers: 0
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project

README

          # pi-browser-harness

![pi-browser-harness](https://raw.githubusercontent.com/amankumarsingh77/pi-browser-harness/main/assets/hero.png)

Full browser control for pi agents in **your real Chrome** — your sessions, your cookies, your tabs. Drives navigation, structured page reads, network capture, clicks, typing, screenshots, and arbitrary scripts via CDP.

---

## Why pi-browser-harness?

| Capability | pi-browser-harness | Playwright MCP | Stagehand | Puppeteer MCP |

|---|:---:|:---:|:---:|:---:|

| **Drives your real Chrome** (logged-in sessions preserved) | ✅ | ❌ launches its own browser | ❌ | ❌ |

| **Coordinate clicks** that work through iframes, shadow DOM, cross-origin | ✅ | ❌ selector-based | ❌ selector-based | ❌ selector-based |

| **Inline TUI screenshot rendering** (Kitty/iTerm2/Ghostty/WezTerm) | ✅ | ❌ | ❌ | ❌ |

| **Accessibility-tree snapshot with click coords `@(x,y)` per element** | ✅ | ✅ tree only, no coords | partial | ❌ |

| **Network request capture** with filters + body capture | ✅ | ✅ post-hoc list | ❌ | ❌ |

| **Parallel tool execution** with automatic mutation serialization | ✅ | ❌ | ❌ | ❌ |

| **Temporary scripts** with daemon + full Node.js | ✅ | ❌ | ❌ | ❌ |

| **Direct HTTP GET** outside the browser (10–50× faster for APIs) | ✅ | ❌ | ❌ | ❌ |

| **Tab ownership isolation** — never touches the user's other tabs | ✅ | N/A | N/A | N/A |

| **Pi-native** — no MCP/JSON-RPC overhead, no extra LLM API keys | ✅ | ❌ MCP roundtrip | ❌ external LLM | ❌ MCP roundtrip |

| **TypeScript strict mode**, zero `any`, all CDP casts documented | ✅ | unknown | unknown | unknown |

| Ctrl+O expand/collapse on tool output | ✅ | ❌ | ❌ | ❌ |

| Compositor-level dispatch (works on every site, no flakey waits) | ✅ | ❌ | ❌ | ❌ |

If you live in pi and you want an agent driving the same Chrome you're already signed into, this is the only one that fits.

---

## Quick Start

```bash

# 1. Install

pi install npm:pi-browser-harness

# 2. Enable Chrome remote debugging

#    Open chrome://inspect/#remote-debugging in Chrome,

#    tick "Discover network targets", click Allow.

#    Or launch Chrome with --remote-debugging-port=9222

# 3. Connect

/browser-setup

```

### Requirements

- pi (latest)

- Node.js ≥ 22

- Chrome / Chromium / Edge

---

## Tool hierarchy — read this first

```

What do you need to know?

  ├─ Page structure / what's clickable / labels?

  │     → browser_snapshot     (DEFAULT — AX tree with @(x,y) per interactive element)

  │

  ├─ A specific element's value / attribute / coords?

  │     → browser_execute_js   (e.g. el.innerText, el.getBoundingClientRect())

  │

  ├─ Network behavior on the current page?

  │     → browser_network_requests

  │

  └─ Visual rendering (layout / colors / chart drew correctly)?

        → browser_screenshot   (LAST RESORT — pixels only)

```

Pass `@(x,y)` from `browser_snapshot` straight to `browser_click`. **No screenshot round-trip needed** to find click targets — the snapshot already has them.

---

## Tools

### Page inspection (use these by default)

| Tool | Purpose |

|------|---------|

| `browser_snapshot` | **Default for page inspection.** Returns the CDP accessibility tree with click coords `@(x,y)` for every interactive element. Optional `includeScreenshot:true`. |

| `browser_execute_js` | Surgical DOM reads — element text, attributes, `getBoundingClientRect()`. Cheapest, most precise. |

| `browser_network_requests` | List recent network requests on the current tab. Filter by url/method/status/type/recency; optional response-body capture. |

| `browser_page_info` | URL, title, viewport, scroll position, or pending dialog. |

| `browser_http_get` | Direct HTTP GET outside the browser — 10-50× faster for APIs. |

### Visual (last resort)

| Tool | Purpose |

|------|---------|

| `browser_screenshot` | Capture PNG/JPEG. Use only when you need to verify visual rendering. |

### Navigation

| Tool | Purpose |

|------|---------|

| `browser_navigate` / `browser_new_tab` | Navigate or open a tab |

| `browser_open_urls` | Open multiple URLs in parallel tabs |

| `browser_go_back` / `browser_go_forward` / `browser_reload` | History navigation |

| `browser_list_tabs` / `browser_current_tab` / `browser_switch_tab` / `browser_close_tab` | Tab management (only tabs this session opened) |

### Interaction

| Tool | Purpose |

|------|---------|

| `browser_click` | Click at viewport coordinates (use `@(x,y)` from `browser_snapshot`) |

| `browser_type` | Type text into the focused element |

| `browser_press_key` | Press a key with optional modifiers |

| `browser_scroll` | Scroll the page by delta pixels |

### Utility

| Tool | Purpose |

|------|---------|

| `browser_wait` / `browser_wait_for_load` | Sleep, or wait for `readyState === 'complete'` |

| `browser_handle_dialog` | Accept or dismiss `alert` / `confirm` / `prompt` |

| `browser_run_script` | Execute a temporary script with daemon and Node.js access |

Three tools (`browser_snapshot`, `browser_network_requests`, `browser_execute_js`) ship custom TUI rendering with **Ctrl+O** (`app.tools.expand`) to toggle between compact and full output.

---

## Core patterns

### Page inspection

```

browser_snapshot()

# → AX outline with @(x,y) per button/link/input

```

### Form filling (no screenshots)

```

browser_snapshot()                    # find input @(x,y) and labels

browser_click({ x, y })               # click using snapshot's @(x,y)

browser_type({ text: "query" })

browser_press_key({ key: "Enter" })

browser_wait_for_load()

browser_snapshot()                    # verify next state

```

### Data extraction

```js

// One value

browser_execute_js({ expression: "document.querySelector('.price').innerText" })

// Direct API call outside the browser

browser_http_get({ url: "https://api.github.com/repos/amankumarsingh77/pi-browser-harness" })

// Structured arrays

browser_execute_js({ expression: `JSON.stringify(

  Array.from(document.querySelectorAll('.result')).map(el => ({

    title: el.querySelector('h3')?.textContent,

    link:  el.querySelector('a')?.href,

  }))

)` })

```

### Network debugging

```

browser_navigate({ url: "https://app.example.com/feed" })

browser_wait_for_load()

browser_network_requests({

  urlPattern: "/api/",

  statusFilter: { min: 400 },

  includeResponseBodies: true

})

```

### Research workflow

```

browser_navigate({ url: "https://google.com/search?q=..." })

browser_open_urls({ urls: ["url1", "url2", "url3"] })

browser_list_tabs()

browser_switch_tab({ targetId: "..." })

browser_wait_for_load()

browser_snapshot()

browser_execute_js({ expression: "document.querySelector('.content').innerText" })

```

### Visual verification (only when pixels matter)

```

browser_click({ x, y })         # got coords from browser_snapshot

browser_snapshot()              # confirm the form transitioned

browser_screenshot()            # ONLY if you need to verify a chart/modal/CSS rendered correctly

```

### Keyboard modifiers

| Key | Bit |

|-----|-----|

| Alt | 1 |

| Ctrl | 2 |

| Meta / Cmd | 4 |

| Shift | 8 |

```

browser_press_key({ key: "c", modifiers: 2 })    // Ctrl+C

browser_press_key({ key: "v", modifiers: 4 })    // Cmd+V

browser_press_key({ key: "T", modifiers: 10 })   // Ctrl+Shift+T

```

### Dialogs

JS dialogs freeze the page. Check `browser_page_info` first — if it reports a dialog, handle it before anything else:

```

browser_handle_dialog({ action: "accept" })       // confirm

browser_handle_dialog({ action: "dismiss" })       // cancel

browser_handle_dialog({ action: "accept", promptText: "hello" })  // prompt

```

---

## Parallel execution

Observation tools run in parallel by default. Mutation tools (`click`, `type`, `scroll`, `navigate`, `switch_tab`, …) are automatically serialized through a shared mutex — emit them in the same turn and the harness FIFO-queues them.

```

# All three run concurrently

browser_snapshot()

browser_network_requests({ sinceMs: 5000 })

browser_http_get({ url: "..." })

```

---

## Tab ownership

The harness never touches tabs you didn't open through it. On first attach it spawns a dedicated Chrome window; subsequent `browser_new_tab` calls open inside that window. `browser_list_tabs` defaults to `scope:"owned"` (pass `scope:"all"` to see read-only listings of your other tabs); `browser_switch_tab` and `browser_close_tab` refuse non-owned tabs.

---

## Temporary scripts

When the built-in tools aren't enough, write a script to disk and run it. Scripts get a `daemon` binding for direct CDP access — much faster than chaining tool calls.

```js

write("/tmp/scrape-pages.js", `

  const results = [];

  for (const url of params.urls) {

    await daemon.session().call("Page.navigate", { url });

    await new Promise(r => setTimeout(r, 2000));

    const title = await daemon.evaluateJs("document.title");

    results.push({ url, title });

  }

  return { content: [{ type: "text", text: JSON.stringify(results, null, 2) }] };

`)

browser_run_script({ path: "/tmp/scrape-pages.js", params: { urls: [...] } })

```

**Bindings:** `params`, `daemon`, `require`, `signal`, `onUpdate`, `ctx`, `console`, `fetch`, `JSON`, `Buffer`, `setTimeout`, `clearTimeout`.

`daemon` exposes: `evaluateJs`, `pageInfo`, `listTabs`, `switchTab`, `newTab`, `current`, and `session(targetId?)` for raw CDP via `session.call` / `session.callOnTarget` / `session.callBrowser` / `session.takeDialog`.

Scripts are written to disk — auditable and re-runnable.

---

## Commands

| Command | Description |

|---------|-------------|

| `/browser-setup` | Connect pi to Chrome (run once) |

| `/browser-status` | Show daemon health and current page |

| `/browser-reload-daemon` | Restart the connection |

---

## What NOT to do

- Don't launch your own browser — you're connected to the user's real Chrome

- Don't type credentials — if you hit an auth wall, stop and ask

- Don't screenshot to understand the page — `browser_snapshot` is the default

- Don't screenshot to find click coordinates — `browser_snapshot`'s `@(x,y)` is exact

- Don't screenshot to read a value — `browser_execute_js` is one round-trip

- Don't ignore dialogs — check `browser_page_info` first

---

## Architecture

```

pi agent → pi-browser-harness (TypeScript)

               │ CDP WebSocket

               ▼

            Chrome

```

Temporary scripts run inside the harness process with full daemon and Node.js access.

---

## Troubleshooting

| Symptom | Fix |

|---------|-----|

| `DevToolsActivePort not found` | Open `chrome://inspect/#remote-debugging`, tick the checkbox, click Allow |

| Connection fails after Chrome restart | Run `/browser-reload-daemon` |

| Page seems loaded but content is missing | SPA — call `browser_snapshot` again, or `browser_execute_js` for a specific element |

| JS dialog is blocking actions | `browser_page_info` will report it — use `browser_handle_dialog` |

| Daemon not starting | Run `/browser-setup` to re-run guided setup |

| Snapshot didn't return `@(x,y)` for a target | Element wasn't recognized as interactive. Fall back to `browser_execute_js` with `getBoundingClientRect()` |

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, conventions, and PR process.

---

## Security

This extension drives your real Chrome. The agent can see open tabs, read page content, submit forms, and act inside authenticated sessions. `browser_run_script` evaluates JavaScript in the pi process with full `require` access — review any temporary scripts before executing them.

---

## License

MIT — see [LICENSE](LICENSE).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amankumarsingh77/pi-browser-harness

Awesome Lists containing this project

README