https://github.com/1999azzar/browser-agent-mcp
Professional, modular browser automation agent MCP server powered by Playwright with stealth and high-fidelity observation capabilities.
https://github.com/1999azzar/browser-agent-mcp
ai-agent automation browser-automation mcp model-context-protocol playwright stealth
Last synced: about 24 hours ago
JSON representation
Professional, modular browser automation agent MCP server powered by Playwright with stealth and high-fidelity observation capabilities.
- Host: GitHub
- URL: https://github.com/1999azzar/browser-agent-mcp
- Owner: 1999AZZAR
- License: mit
- Created: 2026-06-03T14:31:14.000Z (9 days ago)
- Default Branch: master
- Last Pushed: 2026-06-03T15:22:13.000Z (9 days ago)
- Last Synced: 2026-06-03T17:09:18.905Z (9 days ago)
- Topics: ai-agent, automation, browser-automation, mcp, model-context-protocol, playwright, stealth
- Language: JavaScript
- Size: 28.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# General Browser Agent MCP
A modular, production-ready browser automation agent implemented as a Model Context Protocol (MCP) server. Powered by Playwright, it provides a comprehensive toolset for human-like web interaction, state analysis, automated navigation, and network-level control.
## Features
- **Semantic Interaction**: Click elements by text (`browser_click_text`) and fill entire forms (`browser_fill_form`) with single commands.
- **Multi-Tab Management**: Handle multiple sites simultaneously with tab list, switch, and creation tools.
- **Resilient Navigation**: Automatic retry with configurable attempts and backoff on network failures.
- **Request Interception**: Block, mock, or modify requests at the network level — stub APIs, strip ads, inject auth headers.
- **Session & Persistence**: Persistent browser contexts with named session save/load for both cookies and Web Storage (`localStorage`/`sessionStorage`).
- **Crash Recovery**: Browser state is automatically persisted to disk. If the browser process dies, tabs and intercept rules are restored on the next tool call — no data loss.
- **Parallel Agents**: Run independent named pages within a single browser context. Create, switch, and remove agents to handle multi-page workflows without interference.
- **PDF Export**: Save pages to disk as PDF with a custom output path and accurate file size reporting.
- **Smart Wait Strategy**: `browser_wait_for_load` for sites with WebSocket/SSE connections; `browser_wait_until_stable` for AJAX-heavy SPAs.
- **Stealth and Evasion**: Anti-detection behavioral profiles (`stealth` vs `speed`), realistic user-agent spoofing, human-like mouse jitter and typing delay.
- **Robust State Capture**: Extracts semantic page data including Accessibility Trees (AX Tree), interactive elements, and structural headings.
- **Data Extraction**: Table-to-JSON extraction and high-fidelity PDF/HTML capture.
- **CAPTCHA Management**: Automated detection and assisted resolution for reCAPTCHA, hCaptcha, and common challenge pages.
## Demo
See the General Browser Agent in action with Gemini CLI: [Watch on YouTube](https://youtu.be/O6nYKjmlaGk)
## Toolset
### Navigation & Tabs
| Tool | Description |
|------|-------------|
| `browser_navigate` | Navigate to a URL with automatic retry on failure (`retries`, `retryDelay`) — state is saved for crash recovery |
| `browser_new_tab` | Open a new tab, optionally at a URL |
| `browser_list_tabs` | List all open tabs and their active status |
| `browser_switch_tab` | Switch active tab by index |
| `browser_back` / `browser_forward` / `browser_reload` | Standard history control |
| `browser_wait` | Wait for a fixed number of milliseconds |
| `browser_wait_for_selector` | Wait until an element appears in the DOM |
| `browser_wait_for_url` | Wait until the URL matches a pattern (substring or regex) |
| `browser_wait_until_stable` | Wait for networkidle — use for AJAX/SPA pages |
| `browser_wait_for_load` | Wait for the `load` or `domcontentloaded` event — use for WebSocket/SSE pages |
### Named Agents / Parallelism
| Tool | Description |
|------|-------------|
| `browser_agent_create` | Create a new named agent page, or switch to an existing one |
| `browser_agent_switch` | Switch active context to a named agent |
| `browser_agent_remove` | Close and remove a named agent |
| `browser_agent_list` | List all active named agents and their URLs |
Named agents are independent pages within the same browser. Use them to parallelize workflows — each agent keeps its own navigation state, forms, and cookies. Create one, work on it, switch to another, come back later.
**Wait strategy guide:**
| Situation | Tool |
|-----------|------|
| Standard page navigation | `browser_wait_for_load()` |
| SPA / AJAX-heavy content | `browser_wait_until_stable()` |
| Page with WebSocket or long-polling | `browser_wait_for_load()` — networkidle will hang |
| Specific element expected | `browser_wait_for_selector(selector)` |
| URL change after action | `browser_wait_for_url(pattern)` |
### Interaction
| Tool | Description |
|------|-------------|
| `browser_click_text` | Click element by visible text (smart button/link detection) |
| `browser_fill_form` | Populate multiple fields at once from a `{selector: value}` object |
| `browser_click` | Click by selector or `x, y` coordinates |
| `browser_double_click` / `browser_right_click` | Pointer events |
| `browser_hover` | Hover over an element or coordinates |
| `browser_drag` | Drag source element to target |
| `browser_scroll` / `browser_scroll_to` | Scroll by direction or to a target |
| `browser_smart_scroll` | Incremental scroll to trigger lazy-loaded content |
### Forms & Input
| Tool | Description |
|------|-------------|
| `browser_type` | Human-like character insertion with configurable delay |
| `browser_clear` | Clear an input field |
| `browser_press` | Press a keyboard key |
| `browser_select` | Select a dropdown option by value or label |
| `browser_check` / `browser_uncheck` | Checkbox and radio control |
### Observation & Extraction
| Tool | Description |
|------|-------------|
| `browser_get_state` | Unified page snapshot: URL, title, AX tree, interactive elements, screenshot — auto-saves AX tree for later diffing |
| `browser_observe` | **Low-token alternative to `browser_get_state`** — returns only interactable elements with `ref` numbers, no screenshot. Use for pre-action planning. |
| `browser_click_ref` | Click an element by its `ref` number from the last `browser_observe` or `browser_get_state` call |
| `browser_state_diff` | Compare last two AX snapshots: URL/title changes, new/removed headings, element shifts, popups, captcha |
| `browser_screenshot` | Take a screenshot |
| `browser_get_text` | Read text from one or all matching elements |
| `browser_get_html` | Get full page or element HTML |
| `browser_extract_table` | Convert an HTML table to structured JSON |
| `browser_get_cookies` | Get all cookies for the active page |
| `browser_evaluate` | Execute JavaScript in the page context (supports `return`, `await`, and `args` injection) |
| `browser_print_to_pdf` | Save the page as a PDF file to a specified path |
| `browser_console_messages` | Return captured browser console messages and JS errors (last 100). Filter by `type`. Pass `clear: true` to flush. |
| `browser_network_requests` | Return captured network requests with status and timing (last 100). Filter by URL substring or `statusMin`. |
| `browser_health` | Check browser health: context alive, page responsive, latency, active URL. Use to diagnose crashes or unresponsive pages. |
**`browser_evaluate` usage:**
```js
// Return a value
script: "return document.title"
// Use await
script: "const r = await fetch('/api/status'); return r.status"
// Pass data via args (no string interpolation needed)
script: "return args.x * args.y"
args: { "x": 6, "y": 7 }
```
### Request Interception
| Tool | Description |
|------|-------------|
| `browser_intercept` | Add an intercept rule: `block`, `mock`, or `modify` |
| `browser_intercept_list` | List all active intercept rules |
| `browser_clear_intercepts` | Remove all intercept rules |
**Actions:**
- `block` — abort matching requests (ads, trackers, heavy assets)
- `mock` — return a synthetic response with `status`, `body`, `contentType`, `headers`
- `modify` — pass the request through with injected headers (auth tokens, API keys)
**Examples:**
```
# Block all images
pattern: "**/*.{png,jpg,jpeg,gif,webp}", action: "block"
# Mock an API endpoint
pattern: "https://api.example.com/users*", action: "mock"
body: { "users": [] }, status: 200
# Inject Authorization header
pattern: "https://api.example.com/*", action: "modify"
headers: { "Authorization": "Bearer " }
```
Rules persist across page navigations until `browser_clear_intercepts` is called.
### Session & Profile Management
| Tool | Description |
|------|-------------|
| `browser_save_session` | Save cookies (and optionally `localStorage`/`sessionStorage`) to a named file |
| `browser_load_session` | Restore a saved session |
| `browser_list_sessions` | List saved session files with size, cookie count, and origin |
| `browser_set_agent_profile` | Switch between `stealth` and `speed` behavioral profiles |
| `browser_handle_captcha` | Detect and manage CAPTCHA with optional manual hand-off |
| `browser_solve_captcha_grid` | Click specific grid cells in a visual CAPTCHA |
| `browser_close` | Terminate the browser session and clear all state |
**Session storage note:** Pass `includeStorage: true` to `browser_save_session` to also capture `localStorage` and `sessionStorage`. Required for sites that store auth tokens in Web Storage instead of cookies (most modern SPAs). Storage is only restored if the current page origin matches the saved origin.
### Helpers
| Tool | Description |
|------|-------------|
| `browser_dismiss_popups` | Suppress modals, banners, and dialogs |
| `browser_export_state` | Export current page state (URL/title/AX/cookies/storage) as a JSON snapshot for sharing or replay |
## Installation
### Prerequisites
- Node.js 18.x or higher
- npm
### Setup
```bash
bash install.sh
```
## Cookie Injection (Firefox Sync)
Place a `cookies.json` file in the project root. The agent will automatically inject these cookies into every new session.
## Configuration
Register in your MCP client config:
```json
{
"mcpServers": {
"browser-agent": {
"command": "node",
"args": ["/absolute/path/to/browser-agent/src/server.js"],
"env": {}
}
}
}
```
### Environment Variables
| Var | Default | Description |
|-----|---------|-------------|
| `START_URL` | — | Page to open when the session starts. |
| `GOAL` | — | Task description exposed to MCP clients. |
| `CHROMIUM_EXECUTABLE_PATH` | Playwright bundled | Path to a dedicated Chromium binary. If set, Playwright uses this instead of its bundled Chromium. |
| `CHROMIUM_CHANNEL` | — | Playwright channel hint (e.g. `chromium`, `chrome`, `chrome-beta`). Ignored if `CHROMIUM_EXECUTABLE_PATH` is set. |
| `BROWSER_HEADLESS` | `false` | Set to `true` for headless operation (CI / production). |
| `BROWSER_LAUNCH_RETRIES` | `3` | Number of retries on browser launch failure. |
| `BROWSER_LAUNCH_BACKOFF` | `1000` | Base delay (ms) between launch retries; doubled each retry. |
### Browser Stability
The browser layer is hardened for long-running sessions:
- **Launch retry** with exponential backoff — if `chromium.launchPersistentContext` fails, the launcher retries up to `BROWSER_LAUNCH_RETRIES` times, doubling the wait between attempts.
- **Tab creation retry** — if `Target.createTarget` or related protocol errors occur when opening a new tab, the context is reset and the call is retried.
- **Context health probe** — the cached context is checked for liveness (with timeout) before reuse; dead contexts are torn down and relaunched transparently.
- **Stability flags** — Chromium is launched with flags that disable background timer throttling, renderer backgrounding, BackForwardCache, and other features that commonly cause crashes in automation.
- **`browser_health` tool** — returns `{ contextAlive, pageResponsive, pageCount, pageLatencyMs, activePageUrl, headless, executablePath, launchRetries }` for runtime diagnostics.
## Token-Efficient Interaction: Observe → Act
For repetitive or well-understood pages, skip the heavy `browser_get_state` screenshot and use the observe→click loop:
```
1. browser_observe() # Returns elements with ref numbers, no screenshot
→ { elements: [{ ref: 1, tag: "BUTTON", text: "Sign In" }, ...] }
2. browser_click_ref(ref=1) # Click by ref — no re-snapshot needed
→ "Clicked ref 1 (BUTTON "Sign In") at (320, 240)."
```
This matches the approach used by browser-use (93% context reduction) and Stagehand's `act` primitive.
For debugging after an interaction:
```
browser_console_messages(type='error') # Any JS errors?
browser_network_requests(statusMin=400) # Any failed API calls?
```
## Architecture: Sense-Think-Act
The agent is designed for closed-loop automation with a **hybrid screenshot strategy** — screenshots are used only when the AX tree is insufficient.
```
Unfamiliar page → browser_get_state() # AX tree + elements, no image
Planning an action → browser_observe() # interactable elements + refs only
Visual verification → browser_get_state(screenshot=true) # full state + screenshot
Act by ref → browser_click_ref(ref) # stable, no re-snapshot needed
After action → browser_state_diff() # diff only, no image
Debug failures → browser_console_messages() # JS errors
browser_network_requests() # failed API calls
```
**When to request a screenshot:**
- Canvas-rendered UIs, game elements, charts
- `aria-hidden` elements that are visually significant
- Cross-origin iframes
- Visual layout verification (CAPTCHA, image-heavy pages)
All other cases → AX tree is sufficient and far cheaper in tokens.