{"id":50766071,"url":"https://github.com/1999azzar/browser-agent-mcp","last_synced_at":"2026-06-11T14:01:18.564Z","repository":{"id":362324261,"uuid":"1258389163","full_name":"1999AZZAR/browser-agent-mcp","owner":"1999AZZAR","description":"Professional, modular browser automation agent MCP server powered by Playwright with stealth and high-fidelity observation capabilities.","archived":false,"fork":false,"pushed_at":"2026-06-03T15:22:13.000Z","size":29,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-03T17:09:18.905Z","etag":null,"topics":["ai-agent","automation","browser-automation","mcp","model-context-protocol","playwright","stealth"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/1999AZZAR.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-03T14:31:14.000Z","updated_at":"2026-06-03T15:32:01.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/1999AZZAR/browser-agent-mcp","commit_stats":null,"previous_names":["1999azzar/browser-agent-mcp"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/1999AZZAR/browser-agent-mcp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/1999AZZAR%2Fbrowser-agent-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/1999AZZAR%2Fbrowser-agent-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/1999AZZAR%2Fbrowser-agent-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/1999AZZAR%2Fbrowser-agent-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/1999AZZAR","download_url":"https://codeload.github.com/1999AZZAR/browser-agent-mcp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/1999AZZAR%2Fbrowser-agent-mcp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34201842,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","automation","browser-automation","mcp","model-context-protocol","playwright","stealth"],"created_at":"2026-06-11T14:01:01.795Z","updated_at":"2026-06-11T14:01:18.512Z","avatar_url":"https://github.com/1999AZZAR.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# General Browser Agent MCP\n\nA modular, production-ready browser automation agent implemented as a Model Context Protocol (MCP) server. Powered by Playwright, it provides a comprehensive toolset for human-like web interaction, state analysis, automated navigation, and network-level control.\n\n## Features\n\n- **Semantic Interaction**: Click elements by text (`browser_click_text`) and fill entire forms (`browser_fill_form`) with single commands.\n- **Multi-Tab Management**: Handle multiple sites simultaneously with tab list, switch, and creation tools.\n- **Resilient Navigation**: Automatic retry with configurable attempts and backoff on network failures.\n- **Request Interception**: Block, mock, or modify requests at the network level — stub APIs, strip ads, inject auth headers.\n- **Session \u0026 Persistence**: Persistent browser contexts with named session save/load for both cookies and Web Storage (`localStorage`/`sessionStorage`).\n- **Crash Recovery**: Browser state is automatically persisted to disk. If the browser process dies, tabs and intercept rules are restored on the next tool call — no data loss.\n- **Parallel Agents**: Run independent named pages within a single browser context. Create, switch, and remove agents to handle multi-page workflows without interference.\n- **PDF Export**: Save pages to disk as PDF with a custom output path and accurate file size reporting.\n- **Smart Wait Strategy**: `browser_wait_for_load` for sites with WebSocket/SSE connections; `browser_wait_until_stable` for AJAX-heavy SPAs.\n- **Stealth and Evasion**: Anti-detection behavioral profiles (`stealth` vs `speed`), realistic user-agent spoofing, human-like mouse jitter and typing delay.\n- **Robust State Capture**: Extracts semantic page data including Accessibility Trees (AX Tree), interactive elements, and structural headings.\n- **Data Extraction**: Table-to-JSON extraction and high-fidelity PDF/HTML capture.\n- **CAPTCHA Management**: Automated detection and assisted resolution for reCAPTCHA, hCaptcha, and common challenge pages.\n\n## Demo\n\nSee the General Browser Agent in action with Gemini CLI: [Watch on YouTube](https://youtu.be/O6nYKjmlaGk)\n\n## Toolset\n\n### Navigation \u0026 Tabs\n\n| Tool | Description |\n|------|-------------|\n| `browser_navigate` | Navigate to a URL with automatic retry on failure (`retries`, `retryDelay`) — state is saved for crash recovery |\n| `browser_new_tab` | Open a new tab, optionally at a URL |\n| `browser_list_tabs` | List all open tabs and their active status |\n| `browser_switch_tab` | Switch active tab by index |\n| `browser_back` / `browser_forward` / `browser_reload` | Standard history control |\n| `browser_wait` | Wait for a fixed number of milliseconds |\n| `browser_wait_for_selector` | Wait until an element appears in the DOM |\n| `browser_wait_for_url` | Wait until the URL matches a pattern (substring or regex) |\n| `browser_wait_until_stable` | Wait for networkidle — use for AJAX/SPA pages |\n| `browser_wait_for_load` | Wait for the `load` or `domcontentloaded` event — use for WebSocket/SSE pages |\n\n### Named Agents / Parallelism\n\n| Tool | Description |\n|------|-------------|\n| `browser_agent_create` | Create a new named agent page, or switch to an existing one |\n| `browser_agent_switch` | Switch active context to a named agent |\n| `browser_agent_remove` | Close and remove a named agent |\n| `browser_agent_list` | List all active named agents and their URLs |\n\nNamed agents are independent pages within the same browser. Use them to parallelize workflows — each agent keeps its own navigation state, forms, and cookies. Create one, work on it, switch to another, come back later.\n\n**Wait strategy guide:**\n\n| Situation | Tool |\n|-----------|------|\n| Standard page navigation | `browser_wait_for_load()` |\n| SPA / AJAX-heavy content | `browser_wait_until_stable()` |\n| Page with WebSocket or long-polling | `browser_wait_for_load()` — networkidle will hang |\n| Specific element expected | `browser_wait_for_selector(selector)` |\n| URL change after action | `browser_wait_for_url(pattern)` |\n\n### Interaction\n\n| Tool | Description |\n|------|-------------|\n| `browser_click_text` | Click element by visible text (smart button/link detection) |\n| `browser_fill_form` | Populate multiple fields at once from a `{selector: value}` object |\n| `browser_click` | Click by selector or `x, y` coordinates |\n| `browser_double_click` / `browser_right_click` | Pointer events |\n| `browser_hover` | Hover over an element or coordinates |\n| `browser_drag` | Drag source element to target |\n| `browser_scroll` / `browser_scroll_to` | Scroll by direction or to a target |\n| `browser_smart_scroll` | Incremental scroll to trigger lazy-loaded content |\n\n### Forms \u0026 Input\n\n| Tool | Description |\n|------|-------------|\n| `browser_type` | Human-like character insertion with configurable delay |\n| `browser_clear` | Clear an input field |\n| `browser_press` | Press a keyboard key |\n| `browser_select` | Select a dropdown option by value or label |\n| `browser_check` / `browser_uncheck` | Checkbox and radio control |\n\n### Observation \u0026 Extraction\n\n| Tool | Description |\n|------|-------------|\n| `browser_get_state` | Unified page snapshot: URL, title, AX tree, interactive elements, screenshot — auto-saves AX tree for later diffing |\n| `browser_observe` | **Low-token alternative to `browser_get_state`** — returns only interactable elements with `ref` numbers, no screenshot. Use for pre-action planning. |\n| `browser_click_ref` | Click an element by its `ref` number from the last `browser_observe` or `browser_get_state` call |\n| `browser_state_diff` | Compare last two AX snapshots: URL/title changes, new/removed headings, element shifts, popups, captcha |\n| `browser_screenshot` | Take a screenshot |\n| `browser_get_text` | Read text from one or all matching elements |\n| `browser_get_html` | Get full page or element HTML |\n| `browser_extract_table` | Convert an HTML table to structured JSON |\n| `browser_get_cookies` | Get all cookies for the active page |\n| `browser_evaluate` | Execute JavaScript in the page context (supports `return`, `await`, and `args` injection) |\n| `browser_print_to_pdf` | Save the page as a PDF file to a specified path |\n| `browser_console_messages` | Return captured browser console messages and JS errors (last 100). Filter by `type`. Pass `clear: true` to flush. |\n| `browser_network_requests` | Return captured network requests with status and timing (last 100). Filter by URL substring or `statusMin`. |\n| `browser_health` | Check browser health: context alive, page responsive, latency, active URL. Use to diagnose crashes or unresponsive pages. |\n\n**`browser_evaluate` usage:**\n```js\n// Return a value\nscript: \"return document.title\"\n\n// Use await\nscript: \"const r = await fetch('/api/status'); return r.status\"\n\n// Pass data via args (no string interpolation needed)\nscript: \"return args.x * args.y\"\nargs: { \"x\": 6, \"y\": 7 }\n```\n\n### Request Interception\n\n| Tool | Description |\n|------|-------------|\n| `browser_intercept` | Add an intercept rule: `block`, `mock`, or `modify` |\n| `browser_intercept_list` | List all active intercept rules |\n| `browser_clear_intercepts` | Remove all intercept rules |\n\n**Actions:**\n- `block` — abort matching requests (ads, trackers, heavy assets)\n- `mock` — return a synthetic response with `status`, `body`, `contentType`, `headers`\n- `modify` — pass the request through with injected headers (auth tokens, API keys)\n\n**Examples:**\n```\n# Block all images\npattern: \"**/*.{png,jpg,jpeg,gif,webp}\", action: \"block\"\n\n# Mock an API endpoint\npattern: \"https://api.example.com/users*\", action: \"mock\"\nbody: { \"users\": [] }, status: 200\n\n# Inject Authorization header\npattern: \"https://api.example.com/*\", action: \"modify\"\nheaders: { \"Authorization\": \"Bearer \u003ctoken\u003e\" }\n```\n\nRules persist across page navigations until `browser_clear_intercepts` is called.\n\n### Session \u0026 Profile Management\n\n| Tool | Description |\n|------|-------------|\n| `browser_save_session` | Save cookies (and optionally `localStorage`/`sessionStorage`) to a named file |\n| `browser_load_session` | Restore a saved session |\n| `browser_list_sessions` | List saved session files with size, cookie count, and origin |\n| `browser_set_agent_profile` | Switch between `stealth` and `speed` behavioral profiles |\n| `browser_handle_captcha` | Detect and manage CAPTCHA with optional manual hand-off |\n| `browser_solve_captcha_grid` | Click specific grid cells in a visual CAPTCHA |\n| `browser_close` | Terminate the browser session and clear all state |\n\n**Session storage note:** Pass `includeStorage: true` to `browser_save_session` to also capture `localStorage` and `sessionStorage`. Required for sites that store auth tokens in Web Storage instead of cookies (most modern SPAs). Storage is only restored if the current page origin matches the saved origin.\n\n### Helpers\n\n| Tool | Description |\n|------|-------------|\n| `browser_dismiss_popups` | Suppress modals, banners, and dialogs |\n| `browser_export_state` | Export current page state (URL/title/AX/cookies/storage) as a JSON snapshot for sharing or replay |\n\n## Installation\n\n### Prerequisites\n- Node.js 18.x or higher\n- npm\n\n### Setup\n```bash\nbash install.sh\n```\n\n## Cookie Injection (Firefox Sync)\n\nPlace a `cookies.json` file in the project root. The agent will automatically inject these cookies into every new session.\n\n## Configuration\n\nRegister in your MCP client config:\n\n```json\n{\n  \"mcpServers\": {\n    \"browser-agent\": {\n      \"command\": \"node\",\n      \"args\": [\"/absolute/path/to/browser-agent/src/server.js\"],\n      \"env\": {}\n    }\n  }\n}\n```\n\n### Environment Variables\n\n| Var | Default | Description |\n|-----|---------|-------------|\n| `START_URL` | — | Page to open when the session starts. |\n| `GOAL` | — | Task description exposed to MCP clients. |\n| `CHROMIUM_EXECUTABLE_PATH` | Playwright bundled | Path to a dedicated Chromium binary. If set, Playwright uses this instead of its bundled Chromium. |\n| `CHROMIUM_CHANNEL` | — | Playwright channel hint (e.g. `chromium`, `chrome`, `chrome-beta`). Ignored if `CHROMIUM_EXECUTABLE_PATH` is set. |\n| `BROWSER_HEADLESS` | `false` | Set to `true` for headless operation (CI / production). |\n| `BROWSER_LAUNCH_RETRIES` | `3` | Number of retries on browser launch failure. |\n| `BROWSER_LAUNCH_BACKOFF` | `1000` | Base delay (ms) between launch retries; doubled each retry. |\n\n### Browser Stability\n\nThe browser layer is hardened for long-running sessions:\n\n- **Launch retry** with exponential backoff — if `chromium.launchPersistentContext` fails, the launcher retries up to `BROWSER_LAUNCH_RETRIES` times, doubling the wait between attempts.\n- **Tab creation retry** — if `Target.createTarget` or related protocol errors occur when opening a new tab, the context is reset and the call is retried.\n- **Context health probe** — the cached context is checked for liveness (with timeout) before reuse; dead contexts are torn down and relaunched transparently.\n- **Stability flags** — Chromium is launched with flags that disable background timer throttling, renderer backgrounding, BackForwardCache, and other features that commonly cause crashes in automation.\n- **`browser_health` tool** — returns `{ contextAlive, pageResponsive, pageCount, pageLatencyMs, activePageUrl, headless, executablePath, launchRetries }` for runtime diagnostics.\n\n## Token-Efficient Interaction: Observe → Act\n\nFor repetitive or well-understood pages, skip the heavy `browser_get_state` screenshot and use the observe→click loop:\n\n```\n1. browser_observe()           # Returns elements with ref numbers, no screenshot\n   → { elements: [{ ref: 1, tag: \"BUTTON\", text: \"Sign In\" }, ...] }\n\n2. browser_click_ref(ref=1)    # Click by ref — no re-snapshot needed\n   → \"Clicked ref 1 (BUTTON \"Sign In\") at (320, 240).\"\n```\n\nThis matches the approach used by browser-use (93% context reduction) and Stagehand's `act` primitive.\n\nFor debugging after an interaction:\n```\nbrowser_console_messages(type='error')   # Any JS errors?\nbrowser_network_requests(statusMin=400)  # Any failed API calls?\n```\n\n## Architecture: Sense-Think-Act\n\nThe agent is designed for closed-loop automation with a **hybrid screenshot strategy** — screenshots are used only when the AX tree is insufficient.\n\n```\nUnfamiliar page      → browser_get_state()               # AX tree + elements, no image\nPlanning an action   → browser_observe()                  # interactable elements + refs only\nVisual verification  → browser_get_state(screenshot=true) # full state + screenshot\nAct by ref           → browser_click_ref(ref)             # stable, no re-snapshot needed\nAfter action         → browser_state_diff()               # diff only, no image\nDebug failures       → browser_console_messages()         # JS errors\n                       browser_network_requests()         # failed API calls\n```\n\n**When to request a screenshot:**\n- Canvas-rendered UIs, game elements, charts\n- `aria-hidden` elements that are visually significant\n- Cross-origin iframes\n- Visual layout verification (CAPTCHA, image-heavy pages)\n\nAll other cases → AX tree is sufficient and far cheaper in tokens.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F1999azzar%2Fbrowser-agent-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F1999azzar%2Fbrowser-agent-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F1999azzar%2Fbrowser-agent-mcp/lists"}