{"id":47873748,"url":"https://github.com/plasmate-labs/plasmate","last_synced_at":"2026-04-04T16:25:53.957Z","repository":{"id":345196023,"uuid":"1183525901","full_name":"plasmate-labs/plasmate","owner":"plasmate-labs","description":"The browser engine for agents. HTML in, Semantic Object Model out. 10x token compression, V8 JS rendering, CDP compatible. Apache-2.0.","archived":false,"fork":false,"pushed_at":"2026-04-02T13:08:19.000Z","size":22265,"stargazers_count":5,"open_issues_count":4,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-03T01:00:03.632Z","etag":null,"topics":["agent-web-protocol","ai-agents","browser-engine","cdp","headless-browser","llm","mcp","puppeteer","rust","semantic-web","som","token-compression","web-automation","web-scraping"],"latest_commit_sha":null,"homepage":"https://docs.plasmate.app","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/plasmate-labs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP-v0.2.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null},"funding":{"github":["dbhurley"],"custom":["https://plasmate.app"]}},"created_at":"2026-03-16T17:38:22.000Z","updated_at":"2026-04-02T13:08:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/plasmate-labs/plasmate","commit_stats":null,"previous_names":["plasmate-labs/plasmate"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/plasmate-labs/plasmate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plasmate-labs%2Fplasmate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plasmate-labs%2Fplasmate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plasmate-labs%2Fplasmate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plasmate-labs%2Fplasmate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/plasmate-labs","download_url":"https://codeload.github.com/plasmate-labs/plasmate/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plasmate-labs%2Fplasmate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31405700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-web-protocol","ai-agents","browser-engine","cdp","headless-browser","llm","mcp","puppeteer","rust","semantic-web","som","token-compression","web-automation","web-scraping"],"created_at":"2026-04-04T01:00:58.215Z","updated_at":"2026-04-04T16:25:53.950Z","avatar_url":"https://github.com/plasmate-labs.png","language":"HTML","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"website/brand/plasmate-mark.png\" alt=\"Plasmate\" width=\"80\" /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003ePlasmate\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  The browser engine for agents.\u003cbr/\u003e\n  HTML in. Semantic Object Model out.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://plasmate.app\"\u003eWebsite\u003c/a\u003e \u0026middot;\n  \u003ca href=\"https://docs.plasmate.app\"\u003eDocs\u003c/a\u003e \u0026middot;\n  \u003ca href=\"https://plasmate.app/compare\"\u003eBenchmarks\u003c/a\u003e \u0026middot;\n  \u003ca href=\"https://crates.io/crates/plasmate\"\u003eCrates.io\u003c/a\u003e \u0026middot;\n  \u003ca href=\"https://www.npmjs.com/package/plasmate\"\u003enpm\u003c/a\u003e \u0026middot;\n  \u003ca href=\"https://pypi.org/project/plasmate/\"\u003ePyPI\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/plasmate-labs/plasmate/actions/workflows/release.yml/badge.svg\" alt=\"CI\" /\u003e\n  \u003cimg src=\"https://img.shields.io/crates/v/plasmate\" alt=\"crates.io\" /\u003e\n  \u003cimg src=\"https://img.shields.io/npm/v/plasmate\" alt=\"npm\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-Apache--2.0-blue\" alt=\"License\" /\u003e\n\u003c/p\u003e\n\n---\n\nPlasmate compiles HTML into a **Semantic Object Model (SOM)**, a structured representation that LLMs can reason about directly. It runs JavaScript via V8, supports Puppeteer via CDP, and produces output that is 10-800x smaller than raw HTML.\n\n| | Plasmate | Lightpanda | Chrome |\n|---|---|---|---|\n| **Per page** | **4-5 ms** | 23 ms | 252 ms |\n| **Memory (100 pages)** | **~30 MB** | ~2.4 GB | ~20 GB |\n| **Binary** | **43 MB** | 59-111 MB | 300-500 MB |\n| **Output** | **SOM (10-800x smaller)** | Raw HTML | Raw HTML |\n| **License** | **Apache-2.0** | AGPL-3.0 | Chromium |\n\n## Install\n\n```bash\ncurl -fsSL https://plasmate.app/install.sh | sh\n```\n\nOr via package managers:\n\n```bash\ncargo install plasmate       # Rust\nnpm install -g plasmate      # Node.js\npip install plasmate         # Python\n```\n\n## Quick Start\n\n### Fetch a page and get structured output\n\n```bash\nplasmate fetch https://news.ycombinator.com\n```\n\nReturns SOM JSON: structured regions, interactive elements with stable IDs, and content, typically 10x smaller than the raw HTML.\n\n### Start a CDP server (Puppeteer compatible)\n\n```bash\nplasmate serve --protocol cdp --host 127.0.0.1 --port 9222\n```\n\nThen connect with Puppeteer:\n\n```javascript\nimport puppeteer from 'puppeteer-core';\n\nconst browser = await puppeteer.connect({\n  browserWSEndpoint: 'ws://127.0.0.1:9222',\n  protocolTimeout: 10000,\n});\n\nconst page = await browser.newPage();\nawait page.goto('https://example.com');\n\nconst title = await page.evaluate(() =\u003e document.title);\nconsole.log(title);\n\nawait browser.close();\n```\n\n### Start an AWP server (native protocol)\n\n```bash\nplasmate serve --protocol awp --host 127.0.0.1 --port 9222\n```\n\nAWP has 7 methods: `navigate`, `snapshot`, `click`, `type`, `scroll`, `select`, `extract`. That's the entire protocol.\n\n### Run as an MCP tool server (Model Context Protocol)\n\n```bash\nplasmate mcp\n```\n\nThis exposes Plasmate over stdio as MCP tools:\n- `fetch_page` - get structured SOM from any URL\n- `extract_text` - get clean readable text\n- `open_page` - start an interactive session (returns session_id + SOM)\n- `evaluate` - run JavaScript in the page context\n- `click` - click elements by SOM element ID\n- `close_page` - end a session\n\nExample Claude Desktop config:\n\n```json\n{\n  \"mcpServers\": {\n    \"plasmate\": {\n      \"command\": \"plasmate\",\n      \"args\": [\"mcp\"]\n    }\n  }\n}\n```\n\n## For AI Agents\n\nPlasmate is purpose-built for AI agent pipelines. Several ways to wire it in:\n\n### MCP (Claude Desktop, Cursor, VS Code Copilot, Windsurf)\n\nAdd to your MCP config and every tool call automatically uses Plasmate:\n\n```json\n{\n  \"mcpServers\": {\n    \"plasmate\": {\n      \"command\": \"plasmate\",\n      \"args\": [\"mcp\"]\n    }\n  }\n}\n```\n\nConfig file locations:\n- **Claude Desktop** — `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS)\n- **Cursor** — `~/.cursor/mcp.json`\n- **VS Code Copilot** — `.vscode/mcp.json` (workspace) or user settings\n- **Windsurf** — `~/.codeium/windsurf/mcp_config.json`\n\nOnce connected, 13 tools are available: `fetch_page`, `extract_text`, `extract_links`, `open_page`, `navigate_to`, `click`, `type_text`, `select_option`, `scroll`, `toggle`, `clear`, `evaluate`, `close_page`.\n\n**Tip:** use `selector=\"main\"` on any fetch to strip nav/footer before the LLM sees the content.\n\n### Vercel AI SDK\n\nUse Plasmate via the AI SDK's built-in MCP client (AI SDK v4+):\n\n```bash\nnpm install ai @ai-sdk/openai\n```\n\n```ts\nimport { experimental_createMCPClient as createMCPClient, generateText } from 'ai'\nimport { Experimental_StdioMCPTransport as StdioMCPTransport } from 'ai/mcp-stdio'\nimport { openai } from '@ai-sdk/openai'\n\nconst mcp = await createMCPClient({\n  transport: new StdioMCPTransport({\n    command: 'plasmate',\n    args: ['mcp'],\n  }),\n})\n\nconst { text } = await generateText({\n  model: openai('gpt-4o'),\n  tools: await mcp.tools(),\n  maxSteps: 5,\n  prompt: 'Summarize the top 3 stories on news.ycombinator.com',\n})\n\nawait mcp.close()\n```\n\nThis wires all 13 Plasmate tools directly into any Vercel AI SDK agent. See [Vercel AI SDK MCP docs](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling#mcp-tools) for details.\n\n### LLM context\n\n- Machine-readable summary: [`https://plasmate.app/llms.txt`](https://plasmate.app/llms.txt)\n- Codebase guide for AI coding agents: [`AGENTS.md`](./AGENTS.md)\n- Listed on [MCP Registry](https://registry.modelcontextprotocol.io) as the first browser/web tool\n\n\n## What is SOM?\n\nThe DOM was built for rendering. SOM was built for reasoning.\n\n```\nWikipedia homepage:\n  DOM  → 47,000 tokens\n  SOM  → 4,500 tokens (10.4x compression)\n\naccounts.google.com:\n  DOM  → ~300,000 tokens\n  SOM  → ~350 tokens (864x compression)\n```\n\nSOM strips layout, styling, scripts, SVGs, and boilerplate. It keeps structure, content, and interactive elements with stable IDs that agents can reference in actions.\n\n## Token Compression (38-site benchmark)\n\n| Site | HTML | SOM | Compression |\n|---|---|---|---|\n| accounts.google.com | 1.2 MB | 1.4 KB | **864x** |\n| x.com | 239 KB | 1.5 KB | **159x** |\n| linear.app | 2.2 MB | 21 KB | **105x** |\n| bing.com | 157 KB | 1.7 KB | **93x** |\n| google.com | 194 KB | 2.6 KB | **74x** |\n| vercel.com | 941 KB | 22 KB | **43x** |\n| ebay.com | 831 KB | 33 KB | **25x** |\n| Wikipedia | 1.7 MB | 70 KB | **25x** |\n\nMedian compression: **10.2x** across 38 sites. [Full results](https://plasmate.app/compare).\n\n## JavaScript Support\n\nPlasmate embeds V8 and executes page JavaScript, including:\n\n- Inline and external `\u003cscript\u003e` tags\n- `fetch()` and `XMLHttpRequest` with real HTTP requests\n- `setTimeout` / `setInterval` with timer draining\n- DOM mutations (createElement, appendChild, textContent, innerHTML, etc.)\n- DOMContentLoaded and load events\n- Promise resolution and microtask pumping\n\nThe JS pipeline runs during `plasmate fetch` and CDP `page.goto()`. The resulting DOM mutations are serialized back to HTML before SOM compilation, so JS-rendered content is captured.\n\n## CDP Compatibility\n\nPlasmate passes [Lightpanda's Puppeteer benchmark](https://github.com/lightpanda-io/demo) (campfire-commerce). Supported CDP methods:\n\n- `page.goto()`, `page.content()`, `page.title()`\n- `page.evaluate()`, `page.waitForFunction()`\n- `browser.newPage()`, `browser.createBrowserContext()`\n- `Runtime.evaluate`, `Runtime.callFunctionOn`\n- `DOM.getDocument`, `DOM.querySelector`, `DOM.querySelectorAll`\n- `Input.dispatchMouseEvent`, `Input.dispatchKeyEvent`\n- Target management (create, attach, close)\n\nCDP is a compatibility layer. AWP is the native protocol, designed for agents rather than debuggers.\n\n## Architecture\n\n```\nHTML → Network (reqwest) → HTML Parser (html5ever)\n  → JS Pipeline (V8: scripts, fetch, XHR, timers, DOM mutations)\n    → DOM Serialization → SOM Compiler → JSON output\n```\n\n- **Network**: reqwest with TLS, HTTP/2, redirects, compression; cookie jar supported, cookie APIs and proxy configuration are still limited\n- **JS Runtime**: V8 with DOM shim (80+ methods), blocking fetch bridge\n- **SOM Compiler**: semantic region detection, element ID generation, interactive element preservation, smart truncation, deduplication\n- **Protocols**: AWP (native, 7 methods) and CDP (Puppeteer compatibility)\n\n## Build from Source\n\n```bash\ngit clone https://github.com/plasmate-labs/plasmate.git\ncd plasmate\ncargo build --release\n./target/release/plasmate fetch https://example.com\n```\n\nRequirements: Rust 1.75+, V8 (fetched automatically by rusty_v8).\n\n## Docker\n\nPrebuilt multi-arch images (linux/amd64 and linux/arm64) are published to GHCR:\n\n```bash\n# Server mode (CDP or AWP)\ndocker run --rm -p 9222:9222 ghcr.io/plasmate-labs/plasmate:latest\n\n# One-shot fetch\ndocker run --rm ghcr.io/plasmate-labs/plasmate:latest fetch https://example.com\n```\n\nBuild locally:\n\n```bash\ndocker build -t plasmate .\ndocker run --rm -p 9222:9222 plasmate\n```\n\n## Tests\n\n```bash\ncargo test --workspace    # 252 tests\n```\n\n## Benchmarks\n\nRun the built-in benchmark against cached pages:\n\n```bash\ncargo run --release -- bench --urls bench/urls.txt\n```\n\nOr test against live sites:\n\n```bash\nplasmate fetch https://en.wikipedia.org/wiki/Rust_(programming_language) | jq '.regions | length'\n```\n\nSee [plasmate.app/compare](https://plasmate.app/compare) for the full comparison with Lightpanda and Chrome.\n\n## Roadmap\n\n- [x] MCP server mode (`plasmate mcp` over stdio)\n- [x] MCP Phase 2: stateful tools (open_page, click, evaluate, close_page)\n- [x] Docker image (GHCR multi-arch)\n- [ ] Full V8 DOM mutation bridge (re-snapshot SOM after JS changes)\n- [ ] Network interception (Fetch domain)\n- [ ] Expose cookie APIs (CDP Network.getCookies/setCookies, MCP cookie import/export)\n- [ ] Proxy support (per-session config, SOCKS)\n- [ ] Real-world top-100 site coverage testing\n- [ ] Web Platform Tests integration\n\n## License\n\nApache-2.0. See [LICENSE](LICENSE).\n\nBuilt by [Plasmate Labs](https://plasmate.app).\n","funding_links":["https://github.com/sponsors/dbhurley","https://plasmate.app"],"categories":["Building"],"sub_categories":["Tools"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplasmate-labs%2Fplasmate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fplasmate-labs%2Fplasmate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplasmate-labs%2Fplasmate/lists"}