{"id":49595483,"url":"https://github.com/konippi/servo-fetch","last_synced_at":"2026-05-22T02:11:10.845Z","repository":{"id":353697258,"uuid":"1220475543","full_name":"konippi/servo-fetch","owner":"konippi","description":"A self-contained browser engine that fetches, renders, and extracts web content — no Chrome, no API key, no setup.","archived":false,"fork":false,"pushed_at":"2026-05-02T05:07:00.000Z","size":285,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-02T06:19:54.446Z","etag":null,"topics":["agent-skills","cli","fetch","mcp","rust","servo","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/konippi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-25T00:07:28.000Z","updated_at":"2026-05-02T05:07:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/konippi/servo-fetch","commit_stats":null,"previous_names":["konippi/servo-fetch"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/konippi/servo-fetch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/konippi%2Fservo-fetch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/konippi%2Fservo-fetch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/konippi%2Fservo-fetch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/konippi%2Fservo-fetch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/konippi","download_url":"https://codeload.github.com/konippi/servo-fetch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/konippi%2Fservo-fetch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32593948,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T22:12:39.696Z","status":"online","status_checked_at":"2026-05-04T02:00:06.625Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-skills","cli","fetch","mcp","rust","servo","web-scraping"],"created_at":"2026-05-04T04:03:04.766Z","updated_at":"2026-05-22T02:11:10.826Z","avatar_url":"https://github.com/konippi.png","language":"Rust","funding_links":[],"categories":["Applications"],"sub_categories":["Web"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003eservo-fetch\u003c/h1\u003e\n  \u003cp align=\"center\"\u003eA self-contained browser engine that fetches, renders, and extracts web content as Markdown, JSON, or screenshots — no Chromium, no API key, no setup.\u003c/p\u003e\n  \u003cp\u003e\n    \u003ca href=\"https://github.com/konippi/servo-fetch/actions\"\u003e\u003cimg src=\"https://github.com/konippi/servo-fetch/workflows/CI/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://crates.io/crates/servo-fetch\"\u003e\u003cimg src=\"https://img.shields.io/crates/v/servo-fetch.svg\" alt=\"crates.io\"\u003e\u003c/a\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Rust-1.86.0-blue?color=fc8d62\u0026logo=rust\" alt=\"MSRV\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg\" alt=\"MIT OR Apache-2.0\"\u003e\n  \u003c/p\u003e\n  \u003cimg src=\"assets/demo.gif\" alt=\"servo-fetch demo\" width=\"900\"\u003e\n\u003c/div\u003e\n\nservo-fetch embeds the [Servo](https://servo.org/) browser engine. It executes JavaScript, computes CSS layout,\ncaptures screenshots with a software renderer, and extracts clean content — available as a CLI, a Rust library,\nand a Python SDK.\n\n```bash\n# CLI\nservo-fetch \"https://example.com\"                        # clean Markdown\nservo-fetch \"https://example.com\" --screenshot page.png  # PNG screenshot\n```\n\n```rust\n// Rust\nlet md = servo_fetch::markdown(\"https://example.com\")?;\n```\n\n```python\n# Python\npage = servo_fetch.fetch(\"https://example.com\")\nprint(page.markdown)\n```\n\n## Why servo-fetch\n\n- **Zero dependencies** — single binary, no Chromium, no API key\n- **Real JS execution** — SpiderMonkey runs JavaScript, parallel CSS engine computes layout\n- **Layout- and visibility-aware extraction** — strips navbars, sidebars, footers by rendered position, plus cookie banners, modals, and CSS-hidden content (`opacity:0`, `aria-hidden`, sr-only)\n- **Schema-driven JSON** — declarative CSS-selector schema pulls structured data\n- **Parallel batch fetch** — multiple URLs fetched concurrently\n- **Site crawling** — BFS link traversal with robots.txt, same-site scope, and rate limiting\n- **URL discovery** — sitemap-based URL mapping without rendering (fast, lightweight)\n- **Screenshots without GPU** — software renderer captures PNG/full-page screenshots anywhere\n- **Accessibility tree** — AccessKit integration with roles, names, and bounding boxes\n\n## Performance and quality\n\nApple M3 Pro, versus Playwright (the typical AI-agent stack):\n\n| Benchmark           | servo-fetch | playwright:optimized |\n| ------------------- | ----------: | -------------------: |\n| Time — static-small |     ~231 ms |              ~645 ms |\n| Time — spa-heavy    |     ~331 ms |              ~798 ms |\n| Memory (peak RSS)   |    51–64 MB |           300–328 MB |\n\nExtraction quality: mean word-F1 0.819 vs Readability's 0.728 across\neight page-type fixtures, with `without[]` boilerplate removal at 95.0%\nvs 78.6%. Direct-binary engine peers (chrome-headless-shell, Lightpanda,\ncurl) are opt-in.\n\nMethodology, three-axis breakdown, per-fixture F1, and raw JSON:\n[`benchmarks/README.md`](benchmarks/README.md) +\n[`benchmarks/results/`](benchmarks/results/).\n\n## Install\n\n| Interface | Install | Docs |\n|-----------|---------|------|\n| **CLI** | `curl -fsSL https://raw.githubusercontent.com/konippi/servo-fetch/main/install.sh \\| sh` | [CLI docs](crates/servo-fetch-cli/README.md) |\n| **Rust** | `cargo add servo-fetch` | [Library docs](crates/servo-fetch/README.md) |\n| **Python** | `pip install servo-fetch` | [Python docs](bindings/python/README.md) |\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eCLI install alternatives\u003c/b\u003e\u003c/summary\u003e\n\n```bash\ncargo binstall servo-fetch-cli   # prebuilt binary\ncargo install servo-fetch-cli    # build from source\n```\n\nOr download from [GitHub Releases](https://github.com/konippi/servo-fetch/releases).\n\n**Linux** — install runtime deps and use `xvfb-run` on headless servers:\n\n```bash\nsudo apt install -y libegl1 libfontconfig1 libfreetype6\nxvfb-run --auto-servernum servo-fetch \"https://example.com\"\n```\n\n**Windows** — keep `servo-fetch.exe`, `libEGL.dll`, and `libGLESv2.dll` in the same directory.\n\n**macOS** — no extra setup needed.\n\n\u003c/details\u003e\n\n## Quick Start\n\n### CLI\n\n```bash\nservo-fetch \"https://example.com\"                        # Markdown (default)\nservo-fetch \"https://example.com\" --json                 # Structured JSON\nservo-fetch \"https://example.com\" --screenshot page.png  # PNG screenshot\nservo-fetch \"https://example.com\" --js \"document.title\"  # Run JavaScript\nservo-fetch \"https://example.com\" --schema schema.json   # Schema-driven JSON\nservo-fetch URL1 URL2 URL3                               # Parallel batch\nservo-fetch crawl \"https://docs.example.com\" --limit 20  # Crawl a site\nservo-fetch map \"https://example.com\"                    # Discover URLs via sitemap\nservo-fetch mcp                                          # MCP server (stdio)\nservo-fetch serve                                        # HTTP API server\n```\n\nFull CLI reference → [`servo-fetch-cli`](crates/servo-fetch-cli/README.md)\n\n### Rust\n\n```bash\ncargo add servo-fetch\n```\n\n```rust\n// URL → Markdown in one line\nlet md = servo_fetch::markdown(\"https://example.com\")?;\n\n// Fetch with options\nuse servo_fetch::{fetch, FetchOptions};\nuse std::time::Duration;\n\nlet page = fetch(FetchOptions::new(\"https://example.com\").timeout(Duration::from_secs(60)))?;\nprintln!(\"{}\", page.html);\nlet md = page.markdown()?;\n\n// Crawl a site\nservo_fetch::crawl_each(\n    servo_fetch::CrawlOptions::new(\"https://docs.example.com\")\n        .limit(100)\n        .user_agent(\"MyBot/1.0\"),\n    |result| match \u0026result.outcome {\n        Ok(page) =\u003e println!(\"{}: {} chars\", result.url, page.content.len()),\n        Err(e) =\u003e eprintln!(\"{}: {e}\", result.url),\n    },\n)?;\n\n// Discover URLs via sitemap (no rendering)\nlet urls = servo_fetch::map(\n    servo_fetch::MapOptions::new(\"https://example.com\").limit(1000),\n)?;\nfor u in \u0026urls {\n    println!(\"{}\", u.url);\n}\n```\n\nFull API reference → [`servo-fetch`](crates/servo-fetch/README.md)\n\n### Python\n\n```bash\npip install servo-fetch\n```\n\n```python\nimport servo_fetch\n\npage = servo_fetch.fetch(\"https://example.com\")\nprint(page.markdown)\n\n# Schema extraction\nfrom servo_fetch import Schema, Field\nschema = Schema(\n    base_selector=\".product\",\n    fields=[\n        Field(name=\"title\", selector=\"h2\", type=\"text\"),\n        Field(name=\"price\", selector=\".price\", type=\"text\"),\n    ],\n)\npage = servo_fetch.fetch(\"https://shop.example.com\", schema=schema)\nprint(page.extracted)\n```\n\nFull API reference → [`bindings/python`](bindings/python/README.md)\n\n## MCP Server\n\nBuilt-in [Model Context Protocol](https://modelcontextprotocol.io/) server with six tools: `fetch`,\n`batch_fetch`, `crawl`, `map`, `screenshot`, and `execute_js`.\n\n```json\n{\n  \"mcpServers\": {\n    \"servo-fetch\": {\n      \"command\": \"servo-fetch\",\n      \"args\": [\"mcp\"]\n    }\n  }\n}\n```\n\nStreamable HTTP: `servo-fetch mcp --port 8080`\n\nFull MCP tool reference → [`servo-fetch-cli` README](crates/servo-fetch-cli/README.md)\n\n## HTTP API\n\nREST endpoints for containerized deployments and HTTP clients:\n\n```bash\nservo-fetch serve                            # 127.0.0.1:3000\nservo-fetch serve --host 0.0.0.0 --port 80   # expose to network\n\ncurl -X POST http://127.0.0.1:3000/v1/fetch \\\n  -H 'content-type: application/json' \\\n  -d '{\"url\":\"https://example.com\"}'\n```\n\nEndpoints: `GET /health`, `GET /version`, `POST /v1/fetch`, `POST /v1/batch_fetch`, `POST /v1/screenshot`, `POST /v1/execute_js`, `POST /v1/crawl`, `POST /v1/map`.\n\nFull HTTP API reference → [`servo-fetch-cli` README](crates/servo-fetch-cli/README.md#http-api-server)\n\n## Docker\n\nMulti-arch image on GitHub Container Registry (`linux/amd64`, `linux/arm64`):\n\n```bash\ndocker run --rm -p 3000:3000 ghcr.io/konippi/servo-fetch:latest\ncurl -X POST http://127.0.0.1:3000/v1/fetch \\\n  -H 'content-type: application/json' \\\n  -d '{\"url\":\"https://example.com\"}'\n```\n\nRuns as non-root (UID 1001). Images are signed with [cosign](https://github.com/sigstore/cosign) (keyless) and published with SLSA provenance and SBOM attestations.\n\n## Agent Skills\n\nservo-fetch ships with an [Agent Skills](https://agentskills.io/) package for AI coding agents:\n\n```bash\nnpx skills add https://github.com/konippi/servo-fetch/tree/main/skills/servo-fetch\n```\n\n## Security\n\nservo-fetch blocks all private and reserved IP ranges ([RFC 6890](https://datatracker.ietf.org/doc/html/rfc6890)),\nstrips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against\nterminal escape injection ([CVE-2021-42574](https://www.cve.org/CVERecord?id=CVE-2021-42574)).\nSee [SECURITY.md](./SECURITY.md) for details.\n\n## Limitations\n\n- Sites behind login walls or CAPTCHAs are not supported.\n\n## Contributing\n\nSee [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup, commit conventions, and PR guidelines.\n\n## License\n\n[MIT](./LICENSE-MIT) OR [Apache-2.0](./LICENSE-APACHE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkonippi%2Fservo-fetch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkonippi%2Fservo-fetch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkonippi%2Fservo-fetch/lists"}