{"id":44862830,"url":"https://github.com/webpeel/webpeel","last_synced_at":"2026-04-02T00:34:47.960Z","repository":{"id":338081223,"uuid":"1156507506","full_name":"webpeel/webpeel","owner":"webpeel","description":"The web data platform for AI agents. Fetch, search, crawl, extract, monitor, screenshot — one API. 29 domain extractors, 65-98% token savings, MCP server included.","archived":false,"fork":false,"pushed_at":"2026-03-23T07:48:19.000Z","size":61131,"stargazers_count":3,"open_issues_count":1,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-23T10:57:35.218Z","etag":null,"topics":["ai-agent","claude","crawl","cursor","extract","firecrawl-alternative","llm","markdown","mcp-server","monitor","playwright","rag","screenshot","search","token-efficiency","typescript","web-data","web-fetcher","web-scraper"],"latest_commit_sha":null,"homepage":"https://webpeel.dev","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/webpeel.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"webpeel","custom":["https://webpeel.dev"]}},"created_at":"2026-02-12T18:18:25.000Z","updated_at":"2026-03-23T07:48:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/webpeel/webpeel","commit_stats":null,"previous_names":["jakeliume/webpeel","webpeel/webpeel"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/webpeel/webpeel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webpeel%2Fwebpeel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webpeel%2Fwebpeel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webpeel%2Fwebpeel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webpeel%2Fwebpeel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/webpeel","download_url":"https://codeload.github.com/webpeel/webpeel/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webpeel%2Fwebpeel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31293418,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T21:15:39.731Z","status":"ssl_error","status_checked_at":"2026-04-01T21:15:34.046Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","claude","crawl","cursor","extract","firecrawl-alternative","llm","markdown","mcp-server","monitor","playwright","rag","screenshot","search","token-efficiency","typescript","web-data","web-fetcher","web-scraper"],"created_at":"2026-02-17T10:41:15.812Z","updated_at":"2026-04-02T00:34:47.952Z","avatar_url":"https://github.com/webpeel.png","language":"TypeScript","readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://webpeel.dev\"\u003e\n    \u003cimg src=\".github/banner.svg\" alt=\"WebPeel — Web data API for AI agents\" width=\"100%\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.npmjs.com/package/webpeel\"\u003e\u003cimg src=\"https://img.shields.io/npm/v/webpeel.svg?style=flat-square\" alt=\"npm version\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.npmjs.com/package/webpeel\"\u003e\u003cimg src=\"https://img.shields.io/npm/dm/webpeel.svg?style=flat-square\" alt=\"npm downloads\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/webpeel/webpeel/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/webpeel/webpeel?style=flat-square\" alt=\"GitHub stars\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/webpeel/webpeel/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/webpeel/webpeel/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-WebPeel%20SDK-blue.svg?style=flat-square\" alt=\"License\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003eThe web data layer for AI agents.\u003cbr\u003eFetch, search, crawl, extract, screenshot — one call, zero boilerplate.\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#quick-start\"\u003eQuick Start\u003c/a\u003e ·\n  \u003ca href=\"#agent-native-integrations\"\u003eAgent Integrations\u003c/a\u003e ·\n  \u003ca href=\"https://webpeel.dev/docs\"\u003eDocs\u003c/a\u003e ·\n  \u003ca href=\"https://webpeel.dev/playground\"\u003ePlayground\u003c/a\u003e ·\n  \u003ca href=\"https://app.webpeel.dev/signup\"\u003eGet API Key\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\".github/readme-demo.svg\" alt=\"WebPeel demo showing agent-friendly web fetch input, automatic engine selection, and clean JSON output\" width=\"100%\"\u003e\n\u003c/p\u003e\n\n---\n\n## The Problem\n\nEvery AI agent that touches the web rebuilds the same brittle stack: HTTP fetch → headless browser → anti-bot bypass → HTML cleanup → markdown conversion → token budgeting. Each layer fails differently. Sites change. Cloudflare rotates challenges. Your agent gets empty strings at 2 AM and your pipeline breaks.\n\n**WebPeel replaces that entire stack with one function call.** It handles engine selection, anti-bot escalation, domain-specific extraction, and token optimization so your agent gets clean, structured data every time — without managing browsers, proxies, or parsing logic.\n\n---\n\n## Quick Start\n\n```bash\n# Zero-install — just run it\nnpx webpeel \"https://example.com\"\n\n# Search the web\nnpx webpeel search \"latest AI agent frameworks\"\n\n# Crawl an entire site\nnpx webpeel crawl docs.example.com --max-pages 50\n\n# Screenshot any page\nnpx webpeel screenshot \"https://stripe.com/pricing\" --full-page\n\n# Ask a question about any page\nnpx webpeel ask \"https://arxiv.org/abs/2401.00001\" \"What is the main contribution?\"\n```\n\nOr install globally:\n\n```bash\nnpm install -g webpeel\n```\n\n**Use as a library:**\n\n```typescript\nimport { peel } from 'webpeel';\n\nconst result = await peel('https://news.ycombinator.com');\nconsole.log(result.markdown);   // Clean markdown, ready for your LLM\nconsole.log(result.metadata);   // Title, tokens saved, timing, etc.\n```\n\n**Use via API:**\n\n```bash\ncurl \"https://api.webpeel.dev/v1/fetch?url=https://stripe.com/pricing\" \\\n  -H \"Authorization: Bearer $WEBPEEL_API_KEY\"\n```\n\n```json\n{\n  \"url\": \"https://stripe.com/pricing\",\n  \"markdown\": \"# Stripe Pricing\\n\\n**Integrated per-transaction fees**...\",\n  \"metadata\": {\n    \"title\": \"Pricing \u0026 Fees | Stripe\",\n    \"tokens\": 420,\n    \"tokensOriginal\": 8200,\n    \"savingsPct\": 94.9\n  }\n}\n```\n\n[Get your free API key →](https://app.webpeel.dev/signup) · No credit card required · 500 requests/week free\n\n---\n\n## Why WebPeel\n\n### 🧠 55+ Domain Extractors — Not Just HTML-to-Markdown\n\nGeneric scrapers convert raw HTML to markdown and call it a day. WebPeel has **purpose-built extractors** for 55+ domains — Reddit, GitHub, YouTube, Amazon, ArXiv, Hacker News, Wikipedia, StackOverflow, Zillow, Polymarket, ESPN, and more. Each extractor understands the site's structure and returns clean, structured data without browser rendering.\n\n### ⚡ 65–98% Token Savings\n\nDomain extractors strip navigation, ads, sidebars, and boilerplate *before* content reaches your agent. Less context consumed = lower costs, faster inference, and longer agent chains.\n\n| Site | Raw HTML tokens | WebPeel tokens | Savings |\n|------|:--------------:|:--------------:|:-------:|\n| News article | 18,000 | 640 | **96%** |\n| Reddit thread | 24,000 | 890 | **96%** |\n| Wikipedia page | 31,000 | 2,100 | **93%** |\n| GitHub README | 5,200 | 1,800 | **65%** |\n| E-commerce product | 14,000 | 310 | **98%** |\n\n### 🔄 6-Layer Engine Escalation\n\nWebPeel doesn't just try one method — it automatically escalates through 6 engines until it gets a good result:\n\n```\nSimple HTTP → Domain API → Browser render → Stealth browser → Cloaked browser → Search cache fallback\n```\n\nNo manual `--render` flags for most sites. WebPeel knows which sites need JavaScript, which need stealth, and which have anti-bot protection — and picks the right engine automatically.\n\n### 🔌 Firecrawl-Compatible Migration Path\n\nAlready using Firecrawl-style workflows? WebPeel supports compatible `/v1/scrape`, `/v2/scrape`, `/v1/crawl`, `/v1/search`, and `/v1/map` endpoints, which makes migration dramatically easier than rebuilding your pipeline from scratch.\n\n---\n\n## Agent-Native Integrations\n\n### MCP Server (Claude, Cursor, Windsurf, VS Code)\n\nGive any MCP-compatible AI the ability to browse, search, and extract from the web.\n\n```json\n{\n  \"mcpServers\": {\n    \"webpeel\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"webpeel\", \"mcp\"],\n      \"env\": { \"WEBPEEL_API_KEY\": \"wp_your_key_here\" }\n    }\n  }\n}\n```\n\n**7 MCP tools exposed:** `webpeel_read` · `webpeel_find` · `webpeel_see` · `webpeel_extract` · `webpeel_monitor` · `webpeel_act` · `webpeel_crawl`\n\n[Full MCP setup guide →](https://webpeel.dev/docs/mcp)\n\n### LangChain\n\n```typescript\nimport { WebPeelLoader } from 'webpeel/integrations/langchain';\n\nconst loader = new WebPeelLoader({ url: 'https://example.com', render: true });\nconst docs = await loader.load();\n```\n\n### LlamaIndex\n\n```typescript\nimport { WebPeelReader } from 'webpeel/integrations/llamaindex';\n\nconst reader = new WebPeelReader();\nconst docs = await reader.loadData('https://example.com');\n```\n\n### Python SDK\n\n```bash\npip install webpeel\n```\n\n```python\nfrom webpeel import WebPeel\n\nwp = WebPeel(api_key=\"wp_...\")\nresult = wp.fetch(\"https://example.com\")\nprint(result.markdown)\n```\n\n---\n\n## Full Feature Set\n\n| Capability | CLI | API | Details |\n|-----------|:---:|:---:|---------|\n| **Fetch \u0026 extract** | `webpeel \"url\"` | `GET /v1/fetch` | Clean markdown from any URL |\n| **Web search** | `webpeel search \"query\"` | `GET /v1/search` | DuckDuckGo (free) or Brave (BYOK) |\n| **Smart search** | — | `POST /v1/search/smart` | AI-powered structured results |\n| **Crawl sites** | `webpeel crawl \"url\"` | `POST /v1/crawl` | Depth/page limits, rate control |\n| **Screenshots** | `webpeel screenshot \"url\"` | `POST /v1/screenshot` | Full-page, multi-viewport, visual diff, filmstrip |\n| **Structured extraction** | `--extract-schema` | `POST /v1/extract` | JSON schema → structured data |\n| **Q\u0026A** | `webpeel ask \"url\" \"q\"` | `POST /v1/answer` | Answer questions about any page |\n| **Deep research** | — | `POST /v1/deep-research` | Multi-query autonomous research |\n| **Content monitoring** | `webpeel monitor \"url\"` | `POST /v1/watch` | Change detection with webhooks |\n| **Browser sessions** | — | `POST /v1/session` | Persistent sessions for login flows |\n| **Browser actions** | `--action 'click:.btn'` | actions field | Click, type, scroll, wait |\n| **Batch scrape** | `webpeel batch file` | `POST /v1/batch/scrape` | Parallel multi-URL processing |\n| **URL discovery** | `webpeel map \"url\"` | `POST /v1/map` | Sitemap and link discovery |\n| **YouTube transcripts** | auto-detected | auto-detected | Multiple export formats |\n| **PDF extraction** | auto-detected | auto-detected | Text, tables, structure |\n| **Research agent** | — | `POST /v1/agent` | Autonomous multi-step research |\n\n---\n\n## Use Cases for Agent Builders\n\n**RAG pipelines** — Fetch docs, articles, or entire sites as clean markdown ready for chunking, embedding, and retrieval.\n\n**Price monitoring** — Track product pages across major commerce sites with structured extraction and change detection.\n\n**Competitive intel** — Monitor competitor pages, pricing tables, and job boards. Visual diff screenshots catch layout changes CSS selectors would miss.\n\n**Research agents** — Give Claude, Codex, Cursor, or your own agent grounded web access through the API or MCP server.\n\n**Lead enrichment** — Pull company details, public links, and page structure from business sites without writing per-site parsers.\n\n**Content aggregation** — Crawl and extract from communities, docs sites, and publications with domain-native extractors that understand each site's structure.\n\n---\n\n## Architecture\n\n```\nYour Agent\n    ↓\n WebPeel (npm / API / MCP)\n    ↓\n┌─────────────────────────────────┐\n│  Engine Ranker                  │\n│  HTTP → Domain API → Browser   │\n│  → Stealth → Cloaked → Cache   │\n├─────────────────────────────────┤\n│  55+ Domain Extractors          │\n│  reddit · github · youtube      │\n│  amazon · arxiv · zillow · ...  │\n├─────────────────────────────────┤\n│  Content Pipeline               │\n│  Readability → Turndown →       │\n│  Token budgeting → Chunking     │\n└─────────────────────────────────┘\n    ↓\n Clean markdown / structured JSON\n```\n\n---\n\n## Reliability\n\nWebPeel is built for production agent workflows, not just one-off demos.\n\n- **Automated evals in-repo** — smart search and fetch eval suites ship with the codebase\n- **Post-deploy gate** — critical checks run before calling a deploy healthy\n- **Engine fallback chain** — when one fetch method fails, WebPeel escalates instead of giving up\n- **Multiple surfaces, one core** — CLI, API, SDK, and MCP all ride the same extraction pipeline\n\n---\n\n## Security\n\n- **SSRF protection** — blocks localhost, private IPs, metadata endpoints, `file://` schemes\n- **Helmet.js** — HSTS, X-Frame-Options, nosniff, XSS protection on all responses\n- **Webhook signing** — HMAC-SHA256 on all outbound webhooks\n- **API key hashing** — SHA-256 with granular scopes\n- **Rate limiting** — sliding window, per-tier\n- **Audit logging** — every API call logged with IP, key, and action\n- **GDPR compliant** — `DELETE /v1/account` for full data erasure\n[Security policy →](https://webpeel.dev/security) · [SLA (99.9% uptime) →](https://webpeel.dev/sla)\n\n---\n\n## Why teams choose WebPeel instead of stitching a stack together\n\n| Approach | What it gives you | Where it breaks down |\n|---|---|---|\n| Raw HTTP + HTML parsing | Cheap, simple fetches | Falls apart on JS-heavy sites, anti-bot pages, and noisy HTML |\n| Pure browser automation | Maximum control | Expensive, slow, fragile, and high-maintenance for large-scale use |\n| Search-only APIs | Great discovery | Weak page extraction, limited structured output, limited downstream actions |\n| Single-purpose scrapers | Fast on one job | You end up composing 4–6 tools for real agent workflows |\n| **WebPeel** | Fetch + search + crawl + extraction + screenshots + monitoring in one layer | Opinionated toward agent workflows rather than generic scraping |\n\n---\n\n## Links\n\n📖 [Documentation](https://webpeel.dev/docs) · 💰 [Pricing](https://webpeel.dev/pricing) · 🎮 [Playground](https://webpeel.dev/playground) · 📝 [Blog](https://webpeel.dev/blog) · 💬 [Discussions](https://github.com/webpeel/webpeel/discussions) · 🚀 [Releases](https://github.com/webpeel/webpeel/releases) · 📊 [Status](https://webpeel.dev/status) · 🔒 [Security](https://webpeel.dev/security) · 📋 [Changelog](https://webpeel.dev/changelog)\n\n---\n\n## Contributing\n\nPull requests welcome. Please open an issue first to discuss major changes.\n\n```bash\ngit clone https://github.com/webpeel/webpeel.git\ncd webpeel \u0026\u0026 npm install\nnpm run build \u0026\u0026 npm test\n```\n\n---\n\n## License\n\n[WebPeel SDK License](LICENSE) — free for personal and commercial use with attribution.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://app.webpeel.dev/signup\"\u003e\u003cstrong\u003eGet started free →\u003c/strong\u003e\u003c/a\u003e\n\u003c/p\u003e\n","funding_links":["https://github.com/sponsors/webpeel","https://webpeel.dev"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebpeel%2Fwebpeel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwebpeel%2Fwebpeel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebpeel%2Fwebpeel/lists"}