An open API service indexing awesome lists of open source software.

https://github.com/0xMassi/webclaw

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
https://github.com/0xMassi/webclaw

ai ai-agents ai-scraping cli crawler data-extraction html-to-markdown llm markdown mcp mcp-server rust scraper self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping webscraping

Last synced: 28 days ago
JSON representation

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

Awesome Lists containing this project

README

          



webclaw


The fastest web scraper for AI agents.

67% fewer tokens. Sub-millisecond extraction. Zero browser overhead.


Stars
Version
License
npm installs



Discord
X / Twitter
Website
Docs

---


Claude Code: web_fetch gets 403, webclaw extracts successfully


Claude Code's built-in web_fetch → 403 Forbidden. webclaw → clean markdown.

---

Your AI agent calls `fetch()` and gets a 403. Or 142KB of raw HTML that burns through your token budget. **webclaw fixes both.**

It extracts clean, structured content from any URL using Chrome-level TLS fingerprinting — no headless browser, no Selenium, no Puppeteer. Output is optimized for LLMs: **67% fewer tokens** than raw HTML, with metadata, links, and images preserved.

```
Raw HTML webclaw
┌──────────────────────────────────┐ ┌──────────────────────────────────┐

│ │ # Breaking: AI Breakthrough │
│ │ │ │
│ window.__NEXT_DATA__ │ │ Researchers achieved 94% │
│ ={...8KB of JSON...} │ │ accuracy on cross-domain │