{"id":45958360,"url":"https://github.com/giulio-leone/onecrawl","last_synced_at":"2026-03-12T01:11:29.172Z","repository":{"id":335019238,"uuid":"1143774046","full_name":"giulio-leone/onecrawl","owner":"giulio-leone","description":"Native TypeScript web crawler and scraper. Zero Python dependencies.","archived":false,"fork":false,"pushed_at":"2026-03-10T03:23:43.000Z","size":120687,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-10T03:57:09.156Z","etag":null,"topics":["ai","generative-ui","onegenui","typescript"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/giulio-leone.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-01-28T00:54:14.000Z","updated_at":"2026-03-10T03:23:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/giulio-leone/onecrawl","commit_stats":null,"previous_names":["g97iulio1609/onecrawl","giulio-leone/onecrawl"],"tags_count":37,"template":false,"template_full_name":null,"purl":"pkg:github/giulio-leone/onecrawl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giulio-leone%2Fonecrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giulio-leone%2Fonecrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giulio-leone%2Fonecrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giulio-leone%2Fonecrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/giulio-leone","download_url":"https://codeload.github.com/giulio-leone/onecrawl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giulio-leone%2Fonecrawl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30410373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-12T00:40:14.898Z","status":"ssl_error","status_checked_at":"2026-03-12T00:40:08.439Z","response_time":84,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","generative-ui","onegenui","typescript"],"created_at":"2026-02-28T13:40:00.903Z","updated_at":"2026-03-12T01:11:29.143Z","avatar_url":"https://github.com/giulio-leone.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OneCrawl\n\nHigh-performance browser automation engine written in Rust. Native bindings for Node.js and Python.\n\n## Architecture\n\n```\npackages/onecrawl-rust/\n├── crates/\n│   ├── onecrawl-core/       # Shared types, traits, error handling\n│   ├── onecrawl-crypto/     # AES-256-GCM, PKCE, TOTP, PBKDF2 (ring)\n│   ├── onecrawl-parser/     # HTML parsing \u0026 accessibility tree (lol_html + scraper)\n│   ├── onecrawl-storage/    # Encrypted key-value store (sled)\n│   ├── onecrawl-cdp/        # Chrome DevTools Protocol — 63 modules (chromiumoxide)\n│   ├── onecrawl-server/     # HTTP REST API with multi-instance management (axum)\n│   ├── onecrawl-cli-rs/     # Native CLI — 200+ commands (clap v4)\n│   └── onecrawl-mcp-rs/     # MCP server — 10 super-tools, 250 actions (rmcp)\n├── bindings/\n│   ├── napi/                # Node.js via NAPI-RS → @onecrawl/native (307 methods)\n│   └── python/              # Python via PyO3 → onecrawl\n└── Cargo.toml               # Workspace root\n```\n\n## Features\n\n| Category | Highlights |\n|----------|-----------|\n| **Browser** | Launch, connect, stealth-by-default, proxy rotation, fingerprint evasion, session config |\n| **CDP** | 63 modules: DOM, Network, CSS, Performance, Accessibility, Profiler, Tracing, WebAuthn… |\n| **Navigation** | goto, back, forward, reload, wait, screenshot, PDF, multi-tab |\n| **Interaction** | click, type, drag \u0026 drop, hover, keyboard shortcuts, select, file upload |\n| **Scraping** | CSS selectors, XPath, accessibility tree, shadow DOM piercing, streaming extraction |\n| **Crawling** | Spider, sitemap, link graph, robots.txt, DOM snapshot diff |\n| **Network** | Request interception, mock responses, URL blocking, console capture, dialog handling |\n| **Emulation** | Device emulation, geolocation, timezone, media features, network throttling |\n| **Auth** | WebAuthn/Passkey virtual authenticator, cookie/session management, import/export |\n| **Crypto** | AES-256-GCM encryption, PKCE, TOTP, PBKDF2 key derivation |\n| **AI Agent** | Agent memory, workflow DSL, task planner, autonomous computer_use, visual regression testing, performance monitor |\n| **Accessibility** | WCAG compliance auditing, ARIA tree, contrast checks, heading structure, keyboard traps, screen reader simulation |\n| **Real-Time** | WebSocket connect/intercept/send, Server-Sent Events, GraphQL subscriptions |\n| **Human Simulation** | Bézier mouse curves, natural typing with typos, human-like scrolling, behavior profiles |\n| **Service Workers** | SW register/unregister/update, Cache Storage management, push simulation, offline mode |\n| **Server** | Multi-instance Chrome, profiles, tabs, accessibility snapshots, action API |\n| **MCP** | 10 super-tools with 250 actions for AI agent orchestration |\n\n## Installation\n\n### CLI (from source)\n\n```bash\ncd packages/onecrawl-rust\ncargo install --path crates/onecrawl-cli-rs\n```\n\n### Node.js\n\n```bash\nnpm install @onecrawl/native\n```\n\n```javascript\nimport { NativeBrowser } from '@onecrawl/native';\n\nconst browser = await NativeBrowser.launch(true); // headless\nawait browser.goto('https://example.com');\nconst title = await browser.getTitle();\nconst screenshot = await browser.screenshot(); // Buffer\nawait browser.close();\n```\n\n### Python\n\n```bash\npip install onecrawl\n```\n\n```python\nfrom onecrawl import Browser\n\nbrowser = Browser()\nbrowser.launch(headless=True, stealth=True)\nbrowser.goto(\"https://example.com\")\nhtml = browser.content()\nbrowser.close()\n```\n\n## CLI Usage\n\n```bash\n# Launch browser and navigate\nonecrawl launch --stealth\nonecrawl goto https://example.com\n\n# Scraping\nonecrawl css \"h1\" --attr textContent\nonecrawl xpath \"//a[@href]\"\nonecrawl readability https://example.com\n\n# Crawling\nonecrawl spider https://example.com --depth 3\nonecrawl sitemap https://example.com/sitemap.xml\n\n# Screenshots \u0026 PDF\nonecrawl screenshot --full-page -o page.png\nonecrawl pdf -o page.pdf\n\n# Authentication\nonecrawl auth passkey-enable\nonecrawl auth passkey-create --rp-id example.com --user-name admin\n\n# HTTP Server (multi-instance)\nonecrawl serve --port 9867\n```\n\n## HTTP Server API\n\nStart the server with `onecrawl serve` and manage browser instances via REST:\n\n```bash\n# Create a Chrome instance\ncurl -X POST http://localhost:9867/instances \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"profile\": \"default\"}'\n\n# Open a tab and navigate\ncurl -X POST http://localhost:9867/instances/{id}/tabs \\\n  -d '{\"url\": \"https://example.com\"}'\n\n# Get accessibility snapshot (stable element refs)\ncurl http://localhost:9867/instances/{id}/tabs/{tab}/snapshot\n\n# Execute action by element ref\ncurl -X POST http://localhost:9867/instances/{id}/tabs/{tab}/action \\\n  -d '{\"ref\": \"e5\", \"action\": \"click\"}'\n\n# Get token-efficient text (~800 tokens/page)\ncurl http://localhost:9867/instances/{id}/tabs/{tab}/text\n```\n\n### Endpoints\n\n| Method | Path | Description |\n|--------|------|-------------|\n| POST | `/instances` | Create Chrome instance |\n| GET | `/instances` | List instances |\n| DELETE | `/instances/:id` | Stop instance |\n| POST | `/instances/:id/tabs` | Open tab |\n| GET | `/instances/:id/tabs` | List tabs |\n| DELETE | `/instances/:id/tabs/:tab` | Close tab |\n| POST | `/instances/:id/tabs/:tab/navigate` | Navigate tab |\n| GET | `/instances/:id/tabs/:tab/snapshot` | Accessibility snapshot |\n| POST | `/instances/:id/tabs/:tab/action` | Execute action by ref |\n| GET | `/instances/:id/tabs/:tab/text` | Token-efficient text |\n| POST | `/profiles` | Create profile |\n| GET | `/profiles` | List profiles |\n| DELETE | `/profiles/:name` | Delete profile |\n| GET | `/health` | Health check |\n\n## MCP Integration\n\n10 super-tools with 250 total actions, using action-based dispatch:\n\n```json\n{\"action\": \"goto\", \"params\": {\"url\": \"https://example.com\"}}\n```\n\n| Super-Tool | Actions | Highlights |\n|------------|---------|------------|\n| **browser** | 95 | Navigation, interaction, extraction, multi-tab, DOM events, session, network interception, console/dialog, device emulation, drag/drop, file upload, shadow DOM, session context, smart forms, self-healing selectors, event reactions, service worker/PWA, offline mode, session config |\n| **crawl** | 5 | Spider, robots.txt, sitemap, DOM snapshot/diff |\n| **agent** | 40 | Stealth, fingerprint, anti-bot detection, proxy health, CAPTCHA, CDP cross-origin iframe interaction, task decomposition, vision observation, WCAG auditing, accessibility tree, screen reader simulation |\n| **stealth** | 13 | Enable/disable stealth, rotate fingerprint, proxy health, CAPTCHA solving, human behavior simulation |\n| **data** | 26 | Cookies, storage, structured data extraction, entity extraction, feeds, WebSocket, SSE, GraphQL subscriptions |\n| **secure** | 21 | WebAuthn/Passkey, vault, OAuth2, session/form auth, MFA, credentials |\n| **computer** | 18 | AI computer-use, autonomous goal execution, smart element resolution, multi-browser fleet |\n| **memory** | 6 | Agent memory: store, recall, search, forget, list, export |\n| **automate** | 19 | Workflow DSL: run, validate, list, templates, error recovery, session checkpoints, workflow control flow |\n| **perf** | 7 | Performance: audit, metrics, budget, trace, VRT comparison |\n\n\u003e Full API reference: [`docs/MCP_API_REFERENCE.md`](docs/MCP_API_REFERENCE.md)\n\n## Node.js Bindings\n\n`@onecrawl/native` exposes **307 methods** via NAPI-RS (direct FFI, no child process overhead):\n\n| Class/Module | Methods | Description |\n|-------------|---------|-------------|\n| **NativeBrowser** | 290 | Full browser control: navigation, interaction, scraping, crawling, emulation, auth, network, performance |\n| **NativeStore** | 7 | Encrypted key-value store (sled) |\n| **Crypto** | 6 | AES-256-GCM, PKCE, TOTP, PBKDF2 |\n| **Parser** | 4 | A11y tree, CSS selector, text/link extraction |\n\nFeatures: TypeScript types (`index.d.ts`), async/await, Buffer support, 33 test files (3,995 lines), 8-platform cross-compilation.\n\n## Development\n\n```bash\ncd packages/onecrawl-rust\n\n# Build all crates\ncargo build --workspace\n\n# Run tests (427 tests)\ncargo test --workspace --exclude onecrawl-e2e\n\n# Build release binary\ncargo build --release -p onecrawl-cli-rs\n```\n\n## Metrics\n\n| Metric | Value |\n|--------|-------|\n| Rust test suite | 427 tests |\n| Node.js test suite | 33 files, 3,995 lines |\n| CDP modules | 63 |\n| CLI commands | 200+ |\n| MCP super-tools | 10 (250 actions) |\n| NAPI methods | 307 |\n| Handler modules | 10 (split architecture) |\n| Enum-dispatched actions | 250 (compile-time exhaustive) |\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgiulio-leone%2Fonecrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgiulio-leone%2Fonecrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgiulio-leone%2Fonecrawl/lists"}