{"id":47405689,"url":"https://github.com/ExceptionRegret/Kryfto","last_synced_at":"2026-04-03T21:00:57.964Z","repository":{"id":340041145,"uuid":"1164280028","full_name":"ExceptionRegret/Kryfto","owner":"ExceptionRegret","description":"The open-source web-browsing backend for AI agents \u0026 workflow engines. Ships a 42-tool MCP server for Claude Code/Cursor/Codex, a full REST API for n8n/Zapier/Make, federated multi-engine search, anti-bot stealth, and enterprise infrastructure (Postgres, Redis, BullMQ, MinIO). Self-host for $5/mo flat","archived":false,"fork":false,"pushed_at":"2026-03-21T19:34:23.000Z","size":726,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-22T08:47:08.966Z","etag":null,"topics":["ai-agents","anti-detection","claude-code","codex","cursor","data-extraction","developer-tools","fastapi","headless-browser","mcp","mcp-server","n8n","open-source","playwright","redis","search-engine","self-hosted","stealth","web-scraping","workflow-automation"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ExceptionRegret.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-02-22T22:23:14.000Z","updated_at":"2026-03-21T19:34:27.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ExceptionRegret/Kryfto","commit_stats":null,"previous_names":["exceptionregret/kryfto"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ExceptionRegret/Kryfto","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExceptionRegret%2FKryfto","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExceptionRegret%2FKryfto/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExceptionRegret%2FKryfto/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExceptionRegret%2FKryfto/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ExceptionRegret","download_url":"https://codeload.github.com/ExceptionRegret/Kryfto/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExceptionRegret%2FKryfto/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31377109,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T17:53:18.093Z","status":"ssl_error","status_checked_at":"2026-04-03T17:53:17.617Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","anti-detection","claude-code","codex","cursor","data-extraction","developer-tools","fastapi","headless-browser","mcp","mcp-server","n8n","open-source","playwright","redis","search-engine","self-hosted","stealth","web-scraping","workflow-automation"],"created_at":"2026-03-20T18:00:24.224Z","updated_at":"2026-04-03T21:00:57.947Z","avatar_url":"https://github.com/ExceptionRegret.png","language":"TypeScript","funding_links":["https://github.com/sponsors/ExceptionRegret"],"categories":["Frameworks"],"sub_categories":["Common Lisp"],"readme":"\u003cdiv align=\"center\"\u003e\n  \n  \u003cimg src=\"assets/logo.png\" alt=\"Kryfto Logo\" width=\"280\" /\u003e\n\n  \u003ch1\u003eKryfto\u003c/h1\u003e\n\n[![Sponsor](https://img.shields.io/badge/Sponsor-%E2%9D%A4-ea4aaa?logo=github-sponsors)](https://github.com/sponsors/ExceptionRegret)\n\n  \u003cp\u003e\u003cstrong\u003eThe Production-Grade Browser Data Collection Runtime\u003c/strong\u003e\u003c/p\u003e\n  \n  [![License: Apache-2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n  [![Node.js 20+](https://img.shields.io/badge/node-20%2B-brightgreen.svg)]()\n  [![MCP Tools: 42+](https://img.shields.io/badge/MCP_Tools-42%2B-purple.svg)]()\n\n  [![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/new/template)\n  [![Deploy to DO](https://www.deploytodo.com/do-btn-blue.svg)](https://cloud.digitalocean.com/apps/new)\n  \n  \u003cp\u003eSelf-host your own headless browser fleet. Connect it instantly to AI agents, IDEs, and workflow engines via OpenAPI and MCP.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003chr/\u003e\n\n## ✨ Core Features\n\nKryfto is a comprehensive framework for automated data extraction, web crawling, and browser session execution.\n\n- **🤖 AI Agent Ready**: Ships with a built-in [Model Context Protocol (MCP)](https://modelcontextprotocol.io) server exposing **42+ tools**. Instantly give Claude, Cursor, or Codex the ability to search, browse, extract, fact-check, run continuous research agents, and benchmark search quality on the live web.\n- **🕵️‍♂️ Advanced Stealth \u0026 Anti-Bot Engine**: Unified anti-bot layer with **12 rotated modern User-Agents** (Chrome 130–133, Edge 131/133), per-browser `Sec-Ch-Ua` client hints, `Sec-Fetch-*` headers, Chromium-only `Accept` strings, engine-appropriate `Referer` headers, per-engine request spacing delays, canvas fingerprint randomization, WebGL vendor/renderer spoofing, `navigator.platform` matching, `hardwareConcurrency` randomization, WebRTC IP leak prevention, and an RFC 6265-compliant in-memory cookie jar with 30min TTL. **New in v3.5.1:** Consistent cross-signal fingerprints (UA matched to platform, screen, WebGL, fonts, and audio), 20-point browser evasion suite, humanized browser interactions (Bezier curve mouse movements with micro-overshoots, realistic typing with typos, smooth scrolling), per-domain browser session pool with 30min TTL, and browser-based CAPTCHA solving for Cloudflare Turnstile, reCAPTCHA v2, hCaptcha, and Datadome — all without external paid APIs. reCAPTCHA image grids are classified locally via CLIP vision (`clip-vit-large-patch14`), and audio challenges are transcribed locally via Whisper, both using `@xenova/transformers`.\n- **🛡️ Zero Trace Privacy**: Execute purely in-memory HTTP extractions wrapping our bot-evasion without persisting any telemetry or artifacts to the Postgres database.\n- **⚙️ Workflow Engine Native**: Fully documented OpenAPI spec makes it trivial to drop into `n8n`, Zapier, Make, or custom Python/TypeScript pipelines.\n- **🖥️ Admin Dashboard**: Built-in React admin UI (port 3001) for managing tokens, projects, jobs, crawls, audit logs, and per-role rate limits. Includes an interactive **API Playground** for testing any endpoint live and an **Examples** page with ready-to-use cURL commands. Dark-themed SPA served as a separate nginx container.\n- **☁️ Enterprise Infrastructure**: Backed by **Postgres** for persistence, **Redis + BullMQ** for reliable concurrent job queuing, and **MinIO/S3** for long-term artifact storage.\n- **📊 SLO Dashboard \u0026 Eval Suite**: Built-in reliability monitoring with per-tool success rates, latency percentiles (p50/p95/p99), deterministic request replay, and a 10-query benchmark suite for nightly regression testing.\n- **🔄 Continuous Research Agent**: Deploy autonomous background research loops that search, monitor, diff pages, and fire webhook alerts — all from a single MCP tool call.\n\n---\n\n## 🚀 Quickstart (Self-Hosted)\n\nGet Kryfto running locally in seconds using Docker Compose.\n\n```bash\n# Option 1: Auto-generate a secure .env with random tokens \u0026 passwords\nnode scripts/generate-env.mjs -o .env\n\n# Option 2: Or copy the example and fill in values manually\ncp .env.example .env\n\n# Spin up the entire infrastructure (API, Dashboard, Worker, Postgres, Redis, Minio S3)\ndocker compose up -d --build\n\n# Verify health\ncurl -H \"Authorization: Bearer $KRYFTO_API_TOKEN\" http://localhost:8080/v1/healthz\n```\n\nThe **Admin Dashboard** is available at `http://localhost:3001/dashboard/` — log in with your admin API token to manage tokens, projects, jobs, crawls, audit logs, and rate limits.\n\nOnce running, you can immediately dispatch extraction jobs to the headless worker fleet:\n\n```bash\ncurl -X POST http://localhost:8080/v1/jobs \\\n  -H \"Authorization: Bearer $KRYFTO_API_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Idempotency-Key: demo-example-1\" \\\n  -d '{\"url\":\"https://example.com\"}'\n```\n\n### Reading Extracted Data\n\nAfter the job succeeds, retrieve the extracted Markdown or HTML artifact:\n\n```bash\ncurl -H \"Authorization: Bearer $KRYFTO_API_TOKEN\" \\\n  http://localhost:8080/v1/jobs/\u003cjobId\u003e/artifacts\n```\n\n### Running a Federated Search\n\nFind up-to-date information across DuckDuckGo, Brave, and Google natively:\n\n```bash\ncurl -X POST http://localhost:8080/v1/search \\\n  -H \"Authorization: Bearer $KRYFTO_API_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\":\"playwright testing\", \"limit\":5, \"officialOnly\":true}'\n```\n\n\u003e **Note:** For a full breakdown of the REST API, parameter schemas, and advanced options, please refer to the [**API Reference Guide**](docs/api-reference.md).\n\n---\n\n## 📚 Documentation Index\n\nWe maintain exhaustive documentation for every component of the Kryfto stack.\n\n| Guide                                         | Description                                                                                                |\n| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |\n| 📖 [**Usage Examples**](docs/usage.md)        | Exhaustive API, CLI, and cURL examples for scraping, crawling, and scheduling retries.                     |\n| 🚀 [**Deployment Guides**](docs/deploy.md)    | How to deploy to Railway, DigitalOcean, and naked Linux VPS instances securely.                            |\n| 🤖 [**MCP Integration**](docs/mcp.md)         | How to connect Cursor, Claude Code, and Codex to your Kryfto server via HTTPS or SSH tunneling.            |\n| ⚡ [**n8n Workflow Guide**](docs/n8n.md)      | How to automate advanced, stealthy web extractions straight into Google Sheets using n8n.                  |\n| 🔒 [**Security \u0026 Roles**](docs/security.md)   | Setting up RBAC, admin tokens, token expiration, per-role rate limits, and preventing SSRF.                 |\n| 🏗️ [**Architecture**](docs/architecture.md)   | A deep-dive into the BullMQ, Redis, Node, and MinIO scaling infrastructure map.                            |\n| 🥘 [**Extraction Recipes**](docs/recipes.md)  | Pre-written JSON extraction selectors for popular websites. Auto-imported as dynamic `recipe_*` MCP tools. |\n| 🔌 [**OpenAPI Spec**](docs/openapi.yaml)      | The raw `yaml` schema defining the fully-typed REST API.                                                   |\n| ⚙️ [**API Reference**](docs/api-reference.md) | Structured usage guide for Jobs, Artifacts, and Search endpoints.                                          |\n\n---\n\n## 🧩 Ecosystem Integrations\n\nKryfto isn't just an API—it's designed to act as the web-browsing \"motor cortex\" for your existing tools.\n\n### 1. 🤖 Claude Code, Cursor, \u0026 Codex (MCP)\n\nYou can directly attach Kryfto to your AI assistant using the bundled **Model Context Protocol (MCP)** server.\n\n#### 🪄 Auto-Generate Configuration\n\nThe easiest way to get your IDE connected is to run the interactive setup wizard. It will auto-detect your API token and absolute path:\n\n```bash\nnode scripts/setup-mcp.mjs\n```\n\n_Select your client (Claude, Cursor, Codex, RooCode) and copy the generated JSON/TOML into your config file._\n\n---\n\n#### Manual Configuration\n\n**Claude Code / Cursor** — Add to `claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"kryfto\": {\n      \"command\": \"node\",\n      \"args\": [\"/absolute/path/to/kryfto/packages/mcp-server/dist/index.js\"],\n      \"env\": {\n        \"API_BASE_URL\": \"http://localhost:8080\",\n        \"API_TOKEN\": \"\u003cyour-token\u003e\"\n      }\n    }\n  }\n}\n```\n\n**OpenAI Codex** — Add to `.codex/config.toml` (per-project) or `~/.codex/config.toml` (global):\n\n```toml\n[mcp_servers.kryfto]\ncommand = \"node\"\nargs = [\"/absolute/path/to/kryfto/packages/mcp-server/dist/index.js\"]\n\n[mcp_servers.kryfto.env]\nAPI_BASE_URL = \"http://localhost:8080\"\nAPI_TOKEN = \"\u003cyour-token\u003e\"\n```\n\n**Remote VPS configuration (`claude_desktop_config.json` / Cursor MCP Menu):**\n\n**⚠️ SSH Keys Required:** The MCP tunnel relies on `stdio` and cannot accept manual passwords. You must set up SSH Key authentication from your local machine to your VPS.\n\n**macOS/Linux:**\n\n```bash\nssh-keygen -t ed25519 -C \"your_email@example.com\"\nssh-copy-id user@your-vps-ip\n```\n\n**Windows (PowerShell):**\n\n```powershell\nssh-keygen -t ed25519 -C \"your_email@example.com\"\n$Key = Get-Content \"$env:USERPROFILE\\.ssh\\id_ed25519.pub\"\nssh user@your-vps-ip \"mkdir -p ~/.ssh \u0026\u0026 echo '$Key' \u003e\u003e ~/.ssh/authorized_keys\"\n```\n\nOnce `ssh user@your-vps-ip` logs you in instantly without a password, paste this config:\n\n```json\n{\n  \"mcpServers\": {\n    \"kryfto-remote\": {\n      \"command\": \"ssh\",\n      \"args\": [\n        \"user@your-vps-ip\",\n        \"API_BASE_URL=http://localhost:8080\",\n        \"API_TOKEN=\u003cyour-token\u003e\",\n        \"node\",\n        \"/absolute/path/on/vps/to/kryfto/packages/mcp-server/dist/index.js\"\n      ]\n    }\n  }\n}\n```\n\n#### 🏆 Kryfto vs. Built-in Agent Browsers\n\nWhy install Kryfto when Claude and Cursor have built-in web search? Because Kryfto is engineered specifically for **evidence-based deterministic scraping** rather than noisy LLM-summarized search.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/benchmark-proof.png\" alt=\"Kryfto MCP vs Built-in Browser Benchmark\" width=\"800\" /\u003e\n\u003c/div\u003e\n\n**Real-world benchmark (Query: `latest Next.js 15 features`):**\n\n- **Built-in Browser:** Returns a mix of non-official blogs (e.g., `nextjs15.com`), video results, and unstructured snippets. Fails to consistently identify the newest minor release.\n- **Kryfto MCP:** Extracts the semantic release version (`15.5`) from the URL, automatically ranks the official `nextjs.org` blog at **Rank #1**, and extracts the raw Markdown documentation structure (headings, code blocks, publish date) in a single deterministic pass.\n\n\u003e _\"For this specific task and latest run, **I prefer MCP.** Reason: it returned the official `nextjs.org` 15.5 page first and gave structured output (`published_at`, sections, extracted markdown) in one step. - AI Assistant Verdict\"_\n\n_Read the complete [MCP Documentation](docs/mcp.md) for full tool breakdowns._\n\n### 2. ⚡ n8n \u0026 Workflow Automation (Deep Dive)\n\nKryfto exposes a fully typed `/v1` REST API complete with an OpenAPI specification, making it the perfect engine for visual automation tools like **n8n**, **Make**, or **Zapier**.\n\nInstead of paying for expensive API credits on premium scraping platforms, you can use n8n's native **HTTP Request** node to trigger Kryfto's headless browsers.\n\n**How to build an n8n Web Scraping Pipeline:**\n\n1. **Trigger:** Set up a Schedule Trigger (e.g., run every morning at 8 AM).\n2. **Action (Kryfto):** Add an HTTP Request node pointing to your Kryfto instance:\n   - **Method:** `POST`\n   - **URL:** `http://your-vps-ip:8080/v1/jobs`\n   - **Headers:** `Authorization: Bearer \u003cyour-token\u003e`\n   - **Body (Extraction Job):**\n     ```json\n     {\n       \"url\": \"https://news.ycombinator.com\",\n       \"options\": {\n         \"browserEngine\": \"chromium\"\n       },\n       \"extract\": {\n         \"mode\": \"selectors\",\n         \"selectors\": {\n           \"topStories\": \".titleline \u003e a\"\n         }\n       }\n     }\n     ```\n   - **Alternative Body (Deep Search Pipeline):**\n     Use Kryfto's `/v1/search` endpoint instead to find links on DuckDuckGo, then route the JSON results array into an n8n _Split In Batches_ Node to crawl them automatically!\n     ```json\n     {\n       \"query\": \"best enterprise headless CMS tools 2025\",\n       \"limit\": 5,\n       \"engine\": \"duckduckgo\",\n       \"safeSearch\": \"moderate\",\n       \"locale\": \"us-en\"\n     }\n     ```\n3. **Processing:** Add a subsequent node to parse the returned JSON.\n4. **Destination:** Send the formatted data to Google Sheets, Notion, or Slack!\n\n### 3. 🔍 Native Fallback Search Engine (Cutting API Costs)\n\nNeed to execute multi-engine searches without paying outrageous API limits?\n\nTraditional platforms force you to buy expensive **Google Custom Search** or **Bing Search APIs** for basic discovery. Kryfto's SDK routes headless scraping traffic directly through the native HTML search interfaces of search providers, specifically designed for resilience against bots.\n\nYou can instantly find leads or domains _without paying a cent in API credits_:\n\n- **Engines**: `duckduckgo`, `bing`, `yahoo`, `brave`, `google` _(all engines work without external API keys — Google CAPTCHAs are solved locally via CLIP vision and Whisper audio)_.\n\n---\n\n## 💡 Why Kryfto? (Cost Savings \u0026 Benefits)\n\nMost modern AI and web-scraping architectures rely on expensive third-party APIs (like Firecrawl, Apify, or Browserless). Kryfto replaces these dependencies by giving you **complete ownership of your scraping infrastructure**.\n\n### 💸 The Scraping Cost Comparison (100k Requests)\n\n| Platform                     | Cost per 100,000 Pages | Concurrency Limits            | Wait-for-Selectors |\n| ---------------------------- | ---------------------- | ----------------------------- | ------------------ |\n| **Firecrawl.dev**            | ~$100.00 / mo          | 50-100 Concurrent             | Paid Extra         |\n| **Browserless.io**           | ~$200.00 / mo          | Route-dependent               | Paid Extra         |\n| **Apify (Web Scraper)**      | ~$50.00+ / mo          | Memory restricted             | Standard           |\n| **Kryfto (Self-Hosted VPS)** | **$5.00 / mo Flat**    | **Scales With Hardware** | **Included Free**  |\n\n- 💰 **Zero Per-Request Costs:** As the table shows, stop paying per-API-call limits. By self-hosting Kryfto on a $5/month DigitalOcean droplet or Railway instance, you can run browser extractions at scale for a flat infrastructure fee. Concurrency is bounded by your hardware and `WORKER_GLOBAL_CONCURRENCY` setting (default: 2, increase based on available RAM).\n- 🛡️ **Total Data Privacy:** When you connect local IDEs (Cursor/Claude) or internal databases to Kryfto, your sensitive queries and raw scraped HTML never leave your VPC or touch a third-party analytics server.\n- 🚦 **Unmetered Concurrency:** You dictate your rate limits. If you need to spin up 50 headless Chromium instances simultaneously, simply scale your worker droplet without hitting external API throttles.\n- 🤖 **AI-Context Optimization:** Kryfto automatically cleans, minifies, and converts bloated web HTML into dense Markdown. This drastically reduces LLM token consumption and improves context window limits when passing context to Claude or OpenAI.\n\n---\n\n## 🎯 Primary Use Cases \u0026 Solutions\n\n### Use Case 1: Automated Market Research \u0026 Price Monitoring\n\n**The Problem:** You need to track competitor product pricing across 10 different e-commerce sites daily, but they aggressively block basic python `requests` scripts.\n**The Kryfto Solution:**\n\n- Enable `KRYFTO_STEALTH_MODE=true` and feed residential proxies into `KRYFTO_PROXY_URLS`.\n- Use the REST API to schedule daily `crawl` jobs pointing to competitor catalogs.\n- Kryfto bypasses their bot protection, extracts the prices using CSS selectors (`\"price\": \".amount\"`), and drops the raw JSON directly into your MinIO storage bucket for your analytics dashboard to query.\n\n### Use Case 2: Unblocking AI Coding Assistants\n\n**The Problem:** Your AI assistant (Cursor, Claude Code) is writing code using outdated documentation because the framework released a new version yesterday that isn't in its training data.\n**The Kryfto Solution:**\n\n- Install the Kryfto MCP Server into your IDE configuration.\n- Ask your agent: _\"Search for the newest Next.js App Router caching docs and update my code.\"_\n- Kryfto executes the search, extracts the live, up-to-date documentation, and pipes it straight into the AI's context window—allowing it to write perfect, modern code.\n\n### Use Case 3: Proprietary Lead Generation Pipelines\n\n**The Problem:** You want to build a pipeline that finds local businesses on directory sites and extracts their contact emails to automatically pipe into your CRM.\n**The Kryfto Solution:**\n\n- Connect Kryfto to an n8n workflow.\n- Step 1: Trigger Kryfto to execute a `search` for \"plumbers in Chicago\".\n- Step 2: Loop through the search results and trigger Kryfto `browse` extraction jobs on each result's URL, targeting `mailto:` hrefs or contact page DOM nodes.\n- Step 3: Automatically POST the collected emails directly into HubSpot or Salesforce.\n\n### Use Case 4: Evidence-Based Technical Research\n\n**The Problem:** Your team makes decisions based on blog posts and Stack Overflow answers with no source verification. You need traceable, trustworthy evidence.\n**The Kryfto Solution:**\n\n- Use `answer_with_evidence` to ask a question like \"Does React 19 support server components?\" — it searches, reads official pages, extracts paragraph-level evidence spans, and ranks them by domain trust score.\n- Use `conflict_detector` to check if multiple sources contradict each other on a topic.\n- Use `confidence_calibration` to score each claim based on source count, official source presence, recency, and domain trust.\n\n### Use Case 5: Framework Upgrade Risk Assessment\n\n**The Problem:** You need to upgrade Next.js from v13 to v14 but don't know what will break.\n**The Kryfto Solution:**\n\n- Call `upgrade_impact` with `framework: \"nextjs\", fromVersion: \"13\", toVersion: \"14\"` — it fetches migration guides, scans for breaking/deprecated/removed keywords, and rates the risk as low/medium/high.\n- Combine with `github_releases` and `github_diff` to see every commit between tags.\n- Use `query_planner` to preview the entire search→read→extract chain before executing.\n\n### Use Case 6: Continuous Documentation Monitoring\n\n**The Problem:** A critical API's docs change without notice, breaking your integration.\n**The Kryfto Solution:**\n\n- `watch_and_act` registers the URL with an optional Slack/Discord webhook and a semantic `context` filter.\n- Periodically call `check_watch` — if the page changed, it auto-fires a POST to your webhook with the diff and reports delivery status.\n- Use `semantic_diff` with context like \"authentication\" to filter only changes relevant to you.\n- For fully autonomous monitoring, use `continuous_research_start` — it runs search→watch→diff→alert loops on a configurable interval, notifying your webhook of every new finding.\n\n### Use Case 7: SLO Monitoring \u0026 Production Reliability\n\n**The Problem:** You need to know if your AI agent's browsing tool is degrading before users notice.\n**The Kryfto Solution:**\n\n- `slo_dashboard` shows real-time per-tool success rate, p50/p95/p99 latency, cache hit rate, and freshness.\n- `run_eval_suite` runs 10 real-world queries nightly, checking that official sources appear in results — measures precision% and average latency.\n- `replay_request` retrieves the exact input/output of any previous call by `requestId` for debugging.\n\n---\n\n## 🥷 Anti-Bot \u0026 Stealth Configuration\n\nKryfto ships with a unified stealth layer (`packages/shared/src/stealth.ts`) designed to make every HTTP request indistinguishable from organic browser traffic.\n\n### What’s Included (Zero Config Required)\n\n| Feature | Description |\n|---|---|\n| **User-Agent Rotation** | 12 Chromium-only UAs covering Chrome 130–133 and Edge 131/133 (Firefox/Safari UAs removed to avoid fingerprint mismatches) |\n| **Client Hints (`Sec-Ch-Ua`)** | Correct per-browser hints for Chrome/Edge |\n| **Sec-Fetch Headers** | Full `Sec-Fetch-Dest/Mode/Site/User` set for all Chromium-based UAs |\n| **Accept Headers** | Chromium-standard Accept strings for all UAs |\n| **Referer** | Engine homepage injected automatically (e.g., `https://www.google.com/` for Google queries) |\n| **Request Spacing** | Per-engine delays: Google 1500–3000ms, Bing/Yahoo 400–800ms, DDG 200–500ms, Brave 300–600ms |\n| **Cookie Jar** | RFC 6265-compliant in-memory `Set-Cookie` persistence with Domain/Path/Secure/HttpOnly matching and 30min TTL |\n| **Platform Hints** | Derived from UA: Windows/macOS/Linux |\n| **Canvas Fingerprint** | Subtle pixel noise injected into `toDataURL`/`toBlob` to defeat canvas fingerprinting |\n| **Fingerprint Consistency** | UA, platform, screen resolution, WebGL vendor/renderer, fonts, and audio are cross-matched per profile |\n| **20-Point Browser Evasion** | webdriver, plugins, mimeTypes, platform, languages, deviceMemory, connection/Battery APIs, screen props, chrome runtime, permissions, canvas noise, WebGL, AudioContext, WebRTC leak prevention, iframe patches, CDP filtering, headless patches, timing noise, hasFocus, font defense |\n| **WebGL Spoofing** | Reports \"Intel Inc.\" / \"Intel Iris OpenGL Engine\" instead of headless renderer |\n| **Hardware Concurrency** | Randomized from realistic values (4, 6, 8, 10, 12, 16) |\n| **navigator.webdriver** | Patched to `false` in Playwright browser contexts |\n| **Humanized Interactions** | Bezier curve mouse movements, realistic typing with typos + backspace, smooth chunked scrolling |\n| **Browser Session Pool** | Per-domain context reuse with 30min TTL — avoids repeated challenges on subsequent requests |\n| **CAPTCHA Solver** | Browser-based solving for Turnstile, reCAPTCHA v2, hCaptcha, Datadome (no external API keys) |\n| **CLIP Vision Classifier** | Local CLIP (`clip-vit-large-patch14`) via `@xenova/transformers` for reCAPTCHA/hCaptcha image grid challenges |\n| **Audio Transcription** | Local Whisper via `@xenova/transformers` for reCAPTCHA/hCaptcha audio challenges (fallback) |\n| **Google Consent Cookie** | SOCS cookie injection to bypass EU consent interstitials |\n\n### Optional Proxy Configuration\n\nFor crawling highly-protected sites (Cloudflare, Datadome, etc.), add proxies in your `.env`:\n\n```env\nKRYFTO_STEALTH_MODE=true\nKRYFTO_ROTATE_USER_AGENT=true\n# Feed it a comma-separated list of premium residential proxies\nKRYFTO_PROXY_URLS=socks5://proxy1:1080,http://user:pass@proxy2:8080\n```\n\n---\n\n## 🏗️ Architecture\n\nKryfto is structured as an NPM monorepo using `pnpm` workspaces.\n\n```\n┌─────────────────────────────────────────────────────────────────────────────────┐\n│                              CLIENTS                                            │\n│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐  │\n│  │  cURL /  │  │  n8n /   │  │  Claude  │  │  CLI     │  │  Admin           │  │\n│  │  SDK-TS  │  │  Zapier  │  │  Cursor  │  │  Tool    │  │  Dashboard       │  │\n│  │  SDK-PY  │  │  Make    │  │  Codex   │  │          │  │  (React SPA)     │  │\n│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────────┬─────────┘  │\n│       │              │             │              │                 │            │\n│       │         REST API (/v1)     │     MCP (stdio)          REST API          │\n│       │              │             │              │           (/v1/admin)        │\n└───────┼──────────────┼─────────────┼──────────────┼─────────────────┼────────────┘\n        │              │             │              │                 │\n        ▼              ▼             │              │                 ▼\n┌───────────────────────────┐       │              │    ┌─────────────────────────┐\n│      Fastify API          │       │              │    │   nginx (Dashboard)     │\n│      (apps/api)           │       │              │    │   :3001                 │\n│  :8080                    │◄──────┼──────────────┼────│   /dashboard/* → SPA    │\n│                           │       │              │    │   /v1/*  → proxy → API  │\n│  ┌─────────────────────┐  │       │              │    │                         │\n│  │ Auth \u0026 RBAC         │  │       │              │    │   9 pages:              │\n│  │ • SHA-256 tokens    │  │       │              │    │   Overview · Tokens     │\n│  │ • 3 roles           │  │       │              │    │   Projects · Jobs       │\n│  │ • Token expiration  │  │       │              │    │   Crawls · Audit Logs   │\n│  │ • Per-role rate lim. │  │       │              │    │   Rate Limits           │\n│  └─────────────────────┘  │       │              │    │   API Playground        │\n│  ┌─────────────────────┐  │       ▼              │    │   API Examples          │\n│  │ Route Handlers      │  │  ┌────────────────┐  │    └─────────────────────────┘\n│  │ • Jobs CRUD         │  │  │ MCP Server     │  │\n│  │ • Search (5 engines)│  │  │ (packages/     │  │\n│  │ • Crawl             │  │  │  mcp-server)   │  │\n│  │ • Extract           │  │  │ 42+ tools      │  │\n│  │ • Recipes           │  │  │ • search       │──┘\n│  │ • Admin endpoints   │  │  │ • browse       │\n│  └─────────────────────┘  │  │ • research     │\n│  ┌─────────────────────┐  │  │ • extract      │\n│  │ Audit Logging       │  │  │ • watch        │\n│  │ SSRF Protection     │  │  │ • eval suite   │\n│  │ Idempotency Keys    │  │  │ • CAPTCHA solve│\n│  │ OpenTelemetry       │  │  └────────────────┘\n│  └─────────────────────┘  │\n└────────────┬──────────────┘\n             │\n             │ Enqueue (BullMQ)\n             ▼\n┌────────────────────────────────────────────────────────────────────┐\n│                         Redis :6379                                │\n│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────────┐    │\n│  │  Job Queues  │  │  Concurrency │  │  Pub/Sub (SSE logs)   │    │\n│  │  (BullMQ)   │  │  Semaphores  │  │                       │    │\n│  └──────────────┘  └──────────────┘  └───────────────────────┘    │\n└────────────────────────────┬───────────────────────────────────────┘\n                             │\n                             │ Consume\n                             ▼\n┌──────────────────────────────────────────────────────────────────────┐\n│                  Worker (apps/worker)                                 │\n│                                                                      │\n│  ┌────────────────────────┐    ┌─────────────────────────────────┐   │\n│  │  Fetch Path            │    │  Browser Path (Playwright)      │   │\n│  │  • HTTP GET/POST       │    │  • Chromium / Firefox / WebKit  │   │\n│  │  • Stealth headers     │    │  • 20-point stealth evasion     │   │\n│  │  • Cookie jar          │    │  • Fingerprint consistency      │   │\n│  └────────────────────────┘    │  • Humanized mouse/keyboard     │   │\n│                                │  • Browser session pool          │   │\n│  ┌────────────────────────┐    │  • CAPTCHA solving (CLIP/Whisper)│   │\n│  │  Extraction Engine     │    └─────────────────────────────────┘   │\n│  │  • CSS selectors       │                                          │\n│  │  • JSON Schema         │    ┌─────────────────────────────────┐   │\n│  │  • Plugin modules      │    │  Crawl Orchestrator             │   │\n│  │  • HTML → Markdown     │    │  • BFS link-follow              │   │\n│  └────────────────────────┘    │  • Depth/page caps              │   │\n│                                │  • robots.txt respect           │   │\n│                                │  • Politeness delays            │   │\n│                                └─────────────────────────────────┘   │\n└────────────────────────────┬─────────────────────────────────────────┘\n                             │\n                             │ Persist\n                             ▼\n┌────────────────────────────────────────────────────────────────────┐\n│                    Persistence Layer                                │\n│                                                                    │\n│  ┌──────────────────────────┐    ┌─────────────────────────────┐  │\n│  │  PostgreSQL :5432        │    │  MinIO / S3 :9000           │  │\n│  │                          │    │                             │  │\n│  │  • projects              │    │  • Screenshots (PNG)        │  │\n│  │  • api_tokens            │    │  • HTML snapshots           │  │\n│  │  • rate_limit_config     │    │  • HAR archives             │  │\n│  │  • jobs + job_logs       │    │  • Extracted data (JSON)    │  │\n│  │  • artifacts (metadata)  │    │  • Cookies exports          │  │\n│  │  • crawl_runs + nodes    │    │                             │  │\n│  │  • recipes               │    │  Deduplicated by SHA-256    │  │\n│  │  • audit_logs            │    │                             │  │\n│  │  • idempotency_keys      │    └─────────────────────────────┘  │\n│  │  • browser_profiles      │                                     │\n│  └──────────────────────────┘                                     │\n└────────────────────────────────────────────────────────────────────┘\n\n┌────────────────────────────────────────────────────────────────────┐\n│                    Shared Packages                                  │\n│                                                                    │\n│  ┌──────────────────────┐  ┌────────────┐  ┌──────────────────┐   │\n│  │  @kryfto/shared      │  │  @kryfto/  │  │  @kryfto/cli     │   │\n│  │  • Zod schemas       │  │  sdk-ts    │  │  • Commander CLI │   │\n│  │  • Stealth layer     │  │  • Typed   │  │  • YAML recipes  │   │\n│  │  • Search parsers    │  │    client  │  │                  │   │\n│  │  • Fingerprint gen   │  │  • Promise │  └──────────────────┘   │\n│  │  • Browser stealth   │  │    chains  │                         │\n│  │  • Humanize utils    │  └────────────┘                         │\n│  │  • CAPTCHA vision    │                                         │\n│  └──────────────────────┘                                         │\n└────────────────────────────────────────────────────────────────────┘\n```\n\n### Monorepo Layout\n\n| Path | Description |\n|---|---|\n| `apps/api` | Fastify control plane — REST API, auth/RBAC, per-role rate limiting, token expiration, admin endpoints |\n| `apps/dashboard` | React admin dashboard — token/project/job management, audit logs, rate limit config, API playground, examples |\n| `apps/worker` | BullMQ workers — Playwright browser execution, stealth, CAPTCHA solving, crawl orchestration |\n| `packages/mcp-server` | MCP bridge — 42+ tools for Claude, Cursor, Codex (search, browse, research, eval) |\n| `packages/shared` | Shared library — Zod schemas, stealth layer, search parsers, fingerprint, humanize, CAPTCHA vision |\n| `packages/sdk-ts` | TypeScript SDK — typed API client with promise chains |\n| `packages/sdk-py` | Python SDK |\n| `packages/cli` | CLI tool — Commander-based terminal interface with YAML recipe support |\n\n### Development Commands\n\n```bash\npnpm install\npnpm build\npnpm typecheck\nKRYFTO_BASE_URL=http://localhost:8080 KRYFTO_API_TOKEN=$KRYFTO_API_TOKEN pnpm test:integration\n```\n\n---\n\n## ❤️ Support the Project\n\nKryfto is free and open-source. If it saves you money on scraping APIs or helps power your AI workflows, consider supporting continued development with a small donation!\n\n| Network            | Address                                        |\n| ------------------ | ---------------------------------------------- |\n| **Bitcoin (BTC)**  | `bc1qd8ztrxucrhz27fgmu754ayq59lvjprclxdury5`   |\n| **Ethereum (ETH)** | `0x0a01779792a17fc57473a6368f3970fa1d8830ba`   |\n| **Solana (SOL)**   | `FNKjiS2zhCq3rv8bboA83pzvKwDov3wyFxQn4sy75bPr` |\n| **BNB (BSC)**      | `0x0a01779792a17fc57473a6368f3970fa1d8830ba`   |\n| **Tron (TRX)**     | `TF7YwGwP6cDCTGxLAjRKxqPss18pMp762G`           |\n\nEvery contribution helps keep the lights on and the browsers headless. 🙏\n\n---\n\n### License\n\nApache-2.0 (`LICENSE`)\n\n---\n\n## 📋 Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for the full version history.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FExceptionRegret%2FKryfto","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FExceptionRegret%2FKryfto","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FExceptionRegret%2FKryfto/lists"}