An open API service indexing awesome lists of open source software.

https://github.com/ExceptionRegret/Kryfto

The open-source web-browsing backend for AI agents & workflow engines. Ships a 42-tool MCP server for Claude Code/Cursor/Codex, a full REST API for n8n/Zapier/Make, federated multi-engine search, anti-bot stealth, and enterprise infrastructure (Postgres, Redis, BullMQ, MinIO). Self-host for $5/mo flat
https://github.com/ExceptionRegret/Kryfto

ai-agents anti-detection claude-code codex cursor data-extraction developer-tools fastapi headless-browser mcp mcp-server n8n open-source playwright redis search-engine self-hosted stealth web-scraping workflow-automation

Last synced: about 2 months ago
JSON representation

The open-source web-browsing backend for AI agents & workflow engines. Ships a 42-tool MCP server for Claude Code/Cursor/Codex, a full REST API for n8n/Zapier/Make, federated multi-engine search, anti-bot stealth, and enterprise infrastructure (Postgres, Redis, BullMQ, MinIO). Self-host for $5/mo flat

Awesome Lists containing this project

README

          



Kryfto Logo

Kryfto

[![Sponsor](https://img.shields.io/badge/Sponsor-%E2%9D%A4-ea4aaa?logo=github-sponsors)](https://github.com/sponsors/ExceptionRegret)

The Production-Grade Browser Data Collection Runtime



[![License: Apache-2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)
[![Node.js 20+](https://img.shields.io/badge/node-20%2B-brightgreen.svg)]()
[![MCP Tools: 42+](https://img.shields.io/badge/MCP_Tools-42%2B-purple.svg)]()

[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/new/template)
[![Deploy to DO](https://www.deploytodo.com/do-btn-blue.svg)](https://cloud.digitalocean.com/apps/new)

Self-host your own headless browser fleet. Connect it instantly to AI agents, IDEs, and workflow engines via OpenAPI and MCP.



## ✨ Core Features

Kryfto is a comprehensive framework for automated data extraction, web crawling, and browser session execution.

- **πŸ€– AI Agent Ready**: Ships with a built-in [Model Context Protocol (MCP)](https://modelcontextprotocol.io) server exposing **42+ tools**. Instantly give Claude, Cursor, or Codex the ability to search, browse, extract, fact-check, run continuous research agents, and benchmark search quality on the live web.
- **πŸ•΅οΈβ€β™‚οΈ Advanced Stealth & Anti-Bot Engine**: Unified anti-bot layer with **12 rotated modern User-Agents** (Chrome 130–133, Edge 131/133), per-browser `Sec-Ch-Ua` client hints, `Sec-Fetch-*` headers, Chromium-only `Accept` strings, engine-appropriate `Referer` headers, per-engine request spacing delays, canvas fingerprint randomization, WebGL vendor/renderer spoofing, `navigator.platform` matching, `hardwareConcurrency` randomization, WebRTC IP leak prevention, and an RFC 6265-compliant in-memory cookie jar with 30min TTL. **New in v3.5.1:** Consistent cross-signal fingerprints (UA matched to platform, screen, WebGL, fonts, and audio), 20-point browser evasion suite, humanized browser interactions (Bezier curve mouse movements with micro-overshoots, realistic typing with typos, smooth scrolling), per-domain browser session pool with 30min TTL, and browser-based CAPTCHA solving for Cloudflare Turnstile, reCAPTCHA v2, hCaptcha, and Datadome β€” all without external paid APIs. reCAPTCHA image grids are classified locally via CLIP vision (`clip-vit-large-patch14`), and audio challenges are transcribed locally via Whisper, both using `@xenova/transformers`.
- **πŸ›‘οΈ Zero Trace Privacy**: Execute purely in-memory HTTP extractions wrapping our bot-evasion without persisting any telemetry or artifacts to the Postgres database.
- **βš™οΈ Workflow Engine Native**: Fully documented OpenAPI spec makes it trivial to drop into `n8n`, Zapier, Make, or custom Python/TypeScript pipelines.
- **πŸ–₯️ Admin Dashboard**: Built-in React admin UI (port 3001) for managing tokens, projects, jobs, crawls, audit logs, and per-role rate limits. Includes an interactive **API Playground** for testing any endpoint live and an **Examples** page with ready-to-use cURL commands. Dark-themed SPA served as a separate nginx container.
- **☁️ Enterprise Infrastructure**: Backed by **Postgres** for persistence, **Redis + BullMQ** for reliable concurrent job queuing, and **MinIO/S3** for long-term artifact storage.
- **πŸ“Š SLO Dashboard & Eval Suite**: Built-in reliability monitoring with per-tool success rates, latency percentiles (p50/p95/p99), deterministic request replay, and a 10-query benchmark suite for nightly regression testing.
- **πŸ”„ Continuous Research Agent**: Deploy autonomous background research loops that search, monitor, diff pages, and fire webhook alerts β€” all from a single MCP tool call.

---

## πŸš€ Quickstart (Self-Hosted)

Get Kryfto running locally in seconds using Docker Compose.

```bash
# Option 1: Auto-generate a secure .env with random tokens & passwords
node scripts/generate-env.mjs -o .env

# Option 2: Or copy the example and fill in values manually
cp .env.example .env

# Spin up the entire infrastructure (API, Dashboard, Worker, Postgres, Redis, Minio S3)
docker compose up -d --build

# Verify health
curl -H "Authorization: Bearer $KRYFTO_API_TOKEN" http://localhost:8080/v1/healthz
```

The **Admin Dashboard** is available at `http://localhost:3001/dashboard/` β€” log in with your admin API token to manage tokens, projects, jobs, crawls, audit logs, and rate limits.

Once running, you can immediately dispatch extraction jobs to the headless worker fleet:

```bash
curl -X POST http://localhost:8080/v1/jobs \
-H "Authorization: Bearer $KRYFTO_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: demo-example-1" \
-d '{"url":"https://example.com"}'
```

### Reading Extracted Data

After the job succeeds, retrieve the extracted Markdown or HTML artifact:

```bash
curl -H "Authorization: Bearer $KRYFTO_API_TOKEN" \
http://localhost:8080/v1/jobs//artifacts
```

### Running a Federated Search

Find up-to-date information across DuckDuckGo, Brave, and Google natively:

```bash
curl -X POST http://localhost:8080/v1/search \
-H "Authorization: Bearer $KRYFTO_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query":"playwright testing", "limit":5, "officialOnly":true}'
```

> **Note:** For a full breakdown of the REST API, parameter schemas, and advanced options, please refer to the [**API Reference Guide**](docs/api-reference.md).

---

## πŸ“š Documentation Index

We maintain exhaustive documentation for every component of the Kryfto stack.

| Guide | Description |
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| πŸ“– [**Usage Examples**](docs/usage.md) | Exhaustive API, CLI, and cURL examples for scraping, crawling, and scheduling retries. |
| πŸš€ [**Deployment Guides**](docs/deploy.md) | How to deploy to Railway, DigitalOcean, and naked Linux VPS instances securely. |
| πŸ€– [**MCP Integration**](docs/mcp.md) | How to connect Cursor, Claude Code, and Codex to your Kryfto server via HTTPS or SSH tunneling. |
| ⚑ [**n8n Workflow Guide**](docs/n8n.md) | How to automate advanced, stealthy web extractions straight into Google Sheets using n8n. |
| πŸ”’ [**Security & Roles**](docs/security.md) | Setting up RBAC, admin tokens, token expiration, per-role rate limits, and preventing SSRF. |
| πŸ—οΈ [**Architecture**](docs/architecture.md) | A deep-dive into the BullMQ, Redis, Node, and MinIO scaling infrastructure map. |
| πŸ₯˜ [**Extraction Recipes**](docs/recipes.md) | Pre-written JSON extraction selectors for popular websites. Auto-imported as dynamic `recipe_*` MCP tools. |
| πŸ”Œ [**OpenAPI Spec**](docs/openapi.yaml) | The raw `yaml` schema defining the fully-typed REST API. |
| βš™οΈ [**API Reference**](docs/api-reference.md) | Structured usage guide for Jobs, Artifacts, and Search endpoints. |

---

## 🧩 Ecosystem Integrations

Kryfto isn't just an APIβ€”it's designed to act as the web-browsing "motor cortex" for your existing tools.

### 1. πŸ€– Claude Code, Cursor, & Codex (MCP)

You can directly attach Kryfto to your AI assistant using the bundled **Model Context Protocol (MCP)** server.

#### πŸͺ„ Auto-Generate Configuration

The easiest way to get your IDE connected is to run the interactive setup wizard. It will auto-detect your API token and absolute path:

```bash
node scripts/setup-mcp.mjs
```

_Select your client (Claude, Cursor, Codex, RooCode) and copy the generated JSON/TOML into your config file._

---

#### Manual Configuration

**Claude Code / Cursor** β€” Add to `claude_desktop_config.json`:

```json
{
"mcpServers": {
"kryfto": {
"command": "node",
"args": ["/absolute/path/to/kryfto/packages/mcp-server/dist/index.js"],
"env": {
"API_BASE_URL": "http://localhost:8080",
"API_TOKEN": ""
}
}
}
}
```

**OpenAI Codex** β€” Add to `.codex/config.toml` (per-project) or `~/.codex/config.toml` (global):

```toml
[mcp_servers.kryfto]
command = "node"
args = ["/absolute/path/to/kryfto/packages/mcp-server/dist/index.js"]

[mcp_servers.kryfto.env]
API_BASE_URL = "http://localhost:8080"
API_TOKEN = ""
```

**Remote VPS configuration (`claude_desktop_config.json` / Cursor MCP Menu):**

**⚠️ SSH Keys Required:** The MCP tunnel relies on `stdio` and cannot accept manual passwords. You must set up SSH Key authentication from your local machine to your VPS.

**macOS/Linux:**

```bash
ssh-keygen -t ed25519 -C "your_email@example.com"
ssh-copy-id user@your-vps-ip
```

**Windows (PowerShell):**

```powershell
ssh-keygen -t ed25519 -C "your_email@example.com"
$Key = Get-Content "$env:USERPROFILE\.ssh\id_ed25519.pub"
ssh user@your-vps-ip "mkdir -p ~/.ssh && echo '$Key' >> ~/.ssh/authorized_keys"
```

Once `ssh user@your-vps-ip` logs you in instantly without a password, paste this config:

```json
{
"mcpServers": {
"kryfto-remote": {
"command": "ssh",
"args": [
"user@your-vps-ip",
"API_BASE_URL=http://localhost:8080",
"API_TOKEN=",
"node",
"/absolute/path/on/vps/to/kryfto/packages/mcp-server/dist/index.js"
]
}
}
}
```

#### πŸ† Kryfto vs. Built-in Agent Browsers

Why install Kryfto when Claude and Cursor have built-in web search? Because Kryfto is engineered specifically for **evidence-based deterministic scraping** rather than noisy LLM-summarized search.


Kryfto MCP vs Built-in Browser Benchmark

**Real-world benchmark (Query: `latest Next.js 15 features`):**

- **Built-in Browser:** Returns a mix of non-official blogs (e.g., `nextjs15.com`), video results, and unstructured snippets. Fails to consistently identify the newest minor release.
- **Kryfto MCP:** Extracts the semantic release version (`15.5`) from the URL, automatically ranks the official `nextjs.org` blog at **Rank #1**, and extracts the raw Markdown documentation structure (headings, code blocks, publish date) in a single deterministic pass.

> _"For this specific task and latest run, **I prefer MCP.** Reason: it returned the official `nextjs.org` 15.5 page first and gave structured output (`published_at`, sections, extracted markdown) in one step. - AI Assistant Verdict"_

_Read the complete [MCP Documentation](docs/mcp.md) for full tool breakdowns._

### 2. ⚑ n8n & Workflow Automation (Deep Dive)

Kryfto exposes a fully typed `/v1` REST API complete with an OpenAPI specification, making it the perfect engine for visual automation tools like **n8n**, **Make**, or **Zapier**.

Instead of paying for expensive API credits on premium scraping platforms, you can use n8n's native **HTTP Request** node to trigger Kryfto's headless browsers.

**How to build an n8n Web Scraping Pipeline:**

1. **Trigger:** Set up a Schedule Trigger (e.g., run every morning at 8 AM).
2. **Action (Kryfto):** Add an HTTP Request node pointing to your Kryfto instance:
- **Method:** `POST`
- **URL:** `http://your-vps-ip:8080/v1/jobs`
- **Headers:** `Authorization: Bearer `
- **Body (Extraction Job):**
```json
{
"url": "https://news.ycombinator.com",
"options": {
"browserEngine": "chromium"
},
"extract": {
"mode": "selectors",
"selectors": {
"topStories": ".titleline > a"
}
}
}
```
- **Alternative Body (Deep Search Pipeline):**
Use Kryfto's `/v1/search` endpoint instead to find links on DuckDuckGo, then route the JSON results array into an n8n _Split In Batches_ Node to crawl them automatically!
```json
{
"query": "best enterprise headless CMS tools 2025",
"limit": 5,
"engine": "duckduckgo",
"safeSearch": "moderate",
"locale": "us-en"
}
```
3. **Processing:** Add a subsequent node to parse the returned JSON.
4. **Destination:** Send the formatted data to Google Sheets, Notion, or Slack!

### 3. πŸ” Native Fallback Search Engine (Cutting API Costs)

Need to execute multi-engine searches without paying outrageous API limits?

Traditional platforms force you to buy expensive **Google Custom Search** or **Bing Search APIs** for basic discovery. Kryfto's SDK routes headless scraping traffic directly through the native HTML search interfaces of search providers, specifically designed for resilience against bots.

You can instantly find leads or domains _without paying a cent in API credits_:

- **Engines**: `duckduckgo`, `bing`, `yahoo`, `brave`, `google` _(all engines work without external API keys β€” Google CAPTCHAs are solved locally via CLIP vision and Whisper audio)_.

---

## πŸ’‘ Why Kryfto? (Cost Savings & Benefits)

Most modern AI and web-scraping architectures rely on expensive third-party APIs (like Firecrawl, Apify, or Browserless). Kryfto replaces these dependencies by giving you **complete ownership of your scraping infrastructure**.

### πŸ’Έ The Scraping Cost Comparison (100k Requests)

| Platform | Cost per 100,000 Pages | Concurrency Limits | Wait-for-Selectors |
| ---------------------------- | ---------------------- | ----------------------------- | ------------------ |
| **Firecrawl.dev** | ~$100.00 / mo | 50-100 Concurrent | Paid Extra |
| **Browserless.io** | ~$200.00 / mo | Route-dependent | Paid Extra |
| **Apify (Web Scraper)** | ~$50.00+ / mo | Memory restricted | Standard |
| **Kryfto (Self-Hosted VPS)** | **$5.00 / mo Flat** | **Scales With Hardware** | **Included Free** |

- πŸ’° **Zero Per-Request Costs:** As the table shows, stop paying per-API-call limits. By self-hosting Kryfto on a $5/month DigitalOcean droplet or Railway instance, you can run browser extractions at scale for a flat infrastructure fee. Concurrency is bounded by your hardware and `WORKER_GLOBAL_CONCURRENCY` setting (default: 2, increase based on available RAM).
- πŸ›‘οΈ **Total Data Privacy:** When you connect local IDEs (Cursor/Claude) or internal databases to Kryfto, your sensitive queries and raw scraped HTML never leave your VPC or touch a third-party analytics server.
- 🚦 **Unmetered Concurrency:** You dictate your rate limits. If you need to spin up 50 headless Chromium instances simultaneously, simply scale your worker droplet without hitting external API throttles.
- πŸ€– **AI-Context Optimization:** Kryfto automatically cleans, minifies, and converts bloated web HTML into dense Markdown. This drastically reduces LLM token consumption and improves context window limits when passing context to Claude or OpenAI.

---

## 🎯 Primary Use Cases & Solutions

### Use Case 1: Automated Market Research & Price Monitoring

**The Problem:** You need to track competitor product pricing across 10 different e-commerce sites daily, but they aggressively block basic python `requests` scripts.
**The Kryfto Solution:**

- Enable `KRYFTO_STEALTH_MODE=true` and feed residential proxies into `KRYFTO_PROXY_URLS`.
- Use the REST API to schedule daily `crawl` jobs pointing to competitor catalogs.
- Kryfto bypasses their bot protection, extracts the prices using CSS selectors (`"price": ".amount"`), and drops the raw JSON directly into your MinIO storage bucket for your analytics dashboard to query.

### Use Case 2: Unblocking AI Coding Assistants

**The Problem:** Your AI assistant (Cursor, Claude Code) is writing code using outdated documentation because the framework released a new version yesterday that isn't in its training data.
**The Kryfto Solution:**

- Install the Kryfto MCP Server into your IDE configuration.
- Ask your agent: _"Search for the newest Next.js App Router caching docs and update my code."_
- Kryfto executes the search, extracts the live, up-to-date documentation, and pipes it straight into the AI's context windowβ€”allowing it to write perfect, modern code.

### Use Case 3: Proprietary Lead Generation Pipelines

**The Problem:** You want to build a pipeline that finds local businesses on directory sites and extracts their contact emails to automatically pipe into your CRM.
**The Kryfto Solution:**

- Connect Kryfto to an n8n workflow.
- Step 1: Trigger Kryfto to execute a `search` for "plumbers in Chicago".
- Step 2: Loop through the search results and trigger Kryfto `browse` extraction jobs on each result's URL, targeting `mailto:` hrefs or contact page DOM nodes.
- Step 3: Automatically POST the collected emails directly into HubSpot or Salesforce.

### Use Case 4: Evidence-Based Technical Research

**The Problem:** Your team makes decisions based on blog posts and Stack Overflow answers with no source verification. You need traceable, trustworthy evidence.
**The Kryfto Solution:**

- Use `answer_with_evidence` to ask a question like "Does React 19 support server components?" β€” it searches, reads official pages, extracts paragraph-level evidence spans, and ranks them by domain trust score.
- Use `conflict_detector` to check if multiple sources contradict each other on a topic.
- Use `confidence_calibration` to score each claim based on source count, official source presence, recency, and domain trust.

### Use Case 5: Framework Upgrade Risk Assessment

**The Problem:** You need to upgrade Next.js from v13 to v14 but don't know what will break.
**The Kryfto Solution:**

- Call `upgrade_impact` with `framework: "nextjs", fromVersion: "13", toVersion: "14"` β€” it fetches migration guides, scans for breaking/deprecated/removed keywords, and rates the risk as low/medium/high.
- Combine with `github_releases` and `github_diff` to see every commit between tags.
- Use `query_planner` to preview the entire search→read→extract chain before executing.

### Use Case 6: Continuous Documentation Monitoring

**The Problem:** A critical API's docs change without notice, breaking your integration.
**The Kryfto Solution:**

- `watch_and_act` registers the URL with an optional Slack/Discord webhook and a semantic `context` filter.
- Periodically call `check_watch` β€” if the page changed, it auto-fires a POST to your webhook with the diff and reports delivery status.
- Use `semantic_diff` with context like "authentication" to filter only changes relevant to you.
- For fully autonomous monitoring, use `continuous_research_start` — it runs search→watch→diff→alert loops on a configurable interval, notifying your webhook of every new finding.

### Use Case 7: SLO Monitoring & Production Reliability

**The Problem:** You need to know if your AI agent's browsing tool is degrading before users notice.
**The Kryfto Solution:**

- `slo_dashboard` shows real-time per-tool success rate, p50/p95/p99 latency, cache hit rate, and freshness.
- `run_eval_suite` runs 10 real-world queries nightly, checking that official sources appear in results β€” measures precision% and average latency.
- `replay_request` retrieves the exact input/output of any previous call by `requestId` for debugging.

---

## πŸ₯· Anti-Bot & Stealth Configuration

Kryfto ships with a unified stealth layer (`packages/shared/src/stealth.ts`) designed to make every HTTP request indistinguishable from organic browser traffic.

### What’s Included (Zero Config Required)

| Feature | Description |
|---|---|
| **User-Agent Rotation** | 12 Chromium-only UAs covering Chrome 130–133 and Edge 131/133 (Firefox/Safari UAs removed to avoid fingerprint mismatches) |
| **Client Hints (`Sec-Ch-Ua`)** | Correct per-browser hints for Chrome/Edge |
| **Sec-Fetch Headers** | Full `Sec-Fetch-Dest/Mode/Site/User` set for all Chromium-based UAs |
| **Accept Headers** | Chromium-standard Accept strings for all UAs |
| **Referer** | Engine homepage injected automatically (e.g., `https://www.google.com/` for Google queries) |
| **Request Spacing** | Per-engine delays: Google 1500–3000ms, Bing/Yahoo 400–800ms, DDG 200–500ms, Brave 300–600ms |
| **Cookie Jar** | RFC 6265-compliant in-memory `Set-Cookie` persistence with Domain/Path/Secure/HttpOnly matching and 30min TTL |
| **Platform Hints** | Derived from UA: Windows/macOS/Linux |
| **Canvas Fingerprint** | Subtle pixel noise injected into `toDataURL`/`toBlob` to defeat canvas fingerprinting |
| **Fingerprint Consistency** | UA, platform, screen resolution, WebGL vendor/renderer, fonts, and audio are cross-matched per profile |
| **20-Point Browser Evasion** | webdriver, plugins, mimeTypes, platform, languages, deviceMemory, connection/Battery APIs, screen props, chrome runtime, permissions, canvas noise, WebGL, AudioContext, WebRTC leak prevention, iframe patches, CDP filtering, headless patches, timing noise, hasFocus, font defense |
| **WebGL Spoofing** | Reports "Intel Inc." / "Intel Iris OpenGL Engine" instead of headless renderer |
| **Hardware Concurrency** | Randomized from realistic values (4, 6, 8, 10, 12, 16) |
| **navigator.webdriver** | Patched to `false` in Playwright browser contexts |
| **Humanized Interactions** | Bezier curve mouse movements, realistic typing with typos + backspace, smooth chunked scrolling |
| **Browser Session Pool** | Per-domain context reuse with 30min TTL β€” avoids repeated challenges on subsequent requests |
| **CAPTCHA Solver** | Browser-based solving for Turnstile, reCAPTCHA v2, hCaptcha, Datadome (no external API keys) |
| **CLIP Vision Classifier** | Local CLIP (`clip-vit-large-patch14`) via `@xenova/transformers` for reCAPTCHA/hCaptcha image grid challenges |
| **Audio Transcription** | Local Whisper via `@xenova/transformers` for reCAPTCHA/hCaptcha audio challenges (fallback) |
| **Google Consent Cookie** | SOCS cookie injection to bypass EU consent interstitials |

### Optional Proxy Configuration

For crawling highly-protected sites (Cloudflare, Datadome, etc.), add proxies in your `.env`:

```env
KRYFTO_STEALTH_MODE=true
KRYFTO_ROTATE_USER_AGENT=true
# Feed it a comma-separated list of premium residential proxies
KRYFTO_PROXY_URLS=socks5://proxy1:1080,http://user:pass@proxy2:8080
```

---

## πŸ—οΈ Architecture

Kryfto is structured as an NPM monorepo using `pnpm` workspaces.

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ CLIENTS β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ cURL / β”‚ β”‚ n8n / β”‚ β”‚ Claude β”‚ β”‚ CLI β”‚ β”‚ Admin β”‚ β”‚
β”‚ β”‚ SDK-TS β”‚ β”‚ Zapier β”‚ β”‚ Cursor β”‚ β”‚ Tool β”‚ β”‚ Dashboard β”‚ β”‚
β”‚ β”‚ SDK-PY β”‚ β”‚ Make β”‚ β”‚ Codex β”‚ β”‚ β”‚ β”‚ (React SPA) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ REST API (/v1) β”‚ MCP (stdio) REST API β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ (/v1/admin) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚ β”‚ β”‚
β–Ό β–Ό β”‚ β”‚ β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Fastify API β”‚ β”‚ β”‚ β”‚ nginx (Dashboard) β”‚
β”‚ (apps/api) β”‚ β”‚ β”‚ β”‚ :3001 β”‚
β”‚ :8080 │◄──────┼──────────────┼────│ /dashboard/* β†’ SPA β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ /v1/* β†’ proxy β†’ API β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ Auth & RBAC β”‚ β”‚ β”‚ β”‚ β”‚ 9 pages: β”‚
β”‚ β”‚ β€’ SHA-256 tokens β”‚ β”‚ β”‚ β”‚ β”‚ Overview Β· Tokens β”‚
β”‚ β”‚ β€’ 3 roles β”‚ β”‚ β”‚ β”‚ β”‚ Projects Β· Jobs β”‚
β”‚ β”‚ β€’ Token expiration β”‚ β”‚ β”‚ β”‚ β”‚ Crawls Β· Audit Logs β”‚
β”‚ β”‚ β€’ Per-role rate lim. β”‚ β”‚ β”‚ β”‚ β”‚ Rate Limits β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ API Playground β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β–Ό β”‚ β”‚ API Examples β”‚
β”‚ β”‚ Route Handlers β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β€’ Jobs CRUD β”‚ β”‚ β”‚ MCP Server β”‚ β”‚
β”‚ β”‚ β€’ Search (5 engines)β”‚ β”‚ β”‚ (packages/ β”‚ β”‚
β”‚ β”‚ β€’ Crawl β”‚ β”‚ β”‚ mcp-server) β”‚ β”‚
β”‚ β”‚ β€’ Extract β”‚ β”‚ β”‚ 42+ tools β”‚ β”‚
β”‚ β”‚ β€’ Recipes β”‚ β”‚ β”‚ β€’ search β”‚β”€β”€β”˜
β”‚ β”‚ β€’ Admin endpoints β”‚ β”‚ β”‚ β€’ browse β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β€’ research β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β€’ extract β”‚
β”‚ β”‚ Audit Logging β”‚ β”‚ β”‚ β€’ watch β”‚
β”‚ β”‚ SSRF Protection β”‚ β”‚ β”‚ β€’ eval suite β”‚
β”‚ β”‚ Idempotency Keys β”‚ β”‚ β”‚ β€’ CAPTCHA solveβ”‚
β”‚ β”‚ OpenTelemetry β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ Enqueue (BullMQ)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Redis :6379 β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Job Queues β”‚ β”‚ Concurrency β”‚ β”‚ Pub/Sub (SSE logs) β”‚ β”‚
β”‚ β”‚ (BullMQ) β”‚ β”‚ Semaphores β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ Consume
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Worker (apps/worker) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Fetch Path β”‚ β”‚ Browser Path (Playwright) β”‚ β”‚
β”‚ β”‚ β€’ HTTP GET/POST β”‚ β”‚ β€’ Chromium / Firefox / WebKit β”‚ β”‚
β”‚ β”‚ β€’ Stealth headers β”‚ β”‚ β€’ 20-point stealth evasion β”‚ β”‚
β”‚ β”‚ β€’ Cookie jar β”‚ β”‚ β€’ Fingerprint consistency β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β€’ Humanized mouse/keyboard β”‚ β”‚
β”‚ β”‚ β€’ Browser session pool β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β€’ CAPTCHA solving (CLIP/Whisper)β”‚ β”‚
β”‚ β”‚ Extraction Engine β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β€’ CSS selectors β”‚ β”‚
β”‚ β”‚ β€’ JSON Schema β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ β€’ Plugin modules β”‚ β”‚ Crawl Orchestrator β”‚ β”‚
β”‚ β”‚ β€’ HTML β†’ Markdown β”‚ β”‚ β€’ BFS link-follow β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β€’ Depth/page caps β”‚ β”‚
β”‚ β”‚ β€’ robots.txt respect β”‚ β”‚
β”‚ β”‚ β€’ Politeness delays β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ Persist
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Persistence Layer β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ PostgreSQL :5432 β”‚ β”‚ MinIO / S3 :9000 β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β€’ projects β”‚ β”‚ β€’ Screenshots (PNG) β”‚ β”‚
β”‚ β”‚ β€’ api_tokens β”‚ β”‚ β€’ HTML snapshots β”‚ β”‚
β”‚ β”‚ β€’ rate_limit_config β”‚ β”‚ β€’ HAR archives β”‚ β”‚
β”‚ β”‚ β€’ jobs + job_logs β”‚ β”‚ β€’ Extracted data (JSON) β”‚ β”‚
β”‚ β”‚ β€’ artifacts (metadata) β”‚ β”‚ β€’ Cookies exports β”‚ β”‚
β”‚ β”‚ β€’ crawl_runs + nodes β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β€’ recipes β”‚ β”‚ Deduplicated by SHA-256 β”‚ β”‚
β”‚ β”‚ β€’ audit_logs β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β€’ idempotency_keys β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β€’ browser_profiles β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Shared Packages β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ @kryfto/shared β”‚ β”‚ @kryfto/ β”‚ β”‚ @kryfto/cli β”‚ β”‚
β”‚ β”‚ β€’ Zod schemas β”‚ β”‚ sdk-ts β”‚ β”‚ β€’ Commander CLI β”‚ β”‚
β”‚ β”‚ β€’ Stealth layer β”‚ β”‚ β€’ Typed β”‚ β”‚ β€’ YAML recipes β”‚ β”‚
β”‚ β”‚ β€’ Search parsers β”‚ β”‚ client β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β€’ Fingerprint gen β”‚ β”‚ β€’ Promise β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β€’ Browser stealth β”‚ β”‚ chains β”‚ β”‚
β”‚ β”‚ β€’ Humanize utils β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β€’ CAPTCHA vision β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Monorepo Layout

| Path | Description |
|---|---|
| `apps/api` | Fastify control plane β€” REST API, auth/RBAC, per-role rate limiting, token expiration, admin endpoints |
| `apps/dashboard` | React admin dashboard β€” token/project/job management, audit logs, rate limit config, API playground, examples |
| `apps/worker` | BullMQ workers β€” Playwright browser execution, stealth, CAPTCHA solving, crawl orchestration |
| `packages/mcp-server` | MCP bridge β€” 42+ tools for Claude, Cursor, Codex (search, browse, research, eval) |
| `packages/shared` | Shared library β€” Zod schemas, stealth layer, search parsers, fingerprint, humanize, CAPTCHA vision |
| `packages/sdk-ts` | TypeScript SDK β€” typed API client with promise chains |
| `packages/sdk-py` | Python SDK |
| `packages/cli` | CLI tool β€” Commander-based terminal interface with YAML recipe support |

### Development Commands

```bash
pnpm install
pnpm build
pnpm typecheck
KRYFTO_BASE_URL=http://localhost:8080 KRYFTO_API_TOKEN=$KRYFTO_API_TOKEN pnpm test:integration
```

---

## ❀️ Support the Project

Kryfto is free and open-source. If it saves you money on scraping APIs or helps power your AI workflows, consider supporting continued development with a small donation!

| Network | Address |
| ------------------ | ---------------------------------------------- |
| **Bitcoin (BTC)** | `bc1qd8ztrxucrhz27fgmu754ayq59lvjprclxdury5` |
| **Ethereum (ETH)** | `0x0a01779792a17fc57473a6368f3970fa1d8830ba` |
| **Solana (SOL)** | `FNKjiS2zhCq3rv8bboA83pzvKwDov3wyFxQn4sy75bPr` |
| **BNB (BSC)** | `0x0a01779792a17fc57473a6368f3970fa1d8830ba` |
| **Tron (TRX)** | `TF7YwGwP6cDCTGxLAjRKxqPss18pMp762G` |

Every contribution helps keep the lights on and the browsers headless. πŸ™

---

### License

Apache-2.0 (`LICENSE`)

---

## πŸ“‹ Changelog

See [CHANGELOG.md](CHANGELOG.md) for the full version history.