https://github.com/n1byn1kt/apitap
The MCP server that turns any website into an API — no docs, no SDK, no browser. npm: @apitap/core
https://github.com/n1byn1kt/apitap
ai-agent api browser-automation mcp mcp-server playwright skill-file web-scraping
Last synced: 4 months ago
JSON representation
The MCP server that turns any website into an API — no docs, no SDK, no browser. npm: @apitap/core
- Host: GitHub
- URL: https://github.com/n1byn1kt/apitap
- Owner: n1byn1kt
- License: other
- Created: 2026-02-14T16:20:58.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-03-06T04:31:29.000Z (4 months ago)
- Last Synced: 2026-03-06T08:30:21.067Z (4 months ago)
- Topics: ai-agent, api, browser-automation, mcp, mcp-server, playwright, skill-file, web-scraping
- Language: TypeScript
- Homepage: https://www.apitap.io
- Size: 12.7 MB
- Stars: 71
- Watchers: 1
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: .github/SECURITY.md
- Roadmap: ROADMAP.md
Awesome Lists containing this project
README
# ApiTap
[](https://www.npmjs.com/package/@apitap/core)
[](https://github.com/n1byn1kt/apitap)
[](./LICENSE)
**The MCP server that turns any website into an API — no docs, no SDK, no browser.**
ApiTap is an MCP server that lets AI agents browse the web through APIs instead of browsers. When an agent needs data from a website, ApiTap automatically detects the site's framework (WordPress, Next.js, Shopify, etc.), discovers its internal API endpoints, and calls them directly — returning clean JSON instead of forcing the agent to render and parse HTML. For sites that need authentication, it opens a browser window for a human to log in, captures the session tokens, and hands control back to the agent. Every site visited generates a reusable "skill file" that maps the site's APIs, so the first visit is a discovery step and every subsequent visit is a direct, instant API call. It works with any MCP-compatible LLM client and reduces token costs by 20-100x compared to browser automation.
The web was built for human eyes; ApiTap makes it native to machines.
```bash
# One tool call: discover the API + replay it
apitap browse https://techcrunch.com
✓ Discovery: WordPress detected (medium confidence)
✓ Replay: GET /wp-json/wp/v2/posts → 200 (10 articles)
# Or read content directly — no browser needed
apitap read https://en.wikipedia.org/wiki/Node.js
✓ Wikipedia decoder: ~127 tokens (vs ~4,900 raw HTML)
# Or step by step:
apitap capture https://polymarket.com # Watch API traffic
apitap show gamma-api.polymarket.com # See what was captured
apitap replay gamma-api.polymarket.com get-events # Call the API directly
```
No scraping. No browser. Just the API.

---
## How It Works
1. **Capture** — Launch a Playwright browser, visit a site, browse normally. ApiTap intercepts all network traffic via CDP.
2. **Filter** — Scoring engine separates signal from noise. Analytics, tracking pixels, and framework internals are filtered out. Only real API endpoints survive.
3. **Generate** — Captured endpoints are grouped by domain, URLs are parameterized (`/users/123` → `/users/:id`), and a JSON skill file is written to `~/.apitap/skills/`.
4. **Replay** — Read the skill file, substitute parameters, call the API with `fetch()`. Zero dependencies in the replay path.
```
Capture: Browser → Playwright listener → Filter → Skill Generator → skill.json
Replay: Agent → Replay Engine (skill.json) → fetch() → API → JSON response
```
## Install
```bash
npm install -g @apitap/core
```
**Claude Code** — one command to wire it up:
```bash
claude mcp add -s user apitap -- apitap-mcp
```
That's it. 12 MCP tools, ready to go. Requires Node.js 20+.
> **Optional:** To use `capture` and `browse` (which open a real browser), also run:
> ```bash
> npx playwright install chromium
> ```
> The `read`, `peek`, and `discover` tools work without it.
## Quick Start
### Capture API traffic
```bash
# Capture from a single domain (default)
apitap capture https://polymarket.com
# Capture all domains (CDN, API subdomains, etc.)
apitap capture https://polymarket.com --all-domains
# Include response previews in the skill file
apitap capture https://polymarket.com --preview
# Stop after 30 seconds
apitap capture https://polymarket.com --duration 30
```
ApiTap opens a browser window. Browse the site normally — click around, scroll, search. Every API call is captured. Press Ctrl+C when done.
### List and explore captured APIs
```bash
# List all skill files
apitap list
✓ gamma-api.polymarket.com 3 endpoints 2m ago
✓ www.reddit.com 2 endpoints 1h ago
# Show endpoints for a domain
apitap show gamma-api.polymarket.com
[green] ✓ GET /events object (3 fields)
[green] ✓ GET /teams array (12 fields)
# Search across all skill files
apitap search polymarket
```
### Replay an endpoint
```bash
# Replay with captured defaults
apitap replay gamma-api.polymarket.com get-events
# Override parameters
apitap replay gamma-api.polymarket.com get-events limit=5 offset=10
# Machine-readable JSON output
apitap replay gamma-api.polymarket.com get-events --json
```
## Text-Mode Browsing
ApiTap includes a text-mode browsing pipeline — `peek` and `read` — that lets agents consume web content without launching a browser. Seven built-in decoders extract structured content from popular sites at a fraction of the token cost:
| Site | Decoder | Typical Tokens | vs Raw HTML |
|------|---------|----------------|-------------|
| Reddit | `reddit` | ~627 | 93% smaller |
| YouTube | `youtube` | ~36 | 99% smaller |
| Wikipedia | `wikipedia` | ~127 | 97% smaller |
| Hacker News | `hackernews` | ~200 | 90% smaller |
| Grokipedia | `grokipedia` | ~150–5000+ | varies by article length |
| Twitter/X | `twitter` | ~80 | 95% smaller |
| Any other site | `generic` | varies | ~74% avg |
**Average token savings: 74% across 83 tested domains.**
```bash
# Triage first — zero-cost HEAD request
apitap peek https://reddit.com/r/programming
✓ accessible, recommendation: read
# Extract content — no browser needed
apitap read https://reddit.com/r/programming
✓ Reddit decoder: 12 posts, ~627 tokens
# Works for any URL — falls back to generic HTML extraction
apitap read https://example.com/blog/post
```
For MCP agents, `apitap_peek` and `apitap_read` are the fastest way to consume web content — use them before reaching for `apitap_browse` or `apitap_capture`.
## Tested Sites
ApiTap has been tested against real-world sites:
| Site | Endpoints | Tier | Replay |
|------|-----------|------|--------|
| Polymarket | 3 | Green | 200 |
| Reddit | 2 | Green | 200 |
| Discord | 4 | Green | 200 |
| GitHub | 1 | Green | 200 |
| HN (Algolia) | 1 | Yellow | 200 |
| dev.to | 2 | Green | 200 |
| CoinGecko | 6 | Green | 200 |
78% overall replay success rate across 9 tested sites (green tier: 100%).
## Why ApiTap?
**Why not just use the public API?** Most sites don't have one, or it's heavily rate-limited. The internal API that powers the SPA is often richer, faster, and already handles auth.
**Why not just use Playwright/Puppeteer?** Browser automation costs 50-200K tokens per page for an AI agent. ApiTap captures the API once, then your agent calls it directly at 1-5K tokens. No DOM, no selectors, no flaky waits.
**Why not reverse-engineer the API manually?** You could open DevTools and copy headers by hand. ApiTap does it in 30 seconds and gives you a portable file any agent can use.
**Isn't this just a MITM proxy?** No. ApiTap is read-only — it uses Chrome DevTools Protocol to observe responses. No certificate setup, no request modification, no code injection.
## Replayability Tiers
Every captured endpoint is classified by replay difficulty:
| Tier | Meaning | Replay |
|------|---------|--------|
| **Green** | Public, permissive CORS, no signing | Works with `fetch()` |
| **Yellow** | Needs auth, no signing/anti-bot | Works with stored credentials |
| **Orange** | CSRF tokens, session binding | Fragile — may need browser refresh |
| **Red** | Request signing, anti-bot (Cloudflare) | Needs full browser |
GET endpoints are auto-verified during capture by comparing Playwright responses with raw `fetch()` responses.
## MCP Server
ApiTap includes an MCP server with 12 tools for Claude Desktop, Cursor, Windsurf, and other MCP-compatible clients.
```bash
# Start the MCP server
apitap-mcp
```
**Claude Code** — see [Install](#install) above.
**Claude Desktop / Cursor / Windsurf** — add to your MCP config:
```json
{
"mcpServers": {
"apitap": {
"command": "apitap-mcp"
}
}
}
```
**VS Code (GitHub Copilot)** — add `.vscode/mcp.json`:
```json
{
"servers": {
"apitap": {
"command": "apitap-mcp"
}
}
}
```
### MCP Tools
| Tool | Description |
|------|-------------|
| `apitap_browse` | High-level "just get me the data" (discover + replay in one call) |
| `apitap_peek` | Zero-cost URL triage (HEAD only) |
| `apitap_read` | Extract content without a browser (7 decoders) |
| `apitap_discover` | Detect a site's APIs without launching a browser |
| `apitap_search` | Search available skill files |
| `apitap_replay` | Replay a captured API endpoint |
| `apitap_replay_batch` | Replay multiple endpoints in parallel across domains |
| `apitap_capture` | Capture API traffic via instrumented browser |
| `apitap_capture_start` | Start an interactive capture session |
| `apitap_capture_interact` | Interact with a live capture session (click, type, scroll) |
| `apitap_capture_finish` | Finish or abort a capture session |
| `apitap_auth_request` | Request human authentication for a site |
You can also serve a single skill file as a dedicated MCP server with `apitap serve ` — each endpoint becomes its own tool.
## Chrome Extension
ApiTap includes a Chrome extension that captures API traffic directly from your already-logged-in browser — no Playwright, no auth dance, no browser popups.
**Why use the extension?**
- You're already logged into Spotify, Discord, Reddit — the extension captures from your live session
- No `apitap auth request` needed — real tokens are captured automatically
- Browse naturally while it records in the background
### Setup
1. Build the extension:
```bash
cd extension && npm install && npm run build
```
2. Load in Chrome: `chrome://extensions` → Enable Developer mode → Load unpacked → select the `extension/` folder
3. Wire up auto-save (one-time):
```bash
apitap extension install --extension-id
```
Find your extension ID at `chrome://extensions` (enable Developer mode).
### Usage
1. Click the ApiTap icon in Chrome → **Start Capture**
2. Browse normally — extension records all API traffic
3. Click **Stop** → skill files auto-save to `~/.apitap/skills/`
The popup shows CLI connection status and live capture stats. Auth tokens are automatically stored to `~/.apitap/auth.enc` with `[stored]` placeholders in the exported skill files.
> **Note:** Chrome Web Store submission coming soon. For now, load as an unpacked extension in Developer mode.
---
## Auth Management
ApiTap automatically detects and stores auth credentials (Bearer tokens, API keys, cookies) during capture. Credentials are encrypted at rest with AES-256-GCM.
```bash
# View auth status
apitap auth api.example.com
# List all domains with stored auth
apitap auth --list
# Refresh expired tokens via browser
apitap refresh api.example.com
# Force fresh token before replay
apitap replay api.example.com get-data --fresh
# Clear stored auth
apitap auth api.example.com --clear
```
## Skill Files
Skill files are JSON documents stored at `~/.apitap/skills/.json`. They contain everything needed to replay an API — endpoints, headers, query params, request bodies, pagination patterns, and response shapes.
```json
{
"version": "1.1",
"domain": "gamma-api.polymarket.com",
"baseUrl": "https://gamma-api.polymarket.com",
"endpoints": [
{
"id": "get-events",
"method": "GET",
"path": "/events",
"queryParams": { "limit": { "type": "string", "example": "10" } },
"headers": {},
"responseShape": { "type": "object", "fields": ["id", "title", "slug"] }
}
]
}
```
Skill files are portable and shareable. Auth credentials are stored separately in encrypted storage — never in the skill file itself.
### Import / Export
```bash
# Import a skill file from someone else
apitap import ./reddit-skills.json
# Import validates: signature check → SSRF scan → confirmation
```
Imported files are re-signed with your local key and marked with `imported` provenance.
## Security
ApiTap handles untrusted skill files from the internet and replays HTTP requests on your behalf. That's a high-trust position, and we treat it seriously.
### Defense in Depth
- **Auth encryption** — AES-256-GCM with PBKDF2 key derivation, keyed to your machine
- **PII scrubbing** — Emails, phones, IPs, credit cards, SSNs detected and redacted during capture
- **SSRF protection** — Multi-layer URL validation blocks access to internal networks (see below)
- **Header injection protection** — Allowlist prevents skill files from injecting dangerous HTTP headers (`Host`, `X-Forwarded-For`, `Cookie`, `Authorization`)
- **Redirect validation** — Manual redirect handling with SSRF re-check prevents redirect-to-internal-IP attacks
- **DNS rebinding prevention** — Resolved IPs are pinned to prevent TOCTOU attacks where DNS returns different IPs on second lookup
- **Skill signing** — HMAC-SHA256 signatures detect tampering; three-state provenance tracking (self/imported/unsigned)
- **No phone-home** — Everything runs locally. No external services, no telemetry
- **Read-only capture** — Playwright intercepts responses only. No request modification or code injection
### Why SSRF Protection Matters
Since skill files can come from anywhere — shared by colleagues, downloaded from GitHub, or imported from untrusted sources — a malicious skill file is the primary threat vector. Here's what ApiTap defends against:
**The attack:** An attacker crafts a skill file with `baseUrl: "http://169.254.169.254"` (the AWS/cloud metadata endpoint) or `baseUrl: "http://localhost:8080"` (your internal services). When you replay an endpoint, your machine makes the request, potentially leaking cloud credentials or hitting internal APIs.
**The defense:** ApiTap validates every URL at multiple points:
```
Skill file imported
→ validateUrl(): block private IPs, internal hostnames, non-HTTP schemes
→ validateSkillFileUrls(): scan baseUrl + all endpoint example URLs
Endpoint replayed
→ resolveAndValidateUrl(): DNS lookup + verify resolved IP isn't private
→ IP pinning: fetch uses resolved IP directly (prevents DNS rebinding)
→ Header filtering: strip dangerous headers from skill file
→ Redirect check: if server redirects, validate new target before following
```
**Blocked ranges:** `127.0.0.0/8`, `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `169.254.0.0/16` (cloud metadata), `0.0.0.0`, IPv6 equivalents (`::1`, `fe80::/10`, `fc00::/7`, `::ffff:` mapped addresses), `localhost`, `.local`, `.internal`, `file://`, `javascript:` schemes.
This is especially relevant now that [MCP servers are being used as attack vectors in the wild](https://cloud.google.com/blog/topics/threat-intelligence/distillation-experimentation-integration-ai-adversarial-use) — Google's Threat Intelligence Group recently documented underground toolkits built on compromised MCP servers. ApiTap is designed to be safe even when processing untrusted inputs.
## CLI Reference
All commands support `--json` for machine-readable output.
| Command | Description |
|---------|-------------|
| `apitap browse ` | Discover + replay in one step |
| `apitap peek ` | Zero-cost URL triage (HEAD only) |
| `apitap read ` | Extract content without a browser |
| `apitap discover ` | Detect APIs without launching a browser |
| `apitap capture ` | Capture API traffic from a website |
| `apitap list` | List available skill files |
| `apitap show ` | Show endpoints for a domain |
| `apitap search ` | Search skill files by domain or endpoint |
| `apitap replay [key=val...]` | Replay an API endpoint |
| `apitap import ` | Import a skill file with safety validation |
| `apitap refresh ` | Refresh auth tokens via browser |
| `apitap auth [domain]` | View or manage stored auth |
| `apitap serve ` | Serve a skill file as an MCP server |
| `apitap inspect ` | Discover APIs without saving |
| `apitap stats` | Show token savings report |
| `apitap audit` | Audit stored skill files and credentials |
| `apitap forget ` | Remove skill file and credentials for a domain |
| `apitap --version` | Print version |
### Capture flags
| Flag | Description |
|------|-------------|
| `--all-domains` | Capture traffic from all domains (default: target domain only) |
| `--preview` | Include response data previews |
| `--duration ` | Stop capture after N seconds |
| `--port ` | Connect to specific CDP port |
| `--launch` | Always launch a new browser |
| `--attach` | Only attach to existing browser |
| `--no-scrub` | Disable PII scrubbing |
| `--no-verify` | Skip auto-verification of GET endpoints |
## Development
```bash
git clone https://github.com/n1byn1kt/apitap.git
cd apitap
npm install
npm test # 1051 tests, Node built-in test runner
npm run typecheck # Type checking
npm run build # Compile to dist/
npx tsx src/cli.ts capture # Run from source
```
## Contact
Questions, feedback, or issues? → **[hello@apitap.io](mailto:hello@apitap.io)**
## License
[Business Source License 1.1](./LICENSE) — **free for all non-competing use** (personal, internal, educational, research, open source). Cannot be rebranded and sold as a competing service. Converts to Apache 2.0 on February 7, 2029.