https://github.com/marmutapp/superbased-claude-code-plugin
SuperBased plugin for Claude Code — screenshot capture, AI vision, OCR, screen recording, visual regression testing, token compression, voice dictation, and proactive monitoring via 28 MCP tools
https://github.com/marmutapp/superbased-claude-code-plugin
ai-vision claude-code claude-code-plugin dictation mcp ocr screen-recording screenshot token-compression visual-testing
Last synced: 12 days ago
JSON representation
SuperBased plugin for Claude Code — screenshot capture, AI vision, OCR, screen recording, visual regression testing, token compression, voice dictation, and proactive monitoring via 28 MCP tools
- Host: GitHub
- URL: https://github.com/marmutapp/superbased-claude-code-plugin
- Owner: marmutapp
- Created: 2026-04-12T20:18:45.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-26T11:38:51.000Z (about 2 months ago)
- Last Synced: 2026-04-26T13:22:44.613Z (about 2 months ago)
- Topics: ai-vision, claude-code, claude-code-plugin, dictation, mcp, ocr, screen-recording, screenshot, token-compression, visual-testing
- Homepage: https://superbased.app
- Size: 37.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SuperBased — Eyes AND Hands for Claude Code
Screenshot capture, AI vision, OCR, screen recording, visual regression testing, token compression, voice dictation, proactive screen monitoring, **and full GUI automation with humanization v2** — all via 72 MCP tools, directly inside Claude Code.
## Install
### Option 1: From Marketplace
```
/plugin marketplace add marmutapp/superbased-claude-code-plugin
/plugin install superbased@superbased-tools
```
### Option 2: Local Plugin
```
claude --plugin-dir /path/to/superbased/plugin
```
### Option 3: MCP Server Only
Add to your project's `.mcp.json`:
```json
{
"superbased": {
"command": "superbased",
"args": ["mcp"]
}
}
```
## Prerequisites
- **SuperBased desktop app** running (Windows or macOS), OR
- **SuperBased CLI** installed globally: `npm install -g superbased`
- Node.js 20+
## Recommended: auto-approve SuperBased tools
Claude Code prompts for approval on every MCP tool call by default. For GUI-automation flows (click / type / scroll / drag / sequence / ui_dump) each prompt also steals focus back to the Claude Code window — so 20 tool calls = 20 focus breaks. Add this to `.claude/settings.json` (project) or `~/.claude/settings.json` (global) to auto-approve all SuperBased tools:
```json
{
"permissions": {
"allow": ["mcp__superbased"]
}
}
```
Safe: the SuperBased server still enforces its own gates (`guiAutomation.enabled` master toggle + per-action toggles + `confirm: true` + protected-apps blocklist + NDJSON audit log). Auto-approving in Claude Code only bypasses the redundant host-side prompt, not the underlying safety rail.
## Slash Commands (26)
| Command | Description |
|---------|-------------|
| `/superbased:capture` | Take a screenshot (fullscreen, window, or region) |
| `/superbased:window` | List open windows or capture a specific window |
| `/superbased:extract` | Capture + OCR to extract text from screen |
| `/superbased:explain` | Capture + AI analysis of what's on screen |
| `/superbased:ocr` | Extract text from screenshot or image file (local Tesseract) |
| `/superbased:clipboard` | Read or write system clipboard (text or image) |
| `/superbased:annotate` | Add rectangles, arrows, text labels, blur to captures |
| `/superbased:redact` | Auto-redact secrets and PII from screenshots |
| `/superbased:record` | Start, stop, or manage screen recording sessions |
| `/superbased:monitor` | Start proactive AI screen monitoring |
| `/superbased:sessions` | List recording sessions and view frames |
| `/superbased:diff` | Compare two recording sessions for visual regressions |
| `/superbased:baseline` | Manage visual regression testing baselines |
| `/superbased:export` | Export sessions as zip, markdown, PDF, HTML, or GIF |
| `/superbased:gallery` | Browse, search, and manage capture gallery |
| `/superbased:compress` | Compress text into token-efficient images |
| `/superbased:dictate` | Record from microphone and transcribe |
| `/superbased:transcribe` | Transcribe audio file to text (raw Whisper) |
| `/superbased:settings` | View or update app settings |
| `/superbased:presets` | Manage AI instruction presets |
| `/superbased:status` | Health, auth, and AI usage check |
| `/superbased:auth` | Authentication management |
| `/superbased:click` | Click an on-screen element by label or coordinates |
| `/superbased:form` | Fill a form by label/value pairs (`superbased_form_fill`) |
| `/superbased:record-gui` | Record a multi-step GUI workflow for replay |
| `/superbased:captcha` | Open the CAPTCHA-solving guidance (rotation puzzles, drag puzzles, image grids) |
## Skills (11)
Skills are invoked automatically by Claude when relevant to the task.
| Skill | When Claude Uses It |
|-------|-------------------|
| **screenshot** | Claude needs to see the screen to answer a question or verify a UI change |
| **visual-qa** | Visual regression testing: record baseline, make changes, record again, diff |
| **monitor** | Proactive screen watching during deploys, tests, or builds |
| **compress** | Large text content (>500 tokens) that would be cheaper as an image |
| **redact** | Screenshots that may contain API keys, tokens, or PII before sharing |
| **dictation** | User wants voice input, audio transcription, or speech-to-text |
| **annotate** | Highlighting areas, marking regressions, creating annotated screenshots |
| **walkthrough** | Multi-frame product walkthrough: capture, narrate, export |
| **gui-automation** | "Click that", "type into this", "fill the form" — drives the desktop with click/type/hotkey/scroll/drag/form-fill/sequence |
| **captcha-solving** | reCAPTCHA / Cloudflare Turnstile / drag puzzles / rotation puzzles / image grids |
| **humanization** | Sites with bot detection — picks the right humanization profile (off/light/human/paranoid) |
## Agents (3)
Dedicated agents for complex multi-step workflows.
| Agent | Description |
|-------|-------------|
| **visual-qa** | Record baselines, capture after changes, diff, annotate regressions, export reports |
| **monitor** | Watch screen for errors during deploys/tests, flag issues proactively, summarize findings |
| **gui-automation** | Orchestrate multi-step GUI workflows with `superbased_sequence` + the click/type/drag/scroll/form-fill primitives, with the safety checklist baked in |
## Hooks
**Post-test auto-capture:** After any test command (`npm test`, `jest`, `vitest`, `pytest`, `cargo test`, `go test`), SuperBased automatically captures a screenshot at quarter resolution. This builds a visual history of test runs without manual intervention.
## Humanization v2
GUI automation actions (`click`, `type`, `drag`, `hover`) ship with a humanization layer to reduce the bot-detection signal:
- **Cursor walks** use a sin-shaped velocity envelope (Bezier-style ease-in/ease-out) — not constant-velocity
- **Click targets** get a Gaussian jitter so two clicks on the same element land on slightly different pixels
- **Pre-click settle dwell** is gamma-distributed (rare long pauses, common short ones) — humans don't click the millisecond their cursor arrives
- **Click hold** varies between 50–110 ms, **key hold** between 45–95 ms, with per-process cross-session salt mixed into the seed
- **Typo simulation** wired via `typoProb` — with the QWERTY same-row neighbor distribution that real fat-finger errors follow
- **Pre-click tremor** on the target element + occasional 2–4× micro-pauses that mimic distraction
- **Inter-action catch-up pause** between sequence steps so back-to-back clicks don't have suspiciously identical inter-arrival times
- **Opt-in idle cursor drift** via the `humanInputIdleDrift` setting
Four profiles selectable per call: `humanize: 'off' | 'light' | 'human' | 'paranoid'`. Default is `light`. Bump to `human` or `paranoid` for sites with active bot detection — see the **humanization** skill.
## CAPTCHA solving
Plugin ships proactive guidance for the four CAPTCHA classes that come up in real automation work:
- **reCAPTCHA / Cloudflare Turnstile** — checkbox + image-grid challenges. Vision identifies the matching tiles, then a single batched click sequence selects them.
- **Drag puzzles** — slider-to-fit verification (e.g. "drag the puzzle piece to the gap"). Use `superbased_drag` with `humanize: 'light'` so the drop velocity reads as human; never `'off'`.
- **Rotation puzzles** — calibrate-then-execute pattern (capture, identify the angular delta, then drag in one motion).
- **Image grids** — vision to identify, batched click to select.
Plus an honest list of "what humanization can't defeat" (server-side device fingerprinting, audio CAPTCHAs, hCaptcha enterprise mode). See the **captcha-solving** skill.
## MCP Tools (72)
The plugin exposes all 72 SuperBased MCP tools. Click each section to expand.
Capture & View (5)
`superbased_screenshot` (preferred wrapper), `superbased_capture_image` (advanced), `superbased_capture`, `superbased_gallery_image`, `superbased_window_list`
AI & OCR (8)
`superbased_ai`, `superbased_ai_usage`, `superbased_ocr`, `superbased_transcribe`, `superbased_compress_text`, `superbased_project`, `superbased_workspace_sync`, `superbased_stt_status`
Gallery (2)
`superbased_gallery`, `superbased_gallery_update`
Privacy & Annotations (2)
`superbased_redact`, `superbased_annotate`
Dictation & Voice (2)
`superbased_dictate`, `superbased_dictation_history`
Recording & Visual QA (7)
`superbased_recording`, `superbased_sessions`, `superbased_describe_frames`, `superbased_narrate`, `superbased_diff`, `superbased_baseline`, `superbased_export`
Settings, Auth & System (6)
`superbased_settings`, `superbased_presets`, `superbased_auth`, `superbased_license`, `superbased_health`, `superbased_clipboard`
GUI Automation (40)
**Read the screen:** `superbased_ui_dump` (preferred for "read the page"), `superbased_scroll_capture` (preferred for "walk the whole page"), `superbased_scroll_to` (preferred for "find X on a long page"), `superbased_accessibility_tree`, `superbased_locate`
**Drive the desktop:** `superbased_sequence` (preferred for >1 step), `superbased_click`, `superbased_type`, `superbased_hotkey`, `superbased_scroll`, `superbased_drag`, `superbased_drag_file` (scaffold), `superbased_hover`, `superbased_context_menu_select`, `superbased_form_fill`, `superbased_dialog_handle`, `superbased_open_url`, `superbased_find_in_page`, `superbased_tab_management`, `superbased_tray_click`, `superbased_virtual_desktop`
**Window & display:** `superbased_window_state`, `superbased_resize_window`, `superbased_focus_window`, `superbased_window_bounds`, `superbased_find_title_bar_drag_region`, `superbased_display_list`, `superbased_launch_app`
**Vision targeting:** `superbased_find_image`, `superbased_capture_template`, `superbased_pixel_color`
**Accessibility & invoke:** `superbased_ax_invoke`
**Timing:** `superbased_wait`, `superbased_wait_for`, `superbased_mouse_position`
**Safety / dev tools:** `superbased_dry_run`, `superbased_replay`, `superbased_doctor_gui_automation`, `superbased_undo_last`, `superbased_tools`
## Token Savings
SuperBased optimizes token usage with resolution control:
| Resolution | 1080p Tokens | Savings vs Full |
|------------|-------------|-----------------|
| `full` | ~2,765 | baseline |
| `high` | ~1,382 | 2x |
| `half` | ~691 | 4x |
| `quarter` | ~173 | 16x |
| `thumbnail` | ~43 | 64x |
The Token Compression Engine converts large text blocks into optimized images, saving tokens when `image_tokens < text_tokens` (typically for content >500 tokens).
## Examples
### See what's on screen
```
/superbased:capture
```
### Capture a specific window
```
/superbased:window Chrome
```
### Click a button by label
```
/superbased:click Submit
```
### Fill a login form
```
/superbased:form email=alice@example.com password=hunter2
```
### Solve a rotation CAPTCHA
```
/superbased:captcha
(then describe the puzzle to Claude — it'll calibrate the angle, then drag in one motion)
```
### Monitor a deploy for errors
```
/superbased:monitor Flag any errors, failed health checks, or 500 status codes
```
### Visual regression test
```
/superbased:record login-flow-baseline
(navigate the login UI)
/superbased:record stop
/superbased:baseline set login-flow
(make code changes)
/superbased:record login-flow-after
(navigate the same flow)
/superbased:record stop
/superbased:diff
```
### Redact and share a screenshot
```
/superbased:capture
/superbased:redact
```
## Links
- [SuperBased](https://superbased.app) — Desktop app download
- [npm package](https://www.npmjs.com/package/superbased) — Headless CLI
- [MCP Integration Guide](https://github.com/marmutapp/superbased-claude-code-plugin) — Plugin repo & setup guide