{"id":50687787,"url":"https://github.com/klarlabs-studio/scout","last_synced_at":"2026-06-09T00:04:23.603Z","repository":{"id":345972757,"uuid":"1187951093","full_name":"klarlabs-studio/scout","owner":"klarlabs-studio","description":"Browser automation, one binary. The simpler alternative to Playwright — no Node, no Python, no runtime. Library, CLI, MCP server, and chat UI for any AI agent.","archived":false,"fork":false,"pushed_at":"2026-06-06T20:39:29.000Z","size":13078,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-06T21:14:56.827Z","etag":null,"topics":["ai-agent","ai-browser","browser-automation","cdp","chrome","claude-mcp","cursor-mcp","devtools-protocol","golang","headless-browser","llm","mcp","mcp-server","middleware","playwright-alternative","screencast","single-binary","video-recording","web-scraping"],"latest_commit_sha":null,"homepage":"https://klarlabs-studio.github.io/scout/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/klarlabs-studio.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["felixgeelhaar"]}},"created_at":"2026-03-21T12:15:56.000Z","updated_at":"2026-06-06T20:39:32.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/klarlabs-studio/scout","commit_stats":null,"previous_names":["felixgeelhaar/scout","klarlabs-studio/scout"],"tags_count":34,"template":false,"template_full_name":null,"purl":"pkg:github/klarlabs-studio/scout","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klarlabs-studio%2Fscout","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klarlabs-studio%2Fscout/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klarlabs-studio%2Fscout/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klarlabs-studio%2Fscout/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/klarlabs-studio","download_url":"https://codeload.github.com/klarlabs-studio/scout/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klarlabs-studio%2Fscout/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34085338,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","ai-browser","browser-automation","cdp","chrome","claude-mcp","cursor-mcp","devtools-protocol","golang","headless-browser","llm","mcp","mcp-server","middleware","playwright-alternative","screencast","single-binary","video-recording","web-scraping"],"created_at":"2026-06-09T00:04:21.875Z","updated_at":"2026-06-09T00:04:23.597Z","avatar_url":"https://github.com/klarlabs-studio.png","language":"Go","funding_links":["https://github.com/sponsors/felixgeelhaar"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/logo-400.png\" alt=\"Scout\" width=\"160\"\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eScout\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\u003cstrong\u003eBrowser automation, one binary.\u003c/strong\u003e The simpler alternative to Playwright — no Node, no Python, no runtime. Drive a real browser from Go, any shell, any AI agent (built-in MCP server), or a chat UI.\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/klarlabs-studio/scout/releases\"\u003e\u003cimg src=\"https://img.shields.io/github/v/release/klarlabs-studio/scout?style=flat-square\u0026color=3b82f6\" alt=\"Release\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/klarlabs-studio/scout/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/github/license/klarlabs-studio/scout?style=flat-square\" alt=\"License\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/klarlabs-studio/scout/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/klarlabs-studio/scout/ci.yml?style=flat-square\u0026label=CI\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pkg.go.dev/go.klarlabs.de/scout\"\u003e\u003cimg src=\"https://img.shields.io/badge/go.dev-reference-007d9c?style=flat-square\" alt=\"Go Reference\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/coverage-80%25-brightgreen?style=flat-square\" alt=\"Coverage\"\u003e\n  \u003ca href=\"https://github.com/klarlabs-studio/scout/security/code-scanning\"\u003e\u003cimg src=\"https://img.shields.io/badge/security-nox-22c55e?style=flat-square\" alt=\"Security (nox)\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nA single statically-linked `scout` binary gives you a CLI, a 77-tool MCP server (so any MCP-aware agent — Claude Desktop, Cursor, Cline, custom — has a browser), a conversational chat UI, and a Go library with Gin-like middleware composition. Same engine, four access points.\n\n```bash\nbrew install klarlabs-studio/tap/scout\n```\n\n## vs. Playwright\n\n| | Scout | Playwright |\n|---|---|---|\n| Install | one ~15 MB binary | npm + ~600 MB browser cache |\n| Runtime dep | **none** (static) | Node.js always; Python/Java/.NET as wrappers |\n| Drive from | Go, any shell, MCP, chat UI | TS/JS first-class; others lag |\n| AI-agent native | **built-in** `scout mcp serve` | separate `playwright-mcp` project |\n| Token-aware extraction | DOM diff, distillation, observation budgets (50–80% fewer tokens) | not provided |\n| Action playbooks | record \u0026 replay deterministic JSON | codegen produces a script you maintain |\n| Container deploy | drop into `scratch` or `distroless` | carry Node + browser binaries |\n| CDP access | direct WebSocket, zero abstraction | internal protocol over CDP |\n\n## Quick Start\n\n```bash\n# CLI — visible browser, one-shot commands\nscout observe https://example.com          # structured page snapshot\nscout markdown https://news.ycombinator.com # page as compact markdown\nscout screenshot https://github.com         # save screenshot\nscout extract https://example.com h1        # extract element text\nscout frameworks https://react.dev          # detect React, Vue, etc.\n\n# MCP Server — give AI agents browser superpowers\nclaude mcp add scout -- scout mcp serve\n\n# Browser UI — conversational browser automation\nscout ui serve --provider=ollama --model=mistral\ncd ui \u0026\u0026 npm install \u0026\u0026 npm run dev  # open http://localhost:3000\n```\n\n## Install\n\n```bash\n# Homebrew\nbrew install klarlabs-studio/tap/scout\n\n# Direct binary\ncurl -fsSL https://raw.githubusercontent.com/klarlabs-studio/scout/main/install.sh | bash\n\n# Go\ngo install go.klarlabs.de/scout/cmd/scout@latest\n\n# As a library\ngo get go.klarlabs.de/scout\n```\n\n## MCP Server — 77 Tools\n\nRun `scout mcp serve` and any MCP-aware agent has a browser. No second project to install, no Node runtime, no Python interpreter — the binary is the server. Configure in any MCP client:\n\n```bash\nclaude mcp add scout -- scout mcp serve           # Claude Code\n```\n\n```json\n{\"mcpServers\": {\"scout\": {\"command\": \"scout\", \"args\": [\"mcp\", \"serve\"]}}}\n```\n\n### Tool Categories\n\n| Category | Tools |\n|----------|-------|\n| **Navigation** | `navigate`, `observe`, `observe_diff`, `observe_with_budget` |\n| **Interaction** | `click`, `click_label`, `click_text`, `type`, `hover`, `double_click`, `right_click`, `select_option`, `scroll_to`, `scroll_by`, `focus`, `drag_drop`, `dispatch_event` |\n| **Forms** | `fill_form`, `fill_form_semantic` (checkbox/radio + state echo), `discover_form` |\n| **Extraction** | `extract`, `extract_all`, `extract_table`, `auto_extract`, `scroll_and_collect`, `markdown`, `readable_text`, `accessibility_tree` |\n| **Capture** | `screenshot`, `annotated_screenshot`, `pdf` |\n| **Network** | `enable_network_capture`, `network_requests` |\n| **Tabs** | `open_tab`, `switch_tab`, `close_tab`, `list_tabs` |\n| **Frameworks** | `wait_spa`, `detect_frameworks`, `component_state`, `app_state` |\n| **Playback** | `start_recording`, `stop_recording`, `save_playbook`, `replay_playbook` |\n| **Video** | `start_screen_recording`, `stop_screen_recording` |\n| **Smart Helpers** | `check_readiness`, `suggest_selectors`, `session_history` |\n| **Vision** | `hybrid_observe`, `find_by_coordinates` |\n| **Batch** | `execute_batch` |\n| **Iframe** | `switch_to_frame`, `switch_to_main_frame` |\n| **Trace** | `start_trace`, `stop_trace` |\n| **Cookies** | `cookies_list`, `cookies_clear`, `cookies_set`, `dismiss_cookies` |\n| **Diagnostics** | `detect_dialog`, `detect_auth_wall`, `console_errors` (incl. network 4xx/5xx), `failed_requests`, `compare_tabs`, `upload_file` |\n| **Utility** | `has_element`, `wait_for`, `configure`, `web_vitals`, `select_by_prompt` |\n\nAll tools have MCP annotations (`ReadOnly`, `OpenWorld`, `ClosedWorld`, `Idempotent`) for smart auto-approval. Read-only tools like `observe`, `extract`, and `screenshot` run without permission prompts.\n\n### Runtime Configuration\n\nSwitch between headless and visible browser without restarting, and opt into local-dev workflows (loopback, private IPs):\n\n```\nAgent: configure(headless: false)                        → browser window appears\nAgent: navigate(\"https://...\")                           → watch it work\nAgent: configure(headless: true)                         → back to headless\nAgent: configure(allow_private_ips: true)                → unlock localhost / 192.168.* / 10.*\nAgent: navigate(\"http://127.0.0.1:4173/\")                → drive your local dev server\n```\n\nThe MCP server also reads `SCOUT_ALLOW_PRIVATE_IPS=1` at startup as a one-shot toggle for trusted environments.\n\n### Screen Recording (video)\n\nRecord the active page as a video. Pure CDP — works in headless, no Playwright needed. Recording survives `navigate`, `open_tab`, and `switch_tab` calls in between, so a multi-page demo lands as one continuous clip:\n\n```\nAgent: start_screen_recording({ width: 1280, height: 800, fps: 15, format: \"webm\" })\nAgent: navigate(\"https://example.com\")\nAgent: click(\"#cta\")\nAgent: navigate(\"https://example.com/dashboard\")   # recording continues across pages\nAgent: stop_screen_recording()\n       → { path: \"/tmp/scout-rec-XXX.webm\", format: \"webm\", encoder: \"ffmpeg\",\n           frame_count: N, duration_ms: N }\n```\n\nIf `ffmpeg` is on PATH, the result is encoded to WebM (libvpx-vp9) or MP4 (libx264). If not, scout returns the raw JPEG frames directory plus an ffmpeg concat list so you can encode offline. The result is always a file path, never base64 — never enters your LLM token budget.\n\nRealistic FPS: ~10–15 on typical pages, capped at 30. Implementation polls `Page.captureScreenshot` (CDP `Page.startScreencast` events are silently dropped under `--headless=new` Chrome).\n\n## Browser UI\n\nA conversational browser automation interface. Type natural language, watch the browser respond in real-time.\n\n```bash\n# Start the AG-UI server (Go backend)\nscout ui serve --provider=ollama --model=mistral    # local, no API key\nscout ui serve --provider=claude                     # needs ANTHROPIC_API_KEY\nscout ui serve --provider=openai --model=gpt-4o     # needs OPENAI_API_KEY\nscout ui serve --provider=groq --base-url=https://api.groq.com/openai --model=llama-3.3-70b-versatile\n\n# Start the Vue frontend\ncd ui \u0026\u0026 npm install \u0026\u0026 npm run dev                  # http://localhost:3000\n```\n\nThe UI streams AG-UI protocol events over SSE:\n- **Chat panel** with markdown rendering and quick-action pills\n- **Live browser viewport** with screenshot streaming and URL bar\n- **Activity timeline** showing tool calls in real-time\n- **Stop button** to cancel mid-stream\n\nThe Go server handles the agentic loop: LLM decides which scout tools to call, executes them, streams browser state deltas back to the frontend. Supports any OpenAI-compatible endpoint via `--base-url`.\n\n## Agent Package (Go)\n\nHigh-level Go API for callers that want to embed scout in a program. Structured output, auto-wait, goroutine-safe. Most users reach scout through the CLI or MCP server above — this section is for the Go-library path.\n\n```go\nsession, _ := agent.NewSession(agent.SessionConfig{Headless: true})\ndefer session.Close()\n\n// Navigate and observe\nsession.Navigate(\"https://example.com\")\nobs, _ := session.Observe()               // links, inputs, buttons, text + action costs\n\n// DOM diff — only what changed (saves 50-80% tokens)\nsession.Click(\"#submit\")\n_, diff, _ := session.ObserveDiff()\n// diff.Classification: \"modal_appeared\"\n// diff.Summary: \"Modal/dialog appeared: Login required\"\n\n// Semantic form filling — no CSS selectors\nsession.FillFormSemantic(map[string]string{\n    \"Email\": \"user-example\", \"Password\": \"secret\",\n})\n\n// Visual grounding — click by number, not selector\nresult, _ := session.AnnotatedScreenshot()  // numbered labels on elements\nsession.ClickLabel(7)                        // click element [7]\n\n// Multi-tab coordination\nsession.OpenTab(\"pricing\", \"https://example.com/pricing\")\nsession.SwitchTab(\"default\")\n\n// Framework detection (19 frameworks)\nframeworks, _ := session.DetectedFrameworks() // [\"react\", \"nextjs\"]\nstate, _ := session.ComponentState(\"#app\")    // read React/Vue state\n\n// Network capture — read API responses directly\nsession.EnableNetworkCapture(\"/api/\")\ncaptured := session.CapturedRequests(\"/api/users\")\n\n// Action replay — record once, replay without LLM\nsession.StartRecordingPlaybook(\"login-flow\")\n// ... do stuff ...\npb, _ := session.StopRecordingPlaybook()\nagent.SavePlaybook(pb, \"login.json\")\n// Later: session.ReplayPlaybook(pb)  // 100x cheaper\n\n// Persistent profiles\nsession.SaveProfile(\"session.json\")   // cookies + localStorage\nsession.LoadProfile(\"session.json\")\n\n// Content distillation (5 levels)\nsession.Markdown()          // ~2-8KB compact markdown\nsession.ReadableText()      // ~1-4KB main content only\nsession.AccessibilityTree() // ~1-4KB semantic tree\nsession.ObserveWithBudget(500) // fit in ~500 tokens\n```\n\n## Core Library (Go)\n\nGin-like Engine/Context/Group/HandlerFunc with middleware composition. The lowest-level Go API — use it when you want full control of task lifecycle, named groups, and middleware chains:\n\n```go\nengine := browse.Default(browse.WithHeadless(true))\nengine.MustLaunch()\ndefer engine.Close()\n\nengine.Use(middleware.Stealth())\nengine.Use(middleware.Retry(middleware.RetryConfig{MaxAttempts: 3}))\nengine.Use(middleware.Timeout(30 * time.Second))\n\nadmin := engine.Group(\"admin\", middleware.BasicAuth(\"admin\", \"secret\"))\nadmin.Task(\"export\", func(c *browse.Context) {\n    c.MustNavigate(\"https://app.example.com/admin\")\n    table, _ := c.ExtractTable(\"#users\")\n    c.Set(\"data\", table)\n})\n\nengine.RunGroup(\"admin\")\n```\n\n### Middleware\n\n| Category | Middleware |\n|----------|-----------|\n| **Resilience** | `Retry`, `Timeout`, `CircuitBreaker`, `RateLimit`, `Bulkhead` |\n| **Auth** | `BearerAuth`, `BasicAuth`, `CookieAuth`, `HeaderAuth` |\n| **Anti-detection** | `Stealth` (10 patches: webdriver, plugins, WebGL, etc.) |\n| **Network** | `BlockResources`, `WaitNetworkIdle` |\n| **Utilities** | `ScreenshotOnError`, `SlowMotion`, `Viewport` |\n\n## CLI\n\nCLI defaults to visible browser (`--headless` to hide):\n\n```bash\nscout navigate \u003curl\u003e                  # page info as JSON\nscout observe \u003curl\u003e                   # structured observation\nscout markdown \u003curl\u003e                  # compact markdown\nscout screenshot \u003curl\u003e [--output f]   # save screenshot\nscout pdf \u003curl\u003e [--output f]          # save PDF\nscout extract \u003curl\u003e \u003cselector\u003e        # extract text\nscout eval \u003curl\u003e \u003cexpression\u003e         # run JavaScript\nscout form discover \u003curl\u003e             # discover form fields\nscout frameworks \u003curl\u003e                # detect frameworks\nscout watch \u003curl\u003e [--interval=5s]     # live-watch page changes\nscout pipe \u003ccommand\u003e [selector]       # batch process URLs from stdin\nscout record \u003curl\u003e [--output f]       # interactive recording → playbook\nscout mcp serve                       # start MCP server\nscout version                         # print version\n```\n\n## Architecture\n\n```\nscout/\n├── browse.go, engine.go, context.go   # Gin-like API\n├── page.go, selection.go              # CDP page \u0026 element interaction\n├── recorder.go                        # Action playbook recording (Navigate/Click/Type → JSON)\n├── middleware/                        # stealth, resilience, auth, network\n├── agent/                             # AI agent API (50+ methods)\n│   ├── session.go                     # Session lifecycle, Navigate, Click, Type\n│   ├── observe.go, diff.go            # Observe, ObserveDiff, cost estimation\n│   ├── content.go                     # Markdown, ReadableText, AccessibilityTree\n│   ├── form.go                        # DiscoverForm, FillFormSemantic, MatchFormField\n│   ├── annotate.go                    # AnnotatedScreenshot, ClickLabel\n│   ├── network.go                     # EnableNetworkCapture, CapturedRequests\n│   ├── spa.go                         # DetectedFrameworks, ComponentState, GetAppState\n│   ├── tabs.go                        # OpenTab, SwitchTab, CloseTab, ListTabs\n│   ├── playbook.go                    # StartRecording, ReplayPlaybook, SavePlaybook\n│   ├── interact.go                    # Hover, DragDrop, SelectOption, ScrollTo\n│   ├── profile.go                     # CaptureProfile, ApplyProfile, SaveProfile\n│   ├── selector.go                    # Playwright :text() selector translation\n│   ├── budget.go                      # ObserveWithBudget, EstimateTokens\n│   ├── nlselect.go                    # SelectByPrompt, fuzzy NL element matching\n│   ├── batch.go                       # ExecuteBatch, sequential multi-action\n│   ├── vision.go                      # HybridObserve, FindByCoordinates\n│   ├── trace.go                       # StartTrace, StopTrace, action tracing\n│   ├── screencast.go                  # StartScreenRecording / StopScreenRecording — video via captureScreenshot polling + ffmpeg encode\n│   ├── iframe.go                      # SwitchToFrame, SwitchToMainFrame\n│   └── vitals.go                      # WebVitals (LCP/CLS/INP)\n├── internal/cdp/                      # WebSocket CDP client (context-aware)\n├── internal/launcher/                 # Chrome process management\n├── cmd/scout/                         # CLI + MCP server (84 tools)\n└── docs/                              # Landing page (GitHub Pages)\n```\n\n## Security\n\nVulnerability scanning runs on every push and PR via [`nox`](https://github.com/nox-hq/nox). Findings are uploaded to GitHub code scanning, annotated inline on PRs, and gated against `.nox/baseline.json` so regressions block merges. The status badge in the header reflects the latest main-branch scan.\n\n`nox` also drives dependency remediation in place of Dependabot — the [Nox Remediate](.github/workflows/nox-remediate.yml) workflow runs weekly (Monday 06:00 UTC) and on demand, executing `nox fix` against fresh OSV.dev findings and opening a single PR with the verified upgrades.\n\n```bash\n# Local scan\nnox scan -severity-threshold high .\n\n# Local fix\nnox fix -input findings.json\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklarlabs-studio%2Fscout","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fklarlabs-studio%2Fscout","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklarlabs-studio%2Fscout/lists"}