https://github.com/browserbase/agent-browse

Claude Agent SDK with a web browsing tool
https://github.com/browserbase/agent-browse

Last synced: 6 months ago
JSON representation

Claude Agent SDK with a web browsing tool

Host: GitHub
URL: https://github.com/browserbase/agent-browse
Owner: browserbase
Created: 2025-10-12T08:44:05.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-10-17T17:19:59.000Z (6 months ago)
Last Synced: 2025-10-18T19:50:41.639Z (6 months ago)
Language: TypeScript
Size: 73.2 KB
Stars: 1
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Claude Agent SDK + Stagehand: Agentic Browser Automation

A demo showing how the **[Claude Agent SDK](https://docs.claude.com/en/api/agent-sdk/overview)** (reasoning) combines with **[Stagehand](https://github.com/browserbase/stagehand)** (AI browser automation framework) to create powerful agentic browser automation. Because Stagehand accepts natural language instructions, it's significantly more context-efficient than native Playwright.

## Architecture: Reasoning + Tools

This demo illustrates a clean separation of concerns:

- **Claude Agent SDK**: Handles all reasoning, planning, and decision-making

- **Stagehand**: Executes browser actions via natural language commands

- **Result**: Context-efficient automation where Claude decides *what* to do and Stagehand handles *how* to do it

### Context Efficiency

**Stagehand saves thousands of tokens per interaction** by handling DOM traversal and selector logic internally.

```typescript

// Traditional: ~500 tokens of context + implementation

- Full DOM structure passed to Claude

- Claude generates: await page.click('button[data-testid="auth-submit"][aria-label="Submit"]');

- Breaks if UI changes

// Stagehand: ~50 tokens

- Claude calls: act({ action: "click the submit button" })

- Stagehand figures out the selector

- Resilient to UI changes

```

## Installation

```bash

npm install

```

**Requirements**: Chrome must be installed on your system.

## Setup

Set your Anthropic API key:

```bash

export ANTHROPIC_API_KEY="your-api-key"

```

**Note**: On first run, the demo will automatically copy your Chrome user data directory to `.chrome-profile` for browser automation. This preserves your cookies and logged-in sessions.

## Usage

### Interactive Mode

```bash

npx tsx agent-browse.ts

```

### With Initial Prompt

```bash

npx tsx agent-browse.ts "Go to Hacker News and get the title of the top post"

```

After Claude responds, you can:

- Ask follow-up questions

- Give new instructions

- Type `exit` or `quit` to end

### Example Tasks

```bash

# Complex multi-step workflow

npx tsx agent-browse.ts "Go to Hacker News, find the top post, click it, and summarize what it's about"

# Data extraction with reasoning

npx tsx agent-browse.ts "Navigate to example.com and extract any contact information you can find"

# Adaptive navigation

npx tsx agent-browse.ts "Go to github.com/browserbase/stagehand, take a screenshot, then find and click the documentation link"

```

Claude will:

1. **Plan** the steps needed (reasoning via Agent SDK)

2. **Execute** each step using Stagehand tools (natural language browser actions)

3. **Adapt** based on what it sees (screenshots, extracted data)

4. **Report** back with results

## Stagehand Tools

The demo exposes 6 Stagehand browser automation tools via MCP:

| Tool | Description | Example |

|------|-------------|---------|

| `navigate` | Go to a URL | `navigate({ url: "https://example.com" })` |

| `act` | Perform actions via natural language | `act({ action: "click the login button" })` |

| `extract` | Get structured data from the page | `extract({ instruction: "extract the title", schema: { title: "string" } })` |

| `observe` | Discover what's on the page | `observe({ query: "find all buttons" })` |

| `screenshot` | Capture the current page | `screenshot({})` |

| `close_browser` | Clean up when done | `close_browser({})` |

### How It Works

```typescript

const q = query({

  prompt: generateMessages(),

  options: {

    mcpServers: {

      "stagehand": stagehandServer  // Register Stagehand tools

    },

    allowedTools: [

      "mcp__stagehand__navigate",

      "mcp__stagehand__act",

      "mcp__stagehand__extract",

      "mcp__stagehand__observe",

      "mcp__stagehand__screenshot",

      "mcp__stagehand__close_browser"

    ]

  }

});

```

**The flow:**

1. Claude (via Agent SDK) decides what browser action to take

2. Claude calls a Stagehand MCP tool with natural language parameters

3. Stagehand translates the natural language into precise browser actions

4. Results flow back to Claude for the next decision

## Troubleshooting

### Chrome not found

Install Chrome for your platform:

- **macOS**: https://www.google.com/chrome/

- **Windows**: https://www.google.com/chrome/

- **Linux**: `sudo apt install google-chrome-stable`

### Profile refresh

To refresh cookies from your main Chrome profile:

```bash

rm -rf .chrome-profile

```

## Resources

- [Claude Agent SDK Documentation](https://docs.claude.com/en/api/agent-sdk/overview)

- [Stagehand Documentation](https://github.com/browserbase/stagehand)

- [MCP (Model Context Protocol)](https://modelcontextprotocol.io)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/browserbase/agent-browse

Awesome Lists containing this project

README