https://github.com/williamkapke/kapture
Chrome DevTools Extension that enables browser automation through MCP
https://github.com/williamkapke/kapture
Last synced: 6 months ago
JSON representation
Chrome DevTools Extension that enables browser automation through MCP
- Host: GitHub
- URL: https://github.com/williamkapke/kapture
- Owner: williamkapke
- Created: 2025-06-14T17:57:55.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2025-07-07T06:39:27.000Z (6 months ago)
- Last Synced: 2025-07-07T07:40:50.306Z (6 months ago)
- Language: JavaScript
- Size: 1.63 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome - williamkapke/kapture - Chrome DevTools Extension that enables browser automation through MCP (JavaScript)
README
# Kapture - Browser Automation via Chrome DevTools
Kapture is a Chrome DevTools Extension that enables browser automation through the Model Context Protocol (MCP). It allows AI applications like Claude to control web browsers via a three-layer architecture.
**✨ Key Feature**: Multiple AI clients can connect to the same server! Claude Desktop, Cline, and other MCP clients can all control browser tabs through a single server instance.
]

## Overview
Kapture bridges AI assistants with web browsers through:
- **MCP Server**: Handles MCP protocol communication
- **Chrome Extension**: DevTools panel for browser automation
- **WebSocket Bridge**: Real-time communication between server and extensions
- **Multi-Client Support**: Multiple AI clients can connect simultaneously via WebSocket
## Architecture

## Quick Start
### 1. Install Dependencies
```bash
# Server
cd server
npm install
npm run build
# Test App (optional)
cd test-app
npm install
```
### 2. Install Chrome Extension
#### Option A: Install from Chrome Web Store (Recommended)
1. Visit the [Kapture Chrome Web Store page](https://to.kap.co/kapture-extension)
2. Click "Add to Chrome"
3. Confirm the installation
#### Option B: Load Unpacked (Developer Mode)
1. Open Chrome and navigate to `chrome://extensions/`
2. Enable "Developer mode"
3. Click "Load unpacked"
4. Select the `extension` folder
### 3. Start MCP Server
[Configure your AI client](https://williamkapke.github.io/kapture/MCP_USAGE.html) and open it. It will start the MCP server automatically.
**OR**
Run the Test App:
```bash
cd test-app
npm run dev
```
The server starts on port 61822.
```bash
# Server
cd server
npm start
# Test App
cd test-app
npm start
```
### 4. Connect via DevTools
1. Open any website in Chrome
2. Open Chrome/Brave Developer Tools (F12 on Windows/Linux, Cmd+Option+I on macOS)
3. Navigate to "Kapture" panel
4. The extension will automatically connect to the server on port 61822
5. Select a server from the dropdown to connect automatically
## Using with Claude Desktop
Add to your Claude Desktop config:
### Option 1: Using the bridge command (Recommended)
This single command starts the server and handles stdio-to-WebSocket translation:
```json
{
"mcpServers": {
"kapture": {
"command": "npx",
"args": ["-y", "kapture-mcp", "bridge"]
}
}
}
```
### Option 2: Direct WebSocket connection (Advanced)
For advanced use cases where you need manual server control:
1. Start the server manually:
```bash
npx kapture-mcp
```
2. Configure Claude Desktop to connect via WebSocket:
```json
{
"mcpServers": {
"kapture": {
"transport": "websocket",
"url": "ws://localhost:61822/mcp"
}
}
}
```
**Note**: This approach requires manually managing the server lifecycle. Use Option 1 (bridge command) for most use cases.
## 🚀 Run Multiple AI Assistants Simultaneously
Kapture supports multiple MCP clients connecting to the same server! You can run Claude Desktop, Cline, and other MCP clients simultaneously through a single server instance.
### How It Works
- All MCP clients connect via WebSocket to `ws://localhost:61822/mcp`
- All MCP clients share access to the same browser tabs
- Notifications are broadcast to all connected clients
### Smart Server Detection
When running `npx kapture-mcp`, the command automatically detects if a server is already running:
- **No existing server**: Starts a new server on port 61822
- **Server already running**: Shows connection information and exits gracefully
This prevents errors and confusion when multiple clients try to start servers.
### Setting Up Multiple Clients
Each client should use the same bridge command configuration:
**Claude Desktop**:
```json
{
"mcpServers": {
"kapture": {
"command": "npx",
"args": ["-y", "kapture-mcp", "bridge"]
}
}
}
```
**Cline/VS Code**:
```json
{
"cline.mcpServers": {
"kapture": {
"command": "npx",
"args": ["-y", "kapture-mcp", "bridge"]
}
}
}
```
**Other MCP Clients**:
Use the same configuration pattern with `"command": "npx"` and `"args": ["-y", "kapture-mcp", "bridge"]`.
[See the complete multi-assistant guide →](https://williamkapke.github.io/kapture/MULTI_ASSISTANT_GUIDE.html)
### Benefits of Multiple AI Assistants:
- **Parallel Workflows**: Have Claude Desktop research while Cline develops code
- **Specialized Tasks**: Use different AI clients for different types of automation
- **Team Collaboration**: Multiple team members can use their preferred AI tools simultaneously
- **Testing & Development**: Test automation scripts with one AI while developing with another
Then ask Claude to interact with web pages:
- "Navigate to example.com and take a screenshot"
- "Click the search button"
- "Fill in the email field with test@example.com"
## Available MCP Tools
- `navigate` - Navigate to URL
- `back` - Browser back button
- `forward` - Browser forward button
- `reload` - Reload the current page (similar to pressing F5)
- `click` - Click elements (uses first matching element, returns unique selector)
- `hover` - Hover over elements (uses first matching element, returns unique selector)
- `fill` - Fill input fields (uses first matching element, returns unique selector)
- `select` - Select dropdown options (HTML `` only, uses first matching element, returns unique selector)
- `keypress` - Send keyboard events to the page or specific elements (supports modifier keys)
- `elements` - Query all elements matching a CSS selector or XPath with optional visibility filtering
**Note on Selectors**: Tools that accept a `selector` parameter (`click`, `hover`, `fill`, `select`, `keypress`, `screenshot`, `dom`) will only operate on the **first element** that matches the CSS selector. The tool response includes the unique selector of the actual element that was used, which may include an auto-generated ID if the element didn't have one.
**XPath Support**: All tools that accept a `selector` parameter also accept an `xpath` parameter as an alternative. This is particularly useful for:
- Finding elements by text content: `xpath: "//button[contains(text(), 'Submit')]"`
- Complex element relationships: `xpath: "//div[@class='container']//span[2]"`
- When CSS selectors are insufficient
Use either `selector` OR `xpath`, not both. If both are provided, `selector` takes precedence.
### Keypress Tool
The `keypress` tool simulates keyboard events. It accepts:
- `key` (required): The key combination to press. Can be:
- Single key: `"a"`, `"Enter"`, `"Tab"`, `"Escape"`, `" "` (space), `"Shift"`, `"Control"`
- With modifiers: `"Control+a"`, `"Shift+Tab"`, `"Alt+F4"`, `"Meta+Shift+p"`
- Modifier names: `Control` (or `Ctrl`), `Shift`, `Alt`, `Meta` (or `Cmd`/`Command`)
- Note: When sending just a modifier key (e.g., `"Shift"`), it's treated as pressing that key alone
- Duplicate modifiers are ignored (e.g., `"Shift+Shift+a"` is same as `"Shift+a"`)
- `selector` or `xpath` (optional): Target a specific element. If not provided, sends to document.body
Examples:
```json
// Press Enter
{ "key": "Enter", "selector": "#login-form" }
// Select all text (Ctrl+A)
{ "key": "Control+a", "selector": "#username" }
// Zoom in (Ctrl+Plus)
{ "key": "Control++", "selector": "body" }
// Zoom out (Ctrl+Minus)
{ "key": "Control+-", "selector": "body" }
// New tab (Ctrl+T)
{ "key": "Control+t" }
// Close tab (Ctrl+W or Cmd+W on Mac)
{ "key": "Meta+w" }
```
### MCP Resources
- `kapture://tabs` - List all connected browser tabs
- `kapture://tab/{tabId}` - Get detailed information about a specific tab
- `kapture://tab/{tabId}/console` - Get console logs from a specific tab (with pagination support)
- `kapture://tab/{tabId}/screenshot` - Capture screenshots of a tab or element
- `kapture://tab/{tabId}/dom` - Get HTML content of a tab or element
- `kapture://tab/{tabId}/elementsFromPoint` - Get elements at specific coordinates
- `kapture://tab/{tabId}/elements?selector={selector}&visible={true|false|all}` - Query all elements matching a CSS selector or XPath with optional visibility filtering
## Development
### Server Development
```bash
cd server
npm run dev # Development with hot-reload
```
### Test App
```bash
cd test-app
npm run dev # Run Electron test app
```
### Extension Development
After making changes:
1. Go to `chrome://extensions/`
2. Click refresh on Kapture extension
### Key Components
**Server** (`/server`):
- `mcp-handler.ts` - MCP protocol implementation
- `websocket-manager.ts` - WebSocket server
- `tab-registry.ts` - Tab tracking
- `tools/*.ts` - MCP tool implementations
**Extension** (`/extension`):
- `panel/command-executor.js` - Command execution
- `panel/command-queue.js` - Sequential execution
- `background.js` - Screenshot service worker
## DevTools Panel Features
- **Automatic Connection** - Connects to server on port 61822
- **Server Selection** - Dropdown to choose between multiple running servers
- **Connection Status** - Real-time server connection indicator
- **Tab Info** - Current tab ID and URL display
- **Command Testing** - Manual command execution interface
- **Console Viewer** - Live console log capture
- **History** - Command history
- **Dark Theme** - Toggle between light/dark modes
## Troubleshooting
### Connection Issues
- The extension will automatically connect to the server on port 61822
- If no servers are found, verify the server is running
- Check the server dropdown to see which servers were discovered
- Check browser console for errors
- Check server logs in the terminal
### Extension Not Showing
- Ensure extension is loaded and enabled
- Close and reopen DevTools
- Reload extension in `chrome://extensions/`
### Command Timeouts
- Default timeout is 5 seconds
- Some commands accept custom timeout parameter
- Check element selectors are correct
### Performance Considerations
**Important**: The `click` and `hover` tools may experience performance issues when the Kapture DevTools panel is not the active/selected tab in Chrome DevTools. For optimal performance:
- Keep the Kapture panel selected during automation
- If experiencing delays with click/hover operations, switch to the Kapture panel
- This is due to Chrome's optimization of inactive DevTools panels
## Security
- Commands execute within Chrome's DevTools sandbox
- Each tab has unique ID preventing cross-tab interference
- No direct file system access from extension
- Tab registry enforces command isolation
## License
MIT