https://github.com/Saik0s/mcp-browser-use

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/Saik0s/mcp-browser-use
Owner: Saik0s
License: mit
Created: 2025-01-17T20:33:47.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-04-11T13:06:04.000Z (2 months ago)
Last Synced: 2025-05-04T21:53:17.120Z (about 2 months ago)
Language: Python
Size: 335 KB
Stars: 492
Watchers: 2
Forks: 66
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

mcp-index - Browser Use - Automates interactions with web browsers for tasks in testing and development, enabling natural language control and web research capabilities. (Browser Automation)

README

Browser Use Web UI

# browser-use MCP server
[![Documentation](https://img.shields.io/badge/Documentation-📕-blue)](https://docs.browser-use.com)
[![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)

> **Project Note**: This MCP server implementation builds upon the [browser-use/web-ui](https://github.com/browser-use/web-ui) foundation. Core browser automation logic and configuration patterns are adapted from the original project.

AI-driven browser automation server implementing the Model Context Protocol (MCP) for natural language browser control and web research.

## Features

- 🧠 **MCP Integration** - Full protocol implementation for AI agent communication.
- 🌐 **Browser Automation** - Page navigation, form filling, element interaction via natural language (`run_browser_agent` tool).
- 👁️ **Visual Understanding** - Optional screenshot analysis for vision-capable LLMs.
- 🔄 **State Persistence** - Option to manage a browser session across multiple MCP calls or connect to user's browser.
- 🔌 **Multi-LLM Support** - Integrates with OpenAI, Anthropic, Azure, DeepSeek, Google, Mistral, Ollama, OpenRouter, Alibaba, Moonshot, Unbound AI.
- 🔍 **Deep Research Tool** - Dedicated tool for multi-step web research and report generation (`run_deep_search` tool).
- ⚙️ **Environment Variable Configuration** - Fully configurable via environment variables.
- 🔗 **CDP Connection** - Ability to connect to and control a user-launched Chrome/Chromium instance via Chrome DevTools Protocol.

## Quick Start

### Prerequisites

- Python 3.11 or higher
- `uv` (fast Python package installer): `pip install uv`
- Chrome/Chromium browser installed
- Install Playwright browsers: `uv sync` and then `uv run playwright install`

### Integration with MCP Clients (e.g., Claude Desktop)

You can configure clients like Claude Desktop to connect to this server. Add the following structure to the client's configuration (e.g., `claude_desktop_config.json`), adjusting the path and environment variables as needed:

```json
// Example for Claude Desktop config
"mcpServers": {
"browser-use": {
// Option 1: Run installed package
// "command": "uvx",
// "args": ["mcp-server-browser-use"],

// Option 2: Run from local development source
"command": "uv",
"args": [
"--directory",
"/path/to/mcp-server-browser-use",
"run",
"mcp-server-browser-use"
],
"env": {
// --- CRITICAL: Add required API keys here ---
"OPENROUTER_API_KEY": "YOUR_OPENROUTER_API_KEY",
// "OPENAI_API_KEY": "YOUR_KEY_HERE_IF_USING_OPENAI",
// "ANTHROPIC_API_KEY": "YOUR_KEY_HERE_IF_USING_ANTHROPIC",
// ... add other keys based on MCP_MODEL_PROVIDER ...

// --- Optional Overrides (defaults are usually fine) ---
"MCP_MODEL_PROVIDER": "openrouter", // Use OpenRouter as provider
"MCP_MODEL_NAME": "google/gemini-2.5-pro-exp-03-25:free", // Example OpenRouter model
"BROWSER_HEADLESS": "true", // Default: run browser without UI
"BROWSER_USE_LOGGING_LEVEL": "INFO",

// --- Example for connecting to your own browser ---
// "MCP_USE_OWN_BROWSER": "true",
// "CHROME_CDP": "http://localhost:9222",

// Ensure Python uses UTF-8
"PYTHONIOENCODING": "utf-8",
"PYTHONUNBUFFERED": "1",
"PYTHONUTF8": "1"
}
}
}
```

**Important:** Ensure the `command` and `args` correctly point to how you want to run the server (either the installed package or from the source directory). Set the necessary API keys in the `env` section.

## MCP Tools

This server exposes the following tools via the Model Context Protocol:

### Synchronous Tools (Wait for Completion)

1. **`run_browser_agent`**
* **Description:** Executes a browser automation task based on natural language instructions and waits for it to complete. Uses settings prefixed with `MCP_` (e.g., `MCP_HEADLESS`, `MCP_MAX_STEPS`).
* **Arguments:**
* `task` (string, required): The primary task or objective.
* `add_infos` (string, optional): Additional context or hints for the agent (used by `custom` agent type).
* **Returns:** (string) The final result extracted by the agent or an error message.

2. **`run_deep_search`**
* **Description:** Performs in-depth web research on a topic, generates a report, and waits for completion. Uses settings prefixed with `MCP_RESEARCH_` and general `BROWSER_` settings (e.g., `BROWSER_HEADLESS`).
* **Arguments:**
* `research_task` (string, required): The topic or question for the research.
* `max_search_iterations` (integer, optional, default: 10): Max search cycles.
* `max_query_per_iteration` (integer, optional, default: 3): Max search queries per cycle.
* **Returns:** (string) The generated research report in Markdown format, including the file path, or an error message.

## Configuration (Environment Variables)

Configure the server using environment variables. You can set these in your system or place them in a `.env` file in the project root.

| Variable
| :------------------------------
| **LLM Settings**
| `MCP_MODEL_PROVIDER`
| `MCP_MODEL_NAME`
| `MCP_TEMPERATURE`
| `MCP_TOOL_CALLING_METHOD`
| `MCP_MAX_INPUT_TOKENS`
| `MCP_BASE_URL`
| `MCP_API_KEY`
| **Provider API Keys**
| `OPENAI_API_KEY`
| `ANTHROPIC_API_KEY`
| `GOOGLE_API_KEY`
| `AZURE_OPENAI_API_KEY`
| `DEEPSEEK_API_KEY`
| `MISTRAL_API_KEY`
| `OPENROUTER_API_KEY`
| `ALIBABA_API_KEY`
| `MOONSHOT_API_KEY`
| `UNBOUND_API_KEY`
| **Provider Endpoints**
| `OPENAI_ENDPOINT`
| `ANTHROPIC_ENDPOINT`
| `AZURE_OPENAI_ENDPOINT`
| `AZURE_OPENAI_API_VERSION`
| `DEEPSEEK_ENDPOINT`
| `MISTRAL_ENDPOINT`
| `OLLAMA_ENDPOINT`
| `OPENROUTER_ENDPOINT`
| `ALIBABA_ENDPOINT`
| `MOONSHOT_ENDPOINT`
| `UNBOUND_ENDPOINT`
| **Ollama Specific**
| `OLLAMA_NUM_CTX`
| `OLLAMA_NUM_PREDICT`
| **Agent Settings
| `MCP_AGENT_TYPE`
| `MCP_MAX_STEPS`
| `MCP_USE_VISION`
| `MCP_MAX_ACTIONS_PER_STEP`
| `MCP_KEEP_BROWSER_OPEN`
| `MCP_ENABLE_RECORDING`
| `MCP_SAVE_RECORDING_PATH`
| `MCP_AGENT_HISTORY_PATH`
| `MCP_HEADLESS`
| `MCP_DISABLE_SECURITY`
| **Deep Research
| `MCP_RESEARCH_MAX_ITERATIONS`
| `MCP_RESEARCH_MAX_QUERY`
| `MCP_RESEARCH_USE_OWN_BROWSER`
| `MCP_RESEARCH_SAVE_DIR`
| `MCP_RESEARCH_AGENT_MAX_STEPS`
| **Browser Settings
| `MCP_USE_OWN_BROWSER`
| `CHROME_CDP`
| `BROWSER_HEADLESS`
| `BROWSER_DISABLE_SECURITY`
| `CHROME_PATH`
| `CHROME_USER_DATA`
| `BROWSER_TRACE_PATH`
| `BROWSER_WINDOW_WIDTH`
| `BROWSER_WINDOW_HEIGHT`
| **Server & Logging**
| `LOG_FILE`
| `BROWSER_USE_LOGGING_LEVEL`
| `ANONYMIZED_TELEMETRY` | Description | Required? | Default Value | Example Value | | :------------------------------------------------------------------------------------------------------ | :----------------------------- | :-------------------------------- | :-------------------------------- | | | | | | | LLM provider to use. See options below. | **Yes** | `anthropic` | `openrouter` | | Specific model name for the chosen provider. | No | `claude-3-7-sonnet-20250219` | `anthropic/claude-3.7-sonnet` | | LLM temperature (0.0-2.0). Controls randomness. | No | `0.0` | `0.7` | | Method for tool invocation ('auto', 'json_schema', 'function_calling'). Affects `run_browser_agent`. | No | `auto` | `json_schema` | | Max input tokens for LLM context for `run_browser_agent`. | No | `128000` | `64000` | | Optional: Generic override for the LLM provider's base URL. | No | Provider-specific | `http://localhost:8080/v1` | | Optional: Generic override for the LLM provider's API key (takes precedence over provider-specific keys). | No | - | `sk-...` | | **Required based on `MCP_MODEL_PROVIDER` unless `MCP_API_KEY` is set.** | | | | | API Key for OpenAI. | If Used | - | `sk-...` | | API Key for Anthropic. | If Used | - | `sk-ant-...` | | API Key for Google AI (Gemini). | If Used | - | `AIza...` | | API Key for Azure OpenAI. | If Used | - | `...` | | API Key for DeepSeek. | If Used | - | `sk-...` | | API Key for Mistral AI. | If Used | - | `...` | | API Key for OpenRouter. | If Used | - | `sk-or-...` | | API Key for Alibaba Cloud (DashScope). | If Used | - | `sk-...` | | API Key for Moonshot AI. | If Used | - | `sk-...` | | API Key for Unbound AI. | If Used | - | `...` | | Optional: Override default API endpoints. | | | | | OpenAI API endpoint URL. | No | `https://api.openai.com/v1` | | | Anthropic API endpoint URL. | No | `https://api.anthropic.com` | | | **Required if using Azure.** Your Azure resource endpoint. | If Used | - | `https://res.openai.azure.com/` | | Azure API version. | No | `2025-01-01-preview` | `2023-12-01-preview` | | DeepSeek API endpoint URL. | No | `https://api.deepseek.com` | | | Mistral API endpoint URL. | No | `https://api.mistral.ai/v1` | | | Ollama API endpoint URL. | No | `http://localhost:11434` | `http://ollama.local:11434` | | OpenRouter API endpoint URL. | No | `https://openrouter.ai/api/v1` | | | Alibaba (DashScope) API endpoint URL. | No | `https://dashscope...v1` | | | Moonshot API endpoint URL. | No | `https://api.moonshot.cn/v1` | | | Unbound AI API endpoint URL. | No | `https://api.getunbound.ai` | | | | | | | | Context window size for Ollama models. | No | `32000` | `8192` | | Max tokens to predict for Ollama models. | No | `1024` | `2048` | (`run_browser_agent`)** | | | | | | Agent implementation for `run_browser_agent` ('org' or 'custom'). | No | `org` | `custom` | | Max steps per agent run. | No | `100` | `50` | | Enable vision capabilities (screenshot analysis). | No | `true` | `false` | | Max actions per agent step. | No | `5` | `10` | | Keep browser managed by server open between `run_browser_agent` calls (if `MCP_USE_OWN_BROWSER=false`). | No | `false` | `true` | | Enable Playwright video recording for `run_browser_agent`. | No | `false` | `true` | | Path to save agent run video recordings (Required if `MCP_ENABLE_RECORDING=true`). | If Recording | - | `./tmp/recordings` | | Directory to save agent history JSON files. | No | `./tmp/agent_history` | `./agent_runs` | | Run browser without UI specifically for `run_browser_agent` tool. | No | `true` | `false` | | Disable browser security features specifically for `run_browser_agent` tool (use cautiously). | No | `true` | `false` | Settings (`run_deep_search`)** | | | | | | Max search iterations for deep research. | No | `10` | `5` | | Max search queries per iteration. | No | `3` | `5` | | Use a separate browser instance for research (requires `CHROME_CDP` if `MCP_USE_OWN_BROWSER=true`). | No | `false` | `true` | | Directory to save research artifacts (report, results). | No | `./tmp/deep_research/{task_id}` | `./research_output` | | Max steps for sub-agents within deep research. | No | `10` | `15` | (General & Specific Tool Overrides)** | | | | | | Set to true to connect to user's browser via `CHROME_CDP` instead of launching a new one. | No | `false` | `true` | | Connect to existing Chrome via DevTools Protocol URL. Required if `MCP_USE_OWN_BROWSER=true`. | If `MCP_USE_OWN_BROWSER=true` | - | `http://localhost:9222` | | Run browser without visible UI. Primarily affects `run_deep_search`. See also `MCP_HEADLESS`. | No | `true` | `false` | | General browser security setting. See also `MCP_DISABLE_SECURITY`. | No | `false` | `true` | | Path to Chrome/Chromium executable. | No | - | `/usr/bin/chromium-browser` | | Path to Chrome user data directory (for persistent sessions, useful with `CHROME_CDP`). | No | - | `~/.config/google-chrome/Profile 1` | | Directory to save Playwright trace files (useful for debugging). | No | `./tmp/trace` | `./traces` | | Browser window width (pixels). | No | `1280` | `1920` | | Browser window height (pixels). | No | `720` | `1080` | | | | | | | Path for the server log file. | No | `mcp_server_browser_use.log` | `/var/log/mcp_browser.log` | | Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`). | No | `INFO` | `DEBUG` | | Enable/disable anonymized telemetry (`true`/`false`). | No | `true` | `false` |

**Supported LLM Providers (`MCP_MODEL_PROVIDER`):**

`openai`, `azure_openai`, `anthropic`, `google`, `mistral`, `ollama`, `deepseek`, `openrouter`, `alibaba`, `moonshot`, `unbound`

## Connecting to Your Own Browser (CDP)

Instead of having the server launch and manage its own browser instance, you can connect it to a Chrome/Chromium browser that you launch and manage yourself. This is useful for:

* Using your existing browser profile (cookies, logins, extensions).
* Observing the automation directly in your own browser window.
* Debugging complex scenarios.

**Steps:**

1. **Launch Chrome/Chromium with Remote Debugging Enabled:**
Open your terminal or command prompt and run the command appropriate for your operating system. This tells Chrome to listen for connections on a specific port (e.g., 9222).

* **macOS:**
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
```
*(Adjust the path if Chrome is installed elsewhere)*

* **Linux:**
```bash
google-chrome --remote-debugging-port=9222
# or
chromium-browser --remote-debugging-port=9222
```

* **Windows (Command Prompt):**
```cmd
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
```
*(Adjust the path to your Chrome installation if necessary)*

* **Windows (PowerShell):**
```powershell
& "C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
```
*(Adjust the path to your Chrome installation if necessary)*

**Note:** If port 9222 is already in use, choose a different port (e.g., 9223) and use that same port in the `CHROME_CDP` environment variable.

2. **Configure Environment Variables:**
Set the following environment variables in your `.env` file or system environment before starting the MCP server:

```dotenv
MCP_USE_OWN_BROWSER=true
CHROME_CDP=http://localhost:9222 # Use the same port you launched Chrome with
```
* `MCP_USE_OWN_BROWSER=true`: Tells the server to connect to an existing browser instead of launching one.
* `CHROME_CDP`: Specifies the URL where the server can connect to your browser's DevTools Protocol endpoint.

3. **Run the MCP Server:**
Start the server as usual:
```bash
uv run mcp-server-browser-use
```

Now, when you use the `run_browser_agent` or `run_deep_search` tools, the server will connect to your running Chrome instance instead of creating a new one.

**Important Considerations:**

* The browser launched with `--remote-debugging-port` must remain open while the MCP server is running and needs to interact with it.
* Ensure the `CHROME_CDP` URL is accessible from where the MCP server is running (usually `http://localhost:PORT` if running on the same machine).
* Using your own browser means the server inherits its state (open tabs, logged-in sessions). Be mindful of this during automation.
* Settings like `MCP_HEADLESS`, `BROWSER_HEADLESS`, `MCP_KEEP_BROWSER_OPEN` are ignored when `MCP_USE_OWN_BROWSER=true`. Window size is determined by your browser window.

## Development

```bash
# Install dev dependencies and sync project deps
uv sync --dev

# Install playwright browsers
uv run playwright install

# Run with debugger (Example connecting to own browser via CDP)
# 1. Launch Chrome: google-chrome --remote-debugging-port=9222
# 2. Run inspector command:
npx @modelcontextprotocol/inspector@latest \
-e OPENROUTER_API_KEY=$OPENROUTER_API_KEY \
-e MCP_MODEL_PROVIDER=openrouter \
-e MCP_MODEL_NAME=anthropic/claude-3.7-sonnet \
-e MCP_USE_OWN_BROWSER=true \
-e CHROME_CDP=http://localhost:9222 \
uv --directory . run mcp run src/mcp_server_browser_use/server.py
# Note: Change timeout in inspector's config panel if needed (default is 10 seconds)
```

## Troubleshooting

- **Browser Conflicts**: If *not* using `CHROME_CDP` (`MCP_USE_OWN_BROWSER=false`), ensure no other conflicting Chrome instances are running with the same user data directory if `CHROME_USER_DATA` is specified.
- **CDP Connection Issues**: If using `MCP_USE_OWN_BROWSER=true`:
* Verify Chrome was launched with the `--remote-debugging-port` flag.
* Ensure the port in `CHROME_CDP` matches the port used when launching Chrome.
* Check for firewall issues blocking the connection to the specified port.
* Make sure the browser is still running.
- **API Errors**: Double-check that the correct API key environment variable (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) is set for your chosen `MCP_MODEL_PROVIDER`, or that `MCP_API_KEY` is set. Verify keys and endpoints (`AZURE_OPENAI_ENDPOINT` is required for Azure).
- **Vision Issues**: Ensure `MCP_USE_VISION=true` if using vision features and that your selected LLM model supports vision.
- **Dependency Problems**: Run `uv sync` to ensure all dependencies are correctly installed. Check `pyproject.toml`.
- **Logging**: Check the log file specified by `LOG_FILE` (default: `mcp_server_browser_use.log`) for detailed error messages. Increase `BROWSER_USE_LOGGING_LEVEL` to `DEBUG` for more verbose output.

## License

MIT - See [LICENSE](LICENSE) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Saik0s/mcp-browser-use

Awesome Lists containing this project

README