Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gregpr07/browser-use
https://github.com/gregpr07/browser-use
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/gregpr07/browser-use
- Owner: gregpr07
- Created: 2024-10-31T16:00:56.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-11-05T15:17:09.000Z (about 1 month ago)
- Last Synced: 2024-11-05T15:21:33.983Z (about 1 month ago)
- Language: Python
- Size: 2.32 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-ChatGPT-repositories - browser-use - Open-Source Web Automation library with any LLM (Browser-extensions)
README
# π Browser-Use
### Open-Source Web Automation with LLMs
[![GitHub stars](https://img.shields.io/github/stars/gregpr07/browser-use?style=social)](https://github.com/gregpr07/browser-use/stargazers)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![Discord](https://img.shields.io/discord/1303749220842340412?color=7289DA&label=Discord&logo=discord&logoColor=white)](https://discord.gg/uaCtrbbv)Let LLMs interact with websites through a simple interface.
## Short Example
```bash
pip install browser-use
``````python
from langchain_openai import ChatOpenAI
from browser_use import Agentagent = Agent(
task="Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour.",
llm=ChatOpenAI(model="gpt-4o"),
)# ... inside an async function
await agent.run()
```## Demo
Prompt: Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. (1x speed)
Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)
Prompt: Go to kayak.com and find a one-way flight from ZΓΌrich to San Francisco on 12 January 2025. (2.5x speed)
Prompt: Opening new tabs and searching for images for these people: Albert Einstein, Oprah Winfrey, Steve Jobs. (2.5x speed)
## Local Setup
1. Create a virtual environment and install dependencies:
```bash
# I recommend using uv
pip install .
```2. Add your API keys to the `.env` file:
```bash
cp .env.example .env
```
E.g. for OpenAI:
```bash
OPENAI_API_KEY=
```You can use any LLM model supported by LangChain by adding the appropriate environment variables. See [langchain models](https://python.langchain.com/docs/integrations/chat/) for available options.
## Features
- Universal LLM Support - Works with any Language Model
- Interactive Element Detection - Automatically finds interactive elements
- Multi-Tab Management - Seamless handling of browser tabs
- XPath Extraction for scraping functions - No more manual DevTools inspection
- Vision Model Support - Process visual page information
- Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
- Handles dynamic content - dont worry about cookies or changing content
- Chain-of-thought prompting with memory - Solve long-term tasks
- Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions## Advanced Examples
### Chain of Agents
You can persist the browser across multiple agents and chain them together.
```python
from asyncio import run
from browser_use import Agent, Controller
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
load_dotenv()# Persist browser state across agents
controller = Controller()# Initialize browser agent
agent1 = Agent(
task="Open 3 VCs websites in the New York area.",
llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None),
controller=controller)
agent2 = Agent(
task="Give me the names of the founders of the companies in all tabs.",
llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None),
controller=controller)run(agent1.run())
founders, history = run(agent2.run())print(founders)
```You can use the `history` to run the agents again deterministically.
## Command Line Usage
Run examples directly from the command line (clone the repo first):
```bash
python examples/try.py "Your query here" --provider [openai|anthropic]
```### Anthropic
You need to add `ANTHROPIC_API_KEY` to your environment variables. Example usage:
```bash
python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropic
```### OpenAI
You need to add `OPENAI_API_KEY` to your environment variables. Example usage:
```bash
python examples/try.py "Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropic
```## π€ Supported Models
All LangChain chat models are supported. Tested with:
- GPT-4o
- GPT-4o Mini
- Claude 3.5 Sonnet
- LLama 3.1 405B## Limitations
- When extracting page content, the message length increases and the LLM gets slower.
- Currently one agent costs about 0.01$
- Sometimes it tries to repeat the same task over and over again.
- Some elements might not be extracted which you want to interact with.
- What should we focus on the most?
- Robustness
- Speed
- Cost reduction## Roadmap
- [x] Save agent actions and execute them deterministically
- [ ] Pydantic forced output
- [ ] Third party SERP API for faster Google Search results
- [ ] Multi-step action execution to increase speed
- [ ] Test on mind2web dataset
- [ ] Add more browser actions## Contributing
Contributions are welcome! Feel free to open issues for bugs or feature requests.
Feel free to join the [Discord](https://discord.gg/Wy9qE4TKHZ) for discussions and support.
---
Star β this repo if you find it useful!
Made with β€οΈ by the Browser-Use team