{"id":18376333,"url":"https://github.com/fmind/bromate","last_synced_at":"2025-10-10T10:35:58.873Z","repository":{"id":255883527,"uuid":"837814749","full_name":"fmind/bromate","owner":"fmind","description":"Web browser automation through agentic workflows.","archived":false,"fork":false,"pushed_at":"2024-09-14T12:12:34.000Z","size":13126,"stargazers_count":17,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-21T00:02:12.236Z","etag":null,"topics":["agent","automation","browser","gemini","generative-ai","python","selenium"],"latest_commit_sha":null,"homepage":"https://fmind.github.io/bromate/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fmind.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-04T05:45:34.000Z","updated_at":"2025-03-15T09:17:46.000Z","dependencies_parsed_at":"2024-11-06T00:37:01.716Z","dependency_job_id":null,"html_url":"https://github.com/fmind/bromate","commit_stats":null,"previous_names":["fmind/bromate"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmind%2Fbromate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmind%2Fbromate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmind%2Fbromate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmind%2Fbromate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fmind","download_url":"https://codeload.github.com/fmind/bromate/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247547458,"owners_count":20956558,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","automation","browser","gemini","generative-ai","python","selenium"],"created_at":"2024-11-06T00:22:55.174Z","updated_at":"2025-10-10T10:35:53.819Z","avatar_url":"https://github.com/fmind.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bromate\n\n[![check.yml](https://github.com/fmind/bromate/actions/workflows/check.yml/badge.svg)](https://github.com/fmind/bromate/actions/workflows/check.yml)\n[![publish.yml](https://github.com/fmind/bromate/actions/workflows/publish.yml/badge.svg)](https://github.com/fmind/bromate/actions/workflows/publish.yml)\n[![Documentation](https://img.shields.io/badge/documentation-available-brightgreen.svg)](https://fmind.github.io/bromate/)\n[![License](https://img.shields.io/github/license/fmind/bromate)](https://github.com/fmind/bromate/blob/main/LICENCE.txt)\n[![Release](https://img.shields.io/github/v/release/fmind/bromate)](https://github.com/fmind/bromate/releases)\n\n**Bromate** is an experimental project that explores the capabilities of agent workflows for automating web browser interactions.\n\n## Overview\n\nBromate leverages the power of large language models (LLMs), specifically Google's Gemini, to understand user requests expressed in natural language and translate them into a series of actions that automate web browsing tasks.\n\nIt utilizes Selenium for browser control and interaction, offering a seamless way to automate complex workflows within a web browser environment.\n\n## Prerequisites\n\nBefore using Bromate, you need to obtain an API key from Google for the Gemini API. You can get a key by following these steps:\n\n1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey) and click on \"Get an API key\".\n2. [Set the secret key as an environment variables in your system](https://www3.ntu.edu.sg/home/ehchua/programming/howto/Environment_Variables.html).\n\n**Setting the API Key in `.env` during development:**\n\nYou can set the API key in a `.env` file in the project repository:\n\n```bash\nGOOGLE_API_KEY=YOUR_API_KEY\n```\n\nYou can check if the key is configured by typing `echo $GOOGLE_API_KEY` in your shell.\n\n## Installation\n\nBromate is available on PyPI and can be easily installed using pip:\n\n```bash\npip install bromate\n```\n\n## Usage\n\nTo use Bromate, you can provide a natural language query describing the task you want to automate. Bromate will then interact with the agent (Gemini) to interpret the query and generate a sequence of actions to be executed by the Selenium WebDriver.\n\n**Example 1: Subscribe to the MLOps Community Newsletter:**\n\n[![MLOps Demo](https://img.youtube.com/vi/EYjwaZjfQ4E/0.jpg)](https://www.youtube.com/watch?v=EYjwaZjfQ4E)\n\n\u003e bromate \"Open the https://MLOps.Community website. Click on the 'Join' link. Write the address 'hello@mlops'\"\n\n**Example 2: Find the latest version of the Python language:**\n\n[![Python Demo](https://img.youtube.com/vi/dUSk9_8JnE4/0.jpg)](https://www.youtube.com/watch?v=dUSk9_8JnE4)\n\n\u003e bromate --interaction.stay_open=False --agent.name \"gemini-1.5-pro-latest\" \"Go to Python.org. Click on the downloads page. Click on the PEP link for the future Python release. Summarize the release schedule dates.\"\n\n## Arguments\n\n```bash\nbromate -h\nusage: bromate [-h] [--agent JSON] [--agent.api_key {SecretStr,null}] [--agent.name str] [--agent.temperature float] [--agent.candidate_count int]\n               [--agent.max_output_tokens int] [--agent.system_instructions str] [--action JSON] [--action.sleep_time float] [--driver JSON]\n               [--driver.name {Chrome,Firefox}] [--driver.keep_alive bool] [--driver.maximize_window bool] [--execution JSON] [--execution.stop_actions list[str]]\n               [--execution.default_message str] [--interaction JSON] [--interaction.stay_open bool] [--interaction.interactive bool] [--interaction.max_interactions int]\n               QUERY\n\nExecute actions on web browser from a user query in natural language.\n\npositional arguments:\n  QUERY                 User query in natural language\n\noptions:\n  -h, --help            show this help message and exit\n\nagent options:\n  Configuration of the agent\n\n  --agent JSON          set agent from JSON string\n  --agent.api_key {SecretStr,null}\n                        API key of the agent platform (Google) (default: **********)\n  --agent.name str      Name of the agent to use (default: gemini-1.5-flash-latest)\n  --agent.temperature float\n                        Temperature of the agent (default: 0.0)\n  --agent.candidate_count int\n                        Number of candidates to generate (default: 1)\n  --agent.max_output_tokens int\n                        Maximum output tokens to generate (default: 1000)\n  --agent.system_instructions str\n                        System instructions for the agent (default: You are a browser automation system. Your goal is to understand the user request and execute actions\n                        on its browser using the tools at your disposal. After each step, you will receive a screenshot and the page source of the current browser\n                        window.)\n\naction options:\n  Configuration for all actions\n\n  --action JSON         set action from JSON string\n  --action.sleep_time float\n                        Time to sleep after loading a page (default: 0.5)\n\ndriver options:\n  Configuration of the web driver\n\n  --driver JSON         set driver from JSON string\n  --driver.name {Chrome,Firefox}\n                        Name of the driver to use (default: Chrome)\n  --driver.keep_alive bool\n                        Keep the browser open at the end of the execution (default: True)\n  --driver.maximize_window bool\n                        Maximize the browser window at the start of the execution (default: True)\n\nexecution options:\n  Configuration of the execution\n\n  --execution JSON      set execution from JSON string\n  --execution.stop_actions list[str]\n                        Name of actions that can stop the execution (default: ['done'])\n  --execution.default_message str\n                        Default message to send to the agent when no input is provided by the user (default: Continue the execution if necessary or call the done tool if\n                        you are done)\n\ninteraction options:\n  Configuration of the interaction\n\n  --interaction JSON    set interaction from JSON string\n  --interaction.stay_open bool\n                        Keep the browser open before exiting (default: True)\n  --interaction.interactive bool\n                        Ask for user input after every action (default: False)\n  --interaction.max_interactions int\n                        Maximum number of interactions for the agent (default: 5)\n```\n\n## How it Works\n\nBromate operates in a loop, continuously interacting with the Gemini agent and the Selenium WebDriver to automate browser tasks. Here's a breakdown of the core behavior:\n\n**1. Initialization:**\n\n- A Selenium WebDriver is initialized based on your configuration (e.g., Chrome or Firefox). This provides the interface for controlling the browser.\n- The Gemini LLM is initialized, and its \"tools\" are defined. These tools correspond to the actions that the model can instruct the WebDriver to perform. The available actions are defined in the `src/bromate/actions.py` file.  Examples include:\n    - `get`: Open a specific URL in the browser.\n    - `click`: Click on an element identified by a CSS selector.\n    - `write`: Enter text into an element.\n    - `back`: Navigate back to the previous page.\n    - `done`: Signal the end of the automation task.\n\n**2. Action Selection and Execution:**\n\n- You provide an initial query in natural language describing the task you want to automate.\n- At each step, the Gemini model analyzes the current state of the browser (HTML code and screenshot), considers your query and previous interactions, and decides on the most relevant action to take.\n- The chosen action is then executed by the Selenium WebDriver, modifying the browser state.\n\n**3. Feedback Loop:**\n\n- After each action is performed, the updated HTML code of the page and a screenshot of the browser window are sent back to the Gemini model. This provides the model with feedback about the effects of its actions.\n- The loop continues until the model either decides to execute the `done` action, indicating the task is complete, or a maximum number of interactions is reached.\n\nThis iterative process allows Bromate to dynamically adapt to changes in the browser environment and perform complex automation tasks based on natural language instructions.\n\n## Development\n\nBromate's development workflow is managed using Pyinvoke. The `tasks/` folder contains various tasks for managing the project:\n\n- **checks.py:** Tasks for code quality checks (linting, type checking, testing, security).\n- **cleans.py:** Tasks for cleaning up build artifacts and caches.\n- **containers.py:** Tasks for building and running Docker containers.\n- **docs.py:** Tasks for generating and serving API documentation.\n- **formats.py:** Tasks for code formatting.\n- **installs.py:** Tasks for installing dependencies and pre-commit hooks.\n- **packages.py:** Tasks for building and publishing Python packages.\n- **publishes.py:** Tasks for publishing artifacts on software repositories.\n\n## License\n\nBromate is licensed under the MIT License. See the [LICENSE](LICENCE.txt) file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffmind%2Fbromate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffmind%2Fbromate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffmind%2Fbromate/lists"}