{"id":24345961,"url":"https://github.com/jeomon/web-agent","last_synced_at":"2025-04-09T18:12:21.576Z","repository":{"id":271482879,"uuid":"867203713","full_name":"Jeomon/Web-Agent","owner":"Jeomon","description":"Web Agent is a state-of-the-art browser automation tool driven by advanced AI technologies. Designed for seamless navigation and task execution on the web, it intelligently interacts with dynamic web elements, performs searches, downloads files, and adapts to page changes.","archived":false,"fork":false,"pushed_at":"2025-04-08T01:33:41.000Z","size":48299,"stargazers_count":43,"open_issues_count":1,"forks_count":9,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-09T18:12:16.174Z","etag":null,"topics":["agent","gemini","groq","langgraph","llm-agent","ollama","web","web-agent","web-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jeomon.png","metadata":{"files":{"readme":"docs/README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-03T16:20:20.000Z","updated_at":"2025-04-08T20:53:11.000Z","dependencies_parsed_at":"2025-02-18T13:22:09.082Z","dependency_job_id":"37fb179b-326e-4e29-8af9-4a7a98881538","html_url":"https://github.com/Jeomon/Web-Agent","commit_stats":null,"previous_names":["jeomon/web-agent","computer-agent/web-agent"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jeomon%2FWeb-Agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jeomon%2FWeb-Agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jeomon%2FWeb-Agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jeomon%2FWeb-Agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jeomon","download_url":"https://codeload.github.com/Jeomon/Web-Agent/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248085326,"owners_count":21045139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","gemini","groq","langgraph","llm-agent","ollama","web","web-agent","web-search"],"created_at":"2025-01-18T10:29:48.809Z","updated_at":"2025-04-09T18:12:21.570Z","avatar_url":"https://github.com/Jeomon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **Web Agent: Automating Browser Tasks with AI**\n\n## Overview\n**Web Agent** is a state-of-the-art browser automation tool driven by advanced AI technologies. Designed for seamless navigation and task execution on the web, it intelligently interacts with dynamic web elements, performs searches, downloads files, and adapts to page changes. By leveraging cutting-edge LLMs and the Playwright framework, Web Agent simplifies complex tasks and enhances productivity.\n\n## Demo\nhttps://github.com/user-attachments/assets/499f248c-b160-45cc-a6a5-2819b6955aab\n\n---\n\n## **Key Features**\n\n1. **AI-Driven Decision Making**  \n   - Utilizes **Groq (Llama 3.3)** and **Gemini (Gemini 2.0)** LLMs for contextual understanding and task execution.\n   - Smart reasoning ensures the correct action is taken based on webpage context.\n\n2. **Interactive Element Detection**  \n   - Automatically identifies clickable elements, input fields, and navigation links.\n   - Utilizes screenshots, bounding boxes, and browser state capture for precise targeting.\n\n3. **Multi-Browser Compatibility**  \n   - Supports **Chromium**, **Firefox**, and **WebKit (Safari)** via Playwright for cross-platform performance.\n\n4. **Dynamic Adaptation**  \n   - Handles dynamic content changes and page layout variations.\n   - Automatically recovers from errors such as misclicks or missed interactions.\n\n5. **Enhanced Performance Options**  \n   - Supports both **visual-based** interactions through screenshots and **non-visual** interactions using accessibility trees for resource-efficient automation.\n\n---\n\n## **How It Works**\n\n1. **State Capture \u0026 Interaction Analysis:**  \n   - Captures browser state and webpage layout, annotates interactive elements, and stores relevant data for decision-making.\n\n2. **LLM Integration:**  \n   - Processes contextual data to determine the best action for completing tasks.\n   \n3. **Action Execution:**  \n   - Automates task execution such as navigating pages, filling forms, and downloading content.\n\n4. **Feedback Loop:**  \n   - Evaluates outcomes and refines strategy for subsequent actions.\n\n---\n\n## **Usage Example**\n\n### **Task: Download a Research Paper**\n\n```bash\npython main.py \"Can you download the attention is all you need paper in ARXIV from google search\"\n```\n\n#### Example Output:\n\n```plaintext\nEnter your query: Can you download the attention is all you need research paper by google from arxiv using google                                                                                     \nThought: I need to find the research paper \"Attention is All You Need\" on arXiv, which is hosted by Google. I will start by searching on Google for the paper and then navigate to the arXiv page to download the PDF.\nAction Name: GoTo Tool\nAction Input: {'url': 'https://www.google.com'}\nObservation: Navigated to https://www.google.com\nThought: I'm on the Google homepage. I need to type the name of the research paper into the search bar. The search bar is element 6.\nAction Name: Type Tool\nAction Input: {'index': 6, 'text': 'Attention is All You Need research paper arxiv'}\nObservation: Pressed Enter\nThought: I have the search results. The third result, element 25, is a link to the PDF of the paper on arXiv. I will click on this link to download the PDF.\nAction Name: Click Tool\nAction Input: {'index': 25}\nObservation: Clicked element 25\nThought: I have navigated to the arXiv page for the paper. The PDF is already displayed in the browser. I need to download the PDF. I will use the download tool to download the file. The url of the file is the current url.\nAction Name: Download Tool\nAction Input: {'index': 0, 'url': 'https://arxiv.org/pdf/1706.03762', 'filename': 'Attention_is_All_You_Need.pdf'}\nObservation: Downloaded Attention_is_All_You_Need.pdf from https://arxiv.org/pdf/1706.03762 and saved it to D:\\Personal Projects\\Web-Search-Agent\\downloads\\Attention_is_All_You_Need.pdf\nThought: I have successfully downloaded the PDF of the \"Attention is All You Need\" research paper. I can now provide the final answer to the user.\nFinal Answer: I have downloaded the \"Attention is All You Need\" research paper from arXiv. The file is saved as `Attention_is_All_You_Need.pdf`.\n```\n\n---\n\n## **Installation Guide**\n\n### **Prerequisites**\n\n- Python 3.8 or higher\n- Playwright installed\n\n### **Installation Steps**\n\n1. **Clone the repository:**\n\n   ```bash\n   git clone https://github.com/Jeomon/Web-Agent.git\n   cd Web-Agent\n   ```\n\n2. **Install dependencies:**\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. **Set up Playwright:**\n\n   ```bash\n   playwright install\n   ```\n\n---\n\n## **Running the Web Agent**\n\nExecute the following command to start the agent:\n\n```bash\npython main.py \"Describe your task here\"\n```\n\nExample:  \n```bash\npython main.py \"Can you download the attention is all you need research paper by google from arxiv using google\"\n```\n\n---\n\n## **Advanced Usage**\n\n- **Enable Debugging:** Set `verbose=True` in the agent configuration for detailed logs.\n- **Custom Instructions:** Modify task-specific instructions for custom workflows.\n\n---\n\n## **Development \u0026 Contributions**\n\nWe welcome contributions to improve Web Agent! Please feel free to fork the repository, submit issues, or create pull requests.\n\n## License\n\nThis project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the [LICENSE](LICENSE) file for details.\n\nFor more information about the AGPL-3.0 license, please visit: https://www.gnu.org/licenses/agpl-3.0.en.html\n\n---\n\n## **References**\n\n- **[Playwright Documentation](https://playwright.dev/docs/intro)**  \n- **[LangGraph Examples](https://github.com/langchain-ai/langgraph/blob/main/examples/web-navigation/web_voyager.ipynb)**  \n- **[vimGPT](https://github.com/ishan0102/vimGPT)**  \n- **[WebVoyager](https://github.com/MinorJerry/WebVoyager)**  \n\n---\n\n## **Contact**\n\nFor queries or support, please reach out via GitHub Issues.\n\nE-mail: jeogeoalukka@gmail.com","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjeomon%2Fweb-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjeomon%2Fweb-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjeomon%2Fweb-agent/lists"}