{"id":27629262,"url":"https://github.com/itbanque/talk2dom","last_synced_at":"2025-04-23T15:15:52.248Z","repository":{"id":287224532,"uuid":"964027305","full_name":"itbanque/talk2dom","owner":"itbanque","description":"Locate web elements using natural language. Powered by LLM. Works with Selenium.","archived":false,"fork":false,"pushed_at":"2025-04-17T20:49:22.000Z","size":61,"stargazers_count":1,"open_issues_count":7,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-23T15:15:42.585Z","etag":null,"topics":["ai","automation-testing","llm","locator","mobile","openai","qa","selenium","web"],"latest_commit_sha":null,"homepage":"https://talk2dom.itbanque.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/itbanque.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-10T15:20:14.000Z","updated_at":"2025-04-17T20:49:26.000Z","dependencies_parsed_at":"2025-04-17T21:52:13.273Z","dependency_job_id":null,"html_url":"https://github.com/itbanque/talk2dom","commit_stats":null,"previous_names":["itbanque/talk2dom"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itbanque%2Ftalk2dom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itbanque%2Ftalk2dom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itbanque%2Ftalk2dom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itbanque%2Ftalk2dom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/itbanque","download_url":"https://codeload.github.com/itbanque/talk2dom/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250457792,"owners_count":21433734,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","automation-testing","llm","locator","mobile","openai","qa","selenium","web"],"created_at":"2025-04-23T15:15:51.490Z","updated_at":"2025-04-23T15:15:52.238Z","avatar_url":"https://github.com/itbanque.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# talk2dom — Locate Web Elements with One Sentence\n\n\u003e 📚 [English](./README.md) | [中文](./README.zh.md)\n\n![PyPI](https://img.shields.io/pypi/v/talk2dom)\n[![PyPI Downloads](https://static.pepy.tech/badge/talk2dom)](https://pepy.tech/projects/talk2dom)\n![Stars](https://img.shields.io/github/stars/itbanque/talk2dom?style=social)\n![License](https://img.shields.io/github/license/itbanque/talk2dom)\n![CI](https://github.com/itbanque/talk2dom/actions/workflows/test.yaml/badge.svg)\n\n**talk2dom** is a focused utility that solves one of the hardest problems in browser automation and UI testing:\n\n\u003e ✅ **Finding the correct UI element on a page.**\n\n---\n\n[![Watch the demo on YouTube](https://img.youtube.com/vi/6S3dOdWj5Gg/0.jpg)](https://youtu.be/6S3dOdWj5Gg)\n\n\n## 🧠 Why `talk2dom`\n\nIn most automated testing or LLM-driven web navigation tasks, the real challenge is not how to click or type — it's how to **locate the right element**.\n\nThink about it:\n\n- Clicking a button is easy — *if* you know its selector.\n- Typing into a field is trivial — *if* you've already located the right input.\n- But finding the correct element among hundreds of `\u003cdiv\u003e`, `\u003cspan\u003e`, or deeply nested Shadow DOM trees? That's the hard part.\n\n**`talk2dom` is built to solve exactly that.**\n\n---\n\n## 🎯 What it does\n\n`talk2dom` helps you locate elements by:\n\n- Understands natural language instructions and turns them into browser actions  \n- Supports single-command execution or persistent interactive sessions  \n- Uses LLMs (like GPT-4 or Claude) to analyze live HTML and intent  \n- Returns flexible output: actions, selectors, or both — providing flexible outputs: actions, selectors, or both — depending on the instruction and model response  \n- Compatible with both desktop and mobile browsers via Selenium\n\n---\n\n## 🤔 Why Selenium?\n\nWhile there are many modern tools for controlling browsers (like Playwright or Puppeteer), **Selenium remains the most robust and cross-platform solution**, especially when dealing with:\n\n- ✅ Safari (WebKit)\n- ✅ Firefox\n- ✅ Mobile browsers\n- ✅ Cross-browser testing grids\n\nThese tools often have limited support for anything beyond Chrome-based browsers. Selenium, by contrast, has battle-tested support across all major platforms and continues to be the industry standard in enterprise and CI/CD environments.\n\nThat’s why `talk2dom` is designed to integrate directly with Selenium — it works where the real-world complexity lives.\n\n---\n\n## 📦 Installation\n\n```bash\npip install talk2dom\n```\n\n---\n\n## 🧩 Code-Based ActionChain Mode\n\nFor developers and testers who prefer structured Python control, `ActionChain` lets you drive the browser step-by-step.\n\n### Basic Usage\n\nBy default, talk2dom uses gpt-4o-mini to balance performance and cost.\nHowever, during testing, gpt-4o has shown the best performance for this task.\n\n#### Make sure you have OPENAI_API_KEY\n\n```bash\nexport OPENAI_API_KEY=\"...\"\n```\n\nNote: All models must support chat completion APIs and follow OpenAI-compatible schema.\n\n#### Sample Code\n\n```python\nfrom selenium import webdriver\nfrom selenium.webdriver.common.keys import Keys\n\nfrom talk2dom import ActionChain\n\ndriver = webdriver.Chrome()\n\nActionChain(driver) \\\n    .open(\"http://www.python.org\") \\\n    .find(\"Find the Search box\") \\\n    .type(\"pycon\") \\\n    .type(Keys.RETURN) \\\n    .assert_page_not_contains(\"No results found.\") \\\n    .close()\n```\n\n### Free Models\n\nYou can also use `talk2dom` with free models like `llama-3.3-70b-versatile` from [Groq](https://groq.com/).\n\n\n### Full page vs Scoped element queries\nThe `find()` function can be used to query the entire page or a specific element.\nYou can pass either a full Selenium `driver` or a specific `WebElement` to scope the locator to part of the page.\n#### Why/When use `WebElement` instead of `driver`?\n\n1. Reduce Token Usage — Passing a smaller HTML subtree (like a modal or container) instead of the full page saves LLM tokens, reducing latency and cost.\n2. Improve Locator Accuracy — Scoping the query helps the LLM focus on relevant content, which is especially helpful for nested or isolated components like popups, drawers, and cards.\n\nYou don’t need to extract HTML manually — `talk2dom` will automatically use `outerHTML` from any `WebElement` you pass in.\n\n---\n\n\n## ✨ Philosophy\n\n\u003e Our goal is not to control the browser — you still control your browser. \n\u003e Our goal is to **find the right DOM element**, so you can tell the browser what to do.\n\n---\n\n## ✅ Key Features\n\n- 💬 Natural language interface to control the browser  \n- 🔁 Persistent session for multi-step interactions  \n- 🧠 LLM-powered understanding of high-level intent  \n- 🧩 Outputs: actionable XPath/CSS selectors or ready-to-run browser steps  \n- 🧪 Built-in assertions and step validations  \n- 💡 Works with both CLI scripts and interactive chat\n\n---\n\n## 📄 License\n\nApache 2.0\n\n---\n\n## Contributing\n\nPlease read [CONTRIBUTING.md](https://github.com/itbanque/talk2dom/blob/main/CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.\n\n---\n\n## 💬 Questions or ideas?\n\nWe’d love to hear how you're using `talk2dom` in your AI agents or testing flows.  \nFeel free to open issues or discussions!  \nYou can also tag us on GitHub if you’re building something interesting with `talk2dom`!  \n⭐️ If you find this project useful, please consider giving it a star!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitbanque%2Ftalk2dom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fitbanque%2Ftalk2dom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitbanque%2Ftalk2dom/lists"}