{"id":50013654,"url":"https://github.com/profiq/ai-web-explorer","last_synced_at":"2026-05-20T02:57:31.391Z","repository":{"id":233499228,"uuid":"774895998","full_name":"profiq/ai-web-explorer","owner":"profiq","description":"LLM-based web explorer for representing web apps as state machines.","archived":false,"fork":false,"pushed_at":"2024-10-10T06:51:03.000Z","size":10943,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-28T16:52:09.739Z","etag":null,"topics":["ai","llm","qa-automation","scraping"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/profiq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-20T11:51:13.000Z","updated_at":"2024-10-10T06:50:55.000Z","dependencies_parsed_at":"2024-04-16T13:22:05.575Z","dependency_job_id":"e69e72bd-b523-4d9b-b77b-458242c07da0","html_url":"https://github.com/profiq/ai-web-explorer","commit_stats":null,"previous_names":["profiq/ai-web-explorer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/profiq/ai-web-explorer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/profiq%2Fai-web-explorer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/profiq%2Fai-web-explorer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/profiq%2Fai-web-explorer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/profiq%2Fai-web-explorer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/profiq","download_url":"https://codeload.github.com/profiq/ai-web-explorer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/profiq%2Fai-web-explorer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33243960,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-19T15:49:41.270Z","status":"online","status_checked_at":"2026-05-20T02:00:07.149Z","response_time":356,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","llm","qa-automation","scraping"],"created_at":"2026-05-20T02:57:26.990Z","updated_at":"2026-05-20T02:57:31.384Z","avatar_url":"https://github.com/profiq.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI Web Explorer\n\nThis repository contains an autonomous web explorer built on top of LLMs. The web explorer\nacts as a very intelligent web scraper. Unlike a very simple scraping solution it can perform\nmore complex interactions such as clicking on buttons, or filling forms. This means\nthe web explorer can access many states invisible to normal scrapers such as a login form with\na message saying that the password is incorrect.\nIt also understands the content of the pages it visits and the meaning of actions it performs.\n\nThe output of the web explorer is a something aking to a state machine of a given website\n- a graph of visited states and actions that can be performed to reach them.\n\nAt profiq we are interested in using LLMs to automate testing on the web. Web explorer can be\nuset to create a knowledge base that can be then used to suggest test cases and to automate\ntheir implementation.\n\n## How it works\n\nHere is a basic schema of how the web explorer works:\n\n![Web explorer schema](./docs/explorer.png)\n\n1. User provides the domain of the website they want to explore.\n2. A [Playwright](https://playwright.dev/python/) instance is created and the website is opened.\n3. The LLM is used to generate a title for the current state, e.g. \"Login - Empty form\".\n4. The explorer checks whether it has already visited a state with a very similar title. If it has,\n   it sets that state as current. If not, it creates a new state and uses an LLM to generate a textual\n   description of the page and a list of possible actions to perform.\n5. The explorer selects a random action from the list of possible actions it hasn't tried yet \n   and performs it. If no more actions are available, it checks other states for available actions.\n   If there are no more actions available, the exploration is finished.\n6. Steps 3-5 are repeated for a given number of iterations.\n7. The output is a state machine in a dot or JSON format.\n\nHere is an example of a state machine generated by the web explorer for Hacker News:\n\n![Generated State machine](./docs/state_machine.png)\n\n## Installation\n\nWe use the Rye package manager to manage dependencies. You can install it by following the\ninstructions at https://rye-up.com/.\n\nAfter you have Rye installed, you can install the web explorer by cloning this repository\nand installing required dependencies:\n\n```bash\ngit clone https://github.com/profiq/ai-web-explorer.git\ncd ai-web-explorer\nrye sync\nrye run playwright install\n```\n\n## Usage\n\nThe web explorer is a command line tool. You can run it by executing the following commands:\n\n```bash\nexport OPENAI_API_KEY=[YOUR_OPENAI_API_KEY]\nrye run explore [DOMAIN] -i [NO_OF_ITERATIONS]\n```\n\nWhere `[DOMAIN]` is the domain of the website you want to explore. Do not include `http://` or `https://`.\nThe `[NO_OF_ITERATIONS]` parameter specifies how many iterations the web explorer should perform.\n\nThe web explorer will start exploring the website and will output the state machine it has \ndiscovered in a dot format. Dot graphs can be turned into an image using the `dot` command line\ntool or using online tools such as  [WebGraphviz](http://www.webgraphviz.com/).\n\n\n### Additional options\n\n- `-t` or `--store-titles` - swith into an interactive mode where you manually confirm the titles\n  of the pages you visit and correct them if needed. Confirmen titles are stored in a JSON file\n  together with the HTML content of the page. This JSON file can be then used to fine-tune an LLM\n  to generate better titles in the future.\n\n- `-l [LOGIN]` or `--login [LOGIN]` - if the website you are exploring requires a login, you can use this option\n  to provide the login credentials. `[LOGIN]` should be a string in the format `username:password`.\n  If you use special characters in the username or password, you should put the whole string in quotes.\n\n- `-o [OUTPUT]` or `--output [OUTPUT]` - specify the output format of the state machine. You can choose between\n  `dot`, `json` or `jsonsimple`. The `dot` format is great for visualizing the website as a graph. The `json`\n  format is recommended for storing the full state of the exploration so you can continue it later. The `jsonsimple`\n  format is a simplified version of the `json` format that skips certain attributes such as title embeddings or action\n  priority. We recommend this option for processing the website \"state machine\" with other tools LLM prompts.\n\n- `-r [STATE_PATH]` or `--restore [STATE_PATH]` - resume exploration from previous state. `[STATE_PATH]` should be\n  a path to a JSON file containing the state of the exploration (can be created using the `-o json` option).\n\n- `-a [TEXT]` or `--additional-information [TEXT]` - provide additional information that will be added to all LLM prompts\n  to further guide the exploration.\n\n## Future work\n\nThe web explorer is an ongoing research project. There are many things we would like to try and improve:\n\n- Using different LLMs to generate titles and descriptions of the pages.\n- Removing invisible elements from the page before generating titles, descriptions, and actions.\n- Integrating it with other tools such as test case generators.\n- Use collected titles to fine-tune an LLM to generate better titles in the future.\n\nFeel free to contribute by reporting bugs or suggesting new features by creating an issue!\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprofiq%2Fai-web-explorer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprofiq%2Fai-web-explorer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprofiq%2Fai-web-explorer/lists"}