{"id":15917290,"url":"https://github.com/reidbarber/webmarker","last_synced_at":"2025-03-24T07:32:00.195Z","repository":{"id":239530972,"uuid":"793361874","full_name":"reidbarber/webmarker","owner":"reidbarber","description":"Mark web pages for use with vision-language models","archived":false,"fork":false,"pushed_at":"2025-01-07T02:36:00.000Z","size":685,"stargazers_count":30,"open_issues_count":6,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-19T03:13:24.453Z","etag":null,"topics":["claude","computer-use","computer-using-agent","cua","gemini","gpt4o","gpt4v","llms","operator","playwright","prompt","prompt-engineering","qwen-vl","set-of-mark","som","vision-language-model"],"latest_commit_sha":null,"homepage":"https://webmarkerjs.com/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/reidbarber.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-29T04:41:27.000Z","updated_at":"2025-02-24T14:28:15.000Z","dependencies_parsed_at":"2025-01-07T03:30:45.846Z","dependency_job_id":"289e743a-e13b-45cb-8558-a8c92a8d5a61","html_url":"https://github.com/reidbarber/webmarker","commit_stats":{"total_commits":75,"total_committers":1,"mean_commits":75.0,"dds":0.0,"last_synced_commit":"c7463a3967e59e8bc284163a8a5494bb76829d2b"},"previous_names":["reidbarber/webmarker"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reidbarber%2Fwebmarker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reidbarber%2Fwebmarker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reidbarber%2Fwebmarker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reidbarber%2Fwebmarker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/reidbarber","download_url":"https://codeload.github.com/reidbarber/webmarker/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245227500,"owners_count":20580891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude","computer-use","computer-using-agent","cua","gemini","gpt4o","gpt4v","llms","operator","playwright","prompt","prompt-engineering","qwen-vl","set-of-mark","som","vision-language-model"],"created_at":"2024-10-06T18:09:58.498Z","updated_at":"2025-03-24T07:32:00.188Z","avatar_url":"https://github.com/reidbarber.png","language":"TypeScript","readme":"\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/reidbarber/webmarker/assets/8961049/cd3fd0ff-b31f-42b3-b225-207ffded1640\"\u003e\n    \u003cimg width=\"400px\" alt=\"WebMarker\" src=\"https://github.com/reidbarber/webmarker/assets/8961049/b017e0c2-a2f7-4b4d-a1e9-9b2cc91d8ae6\"\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\nMark web pages for use with vision-language models.\n\u003c/p\u003e\n\n## Overview\n\n**WebMarker** adds visual markings with labels to elements on a web page. This can be used for [Set-of-Mark](https://github.com/microsoft/SoM) prompting, which improves visual grounding abilities of vision-language models such as GPT-4o, Claude 3.5, and Google Gemini 1.5.\n\n![Screenshot of marked Google homepage](https://github.com/user-attachments/assets/722e1034-06d4-4ccd-a7d6-f03749435681)\n\n## How it works\n\n**1. Call the `mark()` function**\n\nThis marks the interactive elements on the page, and returns an object containing the marked elements, where each key is a mark label string, and each value is an object with the following properties:\n\n- `element`: The interactive element that was marked.\n- `markElement`: The label element that was added to the page.\n- `boundingBoxElement`: The bounding box element that was added to the page.\n\nYou can use this information to build your prompt for the vision-language model.\n\n**2. Send a screenshot of the marked page to a vision-language model, along with your prompt**\n\nExample prompt:\n\n```javascript\nlet markedElements = mark();\n\nlet prompt = `The following is a screenshot of a web page.\n\nInteractive elements have been marked with red bounding boxes and labels.\n\nWhen referring to elements, use the labels to identify them.\n\nReturn an action and element to perform the action on.\n\nAvailable actions: click, hover\n\nAvailable elements:\n${Object.keys(markedElements)\n  .map((label) =\u003e `- ${label}`)\n  .join(\"\\n\")}\n\nExample response: click 0\n`;\n```\n\n**3. Programmatically interact with the marked elements.**\n\nIn a web browser (i.e. via Playwright), interact with elements as needed.\n\nFor prompting or agent ideas, see the [WebVoyager](https://github.com/MinorJerry/WebVoyager) paper.\n\n## Playwright example\n\n```javascript\n// Inject the WebMarker library into the page\nawait page.addScriptTag({\n  url: \"https://cdn.jsdelivr.net/npm/webmarker-js/dist/main.js\",\n});\n\n// Mark the page and get the marked elements\nlet markedElements = await page.evaluate(async () =\u003e await WebMarker.mark());\n\n// Click a marked element\nawait page.locator('[data-mark-label=\"0\"]').click();\n\n// (Optional) Check if page is marked\nlet isMarked = await page.evaluate(async () =\u003e await WebMarker.isMarked());\n\n// (Optional) Unmark the page\nawait page.evaluate(async () =\u003e await WebMarker.unmark());\n```\n\n## Options\n\n### selector\n\nA custom CSS selector to specify which elements to mark.\n\n- Type: `string`\n- Default: `'a[href], button, input:not([type=\"hidden\"]), select, textarea, summary, [role=\"button\"], [tabindex]:not([tabindex=\"-1\"])'`\n\n### getLabel\n\nProvide a function for generating labels. By default, labels are generated as integers starting from 0.\n\n- Type: `(element: Element, index: number) =\u003e string`\n- Default: `(_, index) =\u003e index.toString()`\n\n### markAttribute\n\nA custom attribute to add to the marked elements. This attribute contains the label of the mark.\n\n- Type: `string`\n- Default: `\"data-mark-label\"`\n\n### markPlacement\n\nThe placement of the mark relative to the element.\n\n- Type: `'top' | 'top-start' | 'top-end' | 'right' | 'right-start' | 'right-end' | 'bottom' | 'bottom-start' | 'bottom-end' | 'left' | 'left-start' | 'left-end'`\n- Default: `'top-start'`\n\n### markStyle\n\nA CSS style to apply to the label element. You can also specify a function that returns a CSS style object.\n\n- Type: `Partial\u003cCSSStyleDeclaration\u003e | (element: Element) =\u003e Partial\u003cCSSStyleDeclaration\u003e`\n- Default: `{backgroundColor: \"red\", color: \"white\", padding: \"2px 4px\", fontSize: \"12px\", fontWeight: \"bold\"}`\n\n### boundingBoxStyle\n\nA CSS style to apply to the bounding box element. You can also specify a function that returns a CSS style object. Bounding boxes are only shown if showBoundingBoxes is true.\n\n- Type: `Partial\u003cCSSStyleDeclaration\u003e | (element: Element) =\u003e Partial\u003cCSSStyleDeclaration\u003e`\n- Default: `{outline: \"2px dashed red\", backgroundColor: \"transparent\"}`\n\n### showBoundingBoxes\n\nWhether or not to show bounding boxes around the elements.\n\n- Type: `boolean`\n- Default: `true`\n\n### containerElement\n\nProvide a container element to query the elements to be marked. By default, the container element is document.body.\n\n- Type: `Element`\n- Default: `document.body`\n\n### viewPortOnly\n\nOnly mark elements that are visible in the current viewport.\n\n- Type: `boolean`\n- Default: `false`\n\n### Advanced example\n\n```typescript\nconst markedElements = mark({\n  // Only mark buttons and inputs\n  selector: \"button, input\",\n  // Use test id attribute for marker labels\n  markAttribute: \"data-test-id\",\n  // Use a blue mark with white text\n  markStyle: { color: \"white\", backgroundColor: \"blue\", padding: 5 },\n  // Use a blue dashed outline with a transparent and slighly blue background\n  boundingBoxStyle: { outline: \"2px dashed blue\", backgroundColor: \"rgba(0, 0, 255, 0.1)\"},\n  // Place the mark at the top right corner of the element\n  markPlacement: \"top-end\";\n  // Show bounding boxes over elements (defaults to true)\n  showBoundingBoxes: true,\n  // Generate labels as 'Element 0', 'Element 1', 'Element 2'...\n  // Defaults to '0', '1', '2'... if not provided.\n  getLabel: (element, index) =\u003e `Element ${index}`,\n  // A custom container element to query the elements to be marked.\n  // Defaults to the document.body.\n  containerElement: document.body.querySelector(\"main\"),\n  // Only mark elements that are visible in the current viewport\n  viewPortOnly: true,\n});\n```\n","funding_links":[],"categories":["prompt-engineering","Projects"],"sub_categories":["Frameworks \u0026 Models"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freidbarber%2Fwebmarker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Freidbarber%2Fwebmarker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freidbarber%2Fwebmarker/lists"}