{"id":19396416,"url":"https://github.com/remotemerge/xpath-parser","last_synced_at":"2025-04-24T05:30:53.811Z","repository":{"id":43178024,"uuid":"254292586","full_name":"remotemerge/xpath-parser","owner":"remotemerge","description":"JavaScript utility for extracting data from HTML and XML documents!","archived":false,"fork":false,"pushed_at":"2024-04-09T11:38:31.000Z","size":1340,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-16T15:37:19.191Z","etag":null,"topics":["delay","dom","javascript","query","scraper","scraping","subquery","typescript","xpath","xpath-expression"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/remotemerge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-09T06:43:23.000Z","updated_at":"2024-04-09T03:33:02.000Z","dependencies_parsed_at":"2024-04-09T05:24:58.885Z","dependency_job_id":"6e843402-6ac4-4e06-b25c-965c5f0f8e30","html_url":"https://github.com/remotemerge/xpath-parser","commit_stats":{"total_commits":236,"total_committers":3,"mean_commits":78.66666666666667,"dds":"0.23728813559322037","last_synced_commit":"3722cb7407180a4cd05205fb9cb8d1cdfa3928be"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/remotemerge%2Fxpath-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/remotemerge%2Fxpath-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/remotemerge%2Fxpath-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/remotemerge%2Fxpath-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/remotemerge","download_url":"https://codeload.github.com/remotemerge/xpath-parser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250572177,"owners_count":21452326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["delay","dom","javascript","query","scraper","scraping","subquery","typescript","xpath","xpath-expression"],"created_at":"2024-11-10T10:35:12.478Z","updated_at":"2025-04-24T05:30:53.435Z","avatar_url":"https://github.com/remotemerge.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003cimg src=\"./assets/logo.png\" width=\"28\" height=\"28\"\u003e XPath Parser\n\n[![Package](https://img.shields.io/npm/v/@remotemerge/xpath-parser?logo=npm)](https://www.npmjs.com/package/@remotemerge/xpath-parser)\n![Build](https://img.shields.io/github/actions/workflow/status/remotemerge/xpath-parser/production.yml?logo=github)\n![Downloads](https://img.shields.io/npm/dt/@remotemerge/xpath-parser)\n![License](https://img.shields.io/npm/l/@remotemerge/xpath-parser)\n\nXPath Parser is a JavaScript utility for extracting data from HTML and XML documents; built for web scraping in a JavaScript\nenvironment. It's open source, modern, lightweight and fast. You can easily integrate it into new or existing web\ncrawlers, browser extensions, etc.\n\n## Install\n\n```bash\n# using NPM\nnpm i @remotemerge/xpath-parser\n# using Yarn\nyarn add @remotemerge/xpath-parser\n```\n\n## Usage\n\nImport the XPathParser class in your project.\n\n```javascript\nimport XPathParser from '@remotemerge/xpath-parser'\n```\n\n## Examples\n\nThe XPathParser constructor `XPathParser(html|DOM)` supports both DOM and HTML string, initialize as required.\n\n```javascript\nconst parser = new XPathParser('\u003chtml\u003e...\u003c/html\u003e');\n```\n\n### Scrape First Match\n\nThis method evaluates the given expression and captures the first result. It is useful for scraping a single element\nvalue like `title`, `price`, etc. from HTML pages.\n\n```javascript\nconst result = parser.queryFirst('//span[@id=\"productTitle\"]');\nconsole.log(result);\n```\n\nSample output:\n\n```text\nLETSCOM Fitness Tracker HR, Activity Tracker Watch with Heart Rate...\n```\n\n### Scrape All Matches\n\nThis method evaluates the given expression and captures all results. It is useful for scraping all URLs, all images, all\nCSS classes, etc. from HTML pages.\n\n```javascript\n// scrape titles\nconst results = parser.queryList('//span[contains(@class, \"zg-item\")]/a/div');\nconsole.log(results);\n```\n\nSample output:\n\n```javascript\n['Cell Phone Stand,Angle Height Adjusta…', 'Selfie Ring Light with Tripod…', 'HOVAMP MFi Certified Nylon…', '...']\n```\n\n### Scrape multiple elements\n\nThis method loop through the given expressions and captures the first match of each expression. It is useful for\nscraping full product information (`title`, `seller`, `price`, `rating`, etc.) from HTML pages. The keys are preserved\nand the values are returned to the same keys.\n\n```javascript\nconst result = parser.multiQuery({\n  title: '//div[@id=\"ppd\"]//span[@id=\"productTitle\"]',\n  seller: '//div[@id=\"ppd\"]//a[@id=\"bylineInfo\"]',\n  price: '//div[@id=\"ppd\"]//span[@id=\"priceblock_dealprice\"]',\n  rating: '//div[@id=\"ppd\"]//span[@id=\"acrCustomerReviewText\"]',\n});\n```\n\nSample output:\n\n```text\n{\n    title: 'LETSCOM Fitness Tracker HR, Activity Tracker Watch with Heart Rate Monitor...',\n    seller: 'LETSCOM',\n    price: '$20.39',\n    rating: '1,489 ratings',\n}\n```\n\n### Scrape with SubQueries\n\nThis method captures the `root` element and runs queries within its namespace. It is useful for scraping multiple\nproducts and full information about each product. For example, there can be 10 products on a page and each product\nhas (`title`, `url`, `image`, `price`, etc.). This method also supports `pagination` parameter. The keys are preserved\nand the values are returned to the same keys. Here `pagination` is optional parameter.\n\n```javascript\nconst result = parser.subQuery({\n  root: '//span[contains(@class, \"zg-item\")]',\n  pagination: '//ul/li/a[contains(text(), \"Next\")]/@href',\n  queries: {\n    title: 'a/div/@title',\n    url: 'a/@href',\n    image: 'a/span/div/img/@src',\n    price: './/span[contains(@class, \"a-color-price\")]',\n  }\n});\nconsole.log(result);\n```\n\nSample output:\n\n```text\n{\n  paginationUrl: 'https://www.example.com/gp/new-releases/wireless/reTF8\u0026pg=2',\n  results: [\n    {\n      title: 'Cell Phone Stand,Angle Height Adjustable Stab/Kindle/Tablet,4-10inch',\n      url: '/Adjustable-LISEN-Aluminum-Compatible-4-10\u0026refRID=H1HWDWERK8YCRN76ER1T',\n      image: 'https://images-na.ssl-images-example.com/images/I/61UL200_SR200,200_.jpg',\n      price: '$16.99'\n    },\n    {\n      title: 'Selfie Ring Light with Tripod Stand and Pheaming Photo Photography Vlogging Video',\n      url: '/Selfie-Lighting-Steaming-Photography-Vlogging/dp/B081SV\u0026K8YCRN76ER1T',\n      image: 'https://images-na.ssl-images-example.com/images/I/717L200_SR200,200_.jpg',\n      price: '$46.99'\n    },\n    {\n      // ...\n    }\n  ]\n}\n```\n\n### Wait for Element\n\nThis method waits until the element (matches by expression) exists on a page. The first parameter `expression` is XPath\nexpression to match and the second parameter `maxSeconds` is the maximum time to wait in seconds (default to 10 seconds)\n.\n\n```javascript\nparser.waitXPath('//span[contains(@class, \"a-color-price\")]/span')\n  .then((response) =\u003e {\n    // expression match and element exists\n  }).catch((error) =\u003e {\n    // match nothing and timeout\n  });\n```\n\n## Contribution\n\nWelcome the community for contribution. Please make a PR request for bug fixes, enhancements, new features, etc.\n\n## Disclaimer\n\nAll the XPath expressions above are tested on Amazon [product listing] and related pages for educational purposes only.\nThe icons are included from [flaticon] website.\n\n[product listing]: https://www.amazon.com/gp/new-releases/wireless\n\n[flaticon]: https://www.flaticon.com","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fremotemerge%2Fxpath-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fremotemerge%2Fxpath-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fremotemerge%2Fxpath-parser/lists"}