{"id":25319672,"url":"https://github.com/aghajari/jssoup","last_synced_at":"2025-10-13T05:08:56.307Z","repository":{"id":57095518,"uuid":"403702981","full_name":"Aghajari/JSSoup","owner":"Aghajari","description":"JSSoup: the JavaScript HTML DOM parser for node.js","archived":false,"fork":false,"pushed_at":"2021-09-06T21:49:32.000Z","size":79,"stargazers_count":14,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-06T11:23:05.280Z","etag":null,"topics":["cssselector","domparser","htmlparser","jsoup","nodejs"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Aghajari.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-06T17:12:55.000Z","updated_at":"2025-05-30T00:45:50.000Z","dependencies_parsed_at":"2022-08-22T23:10:34.889Z","dependency_job_id":null,"html_url":"https://github.com/Aghajari/JSSoup","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Aghajari/JSSoup","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aghajari%2FJSSoup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aghajari%2FJSSoup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aghajari%2FJSSoup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aghajari%2FJSSoup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Aghajari","download_url":"https://codeload.github.com/Aghajari/JSSoup/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aghajari%2FJSSoup/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279013694,"owners_count":26085390,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cssselector","domparser","htmlparser","jsoup","nodejs"],"created_at":"2025-02-13T20:54:36.412Z","updated_at":"2025-10-13T05:08:56.276Z","avatar_url":"https://github.com/Aghajari.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# JSSoup\n **[JSSoup](https://www.npmjs.com/package/@aghajari/jssoup)** is a fast and reliable HTML DOM parser library for JavaScript, node.js based on [PHP: simplehtmldom](https://github.com/simplehtmldom/simplehtmldom) and [Java: Jsoup](https://github.com/jhy/jsoup)\n\n[![Join the chat at https://gitter.im/Aghajari/community](https://badges.gitter.im/Aghajari/community.svg)](https://gitter.im/Aghajari/community?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\n- Works with well-formed and broken HTML documents.\n- Loads webpages and document strings.\n- Supports CSS selectors.\n\n# Usage\n\n### Installation \n\n```console\nnpm i @aghajari/jssoup\n```\n\n```js\nconst jssoup = require('@aghajari/jssoup');\n```\n\nLet's fetch music lyrics from [Google](https://www.google.com/search?q=Hello+lyrics) :\n```js\nconst doc = await jssoup.loadFromURL(\"https://www.google.com/search?q=Hello+lyrics\", options())\n\nconsole.log('track: ' + doc.getElementByAttr(\"data-attrid\", `\"title\"`).plainText())\nconsole.log('artist' + doc.getElementByAttr(\"data-attrid\", `\"subtitle\"`).plainText())\nconsole.log('lyrics' + doc.getElementByAttr(\"data-lyricid\").plainText())\n```\nOutput:\n```html\ntrack: Hello\nartist: Adele\nlyrics: Hello, it's me \nI was wondering if after all these years you'd like to meet \nTo go over everything \nThey say that time's supposed to heal ya...\n```\n\nFor finding the correct css selector for an specific element, you can use `HTMLNode.cssSelector()`,\n- `cssSelector()`: Get a CSS selector that will uniquely select this element\n\nJust try one example as a test and get the cssSelector for all.\n\n```js\nconst doc = await jssoup.loadFromURL(\"https://www.google.com/search?q=someone+like+you+lyrics\", options())\nconsole.log(doc.matchesPlainText('Someone Like You')[0].cssSelector())\n```\nYou will get `h2[data-attrid=title]` as the cssSelector, Now use this selector for all lyrics pages from Google.\n```js\nconst doc = await jssoup.loadFromURL(\"https://www.google.com/search?q=million+years+ago+lyrics\", options())\nconsole.log(doc.findFirst('h2[data-attrid=title]').plainText()) // Output: Million Years Ago\n```\n\n- Let's try parsing document string :\n```js\nconst html = `\u003chtml\u003e\u003chead\u003e\u003ctitle\u003e\u003cb\u003eJSSoup\u003c/b\u003e - node.js\u003c/title\u003e\u003c/head\u003e\u003c/html\u003e`\nconst title = jssoup.load(html).getElementByTagName('title') // or .findFirst('title');\nconsole.log(title.plainText()) // JSSoup - node.js\nconsole.log(title.innerText()) // \u003cb\u003eJSSoup\u003c/b\u003e - node.js\nconsole.log(title.outerText()) // \u003ctitle\u003e\u003cb\u003eJSSoup\u003c/b\u003e - node.js\u003c/title\u003e\n```\nBy the way, you can also use `const title = jssoup.load(html).titleEl()` for getting title.\n\n- Let's use id for finding the element.\n```js\nconst html = `\u003chtml\u003e\u003chead\u003e\u003ctest id='element_id'\u003eThis is a test\u003c/test\u003e\u003c/head\u003e\u003c/html\u003e`\nconst element = jssoup.load(html).getElementById('element_id') // or .findFirst('#element_id')\nconsole.log(element) // This is a test\n```\n\n- Let's use className for finding the element.\n```js\nconst html = `\u003chtml\u003e\u003chead\u003e\u003ctest class='test header'\u003eThis is a test\u003c/test\u003e\u003c/head\u003e\u003c/html\u003e`\nconst element = jssoup.load(html).getElementByClassName('test') // or .findFirst('.header')\nconsole.log(element) // This is a test\n```\n\n- Let's use attribute for finding the element.\n```js\nconst html = `\u003chtml\u003e\u003chead\u003e\u003ctest data-type='test'\u003eThis is a test\u003c/test\u003e\u003c/head\u003e\u003c/html\u003e`;\nconst element = jssoup.load(html).getElementByAttr('data-type'); // or .findFirst('[data-type]');\nconsole.log(element); // This is a test\n```\n\n- Other ways to get the same element :\n```js\nconst html = `\u003chtml\u003e\u003chead\u003e\u003ctest class='tcls header' id='#element_id' data-type='test_attr'\u003eThis is a test\u003c/test\u003e\u003c/head\u003e\u003c/html\u003e`\nconst doc = jssoup.load(html)\nelement = doc.getElementByTagName('test')\nelement = doc.findFirst('test')\nelement = doc.getElementByClass('tcls header')\nelement = doc.findFirst('test.tcls.header')\nelement = doc.getElementById('element_id')\nelement = doc.findFirst('test#element_id')\nelement = doc.getElementByAttr('data-type', 'test_attr')\nelement = doc.findFirst('test[data-type=test_attr]')\nelement = doc.findFirst('html \u003e head \u003e test:nth-child(0)')\n```\n\n- Get comments :\n```js\nconst html = `\u003chtml\u003e\n\u003cbody\u003e\n    \u003ch1\u003eThis is a test\u003c/h1\u003e\n    \u003c!-- this is a comment --\u003e\n\u003c/body\u003e\n\u003c/html\u003e`;\n\nconst doc = jssoup.load(html)\nconsole.log(doc.comments()[0]) // this is a comment\n```\n\n- Get metta tags :\n```js\nconst doc = await jssoup.loadFromURL(\"https://github.com/Aghajari\", options())\n\nconsole.log(doc.metaTags()) // Array of meta tags\nconsole.log(doc.metaEl('description')) // meta element for description\nconsole.log(doc.metaEl('image')) // meta content for image\n/*\n* metaEl() will search following tags:\n* \u003cmeta name=\"NAME\" content=\"bla bla\"\u003e (Standard)\n* \u003cmeta property=\"og:NAME\" content=\"bla bla\"\u003e\n* \u003cmeta itemprop=\"NAME\" content=\"bla bla\"\u003e\n* \u003cmeta name=\"…NAME…\" content=\"bla bla\"\u003e\n* \u003cmeta property=\"…NAME…\" content=\"bla bla\"\u003e\n* \u003cmeta itemprop=\"…NAME…\" content=\"bla bla\"\u003e\n*/ \n```\n\n- LinkPreview inspired by [linkpreview](https://github.com/meyt/linkpreview) :\n```js\nconst doc = await jssoup.loadFromURL(\"https://github.com/Aghajari\", options())\n\nconsole.log('title', doc.title()) // Aghajari - Overview\nconsole.log('description', doc.description()) // Aghajari has ? repositories available. Follow their code on GitHub.\nconsole.log('image', doc.image()) // https://avatars.githubusercontent.com/u/30867537?v=4?s=400\n```\n\n- Getting multiple tags :\n```js\nconst html = `\u003chtml\u003e\n\u003cbody\u003e\n\u003ctag1\u003e This is test1 \u003c/tag1\u003e\n\u003ctag2\u003e This is test2 \u003c/tag2\u003e\n\u003ctag3\u003e This is test3 \u003c/tag3\u003e\n\u003c/body\u003e\n\u003c/html\u003e`;\n\nconst doc = jssoup.load(html)\nconsole.log(doc.find(['tag1', 'tag2', 'tag3'])) // Array of elements\n```\n\n- Limit output indexes :\n```js\nconst html = `\u003chtml\u003e\n\u003cbody\u003e\n\u003ctag\u003e This is test0 \u003c/tag\u003e\n\u003ctag\u003e This is test1 \u003c/tag\u003e\n\u003ctag\u003e This is test2 \u003c/tag\u003e\n\u003ctag\u003e This is test3 \u003c/tag\u003e\n\u003ctag\u003e This is test4 \u003c/tag\u003e\n\u003ctag\u003e This is test5 \u003c/tag\u003e\n\u003c/body\u003e\n\u003c/html\u003e`;\n\nconst doc = jssoup.load(html)\nconsole.log(doc.find('tag', [2, -2])) // Array of elements (test2, test4)\n```\n\n- Get attribute from element:\n```js\nconst html = `\u003chtml\u003e\u003chead\u003e\u003ctest data-id='id1234'\u003eThis is a test\u003c/test\u003e\u003c/head\u003e\u003c/html\u003e`;\nconst element = jssoup.load(html).getElementByAttr('data-id'); // or .findFirst('test[data-id]');\nconsole.log(element.getAttribute('data-id')); // id1234\n```\n\n## Attribute Expression\n- `=` : equal `[attr=value]`\n- `!=` : unequal `[attr!=value]`\n- `*=` : regex `[attr*=value]`\n- `^=` : regex /^pattern/ `[attr^=value]`\n- `$=` : regex /pattern$/ `[attr$=value]`\n- `|=` : startsWith `[attr|=value]`\n- `\u0026=` : endsWith `[attr\u0026=value]`\n- `%=` : contains `[attr%=value]`\n- `~=` : contains in list of words `[attr~=value]`\u003cbr\u003e`\u003ctag attr='blue red green'\u003e` : `[attr~=red]`\n\n## Author \n- **Amir Hossein Aghajari**\n\nLicense\n=======\n\n    Copyright 2021 Amir Hossein Aghajari\n    Licensed under the Apache License, Version 2.0 (the \"License\");\n    you may not use this file except in compliance with the License.\n    You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n    Unless required by applicable law or agreed to in writing, software\n    distributed under the License is distributed on an \"AS IS\" BASIS,\n    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n    See the License for the specific language governing permissions and\n    limitations under the License.\n\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"64\" alt=\"LCoders | AmirHosseinAghajari\" src=\"https://user-images.githubusercontent.com/30867537/90538314-a0a79200-e193-11ea-8d90-0a3576e28a18.png\"\u003e\n  \u003cbr\u003e\u003ca\u003eAmir Hossein Aghajari\u003c/a\u003e • \u003ca href=\"mailto:amirhossein.aghajari.82@gmail.com\"\u003eEmail\u003c/a\u003e • \u003ca href=\"https://github.com/Aghajari\"\u003eGitHub\u003c/a\u003e\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faghajari%2Fjssoup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faghajari%2Fjssoup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faghajari%2Fjssoup/lists"}