{"id":18535564,"url":"https://github.com/beforesemicolon/html-parser","last_synced_at":"2025-04-09T15:32:33.014Z","repository":{"id":239257622,"uuid":"418631317","full_name":"beforesemicolon/html-parser","owner":"beforesemicolon","description":"Customizable Fast HTML parser","archived":false,"fork":false,"pushed_at":"2025-03-05T00:22:32.000Z","size":700,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-03-24T08:54:31.525Z","etag":null,"topics":["html","parser"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/beforesemicolon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["beforesemicolon"]}},"created_at":"2021-10-18T18:57:42.000Z","updated_at":"2025-03-05T00:21:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"cf485fd7-301f-4d51-8398-0dfcd7645c38","html_url":"https://github.com/beforesemicolon/html-parser","commit_stats":null,"previous_names":["beforesemicolon/html-parser"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beforesemicolon%2Fhtml-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beforesemicolon%2Fhtml-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beforesemicolon%2Fhtml-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beforesemicolon%2Fhtml-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/beforesemicolon","download_url":"https://codeload.github.com/beforesemicolon/html-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248058150,"owners_count":21040704,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","parser"],"created_at":"2024-11-06T19:25:25.885Z","updated_at":"2025-04-09T15:32:33.008Z","avatar_url":"https://github.com/beforesemicolon.png","language":"TypeScript","funding_links":["https://github.com/sponsors/beforesemicolon"],"categories":[],"sub_categories":[],"readme":"# HTML Parser\nHTML parser for any Javascript runtime environment. Small, Fast, Easy to use, and highly customizable\n\n[![npm](https://img.shields.io/npm/v/%40beforesemicolon%2Fhtml-parser)](https://www.npmjs.com/package/@beforesemicolon/html-parser)\n![npm](https://img.shields.io/npm/l/%40beforesemicolon%2Fhtml-parser)\n[![Test](https://github.com/beforesemicolon/html-parser/actions/workflows/test.yml/badge.svg?branch=master)](https://github.com/beforesemicolon/html-parser/actions/workflows/test.yml)\n\n## Motivation\nMost HTML parsers will force you to learn their Javascript API after the parse result. \nThey won't allow you to tap into the processing to access the nodes as they are parsed or let you create your own API \nfor the final result that adapts to your project instead of the other way around.\n\nThis parser\n- Is one of the fastest HTML parser out there averaging 1ms per HTML page of different sizes. Check [benchmark](#benchmark).\n- Uses a DOM like API which is a custom Lite DOM built for performance\n- Can use browser DOM API or JsDOM to give you the parsed HTML allowing it to be used in any js runtime environment\n- You can use your own [custom DOM API](#creating-your-custom-handler) like to gain absolute control\n- Accepts a callback, so you can access the nodes as they are being parsed\n- Super simple to use. No need for extensive options list. Parses everything in a performant way\n- Handles SVG and HTML easily including comments and script tags with HTML inside\n\n## Install \n\n#### Node\n```\nnpm install @beforesemicolon/html-parser\n```\n\n#### Browser\n\n```html\n\u003c!DOCTYPE html\u003e\n\u003chtml lang=\"en\"\u003e\n\u003chead\u003e\n\n  \u003c!-- Grab the latest version --\u003e\n  \u003cscript src=\"https://unpkg.com/@beforesemicolon/html-parser/dist/client.js\"\u003e\u003c/script\u003e\n\n  \u003c!-- Or a specific version --\u003e\n  \u003cscript src=\"https://unpkg.com/@beforesemicolon/html-parser@1.0.0/dist/client.js\"\u003e\u003c/script\u003e\n\n\u003c/head\u003e\n\u003cbody\u003e\u003c/body\u003e\n\u003c/html\u003e\n```\n\n###### Good to know\n- Only works with HTML and SVG tags. Duh!\n- Handles custom tags, style, script tags and comments by default without differences in the performance\n- `\u003c!Doctype\u003e` tag is ignored\n- Honor the format by keeping all white spaces which are returned as text nodes\n\n### Usage\nBy default, it will return a document fragment as root. The API is DOM-like, meaning, if you know the DOM\nAPI you already know this. The DOM-like API is minimal and built for performance allowing you to easily\nuse the same code in the browser, Node, Deno or any other javascript runtime environment.\n\nSee [custom handler section](#creating-your-custom-handler) to understand what this Document-like API looks like.\n\n```js\nimport {parse} from \"@beforesemicolon/html-parser\";\n\nconst frag = parse('\u003ch1\u003esite title\u003c/h1\u003e'); // return DocumentFragment-like object\n\nfrag.children[0] // h1 Element\n```\n\nThis parser works with the [DOM API](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model) by default so if you want to use it in Node, Deno or any Javascript runtime environment,\nmake sure to import [jsDom](https://www.npmjs.com/package/jsdom) or similar and provide the [Document](https://developer.mozilla.org/en-US/docs/Web/API/Document) object.\n```js\nimport * as jsdom from \"jsdom\";\nconst {JSDOM} = jsdom;\nconst document = new JSDOM('').window.document;\n\n// import the parser\nimport {parse} from \"@beforesemicolon/html-parser\";\n\nconst frag = parse('\u003ch1\u003esite title\u003c/h1\u003e', document); // return DocumentFragment\n\nfrag.children[0] // h1 Element\n```\n\n#### Browser\n\n```html\n\u003cscript\u003e\n  const {parse, Doc} = window.BFS;\n  \n  // uses a like Document-like object by default\n  const frag1 = parse('\u003ch1\u003esite title\u003c/h1\u003e'); // returns DocumentFragment-like\n  \n  // use the native DOM Document object\n  const frag2 = parse('\u003ch1\u003esite title\u003c/h1\u003e', document); // returns DocumentFragment object\n  \n  frag1.children[0] // h1 Element\n  frag2.children[0] // h1 Element\n\u003c/script\u003e\n```\n\n#### Callback option\nYou may also pass a callback function as second parameter which will get called as the nodes are being parsed\nand created. This will use the document as default so the callback will be get called with DOM Nodes and Element.\n\n```js\nconst frag = parse('\u003ch1\u003esite title\u003c/h1\u003e', (node) =\u003e {\n  // handle node here\n});\n```\n\n### Benchmark\nThe parser itself if fast but depending on the API you use for the final parsed result the performance will varies\non their algorithm. Here are two examples using [htmlparser-benchmark](https://github.com/AndreasMadsen/htmlparser-benchmark).\n\n```ts\nimport {parse} from \"@beforesemicolon/html-parser\";\n\nparse(aReallyMassimeHTMLString);\n// avg duration: 1.86113 ms/file ± 1.09698\n```\nThis is up to 30 times faster than the DOM Document API\n\n#### Using jsdom Document\nThis is using the custom [jsDom](https://www.npmjs.com/package/jsdom) in NodeJs:\n\n```ts\nimport * as jsdom from \"jsdom\";\nimport {parse} from \"@beforesemicolon/html-parser\";\n\nconst {JSDOM} = jsdom;\nconst document = new JSDOM('').window.document;\n\nparse(aReallyMassimeHTMLString, document);\n// avg duration: 27.3563 ms/file ± 19.1060`\n```\n\n### Creating your custom handler\nThe best thing about this parser is the ability to crate your own handler\nto transform HTML into anything you like.\n\nHere is an example of a simple implementation you can start from.\n\n```ts\nconst MyCustomDoc = {\n\tcreateComment: (value: string) =\u003e ({type: 'comment', value}),\n\tcreateTextNode: (value: string) =\u003e ({type: 'text', value}),\n\tcreateDocumentFragment: () =\u003e {\n\t\tconst children: unknown[] = []\n\n\t\treturn {\n\t\t\ttype: 'fragment',\n\t\t\tchildren,\n\t\t\tappendChild: (node: unknown) =\u003e {\n\t\t\t\tchildren.push(node)\n\t\t\t}\n\t\t}\n\t},\n\tcreateElementNS: (namespaceURI: string, tagName: string) =\u003e {\n\t\tconst children: unknown[] = []\n\t\tconst attributes: Record\u003cstring, unknown\u003e = {}\n\n\t\treturn {\n\t\t\tnamespaceURI, // important to ALWAYS include\n\t\t\ttagName,\n\t\t\tchildren,\n\t\t\tattributes,\n\t\t\ttype: 'node',\n\t\t\tappendChild(node: unknown) {\n\t\t\t\tchildren.push(node)\n\t\t\t},\n\t\t\tsetAttribute(name: string, value: string) {\n\t\t\t\tattributes[name] = value;\n\t\t\t}\n\t\t}\n\t},\n}\n\nconst result = parse\u003ctypeof MyCustomDoc\u003e(`...`, MyCustomDoc);\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeforesemicolon%2Fhtml-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeforesemicolon%2Fhtml-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeforesemicolon%2Fhtml-parser/lists"}