Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/creeperyang/html-parser-lite
A light weight html parser and more.
https://github.com/creeperyang/html-parser-lite
html-parser html-parser-lite parser
Last synced: 4 months ago
JSON representation
A light weight html parser and more.
- Host: GitHub
- URL: https://github.com/creeperyang/html-parser-lite
- Owner: creeperyang
- Created: 2016-05-24T09:47:28.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-07-20T13:11:58.000Z (over 1 year ago)
- Last Synced: 2024-10-12T13:27:11.370Z (4 months ago)
- Topics: html-parser, html-parser-lite, parser
- Language: JavaScript
- Size: 107 KB
- Stars: 7
- Watchers: 2
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
## html-parser-lite [![Test CI](https://github.com/creeperyang/html-parser-lite/actions/workflows/node.js.yml/badge.svg)](https://github.com/creeperyang/html-parser-lite/actions/workflows/node.js.yml)
> A light weight html parser and more.
[![NPM](https://nodei.co/npm/html-parser-lite.png?compact=true)](https://nodei.co/npm/html-parser-lite/)
### API
**`parse(html: string, options?: object)=>Node|Node[]`:**
|property|type|desc|defaults|
|--------|---------|-------|-------|
| `html` | `string` | The string to parse | None |
| `options.wrapWithDocument` | `boolean` | Whether force to create a document node as root wrapper. | `false` |
| `options.ignoreWhitespaceText` | `boolean` | Whether create text node when all the chars are white space. | `true` |
| `options.scanner` | `HtmlScanner` | Inner html scanner. Config it only when you want to implement custom complex logic. | inner `HtmlScanner` instance |If `options.ignoreWhitespaceText` set to `true`, it will return a `DocumentNode`(as the root of the whole tree); otherwise, it will return an array of nodes.
#### Important Tips
The library's goal is not to behave the same as the browser, it just parses html string to node tree.
When you use default options (just run `parse(html)`),it will always return an array of nodes. And the white space between tags will be ignored. Take `
s t a r t
↵ ↵` for example:
- `↵ ↵` between two paragraphs will be ignored, so only return two paragraph nodes.
- The first paragraph `s t a r t
` will keep all white space characters.
- The second paragraph `` will ingore white space, so this `p` node has no text child node.
If you want to keep white space(which generates corresponding text nodes), set `options.ignoreWhitespaceText=false`.
### Usage
```js
const fs = require('fs')
const parse = require('html-parser-lite')
const html = fs.readFileSync('test/textures/simple.html').toString()// html-parser will parse html to nodes array (default behavior).
const nodes = parse(html)
// JSON.stringify(nodes):
// [{"tagName":"doctype","nodeType":10,"publicId":"","systemId":"","name":"html"},{"tagName":"html","nodeType":1,"childNodes":[{"tagName":"head","nodeType":1,"childNodes":[{"tagName":"meta","nodeType":1,"childNodes":[],"attrs":{"charset":"utf-8"}},{"tagName":"title","nodeType":1,"childNodes":[{"tagName":"text","nodeType":3,"textContent":"hi"}],"attrs":{}}],"attrs":{}},{"tagName":"body","nodeType":1,"childNodes":[{"tagName":"h1","nodeType":1,"childNodes":[{"tagName":"text","nodeType":3,"textContent":"heading title"}],"attrs":{}}],"attrs":{}}],"attrs":{"class":"html-ok","lang":"zh-hans-cn"},"className":"html-ok"}]
```### License
[MIT](https://opensource.org/licenses/mit-license.php)