Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/liulinboyi/htmlparser
HTMLParser 解析HTML 欢迎参考 HTMLParser Parsing HTML Welcome to the reference
https://github.com/liulinboyi/htmlparser
htmlparser parser parser-library
Last synced: 2 months ago
JSON representation
HTMLParser 解析HTML 欢迎参考 HTMLParser Parsing HTML Welcome to the reference
- Host: GitHub
- URL: https://github.com/liulinboyi/htmlparser
- Owner: liulinboyi
- License: mit
- Created: 2021-07-12T03:22:50.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-08-04T11:42:51.000Z (6 months ago)
- Last Synced: 2024-08-04T12:48:14.792Z (6 months ago)
- Topics: htmlparser, parser, parser-library
- Language: TypeScript
- Homepage:
- Size: 823 KB
- Stars: 13
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# HTML Parser
## 解析HTML
[![Tests](https://github.com/liulinboyi/HTMLParser/actions/workflows/tests.yml/badge.svg)](https://github.com/liulinboyi/HTMLParser/actions/workflows/tests.yml)
## HTML
```html
Document
11{{res.value}}
```
## AST
点击查看详情(Click to view details)
{
"type": "root",
"children": [
{
"type": "DTD",
"LineNum": 1,
"content": "DOCTYPE html"
},
{
"content": "\r\n",
"LineNum": 1,
"type": "text"
},
{
"children": [
{
"content": "\r\n",
"LineNum": 2,
"type": "text"
},
{
"children": [
{
"content": "\r\n ",
"LineNum": 3,
"type": "text"
},
{
"children": [],
"attr": [
{
"name": "charset",
"value": "UTF-8"
}
],
"LineNum": 4,
"type": "tag",
"tag": "meta"
},
{
"content": "\r\n ",
"LineNum": 4,
"type": "text"
},
{
"children": [],
"attr": [
{
"name": "http-equiv",
"value": "X-UA-Compatible"
},
{
"name": "content",
"value": "IE=edge"
}
],
"LineNum": 5,
"type": "tag",
"tag": "meta"
},
{
"content": "\r\n ",
"LineNum": 5,
"type": "text"
},
{
"children": [],
"attr": [
{
"name": "name",
"value": "viewport"
},
{
"name": "content",
"value": "width=device-width, initial-scale=1.0"
}
],
"LineNum": 6,
"type": "tag",
"tag": "meta"
},
{
"content": "\r\n ",
"LineNum": 6,
"type": "text"
},
{
"children": [
{
"content": "Document",
"LineNum": 7,
"type": "text"
}
],
"attr": [],
"LineNum": 7,
"type": "tag",
"tag": "title"
},
{
"content": "\r\n",
"LineNum": 7,
"type": "text"
}
],
"attr": [],
"LineNum": 3,
"type": "tag",
"tag": "head"
},
{
"content": "\r\n",
"LineNum": 8,
"type": "text"
},
{
"children": [
{
"content": "\r\n ",
"LineNum": 9,
"type": "text"
},
{
"children": [
{
"content": "\r\n ",
"LineNum": 10,
"type": "text"
},
{
"children": [
{
"content": "11{{res.value}}",
"LineNum": 11,
"type": "text"
}
],
"attr": [
{
"name": "v-if",
"value": "res.value"
},
{
"name": "name",
"value": "11"
},
{
"name": "@click",
"value": "tes"
}
],
"LineNum": 11,
"type": "tag",
"tag": "h1"
},
{
"content": "\r\n ",
"LineNum": 11,
"type": "text"
}
],
"attr": [],
"LineNum": 10,
"type": "tag",
"tag": "div"
},
{
"content": "\r\n ",
"LineNum": 12,
"type": "text"
},
{
"children": [],
"attr": [
{
"name": "href",
"value": "http://github.com/"
}
],
"LineNum": 13,
"type": "tag",
"tag": "a"
},
{
"content": "\r\n",
"LineNum": 13,
"type": "text"
}
],
"attr": [],
"LineNum": 9,
"type": "tag",
"tag": "body"
},
{
"content": "\r\n",
"LineNum": 14,
"type": "text"
}
],
"attr": [
{
"name": "lang",
"value": "en"
}
],
"LineNum": 2,
"type": "tag",
"tag": "html"
}
],
"LineNum": 1
}## 添加应用
[查找节点](https://github.com/liulinboyi/HTMLParser-App/tree/main/platform)## TIPS
> 无运行时依赖
没有做到浏览器那样兼容性巨好,HTML写成啥样都不报错都会解析,我只解析了一部分奇葩写法~有的HTML写法太奇葩了,要兼容就需要更多的分支和处理,需要更多的精力就算了。
## 注意
#### ~~tsc编译后无法加上.js后缀,导致无法使用module,所以在所有ts文件导入加上了js后缀~~
#### ~~https://segmentfault.com/q/1010000038671707~~
#### ~~[社区讨论](https://github.com/microsoft/TypeScript/issues/16577)~~#### 已解决,写了个[脚本](./script/addSuffixJs.js),将所有编译后的ES modules的导入导出部分加上了js后缀
## [测试](./test)
#### 使用[playwright](https://github.com/microsoft/playwright.git)和浏览器生成的DOM结构做了对比,除了一些奇葩写法,其他基本没问题。