{"id":20649114,"url":"https://github.com/liulinboyi/htmlparser","last_synced_at":"2025-04-17T00:59:32.636Z","repository":{"id":109596262,"uuid":"385113071","full_name":"liulinboyi/HTMLParser","owner":"liulinboyi","description":"HTMLParser 解析HTML 欢迎参考 HTMLParser Parsing HTML Welcome to the reference","archived":false,"fork":false,"pushed_at":"2024-08-04T11:42:51.000Z","size":843,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-08-04T12:48:14.792Z","etag":null,"topics":["htmlparser","parser","parser-library"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/liulinboyi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-12T03:22:50.000Z","updated_at":"2024-08-04T11:42:54.000Z","dependencies_parsed_at":"2023-04-11T19:32:51.138Z","dependency_job_id":null,"html_url":"https://github.com/liulinboyi/HTMLParser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liulinboyi%2FHTMLParser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liulinboyi%2FHTMLParser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liulinboyi%2FHTMLParser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liulinboyi%2FHTMLParser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/liulinboyi","download_url":"https://codeload.github.com/liulinboyi/HTMLParser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224944519,"owners_count":17396257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["htmlparser","parser","parser-library"],"created_at":"2024-11-16T17:12:37.493Z","updated_at":"2024-11-16T17:12:38.130Z","avatar_url":"https://github.com/liulinboyi.png","language":"TypeScript","readme":"# HTML Parser\n\n## 解析HTML\n\n[![Tests](https://github.com/liulinboyi/HTMLParser/actions/workflows/tests.yml/badge.svg)](https://github.com/liulinboyi/HTMLParser/actions/workflows/tests.yml)\n\n## HTML\n\n```html\n\u003c!DOCTYPE html\u003e\n\u003chtml lang=\"en\"\u003e\n\u003chead\u003e\n    \u003cmeta charset=\"UTF-8\"\u003e\n    \u003cmeta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"\u003e\n    \u003cmeta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"\u003e\n    \u003ctitle\u003eDocument\u003c/title\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n    \u003cdiv\u003e\n        \u003ch1 v-if=\"res.value\" name='11' @click=\"tes\"\u003e11{{res.value}}\u003c/h1\u003e\n    \u003c/div\u003e\n    \u003ca href=\"http://github.com/\"\u003e\u003c/a\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n```\n\n## AST\n\u003cdetails\u003e\n\u003csummary\u003e点击查看详情(Click to view details)\u003c/summary\u003e\n\u003cpre\u003e\u003ccode\u003e\n{\n    \"type\": \"root\",\n    \"children\": [\n        {\n            \"type\": \"DTD\",\n            \"LineNum\": 1,\n            \"content\": \"DOCTYPE html\"\n        },\n        {\n            \"content\": \"\\r\\n\",\n            \"LineNum\": 1,\n            \"type\": \"text\"\n        },\n        {\n            \"children\": [\n                {\n                    \"content\": \"\\r\\n\",\n                    \"LineNum\": 2,\n                    \"type\": \"text\"\n                },\n                {\n                    \"children\": [\n                        {\n                            \"content\": \"\\r\\n    \",\n                            \"LineNum\": 3,\n                            \"type\": \"text\"\n                        },\n                        {\n                            \"children\": [],\n                            \"attr\": [\n                                {\n                                    \"name\": \"charset\",\n                                    \"value\": \"UTF-8\"\n                                }\n                            ],\n                            \"LineNum\": 4,\n                            \"type\": \"tag\",\n                            \"tag\": \"meta\"\n                        },\n                        {\n                            \"content\": \"\\r\\n    \",\n                            \"LineNum\": 4,\n                            \"type\": \"text\"\n                        },\n                        {\n                            \"children\": [],\n                            \"attr\": [\n                                {\n                                    \"name\": \"http-equiv\",\n                                    \"value\": \"X-UA-Compatible\"\n                                },\n                                {\n                                    \"name\": \"content\",\n                                    \"value\": \"IE=edge\"\n                                }\n                            ],\n                            \"LineNum\": 5,\n                            \"type\": \"tag\",\n                            \"tag\": \"meta\"\n                        },\n                        {\n                            \"content\": \"\\r\\n    \",\n                            \"LineNum\": 5,\n                            \"type\": \"text\"\n                        },\n                        {\n                            \"children\": [],\n                            \"attr\": [\n                                {\n                                    \"name\": \"name\",\n                                    \"value\": \"viewport\"\n                                },\n                                {\n                                    \"name\": \"content\",\n                                    \"value\": \"width=device-width, initial-scale=1.0\"\n                                }\n                            ],\n                            \"LineNum\": 6,\n                            \"type\": \"tag\",\n                            \"tag\": \"meta\"\n                        },\n                        {\n                            \"content\": \"\\r\\n    \",\n                            \"LineNum\": 6,\n                            \"type\": \"text\"\n                        },\n                        {\n                            \"children\": [\n                                {\n                                    \"content\": \"Document\",\n                                    \"LineNum\": 7,\n                                    \"type\": \"text\"\n                                }\n                            ],\n                            \"attr\": [],\n                            \"LineNum\": 7,\n                            \"type\": \"tag\",\n                            \"tag\": \"title\"\n                        },\n                        {\n                            \"content\": \"\\r\\n\",\n                            \"LineNum\": 7,\n                            \"type\": \"text\"\n                        }\n                    ],\n                    \"attr\": [],\n                    \"LineNum\": 3,\n                    \"type\": \"tag\",\n                    \"tag\": \"head\"\n                },\n                {\n                    \"content\": \"\\r\\n\",\n                    \"LineNum\": 8,\n                    \"type\": \"text\"\n                },\n                {\n                    \"children\": [\n                        {\n                            \"content\": \"\\r\\n    \",\n                            \"LineNum\": 9,\n                            \"type\": \"text\"\n                        },\n                        {\n                            \"children\": [\n                                {\n                                    \"content\": \"\\r\\n        \",\n                                    \"LineNum\": 10,\n                                    \"type\": \"text\"\n                                },\n                                {\n                                    \"children\": [\n                                        {\n                                            \"content\": \"11{{res.value}}\",\n                                            \"LineNum\": 11,\n                                            \"type\": \"text\"\n                                        }\n                                    ],\n                                    \"attr\": [\n                                        {\n                                            \"name\": \"v-if\",\n                                            \"value\": \"res.value\"\n                                        },\n                                        {\n                                            \"name\": \"name\",\n                                            \"value\": \"11\"\n                                        },\n                                        {\n                                            \"name\": \"@click\",\n                                            \"value\": \"tes\"\n                                        }\n                                    ],\n                                    \"LineNum\": 11,\n                                    \"type\": \"tag\",\n                                    \"tag\": \"h1\"\n                                },\n                                {\n                                    \"content\": \"\\r\\n    \",\n                                    \"LineNum\": 11,\n                                    \"type\": \"text\"\n                                }\n                            ],\n                            \"attr\": [],\n                            \"LineNum\": 10,\n                            \"type\": \"tag\",\n                            \"tag\": \"div\"\n                        },\n                        {\n                            \"content\": \"\\r\\n    \",\n                            \"LineNum\": 12,\n                            \"type\": \"text\"\n                        },\n                        {\n                            \"children\": [],\n                            \"attr\": [\n                                {\n                                    \"name\": \"href\",\n                                    \"value\": \"http://github.com/\"\n                                }\n                            ],\n                            \"LineNum\": 13,\n                            \"type\": \"tag\",\n                            \"tag\": \"a\"\n                        },\n                        {\n                            \"content\": \"\\r\\n\",\n                            \"LineNum\": 13,\n                            \"type\": \"text\"\n                        }\n                    ],\n                    \"attr\": [],\n                    \"LineNum\": 9,\n                    \"type\": \"tag\",\n                    \"tag\": \"body\"\n                },\n                {\n                    \"content\": \"\\r\\n\",\n                    \"LineNum\": 14,\n                    \"type\": \"text\"\n                }\n            ],\n            \"attr\": [\n                {\n                    \"name\": \"lang\",\n                    \"value\": \"en\"\n                }\n            ],\n            \"LineNum\": 2,\n            \"type\": \"tag\",\n            \"tag\": \"html\"\n        }\n    ],\n    \"LineNum\": 1\n}\n\u003c/code\u003e\u003c/pre\u003e\n\u003c/details\u003e\n\n## 添加应用\n[查找节点](https://github.com/liulinboyi/HTMLParser-App/tree/main/platform)\n\n## TIPS\n\n\u003e 无运行时依赖\n\n没有做到浏览器那样兼容性巨好，HTML写成啥样都不报错都会解析，我只解析了一部分奇葩写法~有的HTML写法太奇葩了，要兼容就需要更多的分支和处理，需要更多的精力就算了。\n\n## 注意\n\n#### ~~tsc编译后无法加上.js后缀，导致无法使用module，所以在所有ts文件导入加上了js后缀~~\n#### ~~https://segmentfault.com/q/1010000038671707~~\n#### ~~[社区讨论](https://github.com/microsoft/TypeScript/issues/16577)~~\n\n#### 已解决，写了个[脚本](./script/addSuffixJs.js)，将所有编译后的ES modules的导入导出部分加上了js后缀\n\n## [测试](./test)\n#### 使用[playwright](https://github.com/microsoft/playwright.git)和浏览器生成的DOM结构做了对比，除了一些奇葩写法，其他基本没问题。\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliulinboyi%2Fhtmlparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fliulinboyi%2Fhtmlparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliulinboyi%2Fhtmlparser/lists"}