{"id":20516389,"url":"https://github.com/chloro-pn/hparser","last_synced_at":"2026-06-01T04:31:56.048Z","repository":{"id":120001382,"uuid":"239760768","full_name":"chloro-pn/hparser","owner":"chloro-pn","description":"light-weight, simple and fast xhtml parser library for c++11 with DOM-like interface","archived":false,"fork":false,"pushed_at":"2020-02-14T09:35:29.000Z","size":144,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-05T23:27:56.580Z","etag":null,"topics":["cplusplus-11","xhtml","xml","xml-parser"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chloro-pn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-11T12:50:26.000Z","updated_at":"2020-02-14T11:49:09.000Z","dependencies_parsed_at":"2023-06-14T16:15:46.005Z","dependency_job_id":null,"html_url":"https://github.com/chloro-pn/hparser","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/chloro-pn/hparser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Fhparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Fhparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Fhparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Fhparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chloro-pn","download_url":"https://codeload.github.com/chloro-pn/hparser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Fhparser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33760645,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cplusplus-11","xhtml","xml","xml-parser"],"created_at":"2024-11-15T21:28:36.610Z","updated_at":"2026-06-01T04:31:56.030Z","avatar_url":"https://github.com/chloro-pn.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# hparser\nhparser is a light-weight, simple and fast xhtml parser library for c++11 with DOM-like interface, which supports utf-8 encoding(without BOM).\n\nhparser 仅提供访问接口，不提供修改接口。\n\nhparser 可基于正则表达式进行访问和查询。由于std::regex目前对unicode的支持有限，因此如果你使用ascii码以外的正则匹配可能无法得到正确答案。对此hparser提供\nutf8_to_utf32接口(std::string -\u003e std::u32string)和utf32_to_utf8(std::u32string -\u003e std::string)接口。如果有支持u32string(即char32_t存储类型)的正则表达式库，可以结合hparser.find接口以及utf8_to_utf32接口执行正则匹配。\n\nhparser解析过程单遍遍历文本，且未使用递归调用，因此未限制DOM文档树最大深度，最大限制取决于内存等其他系统资源。\n\n# license\nMIT License.\n\n# test\n基于Catch2进行单元测试。\n\n# build\nhparser 使用cmake工具进行构建\n```\nmkdir build \u0026\u0026 cd build\ncmake ..\nmake\n```\n在(project_dir)/build/lib 中会生成静态库libhparser.a。在(project_dir)/build/bin中会生成可执行程序 hparser_test 和 examples。分别为单元测试\n程序和example程序。examples中通过解析文件(project_dir)/examples/1.html，输出所有拥有属性\"href\"的元素。输出格式为tag : \"url\" \\n。\nhparser解析的属性值均没有去除\"号。\n\n# doc\nhttps://segmentfault.com/a/1190000021749001\n\n# example\n```\n#include \"../include/hparser.h\"\n#include \u003cfstream\u003e\n#include \u003cstring\u003e\n#include \u003ciostream\u003e\n#include \u003ccassert\u003e\n\nint main() {\n  std::ifstream in(\"../../examples/1.html\", std::ios::binary);\n  assert(in.good() == true);\n  std::string content;\n  while(true) {\n    char tmp;\n    in.read(\u0026tmp, sizeof(tmp));\n    if(in.eof() == true) {\n      break;\n    }\n    assert(in.good() == true);\n    content.push_back(tmp);\n  }\n  in.close();\n  //定义并初始化类hparser，构造函数中进行解析。\n  hparser h(content);\n  //result的类型是std::vector\u003chparser::element_type\u003e。find_attr接口根据是否具有属性\"href\"筛选element并返回。\n  auto result = h.find_attr(\"href\");\n  for(auto it = result.begin(); it != result.end(); ++it) {\n    std::cout \u003c\u003c (*it)-\u003etag() \u003c\u003c \" : \" \u003c\u003c (*it)-\u003eoperator[](\"href\") \u003c\u003c std::endl;\n  }\n  return 0;\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchloro-pn%2Fhparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchloro-pn%2Fhparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchloro-pn%2Fhparser/lists"}