{"id":13342980,"url":"https://github.com/Chi-EEE/html-parser","last_synced_at":"2025-03-12T03:30:46.689Z","repository":{"id":79679918,"uuid":"603367567","full_name":"Chi-EEE/html-parser","owner":"Chi-EEE","description":"C++ HTML parser that generates a simple DOM tree in C++17","archived":false,"fork":true,"pushed_at":"2023-12-19T10:29:57.000Z","size":98,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2024-10-24T13:58:59.603Z","etag":null,"topics":["boost","cpp","cpp-html-parser","cpp17","css","html","html-parser","html-parser-library","html5","library","parser","scraping","xmake"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"Menci/html-parser","license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Chi-EEE.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-02-18T09:45:55.000Z","updated_at":"2023-12-21T01:06:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"8cdcb965-aba2-4f9d-a0bf-9e897c6af780","html_url":"https://github.com/Chi-EEE/html-parser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chi-EEE%2Fhtml-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chi-EEE%2Fhtml-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chi-EEE%2Fhtml-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Chi-EEE%2Fhtml-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Chi-EEE","download_url":"https://codeload.github.com/Chi-EEE/html-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243150714,"owners_count":20244447,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["boost","cpp","cpp-html-parser","cpp17","css","html","html-parser","html-parser-library","html5","library","parser","scraping","xmake"],"created_at":"2024-07-29T19:30:09.178Z","updated_at":"2025-03-12T03:30:46.362Z","avatar_url":"https://github.com/Chi-EEE.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HTML Parser\n# Requirements\n* [XMake](https://xmake.io)\n\n# How to install\nUsing [XMake](https://xmake.io), run `xmake install` on the repository to install the library\n\n(Use `xmake f --boost=n` to disable installing the [Boost](https://github.com/boostorg/boost) Library beforehand)\n\n# API\nInclude `html-parser/HTMLDocument.h`.\n\n## HTMLDocument\nThe interface to parse HTML string and get data from it.\n\n### `HTMLDocument::HTMLDocument`\nConstruct a `HTMLDocument` object from a `std::istream` or string.\n\n```cpp\nusing namespace html_parser;\n\n// explicit HTMLDocument::HTMLDocument(std::istream \u0026)\nHTMLDocument document1(std::cin);\n\n// explicit HTMLDocument::HTMLDocument(std::istream \u0026\u0026)\nHTMLDocument document2(std::ifstream(\"index.html\"));\n\n// explicit HTMLDocument::HTMLDocument(const std::string \u0026)\nHTMLDocument document3(\"\u003cdiv\u003ea \u0026le; b\u003c/div\u003e\");\n```\n\n### `HTMLDocument::parse`\nParse HTML document from a new string, replacing the current if exists.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(std::cin);\n\n// void HTMLDocument::parse(const std::string \u0026)\ndocument.parse(\"\u003cdiv\u003ea \u0026le; b\u003c/div\u003e\");\n```\n\n### `HTMLDocument::inspect`\nPrint the colorized DOM tree of HTML document to the terminal.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv\u003ea \u0026le; b\u003c/div\u003e\");\n\n// void HTMLDocument::inspect()\ndocument.inspect();\n```\n\n### `HTMLDocument::getTextContent`\nGet all text in the document.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv\u003ea \u0026le; b\u003c/div\u003e\u003cdiv\u003eqwq\u003c/div\u003e\");\n\n// std::string HTMLDocument::getTextContent()\nstd::string textContent = document.getTextContent();\n// textContent = \"a ≤ bqwq\"\n```\n\n### `HTMLDocument::getDirectTextContent`\nGet all the direct text in the document.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cspan class='myspan'\u003e Don't want this text \u003c/span\u003eI want this text\");\n\n// std::string HTMLDocument::getDirectTextContent()\nstd::string directTextContent = document.getDirectTextContent();\n// directTextContent = \"I want this text\"\n```\n\n### `HTMLDocument::getElementById`\nGet the element whose `id` attribute equals to a string. Return a `HTMLDocument::Element` object if found, a null `HTMLDocument::Element` object if NOT found.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv id='my-div'\u003ea \u0026le; b\u003c/div\u003e\");\n\n// HTMLDocument::Element HTMLDocument::getElementById(const std::string \u0026)\nHTMLDocument::Element div = document.getElementById(\"my-div\");\n```\n\n### `HTMLDocument::getElementsByName`\nGet all elements whose `name` attribute equal to a string. Return a `std::vector\u003cHTMLDocument::Element\u003e` that contains all matching elements.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv name='my'\u003ea \u0026le; b\u003c/div\u003e\u003cspan name='my'\u003eqwq\u003c/span\u003e\");\n\n// std::vector\u003cHTMLDocument::Element\u003e HTMLDocument::getElementsByName(const std::string \u0026)\nstd::vector\u003cHTMLDocument::Element\u003e elements = document.getElementsByName(\"my\");\n```\n\n### `HTMLDocument::getElementsByTagName`\nGet all elements whose tag name equals to a string. Return a `std::vector\u003cHTMLDocument::Element\u003e` that contains all matching elements.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv\u003ea \u0026le; b\u003c/div\u003e\u003cdiv\u003eqwq\u003c/div\u003e\");\n\n// std::vector\u003cHTMLDocument::Element\u003e HTMLDocument::getElementsByTagName(const std::string \u0026)\nstd::vector\u003cHTMLDocument::Element\u003e elements = document.getElementsByTagName(\"div\");\n```\n\n### `HTMLDocument::getElementsByClassName`\nGet all elements which have a certain class. Return a `std::vector\u003cHTMLDocument::Element\u003e` that contains all matching elements.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv class='my-class'\u003ea \u0026le; b\u003c/div\u003e\u003cdiv class='my-class'\u003eqwq\u003c/div\u003e\");\n\n// std::vector\u003cHTMLDocument::Element\u003e HTMLDocument::getElementsByClassName(const std::string \u0026)\nstd::vector\u003cHTMLDocument::Element\u003e elements = document.getElementsByClassName(\"my-class\");\n```\n\n### `HTMLDocument::getChildren`\nGet all child elements of the element upon which it was called. Return a `std::vector\u003cHTMLDocument::Element\u003e` that contains all the child elements.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cspan\u003eFirst\u003c/span\u003e\u003cspan\u003eSecond\u003c/span\u003e\");\n\n// std::vector\u003cHTMLDocument::Element\u003e HTMLDocument::getChildren()\nstd::vector\u003cHTMLDocument::Element\u003e elements = document.getChildren();\n```\n\n## HTMLDocument::Element\nThe interface to get data from a HTML element or its subtree.\n\nThe default constructor constructs a empty element, on which you do any operation will result in a `std::invalid_argument` exception. Check it with `if (element)` first.\n\n### `HTMLDocument::Element::inspect`\nPrint the colorized DOM tree of this element to the terminal.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv id='wrapper'\u003e\u003cdiv\u003ea \u0026le; b\u003c/div\u003e\u003c/div\u003e\");\nHTMLDocument::Element element = document.getElementById(\"wrapper\");\n\n// void HTMLDocument::Element::inspect()\nelement.inspect();\n```\n\n### `HTMLDocument::Element::getTextContent`\nGet all text in the element.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv id='wrapper'\u003e\u003cdiv\u003ea \u0026le; b\u003c/div\u003e\u003cdiv\u003eqwq\u003c/div\u003e\u003c/div\u003e\");\nHTMLDocument::Element element = document.getElementById(\"wrapper\");\n\n// std::string HTMLDocument::Element::getTextContent()\nstd::string textContent = element.getTextContent();\n// textContent = \"a ≤ b\"\n```\n\n### `HTMLDocument::Element::getDirectTextContent`\nGet all the direct text in the element.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv id='wrapper'\u003e\u003cspan class='myspan'\u003e Don't want this text \u003c/span\u003eI want this text\u003c/div\u003e\");\nHTMLDocument::Element element = document.getElementById(\"wrapper\");\n\n// std::string HTMLDocument::getDirectTextContent()\nstd::string directTextContent = element.getDirectTextContent();\n// directTextContent = \"I want this text\"\n```\n\n### `HTMLDocument::Element::getAttribute`\nGet a attribute with specfied name of the element. Return a empty string if not found.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv id='wrapper' data-url='/qwq'\u003e\u003c/div\u003e\");\nHTMLDocument::Element element = document.getElementById(\"wrapper\");\n\n// std::string HTMLDocument::Element::getAttribute(const std::string \u0026)\nstd::string value = element.getTextContent(\"data-url\");\n// value = \"/qwq\"\n```\n\n### `HTMLDocument::Element::getElementsByTagName`\nGet all elements whose tag name equals to a string. Return a `std::vector\u003cHTMLDocument::Element\u003e` that contains all matching elements.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv id='wrapper'\u003e\u003cdiv\u003ea \u0026le; b\u003c/div\u003e\u003cdiv\u003eqwq\u003c/div\u003e\u003c/div\u003e\");\nHTMLDocument::Element element = document.getElementById(\"wrapper\");\n\n// std::vector\u003cHTMLDocument::Element\u003e HTMLDocument::Element::getElementsByTagName(const std::string \u0026)\nstd::vector\u003cHTMLDocument::Element\u003e elements = element.getElementsByTagName(\"div\");\n```\n\n### `HTMLDocument::Element::getElementsByClassName`\nGet all elements which have a certain class. Return a `std::vector\u003cHTMLDocument::Element\u003e` that contains all matching elements.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv id='wrapper'\u003e\u003cdiv class='my-class'\u003ea \u0026le; b\u003c/div\u003e\u003cdiv class='my-class'\u003eqwq\u003c/div\u003e\u003c/div\u003e\");\nHTMLDocument::Element element = document.getElementById(\"wrapper\");\n\n// std::vector\u003cHTMLDocument::Element\u003e HTMLDocument::Element::getElementsByClassName(const std::string \u0026)\nstd::vector\u003cHTMLDocument::Element\u003e elements = element.getElementsByClassName(\"my-class\");\n```\n\n### `HTMLDocument::Element::getChildren`\nGet all child elements of the element upon which it was called. Return a `std::vector\u003cHTMLDocument::Element\u003e` that contains all the child elements.\n\n```cpp\nusing namespace html_parser;\n\nHTMLDocument document(\"\u003cdiv id='wrapper'\u003e\u003cspan\u003eFirst\u003c/span\u003e\u003cspan\u003eSecond\u003c/span\u003e\u003c/div\u003e\");\nHTMLDocument::Element element = document.getElementById(\"wrapper\");\n\n// std::vector\u003cHTMLDocument::Element\u003e HTMLDocument::Element::getChildren()\nstd::vector\u003cHTMLDocument::Element\u003e elements = element.getChildren();\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FChi-EEE%2Fhtml-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FChi-EEE%2Fhtml-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FChi-EEE%2Fhtml-parser/lists"}