{"id":13439917,"url":"https://github.com/lexborisov/myhtml","last_synced_at":"2025-05-14T10:06:55.037Z","repository":{"id":42382306,"uuid":"45879586","full_name":"lexborisov/myhtml","owner":"lexborisov","description":"Fast C/C++ HTML 5 Parser. Using threads.","archived":false,"fork":false,"pushed_at":"2025-01-15T17:01:14.000Z","size":16683,"stargazers_count":1673,"open_issues_count":20,"forks_count":149,"subscribers_count":90,"default_branch":"master","last_synced_at":"2025-04-06T20:01:32.760Z","etag":null,"topics":["c","html","html-parser","pure-c"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-2.1","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lexborisov.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-11-10T01:40:13.000Z","updated_at":"2025-04-04T21:02:12.000Z","dependencies_parsed_at":"2025-02-10T05:30:58.444Z","dependency_job_id":null,"html_url":"https://github.com/lexborisov/myhtml","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lexborisov%2Fmyhtml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lexborisov%2Fmyhtml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lexborisov%2Fmyhtml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lexborisov%2Fmyhtml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lexborisov","download_url":"https://codeload.github.com/lexborisov/myhtml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248788874,"owners_count":21161743,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","html","html-parser","pure-c"],"created_at":"2024-07-31T03:01:18.165Z","updated_at":"2025-04-13T22:18:38.549Z","avatar_url":"https://github.com/lexborisov.png","language":"C","readme":"# MyHTML — a pure C HTML parser\n\n[![Build Status](https://travis-ci.org/lexborisov/myhtml.svg?branch=master)](https://travis-ci.org/lexborisov/myhtml)\n\nMyHTML is a fast HTML Parser using Threads implemented as a pure C99 library with no outside dependencies.\n\n## Now\n\n### Important announcement!\n\nPlease use the HTML parser from the [Lexbor project](https://github.com/lexbor/lexbor). It is stable, has more features, and — yes — it's very fast.\n\n## Features\n\n- Asynchronous Parsing, Build Tree and Indexation\n- Fully conformant with the [HTML5 specification]\n- Two API - [high] and [low]-level\n- Manipulation of elements: add, change, delete and other\n- Manipulation of elements attributes: add, change, delete and other\n- Support 39 character encoding by specification [encoding.spec.whatwg.org]\n- Support detecting character encodings\n- Support Single Mode parsing\n- Support Build without POSIX Threads\n- Support for fragment parsing\n- Support for [parsing by chunks]\n- No outside dependencies\n- C99 support\n- Passes all tree construction tests from [html5lib-tests]\n- Tested by 1 billion HTML pages (by [commoncrawl.org])\n\n## Changes\nPlease, see [CHANGELOG.md] file\n\n## Further developments\n\n- [Modest] — Modest is a fast HTML Render implemented as a pure C99 library with no outside dependencies\n- [MyCSS] — Fast C/C++ CSS Parser (Cascading Style Sheets Parser) \n\n## Support encodings for InputStream\n\n```text\nX_USER_DEFINED, UTF_8, UTF_16LE, UTF_16BE, BIG5, EUC_KR, GB18030,\nIBM866, ISO_8859_10, ISO_8859_13, ISO_8859_14, ISO_8859_15, ISO_8859_16, ISO_8859_2, ISO_8859_3,\nISO_8859_4, ISO_8859_5, ISO_8859_6, ISO_8859_7, ISO_8859_8, KOI8_R, KOI8_U, MACINTOSH,\nWINDOWS_1250, WINDOWS_1251, WINDOWS_1252, WINDOWS_1253, WINDOWS_1254, WINDOWS_1255, WINDOWS_1256,\nWINDOWS_1257, WINDOWS_1258, WINDOWS_874, X_MAC_CYRILLIC, ISO_2022_JP, GBK, SHIFT_JIS, EUC_JP, ISO_8859_8_I\n```\n\n## Support encodings for output\n\n**Program working in UTF-8 and returns all in UTF-8**\n\n## Detecting character encodings\n\nNow it UTF-8, UTF-16LE, UTF16BE and russian windows-1251,  koi8-r, iso-8859-5, x-mac-cyrillic, ibm866\n\n## Installation\n\nSee [INSTALL.md](https://github.com/lexborisov/myhtml/blob/master/INSTALL.md)\n\n## Introduction\n\n[Introduction]\n\n## Benchmark\n\n- [Article with charts]\n- [Benchmark code]\n- [Images and CSV]\n\n## Dependencies\n\nNone\n\n## External Bindings and Wrappers\n\n- Perl 5 [HTML::MyHTML] module\n- Perl 5 [HTML5::DOM] module (DOM with CSS selectors)\n- [Perl 6] module\n- [Crystal] binding\n- [Elixir/Erlang] binding\n- [Swift wrapper](https://github.com/adtrevor/MyHTML)\n\n## Examples\n\nSee [examples] directory\n\n**Simple example**\n\n```c\n#include \u003cstdio.h\u003e\n#include \u003cstdlib.h\u003e\n#include \u003cstring.h\u003e\n\n#include \u003cmyhtml/api.h\u003e\n\nint main(int argc, const char * argv[])\n{\n    char html[] = \"\u003cdiv\u003e\u003cspan\u003eHTML\u003c/span\u003e\u003c/div\u003e\";\n    \n    // basic init\n    myhtml_t* myhtml = myhtml_create();\n    myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0);\n    \n    // first tree init\n    myhtml_tree_t* tree = myhtml_tree_create();\n    myhtml_tree_init(tree, myhtml);\n    \n    // parse html\n    myhtml_parse(tree, MyENCODING_UTF_8, html, strlen(html));\n    \n    // print result\n    // or see serialization function with callback: myhtml_serialization_tree_callback\n    mycore_string_raw_t str = {0};\n    myhtml_serialization_tree_buffer(myhtml_tree_get_document(tree), \u0026str);\n    printf(\"%s\\n\", str.data);\n    \n    // release resources\n    mycore_string_raw_destroy(\u0026str, false);\n    myhtml_tree_destroy(tree);\n    myhtml_destroy(myhtml);\n    \n    return 0;\n}\n```\n\n## AUTHOR\n\nAlexander Borisov \u003clex.borisov@gmail.com\u003e\n\n## COPYRIGHT AND LICENSE\n\nCopyright (C) 2015-2018 Alexander Borisov\n\nThis library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.\n\nThis library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more details.\n\nYou should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301 USA\n\nSee the [LICENSE] file.\n\n\n[HTML5 specification]: https://html.spec.whatwg.org/multipage/\n[Modest]: https://github.com/lexborisov/Modest\n[high]: https://github.com/lexborisov/myhtml/blob/master/include/myhtml/api.h\n[low]: https://github.com/lexborisov/myhtml/tree/master/include/myhtml\n[examples]: https://github.com/lexborisov/myhtml/tree/master/examples\n[parsing by chunks]: https://github.com/lexborisov/myhtml/blob/master/examples/myhtml/chunks_high_level.c\n[encoding.spec.whatwg.org]: https://encoding.spec.whatwg.org/\n[html5lib-tests]: https://github.com/html5lib/html5lib-tests\n[commoncrawl.org]: http://commoncrawl.org/\n[MyCSS]: https://github.com/lexborisov/mycss\n[CHANGELOG.md]: https://github.com/lexborisov/myhtml/blob/master/CHANGELOG.md\n[HTML::MyHTML]: https://metacpan.org/release/HTML-MyHTML\n[HTML5::DOM]: https://github.com/Azq2/perl-html5-dom\n[Perl 6]: https://github.com/MadcapJake/p6-MyHTML\n[Crystal]: https://github.com/kostya/myhtml\n[Elixir/Erlang]: https://github.com/Overbryd/myhtmlex\n[Introduction]: http://lexborisov.github.io/myhtml/\n[Article with charts]: http://lexborisov.github.io/benchmark-html-persers/\n[Benchmark code]: https://github.com/lexborisov/benchmark-html-persers/tree/master\n[Images and CSV]: https://github.com/lexborisov/benchmark-html-persers/tree/master/Results\n[LICENSE]: https://github.com/lexborisov/myhtml/blob/master/LICENSE\n","funding_links":[],"categories":["C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flexborisov%2Fmyhtml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flexborisov%2Fmyhtml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flexborisov%2Fmyhtml/lists"}