{"id":13610259,"url":"https://github.com/html5lib/html5lib-python","last_synced_at":"2025-05-13T20:18:07.575Z","repository":{"id":554943,"uuid":"9322649","full_name":"html5lib/html5lib-python","owner":"html5lib","description":"Standards-compliant library for parsing and serializing HTML documents and fragments in Python","archived":false,"fork":false,"pushed_at":"2024-02-27T19:49:36.000Z","size":6859,"stargazers_count":1189,"open_issues_count":90,"forks_count":294,"subscribers_count":50,"default_branch":"master","last_synced_at":"2025-04-28T11:52:30.025Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/html5lib.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-04-09T14:07:42.000Z","updated_at":"2025-04-16T07:20:28.000Z","dependencies_parsed_at":"2024-04-28T01:53:30.538Z","dependency_job_id":"49e2b227-76d5-47cb-9032-faf705bcd571","html_url":"https://github.com/html5lib/html5lib-python","commit_stats":{"total_commits":1536,"total_committers":68,"mean_commits":22.58823529411765,"dds":0.7701822916666666,"last_synced_commit":"3e500bb6e4188ea087f5b743a720ed9f4d9216f9"},"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html5lib%2Fhtml5lib-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html5lib%2Fhtml5lib-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html5lib%2Fhtml5lib-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/html5lib%2Fhtml5lib-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/html5lib","download_url":"https://codeload.github.com/html5lib/html5lib-python/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254020658,"owners_count":22000757,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:01:42.978Z","updated_at":"2025-05-13T20:18:07.550Z","avatar_url":"https://github.com/html5lib.png","language":"Python","readme":"html5lib\n========\n\n.. image:: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml/badge.svg\n    :target: https://github.com/html5lib/html5lib-python/actions/workflows/python-tox.yml\n\nhtml5lib is a pure-python library for parsing HTML. It is designed to\nconform to the WHATWG HTML specification, as is implemented by all major\nweb browsers.\n\n\nUsage\n-----\n\nSimple usage follows this pattern:\n\n.. code-block:: python\n\n  import html5lib\n  with open(\"mydocument.html\", \"rb\") as f:\n      document = html5lib.parse(f)\n\nor:\n\n.. code-block:: python\n\n  import html5lib\n  document = html5lib.parse(\"\u003cp\u003eHello World!\")\n\nBy default, the ``document`` will be an ``xml.etree`` element instance.\nWhenever possible, html5lib chooses the accelerated ``ElementTree``\nimplementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).\n\nTwo other tree types are supported: ``xml.dom.minidom`` and\n``lxml.etree``. To use an alternative format, specify the name of\na treebuilder:\n\n.. code-block:: python\n\n  import html5lib\n  with open(\"mydocument.html\", \"rb\") as f:\n      lxml_etree_document = html5lib.parse(f, treebuilder=\"lxml\")\n\nWhen using with ``urllib2`` (Python 2), the charset from HTTP should be\npass into html5lib as follows:\n\n.. code-block:: python\n\n  from contextlib import closing\n  from urllib2 import urlopen\n  import html5lib\n\n  with closing(urlopen(\"http://example.com/\")) as f:\n      document = html5lib.parse(f, transport_encoding=f.info().getparam(\"charset\"))\n\nWhen using with ``urllib.request`` (Python 3), the charset from HTTP\nshould be pass into html5lib as follows:\n\n.. code-block:: python\n\n  from urllib.request import urlopen\n  import html5lib\n\n  with urlopen(\"http://example.com/\") as f:\n      document = html5lib.parse(f, transport_encoding=f.info().get_content_charset())\n\nTo have more control over the parser, create a parser object explicitly.\nFor instance, to make the parser raise exceptions on parse errors, use:\n\n.. code-block:: python\n\n  import html5lib\n  with open(\"mydocument.html\", \"rb\") as f:\n      parser = html5lib.HTMLParser(strict=True)\n      document = parser.parse(f)\n\nWhen you're instantiating parser objects explicitly, pass a treebuilder\nclass as the ``tree`` keyword argument to use an alternative document\nformat:\n\n.. code-block:: python\n\n  import html5lib\n  parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder(\"dom\"))\n  minidom_document = parser.parse(\"\u003cp\u003eHello World!\")\n\nMore documentation is available at https://html5lib.readthedocs.io/.\n\n\nInstallation\n------------\n\nhtml5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install:\n\n.. code-block:: bash\n\n    $ pip install html5lib\n\nThe goal is to support a (non-strict) superset of the versions that `pip\nsupports\n\u003chttps://pip.pypa.io/en/stable/installing/#python-and-os-compatibility\u003e`_.\n\nOptional Dependencies\n---------------------\n\nThe following third-party libraries may be used for additional\nfunctionality:\n\n- ``lxml`` is supported as a tree format (for both building and\n  walking) under CPython (but *not* PyPy where it is known to cause\n  segfaults);\n\n- ``genshi`` has a treewalker (but not builder); and\n\n- ``chardet`` can be used as a fallback when character encoding cannot\n  be determined.\n\n\nBugs\n----\n\nPlease report any bugs on the `issue tracker\n\u003chttps://github.com/html5lib/html5lib-python/issues\u003e`_.\n\n\nTests\n-----\n\nUnit tests require the ``pytest`` and ``mock`` libraries and can be\nrun using the ``pytest`` command in the root directory.\n\nTest data are contained in a separate `html5lib-tests\n\u003chttps://github.com/html5lib/html5lib-tests\u003e`_ repository and included\nas a submodule, thus for git checkouts they must be initialized::\n\n  $ git submodule init\n  $ git submodule update\n\nIf you have all compatible Python implementations available on your\nsystem, you can run tests on all of them using the ``tox`` utility,\nwhich can be found on PyPI.\n\n\nQuestions?\n----------\n\nCheck out `the docs \u003chttps://html5lib.readthedocs.io/en/latest/\u003e`_. Still\nneed help? Go to our `GitHub Discussions\n\u003chttps://github.com/html5lib/html5lib-python/discussions\u003e`_.\n\nYou can also browse the archives of the `html5lib-discuss mailing list \n\u003chttps://www.mail-archive.com/html5lib-discuss@googlegroups.com/\u003e`_.\n","funding_links":[],"categories":["HTML Manipulation","资源列表","Python","HTML操作","HTML Processing","HTML 处理","HTML Manipulation [🔝](#readme)","Awesome Python"],"sub_categories":["HTML 处理","HTML Manipulation"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhtml5lib%2Fhtml5lib-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhtml5lib%2Fhtml5lib-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhtml5lib%2Fhtml5lib-python/lists"}