{"id":13574230,"url":"https://github.com/buriy/python-readability","last_synced_at":"2025-10-21T13:07:56.951Z","repository":{"id":1457972,"uuid":"1692659","full_name":"buriy/python-readability","owner":"buriy","description":"fast python port of arc90's readability tool, updated to match latest readability.js!","archived":false,"fork":true,"pushed_at":"2025-01-12T19:05:10.000Z","size":752,"stargazers_count":2698,"open_issues_count":38,"forks_count":351,"subscribers_count":95,"default_branch":"master","last_synced_at":"2025-01-12T19:25:38.773Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/buriy/python-readability","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"timbertson/python-readability","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/buriy.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2011-05-02T18:51:48.000Z","updated_at":"2025-01-12T19:05:13.000Z","dependencies_parsed_at":"2023-07-05T21:01:32.724Z","dependency_job_id":null,"html_url":"https://github.com/buriy/python-readability","commit_stats":{"total_commits":177,"total_committers":44,"mean_commits":"4.0227272727272725","dds":0.5932203389830508,"last_synced_commit":"b679ff761f8bfcfd19a3be9c9f4bc5168585884a"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buriy%2Fpython-readability","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buriy%2Fpython-readability/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buriy%2Fpython-readability/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buriy%2Fpython-readability/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/buriy","download_url":"https://codeload.github.com/buriy/python-readability/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235607123,"owners_count":19017298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:00:48.417Z","updated_at":"2025-10-07T08:30:26.327Z","avatar_url":"https://github.com/buriy.png","language":"Python","readme":".. image:: https://travis-ci.org/buriy/python-readability.svg?branch=master\n    :target: https://travis-ci.org/buriy/python-readability\n.. image:: https://img.shields.io/pypi/v/readability-lxml.svg\n    :target: https://pypi.python.org/pypi/readability-lxml\n\npython-readability\n==================\n\nGiven an HTML document, extract and clean up the main body text and title.\n\nThis is a Python port of a Ruby port of `arc90's Readability\nproject \u003chttps://web.archive.org/web/20130519040221/http://www.readability.com/\u003e`__.\n\nInstallation\n------------\n\nIt's easy using ``pip``, just run:\n\n.. code-block:: bash\n\n    $ pip install readability-lxml\n\nAs an alternative, you may also use conda to install, just run:\n\n.. code-block:: bash\n\n    $ conda install -c conda-forge readability-lxml \n\nUsage\n-----\n\n.. code-block:: python\n\n    \u003e\u003e\u003e import requests\n    \u003e\u003e\u003e from readability import Document\n\n    \u003e\u003e\u003e response = requests.get('http://example.com')\n    \u003e\u003e\u003e doc = Document(response.content)\n    \u003e\u003e\u003e doc.title()\n    'Example Domain'\n\n    \u003e\u003e\u003e doc.summary()\n    \"\"\"\u003chtml\u003e\u003cbody\u003e\u003cdiv\u003e\u003cbody id=\"readabilityBody\"\u003e\\n\u003cdiv\u003e\\n    \u003ch1\u003eExample Domain\u003c/h1\u003e\\n\n    \u003cp\u003eThis domain is established to be used for illustrative examples in documents. You may\n    use this\\n    domain in examples without prior coordination or asking for permission.\u003c/p\u003e\n    \\n    \u003cp\u003e\u003ca href=\"http://www.iana.org/domains/example\"\u003eMore information...\u003c/a\u003e\u003c/p\u003e\\n\u003c/div\u003e\n    \\n\u003c/body\u003e\\n\u003c/div\u003e\u003c/body\u003e\u003c/html\u003e\"\"\"\n\nChange Log\n----------\n\n-  0.8.2 Added article author(s) (thanks @mattblaha)\n-  0.8.1 Fixed processing of non-ascii HTMLs via regexps.\n-  0.8 Replaced XHTML output with HTML5 output in summary() call.\n-  0.7.1 Support for Python 3.7 . Fixed a slowdown when processing documents with lots of spaces.\n-  0.7 Improved HTML5 tags handling. Fixed stripping unwanted HTML nodes (only first matching node was removed before).\n-  0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3 - 3.6\n-  0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and 3.4\n-  0.4 Added Videos loading and allowed more images per paragraph\n-  0.3 Added Document.encoding, positive\\_keywords and negative\\_keywords\n\nLicensing\n---------\n\nThis code is under `the Apache License\n2.0 \u003chttp://www.apache.org/licenses/LICENSE-2.0\u003e`__ license.\n\nThanks to\n---------\n\n-  Latest `readability.js \u003chttps://github.com/MHordecki/readability-redux/blob/master/readability/readability.js\u003e`__\n-  Ruby port by starrhorne and iterationlabs\n-  `Python port \u003chttps://github.com/gfxmonk/python-readability\u003e`__ by gfxmonk\n-  `Decruft effort \u003chttps://web.archive.org/web/20110214150709/https://www.minvolai.com/blog/decruft-arc90s-readability-in-python/\u003e` to move to lxml\n-  \"BR to P\" fix from readability.js which improves quality for smaller texts\n-  Github users contributions.\n","funding_links":[],"categories":["Web Content Extracting","资源列表","Python","Web内容提取","HTML","开源工具","网络","Web Content Extracting [🔝](#readme)","Awesome Python"],"sub_categories":["网页内容提取","预处理","Web Content Extracting"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fburiy%2Fpython-readability","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fburiy%2Fpython-readability","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fburiy%2Fpython-readability/lists"}