Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/buriy/python-readability
fast python port of arc90's readability tool, updated to match latest readability.js!
https://github.com/buriy/python-readability
Last synced: about 1 month ago
JSON representation
fast python port of arc90's readability tool, updated to match latest readability.js!
- Host: GitHub
- URL: https://github.com/buriy/python-readability
- Owner: buriy
- License: apache-2.0
- Fork: true (timbertson/python-readability)
- Created: 2011-05-02T18:51:48.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2024-08-15T12:15:29.000Z (3 months ago)
- Last Synced: 2024-09-25T23:33:06.380Z (about 1 month ago)
- Language: Python
- Homepage: https://github.com/buriy/python-readability
- Size: 714 KB
- Stars: 2,651
- Watchers: 94
- Forks: 348
- Open Issues: 40
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- my-awesome-github-stars - buriy/python-readability - fast python port of arc90's readability tool, updated to match latest readability.js! (Python)
- awesome-python-resources - GitHub - 34% open · ⏱️ 10.04.2022): (网络)
- awesome-starts - buriy/python-readability - fast python port of arc90's readability tool, updated to match latest readability.js! (HTML)
- starred-awesome - python-readability - fast python port of arc90's readability tool, updated to match latest readability.js! (HTML)
- Awesome-LLM-RAG-Application - python-readability
README
.. image:: https://travis-ci.org/buriy/python-readability.svg?branch=master
:target: https://travis-ci.org/buriy/python-readability
.. image:: https://img.shields.io/pypi/v/readability-lxml.svg
:target: https://pypi.python.org/pypi/readability-lxmlpython-readability
==================Given an HTML document, extract and clean up the main body text and title.
This is a Python port of a Ruby port of `arc90's Readability
project `__.Installation
------------It's easy using ``pip``, just run:
.. code-block:: bash
$ pip install readability-lxml
As an alternative, you may also use conda to install, just run:
.. code-block:: bash
$ conda install -c conda-forge readability-lxml
Usage
-----.. code-block:: python
>>> import requests
>>> from readability import Document>>> response = requests.get('http://example.com')
>>> doc = Document(response.content)
>>> doc.title()
'Example Domain'>>> doc.summary()
"""\n"""\nExample Domain
\n
This domain is established to be used for illustrative examples in documents. You may
use this\n domain in examples without prior coordination or asking for permission.
\n \n
\n\nChange Log
----------- 0.8.2 Added article author(s) (thanks @mattblaha)
- 0.8.1 Fixed processing of non-ascii HTMLs via regexps.
- 0.8 Replaced XHTML output with HTML5 output in summary() call.
- 0.7.1 Support for Python 3.7 . Fixed a slowdown when processing documents with lots of spaces.
- 0.7 Improved HTML5 tags handling. Fixed stripping unwanted HTML nodes (only first matching node was removed before).
- 0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3 - 3.6
- 0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and 3.4
- 0.4 Added Videos loading and allowed more images per paragraph
- 0.3 Added Document.encoding, positive\_keywords and negative\_keywordsLicensing
---------This code is under `the Apache License
2.0 `__ license.Thanks to
---------- Latest `readability.js `__
- Ruby port by starrhorne and iterationlabs
- `Python port `__ by gfxmonk
- `Decruft effort ` to move to lxml
- "BR to P" fix from readability.js which improves quality for smaller texts
- Github users contributions.