https://github.com/reorx/readability
html main body extractor
https://github.com/reorx/readability
Last synced: 8 days ago
JSON representation
html main body extractor
- Host: GitHub
- URL: https://github.com/reorx/readability
- Owner: reorx
- License: mit
- Created: 2013-01-27T05:34:36.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2015-07-15T07:33:01.000Z (almost 10 years ago)
- Last Synced: 2025-04-14T08:12:45.160Z (8 days ago)
- Language: HTML
- Size: 260 KB
- Stars: 17
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
Readability
===========Another algorithm & implementation of widely known readability conception.
Usage:
.. code-block:: python
import requests
from readability import Readabilityhtml = requests.get('http://blog.hucheng.com/articles/482.html').content
parser = Readability(html.decode('utf8'))parser.title
parser.article
parser.article.get_text()