Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/phensley/python-readable
Python port of Arc90's Readability content extraction rules
https://github.com/phensley/python-readable
Last synced: about 2 months ago
JSON representation
Python port of Arc90's Readability content extraction rules
- Host: GitHub
- URL: https://github.com/phensley/python-readable
- Owner: phensley
- License: apache-2.0
- Archived: true
- Created: 2010-12-22T21:30:01.000Z (about 14 years ago)
- Default Branch: master
- Last Pushed: 2010-12-24T16:22:03.000Z (about 14 years ago)
- Last Synced: 2024-08-04T04:03:42.781Z (5 months ago)
- Language: JavaScript
- Homepage:
- Size: 381 KB
- Stars: 13
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.mkd
- License: LICENSE
Awesome Lists containing this project
- starred-awesome - python-readable - Python port of Arc90's Readability content extraction rules (JavaScript)
README
readable - A port of the Arc90 readability Javascript rules to Python:
http://lab.arc90.com/experiments/readability
License: Apache 2.0.
Depends on the Python lxml module.
Currently only the article body extraction is implemented, and there is
a slight delta between the output produced by the JS version due to the
differences in how DOM is represented and manipulated between the browser DOM
and Python's lxml. For instance, lxml represents text as attributes on nodes
rather than nodes themselves. These differences complicated the translation of
the algorithm a bit.