Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bndr/node-read
Get Readable Content from any page. Based on Arc90's readability project using cheerio engine.
https://github.com/bndr/node-read
Last synced: 30 days ago
JSON representation
Get Readable Content from any page. Based on Arc90's readability project using cheerio engine.
- Host: GitHub
- URL: https://github.com/bndr/node-read
- Owner: bndr
- License: apache-2.0
- Created: 2014-04-24T18:55:58.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2018-08-10T17:36:49.000Z (over 6 years ago)
- Last Synced: 2024-05-22T22:33:35.798Z (7 months ago)
- Language: JavaScript
- Size: 55.7 KB
- Stars: 634
- Watchers: 18
- Forks: 39
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
[![NPM](https://nodei.co/npm/node-read.png?downloads=true)](https://nodei.co/npm/node-read/)
# Node-readGet Readable Content from any page. Based on Arc90's readability project.
## Features
1. Blazingly Fast. This project is based on Cheerio engine, which is 8x times faster than JSDOM.
## Why not Node-readability
Before starting this project I used Node-readability, but the dependencies of that project plus the slowness of JSDOM made it very frustrating to work with. The compiling of contextify module (dependency of JSDOM) failed 9/10 times. And if you wanted to use node-readability with node-webkit you had to manually rebuild contextify with nw-gyp, which is not the optimal solution.
So I decided to write my own version of Arc90's Readability using the fast Cheerio engine with the least number of dependencies.
The Usage of this module is similiar to node-readability, so it's easy to switch.
## Install
npm install node-read
## Usage`read(html [, options], callback)`
Where
* **html** url or html code.
* **options** is an optional options object
* **callback** is the callback to run - `callback(error, article, meta)`Example
var read = require('node-read');
read('http://howtonode.org/really-simple-file-uploads', function(err, article, res) {
// Main Article.
console.log(article.content);
// Title
console.log(article.title);// HTML
console.log(article.html);
// DOM
console.log(article.dom);
});## TODO
* Examples, Docs
* Get Comments with articles
* Get the Author of the article
* Better removal of unnecessary nodes
* Better scoring of content:
- Based on siblings
- Based on content length, common words
- Link density, Image density, other common elements density