Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/olragon/node-readability
Turn any web page into a clean view
https://github.com/olragon/node-readability
Last synced: 30 days ago
JSON representation
Turn any web page into a clean view
- Host: GitHub
- URL: https://github.com/olragon/node-readability
- Owner: olragon
- Fork: true (luin/readability)
- Created: 2014-03-22T05:50:56.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-03-22T05:53:11.000Z (over 10 years ago)
- Last Synced: 2024-10-04T12:06:08.171Z (2 months ago)
- Language: JavaScript
- Homepage:
- Size: 185 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Readability
Turn any web page into a clean view. This module is based on arc90's readability project.
[![Build Status](https://travis-ci.org/luin/node-readability.png?branch=master)](https://travis-ci.org/luin/node-readability)
### Features
1. Optimized for more websites.
2. Support encodings such as GBK and GB2312.
3. Converts relative urls to absolute for images and links automatically(Thank [Guillermo Baigorria](https://github.com/gbaygon) & [Tom Sutton](https://github.com/tomsutton1984)).## Install
npm install node-readability
## Usage
`read(html [, options], callback)`
Where
* **html** url or html code.
* **options** is an optional options object
* **callback** is the callback to run - `callback(error, article, meta)`Example
var read = require('node-readability');
read('http://howtonode.org/really-simple-file-uploads', function(err, article, meta) {
// The main body of the page.
console.log(article.content);
// The title of the page.
console.log(article.title);// The raw HTML code of the page
console.log(article.html);
// The document object of the page
console.log(article.document);
// The response object from request lib
console.log(meta);
});**NB** If the page has been marked with charset other than utf-8, it will be converted automatically. Charsets such as GBK, GB2312 is also supported.
## Options
node-readability will pass the options to [request](https://github.com/mikeal/request) directly.
See request lib to view all available options.node-readability has additional option cleanRules which allow set your own validation rule for tags.
If true rule is valid, otherwise no.
options.cleanRules = [callback(obj, tagName)]
```
read(url, {
cleanRulers : [
function(obj, tag) {
if(tag === 'object') {
if(obj.getAttribute('class') === 'BrightcoveExperience') {
return true;
}
}
}
]
}, function(err, article, response) {});
```
## article object### content
The article content of the web page. Return `false` if failed.
### title
The article title of the web page. It's may not same to the text in the `` tag.
### html
The original html of the web page.
### document
The document of the web page generated by jsdom. You can use it to access the DOM directly(for example, `article.document.getElementById('main')`).## meta object
response object from request lib. If you need to get current url after all redirect or get some headers it can be useful.
## Contributors
https://github.com/luin/node-readability/graphs/contributors
## License
This code is under the Apache License 2.0. http://www.apache.org/licenses/LICENSE-2.0
[![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/luin/node-readability/trend.png)](https://bitdeli.com/free "Bitdeli Badge")