https://github.com/mitica/ascrape-js

Extracts article content from a web page.
https://github.com/mitica/ascrape-js

article-extracting cheerio

Last synced: 11 months ago
JSON representation

Extracts article content from a web page.

Host: GitHub
URL: https://github.com/mitica/ascrape-js
Owner: mitica
Created: 2016-07-14T04:35:34.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2017-02-25T13:43:17.000Z (over 9 years ago)
Last Synced: 2025-06-26T23:58:59.605Z (12 months ago)
Topics: article-extracting, cheerio
Language: JavaScript
Size: 225 KB
Stars: 10
Watchers: 2
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # ascrape

Nodejs module for extracting web page content using Cheerio.

This module is based on [luin](https://github.com/luin/readability)'s readability project.

## Install

```

npm install ascrape

```

## Usage

```

var scrape = require('ascrape');

scrape(html [, options], callback);

```

**Where**

- **html** url or html code.

- **options** is an optional options object

- **callback** is the callback to run - callback(error, article, meta)

## Example

```

var scrape = require('ascrape');

scrape('http://howtonode.org/really-simple-file-uploads', function(err, article, meta) {

  // Main Article

  console.log(article.content.text());

  // Title

  console.log(article.title);

  // Article HTML Source Code

  console.log(article.content.html());

});

```

**NB** If the page has been marked with charset other than utf-8, it will be converted automatically. Charsets such as GBK, GB2312 is also supported.

## Options

ascrape will pass the options to request directly. See request lib to view all available options.

ascrape has one additional option:

- **preprocess** - which should be a function to check or modify downloaded source before passing it to ascrape.

```

scrape(url, {

  preprocess: function(source, response, contentType, callback) {

    if (source.length > maxBodySize) {

      return callback(new Error('too big'));

    }

    callback(null, source);

  }, function(err, article, response) {

    //...

  });

```

### Article object

- **content** - The article content of the web page. Return false if failed. Is a Cheerio object.

- **title** - The article title of the web page. It's may not same to the text in the `` tag.

- **excerpt** - The article description from any description, og:description or twitter:description ``

## License

MIT License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mitica/ascrape-js

Awesome Lists containing this project

README