Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/egoist/recrawler
Remote web content crawler done right.
https://github.com/egoist/recrawler
Last synced: 22 days ago
JSON representation
Remote web content crawler done right.
- Host: GitHub
- URL: https://github.com/egoist/recrawler
- Owner: egoist
- License: mit
- Created: 2016-01-31T09:21:16.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2021-12-25T13:52:36.000Z (almost 3 years ago)
- Last Synced: 2024-10-04T12:05:41.919Z (about 1 month ago)
- Language: JavaScript
- Size: 45.9 KB
- Stars: 30
- Watchers: 5
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# recrawler [![NPM version](https://img.shields.io/npm/v/recrawler.svg)](https://npmjs.com/package/recrawler) [![NPM downloads](https://img.shields.io/npm/dm/recrawler.svg)](https://npmjs.com/package/recrawler) [![Circle CI](https://circleci.com/gh/egoist/recrawler/tree/master.svg?style=svg)](https://circleci.com/gh/egoist/recrawler/tree/master)
> Remote web content crawler done right.
## Motivation
Sometimes I want to grab some nice images from a url like http://bbs.005.tv/thread-492392-1-1.html, so I made this little program to combine `node-fetch` and `cheerio` to make my attempt fulfilled.
## Install
```bash
$ npm install --save recrawler
```For Single Page Apps please head to [recrawler-spa](https://github.com/egoist/recrawler-spa)
## Usage
```js
const recrawler = require('recrawler')recrawler('http://some-url.com/a/b/c')
.then($ => {
$('img.nice-images').each(function () {
const url = $(this).attr('src')
console.log(url)
})
})
```## API
### recrawler(url, opts)
#### opts
##### cheerio
[cheerio](https://github.com/cheeriojs/cheerio) options. Except `decodeEntities` is `false` by default here.
## License
MIT © [EGOIST](https://github.com/egoist)