https://github.com/egoist/recrawler
Remote web content crawler done right.
https://github.com/egoist/recrawler
Last synced: 3 months ago
JSON representation
Remote web content crawler done right.
- Host: GitHub
- URL: https://github.com/egoist/recrawler
- Owner: egoist
- License: mit
- Created: 2016-01-31T09:21:16.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2021-12-25T13:52:36.000Z (over 3 years ago)
- Last Synced: 2025-04-30T21:09:36.699Z (3 months ago)
- Language: JavaScript
- Size: 45.9 KB
- Stars: 30
- Watchers: 4
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# recrawler [](https://npmjs.com/package/recrawler) [](https://npmjs.com/package/recrawler) [](https://circleci.com/gh/egoist/recrawler/tree/master)
> Remote web content crawler done right.
## Motivation
Sometimes I want to grab some nice images from a url like http://bbs.005.tv/thread-492392-1-1.html, so I made this little program to combine `node-fetch` and `cheerio` to make my attempt fulfilled.
## Install
```bash
$ npm install --save recrawler
```For Single Page Apps please head to [recrawler-spa](https://github.com/egoist/recrawler-spa)
## Usage
```js
const recrawler = require('recrawler')recrawler('http://some-url.com/a/b/c')
.then($ => {
$('img.nice-images').each(function () {
const url = $(this).attr('src')
console.log(url)
})
})
```## API
### recrawler(url, opts)
#### opts
##### cheerio
[cheerio](https://github.com/cheeriojs/cheerio) options. Except `decodeEntities` is `false` by default here.
## License
MIT © [EGOIST](https://github.com/egoist)