https://github.com/egoist/recrawler

Remote web content crawler done right.
https://github.com/egoist/recrawler

Last synced: 3 months ago
JSON representation

Remote web content crawler done right.

Host: GitHub
URL: https://github.com/egoist/recrawler
Owner: egoist
License: mit
Created: 2016-01-31T09:21:16.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2021-12-25T13:52:36.000Z (over 3 years ago)
Last Synced: 2025-04-30T21:09:36.699Z (3 months ago)
Language: JavaScript
Size: 45.9 KB
Stars: 30
Watchers: 4
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # recrawler [![NPM version](https://img.shields.io/npm/v/recrawler.svg)](https://npmjs.com/package/recrawler) [![NPM downloads](https://img.shields.io/npm/dm/recrawler.svg)](https://npmjs.com/package/recrawler) [![Circle CI](https://circleci.com/gh/egoist/recrawler/tree/master.svg?style=svg)](https://circleci.com/gh/egoist/recrawler/tree/master)

> Remote web content crawler done right.

## Motivation

Sometimes I want to grab some nice images from a url like http://bbs.005.tv/thread-492392-1-1.html, so I made this little program to combine `node-fetch` and `cheerio` to make my attempt fulfilled. 

## Install

```bash

$ npm install --save recrawler

```

For Single Page Apps please head to [recrawler-spa](https://github.com/egoist/recrawler-spa)

## Usage

```js

const recrawler = require('recrawler')

recrawler('http://some-url.com/a/b/c')

	.then($ => {

		$('img.nice-images').each(function () {

			const url = $(this).attr('src')

			console.log(url)

		})

	})

```

## API

### recrawler(url, opts)

#### opts

##### cheerio

[cheerio](https://github.com/cheeriojs/cheerio) options. Except `decodeEntities` is `false` by default here.

## License

MIT © [EGOIST](https://github.com/egoist)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/egoist/recrawler

Awesome Lists containing this project

README