https://github.com/danielfsousa/jcrawler
:vertical_traffic_light: Asynchronous control flow wrapper to crawl websites
https://github.com/danielfsousa/jcrawler
Last synced: 8 months ago
JSON representation
:vertical_traffic_light: Asynchronous control flow wrapper to crawl websites
- Host: GitHub
- URL: https://github.com/danielfsousa/jcrawler
- Owner: danielfsousa
- License: mit
- Created: 2017-12-08T00:29:19.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-05-08T20:01:33.000Z (about 8 years ago)
- Last Synced: 2025-06-28T20:06:11.377Z (11 months ago)
- Language: JavaScript
- Homepage:
- Size: 35.2 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# jcrawler
Asynchronous control flow wrapper to crawl websites
## How to Install
```bash
npm install jcrawler
```
## Usage
```javascript
const jcrawler = require('jcrawler')
const puppeteer = require('puppeteer')
(async () => {
const crawler = jcrawler({
puppeteer,
concurrency: 2,
rateLimit: 1000, // 1 second
retries: 5,
retryInterval: 1000, // 1 second
backoff: 2, // multiplies the retryInterval for each retry
log: true
})
crawler
.on('data', data => console.log(data)) // events: data, error and end
.on('error', err => console.error(err))
.on('end', (data, results) => console.log(results.timer.time))
const fruits = ['apple', 'banana', 'orange']
await crawler.each(fruits, async (browser, page, fruit) => {
// using puppeteer
await page.goto('http://google.com')
await page.type("input[title='Search']", fruit)
await page.click("input[value=\"I'm Feeling Lucky\"]")
await page.screenshot({ path: `${fruit}.png`) })
})
})()
```
## License
[MIT License](README.md) - [Daniel Sousa](https://github.com/danielfsousa)