https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
https://github.com/appliedsoul/crawlmatic
crawler scraper
Last synced: 3 months ago
JSON representation
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
- Host: GitHub
- URL: https://github.com/appliedsoul/crawlmatic
- Owner: AppliedSoul
- License: other
- Created: 2018-08-14T14:42:44.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-06-02T09:06:56.000Z (almost 5 years ago)
- Last Synced: 2025-02-14T12:36:47.862Z (3 months ago)
- Topics: crawler, scraper
- Language: JavaScript
- Size: 44.9 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: license.txt
Awesome Lists containing this project
README
# crawlmatic
[](https://nodei.co/npm/crawlmatic/)[](https://travis-ci.org/AppliedSoul/crawlmatic) [](https://coveralls.io/github/AppliedSoul/crawlmatic?branch=master) [](https://greenkeeper.io/)[](http://packagequality.com/#?package=crawlmatic)
Single library for static or dynamic website crawling needs.
A standard wrapper for [HCCrawler](https://github.com/yujiosaka/headless-chrome-crawler/blob/master/docs/API.md) & [node-crawler](https://github.com/bda-research/node-crawler), based on bluebird promises.Install using npm:
```
npm i crawlmatic --save
```Dynamic crawling example:
```javascript
const {
DynamicCrawler
} = require('crawlmatic');// Initialize with HCCrawler options
const crawler = new DynamicCrawler({
//dynamically evaluate page title
evaluatePage: (() => ({
title: $('title').text(),
}))
});
//Setup - resolved when Chromium instance is up
crawler.setup().then(() => {// make requests with HCCrawler queue options
crawler.request({
url: "http://example.com"
}).then((resp) => {
console.log(resp.result.title);// destroy the instance
process.nextTick(() => crawler.destroy())
})});
```
Static crawling example:
```javascript
const {
StaticCrawler
} = require('crawlmatic');
//Initialize with node-crawler options
const staticCrawler = new StaticCrawler({
maxConnections: 10,
retries: 3
});//setup internal node-crawler instance and resolves promise
staticCrawler.setup().then(() => {
// makes request with node-crawler queue options
staticCrawler.request({
url: 'http://example.com'
}).then((resp) => {
//server side response parsing using cheerio
let $ = res.$;
console.log($("title").text());// destroy the instance
process.nextTick(() => crawler.destroy())
})
});```