Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/appliedsoul/crawlmatic

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
https://github.com/appliedsoul/crawlmatic

crawler scraper

Last synced: 1 day ago
JSON representation

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.

Awesome Lists containing this project

README

        

# crawlmatic
[![npm package](https://nodei.co/npm/crawlmatic.png?downloads=true&downloadRank=true&stars=true)](https://nodei.co/npm/crawlmatic/)

[![Build Status](https://travis-ci.org/AppliedSoul/crawlmatic.svg?branch=master)](https://travis-ci.org/AppliedSoul/crawlmatic) [![Coverage Status](https://coveralls.io/repos/github/AppliedSoul/crawlmatic/badge.svg?branch=master)](https://coveralls.io/github/AppliedSoul/crawlmatic?branch=master) [![Greenkeeper badge](https://badges.greenkeeper.io/AppliedSoul/crawlmatic.svg)](https://greenkeeper.io/)[![Package quality](http://packagequality.com/shield/crawlmatic.svg)](http://packagequality.com/#?package=crawlmatic)

Single library for static or dynamic website crawling needs.
A standard wrapper for [HCCrawler](https://github.com/yujiosaka/headless-chrome-crawler/blob/master/docs/API.md) & [node-crawler](https://github.com/bda-research/node-crawler), based on bluebird promises.

Install using npm:
```
npm i crawlmatic --save
```

Dynamic crawling example:
```javascript
const {
DynamicCrawler
} = require('crawlmatic');

// Initialize with HCCrawler options
const crawler = new DynamicCrawler({
//dynamically evaluate page title
evaluatePage: (() => ({
title: $('title').text(),
}))
});
//Setup - resolved when Chromium instance is up
crawler.setup().then(() => {

// make requests with HCCrawler queue options
crawler.request({
url: "http://example.com"
}).then((resp) => {
console.log(resp.result.title);

// destroy the instance
process.nextTick(() => crawler.destroy())
})

});
```
Static crawling example:
```javascript
const {
StaticCrawler
} = require('crawlmatic');
//Initialize with node-crawler options
const staticCrawler = new StaticCrawler({
maxConnections: 10,
retries: 3
});

//setup internal node-crawler instance and resolves promise
staticCrawler.setup().then(() => {
// makes request with node-crawler queue options
staticCrawler.request({
url: 'http://example.com'
}).then((resp) => {
//server side response parsing using cheerio
let $ = res.$;
console.log($("title").text());

// destroy the instance
process.nextTick(() => crawler.destroy())
})
});

```