https://github.com/appliedsoul/crawlmatic

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
https://github.com/appliedsoul/crawlmatic

crawler scraper

Last synced: 3 months ago
JSON representation

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.

Host: GitHub
URL: https://github.com/appliedsoul/crawlmatic
Owner: AppliedSoul
License: other
Created: 2018-08-14T14:42:44.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2020-06-02T09:06:56.000Z (almost 5 years ago)
Last Synced: 2025-02-14T12:36:47.862Z (3 months ago)
Topics: crawler, scraper
Language: JavaScript
Size: 44.9 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: license.txt

Awesome Lists containing this project

README

        # crawlmatic

[![npm package](https://nodei.co/npm/crawlmatic.png?downloads=true&downloadRank=true&stars=true)](https://nodei.co/npm/crawlmatic/)

 [![Build Status](https://travis-ci.org/AppliedSoul/crawlmatic.svg?branch=master)](https://travis-ci.org/AppliedSoul/crawlmatic) [![Coverage Status](https://coveralls.io/repos/github/AppliedSoul/crawlmatic/badge.svg?branch=master)](https://coveralls.io/github/AppliedSoul/crawlmatic?branch=master) [![Greenkeeper badge](https://badges.greenkeeper.io/AppliedSoul/crawlmatic.svg)](https://greenkeeper.io/)[![Package quality](http://packagequality.com/shield/crawlmatic.svg)](http://packagequality.com/#?package=crawlmatic)

Single library for static or dynamic website crawling needs.

A standard wrapper for [HCCrawler](https://github.com/yujiosaka/headless-chrome-crawler/blob/master/docs/API.md) & [node-crawler](https://github.com/bda-research/node-crawler), based on bluebird promises.

Install using npm:

```

npm i crawlmatic --save

```

 Dynamic crawling example:

```javascript

const {

  DynamicCrawler

} = require('crawlmatic');

// Initialize with HCCrawler options

const crawler = new DynamicCrawler({

  //dynamically evaluate page title

  evaluatePage: (() => ({

    title: $('title').text(),

  }))

});

//Setup - resolved when Chromium instance is up

crawler.setup().then(() => {

  // make requests with HCCrawler queue options

  crawler.request({

    url: "http://example.com"

  }).then((resp) => {

    console.log(resp.result.title);

    // destroy the instance

    process.nextTick(() => crawler.destroy())

  })

});

```

Static crawling example:

```javascript

const {

  StaticCrawler

} = require('crawlmatic');

//Initialize with node-crawler options

const staticCrawler = new StaticCrawler({

  maxConnections: 10,

  retries: 3

});

//setup internal node-crawler instance and resolves promise

staticCrawler.setup().then(() => {

  // makes request with node-crawler queue options

  staticCrawler.request({

    url: 'http://example.com'

  }).then((resp) => {

    //server side response parsing using cheerio

    let $ = res.$;

    console.log($("title").text());

    // destroy the instance

    process.nextTick(() => crawler.destroy())

  })

});

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/appliedsoul/crawlmatic

Awesome Lists containing this project

README