Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
https://github.com/appliedsoul/crawlmatic
crawler scraper
Last synced: 1 day ago
JSON representation
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
- Host: GitHub
- URL: https://github.com/appliedsoul/crawlmatic
- Owner: AppliedSoul
- License: other
- Created: 2018-08-14T14:42:44.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-06-02T09:06:56.000Z (over 4 years ago)
- Last Synced: 2024-12-26T04:32:59.452Z (6 days ago)
- Topics: crawler, scraper
- Language: JavaScript
- Size: 44.9 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: license.txt
Awesome Lists containing this project
README
# crawlmatic
[![npm package](https://nodei.co/npm/crawlmatic.png?downloads=true&downloadRank=true&stars=true)](https://nodei.co/npm/crawlmatic/)[![Build Status](https://travis-ci.org/AppliedSoul/crawlmatic.svg?branch=master)](https://travis-ci.org/AppliedSoul/crawlmatic) [![Coverage Status](https://coveralls.io/repos/github/AppliedSoul/crawlmatic/badge.svg?branch=master)](https://coveralls.io/github/AppliedSoul/crawlmatic?branch=master) [![Greenkeeper badge](https://badges.greenkeeper.io/AppliedSoul/crawlmatic.svg)](https://greenkeeper.io/)[![Package quality](http://packagequality.com/shield/crawlmatic.svg)](http://packagequality.com/#?package=crawlmatic)
Single library for static or dynamic website crawling needs.
A standard wrapper for [HCCrawler](https://github.com/yujiosaka/headless-chrome-crawler/blob/master/docs/API.md) & [node-crawler](https://github.com/bda-research/node-crawler), based on bluebird promises.Install using npm:
```
npm i crawlmatic --save
```Dynamic crawling example:
```javascript
const {
DynamicCrawler
} = require('crawlmatic');// Initialize with HCCrawler options
const crawler = new DynamicCrawler({
//dynamically evaluate page title
evaluatePage: (() => ({
title: $('title').text(),
}))
});
//Setup - resolved when Chromium instance is up
crawler.setup().then(() => {// make requests with HCCrawler queue options
crawler.request({
url: "http://example.com"
}).then((resp) => {
console.log(resp.result.title);// destroy the instance
process.nextTick(() => crawler.destroy())
})});
```
Static crawling example:
```javascript
const {
StaticCrawler
} = require('crawlmatic');
//Initialize with node-crawler options
const staticCrawler = new StaticCrawler({
maxConnections: 10,
retries: 3
});//setup internal node-crawler instance and resolves promise
staticCrawler.setup().then(() => {
// makes request with node-crawler queue options
staticCrawler.request({
url: 'http://example.com'
}).then((resp) => {
//server side response parsing using cheerio
let $ = res.$;
console.log($("title").text());// destroy the instance
process.nextTick(() => crawler.destroy())
})
});```