Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/indatawetrust/reporter

Crawler queue creation tool for paging
https://github.com/indatawetrust/reporter

crawler

Last synced: about 2 months ago
JSON representation

Crawler queue creation tool for paging

Host: GitHub
URL: https://github.com/indatawetrust/reporter
Owner: indatawetrust
License: mit
Created: 2017-03-16T16:19:36.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2017-06-29T19:49:50.000Z (over 7 years ago)
Last Synced: 2024-10-16T02:31:46.926Z (4 months ago)
Topics: crawler
Language: JavaScript
Homepage:
Size: 28.3 KB
Stars: 3
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        [![Travis Build Status](https://img.shields.io/travis/indatawetrust/reporter.svg)](https://travis-ci.org/indatawetrust/reporter)

![img](https://nodei.co/npm/reporter-cli.png?downloads=true)

```

npm i -g reporter-cli

```

##### -- site

Pagination url

example: https://news.ycombinator.com/news?p=

##### -- list

list element selector

##### -- link

link element selector

##### -- title

title element selector

##### -- limit

page limit number

##### -- file

output filename

##### -- start

crawl start page

##### -- end

crawl end page

##### -- special

```

: *, : *..

```

```js

--special 'username: >.hnuser*text, score: >.score*text'

```

###### ^

parent element

###### <

previous sibling element

###### >

next sibling element

##### -- heartbeat.js

function to run after each request

example:

```js

module.exports = item => {

  console.log(item.url, item.title)

}

```

##### demo

```bash

reporter --site https://news.ycombinator.com/news?p= \

  --list .athing \

  --link .storylink \

  --title .storylink \

  --limit 10 \

  --special 'username: >.hnuser*text, score: >.score*text'

```