https://github.com/indatawetrust/reporter
Crawler queue creation tool for paging
https://github.com/indatawetrust/reporter
crawler
Last synced: about 1 year ago
JSON representation
Crawler queue creation tool for paging
- Host: GitHub
- URL: https://github.com/indatawetrust/reporter
- Owner: indatawetrust
- License: mit
- Created: 2017-03-16T16:19:36.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-06-29T19:49:50.000Z (almost 9 years ago)
- Last Synced: 2025-05-05T19:19:04.344Z (about 1 year ago)
- Topics: crawler
- Language: JavaScript
- Homepage:
- Size: 28.3 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://travis-ci.org/indatawetrust/reporter)

```
npm i -g reporter-cli
```
##### -- site
Pagination url
example: https://news.ycombinator.com/news?p=
##### -- list
list element selector
##### -- link
link element selector
##### -- title
title element selector
##### -- limit
page limit number
##### -- file
output filename
##### -- start
crawl start page
##### -- end
crawl end page
##### -- special
```
: *, : *..
```
```js
--special 'username: >.hnuser*text, score: >.score*text'
```
###### ^
parent element
###### <
previous sibling element
###### >
next sibling element
##### -- heartbeat.js
function to run after each request
example:
```js
module.exports = item => {
console.log(item.url, item.title)
}
```
##### demo
```bash
reporter --site https://news.ycombinator.com/news?p= \
--list .athing \
--link .storylink \
--title .storylink \
--limit 10 \
--special 'username: >.hnuser*text, score: >.score*text'
```