Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/indatawetrust/reporter
Crawler queue creation tool for paging
https://github.com/indatawetrust/reporter
crawler
Last synced: 22 days ago
JSON representation
Crawler queue creation tool for paging
- Host: GitHub
- URL: https://github.com/indatawetrust/reporter
- Owner: indatawetrust
- License: mit
- Created: 2017-03-16T16:19:36.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-06-29T19:49:50.000Z (over 7 years ago)
- Last Synced: 2024-10-16T02:31:46.926Z (3 months ago)
- Topics: crawler
- Language: JavaScript
- Homepage:
- Size: 28.3 KB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Travis Build Status](https://img.shields.io/travis/indatawetrust/reporter.svg)](https://travis-ci.org/indatawetrust/reporter)
![img](https://nodei.co/npm/reporter-cli.png?downloads=true)
```
npm i -g reporter-cli
```##### -- site
Pagination url
example: https://news.ycombinator.com/news?p=
##### -- list
list element selector
##### -- link
link element selector
##### -- title
title element selector
##### -- limit
page limit number
##### -- file
output filename
##### -- start
crawl start page
##### -- end
crawl end page
##### -- special
```
: *, : *..
``````js
--special 'username: >.hnuser*text, score: >.score*text'
```###### ^
parent element
###### <
previous sibling element
###### >
next sibling element
##### -- heartbeat.js
function to run after each request
example:
```js
module.exports = item => {
console.log(item.url, item.title)
}
```##### demo
```bash
reporter --site https://news.ycombinator.com/news?p= \
--list .athing \
--link .storylink \
--title .storylink \
--limit 10 \
--special 'username: >.hnuser*text, score: >.score*text'
```