Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/popcorn-official/pop-api-scraper
The base modules for the popcorn-api scraper
https://github.com/popcorn-official/pop-api-scraper
cheerio http popcorn popcorn-api popcorn-time
Last synced: 4 days ago
JSON representation
The base modules for the popcorn-api scraper
- Host: GitHub
- URL: https://github.com/popcorn-official/pop-api-scraper
- Owner: popcorn-official
- License: mit
- Created: 2017-12-27T19:28:04.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2022-12-06T19:14:14.000Z (almost 2 years ago)
- Last Synced: 2024-09-18T19:51:32.352Z (about 2 months ago)
- Topics: cheerio, http, popcorn, popcorn-api, popcorn-time
- Language: JavaScript
- Homepage: https://popcorn-official.github.io/pop-api-scraper/manual/index.html
- Size: 780 KB
- Stars: 18
- Watchers: 6
- Forks: 25
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# pop-api-scraper
[![Build Status](https://travis-ci.org/popcorn-official/pop-api-scraper.svg?branch=master)](https://travis-ci.org/popcorn-official/pop-api-scraper)
[![Windows Build](https://img.shields.io/appveyor/ci/ChrisAlderson/pop-api-scraper/master.svg?label=windows)](https://ci.appveyor.com/project/ChrisAlderson/pop-api-scraper)
[![Coverage Status](https://coveralls.io/repos/github/popcorn-official/pop-api-scraper/badge.svg?branch=master)](https://coveralls.io/github/popcorn-official/pop-api-scraper?branch=master)
[![Dependency Status](https://david-dm.org/popcorn-official/pop-api-scraper.svg)](https://david-dm.org/popcorn-official/pop-api-scraper)
[![devDependencies Status](https://david-dm.org/popcorn-official/pop-api-scraper/dev-status.svg)](https://david-dm.org/popcorn-official/pop-api-scraper?type=dev)## Features
The pop-api-scraper project aims to provide the core modules for the
[`popcorn-api`](https://github.com/popcorn-official/popcorn-api) scraper, but
can also be used for other purposes by using middleware.
- Strategy pattern with providers
- Cronjobs
- Scraper wrapper class
- HttpService with [`got`](https://github.com/sindresorhus/got)## Installation
```
$ npm install --save pop-api-scraper pop-api
```## Documentation
- [General documentation](https://popcorn-official.github.io/pop-api-scraper/manual/index.html)
- [Api docs](https://popcorn-official.github.io/pop-api-scraper/identifiers.html)
- [Usage](https://popcorn-official.github.io/pop-api-scraper/manual/usage.html)
- [Middleware](https://popcorn-official.github.io/pop-api-scraper/manual/middleware.html)## Usage
For the basic setup you need to create a `Provider` (strategy) the
`PopApiScraper` instance can use. The `PopApiScraper` implements the strategy
pattern, where the providers are the strategies.The example below makes a HTTP GET request to a web service or website. from
there on you are free to implement how and what data you want to get from it.```js
// ./ExampleProvider.js
import { AbstractProvider, HttpService } from 'pop-api-scraper'// Extend from the internal AbstractProvider.
export default class ExampleProvider extends AbstractProvider {constructor(PopApiScraper, {name, configs, maxWebRequests = 2}) {
super(PopApiScraper, {name, configs, maxWebRequests})
}// Override the `scrapeConfig` method to get the content from one
// configuration.
scrapeConfig(config) {
// A HTTP service to send HTTP requests.
this.httpService = new HttpService({
baseUrl: config.baseUrl
})// HTTP GET request to: https://jsonplaceholder.typicode.com/posts?foo=bar
return this.httpService.get('/posts', config.httpOptions)
.then(res => res.data)
}}
```Bundle it all up together with
[`pop-api`](https://github.com/popcorn-official/pop-api):```js
// ./index.js
import os from 'os'
import { PopApi } from 'pop-api'
import { join } from 'path'
import { Cron, PopApiScraper } from 'pop-api-scraper'import ExampleProvider from './ExampleProvider'
(async () => {
try {
// Let the PopApiScraper use the ExampleProvider o scrape data.
PopApiScraper.use(ExampleProvider, {
name: 'example-provider',
configs: [{
baseUrl: 'https://jsonplaceholder.typicode.com',
httpOptions: {
query: {
foo: 'bar'
}
}
}],
maxWebRequests: 2
})// Register the PopApiScraper middleware to the pop-api instance.
PopApi.use(PopApiScraper, {
statusPath: join(...[os.tmpdir(), 'status.json']),
updatedPath: join(...[os.tmpdir(), 'updated.json'])
})
// Optionally you can use the Cron middleware to scrape for content on a
// regulat basis.
PopApi.use(Cron, {
cronTime: '0 0 */6 * * *',
start: false
})// PopApi now has a `scraper` instance.
const res = await PopApi.scraper.scrape()
console.info(res[0])
} catch (err) {
console.error(err)
}
})()
```## License
MIT License