https://github.com/apinf/api-harvester
API harvester
https://github.com/apinf/api-harvester
Last synced: about 2 months ago
JSON representation
API harvester
- Host: GitHub
- URL: https://github.com/apinf/api-harvester
- Owner: apinf
- License: other
- Created: 2017-03-22T13:16:49.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-11-26T10:19:56.000Z (over 8 years ago)
- Last Synced: 2026-01-01T23:31:17.824Z (6 months ago)
- Language: JavaScript
- Homepage:
- Size: 1.36 MB
- Stars: 0
- Watchers: 14
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# API-harvester
Apinf.io has a catalog that lists APIs. We need content. The harvester collects API metadata from multiple sources.
**Product Owner:** Jarkko Moilanen (APInf Oy), jarkko(at)apinf.io
# Scope
The harvesting happens in 2 separate steps:
- Source step
- Transform step
## Source step
We collect data in 3 different ways: web scraping, REST api and CSV files.
The output is JSON.
[Source step requirements](docs/source-step/requirements.md) are in separate file.
## Transform step
JSON input is transformed to the appropriate output using the correct datamodel.
[Transform step requirements](docs/transform-step/requirements.md) are in separate file.
## Graphical overview

# Requirements
- No GUI is needed. We work with command line.
- NodeJS is the most appropriate: JSON, mature, easy to deploy
- This project needs to be done modular. This way we can expand easily.
- We should be able to run this with a cron script or just by calling command line.
- Needs to be easily deployed with Docker (Compose).
# Out of scope
- Any other site to be scraped (only programmableweb.com is in scope)
- Conflict resolution
- Ownership
- Deleted APIs
- Scheduling
- Lifecycle
- Performance
- Frontend design
- Legal