Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/micrub/crawl-etc
https://github.com/micrub/crawl-etc
Last synced: 6 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/micrub/crawl-etc
- Owner: micrub
- Created: 2021-02-14T18:50:09.000Z (almost 4 years ago)
- Default Branch: develop
- Last Pushed: 2021-02-14T18:54:30.000Z (almost 4 years ago)
- Last Synced: 2024-11-08T08:46:40.601Z (about 2 months ago)
- Language: JavaScript
- Size: 32.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# crawl-it requirements
* as an application it should be broken down to scalable components.
* as an application it should handle an API request that triggers a crawl-worker job for specified url.
* the request MUST be made using POST methodto `parse` endpoint and contain 'url' property in it's body.
* as an application it should store parsed url contetnt and sublinks in persistent database.
* for sake of demonstarating the case we use redis to store it all, thou it can be refactored to be used with any non sql backed solution.
* as an application it is assumed that crawling is done only on origin page
* as an application it is assumed that crawling is made maximum 5 levels in depth , like in `http:///some.com/1/2/3/4/5`