Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stacksapien/scalable-nodejs-web-crawler
A NodeJS based web-Crawler which scales on the go!
https://github.com/stacksapien/scalable-nodejs-web-crawler
nodejs react scalable-web-crawler scalable-web-scrapper web-crawler web-crawling web-crawls web-scraping web-scraping-nodejs web-scraping-software
Last synced: 7 days ago
JSON representation
A NodeJS based web-Crawler which scales on the go!
- Host: GitHub
- URL: https://github.com/stacksapien/scalable-nodejs-web-crawler
- Owner: stacksapien
- Created: 2020-03-11T19:28:25.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-24T01:36:42.000Z (almost 2 years ago)
- Last Synced: 2023-10-05T07:04:29.469Z (over 1 year ago)
- Topics: nodejs, react, scalable-web-crawler, scalable-web-scrapper, web-crawler, web-crawling, web-crawls, web-scraping, web-scraping-nodejs, web-scraping-software
- Language: JavaScript
- Homepage:
- Size: 223 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scalable-NodeJS-Web-Crawler
A NodeJS based web-Crawler which can scale on the go!
- Support both Static & Dynamic Page Crawling
## Pre-requisite:
- Linux (Ubuntu)
- Redis
- Nodejs## Installation:
- Install `NodeJS` by executing the below command in root directory of project:
```sh
$ cd init-scripts/
$ sudo bash install-nodejs.sh
```
- Install `Redis`
```sh
$ sudo bash install-redis.sh
```
- Install project dependencies. In root directory of the project execute the following command:
```sh
$ npm install
```
## Usage:
```sh
$ node index.js "" "path-to-store-url"
$ node index.js "https://stacksapien.com" "./temp"
```
- In Above Example, Files like `valid-urls.txt`, `external-urls.txt` & `invalid-urls.txt` will be generated in `temp` folder of your git project directory.