https://github.com/stacksapien/scalable-nodejs-web-crawler
A NodeJS based web-Crawler which scales on the go!
https://github.com/stacksapien/scalable-nodejs-web-crawler
nodejs react scalable-web-crawler scalable-web-scrapper web-crawler web-crawling web-crawls web-scraping web-scraping-nodejs web-scraping-software
Last synced: about 1 month ago
JSON representation
A NodeJS based web-Crawler which scales on the go!
- Host: GitHub
- URL: https://github.com/stacksapien/scalable-nodejs-web-crawler
- Owner: stacksapien
- Created: 2020-03-11T19:28:25.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-01-24T01:36:42.000Z (almost 3 years ago)
- Last Synced: 2025-01-02T11:16:22.595Z (12 months ago)
- Topics: nodejs, react, scalable-web-crawler, scalable-web-scrapper, web-crawler, web-crawling, web-crawls, web-scraping, web-scraping-nodejs, web-scraping-software
- Language: JavaScript
- Homepage:
- Size: 223 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scalable-NodeJS-Web-Crawler
A NodeJS based web-Crawler which can scale on the go!
- Support both Static & Dynamic Page Crawling
## Pre-requisite:
- Linux (Ubuntu)
- Redis
- Nodejs
## Installation:
- Install `NodeJS` by executing the below command in root directory of project:
```sh
$ cd init-scripts/
$ sudo bash install-nodejs.sh
```
- Install `Redis`
```sh
$ sudo bash install-redis.sh
```
- Install project dependencies. In root directory of the project execute the following command:
```sh
$ npm install
```
## Usage:
```sh
$ node index.js "" "path-to-store-url"
$ node index.js "https://stacksapien.com" "./temp"
```
- In Above Example, Files like `valid-urls.txt`, `external-urls.txt` & `invalid-urls.txt` will be generated in `temp` folder of your git project directory.