https://github.com/abhijeetps/noddler
Web Crawler build using NodeJS
https://github.com/abhijeetps/noddler
cheerio crawler csv nodejs
Last synced: 4 months ago
JSON representation
Web Crawler build using NodeJS
- Host: GitHub
- URL: https://github.com/abhijeetps/noddler
- Owner: abhijeetps
- Created: 2019-02-23T22:19:51.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-13T05:59:41.000Z (over 7 years ago)
- Last Synced: 2025-02-08T01:27:17.413Z (over 1 year ago)
- Topics: cheerio, crawler, csv, nodejs
- Language: JavaScript
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 28
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web Crawler NodeJS
Web Crawler build using NodeJS
## Description
A recursive web crawler built using NodeJS that harvest all possible hyperlinks belonging to a particular domain (default: [medium.com](https://medium.com)) and stores them in a CSV files.
## Getting Started
To get started, clone this repository locally and move inside the repository directory
### Dependencies
* NodeJS
### Installing
* In the project directory, type `npm install` to install all the packages and dependencies.
* Rename _.env.example_ to _.env_ and assign a PORT number (e.g. 3000) to it.
* To configure default URL to be crawled, open _config.js_ and update the value of key `url` to your own URL that you want to crawl.
### Executing program
To run the app, type the following command:
```
node index.js
```
Then visit localhost:PORT/crawl to start crawling.
The application will create CSV files of the app in _data/*.csv_ directory for every different pages that it will visit.
## Help
Are you struck while working with the app?
Or still have some doubt of how to work on it something. Please feel free to open an issue anytime.
## Authors
[Abhijeet Singh](https://github.com/abhijeetps)