Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abhijeetps/noddler
Web Crawler build using NodeJS
https://github.com/abhijeetps/noddler
cheerio crawler csv nodejs
Last synced: about 1 month ago
JSON representation
Web Crawler build using NodeJS
- Host: GitHub
- URL: https://github.com/abhijeetps/noddler
- Owner: abhijeetps
- Created: 2019-02-23T22:19:51.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-03-13T05:59:41.000Z (almost 6 years ago)
- Last Synced: 2024-10-27T19:04:43.449Z (3 months ago)
- Topics: cheerio, crawler, csv, nodejs
- Language: JavaScript
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 28
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web Crawler NodeJS
Web Crawler build using NodeJS
## Description
A recursive web crawler built using NodeJS that harvest all possible hyperlinks belonging to a particular domain (default: [medium.com](https://medium.com)) and stores them in a CSV files.
## Getting Started
To get started, clone this repository locally and move inside the repository directory
### Dependencies
* NodeJS
### Installing
* In the project directory, type `npm install` to install all the packages and dependencies.
* Rename _.env.example_ to _.env_ and assign a PORT number (e.g. 3000) to it.
* To configure default URL to be crawled, open _config.js_ and update the value of key `url` to your own URL that you want to crawl.### Executing program
To run the app, type the following command:
```
node index.js
```Then visit localhost:PORT/crawl to start crawling.
The application will create CSV files of the app in _data/*.csv_ directory for every different pages that it will visit.
## Help
Are you struck while working with the app?
Or still have some doubt of how to work on it something. Please feel free to open an issue anytime.## Authors
[Abhijeet Singh](https://github.com/abhijeetps)