Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/abhijeetps/noddler

Web Crawler build using NodeJS
https://github.com/abhijeetps/noddler

cheerio crawler csv nodejs

Last synced: about 1 month ago
JSON representation

Web Crawler build using NodeJS

Host: GitHub
URL: https://github.com/abhijeetps/noddler
Owner: abhijeetps
Created: 2019-02-23T22:19:51.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2019-03-13T05:59:41.000Z (almost 6 years ago)
Last Synced: 2024-10-27T19:04:43.449Z (3 months ago)
Topics: cheerio, crawler, csv, nodejs
Language: JavaScript
Size: 5.86 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 28
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Web Crawler NodeJS

Web Crawler build using NodeJS

## Description

A recursive web crawler built using NodeJS that harvest all possible hyperlinks belonging to a particular domain (default: [medium.com](https://medium.com)) and stores them in a CSV files.

## Getting Started

To get started, clone this repository locally and move inside the repository directory

### Dependencies

* NodeJS

### Installing

* In the project directory, type `npm install` to install all the packages and dependencies.
* Rename _.env.example_ to _.env_ and assign a PORT number (e.g. 3000) to it.
* To configure default URL to be crawled, open _config.js_ and update the value of key `url` to your own URL that you want to crawl.

### Executing program

To run the app, type the following command:

```
node index.js
```

Then visit localhost:PORT/crawl to start crawling.

The application will create CSV files of the app in _data/*.csv_ directory for every different pages that it will visit.

## Help

Are you struck while working with the app?
Or still have some doubt of how to work on it something. Please feel free to open an issue anytime.

## Authors

[Abhijeet Singh](https://github.com/abhijeetps)