Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
https://github.com/akashrajpurohit/node-crawler

crawler node-crawler nodejs url

Last synced: about 6 hours ago
JSON representation

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

Awesome Lists containing this project

README

        

# Nodejs Crawler

### It is a basic nodejs crawler to crawl any domain and get all the urls from that domain

Sample Input HTML page server at ```localhost:4000```

```html


Hello World





Hello


World


Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nulla, laudantium, omnis. Ea quaerat minima, nostrum doloremque repellendus! Ratione quasi, non eligendi quidem at culpa animi vitae id eius corrupti deleniti.
Some image




This is some more dummy text


Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quaerat vitae dolor, atque, excepturi numquam cumque ut iusto, odio perferendis cum rem saepe eveniet voluptatum fuga debitis et illo distinctio eligendi!




Hi there, this is empty div with no children :(


Different section


```

Output:
```
💻💻💻 Scraping...

{ links:
[ { linkText: 'Home', linkUrl: '/index.html' },
{ linkText: 'About', linkUrl: '/about.html' },
{ linkText: 'Contact', linkUrl: '/contact.html' },
{ linkText: 'Blogs', linkUrl: '/blog.html' } ],
requestTime: 64,
title: 'Hello World',
url: 'http://localhost:4000' }

🥳🥳🥳 Done...
```