Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
https://github.com/akashrajpurohit/node-crawler
crawler node-crawler nodejs url
Last synced: about 6 hours ago
JSON representation
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
- Host: GitHub
- URL: https://github.com/akashrajpurohit/node-crawler
- Owner: AkashRajpurohit
- Created: 2019-05-24T04:50:14.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-12-05T02:15:24.000Z (11 months ago)
- Last Synced: 2024-05-30T02:13:39.991Z (5 months ago)
- Topics: crawler, node-crawler, nodejs, url
- Language: JavaScript
- Homepage:
- Size: 47.9 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Nodejs Crawler
### It is a basic nodejs crawler to crawl any domain and get all the urls from that domain
Sample Input HTML page server at ```localhost:4000```
```html
Hello World
Hello
World
Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nulla, laudantium, omnis. Ea quaerat minima, nostrum doloremque repellendus! Ratione quasi, non eligendi quidem at culpa animi vitae id eius corrupti deleniti.
This is some more dummy text
Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quaerat vitae dolor, atque, excepturi numquam cumque ut iusto, odio perferendis cum rem saepe eveniet voluptatum fuga debitis et illo distinctio eligendi!
Hi there, this is empty div with no children :(
Different section
```
Output:
```
💻💻💻 Scraping...
{ links:
[ { linkText: 'Home', linkUrl: '/index.html' },
{ linkText: 'About', linkUrl: '/about.html' },
{ linkText: 'Contact', linkUrl: '/contact.html' },
{ linkText: 'Blogs', linkUrl: '/blog.html' } ],
requestTime: 64,
title: 'Hello World',
url: 'http://localhost:4000' }
🥳🥳🥳 Done...
```