https://github.com/adityasinghvats/web-crawler
This is a project which mimics the web crawlers used in large browser engines like Chromium and Gecko.
https://github.com/adityasinghvats/web-crawler
csv jest-tests seo web-crawling
Last synced: 8 months ago
JSON representation
This is a project which mimics the web crawlers used in large browser engines like Chromium and Gecko.
- Host: GitHub
- URL: https://github.com/adityasinghvats/web-crawler
- Owner: Adityasinghvats
- Created: 2025-01-24T17:06:16.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-25T14:40:02.000Z (about 1 year ago)
- Last Synced: 2025-06-08T10:05:16.710Z (10 months ago)
- Topics: csv, jest-tests, seo, web-crawling
- Language: JavaScript
- Homepage:
- Size: 48.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web-crawler
This is a project which mimics the web crawlers used in large browser engines like Chromium and Gecko.
## Description
This web crawler is designed to crawl web pages starting from a base URL, collect links, and generate a report of the pages it has visited and the number of times each page was found. It is built using Node.js and demonstrates basic web crawling and reporting functionalities.
## Features
- Crawl web pages and collect links
- Normalize URLs
- Handle both absolute and relative URLs
- Generate a report of crawled pages and their hit counts
- Unit tests for core functionalities using Jest
- Ability to create report in CSV format for usage in excel or python parsing.
## Installation
1. Clone the repository:
```sh
git clone https://github.com/Adityasinghvats/web-crawler.git
```
2. Navigate to the project directory:
```sh
cd webcrawler
```
3. Install the dependencies:
```sh
npm install
```
## Usage
To start the web crawler, run the following command:
```sh
npm start