https://github.com/adityasinghvats/web-crawler

This is a project which mimics the web crawlers used in large browser engines like Chromium and Gecko.
https://github.com/adityasinghvats/web-crawler

csv jest-tests seo web-crawling

Last synced: 8 months ago
JSON representation

This is a project which mimics the web crawlers used in large browser engines like Chromium and Gecko.

Host: GitHub
URL: https://github.com/adityasinghvats/web-crawler
Owner: Adityasinghvats
Created: 2025-01-24T17:06:16.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-01-25T14:40:02.000Z (about 1 year ago)
Last Synced: 2025-06-08T10:05:16.710Z (10 months ago)
Topics: csv, jest-tests, seo, web-crawling
Language: JavaScript
Homepage:
Size: 48.8 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Web-crawler

This is a project which mimics the web crawlers used in large browser engines like Chromium and Gecko.

## Description

This web crawler is designed to crawl web pages starting from a base URL, collect links, and generate a report of the pages it has visited and the number of times each page was found. It is built using Node.js and demonstrates basic web crawling and reporting functionalities.

## Features

- Crawl web pages and collect links
- Normalize URLs
- Handle both absolute and relative URLs
- Generate a report of crawled pages and their hit counts
- Unit tests for core functionalities using Jest
- Ability to create report in CSV format for usage in excel or python parsing.

## Installation

1. Clone the repository:
```sh
git clone https://github.com/Adityasinghvats/web-crawler.git
```
2. Navigate to the project directory:
```sh
cd webcrawler
```
3. Install the dependencies:
```sh
npm install
```

## Usage

To start the web crawler, run the following command:
```sh
npm start

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adityasinghvats/web-crawler

Awesome Lists containing this project

README