Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/anzo52/jcrawl

Java web crawler
https://github.com/anzo52/jcrawl

crawler java java-web-crawler web web-crawler

Last synced: 5 days ago
JSON representation

Java web crawler

Awesome Lists containing this project

README

        

# JCrawl
# JCrawl - Java Websites Crawler

JCrawl is a basic web crawler implemented in Java, designed to scrape web pages starting from a given URL and extracting links from those pages. Web crawling is the process of navigating and extracting information from web pages, often used by search engines and web scrapers

## Table of Contents
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Usage](#usage)

## Features
- Web crawling from a starting URL.
- Specify the number of links to scrape using a breakpoint.
- Extract links from web pages.

## Prerequisites
- Java Development Kit (JDK) installed on your system.

## Usage
1. Clone or download this repository to your local machine.
2. Compile the `JCrawl.java` file using `javac`:
`javac JCrawl.java`

## Run the porgram:
1. `java JCrawl`