Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/anzo52/jcrawl

Java web crawler
https://github.com/anzo52/jcrawl

crawler java java-web-crawler web web-crawler

Last synced: about 1 month ago
JSON representation

Java web crawler

Host: GitHub
URL: https://github.com/anzo52/jcrawl
Owner: Anzo52
License: gpl-3.0
Created: 2022-04-10T09:10:39.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-10-13T02:42:51.000Z (over 1 year ago)
Last Synced: 2024-11-08T18:20:58.235Z (3 months ago)
Topics: crawler, java, java-web-crawler, web, web-crawler
Language: Java
Homepage:
Size: 23.4 KB
Stars: 2
Watchers: 2
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# JCrawl
# JCrawl - Java Websites Crawler

JCrawl is a basic web crawler implemented in Java, designed to scrape web pages starting from a given URL and extracting links from those pages. Web crawling is the process of navigating and extracting information from web pages, often used by search engines and web scrapers

## Table of Contents
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Usage](#usage)

## Features
- Web crawling from a starting URL.
- Specify the number of links to scrape using a breakpoint.
- Extract links from web pages.

## Prerequisites
- Java Development Kit (JDK) installed on your system.

## Usage
1. Clone or download this repository to your local machine.
2. Compile the `JCrawl.java` file using `javac`:
`javac JCrawl.java`

## Run the porgram:
1. `java JCrawl`