Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anzo52/jcrawl
Java web crawler
https://github.com/anzo52/jcrawl
crawler java java-web-crawler web web-crawler
Last synced: 5 days ago
JSON representation
Java web crawler
- Host: GitHub
- URL: https://github.com/anzo52/jcrawl
- Owner: Anzo52
- License: gpl-3.0
- Created: 2022-04-10T09:10:39.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-13T02:42:51.000Z (about 1 year ago)
- Last Synced: 2024-11-08T18:20:58.235Z (about 2 months ago)
- Topics: crawler, java, java-web-crawler, web, web-crawler
- Language: Java
- Homepage:
- Size: 23.4 KB
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# JCrawl
# JCrawl - Java Websites CrawlerJCrawl is a basic web crawler implemented in Java, designed to scrape web pages starting from a given URL and extracting links from those pages. Web crawling is the process of navigating and extracting information from web pages, often used by search engines and web scrapers
## Table of Contents
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Usage](#usage)## Features
- Web crawling from a starting URL.
- Specify the number of links to scrape using a breakpoint.
- Extract links from web pages.## Prerequisites
- Java Development Kit (JDK) installed on your system.## Usage
1. Clone or download this repository to your local machine.
2. Compile the `JCrawl.java` file using `javac`:
`javac JCrawl.java`## Run the porgram:
1. `java JCrawl`