https://github.com/brendanddev/groovy-scraper

Educational web scraper in Groovy demonstrating ethical scraping practices with JSoup. Perfect for learning HTML parsing, data extraction, and responsible crawling techniques.
https://github.com/brendanddev/groovy-scraper

Last synced: 10 months ago
JSON representation

Educational web scraper in Groovy demonstrating ethical scraping practices with JSoup. Perfect for learning HTML parsing, data extraction, and responsible crawling techniques.

Host: GitHub
URL: https://github.com/brendanddev/groovy-scraper
Owner: brendanddev
License: mit
Created: 2025-08-10T14:36:32.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-08-10T16:25:33.000Z (10 months ago)
Last Synced: 2025-08-10T16:27:54.721Z (10 months ago)
Language: Groovy
Size: 10.7 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Groovy Web Scraper

A powerful, interactive **terminal-based web scraper application** built with Groovy and JSoup.
Designed for real-world scraping tasks as well as learning and experimentation with ethical scraping techniques.

---

## Overview

This application offers a **user-friendly command-line interface** to scrape data from websites by specifying URLs and CSS selectors.
It combines reusable Groovy utilities with practical features such as:

- Interactive terminal menu for custom scraping and pre-built examples
- Scraping data from demo and real-world websites
- Fetching and parsing HTML tables and JSON APIs
- Checking and respecting `robots.txt` policies to ensure ethical scraping
- Built-in delays to prevent overwhelming target servers
- Saving scraped results in text, JSON, or CSV formats
- Clear, color-coded terminal output for easy reading

Whether you’re a developer wanting to quickly extract data or someone learning how to build scrapers responsibly, this app is ready to use out-of-the-box.

---

## Features

- **Custom Scraping:** Input any URL and CSS selector to scrape live data
- **Built-in Examples:** Demonstrations of table scraping, JSON API parsing, and robots.txt compliance
- **Robots.txt Checker:** Verifies site scraping permissions and informs you if scraping is disallowed or robots.txt is missing
- **Result Saving:** Export scraped data easily to text, JSON, or CSV files
- **Respectful Scraping:** Implements pacing between requests to avoid hitting servers too hard
- **Terminal UI:** Intuitive prompts and colorful messages guide you through scraping tasks

---

## Requirements

- Java 11 or later
- Groovy 3.x
- Internet connection to run scraping tasks

---

## Libraries

- [JSoup](https://jsoup.org/) — for parsing and extracting data from HTML
- Groovy standard libraries — for scripting and CLI utilities

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/brendanddev/groovy-scraper

Awesome Lists containing this project

README