https://github.com/brendanddev/groovy-scraper
Educational web scraper in Groovy demonstrating ethical scraping practices with JSoup. Perfect for learning HTML parsing, data extraction, and responsible crawling techniques.
https://github.com/brendanddev/groovy-scraper
Last synced: 10 months ago
JSON representation
Educational web scraper in Groovy demonstrating ethical scraping practices with JSoup. Perfect for learning HTML parsing, data extraction, and responsible crawling techniques.
- Host: GitHub
- URL: https://github.com/brendanddev/groovy-scraper
- Owner: brendanddev
- License: mit
- Created: 2025-08-10T14:36:32.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-10T16:25:33.000Z (10 months ago)
- Last Synced: 2025-08-10T16:27:54.721Z (10 months ago)
- Language: Groovy
- Size: 10.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Groovy Web Scraper
A powerful, interactive **terminal-based web scraper application** built with Groovy and JSoup.
Designed for real-world scraping tasks as well as learning and experimentation with ethical scraping techniques.
---
## Overview
This application offers a **user-friendly command-line interface** to scrape data from websites by specifying URLs and CSS selectors.
It combines reusable Groovy utilities with practical features such as:
- Interactive terminal menu for custom scraping and pre-built examples
- Scraping data from demo and real-world websites
- Fetching and parsing HTML tables and JSON APIs
- Checking and respecting `robots.txt` policies to ensure ethical scraping
- Built-in delays to prevent overwhelming target servers
- Saving scraped results in text, JSON, or CSV formats
- Clear, color-coded terminal output for easy reading
Whether you’re a developer wanting to quickly extract data or someone learning how to build scrapers responsibly, this app is ready to use out-of-the-box.
---
## Features
- **Custom Scraping:** Input any URL and CSS selector to scrape live data
- **Built-in Examples:** Demonstrations of table scraping, JSON API parsing, and robots.txt compliance
- **Robots.txt Checker:** Verifies site scraping permissions and informs you if scraping is disallowed or robots.txt is missing
- **Result Saving:** Export scraped data easily to text, JSON, or CSV files
- **Respectful Scraping:** Implements pacing between requests to avoid hitting servers too hard
- **Terminal UI:** Intuitive prompts and colorful messages guide you through scraping tasks
---
## Requirements
- Java 11 or later
- Groovy 3.x
- Internet connection to run scraping tasks
---
## Libraries
- [JSoup](https://jsoup.org/) — for parsing and extracting data from HTML
- Groovy standard libraries — for scripting and CLI utilities
---