{"id":30430486,"url":"https://github.com/brendanddev/groovy-scraper","last_synced_at":"2025-08-22T18:22:30.189Z","repository":{"id":309212562,"uuid":"1035502077","full_name":"brendanddev/groovy-scraper","owner":"brendanddev","description":"Educational web scraper in Groovy demonstrating ethical scraping practices with JSoup. Perfect for learning HTML parsing, data extraction, and responsible crawling techniques.","archived":false,"fork":false,"pushed_at":"2025-08-10T16:25:33.000Z","size":11,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-10T16:27:54.721Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Groovy","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brendanddev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-10T14:36:32.000Z","updated_at":"2025-08-10T16:25:37.000Z","dependencies_parsed_at":"2025-08-10T16:27:57.094Z","dependency_job_id":"37c4cdaf-933e-4694-b8ad-81ea94acdfb1","html_url":"https://github.com/brendanddev/groovy-scraper","commit_stats":null,"previous_names":["brendanddev/groovy-scraper"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/brendanddev/groovy-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brendanddev%2Fgroovy-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brendanddev%2Fgroovy-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brendanddev%2Fgroovy-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brendanddev%2Fgroovy-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brendanddev","download_url":"https://codeload.github.com/brendanddev/groovy-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brendanddev%2Fgroovy-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271680835,"owners_count":24802077,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-22T02:00:08.480Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-22T18:22:26.231Z","updated_at":"2025-08-22T18:22:30.169Z","avatar_url":"https://github.com/brendanddev.png","language":"Groovy","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Groovy Web Scraper\n\nA powerful, interactive **terminal-based web scraper application** built with Groovy and JSoup.  \nDesigned for real-world scraping tasks as well as learning and experimentation with ethical scraping techniques.\n\n---\n\n## Overview\n\nThis application offers a **user-friendly command-line interface** to scrape data from websites by specifying URLs and CSS selectors.  \nIt combines reusable Groovy utilities with practical features such as:\n\n- Interactive terminal menu for custom scraping and pre-built examples  \n- Scraping data from demo and real-world websites  \n- Fetching and parsing HTML tables and JSON APIs  \n- Checking and respecting `robots.txt` policies to ensure ethical scraping  \n- Built-in delays to prevent overwhelming target servers  \n- Saving scraped results in text, JSON, or CSV formats  \n- Clear, color-coded terminal output for easy reading  \n\nWhether you’re a developer wanting to quickly extract data or someone learning how to build scrapers responsibly, this app is ready to use out-of-the-box.\n\n---\n\n## Features\n\n- **Custom Scraping:** Input any URL and CSS selector to scrape live data  \n- **Built-in Examples:** Demonstrations of table scraping, JSON API parsing, and robots.txt compliance  \n- **Robots.txt Checker:** Verifies site scraping permissions and informs you if scraping is disallowed or robots.txt is missing  \n- **Result Saving:** Export scraped data easily to text, JSON, or CSV files  \n- **Respectful Scraping:** Implements pacing between requests to avoid hitting servers too hard  \n- **Terminal UI:** Intuitive prompts and colorful messages guide you through scraping tasks  \n\n---\n\n## Requirements\n\n- Java 11 or later\n- Groovy 3.x\n- Internet connection to run scraping tasks\n\n---\n\n## Libraries\n\n- [JSoup](https://jsoup.org/) — for parsing and extracting data from HTML\n- Groovy standard libraries — for scripting and CLI utilities\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrendanddev%2Fgroovy-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrendanddev%2Fgroovy-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrendanddev%2Fgroovy-scraper/lists"}