Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pps-22-scooby/pps-22-scooby

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.
https://github.com/pps-22-scooby/pps-22-scooby

crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers

Last synced: 3 months ago
JSON representation

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.

Awesome Lists containing this project

README

        

# PPS-22-Scooby 🔍

## Team:

👨‍💻 Giovanni Antonioni - [email protected]

👨‍💻 Valerio Di Zio - [email protected]

👨‍💻 Francesco Magnani - [email protected]

👨‍💻 Luca Rubboli - [email protected]

## Technologies:

🔄 Scrum

🛠 SBT

🔗 Git

🎯 YouTrack

🚀 Github Actions

## Overview:

PPS-22-Scooby is a web scraping and crawling application. It enables users to extract data from web pages by crawling through links and scraping specific content according to predefined rules.

## Features:

🕷 **Crawling**: The application navigates web pages, follows links, and retrieves content.

🔍 **Scraping**: Relevant data is extracted from HTML/XML pages using XPath, CSS selectors, or regular expressions.

🛠 **Customization**: Users can define custom scraping and crawling rules to suit their specific needs.

⚙️ **Parallel Processing**: Aspects of parallel programming are integrated for efficient execution.

📤 **Export**: Users can export extracted data in various formats according to their preferences.

## Implementation:

PPS-22-Scooby is built using Scala with Actor libraries for concurrency management. The application utilizes Git for version control, YouTrack for project management, and Github Actions for continuous integration.

## Get Started:

To use PPS-22-Scooby, have a look at the section **Get Started** at https://pps-22-scooby.github.io/