Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/piero24/big-data_hw_23-24

Exercises in Java and Spark for the Big Data Computing course at unipd
https://github.com/piero24/big-data_hw_23-24

big-data clustering fft java mapreduce sampling spark streaming

Last synced: about 1 month ago
JSON representation

Exercises in Java and Spark for the Big Data Computing course at unipd

Host: GitHub
URL: https://github.com/piero24/big-data_hw_23-24
Owner: Piero24
Created: 2024-03-23T20:56:02.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-06-07T20:12:44.000Z (8 months ago)
Last Synced: 2025-01-01T11:13:28.482Z (about 2 months ago)
Topics: big-data, clustering, fft, java, mapreduce, sampling, spark, streaming
Language: Java
Homepage:
Size: 14.2 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

[![Last modified](https://img.shields.io/badge/Last%20modified-10--Aug--2021-red)](https://github.com/Piero24/Big-Data_HW_23-24)
# Big-Data_HW_23-24

> academic year 2023-2024 (unipd)
>
> University of Padua

---

## Java and Spark programming exercises for the Big Data course

Homework assigned by the teacher to develop a minimum of skills in Java and Spark and learn the basics of Big Data.

Here is a collection with related solutions.

The test consists in solving the listed exercises.

## Disclaimer

These exercises should ONLY be used for practicing.

**I AM IN NO WAY RESPONSIBLE FOR MISUSE OF THIS MATERIAL.**

**DO NOT** rely solely on the following exercises for preparation.
As the course program may vary over the years.
Use this material only and exclusively for practice.

## Description

There are 3 different exercises.

- The first exercise is about the MapReduce programming model.
- The second exercise is about the Clustering with the Farthest First Traversal algorithm and MapReduce in a sequential way.
- The third exercise is about the Streaming with the implementation of the Steaky Sampling and Reservoir Sampling algorithms to detect frequent items inside a stream.

### Authors and Copyright

[Pietrobon Andrea](https://github.com/Piero24), [Friso Giovanni](https://github.com/GioFriso), [Agostini Francesco](https://github.com/FrancescoAgostiniUnipd)

### Note

This material will **NOT** be updated in the future.