Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/piero24/big-data_hw_23-24

Exercises in Java and Spark for the Big Data Computing course at unipd
https://github.com/piero24/big-data_hw_23-24

big-data clustering fft java mapreduce sampling spark streaming

Last synced: about 1 month ago
JSON representation

Exercises in Java and Spark for the Big Data Computing course at unipd

Awesome Lists containing this project

README

        

[![Last modified](https://img.shields.io/badge/Last%20modified-10--Aug--2021-red)](https://github.com/Piero24/Big-Data_HW_23-24)
# Big-Data_HW_23-24

> academic year 2023-2024 (unipd)
>
> University of Padua

---

## Java and Spark programming exercises for the Big Data course

Homework assigned by the teacher to develop a minimum of skills in Java and Spark and learn the basics of Big Data.

Here is a collection with related solutions.

The test consists in solving the listed exercises.

## Disclaimer

These exercises should ONLY be used for practicing.

**I AM IN NO WAY RESPONSIBLE FOR MISUSE OF THIS MATERIAL.**

**DO NOT** rely solely on the following exercises for preparation.
As the course program may vary over the years.
Use this material only and exclusively for practice.

## Description

There are 3 different exercises.

- The first exercise is about the MapReduce programming model.
- The second exercise is about the Clustering with the Farthest First Traversal algorithm and MapReduce in a sequential way.
- The third exercise is about the Streaming with the implementation of the Steaky Sampling and Reservoir Sampling algorithms to detect frequent items inside a stream.

### Authors and Copyright

[Pietrobon Andrea](https://github.com/Piero24), [Friso Giovanni](https://github.com/GioFriso), [Agostini Francesco](https://github.com/FrancescoAgostiniUnipd)

### Note

This material will **NOT** be updated in the future.