Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wgierke/distributed_data_analytics
Solutions for the hands-on sessions of the course "Distributed Data Analytics" at Hasso-Plattner-Institute using Akka and Spark.
https://github.com/wgierke/distributed_data_analytics
akka data-analytics distributed inclusion-dependency spark
Last synced: about 1 month ago
JSON representation
Solutions for the hands-on sessions of the course "Distributed Data Analytics" at Hasso-Plattner-Institute using Akka and Spark.
- Host: GitHub
- URL: https://github.com/wgierke/distributed_data_analytics
- Owner: WGierke
- Created: 2017-11-22T15:36:59.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-01-24T14:43:38.000Z (about 7 years ago)
- Last Synced: 2024-12-13T23:51:23.094Z (about 1 month ago)
- Topics: akka, data-analytics, distributed, inclusion-dependency, spark
- Language: Java
- Homepage: https://hpi.de/naumann/teaching/teaching/ws-1718/distributed-data-analytics-vl-master.html
- Size: 95.6 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Distributed Data Analytics
## Solutions for the hands-on sessions on Akka and Spark
### 1. Akka
The [task](https://hpi.de/fileadmin/user_upload/fachgebiete/naumann/lehre/WS2017/DDA/4_Hands-on_Akka_Actor_Programming.pdf) was to crack hashes and find longest substrings.
The [solution](https://github.com/WGierke/distributed_data_analytics/blob/master/akka-cracka/solution.csv) can be obtained using the final [jar](https://github.com/WGierke/distributed_data_analytics/files/1564077/v1.1.zip) by executing
`java -jar akka-cracka.jar --path path/to/students.csv`### 2. Spark
The [task](https://hpi.de/fileadmin/user_upload/fachgebiete/naumann/lehre/WS2017/DDA/10_Hands-on_Spark_Batch_Processing.pdf) was to perform Inclusion Dependency Discovery using Spark.
The solution can be obtained using the final [jar](https://github.com/WGierke/distributed_data_analytics/releases/download/v2.0/fINDer.jar) by executing
`java -jar fINDer.jar --path path/to/TPCH --cores NUMBER_OF_CORES`