An open API service indexing awesome lists of open source software.

https://github.com/lucasbotang/coursera_big_data_for_data_engineers

Assignments for Big Data for Data Engineers specialization on Coursera by Yandex.
https://github.com/lucasbotang/coursera_big_data_for_data_engineers

hadoop hive spark spark-sql

Last synced: about 1 year ago
JSON representation

Assignments for Big Data for Data Engineers specialization on Coursera by Yandex.

Awesome Lists containing this project

README

          

# Specialization: Big Data for Data Engineers

## Big Data Essentials: HDFS, MapReduce and Spark RDD

Assignments:
- Hadoop Streaming Assignment 0: Word Count
- Hadoop Streaming Assignment 1: Words Rating
- Hadoop Streaming Assignment 2: Stop Words
- Spark Assignment 1: Pairs
- Spark Assignment 2: Reconstructing the path
- Real-World Applications: TF-IDF

## Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames

Assignments:
- Hive Assignment 1. DDL: Create Tables
- Hive Assignment 2. DML: Find Most Popular Tags
- Spark Assignment 1: Counting number of the mutual friends
- Spark Assignment 2: Graph based Music Recommender