https://github.com/lucasbotang/coursera_big_data_for_data_engineers
Assignments for Big Data for Data Engineers specialization on Coursera by Yandex.
https://github.com/lucasbotang/coursera_big_data_for_data_engineers
hadoop hive spark spark-sql
Last synced: about 1 year ago
JSON representation
Assignments for Big Data for Data Engineers specialization on Coursera by Yandex.
- Host: GitHub
- URL: https://github.com/lucasbotang/coursera_big_data_for_data_engineers
- Owner: LucasBoTang
- Created: 2018-09-06T15:08:09.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-01-18T21:38:02.000Z (over 7 years ago)
- Last Synced: 2025-04-12T05:38:00.742Z (about 1 year ago)
- Topics: hadoop, hive, spark, spark-sql
- Language: Jupyter Notebook
- Homepage:
- Size: 474 KB
- Stars: 8
- Watchers: 0
- Forks: 17
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Specialization: Big Data for Data Engineers
## Big Data Essentials: HDFS, MapReduce and Spark RDD
Assignments:
- Hadoop Streaming Assignment 0: Word Count
- Hadoop Streaming Assignment 1: Words Rating
- Hadoop Streaming Assignment 2: Stop Words
- Spark Assignment 1: Pairs
- Spark Assignment 2: Reconstructing the path
- Real-World Applications: TF-IDF
## Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Assignments:
- Hive Assignment 1. DDL: Create Tables
- Hive Assignment 2. DML: Find Most Popular Tags
- Spark Assignment 1: Counting number of the mutual friends
- Spark Assignment 2: Graph based Music Recommender