Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cyyeh/ml-bigdata

CMU: 10-405/10-605: Machine Learning with Large Datasets(Spring 2020)
https://github.com/cyyeh/ml-bigdata

Last synced: 8 days ago
JSON representation

CMU: 10-405/10-605: Machine Learning with Large Datasets(Spring 2020)

Awesome Lists containing this project

README

        

# 10-405/10-605: Machine Learning with Large Datasets(Spring 2020)

course website: https://10605.github.io/spring2020/index.html

## Lecture Summaries

- [Lecture 1: Introduction](lecture_summaries/01.md)
- [Lecture 2: Distributed Computing, MapReduce](lecture_summaries/02.md)
- [Lecture 3: Intro to Spark](lecture_summaries/03.md)
- [Lecture 4: Data Cleaning](lecture_summaries/04.md)
- [Lecture 5: Spark: Joins, Structure, and DataFrames](lecture_summaries/05.md)

## Recitaion Summaries

- [Spark topology basics + setup with Databricks](recitation_summaries/01.md)
- [notebook](labs/Recitation0.ipynb)
- [Spark Transformations and Actions](recitation_summaries/02.md)
- [notebook](labs/Recitation2.ipynb)
- Spark RDDs and DataFrames
- [notebook](labs/Recitation3.ipynb)

## Assignment Summaries

- [Assignment 1](assignments/hw1)
- This assignment involves understanding some basics of distributed computing, the MapReduce programming model, Spark, and an example of data cleaning.
- This assignment consists of two major parts. The first part is to build a simple word count application, and the second part is on entity resolution, a common type of data cleaning.

## References

- [more Spark](https://heather.miller.am/teaching/cs4240/spring2018/)