Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cyyeh/ml-bigdata
CMU: 10-405/10-605: Machine Learning with Large Datasets(Spring 2020)
https://github.com/cyyeh/ml-bigdata
Last synced: 8 days ago
JSON representation
CMU: 10-405/10-605: Machine Learning with Large Datasets(Spring 2020)
- Host: GitHub
- URL: https://github.com/cyyeh/ml-bigdata
- Owner: cyyeh
- Created: 2020-07-17T14:13:58.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-07-19T09:06:03.000Z (over 4 years ago)
- Last Synced: 2024-12-21T05:10:06.392Z (2 months ago)
- Language: Jupyter Notebook
- Size: 27.6 MB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 10-405/10-605: Machine Learning with Large Datasets(Spring 2020)
course website: https://10605.github.io/spring2020/index.html
## Lecture Summaries
- [Lecture 1: Introduction](lecture_summaries/01.md)
- [Lecture 2: Distributed Computing, MapReduce](lecture_summaries/02.md)
- [Lecture 3: Intro to Spark](lecture_summaries/03.md)
- [Lecture 4: Data Cleaning](lecture_summaries/04.md)
- [Lecture 5: Spark: Joins, Structure, and DataFrames](lecture_summaries/05.md)## Recitaion Summaries
- [Spark topology basics + setup with Databricks](recitation_summaries/01.md)
- [notebook](labs/Recitation0.ipynb)
- [Spark Transformations and Actions](recitation_summaries/02.md)
- [notebook](labs/Recitation2.ipynb)
- Spark RDDs and DataFrames
- [notebook](labs/Recitation3.ipynb)## Assignment Summaries
- [Assignment 1](assignments/hw1)
- This assignment involves understanding some basics of distributed computing, the MapReduce programming model, Spark, and an example of data cleaning.
- This assignment consists of two major parts. The first part is to build a simple word count application, and the second part is on entity resolution, a common type of data cleaning.## References
- [more Spark](https://heather.miller.am/teaching/cs4240/spring2018/)