Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/noahgift/cloud-data-analysis-at-scale

[Course-2020-2023] taught at Duke MIDS. This is also a Coursera Course that covers MLOps, ML Engineering and the foundations of Cloud Computing for Data Science.
https://github.com/noahgift/cloud-data-analysis-at-scale

analytics cloud data duke github hugging huggingface machine-learning mids syllabus

Last synced: 2 months ago
JSON representation

[Course-2020-2023] taught at Duke MIDS. This is also a Coursera Course that covers MLOps, ML Engineering and the foundations of Cloud Computing for Data Science.

Awesome Lists containing this project

README

        

# Data Analysis at Scale in the Cloud

Course taught at [Duke MIDS](https://datascience.duke.edu/noah-gift), Spring 2020-2022 by [Noah Gift](https://www.noahgift.com/).
* This is the [course syllabus](https://noahgift.github.io/cloud-data-analysis-at-scale/syllabus).
* These are the [projects in the course](https://noahgift.github.io/cloud-data-analysis-at-scale/projects)
* This the [week by week calendar](https://noahgift.github.io/cloud-data-analysis-at-scale/calendar-2022)
* This is the [rubric for grading assignments](https://noahgift.github.io/cloud-data-analysis-at-scale/rubric)
* This is the [grading for the course](https://noahgift.github.io/cloud-data-analysis-at-scale/grading)
* This is the [FAQ](https://noahgift.github.io/cloud-data-analysis-at-scale/faq)
* A complete [online book with screencast videos is available here](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter01-getting-started/).
* [Coursera Course, Building Cloud Computing Solutions at Scale Specialization, can be found here: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale](https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale)

## Guest Lecture 2022-Async

*GPT 3*:
* Book: https://learning.oreilly.com/library/view/gpt-3/9781098113612/
* Interview: https://learning.oreilly.com/videos/52-weeks-of/021822022VIDEOPAIML/
* Shubham Saboo
* Sandra Kublik

## Prequel Material

These resources could be helpful before starting this course.

### Duke/Coursera: Foundations of Data Engineering Course (Launching early 2022)

#### Course1: Python and Pandas for Data Engineering
#### Course2: Linux and Bash for Data Engineering

##### Github Repos for Projects in Course

###### Week1: Using Linux

* [Lesson 1: Using Linux Shell Lab](https://github.com/noahgift/Coursera-DE-C2-Using-Linux)
* [Lesson 2: How shell piping works](https://github.com/noahgift/Coursera-DE-C2-Shell-Piping)
* [Lesson 3: Using SSH](https://github.com/noahgift/ssh-tips-tricks)

###### Week2: Using Bash

* [Lesson 1: Create and Use .bashrc](https://github.com/noahgift/Coursera-DE-C2-configure-shell)
* [Lesson 2: Sourcing shell variables from a script](https://github.com/noahgift/Coursera-DE-C2-shell-variables)
* [Lesson3: Using stdout and stdin](https://github.com/noahgift/Coursera-DE-C2-Standard-Streams)

###### Week3: Building Bash Scripts

* [Lesson 1: Build a for loop in Bash](https://github.com/noahgift/Coursera-DE-C2-Use-Shell-Logic-and-Control-Flow)
* [Lesson 2: Truncate large files with Bash](https://github.com/noahgift/coursera-de-c2-truncate-file)
* [Lesson 3: Building a command-line tool for data processing](https://github.com/noahgift/Coursera-DE-C2-bash-cli-reverse-string)
* [Lesson 4: Build Bash CLI with options ](https://github.com/noahgift/Coursera-DE-C2-Lab3-Building-Bash-Scripts.git)

###### Week4: Composing File and Data Management Solutions with Linux

* [Lesson 1: Understand the search commands](https://github.com/noahgift/Coursera-DE-C2-search-commands)
* [Lesson 2: Setting permissions](https://github.com/noahgift/Coursera-DE-C2-Files-Directories-Permissions)
* [Lesson 3: Using regex to process text from file](https://github.com/noahgift/Coursera-DE-C2-using-regex-search)
* [Lesson 4: Search the filesystem with find](https://github.com/noahgift/Coursera-DE-C2-Lab4-Composing-File-Data-Solutions)

#### Course3: Python and SQL for Data Engineering
#### Course4: Building Data Engineering Solutions with Python for Web Applications, Command-Line Tools and Notebooks

## Sequel Material

These resources could be helpful after starting this course.

### Duke/Coursera: Applied Data Engineering Course (Launching late 2022)

## Github Repos Referenced Duke Coursera Course

### Course 1: Cloud Computing Foundations

* [Practice Markdown](https://github.com/noahgift/duke-coursera-ccf-lab1/blob/main/Practice-Markdown.ipynb)
* [Github Actions-Pytest](https://github.com/noahgift/github-actions-pytest)
* [Google App Engine Continuous Delivery](https://github.com/noahgift/gcp-flask-ml-deploy)
* [Hello World Flask](https://github.com/noahgift/flask-hello-coursera)
* [Hugo Continuous Delivery on AWS](https://github.com/noahgift/dukehugofeb1)

### Course 2: Cloud Computing Building Blocks

* [Lint Dockerfile](https://github.com/noahgift/duke-coursera-ccb-lab1)
* [Flask Change Microservice]

## Lecture Topics:

### Getting Started: [Week1]

* [Getting Started](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter01-getting-started/)

### Cloud Computing Foundations: [Week2]

* [Cloud Computing Foundations](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter02-cloud-foundations/)

### Virtualization and Containers: [Week3 & Week 4]

* [Containers, Virtualization and Elasticity](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter03-virtualization-containers-elasticity/)

### Challenges and Opportunities in Distributed Computing: [Week 5 & Week 6]

* [Distributed Computing](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter04-distributed-computing/)

### Cloud Storage [Week 7 & Week 8]

* [Cloud Storage](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter05-cloud-storage/)

### Serverless [Week 9 & Week 10]

* [Serverless](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter06-serverless-etl/)

### MLOps, Big Data and Edge Computer Vision [Week 11 & Week 12 & Week 13]

* [Managed ML Systems](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter07-managed-ml/)
* [Edge Computer Vision Notebooks and Code](https://github.com/noahgift/edge-computer-vision)
* [HuggingFace](https://learning.oreilly.com/videos/applied-hugging-face/10212022VIDEOPAIML/)
* [OpenAI](https://learning.oreilly.com/videos/assimilate-openai/08252022VIDEOPAIML/)

### General

* [Key Terms](https://noahgift.github.io/cloud-data-analysis-at-scale/topics/key-terms)
* [(Q&A) Question Answer](https://noahgift.github.io/cloud-data-analysis-at-scale/topics/Question-Answer)

### Student Example Projects

* [434 Analytics Application Development by Steve Depp](http://www.stevedepp.com/learn/school/msds/de/434.html)
* [462 Computer Vision by Steve Depp](http://www.stevedepp.com/learn/school/msds/ai/462.html)

#### *A practical guide to Data Science, Machine Learning Engineering and Data Engineering*

[Read Cloud Computing for Data Book](https://paiml.com/docs/home/books/cloud-computing-for-data/)
![cloud4data books](https://d2sofvawe08yqg.cloudfront.net/cloud4data/hero2x?1578933644)

[Free book Developing-on-AWS-with-CSharp](https://d1.awsstatic.com/developer-center/Developing-on-AWS-with-CSharp.pdf)
![Screenshot 2022-10-28 at 7 12 09 AM](https://user-images.githubusercontent.com/58792/198574661-c631cffa-4fca-4b7e-836f-a82bef7d77f6.png)

#### Next Steps: Take Coursera MLOps Course

![cloud-specialization](https://user-images.githubusercontent.com/58792/121041040-650ca180-c780-11eb-956e-8d1ecb134641.png)

* [Take the Specialization](https://www.coursera.org/learn/cloud-computing-foundations-duke?specialization=building-cloud-computing-solutions-at-scale)
* [Cloud Computing Foundations](https://www.coursera.org/learn/cloud-computing-foundations-duke?specialization=building-cloud-computing-solutions-at-scale)
* [Cloud Virtualization, Containers and APIs](https://www.coursera.org/learn/cloud-virtualization-containers-api-duke?specialization=building-cloud-computing-solutions-at-scale)
* [Cloud Data Engineering](https://www.coursera.org/learn/cloud-data-engineering-duke?specialization=building-cloud-computing-solutions-at-scale)
* [Cloud Machine Learning Engineering and MLOps](https://www.coursera.org/learn/cloud-machine-learning-engineering-mlops-duke?specialization=building-cloud-computing-solutions-at-scale)

### Text and Code License
The text and code content of notebooks and documents is released under the [CC-BY-NC-ND license](https://github.com/noahgift/cloud-data-analysis-at-scale/blob/master/license.md)