Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/noahgift/cloud-data-analysis-at-scale
[Course-2020-2023] taught at Duke MIDS. This is also a Coursera Course that covers MLOps, ML Engineering and the foundations of Cloud Computing for Data Science.
https://github.com/noahgift/cloud-data-analysis-at-scale
analytics cloud data duke github hugging huggingface machine-learning mids syllabus
Last synced: 2 months ago
JSON representation
[Course-2020-2023] taught at Duke MIDS. This is also a Coursera Course that covers MLOps, ML Engineering and the foundations of Cloud Computing for Data Science.
- Host: GitHub
- URL: https://github.com/noahgift/cloud-data-analysis-at-scale
- Owner: noahgift
- License: other
- Created: 2019-11-27T23:51:40.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-10-31T11:25:39.000Z (about 2 years ago)
- Last Synced: 2024-10-12T15:32:17.120Z (3 months ago)
- Topics: analytics, cloud, data, duke, github, hugging, huggingface, machine-learning, mids, syllabus
- Language: Jupyter Notebook
- Homepage: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale
- Size: 8.23 MB
- Stars: 128
- Watchers: 14
- Forks: 86
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: license.md
Awesome Lists containing this project
README
# Data Analysis at Scale in the Cloud
Course taught at [Duke MIDS](https://datascience.duke.edu/noah-gift), Spring 2020-2022 by [Noah Gift](https://www.noahgift.com/).
* This is the [course syllabus](https://noahgift.github.io/cloud-data-analysis-at-scale/syllabus).
* These are the [projects in the course](https://noahgift.github.io/cloud-data-analysis-at-scale/projects)
* This the [week by week calendar](https://noahgift.github.io/cloud-data-analysis-at-scale/calendar-2022)
* This is the [rubric for grading assignments](https://noahgift.github.io/cloud-data-analysis-at-scale/rubric)
* This is the [grading for the course](https://noahgift.github.io/cloud-data-analysis-at-scale/grading)
* This is the [FAQ](https://noahgift.github.io/cloud-data-analysis-at-scale/faq)
* A complete [online book with screencast videos is available here](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter01-getting-started/).
* [Coursera Course, Building Cloud Computing Solutions at Scale Specialization, can be found here: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale](https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale)## Guest Lecture 2022-Async
*GPT 3*:
* Book: https://learning.oreilly.com/library/view/gpt-3/9781098113612/
* Interview: https://learning.oreilly.com/videos/52-weeks-of/021822022VIDEOPAIML/
* Shubham Saboo
* Sandra Kublik## Prequel Material
These resources could be helpful before starting this course.
### Duke/Coursera: Foundations of Data Engineering Course (Launching early 2022)
#### Course1: Python and Pandas for Data Engineering
#### Course2: Linux and Bash for Data Engineering##### Github Repos for Projects in Course
###### Week1: Using Linux
* [Lesson 1: Using Linux Shell Lab](https://github.com/noahgift/Coursera-DE-C2-Using-Linux)
* [Lesson 2: How shell piping works](https://github.com/noahgift/Coursera-DE-C2-Shell-Piping)
* [Lesson 3: Using SSH](https://github.com/noahgift/ssh-tips-tricks)
###### Week2: Using Bash* [Lesson 1: Create and Use .bashrc](https://github.com/noahgift/Coursera-DE-C2-configure-shell)
* [Lesson 2: Sourcing shell variables from a script](https://github.com/noahgift/Coursera-DE-C2-shell-variables)
* [Lesson3: Using stdout and stdin](https://github.com/noahgift/Coursera-DE-C2-Standard-Streams)###### Week3: Building Bash Scripts
* [Lesson 1: Build a for loop in Bash](https://github.com/noahgift/Coursera-DE-C2-Use-Shell-Logic-and-Control-Flow)
* [Lesson 2: Truncate large files with Bash](https://github.com/noahgift/coursera-de-c2-truncate-file)
* [Lesson 3: Building a command-line tool for data processing](https://github.com/noahgift/Coursera-DE-C2-bash-cli-reverse-string)
* [Lesson 4: Build Bash CLI with options ](https://github.com/noahgift/Coursera-DE-C2-Lab3-Building-Bash-Scripts.git)###### Week4: Composing File and Data Management Solutions with Linux
* [Lesson 1: Understand the search commands](https://github.com/noahgift/Coursera-DE-C2-search-commands)
* [Lesson 2: Setting permissions](https://github.com/noahgift/Coursera-DE-C2-Files-Directories-Permissions)
* [Lesson 3: Using regex to process text from file](https://github.com/noahgift/Coursera-DE-C2-using-regex-search)
* [Lesson 4: Search the filesystem with find](https://github.com/noahgift/Coursera-DE-C2-Lab4-Composing-File-Data-Solutions)#### Course3: Python and SQL for Data Engineering
#### Course4: Building Data Engineering Solutions with Python for Web Applications, Command-Line Tools and Notebooks## Sequel Material
These resources could be helpful after starting this course.
### Duke/Coursera: Applied Data Engineering Course (Launching late 2022)
## Github Repos Referenced Duke Coursera Course
### Course 1: Cloud Computing Foundations
* [Practice Markdown](https://github.com/noahgift/duke-coursera-ccf-lab1/blob/main/Practice-Markdown.ipynb)
* [Github Actions-Pytest](https://github.com/noahgift/github-actions-pytest)
* [Google App Engine Continuous Delivery](https://github.com/noahgift/gcp-flask-ml-deploy)
* [Hello World Flask](https://github.com/noahgift/flask-hello-coursera)
* [Hugo Continuous Delivery on AWS](https://github.com/noahgift/dukehugofeb1)### Course 2: Cloud Computing Building Blocks
* [Lint Dockerfile](https://github.com/noahgift/duke-coursera-ccb-lab1)
* [Flask Change Microservice]## Lecture Topics:
### Getting Started: [Week1]
* [Getting Started](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter01-getting-started/)
### Cloud Computing Foundations: [Week2]
* [Cloud Computing Foundations](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter02-cloud-foundations/)
### Virtualization and Containers: [Week3 & Week 4]
* [Containers, Virtualization and Elasticity](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter03-virtualization-containers-elasticity/)
### Challenges and Opportunities in Distributed Computing: [Week 5 & Week 6]
* [Distributed Computing](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter04-distributed-computing/)
### Cloud Storage [Week 7 & Week 8]
* [Cloud Storage](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter05-cloud-storage/)
### Serverless [Week 9 & Week 10]
* [Serverless](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter06-serverless-etl/)
### MLOps, Big Data and Edge Computer Vision [Week 11 & Week 12 & Week 13]
* [Managed ML Systems](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter07-managed-ml/)
* [Edge Computer Vision Notebooks and Code](https://github.com/noahgift/edge-computer-vision)
* [HuggingFace](https://learning.oreilly.com/videos/applied-hugging-face/10212022VIDEOPAIML/)
* [OpenAI](https://learning.oreilly.com/videos/assimilate-openai/08252022VIDEOPAIML/)### General
* [Key Terms](https://noahgift.github.io/cloud-data-analysis-at-scale/topics/key-terms)
* [(Q&A) Question Answer](https://noahgift.github.io/cloud-data-analysis-at-scale/topics/Question-Answer)### Student Example Projects
* [434 Analytics Application Development by Steve Depp](http://www.stevedepp.com/learn/school/msds/de/434.html)
* [462 Computer Vision by Steve Depp](http://www.stevedepp.com/learn/school/msds/ai/462.html)#### *A practical guide to Data Science, Machine Learning Engineering and Data Engineering*
[Read Cloud Computing for Data Book](https://paiml.com/docs/home/books/cloud-computing-for-data/)
![cloud4data books](https://d2sofvawe08yqg.cloudfront.net/cloud4data/hero2x?1578933644)[Free book Developing-on-AWS-with-CSharp](https://d1.awsstatic.com/developer-center/Developing-on-AWS-with-CSharp.pdf)
![Screenshot 2022-10-28 at 7 12 09 AM](https://user-images.githubusercontent.com/58792/198574661-c631cffa-4fca-4b7e-836f-a82bef7d77f6.png)#### Next Steps: Take Coursera MLOps Course
![cloud-specialization](https://user-images.githubusercontent.com/58792/121041040-650ca180-c780-11eb-956e-8d1ecb134641.png)
* [Take the Specialization](https://www.coursera.org/learn/cloud-computing-foundations-duke?specialization=building-cloud-computing-solutions-at-scale)
* [Cloud Computing Foundations](https://www.coursera.org/learn/cloud-computing-foundations-duke?specialization=building-cloud-computing-solutions-at-scale)
* [Cloud Virtualization, Containers and APIs](https://www.coursera.org/learn/cloud-virtualization-containers-api-duke?specialization=building-cloud-computing-solutions-at-scale)
* [Cloud Data Engineering](https://www.coursera.org/learn/cloud-data-engineering-duke?specialization=building-cloud-computing-solutions-at-scale)
* [Cloud Machine Learning Engineering and MLOps](https://www.coursera.org/learn/cloud-machine-learning-engineering-mlops-duke?specialization=building-cloud-computing-solutions-at-scale)### Text and Code License
The text and code content of notebooks and documents is released under the [CC-BY-NC-ND license](https://github.com/noahgift/cloud-data-analysis-at-scale/blob/master/license.md)