{"id":16508050,"url":"https://github.com/noahgift/cloud-data-analysis-at-scale","last_synced_at":"2026-02-18T19:32:43.954Z","repository":{"id":45894877,"uuid":"224536190","full_name":"noahgift/cloud-data-analysis-at-scale","owner":"noahgift","description":"[Course-2020-2023] taught at Duke MIDS.  This is also a Coursera Course that covers MLOps, ML Engineering and the foundations of Cloud Computing for Data Science.","archived":false,"fork":false,"pushed_at":"2025-01-10T22:52:36.000Z","size":8638,"stargazers_count":137,"open_issues_count":5,"forks_count":89,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-05-24T13:09:40.086Z","etag":null,"topics":["analytics","cloud","data","duke","github","hugging","huggingface","machine-learning","mids","syllabus"],"latest_commit_sha":null,"homepage":"https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/noahgift.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-27T23:51:40.000Z","updated_at":"2025-05-04T21:22:09.000Z","dependencies_parsed_at":"2025-01-18T18:12:24.675Z","dependency_job_id":"7cceec34-3686-41ba-97b5-418b9cd4932e","html_url":"https://github.com/noahgift/cloud-data-analysis-at-scale","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/noahgift/cloud-data-analysis-at-scale","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noahgift%2Fcloud-data-analysis-at-scale","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noahgift%2Fcloud-data-analysis-at-scale/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noahgift%2Fcloud-data-analysis-at-scale/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noahgift%2Fcloud-data-analysis-at-scale/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/noahgift","download_url":"https://codeload.github.com/noahgift/cloud-data-analysis-at-scale/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noahgift%2Fcloud-data-analysis-at-scale/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29591958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T18:54:29.675Z","status":"ssl_error","status_checked_at":"2026-02-18T18:50:50.517Z","response_time":162,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","cloud","data","duke","github","hugging","huggingface","machine-learning","mids","syllabus"],"created_at":"2024-10-11T15:32:31.910Z","updated_at":"2026-02-18T19:32:43.917Z","avatar_url":"https://github.com/noahgift.png","language":"Jupyter Notebook","readme":"# Data Analysis at Scale in the Cloud\n\nCourse taught at [Duke MIDS](https://datascience.duke.edu/noah-gift), Spring 2020-2022 by [Noah Gift](https://www.noahgift.com/).  \n* This is the [course syllabus](https://noahgift.github.io/cloud-data-analysis-at-scale/syllabus).\n* These are the [projects in the course](https://noahgift.github.io/cloud-data-analysis-at-scale/projects)\n* This the [week by week calendar](https://noahgift.github.io/cloud-data-analysis-at-scale/calendar-2022)\n* This is the [rubric for grading assignments](https://noahgift.github.io/cloud-data-analysis-at-scale/rubric)\n* This is the [grading for the course](https://noahgift.github.io/cloud-data-analysis-at-scale/grading)\n* This is the [FAQ](https://noahgift.github.io/cloud-data-analysis-at-scale/faq)\n* A complete [online book with screencast videos is available here](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter01-getting-started/).\n* [Coursera Course, Building Cloud Computing Solutions at Scale Specialization, can be found here: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale](https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale)\n\n## 🎓 Pragmatic AI Labs | Join 1M+ ML Engineers\n\n### 🔥 Hot Course Offers:\n* 🤖 [Master GenAI Engineering](https://ds500.paiml.com/learn/course/0bbb5/) - Build Production AI Systems\n* 🦀 [Learn Professional Rust](https://ds500.paiml.com/learn/course/g6u1k/) - Industry-Grade Development\n* 📊 [AWS AI \u0026 Analytics](https://ds500.paiml.com/learn/course/31si1/) - Scale Your ML in Cloud\n* ⚡ [Production GenAI on AWS](https://ds500.paiml.com/learn/course/ehks1/) - Deploy at Enterprise Scale\n* 🛠️ [Rust DevOps Mastery](https://ds500.paiml.com/learn/course/ex8eu/) - Automate Everything\n\n### 🚀 Level Up Your Career:\n* 💼 [Production ML Program](https://paiml.com) - Complete MLOps \u0026 Cloud Mastery\n* 🎯 [Start Learning Now](https://ds500.paiml.com) - Fast-Track Your ML Career\n* 🏢 Trusted by Fortune 500 Teams\n\nLearn end-to-end ML engineering from industry veterans at [PAIML.COM](https://paiml.com)\n\n## Prequel Material\n\nThese resources could be helpful before starting this course.\n\n### Duke/Coursera:  Foundations of Data Engineering Course (Launching early 2022)\n\n#### Course1: Python and Pandas for Data Engineering\n#### Course2: Linux and Bash for Data Engineering\n\n##### Github Repos for Projects in Course\n\n###### Week1:  Using Linux\n\n  * [Lesson 1:  Using Linux Shell Lab](https://github.com/noahgift/Coursera-DE-C2-Using-Linux)\n  * [Lesson 2:  How shell piping works](https://github.com/noahgift/Coursera-DE-C2-Shell-Piping)\n  * [Lesson 3: Using SSH](https://github.com/noahgift/ssh-tips-tricks)\n \n###### Week2: Using Bash \n\n  * [Lesson 1: Create and Use .bashrc](https://github.com/noahgift/Coursera-DE-C2-configure-shell)\n  * [Lesson 2: Sourcing shell variables from a script](https://github.com/noahgift/Coursera-DE-C2-shell-variables)\n  * [Lesson3:  Using stdout and stdin](https://github.com/noahgift/Coursera-DE-C2-Standard-Streams)\n\n###### Week3: Building Bash Scripts \n\n * [Lesson 1:  Build a for loop in Bash](https://github.com/noahgift/Coursera-DE-C2-Use-Shell-Logic-and-Control-Flow)\n * [Lesson 2:  Truncate large files with Bash](https://github.com/noahgift/coursera-de-c2-truncate-file)\n * [Lesson 3: Building a command-line tool for data processing](https://github.com/noahgift/Coursera-DE-C2-bash-cli-reverse-string)\n * [Lesson 4: Build Bash CLI with options ](https://github.com/noahgift/Coursera-DE-C2-Lab3-Building-Bash-Scripts.git)\n\n###### Week4: Composing File and Data Management Solutions with Linux\n\n* [Lesson 1: Understand the search commands](https://github.com/noahgift/Coursera-DE-C2-search-commands)\n* [Lesson 2: Setting permissions](https://github.com/noahgift/Coursera-DE-C2-Files-Directories-Permissions)\n* [Lesson 3: Using regex to process text from file](https://github.com/noahgift/Coursera-DE-C2-using-regex-search)\n* [Lesson 4: Search the filesystem with find](https://github.com/noahgift/Coursera-DE-C2-Lab4-Composing-File-Data-Solutions)\n\n#### Course3: Python and SQL for Data Engineering\n#### Course4: Building Data Engineering Solutions with Python for Web Applications, Command-Line Tools and Notebooks\n\n## Sequel Material\n\nThese resources could be helpful after starting this course.\n\n### Duke/Coursera: Applied Data Engineering Course (Launching late 2022)\n\n\n## Github Repos Referenced Duke Coursera Course\n\n### Course 1: Cloud Computing Foundations\n\n* [Practice Markdown](https://github.com/noahgift/duke-coursera-ccf-lab1/blob/main/Practice-Markdown.ipynb)\n* [Github Actions-Pytest](https://github.com/noahgift/github-actions-pytest)\n* [Google App Engine Continuous Delivery](https://github.com/noahgift/gcp-flask-ml-deploy)\n* [Hello World Flask](https://github.com/noahgift/flask-hello-coursera)\n* [Hugo Continuous Delivery on AWS](https://github.com/noahgift/dukehugofeb1)\n\n### Course 2:  Cloud Computing Building Blocks\n\n* [Lint Dockerfile](https://github.com/noahgift/duke-coursera-ccb-lab1)\n* [Flask Change Microservice]\n\n\n## Lecture Topics:\n\n### Getting Started: [Week1]\n\n* [Getting Started](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter01-getting-started/)\n\n### Cloud Computing Foundations: [Week2]\n\n* [Cloud Computing Foundations](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter02-cloud-foundations/)\n\n### Virtualization and Containers: [Week3 \u0026 Week 4]\n\n* [Containers, Virtualization and Elasticity](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter03-virtualization-containers-elasticity/)\n\n### Challenges and Opportunities in Distributed Computing: [Week 5 \u0026 Week 6]\n\n* [Distributed Computing](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter04-distributed-computing/)\n\n### Cloud Storage [Week 7 \u0026 Week 8]\n\n* [Cloud Storage](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter05-cloud-storage/)\n\n### Serverless [Week 9 \u0026 Week 10]\n\n* [Serverless](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter06-serverless-etl/)\n\n### MLOps, Big Data and Edge Computer Vision [Week 11 \u0026 Week 12 \u0026 Week 13]\n\n* [Managed ML Systems](https://paiml.com/docs/home/books/cloud-computing-for-data/chapter07-managed-ml/)\n* [Edge Computer Vision Notebooks and Code](https://github.com/noahgift/edge-computer-vision)\n* [HuggingFace](https://learning.oreilly.com/videos/applied-hugging-face/10212022VIDEOPAIML/)\n* [OpenAI](https://learning.oreilly.com/videos/assimilate-openai/08252022VIDEOPAIML/)\n\n### General\n\n* [Key Terms](https://noahgift.github.io/cloud-data-analysis-at-scale/topics/key-terms)\n* [(Q\u0026A) Question Answer](https://noahgift.github.io/cloud-data-analysis-at-scale/topics/Question-Answer)\n\n### Student Example Projects\n\n* [434 Analytics Application Development by Steve Depp](http://www.stevedepp.com/learn/school/msds/de/434.html)\n* [462 Computer Vision by Steve Depp](http://www.stevedepp.com/learn/school/msds/ai/462.html)\n\n#### *A practical guide to Data Science, Machine Learning Engineering and Data Engineering*\n\n[Read Cloud Computing for Data Book](https://paiml.com/docs/home/books/cloud-computing-for-data/)\n![cloud4data books](https://d2sofvawe08yqg.cloudfront.net/cloud4data/hero2x?1578933644)\n\n[Free book Developing-on-AWS-with-CSharp](https://d1.awsstatic.com/developer-center/Developing-on-AWS-with-CSharp.pdf)\n![Screenshot 2022-10-28 at 7 12 09 AM](https://user-images.githubusercontent.com/58792/198574661-c631cffa-4fca-4b7e-836f-a82bef7d77f6.png)\n\n#### Next Steps:  Take Coursera MLOps Course\n\n![cloud-specialization](https://user-images.githubusercontent.com/58792/121041040-650ca180-c780-11eb-956e-8d1ecb134641.png)\n\n* [Take the Specialization](https://www.coursera.org/learn/cloud-computing-foundations-duke?specialization=building-cloud-computing-solutions-at-scale)\n* [Cloud Computing Foundations](https://www.coursera.org/learn/cloud-computing-foundations-duke?specialization=building-cloud-computing-solutions-at-scale)\n* [Cloud Virtualization, Containers and APIs](https://www.coursera.org/learn/cloud-virtualization-containers-api-duke?specialization=building-cloud-computing-solutions-at-scale)\n* [Cloud Data Engineering](https://www.coursera.org/learn/cloud-data-engineering-duke?specialization=building-cloud-computing-solutions-at-scale)\n* [Cloud Machine Learning Engineering and MLOps](https://www.coursera.org/learn/cloud-machine-learning-engineering-mlops-duke?specialization=building-cloud-computing-solutions-at-scale)\n\n\n### Text and Code License\nThe text and code content of notebooks and documents is released under the [CC-BY-NC-ND license](https://github.com/noahgift/cloud-data-analysis-at-scale/blob/master/license.md)\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnoahgift%2Fcloud-data-analysis-at-scale","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnoahgift%2Fcloud-data-analysis-at-scale","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnoahgift%2Fcloud-data-analysis-at-scale/lists"}