{"id":13404255,"url":"https://github.com/DataTalksClub/data-engineering-zoomcamp","last_synced_at":"2025-03-14T09:30:52.135Z","repository":{"id":36964010,"uuid":"419661684","full_name":"DataTalksClub/data-engineering-zoomcamp","owner":"DataTalksClub","description":"Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.","archived":false,"fork":false,"pushed_at":"2025-02-12T12:32:19.000Z","size":8364,"stargazers_count":28843,"open_issues_count":4,"forks_count":6116,"subscribers_count":492,"default_branch":"main","last_synced_at":"2025-02-15T11:37:03.172Z","etag":null,"topics":["data-engineering","dbt","docker","kafka","kestra","spark"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataTalksClub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-21T09:32:50.000Z","updated_at":"2025-02-15T06:41:47.000Z","dependencies_parsed_at":"2023-10-23T01:15:21.187Z","dependency_job_id":"1ac554f8-eb42-4eb7-ba1b-2165f50df757","html_url":"https://github.com/DataTalksClub/data-engineering-zoomcamp","commit_stats":{"total_commits":782,"total_committers":129,"mean_commits":6.062015503875969,"dds":0.7327365728900256,"last_synced_commit":"beb77c92b9a0982b718c588bdee207764c319857"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataTalksClub%2Fdata-engineering-zoomcamp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataTalksClub%2Fdata-engineering-zoomcamp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataTalksClub%2Fdata-engineering-zoomcamp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataTalksClub%2Fdata-engineering-zoomcamp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataTalksClub","download_url":"https://codeload.github.com/DataTalksClub/data-engineering-zoomcamp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243553890,"owners_count":20309832,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","dbt","docker","kafka","kestra","spark"],"created_at":"2024-07-30T19:01:41.656Z","updated_at":"2025-03-14T09:30:52.119Z","avatar_url":"https://github.com/DataTalksClub.png","language":"Jupyter Notebook","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg width=\"100%\" src=\"images/architecture/arch_v4_workshops.jpg\" alt=\"Data Engineering Zoomcamp Overview\"\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003e\n    \u003cstrong\u003eData Engineering Zoomcamp: A Free 9-Week Course on Data Engineering Fundamentals\u003c/strong\u003e\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\nMaster the fundamentals of data engineering by building an end-to-end data pipeline from scratch. Gain hands-on experience with industry-standard tools and best practices.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://airtable.com/shr6oVXeQvSI5HuWD\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/875246/185755203-17945fd1-6b64-46f2-8377-1011dcb1a444.png\" height=\"50\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://datatalks.club/slack.html\"\u003eJoin Slack\u003c/a\u003e •\n\u003ca href=\"https://app.slack.com/client/T01ATQK62F8/C01FABYF2RG\"\u003e#course-data-engineering Channel\u003c/a\u003e •\n\u003ca href=\"https://t.me/dezoomcamp\"\u003eTelegram Announcements\u003c/a\u003e •\n\u003ca href=\"https://www.youtube.com/playlist?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb\"\u003eCourse Playlist\u003c/a\u003e •\n\u003ca href=\"https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit?usp=sharing\"\u003eFAQ\u003c/a\u003e\n\u003c/p\u003e\n\n## How to Enroll\n\n### 2025 Cohort\n- **Start Date**: January 13, 2025\n- **Register Here**: [Sign up](https://airtable.com/shr6oVXeQvSI5HuWD)\n- **Access Cohort Materials**: [2025 Cohort Folder](cohorts/2025/)\n\n### Self-Paced Learning\nAll course materials are freely available for independent study. Follow these steps:\n1. Watch the course videos.\n2. Join the [Slack community](https://datatalks.club/slack.html).\n3. Refer to the [FAQ document](https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit?usp=sharing) for guidance.\n\n## Syllabus Overview\nThe course consists of structured modules, hands-on workshops, and a final project to reinforce your learning.\n\n### **Prerequisites**\nTo get the most out of this course, you should have:\n- Basic coding experience\n- Familiarity with SQL\n- Experience with Python (helpful but not required)\n\nNo prior data engineering experience is necessary.\n\n### **Modules**\n\n#### [Module 1: Containerization and Infrastructure as Code](01-docker-terraform/)\n- Introduction to GCP\n- Docker and Docker Compose\n- Running PostgreSQL with Docker\n- Infrastructure setup with Terraform\n- Homework\n\n#### [Module 2: Workflow Orchestration](02-workflow-orchestration/)\n- Data Lakes and Workflow Orchestration\n- Workflow orchestration with Kestra\n- Homework\n\n#### [Workshop 1: Data Ingestion](cohorts/2025/workshops/dlt/README.md)\n- API reading and pipeline scalability\n- Data normalization and incremental loading\n- Homework\n\n#### [Module 3: Data Warehousing](03-data-warehouse/)\n- Introduction to BigQuery\n- Partitioning, clustering, and best practices\n- Machine learning in BigQuery\n\n#### [Module 4: Analytics Engineering](04-analytics-engineering/)\n- dbt (data build tool) with PostgreSQL \u0026 BigQuery\n- Testing, documentation, and deployment\n- Data visualization with Metabase\n\n#### [Module 5: Batch Processing](05-batch/)\n- Introduction to Apache Spark\n- DataFrames and SQL\n- Internals of GroupBy and Joins\n\n#### [Module 6: Streaming](06-streaming/)\n- Introduction to Kafka\n- Kafka Streams and KSQL\n- Schema management with Avro\n\n#### [Final Project](projects/)\n- Apply all concepts learned in a real-world scenario\n- Peer review and feedback process\n\n## Community \u0026 Support\n\n### **Getting Help on Slack**\nJoin the [`#course-data-engineering`](https://app.slack.com/client/T01ATQK62F8/C01FABYF2RG) channel on [DataTalks.Club Slack](https://datatalks.club/slack.html) for discussions, troubleshooting, and networking.\n\nTo keep discussions organized:\n- Follow [our guidelines](asking-questions.md) when posting questions.\n- Review the [community guidelines](https://datatalks.club/slack/guidelines.html).\n\n## Meet the Instructors\n- [Victoria Perez Mola](https://www.linkedin.com/in/victoriaperezmola/)\n- [Alexey Grigorev](https://linkedin.com/in/agrigorev)\n- [Michael Shoemaker](https://www.linkedin.com/in/michaelshoemaker1/)\n- [Zach Wilson](https://www.linkedin.com/in/eczachly)\n- [Will Russell](https://www.linkedin.com/in/wrussell1999/)\n- [Anna Geller](https://www.linkedin.com/in/anna-geller-12a86811a/)\n\nPast instructors:\n- [Ankush Khanna](https://linkedin.com/in/ankushkhanna2)\n- [Sejal Vaidya](https://www.linkedin.com/in/vaidyasejal/)\n- [Irem Erturk](https://www.linkedin.com/in/iremerturk/)\n- [Luis Oliveira](https://www.linkedin.com/in/lgsoliveira/)\n\n## Sponsors \u0026 Supporters\nA special thanks to our course sponsors for making this initiative possible!\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://kestra.io/\"\u003e\n    \u003cimg height=\"120\" src=\"images/kestra.svg\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://dlthub.com/\"\u003e\n    \u003cimg height=\"90\" src=\"images/dlthub.png\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nInterested in supporting our community? Reach out to [alexey@datatalks.club](mailto:alexey@datatalks.club).\n\n## About DataTalks.Club\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"40%\" src=\"https://github.com/user-attachments/assets/1243a44a-84c8-458d-9439-aaf6f3a32d89\" alt=\"DataTalks.Club\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://datatalks.club/\"\u003eDataTalks.Club\u003c/a\u003e is a global online community of data enthusiasts. It's a place to discuss data, learn, share knowledge, ask and answer questions, and support each other.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://datatalks.club/\"\u003eWebsite\u003c/a\u003e •\n\u003ca href=\"https://datatalks.club/slack.html\"\u003eJoin Slack Community\u003c/a\u003e •\n\u003ca href=\"https://us19.campaign-archive.com/home/?u=0d7822ab98152f5afc118c176\u0026id=97178021aa\"\u003eNewsletter\u003c/a\u003e •\n\u003ca href=\"http://lu.ma/dtc-events\"\u003eUpcoming Events\u003c/a\u003e •\n\u003ca href=\"https://calendar.google.com/calendar/?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ\"\u003eGoogle Calendar\u003c/a\u003e •\n\u003ca href=\"https://www.youtube.com/@DataTalksClub/featured\"\u003eYouTube\u003c/a\u003e •\n\u003ca href=\"https://github.com/DataTalksClub\"\u003eGitHub\u003c/a\u003e •\n\u003ca href=\"https://www.linkedin.com/company/datatalks-club/\"\u003eLinkedIn\u003c/a\u003e •\n\u003ca href=\"https://twitter.com/DataTalksClub\"\u003eTwitter\u003c/a\u003e\n\u003c/p\u003e\n\nAll the activity at DataTalks.Club mainly happens on [Slack](https://datatalks.club/slack.html). We post updates there and discuss different aspects of data, career questions, and more.\n\nAt DataTalksClub, we organize online events, community activities, and free courses. You can learn more about what we do at [DataTalksClub Community Navigation](https://www.notion.so/DataTalksClub-Community-Navigation-bf070ad27ba44bf6bbc9222082f0e5a8?pvs=21).\n\n","funding_links":[],"categories":["Jupyter Notebook","Learning Resources","Get Started","😸 List of Repos","其他__大数据","Data Engineering ##","Repos","Uncategorized","⚙️ Data Engineering"],"sub_categories":["Tutorials","网络服务_其他","Uncategorized","Resources"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDataTalksClub%2Fdata-engineering-zoomcamp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDataTalksClub%2Fdata-engineering-zoomcamp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDataTalksClub%2Fdata-engineering-zoomcamp/lists"}