Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davidag/data-engineering-zoomcamp
My Data Engineering Zoomcamp 2023 notes and homework
https://github.com/davidag/data-engineering-zoomcamp
Last synced: about 19 hours ago
JSON representation
My Data Engineering Zoomcamp 2023 notes and homework
- Host: GitHub
- URL: https://github.com/davidag/data-engineering-zoomcamp
- Owner: davidag
- Created: 2023-01-21T11:39:12.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-10T11:56:40.000Z (about 1 year ago)
- Last Synced: 2023-10-10T12:43:12.954Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 158 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Engineering Zoomcamp 2023
## Week 1: Introduction and Prerequisites
- How to use Docker and Docker Compose
- Python scripting and Jupyter notebooks
- pgcli and pgAdmin for interacting with PostgreSQL
- SQL refresher
- Google Cloud and Terraform[Homework solution](week_1_basics_n_setup)
## Week 2: Workflow Orchestration
- Data Lakes and Data Warehouses
- Workflow orchestration with Prefect
- Google Cloud Storage and BigQuery
- Using a Docker registry to run Prefect flows in Docker containers[Homework solution](week_2_workflow_orchestration)
## Week 3: Data Warehouse and BigQuery
- OLAP vs OLTP
- Data warehouses
- BigQuery, including partitioning and clustering
- BigQuery Machine Learning[Homework solution](week_3_data_warehouse)
## Week 4: Analytics Engineering
- Analytics Engineering
- data build tool (dbt) and BigQuery
- dbt models
- data visualization[Homework solution](week_4_analytics_engineering)
## Week 5: Batch Processing
- Data processing: batch vs streaming
- Apache Spark
- DataFrames: Actions and Transformations
- Spark SQL: Join and GroupBy
- Resilient Distributed Datasets
- Google Cloud Dataproc[Homework solution](week_5_batch_processing)
## Week 6: Stream Processing
- [Stream processing](https://en.wikipedia.org/wiki/Stream_processing)
- [Apache Kafka](https://kafka.apache.org/)
- [Kafka Connect](https://kafka.apache.org/documentation/#connect)
- [Kafka Streams](https://kafka.apache.org/documentation/streams/)
- [Confluent Schema Registry](https://github.com/confluentinc/schema-registry)
- [ksqlDB](https://ksqldb.io/)[Homework solution](week_6_stream_processing)