https://github.com/iamraphson/de-zoom-camp-2024
https://github.com/iamraphson/de-zoom-camp-2024
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/iamraphson/de-zoom-camp-2024
- Owner: iamraphson
- Created: 2024-01-17T04:46:05.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-01T23:42:41.000Z (about 2 years ago)
- Last Synced: 2025-06-07T15:04:19.523Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 169 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Engineering ZoomCamp 2024
This repo contains homework, notes and final project(s) for the [Data Engineering Zoomcamp](https://github.com/DataTalksClub/data-engineering-zoomcamp) by [Datatalks.Club](https://datatalks.club/).
Each week I completed a series of [videos](https://youtube.com/playlist?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb) and followed this up with homework exercises.
## Tools
We used a range of tools:
* [Terraform](https://www.terraform.io): Infrastructure-as-Code (IaC)
* [Docker](https://www.docker.com): Containerization
* [SQL](https://www.postgresqltutorial.com): Data Analysis & Exploration
* [Mage](https://www.mage.ai/): Workflow Orchestration. You can use [Airflow](https://airflow.apache.org/) too.
* [DBT(Data build tool)](https://www.getdbt.com/): Open-source command-line tool that enables data analysts and engineers to transform and model data in their data warehouses using SQL.
* [Metabase](https://www.metabase.com/): Open-source business intelligence (BI) and analytics tool that allows users to easily visualize and analyze their data. You can use [Google looker studio](https://lookerstudio.google.com/).
* [Google Dataproc](https://cloud.google.com/dataproc): Serivce used to run Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and other big data processing frameworks. Similar to [Amazon EMR](https://aws.amazon.com/emr/) or [Azure HDInsight](https://azure.microsoft.com/en-ca/products/hdinsight).
* [Google Cloud Storage](https://cloud.google.com/storage): Google datalake. Similar to [Amazon S3](https://aws.amazon.com/s3/) or [Azure blob storage](https://azure.microsoft.com/en-ca/products/storage/blobs/).
* [BigQuery](https://cloud.google.com/bigquery): Google datawarehouse. Similar to [Amazon redshift](https://aws.amazon.com/redshift/) or [Azure Synapse Analytics](https://azure.microsoft.com/en-ca/products/synapse-analytics/).
* [Apache Spark](https://spark.apache.org/): Excutes data engineering, data science, and machine learning on single-node machines or clusters.
* [Pyspark](https://spark.apache.org/docs/latest/api/python/index.html): Python API for Apache Spark.