An open API service indexing awesome lists of open source software.

https://github.com/iamraphson/de-zoom-camp-2024


https://github.com/iamraphson/de-zoom-camp-2024

Last synced: 10 months ago
JSON representation

Awesome Lists containing this project

README

          

# Data Engineering ZoomCamp 2024

This repo contains homework, notes and final project(s) for the [Data Engineering Zoomcamp](https://github.com/DataTalksClub/data-engineering-zoomcamp) by [Datatalks.Club](https://datatalks.club/).

Each week I completed a series of [videos](https://youtube.com/playlist?list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb) and followed this up with homework exercises.

## Tools

We used a range of tools:
* [Terraform](https://www.terraform.io): Infrastructure-as-Code (IaC)
* [Docker](https://www.docker.com): Containerization
* [SQL](https://www.postgresqltutorial.com): Data Analysis & Exploration
* [Mage](https://www.mage.ai/): Workflow Orchestration. You can use [Airflow](https://airflow.apache.org/) too.
* [DBT(Data build tool)](https://www.getdbt.com/): Open-source command-line tool that enables data analysts and engineers to transform and model data in their data warehouses using SQL.
* [Metabase](https://www.metabase.com/): Open-source business intelligence (BI) and analytics tool that allows users to easily visualize and analyze their data. You can use [Google looker studio](https://lookerstudio.google.com/).
* [Google Dataproc](https://cloud.google.com/dataproc): Serivce used to run Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and other big data processing frameworks. Similar to [Amazon EMR](https://aws.amazon.com/emr/) or [Azure HDInsight](https://azure.microsoft.com/en-ca/products/hdinsight).
* [Google Cloud Storage](https://cloud.google.com/storage): Google datalake. Similar to [Amazon S3](https://aws.amazon.com/s3/) or [Azure blob storage](https://azure.microsoft.com/en-ca/products/storage/blobs/).
* [BigQuery](https://cloud.google.com/bigquery): Google datawarehouse. Similar to [Amazon redshift](https://aws.amazon.com/redshift/) or [Azure Synapse Analytics](https://azure.microsoft.com/en-ca/products/synapse-analytics/).
* [Apache Spark](https://spark.apache.org/): Excutes data engineering, data science, and machine learning on single-node machines or clusters.
* [Pyspark](https://spark.apache.org/docs/latest/api/python/index.html): Python API for Apache Spark.