https://github.com/caogiathinh/caogiathinh
https://github.com/caogiathinh/caogiathinh
airflow database dataengineer dbt dsa linux python spark sql
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/caogiathinh/caogiathinh
- Owner: caogiathinh
- Created: 2025-08-19T02:27:08.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-10-07T01:26:22.000Z (8 months ago)
- Last Synced: 2025-10-07T03:26:04.520Z (8 months ago)
- Topics: airflow, database, dataengineer, dbt, dsa, linux, python, spark, sql
- Homepage:
- Size: 45.9 KB
- Stars: 5
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cao Gia Thα»nh
### Data Engineer
Welcome to my GitHub profile!
I'm Cao Gia Thinh, a final-year Computer Science student with a deep focus on Data Engineering. I am passionate about designing and building scalable, high-performance data systems that transform raw data into valuable insights to support business decision-making.
---
## π GitHub Stats
## π οΈ Tech Stack & Core Competencies

---
## π Key Projects
These are my flagship projects that showcase my skills and experience.
### 1. [urban-mobility-elt-pipeline](https://github.com/caogiathinh/urban_mobility_elt_pipeline)
*Built a complete data platform on Google Cloud to collect, process, and analyze retail data from various sources.*
- **Orchestration:** Leveraged **Kestra** (deployed on Cloud Composer) to schedule and orchestrate data ingestion pipelines from parquet files.
- **Data Lake & Warehouse:** Stored raw data in **Google Cloud Storage (GCS)**. Subsequently, cleaned, transformed, and loaded the data into **Google BigQuery** using **Apache Spark**.
- **Data Modeling:** Implemented a **Star Schema** within BigQuery to optimize for analytical queries.
- **Deployment:** Containerized the entire application and its dependencies using **Docker** to ensure consistency across environments.
**Technologies:** `GCP (BigQuery, GCS, Composer)`, `Kestra`, `Apache Spark`, `Docker`, `Python`, `SQL`, `dbt`, `Google Data Studio`.
---
### 2. [modern-data-warehouse](https://github.com/caogiathinh/modern-data-warehouse)
*Designed and implemented a modern data warehouse to empower Sales and Marketing teams with advanced analytics.*
- **ETL & Transformation:** Using SQL to extract, transform, and load from source to destination data warehouse.
- **Data Warehouse Design:** Architected a DWH schema on **Microsoft SQL Server**.
**Technologies:** `T-SQL`, `MS SQL SERVER`.
## π« Let's Connect!
I'm always open to discussing new opportunities, interesting projects, or anything related to data and technology. Feel free to reach out!
****