https://github.com/kamanhang/sqldatawarehousedataengineeringproject
This project delivers a modern data warehouse which focuses on building clean, organized data pipeline for advance data analytics. I have covered important aspects such as ETL Development, Data Cleaning, Data Modelling and Data Analytics
https://github.com/kamanhang/sqldatawarehousedataengineeringproject
Last synced: 4 months ago
JSON representation
This project delivers a modern data warehouse which focuses on building clean, organized data pipeline for advance data analytics. I have covered important aspects such as ETL Development, Data Cleaning, Data Modelling and Data Analytics
- Host: GitHub
- URL: https://github.com/kamanhang/sqldatawarehousedataengineeringproject
- Owner: KamanHang
- Created: 2025-01-20T17:51:41.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-02-04T16:45:26.000Z (4 months ago)
- Last Synced: 2025-02-04T17:45:07.342Z (4 months ago)
- Language: PLpgSQL
- Homepage:
- Size: 2.76 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### ❗Things to Consider❗
The provided dataset was anonymous so I provided a fictional name "Bike Haven Collective" - a company that sells bikes, related acessories and clothing.
# SQL Data Warehouse and Data Analytics Project
This project delivers a modern data warehouse which focuses on building clean, organized data pipeline and covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics.## Project Division
This Project focuses into two different sections:
- Data Engineering
- Data Analytics and Reporting## Data Engineering 👷🏻♂️
In this section of the project I have performed following tasks:
_(I have performed the entire task using PL/PostgreSQL)_
- Implemented Medallion Architecture to develop data pipeline for more high quality data flow.
- Developed ETL Pipeline (Extract, Transform, Load)
- Ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) data sources.
- Performed:
- Data Cleansing tasks (Removing Duplicates, Handling Unwanted Spaces, missing and invalid data, Data Type Casting and Filtering)
- Data Standardization
- Data Normalization
- Data Enrichment
- Data Integration for Qualitative Data
- Performed Data Modeling by creating FACTS & DIMENSIONS Table for high quality data analysis in GOLD Layer.# ⛩️ Data Architecture
One of the important thing I was exposed during this project is the Medallion Architecture.
Medallion Architecture consist three layers which helped me design and build modular and robust data warehouse.
- ### **Bronze Layer:**
- In this layer, I have ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) CSV files into PostgreSQL.
- ### **Silver Layer:**
- In this layer, I have performed data cleansing (Handling Null values, empty spaces), standardization, normalization, enrichment and derived columns tasks.
- ### **Gold Layer:**
- In this layer, I have created **Data Model: Star Schema**, in which I have created Fact and Dimension Tables for advance data analytics.# Data LINEAGE (Data Flow)
*Note: Final Updated Data Lineage*
# Data Modeling (Star Schema)
- **STAR SCHEMA**
Star Schema is a multi-dimensional data model for organizing data in a way that makes data analytical tasks easier and helps non technical people easy to understand and get insights from the data.- ### Dimension Table
- dim_customers
- dim_products
- ### Facts Table
- fact_sales
### _For more details check [Data Catlog](https://github.com/KamanHang/sqldatawarehousedataengineeringproject/blob/main/ProjectScripts/data_catlog.md) of Gold Layer_
## Data Analytics and Reporting 📊
- I have analyzed the sales data for different analysis and created an interactive Power BI dashboard:
