https://github.com/kamanhang/sqldatawarehousedataengineeringproject
This project delivers a modern data warehouse which focuses on building clean, organized data pipeline which covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics
https://github.com/kamanhang/sqldatawarehousedataengineeringproject
customer-analytics data-analysis data-cleaning data-engineering data-modeling data-pipeline data-visualization datascience etl-pipeline postgresql powerbi powerbidashboard sales-analysis sql
Last synced: 8 months ago
JSON representation
This project delivers a modern data warehouse which focuses on building clean, organized data pipeline which covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics
- Host: GitHub
- URL: https://github.com/kamanhang/sqldatawarehousedataengineeringproject
- Owner: KamanHang
- Created: 2025-01-20T17:51:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-07T18:50:25.000Z (over 1 year ago)
- Last Synced: 2025-05-30T12:54:04.306Z (about 1 year ago)
- Topics: customer-analytics, data-analysis, data-cleaning, data-engineering, data-modeling, data-pipeline, data-visualization, datascience, etl-pipeline, postgresql, powerbi, powerbidashboard, sales-analysis, sql
- Language: PLpgSQL
- Homepage:
- Size: 6.47 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### ❗Things to Consider❗
The provided dataset was anonymous so I provided a fictional name "Bike Haven Collective" - a company that sells bikes, related acessories and clothing.
# SQL Data Warehouse and Data Analytics Project
This project delivers a modern data warehouse which focuses on building clean, organized data pipeline and covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics.
## Project Division
This Project focuses into two different sections:
- Data Engineering
- Data Analytics and Reporting
## Data Engineering 👷🏻♂️
In this section of the project I have performed following tasks:
_(I have performed the entire task using PL/PostgreSQL)_
- Implemented Medallion Architecture to develop data pipeline for more high quality data flow.
- Developed ETL Pipeline (Extract, Transform, Load)
- Ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) data sources.
- Performed:
- Data Cleansing tasks (Removing Duplicates, Handling Unwanted Spaces, missing and invalid data, Data Type Casting and Filtering)
- Data Standardization
- Data Normalization
- Data Enrichment
- Data Integration for Qualitative Data
- Performed Data Modeling by creating FACTS & DIMENSIONS Table for high quality data analysis in GOLD Layer.
# ⛩️ Data Architecture

One of the important thing I was exposed during this project is the Medallion Architecture.
Medallion Architecture consist three layers which helped me design and build modular and robust data warehouse.
- ### **Bronze Layer:**
- In this layer, I have ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) CSV files into PostgreSQL.
- ### **Silver Layer:**
- In this layer, I have performed data cleansing (Handling Null values, empty spaces), standardization, normalization, enrichment and derived columns tasks.
- ### **Gold Layer:**
- In this layer, I have created **Data Model: Star Schema**, in which I have created Fact and Dimension Tables for advance data analytics.
# Data LINEAGE (Data Flow)
*Note: Final Updated Data Lineage*

# Data Modeling (Star Schema)
- **STAR SCHEMA**
Star Schema is a multi-dimensional data model for organizing data in a way that makes data analytical tasks easier and helps non technical people easy to understand and get insights from the data.
- ### Dimension Table
- dim_customers
- dim_products
- ### Facts Table
- fact_sales
### _For more details check [Data Catlog](https://github.com/KamanHang/sqldatawarehousedataengineeringproject/blob/main/ProjectScripts/data_catlog.md) of Gold Layer_

## Data Analytics and Reporting 📊
- I have analyzed the sales data for different analysis and created an interactive Power BI dashboard:
