{"id":25656289,"url":"https://github.com/kamanhang/sqldatawarehousedataengineeringproject","last_synced_at":"2025-10-10T12:05:25.718Z","repository":{"id":273458484,"uuid":"919608681","full_name":"KamanHang/sqldatawarehousedataengineeringproject","owner":"KamanHang","description":"This project delivers a modern data warehouse which focuses on building clean, organized data pipeline which covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics","archived":false,"fork":false,"pushed_at":"2025-02-07T18:50:25.000Z","size":6786,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-30T12:54:04.306Z","etag":null,"topics":["customer-analytics","data-analysis","data-cleaning","data-engineering","data-modeling","data-pipeline","data-visualization","datascience","etl-pipeline","postgresql","powerbi","powerbidashboard","sales-analysis","sql"],"latest_commit_sha":null,"homepage":"","language":"PLpgSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KamanHang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-20T17:51:41.000Z","updated_at":"2025-02-12T15:39:52.000Z","dependencies_parsed_at":"2025-05-30T11:02:33.905Z","dependency_job_id":"5d93bd1a-3852-4a97-b63c-ef8433995e3c","html_url":"https://github.com/KamanHang/sqldatawarehousedataengineeringproject","commit_stats":null,"previous_names":["kamanhang/sqldatawarehousedataengineeringproject"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/KamanHang/sqldatawarehousedataengineeringproject","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KamanHang%2Fsqldatawarehousedataengineeringproject","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KamanHang%2Fsqldatawarehousedataengineeringproject/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KamanHang%2Fsqldatawarehousedataengineeringproject/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KamanHang%2Fsqldatawarehousedataengineeringproject/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KamanHang","download_url":"https://codeload.github.com/KamanHang/sqldatawarehousedataengineeringproject/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KamanHang%2Fsqldatawarehousedataengineeringproject/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279003892,"owners_count":26083638,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["customer-analytics","data-analysis","data-cleaning","data-engineering","data-modeling","data-pipeline","data-visualization","datascience","etl-pipeline","postgresql","powerbi","powerbidashboard","sales-analysis","sql"],"created_at":"2025-02-23T22:28:00.436Z","updated_at":"2025-10-10T12:05:25.694Z","avatar_url":"https://github.com/KamanHang.png","language":"PLpgSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n### ❗Things to Consider❗\nThe provided dataset was anonymous so I provided a fictional name \"Bike Haven Collective\" - a company that sells bikes, related acessories and clothing.\n#  SQL Data Warehouse and Data Analytics Project\nThis project delivers a modern data warehouse which focuses on building clean, organized data pipeline and covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics.\n\n## Project Division \nThis Project focuses into two different sections:\n- Data Engineering\n- Data Analytics and Reporting\n\n## Data Engineering 👷🏻‍♂️\nIn this section of the project I have performed following tasks:\u003cbr\u003e\n\u003cbr\u003e\n_(I have performed the entire task using PL/PostgreSQL)_\n- Implemented Medallion Architecture to develop data pipeline for more high quality data flow.\n- Developed ETL Pipeline (Extract, Transform, Load)\n- Ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) data sources.\n- Performed:\n    - Data Cleansing tasks (Removing Duplicates, Handling Unwanted Spaces, missing and invalid data, Data Type Casting and Filtering)\n    - Data Standardization\n    - Data Normalization\n    - Data Enrichment\n    - Data Integration for Qualitative Data\n- Performed Data Modeling by creating FACTS \u0026 DIMENSIONS Table for high quality data analysis in GOLD Layer.\n\n\n# ⛩️ Data Architecture\n![DataArchitecturedrawio](https://github.com/user-attachments/assets/8f124cd0-6690-4455-80d9-8d99634a1dc1)\n\nOne of the important thing I was exposed during this project is the Medallion Architecture.\u003cbr\u003e\nMedallion Architecture consist three layers which helped me design and build modular and robust data warehouse.\n- ### **Bronze Layer:**\n     - In this layer, I have ingested raw data from CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning) CSV files into PostgreSQL.\n- ### **Silver Layer:**\n     - In this layer, I have performed data cleansing (Handling Null values, empty spaces), standardization, normalization, enrichment and derived columns tasks.\n- ### **Gold Layer:**\n     - In this layer, I have created **Data Model: Star Schema**, in which I have created Fact and Dimension Tables for advance data analytics.\n\n# Data LINEAGE (Data Flow)\n\u003ci\u003e*Note: Final Updated Data Lineage*\u003c/i\u003e\n![Data Lineage](https://github.com/user-attachments/assets/96f4e8d6-993f-4913-834d-b4887e0883cf)\n\n# Data Modeling (Star Schema)\n- **STAR SCHEMA** \u003cbr\u003e \u003cbr\u003e\nStar Schema is a multi-dimensional data model for organizing data in a way that makes data analytical tasks easier and helps non technical people easy to understand and get insights from the data.\n\n    - ### Dimension Table\n       - dim_customers\n       - dim_products\n    - ### Facts Table\n       - fact_sales\n \n ### _For more details check [Data Catlog](https://github.com/KamanHang/sqldatawarehousedataengineeringproject/blob/main/ProjectScripts/data_catlog.md) of Gold Layer_ \n![StarSchema](https://github.com/user-attachments/assets/21e97013-0699-4f1e-b51f-b7cecdf9ad5e)\n\n\n\n## Data Analytics and Reporting 📊\n\n- I have analyzed the sales data for different analysis and created an interactive Power BI dashboard:\n \n![PowerBI](https://github.com/user-attachments/assets/a0f7c76d-8011-431c-81d0-37d1dffd23cb)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkamanhang%2Fsqldatawarehousedataengineeringproject","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkamanhang%2Fsqldatawarehousedataengineeringproject","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkamanhang%2Fsqldatawarehousedataengineeringproject/lists"}