https://github.com/thomasshikalepo/sql-data-warehouse-project
Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics
https://github.com/thomasshikalepo/sql-data-warehouse-project
data-analysis data-cleaning data-engineering data-lakehouse data-science data-warehouse data-warehousing datascience datawarehousing etl-pipeline medallion-architecture sql sql-query sql-server
Last synced: 10 months ago
JSON representation
Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics
- Host: GitHub
- URL: https://github.com/thomasshikalepo/sql-data-warehouse-project
- Owner: ThomasShikalepo
- License: mit
- Created: 2025-06-22T09:49:09.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-06-29T13:07:36.000Z (11 months ago)
- Last Synced: 2025-07-11T08:59:03.527Z (11 months ago)
- Topics: data-analysis, data-cleaning, data-engineering, data-lakehouse, data-science, data-warehouse, data-warehousing, datascience, datawarehousing, etl-pipeline, medallion-architecture, sql, sql-query, sql-server
- Language: TSQL
- Homepage:
- Size: 2.04 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π Data Warehouse and Analytics Project
Welcome to the **Data Warehouse and Analytics Project** repository! π
This portfolio project showcases a complete end-to-end data warehousing and analytics solutionβfrom raw data ingestion to business intelligence reporting. It follows **industry best practices** in data engineering and analytics.
---
## ποΈ Data Architecture
This project follows the **Medallion Architecture**, structured into three layers:

- **Bronze Layer**: Stores raw data ingested *as-is* from source systems (CSV files) into a SQL Server database.
- **Silver Layer**: Processes and transforms data with cleansing, standardization, and normalization techniques.
- **Gold Layer**: Contains **business-ready**, analytics-optimized data modeled using a **star schema**.
---
## π Project Overview
This project involves:
- **Data Architecture**: Building a modern warehouse with Medallion Architecture (Bronze, Silver, Gold).
- **ETL Pipelines**: Extracting, transforming, and loading data from ERP and CRM CSVs.
- **Data Modeling**: Designing fact and dimension tables for optimized analytical queries.
- **Analytics & Reporting**: Creating SQL-based reports and dashboards for actionable business insights.
---
## π§° Tools & Resources
Everything is **100% free** and open-source!
- π **Datasets**: ERP and CRM CSV files
- π§© **SQL Server Express**: Lightweight SQL Server instance
- π₯οΈ **SQL Server Management Studio (SSMS)**: GUI for SQL Server
- π§ **Draw.io**: For data modeling and architecture diagrams
- π‘ **Notion**: Project templates and documentation
- π» **GitHub**: For version control and collaboration
---
## π Project Requirements
### π§± Part 1: Building the Data Warehouse (Engineering)
**Goal**: Develop a modern data warehouse using SQL Server for unified, analytics-ready sales data.
**Specifications**:
- Import data from two sources (ERP and CRM, in CSV format).
- Cleanse and resolve data quality issues.
- Integrate data into a **single analytical model**.
- Focus on the **most recent data** (no historization required).
- Document the data model for stakeholders and analysts.
---
### π Part 2: Business Intelligence & Reporting (Analysis)
**Goal**: Use SQL to analyze data and generate business insights.
**Insights Provided**:
- Customer Behavior
- Product Performance
- Sales Trends
These insights help drive **data-driven decision-making**.
π For full details, see [`docs/requirements.md`](docs/requirements.md)
---
## π Repository Structure
```bash
data-warehouse-project/
β
βββ datasets/ # Raw datasets used for the project (ERP and CRM data)
β
βββ docs/ # Project documentation and architecture details
β βββ etl.drawio # Draw.io file showing ETL techniques and flow
β βββ data_architecture.drawio # Diagram of the overall data warehouse architecture
β βββ data_catalog.md # Metadata and field descriptions of datasets
β βββ data_flow.drawio # Visual data flow from source to destination
β βββ data_models.drawio # Star schema and data model designs
β βββ naming-conventions.md # Standards for naming tables, fields, and files
β
βββ scripts/ # SQL scripts for ETL and transformation
β βββ bronze/ # Scripts for loading raw data (Bronze layer)
β βββ silver/ # Scripts for data cleansing and transformation (Silver layer)
β βββ gold/ # Scripts for building the analytical model (Gold layer)
β
βββ tests/ # Data quality checks and testing scripts
β
βββ README.md # Project overview and setup instructions
βββ LICENSE # License file for this repository
βββ .gitignore # Git ignore rules for files and folders
βββ requirements.txt # Required software/tools and setup dependencies