An open API service indexing awesome lists of open source software.

https://github.com/thomasshikalepo/sql-data-warehouse-project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics
https://github.com/thomasshikalepo/sql-data-warehouse-project

data-analysis data-cleaning data-engineering data-lakehouse data-science data-warehouse data-warehousing datascience datawarehousing etl-pipeline medallion-architecture sql sql-query sql-server

Last synced: 10 months ago
JSON representation

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics

Awesome Lists containing this project

README

          

# πŸ“Š Data Warehouse and Analytics Project

Welcome to the **Data Warehouse and Analytics Project** repository! πŸš€
This portfolio project showcases a complete end-to-end data warehousing and analytics solutionβ€”from raw data ingestion to business intelligence reporting. It follows **industry best practices** in data engineering and analytics.

---

## πŸ—οΈ Data Architecture

This project follows the **Medallion Architecture**, structured into three layers:

![data_architecture](https://github.com/user-attachments/assets/1f295203-e8ab-4b6c-9de9-9df0514cceab)

- **Bronze Layer**: Stores raw data ingested *as-is* from source systems (CSV files) into a SQL Server database.
- **Silver Layer**: Processes and transforms data with cleansing, standardization, and normalization techniques.
- **Gold Layer**: Contains **business-ready**, analytics-optimized data modeled using a **star schema**.

---

## πŸ“– Project Overview

This project involves:

- **Data Architecture**: Building a modern warehouse with Medallion Architecture (Bronze, Silver, Gold).
- **ETL Pipelines**: Extracting, transforming, and loading data from ERP and CRM CSVs.
- **Data Modeling**: Designing fact and dimension tables for optimized analytical queries.
- **Analytics & Reporting**: Creating SQL-based reports and dashboards for actionable business insights.

---

## 🧰 Tools & Resources

Everything is **100% free** and open-source!

- πŸ“‚ **Datasets**: ERP and CRM CSV files
- 🧩 **SQL Server Express**: Lightweight SQL Server instance
- πŸ–₯️ **SQL Server Management Studio (SSMS)**: GUI for SQL Server
- 🧠 **Draw.io**: For data modeling and architecture diagrams
- πŸ’‘ **Notion**: Project templates and documentation
- πŸ’» **GitHub**: For version control and collaboration

---

## πŸš€ Project Requirements

### 🧱 Part 1: Building the Data Warehouse (Engineering)

**Goal**: Develop a modern data warehouse using SQL Server for unified, analytics-ready sales data.

**Specifications**:
- Import data from two sources (ERP and CRM, in CSV format).
- Cleanse and resolve data quality issues.
- Integrate data into a **single analytical model**.
- Focus on the **most recent data** (no historization required).
- Document the data model for stakeholders and analysts.

---

### πŸ“Š Part 2: Business Intelligence & Reporting (Analysis)

**Goal**: Use SQL to analyze data and generate business insights.

**Insights Provided**:
- Customer Behavior
- Product Performance
- Sales Trends

These insights help drive **data-driven decision-making**.

πŸ“„ For full details, see [`docs/requirements.md`](docs/requirements.md)

---

## πŸ“‚ Repository Structure

```bash
data-warehouse-project/
β”‚
β”œβ”€β”€ datasets/ # Raw datasets used for the project (ERP and CRM data)
β”‚
β”œβ”€β”€ docs/ # Project documentation and architecture details
β”‚ β”œβ”€β”€ etl.drawio # Draw.io file showing ETL techniques and flow
β”‚ β”œβ”€β”€ data_architecture.drawio # Diagram of the overall data warehouse architecture
β”‚ β”œβ”€β”€ data_catalog.md # Metadata and field descriptions of datasets
β”‚ β”œβ”€β”€ data_flow.drawio # Visual data flow from source to destination
β”‚ β”œβ”€β”€ data_models.drawio # Star schema and data model designs
β”‚ β”œβ”€β”€ naming-conventions.md # Standards for naming tables, fields, and files
β”‚
β”œβ”€β”€ scripts/ # SQL scripts for ETL and transformation
β”‚ β”œβ”€β”€ bronze/ # Scripts for loading raw data (Bronze layer)
β”‚ β”œβ”€β”€ silver/ # Scripts for data cleansing and transformation (Silver layer)
β”‚ β”œβ”€β”€ gold/ # Scripts for building the analytical model (Gold layer)
β”‚
β”œβ”€β”€ tests/ # Data quality checks and testing scripts
β”‚
β”œβ”€β”€ README.md # Project overview and setup instructions
β”œβ”€β”€ LICENSE # License file for this repository
β”œβ”€β”€ .gitignore # Git ignore rules for files and folders
└── requirements.txt # Required software/tools and setup dependencies