An open API service indexing awesome lists of open source software.

https://github.com/mariann95/sql_data_warehouse_and_analytics_project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.
https://github.com/mariann95/sql_data_warehouse_and_analytics_project

data-analysis data-analytics data-cleaning data-engineering data-lakehouse data-science data-science-portfolio data-warehouse data-warehousing datalake datawarehouse datawarehousing etl etl-job etl-pipeline medallion-architecture sql sql-query sql-server sqlserver

Last synced: 7 months ago
JSON representation

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

Awesome Lists containing this project

README

          

# Data Warehouse and Analytics Project

Welcome to the **Data Warehouse and Analytics Project** repository! ๐Ÿš€
This project demonstrates a comprehensive data warehousing and analytics solution, from building a data warehouse to generating
actionable insights. Designed as a portfolio project, it highlights industry best practices in data engineering and analytics.

By Mariann รcs-Kovรกcs

#### [LinkedIn profile](https://www.linkedin.com/in/mariann95/)

---
## ๐Ÿ“– Project Overview

This project involves:

1. **Data Architecture**: Designing a Modern Data Warehouse Using Medallion Architecture **Bronze**, **Silver**, and **Gold** layers.
2. **ETL Pipelines**: Extracting, transforming, and loading data from source systems into the warehouse.
3. **Data Modeling**: Developing fact and dimension tables optimized for analytical queries.
4. **Analytics & Reporting**: Creating SQL-based reports and dashboards for actionable insights.

๐ŸŽฏ This project showcases my expertise in:
- SQL Development
- Data Architect
- Data Engineering
- ETL Pipeline Developer
- Data Modeling
- Data Analytics

---

## ๐Ÿ› ๏ธ Important Links & Tools:

Everything is for Free!
- **[Datasets](datasets/):** Access to the project dataset (csv files).
- **[SQL Server Express](https://www.microsoft.com/en-us/sql-server/sql-server-downloads):** Lightweight server for hosting your SQL database.
- **[SQL Server Management Studio (SSMS)](https://learn.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-ver16):** GUI for managing and interacting with databases.

---

## ๐Ÿ“‹ Project Requirements

### Building the Data Warehouse (Data Engineering) ๐Ÿšง

#### Objective
Develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.

#### Specifications
- **Data Sources**: Import data from two source systems (ERP and CRM) provided as CSV files.
- **Data Quality**: Cleanse and resolve data quality issues prior to analysis.
- **Integration**: Combine both sources into a single, user-friendly data model designed for analytical queries.
- **Scope**: Focus on the latest dataset only; historization of data is not required.
- **Documentation**: Provide clear documentation of the data model to support both business stakeholders and analytics teams.

---

### BI: Analytics & Reporting (Data Analysis) ๐Ÿ“Š
A comprehensive collection of SQL scripts for data exploration, analytics, and reporting. These scripts cover various analyses such as database exploration, measures and metrics, time-based trends, cumulative analytics, segmentation, and more. This repository contains SQL queries designed to help data analysts and BI professionals quickly explore, segment, and analyze data within a relational database. Each script focuses on a specific analytical theme and demonstrates best practices for SQL queries.

#### Objective
Develop SQL-based analytics to deliver detailed insights into:
- **Customer Behavior**
- **Product Performance**
- **Sales Trends**

These insights empower stakeholders with key business metrics, enabling strategic decision-making.

For more details, refer to [documents/requirements.md](documents/requirements.md).

---

## ๐Ÿ—๏ธ Data Architecture

The data architecture for this project follows Medallion Architecture **Bronze**, **Silver**, and **Gold** layers:

![Data Architecture](documents/data_architecture.png)

1. **Bronze Layer**: Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.
2. **Silver Layer**: This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.
3. **Gold Layer**: Houses business-ready data modeled into a star schema required for reporting and analytics.

---

## โ†”๏ธ Data Integration

Relationships between tables in the Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems.
The diagram is divided into two main sections: the left side represents the CRM system, and the right side represents the ERP system.

![Data Integration](documents/data_integration.png)

---

## ๐Ÿ“ โ†’ ๐Ÿ“„ Data Flow

The data flow diagram is illustrating the data lineage from source systems to different data layers. It shows the transformation
and flow of data through different stages, ensuring data quality and readiness for analysis or reporting.

![Data Flow](documents/data_flow.png)

---

## โญ Data Model

The diagram represents a star schema data model for a sales data mart. The star schema comprises three tables:
**gold.dim_customers**, **gold.fact_sales**, and **gold.dim_products**.

![Data Model](documents/data_model.png)

---

## ๐Ÿ“‚ Repository Structure
```
data-warehouse-project/
โ”‚
โ”œโ”€โ”€ datasets/ # Raw datasets used for the project (ERP and CRM data)
โ”‚
โ”œโ”€โ”€ documents/ # Project documentation and architecture details
โ”‚ โ”œโ”€โ”€ data_architecture.drawio # Draw.io file shows the project's architecture
โ”‚ โ”œโ”€โ”€ data_catalog.md # Catalog of datasets, including field descriptions and metadata
โ”‚ โ”œโ”€โ”€ data_flow.drawio # Draw.io file for the data flow diagram
โ”‚ โ”œโ”€โ”€ data_integration.drawio # Draw.io file for how the tables are related
โ”‚ โ”œโ”€โ”€ data_model.drawio # Draw.io file for data model (star schema)
โ”‚ โ”œโ”€โ”€ naming-conventions.md # Consistent naming guidelines for tables, columns, and files
โ”‚
โ”œโ”€โ”€ scripts/ # SQL scripts for ETL and transformations and for data exploration, analytics, and reporting
โ”‚ โ”œโ”€โ”€ bronze/ # Scripts for extracting and loading raw data
โ”‚ โ”œโ”€โ”€ silver/ # Scripts for cleaning and transforming data
โ”‚ โ”œโ”€โ”€ gold/ # Scripts for creating analytical models
โ”‚ โ”œโ”€โ”€ Exploratory Data Analysis (EDA)/ # Scripts for understanding data
โ”‚ โ”œโ”€โ”€ Advanced Data Analytics + Reports/ # Scripts for answering business questions
โ”‚
โ”œโ”€โ”€ tests/ # Test scripts and quality files
โ”‚
โ”œโ”€โ”€ README.md # Project overview and instructions
โ”œโ”€โ”€ LICENSE # License information for the repository
โ”œโ”€โ”€ .gitignore # Files and directories to be ignored by Git
โ””โ”€โ”€ requirements.txt # Dependencies and requirements for the project
```
---

## ๐Ÿ›ก๏ธ License

This project is licensed under the [MIT License](LICENSE). You are free to use, modify, and share this project with proper attribution.