An open API service indexing awesome lists of open source software.

https://github.com/salma-mamdoh/datawarehouse_project

Our project for Datawarehouse Course taken during fall 2024 semester
https://github.com/salma-mamdoh/datawarehouse_project

analytics dashboard datawarehousing deploy elt kpis powerbi sql ssis

Last synced: 8 months ago
JSON representation

Our project for Datawarehouse Course taken during fall 2024 semester

Awesome Lists containing this project

README

          

# Datawarehouse_Project ✨📊



Table of Contents



Northwind Database Schema


The Northwind database is a classic and fictional relational database that simulates the operations of a small trading company called Northwind Traders. The database is structured into several tables representing entities such as products, customers, orders, employees, suppliers, and categories. Each table contains attributes relevant to its respective entity, facilitating the storage and management of data related to the company's transactions, inventory, personnel, and interactions with customers and suppliers.


Main Tables




  • Products: Stores information about the products offered by Northwind Traders.


  • Customers: Maintains the details of the customers who place orders.


  • Orders: Captures the order details, including the customer, order date, and shipping information.


  • Order Details: Stores the individual line items for each order, including the product, quantity, and unit price.


  • Employees: Holds the data about the company's employees, such as their names, job titles, and reporting structure.


  • Suppliers: Keeps track of the suppliers who provide products to Northwind Traders.


  • Categories: Organizes the products into different categories.


Database Schema


db schems

KPIs


The following table outlines a set of essential KPIs relevant to our project




KPI
Definition


Total Orders Handled
The aggregate count of orders managed and fulfilled by shippers, indicating operational throughput and workload distribution efficiency.


Average Shipment Time
The mean duration taken by each shipper to complete order deliveries, reflecting the speed and reliability of shipping services provided.


Average Shipment Cost
The average expense incurred per shipment by individual shippers, serving as a key metric for evaluating shipping cost effectiveness and operational expenditure management.


Total Units Sold
The cumulative quantity of units sold across all product offerings, indicating sales volume and market demand for each product.


Total Revenue
The aggregated revenue generated from the sale of all products, serving as a primary indicator of sales performance and revenue generation capability.


Revenue per Unit
The average revenue generated per unit sold for each product, providing insights into pricing strategies, product profitability, and customer purchasing behavior.


Total Number of Orders
The overall count of orders placed by customers, representing sales activity levels and customer engagement with the business.


Total Sales Revenue
The total monetary value generated from all sales transactions, serving as a key financial metric for assessing business performance and revenue growth.


Average Order Value
The mean monetary value of individual orders placed by customers, providing insights into customer spending patterns and purchase behavior, and informing marketing and sales strategies.


Data Warehouse Design


Data Staging Area


The data staging area plays a pivotal role in the construction of the data warehouse. It serves as an intermediary step where data is collected, transformed, and prepared for integration into the warehouse.




  • Data Extraction: Relevant data is extracted from source systems, such as the Northwind database.


  • Data Transformation: Data undergoes transformations to standardize formats, handle null values, and perform other necessary modifications.


  • Data Loading: Transformed data is loaded into staging tables within the data warehouse, ensuring cleanliness and consistency for downstream processing.


Data Warehouse Construction


Facts



Fact Table
Description


F_Shipper
Captures shipping-related metrics such as the number of orders handled, average shipment time, and average shipment cost.


F_Product
Stores data on product sales, including the number of sold items, total revenue, and supplier information.


F_Sales
Tracks sales transactions made by employees, including the number of orders served and total revenue generated.


Dimensions



Dimension Table
Description


D_Date
Contains date-related attributes for time-based analysis.


D_Location
Stores location information for customers, suppliers, and shipping destinations.


D_Customer
Holds details about customers, such as name, location, and segment.


D_Employee
Stores information about employees, including their names and job titles.


D_Supplier
Contains details about suppliers, such as name and location.


D_Product
Stores information about products, including category, supplier, and unit price.


D_Shipper
Contains details about shipping companies, such as name and location.


Datawarehouse Schema


dwh schema

ETL Process


In this project, I utilized SQL Server Integration Services (SSIS) to design and implement a robust ETL process for populating a data warehouse, seamlessly integrating diverse data sources. To optimize the ETL, I implemented an incremental load strategy, enabling real-time data integration and analysis. The data warehouse schema supported comprehensive analysis, with fact tables capturing key business metrics complemented by dimension tables providing essential context. By leveraging SSIS and incremental loading, I delivered a solution that empowers stakeholders with timely and accurate insights to drive informed decisions.


This is a screenshot of the ETL Process for Product Fact Table loading


ssis1
ssis2
ssis3

To see other packages download folder SSIS and enjoy :)


Deploying and Scheduling


In the deployment and scheduling phase, the SSIS catalog facilitated the organization and deployment of SSIS projects, packages, and environments. SQL Server Agent jobs were utilized to automate the execution of SSIS packages, enabling efficient ETL workflows. Each job executed specific SSIS packages responsible for loading dimension and fact tables. By defining job schedules and dependencies, ETL processes ran reliably according to business requirements. The centralized approach streamlined management and integration with other SQL Server components. This framework provided a robust foundation for automating and managing ETL processes, facilitating informed decision-making and driving business outcomes.


Analytical Queries




  • Query 1: Identifies the fastest shipper for each country based on average shipment time during the specified period.


  • Query 2: Finds the top supplier in terms of revenue for each country during the specified period.


  • Query 3: Calculates the total number of units sold by each supplier during the specified period.


  • Query 4: Calculates the average shipment cost for each country.


  • Query 5: Calculates the total number of orders handled by each employee during the specified period.


  • Query 6: Calculates the total number of orders processed by each shipper during the specified period.


  • Query 7: Calculates the total revenue generated by each country during the specified period.


Interactive Dashboard Using Power BI




  • Category Sales Analysis: This section displays the total sales revenue for different product categories, providing insights into the best-selling and underperforming product lines.


  • Customer Order Distribution: Here, the distribution of orders among major customers is visualized through a pie chart, highlighting the top contributors to the company's revenue.


  • Global Revenue Overview: This section presents the total revenue generated from different countries, offering insights into the company's global reach and performance.


  • Employee Performance: Employee productivity and performance are showcased here, listing the top employees by the number of orders processed.



dwh schema