https://github.com/trannhatnguyen2/bi_cloud_kientap
Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration
https://github.com/trannhatnguyen2/bi_cloud_kientap
azure datalake datawarehouse powerbi
Last synced: about 1 month ago
JSON representation
Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration
- Host: GitHub
- URL: https://github.com/trannhatnguyen2/bi_cloud_kientap
- Owner: trannhatnguyen2
- Created: 2023-07-07T11:41:59.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-07T15:22:40.000Z (about 2 years ago)
- Last Synced: 2025-01-17T04:43:39.318Z (9 months ago)
- Topics: azure, datalake, datawarehouse, powerbi
- Language: Jupyter Notebook
- Homepage:
- Size: 37.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# [Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration](https://github.com/trannhatnguyen2/BI_Cloud_KienTap)
## Member of group
### **`BoKho`**
| student_id | class | full_name | role |
| ---------- | ------- | ---------------- | ------ |
| K204061440 | K20406T | Tran Nhat Nguyen | Leader |
| K204061446 | K20406C | Man Dac Sang | Member |# 📕 Table of contents
- 🛠️ [Requirements](#️-requirements)
- 🧙♂️ [Data Source](#-data-source)
- 🚀 [Solution](#-solution)
- 🌊 [Building Data Lake](#-building-data-lake)
- 🧱 [Building Data Warehouse](#-building-data-warehouse)
- 📊 [Result](#️-result)
# 🛠️ Requirements
Many businesses are using multiple systems and data is distributed across multiple sources and formatted in different file types. This leads to difficulties in importing and storing data, and if there are discrepancies, it can lead to many negative consequences such as loss of consistency, unnecessary costs, and impact on the business decision-making process.
# 🧙♂️ Data Source
The data was obtained from Kaggle for experimentation. It was divided into three different sources:
Data Sources1. `Databases`: recording sales activities on an e-commerce platform in Brazil regarding orders.
ERD model2. `Accounting Systems`: recording and managing customer payment information for orders.
3. `Web Services`: customer comments on products and services.# 🚀 Solution
BI Solution- Step 1: Identify data sources and file formats for each source.
- Step 2: Extract data into the `rawdata` zone using a Python script; perform dynamic ELT processes into the `curated` zone to store and upload necessary data for analysis to Azure SQL Server.
- Step 3: Perform ETL processes into the Data Warehouse using Data Factory.
- Step 4: Visualize data using Power BI.# 🌊 Building Data Lake
## Container
The tool used to create data storage zones on the Azure platform is Blob Storage.
Containers### **rawdata**
An exact copy of the data from sources, organized in an orderly folder structure.
```bash
./RAWDATA
├── .accounting_systems/ <- Accounting System Source
│ ├── Payment_2018_01.csv
│ ├── Payment_2018_02.csv
│ └── Payment_2018_03.csv
│
├── .databases/ <- Databases Source
│ ├── Customer_2018_01.csv
│ ├── Customer_2018_02.csv
│ ├── Customer_2018_03.csv
│ ├── Order_2018_01.csv
│ ├── Order_2018_02.csv
│ |── Order_2018_03.csv
| ├── OrderItem_2018_01.csv
│ |── OrderItem_2018_02.csv
│ └── OrderItem_2018_03.csv
│
├── .web_services/ <- Web Services Source
│ ├── Review_2018_01.zip
│ ├── Review_2018_01.zip
│ └── Review_2018_01.zip
```### **staging**
Used to extract all compressed files to prepare for importing data into the `curated` zone.
### **curated**
```bash
./CURATED
├── .EXTERNAL/
│ ├── .Review/
├── .2018/
├── .01/
└── Review_2018_01.json
├── .02/
└── Review_2018_02.json
├── .03/
└── Review_2018_03.json
├── .INTERNAL/
│ ├── .Accounting/
├── .Payment/
├── .2018/
├── .01/
└── Payment_2018_01.csv
├── .02/
└── Payment_2018_02.csv
├── .03/
└── Payment_2018_03.csv
│ ├── .Sales/
├── .Customer/
├── .2018/
├── .01/
└── Customer_2018_01.csv
├── .02/
└── Customer_2018_02.csv
├── .03/
└── Customer_2018_03.csv
├── .Order/
├── .2018/
├── .01/
└── Order_2018_01.csv
├── .02/
└── Order_2018_02.csv
├── .03/
└── Order_2018_03.csv
├── .OrderItem/
├── .2018/
├── .01/
└── OrderItemm_2018_01.csv
├── .02/
└── OrderItemm_2018_02.csv
├── .03/
└── OrderItemm_2018_03.csv
```## Dynamic ELT
Dynamic ELT process is the process of extracting raw data and uploading it to data storage zones accurately according to the predefined structure through adjusting input parameters.
Dynamic ELT Process# 🧱 Building Data Warehouse
`Bus Matrix`, `Master Data`, `Transaction Data`, `ETL Mapping`, etc. are deployed to support the data warehouse construction process.
## Data Warehouse model
The diagram below illustrates the fundamental conceptual diagram of the proposed data warehouse in Star format.
Data Warehouse Star Schema## ETL process
ETL PipelineBased on the pipeline shown above, it is divided into 2 phases:
- Phase 1: Load data from Azure SQL Server --> Dimension Tables
- Phase 2: Load data from Azure SQL Server --> Fact Table# 📊 Result
Sales Performance Dashboard---
© 2023 BoKho