https://github.com/trannhatnguyen2/bi_cloud_kientap

Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration
https://github.com/trannhatnguyen2/bi_cloud_kientap

azure datalake datawarehouse powerbi

Last synced: about 1 month ago
JSON representation

Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration

Host: GitHub
URL: https://github.com/trannhatnguyen2/bi_cloud_kientap
Owner: trannhatnguyen2
Created: 2023-07-07T11:41:59.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-07-07T15:22:40.000Z (about 2 years ago)
Last Synced: 2025-01-17T04:43:39.318Z (9 months ago)
Topics: azure, datalake, datawarehouse, powerbi
Language: Jupyter Notebook
Homepage:
Size: 37.6 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# [Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration](https://github.com/trannhatnguyen2/BI_Cloud_KienTap)

## Member of group

### **`BoKho`**

| student_id | class | full_name | role |
| ---------- | ------- | ---------------- | ------ |
| K204061440 | K20406T | Tran Nhat Nguyen | Leader |
| K204061446 | K20406C | Man Dac Sang | Member |

# 📕 Table of contents

- 🛠️ [Requirements](#️-requirements)
- 🧙‍♂️ [Data Source](#-data-source)
- 🚀 [Solution](#-solution)
- 🌊 [Building Data Lake](#-building-data-lake)
- 🧱 [Building Data Warehouse](#-building-data-warehouse)
- 📊 [Result](#️-result)

# 🛠️ Requirements

Many businesses are using multiple systems and data is distributed across multiple sources and formatted in different file types. This leads to difficulties in importing and storing data, and if there are discrepancies, it can lead to many negative consequences such as loss of consistency, unnecessary costs, and impact on the business decision-making process.

# 🧙‍♂️ Data Source

The data was obtained from Kaggle for experimentation. It was divided into three different sources:

Data Sources

1. `Databases`: recording sales activities on an e-commerce platform in Brazil regarding orders.

ERD model

2. `Accounting Systems`: recording and managing customer payment information for orders.
3. `Web Services`: customer comments on products and services.

# 🚀 Solution

BI Solution

- Step 1: Identify data sources and file formats for each source.
- Step 2: Extract data into the `rawdata` zone using a Python script; perform dynamic ELT processes into the `curated` zone to store and upload necessary data for analysis to Azure SQL Server.
- Step 3: Perform ETL processes into the Data Warehouse using Data Factory.
- Step 4: Visualize data using Power BI.

# 🌊 Building Data Lake

## Container

The tool used to create data storage zones on the Azure platform is Blob Storage.

Containers

### **rawdata**

An exact copy of the data from sources, organized in an orderly folder structure.

```bash
./RAWDATA
├── .accounting_systems/ <- Accounting System Source
│ ├── Payment_2018_01.csv
│ ├── Payment_2018_02.csv
│ └── Payment_2018_03.csv
│
├── .databases/ <- Databases Source
│ ├── Customer_2018_01.csv
│ ├── Customer_2018_02.csv
│ ├── Customer_2018_03.csv
│ ├── Order_2018_01.csv
│ ├── Order_2018_02.csv
│ |── Order_2018_03.csv
| ├── OrderItem_2018_01.csv
│ |── OrderItem_2018_02.csv
│ └── OrderItem_2018_03.csv
│
├── .web_services/ <- Web Services Source
│ ├── Review_2018_01.zip
│ ├── Review_2018_01.zip
│ └── Review_2018_01.zip
```

### **staging**

Used to extract all compressed files to prepare for importing data into the `curated` zone.

### **curated**

```bash
./CURATED
├── .EXTERNAL/
│ ├── .Review/
├── .2018/
├── .01/
└── Review_2018_01.json
├── .02/
└── Review_2018_02.json
├── .03/
└── Review_2018_03.json
├── .INTERNAL/
│ ├── .Accounting/
├── .Payment/
├── .2018/
├── .01/
└── Payment_2018_01.csv
├── .02/
└── Payment_2018_02.csv
├── .03/
└── Payment_2018_03.csv
│ ├── .Sales/
├── .Customer/
├── .2018/
├── .01/
└── Customer_2018_01.csv
├── .02/
└── Customer_2018_02.csv
├── .03/
└── Customer_2018_03.csv
├── .Order/
├── .2018/
├── .01/
└── Order_2018_01.csv
├── .02/
└── Order_2018_02.csv
├── .03/
└── Order_2018_03.csv
├── .OrderItem/
├── .2018/
├── .01/
└── OrderItemm_2018_01.csv
├── .02/
└── OrderItemm_2018_02.csv
├── .03/
└── OrderItemm_2018_03.csv
```

## Dynamic ELT

Dynamic ELT process is the process of extracting raw data and uploading it to data storage zones accurately according to the predefined structure through adjusting input parameters.

Dynamic ELT Process

# 🧱 Building Data Warehouse

`Bus Matrix`, `Master Data`, `Transaction Data`, `ETL Mapping`, etc. are deployed to support the data warehouse construction process.

## Data Warehouse model

The diagram below illustrates the fundamental conceptual diagram of the proposed data warehouse in Star format.

Data Warehouse Star Schema

## ETL process

ETL Pipeline

Based on the pipeline shown above, it is divided into 2 phases:

- Phase 1: Load data from Azure SQL Server --> Dimension Tables
- Phase 2: Load data from Azure SQL Server --> Fact Table

# 📊 Result

Sales Performance Dashboard

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/trannhatnguyen2/bi_cloud_kientap

Awesome Lists containing this project

README