https://github.com/ds-fau-ck/fintech-data-migration
Implementing Change Data Capture for Seamless Fintech Data Migration
https://github.com/ds-fau-ck/fintech-data-migration
adls azure-sql-database azure-synapse delta-tables jupyter-notebook pyspark python3 sql
Last synced: 5 months ago
JSON representation
Implementing Change Data Capture for Seamless Fintech Data Migration
- Host: GitHub
- URL: https://github.com/ds-fau-ck/fintech-data-migration
- Owner: ds-fau-ck
- License: mit
- Created: 2024-09-16T12:16:50.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-21T13:49:23.000Z (over 1 year ago)
- Last Synced: 2025-03-26T06:51:08.539Z (about 1 year ago)
- Topics: adls, azure-sql-database, azure-synapse, delta-tables, jupyter-notebook, pyspark, python3, sql
- Language: Jupyter Notebook
- Homepage:
- Size: 2.01 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### **End-to-End Data Engineering pipeline On Fintech Data Migration**
### **Introduction**
This project demonstrates the migration of data from a **relational SQL Server database** to a **cloud-based Azure architecture**. The process involves moving financial data from a traditional SQL database to **Azure Data Lake Storage (ADLS)** using a structured and scalable approach.
### **Architecture**

The data migration and transformation process is organized using the **Bronze, Silver, and Gold** layer architecture:
- **Bronze Layer**: Raw data extracted from the source.
- **Silver Layer**: Data quality checks and transformations applied.
- **Gold Layer**: Fully processed data, ready for analytics and reporting.
A Synapse pipeline was created to automate the **extraction, loading, and transformation (ELT)** processes, ensuring seamless data flow through these layers. After transformations, the final data from the **Gold Layer** is stored in **SynapseDWH** for further use.

### **Technology Used**
1. **Programming Language**: Python
2. **Scripting Language**: SQL
3. **Data Processing**: PySpark
4. **Azure Cloud Platform**:
- Azure Data Lake Storage (ADLS)
- Azure SQL Database
- Delta Tables
- Azure Synapse Analytics
### **Step 1: SQL Server to Bronze Layer**
The first step involved migrating tables from the SQL Server into the Bronze Layer in ADLS. Initially, separate copy activities were created for each table, but this approach was automated using **Lookup** and **ForEach** activities, which dynamically fetch the list of tables and loop through each to execute the copy activity, eliminating manual work.
Key Components:
- **SQL Server**: Source database.
- **Azure Data Lake Storage (ADLS)**: Storage for raw data in the Bronze Layer.
- **Synapse Pipeline**: Used to automate data movement from SQL Server to ADLS.
**Step 2: Moving Data from Bronze to Silver Layer**
Once in the Bronze Layer, data underwent validation and transformation using **Notebook 1**. The transformed data was stored in the Silver Layer, where it is cleaned and ready for further processing.
**Step 3: Moving Data from Silver to Gold Layer**
**Notebook 2** handled further transformations and aggregations on the Silver Layer data, moving the final processed data to the Gold Layer, which is now ready for analytics, reporting, and querying.
**Step 4: Notification Setup with Logic App**
Azure **Logic App** was configured to send email notifications upon pipeline success or failure, ensuring stakeholders are informed about the pipeline execution.
### **Challenges Encountered**
- **Initial Manual Configuration**: Creating individual copy activities for each table was inefficient, later resolved by using dynamic activities.
- **Error Handling**: Connection issues with SQL Server, ADLS, and Synapse required troubleshooting during the pipeline execution.
### **Scripts and data for this Project**
1. [Spark-notebook](spark-notebook/BronzeToSilverDataProcess.ipynb)
2. [Spark-notebook](spark-notebook/SilverToGoldDataProcess.ipynb)
3. [Sql-Database-Table-Accounts](Sql-Database-Table/Accounts.sql)
4. [Sql-Database-Table-Customers](Sql-Database-Table/Customers.sql)
5. [Sql-Database-Table-Loans](Sql-Database-Table/Loans.sql)
6. [Sql-Database-Table-Payments](Sql-Database-Table/Payments.sql)
7. [Sql-Database-Table-Transactions](Sql-Database-Table/Transactions.sql)
### **Conclusion**
The project successfully implemented a Lakehouse architecture for fintech data migration. By automating the data migration process with Synapse Pipelines and organizing data into Bronze, Silver, and Gold layers in ADLS, a scalable, efficient, and agile architecture was created. This design supports future analytics and reporting needs, with Synapse and Logic Apps providing a robust orchestration and notification mechanism.