{"id":22662825,"url":"https://github.com/shawonsimon/azure-data-engineering","last_synced_at":"2026-05-15T11:35:27.881Z","repository":{"id":263073766,"uuid":"889264743","full_name":"ShawonSimon/Azure-Data-Engineering","owner":"ShawonSimon","description":"An end-to-end data engineering solution on Azure, transforming SQL Server data into Power BI reports using Data Lake, Data Factory, Databricks, Synapse, and Key Vault for security.","archived":false,"fork":false,"pushed_at":"2024-12-08T12:35:18.000Z","size":2670,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T09:13:32.198Z","etag":null,"topics":["azure-keyvault","data-engineering","data-visualization","databricks","powerbi","sqlserver","synapse"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShawonSimon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-15T23:54:40.000Z","updated_at":"2024-12-08T12:35:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"050f0aa3-18d9-4211-9210-377ab7c71624","html_url":"https://github.com/ShawonSimon/Azure-Data-Engineering","commit_stats":null,"previous_names":["shawonsimon/azure-data-engineering"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ShawonSimon/Azure-Data-Engineering","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShawonSimon%2FAzure-Data-Engineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShawonSimon%2FAzure-Data-Engineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShawonSimon%2FAzure-Data-Engineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShawonSimon%2FAzure-Data-Engineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShawonSimon","download_url":"https://codeload.github.com/ShawonSimon/Azure-Data-Engineering/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShawonSimon%2FAzure-Data-Engineering/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263314105,"owners_count":23447291,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-keyvault","data-engineering","data-visualization","databricks","powerbi","sqlserver","synapse"],"created_at":"2024-12-09T12:15:47.083Z","updated_at":"2026-05-15T11:35:27.851Z","avatar_url":"https://github.com/ShawonSimon.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Engineering on Azure\nThis project demonstrates an end-to-end data engineering solution on Microsoft Azure, designed to handle the ingestion, transformation, and analysis of data from an on-premises SQL Server database to a comprehensive reporting platform in Power BI. The solution uses Azure Data Lake Storage Gen2, Azure Data Factory, Databricks, and Azure Synapse Analytics, with added security managed through Azure Key Vault.\n![image](https://github.com/user-attachments/assets/707223df-2f77-47f2-bc17-bddda35af25a)\n\n\n# Project Overview\n## Pipeline Components\n1. Self-Hosted Integration Runtime (SHIR):\n   \n    - Used for secure data transfer from the on-premises SQL Server to Azure. The SHIR facilitates connectivity between the on-prem environment and Azure Data Factory.\n2. Azure Data Factory (ADF):\n\n    - Orchestrates the data pipeline by moving data from the on-premises SQL Server to Azure Data Lake Storage Gen2 via SHIR.\n    - Performs data ingestion, using various activities to manage data flow and ensure seamless pipeline execution.\n3. Azure Data Lake Storage Gen2:\n\n    - Stores ingested data in Bronze, Silver, and Gold layers to manage raw, cleansed, and curated datasets, respectively.\n4. Databricks:\n\n    - Transforms data from the Silver layer to the Gold layer.\n    - Handles complex transformations, cleansing, and data preparation for downstream analytics.\n5. Azure Synapse Analytics:\n\n    - Acts as the data warehouse, loading curated data from the Gold layer for advanced analytics.\n    - Enables efficient query processing and serves as the source for Power BI reporting.\n6. Azure Key Vault:\n\n    - Manages and secures sensitive information such as database connection strings and API keys used throughout the pipeline.\n7. Power BI:\n\n    - Connects to Azure Synapse Analytics for data visualization and reporting, enabling insights and analysis of the ingested and transformed data.\n# Workflow\n1. Data Ingestion:\n\n   - Data is ingested from an on-premises SQL Server database using SHIR and ADF, moving data securely to Azure Data Lake Storage Gen2.\n3. Data Transformation:\n\n   - Data in the Bronze layer is cleansed and transformed into the Silver layer.\n   - Databricks processes the Silver data and produces a refined dataset in the Gold layer.\n5. Data Loading and Analytics:\n\n   - The transformed data from the Gold layer is loaded into Azure Synapse Analytics.\n   - Power BI accesses the data from Synapse to create interactive reports and visualizations.\n# Security\n   \n   - Azure Key Vault ensures the security of sensitive credentials used in the pipeline, such as database passwords and access keys.\n# Conclusion\n\nThis project demonstrates how to build a scalable and secure data engineering solution on Azure, using best practices in data storage, transformation, and analytics. It leverages SHIR for secure on-premises connectivity, data layer separation in Azure Data Lake, and integration with powerful analytics and visualization tools like Azure Synapse and Power BI.\n\nMany thanks to [Mr K. Talks Tech](https://www.youtube.com/@mr.ktalkstech) for one of the best tutorials about data engineering on Azure that I have Found.\n\n# Screenshots\n\n![image](https://github.com/user-attachments/assets/e2d44ea4-d024-486e-97fa-c4aab2203ca0)\n![image](https://github.com/user-attachments/assets/55aea91f-2329-47b8-b138-94103443f34e)\n![image](https://github.com/user-attachments/assets/7ae2e8f0-5196-4229-a80b-739e3cc89851)\n![image](https://github.com/user-attachments/assets/e5293061-7f42-4dd3-b629-484eebb8450c)\n![image](https://github.com/user-attachments/assets/4a41b649-ef7d-414a-82a1-d04f746770b2)\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshawonsimon%2Fazure-data-engineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshawonsimon%2Fazure-data-engineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshawonsimon%2Fazure-data-engineering/lists"}