Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mrpaulandrewltd/Microsoft-Data-Integration-Pipeline-Training

Training workshop content on Azure Data Factory and Azure Synapse Analytics Data Integration Pipelines
https://github.com/mrpaulandrewltd/Microsoft-Data-Integration-Pipeline-Training

azure data data-factory integration pipelines procfwk synapse-analytics

Last synced: 19 days ago
JSON representation

Training workshop content on Azure Data Factory and Azure Synapse Analytics Data Integration Pipelines

Awesome Lists containing this project

README

        

# Microsoft Data Integration Pipeline Training
## The Fundamentals to Level 300
### _with Paul Andrew_

![Slide Header](./Images/ReadMe%20Header.png)

Hey friends and welcome to my training workshop on __Microsoft Data Integration Pipelines__.

## Overview

In this full day of training, we’ll start with the very basics and learn how to orchestrate your Azure data platform from start to finish. You will learn how to build out Azure control flow and data flow components as processing pipelines using Azure Data Factory and Azure Synapse Analytics. We’ll start by covering the fundamentals within the resources and together build out pipelines that ingest data from local source systems, transform and serve it to consumers. We’ll then continue taking an end-to-end look at our Azure integration pipeline tools within highly scalable cloud native architectures, dealing with triggering, monitoring, dynamic pipeline content as well as CI/CD practices. Start the day knowing nothing about Azure Data Integration pipelines and leave with the knowledge, slides, demos, and code to apply these resources in your role as a data engineering professional.
___

## Objectives

* How cloud native data integration resources have evolved over time.
* What the basic data pipeline artifacts are.
* What the common data movement deployment patterns are.
* How to build complex, high dynamic control flows.
* How to massively scale out executions and handle parallel orchestration workloads.
* Best practices for the deployment of orchestration resources.

___

## Agenda

The following offers an insight into the complete agenda and module breakdown for this workshop.

* __Module 1:__ Pipeline Fundamentals - [Slides PDF >>>](/Content/Module%201%20-%20Pipeline%20Fundamentals.pdf)
* The History of Azure Orchestration
* Synapse Analytics vs Data Factory vs Microsoft Fabric
* Integration Components
* Common Activities
* Execution Dependencies

___

* __Module 2:__ Integration Runtime Design Patterns - [Slides PDF >>>](/Content/Module%202%20-%20Integration%20Runtimes.pdf)
* Compute Types
* Azure
* Hosted
* SSIS
* Patterns & Configuration

___

* __Module 3:__ Data Transformation - [Slides PDF >>>](/Content/Module%203%20-%20Data%20Transformations.pdf)
* Data Flows
* Power Query Injection
* Spark Configuration
* Use Cases

___

* __Labs:__ [Getting Hands On](https://github.com/mrpaulandrewltd/Azure-Data-Integration-Pipeline-Training/tree/main/Labs)
* Create Azure Resources
* Build a Copy Pipeline
* Create a Reusable Pipeline
* Author a Data Flow
* Monitor Factory Activities
* Explore Synapse Pipelines
* Explore Fabric Pipelines
* Mini-project

___

* __Module 4:__ Dynamic Pipelines - [Slides PDF >>>](/Content/Module%204%20-%20Dynamic%20Pipelines.pdf)
* Expressions & Interpolation
* Simple Metadata Driven Execution
* Dynamic Content Chains
* Reference Names

___

* __Module 5:__ Pipeline Extensibility - [Slides PDF >>>](/Content/Module%205%20-%20Pipeline%20Extensibility.pdf)
* Azure Batch Service
* Tasks
* Compute Pools
* Scaling
* Pipeline Custom Activities
* Azure Management API
* Azure Functions

___

* __Module 6:__ Execution Parallelism - [Slides PDF >>>](/Content/Module%206%20-%20Execution%20Parallelism.pdf)
* Control Flow Scale Out
* Concurrency Limitations
* Internal vs External Activities
* Orchestration Framework - See Cloud Formations: [CF.Cumulus](https://www.cloudformations.org/cumulus?utm_source=pa&utm_medium=github&utm_campaign=cumulus&utm_content=l2)

___

* __Module 7:__ VNet Integration - [Slides PDF >>>](/Content/Module%207%20-%20VNet%20Integration.pdf)
* Private Endpoints
* Managed VNet's
* Firewall Bypass

___

* __Module 8:__ Security - [Slides PDF >>>](/Content/Module%208%20-%20Security.pdf)
* Service Principals
* Managed Identities
* Azure Key Vault Integration
* Customer Managed Keys
* Pipeline Access & Permissions

___

* __Module 9:__ Monitoring & Alerting - [Slides PDF >>>](/Content/Module%209%20-%20Monitoring%20&%20Alerting.pdf)
* Studio Monitoring
* Log Analytics & Kusto Queries
* Operational Dashboards
* Advanced Alerting

___

* __Module 10:__ Solution Testing - [Slides PDF >>>](/Content/Module%209%20-%20Monitoring%20&%20Alerting.pdf)
* Development Time Validation
* Test Coverage
* NUnit Tests

___

* __Module 11:__ CI/CD - [Slides PDF >>>](/Content/Module%2011%20-%20CI%20CD.pdf)
* Source Control vs Developer UI
* Basic ARM Template Deployments
* Advanced Deployment Patterns

___

* __Module 12:__ Final Thoughts - [Slides PDF >>>](/Content/Module%2012%20-%20Final%20Thoughts.pdf)
* Running Costs
* Conclusions
* Best Practices
___

## Suggested Prerequisites

If participating in any of these training workshop there will be labs to work through and demo code to optionally participate in. These labs will focus on the development of Azure data platform resources, it is therefore recommended that you bring the following ready to use. There will be little spare time for initial setup work.

* Most importantly, access to a Microsoft Azure Tenant including a usable Azure Subscription.
* A free trial account is sufficient, but please have this setup prior to the event to avoid delays.
* This should include the ability to provision resources in an Azure Resource Group with owner level access.
* A developer laptop with power and some form of WiFi connectivity (sorry if obvious).
* Suggested software to be installed on your laptop to make the learning experience run smoothly:
* A modern web browser, Microsoft Edge or similar as preferred.
* A suitable IDE, VSCode or Visual Studio including Azure development extensions.
* Database tools, SQL Server Management Studio or Azure Data Studio.
* GitHub desktop or similar for repository interaction.
* Azure Storage Explorer.
* A PDF file viewer.
* Play the Azure Icon Game, it will help. See blog post for context: [https://mrpaulandrew.com/2017/12/15/the-azure-icon-game](https://mrpaulandrew.com/2017/12/15/the-azure-icon-game)

For software downloads, please complete these tasks prior to the event to avoid internet bandwidth contention for other attendees.

_Many thanks_

___

## Speaker Biography

Paul (AKA @mrpaulandrew) is the Founder & CTO of Cloud Formations, a specialist data consultancy based in the UK. With nearly 20 years’ experience designing and delivering Microsoft data architectures, Paul leads a passionate team of engineers, supporting businesses small and large with scalable cloud platforms. Business value delivered through data insights. Over the years, Paul has covered the breadth and depth of design patterns and industry leading concepts, including Lambda, Kappa, Delta Lake, Data Mesh and Data Fabric.

Paul is also a Microsoft Data Platform MVP, organiser for the Data Relay community conference, East Midlands user group leader, book author and mentor. In addition to the day job(s), Paul is a father of three, husband, foodie, runner, blood donor, geek, Lego, and Star Wars fan! Lastly, Paul confesses to enjoying a Ramstein playlist when given half a chance to do some coding for a customer project.

## Speaker Contact Details

![Contact QR Code](./Images/Contact.png)

[mrpaulandrew.com/contact](https://mrpaulandrew.com/contact/)
___