https://github.com/salmanmkc/azure-data-platform
Short guide as to how to set up the ADF Project :)
https://github.com/salmanmkc/azure-data-platform
azure datafactory microsoft
Last synced: 4 months ago
JSON representation
Short guide as to how to set up the ADF Project :)
- Host: GitHub
- URL: https://github.com/salmanmkc/azure-data-platform
- Owner: salmanmkc
- Created: 2021-02-15T15:34:16.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-01-23T19:00:30.000Z (over 1 year ago)
- Last Synced: 2025-03-17T08:49:26.945Z (4 months ago)
- Topics: azure, datafactory, microsoft
- Homepage:
- Size: 105 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Azure Data Platform/Analytics Platform
This project is here to provide an end-to-end data platform on Azure, with all of the required services in place and fully connected. This provides the empty shell of the solution, ready to "bring-your-own-data".## Pre-requisites:
- An Azure Resource Group with Contributor Access (remember Azure inherits), Owner access is required for managed instance which is optional but discussed in this guide# How to set up the project
## Step 1: Click Deploy To Azure below to get started
[](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fsalmanmkc%2Ftemplatedeployment%2Fmain%2Ftemplate.json)
Fill in the corresponding fields:
Click deploy and you should have this showing:

This should take approximately 5 minutes, however that's solely due to there being a whole Virtual Machine deployment (IaaS), and you can continue to do the next steps as the Virtual Machine continues to deploy.
If you navigate to your newly created resource group or if you click *Go To Deployment* you should have all of these resources (the names may differ based on your input parameters upon deploying the template:

The final step looks like this:

## Step 2: Configure the Azure Data Factory to use a Private Endpoint
- Navigate to your Azure Data Factory that has been created in your resource group
- Go to the networking tab on the left
- Change the Networking Access from Public Endpoint to Private Endpoint
- Select Private endpoint connections at the top
- Add a Private Endpoint by clicking *+Private Endpoint*
Add an endpoint:
Tab 1:
Target your own resource in tab 2
Navigate to your Azure Data Factory and lunch it
Note that you also want to make sure that this private endpoint is configured with the correct VNET, else Azure Data Factory will resolve your public endpoint.
## Step 3: Managed VNET
By creating a managed VNET it allows us to access our PaaS resources on a private endpoint.1. Click the Manage option on the left menu
2. Click Integration Runtimes
3. Click New
Choose Azure, Self-Hosted, then continue

Then click Azure and continue again

1. Give it a good name, in this case I've named it managed-vnet
2. Enable Virtual network configuration
Then click create
1. Click Manage Private Endpoints
2. New
Choose Azure Data Lake Gen 2 (ADLS2)
Give it a name, choose your Azure Subscription and your Storage Account that you made during the initial deployment stage

Go to your storage account and approve the private endpoint

Go back to Linked Services, and click new

Choose Azure Data Lake Storage Gen2
For *Connect via integration runtime* select from the dropdown your managed vnet that you just created in the previous step
If successful this should show up and your private endpoint should be approved
Test the connection

Now this approach is by using a key, **if** you have owner access to your subscription then you will be able to grant access via managed identity by configuring Access Control (IAM) and you can select Managed Identity in the steps above.
1) Go to integration runtimes
2) Click new
Choose Azure, Self-Hosted

Under Network Environment select Self-Hosted and then click continue

In this case I've named it self-hosted-vm

Follow either option 1 or 2, I did option 1 where I logged onto the VM and installed the application and used one of the keys

Now you can see that the status of the integration run time on my self-hosted VM is now running

You will need to use your VM in few steps, so don't log off yet
Add a new linked service again by going to linked services, new

Search for file and choose File System

Under Connect via integration runtime, choose your self-hosted-vm

On your VM, create a new folder on one of your drives
1) Create a new file in that folder
2) Copy the path of the folder
Under host, paste the file path

Enter your username and password that you set in the initial ARM resource template deployment stage

As best practice you can create a linked service to an Azure Key Vault
1) Linked Services
2) New
Select Key Vault

You would choose your Key Vault that you created during the deployment stage here
This can then be used for secrets instead of inputting your password for the self-hosted VM in the previous stage
Go to your storage account and create a new container

## Step 4: Creating connections for the pipeline
1) Click on Pipelines in the side menu on the left
2) Click the plus button
3) Choose datasets
Choose Azure Data Lake Gen 2

For format choose DelimitedText/CSV

Choose your linked service for your Azure Storage Account

Next you can click on the browse button
Choose your container that you previously created

For the import schema, click on None

Next do the same for your file server on the VM

Search for file and choose File System

Again choose DelimitedText/CSV

You can then choose your linked service for your file server

Browse for the file path

Select the file you created

Change the import schema to none

## Step 5: Create a new pipeline
So all the stages before this are primarily just for set up. This is the most customisable step of the process, this will simply copy a file from your virtual machine to your container on Azure Blob Storage.
Firstly Drag Copy Data onto your canvas

At the bottom of the page
1) Click Source
2) Choose DelimitedText2 (the Virtual Machine Dataset)
3)
1) Click Sink
2) Choose DelimitedText1 (the Storage Account Dataset)
Click Debug

It should then show the status as Succeeded

## Complete! I hope you all enjoyed.
Feel free to create an issue on this GitHub page if you have found any problems setting it up or message me directly on [LinkedIn](https://www.linkedin.com/in/salmanmkc/) or by emailing [email protected]