{"id":20697409,"url":"https://github.com/epomatti/azure-ml-private-endpoints","last_synced_at":"2026-05-05T01:33:11.198Z","repository":{"id":227775001,"uuid":"771455315","full_name":"epomatti/azure-ml-private-endpoints","owner":"epomatti","description":"Azure ML workspace with a managed VNET and private endpoints","archived":false,"fork":false,"pushed_at":"2024-07-10T20:54:12.000Z","size":541,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-17T18:34:36.269Z","etag":null,"topics":["aml","azure","azure-machine-learning","azure-ml","azureml","machine-learning","terraform"],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epomatti.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-13T10:31:51.000Z","updated_at":"2024-07-10T20:54:15.000Z","dependencies_parsed_at":"2024-03-18T03:22:48.266Z","dependency_job_id":"23591355-2ad0-446d-9a3a-f28ca525c808","html_url":"https://github.com/epomatti/azure-ml-private-endpoints","commit_stats":null,"previous_names":["epomatti/azure-ml","epomatti/azure-ml-private-endpoints"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Fazure-ml-private-endpoints","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Fazure-ml-private-endpoints/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Fazure-ml-private-endpoints/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Fazure-ml-private-endpoints/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epomatti","download_url":"https://codeload.github.com/epomatti/azure-ml-private-endpoints/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242961747,"owners_count":20213315,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aml","azure","azure-machine-learning","azure-ml","azureml","machine-learning","terraform"],"created_at":"2024-11-17T00:17:54.219Z","updated_at":"2025-12-24T01:43:30.629Z","avatar_url":"https://github.com/epomatti.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Azure ML with Private Endpoints\n\nAzure Machine Learning workspace security features, deployed to a managed VNET, with private datastores via private endpoints, as well as an optional workspace private endpoint.\n\n\u003cimg src=\".assets/aml-architecture.png\" /\u003e\n\n## 1 - Setup\n\nCopy the template `.auto.tfvars` configuration file:\n\n\u003e [!IMPORTANT]\n\u003e A public workspace has some [limitations](#issues-and-limitations) when connecting to private resources, which need extra configuration when using private datastores.\n\n```sh\ncp config/template.tfvars .auto.tfvars\n```\n\nSet the `allowed_ip_address` to allow connectivity to Azure.\n\n(Optional) If using a public compute, generate an SSH key pair to be used for connection:\n\n```sh\nmkdir keys\nssh-keygen -f keys/ssh_key\n```\n\n## 2 - Apply\n\n### Step 1 - Create the project\n\nApply the resources:\n\n```sh\nterraform init\nterraform apply -auto-approve\n```\n\nOnce all resources are created run the step 2.\n\nFor a managed VNET setup, there are three isolation modes:\n\n| Isolation Mode | Terraform value |\n|-----|-----|\n| Disabled | `Disabled` |\n| Allow Internet Outbound | `AllowInternetOutbound` |            \n| Allow Only Approved Outbound | `AllowOnlyApprovedOutbound` |\n\n\u003e [!IMPORTANT]\n\u003e Using `AllowOnlyApprovedOutbound` will create an Azure Firewall with Standard SKU, which adds significant costs.\n\nThe workspace will be created with `AllowInternetOutbound`. Configure the outbound access in the [managed VNET][1] using a preferred interface (add the data lake and the SQL database), which will enable secure outbound access via private endpoints.\n\n\u003e [!IMPORTANT]\n\u003e A Container Registry with `Premium` SKU is required for private endpoints.\n\n### Step 2 - Create the AML Compute Instance\n\n\u003e [!NOTE]\n\u003e Due to limitations with the Terraform provider, compute commands will proceed using the Azure CLI.\n\n\u003e [!TIP]\n\u003e By default, the managed VNET is created along with the compute. Private endpoints should be active after or available for approval.\n\n```sh\naz ml workspace provision-network -g rg-litware -n \u003cmy_workspace_name\u003e --include-spark\n```\n\nCreate the compute instance. This is using [CLI V2 notation][12]:\n\n\u003e [!WARNING]\n\u003e Current YAML notation does not allow configuring public access, with the exception of SSH. Make sure to use `--enable-node-public-ip false` for increased security as the default is `true`.\n\n```sh\naz ml compute create \\\n    --file compute.yml \\\n    --resource-group my-resource-group \\\n    --workspace-name my-workspace \\\n    --enable-node-public-ip false\n```\n\nTo complete the process via Terraform, a private endpoint must be manually approved when the compute is created. I assume this endpoint is required to enable the instances to communicate with the workspace.\n\n\u003e [!IMPORTANT]\n\u003e The execution will halt until the manual approval is done, so keep watching for when the approval is requested.\n\n\u003cimg src=\".assets/aml-compute-approval.png\" width=700 /\u003e\n\n## 3 - Outbound rules\n\nOnce all resources are created, the data stores must be registered in the outbound rules section in order to use them securely via private connections.\n\nTo use the CLI, install the [ML extension][7]:\n\n```sh\naz extension add -n ml\n```\n\nAnd then create the rules:\n\n```sh\naz ml workspace outbound-rule set -g rg-litware \\\n    --rule datalake \\\n    --type private_endpoint \\\n    --workspace-name mlw-litware\u003cabc123\u003e \\\n    --service-resource-id \u003cID\u003e \\\n    --spark-enabled true \\\n    --subresource-target dfs\n```\n\nOr, it might be easier via the Portal integration:\n\n\u003cimg src=\".assets/aml-outbound-rules.png\" /\u003e\n\nIt might be required to perform manual private endpoint approvals, such as in this example for the SQL Server:\n\n\u003cimg src=\".assets/aml-outbound-pe.png\" /\u003e\n\n## 4 - Datastores\n\nIt's time to connect the data sources to the AML workspace. These connections should happen via private endpoints. Datastore connection is documented, such as in [this page][3] or [this article][4].\n\n👉 **Create a secret** for the pre-create Application Registration in Entra ID that can be used to setup connections to the data lake. Optionally, it can also be used for the SQL Server, but it will require an external authentication setup which is not covered here - SQL authentication should be enough for this demo.\n\n👉 **Register the datastore** in order to securely and productively connect to data resources.\n\nOnce the datastores are registered, they become usable via notebooks. In the next example a file is downloaded from a Blob storage.\n\n```python\nimport os\nimport azureml.core\nfrom azureml.core import Workspace, Datastore, Dataset\n\nws = Workspace.from_config()\n\ndatastore = Datastore.get(ws, datastore_name='blobs')\ndatastore.download(target_path=\"./output\", prefix=\"contacts.csv\", overwrite=False)\n\narr = os.listdir('./output')\nprint(arr)\n\nfile = open(\"./output/contacts.csv\", \"r\").read()\nprint(file)\n```\n\nAlternatively, prefer using SDK v2 for [workspace][10] and [data][11] operations.\n\n### SQL Server Service Principal permissions\n\n#### SQL Server\n\nCreate the login from an external provider:\n\n```sql\nUSE master\nCREATE LOGIN [datastores-litware123] FROM EXTERNAL PROVIDER\nGO\n```\n\nCheck the server login:\n\n```sql\nSELECT name, type_desc, type, is_disabled \nFROM sys.server_principals\nWHERE type_desc like 'external%'  \n```\n\n#### SQL Database\n\nCreate the database user associated with the external login:\n\n```sql\nCREATE USER [datastores-litware123] FROM LOGIN [datastores-litware123]\nGO\n```\n\nCheck the database user:\n\n```sql\nSELECT name, type_desc, type \nFROM sys.database_principals \nWHERE type_desc like 'external%'\n```\n\nAdd the necessary permissions to the user:\n\n```sql\nALTER ROLE db_datareader ADD MEMBER [datastores-litware123];\nALTER ROLE db_datawriter ADD MEMBER [datastores-litware123];\nALTER ROLE db_ddladmin ADD MEMBER [datastores-litware123];\nGO\n```\n\n## 5 - Private AML workspace setup\n\nThe most secure architecture for AML would be a private AML workspace, meaning that the workspace would be accessible only via a private endpoint, and the dependent resources and data stores also accessible via private connections.\n\nTo enable private access for this project, change these variables as follows:\n\n```terraform\nmlw_public_network_access_enabled = false\nmlw_create_private_endpoint_flag  = true\nvm_create_flag                    = true\n```\n\nThen `apply` the configuration. Once applied, access tot he AML workspace should be possible only using the VM.\n\nService endpoints should be already created for the datastores, so next step would be to disable public access to storages and databases and make the architecture 100% private.\n\nAML components resources should also be set to private if possible. For example, the workspace storage needs to be visible to the users in the private network, but not from the internet in this use case.\n\n## Firewall costs\n\nAs per Microsoft [documentation][8], a Firewall with `Standard` SKU will be created and the respective cost increase will apply.\n\n\u003e FQDN outbound rules are implemented using Azure Firewall. If you use outbound FQDN rules, charges for Azure Firewall are added to your billing. The Azure Firewall (standard SKU) is provisioned by Azure Machine Learning.\n\nTo avoid this, one option is using a customer-managed VNET which is also the [recommended option][9].\n\n## Issues and limitations\n\nThere are some [limitations][2] when using a public access which will need some special configuration. I've opened [this thread][5] in which I'm asking Msft to add further details on which combinations are actually validar or invalid, and what additional configuration is required.\n\nI've also ran into [this issue][6] where creating a `Dataset` is not working.\n\nIt is important to notice that users of the Azure ML Studio must have line of sight of the workspace storage. Here is an example with the console traces demonstrating that:\n\n\u003cimg src=\".assets/aml-line-of-sight.png\" /\u003e\n\n---\n\n### Clean-up\n\nDelete the resources and avoid unplanned costs:\n\n```sh\nterraform destroy -auto-approve\n```\n\n[1]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-managed-network?view=azureml-api-2\u0026tabs=azure-cli\n[2]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-private-link?view=azureml-api-2\u0026tabs=cli#enable-public-access\n[3]: https://learn.microsoft.com/en-us/AZURE/machine-learning/how-to-access-data?view=azureml-api-1\n[4]: https://k21academy.com/microsoft-azure/dp-100/datastores-and-datasets-in-azure/\n[5]: https://github.com/MicrosoftDocs/azure-docs/issues/120843\n[6]: https://stackoverflow.com/q/78176515/3231778\n[7]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-cli?view=azureml-api-2\u0026tabs=public\n[8]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-managed-network?view=azureml-api-2\u0026tabs=azure-cli#pricing\n[9]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-network-isolation-planning?view=azureml-api-2#recommended-architecture-use-your-azure-vnet\n[10]: https://learn.microsoft.com/en-us/azure/machine-learning/migrate-to-v2-resource-workspace?view=azureml-api-2\n[11]: https://learn.microsoft.com/en-us/azure/machine-learning/migrate-to-v2-assets-data?view=azureml-api-2\n[12]: https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-compute-instance?view=azureml-api-2\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepomatti%2Fazure-ml-private-endpoints","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepomatti%2Fazure-ml-private-endpoints","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepomatti%2Fazure-ml-private-endpoints/lists"}