{"id":20697412,"url":"https://github.com/epomatti/azure-machinelearning-cm-vnet","last_synced_at":"2025-03-11T02:50:25.487Z","repository":{"id":228682778,"uuid":"774145049","full_name":"epomatti/azure-machinelearning-cm-vnet","owner":"epomatti","description":"Azure ML with customer-managed VNET","archived":false,"fork":false,"pushed_at":"2024-06-29T00:20:03.000Z","size":541,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-17T18:34:37.456Z","etag":null,"topics":["aml","azure","azure-machine-learning","azure-ml","forward-proxy","machine-learning","nginx","proxy","squid","squid-proxy","terraform"],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epomatti.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-19T02:42:02.000Z","updated_at":"2024-06-29T16:04:42.000Z","dependencies_parsed_at":"2024-04-15T20:43:31.320Z","dependency_job_id":"3b105188-4f06-4b18-a6f9-ea08c71d3cba","html_url":"https://github.com/epomatti/azure-machinelearning-cm-vnet","commit_stats":null,"previous_names":["epomatti/azure-ml-vnet","epomatti/azure-machinelearning-cm-vnet"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Fazure-machinelearning-cm-vnet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Fazure-machinelearning-cm-vnet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Fazure-machinelearning-cm-vnet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Fazure-machinelearning-cm-vnet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epomatti","download_url":"https://codeload.github.com/epomatti/azure-machinelearning-cm-vnet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242961747,"owners_count":20213315,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aml","azure","azure-machine-learning","azure-ml","forward-proxy","machine-learning","nginx","proxy","squid","squid-proxy","terraform"],"created_at":"2024-11-17T00:17:55.880Z","updated_at":"2025-03-11T02:50:25.462Z","avatar_url":"https://github.com/epomatti.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Azure ML VNET\n\nImplementation of [AML network isolation][1] with a customer-managed VNET.\n\n\u003cimg src=\".assets/aml.png\" /\u003e\n\n## Setup\n\nCreate the variables file:\n\n```sh\ncp config/template.tfvars .auto.tfvars\n```\n\nConfiguration:\n\n1. Set your IP address in the `allowed_ip_address` variable.\n2. Set your the Entra ID tenant in the  `entraid_tenant_domain` variable.\n\nGenerate a key pair to manage instances with SSH:\n\n```sh\nmkdir keys\nssh-keygen -f keys/ssh_key\n```\n\n\u003e [!TIP]\n\u003e To allow public connection to the AML workspace, set `mlw_public_network_access_enabled = true`.\n\n\nCreate the resources:\n\n```sh\nterraform init\nterraform apply -auto-approve\n```\n\nConfirm and approve any private endpoints, both in the subscription, and within the managed AML workspace.\n\nManually create the datastores in AML and run the test notebooks.\n\n## Compute\n\nCreate the AML compute and other resources by changing the appropriate flags:\n\n\u003e [!NOTE]\n\u003e Follow the [documentation][2] steps to enable AKS VNET integration, if not yet done so.\n\n```terraform\nmlw_instance_create_flag = true\nmlw_aks_create_flag      = true\nmlw_mssql_create_flag    = true\n```\n\n## Container Registry\n\nExtra configuration is required when using an Container Registry with private endpoints.\n\nAfter creating the compute node, follow the [documentation][6] to enable docker builds in AML:\n\n```sh\naz ml workspace update --name myworkspace --resource-group myresourcegroup --image-build-compute mycomputecluster\n```\n\n## IAM\n\nThis project has two roles which require different set of permissions:\n\n| User | Activities |\n|------|------------|\n| `azureadmin` | Administration of all related Azure resources. |\n| `datascientist` | Development in the AML workspace. |\n\n## Firewall\n\nTo demonstrate protection against data exfiltration, this exercise implements Azure Firewall. The requirements for this design are documented in this [Configure inbound and outbound network traffic][8] article.\n\n\u003e [!IMPORTANT]\n\u003e Additional steps for hardening the data exfiltration protection are available in the [Azure Machine Learning data exfiltration prevention][9] documentation.\n\n\nSet the flag to enable the Azure Firewall resources and `apply` the infrastructure:\n\n```terraform\nfirewall_create_flag = true\n```\n\nThis will create the firewall, policies, rules, routes, and other resources.\n\n\u003e [!TIP]\n\u003e It's also possible to get a list of hosts and ports, following this [guideline][10].\n\n## Forward Proxy\n\n\u003e [!CAUTION]\n\u003e It was not possible to configure a forward proxy on instance creation (with a creation script) when deploying to an isolated Virtual Network. It seems that the provisioning procedure is overriding the proxy configuration from the startup script. The only official architecture supported by Microsoft with network isolation seems to be using a Firewall for egress. \n\n### Enable Proxy\n\nSet the proxy flag to `true`:\n\n```terraform\nvm_proxy_create_flag = true\n```\n\nConfigure the compute instance with sample file [custom/instance-proxy-init.sh](./custom/instance-proxy-init.sh).\n\nProxy connection will be configured on init following the [proxy documentation][7].\n\n\n### Squid\n\nConnect to the proxy VM server:\n\n```sh\nssh -i keys/ssh_key azureuser@\u003cpublic-ip\u003e\n```\n\nSquid will already be installed via `cloud-init`. If you need to make changes, check the [official docs][5].\n\nConfiguration can be set in file `/etc/squid/squid.conf`.\n\nSet some hostname parameters:\n\n```\nvisible_hostname squid.private.litware.com\nhostname_aliases squid.private.litware.com\n```\n\nChange the `http_access` setting to allow all connections:\n\n```\n# http_access deny !Safe_ports\nhttp_access allow all\n```\n\nRestart the service:\n\n```sh\nsudo systemctl restart squid.service\n```\n\nTesting with default configuration:\n\n```sh\ncurl -x \"http://squid.private.litware.com:3128\" \"https://example.com/\"\n```\n\n### NGINX\n\n\u003e [!NOTE]\n\u003e From this [thread][4], running NGINX full proxy with HTTPS will required additional configuration steps.\n\nConnect to the proxy server:\n\n```sh\nssh -i keys/ssh_key azureuser@\u003cpublic-ip\u003e\n```\n\nI've used [this article][3] as reference to setup the forward proxy server on NGINX.\n\n1. Comment the default server config within `/etc/nginx/sites-enabled/default`.\n2. Create the [nginx/forward][nginx/forward] config file.\n3. Restart NGINX (`systemctl restart nginx.service`).\n\nThe forward proxy service should be available at port `8888`.\n\n```sh\ncurl -x \"http://127.0.0.1:8888\" \"https://example.com/\"\n```\n\n---\n\n### Clean-up\n\nDelete the resources and avoid unplanned costs:\n\n```sh\nterraform destroy -auto-approve\n```\n\n[1]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-network-isolation-planning?view=azureml-api-2#recommended-architecture-use-your-azure-vnet\n[2]: https://learn.microsoft.com/en-us/azure/aks/api-server-vnet-integration\n[3]: https://www.baeldung.com/nginx-forward-proxy\n[4]: https://serverfault.com/a/1090581/560797\n[5]: https://ubuntu.com/server/docs/how-to-install-a-squid-server\n[6]: https://docs.microsoft.com/azure/machine-learning/how-to-secure-workspace-vnet#enable-azure-container-registry-acr\n[7]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-secure-workspace-vnet?view=azureml-api-2\u0026tabs=required%2Cpe%2Ccli#required-public-internet-access\n[8]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-access-azureml-behind-firewall?view=azureml-api-2\u0026tabs=ipaddress%2Cpublic\n[9]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-prevent-data-loss-exfiltration?view=azureml-api-2\u0026tabs=servicetag\n[10]: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-access-azureml-behind-firewall?view=azureml-api-2\u0026tabs=ipaddress%2Cpublic#dependencies-api\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepomatti%2Fazure-machinelearning-cm-vnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepomatti%2Fazure-machinelearning-cm-vnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepomatti%2Fazure-machinelearning-cm-vnet/lists"}