{"id":20697478,"url":"https://github.com/epomatti/az-data-services","last_synced_at":"2026-04-17T16:32:00.244Z","repository":{"id":198246230,"uuid":"697983952","full_name":"epomatti/az-data-services","owner":"epomatti","description":"End-to-end scenario for Azure data services.","archived":false,"fork":false,"pushed_at":"2023-11-07T18:12:13.000Z","size":362,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-11T02:51:13.305Z","etag":null,"topics":["azure","data","data-engineering","databricks","datalake","lake","synapse","terraform"],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epomatti.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-28T22:15:35.000Z","updated_at":"2023-11-07T18:12:56.000Z","dependencies_parsed_at":"2024-11-17T00:20:56.796Z","dependency_job_id":"809a2045-a199-464a-a6f9-3bf50275fc4a","html_url":"https://github.com/epomatti/az-data-services","commit_stats":null,"previous_names":["epomatti/az-data-services-demo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/epomatti/az-data-services","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Faz-data-services","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Faz-data-services/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Faz-data-services/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Faz-data-services/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epomatti","download_url":"https://codeload.github.com/epomatti/az-data-services/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epomatti%2Faz-data-services/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31936567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-17T12:37:54.787Z","status":"ssl_error","status_checked_at":"2026-04-17T12:37:25.095Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","data","data-engineering","databricks","datalake","lake","synapse","terraform"],"created_at":"2024-11-17T00:18:14.600Z","updated_at":"2026-04-17T16:32:00.229Z","avatar_url":"https://github.com/epomatti.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Azure - Data Services\n\nIn this demo a solution named Databoss will be used to connect and apply Azure data services.\n\nThis is the high-level design with main components adn the data flow:\n\n\u003cimg src=\".assets/azure-data.png\" /\u003e\n\nThis project is implemented almost fully within private network architecture, making use of Private Link and Service Endpoints to securely connect to resources.\n\n\u003cimg src=\".assets/azure-data-network.png\" /\u003e\n\n## Infrastructure\n\n### 🚀 1 - Azure resources creation\n\nCopy the `.auto.tfvars` template:\n\n```sh\ncp templates/template.tf .auto.tfvars\n```\n\nCheck your public IP address to be added in the firewalls allow rules:\n\n```sh\ndig +short myip.opendns.com @resolver1.opendns.com\n```\n\nAdd your public IP address to the `public_ip_address_to_allow` variable.\n\nApply and create the Azure infrastructure:\n\n```sh\nterraform init\nterraform apply -auto-approve\n```\n\nPause the Synapse SQL pool to avoid costs while setting up the infrastructure:\n\n```sh\naz synapse sql pool pause -n pool1 --workspace-name synw-databoss -g rg-databoss\n```\n\nOnce the `apply` phase is complete, approve the managed private endpoints for ADF:\n\n```sh\nbash scripts/approveManagedPrivateEndpoints.sh\n```\n\n💡 A single connection to Databricks is required to create the access policies on Azure Key Vault.\n\nIf everything is OK, proceed to the next section.\n\n### 💾 2 - Data setup\n\nUpload some test data:\n\n```sh\nbash scripts/uploadFilesToDataLake.sh\nbash scripts/uploadFilesToExternalStorage.sh\n```\n\nRun the ADF pipeline import data from the external storage into the data lake:\n\n```sh\naz datafactory pipeline create-run \\\n    --resource-group rg-databoss \\\n    --name Adfv2CopyExternalFileToLake \\\n    --factory-name adf-databoss\n```\n\n### 🟦 3 - Synapse\n\nIf you've stopped the Synapse pool, `resume` it:\n\n```sh\naz synapse sql pool resume -n pool1 --workspace-name synw-databoss -g rg-databoss\n```\n\nCreate the template scripts in Synapse:\n\n```sh\nbash scripts/createSynapseSQLScripts.sh\n```\n\nNow, connect to Synapse Web UI or directly to the SQL endpoint and and execute the scripts.\n\n\n### 🧰 4 - Databricks cluster configuration\n\nThe previous Azure run should have created the `databricks/.auto.tfvars` file to configure Databricks.\n\nApply the Databricks configuration:\n\n\u003e 💡 If you haven't yet, you need to login to Databricks, which will create Key Vault policies.\n\n```sh\nterraform -chdir=\"databricks\" init\nterraform -chdir=\"databricks\" apply -auto-approve\n```\n\nCheck the workspace files and run the test notebooks and make sure that connectivity is complete.\n\n\n### 🗲 5 - Function\n\n#### Deployment\n\nDeployment command:\n\n```sh\nfunc azure functionapp publish \u003cFunctionAppName\u003e\n```\n\n### Local Development\n\nCreate the virtual environment:\n\n```sh\npython -m venv venv\n. venv/bin/activate\npip install -r requirements.txt\n\ndeactivate\n```\n\nStart the function:\n\n```sh\nfunc start\n```\n\nGet the Service Bus connection string:\n\n```sh\naz servicebus namespace authorization-rule keys list -n RootManageSharedAccessKey --namespace-name bus-databoss -g rg-databoss\n```\n\nCreate the `local.settings.json` file:\n\n```json\n{\n  \"IsEncrypted\": false,\n  \"Values\": {\n    \"FUNCTIONS_WORKER_RUNTIME\": \"python\",\n    \"AzureWebJobsFeatureFlags\": \"EnableWorkerIndexing\",\n    \"AzureWebJobsStorage\": \"\",\n    \"AzureWebJobsServiceBusConnectionString\": \"\"\n  }\n}\n```\n\n## Extra subjects\n\n- Consume IP addresses\n- Internal runtime\n- Code repository\n- AD permissions\n- Azure Monitor (Logs, Insights)\n- Enable IR interactive authoring\n\n## 🧹 Clean-up\n\nDelete the Databricks configuration:\n\n```sh\nterraform -chdir=\"databricks\" destroy -auto-approve\n```\n\nDelete the Azure infrastructure:\n\n```sh\nterraform destroy -auto-approve\n```\n\n## Reference\n\n- [Tutorial: ADLSv2, Azure Databricks \u0026 Spark](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark)\n- [ADF Private Endpoints](https://learn.microsoft.com/en-us/azure/data-factory/managed-virtual-network-private-endpoint#managed-private-endpoints)\n- [Integration runtime in Azure Data Factory](https://learn.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime)\n- [Connect to Azure Data Lake Storage Gen2 and Blob Storage](https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage)\n- [Azure Databricks: Manage service principals](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals)\n- [Azure Databricks: Query data in Azure Synapse Analytics](https://learn.microsoft.com/en-us/azure/databricks/external-data/synapse-analytics)\n- [Azure Synapse: Azure Private Link Hubs](https://learn.microsoft.com/en-us/azure/synapse-analytics/security/synapse-private-link-hubs)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepomatti%2Faz-data-services","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepomatti%2Faz-data-services","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepomatti%2Faz-data-services/lists"}