{"id":27968969,"url":"https://github.com/madetech/databricks_sandbox","last_synced_at":"2026-02-17T18:32:46.606Z","repository":{"id":285765427,"uuid":"959275036","full_name":"madetech/databricks_sandbox","owner":"madetech","description":"Reusable Terraform modules to provision simple, cost-effective Databricks sandbox environments in Azure. Designed for quick deployment, easy teardown, and alignment with secure reference architectures.","archived":false,"fork":false,"pushed_at":"2025-05-06T10:29:57.000Z","size":214,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-09T03:38:45.845Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/madetech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-02T14:25:37.000Z","updated_at":"2025-06-06T13:31:06.000Z","dependencies_parsed_at":"2025-04-16T17:31:04.192Z","dependency_job_id":"e9c342e6-34ce-4f96-a4dd-3fd7b017297f","html_url":"https://github.com/madetech/databricks_sandbox","commit_stats":null,"previous_names":["madetech/databricks_sandbox"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/madetech/databricks_sandbox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madetech%2Fdatabricks_sandbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madetech%2Fdatabricks_sandbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madetech%2Fdatabricks_sandbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madetech%2Fdatabricks_sandbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/madetech","download_url":"https://codeload.github.com/madetech/databricks_sandbox/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madetech%2Fdatabricks_sandbox/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29552795,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T18:16:07.221Z","status":"ssl_error","status_checked_at":"2026-02-17T18:16:04.782Z","response_time":100,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-07T21:07:55.925Z","updated_at":"2026-02-17T18:32:41.590Z","avatar_url":"https://github.com/madetech.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Databricks Sandbox Terraform Module\nReusable Terraform modules to provision simple, cost-effective Databricks sandbox environments in AWS. Designed for quick deployment, easy teardown, and alignment with secure reference architectures.\n\n## 1. How it works:\n### Secure Reference Architecture\nThis project aligns with the official Databricks Secure Reference Architecture (SRA):\n\n[Databricks Terraform SRA – Azure](https://github.com/databricks/terraform-databricks-sra)\n\n## 2. What this deploys:\n\nThis project provides a modular set of Terraform templates to deploy **Databricks sandbox environments on AWS**. It's designed for simplicity, cost-efficiency, and reusability in mind — enabling data engineers at made tech to quickly spin up and tear down sandbox workspaces with minimal overhead.\n\n### It provisions:\n1. A databricks workspace in AWS\n2. Secure networking: VPC, subnets, NAT gateway, route tables\n3. IAM roles and cross-account trust policies\n4. KMS encryption for managed and workspace data\n5. S3 buckets for root and Unity Catalog storage (with versioning)\n6. Optional Unity Catalog metastore (toggleable)\n7. A shared compute cluster for testing\n8. Unity Catalog metastore, external location, storage credential, and catalog setup\n\n## 3. Prerequisites\n\n- [ ] AWS SSO access to the Made Tech sandbox account\n- [ ] A Databricks **account ID** and a **service principal**\n- [ ] Terraform CLI (v1.3+)\n- [ ] AWS CLI with SSO support (`aws sso login`)\n- [ ] Databricks Terraform provider (auto-installed)\n## 4. Project Structure\n```bash\n.\n├── environments/\n│   ├── main.tf\n│   ├── variables.tf\n│   ├── outputs.tf\n│   ├── providers.tf\n│   └── terraform.tfvars         # Local only, do NOT commit\n├── .env                         # a secrets file (gitignored)\n├── .env.example                 # Template for others to copy\n├── .github/workflows/deploy.yml\n└── README.md\n```\n----------\n## 5. Deploying your Databricks Sandbox\nThis project uses GitHub Actions to automate provisioning of Databricks sandboxes in AWS using Terraform. All secrets are inserted securely via GitHub Secrets, and deployments are triggered through Pull Requests\n### Step 1 Getting Started:\n1. Set up your .env file\nTo set up personal secrets (locally), copy the example file\n```bash\ncp .env.example .env\n```\n2. Fill in your databricks-client-id, databricks-client-secret and your email:\n```bash\nTF_VAR_admin_user=your.email@madetech.com\nTF_VAR_client_id=your-databricks-client-id\nTF_VAR_client_secret=your-databricks-client-secret\n```\n3. Configure terraform.tfvars\nEdit environments/terraform.tfvars with your sandbox-specific values:\n```bash\nresource_prefix       = \"sandbox-\u003cyourname\u003e\" # Provisioned resources will be based on your resource_prefix (e.g. sandbox-alex-cluster)\nmetastore_exists      = false # First time false, change to true after UC is created once\n```\nNotes:\n  * resource_prefix must be unique across the AWS account to avoid naming collisions.\n  * If you're the first person creating Unity Catalog for the week, set metastore_exists = false.\n  * Terraform will then:\n      * Create the Unity Catalog metastore\n      * Configure the root storage bucket and KMS key\n      * Set up external locations, storage credentials, and default catalog\n------\n### Step 2 Deploy via github actions:\n4. Create a new feature branch:\n```bash\ngit checkout -b feat/initials_sandbox e.g. feat/za_sandbox\n```\n5. Commit and push your changes:\n```bash\ngit add environments/terraform.tfvars\ngit commit -m \"Add sandbox for \u003cyourname\u003e\"\ngit push --set-upstream origin feat/\u003cinitials\u003e_sandbox\n```\n7. Commit and push. Make sure the .env file is in .gitignore.\n```bash\ngit add environments/terraform.tfvars\ngit commit -m \"Add sandbox for \u003cyourname\u003e\"\ngit push --set-upstream origin feature/\u003cinitials\u003e_sandbox\n```\n8. Open a pull request to main - do not merge!\nThis will :\n* Trigger the deploy workflow\n* Use your .tfvars + GitHub Secrets\n* Run terraform init, plan, and apply\n* Provision your sandbox automatically\n----------\n### Step 3 Final — Verify Your Environment:\n9. Go to the Pull Request → Actions tab\n    * Confirm that terraform plan and terraform apply both ran without errors.\n10. Log in to Databricks- select your workspace that should be named using your:\n    * workspace_name (from terraform.tfvars)\n    * and resource_prefix\n11. Inside the workspace:\n    * Go to Compute to check your shared cluster\n    * Go to Data → Unity Catalog to verify the catalog and schemas\n\n## If everything looks good, your sandbox is now live.\n------------\n## Destroying the Sandbox (Manual)\nTo destroy your sandbox environment locally, especially if you encounter any \"already exists\" errors during deployment, you have two options:\n1. Manually delete the resources in the AWS Console, or\n2. Run a local Terraform destroy using the steps below.\n\nTo destroy locally:\n1. Navigate to Your Environment Folder\n```bash\ncd environments\n```\n2. Run the following commands:\n```bash\nterraform init -reconfigure\nterraform destroy -auto-approve\n```\nThis will:\n* Initialize Terraform using the remote S3 backend.\n* Destroy all provisioned resources using your remote state (environments/terraform.tfstate).\n\n3. Cleanup\n\u003e **Note:** Running `terraform destroy -auto-approve` will remove most resources, but you may still see leftovers due to dependency chains or provider quirks. When that happens, you will just have to go digging in AWS and remove dependancies or a series of dependancies manually (they are always listed so its obvious but may require waiting until they shut down).\n\n\n## Notes for Contributors\n* Do not commit your .env or terraform.tfvars files.\n* Validate all Terraform changes using terraform validate.\n* Format changes using terraform fmt.\n* Follow the standard GitHub PR workflow and use feature branches.\n* Errors may start cropping up at terraform plamn, in which case troubleshooting will be required, read the errors carefully\n\n## Errors\n* NOTE: External location and catalog can be manually imported via:\n```bash\nterraform import \\\n  module.sra.module.uc_catalog.databricks_external_location.workspace_catalog_external_location \\\n  sandbox-zeerak-catalog-29677xxxxxx-external-location\n\nterraform import \\\n  module.sra.module.uc_catalog.databricks_catalog.workspace_catalog \\\n  sandbox_zeerak_catalog_29677xxxxxx\n\nImported existing catalog: sandbox_zeerak_catalog_29677xxxxxx\nImported external location: sandbox-zeerak-catalog-29677xxxxxx-external-location\n```\n\n\n\n\n## Known Limitations\n* CloudTrail is not enabled automatically.\n* Public subnet routing is not included by default.\n* Not all workspace-level permissions (e.g. SQL access) are fully automat\n\n## Support\nIf you encounter issues with Terraform state, network setup, or authentication, reach out in the #cop-cloud or #data-practice Slack channel or tag/message me @ZeerakAziz for infrastructure-related questions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadetech%2Fdatabricks_sandbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmadetech%2Fdatabricks_sandbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadetech%2Fdatabricks_sandbox/lists"}