Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/databricks/terraform-databricks-lakehouse-blueprints
Set of Terraform automation templates and quickstart demos to jumpstart the design of a Lakehouse on Databricks. This project has incorporated best practices across the industries we work with to deliver composable modules to build a workspace to comply with the highest platform security and governance standards.
https://github.com/databricks/terraform-databricks-lakehouse-blueprints
financial-services hls regulated-industry-blueprints terraform
Last synced: 4 days ago
JSON representation
Set of Terraform automation templates and quickstart demos to jumpstart the design of a Lakehouse on Databricks. This project has incorporated best practices across the industries we work with to deliver composable modules to build a workspace to comply with the highest platform security and governance standards.
- Host: GitHub
- URL: https://github.com/databricks/terraform-databricks-lakehouse-blueprints
- Owner: databricks
- License: other
- Created: 2022-02-26T23:28:36.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-25T00:26:34.000Z (10 months ago)
- Last Synced: 2024-08-14T07:08:45.422Z (3 months ago)
- Topics: financial-services, hls, regulated-industry-blueprints, terraform
- Language: Python
- Homepage:
- Size: 2.42 MB
- Stars: 70
- Watchers: 9
- Forks: 31
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - databricks/terraform-databricks-lakehouse-blueprints - Set of Terraform automation templates and quickstart demos to jumpstart the design of a Lakehouse on Databricks. This project has incorporated best practices across the industries we work with to deli (Python)
README
# Deploy Your Lakehouse Architecture
## Purpose
This set of terraform templates is designed to allow every industry practitioner and devops team to get started quickly with the canonical Regulated Industries security best practices and governance setup as well as highly valuable industry libraries and quickstarts directly in your environment.
![Lakehouse Blueprints](https://raw.githubusercontent.com/databricks/terraform-databricks-lakehouse-blueprints/main/Blueprints.jpg)
---
## Details on what is Packaged
What's include in this sequence of Terraform modules?
### AWS | Azure
There are 4 main modules which can be composed together. (1-4). There is also a full end-to-end example of a workspace deployment with governance and industry quickstarts included. See the `test_aws_full_lakehouse_example` for this version.
1. Creation of Databricks-compliant VPC in `aws_base` or `azure_spoke_vnet` (AWS | Azure)
2. Platform Security Built in to Workspace deployment in `aws_customer_managed_vpc` and `azure_vnet_injected_databricks_workspace` module (Private Link, VPC endpoints, and secure connectivity) (AWS | Azure)
3. Unity Catalog Installation in `aws_uc` and `azure_uc` module (AWS | Azure)
4. Industry Quickstarts with Sample Job and Pre-installed Libraries for Time Series, Common Domain Models (see `aws_fs_lakehouse` module)
5. Full End-to-End example on AWS for Composition of all modules above (see below). This composed example is available the examples folder (test_full_aws_lakehouse_example) This can be similarly applied for Azure modules.```hcl
module "aws_base" {
source = "../../modules/aws_base/"
cidr_block = var.cidr_block
tags = var.tags
region = var.region
databricks_account_password = var.databricks_account_password
databricks_account_id = var.databricks_account_id
databricks_account_username = var.databricks_account_username
}data "aws_vpc" "prod" {
id = module.aws_base.vpc_id
}module "aws_customer_managed_vpc" {
source = "../../modules/aws_customer_managed_vpc/"
databricks_account_id = var.databricks_account_id
databricks_account_username = var.databricks_account_username
databricks_account_password = var.databricks_account_password
region = var.region
relay_vpce_service = var.relay_vpce_service
workspace_vpce_service = var.workspace_vpce_service
vpce_subnet_cidr = cidrsubnet(data.aws_vpc.prod.cidr_block, 3, 3)
vpc_id = module.aws_base.vpc_id
subnet_ids = module.aws_base.subnets
security_group_id = module.aws_base.security_group[0]
cross_account_arn = module.aws_base.cross_account_role_arnproviders = {
databricks = databricks.mws
}
depends_on = [module.aws_base]
}module "aws_uc" {
source = "../../modules/aws_uc/"
databricks_account_id = var.databricks_account_id
databricks_account_username = var.databricks_account_username
databricks_account_password = var.databricks_account_password
region = var.region
workspaces_to_associate = [split("/", module.aws_customer_managed_vpc.workspace_id)[1]]
databricks_workspace_url = module.aws_customer_managed_vpc.workspace_url
}module "aws_fs_lakehouse" {
source = "../../modules/aws_fs_lakehouse/"
workspace_url = module.aws_customer_managed_vpc.workspace_url
databricks_account_username = var.databricks_account_username
databricks_account_password = var.databricks_account_password
crossaccount_role_name = split("/", module.aws_base.cross_account_role_arn)[1]
allow_ip_list = var.allow_ip_list
use_ip_access_list = var.use_ip_access_listproviders = {
databricks = databricks.workspace
}depends_on = [module.aws_uc]
}
```### Azure-Specific Changes
* Hub and Spoke Architecture with Azure Databricks workspace created per Spoke - The infrastructure deployed matches the design in the Data Exfiltration Prevention blog released by Databricks [here](https://www.databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html)
### GCP
* Bring-your-own-VPC configuration with GCP (see GCP folder)