Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tfayyaz/awesome-azure-databricks

Awesome content all about Azure Databricks
https://github.com/tfayyaz/awesome-azure-databricks

List: awesome-azure-databricks

awesome awesome-list azure azure-databricks delta-lake spark

Last synced: about 2 months ago
JSON representation

Awesome content all about Azure Databricks

Awesome Lists containing this project

README

        

# Awesome Azure Databricks

Awesome content all about Azure Databricks features, integrations with Azure products, integrations with partner products and architecture guidance.

# Azure Databricks Features

## Delta Lake

- [Real-Time Data Streaming With Databricks, Spark & Power BI](https://www.insight.com/en_US/content-and-resources/tech-tutorials/real-time-data-streaming-with-databricks-spark-and-power-bi.html) - Bennie Haelen (Insight) - 03-03-2021
- [From Kafka to Delta Lake using Apache Spark Structured Streaming](https://blogit.michelin.io/kafka-to-delta-lake-using-apache-spark-streaming-avro/) Fabien Pomerol (Michelin) - 16-02-2021

## Spark Structured Streaming

- [Ingest Azure Event Hub Telemetry Data with Apache PySpark Structured Streaming on Databricks](https://chinnychukwudozie.com/2021/05/17/ingest-azure-event-hub-telemetry-data-with-apache-pyspark-structured-streaming-on-databricks/) - Charles Chukwudozie (Microsoft) - 17-05-2021
- [NYC Taxi - Cosmos DB & Structured streaming](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3-1_2-12/Samples/Python/NYC-Taxi-Data/02_StructuredStreaming.ipynb)
- [Real-Time Data Streaming With Databricks, Spark & Power BI](https://www.insight.com/en_US/content-and-resources/tech-tutorials/real-time-data-streaming-with-databricks-spark-and-power-bi.html) - Bennie Haelen (Insight) - 03-03-2021
- [How to Consume Data from Apache Kafka Topics and Schema Registry with Confluent and Azure Databricks](https://www.confluent.io/blog/consume-avro-data-from-kafka-topics-and-secured-schema-registry-with-databricks-confluent-cloud-on-azure/) - Caio Moreno (Microsoft) - 04-02-2021
- [From Kafka to Delta Lake using Apache Spark Structured Streaming](https://blogit.michelin.io/kafka-to-delta-lake-using-apache-spark-streaming-avro/) - Fabien Pomerol (Michelin) - 16-02-2021
- [Stream Processing Event Hub Capture files with Autoloader](https://www.rakirahman.me/event-hub-capture-with-autoloader/) - Raki Rahman (Microsoft) - 04-01-2021
- [Exploring Azure Schema Registry with Spark](https://www.rakirahman.me/azure-schemaregistry-spark/) - Raki Rahman (Microsoft) - 02-12-2020
- [Incrementally Process Data Lake Files Using Azure Databricks Autoloader and Spark Structured Streaming API](https://chinnychukwudozie.com/2020/09/30/incrementally-process-data-lake-files-using-azure-databricks-autoloader-and-spark-structured-streaming-api/) - Charles Chukwudozie (Microsoft) - 30-09-2020
- [IBOR scenario using Azure Event Hubs and Azure Databricks](https://www.linkedin.com/pulse/ibor-scenario-using-azure-event-hubs-databricks-bananier-cipm/) - Laurent Bananier (Microsoft) - 17-08-2020
- [Real time stream processing with Databricks and Azure Event Hubs](https://techblog.fexcofts.com/2019/01/18/real-time-stream-processing-with-databricks-and-azure-event-hubs/) - Xavier Sierra - 18-01-2019

## Auto-Loader

- [Stream Processing Event Hub Capture files with Autoloader](https://www.rakirahman.me/event-hub-capture-with-autoloader/) - Raki Rahman (Microsoft) - 04-01-2021
- [Incrementally Process Data Lake Files Using Azure Databricks Autoloader and Spark Structured Streaming API](https://chinnychukwudozie.com/2020/09/30/incrementally-process-data-lake-files-using-azure-databricks-autoloader-and-spark-structured-streaming-api/) - Charles Chukwudozie (Microsoft) - 30-09-2020

## DBFS (Databricks File System)

- [Storage options and File manipulation Commands in Azure Databricks](https://www.analyticsvidhya.com/blog/2021/06/storage-options-and-file-manipulation-commands-in-azure-databricks/) - Vikash Rajluhaniwal - 24-06-2021

## Horovord

- [Single-node and distributed Deep Learning on Databricks](https://codebeez.nl/blogs/single-node-and-distributed-deep-learning-databricks/) - Luuk van der Velden & Rik Jongerius (Codebeez) - 01-03-2021

# Azure Product Integrations - Data & Analytics

## ADLS gen2

- [Storage options and File manipulation Commands in Azure Databricks](https://www.analyticsvidhya.com/blog/2021/06/storage-options-and-file-manipulation-commands-in-azure-databricks/) - Vikash Rajluhaniwal - 24-06-2021
- [How to connect Databricks to your Azure Data Lake](https://towardsdatascience.com/how-to-connect-databricks-to-your-azure-data-lake-ff499f4ca1c) - René Bremer (Microsoft) - 01-05-2021
- [How to bring your modern data pipeline to production](https://towardsdatascience.com/how-to-bring-your-modern-data-pipeline-to-production-2f14e42ac200) - René Bremer (Microsoft) - 25-10-2020

## Power BI

- [Real-Time Data Streaming With Databricks, Spark & Power BI](https://www.insight.com/en_US/content-and-resources/tech-tutorials/real-time-data-streaming-with-databricks-spark-and-power-bi.html) - Bennie Haelen (Insight) - 03-03-2021

## Azure Data Factory
- [Putting the Factory in Azure Data Factory: Dynamically generated Pipelines](https://godatadriven.com/blog/putting-the-factory-in-azure-data-factory-dynamically-generated-pipelines/) - Daniel van der Ende (Go Data Driven) - 21-12-2021
- [Just-in-time Azure Databricks access tokens and instance pools for Azure Data Factory pipelines using workspace automation](https://medium.com/microsoftazure/just-in-time-azure-databricks-access-tokens-and-instance-pools-for-azure-data-factory-pipelines-d1f8d1b6d28c) - Nicholas Hurt (Microsoft) - 08-06-2020
- [Single-node and distributed Deep Learning on Databricks](https://codebeez.nl/blogs/single-node-and-distributed-deep-learning-databricks/) - Luuk van der Velden & Rik Jongerius (Codebeez) - 01-03-2021
- [How to bring your modern data pipeline to production](https://towardsdatascience.com/how-to-bring-your-modern-data-pipeline-to-production-2f14e42ac200) - René Bremer (Microsoft) - 25-10-2020

## Azure Cosmos DB

- [NYC Taxi - Cosmos DB & Spark (Python)](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3-1_2-12/Samples/Python/NYC-Taxi-Data/01_Batch.ipynb)
- [NYC Taxi - Cosmos DB & Structured streaming](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3-1_2-12/Samples/Python/NYC-Taxi-Data/02_StructuredStreaming.ipynb)
- [Using Spark 3 connector for Cosmos DB SQL API with Azure Databricks](https://devblogs.microsoft.com/cosmosdb/spark-3-connector-databricks/) - Iranga Subasinghe (Microsoft) - 18-05-2021
- [Cosmos DB & PySpark – Retrieve all attributes from all Collections under all Databases](https://sqlwithmanoj.com/2021/04/12/cosmos-db-pyspark-retrieve-all-attributes-from-all-collections-under-all-databases/) - Manoj Pandey (Microsoft) - 12-04-2021
- [Using Python in Azure Databricks with Cosmos DB – DDL & DML operations by using “Azure-Cosmos” library for Python](https://sqlwithmanoj.com/2021/04/09/using-python-in-azure-databricks-with-cosmos-db-to-retrieve-all-attributes-from-all-collections-under-all-databases/) - Manoj Pandey (Microsoft) - 09-04-2021
- [Connect to Cosmos DB from Databricks and read data by using Apache Spark to Azure Cosmos DB connector](https://sqlwithmanoj.com/2021/04/07/connect-to-cosmos-db-from-databricks-and-read-data-by-apache-spark-to-azure-cosmos-db-connector/) - Manoj Pandey (Microsoft) - 07-04-2021
- [Azure CosmosDB and Databricks notebooks](https://github.com/Azure/azure-cosmosdb-spark#using-databricks-notebooks)
- [Integrate Azure Cosmos DB with Azure Databricks](https://towardsdatascience.com/revealed-a-ridiculously-easy-way-to-integrate-azure-cosmos-db-with-azure-databricks-4314cce0259b) - Prashanth Xavier (Luxoft) - 09-01-2021
- [How to bring your modern data pipeline to production](https://towardsdatascience.com/how-to-bring-your-modern-data-pipeline-to-production-2f14e42ac200) - René Bremer (Microsoft) - 25-10-2020

## Azure Event Hub

- [Structured Streaming + Event Hubs Integration Guide for PySpark](https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/PySpark/structured-streaming-pyspark.md) - Github
- [Detecting SQL Column Decryption using Purview, Kafka, Kafdrop and Spark](https://www.rakirahman.me/purview-sql-cle-events-with-kafdrop/) - Raki Rahman (Microsoft) - 03-10-2021
- [Ingest Azure Event Hub Telemetry Data with Apache PySpark Structured Streaming on Databricks](https://chinnychukwudozie.com/2021/05/17/ingest-azure-event-hub-telemetry-data-with-apache-pyspark-structured-streaming-on-databricks/) - Charles Chukwudozie (Microsoft) - 17-05-2021
- [Real-Time Data Streaming With Databricks, Spark & Power BI](https://www.insight.com/en_US/content-and-resources/tech-tutorials/real-time-data-streaming-with-databricks-spark-and-power-bi.html) - Bennie Haelen (Insight) - 03-03-2021
- [Stream Processing Event Hub Capture files with Autoloader](https://www.rakirahman.me/event-hub-capture-with-autoloader/) - Raki Rahman (Microsoft) - 04-01-2021
- [Exploring Azure Schema Registry with Spark](https://www.rakirahman.me/azure-schemaregistry-spark/) - Raki Rahman (Microsoft) - 02-12-2020
- [IBOR scenario using Azure Event Hubs and Azure Databricks](https://www.linkedin.com/pulse/ibor-scenario-using-azure-event-hubs-databricks-bananier-cipm/) - Laurent Bananier (Microsoft) - 17-08-2020
- [Real time stream processing with Databricks and Azure Event Hubs](https://techblog.fexcofts.com/2019/01/18/real-time-stream-processing-with-databricks-and-azure-event-hubs/) - Xavier Sierra (Fexco) - 18-01-2019

## Azure Purview

- [Databricks Notebook lineage using Azure Purview Atlas API](https://techcommunity.microsoft.com/t5/azure-purview/march-ahead-with-azure-purview-unify-all-your-data-using-apache/ba-p/2185411) - Vishal Anil (Microsoft) - 08-03-2021
- [Data Lineage from Databricks to Azure Purview](https://intellishore.dk/data-lineage-from-databricks-to-azure-purview/) - Anders Boje Hertz & Oliver Rise Thomsen (Intellishore) - 16-03-2021

## Microsoft Presidio

- [Anonymize PII using Presidio on Azure Databricks](https://microsoft.github.io/presidio/samples/deployments/spark/)

# Azure Product Integrations - Platform & DevOps

## Azure Networking

- [How to connect Databricks to your Azure Data Lake](https://towardsdatascience.com/how-to-connect-databricks-to-your-azure-data-lake-ff499f4ca1c) - René Bremer (Microsoft) - 01-05-2021

## Azure DevOps

- [Nutter for testing of Databricks notebooks in the CI/CD pipeline](https://github.com/alexott/databricks-nutter-projects-demo) - Alex Ott (Databricks) - 18-06-2021
- [CI/CD Tenmplats for Databricks andAzure Devops](https://github.com/databrickslabs/cicd-templates)
- [Using Terraform Outputs Connect To Azure Databricks Workspace via Azure DevOps](https://bzzzt.io/post/2021-05/2021-05-28-log-in-to-adb-workspace/) - Richie Lee (Sabin.io) - 28-05-2021
- [Elegant CICD with Databricks notebooks](https://codebeez.nl/blogs/elegant-cicd-databricks-notebooks/) - Rik Jongerius & Luuk van der Velden (Cloudbees) - 09-01-2021
- [Databricks Notebook Promotion using Azure DevOps](https://medium.com/road-to-data-engineering/databricks-notebook-promotion-using-azure-devops-5f3da5306751) - Himansu Sekhar - 03-01-2021
- [DataOps 3 - Databricks Code Promotion using DevOps CI/CD - YouTube](https://www.youtube.com/watch?v=R7tJZelEt-Q&t=172s) - David Lusty (Microsoft) - 11-02-2020
- [How to bring your modern data pipeline to production](https://towardsdatascience.com/how-to-bring-your-modern-data-pipeline-to-production-2f14e42ac200) - René Bremer (Microsoft) - 25-10-2020

## Azure Log Analytics

- [Monitoring Azure Databricks in an Azure Log Analytics Workspace](https://github.com/mspnp/spark-monitoring)

# Azure Product Integrations - Data Science, ML & AI

## Azure ML

- [MLOps for Azure Databricks Example](https://github.com/SaschaDittmann/MLOps-Databricks) - Sascha Dittmann (Microsoft) - 13-04-2021

# Architecture

- [Launching Databricks at If](https://medium.com/if-tech/launching-databricks-at-if-819be388aa8a) - Valdas Maksimavičius (Microsoft Data Platform MVP) - 31-03-2021
- [From Warehouse to Lakehouse – ELT/ETL Design Patterns with Azure Data Services](https://sqlofthenorth.blog/2021/03/29/elt-etl-design-patterns-with-azure-data-services/) - Mike Dobing (Microsoft) - 29-03-2021
- [How to bring your modern data pipeline to production](https://towardsdatascience.com/how-to-bring-your-modern-data-pipeline-to-production-2f14e42ac200) - René Bremer (Microsoft) - 25-10-2020

# Partner Product Integrations

## Confluent & Kafka

- [From Kafka to Delta Lake using Apache Spark Structured Streaming](https://blogit.michelin.io/kafka-to-delta-lake-using-apache-spark-streaming-avro/) Fabien Pomerol (Michelin) - 16-02-2021
- [How to Consume Data from Apache Kafka Topics and Schema Registry with Confluent and Azure Databricks](https://www.confluent.io/blog/consume-avro-data-from-kafka-topics-and-secured-schema-registry-with-databricks-confluent-cloud-on-azure/) - Caio Moreno (Microsoft) - 04-02-2021

## dbt

- [dbt Core integration with Azure Databricks](https://docs.microsoft.com/en-us/azure/databricks/dev-tools/dbt)
- [dbt Cloud integration with Azure Databricks](https://docs.microsoft.com/en-us/azure/databricks/dev-tools/dbt-cloud)

## Terraform

- [Using Terraform Outputs Connect To Azure Databricks Workspace via Azure DevOps](https://bzzzt.io/post/2021-05/2021-05-28-log-in-to-adb-workspace/) - Richie Lee (Sabin.io) - 28-05-2021