{"id":21074209,"url":"https://github.com/tomaztk/azure-databricks","last_synced_at":"2025-05-16T06:31:15.208Z","repository":{"id":45084468,"uuid":"317361268","full_name":"tomaztk/Azure-Databricks","owner":"tomaztk","description":"Azure Databricks - Advent of 2020 Blogposts","archived":false,"fork":false,"pushed_at":"2022-09-22T05:21:24.000Z","size":47044,"stargazers_count":46,"open_issues_count":0,"forks_count":33,"subscribers_count":7,"default_branch":"main","last_synced_at":"2023-03-09T08:55:50.331Z","etag":null,"topics":["azure-data-factory","azure-databricks","azure-machine-learnning","data-analytics","data-engineerg","databricks","databricks-notebooks","machine-learning","mlflow","mllib","notebook","notebooks","pyspark","python","r-language","scala","spark","spark-structured-streaming","sparkr","sql"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomaztk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2020-11-30T22:24:45.000Z","updated_at":"2023-03-03T23:31:50.000Z","dependencies_parsed_at":"2023-01-18T17:38:00.594Z","dependency_job_id":null,"html_url":"https://github.com/tomaztk/Azure-Databricks","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaztk%2FAzure-Databricks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaztk%2FAzure-Databricks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaztk%2FAzure-Databricks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaztk%2FAzure-Databricks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomaztk","download_url":"https://codeload.github.com/tomaztk/Azure-Databricks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225411520,"owners_count":17470245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-data-factory","azure-databricks","azure-machine-learnning","data-analytics","data-engineerg","databricks","databricks-notebooks","machine-learning","mlflow","mllib","notebook","notebooks","pyspark","python","r-language","scala","spark","spark-structured-streaming","sparkr","sql"],"created_at":"2024-11-19T19:15:00.877Z","updated_at":"2024-11-19T19:15:01.482Z","avatar_url":"https://github.com/tomaztk.png","language":"Jupyter Notebook","readme":"\u003c!-- README.md was wriiten in beautiful MacDown  --\u003e\n# Microsoft Azure Databricks\n\n\u003c!-- badges: start --\u003e\n![](http://img.shields.io/badge/Azure-Databricks-red.svg) ![](http://img.shields.io/badge/Microsoft-Azure-blue.svg) \n[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Ftomaztk%2FAzure-Databricks\u0026count_bg=%2379C83D\u0026title_bg=%23555555\u0026icon=microsoftazure.svg\u0026icon_color=%230A6BFF\u0026title=hits\u0026edge_flat=false)](https://hits.seeyoufarm.com)\n![](https://img.shields.io/github/forks/tomaztk/azure-databricks?style=social)\n\u003c!-- badges: end --\u003e\n\n\n\u003cimg src=\"images/logo-databricks.png\" align=\"right\" width=\"400\" /\u003e\n\u003cimg src=\"images/logo-azure.svg\"  width=\"240\" /\u003e\n\n\n\u003cspan style=\"font-size: x-large; font-weight: normal;\"\u003eMicrosoft Azure Databricks repository is \na set of blogposts as a **Advent of Azure Databricks** _2020_ presented to readers for easier onboarding with Azure Databricks! \u003c/span\u003e\n\n\n## Table of content / Featured blogposts \n\n1. [Dec 01 2020 - What is Azure DataBricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2001%202020%20-%20What%20is%20Azure%20DataBricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/01/advent-of-2020-day-1-what-is-azure-databricks/))\n2. [Dec 02 2020 - How to get started with Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/%20Dec%2002%202020%20-%20How%20to%20get%20started%20with%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/02/advent-of-2020-day-2-how-to-get-started-with-azure-databricks/))\n3. [Dec 03 2020 - Getting to know the workspace and Azure Databricks platform](https://github.com/tomaztk/Azure-Databricks/blob/main/%20Dec%2003%202020%20-%20Getting%20to%20know%20the%20workspace%20and%20Azure%20Databricks%20platform.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/03/advent-of-2020-day-3-getting-to-know-the-workspace-and-azure-databricks-platform/))\n4. [Dec 04 2020 - Creating your first Azure Databricks cluster](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2004%202020%20-%20Creating%20your%20first%20Azure%20Databricks%20cluster.md) ([blogspot](https://tomaztsql.wordpress.com/2020/12/04/advent-of-2020-day-4-creating-your-first-azure-databricks-cluster/))\n5. [Dec 05 2020 - Basics on architecture of clusters, workers, DBFS storage jobs](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2005%202020%20-%20Understanding%20Azure%20Databricks%20cluster%20architecture%2C%20workers%2C%20drivers%20and%20jobs.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/05/advent-of-2020-day-5-understanding-azure-databricks-cluster-architecture-workers-drivers-and-jobs/))\n6. [Dec 06 2020 -  Importing and storing data to Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2006%202020%20-%20Importing%20and%20storing%20data%20to%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/06/advent-of-2020-day-6-importing-and-storing-data-to-azure-databricks/))\n7. [Dec 07 2020 - Starting with Databricks notebooks and loading data](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2007%202020%20-%20Starting%20with%20Databricks%20notebooks%20and%20loading%20data%20to%20DBFS.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/07/advent-of-2020-day-7-starting-with-databricks-notebooks-and-loading-data-to-dbfs/))\n8. [Dec 08 2020 - Using Databricks CLI and DBFS CLI for file upload](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2008%202020%20-%20Using%20Databricks%20CLI%20and%20DBFS%20CLI%20for%20file%20upload.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/08/advent-of-2020-day-8-using-databricks-cli-and-dbfs-cli-for-file-upload/))\n9. [Dec 09 2020 - Connect to Azure Blob storage using Notebooks in Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2009%202020%20-%20Connect%20to%20Azure%20Blob%20storage%20using%20Notebooks%20in%20%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/09/advent-of-2020-day-9-connect-to-azure-blob-storage-using-notebooks-in-azure-databricks/))\n10. [Dec 10 2020 - Using Azure Databricks Notebooks with SQL for Data engineering tasks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2010%202020%20-%20Using%20Azure%20Databricks%20Notebooks%20with%20SQL%20for%20Data%20engineering%20tasks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/10/advent-of-2020-day-10-using-azure-databricks-notebooks-with-sql-for-data-engineering-tasks/))\n11. [Dec 11 2020 - Using Azure Databricks Notebooks with R to do Data engineerg and data analytics](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2011%202020%20-%20Using%20Azure%20Databricks%20Notebooks%20with%20SQL%20for%20Data%20engineering%20tasks.md)) ([blogpost](https://tomaztsql.wordpress.com/2020/12/11/advent-of-2020-day-11-using-azure-databricks-notebooks-with-r-language-for-data-analytics/))\n12. [Dec 12 2020 - Using Azure Databricks Notebooks with Python to do Data engineerg and data analytics](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2012%202020%20-%20Using%20Azure%20Databricks%20Notebooks%20with%20Python%20Language%20for%20data%20analytics.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/12/advent-of-2020-day-12-using-azure-databricks-notebooks-with-python-language-for-data-analytics/))\n13. [Dec 13 2020 - Using Python Databricks Koalas with Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2013%202020%20-%20Using%20Python%20Databricks%20Koalas%20with%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/13/advent-of-2020-day-13-using-python-databricks-koalas-with-azure-databricks/))\n14. [Dec 14 2020 - From configuration to execution of Databricks jobs](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2014%202020%20-%20%20From%20configuration%20to%20execution%20of%20Databricks%20jobs.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/14/advent-of-2020-day-14-from-configuration-to-execution-of-databricks-jobs/))\n15. [Dec 15 2020 - Databricks Spark UI, Event Logs, Driver logs and Metrics](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2015%202020%20-%20Databricks%20Spark%20UI%2C%20Event%20Logs%2C%20Driver%20logs%20and%20Metrics.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/15/advent-of-2020-day-15-databricks-spark-ui-event-logs-driver-logs-and-metrics/))\n16. [Dec 16 2020 - Databricks experiments, models and MLFlow](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2016%202020%20-%20Databricks%20experiments%2C%20models%20and%20MLFlow.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/16/advent-of-2020-day-16-databricks-experiments-models-and-mlflow/))\n17. [Dec 17 2020 - End-to-End Machine learning project in Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2017%202020%20-%20End-to-End%20Machine%20learning%20project%20in%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/17/advent-of-2020-day-17-end-to-end-machine-learning-project-in-azure-databricks/))\n18. [Dec 18 2020 - Using Azure Data Factory with Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2018%202020%20-%20Using%20Azure%20Data%20Factory%20with%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/18/advent-of-2020-day-18-using-azure-data-factory-with-azure-databricks/))\n19. [Dec 19 2020 - Using Azure Data Factory with Azure Databricks for merging CSV files](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2019%202020%20-%20Using%20Azure%20Data%20Factory%20with%20Azure%20Databricks%20for%20merging%20CSV%20files.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/19/advent-of-2020-day-19-using-azure-data-factory-with-azure-databricks-for-merging-csv-files/))\n20. [Dec 20 2020 - Orchestrating multiple notebooks with Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2020%202020%20-%20Orchestrating%20multiple%20notebooks%20with%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/16/advent-of-2020-day-16-databricks-experiments-models-and-mlflow/\n))\n21. [Dec 21 2020 - Using Scala with Spark Core API in Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2021%202020%20-%20Using%20Scala%20with%20Spark%20Core%20API%20in%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/21/advent-of-2020-day-21-using-scala-with-spark-core-api-in-azure-databricks/))\n22. [Dec 22 2020 - Using Spark SQL and DataFrames in Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2022%202020%20-%20Using%20Spark%20SQL%20and%20DataFrames%20in%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/22/advent-of-2020-day-22-using-spark-sql-and-dataframes-in-azure-databricks/))\n23. [Dec 23 2020 - Using Spark Streaming in Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2023%202020%20-%20Using%20Spark%20Streaming%20in%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/23/advent-of-2020-day-23-using-spark-streaming-in-azure-databricks/))\n24. [Dec 24 2020 - Using Spark MLlib for Machine Learning in Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2024%202020%20-%20Using%20Spark%20MLlib%20for%20Machine%20Learning%20in%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/24/advent-of-2020-day-24-using-spark-mllib-for-machine-learning-in-azure-databricks/))\n25. [Dec 25 2020 - Using Spark GraphFrames in Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2025%202020%20-%20Using%20Spark%20GraphFrames%20in%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/25/advent-of-2020-day-25-using-spark-graphframes-in-azure-databricks/))\n26. [Dec 26 2020 - Connecting Azure Machine Learning Services Workspace and Azure Databricks](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2026%202020%20-%20Connecting%20Azure%20Machine%20Learning%20Services%20Workspace%20and%20Azure%20Databricks.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/26/advent-of-2020-day-26-connecting-azure-machine-learning-services-workspace-and-azure-databricks/))\n27. [Dec 27 2020 - Connecting Azure Databricks with on premise environment](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2027%202020%20-%20Connecting%20Azure%20Databricks%20with%20on%20premise%20environment.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/27/advent-of-2020-day-27-connecting-azure-databricks-with-on-premise-environment/))\n28. [Dec 28 2020 - Infrastructure as Code and how to automate, script and deploy Azure Databricks with Powershell](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2028%202020%20-%20Infrastructure%20as%20Code%20and%20how%20to%20automate%2C%20script%20and%20deploy%20Azure%20Databricks%20with%20Powershell.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/28/advent-of-2020-day-28-infrastructure-as-code-and-how-to-automate-script-and-deploy-azure-databricks-with-powershell/))\n29. [Dec 29 2020 - Performance tuning for Apache Spark](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2029%202020%20-%20Performance%20tuning%20for%20Apache%20Spark.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/29/advent-of-2020-day-29-performance-tuning-of-apache-spark/))\n30. [Dec 30 2020 - Monitoring and troubleshooting of Apache Spark](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2030%202020%20-%20Monitoring%20and%20troubleshooting%20of%20Apache%20Spark.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/30/advent-of-2020-day-30-monitoring-and-troubleshooting-of-apache-spark/))\n31. [Dec 31 2020 - Azure Databricks documentation, learning materials and additional resources](https://github.com/tomaztk/Azure-Databricks/blob/main/Dec%2031%202020%20-%20Azure%20Databricks%20documentation%2C%20learning%20materials%20and%20additional%20resources.md) ([blogpost](https://tomaztsql.wordpress.com/2020/12/31/advent-of-2020-day-31-azure-databricks-documentation-learning-materials-and-additional-resources/))\n\n## Additional Material\n\nAdditional Material as a collection of demo materials from different sessions is also available for use in this repository.\n\n## Blog\n\nAll posts were originally posted on my [blog](https://tomaztsql.wordpress.com) and made copy here at Github. On Github is extremely simple to clone the code, markdown file and all the materials.\n\n## Cloning the repository\nYou can follow the steps below to clone the repository.\n\n```\ngit clone -n https://github.com/tomaztk/Azure-Databricks.git\n```\n\n## Contact\nGet in contact:\n\n [![Gmail](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge\u0026logo=gmail\u0026logoColor=white\u0026)](mailto:tomaztsql@gmail.com?subject=[GithubRepo]%20AzureDatabricks)\n \n [![Github URL](https://img.shields.io/twitter/url/https/twitter.com/tomaz_tsql.svg?style=social\u0026label=Follow%20%40tomaz_tsql)](https://github.com/tomaztk)\n\n\u003c!--\n\u003ca class=\"github-button\" href=\"https://github.com/tomaztk\" data-show-count=\"true\" aria-label=\"Follow @tomaztk on GitHub\"\u003eFollow @tomaztk\u003c/a\u003e\n\u003cscript async defer src=\"https://buttons.github.io/buttons.js\"\u003e\u003c/script\u003e  --\u003e\n\n\n## Contributing\nDo the usual GitHub fork and pull request dance. Add yourself (or I will add you to the contributors section) if you want to. \n\n\n## Suggestions\nFeel free to suggest any new topics that you would like to be covered.\n\n\n## License\n[MIT](https://choosealicense.com/licenses/mit/) © Tomaž Kaštrun\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomaztk%2Fazure-databricks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomaztk%2Fazure-databricks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomaztk%2Fazure-databricks/lists"}