Projects in Awesome Lists tagged with databricks-notebooks
A curated list of projects in awesome lists tagged with databricks-notebooks .
https://github.com/microsoft/nutter
Testing framework for Databricks notebooks
azuredevops databricks databricks-notebooks
Last synced: 16 May 2025
https://github.com/Azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
apache-spark azure-cosmos-db azure-databricks changefeed connector cosmos-db databricks databricks-notebooks jupyter-notebook lambda-architecture pyspark spark
Last synced: 10 May 2025
https://github.com/azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
apache-spark azure-cosmos-db azure-databricks changefeed connector cosmos-db databricks databricks-notebooks jupyter-notebook lambda-architecture pyspark spark
Last synced: 02 Mar 2025
https://github.com/tomaztk/azure-databricks
Azure Databricks - Advent of 2020 Blogposts
azure-data-factory azure-databricks azure-machine-learnning data-analytics data-engineerg databricks databricks-notebooks machine-learning mlflow mllib notebook notebooks pyspark python r-language scala spark spark-structured-streaming sparkr sql
Last synced: 16 May 2025
https://github.com/jaceklaskowski/learn-databricks
Notebooks to learn Databricks Lakehouse Platform
databricks databricks-notebooks delta-live-tables mlflow
Last synced: 16 Apr 2025
https://github.com/santiagortiiz/advanced-data-engineering-with-databricks
Databricks. Incremental data processing, task orchestration, and production job monitoring.
big-data databricks databricks-notebooks kafka spark spark-streaming streaming
Last synced: 08 Mar 2026
https://github.com/analyticalmonk/pyspark_nlp_workshop
Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"
databricks databricks-notebooks distributed-computing nlp pyspark spark spark-nlp workshop
Last synced: 05 Oct 2025
https://github.com/hmiladhia/nbmanips
nbmanips allows you easily manipulate ipynb files
databricks-notebooks dbc jupyter-notebook markdown notebook python zeppelin zeppelin-notebook zpln
Last synced: 28 Oct 2025
https://github.com/fvaleye/delta-buddy
Introducing Delta-Buddy: Your ultimate Delta Lake companion! 🚀 Streamline your data journey with an AI-powered chatbot. Ask Delta-Buddy anything about your Delta Lake.
chromadb data-privacy databricks-notebooks delta-lake dolly langchain llm python
Last synced: 14 Feb 2026
https://github.com/majdi-akrmi/elt-ipl
This is an End-to-End Data Engineering Project that using the IPL Dataset.
apache-spark databricks-notebooks pyspark snowflake snowsql
Last synced: 20 Aug 2025
https://github.com/newrelic-experimental/nri-spark
This New Relic standalone integration polls the Apache Spark REST API for metrics and pushes them into New Relic using Metrics API It uses the New Relic Telemetry sdk for go
apache-spark databricks databricks-notebooks metrics newrelic nrlabs nrlabs-data nrlabs-odp spark
Last synced: 10 Apr 2025
https://github.com/easonlai/samples_for_azure_databricks_orientation
Samples for Azure Databricks Orientation
azure azure-storage azureblobstorage azuresqldb databricks databricks-notebooks datacleaning json json-schema matplotlib matplotlib-pyplot pandas pandas-dataframe pyodbc pyspark pyspark-notebook pyspark-tutorial python seaborn seaborn-plots
Last synced: 26 Apr 2025
https://github.com/retkowsky/cloud_workshop_azuredatabricks
Cloud Workshop Azure Databricks
azure databricks databricks-notebooks
Last synced: 22 Apr 2025
https://github.com/aessing/demo-mdwh
Modern Dataware House Demos with Azure Databricks, Azure Data Factory & Azure Dedicated SQL pool (formerly SQL DW)
azure azure-data-factory azure-databricks data data-engineering data-science databricks databricks-notebooks datafactory datalake datawarehouse datawarehousing delta-lake demos etl machine-learning mdwh ml modern-data-warehouse spark
Last synced: 26 Jun 2025
https://github.com/ac-gomes/data-engineering-with-databricks
A simple boilerplate for data engineering and data analysis training in Databricks.
data-analysis data-engineering databricks databricks-notebooks pyspark python unit-testing
Last synced: 30 Apr 2025
https://github.com/easonlai/databricks_odbc_connection_to_azure_sql_db_with_azure_ad_user_access_token
Making ODBC connection from Databricks (Azure Databricks) to Azure SQL Database with Azure AD User Access Token.
azure azuread azuredatabricks azuresql azuresqldb bigdata data-analysis dataanalysis dataanalytics databricks databricks-notebooks datascience microsoft microsoft-azure microsoftazure odbc odbc-driver pandas pyodbc spark
Last synced: 04 May 2026
https://github.com/tknishh/olympic-data-analysis-azure
End-to-End data engineering project with Azure Databricks as cloud service and Tokyo olympic data
azure-storage databricks-notebooks datafactory de-project olympic-data synapse-analytics
Last synced: 03 Mar 2026
https://github.com/abdelmajidlh/spark-functionality-repo
Ce dépôt GitHub contient un document détaillé sur les bases du langage Scala.
apache apachespark databricks databricks-notebooks pyspark python3 scala spark
Last synced: 11 Feb 2026
https://github.com/nhsdigital/sde_example_analysis
Example of what you can do in Databricks in the Secure Data Environment (SDE) using Python, SQL, and R.
data-analysis data-science databricks-notebooks machine-learning mlflow
Last synced: 25 Oct 2025
https://github.com/hjh17/dbloy
Continuous Delivery tool for PySpark Notebooks based jobs on Databricks
ci-cd cli databricks databricks-notebooks pyspark pyspark-notebook python3
Last synced: 01 Apr 2026
https://github.com/easonlai/databricks_delta_table_samples
This is a code sample repository for demonstrating how to perform Databricks Delta Table operations.
databricks databricks-notebooks delta delta-lake deltalake pyspark pyspark-notebook python
Last synced: 09 Jul 2025
https://github.com/marvinbuss/small_samples
Small samples from daily work.
databricks databricks-notebooks samples
Last synced: 04 Apr 2025
https://github.com/mensenvau/data_migration_validation
Data Validation Documentation for Source and Target Tables in Databricks
data-migration data-validation databricks-notebooks
Last synced: 17 Jun 2025
https://github.com/ajaxbarcelonacruyff/databricks_bigquery
Extract BigQuery tables in Databricks Notebook
bigquery databricks databricks-notebooks ga4 googleanalytics
Last synced: 17 Apr 2026
https://github.com/shogunbanik18/budgetify
End-to-End Budget Analysis enables effective budgeting through detailed analysis and strategic planning
analysis data data-engineering data-exploration databricks databricks-notebooks etl etl-process python3
Last synced: 09 Jun 2026
https://github.com/easonlai/eda_for_prudential_life_insurance_sample_data
Notebook sample of Exploratory Data Analysis (EDA) for Prudential Life Insurance Sample Data
azure-databricks azuredatabricks data-analysis data-analysis-python data-analytics databricks databricks-notebooks eda exploratory-data-analysis insurance insurance-sample-data jupyter-notebook python python3
Last synced: 14 May 2026
https://github.com/elastacloud/databricks-dotnet-rest-sdk
An SDK for the Databricks REST API in dotnet
databricks databricks-notebooks dotnet dotnet-library
Last synced: 16 Apr 2026
https://github.com/edisedis777/pyspark-ml-features
A PySpark implementation of 6 lesser-known Scikit-Learn features optimized for Azure Databricks. This project translates powerful machine learning techniques from Scikit-Learn into PySpark's distributed computing framework.
azure databricks databricks-notebooks large-scale machine-learning pyspark python scikit-learn scikitlearn-machine-learning
Last synced: 13 Apr 2026
https://github.com/travelxml/apache-spark-pyspark-databricks
APACHE SPARK: Data Analysis, Transformation, and Visualisation with PySpark, IPL Data Analysis
apache-spark data-science data-visualization databricks databricks-notebooks dataframe ipl machine-learning pyspark pyspark-mllib pyspark-notebook pyspark-python pyspark-tutorial
Last synced: 26 Jan 2026
https://github.com/sumit-sinha9/ipl-data-analysis-using-apache-spark-on-databricks
This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.
apache-spark aws-s3 databricks-notebooks python
Last synced: 09 May 2026
https://github.com/nasirkadri2601/live_cricket_data_pipeline
This project captures live cricket data in raw JSON format, cleans and transforms it, and stores it in a centralized data warehouse. The data is then used for analysis, including match outcome predictions, player performance, and team strategy insights, enabling data-driven decisions.
airflow-docker apache-spark aws-lambda aws-s3 databricks-notebooks python snowflake
Last synced: 02 May 2026
https://github.com/aaryan-agr/2015-yellow-taxi-data-analysis
Exploring NYC Yellow Cab trips through comprehensive EDA techniques to uncover usage patterns and insights.
apache-spark big-data databricks-notebooks
Last synced: 16 May 2026
https://github.com/retkowsky/workshop_azuredatabricks_-_azuremlservice
Notebooks Azure Databricks avec Azure ML service
azure databricks-notebooks microsoft python
Last synced: 18 May 2026
https://github.com/hsm207/grab-safety
My submission for Grab AI for S.E.A. challenge
databricks databricks-notebooks pyspark spark-ml telematics
Last synced: 16 Jun 2025
https://github.com/rhejos/ipl_data_analysis
This project explores data analysis of the Indian Premier League utilizing AWS S3, Apache Spark, python, and SQL.
apache-spark aws-s3 databricks-notebooks pyspark sql
Last synced: 07 Mar 2026
https://github.com/sebader/azuredatabricks-samples
azure databricks-notebooks iothub
Last synced: 02 May 2026
https://github.com/rajeev11256/ipl-data-analysis-project-using-apache-spark-on-databricks
This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.
data-engineering databricks-notebooks pyspark python
Last synced: 09 Apr 2025
https://github.com/santoshshinde2012/medallion-architecture-databrics
Medallion Architecture: Principles and Practical Exploration
data data-plat data-science databricks databricks-notebooks medallion-architecture
Last synced: 26 Jul 2025
https://github.com/jotstolu/car-sales-end-to-end-data-engineering-project-using-azure-databricks
This project presents a scalable end-to-end data pipeline designed for processing and analysing car sales data using the Azure Cloud and Databricks ecosystem.
azurecloud azuredatabricks azuredatafactory azuredatalakegen2 azuresqldb databricks databricks-notebooks delta-lake delta-lake-table dlt etl-pipeline pyspark pyspark-notebook unitycatalog
Last synced: 06 Feb 2026
https://github.com/euiyounghwang/euiyounghwang.github.io
Software Engineer: Euiyoung Hwang
alertmanager artificial-intelligence-algorithms databricks-notebooks datadog django-rest-framework docker elk-stack golang grafana jupyter-notebook kafka mongodb oracle postgresql prometheus python-stack rabbitmq redis rest-api-framework spring-boot
Last synced: 10 Apr 2026
https://github.com/jotstolu/retail-orders-end-to-end-data-engineering-project-using-azure-databricks
This project demonstrates the development of a scalable end-to-end data pipeline for processing and reporting retail order data using Azure Cloud services, Delta Lake architecture, and Databricks.
azure-devops azurecloud azuredatabricks azuredatafactory bigdataanalytics databricks databricks-notebooks delta-lake medallion-architecture pyspark pyspark-notebook starschema unitycatalog
Last synced: 20 Feb 2026
https://github.com/thaitechtales/databricks
This repository is dedicated to showcasing projects built on Databricks, focusing on big data analytics, data engineering, and machine learning workflows.
apache-spark big-data cloud-data-platform data-analytics data-engineering databricks databricks-notebooks etl machine-learning
Last synced: 18 Apr 2026
https://github.com/nabojyoti/elt-ipl
This is an End-to-End Data Engineering Project that using the IPL Dataset.
apache-spark databricks-notebooks pyspark snowflake snowsql
Last synced: 16 Jan 2026
https://github.com/bhavanachitragar/ipl-data-analysis-project-using-apache-spark-on-databricks
This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.
apache-spark aws-s3 databricks-notebooks python
Last synced: 03 Feb 2026
https://github.com/darrendavy12/databricks_projects
topic-specific projects and end-to-end project
certification data-engineering databricks databricks-notebooks databricks-workspace delta-lake pipelines pyspark python sparksql sql unity-catalog workflow
Last synced: 03 Jul 2025
https://github.com/vandanabhumireddygari/open-table-formats-with-databricks-and-delta-lake
This project demonstrates the use of Open Table Formats with Databricks, PySpark, and Delta Lake. It covers data ingestion, transformation, querying, and storage management using Delta tables. The project includes code for loading data, writing it to Delta format, querying, and utilizing Delta Lake
databricks-notebooks opentableformat pyspark-notebook python
Last synced: 04 May 2026
https://github.com/rajeev11256/flipkart-data-analysis-using-pyspark-on-databricks
The project focuses on building an end-to-end data engineering pipeline using PySpark to address real-world business scenarios. Key steps include exploring and understanding the dataset structure, performing data cleaning to handle inconsistencies, and applying transformations to prepare the data for analysis.
data-engineering databricks-notebooks pyspark python
Last synced: 09 Apr 2025
https://github.com/prateekmaj21/big-data-engineering
Code files for Databricks
Last synced: 11 Oct 2025
https://github.com/mananabbasi/data-science-complete-project-using-big-data-tools-techniques-
This repository contains Databricks projects utilizing RDDs, DataFrames, and SQL to process and analyze various real-world datasets. Data cleaning and analysis have been performed using PySpark functions to handle challenges such as inconsistent formats, missing values, and complex data structures. The project ensures efficient data transformation
azure databricks databricks-industry-solutions databricks-notebooks dataframe pyspark-mllib pyspark-notebook pyspark-python python-script rdd
Last synced: 23 Jan 2026
https://github.com/srking501/csc8101_coursework
A summative coursework for CSC8101 Engineering for AI
apache-parquet apache-spark azure-databricks big-data big-data-analytics big-data-processing data-science databri databricks-notebooks delta-file nyc-taxi-dataset parquet-files pyspark
Last synced: 12 Feb 2026
https://github.com/phelipe-sempreboni/databricks
Repository for tutorials, information and notes about databricks.
databricks databricks-connect databricks-dbconnect databricks-deploy databricks-notebooks databricks-workspace
Last synced: 19 Apr 2026
https://github.com/bhavanachitragar/flipkart-data-analysis-using-pyspark-on-databricks
The project focuses on building an end-to-end data engineering pipeline using PySpark to address real-world business scenarios. Key steps include exploring and understanding the dataset structure, performing data cleaning to handle inconsistencies, and applying transformations to prepare the data for analysis.
data-engineering databricks-notebooks pyspark python
Last synced: 27 Apr 2026
https://github.com/darrendavy12/earthquake-events-and-risks-project---azure-data-pipeline---api-connection-
Earthquake Events and Risks Project - Azure Data Pipeline - API Connection
azure blob-storage cloud cloudstorage data databricks databricks-notebooks databricks-workspace dataengineer dataengineering microsoft python
Last synced: 28 Apr 2026
https://github.com/bytebyrajeev/ipl-data-analysis-project-using-apache-spark-on-databricks
This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.
data-engineering databricks-notebooks pyspark python
Last synced: 29 Apr 2026
https://github.com/bytebyrajeev/flipkart-data-analysis-using-pyspark-on-databricks
The project focuses on building an end-to-end data engineering pipeline using PySpark to address real-world business scenarios. Key steps include exploring and understanding the dataset structure, performing data cleaning to handle inconsistencies, and applying transformations to prepare the data for analysis.
data-engineering databricks-notebooks pyspark python
Last synced: 30 Apr 2026
https://github.com/arnoldchrisoduor1/linearregression-model-with-apachespark-and-databricks
Using Apache pySpark on DataBricks, I was able to do feature Engineering on Customer Data, trained and used a Linear Regression Model to predict their bill based on previous customer trends.
apache-spark databricks-notebooks linear-regression pyspark python3 vectorassembler
Last synced: 01 May 2026
https://github.com/seifo321/microsoft-data-engineer-project
Leveraging Microsoft AZURE Services , DEVELOPING a high performance ETL pipeline that extracts and transform the BikeStores data and loads it to Azure data warehouse
azure azuresynapseanalytics databricks-notebooks dataengineering etl-automation etl-pipeline machine-learning predective-modeling sqlserver
Last synced: 07 May 2026