An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with databricks-notebooks

A curated list of projects in awesome lists tagged with databricks-notebooks .

https://github.com/microsoft/nutter

Testing framework for Databricks notebooks

azuredevops databricks databricks-notebooks

Last synced: 16 May 2025

https://github.com/jaceklaskowski/learn-databricks

Notebooks to learn Databricks Lakehouse Platform

databricks databricks-notebooks delta-live-tables mlflow

Last synced: 16 Apr 2025

https://github.com/santiagortiiz/advanced-data-engineering-with-databricks

Databricks. Incremental data processing, task orchestration, and production job monitoring.

big-data databricks databricks-notebooks kafka spark spark-streaming streaming

Last synced: 08 Mar 2026

https://github.com/analyticalmonk/pyspark_nlp_workshop

Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"

databricks databricks-notebooks distributed-computing nlp pyspark spark spark-nlp workshop

Last synced: 05 Oct 2025

https://github.com/fvaleye/delta-buddy

Introducing Delta-Buddy: Your ultimate Delta Lake companion! 🚀 Streamline your data journey with an AI-powered chatbot. Ask Delta-Buddy anything about your Delta Lake.

chromadb data-privacy databricks-notebooks delta-lake dolly langchain llm python

Last synced: 14 Feb 2026

https://github.com/majdi-akrmi/elt-ipl

This is an End-to-End Data Engineering Project that using the IPL Dataset.

apache-spark databricks-notebooks pyspark snowflake snowsql

Last synced: 20 Aug 2025

https://github.com/newrelic-experimental/nri-spark

This New Relic standalone integration polls the Apache Spark REST API for metrics and pushes them into New Relic using Metrics API It uses the New Relic Telemetry sdk for go

apache-spark databricks databricks-notebooks metrics newrelic nrlabs nrlabs-data nrlabs-odp spark

Last synced: 10 Apr 2025

https://github.com/ac-gomes/data-engineering-with-databricks

A simple boilerplate for data engineering and data analysis training in Databricks.

data-analysis data-engineering databricks databricks-notebooks pyspark python unit-testing

Last synced: 30 Apr 2025

https://github.com/tknishh/olympic-data-analysis-azure

End-to-End data engineering project with Azure Databricks as cloud service and Tokyo olympic data

azure-storage databricks-notebooks datafactory de-project olympic-data synapse-analytics

Last synced: 03 Mar 2026

https://github.com/abdelmajidlh/spark-functionality-repo

Ce dépôt GitHub contient un document détaillé sur les bases du langage Scala.

apache apachespark databricks databricks-notebooks pyspark python3 scala spark

Last synced: 11 Feb 2026

https://github.com/nhsdigital/sde_example_analysis

Example of what you can do in Databricks in the Secure Data Environment (SDE) using Python, SQL, and R.

data-analysis data-science databricks-notebooks machine-learning mlflow

Last synced: 25 Oct 2025

https://github.com/hjh17/dbloy

Continuous Delivery tool for PySpark Notebooks based jobs on Databricks

ci-cd cli databricks databricks-notebooks pyspark pyspark-notebook python3

Last synced: 01 Apr 2026

https://github.com/easonlai/databricks_delta_table_samples

This is a code sample repository for demonstrating how to perform Databricks Delta Table operations.

databricks databricks-notebooks delta delta-lake deltalake pyspark pyspark-notebook python

Last synced: 09 Jul 2025

https://github.com/marvinbuss/small_samples

Small samples from daily work.

databricks databricks-notebooks samples

Last synced: 04 Apr 2025

https://github.com/mensenvau/data_migration_validation

Data Validation Documentation for Source and Target Tables in Databricks

data-migration data-validation databricks-notebooks

Last synced: 17 Jun 2025

https://github.com/ajaxbarcelonacruyff/databricks_bigquery

Extract BigQuery tables in Databricks Notebook

bigquery databricks databricks-notebooks ga4 googleanalytics

Last synced: 17 Apr 2026

https://github.com/shogunbanik18/budgetify

End-to-End Budget Analysis enables effective budgeting through detailed analysis and strategic planning

analysis data data-engineering data-exploration databricks databricks-notebooks etl etl-process python3

Last synced: 09 Jun 2026

https://github.com/elastacloud/databricks-dotnet-rest-sdk

An SDK for the Databricks REST API in dotnet

databricks databricks-notebooks dotnet dotnet-library

Last synced: 16 Apr 2026

https://github.com/edisedis777/pyspark-ml-features

A PySpark implementation of 6 lesser-known Scikit-Learn features optimized for Azure Databricks. This project translates powerful machine learning techniques from Scikit-Learn into PySpark's distributed computing framework.

azure databricks databricks-notebooks large-scale machine-learning pyspark python scikit-learn scikitlearn-machine-learning

Last synced: 13 Apr 2026

https://github.com/sumit-sinha9/ipl-data-analysis-using-apache-spark-on-databricks

This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.

apache-spark aws-s3 databricks-notebooks python

Last synced: 09 May 2026

https://github.com/nasirkadri2601/live_cricket_data_pipeline

This project captures live cricket data in raw JSON format, cleans and transforms it, and stores it in a centralized data warehouse. The data is then used for analysis, including match outcome predictions, player performance, and team strategy insights, enabling data-driven decisions.

airflow-docker apache-spark aws-lambda aws-s3 databricks-notebooks python snowflake

Last synced: 02 May 2026

https://github.com/aaryan-agr/2015-yellow-taxi-data-analysis

Exploring NYC Yellow Cab trips through comprehensive EDA techniques to uncover usage patterns and insights.

apache-spark big-data databricks-notebooks

Last synced: 16 May 2026

https://github.com/retkowsky/workshop_azuredatabricks_-_azuremlservice

Notebooks Azure Databricks avec Azure ML service

azure databricks-notebooks microsoft python

Last synced: 18 May 2026

https://github.com/hsm207/grab-safety

My submission for Grab AI for S.E.A. challenge

databricks databricks-notebooks pyspark spark-ml telematics

Last synced: 16 Jun 2025

https://github.com/rhejos/ipl_data_analysis

This project explores data analysis of the Indian Premier League utilizing AWS S3, Apache Spark, python, and SQL.

apache-spark aws-s3 databricks-notebooks pyspark sql

Last synced: 07 Mar 2026

https://github.com/rajeev11256/ipl-data-analysis-project-using-apache-spark-on-databricks

This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.

data-engineering databricks-notebooks pyspark python

Last synced: 09 Apr 2025

https://github.com/jotstolu/car-sales-end-to-end-data-engineering-project-using-azure-databricks

This project presents a scalable end-to-end data pipeline designed for processing and analysing car sales data using the Azure Cloud and Databricks ecosystem.

azurecloud azuredatabricks azuredatafactory azuredatalakegen2 azuresqldb databricks databricks-notebooks delta-lake delta-lake-table dlt etl-pipeline pyspark pyspark-notebook unitycatalog

Last synced: 06 Feb 2026

https://github.com/jotstolu/retail-orders-end-to-end-data-engineering-project-using-azure-databricks

This project demonstrates the development of a scalable end-to-end data pipeline for processing and reporting retail order data using Azure Cloud services, Delta Lake architecture, and Databricks.

azure-devops azurecloud azuredatabricks azuredatafactory bigdataanalytics databricks databricks-notebooks delta-lake medallion-architecture pyspark pyspark-notebook starschema unitycatalog

Last synced: 20 Feb 2026

https://github.com/thaitechtales/databricks

This repository is dedicated to showcasing projects built on Databricks, focusing on big data analytics, data engineering, and machine learning workflows.

apache-spark big-data cloud-data-platform data-analytics data-engineering databricks databricks-notebooks etl machine-learning

Last synced: 18 Apr 2026

https://github.com/nabojyoti/elt-ipl

This is an End-to-End Data Engineering Project that using the IPL Dataset.

apache-spark databricks-notebooks pyspark snowflake snowsql

Last synced: 16 Jan 2026

https://github.com/bhavanachitragar/ipl-data-analysis-project-using-apache-spark-on-databricks

This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.

apache-spark aws-s3 databricks-notebooks python

Last synced: 03 Feb 2026

https://github.com/vandanabhumireddygari/open-table-formats-with-databricks-and-delta-lake

This project demonstrates the use of Open Table Formats with Databricks, PySpark, and Delta Lake. It covers data ingestion, transformation, querying, and storage management using Delta tables. The project includes code for loading data, writing it to Delta format, querying, and utilizing Delta Lake

databricks-notebooks opentableformat pyspark-notebook python

Last synced: 04 May 2026

https://github.com/rajeev11256/flipkart-data-analysis-using-pyspark-on-databricks

The project focuses on building an end-to-end data engineering pipeline using PySpark to address real-world business scenarios. Key steps include exploring and understanding the dataset structure, performing data cleaning to handle inconsistencies, and applying transformations to prepare the data for analysis.

data-engineering databricks-notebooks pyspark python

Last synced: 09 Apr 2025

https://github.com/prateekmaj21/big-data-engineering

Code files for Databricks

big-data databricks-notebooks

Last synced: 11 Oct 2025

https://github.com/mananabbasi/data-science-complete-project-using-big-data-tools-techniques-

This repository contains Databricks projects utilizing RDDs, DataFrames, and SQL to process and analyze various real-world datasets. Data cleaning and analysis have been performed using PySpark functions to handle challenges such as inconsistent formats, missing values, and complex data structures. The project ensures efficient data transformation

azure databricks databricks-industry-solutions databricks-notebooks dataframe pyspark-mllib pyspark-notebook pyspark-python python-script rdd

Last synced: 23 Jan 2026

https://github.com/bhavanachitragar/flipkart-data-analysis-using-pyspark-on-databricks

The project focuses on building an end-to-end data engineering pipeline using PySpark to address real-world business scenarios. Key steps include exploring and understanding the dataset structure, performing data cleaning to handle inconsistencies, and applying transformations to prepare the data for analysis.

data-engineering databricks-notebooks pyspark python

Last synced: 27 Apr 2026

https://github.com/bytebyrajeev/ipl-data-analysis-project-using-apache-spark-on-databricks

This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.

data-engineering databricks-notebooks pyspark python

Last synced: 29 Apr 2026

https://github.com/bytebyrajeev/flipkart-data-analysis-using-pyspark-on-databricks

The project focuses on building an end-to-end data engineering pipeline using PySpark to address real-world business scenarios. Key steps include exploring and understanding the dataset structure, performing data cleaning to handle inconsistencies, and applying transformations to prepare the data for analysis.

data-engineering databricks-notebooks pyspark python

Last synced: 30 Apr 2026

https://github.com/arnoldchrisoduor1/linearregression-model-with-apachespark-and-databricks

Using Apache pySpark on DataBricks, I was able to do feature Engineering on Customer Data, trained and used a Linear Regression Model to predict their bill based on previous customer trends.

apache-spark databricks-notebooks linear-regression pyspark python3 vectorassembler

Last synced: 01 May 2026

https://github.com/seifo321/microsoft-data-engineer-project

Leveraging Microsoft AZURE Services , DEVELOPING a high performance ETL pipeline that extracts and transform the BikeStores data and loads it to Azure data warehouse

azure azuresynapseanalytics databricks-notebooks dataengineering etl-automation etl-pipeline machine-learning predective-modeling sqlserver

Last synced: 07 May 2026