An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with databricks

A curated list of projects in awesome lists tagged with databricks .

https://github.com/getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

analytics athena bi bigquery business-intelligence dashboard databricks hacktoberfest javascript mysql postgresql python redash redshift spark spark-sql visualization

Last synced: 16 Dec 2025

https://github.com/cube-js/cube

📊 Cube’s universal semantic layer platform is the next evolution of OLAP technology for AI, BI, spreadsheets, and embedded analytics

analytics bigquery cube databricks headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless snowflake sql

Last synced: 09 Sep 2025

https://github.com/cube-js/cube.js

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics

analytics bigquery cube databricks headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless snowflake sql

Last synced: 19 Mar 2025

https://github.com/tencent/apijson

🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users

baas clickhouse crud databricks elasticsearch hadoop hive influxdb low-code lowcode milvus nocode oracle postgresql postgresql-database serverless snowflake sqlserver tdengine tidb

Last synced: 13 May 2025

https://github.com/Tencent/APIJSON

🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users

baas clickhouse crud databricks elasticsearch hadoop hive influxdb low-code lowcode milvus nocode oracle postgresql postgresql-database serverless snowflake sqlserver tdengine tidb

Last synced: 01 Apr 2025

https://github.com/databrickslabs/dolly

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

chatbot databricks dolly gpt

Last synced: 15 Mar 2025

https://github.com/delta-io/delta-rs

A native Rust library for Delta Lake, with bindings into Python

databricks delta delta-lake pandas pandas-dataframe python rust

Last synced: 13 May 2025

https://github.com/databricks/dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

databricks gen-ai generative-ai llm llm-inference llm-training mosaic-ai

Last synced: 25 Oct 2025

https://github.com/Azure-Samples/modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo

automatedtesting azure cicd data databricks datafactory dataops devops fabric

Last synced: 30 Jul 2025

https://github.com/azure-samples/modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo

automatedtesting azure cicd data databricks datafactory dataops devops fabric

Last synced: 14 May 2025

https://github.com/databricks/mlops-stacks

This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.

databricks machine-learning mlops

Last synced: 15 May 2025

https://github.com/databrickslabs/dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

ci cicd databricks databricks-api databricks-cli mlops

Last synced: 29 Apr 2025

https://github.com/databricks/databricks-sdk-py

Databricks SDK for Python (Beta)

databricks databricks-sdk python

Last synced: 14 May 2025

https://github.com/databrickslabs/dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

data-generation databricks datagen datageneration datagenerator delta-live-tables deltalake faker pyspark python spark spark-streaming synthetic-data

Last synced: 07 Jul 2025

https://github.com/thoughtworks/mlops-platforms

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

azureml data-science databricks dataiku datarobot google-ai-platform h2oai iguazio knime kubeflow machine-learning mlflow mlops pachyderm sagemaker seldon

Last synced: 07 Aug 2025

https://github.com/microsoft/nutter

Testing framework for Databricks notebooks

azuredevops databricks databricks-notebooks

Last synced: 16 May 2025

https://github.com/databricks/dbt-databricks

A dbt adapter for Databricks.

databricks dbt etl sql

Last synced: 14 May 2025

https://github.com/databricks/terraform-databricks-examples

Examples of using Terraform to deploy Databricks resources

aws azure databricks databricks-module gcp lakehouse terraform terraform-module

Last synced: 16 May 2025

https://github.com/databrickslabs/dqx

Databricks framework to validate Data Quality of pySpark DataFrames

data-profiling data-quality data-quality-checks data-quality-monitoring databricks dlt spark spark-streaming

Last synced: 08 Apr 2025

https://github.com/adidas/lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

big-data configuration-driven data-engineering data-quality databricks delta-lake framework great-expectations lakehouse spark

Last synced: 12 Apr 2025

https://github.com/databrickslabs/overwatch

Capture deep metrics on one or all assets within a Databricks workspace

databricks monitoring

Last synced: 07 Jul 2025

https://github.com/databrickslabs/ucx

Automated migrations to Unity Catalog

databricks databricks-cli-installable unity-catalog

Last synced: 29 Apr 2025

https://github.com/databrickslabs/cicd-templates

Manage your Databricks deployments and CI with code.

aws azure azure-devops cd-pipeline ci databricks github-actions gitlab mlops

Last synced: 10 May 2025

https://github.com/cartodb/analytics-toolbox-core

A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities

analytics-toolbox bigquery carto databricks geospatial gis postgres redshift snowflake sql

Last synced: 10 Oct 2025

https://github.com/databricks/databricks-sql-python

Databricks SQL Connector for Python

databricks dwh python3 sql

Last synced: 11 Apr 2025

https://github.com/databrickslabs/dlt-meta

Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines

databricks dlt meta-programming python

Last synced: 29 Apr 2025

https://github.com/lamastex/scalable-data-science

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

apache-spark data-science databricks scala

Last synced: 16 May 2025

https://github.com/aloneguid/stowage

Bloat-free, no BS cloud storage SDK.

aws-s3 azure-storage databricks gcp-storage

Last synced: 08 Apr 2025

https://github.com/buremba/universql

The bridge to effortless multi-engine data applications, currently supports Snowflake ❄️ and DuckDB 🦆

databricks dbt duckdb proxy-server snowflake sql sql-proxy sqlglot

Last synced: 12 Apr 2025

https://github.com/yokawasa/databricks-notebooks

Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )

azure azuredatabricks databricks elt python spark streaming

Last synced: 19 Jun 2025

https://github.com/databrickslabs/jupyterlab-integration

DEPRECATED: Integrating Jupyter with Databricks via SSH

databricks databricks-api databricks-deploy jupyter jupyter-notebook

Last synced: 07 Oct 2025

https://github.com/alexott/databricks-playground

Code samples, etc. for Databricks

databricks pyspark

Last synced: 09 Sep 2025

https://github.com/mullerpeter/databricks-grafana

Grafana Databricks integration allowing direct connection to Databricks to query and visualize Databricks data in Grafana.

databricks grafana grafana-backend-plugin grafana-datasource grafana-plugin

Last synced: 13 Apr 2025

https://github.com/starlake-ai/jsqltranspiler

Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.

abstract-syntax-tree bigquery column databricks duckdb java lineage query redshift resolver rewrite snowflake transpiler

Last synced: 06 Aug 2025

https://github.com/viktoriasemaan/data-engineering

🚀 Advanced Data & AI Engineering Portfolio: Real-world projects and production-ready patterns to level up your AI skills—from building clean data pipelines to deploying RAG systems, AI agents, and intelligent dashboards.

ai data-engineering databricks spark

Last synced: 10 Oct 2025

https://github.com/databrickslabs/sandbox

Experimental labs projects

databricks databricks-api databricks-sdk

Last synced: 26 Oct 2025

https://github.com/souvik-databricks/dlt-with-debug

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data big-data-processing databricks delta-live-tables dlt etl etl-pipeline python3 spark

Last synced: 10 Sep 2025

https://github.com/databricks/unity-catalog-setup

Notebooks, terraform, tools to enable setting up Unity Catalog

databricks unity-catalog

Last synced: 13 Jul 2025

https://github.com/databricks/databricks-sql-go

Golang database/sql driver for Databricks SQL.

databricks dwh golang golang-library sql

Last synced: 16 May 2025

https://github.com/tatevkaren/free-resources-books-papers

Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.

books data-science databricks delta-lake developers econometrics free-books free-resources machine-learning mathematics statistics

Last synced: 19 Oct 2025

https://github.com/databrickslabs/delta-oms

DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/

centralized databricks delta delta-lake lakehouse metrics monitoring

Last synced: 07 Jul 2025

https://github.com/datamole-ai/pysparkdt

An open-source Python library for simplifying local testing of Databricks workflows that use PySpark and Delta tables.

databricks delta delta-tables pipelines pyspark pytest python testing workflows

Last synced: 15 Oct 2025

https://github.com/getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake

Last synced: 13 Oct 2025

https://github.com/renardeinside/databricks-streamlit-demo

Demo of Streamlit application with Databricks SQL Endpoint

databricks streamlit visualization

Last synced: 23 Apr 2025

https://github.com/alexott/dlt-files-in-repos-demo

Demonstration of using Files in Repos with Databricks Delta Live Tables

ci-cd databricks delta-live-tables devops unit-testing

Last synced: 16 Aug 2025

https://github.com/airscholar/modern-data-eng-dbt-databricks-azure

In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.

apache-spark azure databricks dbt modern-data-engineering

Last synced: 10 Apr 2025

https://github.com/databricks/databricks-sql-nodejs

Databricks SQL Connector for Node.js

databricks dwh node node-js nodejs sql

Last synced: 04 Apr 2025

https://github.com/renardeinside/databricks-uc-semantic-layer

Using OpenAI with Databricks SQL for queries in natural language

databricks databricks-sql openai sql

Last synced: 23 Apr 2025

https://github.com/databrickslabs/pylint-plugin

Databricks Plugin for PyLint

databricks pylint-plugin python

Last synced: 07 Jul 2025

https://github.com/jaceklaskowski/learn-databricks

Notebooks to learn Databricks Lakehouse Platform

databricks databricks-notebooks delta-live-tables mlflow

Last synced: 16 Apr 2025

https://github.com/databrickslabs/databricks-sdk-r

Databricks SDK for R (Experimental)

data-science databricks r sdk

Last synced: 27 Oct 2025

https://github.com/nhsdigital/artificial-data-generator

Pipelines for generating large volumes of anonymous artificial data that share some of the characteristics of real NHS data

artificial baseline-rap databricks hospital-episode-statistics nhs not-optimised-for-reuse pyspark python

Last synced: 12 Apr 2025

https://github.com/adampaternostro/azure-databricks-log4j-to-appinsights

Connect your Spark Databricks clusters Log4J output to the Application Insights Appender

application-insights azure-monitor databricks

Last synced: 16 Oct 2025

https://github.com/santiagortiiz/advanced-data-engineering-with-databricks

Databricks. Incremental data processing, task orchestration, and production job monitoring.

big-data databricks databricks-notebooks kafka spark spark-streaming streaming

Last synced: 15 Apr 2025

https://github.com/mach-kernel/databricks-kube-operator

A Kubernetes operator to enable GitOps style deploys for Databricks resources

ci cicd databricks gitops helm kubernetes operators rust spark

Last synced: 26 Apr 2025

https://github.com/mlverse/pysparklyr

Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect

databricks pyspark r spark spark-connect

Last synced: 14 Jul 2025

https://github.com/databrickslabs/lsql

Lightweight SQL execution wrapper only on top of Databricks SDK

databricks databricks-sdk databricks-sql

Last synced: 07 Jul 2025

https://github.com/getyourguide/db-rocket

Keep your local python scripts installed and in sync with a databricks notebook. Shortens the feedback loop to develop projects using a hybrid environment.

data-science databricks productivity python

Last synced: 11 Apr 2025

https://github.com/hashload/freeza-offset

Spark stream consumption commit in kafka consumer group

databricks kafka kafka-commit kafka-offset-commits spark spark-streaming

Last synced: 20 Jul 2025

https://github.com/renardeinside/pyspark-logging-examples

Writing PySpark logs in Apache Spark and Databricks

apache-spark databricks log4j logging logs

Last synced: 19 Jul 2025

https://github.com/bluegranite/databrickstraining

Repository for Microsoft Databricks Training Events - Hosted by BlueGranite

apache-spark azure azure-databricks databricks distributed-computing machine-learning pyspark spark spark-streaming

Last synced: 13 May 2025

https://github.com/renardeinside/e2e-mlops-demo

E2E MLOps with Databricks

azure databricks hyperopt mlops

Last synced: 23 Apr 2025

https://github.com/cartodb/poc-databricks

CARTO Analytics Toolbox for Databricks provides geospatial functionality leveraging the Geomesa SparkSQL capabilities.

databricks geospatial gis location

Last synced: 12 Apr 2025

https://github.com/azure/employee-retention-databricks-kubernetes-poc

End-to-end proof of concept showing core MLOps practices to develop, deploy and monitor a machine learning model for an employee retention workload using Databricks and Kubernetes on Microsoft Azure.

azure databricks github-actions kubernetes machine-learning mlflow

Last synced: 09 Apr 2025

https://github.com/benc-uk/batcomputer

A working example of DevOps & operationalisation applied to Machine Learning and AI

api-wrapper azure azure-devops databricks docker kubernetes machine-learning

Last synced: 10 Mar 2025

https://github.com/analyticalmonk/pyspark_nlp_workshop

Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"

databricks databricks-notebooks distributed-computing nlp pyspark spark spark-nlp workshop

Last synced: 05 Oct 2025

https://github.com/ajaen4/terraform-databricks-aws

Terraform repository to deploy a fully functioning Databricks environment on top of AWS. Deploys all Databricks and AWS resources.

aws databricks terraform

Last synced: 26 Sep 2025

https://github.com/tomarv2/terraform-databricks-workspace-management

Terraform module for Databricks Workspace Management: https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/guides/workspace-management

databricks databricks-deploy databricks-workspace databricks-workspace-management notebook terraform terraform-module

Last synced: 06 Nov 2025

https://github.com/renardeinside/chatten

RAG application (backend & frontend) with sources retriveal and highlighting on the Databricks Platform

dash databricks python rag vector-search

Last synced: 05 May 2025

https://github.com/adampaternostro/azure-app-insights-distrubuted-tracing

How to use Application Insights to do distributed tracing through a Web App, REST API, Function App, Service Bus, Databricks and Data Factory.

application-insights azure azure-data-factory azure-functions databricks monitoring service-bus

Last synced: 08 Sep 2025