Projects in Awesome Lists by databrickslabs
A curated list of projects in awesome lists by databrickslabs .
https://github.com/databrickslabs/dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
Last synced: 15 Mar 2025
https://github.com/databrickslabs/dbx
đź§± Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
ci cicd databricks databricks-api databricks-cli mlops
Last synced: 29 Apr 2025
https://github.com/databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
data-generation databricks datagen datageneration datagenerator delta-live-tables deltalake faker pyspark python spark spark-streaming synthetic-data
Last synced: 07 Jul 2025
https://github.com/databrickslabs/dqx
Databricks framework to validate Data Quality of pySpark DataFrames and Tables
data-profiling data-quality data-quality-monitoring databricks lakeflow spark spark-streaming unity-catalog
Last synced: 01 Apr 2026
https://github.com/databrickslabs/tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
data-analysis data-science pandas python scala time-series timeseries timeseries-analysis timeseries-data
Last synced: 29 Apr 2025
https://github.com/databrickslabs/overwatch
Capture deep metrics on one or all assets within a Databricks workspace
Last synced: 07 Jul 2025
https://github.com/databrickslabs/ucx
Automated migrations to Unity Catalog
databricks databricks-cli-installable unity-catalog
Last synced: 29 Apr 2025
https://github.com/databrickslabs/cicd-templates
Manage your Databricks deployments and CI with code.
aws azure azure-devops cd-pipeline ci databricks github-actions gitlab mlops
Last synced: 10 May 2025
https://github.com/databrickslabs/automl-toolkit
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
apache-spark feature-engineering machinelearning ml pyspark scala spark
Last synced: 03 Oct 2025
https://github.com/databrickslabs/dlt-meta
Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines
databricks dlt meta-programming python
Last synced: 29 Apr 2025
https://github.com/databrickslabs/lakebridge
Accelerates migrations to Databricks by automating key migration activities
code-analysis code-converter data-validation databricks databricks-cli-installable reconciliation transpiler
Last synced: 13 Jun 2026
https://github.com/databrickslabs/dataframe-rules-engine
Extensible Rules Engine for custom Dataframe / Dataset validation
Last synced: 07 Jul 2025
https://github.com/databrickslabs/discoverx
A Swiss-Army-knife for your Data Intelligence platform administration.
data-retrieval multi-table-operations pii-detection scanning semantic-classification
Last synced: 07 Jul 2025
https://github.com/databrickslabs/ontos
Business Semantics for Unity Catalog
Last synced: 18 Jan 2026
https://github.com/databrickslabs/geoscan
Geospatial clustering at massive scale
Last synced: 14 Jul 2025
https://github.com/databrickslabs/pytester
Python Testing for Databricks
databricks databricks-sdk pytest pytest-plugin python3
Last synced: 07 Jul 2025
https://github.com/databrickslabs/jupyterlab-integration
DEPRECATED: Integrating Jupyter with Databricks via SSH
databricks databricks-api databricks-deploy jupyter jupyter-notebook
Last synced: 07 Oct 2025
https://github.com/databrickslabs/blueprint
Baseline for Databricks Labs projects written in Python
cli databricks databricks-cli-installable python
Last synced: 09 Mar 2026
https://github.com/databrickslabs/feature-factory
Accelerator to rapidly deploy customized features for your business
Last synced: 07 Jul 2025
https://github.com/databrickslabs/sandbox
Experimental labs projects
databricks databricks-api databricks-sdk
Last synced: 26 Oct 2025
https://github.com/databrickslabs/databricks-sync
An experimental tool to synchronize source Databricks deployment with a target Databricks deployment.
Last synced: 02 Aug 2025
https://github.com/databrickslabs/delta-oms
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
centralized databricks delta delta-lake lakehouse metrics monitoring
Last synced: 07 Jul 2025
https://github.com/databrickslabs/pylint-plugin
Databricks Plugin for PyLint
databricks pylint-plugin python
Last synced: 07 Jul 2025
https://github.com/databrickslabs/arcuate
Delta Sharing + MLflow for ML model & experiment exchange (arcuate delta - a fan shaped river delta)
big-data data-sharing delta-sharing mlflow spark
Last synced: 28 Jun 2025
https://github.com/databrickslabs/databricks-sdk-r
Databricks SDK for R (Experimental)
Last synced: 27 Oct 2025
https://github.com/databrickslabs/lsql
Lightweight SQL execution wrapper only on top of Databricks SDK
databricks databricks-sdk databricks-sql
Last synced: 07 Jul 2025
https://github.com/databrickslabs/impulse
Large-scale time-series measurement data analytics on Apache Spark
analytics automotive databricks iot manufacturing measurement-data sensor-data spark telemetry time-series
Last synced: 01 Jun 2026
https://github.com/databrickslabs/waterbear
Automated provisioning of an industry Lakehouse with enterprise data model
data-model databricks delta-lake lakehouse python spark sql
Last synced: 07 Jul 2025
https://github.com/databrickslabs/ontobricks
OntoBricks is a web application that transforms Databricks tables into a materialized knowledge graph. It lets you design ontologies (OWL), map them to Unity Catalog tables via R2RML, materialize triples into a Delta triple store and graph DB, reason over the graph (OWL 2 RL, SWRL, SHACL), and query it through an auto-generated GraphQL API + MCP
graph ontology owl triple-store
Last synced: 29 Apr 2026
https://github.com/databrickslabs/geobrix
GeoBrix is a high-performance spatial processing library.
databricks geospatial grid-system raster spark vector
Last synced: 18 Apr 2026
https://github.com/databrickslabs/meta-conversions-api-app
A companion Databricks App for the Meta Conversions API marketplace listing. Provides a guided setup experience for connecting your Databricks lakehouse to Meta's Conversions API (CAPI).
Last synced: 29 Apr 2026