An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by databrickslabs

A curated list of projects in awesome lists by databrickslabs .

https://github.com/databrickslabs/dolly

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

chatbot databricks dolly gpt

Last synced: 15 Mar 2025

https://github.com/databrickslabs/dbx

đź§± Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

ci cicd databricks databricks-api databricks-cli mlops

Last synced: 29 Apr 2025

https://github.com/databrickslabs/dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

data-generation databricks datagen datageneration datagenerator delta-live-tables deltalake faker pyspark python spark spark-streaming synthetic-data

Last synced: 07 Jul 2025

https://github.com/databrickslabs/dqx

Databricks framework to validate Data Quality of pySpark DataFrames and Tables

data-profiling data-quality data-quality-monitoring databricks lakeflow spark spark-streaming unity-catalog

Last synced: 01 Apr 2026

https://github.com/databrickslabs/tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

data-analysis data-science pandas python scala time-series timeseries timeseries-analysis timeseries-data

Last synced: 29 Apr 2025

https://github.com/databrickslabs/overwatch

Capture deep metrics on one or all assets within a Databricks workspace

databricks monitoring

Last synced: 07 Jul 2025

https://github.com/databrickslabs/ucx

Automated migrations to Unity Catalog

databricks databricks-cli-installable unity-catalog

Last synced: 29 Apr 2025

https://github.com/databrickslabs/cicd-templates

Manage your Databricks deployments and CI with code.

aws azure azure-devops cd-pipeline ci databricks github-actions gitlab mlops

Last synced: 10 May 2025

https://github.com/databrickslabs/automl-toolkit

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.

apache-spark feature-engineering machinelearning ml pyspark scala spark

Last synced: 03 Oct 2025

https://github.com/databrickslabs/dlt-meta

Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines

databricks dlt meta-programming python

Last synced: 29 Apr 2025

https://github.com/databrickslabs/lakebridge

Accelerates migrations to Databricks by automating key migration activities

code-analysis code-converter data-validation databricks databricks-cli-installable reconciliation transpiler

Last synced: 13 Jun 2026

https://github.com/databrickslabs/dataframe-rules-engine

Extensible Rules Engine for custom Dataframe / Dataset validation

Last synced: 07 Jul 2025

https://github.com/databrickslabs/discoverx

A Swiss-Army-knife for your Data Intelligence platform administration.

data-retrieval multi-table-operations pii-detection scanning semantic-classification

Last synced: 07 Jul 2025

https://github.com/databrickslabs/ontos

Business Semantics for Unity Catalog

Last synced: 18 Jan 2026

https://github.com/databrickslabs/geoscan

Geospatial clustering at massive scale

clustering library spark-ml

Last synced: 14 Jul 2025

https://github.com/databrickslabs/jupyterlab-integration

DEPRECATED: Integrating Jupyter with Databricks via SSH

databricks databricks-api databricks-deploy jupyter jupyter-notebook

Last synced: 07 Oct 2025

https://github.com/databrickslabs/brickster

R Toolkit for Databricks

Last synced: 07 Jul 2025

https://github.com/databrickslabs/blueprint

Baseline for Databricks Labs projects written in Python

cli databricks databricks-cli-installable python

Last synced: 09 Mar 2026

https://github.com/databrickslabs/feature-factory

Accelerator to rapidly deploy customized features for your business

Last synced: 07 Jul 2025

https://github.com/databrickslabs/sandbox

Experimental labs projects

databricks databricks-api databricks-sdk

Last synced: 26 Oct 2025

https://github.com/databrickslabs/databricks-sync

An experimental tool to synchronize source Databricks deployment with a target Databricks deployment.

Last synced: 02 Aug 2025

https://github.com/databrickslabs/transpiler

SIEM-to-Spark Transpiler

Last synced: 07 Jul 2025

https://github.com/databrickslabs/delta-oms

DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/

centralized databricks delta delta-lake lakehouse metrics monitoring

Last synced: 07 Jul 2025

https://github.com/databrickslabs/mcp

Last synced: 07 Jul 2025

https://github.com/databrickslabs/pylint-plugin

Databricks Plugin for PyLint

databricks pylint-plugin python

Last synced: 07 Jul 2025

https://github.com/databrickslabs/arcuate

Delta Sharing + MLflow for ML model & experiment exchange (arcuate delta - a fan shaped river delta)

big-data data-sharing delta-sharing mlflow spark

Last synced: 28 Jun 2025

https://github.com/databrickslabs/databricks-sdk-r

Databricks SDK for R (Experimental)

data-science databricks r sdk

Last synced: 27 Oct 2025

https://github.com/databrickslabs/lsql

Lightweight SQL execution wrapper only on top of Databricks SDK

databricks databricks-sdk databricks-sql

Last synced: 07 Jul 2025

https://github.com/databrickslabs/impulse

Large-scale time-series measurement data analytics on Apache Spark

analytics automotive databricks iot manufacturing measurement-data sensor-data spark telemetry time-series

Last synced: 01 Jun 2026

https://github.com/databrickslabs/waterbear

Automated provisioning of an industry Lakehouse with enterprise data model

data-model databricks delta-lake lakehouse python spark sql

Last synced: 07 Jul 2025

https://github.com/databrickslabs/ontobricks

OntoBricks is a web application that transforms Databricks tables into a materialized knowledge graph. It lets you design ontologies (OWL), map them to Unity Catalog tables via R2RML, materialize triples into a Delta triple store and graph DB, reason over the graph (OWL 2 RL, SWRL, SHACL), and query it through an auto-generated GraphQL API + MCP

graph ontology owl triple-store

Last synced: 29 Apr 2026

https://github.com/databrickslabs/geobrix

GeoBrix is a high-performance spatial processing library.

databricks geospatial grid-system raster spark vector

Last synced: 18 Apr 2026

https://github.com/databrickslabs/meta-conversions-api-app

A companion Databricks App for the Meta Conversions API marketplace listing. Provides a guided setup experience for connecting your Databricks lakehouse to Meta's Conversions API (CAPI).

Last synced: 29 Apr 2026