An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by databricks

A curated list of projects in awesome lists by databricks .

https://github.com/databricks/learning-spark

Example code from Learning Spark book

Last synced: 14 May 2025

https://github.com/databricks/koalas

Koalas: pandas API on Apache Spark

big-data data-science dataframe mlflow pandas pydata spark

Last synced: 13 May 2025

https://github.com/databricks/spark-the-definitive-guide

Spark: The Definitive Guide's Code Repository

Last synced: 14 May 2025

https://github.com/databricks/Spark-The-Definitive-Guide

Spark: The Definitive Guide's Code Repository

Last synced: 26 Mar 2025

https://github.com/databricks/scala-style-guide

Databricks Scala Coding Style Guide

Last synced: 14 May 2025

https://github.com/databricks/dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

databricks gen-ai generative-ai llm llm-inference llm-training mosaic-ai

Last synced: 25 Oct 2025

https://github.com/databricks/spark-deep-learning

Deep Learning Pipelines for Apache Spark

Last synced: 15 May 2025

https://github.com/databricks/click?tab=readme-ov-file

The "Command Line Interactive Controller for Kubernetes"

cli kubectl kubernetes rust

Last synced: 29 Mar 2025

https://github.com/databricks/click

The "Command Line Interactive Controller for Kubernetes"

cli kubectl kubernetes rust

Last synced: 14 May 2025

https://github.com/databricks/learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

apache-spark delta-lake mlflow mllib spark spark-mllib spark-sql structured-streaming

Last synced: 14 May 2025

https://github.com/databricks/spark-sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

apache-spark grid-search machine-learning parameter-tuning scikit-learn

Last synced: 30 Sep 2025

https://github.com/databricks/spark-csv

CSV Data Source for Apache Spark 1.x

Last synced: 15 May 2025

https://github.com/databricks/tensorframes

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

Last synced: 15 May 2025

https://github.com/databricks/devrel

This repository contains the notebooks and presentations we use for our Databricks Tech Talks

Last synced: 15 May 2025

https://github.com/databricks/reference-apps

Spark reference applications

Last synced: 16 May 2025

https://github.com/databricks/spark-redshift

Redshift data source for Apache Spark

Last synced: 15 May 2025

https://github.com/databricks/mlops-stacks

This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.

databricks machine-learning mlops

Last synced: 15 May 2025

https://github.com/databricks/spark-avro

Avro Data Source for Apache Spark

Last synced: 12 May 2025

https://github.com/databricks/databricks-sdk-py

Databricks SDK for Python (Beta)

databricks databricks-sdk python

Last synced: 11 Feb 2026

https://github.com/databricks/spark-xml

XML data source for Spark SQL and DataFrames

Last synced: 25 Mar 2025

https://github.com/databricks/spark-corenlp

Stanford CoreNLP wrapper for Apache Spark

Last synced: 06 Apr 2025

https://github.com/databricks/databricks-cli

(Legacy) Command Line Interface for Databricks

Last synced: 14 May 2025

https://github.com/databricks/spark-perf

Performance tests for Apache Spark

Last synced: 09 Sep 2025

https://github.com/databricks/spark-knowledgebase

Spark Knowledge Base

Last synced: 12 Apr 2025

https://github.com/databricks/sjsonnet

Last synced: 16 May 2025

https://github.com/databricks/dbt-databricks

A dbt adapter for Databricks.

databricks dbt etl sql

Last synced: 14 May 2025

https://github.com/databricks/terraform-databricks-examples

Examples of using Terraform to deploy Databricks resources

aws azure databricks databricks-module gcp lakehouse terraform terraform-module

Last synced: 16 May 2025

https://github.com/databricks/jsonnet-style-guide

Databricks Jsonnet Coding Style Guide

Last synced: 12 Apr 2025

https://github.com/databricks/databricks-sql-python

Databricks SQL Connector for Python

databricks dwh python3 sql

Last synced: 08 Jan 2026

https://github.com/databricks/containers

Sample base images for Databricks Container Services

Last synced: 12 Apr 2025

https://github.com/databricks/sbt-spark-package

Sbt plugin for Spark packages

Last synced: 18 Oct 2025

https://github.com/databricks/databricks-vscode

VS Code extension for Databricks

Last synced: 05 Apr 2025

https://github.com/databricks/bundle-examples

Examples of Databricks Asset Bundles

Last synced: 05 Apr 2025

https://github.com/databricks/notebook-best-practices

An example showing how to apply software engineering best practices to Databricks notebooks.

Last synced: 08 May 2025

https://github.com/databricks/iceberg-rest-image

Simple project to expose a catalog over REST using a Java catalog backend

Last synced: 05 Apr 2025

https://github.com/databricks/terraform-databricks-sra

The Security Reference Architecture (SRA) implements typical security features as Terraform Templates that are deployed by most high-security organizations, and enforces controls for the largest risks that customers ask about most often.

Last synced: 12 Apr 2025

https://github.com/databricks/benchmarks

A place in which we publish scripts for reproducible benchmarks.

Last synced: 18 Sep 2025

https://github.com/databricks/tmm

Last synced: 05 Apr 2025

https://github.com/databricks/mlflow

Open source platform for the machine learning lifecycle

Last synced: 02 May 2025

https://github.com/databricks/spark-tfocs

A Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)

Last synced: 30 Jul 2025

https://github.com/databricks/intellij-jsonnet

Intellij Jsonnet Plugin

Last synced: 07 Apr 2025

https://github.com/databricks/terraform-databricks-lakehouse-blueprints

Set of Terraform automation templates and quickstart demos to jumpstart the design of a Lakehouse on Databricks. This project has incorporated best practices across the industries we work with to deliver composable modules to build a workspace to comply with the highest platform security and governance standards.

financial-services hls regulated-industry-blueprints terraform

Last synced: 07 Apr 2025

https://github.com/databricks/sbt-databricks

An sbt plugin for deploying code to Databricks Cloud

Last synced: 07 Apr 2025

https://github.com/databricks/spark-integration-tests

Integration tests for Spark

Last synced: 07 Apr 2025

https://github.com/databricks/spark-pr-dashboard

Dashboard to aid in Spark pull request reviews

Last synced: 07 Apr 2025

https://github.com/databricks/ide-best-practices

Best practices for working with Databricks from an IDE

Last synced: 26 Jun 2025

https://github.com/databricks/unity-catalog-setup

Notebooks, terraform, tools to enable setting up Unity Catalog

databricks unity-catalog

Last synced: 13 Jul 2025

https://github.com/databricks/simr

Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure

Last synced: 07 Apr 2025

https://github.com/databricks/databricks-sql-go

Golang database/sql driver for Databricks SQL.

databricks dwh golang golang-library sql

Last synced: 16 May 2025

https://github.com/databricks/databricks-sql-cli

CLI for querying Databricks SQL

Last synced: 07 Apr 2025

https://github.com/databricks/diviner

Grouped time series forecasting engine

Last synced: 07 Apr 2025

https://github.com/databricks/tpch-dbgen

Patched version of dbgen

Last synced: 07 Apr 2025

https://github.com/databricks/automl

Last synced: 05 Apr 2025

https://github.com/databricks/databricks-sql-nodejs

Databricks SQL Connector for Node.js

databricks dwh node node-js nodejs sql

Last synced: 04 Apr 2025

https://github.com/databricks/als-benchmark-scripts

Scripts to benchmark distributed Alternative Least Squares (ALS)

Last synced: 07 Apr 2025

https://github.com/databricks/python-interview

Databricks Python interview setup instructions

Last synced: 12 Apr 2025

https://github.com/databricks/spark-package-cmd-tool

A command line tool for Spark packages

Last synced: 07 Apr 2025

https://github.com/databricks/setup-cli

Sets up the Databricks CLI in your GitHub Actions workflow.

Last synced: 18 Oct 2025

https://github.com/databricks/congruity

The goal of this library is to provide a compatibility layer that makes it easier to adopt Spark Connect. The library is designed to be simply imported in your application and will then monkey-patch the existing API to provide the legacy functionality.

Last synced: 07 Apr 2025

https://github.com/databricks/genomics-pipelines

secondary analysis pipelines parallelized with apache spark

bwa gatk genomics mutect2 star

Last synced: 07 Apr 2025

https://github.com/databricks/xgb-regressor

MLflow XGBoost Regressor

Last synced: 07 Apr 2025

https://github.com/databricks/databricks-accelerators

Accelerate the use of Databricks for customers [public repo]

Last synced: 07 Apr 2025

https://github.com/databricks/sqltools-databricks-driver

SQLTools driver for Databricks SQL

Last synced: 26 Jun 2025

https://github.com/databricks/dais-cow-bff

Code for the "Path to Production" DAIS 2024 and 2023 talks

Last synced: 07 Apr 2025

https://github.com/databricks/simple-pipeline

Example pipeline for bit.io

Last synced: 01 Apr 2025

https://github.com/databricks/xgboost-linux64

Databricks Private xgboost Linux64 fork

Last synced: 07 Apr 2025

https://github.com/databricks/jenkins-job-builder

Fork of https://docs.openstack.org/infra/jenkins-job-builder/ to include unmerged patches

Last synced: 29 Oct 2025

https://github.com/databricks/terraform-databricks-mlops-aws-project

This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Databricks AWS staging and prod workspaces.

Last synced: 12 Apr 2025

https://github.com/databricks/terraform-databricks-mlops-azure-project-with-sp-creation

This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Azure Databricks staging and prod workspaces. It also creates the relevant Azure Active Directory (AAD) applications for the service principals.

Last synced: 12 Apr 2025

https://github.com/databricks/terraform-databricks-mlops-aws-infrastructure

This module sets up multi-workspace model registry between a Databricks AWS development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries.

Last synced: 12 Apr 2025

https://github.com/databricks/databricks-empty-ide-project

Empty IDE project used by the VSCode extension for Databricks

Last synced: 23 Jan 2026

https://github.com/databricks/tabular-sdk-go

Golang SDK for interacting with the Tabular API

Last synced: 15 Jun 2025

https://github.com/databricks/pex

Fork of pantsbuild/pex with a few Databricks-specific changes

Last synced: 13 Aug 2025

https://github.com/databricks/homebrew-tap

Homebrew Tap for the Databricks CLI

Last synced: 07 Apr 2025

https://github.com/databricks/dbt-tabular

Repository for the dbt ❤️ Tabular blogpost

Last synced: 04 Oct 2025