Projects in Awesome Lists by databricks
A curated list of projects in awesome lists by databricks .
https://github.com/databricks/learning-spark
Example code from Learning Spark book
Last synced: 14 May 2025
https://github.com/databricks/koalas
Koalas: pandas API on Apache Spark
big-data data-science dataframe mlflow pandas pydata spark
Last synced: 13 May 2025
https://github.com/databricks/spark-the-definitive-guide
Spark: The Definitive Guide's Code Repository
Last synced: 14 May 2025
https://github.com/databricks/Spark-The-Definitive-Guide
Spark: The Definitive Guide's Code Repository
Last synced: 26 Mar 2025
https://github.com/databricks/scala-style-guide
Databricks Scala Coding Style Guide
Last synced: 14 May 2025
https://github.com/databricks/dbrx
Code examples and resources for DBRX, a large language model developed by Databricks
databricks gen-ai generative-ai llm llm-inference llm-training mosaic-ai
Last synced: 25 Oct 2025
https://github.com/databricks/spark-deep-learning
Deep Learning Pipelines for Apache Spark
Last synced: 15 May 2025
https://github.com/databricks/click?tab=readme-ov-file
The "Command Line Interactive Controller for Kubernetes"
Last synced: 29 Mar 2025
https://github.com/databricks/click
The "Command Line Interactive Controller for Kubernetes"
Last synced: 14 May 2025
https://github.com/databricks/learningsparkv2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
apache-spark delta-lake mlflow mllib spark spark-mllib spark-sql structured-streaming
Last synced: 14 May 2025
https://github.com/databricks/spark-sklearn
(Deprecated) Scikit-learn integration package for Apache Spark
apache-spark grid-search machine-learning parameter-tuning scikit-learn
Last synced: 30 Sep 2025
https://github.com/databricks/spark-csv
CSV Data Source for Apache Spark 1.x
Last synced: 15 May 2025
https://github.com/databricks/lilac
Curate better data for LLMs
artificial-intelligence data-analysis dataset-analysis unstructured-data
Last synced: 10 Mar 2025
https://github.com/databricks/tensorframes
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
Last synced: 15 May 2025
https://github.com/databricks/devrel
This repository contains the notebooks and presentations we use for our Databricks Tech Talks
Last synced: 15 May 2025
https://github.com/databricks/spark-redshift
Redshift data source for Apache Spark
Last synced: 15 May 2025
https://github.com/databricks/mlops-stacks
This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
databricks machine-learning mlops
Last synced: 15 May 2025
https://github.com/databricks/databricks-sdk-py
Databricks SDK for Python (Beta)
databricks databricks-sdk python
Last synced: 11 Feb 2026
https://github.com/databricks/terraform-provider-databricks
Databricks Terraform Provider
aws azure databricks databricks-automation gcp terraform terraform-provider
Last synced: 14 May 2025
https://github.com/databricks/spark-xml
XML data source for Spark SQL and DataFrames
Last synced: 25 Mar 2025
https://github.com/databricks/spark-corenlp
Stanford CoreNLP wrapper for Apache Spark
Last synced: 06 Apr 2025
https://github.com/databricks/databricks-cli
(Legacy) Command Line Interface for Databricks
Last synced: 14 May 2025
https://github.com/databricks/spark-perf
Performance tests for Apache Spark
Last synced: 09 Sep 2025
https://github.com/databricks/terraform-databricks-examples
Examples of using Terraform to deploy Databricks resources
aws azure databricks databricks-module gcp lakehouse terraform terraform-module
Last synced: 16 May 2025
https://github.com/databricks/jsonnet-style-guide
Databricks Jsonnet Coding Style Guide
Last synced: 12 Apr 2025
https://github.com/databricks/databricks-sql-python
Databricks SQL Connector for Python
Last synced: 08 Jan 2026
https://github.com/databricks/cli
Databricks CLI
command-line-interface databricks
Last synced: 05 Feb 2026
https://github.com/databricks/containers
Sample base images for Databricks Container Services
Last synced: 12 Apr 2025
https://github.com/databricks/sbt-spark-package
Sbt plugin for Spark packages
Last synced: 18 Oct 2025
https://github.com/databricks/databricks-vscode
VS Code extension for Databricks
Last synced: 05 Apr 2025
https://github.com/databricks/bundle-examples
Examples of Databricks Asset Bundles
Last synced: 05 Apr 2025
https://github.com/databricks/notebook-best-practices
An example showing how to apply software engineering best practices to Databricks notebooks.
Last synced: 08 May 2025
https://github.com/databricks/iceberg-rest-image
Simple project to expose a catalog over REST using a Java catalog backend
Last synced: 05 Apr 2025
https://github.com/databricks/terraform-databricks-sra
The Security Reference Architecture (SRA) implements typical security features as Terraform Templates that are deployed by most high-security organizations, and enforces controls for the largest risks that customers ask about most often.
Last synced: 12 Apr 2025
https://github.com/databricks/benchmarks
A place in which we publish scripts for reproducible benchmarks.
Last synced: 18 Sep 2025
https://github.com/databricks/mlflow
Open source platform for the machine learning lifecycle
Last synced: 02 May 2025
https://github.com/databricks/spark-tfocs
A Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)
Last synced: 30 Jul 2025
https://github.com/databricks/terraform-databricks-lakehouse-blueprints
Set of Terraform automation templates and quickstart demos to jumpstart the design of a Lakehouse on Databricks. This project has incorporated best practices across the industries we work with to deliver composable modules to build a workspace to comply with the highest platform security and governance standards.
financial-services hls regulated-industry-blueprints terraform
Last synced: 07 Apr 2025
https://github.com/databricks/sbt-databricks
An sbt plugin for deploying code to Databricks Cloud
Last synced: 07 Apr 2025
https://github.com/databricks/databricks-sdk-go
Databricks SDK for Go
databricks databricks-automation databricks-sdk go
Last synced: 27 Jan 2026
https://github.com/databricks/spark-integration-tests
Integration tests for Spark
Last synced: 07 Apr 2025
https://github.com/databricks/spark-pr-dashboard
Dashboard to aid in Spark pull request reviews
Last synced: 07 Apr 2025
https://github.com/databricks/ide-best-practices
Best practices for working with Databricks from an IDE
Last synced: 26 Jun 2025
https://github.com/databricks/databricks-sdk-java
Databricks SDK for Java
databricks databricks-automation databricks-sdk java
Last synced: 04 Feb 2026
https://github.com/databricks/unity-catalog-setup
Notebooks, terraform, tools to enable setting up Unity Catalog
Last synced: 13 Jul 2025
https://github.com/databricks/simr
Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure
Last synced: 07 Apr 2025
https://github.com/databricks/databricks-sql-go
Golang database/sql driver for Databricks SQL.
databricks dwh golang golang-library sql
Last synced: 16 May 2025
https://github.com/databricks/databricks-sql-cli
CLI for querying Databricks SQL
Last synced: 07 Apr 2025
https://github.com/databricks/diviner
Grouped time series forecasting engine
Last synced: 07 Apr 2025
https://github.com/databricks/databricks-sql-nodejs
Databricks SQL Connector for Node.js
databricks dwh node node-js nodejs sql
Last synced: 04 Apr 2025
https://github.com/databricks/als-benchmark-scripts
Scripts to benchmark distributed Alternative Least Squares (ALS)
Last synced: 07 Apr 2025
https://github.com/databricks/python-interview
Databricks Python interview setup instructions
Last synced: 12 Apr 2025
https://github.com/databricks/spark-package-cmd-tool
A command line tool for Spark packages
Last synced: 07 Apr 2025
https://github.com/databricks/setup-cli
Sets up the Databricks CLI in your GitHub Actions workflow.
Last synced: 18 Oct 2025
https://github.com/databricks/congruity
The goal of this library is to provide a compatibility layer that makes it easier to adopt Spark Connect. The library is designed to be simply imported in your application and will then monkey-patch the existing API to provide the legacy functionality.
Last synced: 07 Apr 2025
https://github.com/databricks/databricks-accelerators
Accelerate the use of Databricks for customers [public repo]
Last synced: 07 Apr 2025
https://github.com/databricks/sqltools-databricks-driver
SQLTools driver for Databricks SQL
Last synced: 26 Jun 2025
https://github.com/databricks/dais-cow-bff
Code for the "Path to Production" DAIS 2024 and 2023 talks
Last synced: 07 Apr 2025
https://github.com/databricks/xgboost-linux64
Databricks Private xgboost Linux64 fork
Last synced: 07 Apr 2025
https://github.com/databricks/jenkins-job-builder
Fork of https://docs.openstack.org/infra/jenkins-job-builder/ to include unmerged patches
Last synced: 29 Oct 2025
https://github.com/databricks/terraform-databricks-mlops-aws-project
This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Databricks AWS staging and prod workspaces.
Last synced: 12 Apr 2025
https://github.com/databricks/terraform-databricks-mlops-azure-project-with-sp-creation
This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Azure Databricks staging and prod workspaces. It also creates the relevant Azure Active Directory (AAD) applications for the service principals.
Last synced: 12 Apr 2025
https://github.com/databricks/terraform-databricks-mlops-aws-infrastructure
This module sets up multi-workspace model registry between a Databricks AWS development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries.
Last synced: 12 Apr 2025
https://github.com/databricks/databricks-empty-ide-project
Empty IDE project used by the VSCode extension for Databricks
Last synced: 23 Jan 2026
https://github.com/databricks/tabular-sdk-go
Golang SDK for interacting with the Tabular API
Last synced: 15 Jun 2025
https://github.com/databricks/pex
Fork of pantsbuild/pex with a few Databricks-specific changes
Last synced: 13 Aug 2025
https://github.com/databricks/homebrew-tap
Homebrew Tap for the Databricks CLI
Last synced: 07 Apr 2025
https://github.com/databricks/dbt-tabular
Repository for the dbt ❤️ Tabular blogpost
Last synced: 04 Oct 2025