An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with databricks

A curated list of projects in awesome lists tagged with databricks .

https://github.com/getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

analytics athena bi bigquery business-intelligence dashboard databricks hacktoberfest javascript mysql postgresql python redash redshift spark spark-sql visualization

Last synced: 12 May 2025

https://github.com/cube-js/cube

📊 Cube’s universal semantic layer platform is the next evolution of OLAP technology for AI, BI, spreadsheets, and embedded analytics

analytics bigquery cube databricks headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless snowflake sql

Last synced: 12 May 2025

https://github.com/cube-js/cube.js

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics

analytics bigquery cube databricks headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless snowflake sql

Last synced: 19 Mar 2025

https://github.com/tencent/apijson

🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users

baas clickhouse crud databricks elasticsearch hadoop hive influxdb low-code lowcode milvus nocode oracle postgresql postgresql-database serverless snowflake sqlserver tdengine tidb

Last synced: 13 May 2025

https://github.com/Tencent/APIJSON

🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users

baas clickhouse crud databricks elasticsearch hadoop hive influxdb low-code lowcode milvus nocode oracle postgresql postgresql-database serverless snowflake sqlserver tdengine tidb

Last synced: 01 Apr 2025

https://github.com/databrickslabs/dolly

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

chatbot databricks dolly gpt

Last synced: 15 Mar 2025

https://github.com/delta-io/delta-rs

A native Rust library for Delta Lake, with bindings into Python

databricks delta delta-lake pandas pandas-dataframe python rust

Last synced: 13 May 2025

https://github.com/databricks/dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

databricks gen-ai generative-ai llm llm-inference llm-training mosaic-ai

Last synced: 15 May 2025

https://github.com/azure-samples/modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo

automatedtesting azure cicd data databricks datafactory dataops devops fabric

Last synced: 14 May 2025

https://github.com/Azure-Samples/modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo

automatedtesting azure cicd data databricks datafactory dataops devops fabric

Last synced: 04 Dec 2024

https://github.com/databricks/mlops-stacks

This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.

databricks machine-learning mlops

Last synced: 15 May 2025

https://github.com/databrickslabs/dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

ci cicd databricks databricks-api databricks-cli mlops

Last synced: 29 Apr 2025

https://github.com/databricks/databricks-sdk-py

Databricks SDK for Python (Beta)

databricks databricks-sdk python

Last synced: 14 May 2025

https://github.com/thoughtworks/mlops-platforms

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

azureml data-science databricks dataiku datarobot google-ai-platform h2oai iguazio knime kubeflow machine-learning mlflow mlops pachyderm sagemaker seldon

Last synced: 07 May 2025

https://github.com/microsoft/nutter

Testing framework for Databricks notebooks

azuredevops databricks databricks-notebooks

Last synced: 16 May 2025

https://github.com/databricks/dbt-databricks

A dbt adapter for Databricks.

databricks dbt etl sql

Last synced: 14 May 2025

https://github.com/databricks/terraform-databricks-examples

Examples of using Terraform to deploy Databricks resources

aws azure databricks databricks-module gcp lakehouse terraform terraform-module

Last synced: 16 May 2025

https://github.com/databrickslabs/dqx

Databricks framework to validate Data Quality of pySpark DataFrames

data-profiling data-quality data-quality-checks data-quality-monitoring databricks dlt spark spark-streaming

Last synced: 08 Apr 2025

https://github.com/adidas/lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

big-data configuration-driven data-engineering data-quality databricks delta-lake framework great-expectations lakehouse spark

Last synced: 12 Apr 2025

https://github.com/databrickslabs/ucx

Automated migrations to Unity Catalog

databricks databricks-cli-installable unity-catalog

Last synced: 29 Apr 2025

https://github.com/databrickslabs/cicd-templates

Manage your Databricks deployments and CI with code.

aws azure azure-devops cd-pipeline ci databricks github-actions gitlab mlops

Last synced: 10 May 2025

https://github.com/cartodb/analytics-toolbox-core

A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities

analytics-toolbox bigquery carto databricks geospatial gis postgres redshift snowflake sql

Last synced: 12 Apr 2025

https://github.com/databrickslabs/dlt-meta

Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines

databricks dlt meta-programming python

Last synced: 29 Apr 2025

https://github.com/databricks/databricks-sql-python

Databricks SQL Connector for Python

databricks dwh python3 sql

Last synced: 11 Apr 2025

https://github.com/lamastex/scalable-data-science

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

apache-spark data-science databricks scala

Last synced: 16 May 2025

https://github.com/aloneguid/stowage

Bloat-free, no BS cloud storage SDK.

aws-s3 azure-storage databricks gcp-storage

Last synced: 08 Apr 2025

https://github.com/buremba/universql

The bridge to effortless multi-engine data applications, currently supports Snowflake ❄️ and DuckDB 🦆

databricks dbt duckdb proxy-server snowflake sql sql-proxy sqlglot

Last synced: 12 Apr 2025

https://github.com/yokawasa/databricks-notebooks

Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )

azure azuredatabricks databricks elt python spark streaming

Last synced: 26 Mar 2025

https://github.com/databrickslabs/jupyterlab-integration

DEPRECATED: Integrating Jupyter with Databricks via SSH

databricks databricks-api databricks-deploy jupyter jupyter-notebook

Last synced: 25 Jan 2025

https://github.com/alexott/databricks-playground

Code samples, etc. for Databricks

databricks pyspark

Last synced: 09 Apr 2025

https://github.com/databrickslabs/remorph

Accelerates migrations to Databricks by automating code conversion and migration validation

code-converter data-validation databricks reconciliation transpiler

Last synced: 06 May 2025

https://github.com/mullerpeter/databricks-grafana

Grafana Databricks integration allowing direct connection to Databricks to query and visualize Databricks data in Grafana.

databricks grafana grafana-backend-plugin grafana-datasource grafana-plugin

Last synced: 13 Apr 2025

https://github.com/starlake-ai/jsqltranspiler

Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.

abstract-syntax-tree bigquery column databricks duckdb java lineage query redshift resolver rewrite snowflake transpiler

Last synced: 16 May 2025

https://github.com/souvik-databricks/dlt-with-debug

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data big-data-processing databricks delta-live-tables dlt etl etl-pipeline python3 spark

Last synced: 15 Apr 2025

https://github.com/databricks/unity-catalog-setup

Notebooks, terraform, tools to enable setting up Unity Catalog

databricks unity-catalog

Last synced: 12 Apr 2025

https://github.com/databricks/databricks-sql-go

Golang database/sql driver for Databricks SQL.

databricks dwh golang golang-library sql

Last synced: 16 May 2025

https://github.com/tatevkaren/free-resources-books-papers

Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.

books data-science databricks delta-lake developers econometrics free-books free-resources machine-learning mathematics statistics

Last synced: 28 Mar 2025

https://github.com/getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake

Last synced: 10 Apr 2025

https://github.com/renardeinside/databricks-streamlit-demo

Demo of Streamlit application with Databricks SQL Endpoint

databricks streamlit visualization

Last synced: 23 Apr 2025

https://github.com/alexott/dlt-files-in-repos-demo

Demonstration of using Files in Repos with Databricks Delta Live Tables

ci-cd databricks delta-live-tables devops unit-testing

Last synced: 13 Apr 2025

https://github.com/airscholar/modern-data-eng-dbt-databricks-azure

In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.

apache-spark azure databricks dbt modern-data-engineering

Last synced: 10 Apr 2025

https://github.com/databricks/databricks-sql-nodejs

Databricks SQL Connector for Node.js

databricks dwh node node-js nodejs sql

Last synced: 04 Apr 2025

https://github.com/renardeinside/databricks-uc-semantic-layer

Using OpenAI with Databricks SQL for queries in natural language

databricks databricks-sql openai sql

Last synced: 23 Apr 2025

https://github.com/pbv0/databricks-apps-cookbook

Ready-to-use code snippets for building interactive data applications using Databricks Apps.

databricks web-application

Last synced: 01 Feb 2025

https://github.com/jaceklaskowski/learn-databricks

Notebooks to learn Databricks Lakehouse Platform

databricks databricks-notebooks delta-live-tables mlflow

Last synced: 16 Apr 2025

https://github.com/nhsdigital/artificial-data-generator

Pipelines for generating large volumes of anonymous artificial data that share some of the characteristics of real NHS data

artificial baseline-rap databricks hospital-episode-statistics nhs not-optimised-for-reuse pyspark python

Last synced: 12 Apr 2025

https://github.com/santiagortiiz/advanced-data-engineering-with-databricks

Databricks. Incremental data processing, task orchestration, and production job monitoring.

big-data databricks databricks-notebooks kafka spark spark-streaming streaming

Last synced: 15 Apr 2025

https://github.com/adampaternostro/azure-databricks-log4j-to-appinsights

Connect your Spark Databricks clusters Log4J output to the Application Insights Appender

application-insights azure-monitor databricks

Last synced: 03 Dec 2024

https://github.com/mach-kernel/databricks-kube-operator

A Kubernetes operator to enable GitOps style deploys for Databricks resources

ci cicd databricks gitops helm kubernetes operators rust spark

Last synced: 26 Apr 2025

https://github.com/renardeinside/pyspark-logging-examples

Writing PySpark logs in Apache Spark and Databricks

apache-spark databricks log4j logging logs

Last synced: 23 Apr 2025

https://github.com/getyourguide/db-rocket

Keep your local python scripts installed and in sync with a databricks notebook. Shortens the feedback loop to develop projects using a hybrid environment.

data-science databricks productivity python

Last synced: 11 Apr 2025

https://github.com/hashload/freeza-offset

Spark stream consumption commit in kafka consumer group

databricks kafka kafka-commit kafka-offset-commits spark spark-streaming

Last synced: 14 Feb 2025

https://github.com/bluegranite/databrickstraining

Repository for Microsoft Databricks Training Events - Hosted by BlueGranite

apache-spark azure azure-databricks databricks distributed-computing machine-learning pyspark spark spark-streaming

Last synced: 13 May 2025

https://github.com/azure/employee-retention-databricks-kubernetes-poc

End-to-end proof of concept showing core MLOps practices to develop, deploy and monitor a machine learning model for an employee retention workload using Databricks and Kubernetes on Microsoft Azure.

azure databricks github-actions kubernetes machine-learning mlflow

Last synced: 09 Apr 2025

https://github.com/renardeinside/e2e-mlops-demo

E2E MLOps with Databricks

azure databricks hyperopt mlops

Last synced: 23 Apr 2025

https://github.com/xonai-computing/xonai-dashboard

A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver

apache-spark aws aws-emr databricks grafana prometheus python

Last synced: 14 Feb 2025

https://github.com/cartodb/poc-databricks

CARTO Analytics Toolbox for Databricks provides geospatial functionality leveraging the Geomesa SparkSQL capabilities.

databricks geospatial gis location

Last synced: 12 Apr 2025

https://github.com/analyticalmonk/pyspark_nlp_workshop

Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"

databricks databricks-notebooks distributed-computing nlp pyspark spark spark-nlp workshop

Last synced: 15 Apr 2025

https://github.com/ajaen4/terraform-databricks-aws

Terraform repository to deploy a fully functioning Databricks environment on top of AWS. Deploys all Databricks and AWS resources.

aws databricks terraform

Last synced: 17 Jan 2025

https://github.com/benc-uk/batcomputer

A working example of DevOps & operationalisation applied to Machine Learning and AI

api-wrapper azure azure-devops databricks docker kubernetes machine-learning

Last synced: 10 Mar 2025

https://github.com/mlverse/pysparklyr

Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect

databricks pyspark r spark spark-connect

Last synced: 22 Nov 2024

https://github.com/tomarv2/terraform-databricks-workspace-management

Terraform module for Databricks Workspace Management: https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/guides/workspace-management

databricks databricks-deploy databricks-workspace databricks-workspace-management notebook terraform terraform-module

Last synced: 23 Mar 2025

https://github.com/renardeinside/chatten

RAG application (backend & frontend) with sources retriveal and highlighting on the Databricks Platform

dash databricks python rag vector-search

Last synced: 05 May 2025

https://github.com/serialbandicoot/great-assertions

This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.

data-science data-testing databricks great-expectations jupyter-notebook python python3 quality-assurance testing

Last synced: 13 Feb 2025

https://github.com/adampaternostro/azure-app-insights-distrubuted-tracing

How to use Application Insights to do distributed tracing through a Web App, REST API, Function App, Service Bus, Databricks and Data Factory.

application-insights azure azure-data-factory azure-functions databricks monitoring service-bus

Last synced: 03 Dec 2024

https://github.com/sjrusso8/fastapi-lakehouse

Connect FastAPI to a Databricks Lakehouse

databricks fastapi fastapi-template lakehouse

Last synced: 13 Apr 2025

https://github.com/hamza88-coder/real-time-recruitment-system-with-ai-and-data-analytics

Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includes a Flask-based recommendation system and Tableau visualizations.

apache-nifi chatbot databricks dbt delta-lake docker faiss flask k-means kafka llama3 pinecone postgresql ray redis snowflake spark sparkml

Last synced: 13 Jan 2025

https://github.com/renardeinside/dbx-scala-example

Sample project for Scala applications with dbx and CI/CD setup based on Github actions.

cicd databricks github-actions scala

Last synced: 23 Apr 2025

https://github.com/kevinknights29/databricks_llm101x

This project contains the lab notebooks from course: Large Language Models: Application through Production by Databricks

databricks jupyter-notebook llms

Last synced: 12 Apr 2025

https://github.com/turbot/steampipe-plugin-databricks

Use SQL to instantly query Databricks resources. Open source CLI. No DB required.

backup databricks etl hacktoberfest postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 22 Apr 2025