Projects in Awesome Lists tagged with databricks
A curated list of projects in awesome lists tagged with databricks .
https://github.com/getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
analytics athena bi bigquery business-intelligence dashboard databricks hacktoberfest javascript mysql postgresql python redash redshift spark spark-sql visualization
Last synced: 12 May 2025
https://github.com/cube-js/cube
📊 Cube’s universal semantic layer platform is the next evolution of OLAP technology for AI, BI, spreadsheets, and embedded analytics
analytics bigquery cube databricks headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless snowflake sql
Last synced: 12 May 2025
https://github.com/cube-js/cube.js
📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
analytics bigquery cube databricks headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless snowflake sql
Last synced: 19 Mar 2025
https://github.com/tencent/apijson
🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users
baas clickhouse crud databricks elasticsearch hadoop hive influxdb low-code lowcode milvus nocode oracle postgresql postgresql-database serverless snowflake sqlserver tdengine tidb
Last synced: 13 May 2025
https://github.com/Tencent/APIJSON
🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users
baas clickhouse crud databricks elasticsearch hadoop hive influxdb low-code lowcode milvus nocode oracle postgresql postgresql-database serverless snowflake sqlserver tdengine tidb
Last synced: 01 Apr 2025
https://github.com/databrickslabs/dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
Last synced: 15 Mar 2025
https://github.com/microsoft/synapseml
Simple and Distributed Machine Learning
ai apache-spark azure big-data cognitive-services data-science databricks deep-learning http lightgbm machine-learning microsoft ml model-deployment onnx opencv pyspark scala spark synapse
Last synced: 13 May 2025
https://microsoft.github.io/SynapseML/
Simple and Distributed Machine Learning
ai apache-spark azure big-data cognitive-services data-science databricks deep-learning http lightgbm machine-learning microsoft ml model-deployment onnx opencv pyspark scala spark synapse
Last synced: 29 Apr 2025
https://github.com/microsoft/SynapseML
Simple and Distributed Machine Learning
ai apache-spark azure big-data cognitive-services data-science databricks deep-learning http lightgbm machine-learning microsoft ml model-deployment onnx opencv pyspark scala spark synapse
Last synced: 14 Mar 2025
https://github.com/delta-io/delta-rs
A native Rust library for Delta Lake, with bindings into Python
databricks delta delta-lake pandas pandas-dataframe python rust
Last synced: 13 May 2025
https://github.com/databricks/dbrx
Code examples and resources for DBRX, a large language model developed by Databricks
databricks gen-ai generative-ai llm llm-inference llm-training mosaic-ai
Last synced: 15 May 2025
https://github.com/dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
analytics apache-spark azure bigdata csharp databricks dotnet dotnet-core dotnet-standard emr fsharp hdinsight machine-learning microsoft spark spark-sql spark-streaming streaming tpcds tpch
Last synced: 11 May 2025
https://github.com/multiwoven/multiwoven
🔥🔥🔥 Open source composable CDP - alternative to hightouch and census.
bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript
Last synced: 13 May 2025
https://github.com/Multiwoven/multiwoven
🔥🔥🔥 Open source composable CDP - alternative to hightouch and census.
bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript
Last synced: 01 Apr 2025
https://github.com/azure-samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
automatedtesting azure cicd data databricks datafactory dataops devops fabric
Last synced: 14 May 2025
https://github.com/Azure-Samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
automatedtesting azure cicd data databricks datafactory dataops devops fabric
Last synced: 04 Dec 2024
https://github.com/synmetrix/synmetrix
Synmetrix – production-ready open source semantic layer on Cube
big-data bigquery business-intelligence clickhouse cube cubejs data-engineering databricks dremio druid firebolt llm prestodb redshift semantic-layer snowflake vertica
Last synced: 15 May 2025
https://github.com/databricks/mlops-stacks
This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
databricks machine-learning mlops
Last synced: 15 May 2025
https://github.com/databricks/terraform-provider-databricks
Databricks Terraform Provider
aws azure databricks databricks-automation gcp terraform terraform-provider
Last synced: 14 May 2025
https://github.com/databrickslabs/dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
ci cicd databricks databricks-api databricks-cli mlops
Last synced: 29 Apr 2025
https://github.com/databricks/databricks-sdk-py
Databricks SDK for Python (Beta)
databricks databricks-sdk python
Last synced: 14 May 2025
https://github.com/thoughtworks/mlops-platforms
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...
azureml data-science databricks dataiku datarobot google-ai-platform h2oai iguazio knime kubeflow machine-learning mlflow mlops pachyderm sagemaker seldon
Last synced: 07 May 2025
https://github.com/microsoft/nutter
Testing framework for Databricks notebooks
azuredevops databricks databricks-notebooks
Last synced: 16 May 2025
https://github.com/databricks/terraform-databricks-examples
Examples of using Terraform to deploy Databricks resources
aws azure databricks databricks-module gcp lakehouse terraform terraform-module
Last synced: 16 May 2025
https://github.com/databrickslabs/dqx
Databricks framework to validate Data Quality of pySpark DataFrames
data-profiling data-quality data-quality-checks data-quality-monitoring databricks dlt spark spark-streaming
Last synced: 08 Apr 2025
https://github.com/dataflint/spark
Performance Observability for Apache Spark
apache-spark big-data data-pipeline data-pipelines databricks dataproc emr etl observability optimization spark-operator
Last synced: 12 Apr 2025
https://github.com/adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
big-data configuration-driven data-engineering data-quality databricks delta-lake framework great-expectations lakehouse spark
Last synced: 12 Apr 2025
https://github.com/azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
apache apache-spark azure bigdata connector continuous databricks event-hubs eventhubs ingestion kafka microsoft real-time scala spark spark-streaming stream streaming structured-streaming
Last synced: 15 May 2025
https://github.com/databrickslabs/ucx
Automated migrations to Unity Catalog
databricks databricks-cli-installable unity-catalog
Last synced: 29 Apr 2025
https://github.com/azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
apache-spark azure-cosmos-db azure-databricks changefeed connector cosmos-db databricks databricks-notebooks jupyter-notebook lambda-architecture pyspark spark
Last synced: 02 Mar 2025
https://github.com/Azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
apache-spark azure-cosmos-db azure-databricks changefeed connector cosmos-db databricks databricks-notebooks jupyter-notebook lambda-architecture pyspark spark
Last synced: 10 May 2025
https://github.com/databrickslabs/cicd-templates
Manage your Databricks deployments and CI with code.
aws azure azure-devops cd-pipeline ci databricks github-actions gitlab mlops
Last synced: 10 May 2025
https://github.com/cartodb/analytics-toolbox-core
A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities
analytics-toolbox bigquery carto databricks geospatial gis postgres redshift snowflake sql
Last synced: 12 Apr 2025
https://github.com/databrickslabs/dlt-meta
Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines
databricks dlt meta-programming python
Last synced: 29 Apr 2025
https://github.com/databricks/databricks-sql-python
Databricks SQL Connector for Python
Last synced: 11 Apr 2025
https://github.com/databricks/cli
Databricks CLI
command-line-interface databricks
Last synced: 15 May 2025
https://github.com/lamastex/scalable-data-science
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
apache-spark data-science databricks scala
Last synced: 16 May 2025
https://github.com/aloneguid/stowage
Bloat-free, no BS cloud storage SDK.
aws-s3 azure-storage databricks gcp-storage
Last synced: 08 Apr 2025
https://github.com/buremba/universql
The bridge to effortless multi-engine data applications, currently supports Snowflake ❄️ and DuckDB 🦆
databricks dbt duckdb proxy-server snowflake sql sql-proxy sqlglot
Last synced: 12 Apr 2025
https://github.com/aehrc/variantspark
machine learning for genomic variants
association-studies aws bioinformatics databricks emr genome gwas notebook random-forest variant-spark variantspark vcf
Last synced: 06 Apr 2025
https://github.com/olonok69/llm_notebooks
Notebooks and Code about Generative Ai, LLMs, MLOPS, NLP , CV and Graph databases
azure-machine-learning-studio computer-vision databricks docker generative-ai langchain llamaindex llms machine-learning mlflow mlops neo4j nlp ocr prompt-engineering rag vertex-ai
Last synced: 06 Apr 2025
https://github.com/yokawasa/databricks-notebooks
Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )
azure azuredatabricks databricks elt python spark streaming
Last synced: 26 Mar 2025
https://github.com/yueureka/WildFireDetection
Using U-Net Model to Detect Wildfire from Satellite Imagery
ai databricks deep-learning docker satellite-imagery sparkaisummit streamlit unet-model wildfire wildfire-detection
Last synced: 04 May 2025
https://github.com/databrickslabs/jupyterlab-integration
DEPRECATED: Integrating Jupyter with Databricks via SSH
databricks databricks-api databricks-deploy jupyter jupyter-notebook
Last synced: 25 Jan 2025
https://github.com/johnsnowlabs/johnsnowlabs
Gateway into the John Snow Labs Ecosystem
bert databricks gpt machine-learning natural-language-processing nlp python seq2seq spark t5
Last synced: 09 May 2025
https://github.com/lhbench/lhbench
Lakehouse storage system benchmark
apache-hudi apache-iceberg benchmark cidr database databricks delta-lake lakehouse
Last synced: 04 Apr 2025
https://github.com/alexott/databricks-playground
Code samples, etc. for Databricks
Last synced: 09 Apr 2025
https://github.com/databrickslabs/remorph
Accelerates migrations to Databricks by automating code conversion and migration validation
code-converter data-validation databricks reconciliation transpiler
Last synced: 06 May 2025
https://github.com/mullerpeter/databricks-grafana
Grafana Databricks integration allowing direct connection to Databricks to query and visualize Databricks data in Grafana.
databricks grafana grafana-backend-plugin grafana-datasource grafana-plugin
Last synced: 13 Apr 2025
https://github.com/databricks/databricks-sdk-go
Databricks SDK for Go
databricks databricks-automation databricks-sdk go
Last synced: 04 Apr 2025
https://github.com/starlake-ai/jsqltranspiler
Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.
abstract-syntax-tree bigquery column databricks duckdb java lineage query redshift resolver rewrite snowflake transpiler
Last synced: 16 May 2025
https://github.com/souvik-databricks/dlt-with-debug
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
big-data big-data-processing databricks delta-live-tables dlt etl etl-pipeline python3 spark
Last synced: 15 Apr 2025
https://github.com/databricks/unity-catalog-setup
Notebooks, terraform, tools to enable setting up Unity Catalog
Last synced: 12 Apr 2025
https://github.com/tomaztk/azure-databricks
Azure Databricks - Advent of 2020 Blogposts
azure-data-factory azure-databricks azure-machine-learnning data-analytics data-engineerg databricks databricks-notebooks machine-learning mlflow mllib notebook notebooks pyspark python r-language scala spark spark-structured-streaming sparkr sql
Last synced: 16 May 2025
https://github.com/databricks/databricks-sql-go
Golang database/sql driver for Databricks SQL.
databricks dwh golang golang-library sql
Last synced: 16 May 2025
https://github.com/databricks/databricks-sdk-java
Databricks SDK for Java
databricks databricks-automation databricks-sdk java
Last synced: 07 Apr 2025
https://github.com/tatevkaren/free-resources-books-papers
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
books data-science databricks delta-lake developers econometrics free-books free-resources machine-learning mathematics statistics
Last synced: 28 Mar 2025
https://github.com/getstrm/pace
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake
Last synced: 10 Apr 2025
https://github.com/renardeinside/databricks-streamlit-demo
Demo of Streamlit application with Databricks SQL Endpoint
databricks streamlit visualization
Last synced: 23 Apr 2025
https://github.com/alexott/dlt-files-in-repos-demo
Demonstration of using Files in Repos with Databricks Delta Live Tables
ci-cd databricks delta-live-tables devops unit-testing
Last synced: 13 Apr 2025
https://github.com/airscholar/modern-data-eng-dbt-databricks-azure
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.
apache-spark azure databricks dbt modern-data-engineering
Last synced: 10 Apr 2025
https://github.com/vedanthv/data-engineering-portfolio
Cool DE Projects
aws-projects data data-engineering data-modelling databricks portfolio-project python sql
Last synced: 14 Apr 2025
https://github.com/databricks/databricks-sql-nodejs
Databricks SQL Connector for Node.js
databricks dwh node node-js nodejs sql
Last synced: 04 Apr 2025
https://github.com/jihyeonseong/esg-ai-investment-by-streamlit
ESG-investment AI
ai databricks esg finance-analytics investment markowitz-portfolio node2vec pyspark python streamlit
Last synced: 01 Feb 2025
https://github.com/renardeinside/databricks-uc-semantic-layer
Using OpenAI with Databricks SQL for queries in natural language
databricks databricks-sql openai sql
Last synced: 23 Apr 2025
https://github.com/pbv0/databricks-apps-cookbook
Ready-to-use code snippets for building interactive data applications using Databricks Apps.
Last synced: 01 Feb 2025
https://github.com/jaceklaskowski/learn-databricks
Notebooks to learn Databricks Lakehouse Platform
databricks databricks-notebooks delta-live-tables mlflow
Last synced: 16 Apr 2025
https://github.com/nhsdigital/artificial-data-generator
Pipelines for generating large volumes of anonymous artificial data that share some of the characteristics of real NHS data
artificial baseline-rap databricks hospital-episode-statistics nhs not-optimised-for-reuse pyspark python
Last synced: 12 Apr 2025
https://github.com/santiagortiiz/advanced-data-engineering-with-databricks
Databricks. Incremental data processing, task orchestration, and production job monitoring.
big-data databricks databricks-notebooks kafka spark spark-streaming streaming
Last synced: 15 Apr 2025
https://github.com/adampaternostro/azure-databricks-log4j-to-appinsights
Connect your Spark Databricks clusters Log4J output to the Application Insights Appender
application-insights azure-monitor databricks
Last synced: 03 Dec 2024
https://github.com/mach-kernel/databricks-kube-operator
A Kubernetes operator to enable GitOps style deploys for Databricks resources
ci cicd databricks gitops helm kubernetes operators rust spark
Last synced: 26 Apr 2025
https://github.com/benitomartin/mlops-databricks-credit-default
End-to-end MLOps Credit Default Project using DABs
aws continuous-deployment continuous-integration databricks mlflow precommit-hooks pydantic python ruff uv
Last synced: 10 Apr 2025
https://github.com/renardeinside/pyspark-logging-examples
Writing PySpark logs in Apache Spark and Databricks
apache-spark databricks log4j logging logs
Last synced: 23 Apr 2025
https://github.com/getyourguide/db-rocket
Keep your local python scripts installed and in sync with a databricks notebook. Shortens the feedback loop to develop projects using a hybrid environment.
data-science databricks productivity python
Last synced: 11 Apr 2025
https://github.com/hashload/freeza-offset
Spark stream consumption commit in kafka consumer group
databricks kafka kafka-commit kafka-offset-commits spark spark-streaming
Last synced: 14 Feb 2025
https://github.com/bluegranite/databrickstraining
Repository for Microsoft Databricks Training Events - Hosted by BlueGranite
apache-spark azure azure-databricks databricks distributed-computing machine-learning pyspark spark spark-streaming
Last synced: 13 May 2025
https://github.com/azure/employee-retention-databricks-kubernetes-poc
End-to-end proof of concept showing core MLOps practices to develop, deploy and monitor a machine learning model for an employee retention workload using Databricks and Kubernetes on Microsoft Azure.
azure databricks github-actions kubernetes machine-learning mlflow
Last synced: 09 Apr 2025
https://github.com/renardeinside/e2e-mlops-demo
E2E MLOps with Databricks
azure databricks hyperopt mlops
Last synced: 23 Apr 2025
https://github.com/xonai-computing/xonai-dashboard
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
apache-spark aws aws-emr databricks grafana prometheus python
Last synced: 14 Feb 2025
https://github.com/unytics/catalog_builder
Data Catalogs Made Easy
bigquery data-catalog data-discovery databricks dbt redshift snowflake
Last synced: 12 Apr 2025
https://github.com/cartodb/poc-databricks
CARTO Analytics Toolbox for Databricks provides geospatial functionality leveraging the Geomesa SparkSQL capabilities.
databricks geospatial gis location
Last synced: 12 Apr 2025
https://github.com/analyticalmonk/pyspark_nlp_workshop
Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"
databricks databricks-notebooks distributed-computing nlp pyspark spark spark-nlp workshop
Last synced: 15 Apr 2025
https://github.com/ajaen4/terraform-databricks-aws
Terraform repository to deploy a fully functioning Databricks environment on top of AWS. Deploys all Databricks and AWS resources.
Last synced: 17 Jan 2025
https://github.com/benc-uk/batcomputer
A working example of DevOps & operationalisation applied to Machine Learning and AI
api-wrapper azure azure-devops databricks docker kubernetes machine-learning
Last synced: 10 Mar 2025
https://github.com/mlverse/pysparklyr
Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect
databricks pyspark r spark spark-connect
Last synced: 22 Nov 2024
https://github.com/cdcgov/cdh-lava-react
CDC Data Hub Lifecycle, Analysis & Visualization Accelerator (LAVA) REACT Components based on machine readable requirements.
agile-development azure data-analysis data-catalog data-governance data-quality data-science data-visualization databricks datavisualization devops excel-export metadata operations powerautomate powerbi pyspark security sql test-automation
Last synced: 22 Apr 2025
https://github.com/tomarv2/terraform-databricks-workspace-management
Terraform module for Databricks Workspace Management: https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/guides/workspace-management
databricks databricks-deploy databricks-workspace databricks-workspace-management notebook terraform terraform-module
Last synced: 23 Mar 2025
https://github.com/renardeinside/chatten
RAG application (backend & frontend) with sources retriveal and highlighting on the Databricks Platform
dash databricks python rag vector-search
Last synced: 05 May 2025
https://github.com/serialbandicoot/great-assertions
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
data-science data-testing databricks great-expectations jupyter-notebook python python3 quality-assurance testing
Last synced: 13 Feb 2025
https://github.com/arsentievalex/newspulse-databricks-hackathon
NewsPulse is AI powered news analytics app for investors
databricks dbrx duckduckgo-search langchain llm openai rag streamlit vector-database yahooquery
Last synced: 22 Nov 2024
https://github.com/tomarv2/terraform-databricks-aws-workspace
Terraform module to create Databricks AWS E2 workspace
aws databricks databricks-account databricks-e2-workspace databricks-e2-workspaces terraform terraform-module
Last synced: 23 Mar 2025
https://github.com/adampaternostro/azure-app-insights-distrubuted-tracing
How to use Application Insights to do distributed tracing through a Web App, REST API, Function App, Service Bus, Databricks and Data Factory.
application-insights azure azure-data-factory azure-functions databricks monitoring service-bus
Last synced: 03 Dec 2024
https://github.com/sjrusso8/fastapi-lakehouse
Connect FastAPI to a Databricks Lakehouse
databricks fastapi fastapi-template lakehouse
Last synced: 13 Apr 2025
https://github.com/hamza88-coder/real-time-recruitment-system-with-ai-and-data-analytics
Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includes a Flask-based recommendation system and Tableau visualizations.
apache-nifi chatbot databricks dbt delta-lake docker faiss flask k-means kafka llama3 pinecone postgresql ray redis snowflake spark sparkml
Last synced: 13 Jan 2025
https://github.com/renardeinside/dbx-scala-example
Sample project for Scala applications with dbx and CI/CD setup based on Github actions.
cicd databricks github-actions scala
Last synced: 23 Apr 2025
https://github.com/kevinknights29/databricks_llm101x
This project contains the lab notebooks from course: Large Language Models: Application through Production by Databricks
databricks jupyter-notebook llms
Last synced: 12 Apr 2025
https://github.com/turbot/steampipe-plugin-databricks
Use SQL to instantly query Databricks resources. Open source CLI. No DB required.
backup databricks etl hacktoberfest postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl
Last synced: 22 Apr 2025
https://github.com/datawaves-xyz/dbt_datawaves_wallet_labels
Ethereum Wallet labels built using dbt.
blockchain blockchain-analytics databricks dbt ethereum nft whales
Last synced: 11 May 2025