{"id":18400650,"url":"https://github.com/databricks/dbt-databricks","last_synced_at":"2026-02-26T18:28:55.331Z","repository":{"id":37569887,"uuid":"419004753","full_name":"databricks/dbt-databricks","owner":"databricks","description":"A dbt adapter for Databricks.","archived":false,"fork":false,"pushed_at":"2025-05-14T00:23:31.000Z","size":4702,"stargazers_count":268,"open_issues_count":80,"forks_count":137,"subscribers_count":23,"default_branch":"main","last_synced_at":"2025-05-14T09:05:42.061Z","etag":null,"topics":["databricks","dbt","etl","sql"],"latest_commit_sha":null,"homepage":"https://databricks.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.MD","funding":null,"license":"License.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-10-19T16:26:44.000Z","updated_at":"2025-05-13T18:54:46.000Z","dependencies_parsed_at":"2024-04-19T22:26:21.739Z","dependency_job_id":"4bd211e6-79ba-41f7-952e-cb1bbb4c43ef","html_url":"https://github.com/databricks/dbt-databricks","commit_stats":null,"previous_names":[],"tags_count":123,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdbt-databricks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdbt-databricks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdbt-databricks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fdbt-databricks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks","download_url":"https://codeload.github.com/databricks/dbt-databricks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254110374,"owners_count":22016391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["databricks","dbt","etl","sql"],"created_at":"2024-11-06T02:35:40.620Z","updated_at":"2026-02-26T18:28:55.319Z","avatar_url":"https://github.com/databricks.png","language":"Python","funding_links":[],"categories":["🧱 Databricks"],"sub_categories":["📚 Repos"],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://bynder-public-us-west-2.s3.amazonaws.com/styleguide/ABB317701CA31CB7F29268E32B303CAE-pdf-column-1.png\" alt=\"databricks logo\" width=\"50%\" /\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/dbt-labs/dbt/ec7dee39f793aa4f7dd3dae37282cc87664813e4/etc/dbt-logo-full.svg\" alt=\"dbt logo\" width=\"250\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/databricks/dbt-databricks/actions/workflows/main.yml\"\u003e\n    \u003cimg src=\"https://github.com/databricks/dbt-databricks/actions/workflows/main.yml/badge.svg?event=push\" alt=\"Unit Tests Badge\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/databricks/dbt-databricks/actions/workflows/integration.yml\"\u003e\n    \u003cimg src=\"https://github.com/databricks/dbt-databricks/actions/workflows/integration.yml/badge.svg?event=push\" alt=\"Integration Tests Badge\"/\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n**[dbt](https://www.getdbt.com/)** enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.\n\nThe **[Databricks Lakehouse](https://www.databricks.com/)** provides one simple platform to unify all your data, analytics and AI workloads.\n\n# dbt-databricks\n\nThe `dbt-databricks` adapter contains all of the code enabling dbt to work with Databricks. This adapter is based off the amazing work done in [dbt-spark](https://github.com/dbt-labs/dbt-spark). Some key features include:\n\n- **Easy setup**. No need to install an ODBC driver as the adapter uses pure Python APIs.\n- **Open by default**. For example, it uses the the open and performant [Delta](https://delta.io/) table format by default. This has many benefits, including letting you use `MERGE` as the the default incremental materialization strategy.\n- **Support for Unity Catalog**. dbt-databricks supports the 3-level namespace of Unity Catalog (catalog / schema / relations) so you can organize and secure your data the way you like.\n- **Performance**. The adapter generates SQL expressions that are automatically accelerated by the native, vectorized [Photon](https://databricks.com/product/photon) execution engine.\n\n## Choosing between dbt-databricks and dbt-spark\n\nIf you are developing a dbt project on Databricks, we recommend using `dbt-databricks` for the reasons noted above.\n\n`dbt-spark` is an actively developed adapter which works with Databricks as well as Apache Spark anywhere it is hosted e.g. on AWS EMR.\n\n## Getting started\n\n### Installation\n\nInstall using pip:\n\n```nofmt\npip install dbt-databricks\n```\n\nUpgrade to the latest version\n\n```nofmt\npip install --upgrade dbt-databricks\n```\n\n### Profile Setup\n\n```nofmt\nyour_profile_name:\n  target: dev\n  outputs:\n    dev:\n      type: databricks\n      catalog: [optional catalog name, if you are using Unity Catalog]\n      schema: [database/schema name]\n      host: [your.databrickshost.com]\n      http_path: [/sql/your/http/path]\n      token: [dapiXXXXXXXXXXXXXXXXXXXXXXX]\n```\n\n### Documentation\n\nFor comprehensive documentation on Databricks-specific features, configurations, and capabilities:\n\n- **[Databricks configurations](https://docs.getdbt.com/reference/resource-configs/databricks-configs)** - Complete reference for all Databricks-specific model configurations, materializations, and incremental strategies\n- **[Connect to Databricks](https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup)** - Setup and authentication guide\n\n### Quick Starts\n\nThese following quick starts will get you up and running with the `dbt-databricks` adapter:\n\n- [Set up your dbt project with Databricks](https://docs.getdbt.com/guides/set-up-your-databricks-dbt-project)\n- Using dbt Cloud with Databricks ([Azure](https://docs.microsoft.com/en-us/azure/databricks/integrations/prep/dbt-cloud) | [AWS](https://docs.databricks.com/integrations/prep/dbt-cloud.html))\n- [Running dbt production jobs on Databricks Workflows](https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-workflows.md)\n- [Using Unity Catalog with dbt-databricks](https://github.com/databricks/dbt-databricks/blob/main/docs/uc.md)\n- [Continuous integration in dbt](https://docs.getdbt.com/docs/deploy/continuous-integration)\n- [Loading data from S3 into Delta using the databricks_copy_into macro](https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-copy-into-macro-aws.md)\n- [Contribute to this repository](CONTRIBUTING.MD)\n\n### Compatibility\n\nThe `dbt-databricks` adapter has been tested:\n\n- with Python 3.7 or above.\n- against `Databricks SQL` and `Databricks runtime releases 9.1 LTS` and later.\n\n### Tips and Tricks\n\n## Choosing compute for a Python model\n\nYou can override the compute used for a specific Python model by setting the `http_path` property in model configuration. This can be useful if, for example, you want to run a Python model on an All Purpose cluster, while running SQL models on a SQL Warehouse. Note that this capability is only available for Python models.\n\n```\ndef model(dbt, session):\n    dbt.config(\n      http_path=\"sql/protocolv1/...\"\n    )\n```\n\n## Python models and ANSI mode\n\nWhen ANSI mode is enabled (`spark.sql.ansi.enabled=true`), there are limitations when using pandas DataFrames in Python models:\n\n1. **Regular pandas DataFrames**: dbt-databricks will automatically handle conversion even when ANSI mode is enabled, falling back to `spark.createDataFrame()` if needed.\n\n2. **pandas-on-Spark DataFrames**: If you create pandas-on-Spark DataFrames directly in your model (using `pyspark.pandas` or `databricks.koalas`), you may encounter errors with ANSI mode enabled. In this case, you have two options:\n   - Disable ANSI mode for your session: Set `spark.sql.ansi.enabled=false` in your cluster or SQL warehouse configuration\n   - Set the pandas-on-Spark option in your model code:\n     ```python\n     import pyspark.pandas as ps\n     ps.set_option('compute.fail_on_ansi_mode', False)\n     ```\n     Note: This may cause unexpected behavior as pandas-on-Spark follows pandas semantics (returning null/NaN for invalid operations) rather than ANSI SQL semantics (raising errors).\n\nFor more information about ANSI mode and its implications, see the [Spark documentation on ANSI compliance](https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fdbt-databricks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks%2Fdbt-databricks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fdbt-databricks/lists"}