{"id":13453955,"url":"https://github.com/duckdb/dbt-duckdb","last_synced_at":"2026-02-18T07:01:14.339Z","repository":{"id":37498332,"uuid":"298630871","full_name":"duckdb/dbt-duckdb","owner":"duckdb","description":"dbt adapter for DuckDB","archived":false,"fork":false,"pushed_at":"2026-02-09T18:47:19.000Z","size":1136,"stargazers_count":1227,"open_issues_count":68,"forks_count":123,"subscribers_count":23,"default_branch":"master","last_synced_at":"2026-02-13T23:44:54.048Z","etag":null,"topics":["dbt","duckdb"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/duckdb.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-09-25T16:54:41.000Z","updated_at":"2026-02-12T18:47:01.000Z","dependencies_parsed_at":"2023-09-12T02:28:10.097Z","dependency_job_id":"c4f00d91-0f68-45f7-ae34-5bc652d902e9","html_url":"https://github.com/duckdb/dbt-duckdb","commit_stats":null,"previous_names":["duckdb/dbt-duckdb","jwills/dbt-duckdb"],"tags_count":30,"template":false,"template_full_name":null,"purl":"pkg:github/duckdb/dbt-duckdb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckdb%2Fdbt-duckdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckdb%2Fdbt-duckdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckdb%2Fdbt-duckdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckdb%2Fdbt-duckdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/duckdb","download_url":"https://codeload.github.com/duckdb/dbt-duckdb/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckdb%2Fdbt-duckdb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29571886,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T06:19:27.422Z","status":"ssl_error","status_checked_at":"2026-02-18T06:18:44.348Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dbt","duckdb"],"created_at":"2024-07-31T08:00:49.718Z","updated_at":"2026-02-18T07:01:14.326Z","avatar_url":"https://github.com/duckdb.png","language":"Python","funding_links":[],"categories":["Integrations","others","Python","🔄 Data Plattform Tools"],"sub_categories":["Web Clients","🧠 Prompt Engineering \u0026 Memory Bank"],"readme":"## dbt-duckdb\n\n[DuckDB](http://duckdb.org) is an embedded database, similar to SQLite, but designed for OLAP-style analytics.\nIt is crazy fast and allows you to read and write data stored in CSV, JSON, and Parquet files directly, without requiring you to load\nthem into the database first.\n\n[dbt](http://getdbt.com) is the best way to manage a collection of data transformations written in SQL or Python for analytics\nand data science. `dbt-duckdb` is the project that ties DuckDB and dbt together, allowing you to create a [Modern Data Stack In\nA Box](https://duckdb.org/2022/10/12/modern-data-stack-in-a-box.html) or a simple and powerful data lakehouse with Python.\n\n### Installation\n\nThis project is hosted on PyPI, so you should be able to install it and the necessary dependencies via:\n\n`pip3 install dbt-duckdb`\n\nThe latest supported version targets `dbt-core` versions \u003e= 1.8.x and `duckdb` version 1.1.x, but we work hard to ensure that newer\nversions of DuckDB will continue to work with the adapter as they are released.\n\n### Configuring Your Profile\n\nA super-minimal dbt-duckdb profile only needs *one* setting:\n\n````\ndefault:\n  outputs:\n    dev:\n      type: duckdb\n  target: dev\n````\n\nThis will run your dbt-duckdb pipeline against an in-memory DuckDB database that will not be persisted after your run completes. This may\nnot seem very useful at first, but it turns out to be a powerful tool for a) testing out data pipelines, either locally or in CI jobs and\nb) running data pipelines that operate purely on external CSV, Parquet, or JSON files. More details on how to work with external data files\nin dbt-duckdb are provided in the docs on [reading and writing external files](#reading-and-writing-external-files).\n\nTo have your dbt pipeline persist relations in a DuckDB file, set the `path` field in your profile to the path\nof the DuckDB file that you would like to read and write on your local filesystem. (For in-memory pipelines, the `path`\nis automatically set to the special value `:memory:`). By default, the `path` is relative to your `profiles.yml` file location.\nIf the database doesn't exist at the specified `path`, DuckDB will automatically create it.\n\n`dbt-duckdb` also supports common profile fields like `schema` and `threads`, but the `database` property is special: its value is automatically set\nto the basename of the file in the `path` argument with the suffix removed. For example, if the `path` is `/tmp/a/dbfile.duckdb`, the `database`\nfield will be set to `dbfile`. If you are running in in-memory mode, then the `database` property will be automatically set to `memory`.\n\n#### Using MotherDuck\n\nAs of `dbt-duckdb` 1.5.2, you can connect to a DuckDB instance running on [MotherDuck](http://www.motherduck.com) by setting your `path` to use a [md:\u003cdatabase\u003e connection string](https://motherduck.com/docs/getting-started/connect-query-from-python/installation-authentication), just as you would with the DuckDB CLI\nor the Python API.\n\nMotherDuck databases generally work the same way as local DuckDB databases from the perspective of dbt, but\nthere are a [few differences to be aware of](https://motherduck.com/docs/architecture-and-capabilities#considerations-and-limitations):\n1. MotherDuck is compatible with client DuckDB versions 0.10.2 and older.\n1. MotherDuck preloads a set of the most common DuckDB extensions for you, but does not support loading custom extensions or user-defined functions.\n\nAs of `dbt-duckdb` 1.9.6, you can also connect to a DuckDB instance running [hosted DuckLake on MotherDuck](https://motherduck.com/blog/ducklake-motherduck/) by creating a DuckLake on MotherDuck and then setting `is_ducklake: true` in your `profiles.yml`.\n\n```sql\n-- to use create your own database in MotherDuck first\nCREATE DATABASE my_ducklake\n  (TYPE ducklake, DATA_PATH 's3://...')\n```\n\nAn example profile is show below under \"Attaching Additional Databases\". DuckLake must be identified so that safe DDL operations are applied by dbt.\n\n#### DuckDB Extensions, Settings, and Filesystems\n\nYou can install and load any core [DuckDB extensions](https://duckdb.org/docs/extensions/overview) by listing them in\nthe `extensions` field in your profile as a string. You can also set any additional [DuckDB configuration options](https://duckdb.org/docs/sql/configuration)\nvia the `settings` field, including options that are supported in the loaded extensions. You can also configure extensions from outside of the core\nextension repository (e.g., a community extension) by configuring the extension as a `name`/`repo` pair:\n\n```\ndefault:\n  outputs:\n    dev:\n      type: duckdb\n      path: /tmp/dbt.duckdb\n      extensions:\n        - httpfs\n        - parquet\n        - name: h3\n          repo: community\n        - name: uc_catalog\n          repo: core_nightly\n  target: dev\n```\n\nTo use the [DuckDB Secrets Manager](https://duckdb.org/docs/configuration/secrets_manager.html), you can use the `secrets` field. For example, to be able to connect to S3 and read/write\nParquet files using an AWS access key and secret, your profile would look something like this:\n\n```\ndefault:\n  outputs:\n    dev:\n      type: duckdb\n      path: /tmp/dbt.duckdb\n      extensions:\n        - httpfs\n        - parquet\n      secrets:\n        - type: s3\n          region: my-aws-region\n          key_id: \"{{ env_var('S3_ACCESS_KEY_ID') }}\"\n          secret: \"{{ env_var('S3_SECRET_ACCESS_KEY') }}\"\n  target: dev\n```\n\nAs of version `1.4.1`, we have added (experimental!) support for DuckDB's (experimental!) support for filesystems\nimplemented via [fsspec](https://duckdb.org/docs/guides/python/filesystems.html). The `fsspec` library provides\nsupport for reading and writing files from a [variety of cloud data storage systems](https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations)\nincluding S3, GCS, and Azure Blob Storage. You can configure a list of fsspec-compatible implementations for use with your dbt-duckdb project by installing the relevant Python modules\nand configuring your profile like so:\n\n```\ndefault:\n  outputs:\n    dev:\n      type: duckdb\n      path: /tmp/dbt.duckdb\n      filesystems:\n        - fs: s3\n          anon: false\n          key: \"{{ env_var('S3_ACCESS_KEY_ID') }}\"\n          secret: \"{{ env_var('S3_SECRET_ACCESS_KEY') }}\"\n          client_kwargs:\n            endpoint_url: \"http://localhost:4566\"\n  target: dev\n```\n\nHere, the `filesystems` property takes a list of configurations, where each entry must have a property named `fs` that indicates which `fsspec` protocol\nto load (so `s3`, `gcs`, `abfs`, etc.) and then an arbitrary set of other key-value pairs that are used to configure the `fsspec` implementation. You can see a simple example project that\nillustrates the usage of this feature to connect to a Localstack instance running S3 from dbt-duckdb [here](https://github.com/jwills/s3-demo).\n\n#### Fetching credentials from context\n\nInstead of specifying the credentials through the settings block, you can also use the `CREDENTIAL_CHAIN` secret provider. This means that you can use any supported mechanism from AWS to obtain credentials (e.g., web identity tokens). You can read more about the secret providers [here](https://duckdb.org/docs/configuration/secrets_manager.html#secret-providers). To use the `CREDENTIAL_CHAIN` provider and automatically fetch credentials from AWS, specify the `provider` in the `secrets` key:\n\n```\ndefault:\n  outputs:\n    dev:\n      type: duckdb\n      path: /tmp/dbt.duckdb\n      extensions:\n        - httpfs\n        - parquet\n      secrets:\n        - type: s3\n          provider: credential_chain\n  target: dev\n```\n\n#### Scoped credentials by storage prefix\n\nSecrets can be scoped, such that different storage path can use different credentials.\n\n```\ndefault:\n  outputs:\n    dev:\n      type: duckdb\n      path: /tmp/dbt.duckdb\n      extensions:\n        - httpfs\n        - parquet\n      secrets:\n        - type: s3\n          provider: credential_chain\n          scope: [ \"s3://bucket-in-eu-region\", \"s3://bucket-2-in-eu-region\" ]\n          region: \"eu-central-1\"\n        - type: s3\n          region: us-west-2\n          scope: \"s3://bucket-in-us-region\"\n```\n\nWhen fetching a secret for a path, the secret scopes are compared to the path, returning the matching secret for the path. In the case of multiple matching secrets, the longest prefix is chosen.\n\n#### Attaching Additional Databases\n\nDuckDB supports [attaching additional databases](https://duckdb.org/docs/sql/statements/attach.html) to your dbt-duckdb run so that you can read\nand write from multiple databases. Additional databases may be configured via the `attach` argument\nin your profile that was added in dbt-duckdb `1.4.0`:\n\n```\ndefault:\n  outputs:\n    dev:\n      type: duckdb\n      path: /tmp/dbt.duckdb\n      attach:\n        - path: /tmp/other.duckdb\n        - path: ./yet/another.duckdb\n          alias: yet_another\n        - path: s3://yep/even/this/works.duckdb\n          read_only: true\n        - path: sqlite.db\n          type: sqlite\n        - path: postgresql://username@hostname/dbname\n          type: postgres\n        # Using the options dict for arbitrary ATTACH options\n        - path: /tmp/special.duckdb\n          options:\n            cache_size: 1GB\n            threads: 4\n            enable_fsst: true\n```\n\nFor DuckLake, use `ducklake:` for local; for MotherDuck-managed DuckLake use `md:` with `is_ducklake: true`.\n\n```yaml\nattach:\n  - path: \"ducklake:my_ducklake.ddb\"\n  - path: \"md:my_other_ducklake\"\n    is_ducklake: true\n```\n\n\nThe attached databases may be referred to in your dbt sources and models by either the basename of the database file minus its suffix (e.g., `/tmp/other.duckdb` is the `other` database\nand `s3://yep/even/this/works.duckdb` is the `works` database) or by an alias that you specify (so the `./yet/another.duckdb` database in the above configuration is referred to\nas `yet_another` instead of `another`.) Note that these additional databases do not necessarily have to be DuckDB files: DuckDB's storage and catalog engines are pluggable, and\nDuckDB ships with support for reading and writing from attached databases. You can indicate the type of the database you are connecting to via the `type` argument,\nwhich currently supports `duckdb`, `sqlite` and `postgres`.\n\n##### Arbitrary ATTACH Options\n\nAs DuckDB continues to add new attachment options, you can use the `options` dictionary to specify any additional key-value pairs that will be passed to the `ATTACH` statement. This allows you to take advantage of new DuckDB features without waiting for explicit support in dbt-duckdb:\n\n```\nattach:\n  # Standard way using direct fields\n  - path: /tmp/db1.duckdb\n    type: sqlite\n    read_only: true\n\n  # New way using options dict (equivalent to above)\n  - path: /tmp/db2.duckdb\n    options:\n      type: sqlite\n      read_only: true\n\n  # Mix of both (no conflicts allowed)\n  - path: /tmp/db3.duckdb\n    type: sqlite\n    options:\n      block_size: 16384\n\n  # Using options dict for future DuckDB attachment options\n  - path: /tmp/db4.duckdb\n    options:\n      type: duckdb\n      # Example: hypothetical future options DuckDB might add\n      compression: lz4\n      memory_limit: 2GB\n```\n\nNote: If you specify the same option in both a direct field (`type`, `secret`, `read_only`) and in the `options` dict, dbt-duckdb will raise an error to prevent conflicts.\n\n#### Configuring dbt-duckdb Plugins\n\ndbt-duckdb has its own [plugin](dbt/adapters/duckdb/plugins/__init__.py) system to enable advanced users to extend\ndbt-duckdb with additional functionality, including:\n\n* Defining [custom Python UDFs](https://duckdb.org/docs/api/python/function.html) on the DuckDB database connection\nso that they can be used in your SQL models\n* Loading source data from [Excel](dbt/adapters/duckdb/plugins/excel.py), [Google Sheets](dbt/adapters/duckdb/plugins/gsheet.py), or [SQLAlchemy](dbt/adapters/duckdb/plugins/sqlalchemy.py) tables\n\nYou can find more details on [how to write your own plugins here](#writing-your-own-plugins). To configure a plugin for use\nin your dbt project, use the `plugins` property on the profile:\n\n```\ndefault:\n  outputs:\n    dev:\n      type: duckdb\n      path: /tmp/dbt.duckdb\n      plugins:\n        - module: gsheet\n          config:\n            method: oauth\n        - module: sqlalchemy\n          alias: sql\n          config:\n            connection_url: \"{{ env_var('DBT_ENV_SECRET_SQLALCHEMY_URI') }}\"\n        - module: path.to.custom_udf_module\n```\n\nEvery plugin must have a `module` property that indicates where the `Plugin` class to load is defined. There is\na set of built-in plugins that are defined in [dbt.adapters.duckdb.plugins](dbt/adapters/duckdb/plugins/) that\nmay be referenced by their base filename (e.g., `excel` or `gsheet`), while user-defined plugins (which are\ndescribed later in this document) should be referred to via their full module path name (e.g. a `lib.my.custom` module that defines a class named `Plugin`.)\n\nEach plugin instance has a name for logging and reference purposes that defaults to the name of the module\nbut that may be overridden by the user by setting the `alias` property in the configuration. Finally,\nmodules may be initialized using an arbitrary set of key-value pairs that are defined in the\n`config` dictionary. In this example, we initialize the `gsheet` plugin with the setting `method: oauth` and we\ninitialize the `sqlalchemy` plugin (aliased as \"sql\") with a `connection_url` that is set via an environment variable.\n\nPlease remember that using plugins may require you to add additional dependencies to the Python environment that your dbt-duckdb pipeline runs in:\n\n* `excel` depends on `pandas`, and `openpyxl` or `xlsxwriter` to perform writes\n* `gsheet` depends on `gspread` and `pandas`\n* `iceberg` depends on `pyiceberg` and Python \u003e= 3.10\n* `sqlalchemy` depends on `pandas`, `sqlalchemy`, and the driver(s) you need\n\n**Experimental:**\n\n* `delta` depends on `deltalake`, [an example project](https://github.com/milicevica23/dbt-duckdb-delta-plugin-demo)\n\n**Note:** Be aware that experimental features can change over time, and we would like your feedback on config and possible different use cases.\n\n#### Using Local Python Modules\n\nIn dbt-duckdb 1.6.0, we added a new profile setting named `module_paths` that allows users to specify a list\nof paths on the filesystem that contain additional Python modules that should be added to the Python processes'\n`sys.path` property. This allows users to include additional helper Python modules in their dbt projects that\ncan be accessed by the running dbt process and used to define custom dbt-duckdb Plugins or library code that is\nhelpful for creating dbt Python models.\n\n### Reading and Writing External Files\n\nOne of DuckDB's most powerful features is its ability to read and write CSV, JSON, and Parquet files directly, without needing to import/export\nthem from the database first.\n\n#### Reading from external files\n\nYou may reference external files in your dbt models either directly or as dbt `source`s by configuring the `external_location`\nin either the `meta` or the `config` option on the source definition. The difference is that settings under the `meta` option\nwill be propagated to the documentation for the source generated via `dbt docs generate`, but the settings under the `config`\noption will not be. Any source settings that should be excluded from the docs should be specified via `config`, while any\noptions that you would like to be included in the generated documentation should live under `meta`.\n\n```\nsources:\n  - name: external_source\n    meta:\n      external_location: \"s3://my-bucket/my-sources/{name}.parquet\"\n    tables:\n      - name: source1\n      - name: source2\n```\n\nHere, the `meta` options on `external_source` defines `external_location` as an [f-string](https://peps.python.org/pep-0498/) that\nallows us to express a pattern that indicates the location of any of the tables defined for that source. So a dbt model like:\n\n```\nSELECT *\nFROM {{ source('external_source', 'source1') }}\n```\n\nwill be compiled as:\n\n```\nSELECT *\nFROM 's3://my-bucket/my-sources/source1.parquet'\n```\n\nIf one of the source tables deviates from the pattern or needs some other special handling, then the `external_location` can also be set on the `meta`\noptions for the table itself, for example:\n\n```\nsources:\n  - name: external_source\n    meta:\n      external_location: \"s3://my-bucket/my-sources/{name}.parquet\"\n    tables:\n      - name: source1\n      - name: source2\n        config:\n          external_location: \"read_parquet(['s3://my-bucket/my-sources/source2a.parquet', 's3://my-bucket/my-sources/source2b.parquet'])\"\n```\n\nIn this situation, the `external_location` setting on the `source2` table will take precedence, so a dbt model like:\n\n```\nSELECT *\nFROM {{ source('external_source', 'source2') }}\n```\n\nwill be compiled to the SQL query:\n\n```\nSELECT *\nFROM read_parquet(['s3://my-bucket/my-sources/source2a.parquet', 's3://my-bucket/my-sources/source2b.parquet'])\n```\n\nNote that the value of the `external_location` property does not need to be a path-like string; it can also be a function\ncall, which is helpful in the case that you have an external source that is a CSV file which requires special handling for DuckDB to load it correctly:\n\n```\nsources:\n  - name: flights_source\n    tables:\n      - name: flights\n        config:\n          external_location: \"read_csv('flights.csv', types={'FlightDate': 'DATE'}, names=['FlightDate', 'UniqueCarrier'])\"\n          formatter: oldstyle\n```\n\nNote that we need to override the default `str.format` string formatting strategy for this example\nbecause the `types={'FlightDate': 'DATE'}` argument to the `read_csv` function will be interpreted by\n`str.format` as a template to be matched on, which will cause a `KeyError: \"'FlightDate'\"` when we attempt\nto parse the source in a dbt model. The `formatter` configuration option for the source indicates whether\nwe should use `newstyle` string formatting (the default), `oldstyle` string formatting, or `template` string\nformatting. You can read up on the strategies the various string formatting techniques use at this\n[Stack Overflow answer](https://stackoverflow.com/questions/13451989/pythons-many-ways-of-string-formatting-are-the-older-ones-going-to-be-depre) and see examples of their use\nin this [dbt-duckdb integration test](https://github.com/jwills/dbt-duckdb/blob/master/tests/functional/adapter/test_sources.py).\n\n#### Writing to external files\n\nWe support creating dbt models that are backed by external files via the `external` materialization strategy:\n\n```\n{{ config(materialized='external', location='local/directory/file.parquet') }}\nSELECT m.*, s.id IS NOT NULL as has_source_id\nFROM {{ ref('upstream_model') }} m\nLEFT JOIN {{ source('upstream', 'source') }} s USING (id)\n```\n\n| Option | Default | Description\n| :---:    |  :---:    | ---\n| location | [external_location](dbt/include/duckdb/macros/utils/external_location.sql) macro | The path to write the external materialization to. See below for more details.\n| format | parquet | The format of the external file (parquet, csv, or json)\n| delimiter | ,    | For CSV files, the delimiter to use for fields.\n| options | None | Any other options to pass to DuckDB's `COPY` operation (e.g., `partition_by`, `codec`, etc.)\n| glue_register | false | If true, try to register the file created by this model with the AWS Glue Catalog.\n| glue_database | default | The name of the AWS Glue database to register the model with.\n\nIf the `location` argument is specified, it must be a filename (or S3 bucket/path), and dbt-duckdb will attempt to infer\nthe `format` argument from the file extension of the `location` if the `format` argument is unspecified (this functionality was\nadded in version 1.4.1.)\n\nIf the `location` argument is _not_ specified, then the external file will be named after the model.sql (or model.py) file that defined it\nwith an extension that matches the `format` argument (`parquet`, `csv`, or `json`). By default, the external files are created\nrelative to the current working directory, but you can change the default directory (or S3 bucket/prefix) by specifying the\n`external_root` setting in your DuckDB profile.\n\nUnfortunately incremental materialization strategies are not yet supported for `external` models.\n\n\n#### Incremental Strategy Configuration\n\ndbt-duckdb supports the `delete+insert`, `append`, `merge`, and `microbatch` strategies for incremental `table` models.\n\n* The `merge` strategy requires DuckDB \u003e= 1.4.0 and provides access to DuckDB's native MERGE statement.\n* The `microbatch` strategy requires dbt-core's microbatch support (dbt-core \u003e= 1.9).\n\n**Append Strategy:**\n\n| Configuration | Type | Default | Description\n| :---: | :---: | :---: | ---\n| `incremental_predicates` | list | null | SQL conditions to filter which records get appended\n\nExample:\n\n```yaml\nmodels:\n  - name: my_incremental_model\n    config:\n      materialized: incremental\n      incremental_strategy: append\n      incremental_predicates: [\"created_at \u003e (select max(created_at) from {{ this }})\"]\n```\n\n\n**Delete+Insert Strategy:**\n\n| Configuration | Type | Default | Description\n| :---: | :---: | :---: | ---\n| `unique_key` | string/list | required | Column(s) used to identify records for deletion\n| `incremental_predicates` | list | null | SQL conditions to filter the delete and insert operations\n\nExample:\n\n```yaml\nmodels:\n  - name: my_incremental_model\n    config:\n      materialized: incremental\n      incremental_strategy: delete+insert\n      unique_key: id  # or ['id', 'date'] for composite keys\n      incremental_predicates: [\"updated_at \u003e= '2023-01-01'\"]\n```\n\n\n**Microbatch Strategy:**\n\nMicrobatch runs incremental builds in time-based batches (using a configured `event_time` column) and generates per-batch `delete` + `insert` statements scoped to the batch window. Note that microbatching is most performant for physically _partitioned_ tables, for example on a DuckLake, but it is not necessarily the best strategy for DuckDB tables or Parquet files that work with row groups.\n\nImportant: dbt-duckdb does not support `unique_key` with `incremental_strategy: microbatch`. Microbatch does not do key-based upserts, and specifying `unique_key` is ignored/misleading. If you need key-based upserts, use `incremental_strategy: merge`.\n\n| Configuration | Type | Default | Description\n| :---: | :---: | :---: | ---\n| `event_time` | string | required | Name of the timestamp column used for microbatch windowing\n| `begin` | string | required | Start time for batching (for example `YYYY-MM-DD`)\n| `batch_size` | string | required | Batch grain (for example `day`, `hour`)\n| `incremental_predicates` | list | null | Optional additional predicates applied within each batch\n\nExample:\n\n```yaml\nmodels:\n  - name: my_microbatch_model\n    config:\n      materialized: incremental\n      incremental_strategy: microbatch\n      event_time: event_time\n      begin: '2025-01-01'\n      batch_size: day\n      incremental_predicates: [\"country = 'US'\"]\n```\n\n\u003e [!TIP]\n\u003e Microbatching might not always be best option from a performance perspective. Consider that DuckDB operates on row groups, not physical partitions (unless you have explicitly partitioned data in a DuckLake). While you can do batch processing in parallel, more threads with more batches in parallel does not always equal better performance as row groups might not align 1-1 with the batches. Be sure to test different amounts of threads to match your use case.\n\n\n**Merge Strategy (DuckDB \u003e= 1.4.0):**\n\nThe merge strategy leverages DuckDB's native MERGE statement to efficiently synchronize data between your incremental model and the target table. This strategy offers three configuration approaches: basic configuration (using simple options), enhanced configuration with explicit column control, and fully custom merge clauses.\n\n**Basic Configuration (Default Behavior):**\n\nWhen you specify only `unique_key`, dbt-duckdb uses DuckDB's `UPDATE BY NAME` and `INSERT BY NAME` operations, which automatically match columns by name between source and target tables.\n\n```yaml\nmodels:\n  - name: my_incremental_model\n    config:\n      materialized: incremental\n      incremental_strategy: merge\n      unique_key: id  # or ['id', 'date'] for composite keys\n```\n\nThis generates SQL equivalent to:\n\n```sql\nMERGE INTO target AS DBT_INTERNAL_DEST\nUSING source AS DBT_INTERNAL_SOURCE\nON (DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id)\nWHEN MATCHED THEN UPDATE BY NAME\nWHEN NOT MATCHED THEN INSERT BY NAME\n```\n\n**Enhanced Configuration:**\n\nThese options extend the basic merge behavior with additional control over which records get updated or inserted, which columns are affected, and how values are set.\n\n| Configuration | Type | Default | Description\n| :---: | :---: | :---: | ---\n| `unique_key` | string/list | required | Column(s) used for the MERGE join condition\n| `incremental_predicates` | list | null | Additional SQL conditions to filter the MERGE operation\n| `merge_on_using_columns` | list | null | Columns for USING clause syntax instead of ON for the join condition\n| `merge_update_condition` | string | null | SQL condition to control when matched records are updated\n| `merge_insert_condition` | string | null | SQL condition to control when unmatched records are inserted\n| `merge_update_columns` | list | null | Specific columns to update\n| `merge_exclude_columns` | list | null | Columns to exclude from updates\n| `merge_update_set_expressions` | dict | null | Custom expressions for column updates\n| `merge_returning_columns` | list | null | Columns to return from the MERGE operation\n\n**Example with Enhanced Options:**\n\n```yaml\nmodels:\n  - name: my_incremental_model\n    config:\n      materialized: incremental\n      incremental_strategy: merge\n      unique_key: id\n      merge_update_condition: \"DBT_INTERNAL_DEST.age \u003c DBT_INTERNAL_SOURCE.age\"\n      merge_insert_condition: \"DBT_INTERNAL_SOURCE.status != 'inactive'\"\n      merge_update_columns: ['name', 'age', 'status']\n      merge_exclude_columns: ['created_at']\n      merge_update_set_expressions:\n        updated_at: \"CURRENT_TIMESTAMP\"\n        version: \"COALESCE(DBT_INTERNAL_DEST.version, 0) + 1\"\n```\n\n**Custom Merge Clauses:**\n\nFor maximum flexibility, use `merge_clauses` to define custom `when_matched` and `when_not_matched` behaviors.  This is especially helpful in more complex scenarios where you have more than one action, multiple conditions, or error handling within a `when_matched` or `when_not_matched` clause.\n\n*Supported When Matched Actions and Modes:*\n- `update`: Update the matched record\n  - `mode: by_name`: Use `UPDATE BY NAME` (default)\n  - `mode: by_position`: Use `UPDATE BY POSITION`\n  - `mode: star`: Use `UPDATE SET *`\n  - `mode: explicit`: Use explicit column list with custom expressions\n    - `update.include`: List of columns to include in the update\n    - `update.exclude`: List of columns to exclude from the update\n    - `update.set_expressions`: Dictionary of column-to-expression mappings for custom update values\n- `delete`: Delete the matched record\n- `do_nothing`: Skip the matched record\n- `error`: Raise an error for matched records\n  - `error_message`: Optional custom error message\n\n*Supported When Not Matched Actions and Modes:*\n- `insert`: Insert the unmatched record\n  - `mode: by_name`: Use `INSERT BY NAME` (default)\n  - `mode: by_position`: Use `INSERT BY POSITION`\n  - `mode: star`: Use `INSERT *`\n  - `mode: explicit`: Use explicit column and value lists\n    - `insert.columns`: List of column names for the INSERT statement\n    - `insert.values`: List of values/expressions corresponding to the columns\n- `update`: Update unmatched records (for WHEN NOT MATCHED BY SOURCE scenarios)\n  - `set_expressions`: Dictionary of column-to-expression mappings\n- `delete`: Delete unmatched records\n- `do_nothing`: Skip the unmatched record\n- `error`: Raise an error for unmatched records\n  - `error_message`: Optional custom error message\n\n**Example with Custom Merge Clauses:**\n\n```yaml\nmodels:\n  - name: my_incremental_model\n    config:\n      materialized: incremental\n      incremental_strategy: merge\n      unique_key: id\n      merge_clauses:\n        when_matched:\n          - action: update\n            mode: explicit\n            condition: \"DBT_INTERNAL_SOURCE.status = 'active'\"\n            update:\n              include: ['name', 'email', 'status']\n              exclude: ['created_at']\n              set_expressions:\n                updated_at: \"CURRENT_TIMESTAMP\"\n                version: \"COALESCE(DBT_INTERNAL_DEST.version, 0) + 1\"\n          - action: delete\n            condition: \"DBT_INTERNAL_SOURCE.status = 'deleted'\"\n        when_not_matched:\n          - action: insert\n            mode: explicit\n            insert:\n              columns: ['id', 'name', 'email', 'created_at']\n              values: ['DBT_INTERNAL_SOURCE.id', 'DBT_INTERNAL_SOURCE.name', 'DBT_INTERNAL_SOURCE.email', 'CURRENT_TIMESTAMP']\n```\n\n**DuckLake Restrictions:**\n\nWhen using DuckLake (attached DuckLake databases), MERGE statements are limited to a single UPDATE or DELETE action in `when_matched` clauses due to DuckLake's current MERGE implementation constraints.\n\n**Table Aliases:**\n\nIn conditions and expressions, use these table aliases:\n- `DBT_INTERNAL_SOURCE`: References the incoming data (your model's SELECT)\n- `DBT_INTERNAL_DEST`: References the existing target table\n\n#### Re-running external models with an in-memory version of dbt-duckdb\nWhen using `:memory:` as the DuckDB database, subsequent dbt runs can fail when selecting a subset of models that depend on external tables. This is because external files are only registered as  DuckDB views when they are created, not when they are referenced. To overcome this issue we have provided the `register_upstream_external_models` macro that can be triggered at the beginning of a run. To enable this automatic registration, place the following in your `dbt_project.yml` file:\n\n```yaml\non-run-start:\n  - \"{{ register_upstream_external_models() }}\"\n```\n\n### `table_function` Materialization\n\ndbt-duckdb also provides a custom table_function materialization to use DuckDB's Table Function / Table Macro feature to provide parameterized views.\n\nWhy use this materialization?\n* Late binding of functions means that the underlying table can change (have new columns added) and the function does not need to be recreated.\n  * (With a view, the create view statement would need to be re-run).\n  * This allows for skipping parts of the dbt DAG, even if the underlying table changed.\n* Parameters can force filter pushdown\n* Functions can provide advanced features like dynamic SQL (the query and query_table functions)\n\n\nExample table_function creation with 0 parameters:\n```sql\n{{\n    config(\n        materialized='table_function'\n    )\n}}\nselect * from {{ ref(\"example_table\") }}\n```\n\nExample table_function invocation (note the parentheses are needed even with 0 parameters!):\n```sql\nselect * from {{ ref(\"my_table_function\") }}()\n```\n\nExample table_function creation with 2 parameters:\n```sql\n{{\n    config(\n        materialized='table_function',\n        parameters=['where_a', 'where_b']\n    )\n}}\nselect *\nfrom {{ ref(\"example_table\") }}\nwhere 1=1\n    and a = where_a\n    and b = where_b\n```\n\nExample table_function with 2 parameters invocation:\n```sql\nselect * from {{ ref(\"my_table_function_with_parameters\") }}(1, 2)\n```\n\n### Python Support\n\ndbt added support for [Python models in version 1.3.0](https://docs.getdbt.com/docs/build/python-models). For most data platforms,\ndbt will package up the Python code defined in a `.py` file and ship it off to be executed in whatever Python environment that\ndata platform supports (e.g., Snowpark for Snowflake or Dataproc for BigQuery.) In dbt-duckdb, we execute Python models in the same\nprocess that owns the connection to the DuckDB database, which by default, is the Python process that is created when you run dbt.\nTo execute the Python model, we treat the `.py` file that your model is defined in as a Python module and load it into the\nrunning process using [importlib](https://docs.python.org/3/library/importlib.html). We then construct the arguments to the `model`\nfunction that you defined (a `dbt` object that contains the names of any `ref` and `source` information your model needs and a\n`DuckDBPyConnection` object for you to interact with the underlying DuckDB database), call the `model` function, and then materialize\nthe returned object as a table in DuckDB.\n\nThe value of the `dbt.ref` and `dbt.source` functions inside of a Python model will be a [DuckDB Relation](https://duckdb.org/docs/api/python/reference/)\nobject that can be easily converted into a Pandas/Polars DataFrame or an Arrow table. The return value of the `model` function can be\nany Python object that DuckDB knows how to turn into a table, including a Pandas/Polars `DataFrame`, a DuckDB `Relation`, or an Arrow `Table`,\n`Dataset`, `RecordBatchReader`, or `Scanner`.\n\n#### Batch processing with Python models\n\nAs of version 1.6.1, it is possible to both read and write data in chunks, which allows for larger-than-memory\ndatasets to be manipulated in Python models. Here is a basic example:\n```\nimport pyarrow as pa\n\ndef batcher(batch_reader: pa.RecordBatchReader):\n    for batch in batch_reader:\n        df = batch.to_pandas()\n        # Do some operations on the DF...\n        # ...then yield back a new batch\n        yield pa.RecordBatch.from_pandas(df)\n\ndef model(dbt, session):\n    big_model = dbt.ref(\"big_model\")\n    batch_reader = big_model.record_batch(100_000)\n    batch_iter = batcher(batch_reader)\n    return pa.RecordBatchReader.from_batches(batch_reader.schema, batch_iter)\n```\n\n### Writing Your Own Plugins\n\nDefining your own dbt-duckdb plugin is as simple as creating a python module that defines a class named `Plugin` that\ninherits from [dbt.adapters.duckdb.plugins.BasePlugin](dbt/adapters/duckdb/plugins/__init__.py). There are currently\nfour methods that may be implemented in your Plugin class:\n\n1. `initialize`: Takes in the `config` dictionary for the plugin that is defined in the profile to enable any\nadditional configuration for the module based on the project; this method is called once when an instance of the\n`Plugin` class is created.\n1. `configure_connection`: Takes an instance of the `DuckDBPyConnection` object used to connect to the DuckDB\ndatabase and may perform any additional configuration of that object that is needed by the plugin, like defining\ncustom user-defined functions.\n1. `load`: Takes a [SourceConfig](dbt/adapters/duckdb/utils.py) instance, which encapsulates the configuration for a\na dbt source and can optionally return a DataFrame-like object that DuckDB knows how to turn into a table (this is\nsimilar to a dbt-duckdb Python model, but without the ability to `ref` any models or access any information beyond\nthe source config.)\n1. `store`: Takes a [TargetConfig](dbt/adapters/duckdb/utils.py) instance, which encapsulates the configuration for\nan `external` materialization and can perform additional operations once the CSV/Parquet/JSON file is written. The\n[glue](dbt/adapters/duckdb/plugins/glue.py) and [sqlalchemy](dbt/adapters/duckdb/plugins/sqlalchemy.py) are examples\nthat demonstrate how to use the `store` operation to register an AWS Glue database table or upload a DataFrame to\nan external database, respectively.\n\ndbt-duckdb ships with a number of [built-in plugins](dbt/adapters/duckdb/plugins/) that can be used as examples\nfor implementing your own.\n\n### Interactive Shell\n\nAs of version 1.9.3, dbt-duckdb includes an interactive shell that allows you to run dbt commands and query the DuckDB database in an integrated CLI environment. The shell automatically launches the [DuckDB UI](https://duckdb.org/2025/03/12/duckdb-ui.html), providing a visual interface to explore your data while working with your dbt models.\n\nTo start the interactive shell, use:\n\n```bash\npython -m dbt.adapters.duckdb.cli\n```\n\nYou can specify a profile to use with the `--profile` flag:\n\n```\npython -m dbt.adapters.duckdb.cli --profile my_profile\n```\n\nThe shell provides access to all standard dbt commands:\n- `run` - Run dbt models\n- `test` - Run tests on dbt models\n- `build` - Build and test dbt models\n- `seed` - Load seed files\n- `snapshot` - Run snapshots\n- `compile` - Compile models without running them\n- `parse` - Parse the project\n- `debug` - Debug connection\n- `deps` - Install dependencies\n- `list` - List resources\n\nWhen you launch the shell, it automatically:\n1. Runs `dbt debug` to test your connection\n2. Parses your dbt project\n3. Launches the DuckDB UI for visual data exploration\n\nThe shell supports model name autocompletion if you install the optional `iterfzf` package:\n\n```\npip install iterfzf\n```\n\nExample workflow:\n1. Start the interactive shell\n2. View your project's models in the launched DuckDB UI\n3. Run `build` to build your models\n4. Immediately see the results in the UI and continue iterating\n\nThis interactive environment makes it easier to develop and test dbt models while simultaneously exploring the data in a visual interface.\n\n### Roadmap\n\nThings that we would like to add in the near future:\n\n* Support for Delta and Iceberg external table formats (both as sources and destinations)\n* Make dbt's incremental models and snapshots work with external materializations\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduckdb%2Fdbt-duckdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduckdb%2Fdbt-duckdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduckdb%2Fdbt-duckdb/lists"}