Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/erika-e/dbt-tips
Collection of dbt Tips and Tricks
https://github.com/erika-e/dbt-tips
Last synced: 3 months ago
JSON representation
Collection of dbt Tips and Tricks
- Host: GitHub
- URL: https://github.com/erika-e/dbt-tips
- Owner: erika-e
- License: gpl-3.0
- Created: 2021-05-23T14:53:10.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-12T19:59:12.000Z (over 2 years ago)
- Last Synced: 2024-08-03T21:03:23.426Z (6 months ago)
- Size: 56.6 KB
- Stars: 358
- Watchers: 18
- Forks: 28
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-dbt - dbt-tips - Excellent companion to your dbt practice with rich collection of tips. (Utilities)
README
# dbt-tips
Collection of dbt tips and tricks. Includes links to the dbt slack community which you can [join here](https://www.getdbt.com/community/join-the-community). See something that needs to be changed? See [contributing guidelines](CONTRIBUTING.md).
## Skip to a Section
* [New to dbt](#new-to-the-dbt-ecosystem-start-here-with-beginner-tutorials) - start here for beginner tutorials
* [Toolbox](#toolbox) - basic tools that make working with dbt way easier
* [Infrastructure and Deploying dbt](#infrastructure-and-deploying-dbt)
* [dbt CLI](#dbt-CLI) - tips and tricks for the dbt CLI
* [General Command Line](#command-line) - useful tricks for model and yml file manipulation from the command line
* [Git Tips](#git-tips) - useful in large dbt projects
* [Jinja](#jinja) - it's surprisingly hard to learn basic Jinja concepts
* [dbt Macros and dbt Specific Jinja](#dbt-macros-and-dbt-specific-jinja) - reference for dbt Macros and Jinja functions that come with dbt## dbt Concepts and Practices
* [How dbtLabs stuctures dbt projects](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview) ([prior version](https://discourse.getdbt.com/t/how-we-structure-our-dbt-projects/355))
* [dbtLabs dbt style guide](https://github.com/fishtown-analytics/corp/blob/master/dbt_style_guide.md)
* [HowTheydbt](https://github.com/stumelius/howtheydbt) - repo of company specific dbt practice publications
* [Overview of tests in dbt (updated for 0.20)](https://datacoves.com/post/an-overview-of-testing-options-in-dbt-data-build-tool)
* [Is Kimball Modeling Still Relevant?](https://discourse.getdbt.com/t/is-kimball-dimensional-modeling-still-relevant-in-a-modern-data-warehouse/225) and [Coalesce 2020: Kimball in the Context of the Modern Data Warehouse](https://www.youtube.com/watch?v=3OcS2TMXELU&list=PL0QYlrC86xQmPf9QUceFdOarYcv3ETSsz&index=19)
* [Make changes to your dbt source on your global project](https://discourse.getdbt.com/t/did-you-know-dbt-ships-with-its-own-project/764)
* [Overriding default schema and table names, using model file names as sources for schema and table names](https://discourse.getdbt.com/t/extracting-schema-and-model-names-from-the-filename/575)
* [Creating date dimensions tables with dbt](https://discourse.getdbt.com/t/date-dimensions/735/4)
* [Examples of custom schema tests](https://discourse.getdbt.com/t/examples-of-custom-schema-tests/181/5)
* [Sessionization best practices for large event tables](https://app.slack.com/client/T0VLPD22H/C0VLZPLAE/thread/C0VLZPLAE-1620406666.325100)
* [Updating a global dictionary value with Jinja](https://getdbt.slack.com/archives/CJN7XRF1B/p1608230420178900?thread_ts=1608156760.169900&cid=CJN7XRF1B)
* [None and undefined in logical tests in Jinja](https://getdbt.slack.com/archives/C2JRRQDTL/p1622223139233800?thread_ts=1622222319.232500&cid=C2JRRQDTL)
* [Analyzing dbt project performance with artifacts](https://discourse.getdbt.com/t/analyzing-fishtowns-dbt-project-performance-with-artifacts/2214)
* [dbt course refactoring legacy SQL to dbt](https://blog.getdbt.com/sql-refactoring-course/)
* [dbt analytics engineering guide](https://www.getdbt.com/analytics-engineering/)
* [Clean up orphaned tables and views without matching current models](https://getdbt.slack.com/archives/C2JRRQDTL/p1636438321428900)
* [Use a specific warehouse to run certain models (Snowflake)](https://docs.getdbt.com/reference/resource-configs/snowflake-configs#configuring-virtual-warehouses) [h/t](https://getdbt.slack.com/archives/CJN7XRF1B/p1636547281170700)
* [Using macros to manage UDFs (user defined functions)](https://getdbt.slack.com/archives/CJN7XRF1B/p1637366753295200)
* [dbt slack thread on approaches to CI builds and testing](https://getdbt.slack.com/archives/CMZ2Q9MA9/p1637248429101700)
* [The JaffleGaggle Story: Data Modeling for a Customer 360 View](https://docs.getdbt.com/blog/customer-360-view-identity-resolution)### New to the dbt Ecosystem? Start Here with Beginner Tutorials
* dbt - start with the [Fundamentals Course](https://courses.getdbt.com/courses/fundamentals)
* SQL - check out the [Mode SQL tutorial](https://mode.com/sql-tutorial/)
* git - try [Think Like Git](http://think-like-a-git.net/) if you prefer to learn by reading, or [Learn Git Branching](https://learngitbranching.js.org/?locale=en_US) if you prefer something visual and interactive
* Jinja - check out [these tutorials by Prezemek Rogala](https://ttl255.com/jinja2-tutorial-part-1-introduction-and-variable-substitution/) who wrote the live parser I recommend## Toolbox
* [dbt-utils](https://github.com/fishtown-analytics/dbt-utils) : Many multipurpose macros. If you're writing something complex or custom, there's probably a better way using functionality from dbt-utils
* [dbt-completion.bash](https://github.com/fishtown-analytics/dbt-completion.bash) : autocompletion for the dbt CLI [h/t](https://twitter.com/grantwinship/status/1397746292953128970)
* [dbt-codegen](https://github.com/fishtown-analytics/dbt-codegen) : macros that generate dbt code to the command line [h/t](https://twitter.com/JayPeeDevlin/status/1397743525270261766)
* [dbt-audit-helper](https://github.com/fishtown-analytics/dbt-audit-helper) : Zen and the art of data auditing. This package will change your life.
* [pre-commit-dbt](https://github.com/offbi/pre-commit-dbt) : Package of dbt pre-commit hooks that allow you to check quality of dbt project documentation, tests, etc
* [dbt-helper](https://github.com/mikekaminsky/dbt-helper) : Utility functions to compare WH to dbt, create schema files, and list dependencies
* [live Jinja Parser](https://j2live.ttl255.com/) : Useful tool for writing complex Jinja, does not include dbt-specific Jinja functions
* [yaml Checker](https://yamlchecker.com/) : yaml syntax validator
* [palm](https://github.com/palmetto/palm-cli) and [palm-dbt](https://github.com/palmetto/palm-dbt): automate dbt workflows, expedite onboarding, and containerize your dbt project
* [dagrules](https://discourse.getdbt.com/t/introducing-dagrules-a-linter-for-your-dag/3574) : A linter to enforce DAG organization conventions in your project [h/t](https://getdbt.slack.com/archives/C01NH3F2E05/p1641403576388400?thread_ts=1641403576.388400&cid=C01NH3F2E05)
* [countries-states-cities-database](https://github.com/dr5hn/countries-states-cities-database): Useful dbt seed file source for geographic data and information [h/t](https://getdbt.slack.com/archives/D01SRMHH7QB/p1644944421664699)## Infrastructure and Deploying dbt
* [dbt-github-workflow](https://github.com/slve/dbt-github-workflow) - example CI / CD pipeline with BigQuery, GCP, and Airflow
* [Example Meltano deploment](https://github.com/mattarderne/meltano-batch) - using AWS and Terraform
* [Slim CI in Docker for BigQuery](https://medium.com/teads-engineering/setup-a-slim-ci-for-dbt-with-bigquery-and-docker-ce8e0a1a38f) [h/t](https://getdbt.slack.com/archives/C01NH3F2E05/p1639477888286000?thread_ts=1639477888.286000&cid=C01NH3F2E05)
* [dbt-artifacts](https://github.com/brooklyn-data/dbt_artifacts) - Snowflake-specific dbt package that builds dimensional models from dbt artifacts and provides macros to upload artifacts to Snowflake
* [slack thread coordinating dbt & Looker deployments](https://getdbt.slack.com/archives/C0VLZPLAE/p1643313047286400)## dbt CLI
### Run And Test Modified and Downstream Models
`dbt run -m state:modified+ && dbt test -m state:modified+` [h/t](https://twitter.com/itswagg/status/1397758065043202049) and [dbt's caveats](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats)
To list modifications since your last local run use `dbt ls -m state:modified --state ./target/`.
### Run Downstream Children from a Specific Node
`dbt run -m model_name_here+1` only run the children one layer downstream in the DAG
or
`dbt run -m model_name_here+n` where n is the offset### Use Dictionaries as Selectors
`dbt run -m staging.airtable.*` to run all models in models/staging/airtable [h/t](https://twitter.com/grantwinship/status/1397732784844779529) or `dbt run -m staging/airtable`
### List All Models Downstream
`dbt ls -m mymodel+`
### Run Models with a Specific Tag and Their Parents
`dbt run -m +tag:foo` [h/t](https://getdbt.slack.com/archives/C2JRRQDTL/p1626273465175400)
### Run a Specific Snapshot
`dbt snapshot --select order_snapshot`
### Run Multiple Snapshots
`dbt snapshot --select model1 model2 ... modeln`
### Output dbt Logs to stdout Instead of the dbt.log File
`dbt --debug run` gives something similar to the logs [h/t](https://getdbt.slack.com/archives/C2JRRQDTL/p1622833644408600?thread_ts=1622833132.408500&cid=C2JRRQDTL)
### Run Tests on Sources Only
`dbt test --models source:*` [dbt docs](https://docs.getdbt.com/docs/building-a-dbt-project/tests)
### Run Tests By Materialization
`dbt test -m config.materialized:snapshot` or `dbt test -m config.materialzied:seed` not guaranteed to be supported [h/t](https://getdbt.slack.com/archives/C01UM2N7814/p1624369845111600?thread_ts=1624192939.105900&cid=C01UM2N7814) and caveat
### Run Tests by Tag After Tagging Models With a Specific Materialization
```yaml
# dbt_project.yml
...
snapshots:
+tags: ['snapshot']
```To run use `dbt test -m tag:snapshot` [h/t](https://getdbt.slack.com/archives/C01UM2N7814/p1624192939105900)
### Run A Macro With Arguments
`dbt run-operation my_macro --args '{"myarg1":"arrrgh", "myarg2":"aaaaaaaargh"}'`
### dbt Keyboard Shortcuts
* `Command` + `/` to block comment / uncomment yml sections
* `Command` + `/` to block comment SQL sections in dbt cloud
* `Command` + `ENTER` to preview data in dbt cloud
* Use `F1` in dbt cloud to bring up a list of dbt cloud keyboard shortcuts [h/t](https://getdbt.slack.com/archives/CMZ2V0X8V/p1616095693024200?thread_ts=1616092005.022100&cid=CMZ2V0X8V)## dbt Materializations
### Incremental
* [Incremental model strategy performance comparisons for BigQuery](https://discourse.getdbt.com/t/benchmarking-incremental-strategies-on-bigquery/981)
* [Incrementally --full-refresh an incremental model](https://getdbt.slack.com/archives/C0VLZPLAE/p1625066877102900)
* [Adding a column to an incremental model](https://discourse.getdbt.com/t/adding-a-column-to-an-incremental-model/55)
* [Performance improving ideas for lagre incremental models](https://discourse.getdbt.com/t/large-incremental-models/2096)### Snapshots
* [dbt guide to snapshots](https://blog.getdbt.com/track-data-changes-with-dbt-snapshots/)
* [Handling hard deletes in snapshots](https://discourse.getdbt.com/t/handling-hard-deletes-from-source-tables-in-snapshots/1005/2)
* [Handling late arriving records in snapshots](https://discourse.getdbt.com/t/handling-late-arriving-records-in-fivetran-synced-snapshots/1641)
* [Adding a column to a snapshot with `check_cols` strategy](https://discourse.getdbt.com/t/migrating-a-snapshot-after-adding-a-check-col/485)
* [Use dynamic schemas for snapshots](https://discourse.getdbt.com/t/using-dynamic-schemas-for-snapshots/1070)## dbt Test Examples
* [Combine unique_where and combination_of_columns](https://getdbt.slack.com/archives/C01UM2N7814/p1639159790186900)
* [Opposite of accepted_values](https://getdbt.slack.com/archives/C01UM2N7814/p1640690640249000)
* [Store test failures in a custom schema](https://getdbt.slack.com/archives/C01UM2N7814/p1638983308171300)## VS Code
This section is stubby for now -- but I'd welcome contributions for how to make VS Code a more effective editor for dbt work!
* [Guide to using dbt in VSCode with lots of great setup tips](https://dbt-msft.github.io/dbt-msft-docs/docs/guides/vscode_setup/)
* [vscode-dbt-power-user](https://marketplace.visualstudio.com/items?itemName=innoverio.vscode-dbt-power-user)
* dbt-power-user has commands for `Show Compiled SQL` and `Show Run SQL` [h/t](https://discourse.getdbt.com/t/how-we-set-up-our-computers-for-working-on-dbt-projects/243/18)
* [brew installed dbt `dbt is not installed` message in VS Code](https://getdbt.slack.com/archives/C2JRRQDTL/p1635524229176400)## Command Line
### Create Multiple (Blank) Model Files Following A Pattern
`touch {prefix1,prefix2}_model.sql`
Sometimes I'll use a spreadsheet to easily generate the list of prefixes, check out the `TEXTJOIN()` function in google sheets.
### Find and Delete Files Matching a Pattern Including in Subdirectories
Find first using `find path/to/directories/with*ifneeded -type f -name "example_file_prefix*"` to test your search criteria.
When happy, `find path/to/directories/with*ifneeded -type f -name "example_file_prefix*" -delete` to delete. [source](https://askubuntu.com/questions/377438/how-can-i-recursively-delete-all-files-of-a-specific-extension-in-the-current-di)
### Find Specific Lines in a .yml File and Delete Them
First, test with `sed -e '/regex_goes_here/{action goes here}' path_to_file(s)_to_act_on`
An example action to delete the matching line and subsequent lines could look like `{N;N;d}` to delete the line and the line after it. [More syntax examples for this use case](https://stackoverflow.com/questions/4396974/sed-or-awk-delete-n-lines-following-a-pattern)
When you're happy with your output, add `-i ''` to the start of your command. This will replace inplace without creating a new file. This `sed -i '' -e '/ - name: my_model_prefix[a-z]*/{N;N;N;d;}' models/staging/*.yml` will delete the matching line and the 3 lines that follow it in the matching file(s) and line(s).
### Shell Config for dbtmrt
`function dbtmrt () { dbt run -m $1 && dbt test -m $1}` for config
`dbtmrt model_name` to use [h/t](https://twitter.com/grantwinship/status/1397731624574492676)### Two-Terminal Docs
1. run dbt docs generate
2. open a new terminal window to run dbt docs serve
3. just always have the docs open in the background - re run dbt docs generate in first terminal and refresh if needed
[h/t](https://twitter.com/clairebcarroll/status/1397734400134164480)### Shell Script Examples for zsh
```zsh
# Functions
function dbtr() {
dbt run -m $1 --fail-fast
say done
}function dbtrt() {
dbt run -m $1 && dbt test -m $1
say done
}```
### Shell Script to Compile Audit Helper
Note that the below has only been tested on a mac, and requires you to create a SQL file called `audit_helper_template` in the `analysis` folder of your dbt project.
```zsh
function dbtah() {
# substitute the model name from the argument
gsed -i "s/model_to_audit/$1/" analysis/audit_helper_template.sql
# enable the audit_helper_template
gsed -i 's/enabled = false/enabled = true/' analysis/audit_helper_template.sql
# compile
dbt compile -m audit_helper_template
cat target/compiled/*/analysis/audit_helper_template.sql | awk NF | pbcopy
# modify the template back to the defaults
gsed -i 's/enabled = true/enabled = false/' analysis/audit_helper_template.sql
gsed -i "s/$1/model_to_audit/" analysis/audit_helper_template.sql
say copy pasta
}```
```SQL
-- The model needs to be disabled so it will be ignored while in typical compilation
-- This is required because dbt won't find a node named 'model_to_audit'
-- Substitute the correct production schema and database for your environment
{{
config(
enabled = false,
)
}}{%- set audit_model = "model_to_audit" -%}
{%- set prod_schema = "prod_schema_name" -%}
{%- set dbt_database = "prod_database_name" -%}
{%- set dbt_relation = ref(audit_model) -%}{%- set old_etl_relation=adapter.get_relation(
database=dbt_database,
schema=prod_schema,
identifier=audit_model
) -%}{# Generate the audit query - update primary key as needed #}
{{ audit_helper.compare_relations(
a_relation=old_etl_relation,
b_relation=dbt_relation,
primary_key="primary_key_update_before_running"
) }}
```To run use `dbtah my_target_model`. Template heavily inspired by a tool created by [Lewis Davies](https://github.com/LewisDavies)
## Git Tips
### git mv to reorganize your project without losing history
If you need to move files from one directory to another, cd into the parent of the directories where the files are located, then run `git mv child_dir1/model_name child_dir2/with_nested_dirs_if_needed`. To make sure it worked, run `git status` afterwards and you should see your pending changes with `renamed:` ahead of the file name. Commit as usual, keep rocking with your history!
You can find lots of arguments about whether or not this *really* keeps the history on stack overflow and elsewhere around the internet. I've found that using this method preserves commit history, but ymmv.
### Rename a file without losing history
`git mv old_file_name.sql new_file_name.sql` for example. Same comment about but-does-it-really from above applies.
### Clean up branches and references to remote branches that have been merged
Remove local branches with `git branch -d my-merged-branch-name` for merged branches or `git branch -D unmerged-local-branch` for unmerged branches. You can run `git remote show origin` which will show all branches locally and on the origin. Running `git fetch --prune` will remove all branches in this listing that look like `refs/remotes/origin/branch-name` which are local snapshots of remote branches. [source](https://stackoverflow.com/questions/20106712/what-are-the-differences-between-git-remote-prune-git-prune-git-fetch-prune/20107184#20107184)
### Update your feature branch with new changes from master
You've been working along, and now git says your branch is out of date with the main branch. To fix this, swap over to the main branch with `git checkout main` and then get any new changes with `git pull --rebase`. Cool, now you're good locally, so switch over to your branch again with `git checkout feature` and use `git merge main` to update your branch.
### I made a bunch of changes on a branch and now my changes are failing
Assuming your branch has not been pushed up to the remote repo, you can get around this! First, create a temporary branch to save your work `git checkout -b temp_save` and then switch back to your develpoment branch `git checkout my-feat`. Next, reset your branch and your working tree to the last commit you are confident in with `git reset --hard 44e447a1`. What this will do is roll your branch and your working tree all the way back to the commit `44e447a1`. Next, you can apply the commits from your saved branch to the reset branch to 'walk forward' in time one by one until you find the problem that was introduced. To apply a commit from one branch to another, use `git cherry-pick 1b132878`. This will create a totally new commit on your branch with the changes from the commit `1b132878`. It's not a good idea to do this if you've already pushed your branch, since you're then rewriting history and potentially others' commits.
## Jinja
A live parser is an incredibly useful tool while you're learning and writing jinja. I like [this one](https://j2live.ttl255.com/)
### Jinja Delimiters
* A statement looks like `{% ... %}`
* An expression looks like `{{ ... }}`
* A comment looks like `{# ... #}` [h/t](http://courses.getdbt.com)
* To escape a sequence you can use `{{ '{{ my_escaped_var}}'~`
* To escape longer blocks of code you can use `{% raw %} {% endraw %}` [Jinja2 docs](https://jinja.palletsprojects.com/en/3.0.x/templates/?highlight=endraw#escaping)### Comments that Don't Show in Compiled SQL
`{# You won't see me #}` vs `--I will show up all day` vs `{# /* I won't show up AND your linter won't freak out /* #}` [dbt discourse](https://discourse.getdbt.com/t/removing-jinja-comments-from-compiled-sql/80)
### Declaring Variables
`{% set my_list = ['one','two',three'] %}` and accessing items from lists `{{ my_list[1] }}` will return `two`
```Jinja2
{% set my_dict = {
'key1': 'value1',
'key2': 'value2'
} %}
```A general best practice is to do this at the top of your file.
### If Else Blocks
```Jinja2
{% set condition = True %}{% if condition %}
what to execute when true{% else %}
what to execute otherwise{% endif %}
```### For Loops
```Jinja2
{% set my_list = ['one','two',three'] %}{% for item in my_list %}
{{item}}
{% endfor %}
```## dbt Macros and dbt Specific Jinja
### When to Use if execute in Macros
[When you have queries that need to execute during the parse phase!](https://docs.getdbt.com/reference/dbt-jinja-functions/execute)
### Macros Calling Other Macros
[Example of a Macro Calling Another Macro](https://github.com/fishtown-analytics/dbt/issues/469)
### For Loops for Select Statements
For loops for select statements need special handling to prevent a comma after the last item in the select. The hyphens are used to trim the whitespace for better looking compiled SQL.
```SQL
SELECT
{% for item in list_of_items -%}
{{item}}
{%- if not loop.last -%}
,
{%- endif %}
{% endfor -%}
```### Macros with Parameters
```Jinja2
{% macro my_macro(required_parameter, optional_parameter = default_value) %}
...
{% endmacro %}
```### adapter - a dbt Jinja Function
[adapter](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter) performs many useful functions and adapts them to the specific database context you're using
* [dispatch](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#dispatch) - database specific versions of macros
* [get_missing_columns](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#get_missing_columns) - find columns existing in one relation but missing in another, return as a list - can identify new coulmns in sources
* [expand_target_column_types](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#expand_target_column_types) - to make one relation match another, some limitations
* [get_relation](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#get_relation) - returns a relation object from the database from the provided database, schema, identifier
* [get_columns_in_relation](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#get_columns_in_relation) - returns a list of columns in a relation
* [create / drop schema or relation](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#create_schema) - methods for creating, dropping, or renaming schemas or relations in the database