Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/AxelThevenot/dbt-assertions
Package to assert rows in-line with dbt macros.
https://github.com/AxelThevenot/dbt-assertions
Last synced: 3 months ago
JSON representation
Package to assert rows in-line with dbt macros.
- Host: GitHub
- URL: https://github.com/AxelThevenot/dbt-assertions
- Owner: AxelThevenot
- License: apache-2.0
- Created: 2024-01-15T16:45:21.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-02-27T13:10:18.000Z (9 months ago)
- Last Synced: 2024-07-29T13:33:38.771Z (3 months ago)
- Size: 854 KB
- Stars: 57
- Watchers: 2
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-dbt - dbt-assertions - Package to assert rows in-line with dbt macros. (Packages)
README
![dbt-assertions-logo.png](./img/dbt-assertions-logo.png)
## Features
â **Robust Data Quality Checks**
`dbt-assertions` ensures thorough data quality assessments at the row level,
enhancing the reliability of downstream models.đ **Efficient Exception Detection**
Granular row-by-row exception detection identifies and
flags specific rows that fails assertions, streamlining the resolution process.đ ī¸ **Customizable Assertions & Easy Integration**
Easy-to-use macros `assertions()` and `assertions_filter()` empower users to
customize without barriers data quality checks within the model YAML definition,
adapting to specific data validation needs.đ **An Easy Shift from your Actual Workflows**
A generic test `generic_assertions()` to perform dbt tests as usual,
testing the package easily without compromising your current workflows.
**you can test the package with this generic test easily without having to rebuild you table**## Content
- [Features](#features)
- [Content](#content)
- [Install](#install)
- [Supported databases](#supported-databases)
- [Dependencies](#dependencies)
- [Variables](#variables)
- [Basic Example](#basic-example)
- [Documentation](#documentation)
- [Macros](#macros)
- [assertions](#assertions)
- [assertions\_filter](#assertions_filter)
- [Tests](#tests)
- [generic\_assertions](#generic_assertions)
- [Model definition](#model-definition)
- [Yaml general definition](#yaml-general-definition)
- [Custom assertions](#custom-assertions)
- [`null_as_exception`](#null_as_exception)
- [`__unique__` helper](#__unique__-helper)
- [`__not_null__` helper](#__not_null__-helper)
- [Custom column name](#custom-column-name)
- [Contribution](#contribution)
- [Acknowledgments](#acknowledgments)
- [Contact](#contact)## Install
`dbt-assertions` currently supports `dbt 1.7.x` or higher.
Check [dbt package hub](https://hub.getdbt.com/calogica/dbt_expectations/latest/)
for the latest installation instructions,
or [read the docs](https://docs.getdbt.com/docs/package-management)
for more information on installing packages.Include in `packages.yml`
```yaml
packages:
- package: AxelThevenot/dbt_assertions
version: [">=1.0.0", "<2.0.0"]
# for the latest version tag
```## Supported databases
|**Database**|[assertions](#assertions)|[assertions_filter](#assertions_filter)|[\_\_unique__](#__unique__-helper)|[\_\_not_null__](#__not_null__-helper)|[generic_assertions](#generic_assertions)
|:---:|:---:|:---:|:---:|:---:|:---:|
|BigQuery (default)|â |â |â |â |â |
|Snowflake|â |â |â (not nested fields)|â (not nested fields)|â |
|DuckDB|â |â |â (not nested fields)|â (not nested fields)|â |
|Others|âšī¸|âšī¸|âšī¸|âšī¸|âšī¸|- â : supported
- âī¸: supported not tested
- â: not supported
- âšī¸: Opened to contributions â¤ī¸ (see the [integration_tests](./integration_tests/) folder on how to test your adapter)For latest release, see [https://github.com/AxelThevenot/dbt-assertions/releases](https://github.com/AxelThevenot/dbt-assertions/releases/latest)
## Dependencies
This package do not have dependencies.
## Variables
The following variable **can** be defined in your `dbt_project.yml` file to change the default `exceptions` column name used by the package.
```yml
vars:
"dbt_assertions:default_column": "exceptions"
```## Basic Example
Check the [basic_example](examples/basic_example) example.
## Documentation
### Macros
#### [assertions](macros/assertions.sql)
`assertions()` macro generates a select expression for row-level assertions.
**Arguments:**
- **column (optional[str]):** column to read the assertions from.---
This macro parses the schema model YAML to extract row-level assertions;
[custom assertions](#custom-assertions),
[unique](#__unique__-helper),
and [not-null](#__not_null__-helper).
It then constructs an array of exceptions for each row based on its assertions results.By default, it will generate assertions based on
your [YAML model definition](#model-definition)
reading configuration for a column named `exceptions`.You can call the macro using `column` argument to change this default column.
```sql
SELECT
*,
{{ dbt_assertions.assertions(column='warnings') }},
FROM {{ ref('my_model') }}
```**Note:** this macro is made to generate assertions based on you query result.
It means it must be generated at the end of the query.```sql
WITH
[...] -- Other CTEs
final AS (
SELECT
[...]
FROM {{ ref('my_model') }}
)-- After query results
SELECT
*,
{{ dbt_assertions.assertions() }},
FROM final
```#### [assertions_filter](macros/assertions_filter.sql)
`assertions_filter()` macro generates an expression to filter rows based on
assertions results, generated with the [`assertions()`](#assertions) macro.**Arguments:**
- **column (optional[str]):** Column to read the exceptions from.
- **exclude_list (optional[list[str]]):** Assertions to exclude in the filter.
- **include_list (optional[list[str]]):** Assertions to include in the filter.
- **reverse (optional[bool]):** returns rows without exception when `reverse=false`,
and rows with exceptions when `reverse=true`.---
By default, each row with exception(s) will be filtered.
```sql
SELECT
*
FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter() }}
```You can change this default behaviour specifying an
optional `exclude_list` or `include_list` argument (not both).```sql
SELECT
*
FROM {{ ref('my_model') }}
WHERE {{ dbt_assertions.assertions_filter(exclude_list=['assertions_id']) }}
```### Tests
#### [generic_assertions](tests/generic/generic_assertions.sql)
Generates a test to get rows based on exceptions.
It will returns the rows without any exception by default.
You can change this default behaviour specifying a exclude_list or include_list (not both).You must defined beforehand the assertions for the model. [More on YAML definition for assertions](#yaml-general-definition).
**Arguments:**
- **column (optional[str]):** Column to read the exceptions from.
- **exclude_list (optional[list[str]]):** Assertions to exclude in the filter.
- **include_list (optional[list[str]]):** Assertions to include in the filter.
- **re_assert (optional[bool]):** to set to `true` if your assertion field
is not calculated in your table.Configure the generic test in schema.yml with:
```yml
model:
name: my_model
tests:
- dbt_assertions.generic_assertions:
[column: ]
[exclude_list: ]
[include_list: ]
[re_assert: true | false]columns:
...
````[]` represents optional parts. Yes everything is optional but let's see it by examples.
In the [basic test example](./examples/basic_test_example/)
you can easily create your test as follows then run your `dbt test` command.```yml
models:
- name: basic_test_example_d_site
tests:
- dbt_assertions.generic_assertions:
column: exceptions
include_list:
- site_id_is_not_null
# `re_assert: true` to use only if your assertion's column
# is not computed and saved in your table.
re_assert: truecolumns:
...
```### Model definition
#### Yaml general definition
The assertions definition **must** be created
**under a column definition of your model** and respects the following.```yml
assertions:
[__unique__: ]
[__not_null__: __unique__ | ][:
description: []
expression:
null_as_exception: []]
...
````[]` represents optional parts.
Yes, everything is optional but let's see it by examples.#### Custom assertions
Custom assertions are the basics assertions.
> The package is made to support every assertions as long as
> it is supported in a SELECT statement of your underlying database.
> **So you can do a lot of things**.It is represented as key values. Keys are the ID of the assertions.
Each assertions is defined by at least an `expression` which will be rendered
to be evaluated as your test.`description` and [`null_as_exception`](#null_as_exception) are optional.
```yml
assertions:
unique:
description: "Row must be unique."
expression: "1 = COUNT(1) OVER(PARTITION by my_id)"site_creation_date_is_past:
description: "Site must be created in the past."
expression: "site_creation_date <= CURRENT_DATE()"site_type_pick_list:
description: "The site type be must in its known picklist."
expression: |
site_type IN (
'store',
'ecommerce',
'drive',
'pickup'
)
```#### `null_as_exception`
`null_as_exception` is an optional configuration for your assertion.
Default to `false` it is the return result if your expression is evaluated to `NULL`.Default behaviour is set to `false` because one assertion must evaluate on thing.
Prefer using the [`__not_null_`](#__not_null__-helper) helper instead.In our previous, if we want to also avoid `NULL` site types.
```yml
assertions:
...site_type_pick_list:
description: "The site type be must in its known picklist."
expression: |
site_type IN (
'store',
'ecommerce',
'drive',
'pickup'
)
null_as_exception: true
```#### `__unique__` helper
As guaranteeing uniqueness of rows is a concern in most of the use cases,
the `__unique__` helper is here to avoid writing complex and repetitive queries.---
```yml
assertions:
unique:
description: "Row must be unique."
expression: "1 = COUNT(1) OVER(PARTITION by key_1, key_2)"
```The above configuration is the same as writing
```yml
assertions:
__unique__:
- key_1
- key_2
```---
You can also verify unique keys for nested/repeated structure. It will generate:
- One assertion for the 0-depth guaranteeing uniqueness **across the rows**.
- One assertion **for each** repeated field guaranteeing uniqueness **within the row**.The following example will generate the assertions:
- `unique`: Row must be unique over the unique keys.
- `nested_1__unique`: Items must be unique **within** nested_1 in the row.
- `nested_1.nested_2__unique`: Items must be unique **within** nested_1.nested_2 in the row.```yml
assertions:
__unique__:
- key_1
- key_2
- nested_1:
- key_3
- key_4
- nested_2:
- key_5
- key_6
```#### `__not_null__` helper
As guaranteeing not null values is also concern in most of the use cases,
the `__not_null__` helper is here to avoid writing complex and repetitive queries.---
```yml
assertions:
key_1__not_null:
description: "key_1 is not null."
expression: "key_1 IS NOT NULL"key_2__not_null:
description: "key_2 is not null."
expression: "key_2 IS NOT NULL"
```The above configuration is the same as writing
```yml
assertions:
__unique__:
- key_1
- key_2__not_null__:
- key_1
- key_2
```And as the two helpers are often linked,
you can rewrite the assertions as follows, which is also the same.```yml
assertions:
__unique__:
- key_1
- key_2__not_null__: __unique__
```---
You can also verify unique keys for nested/repeated structure. It will generate:
- One assertion **for each column** of the 0-depth guaranteeing not null.
- One assertion **for each column** under the repeated field guaranteeing
**all the values are not null within the row**.The following example will generate the assertions:
- `key_1__not_null`: key_1 is not null.
- `key_2__not_null`: key_2 is not null.
- `nested_1.key_3__not_null`: nested_1.key_3 **are** not null.```yml
assertions:
__not_null__:
- key_1
- key_2
- nested_1:
- key_3
```#### Custom column name
If `exceptions` column is not a naming convention you like,
you can still opt for a column name you choose and the macro will
still work with the `from_colum` argument.You can also play with multiple columns.
```yml
model:
name: my_model
columns:
...
- errors:
assertions:
__unique__:
- key_1
- key_2__not_null__: __unique__
- warns:
assertions:
site_creation_date_is_past:
description: "Site must be created in the past."
expression: "site_creation_date <= CURRENT_DATE()"
```And in your model query.
```sql
WITH final AS
(
SELECT ...
)
SELECT
*,
{{ dbt_assertions.assertions(column='errors') }},
{{ dbt_assertions.assertions(column='warns') }}
FROM {{ ref('my_model') }}
```## Contribution
If you want to contribute, please open a Pull Request or an Issue on this repo.
Feel free to reach me [Linkedin](https://www.linkedin.com/in/axel-thevenot/).## Acknowledgments
Special thank to
- [Victor Vaneecloo](https://www.linkedin.com/in/victorvaneecloo/) for the generic test
- [Guillaume Blaquiere](https://www.linkedin.com/in/guillaume-blaquiere-38693b15/) for its help in terminology
- [Daniel Bartley](https://www.linkedin.com/in/dbrtly/) for its help in terminology
- [Benoit Perigaud](https://www.linkedin.com/in/benoit-perigaud/) for the Snowflake support## Contact
If you have any question, please open a new Issue or
feel free to reach out to [Linkedin](https://www.linkedin.com/in/axel-thevenot/)---
Happy coding with **dbt-assertions**!