Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kgmcquate/dbt-testgen

Generate DBT tests based on sample data
https://github.com/kgmcquate/dbt-testgen

dbt dbt-packages sql testing-tools

Last synced: about 2 months ago
JSON representation

Generate DBT tests based on sample data

Awesome Lists containing this project

README

        

# dbt-testgen

- [dbt-testgen](#dbt-testgen)
- [About](#about)
- [Install](#install)
- [Supported Databases](#supported-databases)
- [Test types](#test-types)
- [Usage](#usage)
- [Macros](#macros)
- [get\_test\_suggestions](#get_test_suggestions)

# About
`dbt-testgen` is a [dbt](https://github.com/dbt-labs/dbt) package that autogenerates dbt test YAML based on real data.

Code documentation available at [here](https://kgmcquate.github.io/dbt-testgen/)

Inspired by [dbt-codegen](https://github.com/dbt-labs/dbt-codegen) and [deequ Constraint Suggestion](https://github.com/awslabs/deequ/blob/master/src/main/scala/com/amazon/deequ/examples/constraint_suggestion_example.md).

# Install
`dbt-testgen` currently supports `dbt 1.2.x` or higher.

Include in `packages.yml`:
```yaml
packages:
- git: https://github.com/kgmcquate/dbt-testgen
```

# Supported Databases
The following databases are supported:
- Snowflake
- Databricks
- RedShift
- BigQuery
- Postgres
- DuckDB

Integration tests are run for each of these databases in [Actions](https://github.com/kgmcquate/dbt-testgen/actions).

# Test types
dbt-testgen can generate these types of tests, using [built-in tests](https://docs.getdbt.com/reference/resource-properties/data-tests), [dbt_utils](https://github.com/dbt-labs/dbt-utils), and [dbt-expectations](https://github.com/calogica/dbt-expectations/):
- [uniqueness](https://github.com/dbt-labs/dbt-utils?tab=readme-ov-file#unique_combination_of_columns-source)
- [not_null](https://docs.getdbt.com/reference/resource-properties/data-tests#not_null)
- [string length](https://github.com/calogica/dbt-expectations/tree/main?tab=readme-ov-file#expect_column_value_lengths_to_be_between)
- [range](https://github.com/dbt-labs/dbt-utils?tab=readme-ov-file#accepted_range-source)
- [accepted_values](https://docs.getdbt.com/reference/resource-properties/data-tests#accepted_values)
- [recency](https://github.com/dbt-labs/dbt-utils?tab=readme-ov-file#recency-source)

# Usage
The DBT config YAML is generated by a Jinja macro, `get_test_suggestions`, which you can run like this:
```powershell
dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('mymodel')) }}"
```
Output:
```yaml
models:
- name: mymodel
tests:
- dbt_utils.recency:
field: day
datepart: day
interval: 2
columns:
- name: user_id
description: Numeric range test generated by dbt-testgen
tests:
- unique
- not_null
- dbt_utils.accepted_range:
min_value: 1
max_value: 30
- name: username
tests:
- unique
- not_null
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 8
max_value: 15
row_condition: '"username" is not null'
- name: email
tests:
- unique
- not_null
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 18
max_value: 25
row_condition: '"email" is not null'
- name: user_status
tests:
- accepted_values:
values:
- active
- inactive
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 6
max_value: 8
row_condition: '"user_status" is not null'
- name: age
tests:
- dbt_utils.accepted_range:
min_value: 22
max_value: 35
```




You can output to a file like this:
```yaml
dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('mymodel')) }}" >> models/schema.yml
```




You can also merge with an existing properties YAML file:
```bash
EXISTING_YAML_BODY=`cat models/schema.yml`
dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('users'), dbt_config=fromyaml(\"${EXISTING_YAML_BODY}\")) }}"
```




Here's an example of more advanced usage:
```bash
EXISTING_YAML_BODY=$(cat <