An open API service indexing awesome lists of open source software.

https://github.com/llucmh/dbt-checks

Reusable, declarative data quality checks for dbt projects.
https://github.com/llucmh/dbt-checks

analytics analytics-engineering bigquery data-quality data-quality-checks data-validation data-validation-library databricks date-engineering dbt duckdb postgres snowflake sql testing testing-framework testing-tool

Last synced: 23 days ago
JSON representation

Reusable, declarative data quality checks for dbt projects.

Awesome Lists containing this project

README

          


dbt-checks banner

---








---

**`dbt-checks`** is a lightweight library of reusable data quality checks for dbt projects.

It provides simple, expressive tests to validate business rules and data integrity directly in your models — without writing custom SQL every time.

> ⚠️ Early-stage project — feedback and contributions are welcome.

---

# Installation

Add the package to your `packages.yml`:

```yaml
packages:
- git: https://github.com/LlucMH/dbt-checks.git
revision: v0.3.2
```

Then install dependencies:

``` bash
dbt deps
```

💡 Always pin a version in production projects.

# Usage

Checks can be added directly to models or columns in your schema files.

Example:

``` yaml
models:
- name: orders
columns:
- name: value
data_tests:
- dbt_checks.non_negative
- dbt_checks.between_values:
arguments:
min_value: 0
max_value: 10000
```

Run tests as usual:

``` bash
dbt test
```

# Scoped Checks with `where`

All checks support an optional `where` argument to apply validations only to a subset of rows.

This is useful when you want to validate specific business segments, statuses, partitions, or recent data.

Example:

```yaml
models:
- name: orders
columns:
- name: value
data_tests:
- dbt_checks.greater_than:
arguments:
value: 0
where: "status = 'active'"
```

The `where` expression is applied before the check runs.

# Standardized Failure Output

dbt-checks provides standardized and human-readable failure outputs designed for easier debugging and CI visibility.

Instead of generic outputs like:

```text
Got 1 result, configured to fail if != 0
```

checks now expose contextual failure information.

## Row-level checks

Example output:

| failing_value | expected_min_value | failed_check | failure_reason |
| --- | --- | --- | --- |
| -5 | 0 | non_negative | Value must be greater than or equal to 0 |

Used by:
- numeric checks
- string checks
- most temporal checks

---

## Aggregation checks

Example output:

| actual_value | expected_min_value | expected_max_value |
| --- | --- | --- |
| 1500 | 0 | 1000 |

Used by:
- avg_between
- sum_between
- min_between
- max_between
- row_count_between

---

## Ratio checks

Example output:

| actual_ratio | expected_min_ratio | expected_max_ratio |
| --- | --- | --- |
| 0.92 | 0.0 | 0.80 |

Used by:
- null_ratio_between
- positive_ratio_between
- negative_ratio_between
- value_ratio_between

---

## Additional Context

Checks may also expose:

- `failed_check`
- `failure_reason`
- `applied_condition`
- `actual_length`
- `actual_diff_days`
- `actual_day_of_week`

This makes dbt-checks outputs easier to:
- debug in CI
- inspect in stored failures
- integrate with observability tooling
- consume programmatically

# NULL Handling

dbt-checks follows a consistent and explicit null-handling strategy.

Most checks ignore null values by default.
Use dedicated checks to validate null presence.

## Summary:

- Numeric → ignored
- String → ignored
- Temporal → ignored
- Aggregation → ignored (SQL behavior)
- Row count → includes nulls
- Ratio checks → explicit handling

Use:
- null_ratio_below
- null_ratio_between

# Available Checks

dbt-checks provides reusable data validation tests grouped by category.

## Numeric

Numeric checks validate numeric ranges and thresholds.

Check | Description
----- | ----------
`non_negative` | Ensures values are ≥ 0
`non_positive` | Ensures values are ≤ 0
`greater_than` | Ensures values are greater than a threshold
`greater_or_equal_than` | Ensures values are ≥ a threshold
`less_than` | Ensures values are less than a threshold
`less_or_equal_than` | Ensures values are ≤ a threshold
`between_values` | Ensures values fall within a numeric range

Example

``` yaml
columns:
- name: value
data_tests:
- dbt_checks.between_values:
arguments:
min_value: 0
max_value: 100
```
## String

String checks validate textual fields such as identifiers or formatted values.

Check | Description
----- | ----------
`not_blank` | Ensures strings are not empty or whitespace
`length_between` | Validates string length range
`matches_regex` | Validates a regex pattern
`starts_with` | Ensures string starts with prefix
`ends_with` | Ensures string ends with suffix
`contains` | Ensures string contains substring

Example

``` yaml
columns:
- name: email
data_tests:
- dbt_checks.matches_regex:
arguments:
pattern: "^[^@]+@[^@]+\\.[^@]+$"
```
## Temporal

Temporal checks validate date and timestamp fields.

Check | Description
----- | ----------
`not_future_date` | Ensures date is not in the future
`not_before_date` | Ensures date is after a minimum date
`between_dates` | Ensures date is within a range
`recent_date` | Ensures date is within N days
`date_diff_less_than` | Ensures difference between two dates is within threshold
`no_weekend_dates` | Ensures dates do not fall on weekends

Example

``` yaml
columns:
- name: date
data_tests:
- dbt_checks.recent_date:
arguments:
max_age_days: 7
```
## Aggregation

Aggregation checks validate dataset-level metrics.
Nulls follow SQL behavior (ignored in aggregation).

Check | Description
----- | ----------
`row_count_greater_than` | Ensures model has at least N rows
`row_count_less_than` | Ensures model has at most N rows
`row_count_between` | Ensures row count falls within range
`sum_between` | Ensures column sum falls within range
`avg_between` | Ensures column average falls within range
`max_between` | Ensures column maximum falls within range
`min_between` | Ensures column minimum falls within range

**If all values are null → test fails**

Example

``` yaml
models:
- name: orders
data_tests:
- dbt_checks.row_count_greater_than:
arguments:
value: 100
```
## Ratio

Ratio checks validate proportions of rows matching a condition.

Check | Description
----- | ----------
`null_ratio_below` | Ensures null ratio is below threshold
`null_ratio_between` | Ensures null ratio is within range
`positive_ratio_between` | Ensures positive value ratio within range
`negative_ratio_between` | Ensures negative value ratio within range
`value_ratio_between` | Ensures specific value ratio within range

**Null handling:**
- null_ratio_* explicitly evaluates nulls
- others use total row count as denominator

Example

``` yaml
columns:
- name: email
data_tests:
- dbt_checks.null_ratio_below:
arguments:
threshold: 0.05
```
# Supported Warehouses

`dbt-checks` is designed to work across common dbt adapters:

- Snowflake
- BigQuery
- Databricks
- Spark
- Redshift
- Postgres

Adapter-specific behavior is handled through dbt's `dispatch` mechanism.

**Tested on DuckDB in CI.**

**Aditional adapters are supported through dbt dispatch (best-efort compatibility).**

# Why dbt-checks?

Many dbt projects repeatedly implement the same validation logic.

`dbt-checks` provides:

- reusable checks
- simple configuration
- scoped checks with optional `where` filters
- standardized failure outputs
- CI-friendly debugging context
- predictable null handling
- consistent validation patterns
- cross-warehouse compatibility
- reusable internal helper architecture
- consistent SQL generation across checks
- centralized casting, predicates, ratios, and filtering logic

# Internal Architecture

`dbt-checks` uses reusable internal helper macros to standardize SQL generation across all checks.

Internal helpers include:

- casting helpers
- reusable predicates
- ratio utilities
- filter application helpers
- date utilities
- validation helpers

This improves:
- maintainability
- adapter compatibility
- consistency
- future extensibility

# Contributing

Contributions are welcome.

To add a new check:

1. Implement it in `macros/tests`
2. Reuse helper macros when possible
3. Add documentation
4. Add integration tests (including null behavior)

# License

This project is licensed under the MIT License — see the [LICENSE](LICENSE) file for details.