Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/emilyriederer/dbtplyr
dbt package mimicking dplyr select-helpers semantics
https://github.com/emilyriederer/dbtplyr
dbt dplyr macros sql
Last synced: 20 days ago
JSON representation
dbt package mimicking dplyr select-helpers semantics
- Host: GitHub
- URL: https://github.com/emilyriederer/dbtplyr
- Owner: emilyriederer
- License: apache-2.0
- Created: 2021-02-06T00:40:16.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-08-22T23:54:37.000Z (4 months ago)
- Last Synced: 2024-08-23T01:12:03.113Z (4 months ago)
- Topics: dbt, dplyr, macros, sql
- Homepage: https://emilyriederer.github.io/dbtplyr
- Size: 646 KB
- Stars: 136
- Watchers: 1
- Forks: 10
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - emilyriederer/dbtplyr - dbt package mimicking dplyr select-helpers semantics (Others)
README
## dbtplyr
This add-on package enhances `dbt` by providing macros which programmatically select columns
based on their column names. It is inspired by the [`across()` function](https://www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/)
and the [`select helpers`](https://tidyselect.r-lib.org/reference/select_helpers.html) in the R package `dplyr`.`dplyr` (>= 1.0.0) has helpful semantics for selecting and applying transformations to variables based on their names.
For example, if one wishes to take the *sum* of all variables with name prefixes of `N` and the mean of all variables with
name prefixes of `IND` in the dataset `mydata`, they may write:```
summarize(
mydata,
across( starts_with('N'), sum),
across( starts_with('IND', mean)
)
```This package enables us to similarly write `dbt` data models with commands like:
```
{% set cols = dbtplyr.get_column_names( ref('mydata') ) %}
{% set cols_n = dbtplyr.starts_with('N', cols) %}
{% set cols_ind = dbtplyr.starts_with('IND', cols) %}select
{{ dbtplyr.across(cols_n, "sum({{var}}) as {{var}}_tot") }},
{{ dbtplyr.across(cols_ind, "mean({{var}}) as {{var}}_avg") }}from {{ ref('mydata') }}
```which `dbt` then compiles to standard SQL.
Alternatively, to protect against cases where no column names matched the pattern provided
(e.g. no variables start with `n` so `cols_n` is an empty list), one may instead internalize the final comma
so that it is only compiled to SQL when relevant by using the `final_comma` parameter of `across`.```
{{ dbtplyr.across(cols_n, "sum({{var}}) as {{var}}_tot", final_comma = true) }}
```Note that, slightly more `dplyr`-like, you may also write:
```
select{{ dbtplyr.across(dbtplyr.starts_with('N', ref('mydata')), "sum({{var}}) as {{var}}_tot") }},
{{ dbtplyr.across(dbtplyr.starts_with('IND', ref('mydata')), "mean({{var}}) as {{var}}_avg") }}from {{ ref('mydata') }}
```But, as each function call is a bit longer than the equivalent `dplyr` code, I personally find the first form more readable.
## Macros
The complete list of macros included are:
**Functions to apply operation across columns**
- `across(var_list, script_string, final_comma)`
- `c_across(var_list, script_string)`**Functions to evaluation condition across columns**
- `if_any(var_list, script_string)`
- `if_all(var_list, script_string)`**Functions to subset columns by naming conventions**
- `starts_with(string, relation or list)`
- `ends_with(string, relation or list)`
- `contains(string, relation or list)`
- `not_contains(string, relation or list)`
- `one_of(string_list, relation or list)`
- `not_one_of(string_list, relation or list)`
- `matches(string, relation)`
- `everything(relation)`
- `where(fn, relation)` where `fn` is the string name of a [Column type-checker](https://docs.getdbt.com/reference/dbt-classes/#column) (e.g. "is_number")Note that all of the select-helper functions that take a relation as an argument can optionally be passed a list of names instead.
Documentation for these functions is available on the [package website](https://emilyriederer.github.io/dbtplyr/) and in the [`macros/macro.yml`](https://github.com/emilyriederer/dbtplyr/blob/main/macros/macro.yml) file.