https://github.com/emilyriederer/dbtplyr

dbt package mimicking dplyr select-helpers semantics
https://github.com/emilyriederer/dbtplyr

dbt dplyr macros sql

Last synced: 7 months ago
JSON representation

dbt package mimicking dplyr select-helpers semantics

Host: GitHub
URL: https://github.com/emilyriederer/dbtplyr
Owner: emilyriederer
License: apache-2.0
Created: 2021-02-06T00:40:16.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2024-08-22T23:54:37.000Z (10 months ago)
Last Synced: 2024-08-23T01:12:03.113Z (10 months ago)
Topics: dbt, dplyr, macros, sql
Homepage: https://emilyriederer.github.io/dbtplyr
Size: 646 KB
Stars: 136
Watchers: 1
Forks: 10
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

jimsghstars - emilyriederer/dbtplyr - dbt package mimicking dplyr select-helpers semantics (Others)

README

        ## dbtplyr

This add-on package enhances `dbt` by providing macros which programmatically select columns

based on their column names. It is inspired by the [`across()` function](https://www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/) 

and the [`select helpers`](https://tidyselect.r-lib.org/reference/select_helpers.html) in the R package `dplyr`.

`dplyr` (>= 1.0.0) has helpful semantics for selecting and applying transformations to variables based on their names.

For example, if one wishes to take the *sum* of all variables with name prefixes of `N` and the mean of all variables with

name prefixes of `IND` in the dataset `mydata`, they may write:

```

summarize(

  mydata, 

  across( starts_with('N'), sum),

  across( starts_with('IND', mean)

)

```

This package enables us to similarly write `dbt` data models with commands like:

```

{% set cols = dbtplyr.get_column_names( ref('mydata') ) %}

{% set cols_n = dbtplyr.starts_with('N', cols) %}

{% set cols_ind = dbtplyr.starts_with('IND', cols) %}

select

  {{ dbtplyr.across(cols_n, "sum({{var}}) as {{var}}_tot") }},

  {{ dbtplyr.across(cols_ind, "mean({{var}}) as {{var}}_avg") }}

from {{ ref('mydata') }}

```

which `dbt` then compiles to standard SQL. 

Alternatively, to protect against cases where no column names matched the pattern provided 

(e.g. no variables start with `n` so `cols_n` is an empty list), one may instead internalize the final comma

so that it is only compiled to SQL when relevant by using the `final_comma` parameter of `across`.

```

  {{ dbtplyr.across(cols_n, "sum({{var}}) as {{var}}_tot", final_comma = true) }}

```

Note that, slightly more `dplyr`-like, you may also write:

```

select

  {{ dbtplyr.across(dbtplyr.starts_with('N', ref('mydata')), "sum({{var}}) as {{var}}_tot") }},

  {{ dbtplyr.across(dbtplyr.starts_with('IND', ref('mydata')), "mean({{var}}) as {{var}}_avg") }}

from {{ ref('mydata') }}

```

But, as each function call is a bit longer than the equivalent `dplyr` code, I personally find the first form more readable.

## Macros

The complete list of macros included are:

**Functions to apply operation across columns**

- `across(var_list, script_string, final_comma)`

- `c_across(var_list, script_string)`

**Functions to evaluation condition across columns**

- `if_any(var_list, script_string)`

- `if_all(var_list, script_string)`

**Functions to subset columns by naming conventions**

- `starts_with(string, relation or list)` 

- `ends_with(string, relation or list)`

- `contains(string, relation or list)`

- `not_contains(string, relation or list)`

- `one_of(string_list, relation or list)`

- `not_one_of(string_list, relation or list)`

- `matches(string, relation)`

- `everything(relation)`

- `where(fn, relation)` where `fn` is the string name of a [Column type-checker](https://docs.getdbt.com/reference/dbt-classes/#column) (e.g. "is_number")

Note that all of the select-helper functions that take a relation as an argument can optionally be passed a list of names instead.

Documentation for these functions is available on the [package website](https://emilyriederer.github.io/dbtplyr/) and in the [`macros/macro.yml`](https://github.com/emilyriederer/dbtplyr/blob/main/macros/macro.yml) file.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/emilyriederer/dbtplyr

Awesome Lists containing this project

README