https://github.com/mjfrigaard/strutilities

String utilities (for testing)
https://github.com/mjfrigaard/strutilities

Last synced: 7 months ago
JSON representation

String utilities (for testing)

Host: GitHub
URL: https://github.com/mjfrigaard/strutilities
Owner: mjfrigaard
License: other
Created: 2023-11-09T21:31:20.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-11-10T14:14:37.000Z (over 1 year ago)
Last Synced: 2024-08-13T07:11:08.510Z (11 months ago)
Language: R
Size: 527 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

jimsghstars - mjfrigaard/strutilities - String utilities (for testing) (R)

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "80%",

  fig.align = 'center'

)

```

# `strutilities`

The goal of `strutilities` is to perform obscure string manipulations.

## Installation

You can install the development version of strutilities like so:

``` r

install.packages('pak')

pak::pak('mjfrigaard/strutilities')

```

```{r example}

library(strutilities)

```

## process_text()

`process_text()` is designed to standardize the columns names and text contents in a dataset (sort of a low-budget combination of a `janitor::clean_names()` and `map(df, tolower)`):

```{r}

names(datasets::iris)

names(process_text(datasets::iris))

```

It has an optional `fct` argument that will convert factors to lowercase characters, too.

```{r}

str(datasets::InsectSprays)

str(process_text(datasets::InsectSprays, fct = TRUE))

```

## Testing 

Below you'll find the structure of the `tests/` folder:

```{r , echo=FALSE}

fs::dir_tree("tests")

```

### Helper with fixture (issue)

In the test below, the `process_text()` function uses the source .csv version of `palmerpenguins::penguins_raw` as a test fixture (loaded in from `tests/testthat/fixtures/make-test_data.R` and exported to `tests/testthat/fixtures/test_data.rds`)

The test helper function (`test_logger()`) is stored in `tests/testthat/helper.R`:

```{r , eval=FALSE}

describe(

"Feature: Process text from dataset

    As a ...

    I want to ...

    So that I ...", code = {

  it(

   "Scenario: scenario

     Given ...

     When ...

     Then ...", code = {

   # helper

   test_logger(start = "process_text()", msg = "names penguins_raw.csv")

   # fixture

    test_data <- readRDS(test_path("fixtures", "test_data.rds"))

    # observerd data

    processed_data <- process_text(raw_data = test_data, fct = TRUE)

    # expected names

    nms <- c("studyname",

            "sample_number",

            "species",

            "region",

            "island",

            "stage",

            "individual_id",

            "clutch_completion",

            "date_egg",

            "culmen_length_mm",

            "culmen_depth_mm",

            "flipper_length_mm",

            "body_mass_g",

            "sex",

            "delta_15_n_o_oo",

            "delta_13_c_o_oo",

            "comments")

    expect_equal(object = names(processed_data), expected = nms)

    test_logger(end = "process_text()", msg = "names penguins_raw.csv")

  })

})

```

#### devtools:::test_active_file()

As we can see below, the test runs fine (with the helper).

```

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]

INFO [2023-11-09 14:59:57] [ START process_text() = names penguins_raw.csv]

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]

INFO [2023-11-09 14:59:57] [ END process_text() = names penguins_raw.csv]

```

#### devtools:::test_coverage_active_file()

However, when I attempt to get the coverage for the test file, it shows 0.00% :(

```{r coverage_fixture.png, echo=FALSE}

knitr::include_graphics("man/figures/coverage_fixture.png")

```

I thought it might be `it()`, so I swapped it for `test_that()`, but 'same same' :(

### Helper without fixture (works!)

To make sure it wasn't the `process_text()` function or the helper, I also tested loading the `penguins_raw` data directly from the `palmerpenguins` package (i.e., not using the fixture): 

```{r , eval=FALSE}

describe(

"Feature: Process text from dataset

    As a ...

    I want to ...

    So that I ...", code = {

  it(

   "Scenario: scenario

     Given ...

     When ...

     Then ...", code = {

   # helper

   test_logger(start = "process_text()", msg = "names palmerpenguins::penguins_raw")

   # data frame package

    test_data <- palmerpenguins::penguins_raw

    # test

    processed_data <- process_text(raw_data = test_data, fct = TRUE)

    nms <- c("studyname",

            "sample_number",

            "species",

            "region",

            "island",

            "stage",

            "individual_id",

            "clutch_completion",

            "date_egg",

            "culmen_length_mm",

            "culmen_depth_mm",

            "flipper_length_mm",

            "body_mass_g",

            "sex",

            "delta_15_n_o_oo",

            "delta_13_c_o_oo",

            "comments")

    expect_equal(object = names(processed_data), expected = nms)

    test_logger(end = "process_text()", msg = "names palmerpenguins::penguins_raw")

  })

})

```

#### devtools:::test_active_file()

```

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]

INFO [2023-11-09 15:03:36] [ START process_text() = names palmerpenguins::penguins_raw]

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]

INFO [2023-11-09 15:03:36] [ END process_text() = names palmerpenguins::penguins_raw]

```

#### devtools:::test_coverage_active_file()

```{r coverage_helper.png, echo=FALSE}

knitr::include_graphics("man/figures/coverage_helper.png")

```

 

## Other functions 

`strutilities` has two other weird functions (`sep_cols_mult()` and `pivot_term_long()`) for manipulating strings/character columns (all written in base R to keep dependencies at a minimum).

### pivot_term_long()

This is an odd version of `pivot_wider()` that's been adapted for a vectors:

```{r}

pivot_term_long("A large size in stockings is hard to sell.")

```

You can pass multiple 'terms' and it returns a data.frame with each unique term:

```{r}

terms <- c("A large size in stockings is hard to sell.", "The first part of the plan needs changing.")

pivot_term_long(terms)

```

## sep_cols_mult()

The is *somewhat* similar to `tidyr::separate()`, but always uses `"[^[:alnum:]]+"` as the `sep` and keeps all the items resulting from the regex.

```{r}

d <- data.frame(value = c(29L, 91L, 39L, 28L, 12L),

                full_name = c("John", "John, Jacob",

                         "John, Jacob, Jingleheimer",

                         "Jingleheimer, Schmidt",

                         "JJJ, Schmidt"))

d

sep_cols_mult(data = d, col = "full_name", col_prefix = "name")

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mjfrigaard/strutilities

Awesome Lists containing this project

README