An open API service indexing awesome lists of open source software.

https://github.com/mjfrigaard/strutilities

String utilities (for testing)
https://github.com/mjfrigaard/strutilities

Last synced: 5 months ago
JSON representation

String utilities (for testing)

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "80%",
fig.align = 'center'
)
```

# `strutilities`

The goal of `strutilities` is to perform obscure string manipulations.

## Installation

You can install the development version of strutilities like so:

``` r
install.packages('pak')
pak::pak('mjfrigaard/strutilities')
```

```{r example}
library(strutilities)
```

## process_text()

`process_text()` is designed to standardize the columns names and text contents in a dataset (sort of a low-budget combination of a `janitor::clean_names()` and `map(df, tolower)`):

```{r}
names(datasets::iris)
names(process_text(datasets::iris))
```

It has an optional `fct` argument that will convert factors to lowercase characters, too.

```{r}
str(datasets::InsectSprays)
str(process_text(datasets::InsectSprays, fct = TRUE))
```

## Testing

Below you'll find the structure of the `tests/` folder:

```{r , echo=FALSE}
fs::dir_tree("tests")
```

### Helper with fixture (issue)

In the test below, the `process_text()` function uses the source .csv version of `palmerpenguins::penguins_raw` as a test fixture (loaded in from `tests/testthat/fixtures/make-test_data.R` and exported to `tests/testthat/fixtures/test_data.rds`)

The test helper function (`test_logger()`) is stored in `tests/testthat/helper.R`:

```{r , eval=FALSE}
describe(
"Feature: Process text from dataset
As a ...
I want to ...
So that I ...", code = {
it(
"Scenario: scenario
Given ...
When ...
Then ...", code = {
# helper
test_logger(start = "process_text()", msg = "names penguins_raw.csv")
# fixture
test_data <- readRDS(test_path("fixtures", "test_data.rds"))
# observerd data
processed_data <- process_text(raw_data = test_data, fct = TRUE)
# expected names
nms <- c("studyname",
"sample_number",
"species",
"region",
"island",
"stage",
"individual_id",
"clutch_completion",
"date_egg",
"culmen_length_mm",
"culmen_depth_mm",
"flipper_length_mm",
"body_mass_g",
"sex",
"delta_15_n_o_oo",
"delta_13_c_o_oo",
"comments")
expect_equal(object = names(processed_data), expected = nms)
test_logger(end = "process_text()", msg = "names penguins_raw.csv")
})
})
```

#### devtools:::test_active_file()

As we can see below, the test runs fine (with the helper).

```
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
INFO [2023-11-09 14:59:57] [ START process_text() = names penguins_raw.csv]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
INFO [2023-11-09 14:59:57] [ END process_text() = names penguins_raw.csv]
```

#### devtools:::test_coverage_active_file()

However, when I attempt to get the coverage for the test file, it shows 0.00% :(

```{r coverage_fixture.png, echo=FALSE}
knitr::include_graphics("man/figures/coverage_fixture.png")
```

I thought it might be `it()`, so I swapped it for `test_that()`, but 'same same' :(

### Helper without fixture (works!)

To make sure it wasn't the `process_text()` function or the helper, I also tested loading the `penguins_raw` data directly from the `palmerpenguins` package (i.e., not using the fixture):

```{r , eval=FALSE}
describe(
"Feature: Process text from dataset
As a ...
I want to ...
So that I ...", code = {
it(
"Scenario: scenario
Given ...
When ...
Then ...", code = {
# helper
test_logger(start = "process_text()", msg = "names palmerpenguins::penguins_raw")
# data frame package
test_data <- palmerpenguins::penguins_raw
# test
processed_data <- process_text(raw_data = test_data, fct = TRUE)
nms <- c("studyname",
"sample_number",
"species",
"region",
"island",
"stage",
"individual_id",
"clutch_completion",
"date_egg",
"culmen_length_mm",
"culmen_depth_mm",
"flipper_length_mm",
"body_mass_g",
"sex",
"delta_15_n_o_oo",
"delta_13_c_o_oo",
"comments")
expect_equal(object = names(processed_data), expected = nms)
test_logger(end = "process_text()", msg = "names palmerpenguins::penguins_raw")
})
})
```

#### devtools:::test_active_file()

```
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
INFO [2023-11-09 15:03:36] [ START process_text() = names palmerpenguins::penguins_raw]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
INFO [2023-11-09 15:03:36] [ END process_text() = names palmerpenguins::penguins_raw]
```

#### devtools:::test_coverage_active_file()

```{r coverage_helper.png, echo=FALSE}
knitr::include_graphics("man/figures/coverage_helper.png")
```


## Other functions

`strutilities` has two other weird functions (`sep_cols_mult()` and `pivot_term_long()`) for manipulating strings/character columns (all written in base R to keep dependencies at a minimum).

### pivot_term_long()

This is an odd version of `pivot_wider()` that's been adapted for a vectors:

```{r}
pivot_term_long("A large size in stockings is hard to sell.")
```

You can pass multiple 'terms' and it returns a data.frame with each unique term:

```{r}
terms <- c("A large size in stockings is hard to sell.", "The first part of the plan needs changing.")
pivot_term_long(terms)
```

## sep_cols_mult()

The is *somewhat* similar to `tidyr::separate()`, but always uses `"[^[:alnum:]]+"` as the `sep` and keeps all the items resulting from the regex.

```{r}
d <- data.frame(value = c(29L, 91L, 39L, 28L, 12L),
full_name = c("John", "John, Jacob",
"John, Jacob, Jingleheimer",
"Jingleheimer, Schmidt",
"JJJ, Schmidt"))
d
sep_cols_mult(data = d, col = "full_name", col_prefix = "name")
```