https://github.com/mjfrigaard/strutilities
String utilities (for testing)
https://github.com/mjfrigaard/strutilities
Last synced: 5 months ago
JSON representation
String utilities (for testing)
- Host: GitHub
- URL: https://github.com/mjfrigaard/strutilities
- Owner: mjfrigaard
- License: other
- Created: 2023-11-09T21:31:20.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-10T14:14:37.000Z (over 1 year ago)
- Last Synced: 2024-08-13T07:11:08.510Z (8 months ago)
- Language: R
- Size: 527 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - mjfrigaard/strutilities - String utilities (for testing) (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "80%",
fig.align = 'center'
)
```# `strutilities`
The goal of `strutilities` is to perform obscure string manipulations.
## Installation
You can install the development version of strutilities like so:
``` r
install.packages('pak')
pak::pak('mjfrigaard/strutilities')
``````{r example}
library(strutilities)
```## process_text()
`process_text()` is designed to standardize the columns names and text contents in a dataset (sort of a low-budget combination of a `janitor::clean_names()` and `map(df, tolower)`):
```{r}
names(datasets::iris)
names(process_text(datasets::iris))
```It has an optional `fct` argument that will convert factors to lowercase characters, too.
```{r}
str(datasets::InsectSprays)
str(process_text(datasets::InsectSprays, fct = TRUE))
```## Testing
Below you'll find the structure of the `tests/` folder:
```{r , echo=FALSE}
fs::dir_tree("tests")
```### Helper with fixture (issue)
In the test below, the `process_text()` function uses the source .csv version of `palmerpenguins::penguins_raw` as a test fixture (loaded in from `tests/testthat/fixtures/make-test_data.R` and exported to `tests/testthat/fixtures/test_data.rds`)
The test helper function (`test_logger()`) is stored in `tests/testthat/helper.R`:
```{r , eval=FALSE}
describe(
"Feature: Process text from dataset
As a ...
I want to ...
So that I ...", code = {
it(
"Scenario: scenario
Given ...
When ...
Then ...", code = {
# helper
test_logger(start = "process_text()", msg = "names penguins_raw.csv")
# fixture
test_data <- readRDS(test_path("fixtures", "test_data.rds"))
# observerd data
processed_data <- process_text(raw_data = test_data, fct = TRUE)
# expected names
nms <- c("studyname",
"sample_number",
"species",
"region",
"island",
"stage",
"individual_id",
"clutch_completion",
"date_egg",
"culmen_length_mm",
"culmen_depth_mm",
"flipper_length_mm",
"body_mass_g",
"sex",
"delta_15_n_o_oo",
"delta_13_c_o_oo",
"comments")
expect_equal(object = names(processed_data), expected = nms)
test_logger(end = "process_text()", msg = "names penguins_raw.csv")
})
})
```#### devtools:::test_active_file()
As we can see below, the test runs fine (with the helper).
```
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
INFO [2023-11-09 14:59:57] [ START process_text() = names penguins_raw.csv]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
INFO [2023-11-09 14:59:57] [ END process_text() = names penguins_raw.csv]
```#### devtools:::test_coverage_active_file()
However, when I attempt to get the coverage for the test file, it shows 0.00% :(
```{r coverage_fixture.png, echo=FALSE}
knitr::include_graphics("man/figures/coverage_fixture.png")
```I thought it might be `it()`, so I swapped it for `test_that()`, but 'same same' :(
### Helper without fixture (works!)
To make sure it wasn't the `process_text()` function or the helper, I also tested loading the `penguins_raw` data directly from the `palmerpenguins` package (i.e., not using the fixture):
```{r , eval=FALSE}
describe(
"Feature: Process text from dataset
As a ...
I want to ...
So that I ...", code = {
it(
"Scenario: scenario
Given ...
When ...
Then ...", code = {
# helper
test_logger(start = "process_text()", msg = "names palmerpenguins::penguins_raw")
# data frame package
test_data <- palmerpenguins::penguins_raw
# test
processed_data <- process_text(raw_data = test_data, fct = TRUE)
nms <- c("studyname",
"sample_number",
"species",
"region",
"island",
"stage",
"individual_id",
"clutch_completion",
"date_egg",
"culmen_length_mm",
"culmen_depth_mm",
"flipper_length_mm",
"body_mass_g",
"sex",
"delta_15_n_o_oo",
"delta_13_c_o_oo",
"comments")
expect_equal(object = names(processed_data), expected = nms)
test_logger(end = "process_text()", msg = "names palmerpenguins::penguins_raw")
})
})
```#### devtools:::test_active_file()
```
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ]
INFO [2023-11-09 15:03:36] [ START process_text() = names palmerpenguins::penguins_raw]
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]
INFO [2023-11-09 15:03:36] [ END process_text() = names palmerpenguins::penguins_raw]
```#### devtools:::test_coverage_active_file()
```{r coverage_helper.png, echo=FALSE}
knitr::include_graphics("man/figures/coverage_helper.png")
```
## Other functions`strutilities` has two other weird functions (`sep_cols_mult()` and `pivot_term_long()`) for manipulating strings/character columns (all written in base R to keep dependencies at a minimum).
### pivot_term_long()
This is an odd version of `pivot_wider()` that's been adapted for a vectors:
```{r}
pivot_term_long("A large size in stockings is hard to sell.")
```You can pass multiple 'terms' and it returns a data.frame with each unique term:
```{r}
terms <- c("A large size in stockings is hard to sell.", "The first part of the plan needs changing.")
pivot_term_long(terms)
```## sep_cols_mult()
The is *somewhat* similar to `tidyr::separate()`, but always uses `"[^[:alnum:]]+"` as the `sep` and keeps all the items resulting from the regex.
```{r}
d <- data.frame(value = c(29L, 91L, 39L, 28L, 12L),
full_name = c("John", "John, Jacob",
"John, Jacob, Jingleheimer",
"Jingleheimer, Schmidt",
"JJJ, Schmidt"))
d
sep_cols_mult(data = d, col = "full_name", col_prefix = "name")
```