Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/daranzolin/hacksaw

Extra tidyverse-like functionality
https://github.com/daranzolin/hacksaw

data-manipulation r

Last synced: about 2 months ago
JSON representation

Extra tidyverse-like functionality

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# hacksaw

![](https://camo.githubusercontent.com/ea6e0ff99602c3563e3dd684abf60b30edceaeef/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6966656379636c652d6578706572696d656e74616c2d6f72616e67652e737667)
![CRAN log](http://www.r-pkg.org/badges/version/hacksaw)
![](http://cranlogs.r-pkg.org/badges/grand-total/hacksaw)
[![Travis build status](https://travis-ci.org/daranzolin/hacksaw.svg?branch=master)](https://travis-ci.com/daranzolin/hacksaw)

hacksaw is as an adhesive between various dplyr and purrr operations, with some extra tidyverse-like functionality (e.g. keeping NAs, shifting row values) and shortcuts (e.g. filtering patterns, casting, plucking, etc.).

## Installation

You can install the released version of hacksaw from CRAN with:

``` r
install.packages("hacksaw")
```

Or install the development version from GitHub with:

```r
remotes::install_github("daranzolin/hacksaw")
```

## Split operations

hacksaw's assortment of split operations recycle the original data frame. This is useful when you want to run slightly different code on the same object multiple times (e.g. assignment) or you want to take advantage of some list functionality (e.g. purrr, `lengths()`, `%->%`, etc.).

The useful`%<-%` and `%->%` operators are re-exported from [the zeallot package.](https://github.com/r-lib/zeallot)

### filter

```{r warning=FALSE, message=FALSE}
library(hacksaw)
library(tidyverse)

iris %>%
filter_split(
large_petals = Petal.Length > 5.1,
large_sepals = Sepal.Length > 6.4
) %>%
map(summary)
```

### select

Include multiple columns and select helpers within `c()`:

```{r}
iris %>%
select_split(
sepal_data = c(Species, starts_with("Sepal")),
petal_data = c(Species, starts_with("Petal"))
) %>%
str()
```

### count

Count across multiple variables:

```{r}
mtcars %>%
count_split(
cyl,
carb,
gear
)
```

### rolling_count_split

Rolling counts, left-to-right

```{r}
mtcars %>%
rolling_count_split(
cyl,
carb,
gear
)
```

### distinct

Easily get the unique values of multiple columns:

```{r}
starwars %>%
distinct_split(skin_color, eye_color, homeworld) %>%
str() # lengths() is also useful
```

### mutate

```{r}
iris %>%
mutate_split(
Sepal.Length2 = Sepal.Length * 2,
Sepal.Length3 = Sepal.Length * 3
) %>%
str()
```

### group_by

Separate groups:

```{r}
mtcars %>%
group_by_split(cyl, gear, am, across(c(cyl, gear))) %>%
map(tally, wt = vs)
```

### rolling_group_by_split

Rolling groups, left-to-right:

```{r}
mtcars %>%
rolling_group_by_split(
cyl,
carb,
gear
) %>%
map(summarize, mean_mpg = mean(mpg))
```

### nest_by

```{r}
mtcars %>%
nest_by_split(cyl, gear) %>%
map(mutate, model = list(lm(mpg ~ wt, data = data)))
```

### rolling_nest_by

```{r}
mtcars %>%
rolling_nest_by_split(cyl, gear) %>%
map(mutate, model = list(lm(mpg ~ wt, data = data)))
```

### transmute

```{r}
iris %>%
transmute_split(Sepal.Length * 2, Petal.Width + 5) %>%
str()
```

### slice

```{r}
iris %>%
slice_split(1:10, 11:15, 30:50) %>%
str()
```

Use the `var_max` and `var_min` helpers to easily get minimum and maximum values of a variable:

```{r}
iris %>%
slice_split(
largest_sepals = var_max(Sepal.Length, 4),
smallest_sepals = var_min(Sepal.Length, 4)
)#
```

### precision_split

`precision_split` splits the mtcars data frame into two: one with mpg greater than 20, one with mpg less than 20:

```{r}
mtcars %>%
precision_split(mpg > 20) %->% c(lt20mpg, gt20mpg)

str(gt20mpg)
str(lt20mpg)
```

### eval_split

Evaluate any expression:

```{r}
mtcars %>%
eval_split(
select(hp, mpg),
filter(mpg > 25),
mutate(pounds = wt*1000)
) %>%
str()
```

## Casting

Tired of `mutate(var = as.[character|numeric|logical](var))`?

```{r}
starwars %>% cast_character(height, mass) %>% str(max.level = 2)
iris %>% cast_character(contains(".")) %>% str(max.level = 1)
```

hacksaw also includes `cast_numeric` and `cast_logical`.

## Keeping NAs

The reverse of `tidyr::drop_na`, strangely omitted in the original tidyverse.

```{r}
df <- tibble(x = c(1, 2, NA, NA, NA), y = c("a", NA, "b", NA, NA))
df %>% keep_na()
df %>% keep_na(x)
df %>% keep_na(x, y)
```

## Coercive joins

I never care if my join keys are incompatible. The `*_join2` suite of functions coerce either the left or right table accordingly.

```{r}
df1 <- tibble(x = 1:10, b = 1:10, y = letters[1:10])
df2 <- tibble(x = as.character(1:10), z = letters[11:20])
left_join2(df1, df2)
```

## Shifting row values

Shift values across rows in either direction. Sometimes useful when importing irregularly-shaped tabular data.

```{r}
df <- tibble(
s = c(NA, 1, NA, NA),
t = c(NA, NA, 1, NA),
u = c(NA, NA, 2, 5),
v = c(5, 1, 9, 2),
x = c(1, 5, 6, 7),
y = c(NA, NA, 8, NA),
z = 1:4
)
df

shift_row_values(df)
shift_row_values(df, at = 1:3)
shift_row_values(df, at = 1:2, .dir = "right")

```

## Filtering, keeping, and discarding patterns

A wrapper around `filter(grepl(..., var))`:

```{r}
starwars %>%
filter_pattern(homeworld, "oo") %>%
distinct(homeworld)
```

Use `keep_pattern` and `discard_pattern` for lists and vectors.

## Plucking values

A wrapper around `x[p][i]`:

```{r}
df <- tibble(
id = c(1, 1, 1, 2, 2, 2, 3, 3),
tested = c("no", "no", "yes", "no", "no", "no", "yes", "yes"),
year = c(2015:2017, 2010:2012, 2019:2020)
)

df %>%
group_by(id) %>%
mutate(year_first_tested = pluck_when(year, tested == "yes"))
```