Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/daranzolin/hacksaw
Extra tidyverse-like functionality
https://github.com/daranzolin/hacksaw
data-manipulation r
Last synced: about 2 months ago
JSON representation
Extra tidyverse-like functionality
- Host: GitHub
- URL: https://github.com/daranzolin/hacksaw
- Owner: daranzolin
- License: other
- Created: 2020-05-25T07:04:21.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-04-08T01:51:01.000Z (almost 4 years ago)
- Last Synced: 2024-08-13T07:13:23.769Z (5 months ago)
- Topics: data-manipulation, r
- Language: R
- Homepage:
- Size: 187 KB
- Stars: 33
- Watchers: 2
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - daranzolin/hacksaw - Extra tidyverse-like functionality (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# hacksaw
![](https://camo.githubusercontent.com/ea6e0ff99602c3563e3dd684abf60b30edceaeef/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6966656379636c652d6578706572696d656e74616c2d6f72616e67652e737667)
![CRAN log](http://www.r-pkg.org/badges/version/hacksaw)
![](http://cranlogs.r-pkg.org/badges/grand-total/hacksaw)
[![Travis build status](https://travis-ci.org/daranzolin/hacksaw.svg?branch=master)](https://travis-ci.com/daranzolin/hacksaw)hacksaw is as an adhesive between various dplyr and purrr operations, with some extra tidyverse-like functionality (e.g. keeping NAs, shifting row values) and shortcuts (e.g. filtering patterns, casting, plucking, etc.).
## Installation
You can install the released version of hacksaw from CRAN with:
``` r
install.packages("hacksaw")
```Or install the development version from GitHub with:
```r
remotes::install_github("daranzolin/hacksaw")
```## Split operations
hacksaw's assortment of split operations recycle the original data frame. This is useful when you want to run slightly different code on the same object multiple times (e.g. assignment) or you want to take advantage of some list functionality (e.g. purrr, `lengths()`, `%->%`, etc.).
The useful`%<-%` and `%->%` operators are re-exported from [the zeallot package.](https://github.com/r-lib/zeallot)
### filter
```{r warning=FALSE, message=FALSE}
library(hacksaw)
library(tidyverse)iris %>%
filter_split(
large_petals = Petal.Length > 5.1,
large_sepals = Sepal.Length > 6.4
) %>%
map(summary)
```### select
Include multiple columns and select helpers within `c()`:
```{r}
iris %>%
select_split(
sepal_data = c(Species, starts_with("Sepal")),
petal_data = c(Species, starts_with("Petal"))
) %>%
str()
```### count
Count across multiple variables:
```{r}
mtcars %>%
count_split(
cyl,
carb,
gear
)
```### rolling_count_split
Rolling counts, left-to-right
```{r}
mtcars %>%
rolling_count_split(
cyl,
carb,
gear
)
```### distinct
Easily get the unique values of multiple columns:
```{r}
starwars %>%
distinct_split(skin_color, eye_color, homeworld) %>%
str() # lengths() is also useful
```### mutate
```{r}
iris %>%
mutate_split(
Sepal.Length2 = Sepal.Length * 2,
Sepal.Length3 = Sepal.Length * 3
) %>%
str()
```### group_by
Separate groups:
```{r}
mtcars %>%
group_by_split(cyl, gear, am, across(c(cyl, gear))) %>%
map(tally, wt = vs)
```### rolling_group_by_split
Rolling groups, left-to-right:
```{r}
mtcars %>%
rolling_group_by_split(
cyl,
carb,
gear
) %>%
map(summarize, mean_mpg = mean(mpg))
```### nest_by
```{r}
mtcars %>%
nest_by_split(cyl, gear) %>%
map(mutate, model = list(lm(mpg ~ wt, data = data)))
```### rolling_nest_by
```{r}
mtcars %>%
rolling_nest_by_split(cyl, gear) %>%
map(mutate, model = list(lm(mpg ~ wt, data = data)))
```### transmute
```{r}
iris %>%
transmute_split(Sepal.Length * 2, Petal.Width + 5) %>%
str()
```### slice
```{r}
iris %>%
slice_split(1:10, 11:15, 30:50) %>%
str()
```Use the `var_max` and `var_min` helpers to easily get minimum and maximum values of a variable:
```{r}
iris %>%
slice_split(
largest_sepals = var_max(Sepal.Length, 4),
smallest_sepals = var_min(Sepal.Length, 4)
)#
```### precision_split
`precision_split` splits the mtcars data frame into two: one with mpg greater than 20, one with mpg less than 20:
```{r}
mtcars %>%
precision_split(mpg > 20) %->% c(lt20mpg, gt20mpg)str(gt20mpg)
str(lt20mpg)
```### eval_split
Evaluate any expression:
```{r}
mtcars %>%
eval_split(
select(hp, mpg),
filter(mpg > 25),
mutate(pounds = wt*1000)
) %>%
str()
```## Casting
Tired of `mutate(var = as.[character|numeric|logical](var))`?
```{r}
starwars %>% cast_character(height, mass) %>% str(max.level = 2)
iris %>% cast_character(contains(".")) %>% str(max.level = 1)
```hacksaw also includes `cast_numeric` and `cast_logical`.
## Keeping NAs
The reverse of `tidyr::drop_na`, strangely omitted in the original tidyverse.
```{r}
df <- tibble(x = c(1, 2, NA, NA, NA), y = c("a", NA, "b", NA, NA))
df %>% keep_na()
df %>% keep_na(x)
df %>% keep_na(x, y)
```## Coercive joins
I never care if my join keys are incompatible. The `*_join2` suite of functions coerce either the left or right table accordingly.
```{r}
df1 <- tibble(x = 1:10, b = 1:10, y = letters[1:10])
df2 <- tibble(x = as.character(1:10), z = letters[11:20])
left_join2(df1, df2)
```## Shifting row values
Shift values across rows in either direction. Sometimes useful when importing irregularly-shaped tabular data.
```{r}
df <- tibble(
s = c(NA, 1, NA, NA),
t = c(NA, NA, 1, NA),
u = c(NA, NA, 2, 5),
v = c(5, 1, 9, 2),
x = c(1, 5, 6, 7),
y = c(NA, NA, 8, NA),
z = 1:4
)
dfshift_row_values(df)
shift_row_values(df, at = 1:3)
shift_row_values(df, at = 1:2, .dir = "right")```
## Filtering, keeping, and discarding patterns
A wrapper around `filter(grepl(..., var))`:
```{r}
starwars %>%
filter_pattern(homeworld, "oo") %>%
distinct(homeworld)
```Use `keep_pattern` and `discard_pattern` for lists and vectors.
## Plucking values
A wrapper around `x[p][i]`:
```{r}
df <- tibble(
id = c(1, 1, 1, 2, 2, 2, 3, 3),
tested = c("no", "no", "yes", "no", "no", "no", "yes", "yes"),
year = c(2015:2017, 2010:2012, 2019:2020)
)df %>%
group_by(id) %>%
mutate(year_first_tested = pluck_when(year, tested == "yes"))
```