Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/moodymudskipper/fastpipe

A fast pipe implementation
https://github.com/moodymudskipper/fastpipe

Last synced: about 2 months ago
JSON representation

A fast pipe implementation

Awesome Lists containing this project

README

        

---
output: github_document
---

[![Travis build status](https://travis-ci.org/moodymudskipper/fastpipe.svg?branch=master)](https://travis-ci.org/moodymudskipper/fastpipe)
[![Codecov test coverage](https://codecov.io/gh/moodymudskipper/fastpipe/branch/master/graph/badge.svg)](https://codecov.io/gh/moodymudskipper/fastpipe?branch=master)

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# fastpipe

This package proposes an alternative to the pipe from the *magrittr* package.

It's named the same and passes all the *magrittr* test so can easily be a drop-in
replacement, its main advantages is that it is faster and solves most of the issues
of *magrittr*.

Install with :

``` r
remotes::install_github("moodymudskipper/fastpipe")
```

## Issues solved and special features

*fastpipe* passes all *magrittr*'s tests, but it doesn't stop there and solves
most of its issues, at least most of the open issues on GitHub.

```{r, include = FALSE}
library(fastpipe)
```

* Lazy evaluation

```{r}
# https://github.com/tidyverse/magrittr/issues/195
# Issue with lazy evaluation #195
gen <- function(x) {function() eval(quote(x))}
identical(
{fn <- gen(1); fn()},
{fn <- 1 %>% gen(); fn()})
```

``` {r, eval = FALSE}
# https://github.com/tidyverse/magrittr/issues/159
# Pipes of functions have broken environments
identical(
{
compose <- function(f, g) { function(x) g(f(x)) }
plus1 <- function(x) x + 1
compose(plus1, plus1)(5)},
{
plus2 <- plus1 %>% compose(plus1)
plus2(5)
})
#> [1] TRUE
# Note : copy and pasted because didn't work with knitr
```

* It considers `!!!.` as `.` when deciding wether to insert a dot

This was requested by Lionel and seems fairly reasonable as it's is unlikely that
a user will need to use both `.` and `!!!.` in a call.

```{r}
letters[1:3] %>% rlang::list2(!!!.)
```

* `:::`, `::` and `$` get a special treatment to solve a common issue

```{r}
iris %>% base::dim
iris %>% base:::dim
x <- list(y = dim)
iris %>% x$y
```

* It fails explicitly in some cases rather than allowing strange behavior silently

```{r, error = TRUE}
iris %>% head %<>% dim
. %<>% head
```

* The new pipe `%S>%` allows the use of *rlang*'s `!!!` operator to splice dots in
any function.

```{r}
c(a = 1, b = 2) %S>% data.frame(!!!.)
```

* The new pipe `%L>%` behaves like `%>%` except that it logs to the console the
calls and the time they took to run.

```{r}
1000000 %L>% rnorm() %L>% sapply(cos) %>% max
```

## Families of pipes and performance

* A `%>%` family of pipes reproduce offered by *magrittr* faster and more robustly,
and extends some features.
* A `%>>%` family of pipes,
offer very fast alternatives, provided the dots are provided explicitly on the
rhs. It doesn't support functional sequences (`. %>% foo()`) nor compound assignment (`bar %<>% baz()`). We call them *bare pipes*.

We provide a benchmark below, the last value is given for context, to remember
that we are discussing small time intervals.

```{r}
`%.%` <- fastpipe::`%>%` # (Note that a *fastpipe*, unlike a *magrittr* pipe, can be copied)
`%>%` <- magrittr::`%>%`
bench::mark(check=F,
'magrittr::`%>%`' =
1 %>% identity %>% identity() %>% (identity) %>% {identity(.)},
'fastpipe::`%>%`' =
1 %.% identity %.% identity() %.% (identity) %.% {identity(.)},
'fastpipe::`%>>%`' =
1 %>>% identity(.) %>>% identity(.) %>>% identity(.) %>>% identity(.),
base = identity(identity(identity(identity(1)))),
`median(1:3)` = median(1:3)
)
rm(`%>%`) # reseting `%>%` to fastpipe::`%>%`
```

We see that our `%>%` pipe is twice faster, and that our `%>>%` pipe
is 10 times faster (on a properly formatted call).

We'd like to stress that a pipe is unlikely to be the performance bottleneck in
a script or a function. If optimal performance is
critical, pipes are best avoided as they will always have an overhead.

In other cases `%>>%` is fast and robust to program and keep the main benefit of piping.

## Implementation

The implementation is completely different from *magrittr*, we can sum it up as
follow :

* `%>>%` pipes : evaluate *rhs* in parent environment, overriding `.` with the value
of the *lhs*. Its code is extremely straightforward.

```{r}
`%>>%`
```


* `%>%` pipes :
* Use heuristics to insert dots, so `x %>% head` becomes `x %>% head(.)`
* support functionnal sequences, so pipe chains starting with `.` are functions
* support compound assignment, so pipe chains starting with `%<>%` assign in
place to the lhs

To support the two latter, we use global variables, stored in `globals`, a child
environment of our package's environment. It contains the values `master`, `is_fs`
and `is_compound`. Outside of pipes master is always `TRUE` while the two latter
are always `FALSE`. The alternative to global parameters was to play with
classes and attributes but was slower and more complex.

Whenever we enter a pipe we check the value of `globals$master`.
If it is `TRUE` we enter a sequence where :

* We set `master` to `FALSE`, and use `on.exit()` to setup the restoration of
global parameters to their default value at the end of the call.
* We reexecute the whole call (which now won't enter this sequence)
* If no `.` nor `%<>%` are hit in the call recursion we just evaluate a sequence
of calls as described by `%>>%`'s code, expliciting the implicit dots as we go.
* If the chain starts with `.`
* We switch on `is_fs`
* When `is_fs` is switched on we return the original call, unevaluated, subtituting
the pipe by it's bare version (the name of the bare version is stored as
an attribute of the pipe) and adding explicit dots (so `. %>% head(2)`
becomes `. %>>% head(.,2)`).
* It works recursively and we end up with a quoted pipe chain of bare pipes
starting with a dot
* If the chain starts with a `%<>%` call:
* We switch on `is_compound`
* We define a global variable named `compound_lhs`, which contains
the quoted lhs of the `%<>%` call
* We reevaluate the `%<>%` call, substituting `%<>%` by `%>>%`
* Then consequent pipe calls are recognized as standard
* Once we get the result:
* If `is_fs` was switched on we wrap our quoted pipe in a function and return it
* If `is_compound` was switched on we assign in the calling environment the
result to the expression stored in `global$compoud_lhs`
* If not any of those we just return the result
* Default global values are restored, thanks to our `on.exit()` call

*fastpipe* operators contain their own code while in *magrittr*'s current
implementation they all have the main code and are recognized in the function
by their name. Moreover the pipe names are not hardcoded in the functions, which
prevents confusion like the fact that `%$%` still sometimes work in *magrittr*
when only `%>%` is reexported. https://github.com/tidyverse/magrittr/issues/194

The pipes have a class and a printing methods, the functional sequences don't.

```{r}
`%T>>%`
. %>% sin %>% cos() %T>% tan(.)
```

## Benchmarks for functional sequences

An interesting and little known fact is that using functional sequences in functionals
is much more efficient than using lambda functions calling pipes (though it will
generally be largely offset by the content of the fonction), for instance :
`purrr::map(foo, ~ .x %>% bar %>% baz)` is slower than
`purrr::map(foo, . %>% bar %>% baz)`. I tried to keep this nice feature but
didn't succeed to make it as efficient as in *magrittr*. The difference show only
for large number of iteration and will probably be negligible in any realistic loop.

```{r}
`%.%` <- fastpipe::`%>%`
`%>%` <- magrittr::`%>%`
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# functional sequence with 5 iterations, equivalent speed
library(rlang) # to call `as_function()`
bench::mark(check=F,
rlang_lambda =
vapply(1:5, as_function(~.x %>% identity %>% identity() %>% (identity) %>% {identity(.)}), integer(1)),
fastpipe =
vapply(1:5, . %.% identity %.% identity() %.% (identity) %.% {identity(.)}, integer(1)),
magrittr =
vapply(1:5, . %>% identity %>% identity() %>% (identity) %>% {identity(.)}, integer(1)),
base =
vapply(1:5, function(x) identity(identity(identity(identity(x)))) , integer(1)),
`median(1:3)` = median(1:3)
)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# functional sequence with 1000 iterations
bench::mark(check=F,
rlang_lambda =
vapply(1:1000, as_function(~.x %>% identity %>% identity() %>% (identity) %>% {identity(.)}), integer(1)),
fastpipe =
vapply(1:1000, . %.% identity %.% identity() %.% (identity) %.% {identity(.)}, integer(1)),
magrittr =
vapply(1:1000, . %>% identity %>% identity() %>% (identity) %>% {identity(.)}, integer(1)),
base =
vapply(1:1000, function(x) identity(identity(identity(identity(x)))) , integer(1))
)
```

## Breaking changes

The fact that the tests of *magrittr* are passed by *fastpipe* doesn't mean
that *fastpipe* will necessarily behave as expected by *magrittr*'s users in
all cicumstances.

Here are the main such cases :

* In *pastpipe* functional sequences don't have a class and are not subsetable.
This is a feature that Stefan Bache wanted to weed out of his package, but can
still break some code.

* We said earlier that the instances of using both `!!!.` and `.` are expected
to be rare, but they happen nonetheless. The call `mtcars[8:11] %>% dplyr::count(!!!.)`
works with *magrittr* but with *fastpipe* we need `mtcars[8:11] %>% dplyr::count(., !!!.)`.

* If `%T>%` or `%$%` are not reexported by a package which reexports `%>%`,
they will not be usable at all, while at the moment after attaching dplyr you can
do `cars %$% speed %>% head(2)` even if `%$%` is nowhere to be found. It's a feature
of *fastpipe* rather than a bug but will still break the code of those who forgot
to attach *magrittr* and had their code work by luck.

* If reexporting `%>%` from *fastpipe*, one must also reexport `%>>%` as it is
used by functional sequences.

* If you do strange things like ``cars %>% `$`("spee")`` it won't work
with *fastpipe* because `$`, `::` and `:::` are special cases and *fastpipe*
would basically try to run `'$'("spee",)(cars)` and choke because there'd be
no second argument to `$`.

## Notes

* The package *magrittr* was created by Stefan Milton Bache and Hadley Wickham.
The design of the pipe's interface and most of the testing code of this package
is their work or the work of other *magrittr* contributors, while none of the
remaining code is.
* The package is young and might still change.
* *magrittr*'s pipe is widespread and reexported by prominent *tidyverse*
packages. It could have been annoying if they masked *fastpipe* 's operator(s)
each time so I set a hook so that whenever one of those prominent packages is
attached, *fastpipe* will be detached and reattached at the end of
the search path. This assumes that if you attach *fastpipe* you want to use
its pipes by default, which I think is reasonable. A message makes it explicit,
for instance :

> Attaching package: ‘dplyr’
>
> The following object is masked \_by\_ ‘package:fastpipe’:
>
> %>%

The current list of the packages subjects to this hook is : *magrittr*, *dplyr*, *purrr*, *tidyr*, *stringr*, *forcats*, *rvest*, *modelr*, *testthat*. However many more packages reexport
magrittr's pipe and the safest way to make sure you're using *fastpipe* is to
attach it after all other packages.