Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/renkun-ken/piper
Multi-Paradigm Pipeline Implementation
https://github.com/renkun-ken/piper
Last synced: 12 days ago
JSON representation
Multi-Paradigm Pipeline Implementation
- Host: GitHub
- URL: https://github.com/renkun-ken/piper
- Owner: renkun-ken
- License: other
- Created: 2014-04-07T13:16:35.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-08-26T00:50:14.000Z (about 8 years ago)
- Last Synced: 2024-10-19T05:59:15.478Z (20 days ago)
- Language: R
- Homepage:
- Size: 417 KB
- Stars: 169
- Watchers: 26
- Forks: 39
- Open Issues: 5
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
```{r knitsetup, echo=FALSE, results='hide', warning=FALSE, message=FALSE, cache=FALSE}
library(knitr)
opts_knit$set(base.dir='./', fig.path='', out.format='md')
opts_chunk$set(prompt=FALSE, comment='', results='markup')
library(pipeR)
```# pipeR
[![Linux Build Status](https://travis-ci.org/renkun-ken/pipeR.png?branch=master)](https://travis-ci.org/renkun-ken/pipeR)
[![Windows Build status](https://ci.appveyor.com/api/projects/status/github/renkun-ken/pipeR?svg=true)](https://ci.appveyor.com/project/renkun-ken/pipeR)
[![codecov.io](http://codecov.io/github/renkun-ken/pipeR/coverage.svg?branch=master)](http://codecov.io/github/renkun-ken/pipeR?branch=master)
[![CRAN Version](http://www.r-pkg.org/badges/version/pipeR)](https://cran.r-project.org/package=pipeR)pipeR provides various styles of function chaining methods:
* Pipe operator
* Pipe object
* pipeline functionEach of them represents a distinct pipeline model but they share almost a common set of features. A value can be piped to the next expression
* As the first unnamed argument of the function
* As dot symbol (`.`) in the expression
* As a named variable defined by a formula
* For side-effect that carries over the input to the next
* For assignment that saves an intermediate valueThe syntax is designed to make the pipeline more readable and friendly to
a wide variety of operations.**[pipeR Tutorial](http://renkun.me/pipeR-tutorial) is a highly recommended complete guide to pipeR.**
This document is also translated into [日本語](https://github.com/renkun-ken/pipeR/blob/master/README.ja.md) (by [@hoxo_m](https://github.com/hoxo-m/)).
## Installation
Install the latest development version from GitHub:
```r
devtools::install_github("renkun-ken/pipeR")
```Install from [CRAN](https://cran.r-project.org/package=pipeR):
```r
install.packages("pipeR")
```## Getting started
The following code is an example written in traditional approach:
It basically performs bootstrap on `mpg` values in built-in dataset `mtcars` and plots its density function estimated by Gaussian kernel.
```r
plot(density(sample(mtcars$mpg, size = 10000, replace = TRUE),
kernel = "gaussian"), col = "red", main="density of mpg (bootstrap)")
```The code is deeply nested and can be hard to read and maintain. In the following examples, the traditional code is rewritten by Pipe operator, `Pipe()` function and `pipeline()` function, respectively.
* Operator-based pipeline
```r
mtcars$mpg %>>%
sample(size = 10000, replace = TRUE) %>>%
density(kernel = "gaussian") %>>%
plot(col = "red", main = "density of mpg (bootstrap)")
```* Object-based pipeline (`Pipe()`)
```r
Pipe(mtcars$mpg)$
sample(size = 10000, replace = TRUE)$
density(kernel = "gaussian")$
plot(col = "red", main = "density of mpg (bootstrap)")
```* Argument-based pipeline
```r
pipeline(mtcars$mpg,
sample(size = 10000, replace = TRUE),
density(kernel = "gaussian"),
plot(col = "red", main = "density of mpg (bootstrap)"))
```* Expression-based pipeline
```r
pipeline({
mtcars$mpg
sample(size = 10000, replace = TRUE)
density(kernel = "gaussian")
plot(col = "red", main = "density of mpg (bootstrap)")
})
```## Usage
### `%>>%`
Pipe operator `%>>%` basically pipes the left-hand side value forward to the right-hand side expression which is evaluated according to its syntax.
#### Pipe to first-argument of function
Many R functions are pipe-friendly: they take some data by the first argument and transform it in a certain way. This arrangement allows operations to be streamlined by pipes, that is, one data source can be put to the first argument of a function, get transformed, and put to the first argument of the next function. In this way, a chain of commands are connected, and it is called a pipeline.
On the right-hand side of `%>>%`, whenever a function name or call is supplied, the left-hand side value will always be put to the first unnamed argument to that function.
```r
rnorm(100) %>>%
plot
``````r
rnorm(100) %>>%
plot(col="red")
```Sometimes the value on the left is needed at multiple places. One can use `.` to represent it anywhere in the function call.
```r
rnorm(100) %>>%
plot(col="red", main=length(.))
```There are situations where one calls a function in a namespace with `::`. In this case, the call must end up with `()`.
```r
rnorm(100) %>>%
stats::median()
rnorm(100) %>>%
graphics::plot(col = "red")
```#### Pipe to `.` in an expression
Not all functions are pipe-friendly in every case: You may find some functions do not take your data produced by a pipeline as the first argument. In this case, you can enclose your expression by `{}` or `()` so that `%>>%` will use `.` to represent the value on the left.
```r
mtcars %>>%
{ lm(mpg ~ cyl + wt, data = .) }
``````r
mtcars %>>%
( lm(mpg ~ cyl + wt, data = .) )
```#### Pipe by formula as lambda expression
Sometimes, it may look confusing to use `.` to represent the value being piped. For example,
```r
mtcars %>>%
(lm(mpg ~ ., data = .))
```Although it works perfectly, it may look ambiguous if `.` has several meanings in one line of code.
`%>>%` accepts lambda expression to direct its piping behavior. Lambda expression is characterized by a formula enclosed within `()`, for example, `(x ~ f(x))`. It contains a user-defined symbol to represent the value being piped and the expression to be evaluated.
```r
mtcars %>>%
(df ~ lm(mpg ~ ., data = df))
``````r
mtcars %>>%
subset(select = c(mpg, wt, cyl)) %>>%
(x ~ plot(mpg ~ ., data = x))
```#### Pipe for side effect
In a pipeline, one may be interested not only in the final outcome but sometimes also in intermediate results. To print, plot or save the intermediate results, it must be a side-effect to avoid breaking the mainstream pipeline. For example, calling `plot()` to draw scatter plot returns `NULL`, and if one directly calls `plot()` in the middle of a pipeline, it would break the pipeline by changing the subsequent input to `NULL`.
One-sided formula that starts with `~` indicates that the right-hand side expression will only be evaluated for its side-effect, its value will be ignored, and the input value will be returned instead.
```r
mtcars %>>%
subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
(~ cat("rows:",nrow(.),"\n")) %>>% # cat() returns NULL
summary
``````r
mtcars %>>%
subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
(~ plot(mpg ~ wt, data = .)) %>>% # plot() returns NULL
(lm(mpg ~ wt, data = .)) %>>%
summary()
```With `~`, side-effect operations can be easily distinguished from mainstream pipeline.
An easier way to print the intermediate value it to use `(? expr)` syntax like asking question.
```r
mtcars %>>%
(? ncol(.)) %>>%
summary
```#### Pipe with assignment
In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning the value to a variable (symbol).
If one needs to assign the value to a symbol, just insert a step like `(~ symbol)`, then the input value of that step will be assigned to `symbol` in the current environment.
```r
mtcars %>>%
(lm(formula = mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary
```If the input value is not directly to be saved but after some transformation, then one can use `=`, `<-`, or more natural `->` to specify a lambda expression to tell what to be saved (thanks @yanlinlin82 for suggestion).
```r
mtcars %>>%
(~ summ = summary(.)) %>>% # side-effect assignment
(lm(formula = mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary
``````r
mtcars %>>%
(~ summary(.) -> summ) %>>%
mtcars %>>%
(~ summ <- summary(.)) %>>%
```An easier way to saving intermediate value that is to be further piped is to use `(symbol = expression)` syntax:
```r
mtcars %>>%
(~ summ = summary(.)) %>>% # side-effect assignment
(lm_mtcars = lm(formula = mpg ~ wt + cyl, data = .)) %>>% # continue piping
summary
```or `(expression -> symbol)` syntax:
```r
mtcars %>>%
(~ summary(.) -> summ) %>>% # side-effect assignment
(lm(formula = mpg ~ wt + cyl, data = .) -> lm_mtcars) %>>% # continue piping
summary
```#### Extract element from an object
`x %>>% (y)` means extracting the element named `y` from object `x` where `y` must be a valid symbol name and `x` can be a vector, list, environment or anything else for which `[[]]` is defined, or S4 object.
```r
mtcars %>>%
(lm(mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary %>>%
(r.squared)
```#### Compatibility
* Working with [dplyr](https://github.com/hadley/dplyr/):
```r
library(dplyr)
mtcars %>>%
filter(mpg <= mean(mpg)) %>>%
select(mpg, wt, cyl) %>>%
(~ plot(.)) %>>%
(model = lm(mpg ~ wt + cyl, data = .)) %>>%
(summ = summary(.)) %>>%
(coefficients)
```* Working with [ggvis](http://ggvis.rstudio.com/):
```r
library(ggvis)
mtcars %>>%
ggvis(~mpg, ~wt) %>>%
layer_points()
```* Working with [rlist](http://renkun.me/rlist/):
```r
library(rlist)
1:100 %>>%
list.group(. %% 3) %>>%
list.mapv(g ~ mean(g))
```### `Pipe()`
`Pipe()` creates a Pipe object that supports light-weight chaining without any external operator. Typically, start with `Pipe()` and end with `$value` or `[]` to extract the final value of the Pipe.
Pipe object provides an internal function `.(...)` that work exactly in the same way with `x %>>% (...)`, and it has more features than `%>>%`.
> NOTE: `.()` does not support assignment with `=` but supports `~`, `<-` and `->`.
#### Piping
```r
Pipe(rnorm(1000))$
density(kernel = "cosine")$
plot(col = "blue")
``````r
Pipe(mtcars)$
.(mpg)$
summary()
``````r
Pipe(mtcars)$
.(~ summary(.) -> summ)$
lm(formula = mpg ~ wt + cyl)$
summary()$
.(coefficients)
```#### Subsetting and extracting
```r
pmtcars <- Pipe(mtcars)
pmtcars[c("mpg","wt")]$
lm(formula = mpg ~ wt)$
summary()
pmtcars[["mpg"]]$mean()
```#### Assigning values
```r
plist <- Pipe(list(a=1,b=2))
plist$a <- 0
plist$b <- NULL
```#### Side effect
```r
Pipe(mtcars)$
.(? ncol(.))$
.(~ plot(mpg ~ ., data = .))$ # side effect: plot
lm(formula = mpg ~ .)$
.(~ lm_mtcars)$ # side effect: assign
summary()$
```#### Compatibility
* Working with dplyr:
```r
Pipe(mtcars)$
filter(mpg >= mean(mpg))$
select(mpg, wt, cyl)$
lm(formula = mpg ~ wt + cyl)$
summary()$
.(coefficients)$
value
```* Working with ggvis:
```r
Pipe(mtcars)$
ggvis(~ mpg, ~ wt)$
layer_points()
```* Working with rlist:
```r
Pipe(1:100)$
list.group(. %% 3)$
list.mapv(g ~ mean(g))$
value
```### `pipeline()`
`pipeline()` provides argument-based and expression-based pipeline evaluation mechanisms. Its behavior depends on how its arguments are supplied. If only the first argument is supplied, it expects an expression enclosed in `{}` in which each line represents a pipeline step. If, instead, multiple arguments are supplied, it regards each argument as a pipeline step. For all pipeline steps, the expressions will be transformed to be connected by `%>>%` so that they behave exactly the same.
One notable difference is that in `pipeline()`'s argument or expression, the special symbols to perform specially defined pipeline tasks (e.g. side-effect) does not need to be enclosed within `()` because no operator priority issues arise as they do in using `%>>%`.
```r
pipeline({
mtcars
lm(formula = mpg ~ cyl + wt)
~ lmodel
summary
? .$r.squared
coef
})
```Thanks [@hoxo_m](https://twitter.com/hoxo_m) for the idea presented in this [post](http://qiita.com/hoxo_m/items/3fd3d2520fa014a248cb).
## License
This package is under [MIT License](http://opensource.org/licenses/MIT).