Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/renkun-ken/piper

Multi-Paradigm Pipeline Implementation
https://github.com/renkun-ken/piper
Last synced: 4 months ago
JSON representation
Multi-Paradigm Pipeline Implementation
Host: GitHub
URL: https://github.com/renkun-ken/piper
Owner: renkun-ken
License: other
Created: 2014-04-07T13:16:35.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2016-08-26T00:50:14.000Z (over 8 years ago)
Last Synced: 2024-10-19T05:59:15.478Z (4 months ago)
Language: R
Homepage:
Size: 417 KB
Stars: 169
Watchers: 26
Forks: 39
Open Issues: 5
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project

README

        ```{r knitsetup, echo=FALSE, results='hide', warning=FALSE, message=FALSE, cache=FALSE}

library(knitr)

opts_knit$set(base.dir='./', fig.path='', out.format='md')

opts_chunk$set(prompt=FALSE, comment='', results='markup')

library(pipeR)

```

# pipeR

[![Linux Build Status](https://travis-ci.org/renkun-ken/pipeR.png?branch=master)](https://travis-ci.org/renkun-ken/pipeR) 

[![Windows Build status](https://ci.appveyor.com/api/projects/status/github/renkun-ken/pipeR?svg=true)](https://ci.appveyor.com/project/renkun-ken/pipeR)

[![codecov.io](http://codecov.io/github/renkun-ken/pipeR/coverage.svg?branch=master)](http://codecov.io/github/renkun-ken/pipeR?branch=master)

[![CRAN Version](http://www.r-pkg.org/badges/version/pipeR)](https://cran.r-project.org/package=pipeR)

pipeR provides various styles of function chaining methods: 

* Pipe operator

* Pipe object

* pipeline function

Each of them represents a distinct pipeline model but they share almost a common set of features. A value can be piped to the next expression

* As the first unnamed argument of the function

* As dot symbol (`.`) in the expression

* As a named variable defined by a formula

* For side-effect that carries over the input to the next

* For assignment that saves an intermediate value

The syntax is designed to make the pipeline more readable and friendly to

a wide variety of operations.

**[pipeR Tutorial](http://renkun.me/pipeR-tutorial) is a highly recommended complete guide to pipeR.**

This document is also translated into [日本語](https://github.com/renkun-ken/pipeR/blob/master/README.ja.md) (by [@hoxo_m](https://github.com/hoxo-m/)).

## Installation

Install the latest development version from GitHub:

```r

devtools::install_github("renkun-ken/pipeR")

```

Install from [CRAN](https://cran.r-project.org/package=pipeR):

```r

install.packages("pipeR")

```

## Getting started

The following code is an example written in traditional approach:

It basically performs bootstrap on `mpg` values in built-in dataset `mtcars` and plots  its density function estimated by Gaussian kernel.

```r

plot(density(sample(mtcars$mpg, size = 10000, replace = TRUE), 

  kernel = "gaussian"), col = "red", main="density of mpg (bootstrap)")

```

The code is deeply nested and can be hard to read and maintain. In the following examples, the traditional code is rewritten by Pipe operator, `Pipe()` function and `pipeline()` function, respectively.

* Operator-based pipeline

```r

mtcars$mpg %>>%

  sample(size = 10000, replace = TRUE) %>>%

  density(kernel = "gaussian") %>>%

  plot(col = "red", main = "density of mpg (bootstrap)")

```

* Object-based pipeline (`Pipe()`)

```r

Pipe(mtcars$mpg)$

  sample(size = 10000, replace = TRUE)$

  density(kernel = "gaussian")$

  plot(col = "red", main = "density of mpg (bootstrap)")

```

* Argument-based pipeline

```r

pipeline(mtcars$mpg,

  sample(size = 10000, replace = TRUE),

  density(kernel = "gaussian"),

  plot(col = "red", main = "density of mpg (bootstrap)"))

```

* Expression-based pipeline

```r

pipeline({

  mtcars$mpg

  sample(size = 10000, replace = TRUE)

  density(kernel = "gaussian")

  plot(col = "red", main = "density of mpg (bootstrap)")  

})

```

## Usage

### `%>>%`

Pipe operator `%>>%` basically pipes the left-hand side value forward to the right-hand side expression which is evaluated according to its syntax.

#### Pipe to first-argument of function

Many R functions are pipe-friendly: they take some data by the first argument and transform it in a certain way. This arrangement allows operations to be streamlined by pipes, that is, one data source can be put to the first argument of a function, get transformed, and put to the first argument of the next function. In this way, a chain of commands are connected, and it is called a pipeline.

On the right-hand side of `%>>%`, whenever a function name or call is supplied, the left-hand side value will always be put to the first unnamed argument to that function.

```r

rnorm(100) %>>%

  plot

```

```r

rnorm(100) %>>%

  plot(col="red")

```

Sometimes the value on the left is needed at multiple places. One can use `.` to represent it anywhere in the function call.

```r

rnorm(100) %>>%

  plot(col="red", main=length(.))

```

There are situations where one calls a function in a namespace with `::`. In this case, the call must end up with `()`.

```r

rnorm(100) %>>%

  stats::median()

  

rnorm(100) %>>%

  graphics::plot(col = "red")

```

#### Pipe to `.` in an expression

Not all functions are pipe-friendly in every case: You may find some functions do not take your data produced by a pipeline as the first argument. In this case, you can enclose your expression by `{}` or `()` so that `%>>%` will use `.` to represent the value on the left.

```r

mtcars %>>%

  { lm(mpg ~ cyl + wt, data = .) }

```

```r

mtcars %>>%

  ( lm(mpg ~ cyl + wt, data = .) )

```

#### Pipe by formula as lambda expression

Sometimes, it may look confusing to use `.` to represent the value being piped. For example,

```r

mtcars %>>%

  (lm(mpg ~ ., data = .))

```

Although it works perfectly, it may look ambiguous if `.` has several meanings in one line of code. 

`%>>%` accepts lambda expression to direct its piping behavior. Lambda expression is characterized by a formula enclosed within `()`, for example, `(x ~ f(x))`. It contains a user-defined symbol to represent the value being piped and the expression to be evaluated.

```r

mtcars %>>%

  (df ~ lm(mpg ~ ., data = df))

```

```r

mtcars %>>%

  subset(select = c(mpg, wt, cyl)) %>>%

  (x ~ plot(mpg ~ ., data = x))

```

#### Pipe for side effect

In a pipeline, one may be interested not only in the final outcome but sometimes also in intermediate results. To print, plot or save the intermediate results, it must be a side-effect to avoid breaking the mainstream pipeline. For example, calling `plot()` to draw scatter plot returns `NULL`, and if one directly calls `plot()` in the middle of a pipeline, it would break the pipeline by changing the subsequent input to `NULL`.

One-sided formula that starts with `~` indicates that the right-hand side expression will only be evaluated for its side-effect, its value will be ignored, and the input value will be returned instead.

```r

mtcars %>>%

  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%

  (~ cat("rows:",nrow(.),"\n")) %>>%   # cat() returns NULL

  summary

```

```r

mtcars %>>%

  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%

  (~ plot(mpg ~ wt, data = .)) %>>%    # plot() returns NULL

  (lm(mpg ~ wt, data = .)) %>>%

  summary()

```

With `~`, side-effect operations can be easily distinguished from mainstream pipeline.

An easier way to print the intermediate value it to use `(? expr)` syntax like asking question.

```r

mtcars %>>% 

  (? ncol(.)) %>>%

  summary

```

#### Pipe with assignment

In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning the value to a variable (symbol).

If one needs to assign the value to a symbol, just insert a step like `(~ symbol)`, then the input value of that step will be assigned to `symbol` in the current environment.

```r

mtcars %>>%

  (lm(formula = mpg ~ wt + cyl, data = .)) %>>%

  (~ lm_mtcars) %>>%

  summary

```

If the input value is not directly to be saved but after some transformation, then one can use `=`, `<-`, or more natural `->` to specify a lambda expression to tell what to be saved (thanks @yanlinlin82 for suggestion).

```r

mtcars %>>%

  (~ summ = summary(.)) %>>%  # side-effect assignment

  (lm(formula = mpg ~ wt + cyl, data = .)) %>>%

  (~ lm_mtcars) %>>%

  summary

```

```r

mtcars %>>%

  (~ summary(.) -> summ) %>>%

  

mtcars %>>%

  (~ summ <- summary(.)) %>>%

```

An easier way to saving intermediate value that is to be further piped is to use `(symbol = expression)` syntax:

```r

mtcars %>>%

  (~ summ = summary(.)) %>>%  # side-effect assignment

  (lm_mtcars = lm(formula = mpg ~ wt + cyl, data = .)) %>>%  # continue piping

  summary

```

or `(expression -> symbol)` syntax:

```r

mtcars %>>%

  (~ summary(.) -> summ) %>>%  # side-effect assignment

  (lm(formula = mpg ~ wt + cyl, data = .) -> lm_mtcars) %>>%  # continue piping

  summary

```

#### Extract element from an object

`x %>>% (y)` means extracting the element named `y` from object `x` where `y` must be a valid symbol name and `x` can be a vector, list, environment or anything else for which `[[]]` is defined, or S4 object.

```r

mtcars %>>%

  (lm(mpg ~ wt + cyl, data = .)) %>>%

  (~ lm_mtcars) %>>%

  summary %>>%

  (r.squared)

```

#### Compatibility

* Working with [dplyr](https://github.com/hadley/dplyr/):

```r

library(dplyr)

mtcars %>>%

  filter(mpg <= mean(mpg)) %>>%  

  select(mpg, wt, cyl) %>>%

  (~ plot(.)) %>>%

  (model = lm(mpg ~ wt + cyl, data = .)) %>>%

  (summ = summary(.)) %>>%

  (coefficients)

```

* Working with [ggvis](http://ggvis.rstudio.com/):

```r

library(ggvis)

mtcars %>>%

  ggvis(~mpg, ~wt) %>>%

  layer_points()

```

* Working with [rlist](http://renkun.me/rlist/):

```r

library(rlist)

1:100 %>>%

  list.group(. %% 3) %>>%

  list.mapv(g ~ mean(g))

```

### `Pipe()`

`Pipe()` creates a Pipe object that supports light-weight chaining without any external operator. Typically, start with `Pipe()` and end with `$value` or `[]` to extract the final value of the Pipe. 

Pipe object provides an internal function `.(...)` that work exactly in the same way with `x %>>% (...)`, and it has more features than `%>>%`.

> NOTE: `.()` does not support assignment with `=` but supports `~`, `<-` and `->`.

#### Piping

```r

Pipe(rnorm(1000))$

  density(kernel = "cosine")$

  plot(col = "blue")

```

```r

Pipe(mtcars)$

  .(mpg)$

  summary()

```

```r

Pipe(mtcars)$

  .(~ summary(.) -> summ)$

  lm(formula = mpg ~ wt + cyl)$

  summary()$

  .(coefficients)

```

#### Subsetting and extracting

```r

pmtcars <- Pipe(mtcars)

pmtcars[c("mpg","wt")]$

  lm(formula = mpg ~ wt)$

  summary()

pmtcars[["mpg"]]$mean()

```

#### Assigning values

```r

plist <- Pipe(list(a=1,b=2))

plist$a <- 0

plist$b <- NULL

```

#### Side effect

```r

Pipe(mtcars)$

  .(? ncol(.))$

  .(~ plot(mpg ~ ., data = .))$    # side effect: plot

  lm(formula = mpg ~ .)$

  .(~ lm_mtcars)$                  # side effect: assign

  summary()$

```

#### Compatibility

* Working with dplyr:

```r

Pipe(mtcars)$

  filter(mpg >= mean(mpg))$

  select(mpg, wt, cyl)$

  lm(formula = mpg ~ wt + cyl)$

  summary()$

  .(coefficients)$

  value

```

* Working with ggvis:

```r

Pipe(mtcars)$

  ggvis(~ mpg, ~ wt)$

  layer_points()

```

* Working with rlist:

```r

Pipe(1:100)$

  list.group(. %% 3)$

  list.mapv(g ~ mean(g))$

  value

```

### `pipeline()`

`pipeline()` provides argument-based and expression-based pipeline evaluation mechanisms. Its behavior depends on how its arguments are supplied. If only the first argument is supplied, it expects an expression enclosed in `{}` in which each line represents a pipeline step. If, instead, multiple arguments are supplied, it regards each argument as a pipeline step. For all pipeline steps, the expressions will be transformed to be connected by `%>>%` so that they behave exactly the same.

One notable difference is that in `pipeline()`'s argument or expression, the special symbols to perform specially defined pipeline tasks (e.g. side-effect) does not need to be enclosed within `()` because no operator priority issues arise as they do in using `%>>%`.

```r

pipeline({

  mtcars

  lm(formula = mpg ~ cyl + wt)

  ~ lmodel

  summary

  ? .$r.squared

  coef

})

```

Thanks [@hoxo_m](https://twitter.com/hoxo_m) for the idea presented in this [post](http://qiita.com/hoxo_m/items/3fd3d2520fa014a248cb).

## License

This package is under [MIT License](http://opensource.org/licenses/MIT).