An open API service indexing awesome lists of open source software.

https://github.com/Tazovsky/cachemeR

Convenient way of caching objects in R
https://github.com/Tazovsky/cachemeR

cache caching pipe r r-package r-programming

Last synced: 4 months ago
JSON representation

Convenient way of caching objects in R

Awesome Lists containing this project

README

        

---
output: github_document
---

# cachemeR

[![Build Status](https://travis-ci.org/Tazovsky/cachemeR.svg?branch=devel)](https://travis-ci.org/Tazovsky/cachemeR)
[![codecov](https://codecov.io/gh/Tazovsky/cachemeR/branch/devel/graph/badge.svg)](https://codecov.io/gh/Tazovsky/cachemeR)
[![Coverage Status](https://coveralls.io/repos/github/Tazovsky/cachemeR/badge.svg?branch=devel)](https://coveralls.io/github/Tazovsky/cachemeR?branch=devel)

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = ">",
out.width = "100%"
)
```

## Overview {#overview}

`cachemeR` is a convenient way of caching functions in R.
From the beginning the purpose of this package is to make caching as easy as possible
and to put as less effort as possible to implement it in existsing projects :)

## Installation {#installation}

```{r eval=FALSE}
if (!require("devtools"))
install.packages("devtools")

devtools::install_github("Tazovsky/cachemeR@devel")
```

## Usage - `%c-%` operator {#usage}

Cache has to be initialized. It requires to run `cachemer$new` with path to `config.yaml`.
Then all you need to cache is to use pipe **`%c-%`** instead of assignment operator `<-`.
In following example function `doLm` fits linear model:

```{r}
library(cachemeR)

doLm <- function(rows, cols, verbose = TRUE) {
if (verbose)
print("Function is run")
set.seed(1234)
X <- matrix(rnorm(rows*cols), rows, cols)
b <- sample(1:cols, cols)
y <- runif(1) + X %*% b + rnorm(rows)
model <- lm(y ~ X)
}

# create dir where cache will be stored
dir.create(tmp.dir <- tempfile())

# create path to config.yaml - yaml fild must be in that dir
config.file <- file.path(tmp.dir, "config.yaml")

# initialize cachemeR
cache <- cachemer$new(path = config.file)

cache$setLogger(TRUE)

# cache function
result1 %c-% doLm(5, 5)
result1

# function is cached now so if you re-run function then
# output will be retrieved from cache instead of executing 'doLm' function again
result2 %c-% doLm(5, 5)
result2

```

```{r, include=FALSE}
on.exit(unlink(dirname(config.file), T, T))
```

Operator `%c-%` is sesitive to function name, function body, argument (whether argument is named or not,
or is list or not, or is declared in parent environment, etc.):

```{r}
library(cachemeR)

dir.create(tmp.dir <- tempfile())
config.file <- file.path(tmp.dir, "config.yaml")
cache <- cachemer$new(path = config.file)

testFun <- function(a, b) {
(a+b) ^ (a*b)
}

cache$setLogger(TRUE)

result1 %c-% testFun(a = 2, b = 3)

testFun <- function(a, b) {
(a+b) / (a*b)
}

# function name didn't change, but function body did so it will be cached:
result2 %c-% testFun(a = 2, b = 3)

result1
result2

```

```{r, include=FALSE}
on.exit(unlink(dirname(config.file), T, T))
```

## Share elements across all instances of a class

```{r}
library(cachemeR)

dir.create(tmp.dir <- tempfile())
config.file <- file.path(tmp.dir, "config.yaml")
on.exit(unlink(tmp.dir, TRUE, TRUE), add = TRUE)
cache <- cachemer$new(path = config.file)

cache$setLogger(TRUE)

lm_fit <- function(rows, cols) {
print("Fitting linear model...")
set.seed(1234)
X <- matrix(rnorm(rows*cols), rows, cols)
b <- sample(1:cols, cols)
y <- runif(1) + X %*% b + rnorm(rows)
model <- lm(y ~ X)
}

fun3 <- function() {
print("> fun3")
y %c-% lm_fit(123, 123)
return(y)
}
fun2 <- function() {
print("> fun2")
fun3()
}
fun1 <- function() {
print("> fun1")
fun2()
}

x %c-% lm_fit(123, 123) # cache function on the toppest level
x2 %c-% fun1()
identical(x, x2)

```

But it also **has some [limitations](#limitations)**.

## Use cases {#usecases}

1. shiny app example
2. calculate Fibonacci
3. ?

## Limitations {#limitations}

Generally `cachemeR` is designed to cache R functions only. And it won't work if:

- Cached function's argument value contains `$` or `[[]]`:

```{r, eval=FALSE}
args = list(a = 1, b = 2)
res %c-% fun(a = args$a, b = args[["b"]])
```

- You want to cache something else than function only:

```{r, eval=FALSE}

res %c-% fun(a = 1, b = 2) + 1

# or expression in parenthesis:
res %c-% (fun(a = 1, b = 2) + 1)
res %c-% {fun(a = 1, b =2) + 1}

# or simple value/variable:
arg <- 1
res %c-% arg
```

- You want to use it with `magrittr` pipes, for example with `%>%`:

```{r, eval=FALSE}
getDF <- function(nm) {
switch(nm,
iris = iris,
mtcars = mtcars)
}

library(dplyr)
res %c-% getDF("iris") %>% summary()

```

## Microbenchmark {#microbenchmark}

```{r}
# microbenchmark
cache <- cachemer$new(path = config.file)
cache$setLogger(FALSE)

test_no_cache <- function(n) {
result_no_cache <- doLm(n, n, verbose = FALSE)
}

test_cache <- function(n) {
result_no_cache %c-% doLm(n, n, verbose = FALSE)
}

res1 <- microbenchmark::microbenchmark(
test_no_cache(400),
test_cache(400)
)

res1

# but now just try it assuming the calculation has been already cached
res2 <- microbenchmark::microbenchmark(
test_no_cache(400),
test_cache(400)
)

res2
```
# Dev environment

Package is developed in RStudio run in container:

```bash
# R 3.6.1
docker build -f Dockerfile-R3.6.1 -t kfoltynski/cachemer:3.6.1 .
# or R 4.0.0
docker build -f Dockerfile-R4.0.0 -t kfoltynski/cachemer:4.0.0 .

# run container with RStudio listening on 8790

# R 3.6.1
docker run -d --name cachemer --rm -v $(PWD):/mnt/vol -w /mnt/vol -p 8790:8787 kfoltynski/cachemer:3.6.1
# R 4.0.0
docker run -d --name cachemer --rm -v $(PWD):/mnt/vol -w /mnt/vol -p 8790:8787 kfoltynski/cachemer:4.0.0
```