Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ColinFay/odds

On Disk Data Storage for Cross-Session Access in R
https://github.com/ColinFay/odds

Last synced: 8 days ago
JSON representation

On Disk Data Storage for Cross-Session Access in R

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# odds

The goal of `{odds}` is to provide an on-disk data-storage of native R object, for cross-session data access.

## Installation

You can install the dev version of `{odds}` from GitHub with:

``` r
remotes::install_github("colinfay/odds")
```

## Basic use

The main goal of `{odds}` is to create a data storage architecture, on disk, so that you can access the values from one session to another.

### How it works

By default, the storage is done at `~/.odds`, but it can be changed when creating the storage object.

__Note that the path is passed through `fs::path_norm()`, which doesn't treat `~` the same way as base R on Windows.__

```{r}
library(odds)
st <- Storage$new()
```

There are two main methods: `set()` and `get()`.
The first saves a value under a name on the disk, the second retrieve this value from the storage.

```{r}
st$set(head(iris), "a")
st$get("a")
```

Storages can be namespaced, and the default is "global",

```{r}
nsp <- paste(sample(letters, 3), collapse = "")

st$set(mtcars, "a", namespace = nsp)
st$get("a", namespace = nsp)
```

### Cross session access

Let's create an object in another R session:

```{r}
library(callr)
rx <- r_bg(
function(){
library(odds)
st <- Storage$new()
st$set(head(airquality), "ping", namespace = "blop")
}
)
```

```{r include = FALSE}
rx$wait()
```

It's now accessible in the first session:

```{r}
st$get("ping", namespace = "blop")
```

Values can be deleted:

```{r}
st$rm("ping", namespace = "blop")
```
Namespaces can be deleted:

```{r}
st$remove_namespace(nsp)
st$remove_namespace("blop")
```

## Overhead

Of course, reading from disk adds some overhead, but for small to medium size objects, the cost of `get`ting from disk instead of reading for RAM is pretty small.

```{r}
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
st$set(diamonds, "dm", "bench")
bench::mark(
ram = {
diamonds %>% filter(cut == "Ideal")
},
disk = {
st$get("dm", "bench") %>%
filter(cut == "Ideal")
}
)
```

```{r include = FALSE}
st$remove_namespace("bench")
```

`set()` and `get()` are powered by `{qs}` `qread()` and `qwrite()` and take the same arguments, so you can use parameters to these functions to speed up the read and write timing.

Read the `{qs}` benchmark [online](https://github.com/traversc/qs#summary-table).

## Acknowledgment

This package heavily relies on the `{qs}` package.
Thanks to the package authors for their work.

## Coc

Please note that the 'odds' project is released with a
[Contributor Code of Conduct](CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.