https://github.com/ColinFay/odds

On Disk Data Storage for Cross-Session Access in R
https://github.com/ColinFay/odds

Last synced: 6 months ago
JSON representation

On Disk Data Storage for Cross-Session Access in R

Host: GitHub
URL: https://github.com/ColinFay/odds
Owner: ColinFay
License: other
Archived: true
Created: 2020-02-07T20:50:52.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-02-09T19:50:49.000Z (over 5 years ago)
Last Synced: 2024-08-13T07:15:11.189Z (10 months ago)
Language: R
Homepage:
Size: 19.5 KB
Stars: 16
Watchers: 5
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

jimsghstars - ColinFay/odds - On Disk Data Storage for Cross-Session Access in R (R)

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# odds

The goal of `{odds}` is to provide an on-disk data-storage of native R object, for cross-session data access.

## Installation

You can install the dev version of `{odds}` from GitHub with:

``` r

remotes::install_github("colinfay/odds")

```

## Basic use

The main goal of `{odds}` is to create a data storage architecture, on disk, so that you can access the values from one session to another.

### How it works

By default, the storage is done at `~/.odds`, but it can be changed when creating the storage object.

__Note that the path is passed through `fs::path_norm()`, which doesn't treat `~` the same way as base R on Windows.__

```{r}

library(odds)

st <- Storage$new()

```

There are two main methods: `set()` and `get()`. 

The first saves a value under a name on the disk, the second retrieve this value from the storage. 

```{r}

st$set(head(iris), "a")

st$get("a")

```

Storages can be namespaced, and the default is "global",

```{r}

nsp <- paste(sample(letters, 3), collapse = "")

st$set(mtcars, "a", namespace = nsp)

st$get("a", namespace = nsp)

```

### Cross session access

Let's create an object in another R session: 

```{r}

library(callr)

rx <- r_bg(

  function(){

    library(odds)

    st <- Storage$new()

    st$set(head(airquality), "ping", namespace = "blop")

  }

)

```

```{r include = FALSE}

rx$wait()

```

It's now accessible in the first session:

```{r}

st$get("ping", namespace = "blop")

```

Values can be deleted: 

```{r}

st$rm("ping", namespace = "blop")

```

Namespaces can be deleted: 

```{r}

st$remove_namespace(nsp)

st$remove_namespace("blop")

```

## Overhead

Of course, reading from disk adds some overhead, but for small to medium size objects, the cost of `get`ting from disk instead of reading for RAM is pretty small.

```{r}

library(dplyr, warn.conflicts = FALSE)

library(ggplot2)

st$set(diamonds, "dm", "bench")

bench::mark(

  ram = {

   diamonds %>% filter(cut == "Ideal")

  }, 

  disk = {

    st$get("dm", "bench") %>% 

      filter(cut == "Ideal")

  }

)

```

```{r include = FALSE}

st$remove_namespace("bench")

```

`set()` and `get()` are powered by `{qs}` `qread()` and `qwrite()` and take the same arguments, so you can use parameters to these functions to speed up the read and write timing. 

Read the `{qs}` benchmark [online](https://github.com/traversc/qs#summary-table).

## Acknowledgment 

This package heavily relies on the `{qs}` package. 

Thanks to the package authors for their work.

## Coc

Please note that the 'odds' project is released with a

[Contributor Code of Conduct](CODE_OF_CONDUCT.md).

By contributing to this project, you agree to abide by its terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ColinFay/odds

Awesome Lists containing this project

README