Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nbenn/prt

Tabular Data Backed by Partitioned `fst` Files
https://github.com/nbenn/prt

Last synced: 3 months ago
JSON representation

Tabular Data Backed by Partitioned `fst` Files

Awesome Lists containing this project

README

        

---
output:
github_document:
html_preview: false
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(prt)
```

# prt

[![Lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Codecov test coverage](https://app.codecov.io/gh/nbenn/prt/branch/master/graph/badge.svg?token=HvOM3yosW3)](https://app.codecov.io/gh/nbenn/prt)
[![R build status](https://github.com/nbenn/prt/workflows/build/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Abuild)
[![pkgdown build status](https://github.com/nbenn/prt/workflows/pkgdown/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Apkgdown)
[![covr status](https://github.com/nbenn/prt/workflows/coverage/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Acoverage)

Building on `data.frame` serialization provided by [`fst`](https://www.fstpackage.org), `prt` offers an interface for working with partitioned `data.frame`s, saved as individual `fst` files.

## Installation

You can install the development version of [prt](https://nbenn.github.io/prt/) from GitHub by running

```{r gh-dev, eval = FALSE}
source("https://install-github.me/nbenn/prt")
```

Alternatively, if you have the `remotes` package available, the latest release is available by calling `install_github()` as

```{r gh-rel, eval = FALSE}
# install.packages("remotes")
remotes::install_github("nbenn/prt@*release")
```

## Short demo

Creating a `prt` object can be done either by calling `new_prt()` on a list of previously created `fst` files or by coercing a `data.frame` object to `prt` using `as_prt()`.

```{r create}
tmp <- tempfile()
dir.create(tmp)

flights <- as_prt(nycflights13::flights, n_chunks = 2L, dir = tmp)

print(flights)
```

In case a `prt` object is created from a `data.frame`, the specified number of files is written to the directory of choice (a newly created directory within `tempdir()` by default).

```{r inspect}
list.files(tmp)
```

Subsetting and printing is closely modeled after `tibble` and behavior that deviates from that of `tibble` will most likely be considered a bug (please [report](https://github.com/nbenn/prt/issues/new)). Some design choices that do set a `prt` object apart from a `tibble` include the use of `data.table`s for any result of a subsetting operation and the complete disregard for `row.names`.

In addition to standard subsetting operations involving the functions &grave;[&grave;(), &grave;[[&grave;() and &grave;$&grave;(), the base generic function `subset()` is implemented for the `prt` class, enabling subsetting operations using non-standard evaluation. Combined with random access to tables stored as `fst` files, this can make data access more efficient in cases where only a subset of the data is of interest.

```{r subset}
jan <- flights[flights$month == 1, ]
identical(jan, subset(flights, month == 1))
print(jan)
```

A subsetting operation on a `prt` object yields a `data.table`. If the full table is of interest, a `prt`-specific implementation of the `as.data.table()` generic is available.

```{r cleanup}
unlink(tmp, recursive = TRUE)
```