Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nbenn/prt
Tabular Data Backed by Partitioned `fst` Files
https://github.com/nbenn/prt
Last synced: 3 months ago
JSON representation
Tabular Data Backed by Partitioned `fst` Files
- Host: GitHub
- URL: https://github.com/nbenn/prt
- Owner: nbenn
- License: gpl-3.0
- Created: 2019-10-24T16:45:04.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-04-08T14:40:25.000Z (almost 2 years ago)
- Last Synced: 2024-11-02T12:28:09.226Z (3 months ago)
- Language: R
- Homepage: https://nbenn.github.io/prt/
- Size: 390 KB
- Stars: 6
- Watchers: 3
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE.md
Awesome Lists containing this project
README
---
output:
github_document:
html_preview: false
---```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(prt)
```# prt
[![Lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Codecov test coverage](https://app.codecov.io/gh/nbenn/prt/branch/master/graph/badge.svg?token=HvOM3yosW3)](https://app.codecov.io/gh/nbenn/prt)
[![R build status](https://github.com/nbenn/prt/workflows/build/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Abuild)
[![pkgdown build status](https://github.com/nbenn/prt/workflows/pkgdown/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Apkgdown)
[![covr status](https://github.com/nbenn/prt/workflows/coverage/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Acoverage)Building on `data.frame` serialization provided by [`fst`](https://www.fstpackage.org), `prt` offers an interface for working with partitioned `data.frame`s, saved as individual `fst` files.
## Installation
You can install the development version of [prt](https://nbenn.github.io/prt/) from GitHub by running
```{r gh-dev, eval = FALSE}
source("https://install-github.me/nbenn/prt")
```Alternatively, if you have the `remotes` package available, the latest release is available by calling `install_github()` as
```{r gh-rel, eval = FALSE}
# install.packages("remotes")
remotes::install_github("nbenn/prt@*release")
```## Short demo
Creating a `prt` object can be done either by calling `new_prt()` on a list of previously created `fst` files or by coercing a `data.frame` object to `prt` using `as_prt()`.
```{r create}
tmp <- tempfile()
dir.create(tmp)flights <- as_prt(nycflights13::flights, n_chunks = 2L, dir = tmp)
print(flights)
```In case a `prt` object is created from a `data.frame`, the specified number of files is written to the directory of choice (a newly created directory within `tempdir()` by default).
```{r inspect}
list.files(tmp)
```Subsetting and printing is closely modeled after `tibble` and behavior that deviates from that of `tibble` will most likely be considered a bug (please [report](https://github.com/nbenn/prt/issues/new)). Some design choices that do set a `prt` object apart from a `tibble` include the use of `data.table`s for any result of a subsetting operation and the complete disregard for `row.names`.
In addition to standard subsetting operations involving the functions
`[`()
,`[[`()
and`$`()
, the base generic function `subset()` is implemented for the `prt` class, enabling subsetting operations using non-standard evaluation. Combined with random access to tables stored as `fst` files, this can make data access more efficient in cases where only a subset of the data is of interest.```{r subset}
jan <- flights[flights$month == 1, ]
identical(jan, subset(flights, month == 1))
print(jan)
```A subsetting operation on a `prt` object yields a `data.table`. If the full table is of interest, a `prt`-specific implementation of the `as.data.table()` generic is available.
```{r cleanup}
unlink(tmp, recursive = TRUE)
```