Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fstpackage/fsttable
An interface to fast on-disk data tables stored with the fst format
https://github.com/fstpackage/fsttable
Last synced: 9 days ago
JSON representation
An interface to fast on-disk data tables stored with the fst format
- Host: GitHub
- URL: https://github.com/fstpackage/fsttable
- Owner: fstpackage
- License: agpl-3.0
- Created: 2018-01-17T21:55:50.000Z (almost 7 years ago)
- Default Branch: develop
- Last Pushed: 2019-09-10T08:39:21.000Z (over 5 years ago)
- Last Synced: 2024-08-13T07:15:36.335Z (4 months ago)
- Language: R
- Size: 86.9 KB
- Stars: 27
- Watchers: 11
- Forks: 4
- Open Issues: 22
-
Metadata Files:
- Readme: README.Rmd
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - fstpackage/fsttable - An interface to fast on-disk data tables stored with the fst format (R)
README
---
output: github_document
editor_options:
chunk_output_type: console
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# fsttable
[![Linux/OSX Build Status](https://travis-ci.org/fstpackage/fsttable.svg?branch=develop)](https://travis-ci.org/fstpackage/fsttable)
[![Windows Build status](https://ci.appveyor.com/api/projects/status/nrjyuihxtx9amgpl/branch/develop?svg=true
)](https://ci.appveyor.com/project/fstpackage/fsttable)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)R package `fsttable` aims to provide a fully functional `data.table` interface to on-disk `fst` files. The focus of the package is on keeping memory usage as low as possible woithout sacrificing features of in-memory `data.table` operations.
## Installation
You can install the latest package version with:
``` r
devtools::install_github("fstpackage/fsttable")
```## Example
First, we create a on-disk _fst_ file containing a medium sized dataset:
```{r data}
library(fsttable)# write some sample data to disk
nr_of_rows <- 1e6
x <- data.table::data.table(X = 1:nr_of_rows, Y = LETTERS[1 + (1:nr_of_rows) %% 26])
fst::write_fst(x, "1.fst")
```Then we define our _fst\_table_ by using:
```{r proxy}
ft <- fst_table("1.fst")
```This _fst\_table_ can be used as a regular _data.table_ object. For example, we can print:
```{r print}
ft
```we can select columns:
```{r columns}
ft[, .(Y)]
```and rows:
```{r rows}
ft[1:4,]
```Or both at the same time:
```{r cols_and_rows}
ft[1:4, .(X)]
```# Memory
During the operations shown above, the actual data was never fully loaded from the file. That's because of `fsttable`'s philosophy of keeping RAM usage as low as possible. Printing a few lines of a table doesn't require knowlegde of the remaining lines, so `fsttable` will never actualy load them.
Even when you create a new set:
```{r copy}
ft2 <- ft[1:4, .(X)]
```No actual data is being loaded into RAM. The copy still uses the original _fst_ file to keep the data on-disk:
```{r internals}
# small size because actual data is still on disk
object.size(ft2)
```