{"id":13665940,"url":"https://github.com/fstpackage/fst","last_synced_at":"2026-01-14T20:01:38.119Z","repository":{"id":41086418,"uuid":"78566495","full_name":"fstpackage/fst","owner":"fstpackage","description":"Lightning Fast Serialization of Data Frames for R","archived":false,"fork":false,"pushed_at":"2024-09-26T11:56:12.000Z","size":2456,"stargazers_count":625,"open_issues_count":134,"forks_count":41,"subscribers_count":39,"default_branch":"develop","last_synced_at":"2025-05-23T11:30:23.899Z","etag":null,"topics":["compression","data-frame","data-storage","r"],"latest_commit_sha":null,"homepage":"http://www.fstpackage.org/fst/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fstpackage.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-01-10T19:30:05.000Z","updated_at":"2025-04-17T14:35:04.000Z","dependencies_parsed_at":"2023-12-05T16:44:19.977Z","dependency_job_id":"63bf5d86-f554-4f8f-8bbb-113da16b4047","html_url":"https://github.com/fstpackage/fst","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/fstpackage/fst","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fstpackage%2Ffst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fstpackage%2Ffst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fstpackage%2Ffst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fstpackage%2Ffst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fstpackage","download_url":"https://codeload.github.com/fstpackage/fst/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fstpackage%2Ffst/sbom","scorecard":{"id":412879,"data":{"date":"2025-08-11","repo":{"name":"github.com/fstpackage/fst","commit":"b35606332483699aef44b6edbcf38de68bd7fa9e"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.4,"checks":[{"name":"Code-Review","score":0,"reason":"Found 1/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/R-CMD-check.yaml:34: update your workflow using https://app.stepsecurity.io/secureworkflow/fstpackage/fst/R-CMD-check.yaml/develop?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/R-CMD-check.yaml:36: update your workflow using https://app.stepsecurity.io/secureworkflow/fstpackage/fst/R-CMD-check.yaml/develop?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/R-CMD-check.yaml:38: update your workflow using https://app.stepsecurity.io/secureworkflow/fstpackage/fst/R-CMD-check.yaml/develop?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/R-CMD-check.yaml:44: update your workflow using https://app.stepsecurity.io/secureworkflow/fstpackage/fst/R-CMD-check.yaml/develop?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/R-CMD-check.yaml:49: update your workflow using https://app.stepsecurity.io/secureworkflow/fstpackage/fst/R-CMD-check.yaml/develop?enable=pin","Info:   0 out of   1 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   4 third-party GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/R-CMD-check.yaml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: GNU Affero General Public License v3.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'develop'","Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 1 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T23:09:08.100Z","repository_id":41086418,"created_at":"2025-08-18T23:09:08.100Z","updated_at":"2025-08-18T23:09:08.100Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28434130,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T18:57:19.464Z","status":"ssl_error","status_checked_at":"2026-01-14T18:52:48.501Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","data-frame","data-storage","r"],"created_at":"2024-08-02T06:00:54.405Z","updated_at":"2026-01-14T20:01:38.102Z","avatar_url":"https://github.com/fstpackage.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput:\n  github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\"\n)\n```\n\n\u003cimg src=\"man/figures/fst.png\" align=\"right\" height=\"196\" width=\"196\" /\u003e\n\n[![Build Status](https://github.com/fstpackage/fst/actions/workflows/R-CMD-check.yaml/badge.svg?branch=develop)](https://github.com/fstpackage/fst/actions/workflows/R-CMD-check.yaml)\n[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)\n[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/fst)](https://cran.r-project.org/package=fst)\n[![fastverse status badge](https://fastverse.r-universe.dev/badges/fst)](https://fastverse.r-universe.dev/ui#package:fastverse)\n[![codecov](https://codecov.io/gh/fstpackage/fst/branch/develop/graph/badge.svg)](https://app.codecov.io/gh/fstpackage/fst)\n[![downloads](https://cranlogs.r-pkg.org/badges/fst)](http://cran.rstudio.com/web/packages/fst/index.html)\n[![total_downloads](https://cranlogs.r-pkg.org/badges/grand-total/fst)](http://cran.rstudio.com/web/packages/fst/index.html)\n\n## Overview\n\nThe [_fst_ package][fstRepo] for R  provides a fast, easy and flexible way to serialize data frames. With access speeds of multiple GB/s, _fst_ is specifically designed to unlock the potential of high speed solid state disks that can be found in most modern computers. Data frames stored in the _fst_ format have full random access, both in column and rows.\n\nThe figure below compares the read and write performance of the _fst_ package to various alternatives.\n\n```{r speedTable, results = 'asis', message = FALSE, echo = FALSE}\nrequire(ggplot2)\nrequire(data.table)\nrequire(fst)\nrequire(knitr)\n\nspeed_results \u003c- read_fst(\"res_readme.fst\", as.data.table = TRUE)\n\nspeeds \u003c- speed_results[NrOfRows == 5e7, list(\n  `Time (ms)` = as.integer(median(Time)),\n  `Size (MB)` = format(median(FileSize), digits = 2),\n  `Speed (MB/s)` = as.integer(median(Speed)),\n  N = .N),\n    by = \"Package,Mode\"]\n\nspeeds[Package == \"data.table\" \u0026 Mode == \"Write\", Method := \"fwrite\"]\nspeeds[Package == \"data.table\" \u0026 Mode == \"Read\", Method := \"fread\"]\nspeeds[Package == \"fst\" \u0026 Mode == \"Write\", Method := \"write_fst\"]\nspeeds[Package == \"fst\" \u0026 Mode == \"Read\", Method := \"read_fst\"]\nspeeds[Package == \"baseR\" \u0026 Mode == \"Write\", Method := \"saveRDS\"]\nspeeds[Package == \"baseR\" \u0026 Mode == \"Read\", Method := \"readRDS\"]\nspeeds[Package == \"feather\" \u0026 Mode == \"Write\", Method := \"write_feather\"]\nspeeds[Package == \"feather\" \u0026 Mode == \"Read\", Method := \"read_feather\"]\nspeeds[, Format := \"bin\"]\nspeeds[Package == \"data.table\", Format := \"csv\"]\nsetkey(speeds, Package, Mode)\n\ndisplay_speeds \u003c- copy(speeds)\nfor (col in colnames(display_speeds)[2:ncol(display_speeds)]) {\n  setnames(display_speeds, col, \"CurCol\")\n  display_speeds[, CurCol := as.character(CurCol)]\n  display_speeds[Package == \"fst\", CurCol := paste0(\"**\", CurCol, \"**\")]\n  setnames(display_speeds, \"CurCol\", col)\n}\n\nkable(display_speeds[, c(7, 8, 3, 4, 5, 6)])\n```\n\nThese benchmarks were performed on a laptop (i7 4710HQ @2.5 GHz) with a reasonably fast SSD (M.2 Samsung SM951) using the dataset defined below. Parameter *Speed* was calculated by dividing the in-memory size of the data frame by the measured time. These results are also visualized in the following graph:\n\n```{r speed-bench, echo = FALSE, message = FALSE, results = 'hide', fig.width = 8.5, fig.height = 6}\nggplot(speed_results[NrOfRows == 5e7, .(Speed = median(Speed)), by = \"Package,Mode\"]) +\n  geom_bar(aes(Mode, Speed, fill = Mode), colour = \"darkgrey\", stat = \"identity\") +\n  facet_wrap(~ Package, 1) +\n  ylim(0, NA) +\n  ylab(\"Read or Write speed (MB/s)\") +\n  xlab(\"\") +\n  theme(legend.position=\"none\")\n```\n\nAs can be seen from the figure, the measured speeds for the _fst_ package are very high and even top the maximum drive speed of the SSD used. The package accomplishes this by an effective combination of multi-threading and compression. The on-disk file sizes of _fst_ files are also much smaller than that of the other formats tested. This is an added benefit of _fst_'s use of type-specific compressors on each stored column.\n\nIn addition to methods for data frame serialization, _fst_ also provides methods for multi-threaded in-memory compression with the popular LZ4 and ZSTD compressors and an extremely fast multi-threaded hasher.\n\n## Multi-threading\n\nThe _fst_ package relies heavily on multi-threading to boost the read- and write speed of data frames. To maximize throughput, _fst_ compresses and decompresses data _in the background_ and tries to keep the disk busy writing and reading data at the same time.\n\n## Installation\n\nThe easiest way to install the package is from CRAN:\n\n```{r, eval = FALSE}\ninstall.packages(\"fst\")\n```\n\nYou can also use the development version from GitHub:\n\n```{r, eval = FALSE}\n# install.packages(\"devtools\")\ndevtools::install_github(\"fstpackage/fst\", ref = \"develop\")\n```\n\n## Basic usage\n\n```{r, results = 'hide', echo = FALSE, message = FALSE}\nrequire(fst)\n```\n\nUsing _fst_ is simple. Data can be stored and retrieved using methods _write\\_fst_ and _read\\_fst_:\n\n```{r}\n# Generate some random data frame with 10 million rows and various column types\nnr_of_rows \u003c- 1e7\n\ndf \u003c- data.frame(\n    Logical = sample(c(TRUE, FALSE, NA), prob = c(0.85, 0.1, 0.05), nr_of_rows, replace = TRUE),\n    Integer = sample(1L:100L, nr_of_rows, replace = TRUE),\n    Real = sample(sample(1:10000, 20) / 100, nr_of_rows, replace = TRUE),\n    Factor = as.factor(sample(labels(UScitiesD), nr_of_rows, replace = TRUE))\n  )\n\n# Store the data frame to disk\n  write_fst(df, \"dataset.fst\")\n  \n# Retrieve the data frame again\n  df \u003c- read_fst(\"dataset.fst\")\n```\n\n_Note: the dataset defined in this example code was also used to obtain the benchmark results shown in the introduction._\n\n## Random access\n\nThe _fst_ file format provides full random access to stored datasets. You can retrieve a selection of columns and rows with:\n\n```{r, results = 'hide'}\n  df_subset \u003c- read_fst(\"dataset.fst\", c(\"Logical\", \"Factor\"), from = 2000, to = 5000)\n```\n\nThis reads rows 2000 to 5000 from columns _Logical_ and _Factor_ without actually touching any other data in the stored file. That means that a subset can be read from file **without reading the complete file first**. This is different from, say, _readRDS_ or _read\\_feather_ where you have to read the complete file or column before you can make a subset.\n\n## Compression\n\nFor compression the excellent and speedy [LZ4][lz4Repo] and [ZSTD][zstdRepo] compression algorithms are used. These compressors (in combination with type-specific bit filters), enable _fst_ to achieve high compression speeds at reasonable compression factors. The compression factor can be tuned from 0 (minimum) to 100 (maximum):\n\n```{r, results = 'hide'}\nwrite_fst(df, \"dataset.fst\", 100)  # use maximum compression\n```\n\nCompression reduces the size of the _fst_ file that holds your data. But because the (de-)compression is done _on background threads_, it can increase the total read- and write speed as well. The graph below shows how the use of multiple threads enhances the read and write speed of our sample dataset.\n\n```{r multi-threading, fig.width = 10,  fig.height = 8, echo = FALSE}\nspeed_results[Package %in% c(\"baseR\", \"feather\"), Threads := 1]\nspeed_results[, Threads := as.factor(Threads)]\nspeed_results[, Speed := 0.001 * Speed]\nspeed_results[, NrOfRows := ifelse(NrOfRows == 1e7, \"1 million rows\", \"10 million rows\")]\n\n\nfig_results \u003c- speed_results[, .(Speed = median(Speed)), by = \"Package,Mode,NrOfRows,Threads\"]\n\nggplot(fig_results) +\n  geom_bar(aes(Package, Speed, fill = Threads),\n    stat = \"identity\", position = \"dodge\", width = 0.8) +\n  facet_grid(Mode ~ NrOfRows, scales = \"free_y\") +\n  scale_fill_manual(values = colorRampPalette(c(\"#FBE3A5\", \"#433B54\"))(10)[10:3]) +\n  ylab(\"Read or write speed in GB/s\") +\n  theme_minimal()\n```\n\nThe _csv_ format used by the _fread_ and _fwrite_ methods of package _data.table_ is actually a human-readable text format and not a binary format. Normally, binary formats would be much faster than the _csv_ format, because _csv_ takes more space on disk, is row based, uncompressed and needs to be parsed into a computer-native format to have any meaning. So any serializer that's working on _csv_ has an enormous disadvantage as compared to binary formats. Yet, the results show that _data.table_ is on par with binary formats and when more threads are used, it can even be faster. Because of this impressive performance, it was included in the graph for comparison.\n\n## Bindings in other languages\n\n**Julia**: [**`FstFileFormat.jl`**][fstformatRepo] A naive Julia binding using RCall.jl\n\n\u003e **Note to users**: From CRAN release v0.8.0, the _fst_ format is stable and backwards compatible. That means that all _fst_ files generated with package v0.8.0 or later can be read by future versions of the package.\n\n```{r, echo = FALSE, results = 'hide'}\n# cleanup\nfile.remove(\"dataset.fst\")\n```\n\n[fstRepo]: https://github.com/fstpackage/fst\n[lz4Repo]: https://github.com/lz4/lz4\n[zstdRepo]: https://github.com/facebook/zstd\n[fstformatRepo]: https://github.com/xiaodaigh/FstFileFormat.jl\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffstpackage%2Ffst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffstpackage%2Ffst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffstpackage%2Ffst/lists"}