{"id":14068580,"url":"https://github.com/hope-data-science/tidyfst","last_synced_at":"2025-05-11T17:52:30.782Z","repository":{"id":40633940,"uuid":"240626994","full_name":"hope-data-science/tidyfst","owner":"hope-data-science","description":"Tidy Verbs for Fast Data Manipulation","archived":false,"fork":false,"pushed_at":"2025-05-07T10:32:28.000Z","size":19619,"stargazers_count":104,"open_issues_count":0,"forks_count":7,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-11T11:41:29.031Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://hope-data-science.github.io/tidyfst/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hope-data-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/CONTRIBUTING.html","funding":null,"license":"LICENSE","code_of_conduct":"docs/CODE_OF_CONDUCT.html","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"docs/SUPPORT.html","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-02-15T01:25:55.000Z","updated_at":"2025-05-08T08:10:49.000Z","dependencies_parsed_at":"2025-04-14T16:41:53.115Z","dependency_job_id":"93d9a8c8-f6b9-439e-b294-4a758a2898d2","html_url":"https://github.com/hope-data-science/tidyfst","commit_stats":{"total_commits":315,"total_committers":2,"mean_commits":157.5,"dds":0.01904761904761909,"last_synced_commit":"204b1961386282878fab6594f69734c6e83954f9"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hope-data-science%2Ftidyfst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hope-data-science%2Ftidyfst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hope-data-science%2Ftidyfst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hope-data-science%2Ftidyfst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hope-data-science","download_url":"https://codeload.github.com/hope-data-science/tidyfst/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253561077,"owners_count":21927761,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-13T07:06:17.258Z","updated_at":"2025-05-11T17:52:30.759Z","avatar_url":"https://github.com/hope-data-science.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"# tidyfst: Tidy Verbs for Fast Data Manipulation\u003cimg src=\"man/figures/hex-tidyfst_url.png\" align=\"right\" alt=\"\" width=\"120\" /\u003e\r\n\r\n [![](https://www.r-pkg.org/badges/version/tidyfst?color=orange)](https://cran.r-project.org/package=tidyfst) [![](https://img.shields.io/badge/devel%20version-2.0.0-purple.svg)](https://github.com/hope-data-science/tidyfst) ![](https://img.shields.io/badge/lifecycle-stable-green.svg)  [![downloads](http://cranlogs.r-pkg.org/badges/grand-total/tidyfst?color=yellow)](https://r-pkg.org/pkg/tidyfst)\r\n\r\n [![download](https://cranlogs.r-pkg.org/badges/tidyfst?color=red)](https://rdrr.io/cran/tidyfst/) [![downloads](https://cranlogs.r-pkg.org/badges/last-week/tidyfst?color=ff69b4)](https://cran.r-project.org/package=tidyfst) [![downloads](https://cranlogs.r-pkg.org/badges/last-day/tidyfst?color=9cf)](https://cran.r-project.org/package=tidyfst)\r\n\r\n [![ZENODO DOI](https://zenodo.org/badge/240626994.svg)](https://zenodo.org/badge/latestdoi/240626994) [![JOSS DOI](http://joss.theoj.org/papers/10.21105/joss.02388/status.svg)](https://joss.theoj.org/papers/10.21105/joss.02388)\r\n\r\n\r\n\r\n## Overview\r\n\r\n*tidyfst* is a toolkit of tidy data manipulation verbs with *data.table* as the backend . Combining the merits of syntax elegance from *dplyr* and computing performance from *data.table*,  *tidyfst* intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of *data.table*, while enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations.  Also, *tidyfst* would introduce more tidy data verbs from other packages, including but not limited to *tidyverse* and *data.table*. If you are a *dplyr* user but have to use *data.table* for speedy computation,  or *data.table* user looking for readable coding syntax, *tidyfst* is designed for you (and me of course). For further details and tutorials, see [vignettes](https://hope-data-science.github.io/tidyfst/). Both [Chinese](https://hope-data-science.github.io/tidyfst/articles/chinese_tutorial.html) and [English](https://hope-data-science.github.io/tidyfst/articles/english_tutorial.html) tutorials could be found there.\r\n\r\nTill now, *tidyfst* has an API that might even transcend its predecessors (e.g. [`select_dt`](https://hope-data-science.github.io/tidyfst/reference/select.html) could accept nearly anything for super column selection). Enjoy the efficient data operations in *tidyfst* !\r\n\r\nPS: For extreme performance in tidy syntax, try *tidyfst*'s mirror package [tidyft](https://github.com/hope-data-science/tidyft). \r\n\r\n\r\n\r\n## Features\r\n\r\n- Receives any data.frame (tibble/data.table/data.frame) and returns a data.table.\r\n- Show the variable class of data.table as default.\r\n- Never use in place replacement (also known as modification by reference, which means the original variable would not be modified without notification).\r\n- Use suffix (\"_dt\") rather than prefix to increase the efficiency (especially when you have IDE with automatic code completion).\r\n- More flexible verbs (e.g. [pairwise_count_dt](https://hope-data-science.github.io/tidyfst/reference/pairwise.html)) for big data manipulation.\r\n- Supporting data importing and parsing with *fst*, which saves both time and memory. Details see [parse_fst/select_fst/filter_fst](https://hope-data-science.github.io/tidyfst/reference/fst.html) and [import_fst/export_fst](https://hope-data-science.github.io/tidyfst/reference/fst_io.html).\r\n- Low and stable dependency on mature packages (data.table, fst, stringr)\r\n\r\n\r\n\r\n## Installation\r\n\r\n```R\r\ninstall.packages(\"tidyfst\")\r\n```\r\n\r\n\r\n\r\n## Example\r\n\r\n```R\r\nlibrary(tidyfst)\r\n\r\niris %\u003e%\r\n  mutate_dt(group = Species,sl = Sepal.Length,sw = Sepal.Width) %\u003e%\r\n  select_dt(group,sl,sw) %\u003e%\r\n  filter_dt(sl \u003e 5) %\u003e%\r\n  arrange_dt(group,sl) %\u003e%\r\n  distinct_dt(sl,.keep_all = T) %\u003e%\r\n  summarise_dt(sw = max(sw),by = group)\r\n#\u003e         group  sw\r\n#\u003e        \u003cfctr\u003e \u003cnum\u003e\r\n#\u003e 1:     setosa 4.4\r\n#\u003e 2: versicolor 3.4\r\n#\u003e 3:  virginica 3.8\r\n\r\niris %\u003e%\r\n  count_dt(Species) %\u003e%\r\n  add_prop()\r\n#\u003e       Species     n      prop prop_label\r\n#\u003e        \u003cfctr\u003e \u003cint\u003e     \u003cnum\u003e     \u003cchar\u003e\r\n#\u003e 1:     setosa    50 0.3333333      33.3%\r\n#\u003e 2: versicolor    50 0.3333333      33.3%\r\n#\u003e 3:  virginica    50 0.3333333      33.3%\r\n\r\niris[3:8,] %\u003e%\r\n  mutate_when(Petal.Width == .2,\r\n              one = 1,Sepal.Length=2)\r\n#\u003e    Sepal.Length Sepal.Width Petal.Length Petal.Width Species one\r\n#\u003e          \u003cnum\u003e       \u003cnum\u003e        \u003cnum\u003e       \u003cnum\u003e  \u003cfctr\u003e \u003cnum\u003e\r\n#\u003e 1:          2.0         3.2          1.3         0.2  setosa   1\r\n#\u003e 2:          2.0         3.1          1.5         0.2  setosa   1\r\n#\u003e 3:          2.0         3.6          1.4         0.2  setosa   1\r\n#\u003e 4:          5.4         3.9          1.7         0.4  setosa  NA\r\n#\u003e 5:          4.6         3.4          1.4         0.3  setosa  NA\r\n#\u003e 6:          2.0         3.4          1.5         0.2  setosa   1\r\n\r\n\r\n```\r\n\r\n\r\n\r\n## Future plans\r\n\r\n*tidyfst* will keep up with the [updates](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) of *data.table* , in the next step would introduce more new features to improve the performance and flexibility to facilitate fast data manipulation in tidy syntax. \r\n\r\n\r\n\r\n## Vignettes\r\n- [Example 1: Basic usage](https://hope-data-science.github.io/tidyfst/articles/example1_intro.html)\r\n- [Example 2: Join tables](https://hope-data-science.github.io/tidyfst/articles/example2_join.html)\r\n- [Example 3: Reshape](https://hope-data-science.github.io/tidyfst/articles/example3_reshape.html)\r\n- [Example 4: Nest](https://hope-data-science.github.io/tidyfst/articles/example4_nest.html)\r\n- [Example 5: Fst](https://hope-data-science.github.io/tidyfst/articles/example5_fst.html) \r\n- [Example 6: Dt](https://hope-data-science.github.io/tidyfst/articles/example6_dt.html) \r\n\r\n## Cheat sheet\r\n\r\n\u003ca href=\"https://github.com/hope-data-science/tidyfst/blob/master/docs/tidyfst_cheatsheet.pdf\"\u003e\u003cimg src=\"tidyfst_cheatsheet.png\"/\u003e\u003c/a\u003e\r\n\r\n## Suggested citation\r\n\r\nHuang et al., (2020). tidyfst: Tidy Verbs for Fast Data Manipulation. Journal of Open Source Software, 5(52), 2388, https://doi.org/10.21105/joss.02388\r\n\r\n\r\n\r\n## Related work\r\n\r\n- [data.table](https://github.com/Rdatatable/data.table)\r\n- [fst](https://github.com/fstpackage/fst)\r\n- [tidyr](https://github.com/tidyverse/tidyr)\r\n- [dplyr](https://github.com/tidyverse/dplyr)\r\n- [dtplyr](https://github.com/tidyverse/dtplyr)\r\n\r\n\r\n\r\n## Acknowledgement\r\n\r\nThe author of [maditr](https://github.com/gdemin/maditr), [Gregory Demin](https://github.com/gdemin) and the author of [fst](https://github.com/fstpackage/fst), [Marcus Klik](https://github.com/MarcusKlik) have helped me a lot in the development of this work. It is so lucky to have them (and many other selfless contributors) in the same open source community of R.\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhope-data-science%2Ftidyfst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhope-data-science%2Ftidyfst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhope-data-science%2Ftidyfst/lists"}