{"id":14069144,"url":"https://github.com/ropensci/git2rdata","last_synced_at":"2026-04-09T09:22:13.485Z","repository":{"id":34930093,"uuid":"147685405","full_name":"ropensci/git2rdata","owner":"ropensci","description":"An R package for storing and retrieving data.frames in git repositories.","archived":false,"fork":false,"pushed_at":"2025-02-09T15:41:39.000Z","size":3028,"stargazers_count":100,"open_issues_count":6,"forks_count":13,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-14T13:06:32.699Z","etag":null,"topics":["r","r-package","reproducible-research","rstats","version-control"],"latest_commit_sha":null,"homepage":"https://ropensci.github.io/git2rdata/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ropensci.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":".zenodo.json"}},"created_at":"2018-09-06T14:22:19.000Z","updated_at":"2025-03-17T13:16:53.000Z","dependencies_parsed_at":"2025-04-14T13:06:39.066Z","dependency_job_id":null,"html_url":"https://github.com/ropensci/git2rdata","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fgit2rdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fgit2rdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fgit2rdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fgit2rdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ropensci","download_url":"https://codeload.github.com/ropensci/git2rdata/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252908937,"owners_count":21823522,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","r-package","reproducible-research","rstats","version-control"],"created_at":"2024-08-13T07:06:38.919Z","updated_at":"2026-04-09T09:22:13.479Z","avatar_url":"https://github.com/ropensci.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/git2rdata)](https://cran.r-project.org/package=git2rdata)\n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![lifecycle](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\n[![minimal R version](https://img.shields.io/badge/R%3E%3D-3.5.0-6666ff.svg)](https://cran.r-project.org/)\n[![DOI](https://zenodo.org/badge/147685405.svg)](https://zenodo.org/badge/latestdoi/147685405)\n[![ROpenSci review](https://badges.ropensci.org/263_status.svg)](https://github.com/ropensci/software-review/issues/263)\n[![GPL-3](https://img.shields.io/badge/License-GPL-3-brightgreen)](https://raw.githubusercontent.com/inbo/checklist/refs/heads/main/inst/generic_template/gplv3.md)\n[![Release](https://img.shields.io/github/release/ropensci/git2rdata.svg)](https://github.com/ropensci/git2rdata/releases)\n![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/ropensci/git2rdata/check-package)\n![GitHub repo size](https://img.shields.io/github/repo-size/ropensci/git2rdata)\n![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/ropensci/git2rdata.svg)\n![GitHub forks](https://img.shields.io/github/forks/ropensci/git2rdata.svg?style=social)\n![GitHub stars](https://img.shields.io/github/stars/ropensci/git2rdata.svg?style=social)\n![r-universe name](https://ropensci.r-universe.dev/badges/:name?color=c04384)\n![r-universe package](https://ropensci.r-universe.dev/badges/git2rdata)\n[![Codecov test coverage](https://codecov.io/gh/ropensci/git2rdata/branch/main/graph/badge.svg)](https://app.codecov.io/gh/ropensci/git2rdata?branch=main)\n\u003c!-- badges: end --\u003e\n\n# git2rdata: Store and Retrieve Data.frames in a Git Repository\n\n[Onkelinx, Thierry![ORCID logo](https://info.orcid.org/wp-content/uploads/2019/11/orcid_16x16.png)](https://orcid.org/0000-0001-8804-4216)[^aut][^cre][^INBO]\n[Vanderhaeghe, Floris![ORCID logo](https://info.orcid.org/wp-content/uploads/2019/11/orcid_16x16.png)](https://orcid.org/0000-0002-6378-6229)[^ctb][^INBO]\n[Desmet, Peter![ORCID logo](https://info.orcid.org/wp-content/uploads/2019/11/orcid_16x16.png)](https://orcid.org/0000-0002-8442-8025)[^ctb][^INBO]\n[Lommelen, Els![ORCID logo](https://info.orcid.org/wp-content/uploads/2019/11/orcid_16x16.png)](https://orcid.org/0000-0002-3481-5684)[^ctb][^INBO]\n[Research Institute for Nature and Forest (INBO)](mailto:info%40inbo.be)[^cph][^fnd]\n\n[^aut]: author\n[^cre]: contact person\n[^INBO]: Research Institute for Nature and Forest (INBO)\n[^ctb]: contributor\n[^cph]: copyright holder\n[^fnd]: funder\n\n**keywords**:  git; version control; plain text data\n\n\n\u003c!-- description: start --\u003e\nThe `git2rdata` package is an R package for writing and reading dataframes as plain text files. \nA metadata file stores important information.\n\n1. Storing metadata allows to maintain the classes of variables. \n  By default, `git2rdata` optimizes the data for file storage. \n  The optimization is most effective on data containing factors. \n  The optimization makes the data less human readable.\n  The user can turn this off when they prefer a human readable format over smaller files.\n  Details on the implementation are available in `vignette(\"plain_text\", package = \"git2rdata\")`.\n2. Storing metadata also allows smaller row based [diffs](https://en.wikipedia.org/wiki/Diff) between two consecutive [commits](https://en.wikipedia.org/wiki/Commit_(version_control)). \n  This is a useful feature when storing data as plain text files under version control. \n  Details on this part of the implementation are available in `vignette(\"version_control\", package = \"git2rdata\")`. \n  Although we envisioned `git2rdata` with a [git](https://git-scm.com/) workflow in mind, you can use it in combination with other version control systems like [subversion](https://subversion.apache.org/) or [mercurial](https://www.mercurial-scm.org/).\n3. `git2rdata` is a useful tool in a reproducible and traceable workflow. \n  `vignette(\"workflow\", package = \"git2rdata\")` gives a toy example.\n4. `vignette(\"efficiency\", package = \"git2rdata\")` provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.\n\u003c!-- description: end --\u003e\n\n## Why Use Git2rdata?\n\n- You can store dataframes as plain text files.\n- The dataframe you read identical information content as the one you wrote.\n    - No changes in data type.\n    - Factors keep their original levels, including their order.\n    - Date and date-time format are unambiguous, documented in the metadata.\n- The data and the metadata are in a standard and open format, making it readable by other software.\n- `git2rdata` checks the data and metadata during the reading. \n`read_vc()` informs the user if there is tampering with the data or metadata.\n- Git2rdata integrates with the [`git2r`](https://cran.r-project.org/package=git2r) package for working with git repository from R.\n    - Another option is using git2rdata solely for writing to disk and handle the plain text files with your favourite version control system outside of R.\n- The optimization reduces the required disk space by about 30% for both the working directory and the git history. \n- Reading data from a HDD is 30% faster than `read.table()`, writing to a HDD takes about 70% more time than `write.table()`.\n- Git2rdata is useful as a tool in a reproducible and traceable workflow. \nSee `vignette(\"workflow\", package = \"git2rdata\")`.\n- You can detect when a file was last modified in the git history. \nUse this to check whether an existing analysis is obsolete due to new data. \nThis allows to not rerun up to date analyses, saving resources.\n\n## Talk About `git2rdata` at\nuseR!2019\u003c!-- spell-check: ignore --\u003e\nin Toulouse, France\n\n\u003ciframe width=\"560\" height=\"315\" src=\"https://www.youtube-nocookie.com/embed/sbRPmakBFqo\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen\u003e\u003c/iframe\u003e\u003c!-- spell-check: ignore --\u003e\n\n## Installation\n\nInstall from CRAN\n\n```r\ninstall.packages(\"git2rdata\")\n```\n\nInstall the development version from GitHub\n\n```r\n# installation requires the \"remotes\" package\n# install.package(\"remotes\")\n\n# install with vignettes (recommended)\nremotes::install_github(\n  \"ropensci/git2rdata\", \n  build = TRUE, \n  dependencies = TRUE, \n  build_opts = c(\"--no-resave-data\", \"--no-manual\")\n)\n# install without vignettes\nremotes::install_github(\"ropensci/git2rdata\"))\n```\n\n## Usage in Brief\n\nThe user stores dataframes with `write_vc()` and retrieves them with `read_vc()`. \nBoth functions share the arguments `root` and `file`. \n`root` refers to a base location where to store the dataframe. \nIt can either point to a local directory or a local git repository. \n`file` is the file name to use and can include a path relative to `root`. \nMake sure the relative path stays within `root`.\n\n```r\n# using a local directory\nlibrary(git2rdata)\nroot \u003c- \"~/myproject\" \nwrite_vc(my_data, file = \"rel_path/filename\", root = root)\nread_vc(file = \"rel_path/filename\", root = root)\nroot \u003c- git2r::repository(\"~/my_git_repo\") # git repository\n```\n\nMore details on store dataframes as plain text files in `vignette(\"plain_text\", package = \"git2rdata\")`.\n\n```r\n# using a git repository\nlibrary(git2rdata)\nrepo \u003c- repository(\"~/my_git_repo\")\npull(repo)\nwrite_vc(my_data, file = \"rel_path/filename\", root = repo, stage = TRUE)\ncommit(repo, \"My message\")\npush(repo)\nread_vc(file = \"rel_path/filename\", root = repo)\n```\n\nPlease read `vignette(\"version_control\", package = \"git2rdata\")` for more details on using git2rdata in combination with version control.\n\n## What Data Sizes Can Git2rdata Handle?\n\nThe recommendation for git repositories is to use files smaller than 100 MiB, a repository size less than 1 GiB and less than 25k files. \nThe individual file size is the limiting factor. \nStoring the airbag dataset ([`DAAG::nassCDS`](https://cran.r-project.org/package=DAAG)) with `write_vc()` requires on average 68 (optimized) or 97 (verbose) byte per record. \nThe file reaches the 100 MiB limit for this data after about 1.5 million (optimized) or 1 million (verbose) observations. \n\nStoring a 90% random subset of the airbag dataset requires 370 kiB (optimized) or 400 kiB (verbose) storage in the git history. \nUpdating the dataset with other 90% random subsets requires on average 60 kiB (optimized) to 100 kiB (verbose) per commit. \nThe git history reaches the limit of 1 GiB after 17k (optimized) to 10k (verbose) commits.\n\nYour mileage might vary.\n\n## Citation\n\nPlease use the output of `citation(\"git2rdata\")`\n\n## Folder Structure\n\n- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://CRAN.R-project.org/package=roxygen2) format\n- `man`: The help files in [Rd](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Rd-format) format\n- `inst/efficiency`: pre-calculated data to speed up `vignette(\"efficiency\", package = \"git2rdata\")`\n- `testthat`: R scripts with unit tests using the [testthat](https://CRAN.R-project.org/package=testthat) framework\n- `vignettes`: source code for the vignettes describing the package\n- `man-roxygen`: templates for documentation in Roxygen format\n- `pkgdown`: source files for the `git2rdata` [website](https://ropensci.github.io/git2rdata/)\n- `.github`: guidelines and templates for contributors\n\n```\ngit2rdata\n├── .github \n├─┬ inst\n│ └── efficiency\n├── man \n├── man-roxygen \n├── pkgdown\n├── R\n├─┬ tests\n│ └── testthat\n└── vignettes\n```\n\n## Contributions\n\n`git2rdata` welcomes contributions. \nPlease read our [Contributing guidelines](https://github.com/ropensci/git2rdata/blob/master/.github/CONTRIBUTING.md) first. \nThe `git2rdata` project has a [Contributor Code of Conduct](https://github.com/ropensci/git2rdata/blob/master/.github/CODE_OF_CONDUCT.md). \nBy contributing to this project, you agree to abide by its terms.\n\n[![rOpenSci footer](http://ropensci.org/public_images/github_footer.png)](https://ropensci.org)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Fgit2rdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fropensci%2Fgit2rdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Fgit2rdata/lists"}