{"id":16277164,"url":"https://github.com/eliocamp/ggdatasaver","last_synced_at":"2025-06-12T08:37:04.210Z","repository":{"id":45684973,"uuid":"514043478","full_name":"eliocamp/ggdatasaver","owner":"eliocamp","description":"Automatically save data associated with a 'ggplot2' plot","archived":false,"fork":false,"pushed_at":"2022-10-06T19:01:39.000Z","size":606,"stargazers_count":23,"open_issues_count":3,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-03T08:42:48.773Z","etag":null,"topics":["ggplot2","r","r-package","rstats"],"latest_commit_sha":null,"homepage":"https://eliocamp.github.io/ggdatasaver","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eliocamp.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-07-14T20:47:07.000Z","updated_at":"2024-03-22T10:17:48.000Z","dependencies_parsed_at":"2023-01-19T13:45:16.075Z","dependency_job_id":null,"html_url":"https://github.com/eliocamp/ggdatasaver","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eliocamp/ggdatasaver","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliocamp%2Fggdatasaver","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliocamp%2Fggdatasaver/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliocamp%2Fggdatasaver/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliocamp%2Fggdatasaver/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eliocamp","download_url":"https://codeload.github.com/eliocamp/ggdatasaver/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eliocamp%2Fggdatasaver/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259431181,"owners_count":22856451,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ggplot2","r","r-package","rstats"],"created_at":"2024-10-10T18:53:00.873Z","updated_at":"2025-06-12T08:37:04.166Z","avatar_url":"https://github.com/eliocamp.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"50%\"\n)\n```\n\n# ggdatasaver\n\n\u003c!-- badges: start --\u003e\n[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n[![CRAN status](https://www.r-pkg.org/badges/version/ggdatasaver)](https://CRAN.R-project.org/package=ggdatasaver)\n[![R-CMD-check](https://github.com/eliocamp/ggdatasaver/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/eliocamp/ggdatasaver/actions/workflows/R-CMD-check.yaml)\n\u003c!-- badges: end --\u003e\n\nThe goal of ggdatasaver is to automatically save the data associated with your plots for you to share as supplementary material. \nOther people can then use that data instead of digitising your plots. \nBecause only the data already being published as a plot is saved, there should be fewer privacy or legal complications.\n\n## Installation\n\nYou can install the development version of ggdatasaver like so:\n\n``` r\nremotes::install_github(\"eliocamp/ggdatasaver\")\n```\n\n## Example\n\nggdatasaver works automatically with knitr. \nThe only thing you need to do is to define the directory where the data is saved with\n\n```{r}\nggdatasaver::save_plot_data_in(\"plot-data\")\n```\n\nThen, just create your ggplot2 figures as always. \nUsing a chunk label is encouraged because it will be used to name the file. \n\n```{r mpg, fig.alt = \"Scatterplot of mpg vs disp with a fitted smooth line showing a decreasing relationship.\"}\nlibrary(ggplot2)\n\nggplot(mtcars, aes(mpg, disp)) +\n  geom_point() +\n  geom_smooth()\n```\n\n\nAfter you knit, you will have a (possibly new) directory with zip files with the data of each plot. \n\n```{r}\nfs::dir_tree(\"plot-data\")\n```\n\nInside that zip file there will be a csv file for each layer. \n\n```{r}\n# Unzip the contents of mpg.zip into a temporary directory. \ndir \u003c- file.path(tempdir(), \"mpg\")\nutils::unzip(\"plot-data/mpg-1.zip\", exdir = dir)\nfs::dir_tree(dir)\n```\n\nThe data of each layer is only the one used to draw the geometry. \nFor example, GeomSmooth.csv has the coordinates of the fit and some other aesthetic information\n\n```{r}\nsmooth \u003c- read.csv(file.path(dir, \"GeomSmooth.csv\"))\nknitr::kable(head(smooth))\n```\n\nAnd the line can be reconstructed exactly from these data. \n\n```{r, plot_data_dir = NULL, fig.alt = \"The same figure from before but only the smooth fit.\"}\nggplot(smooth, aes(x, y)) + \n  geom_ribbon(aes(ymin = ymin, ymax = ymax, fill = I(fill), alpha = I(alpha))) +\n  geom_line(aes(colour = I(colour), size = I(size)))\n```\n\n(Setting `plot_data_dir` to `NULL` will suppress data-saving for that chunk.)\n\nAs you can see, only the coordinates of each geom are saved, not the underlying data. \nFor a more dramatic example, take this controur plot of the Old Faithful Geyser Data.\n\n\n```{r faithful-density, fig.alt = \"2D density contours of eruptions vs. waiting shoing two distinct areas of high density, one centered at ~4.5 eruptions and ~80 waiting and one at 2 eruptions and 55 waiting.\"}\nggplot(faithful, aes(x = eruptions, y = waiting)) +\n  geom_density_2d()\n```\n\n(Now there are two zip files in the `plot-data` directory\n```{r}\nfs::dir_tree(\"plot-data\")\n```\n.)\n\nggdatasaver will save the coordinates that defined the contours, not the observations from which they were computed. \n\n```{r, plot_data_dir = NULL, fig.alt = \"The same plot from before.\"}\ndir \u003c- file.path(tempdir(), \"faithful-density\")\nutils::unzip(\"plot-data/faithful-density-1.zip\", exdir = dir)\n\ndensity \u003c- read.csv(file.path(dir, \"GeomDensity2d.csv\"))\n\nggplot(density, aes(x, y)) +\n  geom_path(aes(group = group))\n```\n\nThis makes it safe to share these data, as it doesn't include any more information than what's in the plot you are already sharing. \n\n\nThe panel specification of each plot is saved in layout.csv, which holds the location (ROW and COLumn) information of each panel as well as the value of the variables \n\n```{r mpg-facets, fig.alt = \"Scatterplot of displ vs cty with 12 panels organised in 2 rows and 4 columns according to the values of drv and cyl.\"}\nggplot(mpg, aes(displ, cty)) +\n  geom_point() +\n  facet_grid(drv ~ cyl)\n```\n\n```{r}\ndir \u003c- file.path(tempdir(), \"mpg-facets\")\nutils::unzip(\"plot-data/mpg-facets-1.zip\", exdir = dir)\n\nlayout \u003c- read.csv(file.path(dir, \"layout.csv\"))\n\nhead(layout)\n```\n\n\n```{r, include = FALSE}\nunlink(\"plot-data\", recursive = TRUE)\n```\n\n## Use cases\n\n### Accessibilty\n\nAcademic journals almost never have any infrastructure that allows for alt text for figures. \nFor blind people, having access to the raw data is better than nothing. \n\nWith the data they could print a tactile version (for simple plots), compute statistics to get a better sense of the relationships, or just read the raw data. \nFor fitted curves, which usually are not adequately described in text, they could get the data, fit the curve and read the curve parameters. \n\n### Reproducibilty\n\nAn important aspect of reproducibility is having access to data, but this is easier said than done. \nHuge data is expensive to store and serve, and many types of data carry privacy concerns (such as patient data) or licencing issues (like secret data). \nAnother barrier to data sharing is organising it in useful way (see [The Turing Way's Guide to Reproducible Research](https://the-turing-way.netlify.app/reproducible-research/open/open-data.html#barriers-to-data-sharing)). \n\nWhile not perfect, sharing the small snippets of data that are the coordinates of plot geometries can be a good compromise. \nThese data are generally small and already in a tabular format, so it's technically easy to share in a repository or as supplemental material. \nAnd because is data that is already implicitly shared as an image, it doesn't carry privacy and licencing concerns. \n(I'm not a lawyer, so don't take that as legal advice.)\n\nAnd even when the raw data is shared, sharing also the plot data can be useful for researchers that want to reproduce or reanalise small chunks of your results but don't want or can't download the original data and run the code. \n\n## Limitations\n\nggdatasaver has only been tested on simple plots although there's no reason it should work work with more complicated ones. \n[patchwork](https://patchwork.data-imaginist.com/) is supported but not [cowplot](https://wilkelab.org/cowplot/). \n\nWhen using ggdatasaver plots are built twice; once when saving the data and once when drawing the plot. \nThis shouldn't be an issue most of the time unless your plot requires heavy computation.\n\nOnly data from ggplot2 plots are exported.\nBase plots or lattice plots are not supported; only because I don't know how to go about it. \nIf you have any idea of how to implement ggdatasaver for base plots, open an issue and let's talk about it!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feliocamp%2Fggdatasaver","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feliocamp%2Fggdatasaver","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feliocamp%2Fggdatasaver/lists"}