{"id":14063274,"url":"https://github.com/karthik/rstudio2019","last_synced_at":"2025-12-29T13:04:09.637Z","repository":{"id":144899924,"uuid":"165291841","full_name":"karthik/rstudio2019","owner":"karthik","description":"Resources from my Rstudio::conf 2019 talk","archived":false,"fork":false,"pushed_at":"2019-01-30T18:38:38.000Z","size":5312,"stargazers_count":217,"open_issues_count":0,"forks_count":23,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-01-21T03:08:32.027Z","etag":null,"topics":["rstudioconf","talks"],"latest_commit_sha":null,"homepage":"http://inundata.org/talks/rstd19/#/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/karthik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-01-11T18:37:07.000Z","updated_at":"2025-01-18T08:25:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"3ef68d4e-ff34-43b7-ae86-9edd40b20bb8","html_url":"https://github.com/karthik/rstudio2019","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karthik%2Frstudio2019","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karthik%2Frstudio2019/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karthik%2Frstudio2019/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karthik%2Frstudio2019/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/karthik","download_url":"https://codeload.github.com/karthik/rstudio2019/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243545778,"owners_count":20308398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["rstudioconf","talks"],"created_at":"2024-08-13T07:03:14.305Z","updated_at":"2025-12-29T13:04:04.598Z","avatar_url":"https://github.com/karthik.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# How To Make Your Data Analysis Notebooks More Reproducible\n\n[![rstudio_talk_slides](https://i.imgur.com/fYGze6k.png)](http://inundata.org/talks/rstd19/#/)\n\n[Slide deck](http://inundata.org/talks/rstd19/#/) | Slide deck as [PDF](https://github.com/karthik/rstudio2019/blob/master/reproducible-data-analysis.pdf) \n\n[🎥 Video of talk at rstudio::conf(2019)](https://resources.rstudio.com/rstudio-conf-2019/a-guide-to-modern-reproducible-data-science-with-r) \n\n## Resources\nI have included a handful of links to papers, software packages and tutorials/manuals about some tools I mention in my talk. Pull requests or issues on additional ones to include are welcome.\n\n### Research Compendia\n\n- [Statistical Analysis and reproducible research ](https://biostats.bepress.com/bioconductor/paper2/)\n- [Packaging Data Analytical Work Reproducibly Using R (and Friends)](https://www.tandfonline.com/doi/abs/10.1080/00031305.2017.1375986) ([OA preprint](https://peerj.com/preprints/3192/)). A practical introduction to setting up a research compendium in R. \n- [The rOpenSci reproducibility guide](https://ropensci.github.io/reproducibility-guide/) *Slightly dated but still very useful*\n\n**Examples of Research Compendia on GitHub**\nBelow are a few links to real world examples of research compendia in R. To have a minimal compendium, all you really need is a valid [`DESCRIPTION`](https://github.com/boettiger-lab/pomdp-intro/blob/master/DESCRIPTION) file containing a handful of fields such as type, name, version and dependencies. See Marwick et al 2017 for a detailed description of the different types of compendia.\n\n**Small**\n- [Code and data associated with Duffy, James, and Longworth Applied and Environmental Microbiology paper describing the ecology, virulence, and phylogeny of a brood parasite of Daphnia, Blastulidium paedophthorum;](https://github.com/duffymeg/BroodParasiteDescription)\n \n**Medium**\n- [Resolving the measurement uncertainty paradox in ecological management](https://github.com/boettiger-lab/pomdp-intro)\n\n**Large**\n\n- [Non-parametric Bayesian Inference for Conservation Decisions ](https://github.com/cboettig/nonparametric-bayes)\n\n- Find various other compendia on [Github](https://github.com/topics/research-compendium) and [Zenodo](https://zenodo.org/communities/research-compendium?page=1\u0026size=20) using the `research-compendium` tag.\n\n**Software packages related to research compendia**\n\n- 📦 [`rrtools`](https://github.com/benmarwick/rrtools)  by Ben Marwick (also the author of the packaging data analysis paper mentioned above) *extends functions in `devtools` and provides instructions, templates, and functions to make a basic compendium suitable for doing reproducible research with R.* \n\t- Also see 📦 [workflowr](https://jdblischak.github.io/workflowr/) by John Blischak and the [task view](https://github.com/jdblischak/ctv-project-workflows) on R-based data analysis projects maintained by John Blischak, Anna Krystalli, Ben Marwick, Daniel Nüst.\n- 📦 [`usethis`](https://github.com/r-lib/usethis) *Many of the major function in `rrtools` are imported from `usethis.` A savvy user can get by setting up and maintaining a compendium purely with `usethis` functions.*\n- 📦 [`goodpractice`](https://github.com/MangoTheCat/goodpractice) - Designed to help you build more robust packages, the package does a deep dive on your package contents and provide advice on syntax pitfalls to avoid, code formatting suggestions, and helps you improve overall package structure.\n- The 📦 [`rticles`](https://github.com/rstudio/rticles) package by JJ has numerous journal templates and together with Rstudio addins like word [`countaddin` ](https://github.com/benmarwick/wordcountaddin)and [`citr`](https://github.com/crsh/citr) + [`knitcitations`](https://github.com/cboettig/knitcitations).\n \n\n### 📈 Data management \n\n- 📦 [`piggyback`](https://github.com/ropensci/piggyback), [[docs]](https://ropensci.github.io/piggyback/):  This clever R package allows you to attach arbitrary data (or other) files (upto 2gb each) to a GitHub release. Given GitHub's fast [CDN](https://en.wikipedia.org/wiki/Content_delivery_network), this would be an easy way to quickly attach large files to a compendium and read them back in a local/collaborator/remote environment very easily. As always be sure to archive a long-term copy on [Zenodo](https://zenodo.org/).\n- 📦 [`arkdb`](https://github.com/ropensci/arkdb) [[docs]](https://ropensci.github.io/arkdb/):  This package allows you to archive and unarchive databases as flat text files.\n-  🎥 For more on setting up data packages, see this [excellent talk by Noam Ross](https://www.youtube.com/watch?v=zsEsh5QpN0U) at New York R.\n\n### Computational environments: Binder and friends\n\n- [My Binder](https://mybinder.org/) is a free binderhub deployment that turns any Git repo into a collection of interactive notebooks. Now with better R support!\n- For instructions on how to set this up for your R project, see [my notes here](https://github.com/karthik/rstudio2019/blob/master/binder-notes.md)\n- [Introducing Binder 2.0 — share your interactive research environment](https://elifesciences.org/labs/8653a61d/introducing-binder-2-0-share-your-interactive-research-environment) Paper describing the architecture of Binder in case you were interested in what was happening under the hood\n- 🎥 [A talk about Binder at Scipy 2018](https://www.youtube.com/watch?v=KcC0W5LP9GM). Also see [conference proceedings PDF](http://conference.scipy.org/proceedings/scipy2018/pdfs/project_jupyter.pdf).\n- [`repo2docker`](https://github.com/jupyter/repo2docker) A Python module that will turn any repo (or local folder) into a Docker Image.  \n\n**Other hosted Binder hubs**\n\n- [Pangeo binder](https://binder.pangeo.io/) *Pangeo encourages everyone to use it.*\n- [gesis](https://notebooks.gesis.org/)\n- [Syzgy](http://syzygy.ca/) *Binder + JupyterHub for Compute Canada*\n\n**Setting up Binder for your analysis**\n\nI have captured all the various ways to set up mybinder with a R project in a [separate document](binder-notes.md). \n\nAre you interested in setting up or hosting a binderhub for the R community? Get in touch via the issues.\n\n\n**Also see**\n- [Whole Tale](https://wholetale.org/) \n- [Computing environments for reproducibility: Capturing the “Whole Tale”](https://www.sciencedirect.com/science/article/pii/S0167739X17310695) - OA paper describing the Whole Tale project.\n- [Code Ocean](https://codeocean.com/) - A commercial, blackbox, full-stack service that will accomplish something similar to the above two projects. Code Ocean links will likely start appearing in papers soon.\n\n\n**Software packages related to setting up computational environments**\n\n- 📦 [`Containerit`](https://github.com/o2r-project/containerit). [Detailed blog post](https://o2r.info/2017/05/30/containerit-package/) This sweet package will generate a Dockerfile for you by examining the code inside a folder or just from your session info. This is analogous to `repo2docker` but is very R centric\n- [`stevedore`](https://github.com/richfitz/stevedore) Although there are a few docker clients (docker, harbor), this is my recommendation for managing docker containers from inside R. \n\n\n### 🔨 Workflows: drake and friends\n\n- 📦 [`drake`](https://github.com/ropensci/drake) - An R-focused pipeline toolkit for reproducibility and high-performance computing. Install the package from here or CRAN.\n- [The prequel to the drake R package](https://ropensci.org/blog/2018/02/06/drake/) *A blog post by the creator of drake describing his motivation for the package.*\n- [drake manual](https://ropenscilabs.github.io/drake-manual/) A detailed `bookdown` guide on how to setup and use drake for projects of varying levels of complexity.\n- [Presentation on drake](https://wlandau.github.io/drake-datafest-2019/#/) Slides from a talk by Will Landau (who is here at the conference so go pick his brain if you want to learn more!)\n\n**Real world drake examples**\n- [Pathogen modeling study](https://github.com/pat-s/pathogen-modeling)\n\n**Miscellaneous**\n- IKEA diagram inspired by [IDEA instructions](https://idea-instructions.com/)\n\n---\n\n### Acknowledgments\n\nMany thanks to [Chris Holdgraf](https://bids.berkeley.edu/people/chris-holdgraf), [Carl Boettiger](https://www.carlboettiger.info/), [Will Landau](https://wlandau.github.io/), and [Ben Marwick](http://faculty.washington.edu/bmarwick/) for various discussions on these topics. Also thanks to Ciera Martinez, Kara Woo, and Nick Tierney for comments on the presentation. \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkarthik%2Frstudio2019","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkarthik%2Frstudio2019","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkarthik%2Frstudio2019/lists"}