Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/leipzig/awesome-reproducible-research

A curated list of reproducible research case studies, projects, tutorials, and media
https://github.com/leipzig/awesome-reproducible-research

List: awesome-reproducible-research

awesome awesome-list reproducibility reproducible-analysis reproducible-research reproducible-science reproducible-workflows

Last synced: about 2 months ago
JSON representation

A curated list of reproducible research case studies, projects, tutorials, and media

Awesome Lists containing this project

README

        

# Awesome Reproducible Research [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3564746.svg)](https://doi.org/10.5281/zenodo.3564746)

> A curated list of reproducible research case studies, projects, tutorials, and media

## Contents

- [Case studies](#case-studies)
- [Ad-hoc reproductions](#ad-hoc-reproductions)
- [Theory papers](#theory-papers)
- [Theses and dissertations](#theses-and-dissertations)
- [Tool reviews](#tool-reviews)
- [Courses](#courses)
- [Development Resources](#development-resources)
- [Literature tools](#literature-tools)
- [Scientific Data Management Systems](#scientific-data-management-systems)
- [Books](#books)
- [Databases](#databases)
- [Data Repositories](#data-repositories)
- [Exemplar Portals](#exemplar-portals)
- [Runnable Papers](#runnable-papers)
- [Journals](#journals)
- [Ontologies](#ontologies)
- [Minimal Standards](#minimal-standards)
- [Organizations](#organizations)
- [Awesome Lists](#awesome-lists)

## Case studies
The term "case studies" is used here in a general sense to describe any study of reproducibility. A _reproduction_ is an attempt to arrive at comparable results with identical data using computational methods described in a paper. A _refactor_ involves refactoring existing code into frameworks and other reproducibility best practices while preserving the original data. A _replication_ involves generating new data and applying existing methods to achieve comparable results. A _robustness test_ applies various protocols, workflows, statistical models or parameters to a given data set to study their effect on results, either as a follow-up to an existing study or as a "bake-off". A _census_ is a high-level tabulation conducted by a third party. A _survey_ is a questionnaire sent to practitioners. A _case narrative_ is an in-depth first-person account. An _independent discussion_ utilizes a secondary independent author to interpret the results of a study as a means to improve inferential reproducibility.





Study





Field





Approach





Size







Glasziou et al 2008





Medicine





Census





80 studies







Baggerly & Coombes 2009





Cancer biology





Refactor





8 studies







Hothorn et al. 2009





Biostatistics





Census





56 studies







Ioannidis et al 2009





Genetics





Reproduction





18 studies







Anda et al 2009





Software engineering





Replication





4 companies







Vandewalle et al 2009





Signal processing





Census





134 papers







Prinz 2011





Biomedical sciences





Survey





23 PIs







Horthorn & Leisch 2011





Bioinformatics





Census





100 studies







Begley & Ellis 2012





Cancer biology





Replication





53 studies







Collberg et al 2014
Collberg & Proebsting 2016





Computer science





Census





613 papers







OSC 2015





Psychology





Replication





100 studies







Bandrowski et al 2015





Biomedical sciences





Census





100 papers







Patel et al 2015





Epidemiology





Robustness test





417 variables







Chang et al
2015





Economics





Reproduction





67 papers







Iqbal et al
2016





Biomedical sciences





Census





441 papers







Baker 2016





Science





Survey





1,576 researchers







Névéol et al 2016





NLP





Replication





3 studies







Reproducibility Project 2017





Cancer biology





Replication





9 studies







Vasilevsky et al 2017





Biomedical sciences





Census





318 journals







Kitzes et al 2017





Science





Case narrative





31 PIs







Barone et al 2017





Biological sciences





Survey





704 PIs







Kim & Dumas 2017





Bioinformatics





Refactor





1 study







Camerer 2017





Economics





Replication





18 studies







Olorisade 2017





Machine learning





Census





30 studies







Strupler & Wilkinson 2017





Archaeology





Case narrative





1 survey







Danchev et al 2017





Comparative toxicogenomics





Census





51,292 claims in 3,363 papers







Kjensmo & Gundersen 2018





Artificial intelligence





Census





400 papers







Gertler et al 2018





Economics





Census





203 papers







Stodden et al 2018





Computational science





Reproduction





204 papers, 180 authors







Madduri et al 2018





Genomics





Case narrative





1 study







Camerer et al 2018





Social sciences





Replication





21 papers







Silberzahn et al 2018





Psychology





Robustness test





One data set, 29 analyst teams







Boulesteix et al 2018





Medicine and health sciences





Census





30 papers







Eaton et al 2018





Microbiome immuno oncology





Replication





1 paper







Vaquero-Garcia et al 2018





Bioinformatics





Refactor and test of robustness





1 paper







Wallach et al 2018





Biomedical Sciences





Census





149 papers







Miller et al 2018





Bioinformatics





Synthetic replication & refactor





1 paper







Konkol et al 2018





Geosciences





Survey, Reproduction





146 scientists, 41 papers







Rahtz 2018





Reinforcement Learning





Reproduction, case narrative




1 paper








Stodden et al
2018





Computational physics





Census





306 papers







AlNoamany & Borghi 2018





Science & Engineering





Survey





215 participants







Li et al 2018





Nephrology





Robustness test





1 paper






Chen 2018





Social sciences & other





Census





810 Dataverse studies







Trisovic et al
2021





Social sciences & other





Census, Survey





2109 replication datasets







Nüst et al
2018





GIScience/Geoinformatics





Census, Survey





32 papers, 22 participants







Raman et al
2018





Genomics





Robustness test





8 studies







Stagge et al 2019





Geosciences





Survey





360 papers







Bizzego et al 2019





Deep learning





Robustness test





1 analysis







Madduri et al 2019





Genomics





Case narrative





1 analysis







Mammoliti
et al 2019





Pharmacogenomics





Case narrative





2 analyses






Allen & Mehler 2019





Biomedical sciences and Psychology





Census





127 registered reports






Pimentel et al 2019





All





Census





1,159,166 Jupyter notebooks






Fergusson et al 2019





Virology





Census





236 papers






Vlisides et al 2019
Sieber et al 2019





Anaesthesia





Independent discussion





1 study






Bakker et al 2019





Psychology





Replication





1 paper






Niepel et al 2019





Cell pharmacology





Robustness test





5 labs






Dacrema et al 2019





Machine learning





Reproduction





18 conference papers






Eran et al 2019





Experimental archaeology





Replication





1 theory






Rauh et al 2019





Neurology





Census





202 papers






Sætrevik & Sjåstad 2019





Psychology





Replication





2 experiments






Feng et al. 2019





Ecology and Evolution





Census





163 papers






Botvinik-Nezer et al. 2019





Neuroimaging





Robustness test





1 data set, 70 teams






Klein et al. 2019





Psychology





Replication





1 experiment, 21 labs, 2,220 participants






Obels et al. 2019





Psychology





Census





62 papers







Wayant et al
2019





Oncology





Census





154 meta-analyses






Simoneau et al. 2020





Bioinformatics





Robustness test





1 data set






Miyakawa 2020





Neurobiology





Census





41 papers






Thelwall et al
2020





Genetics





Census





1799 papers







Maassen et al
2020





Psychology





Reproduction





33 meta-analyses







Riedel et al
2020





Biomedical science





Census





792 papers







Culina et al
2020





Ecology





Census





346 papers







Clementi & Barba
2020





Physics





Replication





2 papers







Kemper et al
2020





Reproductive endocrinology





Census





222 papers







Marqués et al
2020





Biomedical sciences





Census





240 papers







Janssen et al
2020





Environmental Modelling





Census





7500 papers







Anderson et al
2020





Cardiology





Census





532 papers







Ostermann et al
2021





GIS





Census





75 papers







Samota & Davey
2020





Life Sciences





Survey





251 researchers







Bedford & Tzovaras
2020





Genetics





Robustness test





1 paper







Krassowski et al
2020
(repo)





Life Sciences





Census





3377 articles







Boudreau et al
2021





Computational Biology





Census





622 papers







Heumos et al
2021





Computational Biology





Robustness test





6 studies







Hrynaszkiewicz et al
2021





Computational Biology





Survey





214 researchers







Päll et al
2021





Differential expression





Census





2109 GEO submissions







Wijesooriya et al
2021





Computational biology





Census





186 papers







Weisberg et al
2021





Psychology





Robustness test





1 study







Vanderaa & Gatto
2021





Proteomics





Refactor





1 analysis







Breznau et al
2021





Social science





Robustness test





73 teams







Roberts et al
2021





Radiology





Census





62 studies







McDermott et al
2021





Clinical ML





Census





511 papers







Tedersoo et al
2021





9 Fields





Census





875 articles







Gabelica et al
2022





Life Sciences





Census





3556 papers







Samuel & Mietchen
2022





Biomedical Sciences





Census





9625 Jupyter notebooks







Zaorsky et al
2022





Radiation oncology





Robustness test





300k models







Kohrt et al
2022





Behavioral sciences





Refactor





One study







Hamilton et al
2022





Cancer biology





Census





306 papers







Motoki & Iseki
2022





Marketing





Replication





10 papers







Gihawi et al
2023





Bioinformatics





Refactor





1 paper







Gould et al
2023





Ecology





Robustness test





2 datasets, 174 teams







Protzko et al
2023





Psychology





Replication





16 findings







Bochynska et al
2023





Linguistics





Census





600 articles







Kambouris et al
2024





Ecology





Census





177 papers







Standvoss et al
2024





Biology





Census





750 papers







Brodeur et al
2024





Economics





Robustness test





110 papers






## Ad-hoc reproductions

These are one-off unpublished attempts to reproduce individual studies





Reproduction





Original study







https://rdoodles.rbind.io/2019/06/reanalyzing-data-from-human-gut-microbiota-from-autism-spectrum-disorder-promote-behavioral-symptoms-in-mice/
and
https://notstatschat.rbind.io/2019/06/16/analysing-the-mouse-autism-data/





Sharon, G. et al. Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice. Cell 2019, 177 (6), 1600–1618.e17.







https://github.com/sean-harrison-bristol/CCR5_replication





Wei, X.; Nielsen, R. CCR5-∆32 Is Deleterious in the Homozygous State in Humans. Nat. Med. 2019 DOI: 10.1038/s41591-019-0459-6. (retracted)







https://github.com/leipzig/placenta





Leiby et al "Lack of detection of a human placenta microbiome in samples from preterm and term deliveries"
https://doi.org/10.1186/s40168-018-0575-4







Heilbut et al "Rigor and Replication in Alzheimer’s Therapeutic Development: A Case Study"





Wang et al "Retraction: High-Affinity Naloxone Binding to Filamin A Prevents Mu Opioid Receptor–Gs Coupling Underlying Opioid Tolerance and Dependence"






## Theory papers





Authors/Date





Title





Field





Type







Ioannidis 2005





Why most published research findings are false




Science





Statistical reproducibility








Noble 2005





A Quick Guide to Organizing Computational Biology Projects




Bioinformatics





Best practices








Sandve et al 2013





Ten Simple Rules for Reproducible Computational Research




Computational science





Best practices








Freedman et al 2015





The Economics of Reproducibility in Preclinical Research





Preclinical research





Best practices







Yarkoni 2019





The Generalizability Crisis





Psychology





Statistical reproducibility







Bouthillier et al 2019





Unreproducible Research is Reproducible





Machine Learning





Methodology






Milton & Possolo 2019


Trustworthy data underpin reproducible research





Physics





Scientific philosophy







Devezer et al 2019





Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity





Science





Statistical reproducibility







Tierney et al 2020





A Realistic Guide to Making Data Available Alongside Code to Improve
Reproducibility





Science





Best practices







Haibe-Kains et al 2020





The importance of transparency and reproducibility in artificial
intelligence research





Artificial Intelligence





Critique







Nosek & Errington 2020





What is replication?





Science





Scientific philosophy







Alston & Rick 2020





A Beginner’s Guide to Conducting Reproducible Research





Ecology





Best Practices







Hejblum et al 2020





Realistic and Robust Reproducible Research for Biostatistics





Biostatistics





Best practices







Pawlik et al 2019





A Link is not Enough – Reproducibility of Data





Databases





Best practices







Schriml et al 2020





COVID-19 pandemic reveals the peril of ignoring metadata standards





Virology





Critique







Stoudt et al 2020





Principles for data analysis workflows





Data science





Best practices







Peng & Hicks 2020





Reproducible Research: A Retrospective





Public health





Review







Reiter et al 2020





Streamlining Data-Intensive Biology With Workflow Systems





Biology





Best practices







Ulrich & Miller 2020





Meta Research: Questionable research practices may have little effect on replicability





Science





Statistical reproducibility







Kasif & Roberts 2020





We need to keep a reproducible trace of facts, predictions, and hypotheses from gene to function in the era of big data





Functional genomics





Critique







Raman 2021





A research parasite's perspective on establishing a baseline to avoid errors in secondary analyses





Science





Best practices







Hoffmann et al 2021





The multiplicity of analysis strategies jeopardizes replicability: lessons learned across disciplines





Science





Critique







Rosenberg et al 2020





Reproducible Results Policy





Water Resources





Policy







Clary et al 2022





10 Things for Curating Reproducible and FAIR Research





Social sciences





Best practices







Mongan et al 2020





Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers





Medical imaging





Best practices







Orzechowski & Moore 2022





Generative and reproducible benchmarks or comprehensive evaluation machine learning classifiers





Machine Learning





Best practices







Ziemann et al 2023





The five pillars of computational reproducibility: Bioinformatics and beyond





Bioinformatics





Best practices







Stefan & Schönbrodt 2023





Big little lies: a compendium and simulation of p-hacking strategies





Research





Statistical reproducibility







Reinagel 2023





Is N-Hacking Ever OK? The consequences of collecting more data in pursuit of statistical significance





Biology





Statistical reproducibility







Abdill et al 2024





A how-to guide for code-sharing in biology





Biology





Best practices







Hassan et al 2024





Characterising Reproducibility Debt in Scientific Software:A Systematic Literature Review





Reproducible Research





Review of reviews





## Theses and dissertations





Authors/Date





Title





Institution







Pham, Quan 2014





A Framework for Reproducible Computational Research





University of Chicago







Wallach, Joshua 2016





Reproducible Research Practices, Scientific Transparency, and Subgroup Claims: A Meta-Research Dissertation





Stanford University







Konkol, Markus 2019





Publishing Reproducible Geoscientific Papers: Status quo, benefits, and opportunities





University of Münster







Trisovic, Ana 2018





Data preservation and reproducibility at the LHCb experiment at CERN





University of Cambridge







Feger, Sebastian 2020





Interactive Tools for Reproducible Science -- Understanding, Supporting, and Motivating Reproducible Science Practices





University of Munich







Leipzig, Jeremy 2021





Tests of Robustness in Peer Review





Drexel University







Nüst, Daniel 2022





Infrastructures and Practices for Reproducible Research in Geography, Geosciences, and GIScience





University of Münster







Melcher, Wiebke 2019





Free will in psychological research : considerations on methodic procedure and reproducibility of results





Leuphana University







Abang Ibrahim, Dayang 2016





The exploitation of provenance and versioning in the reproduction of e-experiments





University of Newcastle Upon Tyne







Henderson, Peter 2018





Reproducibility and Reusability in Deep Reinforcement Learning





McGill University







Drimer-Batca, Daniel 2018





Reproducibility Crisis in Science: Causes and Possible Solutions





Boston University 







Matheson, Granville 2018





Reliability, Replicability and Reproducibility in Pet Imaging





Karolinska Institutet







Patil, Prasad 2016





Assessing reproducibility and value in genomic signatures





The Johns Hopkins University 







Ahmad, MKH 2016





Scientific workflow execution reproducibility using cloud-aware provenance





University of the West of England, Bristol






Samuel, Sheeba 2019





A provenance-based semantic approach to support understandability, reproducibility, and reuse of scientific experiments





Friedrich-Schiller-Universität Jena







Wayant, Christian Cole 2021





Rigor and reproducibility of cancer medicine evidence





Oklahoma State University







Vilaró Pacheco, Marta 2021





Long-term impact of an editorial intervention to improve paper transparency and reproducibility





Universitat Politècnica de Catalunya







Sebastian, Aswathy 2023





Advancing Genomic and Transcriptomic Knowledge Through Reproducible Bioinformatics Workflows





Penn State University





## Tool reviews





Authors/Date





Title





Tools







Isdahl & Gundersen 2019





Out-of-the-box Reproducibility: A Survey of Machine Learning Platforms




MLflow, Polyaxon, StudioML, Kubeflow, CometML, Sagemaker, GCPML, AzureML, Floydhub, BEAT, Codalab, Kaggle







Pimentel et al
2019





A Survey on Collecting, Managing, and Analyzing Provenance from Scripts





Astro-Wise, CPL, CXXR, Datatrack, ES3, ESSW, IncPy, Lancet, Magni, noWorkflow, Provenance Curios, pypet, RDataTracker, Sacred, SisGExp, SPADE, StarFlow, Sumatra, Variolite, VCR, versuchung, WISE, YesWorkflow








Leipzig et al
2021
(supplemental)





The Role of Metadata in Reproducible Computational Research





CellML, CIF2, DATS, DICOM, EML, FAANG, GBIF, GO, ISO/TC 276, MIAME, NetCDF, OGC, ThermoML, CRAN, Conda, pip setup.cfg, EDAM, CodeMeta, Biotoolsxsd, DOAP, ontosoft, SWO, OBCS, STATO, SDMX, DDI, MEX, MLSchema, MLFlow, Rmd, CWL, CWLProv, RO-Crate, RO, WICUS, OPM, PROV-O, ReproZip, ProvOne, WES, BagIt, BCO, ERC, BEL, DC, JATS, ONIX, MeSH, LCSH, MP, Open PHACTS, SWAN, SPAR, PWO, PAV, Manubot, ReScience, PandocScholar







Konkol, Markus, Nüst, Daniel, Goulier, Laura
2020





Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication





Authorea, Binder, CodeOcean, eLife RDS, Galaxy Project, Gigantum, Manuscript, o2r, REANA, ReproZip, Whole tale





## Courses
- MOOCs
- [Coursera Reproducible Research](https://www.coursera.org/learn/reproducible-research) - Roger Peng et al JHU. Very popular course.
- [edX Principles, Statistical and Computational Tools for Reproducible Science](https://www.edx.org/course/principles-statistical-computational-harvardx-ph527x) - John Quackenbush et al Harvard
- [Reproducible research: methodological principles for transparent science](https://www.fun-mooc.fr/en/courses/reproducible-research-methodological-principles-transparent-scie/) - Beginner level. Note taking, version control, notebooks, reproducible data analysis. Bilingual English/French.
- Online course content
- [Tools for Reproducible Research](http://kbroman.org/Tools4RR/) - Karl Broman UW, includes resources page
- [R for Reproducible Scientific Analysis](https://swcarpentry.github.io/r-novice-gapminder/) - Software Carpentry workshop primer using Gapminder data
- [R-DAVIS](https://gge-ucd.github.io/R-DAVIS/syllabus.html) - Student-developed computer literacy and data course in R
- [AMIA2019](https://github.com/StatTag/amia-2019-spring-rr/) - Pragmatic RR for Analysis, Dissemination and Publication
- [PSU-PSY525](https://github.com/psu-psychology/psy-525-reproducible-research-2020) - Transparent, Open, and Reproducible Research Practices in the Social and Behavioral Sciences
- [Monash-RRR](https://monashdatafluency.github.io/r-rep-res/) - Reproducible Research in R workshop tutorial
- [OSU-OSRR](https://github.com/cbahlai/OSRR_course) - An open science and reproducible research course targeted at organismal ecologists
- [Reproducible-Science-Curriculum](https://github.com/Reproducible-Science-Curriculum) - A curriculum for teaching reproducible computational science bootcamps

## Development Resources
- R
- [CRAN Task View - Reproducible Research](https://cran.r-project.org/web/views/ReproducibleResearch.html) - packages relevant to RCR in R
- [liftr](https://liftr.me/) - persistent reproducible reporting through containerized R Markdown documents
- [repo](https://github.com/franapoli/repo) - provenance framework package
- [orderly](https://vimc.github.io/orderly/articles/orderly.html) - R package that automates writing reproducible analyses
- Linux-related (polyglot)
- [Reproducible Builds](https://reproducible-builds.org/) - a set of software development practices that create an independently-verifiable path from source to binary code
- Python
- [mlf-core](https://mlf-core.com) - Framework to develop GPU deterministic machine learning models with PyTorch, TensorFlow and XGBoost

## Literature tools
- [Scite](https://scite.ai/) - Citation statement AI for discovering and evaluating scientific articles
- [SciScore](https://www.sciscore.com/) - SciScore methods sections for a variety of rigor criteria and analyzes sentences that contain research resources (antibodies, cell lines, plasmids and software tools) and determines how uniquely identifiable that resource is based off of the provided metadata.
- [Ripeta](https://www.ripeta.com/) - Ripeta quickly scans research manuscripts or articles to identify and record key reproducibility variables, such as data availability, code acknowledgements, and research analysis methods.

## Scientific Data Management Systems
- [DVC](https://dvc.org/) - DVC tracks machine learning models and data sets
- [DataLad](https://www.datalad.org/) - Git-based versioning for data and provenance
- [Overture](https://www.overture.bio/) - Portal, query interface, visualization and schema framework that powers ICGC, KFDC, GDC
- [Fairly Toolset](https://fairly.readthedocs.io) - Tools for preparing, publishing and downloading datasets from research data repositories directly into computing environments. It provides integration with [Zenodo](https://fairly.readthedocs.io) and [Figshare](https://figshare.com/).

## Books
- [Reproducible Research with R and R Studio 2013](https://g.co/kgs/RxcFNm)
- [Implementing Reproducible Research 2014](https://osf.io/s9tya/) - Describes projects: Sumatra, Vistrails, CDE, SOLE, JUMBO, CML, knitr. Content available on OSF.
- [The Practice of Reproducible Research 2017](https://g.co/kgs/jZiMR7) - 31 first person case narratives and intro chapters
- [Dynamic Documents with R and knitr 2015](https://g.co/kgs/dpzkF4)
- [The Turing Way: A Handbook for Reproducible Data Science 2020](https://the-turing-way.netlify.com/introduction/introduction)
- [Reproducibility and Replicability in Science](https://www.nap.edu/catalog/25303/reproducibility-and-replicability-in-science)
- [Reproducibility: Principles, Problems, Practices, and Prospects](https://www.wiley.com/en-ec/Reproducibility:+Principles,+Problems,+Practices,+and+Prospects+-p-9781118864975)

## Databases
- [ReplicationWiki](http://replication.uni-goettingen.de/wiki/index.php) - Database for empirical studies with information about methods, data and software used, availability of replication material and whether replications, corrections or retractions are known. Mostly focused on social sciences.
- [ReproCrawl](https://crawl.reproduciblescience.org/)
- [ReplicationDatabase](https://metaanalyses.shinyapps.io/replicationdatabase/) - 1211 replication findings on 333 psychology studies

## Data Repositories
All these repositories assign Digital Object Identifiers (DOIs) to data
- [DataCite](https://datacite.org) - 12M+ DOIs registered for 46 allocators. Offers APIs and a metadata schema.
- [Data Dryad](https://datadryad.org) - curated, metadata-centric, focused on articles associated with published artices, $120 submission fee (various waivers available)
- [Figshare](https://figshare.com) - 20 GB of free private space, unlimited public space, >2M articles, >5k projects
- [OSF](https://osf.io) - Project-oriented system with access control and integration with popular tools. Unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each.
- [Zenodo](https://zenodo.org/) - Allows embargoed, restricted access, metadata support. 50GB limit.

## Exemplar Portals
Places to find papers with code or portals to host them
- [Jupyter Gallery](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) - Gallery of interesting Jupyter notebooks
- [Papers With Code](https://paperswithcode.com/) - ML papers with code
- [NARPS](https://github.com/poldrack/narps) - Code related to Neuroimaging Analysis Replication and Prediction Study
- [Codeocean](https://codeocean.com/explore) - A gallery of cloud-based containers with reproducible analyses

## Runnable Papers
Experimental papers that have associated notebooks
### Haibe-Kains lab




Publication


CodeOcean link




Mer AS et al. Integrative Pharmacogenomics Analysis of Patient Derived Xenografts


codeocean.com/capsule/056639




Gendoo, Zon et al. MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature


codeocean.com/capsule/643863




Yao et al. Tissue specificity of in vitro drug sensitivity


codeocean.com/capsule/550275




Safikhani Z et al. Gene isoforms as expression-based biomarkers predictive of drug response in vitro


codeocean.com/capsule/000290




El-Hachem et al. Integrative cancer pharmacogenomics to infer large-scale drug taxonomy


codeocean.com/capsule/425224




Safikhani Z et al. Revisiting inconsistency in large pharmacogenomic studies


codeocean.com/capsule/627606




Sandhu V et al. Meta-analysis of 1,200 transcriptomic profiles identifies a prognostic model for pancreatic ductal adenocarcinoma


codeocean.com/capsule/269362




Sharifi-Noghabi et al Drug sensitivity prediction from cell line-based pharmacogenomics data: guidelines for developing machine learning models


codeocean.com/capsule/7358839




Arrowsmith et al Automated detection of dental artifacts for large-scale radiomic analysis in radiation oncology


codeocean.com/capsule/2097894




Mer et al Biological and therapeutic implications of a unique subtype of NPM1 mutated AML


codeocean.com/capsule/8791617




Ortmann et al KuLGaP: A Selective Measure for Assessing Therapy Response in Patient-Derived Xenografts


codeocean.com/capsule/2817911




Madani Tonekaboni et al Large organized chromatin lysine domains help distinguish primitive from differentiated cell populations


codeocean.com/capsule/6911149




Seo et al SYNERGxDB: an integrative pharmacogenomic portal to identify synergistic drug combinations for precision oncology


codeocean.com/capsule/6322807




Mammoliti et al Creating reproducible pharmacogenomic analysis pipelines


codeocean.com/capsule/6718332




Manem et al Modeling Cellular Response in Large-Scale Radiogenomic Databases to Advance Precision Radiotherapy


codeocean.com/capsule/1166221




Tonekaboni et al CREAM: Clustering of genomic REgions Analysis Method


codeocean.com/capsule/0002901




Madani Tonekaboni et al SIGN: similarity identification in gene expression


codeocean.com/capsule/0544852




Mer et al Integrative Pharmacogenomics Analysis of Patient-Derived Xenografts


codeocean.com/capsule/0566399




Sandhu et al Applications of Computational Systems Biology in Cancer Signaling Pathways


codeocean.com/capsule/0795540




Sandhu et al Meta-Analysis of 1,200 Transcriptomic Profiles Identifies a Prognostic Model for Pancreatic Ductal Adenocarcinoma


codeocean.com/capsule/7402260




Gendoo et al MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature


codeocean.com/capsule/6438633




Yao et al Tissue specificity of in vitro drug sensitivity


codeocean.com/capsule/5502756




Safikhani et al Gene isoforms as expression-based biomarkers predictive of drug response in vitro


codeocean.com/capsule/0002901




El-Hachem et al Integrative Cancer Pharmacogenomics to Infer Large-Scale Drug Taxonomy


codeocean.com/capsule/4252248




Safikhani et al Revisiting inconsistency in large pharmacogenomic studies


codeocean.com/capsule/6276064



### Patcher lab



Publication


Github link




Pimental et al 2017. Differential analysis of RNA-seq incorporating quantification uncertainty


sleuth_paper_analysis




Melsted et al 2019. Modular and efficient pre-processing of single-cell RNA-seq


MBGBLHGP_2019




Chari et al 2021. Whole Animal Multiplexed Single-Cell RNA-Seq Reveals Plasticity of Clytia Medusa Cell Types


CWGFLHGCCHAP_2021

### Siepel lab



Blumberg et al 2021. Characterizing RNA stability genome-wide through combined analysis of PRO-seq and RNA-seq data


https://codeocean.com/capsule/7351682


## Journals
- [ReScience](http://rescience.github.io/) - Journal dedicated to insilico reproductions and tests of robustness, lives on Github.
- [eLife](https://elifesciences.org/for-the-press/eb096af1/elife-launches-executable-research-articles-for-publishing-computationally-reproducible-results) - Executable Research Articles (ERA) inline executable blocks

## Ontologies
- [FAIRsharing](https://fairsharing.org) - standards, databases, and policies
- [BioPortal](https://bioportal.bioontology.org/) - 660 biomedical ontologies

## Minimal Standards
- [STORMS](https://www.stormsmicrobiome.org/) - Strengthening The Organization and Reporting of Microbiome Studies (STORMS) is a checklist for reporting on human microbiome studies. [Preprint](https://doi.org/10.1101/2020.06.24.167353)

## Organizations
- [ResearchObject.org](http://www.researchobject.org/) - RO specifications and publications
- [BioCompute](https://osf.io/zm97b/) - BCO specs
- [rOpenSci](https://ropensci.org) - Tools, conferences, and education
- [Open Science Framework](https://osf.io) - Open source project management
- [pyOpenSci](https://www.pyopensci.org/) - Promotes open and reproducible research through peer-review of scientific Python packages
- [Replication Network](https://replicationnetwork.com/) - Furthering the practice of replication in economics. Econ replication database.
- [repliCATS project](https://replicats.research.unimelb.edu.au/) - Estimating the replicability of research in the social sciences. [Paper](https://osf.io/preprints/metaarxiv/2pczv/)
- [ReproHack](https://reprohack.github.io/reprohack-hq/) - 1-day reproducibility hackathons held worldwide
- [CODECHECK](https://codecheck.org.uk/) - community for checking executability of scientific preprints and papers
- [CASCaD](https://www.cascad.tech/) - Certification Agency for Scientific Code and Data. Issues reproducibility certificates.
- [Reproducibility for Everyone](https://www.repro4everyone.org/) - Community-led reproducibility workshops
- [CUrating for REproducibility](https://curating4reproducibility.org/) - curation of research and code for digital preservation
- [Michigan Institute for Data Science Reproducibility Hub](https://midas.umich.edu/reproducibility-resources/) - [reproducibility challenge](https://hdsr.mitpress.mit.edu/pub/mlconlea/release/1) manuscripts & presentations
- [OpenMKT](https://openmkt.org/) - transparency and quality of marketing research published in academic journals
- [Many Co-Authors](https://manycoauthors.org/) - online platform designed to collect and share information on the provenance and availability of the data for all articles co-authored by Francesca Gino
- [FORRT](https://forrt.org/) - Framework for Open and Reproducible Research Training advancing research transparency, reproducibility, rigor, and ethics through pedagogical reform and meta-scientific research

## Awesome Lists
- [Awesome Pipeline](https://github.com/pditommaso/awesome-pipeline) - So many pipelines frameworks
- [Awesome Docker](https://github.com/veggiemonk/awesome-docker) - Everything related to the Docker containerization system
- [Awesome R](https://github.com/qinwf/awesome-R#reproducible-research) - Section on RR tools
- [Awesome Reproducible R](https://github.com/datasnakes/awesome-reproducible-R) - RRR tools
- [Awesome Jupyter](https://github.com/adebar/awesome-jupyter) - Jupyter projects, libraries and resources
- [Awesome Bioinformatics Benchmarks](https://github.com/j-andrews7/Awesome-Bioinformatics-Benchmarks) - Benchmarks are a related aspect of robustness testing
- [Awesome Open Science](https://github.com/ZoranPandovski/awesome-open-science) - Resources, data, tools, and scholarship
- [Awesome Public Datasets](https://github.com/awesomedata/awesome-public-datasets) - A topic-centric list of HQ open datasets
- [Awesome Semantic Web](https://github.com/semantalytics/awesome-semantic-web) - Semantic web and linked data resources.

## Contribute

Contributions welcome! Read the [contribution guidelines](contributing.md) first. You may find my `src/doi2md.py` script useful for quickly generating entries from a DOI.

## License

[![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/)

To the extent possible under law, Jeremy Leipzig has waived all copyright and
related or neighboring rights to this work.