Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources
Unix, R and python tools for genomics and data science
https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources
bioinformatics cancer-genomics data-science
Last synced: 3 days ago
JSON representation
Unix, R and python tools for genomics and data science
- Host: GitHub
- URL: https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources
- Owner: crazyhottommy
- Created: 2015-09-14T02:50:30.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2024-12-19T17:45:37.000Z (4 days ago)
- Last Synced: 2024-12-19T18:35:56.075Z (4 days ago)
- Topics: bioinformatics, cancer-genomics, data-science
- Language: Shell
- Homepage:
- Size: 36.8 MB
- Stars: 1,195
- Watchers: 87
- Forks: 351
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-bioinformatics-education - List of training materials at GitHub.com/crazyhottommy
README
Table of content
## Table of content
- [General](#general)
- [Courses](#courses)
- [Some biology](#some-biology)
- [Some statistics](#some-statistics)
- [linear algebra](#linear-algebra)
- [Bayesian Statistics](#bayesian-statistics)
- [Learning Latex](#learning-latex)
- [Linux commands](#linux-commands)
- [Do not give me excel files!](#do-not-give-me-excel-files)
- [How to name files](#how-to-name-files)
- [parallelization](#parallelization)
- [Statistics](#statistics)
- [Data transfer](#data-transfer)
- [Website](#website)
- [profile R code](#profile-r-code)
- [updating R](#updating-r)
- [Better R code](#better-r-code)
- [Shiny App](#shiny-app)
- [R tools for data wrangling, tidying and visualizing.](#r-tools-for-data-wrangling-tidying-and-visualizing)
- [Genomic data visulization](#genomic-data-visulization)
- [Sankey graph](#sankey-graph)
- [Handling big data in R](#handling-big-data-in-r)
- [Write your own R package](#write-your-own-r-package)
- [Documentation](#documentation)
- [handling arguments at the command line](#handling-arguments-at-the-command-line)
- [visualization in general](#visualization-in-general)
- [Javascript](#javascript)
- [python tips and tools](#python-tips-and-tools)
- [machine learning](#machine-learning)
- [Amazon cloud computing](#amazon-cloud-computing)
- [Genomics-visualization-tools](#genomics-visualization-tools)
- [Databases](#databases)
- [Large data consortium data mining](#large-data-consortium-data-mining)
- [Integrative analysis](#integrative-analysis)
- [Interactive visualization](#interactive-visualization)
- [Tutorials](#tutorials)
- [MOOC(Massive Open Online Courses)](#moocmassive-open-online-courses)
- [git and version control](#git-and-version-control)
- [blogs](#blogs)
- [data management](#data-management)
- [Automate your workflow, open science and reproducible research](#automate-your-workflow-open-science-and-reproducible-research)
- [Survival curve](#survival-curve)
- [Organize research for a group](#organize-research-for-a-group)
- [Clustering](#clustering)
- [CRISPR related](#crispr-related)
- [vector arts for life sciences](#vector-arts-for-life-sciences)### General
* [So you want to be a computational biologist?](http://www.nature.com/nbt/journal/v31/n11/full/nbt.2740.html)
* [Ten simple rules for biologists learning to program](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005871)
* [Scientific computing: Code alert](http://www.nature.com/naturejobs/science/articles/10.1038/nj7638-563a?WT.mc_id=TWT_NatureNews) Nature News.
* [Some drawings about programming](https://drawings.jvns.ca/) Very nice cartoon demonstrating useful concepts. https://wizardzines.com/
* [Practical computing for biologist](http://practicalcomputing.org/). One of my first books to get me started in coding.
* [ModernDive An Introduction to Statistical and Data Sciences via R](https://ismayc.github.io/moderndiver-book/index.html)
* [DevOps for Data Science](https://do4ds.com/) A free ebook on DevOps for data science.
* [Introduction to Data Science](https://rafalab.github.io/dsbook/) by Rafael A. Irizarry.
* [Learning Statistics with R](https://learningstatisticswithr.com/)
* [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)
* [Reproducible Research Workflows with Snakemake and R](https://lachlandeer.github.io/snakemake-econ-r-tutorial/)
* [The Biologist’s Guide to Computing](http://book.biologistsguide2computing.com/en/stable/) A book written by @tjelvar_olsson
* [A Primer for Computational Biology](http://library.open.oregonstate.edu/computationalbiology/) A nice book from Oregon State University. You can get a hard copy on Amazon https://www.amazon.com/Primer-Computational-Biology-Shawn-ONeil/dp/0870719262.
* [Computational Genomics With R](http://compgenomr.github.io/book/) A nice book from Altuna Akalin.
* [Modern Statistics for Modern Biology](http://web.stanford.edu/class/bios221/book/) written by Prof. Susan Holmes from Stanford. I plan to read through it. a nice book using R for modern biology! looks awesome!
* [Introduction to Data Science](https://ubc-dsci.github.io/introduction-to-datascience/index.html) A book by Tiffany-Anne. TimbersTrevor and CampbellMelissa Lee.
* [An Introduction To Applied Bioinformatics](https://github.com/caporaso-lab/An-Introduction-To-Applied-Bioinformatics) Interactive lessons in bioinformatics. http://www.readiab.org/introduction.html
* [Feature Engineering and Selection: A Practical Approach for Predictive Models](https://github.com/topepo/FES) by Kuhn and Johnson https://bookdown.org/max/FES
* [Agile Data Science with R](https://edwinth.github.io/ADSwR/index.html)
* [Offensieve programming book](https://neonira.github.io/offensiveProgrammingBook_v1.2.1/) in R.
* [The Biostar Handbook: A Beginner's Guide to Bioinformatics](http://read.biostarhandbook.com/?q=) I am honored to be a co-author of this book. My ChIP-seq section was released by the mid of 2017.
* [Beginner's Handbook to Next Generation Sequencing](https://genohub.com/next-generation-sequencing-handbook/) Everything you need to know about starting a sequencing project
* [Another Book on Data Science:Learn R and Python in Parallel](https://www.anotherbookondatascience.com/) compares R and python side by side.
* [A New Online Computational Biology Curriculum](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003662) PLOS genetics paper.
* [Bioinformatics core competencies for undergraduate life sciences education](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196878)
* [PH525x series - Biomedical Data Science](http://genomicsclass.github.io/book/) The best course to get you started with genomics using R. I have taken 3 times for the same course to get a deep understanding of the concepts and R commands. Now everything can be found here from Rafael Irizarry lab: http://rafalab.github.io/pages/harvardx.html
* [The Bioconductor 2018 Workshop Compilation](https://bioconductor.github.io/BiocWorkshops/) very rich!
* [Bioconductor for Genomics Data sciences](https://kasperdanielhansen.github.io/genbioconductor/) Coursera course.
* [bioc workflow genomic annotation](https://www.bioconductor.org/packages/release/workflows/html/annotation.html)
* [Expanding the computational toolbox for mining cancer genomes](http://www.nature.com/nrg/journal/v15/n8/full/nrg3767.html) Nature Review.
* [some repos from command line to rstats and github](https://github.com/info-201)
* 2016 review [Coming of age: ten years of next-generation sequencing technologies](http://www.nature.com/nrg/journal/v17/n6/full/nrg.2016.49.html)
* [Cancer genomics — from bench to bedside: review papers from Nature](http://www.nature.com/collections/dswwtfkdty?BAN_NRG_1602_CANCERCOLLECTION_PORTFOLIO)
* [SequencEnG: an Interactive Knowledge Base of Sequencing Techniques
](http://education.knoweng.org/sequenceng/)
* [Research Software Engineering with Python](https://merely-useful.tech/py-rse/) Building software that makes research possible. From Greg Wilson and Carpentries folks.
* [Research Software Engineering with R](https://merely-useful.tech/r-rse/index.html) Building software that makes research possible### Courses
* [The Missing Semester of Your CS Education](https://missing.csail.mit.edu/) These MIT Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there’s one critical subject that’s rarely covered, and is instead left to students to figure out on their own: proficiency with their tools. We’ll teach you how to master the command-line, use a powerful text editor, use fancy features of version control systems, and much more!
* Bioinformatics training materials from https://bioinformatics.babraham.ac.uk/training.html I like the Inkscape tutorial too
* [applied computational genomics](https://github.com/quinlan-lab/applied-computational-genomics#course-lecture-slides) by Aaron Quinlan, the creator of bedtools and many other cool tools.
* [BMMB 852: Applied Bioinformatics (Fall, 2016)](https://www.ialbert.me/) by Istvan Albert, the creator of [biostars](https://www.biostars.org/).
* [JHU EN.600.649: Computational Genomics: Applied Comparative Genomics](https://github.com/schatzlab/appliedgenomics) by Michael Schatz.
* [Introduction to Computational Biology](https://biodatascience.github.io/compbio/) by Mike Love.
* [Advanced Data Science](http://jtleek.com/advdatasci/index.html) by Jeff Leek.
* [Data Science for Biological, Medical and Health Research: Notes for 431](https://thomaselove.github.io/2018-431-book/data-science.html): R focused
* Various [TeachingMaterial](https://github.com/lgatto/TeachingMaterial) collected by Laurent Gatto.
* [NGS sequence analysis](https://bioinf.comav.upv.es/courses/sequence_analysis/index.html)
* [bioinformatics-workbook](https://isugenomics.github.io/bioinformatics-workbook/)
* [Reproducible Quantitative Methods](https://cbahlai.github.io/rqm-template/) from Mozilla science lab.
* [bio-info courses](https://edu.t-bio.info/lp-courses/)
* [MIT Computational Biology: Genomes, Networks, Evolution, Health - Fall 2018 - 6.047/6.878/HST.507](https://www.youtube.com/playlist?list=PLypiXJdtIca6GBQwDTo4bIEDV8F4RcAgt)by Manolis Kellis
* [MIT machine learning in Genomics](https://www.youtube.com/playlist?list=PLypiXJdtIca6U5uQOCHjP9Op3gpa177fK) by Manolis Kellis.
* [MIT linear algebra course by Gilbert Strang ](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/)
* [A 2020 Vision of Linear Algebra](https://ocw.mit.edu/resources/res-18-010-a-2020-vision-of-linear-algebra-spring-2020/index.htm) by Gilbert Strang
* [Generalized Additive Models in R](https://noamross.github.io/gams-in-r-course/) This short course will teach you how to use these flexible, powerful tools to model data and solve data science problems. GAMs offer offer a middle ground between simple linear models and complex machine-learning techniques, allowing you to model and understand complex systems.
* [Feature Engineering and Selection: A Practical Approach for Predictive Models](https://bookdown.org/max/FES/)
* [Tidy Modeling with R](https://www.tmwr.org/)### Some biology
If you are from fields outside of biology, places to get you started:
* [An Owner's Guide to the Human Genome: an introduction to human population genetics, variation and disease](https://web.stanford.edu/group/pritchardlab/HGbook.html) by Jonathan Pritchard, Stanford University
* [Tales from the Genome](https://www.udacity.com/course/tales-from-the-genome--bio110) A course by Udacity and 23andMe.
* [The Biology of Cancer](http://garlandscience.com/product/isbn/9780815342205) A classic text book by Robert A. Weinberg. A must read for all cancer biologists.
* [Molecular Biology of the Cell](https://www.amazon.com/Molecular-Biology-Cell-Bruce-Alberts/dp/0815341059/ref=mt_hardcover?_encoding=UTF8&me=) A text book
* [Learn Genetics](http://learn.genetics.utah.edu/) from University of Utah learning center.
* [iBiology](https://www.ibiology.org) offers several different types of courses
* [courses](https://www.khanacademy.org/science/biology) from khanacademy.org
* [genomics for software engineer](https://learngenomics.dev/docs/biological-foundations/cells-genomes-dna-chromosomes)### Some statistics
* [Elements of Statistical Modeling for Experimental Biology](https://www.middleprofessor.com/files/applied-biostatistics_bookdown/_book/) by Jeffrey A. Walker.I plan to read this one!!
* [seeing theory](http://students.brown.edu/seeing-theory/index.html) The goal of the project is to make statistics more accessible to a wider range of students through interactive visualizations.
* [Points of Significance: Interpreting P values](http://www.nature.com/nmeth/journal/v14/n3/full/nmeth.4210.html)
* [statistics for biologists](http://www.nature.com/collections/qghhqm/pointsofsignificance)
* [Advanced Statistical Computing](https://bookdown.org/rdpeng/advstatcomp/) by Roger Peng.
* [fiveMinuteStats](http://stephens999.github.io/fiveMinuteStats/index.html#inference)
* [Learning Statistics with R](https://learningstatisticswithr.com/)
* [Statistical Modeling of High Dimensional Counts](https://mikelove.github.io/counts-model/) by Mike love on RNAseq counts modeling.
* [Mixed models in R: a primer](https://arbor-analytics.com/post/mixed-models-a-primer/)
* [Introduction to linear mixed models](https://gkhajduk.github.io/2017-03-09-mixed-models/)
* [MIXED EFFECTS COX REGRESSION](https://stats.idre.ucla.edu/r/dae/mixed-effects-cox-regression/)
* [GLMM FAQ](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html)
* [mixed model/hierachical model visualized](http://mfviz.com/hierarchical-models/)
* [A brief introduction to mixed effects modelling and multi-model inference in ecology](https://peerj.com/articles/4794/)### linear algebra
* [Essence of linear algebra](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab) by threebrownoneblue
* [18.06 from Gilbert Strang](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/)
* [Matrix Methods in Data Analysis, Signal Processing, and Machine Learning from Gilbert Strang](https://ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/index.htm)
* [Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares](http://vmls-book.stanford.edu/)using Julia language.
* [Common statistical tests are linear models (or: how to teach stats)](https://lindeloev.github.io/tests-as-linear/)
* [Course materials for applied regression STATS191 stanford](https://pratheepaj.github.io/teaching/stats191/)#### Bayesian Statistics
* [Bayes Rules! An Introduction to Bayesian Modeling with R](https://www.bayesrulesbook.com/)
* [Introduction to Bayesian Statistics](https://www.youtube.com/playlist?list=PLuRpZIQQRQedb2GM2WhKSEzojGN-BIIR9) STATS331 from Brendon Brewer.
* [Introduction to Empirical Bayes](http://varianceexplained.org/r/empirical-bayes-book/) by David Robinson using baseball examples.
* [Statistical Rethinking](http://xcelab.net/rm/statistical-rethinking/) github link https://github.com/rmcelreath/statrethinking_winter2019 Julia code https://shmuma.github.io/rethinking-2ed-julia/
* [Bayesian Data Analysis demos for R](https://github.com/avehtari/BDA_R_demos)
* [Doing Bayesian Data Analysis in brms and the tidyverse](https://bookdown.org/ajkurz/DBDA_recoded/) a book.
* [Probabilistic-Programming-and-Bayesian-Methods-for-Hackers](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers)### Learning Latex
* [draw your symbols](http://detexify.kirelabs.org/classify.html)
* [The Best Way to Support LaTeX Math in Markdown with MathJax](https://yihui.org/en/2018/07/latex-math-markdown/)
* [TinyTeX](https://yihui.org/tinytex/) A lightweight, cross-platform, portable, and easy-to-maintain LaTeX distribution based on TeX Live
* Math expression http://www.math.mcgill.ca/yyang/regression/RMarkdown/example.html
* [intro to Latex2](http://ctan.mirrors.hoobly.com/info/lshort/english/lshort.pdf) chapter 3
* [The Bates LaTeX Manual](https://www.bates.edu/mathematics/resources/latex-manual/)### Linux commands
* [Greg Wilson's youtube videos on unix shell](https://www.youtube.com/watch?v=U3iNcBtycaQ)
* [A Bioinformatician's UNIX Toolbox](http://lh3lh3.users.sourceforge.net/biounix.shtml#xargs) from Heng Li
* [Linux command line exercises for NGS data processing](http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/linux.html)
* [command line bootcamp](http://rik.smith-unna.com/command_line_bootcamp/?id=rca84m6nsz6c9ngnugt8uayvi) teaches you unix command step by step
* [Unix in your browser](https://browsix.org/). Maybe useful for teaching bash?
* sshx A secure web-based, collaborative terminal https://sshx.io/ useful for teaching
* [A Book for Anyone to Get Started with Unix](https://github.com/seankross/the-unix-workbench)
* [bash one-liners for bioinformatics](https://github.com/crazyhottommy/oneliners)
* [some of my bash one-liner collections](https://github.com/crazyhottommy/scripts-general-use/blob/master/Shell/bioingformatics_one_liner.md)
* [Use the Unofficial Bash Strict Mode (Unless You Looove Debugging)](http://redsymbol.net/articles/unofficial-bash-strict-mode/)
* [Defensive BASH Programming](http://www.kfirlavi.com/blog/2012/11/14/defensive-bash-programming) very good read for bash programming.
* [Better Bash Scripting in 15 Minutes](http://robertmuth.blogspot.com/2012/08/better-bash-scripting-in-15-minutes.html?m=1)
* [bash pitfalls](http://mywiki.wooledge.org/BashPitfalls)
* [Advancing in the Bash Shell](http://samrowe.com/wordpress/advancing-in-the-bash-shell/)
* [Bash tips](http://jvns.ca/blog/2017/03/26/bash-quirks/)
![](https://cloud.githubusercontent.com/assets/4106146/24389198/68cee218-1345-11e7-98a1-93ba78542daf.jpg)
* [Bash by example](https://www.ibm.com/developerworks/library/l-bash/)
* process substitution: [Using Names Pipes and Process Substitution in Bioinformatics](http://vincebuffalo.org/blog/2013/08/08/using-names-pipes-and-process-substitution-in-bioinformatics.html) [Handy Bash feature: Process Substitution](https://medium.com/@joewalnes/handy-bash-feature-process-substitution-8eb6dce68133#.uz5pj9yer)
* [NGS Advanced Beginner/Intermediate Shell](https://github.com/ngs-docs/2016-adv-begin-shell-genomics)
* Commonly used commands for PBS scheduler:[Monitoring and Managing Your Job](https://www.osc.edu/supercomputing/batch-processing-at-osc/monitoring-and-managing-your-job)
* test your unix skills at [cmd challenge](https://cmdchallenge.com)
* people say awk is not part of bioinformats :) Still very useful parsing plain text files. [Steve's Awk Academy](http://troubleshooters.com/codecorn/awk/index.htm)
* [intro-bioinformatics: Website and slides for intro to bioinformatics class at Fred Hutch](https://github.com/fredhutchio/intro-bioinformatics)
![](https://cloud.githubusercontent.com/assets/4106146/17654247/872f8716-6266-11e6-887d-cebd009dde6a.png)
* [tmate](https://tmate.io/):Instant terminal sharing
* [tmux](https://tmux.github.io/) is a terminal multiplexer similar to [`screen`](https://www.gnu.org/software/screen/manual/screen.html) but have more features.
[tmux cheatsheet](https://gist.github.com/MohamedAlaa/2961058)
[tmux config](https://github.com/tony/tmux-config)
[tmux install without root](https://gist.github.com/ryin/3106801)* [All about redirection](http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-3.html)
**Theory and quick reference**
There are 3 file descriptors, stdin, stdout and stderr (std=standard).
Basically you can:
redirect stdout to a file
redirect stderr to a file
redirect stdout to a stderr
redirect stderr to a stdout
redirect stderr and stdout to a file
redirect stderr and stdout to stdout
redirect stderr and stdout to stderr
1 'represents' stdout and 2 stderr.
A little note for seeing this things: with the less command you can view both stdout (which will remain on the buffer) and the stderr that will be printed on the screen, but erased as you try to 'browse' the buffer.* stdout 2 file
This will cause the ouput of a program to be written to a file.
ls -l > ls-l.txt
Here, a file called 'ls-l.txt' will be created and it will contain what you would see on the screen if you type the command 'ls -l' and execute it.
* stderr 2 file
This will cause the stderr ouput of a program to be written to a file.
grep da * 2> grep-errors.txt
Here, a file called 'grep-errors.txt' will be created and it will contain what you would see the stderr portion of the output of the 'grep da *' command.
* stdout 2 stderr
This will cause the stderr ouput of a program to be written to the same filedescriptor than stdout.
grep da * 1>&2
Here, the stdout portion of the command is sent to stderr, you may notice that in differen ways.
* stderr 2 stdout
This will cause the stderr ouput of a program to be written to the same filedescriptor than stdout.
grep * 2>&1
Here, the stderr portion of the command is sent to stdout, if you pipe to less, you'll see that lines that normally 'dissapear' (as they are written to stderr) are being kept now (because they're on stdout).
* stderr and stdout 2 file
This will place every output of a program to a file. This is suitable sometimes for cron entries, if you want a command to pass in absolute silence.
rm -f $(find / -name core) &> /dev/null
This (thinking on the cron entry) will delete every file called 'core' in any directory. Notice that you should be pretty sure of what a command is doing if you are going to wipe it's output.
* change permissions of files
each digit is for: user, group and other.`chmod 754 myfile`: this means the user has read, write and execute permssion; member in the same group has read and execute permission but no write permission; other people in the world only has read permission.
4 stands for "read",
2 stands for "write",
1 stands for "execute", and
0 stands for "no permission."
So 7 is the combination of permissions 4+2+1 (read, write, and execute), 5 is 4+0+1 (read, no write, and execute), and 4 is 4+0+0 (read, no write, and no execute).It is sometimes hard to remember. one can use the letter:The letters u, g, and o stand for "user", "group", and "other"; "r", "w", and "x" stand for "read", "write", and "execute", respectively.
`chmod u+x myfile`
`chmod g+r myfile`### Do not give me excel files!
* [scary-excel-stories](https://github.com/jennybc/scary-excel-stories/blob/master/README.md)
* [VisiData](https://www.visidata.org/)is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.
* [convert xlsx to csv: xlsx2csv](https://github.com/dilshod/xlsx2csv)
* [csvkit](http://csvkit.readthedocs.io/en/latest/index.html#)
* [csvtk](https://bioinf.shenwei.me/csvtk/usage/) A complete .csv/.tsv toolkit including join command.
* [GNU datamash](https://www.gnu.org/software/datamash/)
* [tabtk](https://github.com/lh3/tabtk) Toolkit for processing TAB-delimited format from Heng Li, the author of `Samtools`, `BWA` and many others.
* [miller](https://miller.readthedocs.io/en/latest/) is a command-line tool for querying, shaping, and reformatting data files in various formats including CSV, TSV, JSON, and JSON Lines.
* [xsv](https://github.com/BurntSushi/xsv) A fast CSV toolkit written in Rust.
* [Going from a human readable Excel file to a machine-readable csv with {tidyxl}](https://www.brodrigues.co/blog/2018-09-11-human_to_machine/)
* eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more. https://ebay.github.io/tsv-utils/### How to name files
It is really important to name your files correctly! see a [ppt](https://rawgit.com/Reproducible-Science-Curriculum/rr-organization1/master/organization-01-slides.html) by Jenny Bryan.
Three principles for (file) names:
* Machine readable (do not put special characters and space in the name)
* Human readable (Easy to figure out what the heck something is, based on its name, add slug)
* Plays well with default ordering:1. Put something numeric first
2. Use the ISO 8601 standard for dates (YYYY-MM-DD)
3. Left pad other numbers with zeros
![](https://cloud.githubusercontent.com/assets/4106146/17389870/5dfc54c4-59cd-11e6-9293-a1f8789c8352.png)
![](https://cloud.githubusercontent.com/assets/4106146/17389869/5df7f6f4-59cd-11e6-8715-86645243d70c.png)
**If you have to rename the files...**
* [brename](https://github.com/shenwei356/brename) A cross-platform command-line tool for safely batch renaming files/directories via regular expression (supporting Windows, Linux and OS X) from ShenWei is very useful!
**Good naming of your files can help you to extract meta data from the file name**
* [dirdf](https://github.com/ropenscilabs/dirdf) Create tidy data frames of file metadata from directory and file names.```r
> dir("examples/dataset_1/")
[1] "2013-06-26_BRAFWTNEG_Plasmid-Cellline-100_A01.csv"
[2] "2013-06-26_BRAFWTNEG_Plasmid-Cellline-100_A02.csv"
[3] "2014-02-26_BRAFWTNEG_FFPEDNA-CRC-1-41_D08.csv"
[4] "2014-03-05_BRAFWTNEG_FFPEDNA-CRC-REPEAT_H03.csv"
[5] "2016-04-01_BRAFWTNEG_FFPEDNA-CRC-1-41_E12.csv"> library("dirdf")
> dirdf("examples/dataset_1/", template="date_assay_experiment_well.ext")
date assay experiment well ext pathname
1 2013-06-26 BRAFWTNEG Plasmid-Cellline-100 A01 csv 2013-06-26_BRAFWTNEG_Plasmid-Cellline-100_A01.csv
2 2013-06-26 BRAFWTNEG Plasmid-Cellline-100 A02 csv 2013-06-26_BRAFWTNEG_Plasmid-Cellline-100_A02.csv
3 2014-02-26 BRAFWTNEG FFPEDNA-CRC-1-41 D08 csv 2014-02-26_BRAFWTNEG_FFPEDNA-CRC-1-41_D08.csv
4 2014-03-05 BRAFWTNEG FFPEDNA-CRC-REPEAT H03 csv 2014-03-05_BRAFWTNEG_FFPEDNA-CRC-REPEAT_H03.csv
```### parallelization
Using these tool will greatly improve your working efficiency and get rid of most of your `for loops`.
1. [xargs](http://www.cyberciti.biz/faq/linux-unix-bsd-xargs-construct-argument-lists-utility/)
2. [GNU parallel](https://www.gnu.org/software/parallel/). one of my post [here](http://crazyhottommy.blogspot.com/2016/03/the-most-powerful-uniux-commands-i.html)
3. [gxargs](https://github.com/brentp/gargs) by Brent Pedersen. Written in GO.
4. [rush](https://github.com/shenwei356/rush) A cross-platform command-line tool for executing jobs in parallel by Shen Wei. I use his other tools such as `brename` and `csvtk`.
5. [future: Unified Parallel and Distributed Processing in R for Everyone](https://cran.r-project.org/web/packages/future/index.html)
6. [furrr](https://github.com/DavisVaughan/furrr) Apply Mapping Functions in Parallel using Futures### Statistics
* [Essence of linear algebra](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
* [statistics for biologists](http://www.nature.com/collections/qghhqm) A collection of Nature articles on statistics in biology.### Data transfer
* keep an eye on the [dat project](https://datproject.org/)! https://blog.datproject.org/2018/04/24/data-sharing-at-institutions-and-beyond-with-dat/
a blog post by Mark Ziemann http://genomespot.blogspot.com/2018/03/share-and-backup-data-sets-with-dat.html
### Website
* [rmarkdown website](https://rmarkdown.rstudio.com/rmarkdown_websites.html)
* [A step by step tutorial](https://gupsych.github.io/acadweb/index.html)
* [Up and running with blogdown](https://alison.rbind.io/post/up-and-running-with-blogdown/)
* [summer of blogdown](https://summer-of-blogdown.netlify.com/)
* [bookdown advanced slide](https://arm.rbind.io/slides/bookdown.html#1)
* [make a hugo blog from scratch](https://zwbetz.com/make-a-hugo-blog-from-scratch/) to understand Hugo if you use blogdown.
* [Tips for using the Hugo academic theme](https://lmyint.github.io/post/hugo-academic-tips/)
* [Custom domain hosting with Github and Namecheap](http://blog.brooke.science/posts/custom-domain-hosting-with-github-and-namecheap/)
* [MkDocs](https://www.mkdocs.org/) is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation. Documentation source files are written in Markdown, and configured with a single YAML configuration file.### updating R
* [R upgrading can be a smooth process](http://dscinomics.com/post/2017-04-28-upgrade-to-r-3-4-0/)
* [updating R](http://lcolladotor.github.io/2017/05/04/Updating-R/#.WSEQPrzyuqA) a blog post by L. Collado-Torres.
* [update your R version in a breeze ( on OSX)](https://github.com/AndreaCirilloAC/updateR)
* [updating R](https://whattheyforgot.org/maintaining-r.html#how-to-transfer-your-library-when-updating-r)```r
# Install new version of R (lets say 3.5.0 in this example)# Create a new directory for the version of R
fs::dir_create("~/Library/R/3.5/library")# Re-start R so the .libPaths are updated
# Lookup what packages were in your old package library
pkgs <- fs::dirname(fs::dir_ls("~/Library/R/3.4/library"))# Filter these packages as needed
# Install the packages in the new version
install.packages(pkgs)```
### Better R code
* [assertr](https://github.com/ropensci/assertr)
* [Tools for Working with ...](https://ellipsis.r-lib.org)
* [here](https://github.com/jennybc/here_here)
* [Inline testthat tests with roxygen2:roxytest](https://github.com/mikldk/roxytest)
* [Non-invasive pretty printing of R code: styler](https://styler.r-lib.org)
* [Static Code Analysis for R: lintr](https://github.com/jimhester/lintr) It checks adherence to a given style, syntax errors and possible semantic issues
* [Make R a little bit stricter: strict](https://github.com/hadley/strict)
also read[offensive programming Book](https://neonira.github.io/offensiveProgrammingBook_v1.2.1/)
### Shiny App* [Omicsplayground[https://github.com/bigomics/omicsplayground]
* A Framework for Building Robust Shiny Apps [golem](https://thinkr-open.github.io/golem/)
* [bootstrapllib}(https://rstudio.github.io/bootstraplib/) Tools for styling shiny and rmarkdown from R via Bootstrap (3 or 4) Sass
* [The Shiny AWS Book](https://business-science.github.io/shiny-production-with-aws-book/) How to set up Shiny in AWS
* [imola](https://github.com/pedrocoutinhosilva/imola)Bridging the gap between R/shiny and CSS layouts (grid and flexbox)!### profile R code
* [profvis](https://rstudio.github.io/profvis/) Interactive Visualizations for Profiling R Code.
* [proffer](https://github.com/r-prof/proffer) The proffer package profiles R code to find bottlenecks.
* [rco - The R Code Optimizer](https://jcrodriguez1989.github.io/rco/index.html) Make your R code run faster! rco analyzes your code and applies different optimization strategies that return an R code that runs faster.### R tools for data wrangling, tidying and visualizing.
* [Common statistical tests are linear models (or: how to teach stats)](https://lindeloev.github.io/tests-as-linear/)
* [What They Forgot to Teach You About R](https://whattheyforgot.org/) by Jennifer Bryan, Jim Hester. you know it is good.
Rstudio2020 https://rstudio-conf-2020.github.io/what-they-forgot/
* An R package for simple but efficient rowwise jobs https://courtiol.github.io/lay/
* [Fundamentals of Data Visualization](http://serialmentor.com/dataviz/) by Claus O. Wilke.
* [from data to vis](https://www.data-to-viz.com) From Data to Viz leads you to the most appropriate graph for your data. It links to the code to build it and lists common caveats you should avoid.
* [Rmarkdown cookbook](https://bookdown.org/yihui/rmarkdown-cookbook)
* [Data Visualization: A practical introduction](http://socviz.co/) A book by Kieran Healy from Duke University. Nice one to have!
* [Functional programming and unit testing for data munging with R](http://www.brodrigues.co/fput/)
* [R workshops](https://github.com/nuitrcs/rworkshops) some resources for R related materials.
* [RStartHere](https://github.com/rstudio/RStartHere) A guide to some of the most useful R Packages that we know about, organized by their role in data science.
* [biobroom](https://www.bioconductor.org/packages/release/bioc/html/biobroom.html):Turn Bioconductor objects into tidy data frames
* [readr](https://github.com/hadley/readr)
* [visdat](https://github.com/ropensci/visdat) visualizing your missing data and more.
* [tidyr](https://github.com/hadley/tidyr)
* [stringr](https://github.com/tidyverse/stringr)
* [glue](https://github.com/tidyverse/glue#usage) Glue strings to data in R. Small, fast, dependency free interpreted string literals
* [purrr tutorial](https://jennybc.github.io/purrr-tutorial/index.html) by jenny bryan. functional programming in R.
* [Row-oriented workflows in R with the tidyverse](https://github.com/jennybc/row-oriented-workflows) `pmap` is your friend :)
* [janitor](https://github.com/sfirke/janitor) simple tools for data cleaning in R.
* [tidyeval resources](http://maraaverick.rbind.io/2017/08/tidyeval-resource-roundup/)
* Rstudio [tidyeval video](https://www.rstudio.com/resources/webinars/tidy-eval/)
* [Tidy evaluation, most common actions](https://edwinth.github.io/blog/dplyr-recipes/)
* [Tidy Eval Meets ggplot2](http://www.onceupondata.com/2018/07/06/ggplot-tidyeval/) a blog post.
* [Tidy evaluation in ggplot2](https://www.tidyverse.org/articles/2018/07/ggplot2-tidy-evaluation/) from tidyverse.
* [tidyeval patterns](https://www.tidyverse.org/articles/2019/06/rlang-0-4-0/)
* [Tidy eval now supports glue strings](https://www.tidyverse.org/blog/2020/02/glue-strings-and-tidy-eval/)
* [Non-standard evaluation, how tidy eval builds on base R](https://edwinth.github.io/blog/nse/)
* [My First Steps into The World of Tidy Eval](http://www.onceupondata.com/2017/08/12/my-first-steps-into-the-world-of-tidyeval/)
* [tidyeval shiny app](https://ijlyttle.shinyapps.io/tidyeval/)
* [tidyeval bookdown](https://tidyeval.tidyverse.org/)
* [reusing tidyverse code](https://speakerdeck.com/lionelhenry/reusing-tidyverse-code)
* [dplry](https://github.com/hadley/dplyr)
* [set_na_where(): a nonstandard evaluation use case](https://tjmahr.github.io/set-na-where-nonstandard-evaluation-use-case/)
* [programming with dplyr](http://dplyr.tidyverse.org/articles/programming.html) A great read on non-standard evaluation, quoating and qusiquotation. then the following two packages help you to deal with that.
* [best practices for programming with ggplot2](https://fishandwhistle.net/slides/rstudioconf2020/#1)
* [replyr](https://github.com/WinVector/replyr) An R package for fluid use of dplyr.
* [Introduction of Parameterized dplyr expression](http://blog.eighty20.co.za//package%20exploration/2017/02/16/replyr-dplyr/) using replyr
* [wrapr](https://github.com/WinVector/wrapr) wraps R functions debugging and better standard evaluation. `Let` function. blog post [wrapr: for sweet R code](http://www.win-vector.com/blog/2017/03/wrapr-for-sweet-r-code/)
* [Easy machine learning pipelines with pipelearner: intro and call for contributors](https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors?utm_content=buffer7ef93&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) [github page](https://github.com/drsimonj/pipelearner)
* [plot ROC with tidyverse](https://sydykova.com/post/2019-03-12-make-roc-curves-tidyverse/)
* [csv fingerprint](http://setosa.io/blog/2014/08/03/csv-fingerprints/)
* [ggplot2](https://github.com/hadley/ggplot2)
* [ggplot2 tips](http://t-redactyl.io/tag/ggplot2.html)
* [Demystifying ggplot2](https://rud.is/books/creating-ggplot2-extensions/demystifying-ggplot2.html) Learn how to write ggplot2 extensions.
* [A List of ggplot2 extensions](https://www.ggplot2-exts.org/)
* [using ggplot2 in packages](https://ggplot2.tidyverse.org/dev/articles/ggplot2-in-packages.html)
>If you already know the mapping in advance (like the above example) you should use the .data pronoun from rlang to make it explicit that you are referring to the drv in the layer data and not some other variable named drv (which may or may not exist elsewhere). To avoid a similar note from the CMD check about .data, use #' @importFrom rlang .data in any roxygen code block (typically this should be in the package documentation as generated by usethis::use_package_doc()).> * If you know the mapping or facet specification is col in advance, use aes(.data$col) or vars(.data$col).
> * If col is a variable that contains the column name as a character vector, use aes(.data[[col]] or vars(.data[[col]]).
> * If you would like the behaviour of col to look and feel like it would within aes() and vars(), use aes({{ col }}) or vars({{ col }}).* [gghighlight](https://github.com/yutannihilation/gghighlight): Highlight ggplot's Lines and Points with Predicates
* [Anatomy of gghighlight](https://yutani.rbind.io/post/2018-06-03-anatomy-of-gghighlight/)
* [nice ggplot themes](https://github.com/hrbrmstr/hrbrthemes)
* [ggsci](https://ggsci.net/) offers a collection of ggplot2 color palettes inspired by scientific journals, data visualization libraries, science fiction movies, and TV shows.
* The goal of [paletteer](https://github.com/EmilHvitfeldt/paletteer) is to be a comprehensize collection (666!)of color palettes in R using a common interface
* [randomcolR](https://github.com/ronammar/randomcoloR) An R package for generating attractive and distinctive colors.
* [colourpicker](https://github.com/daattali/colourpicker) A colour picker tool for Shiny and for selecting colours in plots (in R). [R blogger post](https://www.r-bloggers.com/plot-colour-helper-finally-an-easy-way-to-pick-colours-for-your-r-plots/amp/)
* [ggforce](https://github.com/thomasp85/ggforce/tree/facets): facet_zoom() to zoom in part of the figure! and many more.
* [ggpubr](http://www.sthda.com/english/rpkgs/ggpubr/): ‘ggplot2’ Based Publication Ready Plots. add pvalues. this saves me from customerizing my ggplot2 figures.
* [op 50 ggplot2 Visualizations - The Master List (With Full R Code)](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html)
* [kableExtra](https://github.com/haozhu233/kableExtra) Construct Complex Table with knitr::kable() + pipe.
* [ggedit](https://www.r-statistics.com/2016/11/ggedit-interactive-ggplot-aesthetic-and-theme-editor/?utm_content=buffer62da5&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) – interactive ggplot aesthetic and theme editor.
* [trelliscopejs](http://ryanhafen.com/blog/trelliscopejs) is an R package that brings faceted visualizations to life while plugging in to common analytical workflows like ggplot2 or the “tidyverse”.
* [Plotting background data for groups with ggplot2](https://drsimonj.svbtle.com/plotting-background-data-for-groups-with-ggplot2?utm_campaign=Data%2BElixir&utm_medium=email&utm_source=Data_Elixir_92)
* [Ordering categories within ggplot2 facets](https://drsimonj.svbtle.com/ordering-categories-within-ggplot2-facets)
* [plotly for R](https://cpsievert.github.io/plotly_book/)
* [rematch2](https://github.com/MangoTheCat/rematch2#readme)Tidy output from regular expression matches
* [Make waffle (square pie) charts in R](https://github.com/hrbrmstr/waffle)
* Bring the power of R to the command line: [littler](http://dirk.eddelbuettel.com/blog/2016/08/07/#littler-0.3.1) [Rio](https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/Rio) A wrapper by Jeroen Janssens, the author of [data science at the command line](http://datascienceatthecommandline.com/)
* [Complexheatmap](https://jokergoo.github.io/ComplexHeatmap-reference/book/) my go-to package for static heatmaps.
* [tidyheatmap](https://stemangiola.github.io/tidyHeatmap/articles/introduction.html) based on complexheatmap.
* [htmlwidgets for R](http://www.htmlwidgets.org/showcase_d3heatmap.html) including `d3heatmap` for interactive heatmaps.
* [focus() on correlations of some variables with many others](https://drsimonj.svbtle.com/how-does-one-variable-correlate-with-all-others)
* Explore correlations in R with [corrr](https://github.com/drsimonj/corrr)
* [Unit test in R](http://www.johnmyleswhite.com/notebook/2010/08/17/unit-testing-in-r-the-bare-minimum/)
* [sinaplot](https://cran.r-project.org/web/packages/sinaplot/vignettes/SinaPlot.html): an enhanced chart for simple and truthful representation of single observations over multiple classes. `ggforce` has `geom_sina` for the same purpose.
* [complexHeatmaps](https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html)
* [superheat](https://github.com/rlbarter/superheat) Another heatmap package worth learning besides `ComplexHeatmap`. Not as flexiable as ComplexHeatmap, but can be handy when the function you want has been implemented.
* [iheatmapr](https://github.com/AliciaSchep/iheatmapr) is an R package for building complex, interactive heatmaps using modular building blocks.
* [heatmap:gapmap](https://cran.rstudio.com/web/packages/gapmap/vignettes/simple_example.html)
* [dendsort](https://cran.r-project.org/web/packages/dendsort/index.html):Modular Leaf Ordering Methods for Dendrogram Nodes
* [dendextend](https://cran.r-project.org/web/packages/dendextend/vignettes/introduction.html#changing-a-dendrograms-structure)
* [Interactive Heat Maps for R Using plotly](https://github.com/talgalili/heatmaply)
* [Multiple plots on a page](https://stat545-ubc.github.io/block020_multiple-plots-on-a-page.html)
* [ggExtra](http://deanattali.com/2015/03/29/ggExtra-r-package/)
* [cowplot](https://github.com/wilkelab/cowplot) -- An add-on to the ggplot2 plotting package
* [ggplot2 - Easy way to mix multiple graphs on the same page - R software and data visualization](http://www.sthda.com/english/wiki/ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page-r-software-and-data-visualization)
* [Extract Tables from PDFs](https://github.com/leeper/tabulizer)
* Alternative to venndiagram! [upSetR](https://github.com/hms-dbmi/UpSetR)
* [hierarchicalSets](https://github.com/thomasp85/hierarchicalSets)
* [Intervene](https://bitbucket.org/CBGR/intervene) is a tool for intersection and visualization of multiple gene or genomic region sets.
* [In-depth introduction to machine learning in 15 hours of expert videos](http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/)
* [Data Analysis and Visualization Using R](http://varianceexplained.org/RData/)This is a course that combines video, HTML and interactive elements to teach the statistical programming language R.
* [dabestr](https://cran.r-project.org/web/packages/dabestr/vignettes/using-dabestr.html) dabestr is a package for Data Analysis using Bootstrap-Coupled ESTimation. https://github.com/ACCLAB/dabestr
* [These are the course notes for the Monash Bioinformatics Platform’s “R More” course](https://monashbioinformaticsplatform.github.io/r-more/)
* [gitbook: Getting used to R, RStudio, and R Markdown](https://ismayc.github.io/rbasics-book/index.html)
* [Technical Foundations of Informatics](https://info201-s17.github.io/book/) a free book to teach you R and many others.
* [Efficient R programming](https://csgillespie.github.io/efficientR/)
* [R for Data Science](http://r4ds.had.co.nz/) by Garrett Grolemund and Hadley Wickham### Genomic data visulization
* [karyoploteR](https://bernatgel.github.io/karyoploter_tutorial/Tutorial/PlotCoverage/PlotCoverage.html) Really powerful and versatile tool.
* [Bentobox](https://phanstiellab.github.io/BentoBox/index.html) BentoBox empowers users to programmatically and flexibly generate multi-panel figures.### Sankey graph
* [ggalluvial](http://corybrunson.github.io/ggalluvial/index.html)
* [ggforce](https://github.com/thomasp85/ggforce/tree/sankey) `geom_parallel_sets()`
* [easyalluvial](https://erblast.github.io/easyalluvial/)### Handling big data in R
* [A data.table and dplyr tour](https://atrebas.github.io/post/2019-03-03-datatable-dplyr/) A blog post compare dplyr and data.table side by side.
* [Lightning Fast Serialization of Data Frames for R](https://github.com/fstpackage/fst) faster than `data.table`, `feather`.
* [Rpub post: Handling large data sets in R](https://rpubs.com/msundar/large_data_analysis)
* [The disk.frame package aims to be the answer to the question: how do I manipulate structured tabular data that doesn’t fit into Random Access Memory (RAM)](https://github.com/xiaodaigh/disk.frame)
* [`dtplyr` and `tidyfast` are teaming up (well, at least in this blog post)](https://tysonbarrett.com//jekyll/update/2019/12/03/workflow_dtplyr_tidyfast/)
* [Fast reading of delimited files with vroom](https://vroom.r-lib.org) The fastest delimited reader for R, 1.40 GB/sec/sec.
* [stash: Naive on-disk caching in R](https://github.com/iqis/stash)
* [qs: Quick serialization of R objects](https://github.com/traversc/qs)
* [The fst package](https://www.fstpackage.org/) for R provides a fast, easy and flexible way to serialize data frames. With access speeds of multiple GB/s, fst is specifically designed to unlock the potential of high speed solid state disks that can be found in most modern computers. Data frames stored in the fst format have full random access, both in column and rows.
* The [arrow](https://github.com/apache/arrow/tree/master/r) package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets (open_dataset()), working with individual Parquet (read_parquet(), write_parquet()) and Feather (read_feather(), write_feather()) files, as well as lower-level access to Arrow memory and messages.
* [dm: Working with relational data models in R](https://github.com/krlmlr/dm). Use it today (if only like a list of tables). Build data models tomorrow. Deploy the data models to your organization’s RDBMS the day after.### Write your own R package
* [R packages](https://r-pkgs.org/) book from Hadely.
* [rpackages](https://github.com/jtleek/rpackages) from Jeff Leek.
* [WRITING R PACKAGES IN RSTUDIO TUTORIAL ADAPTED FROM STIRLINGCODINGCLUB.GITHUB.IO](https://ourcodingclub.github.io/tutorials/writing-r-package/)
* [R package development](https://combine-australia.github.io/r-pkg-dev/) This workshop was created by COMBINE, an association for Australian students in bioinformatics, computational biology and related fields.
* [usethis workflow for package development](https://www.hvitfeldt.me/2018/09/usethis-workflow-for-package-development/)
* [Developing R Packages with usethis and GitLab CI: Part I](https://blog.methodsconsultants.com/posts/developing-r-packages-using-gitlab-ci-part-i/)
* [Writing an R package from scratch](https://r-mageddon.netlify.com/post/writing-an-r-package-from-scratch/) a blog post.
* [available](https://github.com/ropenscilabs/available) helps you name your R package
* [goodpractice](https://cran.r-project.org/web/packages/goodpractice/index.html) An R package on Advice on R packages.
* [R package primer: a minimal tutorial](http://kbroman.org/pkg_primer/)
* [Write your own R package](http://stat545.com/packages06_foofactors-package.html)
* [R packages](http://r-pkgs.had.co.nz/) a book by Hadley Wickham.
* [Developing R packages](https://github.com/jtleek/rpackages/blob/master/README.md) from Jeff leek.
* [Sinew](https://github.com/metrumresearchgroup/sinew) is a R package that generates a roxygen2 skeleton populated with information scraped from the function script.
* [Automatic tools for improving R packages](http://www.masalmon.eu/2017/06/17/automatictools/) `devtools:spell_check()` `goodpractice:gp()` and `pkgdown:build_site()`.
* blog post [How to develop good R packages (for open science)](http://www.masalmon.eu/2017/12/11/goodrpackages/)
* Easy and efficient debugging for R packages: [debugme](https://github.com/r-lib/debugme)
* [Non-invasive pretty printing of R code](http://styler.r-lib.org)
* [usethis](https://github.com/r-lib/usethis) The goal of usethis is to automate many common package and analysis setup tasks.
* [Mastering Software Development in R](https://bookdown.org/rdpeng/RProgDA/) by Roger Peng et.al.
* [The tidyverse style guide](http://style.tidyverse.org/) by Hadley Wickham.
* [submitting your package to bioconductor](https://github.com/kuwisdelu/BiocMeetup/blob/master/2019-Jan/BioC-git-and-Github.pdf)### Documentation
* This is a must read for writing good documentations: A blog [post](https://www.divio.com/blog/documentation/). I saved it to a pdf and uploaded to this repo.
### handling arguments at the command line
* [docopt.R](https://github.com/docopt/docopt.R) [tutorial](http://www.slideshare.net/EdwindeJonge1/docopt-user2014)
* [python version](http://docopt.org/)
* [Generate a CLI tool from a Python module/function](https://github.com/bharadwaj-raju/cligenerator)
* [Introducing Python Fire, a library for automatically generating command line interfaces](https://opensource.googleblog.com/2017/03/python-fire-command-line.html)
* [Patterns and anti-patterns for writing command-line bioinformatics software](https://github.com/ctb/titus-blog/blob/add/command_line_patterns/src/2018-our-command-line-patterns.md) by Titus.### visualization in general
* [Nature Methods point of view data visulization](http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html)
* [A tutorial for the free Inkscape cross-platform vector graphics editor](https://github.com/fredhutchio/inkscape-tutorial)
* [gimp](https://www.gimp.org/downloads/) for bit-map based figures.
* [data vis resource from Sabah](https://sabahzero.github.io/dataviz/resources)
* [Ten simple rules to colorize biological data visualization](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008259)### Javascript
* [JavaScript versus Research Computing](https://gvwilson.github.io/js-vs-rc/) from Greg Wilson, the founder of software carpentry.
### python tips and tools
* [some nice free python books: Think python etc](http://greenteapress.com/wp/)
* [python learning resources](https://learnbyexample.github.io/py_resources/)
* [Interactive python](http://interactivepython.org/runestone/default/user/login?_next=/runestone/default/index) nice interactive books help you learn python.
* [30 Python Language Features and Tricks You May Not Know About](http://sahandsaba.com/thirty-python-language-features-and-tricks-you-may-not-know.html#id1)
* [intermediatePython](https://github.com/crazyhottommy/intermediatePython)
* [The Hitchhiker’s Guide to Python!](http://docs.python-guide.org/en/latest/)
* [Python 3 for Scientists](http://python-3-for-scientists.readthedocs.io/en/latest/)
* [Python FAQ: Why should I use Python 3?](https://eev.ee/blog/2016/07/31/python-faq-why-should-i-use-python-3/)
* [gitbook: Computational and Inferential Thinking; The Foundations of Data Science](https://www.gitbook.com/book/ds8/textbook/details)
* [A collection of python courses online](http://bafflednerd.com/learn-python-online/)
* [tpot:A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming.](https://github.com/rhiever/tpot)
* [Easy to use Python API wrapper to plot charts with matplotlib, plotly, bokeh and more](https://github.com/cuemacro/chartpy):chartpy creates a simple easy to use API to plot in a number of great Python chart libraries like plotly (via cufflinks), bokeh and matplotlib, with a unified interface. You simply need to change a single keyword to change which chart engine to use (see below), rather than having to learn the low level details of each library.
* [Top 8 resources for learning data analysis with pandas](http://www.dataschool.io/best-python-pandas-resources/)
* [Jupyter Notebooks for the Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook)
* [kite](https://kite.com/)The smart copilot for programmers. works with atom, sublime, vim and emacs!### machine learning
* [Practical Machine Learning with Python: standford crowd course](http://crowdcourse.stanford.edu/ml.html)
* [tidy modeling with R](https://www.tmwr.org/)### Amazon cloud computing
[Intro to AWS Cloud Computing](https://github.com/griffithlab/rnaseq_tutorial/wiki/Intro-to-AWS-Cloud-Computing#necessary-steps-for-launching-an-instance)
### Genomics-visualization-tools
There are many online web based tools for visualization of (cancer) genomic data. I put my collections here. I use R for visulization.
see a nice post by using python by Radhouane Aniba:[Genomic Data Visualization in Python](http://fullstackdatascientist.io/2016-03-15-genomic-data-visualization-using-python/)* [UCSC cancer genome browser](https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/) It has many data including TCGA data buit in, and can be very handy for both bench scientist and bioinformaticians.
* [UCSC Xena](http://xena.ucsc.edu/). A new tool developed by UCSC team as well. Poteintially very useful, but need more tutorials to follow.
* [UCSC genome browser](http://genome.ucsc.edu/). One of the most famous genome browser and my favoriate. **Every person** studying genetics, genomics and molecular biology needs to know how to use it. [Tutorials from OpenHelix](http://blog.openhelix.eu/?p=22649).
* [Epiviz 3](http://epiviz.github.io/index.html) is an interactive visualization tool for functional genomics data. It supports genome navigation like other genome browsers, but allows multiple visualizations of data within genomic regions using scatterplots, heatmaps and other user-supplied visualizations.
* Mutation Annotation & Genome Interpretation TCGA: [MAGA](http://magi.brown.edu/)
* [GeneProteinViz (GPViz)](http://icbi.at/software/gpviz/gpviz.shtml) is a versatile Java-based software for dynamic gene-centered visualization of genomic regions and/or variants.
* [ProteinPaint: Web Application for Visualizing Genomic Data](https://pecan.stjude.org/proteinpaint/TP53/) The software developed for this project highlights critical attributes about the mutations, including the form of protein variant (e.g. the new amino acid as a result of missense mutation), the name of sample from which the mutation was identified, whether the mutation is somatic or germline,### Databases
* [protein-protein interaction databases](http://startbioinfo.com/cgi-bin/simpleresources.pl?tn=PPI_AR)
* [A compilation of protein-protein interaction resources Akhilesh Bajpai and Sravanthi Davuluri (Correspondence: Acharya KK, [email protected])](http://startbioinfo.com/cgi-bin/simpleresources.pl?tn=PPI_AR)
* [DisGeNET](http://www.disgenet.org/web/DisGeNET/menu/home?utm_source=twitterfeed&utm_medium=twitter) is a discovery platform integrating information on gene-disease associations (GDAs) from several public data sources and the literature
* [Cancer3D](http://cancer3d.org/) is a database that unites information on somatic missense mutations from TCGA and CCLE, allowing users to explore two different cancer-related problems at the same time: drug sensitivity/biomarker identification and prediction of cancer drivers
* [UCSCXenaTools](https://github.com/ropensci/UCSCXenaTools) An R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq
* [PharmacoGx](https://bioconductor.org/packages/release/bioc/html/PharmacoGx.html) Contains a set of functions to perform large-scale analysis of pharmacogenomic data. public data sets such as CCLE can be easily downloaded!
* [clinical intepretations of variants in cancer](https://civic.genome.wustl.edu/#/home)
* [R Wrapper for DGIdb](http://bioconductor.org/packages/devel/bioc/html/rDGIdb.html) Drug-gene interaction database.
* [BioGrid](http://thebiogrid.org/) Welcome to the Biological General Repository for Interaction Datasets
* [The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands](http://nar.oxfordjournals.org/content/early/2015/10/11/nar.gkv1037.short?rss=1)
* [Public data and open source tools for multi-assay genomic investigation of disease](http://bib.oxfordjournals.org/content/early/2015/10/10/bib.bbv080.long)
* [cancer cell metabolism genes](http://bioinfo.mc.vanderbilt.edu/ccmGDB/index.html)
* [oncogenes and tumor suppressors](https://www.biostars.org/p/15890/) biostar post and [TSgene](http://bioinfo.mc.vanderbilt.edu/TSGene/index.html)
* [DriverDB: A database for cancer driver gene/mutation](http://ngs.ym.edu.tw/driverdb)
* Interaction of genes: [GENEMANIA](http://genemania.org/)
* [DATA DISCOVERY PLATFORM:Designed for researchers who use, share and collaborate on human genomic data](http://discover.repositive.io/)
* [zenodo: research shared](https://zenodo.org/collection/datasets)
* [dataMed](https://datamed.org/) biomedical and healthCAre Data Discovery Index Ecosystem.
* [repostive](https://repositive.io/) Discover a better way of searching for genomic data.
* The NCI's [Genomic Data Commons](https://gdc.nci.nih.gov/) (GDC) provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. A copy of TCGA and TARGET data? [Data Release Notes](https://gdc-docs.nci.nih.gov/Data/Release_Notes/Data_Release_Notes/?platform=hootsuite)
* [OASIS genomics](http://www.oasis-genomics.org/) from Pfizer. processed data from TCGA, CCLE, GTEx.
* [TCGA alternative splicing](http://bioinformatics.mdanderson.org/TCGASpliceSeq)
* [ISOexpresso](http://wiki.tgilab.org/ISOexpresso/): a web-based platform for isoform-level expression analysis in human cancer
* [omics databse](http://www.omicsdi.org/#/) The Omics Discovery Index (OmicsDI) provides dataset discovery across a heterogeneous, distributed group of Transcriptomics, Genomics, Proteomics and Metabolomics data resources spanning eight repositories in three continents and six organisations, including both open and controlled access data resources. The resource provides a short description of every dataset: accession, description, sample/data protocols biological evidences, publication, etc. Based on these metadata, OmicsDI provides extensive search capabilities, as well as identification of related datasets by metadata and data content where possible. In particular, OmicsDI identifies groups of related, multi-omics datasets across repositories by shared identifiers.
* [MAGI](http://magi.brown.edu/) Mutation Annotation &Genome Interpretation for TCGA data.
* [How to successfully apply for access to dbGaP](http://blog.repositive.io/how-to-successfully-apply-for-access-to-dbgap/)
* [Human cell Atlas](https://www.humancellatlas.org/) some preview data sets https://preview.data.humancellatlas.org/
* [DepMap](https://depmap.org/portal/depmap/) A Cancer Dependency Map to systematically identify genetic and pharmacologic dependencies and the biomarkers that predict them.### Large data consortium data mining
* [AnnotationHub](http://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/AnnotationHub-HOWTO.html#roadmap-epigenomics-project) bioconductor package for TCGA and epigenome roadmap, ENCODE project.
* [TCGAbiolinks](http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/tcgaBiolinks.html) bioconductor package.
* [GenomicDataCommons](https://github.com/Bioconductor/GenomicDataCommons) bioc package to acess GDC.
* [RTCGA bioconductor](http://bioconductor.org/packages/devel/bioc/html/RTCGA.html)
* [f1000 workflow paper TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages](http://f1000research.com/articles/5-1542/v1)
* paper [Data mining The Cancer Genome Atlas in the era of precision cancer medicine](http://www.smw.ch/content/smw-2015-14183/)
* [CrossHub](http://sourceforge.net/p/crosshub/): a tool for multi-way analysis of The Cancer Genome Atlas (TCGA) in the context of gene expression regulation mechanisms.
* [Ferret, a User-Friendly Java Tool to Extract Data from the 1000 Genomes Project](http://limousophie35.github.io/Ferret/)
* [EGA:European Genome-phenome Archive](https://www.ebi.ac.uk/ega/datasets)
* [survival curves for TCGA data: a simple web tool](http://www.oncolnc.org/)
* Genetic determinants of cancer patient survival http://survival.cshl.edu/. https://twitter.com/jsheltzer/status/1150828456340574209?s=12
"..in some papers and presentations, biologists will use TCGA survival curves showing that their favorite gene is associated with poor prognosis to argue that their gene is super-important. This is weak evidence. **Prognostic biomarkers are not necessarily strong cancer drivers**"
* [AACR Project GENIE](https://www.synapse.org/#!Synapse:syn7222066/wiki/405659) [data guide](https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources/blob/master/GENIEDataGuide.pdf)### Integrative analysis
* [High-dimensional genomic data bias correction and data integration using MANCIE](http://www.nature.com/ncomms/2016/160413/ncomms11305/full/ncomms11305.html) correct batch effects for data from different sequencing methods. (RNAseq vs ChIPseq)
*### Interactive visualization
* [Vega-lite](https://github.com/vega/vega-lite) A high-level grammar for visual analysis, built on top of Vega. Looks awesome!
* [Introducing altair, an R interface to the Altair Python Package](https://vegawidget.rbind.io/post/2018-05-20-introducing-altair/) which you can use to build and render Vega-Lite chart-specifications.
* The goal of [g(r)osling](https://github.com/gosling-lang/grosling) is to help you build interactive genomics visualizations with [Gosling](https://github.com/gosling-lang). This package uses reticulate to provide an interface to the Gos Python package.### Tutorials
* [Ten quick tips for effective dimensionality reduction](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907) by Susan Holmes.
* [PH525x series - Biomedical Data Science](http://genomicsclass.github.io/book/). Learn R and bioconductor.
* [Principal Component Analysis Explained Visually](http://setosa.io/ev/principal-component-analysis/)
* [PCA, MDS, k-means, Hierarchical clustering and heatmap](https://rpubs.com/crazyhottommy/PCA_MDS). I wrote it.
* [A tale of two heatmaps](https://rpubs.com/crazyhottommy/a-tale-of-two-heatmap-functions). I wrote it.
* [Heatmap demystified](https://rpubs.com/crazyhottommy/heatmap_demystified). I wrote it.
* [Cluster Analysis in R - Unsupervised machine learning](http://www.sthda.com/english/wiki/cluster-analysis-in-r-unsupervised-machine-learning#at_pco=smlre-1.0&at_si=58765a95fcb21379&at_ab=per-2&at_pos=3&at_tot=4) very practical intro on STHDA website.
* [I wrote on PCA, and heatmaps on Rpub](https://rpubs.com/crazyhottommy)
* A most read for clustering analysis for high-dimentional biological data:[Avoiding common pitfalls when clustering
biological data](http://stke.sciencemag.org/content/9/432/re6)
* [How does gene expression clustering work?](http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html) A must read for
clustering.
* [How to read PCA plots for scRNAseq](http://www.nxn.se/valent/2017/6/12/how-to-read-pca-plots) by VALENTINE SVENSSON.See https://t.co/yxCb85ctL1: "MDS best choice for preserving outliers, PCA for variance, & T-SNE for clusters" @mikelove @AndrewLBeam
— Rileen Sinha (@RileenSinha) August 25, 2016[paper: Outlier Preservation by Dimensionality Reduction Techniques](http://oai.cwi.nl/oai/asset/22628/22628B.pdf)
>"MDS best choice for preserving outliers, PCA for variance, & T-SNE for clusters"* [How to Use t-SNE Effectively](http://distill.pub/2016/misread-tsne/)
* [Rtsne](https://github.com/jkrijthe/Rtsne) R package for T-SNE
* [rtsne](https://github.com/jdonaldson/rtsne) An R package for t-SNE (t-Distributed Stochastic Neighbor Embedding)
a bug was in `rtsne`: https://gist.github.com/mikelove/74bbf5c41010ae1dc94281cface90d32
* [t-SNE-Heatmaps](https://github.com/KlugerLab/t-SNE-Heatmaps) Beta version of 1D t-SNE heatmaps to visualize expression patterns of hundreds of genes simultaneously in scRNA-seq.
* [PHATE dimensionality reduction method](https://github.com/KrishnaswamyLab/PHATE) paper: http://biorxiv.org/content/early/2017/03/24/120378
* [Uniform Manifold Approximation and Projection (UMAP)](https://github.com/lmcinnes/umap) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data. Run from R: https://gist.github.com/crazyhottommy/caa5a4a4b07ee7f08f7d0649780832ef
* [umapr](https://github.com/ropenscilabs/umapr) UMAP dimensionality reduction in R
* [Understanding UMAP](https://pair-code.github.io/understanding-umap/) very nice one to read!* [Survival analysis of TCGA patients integrating gene expression (RNASeq) data](https://www.biostars.org/p/153013/)
* [Tutorial: Machine Learning For Cancer Classification](https://www.biostars.org/p/85124/). It has four parts.
* [Learning bash scripting for beginners](http://www.cyberciti.biz/open-source/learning-bash-scripting-for-beginners/)
* [Bedtools tutorial](http://quinlanlab.org/tutorials/cshl2013/bedtools.html)
* [Gemini](http://gemini.readthedocs.org/en/latest/#tutorials) explores your vcf, and [slides](https://speakerdeck.com/arq5x).
* [GNU parallel](https://www.biostars.org/p/63816/)
* [A Tutorial on Principal Component Analysis](http://arxiv.org/abs/1404.1100)
* [StatQuest: PCA clearly explained](https://www.youtube.com/watch?v=fRiEZWQ-WT8)
* [Computing Workflows for Biologists: A Roadmap](http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002303)
* [Best Practices for Scientific Computing](http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745)
* [Google's R Style Guide](https://google.github.io/styleguide/Rguide.xml)### MOOC(Massive Open Online Courses)
* [The Open Source Data Science Masters](http://datasciencemasters.org/)
* [Path to a free self-taught education in Data Science!](https://github.com/open-source-society/data-science)
* [Path to a free self-taught education in Bioinformatics!](https://github.com/open-source-society/bioinformatics)
* [CODING CLUB TUTORIALS](https://ourcodingclub.github.io/tutorials/)
* [Udacity](https://www.udacity.com/)
* [Coursera](https://www.coursera.org/)
* [edx](https://www.edx.org/)### git and version control
* [git intro by github](https://github.github.io/on-demand/)
* [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/)
* [Happy Git and GitHub for the useR](http://happygitwithr.com/) A book by Jenny Bryan.
* [learn git branching](http://learngitbranching.js.org/)
* [A Git Workflow Walkthrough Series](http://vallandingham.me/git-workflow.html)
* [paper:A Quick Introduction to Version Control with Git and GitHub](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004668)
* [paper:Ten Simple Rules for Taking Advantage of Git and GitHub](http://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1004947)
* [software carpentry git novice lesson](http://swcarpentry.github.io/git-novice/)
* [git best practise](https://sethrobertson.github.io/GitBestPractices/)
* [git-hub cheatsheet](https://github.com/tiimgreen/github-cheat-sheet#readme)
* [oh shit git!](http://ohshitgit.com/) Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.
* [How to undo (almost) anything with Git](https://github.com/blog/2019-how-to-undo-almost-anything-with-git)
* [A guide for astronauts (now, programmers using Git) about what to do when things go wrong: git flight rules](https://github.com/k88hudson/git-flight-rules)
* An opinionated intermediate/advanced Git book: [git in practise](https://github.com/GitInPractice/GitInPractice#readme)
* [shournal](https://github.com/tycho-kirchner/shournal) Note: for a more formal introduction please read Bashing irreproducibility with shournal on [bioRxiv](https://www.biorxiv.org/content/10.1101/2020.08.03.232843v1). Save your bash history, record how a file is generated!### blogs
* [blogdown](https://github.com/rstudio/blogdown) from yihui xie.
* [Jekyll Jupyter Notebook plugin](https://github.com/red-data-tools/jekyll-jupyter-notebook)
* [How to Use Plotly with Jekyll and Github Pages](http://ryankuhn.net/blog/How-To-Use-Plotly-With-Jekyll)
* [render Rmd pages into blog posts using updated rmarkdown::render function](https://sbamin.com/blog/2017/05/hello-r-jekyll/)### data management
* [youtube video from softwarecarpentry](https://www.youtube.com/watch?v=3MEJ38BO6Mo)
* [research data management: the-turing-way](https://the-turing-way.netlify.app/reproducible-research/rdm/rdm-fair.html)
* [How to FAIR](https://howtofair.dk/)
* [The FAIR Guiding Principles for scientific data management and stewardship](https://www.nature.com/articles/sdata201618)
* [Developing a modern data workflow for living data](https://www.biorxiv.org/content/early/2018/06/12/344804)
* [online course CN-2559-BEST-PRACTICES-BIOMEDICAL-RESEARCH-DATA-MANAGEMENT](https://learn.canvas.net/courses/1854)
* [Ten Simple Rules for Creating a Good Data Management Plan](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004525)
* [Nine simple ways to make it easier to (re)use your data](https://ojs.library.queensu.ca/index.php/IEE/article/view/4608)
* [Dataone best practise Practices](https://www.dataone.org/best-practices)
* [Research Data Management: A Primer Publication of the National Information Standards Organization](https://groups.niso.org/apps/group_public/download.php/15375/PrimerRDM-2015-0727%E2%80%A6)
* [Data management for biologists](https://www.tjelvarolsson.com/blog/data-management-for-biologists/) A blog post by Tjelvar Olsson. Also check his [dtool](https://dtool.readthedocs.io/en/latest/philosophy.html)
* [peppy]([http://code.databio.org/peppy/) is a python package that provides an API for handling standardized project and sample metadata. If you define your project in Portable Encapsulated Project (PEP) format.### Automate your workflow, open science and reproducible research
**Automation wins in the long run.**
![](https://cloud.githubusercontent.com/assets/4106146/20045655/b58467e6-a468-11e6-8d63-b44da6a276b1.png)
**STEP 6 is usually missing!**
![](https://cloud.githubusercontent.com/assets/4106146/19217628/807b5bba-8da5-11e6-8387-5f701d7a9ead.jpg)
The pic was downloaded from http://biobungalow.weebly.com/bio-bungalow-blog/everybody-knows-the-scientific-method
#### Workflow languages
##### Reviews
* [Streamlining Data-Intensive Biology With Workflow Systems](https://dib-lab.github.io/2020-workflows-paper/) Nice read from Titus Brown Group.
* A [blog post](https://jmazz.me/blog/NGS-Workflows) comparing bash script, make, snakemake and nextflow.
* [paper:A review of bioinformatic pipeline frameworks](http://bib.oxfordjournals.org/content/early/2016/03/23/bib.bbw020.long)
* [Building Infrastructure and Workflows for Clinical Bioinformatics Pipelines](https://www.sciencedirect.com/science/article/pii/S2589408020300156)
* [Existing Workflow systems](https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems)
* [Workflow management software for pipeline development in NGS](https://www.biostars.org/p/115745/)
* [Awesome pipeline toolkit list](https://github.com/pditommaso/awesome-pipeline)##### Snakemake
* [Snakemake](https://github.com/snakemake/snakemake) [[Docs](https://snakemake.readthedocs.io)] [[Publication](https://academic.oup.com/bioinformatics/article/28/19/2520/290322)]
* [Snakemake tutorial from Titus Brown 2019](https://github.com/ctb/2019-snakemake-ucdavis)
* [Snakemake tutorial from Titus Brown 2020](https://hackmd.io/jXwbvOyQTqWqpuWwrpByHQ?view)
* [snakePipes: facilitating flexible, scalable and integrative epigenomic analysis](https://github.com/maxplanck-ie/snakepipes) [[Publication](https://academic.oup.com/bioinformatics/article/35/22/4757/5499080)]
* [Snk: A Snakemake CLI and Workflow Management System](https://joss.theoj.org/papers/10.21105/joss.07410)I am using snakemake and so far is very happy about it!
##### Nextflow
* [Nextflow](https://www.nextflow.io/) [[Docs](https://www.nextflow.io/docs/latest/index.html)] [[Publication](https://www.nature.com/articles/nbt.3820)]
* [Nextflow DSL 2 modular syntax](https://www.nextflow.io/docs/latest/dsl2.html) [[Original GitHub issue](https://github.com/nextflow-io/nextflow/issues/984)]
* [Nextflow Camp DSL 2 tutorial 2019](https://github.com/nextflow-io/nfcamp-tutorial)
* [CZ Biohub Nextflow tutorial 2019](https://github.com/czbiohub/nextflow-tutorial-2019)
* [Nextflow workshop tutorial 2018](https://nextflow-io.github.io/nf-hack18/)
* [Nextflow pipeline examples](https://www.nextflow.io/example1.html)
* [The nf-core framework for community-curated bioinformatics pipelines](https://nf-co.re/) [[Existing Workflows](https://nf-co.re/pipelines)] [[Publication](https://rdcu.be/b1GjZ)]
* [Curated list of Nextflow pipelines](https://github.com/nextflow-io/awesome-nextflow)
* [NGS pipelines by nextflow core](https://github.com/nf-core)
* [nextflow tower](https://tower.nf/)
* [A Nextflow pipeline assembler for genomics](https://github.com/assemblerflow/assemblerflow) and [flowcraft](https://github.com/assemblerflow/flowcraft) Now you can track both the execution of a nextflowio pipeline AND the reports that it generates in real-time! You can even follow the reports (https://tinyurl.com/y854vftf ) and the pipeline execution.#### Reproducible research
* [Data Skills for Reproducible Research](https://psyteachr.github.io/reprores-v3/)
* [pracpac: Practical R Packaging with Docker](https://arxiv.org/abs/2303.07876)
* [rix: Reproducible Environments with Nix](https://docs.ropensci.org/rix/) not only for R package versions but also R versions and operating systems. I will try it!
* [Awesome youtube video for reproducible workflow](https://www.youtube.com/watch?v=s3JldKoA0zw&feature=youtu.be)
* A great book Building reproducible analytical pipelines with R https://raps-with-r.dev/preface.html
* [A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility](https://github.com/karthik/ddd)
* A must read: [Parallel sequencing lives, or what makes large sequencing projects successful ](https://academic.oup.com/gigascience/article/6/11/gix100/4557140)
* [A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker](https://psyarxiv.com/8xzqy/) https://github.com/aaronpeikert/reproducible-research
* [Reproducibility starts at home](http://www.jonzelner.net/statistics/make/docker/reproducibility/2016/05/31/reproducibility-pt-1/) A series of blog posts by Jon Zelner.
* [docker intro](https://staph-b.github.io/docker-builds/)
* [cyverse Reproducibility Tour](https://learning.cyverse.org/projects/cyverse-cyverse-reproducbility-tutorial/en/latest/index.html#)
* [Conda hacks for data science efficiency](http://ericmjl.com/blog/2018/12/25/conda-hacks-for-data-science-efficiency/)
* [Practical Computational Reproducibility in the Life Sciences](https://www.cell.com/cell-systems/fulltext/S2405-4712(18)30140-6?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2405471218301406%3Fshowall%3Dtrue) from Cell Systems.
* [Analysis validation has been neglected in the Age of Reproducibility](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000070)
* [The Life & Times of a Reproducible Clinical Project](https://github.com/jenniferthompson/RMedicine2018) https://jenthompson.me/slides/rmedicine2018/rmedicine2018#1
* [github Actions for R](https://speakerdeck.com/jimhester/github-actions-for-r)
* [Automate testing of your R package using Travis CI, Codecov, and testthat](https://jef.works/blog/2019/02/17/automate-testing-of-your-R-package/) by Jean Fan.
* [Reproducible computational environments using containers](https://dme26.github.io/docker-introduction/)
* [docker intro by Cyverse](https://cyverse-cybercarpentry-container-workshop-2018.readthedocs-hosted.com/en/latest/docker/dockerintro.html) and [singularity](https://cyverse-container-camp-workshop-2018.readthedocs-hosted.com/en/latest/index.html) by upendra devisetty. I met him in UC Davis during 2018 ANGUS :)
* [rocker/binder](https://github.com/rocker-org/binder) Adds binder abilities on top of the rocker/tidyverse images.
* [Embedding containerized workflows inside data science notebooks enhances reproducibility](https://www.biorxiv.org/content/early/2018/05/02/309567)
* [workflowr](https://jdblischak.github.io/workflowr/index.html): organized + reproducible + shareable data science in R
* [Singularity](http://singularity.lbl.gov/) Singularity enables users to have full control of their environment. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data. This means that you don’t have to ask your cluster admin to install anything for you - you can put it in a Singularity container and run.
* [EMBL-bioIT singularity workshop](https://git.embl.de/grp-bio-it/singularity-training-2019)
* [countinous analysis](https://github.com/greenelab/continuous_analysis) [Reproducibility of computational workflows is automated using continuous analysis](http://www.nature.com/nbt/journal/v35/n4/full/nbt.3780.html)
* [The hard road to reproducibility](http://science.sciencemag.org/content/354/6308/142) commentary on Science Magzine.
* [Five selfish reasons to work reproducibly](http://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0850-7) Genome Biology paper.
* [Make lessons from software carpentry](http://swcarpentry.github.io/make-novice/)
* [biomake](https://github.com/evoldoers/biomake) GNU-Make-like utility for managing builds and complex workflows.
* [drake](https://github.com/ropensci/drake) An R-focused pipeline toolkit for reproducibility and high-performance computing. Snakemake in R.
* [STAT545 Automating data analysis pipelines](https://stat545-ubc.github.io/automation00_index.html)
* [biostar post:Job Manager to parallelize otherwise consecutive bash scripts](https://www.biostars.org/p/174468/)
* [initial steps toward reproducible research](http://kbroman.org/steps2rr/#TAGC16)
* [JupyterLab: the next generation of the Jupyter Notebook](http://blog.jupyter.org/2016/07/14/jupyter-lab-alpha/)
* [Deepnote](https://www.deepnote.com) - Better UI for Jupyter and enables collaboration & working online without installing anything.
* [R notebook](http://data-steve.github.io/setting-up-r-notebook/)
* [CoCAL](https://cocalc.com/) Collaborative Calculation in the Cloud
* [BEAKER THE DATA SCIENTIST'S LABORATORY](http://beakernotebook.com/)
* [nteract] notebook (https://nteract.io/)
* A video by Dr.Keith A. Baggerly from MD Anderson [The Importance of Reproducible Research in High-Throughput Biology](https://www.youtube.com/watch?v=7gYIs7uYbMo) very interesting, and Keith is really a fun guy!
* [paper: Ten Simple Rules for Reproducible Computational Research](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285)
* [open-research](https://github.com/svaksha/aksh/blob/master/open-research.md#arr)
* [Best Practice Data Life Cycle Approaches for the Life Sciences](http://www.biorxiv.org/content/early/2017/07/24/167619)
* [Good Enough Practices in Scientific Computing](http://arxiv.org/abs/1609.00037) We present a set of computing tools and techniques that every researcher can and should adopt. These recommendations synthesize inspiration from our own work, from the experiences of the thousands of people who have taken part in Software Carpentry and Data Carpentry workshops over the past six years, and from a variety of other guides. Unlike some other guides, our recommendations are aimed specifically at people who are new to research computing. **Well worth reading!**
* [A Quick Guide to Organizing Computational Biology Projects](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424) A must read for computational biologists!
* [Ten Simple Rules for Digital Data Storage](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005097)
* avoid `setwd()` in your R script. [`here_here()`](https://github.com/jennybc/here_here#readme) comes to rescue.* **Have you ever had problem to reuse one of your own published figures due to copyright of the journal?**
Here is the [solution](https://storify.com/LorenaABarba/reactions-to-my-tip-on-how-i-use-figshare)! from @LorenaABarba>As an early adopter of the Figshare repository, I came up with a strategy that serves both our open-science and our reproducibility goals, and also helps with this problem: for the main results in any new paper, we would share the data, plotting script and figure under a CC-BY license, by first uploading them to Figshare.
### Survival curve
* tidymodels survival analysis with [censored](https://censored.tidymodels.org/) https://hfrick.github.io/rstudio-conf-2022/#/section
* [Survival Analysis in R](http://www.emilyzabor.com/tutorials/survival_analysis_in_r_tutorial.html) This tutorial was originally presented at the Memorial Sloan Kettering Cancer Center R-Presenters series on August 30, 2018 by [Emily](http://www.emilyzabor.com)
* [Survival plots have never been so informative: survminer package](http://r-addict.com/2016/05/23/Informative-Survival-Plots.html)
* posts for survival analysis:
** [Survival Analysis - 1](http://justanotherdatablog.blogspot.com/2015/08/survival-analysis-1.html) KM estimator
** [Survival Analysis - 2](http://justanotherdatablog.blogspot.com/2015/08/survival-analysis-2.html) Cox's proportional hazards model
** [Overall Survival Curves for TCGA and Tothill by RD Status](http://bioinformatics.mdanderson.org/Supplements/ResidualDisease/Reports/osCurves.html)
** [Survival analysis of TCGA patients integrating gene expression (RNASeq) data](https://www.biostars.org/p/153013/)
* [survminer](http://www.sthda.com/english/wiki/survminer-0-3-0)
* [survival analysis with TCGA](http://bioconnector.org/r-survival.html)
* [Kaplan Meier Mistakes](https://towardsdatascience.com/kaplan-meier-mistakes-48cd9e168b09) a blog post by https://twitter.com/BencomoTomas
* [TCGA survival](https://tcga-survival.com/)### Organize research for a group
* [slack](https://slack.com/):A messaging app for teams.
* [Ryver](http://www.ryver.com/ryver-vs-slack/).
* [Trello](https://trello.com/) lets you work more collaboratively and get more done.### Clustering
* [densityCut](http://m.bioinformatics.oxfordjournals.org/content/early/2016/04/23/bioinformatics.btw227.short?rss=1): an efficient and versatile topological approach for automatic clustering of biological data
* [Interactive visualisation and fast computation of the solution path: convex bi-clustering](https://www.youtube.com/watch?v=2g-akN6q8aI) by [Genevera Allen](http://www.stat.rice.edu/~gallen/software.html)
[cvxbiclustr](https://cran.r-project.org/web/packages/cvxbiclustr/index.html) and the clustRviz package coming.
* [optCluster](https://cran.r-project.org/web/packages/optCluster/index.html): An R Package for Determining the Optimal Clustering Algorithm.
* [iClusterPlus](https://www.bioconductor.org/packages/release/bioc/html/iClusterPlus.html) Integrative clustering of multiple genomic data using a joint latent variable model.
* [ConsensusClusterPlus](https://bioconductor.org/packages/release/bioc/html/ConsensusClusterPlus.html) algorithm for determining cluster count and membership by stability evidence in unsupervised analysis.### CRISPR related
* [CRISPR GENOME EDITING MADE EASY](https://www.deskgen.com/landing/)
* [CRISPR design from Japan](http://crispr.dbcls.jp/)
* [CRISPResso](http://crispresso.rocks/):Analysis of CRISPR-Cas9 genome editing outcomes from deep sequencing data
* [CRISPR-DO](http://cistrome.org/crispr/): A whole genome CRISPR designer and optimizer in human and mouse
* [CCTop](http://crispr.cos.uni-heidelberg.de/) - CRISPR/Cas9 target online predictor
* [DESKGEN](https://horizon.deskgen.com/landing/#/)
* [Genome-wide Unbiased Identifications of DSBs Evaluated by Sequencing (GUIDE-seq) is a novel method the Joung lab has developed to identify the off-target sites of CRISPR-Cas RNA-guided Nucleases](http://www.jounglab.org/guideseq)
* [WTSI Genome Editing (WGE) is a website that provides tools to aid with genome editing of human and mouse genomes](http://www.sanger.ac.uk/htgt/wge/)### vector arts for life sciences
* [biorender](https://biorender.io/)
* [The Noun Project](https://thenounproject.com/)
* https://bioicons.com/
* https://healthicons.org/
* [reactome icon](https://reactome.org/icon-lib)
* [Inovative genomic Institute glossary](https://innovativegenomics.org/resources/educational-materials/glossary/)\
* https://smart.servier.com/category/cellular-biology/nucleic-acids/
* https://www.vecteezy.com/
* https://www.freepik.com/
* https://pixabay.com/
* make workflow diagram https://app.diagrams.net/