Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Bioinformatics
A curated list of awesome Bioinformatics libraries and software.
https://github.com/danielecook/Awesome-Bioinformatics
Last synced: about 6 hours ago
JSON representation
-
Package suites
- Bioconductor - A plethora of tools for analysis and comprehension of high-throughput genomic data, including 1500+ software packages. [ [paper-2004](https://link.springer.com/article/10.1186/gb-2004-5-10-r80) | [web](https://www.bioconductor.org) ]
- Bioconda - A channel for the [conda package manager](http://conda.pydata.org/docs/intro.html) specializing in bioinformatics software. Includes a repository with 3000+ ready-to-install (with `conda install`) bioinformatics packages. [ [paper-2018](https://pubmed.ncbi.nlm.nih.gov/29967506) | [web](https://bioconda.github.io) ]
- BioJulia - Bioinformatics and computational biology infastructure for the Julia programming language. [ [web](https://biojulia.net) ]
- (Poly)merase - A Go library and command line utility for engineering organisms.
- Bioperl - International association of users & developers of open source Perl tools for bioinformatics, genomics and life sciences. [ [paper-2002](https://doi.org/10.1101%2Fgr.361602) | [web](https://bioperl.org) ]
- Biopython - Freely available tools for biological computing in Python, with included cookbook, packaging and thorough documentation. Part of the [Open Bioinformatics Foundation](http://open-bio.org/). Contains the very useful [Entrez](https://biopython.org/DIST/docs/api/Bio.Entrez-module.html) package for API access to the NCBI databases. [ [paper-2009](https://pubmed.ncbi.nlm.nih.gov/19304878) | [web](https://biopython.org) ]
- Rust-Bio - Rust implementations of algorithms and data structures useful for bioinformatics. [ [paper-2016](http://bioinformatics.oxfordjournals.org/content/early/2015/10/06/bioinformatics.btv573.short?rss=1) ]
- SeqAn - The modern C++ library for sequence analysis.
- (Poly)merase - A Go library and command line utility for engineering organisms.
- Biocaml - Biocaml aims to be a high-performance user-friendly library for Bioinformatics.
-
Data Processing
-
Command Line Utilities
- datamash - Data transformations and statistics. [ [web](http://www.gnu.org/software/datamash) ]
- Here
- Bioinformatics One Liners - Git repo of useful single line commands.
- BioNode - Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows. [ [web](http://bionode.io) ]
- bioSyntax - Syntax Highlighting for Computational Biology file formats (SAM, VCF, GTF, FASTA, PDB, etc...) in vim/less/gedit/sublime. [ [paper-2018](https://pubmed.ncbi.nlm.nih.gov/30134911) | [web](http://www.bioSyntax.org) ]
- CSVKit - Utilities for working with CSV/Tab-delimited files. [ [web](https://csvkit.readthedocs.io/en/latest) ]
- csvtk - Another cross-platform, efficient, practical and pretty CSV/TSV toolkit. [ [web](https://bioinf.shenwei.me/csvtk) ]
- easy_qsub - Easily submitting PBS jobs with script template. Multiple input files supported.
- grabix - A wee tool for random access into BGZF files.
- gsort - Sort genomic files according to a specified order.
- tabix - Table file index. [ [paper-2011](https://pubmed.ncbi.nlm.nih.gov/21208982) ]
- wormtable - Write-once-read-many table for large datasets.
- zindex - Create an index on a compressed text file.
-
-
Next Generation Sequencing
-
Workflow Managers
- Galaxy - a popular open-source, web-based platform for data intensive biomedical research. Has several features, from data analysis to workflow management to visualization tools. [ [paper-2018](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030816) | [web](https://galaxyproject.org) ]
- Snakemake - A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ [paper-2018](https://pubmed.ncbi.nlm.nih.gov/29788404) | [web](https://snakemake.readthedocs.io) ]
- Snakemake - A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ [paper-2018](https://pubmed.ncbi.nlm.nih.gov/29788404) | [web](https://snakemake.readthedocs.io) ]
- BigDataScript - A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities. [ [paper-2014](https://pubmed.ncbi.nlm.nih.gov/25189778) | [web](https://pcingola.github.io/BigDataScript) ]
- Bpipe - A small language for defining pipeline stages and linking them together to make pipelines. [ [web](http://docs.bpipe.org) ]
- Common Workflow Language - a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. [ [web](http://www.commonwl.org) ]
- Cromwell - A Workflow Management System geared towards scientific workflows. [ [web](https://cromwell.readthedocs.io) ]
- Nextflow - A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. [ [paper-2018](https://pubmed.ncbi.nlm.nih.gov/29412134) | [web](http://nextflow.io) ]
- redun - A python-based workflow manager.
- Ruffus - Computation Pipeline library for python widely used in science and bioinformatics. [ [paper-2010](https://pubmed.ncbi.nlm.nih.gov/20847218) | [web](http://www.ruffus.org.uk) ]
- SciPipe - Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output [ [paper-2019](https://pubmed.ncbi.nlm.nih.gov/31029061/) | [web](https://scipipe.org/) ]
- SeqWare - Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments. [ [paper-2010](https://pubmed.ncbi.nlm.nih.gov/21210981) | [web](https://seqware.github.io) ]
- Workflow Descriptor Language - Workflow standard developed by the Broad. [ [web](https://software.broadinstitute.org/wdl) ]
-
Pipelines
- bcbio-nextgen - Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction. [ [web](https://bcbio-nextgen.readthedocs.io) ]
- bcbio-nextgen - Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction. [ [web](https://bcbio-nextgen.readthedocs.io) ]
- Awesome-Pipeline - A list of pipeline resources.
- Bactopia - A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes. [ [web](https://bactopia.github.io/) ]
- Bacannot - A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results. [ [web](https://bacannot.readthedocs.io/en/latest/?badge=latest) ]
- R-Peridot - Customizable pipeline for differential expression analysis with an intuitive GUI. [ [web](http://www.bioinformatics-brazil.org/r-peridot) ]
- ngs-preprocess - A pipeline for preprocessing short and long sequencing reads, built with Nextflow. [ [web](https://ngs-preprocess.readthedocs.io/en/latest/?badge=latest) ]
-
GFF BED File Utilities
- BEDOPS - The fast, highly scalable and easily-parallelizable genome analysis toolkit. [ [paper-2012](https://academic.oup.com/bioinformatics/article/28/14/1919/218826) ]
- AGAT - Suite of tools to handle gene annotations in any GTF/GFF format. [ [web](https://agat.readthedocs.io/en/latest/?badge=latest) ]
- Bedtools2 - A Swiss Army knife for genome arithmetic. [ [paper-2010](https://pubmed.ncbi.nlm.nih.gov/20110278) | [paper-2014](https://pubmed.ncbi.nlm.nih.gov/25199790) | [web](https://bedtools.readthedocs.io) ]
- gffutils - GFF and GTF file manipulation and interconversion. [ [web](http://daler.github.io/gffutils) ]
-
Variant Prediction/Annotation
- Ensembl VEP - The VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. [ [paper-2016](https://doi.org/10.1186/s13059-016-0974-4) | [web](http://www.ensembl.org/info/docs/tools/vep/index.html) ]
- SIFT - Predicts whether an amino acid substitution affects protein function. [ [paper-2003](https://pubmed.ncbi.nlm.nih.gov/12824425) | [web](http://sift.jcvi.org) ]
- SnpEff - Genetic variant annotation and effect prediction toolbox. [ [paper-2012](https://www.tandfonline.com/doi/full/10.4161/fly.19695) | [web](https://pcingola.github.io/SnpEff) ]
-
Sequence Processing
- MultiQC - Aggregate results from bioinformatics analyses across many samples into a single report. [ [paper-2016](https://pubmed.ncbi.nlm.nih.gov/27312411) | [web](http://multiqc.info) ]
- AfterQC - Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data. [ [paper-2017](https://pubmed.ncbi.nlm.nih.gov/28361673) ]
- FastQC - A quality control tool for high throughput sequence data. [ [web](http://www.bioinformatics.babraham.ac.uk/projects/fastqc) ]
- Fastqp - FASTQ and SAM quality control using Python.
- Fastx Tookit - FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities. [ [web](http://hannonlab.cshl.edu/fastx_toolkit) ]
- MultiQC - Aggregate results from bioinformatics analyses across many samples into a single report. [ [paper-2016](https://pubmed.ncbi.nlm.nih.gov/27312411) | [web](http://multiqc.info) ]
- SeqFu - Sequence manipulation toolkit for FASTA/FASTQ files written in Nim. [ [paper-2021](https://www.mdpi.com/2306-5354/8/5/59) | [web](https://telatin.github.io/seqfu2/) ]
- SeqKit - A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang. [ [paper-2016](https://pubmed.ncbi.nlm.nih.gov/27706213) | [web](https://bioinf.shenwei.me/seqkit) ]
- seqmagick - file format conversion in Biopython in a convenient way. [ [web](http://seqmagick.readthedocs.io) ]
- Seqtk - Toolkit for processing sequences in FASTA/Q formats.
- smof - UNIX-style FASTA manipulation tools.
-
Sequence Alignment
- WFA - the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment [ [paper-2020](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa777/5904262) ]
- Bowtie 2 - An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. [ [paper-2012](https://pubmed.ncbi.nlm.nih.gov/22388286) | [web](http://bowtie-bio.sourceforge.net/bowtie2) ]
- BWA - Burrow-Wheeler Aligner for pairwise alignment between DNA sequences.
- WFA - the wavefront alignment algorithm (WFA) which expoit sequence similarity to speed up alignment [ [paper-2020](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa777/5904262) ]
- Parasail - SIMD C library for global, semi-global, and local pairwise sequence alignments [ [paper-2016](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0930-z) ]
- MUMmer - A system for rapidly aligning entire genomes, whether in complete or draft form. [ [paper-1999](http://mummer.sourceforge.net/MUMmer.pdf) | [paper-2002](http://mummer.sourceforge.net/MUMmer2.pdf) | [paper-2004](http://mummer.sourceforge.net/MUMmer3.pdf) | [web](http://mummer.sourceforge.net) ]
- DIAMOND - An ultrafast protein aligner for `blastp` and `blastx` like searches. [ [paper-2021](https://www.nature.com/articles/s41592-021-01101-x) ]
- POA - Partial-Order Alignment for fast alignment and consensus of multiple homologous sequences. [ [paper-2002](https://academic.oup.com/bioinformatics/article/18/3/452/236691) ]
- MMseqs2 - Ultra-fast, sensitive search and clustering suite for protein and nucleotide sequence sets. [ [paper-2017](https://www.nature.com/articles/nbt.3988) | [paper-2018](https://www.nature.com/articles/s41467-018-04964-5) ]
-
Variant Calling
- freebayes - Bayesian haplotype-based polymorphism discovery and genotyping. [ [web](http://arxiv.org/abs/1207.3907) ]
- DeepVariant - Deep learning-based variant caller [ [paper-2018](https://rdcu.be/7Dhl) ]
- freebayes - Bayesian haplotype-based polymorphism discovery and genotyping. [ [web](http://arxiv.org/abs/1207.3907) ]
- GATK - Variant Discovery in High-Throughput Sequencing Data. [ [web](https://software.broadinstitute.org/gatk) ]
- Octopus - A polymorphic bayesian genotyping model with wide applicability. [ [paper-2021](https://www.nature.com/articles/s41587-021-00861-3) ]
- Delly - Structural variant discovery by integrated paired-end and split-read analysis. [ [paper-2012](https://pubmed.ncbi.nlm.nih.gov/22962449) ]
- lumpy - lumpy: a general probabilistic framework for structural variant discovery. [ [paper-2014](https://link.springer.com/article/10.1186/gb-2014-15-6-r84) ]
- manta - Structural variant and indel caller for mapped sequencing data. [ [paper-2015](https://pubmed.ncbi.nlm.nih.gov/26647377) ]
- gridss - GRIDSS: the Genomic Rearrangement IDentification Software Suite. [ [paper-2017](https://pubmed.ncbi.nlm.nih.gov/29097403) ]
- smoove - structural variant calling and genotyping with existing tools, but,smoothly.
-
Data Analysis
-
Quantification
- Cufflinks - Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. [ [paper-2010](https://www.nature.com/articles/nbt.1621) ]
- RSEM - A software package for estimating gene and isoform expression levels from RNA-Seq data. [ [paper-2011](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-323) | [web](http://deweylab.github.io/RSEM/) ]
-
BAM File Utilities
- Bamtools - Collection of tools for working with BAM files. [ [paper-2011](https://academic.oup.com/bioinformatics/article/27/12/1691/255399) ]
- bam toolbox
- mergesam - Automate common SAM & BAM conversions.
- mosdepth - fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing. [ [paper-2017](https://pubmed.ncbi.nlm.nih.gov/29096012/) ]
- SAMstat - Displaying sequence statistics for next-generation sequencing. [ [paper-2010](https://academic.oup.com/bioinformatics/article/27/1/130/201972) | [web](http://samstat.sourceforge.net) ]
- Somalier - Fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs. [ [paper-2020](https://pubmed.ncbi.nlm.nih.gov/32664994) ]
- Telseq - Telseq is a tool for estimating telomere length from whole genome sequence data. [ [paper-2014](https://academic.oup.com/nar/article/42/9/e75/1249448) ]
-
VCF File Utilities
- vcfanno - Annotate a VCF with other VCFs/BEDs/tabixed files. [ [paper-2016](https://pubmed.ncbi.nlm.nih.gov/27250555) ]
- vcflib - A C++ library for parsing and manipulating VCF files.
- vcftools - VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst). [ [paper-2011](https://pubmed.ncbi.nlm.nih.gov/21653522) ]
- bcftools - Set of tools for manipulating VCF files. [ [paper-2016](https://pubmed.ncbi.nlm.nih.gov/26826718) | [paper-2017](https://pubmed.ncbi.nlm.nih.gov/28205675) | [web](http://samtools.github.io/bcftools) ]
-
Variant Simulation
- Bam Surgeon - Tools for adding mutations to existing `.bam` files, used for testing mutation callers. [ [web](https://popmodels.cancercontrol.cancer.gov/gsr/packages/bamsurgeon) ]
- wgsim - **Comes with samtools!** - Reads simulator. [ [web](https://popmodels.cancercontrol.cancer.gov/gsr/packages/wgsim) ]
-
Python Modules
- cruzdb - Pythonic access to the UCSC Genome database. [ [paper-2013](https://academic.oup.com/bioinformatics/article/29/23/3003/248468) ]
- pyensembl - Pythonic Access to the Ensembl database. [ [web](https://pyensembl.readthedocs.io/en/latest/pyensembl.html) ]
- bioservices - Access to Biological Web Services from Python. [ [paper-2013](https://academic.oup.com/bioinformatics/article/29/24/3241/194040) | [web](http://bioservices.readthedocs.io) ]
- cyvcf - A port of [pyVCF](https://github.com/jamescasbon/PyVCF) using Cython for speed.
- cyvcf2 - Cython + HTSlib == fast VCF parsing; even faster parsing than pyVCF. [ [paper-2017](https://pubmed.ncbi.nlm.nih.gov/28165109) | [web](https://brentp.github.io/cyvcf2) ]
- pyBedTools - Python wrapper for [bedtools](https://github.com/arq5x/bedtools). [ [paper-2011](https://pubmed.ncbi.nlm.nih.gov/21949271) | [web](http://daler.github.io/pybedtools) ]
- pyfaidx - Pythonic access to FASTA files.
- pysam - Python wrapper for [samtools](https://github.com/samtools/samtools). [ [web](https://pysam.readthedocs.io/en/latest/api.html) ]
- pyVCF - A VCF Parser for Python. [ [web](http://pyvcf.readthedocs.org/en/latest/index.html) ]
-
Assembly
- SPAdes - SPAdes (St. Petersburg genome assembler) is an assembly toolkit containing various assembly pipelines and the de-facto standard for prokaryotic genome assemblies.
- SKESA - SKESA is a de-novo sequence read assembler for microbial genomes. It uses conservative heuristics and is designed to create breaks at repeat regions in the genome. This leads to excellent sequence quality without significantly compromising contiguity.
-
Annotation
- Prokka - Prokka: rapid prokaryotic genome annotation. Prokka is one of the most cited annotation command line tools for microbial genome annotations.
- Bakta - Bakta is a tool for the rapid & standardized annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readable JSON & bioinformatics standard file formats for automatic downstream analysis.
-
-
Visualization
-
Circos Related
- rCircos - R package for circular plots. [ [paper-2013](https://pubmed.ncbi.nlm.nih.gov/23937229) | [web](http://watson.nci.nih.gov/cran_mirror/web/packages/RCircos/index.html) ]
- [paper-2015
- [paper-2014
- [paper-2014
- Circos - Perl package for circular plots, which are well suited for genomic rearrangements. [ [paper-2009](https://pubmed.ncbi.nlm.nih.gov/19541911) | [web](http://circos.ca) ]
- fujiplot - A circos representation of multiple GWAS results. [ [paper-2018](https://www.nature.com/articles/s41588-018-0047-6) ]
- [paper-2015
- [paper-2014
- [paper-2014
-
Genome Browsers / Gene Diagrams
- Squiggle - Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations. [ [paper-2018](https://pubmed.ncbi.nlm.nih.gov/30247632) | [web](https://squiggle.readthedocs.io/en/latest/) ]
- [paper-2011
- [web
- Squiggle - Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations. [ [paper-2018](https://pubmed.ncbi.nlm.nih.gov/30247632) | [web](https://squiggle.readthedocs.io/en/latest/) ]
- biodalliance - Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.
- BioJS - BioJS is a library of over hundred JavaScript components enabling you to visualize and process data using current web technologies. [ [paper-2014](https://pubmed.ncbi.nlm.nih.gov/25075290/) | [web](http://biojs.net/) ]
- Circleator - Flexible circular visualization of genome-associated data with BioPerl and SVG. [ [paper-2014](https://pubmed.ncbi.nlm.nih.gov/25075113) ]
- DNAism - Horizon chart D3-based JavaScript library for DNA data. [ [paper-2016](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0891-2) | [web](http://drio.github.io/dnaism/) ]
- IGV js - Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats. [ [paper-2019](https://pubmed.ncbi.nlm.nih.gov/31099383) | [web](https://software.broadinstitute.org/software/igv) ]
- Island Plot - D3 JavaScript based genome viewer. Constructs SVGs. [ [paper-2015](https://pubmed.ncbi.nlm.nih.gov/25916842/) ]
- JBrowse - JavaScript genome browser that is highly customizable via plugins and track customizations. [ [paper-2016](https://pubmed.ncbi.nlm.nih.gov/27072794) | [web](http://jbrowse.org/) ]
- PHAT - Point and click, cross platform suite for analysing and visualizing next-generation sequencing datasets. [ [paper-2018](https://pubmed.ncbi.nlm.nih.gov/30561651) | [web](https://chgibb.github.io/PHATDocs) ]
- pileup.js - JavaScript library that can be used to generate interactive and highly customizable web-based genome browsers. [ [paper-2016](https://pubmed.ncbi.nlm.nih.gov/27153605) ]
- scribl - JavaScript library for drawing canvas-based gene diagrams. [ [paper-2012](https://pubmed.ncbi.nlm.nih.gov/23172864) | [web](http://chmille4.github.io/Scribl) ]
- [paper-2011
- [web
-
-
Database Access
-
Circos Related
- Entrez Direct: E-utilities on the UNIX command line - UNIX command line tools to access NCBI's databases programmatically. Instructions to install and examples are found in the link.
- Entrez Direct: E-utilities on the UNIX command line - UNIX command line tools to access NCBI's databases programmatically. Instructions to install and examples are found in the link.
-
-
Resources
-
Becoming a Bioinformatician
- What is a bioinformatician
- Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies
- Top N Reasons To Do A Ph.D. or Post-Doc in Bioinformatics/Computational Biology
- A 10-Step Guide to Party Conversation For Bioinformaticians - Here is a step-by-step guide on how to convey concepts to people not involved in the field when asked the question: 'So, what do you do?'
- A History Of Bioinformatics (In The Year 2039) - A talk by C. Titus Brown on his take of looking back at bioinformatics from the year 2039. His notes for this talk can be found [here](http://ivory.idyll.org/blog/2014-bosc-keynote.html).
- A farewell to bioinformatics - A critical view of the state of bioinformatics.
- A Series of Interviews with Notable Bioinformaticians - Dr. Keith Bradnam "thought it might be instructive to ask a simple series of questions to a bunch of notable bioinformaticians to assess their feelings on the current state of bioinformatics research, and maybe get any tips they have about what has been useful to their bioinformatics careers."
- Rosalind - Rosalind is a platform for learning bioinformatics through problem solving.
- A guide for the lonely bioinformatician - This guide is aimed at bioinformaticians, and is meant to guide them towards better career development.
- A brief history of bioinformatics
- Top N Reasons To Do A Ph.D. or Post-Doc in Bioinformatics/Computational Biology
- Rosalind - Rosalind is a platform for learning bioinformatics through problem solving.
- A brief history of bioinformatics
- A brief history of bioinformatics
-
Sequencing
- Next-Generation Sequencing Technologies - Elaine Mardis (2014) - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.
- Annotated bibliography of \*Seq assays - List of ~100 papers on various sequencing technologies and assays ranging from transcription to transposable element discovery.
- For all you seq... (PDF) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included.
- For all you seq... (PDF) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included.
- Next-Generation Sequencing Technologies - Elaine Mardis (2014) - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.
-
RNA-Seq
- Review papers on RNA-seq (Biostars) - Includes lots of seminal papers on RNA-seq and analysis methods.
- RNA-seqlopedia - RNA-seqlopedia provides an awesome overview of RNA-seq and of the choices necessary to carry out a successful RNA-seq experiment.
- A survey of best practices for RNA-seq data analysis - Gives awesome roadmap for RNA-seq computational analyses, including challenges/obstacles and things to look out for, but also how you might integrate RNA-seq data with other data types.
- Stories from the Supplement - Dr. Lior Pachter shares his stories from the supplement for well-known RNA-seq analysis software CuffDiff and [Cufflinks](http://cole-trapnell-lab.github.io/cufflinks/) and explains some of their methodologies.
- List of RNA-seq Bioinformatics Tools - Extensive list on Wikipedia of RNA-seq bioinformatics tools needed in analysis, ranging from all parts of an analysis pipeline from quality control, alignment, splice analysis, and visualizations.
- RNA-seqlopedia - RNA-seqlopedia provides an awesome overview of RNA-seq and of the choices necessary to carry out a successful RNA-seq experiment.
- A survey of best practices for RNA-seq data analysis - Gives awesome roadmap for RNA-seq computational analyses, including challenges/obstacles and things to look out for, but also how you might integrate RNA-seq data with other data types.
-
YouTube Channels and Playlists
- Current Topics in Genome Analysis 2016 - Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine.
- GenomeTV - "GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."
- Leading Strand - Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on [The Leading Strand](http://theleadingstrand.cshl.edu/).
- Genomics, Big Data and Medicine Seminar Series - "Our seminars are dedicated to the critical intersection of GBM, delving into 'bleeding edge' technology and approaches that will deeply shape the future."
- Rafael Irizarry's Channel - Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics.
- NIH VideoCasting and Podcasting - "NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics.
- Current Topics in Genome Analysis 2016 - Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine.
- GenomeTV - "GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."
- Leading Strand - Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on [The Leading Strand](http://theleadingstrand.cshl.edu/).
- Rafael Irizarry's Channel - Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics.
- NIH VideoCasting and Podcasting - "NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics.
-
Blogs
- ACGT - Dr. Keith Bradnam writes about this "thoughts on biology, genomics, and the ongoing threat to humanity from the bogus use of bioinformatics acroynums."
- Opiniomics - Dr. Mick Watson write on bioinformatics, genomes, and biology.
- Bits of DNA - Dr. Lior Pachter writes review and commentary on computational biology.
- it is NOT junk - Dr. Michael Eisen writes "a blog about genomes, DNA, evolution, open science, baseball and other important things"
- #!/perl/bioinfo - The Computational and Structural Biology group at EEAD-CSIC writes, in Spanish and English, about ideas and code for plant genomics, computational and structural biology problems.
- it is NOT junk - Dr. Michael Eisen writes "a blog about genomes, DNA, evolution, open science, baseball and other important things"
- #!/perl/bioinfo - The Computational and Structural Biology group at EEAD-CSIC writes, in Spanish and English, about ideas and code for plant genomics, computational and structural biology problems.
-
Miscellaneous
- A New Online Computational Biology Curriculum - "This article introduces a catalog of several hundred free video courses of potential interest to those wishing to expand their knowledge of bioinformatics and computational biology. The courses are organized into eleven subject areas modeled on university departments and are accompanied by commentary and career advice."
- How Perl Saved the Human Genome Project - An anecdote by Lincoln D. Stein on the importance of the Perl programming language in the Human Genome Project.
- Educational Papers from Nature Biotechnology and PLoS Computational Biology - Page of links to primers and short educational articles on various methods used in computational biology and bioinformatics.
- The PeerJ Bioinformatics Software Tools Collection - Collection of tools curated by Keith Crandall and Claus White, aimed at collating the most interesting, innovative, and relevant bioinformatics tools articles in PeerJ.
- The PeerJ Bioinformatics Software Tools Collection - Collection of tools curated by Keith Crandall and Claus White, aimed at collating the most interesting, innovative, and relevant bioinformatics tools articles in PeerJ.
-
-
Online networking groups
-
Miscellaneous
- Bioinformatics (on Discord) - a Discord server for general bioinformatics
- r-bioinformatics - the official Slack workspace of r/bioinformatics ([send a direct message to apfejes on reddit](https://www.reddit.com/message/compose/?to=apfejes&subject=Request%20to%20join%20the%20r/bioinformatics%20Slack%20group&message=I%20would%20like%20to%20request%20to%20join%20the%20r/bioinformatics%20Slack%20group))
- BioinformaticsGRX - A community of bioinformaticians based in Granada, Spain
- Comunidad de Desarolladores de Software en Bioinformática - A community of bioinformaticians centered in Latin America
- COMBINE - An Austrialian group for bioinformatics students
- COMBINE - An Austrialian group for bioinformatics students
-
-
Data Tools
-
Downloading
- GGD - Go Get Data; A command line interface for obtaining genomic data. [ [web](https://gogetdata.github.io) ]
- SRA-Explorer - Easily get SRA download links and other information. [ [web](https://sra-explorer.info) ]
-
Compressing
- Genozip - A compressor of common genomic file formats (BAM, CRAM, FASTQ, VCF etc). [ [web](https://genozip.com/?utm_source=Awesome-Bioinformatics) | [paper-2021](https://www.researchgate.net/publication/349347156_Genozip_-_A_Universal_Extensible_Genomic_Data_Compressor) ]
-
-
Long-read sequencing
-
Long-read Assembly
- canu - A single molecule sequence assembler for genomes large and small.
- flye - De novo assembler for single molecule sequencing reads using repeat graphs.
- hifiasm - A haplotype-resolved assembler for accurate Hifi reads.
- wtdbg2 - A fuzzy Bruijn graph approach to long noisy reads assembly
- flye - De novo assembler for single molecule sequencing reads using repeat graphs.
-
Programming Languages
Categories
Sub Categories
Genome Browsers / Gene Diagrams
16
Miscellaneous
15
Becoming a Bioinformatician
14
Command Line Utilities
13
Workflow Managers
13
YouTube Channels and Playlists
11
Sequence Processing
11
Circos Related
11
Variant Calling
10
Python Modules
9
Sequence Alignment
9
Pipelines
7
Blogs
7
RNA-Seq
7
BAM File Utilities
7
Sequencing
5
Long-read Assembly
5
GFF BED File Utilities
4
VCF File Utilities
4
Variant Prediction/Annotation
3
Variant Simulation
2
Downloading
2
Assembly
2
Annotation
2
Data Analysis
2
Quantification
2
Compressing
1
Keywords
bioinformatics
37
genomics
17
python
7
fasta
7
dna
6
workflow
5
vcf
5
ngs
5
fastq
5
nextflow
4
pipeline
4
sequencing
4
quality-control
4
alignment
4
toolkit
4
sequence-alignment
4
golang
3
pacbio
3
annotation
3
next-generation-sequencing
3
reproducible-research
3
structural-variation
3
biology
3
bacterial-genomes
3
sequence-analysis
3
genome-annotation
3
samtools
3
aws
3
workflow-engine
3
bam
3
docker
3
sequence
2
bioinformatics-pipeline
2
cross-platform
2
analysis
2
sam
2
tool
2
reproducible-science
2
containers
2
dataflow
2
science
2
cancer-genomics
2
variant-calling
2
cloud
2
genome
2
hpc
2
plasmids
2
bioconda
2
genome-browser
2
computational-biology
2