https://github.com/mikelove/bioc-refcard

Bioconductor cheat sheet
https://github.com/mikelove/bioc-refcard
bioconductor bioinformatics cheatsheet compbio guide howto microarray r rnaseq
Last synced: 2 months ago
JSON representation
Bioconductor cheat sheet
Host: GitHub
URL: https://github.com/mikelove/bioc-refcard
Owner: mikelove
Created: 2012-12-03T13:21:13.000Z (almost 13 years ago)
Default Branch: main
Last Pushed: 2024-08-26T13:25:30.000Z (about 1 year ago)
Last Synced: 2025-04-23T00:46:45.377Z (6 months ago)
Topics: bioconductor, bioinformatics, cheatsheet, compbio, guide, howto, microarray, r, rnaseq
Language: HTML
Homepage: http://mikelove.github.io/bioc-refcard/
Size: 376 KB
Stars: 188
Watchers: 18
Forks: 66
Open Issues: 2
Metadata Files:
- Readme: README.html
Awesome Lists containing this project

README

          

Bioconductor cheat sheet

code{white-space: pre-wrap;}

span.smallcaps{font-variant: small-caps;}

div.columns{display: flex; gap: min(4vw, 1.5em);}

div.column{flex: auto; overflow-x: auto;}

div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}

ul.task-list{list-style: none;}

ul.task-list li input[type="checkbox"] {

  width: 0.8em;

  margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 

  vertical-align: middle;

}







Other Formats

PDF







Bioconductor cheat sheet





    


    Author

    

             Michael Love 

          

  

    

  

    

  

  

Install

For details go to http://bioconductor.org/install/

if (!requireNamespace("BiocManager"))

    install.packages("BiocManager")

BiocManager::install()

BiocManager::install(c("package1","package2")

BiocManager::valid() # are packages up to date?

# what Bioc version is release right now?

http://bioconductor.org/bioc-version

# what Bioc versions are release/devel?

http://bioconductor.org/js/versions.js


help within R

Simple help:

?functionName

?"eSet-class" # classes need the '-class' on the end

help(package="foo",help_type="html") # launch web browser help

vignette("topic")

browseVignettes(package="package") # show vignettes for the package

Help for advanced users:

functionName # prints source code

getMethod(method,"class")  # prints source code for method

selectMethod(method, "class") # will climb the inheritance to find method

showMethods(classes="class") # show all methods for class

methods(class="GRanges") # this will work in R >= 3.2

?"functionName,class-method" # method help for S4 objects, e.g.:

?"plotMA,data.frame-method" # from library(geneplotter)

?"method.class" # method help for S3 objects e.g.:

?"plot.lm"

sessionInfo() # necessary info for getting help

packageVersion("foo") # what version of package 

Bioconductor support website: https://support.bioconductor.org

If you use RStudio, then you already get nicely rendered documentation using ? or help. If you are a command line person, then you can use this alias to pop up a help page in your web browser with rhelp functionName packageName.

alias rhelp="Rscript -e 'args <- commandArgs(TRUE); help(args[2], package=args[3], help_type=\"html\"); Sys.sleep(5)' --args"


debugging R

traceback() # what steps lead to an error

# debug a function

debug(myFunction) # step line-by-line through the code in a function

undebug(myFunction) # stop debugging

debugonce(myFunction) # same as above, but doesn't need undebug()

# also useful if you are writing code is to put

# the function browser() inside a function at a critical point

# this plus devtools::load_all() can be useful for programming

# to jump in function on error:

options(error=recover)

# turn that behavior off:

options(error=NULL)

# debug, e.g. estimateSizeFactors from DESeq2...

# debugging an S4 method is more difficult; this gives you a peek inside:

trace(estimateSizeFactors, browser, exit=browser, signature="DESeqDataSet")


Show package-specific methods for a class

These two long strings of R code do approximately the same thing: obtain the methods that operate on an object of a given class, which are defined in a specific package.

intersect(sapply(strsplit(as.character(methods(class="DESeqDataSet")), ","), `[`, 1), ls("package:DESeq2"))

sub("Function: (.*) \\(package .*\\)","\\1",grep("Function",showMethods(classes="DESeqDataSet", where=getNamespace("DESeq2"), printTo=FALSE), value=TRUE))


Annotations

For AnnotationHub examples, see:

https://www.bioconductor.org/help/workflows/annotation/Annotation_Resources

The following is how to work with the organism database packages, and biomart.

AnnotationDbi

# using one of the annotation packges

library(AnnotationDbi)

library(org.Hs.eg.db) # or, e.g. Homo.sapiens

columns(org.Hs.eg.db)

keytypes(org.Hs.eg.db)

head(keys(org.Hs.eg.db, keytype="ENTREZID"))

# returns a named character vector, see ?mapIds for multiVals options

res <- mapIds(org.Hs.eg.db, keys=k, column="ENSEMBL", keytype="ENTREZID")

# generates warning for 1:many mappings

res <- select(org.Hs.eg.db, keys=k,

  columns=c("ENTREZID","ENSEMBL","SYMBOL"),

  keytype="ENTREZID")

biomaRt

# map from one annotation to another using biomart

library(biomaRt)

m <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

map <- getBM(mart = m,

  attributes = c("ensembl_gene_id", "entrezgene"),

  filters = "ensembl_gene_id", 

  values = some.ensembl.genes)


Genomic ranges

GenomicRanges

library(GenomicRanges)

z <- GRanges("chr1",IRanges(1000001,1001000),strand="+")

start(z)

end(z)

width(z)

strand(z)

mcols(z) # the 'metadata columns', any information stored alongside each range

ranges(z) # gives the IRanges

seqnames(z) # the chromosomes for each ranges

seqlevels(z) # the possible chromosomes

seqlengths(z) # the lengths for each chromosome


Intra-range methods

Affects ranges independently


function

description

shift

moves left/right

narrow

narrows by relative position within range

resize

resizes to width, fixing start for +, end for -

flank

returns flanking ranges to the left +, or right -

promoters

similar to flank

restrict

restricts ranges to a start and end position

trim

trims out of bound ranges

+/-

expands/contracts by adding/subtracting fixed amount

*

zooms in (positive) or out (negative) by multiples

Inter-range methods

Affects ranges as a group


function

description

range

one range, leftmost start to rightmost end

reduce

cover all positions with only one range

gaps

uncovered positions within range

disjoin

breaks into discrete ranges based on original starts/ends

Nearest methods

Given two sets of ranges, x and subject, for each range in x, returns…


function

description

nearest

index of the nearest neighbor range in subject

precede

index of the range in subject that is directly preceded by the range in x

follow

index of the range in subject that is directly followed by the range in x

distanceToNearest

distances to its nearest neighbor in subject (Hits object)

distance

distances to nearest neighbor (integer vector)

A Hits object can be accessed with queryHits, subjectHits and mcols if a distance is associated.


set methods

If y is a GRangesList, then use punion, etc. All functions have default ignore.strand=FALSE, so are strand specific.

union(x,y) 

intersect(x,y)

setdiff(x,y)


Overlaps

x %over% y  # logical vector of which x overlaps any in y

fo <- findOverlaps(x,y) # returns a Hits object

queryHits(fo)   # which in x

subjectHits(fo) # which in y 


Seqnames and seqlevels

GenomicRanges and GenomeInfoDb

gr.sub <- gr[seqlevels(gr) == "chr1"]

seqlevelsStyle(x) <- "UCSC" # convert to 'chr1' style from "NCBI" style '1'


Sequences

Biostrings

see the Biostrings Quick Overview PDF

For naming, see cheat sheet for annotation

library(BSgenome.Hsapiens.UCSC.hg19)

dnastringset <- getSeq(Hsapiens, granges) # returns a DNAStringSet

# also Views() for Bioconductor >= 3.1

library(Biostrings)

dnastringset <- readDNAStringSet("transcripts.fa")

substr(dnastringset, 1, 10) # to character string

subseq(dnastringset, 1, 10) # returns DNAStringSet

Views(dnastringset, 1, 10) # lightweight views into object

complement(dnastringset)

reverseComplement(dnastringset)

matchPattern("ACGTT", dnastring) # also countPattern, also works on Hsapiens/genome

vmatchPattern("ACGTT", dnastringset) # also vcountPattern

letterFrequecy(dnastringset, "CG") # how many C's or G's

# also letterFrequencyInSlidingView

alphabetFrequency(dnastringset, as.prob=TRUE)

# also oligonucleotideFrequency, dinucleotideFrequency, trinucleotideFrequency

# transcribe/translate for imitating biological processes


Sequencing data

Rsamtools scanBam returns lists of raw values from BAM files

library(Rsamtools)

which <- GRanges("chr1",IRanges(1000001,1001000))

what <- c("rname","strand","pos","qwidth","seq")

param <- ScanBamParam(which=which, what=what)

# for more BamFile functions/details see ?BamFile

# yieldSize for chunk-wise access

bamfile <- BamFile("/path/to/file.bam")

reads <- scanBam(bamfile, param=param)

res <- countBam(bamfile, param=param) 

# for more sophisticated counting modes

# see summarizeOverlaps() below

# quickly check chromosome names

seqinfo(BamFile("/path/to/file.bam"))

# DNAStringSet is defined in the Biostrings package

# see the Biostrings Quick Overview PDF

dnastringset <- scanFa(fastaFile, param=granges)

GenomicAlignments returns Bioconductor objects (GRanges-based)

library(GenomicAlignments)

ga <- readGAlignments(bamfile) # single-end

ga <- readGAlignmentPairs(bamfile) # paired-end


Transcript databases

GenomicFeatures

# get a transcript database, which stores exon, trancript, and gene information

library(GenomicFeatures)

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

# or build a txdb from GTF file (e.g. downloadable from Ensembl FTP site)

txdb <- makeTranscriptDbFromGFF("file.GTF", format="gtf")

# or build a txdb from Biomart (however, not as easy to reproduce later)

txdb <- makeTranscriptDbFromBiomart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

# in Bioconductor >= 3.1, also makeTxDbFromGRanges

# saving and loading

saveDb(txdb, file="txdb.sqlite")

loadDb("txdb.sqlite")

# extracting information from txdb

g <- genes(txdb) # GRanges, just start to end, no exon/intron information

tx <- transcripts(txdb) # GRanges, similar to genes()

e <- exons(txdb) # GRanges for each exon

ebg <- exonsBy(txdb, by="gene") # exons grouped in a GRangesList by gene

ebt <- exonsBy(txdb, by="tx") # similar but by transcript

# then get the transcript sequence

txSeq <- extractTranscriptSeqs(Hsapiens, ebt)


Summarizing information across ranges and experiments

The SummarizedExperiment is a storage class for high-dimensional information tied to the same GRanges or GRangesList across experiments (e.g., read counts in exons for each gene).

library(GenomicAlignments)

fls <- list.files(pattern="*.bam$")

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

ebg <- exonsBy(txdb, by="gene")

# see yieldSize argument for restricting memory

bf <- BamFileList(fls)

library(BiocParallel)

register(MulticoreParam(4))

# lots of options in the man page

# singleEnd, ignore.strand, inter.features, fragments, etc.

se <- summarizeOverlaps(ebg, bf)

# operations on SummarizedExperiment

assay(se) # the counts from summarizeOverlaps

colData(se)

rowRanges(se)

My preferred quantification method is Salmon, with --gcBias option enabled unless you know there is no GC dependence in the data, followed by tximport. Here is an example of usage:

coldata <- read.table("samples.txt")

rownames(coldata) <- coldata$id

files <- coldata$files; names(files) <- coldata$id

txi <- tximport(files, type="salmon", tx2gene=tx2gene)

dds <- DESeqDataSetFromTximport(txi, coldata, ~condition)

Another fast Bioconductor read counting method is featureCounts in Rsubread.

library(Rsubread)

res <- featureCounts(files, annot.ext="annotation.gtf",

  isGTFAnnotationFile=TRUE,

  GTF.featureType="exon",

  GTF.attrType="gene_id")

res$counts


RNA-seq gene-wise analysis

DESeq2

My preferred pipeline for DESeq2 users is to start with a lightweight transcript abundance quantifier such as Salmon and to use tximport, followed by DESeqDataSetFromTximport.

Here, coldata is a data.frame with group as a column.

library(DESeq2)

# from tximport

dds <- DESeqDataSetFromTximport(txi, coldata, ~ group)

# from SummarizedExperiment

dds <- DESeqDataSet(se, ~ group)

# from count matrix

dds <- DESeqDataSetFromMatrix(counts, coldata, ~ group)

# minimal filtering helps keep things fast 

# one can set 'n' to e.g. min(5, smallest group sample size)

keep <- rowSums(counts(dds) >= 10) >= n 

dds <- dds[keep,]

dds <- DESeq(dds)

res <- results(dds) # no shrinkage of LFC, or:

res <- lfcShrink(dds, coef = 2, type="apeglm") # shrink LFCs

edgeR

# this chunk from the Quick start in the edgeR User Guide

library(edgeR) 

y <- DGEList(counts=counts,group=group)

keep <- filterByExpr(y)

y <- y[keep,]

y <- calcNormFactors(y)

design <- model.matrix(~group)

y <- estimateDisp(y,design)

fit <- glmFit(y,design)

lrt <- glmLRT(fit)

topTags(lrt)

# or use the QL methods:

qlfit <- glmQLFit(y,design)

qlft <- glmQLFTest(qlfit)

topTags(qlft)

limma-voom

library(limma)

design <- model.matrix(~ group)

y <- DGEList(counts)

keep <- filterByExpr(y)

y <- y[keep,]

y <- calcNormFactors(y)

v <- voom(y,design)

fit <- lmFit(v,design)

fit <- eBayes(fit)

topTable(fit)

Many more RNA-seq packages


Expression set

library(Biobase)

data(sample.ExpressionSet)

e <- sample.ExpressionSet

exprs(e)

pData(e)

fData(e)


Get GEO dataset

library(GEOquery)

e <- getGEO("GSE9514")


Microarray analysis

library(affy)

library(limma)

phenoData <- read.AnnotatedDataFrame("sample-description.csv")

eset <- justRMA("/celfile-directory", phenoData=phenoData)

design <- model.matrix(~ Disease, pData(eset))

fit <- lmFit(eset, design)

efit <- eBayes(fit)

topTable(efit, coef=2)


iCOBRA performance metrics

library(iCOBRA)

cd <- COBRAData(pval=pval.df, padj=padj.df, score=score.df, truth=truth.df)

cp <- calculate_performance(cd, binary_truth = "status", cont_truth = "logFC")

cobraplot <- prepare_data_for_plot(cp)

plot_fdrtprcurve(cobraplot)

# interactive shiny app:

COBRAapp(cd)


window.document.addEventListener("DOMContentLoaded", function (event) {

  const toggleBodyColorMode = (bsSheetEl) => {

    const mode = bsSheetEl.getAttribute("data-mode");

    const bodyEl = window.document.querySelector("body");

    if (mode === "dark") {

      bodyEl.classList.add("quarto-dark");

      bodyEl.classList.remove("quarto-light");

    } else {

      bodyEl.classList.add("quarto-light");

      bodyEl.classList.remove("quarto-dark");

    }

  }

  const toggleBodyColorPrimary = () => {

    const bsSheetEl = window.document.querySelector("link#quarto-bootstrap");

    if (bsSheetEl) {

      toggleBodyColorMode(bsSheetEl);

    }

  }

  toggleBodyColorPrimary();  

  const icon = "";

  const anchorJS = new window.AnchorJS();

  anchorJS.options = {

    placement: 'right',

    icon: icon

  };

  anchorJS.add('.anchored');

  const isCodeAnnotation = (el) => {

    for (const clz of el.classList) {

      if (clz.startsWith('code-annotation-')) {                     

        return true;

      }

    }

    return false;

  }

  const clipboard = new window.ClipboardJS('.code-copy-button', {

    text: function(trigger) {

      const codeEl = trigger.previousElementSibling.cloneNode(true);

      for (const childEl of codeEl.children) {

        if (isCodeAnnotation(childEl)) {

          childEl.remove();

        }

      }

      return codeEl.innerText;

    }

  });

  clipboard.on('success', function(e) {

    // button target

    const button = e.trigger;

    // don't keep focus

    button.blur();

    // flash "checked"

    button.classList.add('code-copy-button-checked');

    var currentTitle = button.getAttribute("title");

    button.setAttribute("title", "Copied!");

    let tooltip;

    if (window.bootstrap) {

      button.setAttribute("data-bs-toggle", "tooltip");

      button.setAttribute("data-bs-placement", "left");

      button.setAttribute("data-bs-title", "Copied!");

      tooltip = new bootstrap.Tooltip(button, 

        { trigger: "manual", 

          customClass: "code-copy-button-tooltip",

          offset: [0, -8]});

      tooltip.show();    

    }

    setTimeout(function() {

      if (tooltip) {

        tooltip.hide();

        button.removeAttribute("data-bs-title");

        button.removeAttribute("data-bs-toggle");

        button.removeAttribute("data-bs-placement");

      }

      button.setAttribute("title", currentTitle);

      button.classList.remove('code-copy-button-checked');

    }, 1000);

    // clear code selection

    e.clearSelection();

  });

  function tippyHover(el, contentFn) {

    const config = {

      allowHTML: true,

      content: contentFn,

      maxWidth: 500,

      delay: 100,

      arrow: false,

      appendTo: function(el) {

          return el.parentElement;

      },

      interactive: true,

      interactiveBorder: 10,

      theme: 'quarto',

      placement: 'bottom-start'

    };

    window.tippy(el, config); 

  }

  const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');

  for (var i=0; i<noterefs.length; i++) {

    const ref = noterefs[i];

    tippyHover(ref, function() {

      // use id or data attribute instead here

      let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href');

      try { href = new URL(href).hash; } catch {}

      const id = href.replace(/^#\/?/, "");

      const note = window.document.getElementById(id);

      return note.innerHTML;

    });

  }

      let selectedAnnoteEl;

      const selectorForAnnotation = ( cell, annotation) => {

        let cellAttr = 'data-code-cell="' + cell + '"';

        let lineAttr = 'data-code-annotation="' +  annotation + '"';

        const selector = 'span[' + cellAttr + '][' + lineAttr + ']';

        return selector;

      }

      const selectCodeLines = (annoteEl) => {

        const doc = window.document;

        const targetCell = annoteEl.getAttribute("data-target-cell");

        const targetAnnotation = annoteEl.getAttribute("data-target-annotation");

        const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation));

        const lines = annoteSpan.getAttribute("data-code-lines").split(",");

        const lineIds = lines.map((line) => {

          return targetCell + "-" + line;

        })

        let top = null;

        let height = null;

        let parent = null;

        if (lineIds.length > 0) {

            //compute the position of the single el (top and bottom and make a div)

            const el = window.document.getElementById(lineIds[0]);

            top = el.offsetTop;

            height = el.offsetHeight;

            parent = el.parentElement.parentElement;

          if (lineIds.length > 1) {

            const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]);

            const bottom = lastEl.offsetTop + lastEl.offsetHeight;

            height = bottom - top;

          }

          if (top !== null && height !== null && parent !== null) {

            // cook up a div (if necessary) and position it 

            let div = window.document.getElementById("code-annotation-line-highlight");

            if (div === null) {

              div = window.document.createElement("div");

              div.setAttribute("id", "code-annotation-line-highlight");

              div.style.position = 'absolute';

              parent.appendChild(div);

            }

            div.style.top = top - 2 + "px";

            div.style.height = height + 4 + "px";

            let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");

            if (gutterDiv === null) {

              gutterDiv = window.document.createElement("div");

              gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter");

              gutterDiv.style.position = 'absolute';

              const codeCell = window.document.getElementById(targetCell);

              const gutter = codeCell.querySelector('.code-annotation-gutter');

              gutter.appendChild(gutterDiv);

            }

            gutterDiv.style.top = top - 2 + "px";

            gutterDiv.style.height = height + 4 + "px";

          }

          selectedAnnoteEl = annoteEl;

        }

      };

      const unselectCodeLines = () => {

        const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"];

        elementsIds.forEach((elId) => {

          const div = window.document.getElementById(elId);

          if (div) {

            div.remove();

          }

        });

        selectedAnnoteEl = undefined;

      };

      // Attach click handler to the DT

      const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');

      for (const annoteDlNode of annoteDls) {

        annoteDlNode.addEventListener('click', (event) => {

          const clickedEl = event.target;

          if (clickedEl !== selectedAnnoteEl) {

            unselectCodeLines();

            const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active');

            if (activeEl) {

              activeEl.classList.remove('code-annotation-active');

            }

            selectCodeLines(clickedEl);

            clickedEl.classList.add('code-annotation-active');

          } else {

            // Unselect the line

            unselectCodeLines();

            clickedEl.classList.remove('code-annotation-active');

          }

        });

      }

  const findCites = (el) => {

    const parentEl = el.parentElement;

    if (parentEl) {

      const cites = parentEl.dataset.cites;

      if (cites) {

        return {

          el,

          cites: cites.split(' ')

        };

      } else {

        return findCites(el.parentElement)

      }

    } else {

      return undefined;

    }

  };

  var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');

  for (var i=0; i<bibliorefs.length; i++) {

    const ref = bibliorefs[i];

    const citeInfo = findCites(ref);

    if (citeInfo) {

      tippyHover(citeInfo.el, function() {

        var popup = window.document.createElement('div');

        citeInfo.cites.forEach(function(cite) {

          var citeDiv = window.document.createElement('div');

          citeDiv.classList.add('hanging-indent');

          citeDiv.classList.add('csl-entry');

          var biblioDiv = window.document.getElementById('ref-' + cite);

          if (biblioDiv) {

            citeDiv.innerHTML = biblioDiv.innerHTML;

          }

          popup.appendChild(citeDiv);

        });

        return popup.innerHTML;

      });

    }

  }

});
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mikelove/bioc-refcard

Awesome Lists containing this project

README

Other Formats

Bioconductor cheat sheet

Install

help within R

debugging R

Show package-specific methods for a class

Annotations

Genomic ranges

Intra-range methods

Inter-range methods

Nearest methods

set methods

Overlaps

Seqnames and seqlevels

Sequences

Sequencing data

Transcript databases

Summarizing information across ranges and experiments

RNA-seq gene-wise analysis

Expression set

Get GEO dataset

Microarray analysis

iCOBRA performance metrics