https://github.com/vjcitn/brcaproteo

A package that illustrates use of CPTAC data on breast cancer in a multiomic context
https://github.com/vjcitn/brcaproteo

Last synced: 3 months ago
JSON representation

A package that illustrates use of CPTAC data on breast cancer in a multiomic context

Host: GitHub
URL: https://github.com/vjcitn/brcaproteo
Owner: vjcitn
Created: 2019-04-19T20:10:28.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-05-01T15:35:54.000Z (about 6 years ago)
Last Synced: 2025-01-09T13:47:25.326Z (5 months ago)
Language: R
Size: 10.3 MB
Stars: 2
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        ---

title: "brcaProteo -- proteogenomics data archive for breast cancer"

author: "Vincent J. Carey, stvjc at channing.harvard.edu"

date: "`r format(Sys.time(), '%B %d, %Y')`"

vignette: >

  %\VignetteEngine{knitr::rmarkdown}

  %\VignetteIndexEntry{}

  %\VignetteEncoding{UTF-8}

output:

  BiocStyle::html_document:

    highlight: pygments

    number_sections: yes

    theme: united

    toc: yes

bibliography: brcaProteo.bib

---

# Introduction

@Mertins2016 unites information

developed in the NCI CPTAC with TCGA data.  The brcaProteo package

reviews approaches to interrogating this data with Bioconductor.

# Multiassay experiment and a sanity check

The MultiAssayExperiment instance `brProteoMAE` has two components

that use DelayedArray assay representation.  The assay data can

be retrieved using `assay` when the environment variable `CGC_BILLING` 

is set

to a valid Google Cloud Platform project id.  

```{r getpack}

library(brcaProteo)

data(brProteoMAE)

brProteoMAE

```

The proteomic results are available for a subset of the TCGA

contributions.

```{r lku}

upsetSamples(brProteoMAE)

```

We can see that there are 76 samples contributing to three

assays.

## Intersection of samples and features across assays

We would like to line up samples and features that are

common across assay types, in the simplest possible

way.  

### Harmonizing feature annotation

We will relabel the rows of the RNA-seq data, which

are given as Entrez identifiers.  The other two

assays use gene symbols.

We'll start out by isolating the

relevant SummarizedExperiment and forcing an authentication, by

querying for assay values.

```{r lkint}

library(restfulSE)

rna = experiments(brProteoMAE)[["rnaseq"]]

rna

assay(rna[1:3,1:4])

```

Now we use the orgDb interface to obtain gene symbols.

```{r lkorg}

library(org.Hs.eg.db)

id = mapIds(org.Hs.eg.db, keys=rownames(rna), 

       column="SYMBOL", keytype="ENTREZID") 

id[is.na(id)] = rownames(rna)[is.na(id)]

rownames(rna) = id  # if no symbol, just retain entrez id

```

### Reducing multiple instances of features

The proteomic data can have multiple feature values per gene;

for this initial view we'll omit all but the first when there

are multiplicities.

```{r lkdup}

rd = rowData(experiments(brProteoMAE)[["itraq"]])

ok = which(!is.na(rd$geneName)&!duplicated(rd$geneName))

ph = experiments(brProteoMAE)[["itraq"]][ok,]

rownames(ph) = rowData(ph)$geneName

```

### Using intersect*

The MultiAssayExperiment with common samples and features

is then produced as follows:

```{r produceIntersection}

slot(brProteoMAE, "ExperimentList")[["itraq"]] = ph

slot(brProteoMAE, "ExperimentList")[["rnaseq"]] = rna

brint = intersectColumns(intersectRows(brProteoMAE))

```

### Retrieval from BigQuery

We'll materialize the assay components for RPPA and RNA-seq, taking

logs of RNA-seq values:

```{r getm, cache=TRUE}

assay(experiments(brint)[["rnaseq"]]) = 

   log(as.matrix(assay(experiments(brint)[["rnaseq"]]))+1)

assay(experiments(brint)[["rppa"]]) = 

   as.matrix(assay(experiments(brint)[["rppa"]]))

brint

```

## Feature-specific correlation between assays

We extract a matrix for comparison of assay values over

samples, focusing here on gene BCL2.

```{r lkcomp}

mt = sapply(experiments(brint["BCL2",]), assay) 

pairs(mt)

```

For all the selected, coincident features, 

interassay correlation is computed as follows:

```{r getcorrs}

fixup = function(x) data.matrix(na.omit(data.frame(x)))

corm = function(x) cor(fixup(sapply(experiments(brint[x,]), assay)))

allc = sapply(rownames(brint)[[1]], corm)

dim(allc)

summary(allc[2,]) # itraq vs rppa

summary(allc[3,]) # itraq vs rnaseq

summary(allc[8,]) # rppa vs rnaseq

```

The distributions of correlations can be searched

to find assays that are not highly concordant across samples.

The pairs plot for BRCA2 is

```{r lkc2}

mt = sapply(experiments(brint["BRCA2",]), assay) 

pairs(mt)

```

# References

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vjcitn/brcaproteo

Awesome Lists containing this project

README