An open API service indexing awesome lists of open source software.

https://github.com/unistbig/netGO

R/Shiny package for network-integrated pathway enrichment analysis
https://github.com/unistbig/netGO

enrichment-analysis netgo network-analysis protein-protein-interaction r shinyapps

Last synced: 2 months ago
JSON representation

R/Shiny package for network-integrated pathway enrichment analysis

Awesome Lists containing this project

README

          

netGO is an R/Shiny package for network-integrated pathway enrichment analysis.

netGO provides user-interactive visualization of enrichment analysis results and related networks.

Currently, netGO supports analysis for four species (*[Human](https://github.com/unistbig/netGO-Data/tree/master/Human), [Mouse](https://github.com/unistbig/netGO-Data/tree/master/Mouse), [Arabidopsis thaliana](https://github.com/unistbig/netGO-Data/tree/master/Arabidopsis),and [Yeast](https://github.com/unistbig/netGO-Data/tree/master/Yeast)*)

These data are available from [netGO-Data](https://github.com/unistbig/netGO-Data) repository.

## :clipboard: Prerequisites
The R packages listed below are required to be installed before running netGO.(Alphabetical order)

*devtools, doParallel, doSNOW, DT, foreach, googleVis, htmlwidgets, shiny, shinyCyJS, shinyjs, V8*

* Most of the packages are avaiable from [CRAN](https://cran.r-project.org/), but [shinyCyJS](https://github.com/unistbig/shinyCyJS) should be installed from github.

* Linux user has to install V8 after installing the other packages.

* Note that netGO is not supported for centOS 8, because V8 is not available in centOS 8.

On Debian / Ubuntu : libv8-dev or libnode-dev.

On Fedora : v8-devel

[more information](https://cran.r-project.org/web/packages/V8/index.html)

The user may want to use the following codes to install the required packages.

``` R
install.packages('devtools') # 2.2.1
library(devtools) # check Rcpp package is installed.
install_github('unistbig/shinyCyJS')
install.packages('doParallel') # 1.0.15
install.packages('doSNOW') # 1.0.18
install.packages('DT') # 0.11
install.packages('foreach') # 1.4.7
install.packages('googleVis') # 0.6.4
install.packages('htmlwidgets') # 1.5.1
install.packages('shiny') # 1.4.0
install.packages('shinyjs') # 1.0
install.packages('V8') # 2.3
```

## :wrench: Running with an example data

Here are codes to run netGO for the breast tumor dataset (*GEO [GSE3744](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3744).*)

```r
library(devtools)
install_github('unistbig/netGO') # install netGO library

library(netGO) # load netGO library
DownloadExampleData() # Download and load the breast tumor data
obj = netGO(genes = brca[1:30], genesets, network, genesetV)

# The user may also load the pre-calculated result using the following command
# load("brcaresult.RData")
```

For custom data analysis,

```r
library(netGO)
userGenesetV = BuildGenesetV(genesets = userGenesets, network = userNetwork)
obj = netGO(genes = userGenes, genesets = userGenesets, network = userNetwork, genesetV = userGenesetV)
```
Running this example takes 5 to 25 minutes depending on the system used. The analysis results of netGO is shown below.


The analysis result can be visualized using the following codes:

```r
netGOVis(obj, genes = brca[1:30], genesets, network, R = 50, Q = 0.25 ) # visualize netGO's result
```



If user wants to access result without shinyweb-application, the following functions can be used to export the result as text files

```r
# exportGraphTxt
table = exportGraphTxt(gene = brca[1:30], geneset =
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # table
head(table)

# exportGraph
graph = exportGraph(brca[1:30], geneset =
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # shinyCyJS graph object
shinyCyJS(graph)

# exportTable
table = exportTable(obj, R = 50, Q = 0.25) # table
head(table)

dtable = exportTable(obj, type='D', R = 50, Q = 0.25) # data.table
dtable
```

## :memo: Data

### Example Datasets [(netGO-Data repository)](https://github.com/unistbig/netGO-Data)


#### Human

|Data|genes|genesets|network|genesetV|
|:---:|:---:|:---:|:---:|:---:|
|Breast Tumor|brca.RData|c2gs.RData|networkString.RData networkHumannet.RData|genesetVString1,2.RData genesetVHumannet1,2.RData|
|P53|p53.RData|c2gs.RData|networkString.RData networkHumannet.RData|genesetVString1,2.RData genesetVHumannet1,2.RData|
|Diabetes|dg.RData|cpGenesets.RData|networkString.RData networkHumannet.RData|cpgenesetV1,2.RData|

The user can download the breast tumor data using *DownloadExampleData* function(Recommended)

#### Arabidopsis thaliana

|Data|genes|genesets|network|genesetV|
|:---:|:---:|:---:|:---:|:---:|
|ShadowResponse|Aragenes.RData|KEGGara.RData|networkAranet.RData|AragenesetV.RData|

#### Mouse & Yeast ( gene-set and networks available )

|Species|genesets|network|
|:----:|:----:|:----:|
|Mouse|KEGGmouse.Rdata|networkMousenet.Rdata|
|Yeast|KEGGyeast.Rdata|networkYeastnet.Rdata|

### Data Formats

netGO requires the follwoing four data types.

- *genes* : a character vector of input genes (e.g., differentially expressed genes).

- *genesets* : a named list of gene-sets consisting of groups of genes to be tested.

- *network* : a numeric matrix of network data. The network scores are normalized to the unit interval [0,1] by dividing each score by the maximum score

- *genesetV* : A numeric matrix of pre-calculated interaction data between gene and gene-sets.

The dimension of matrix must be [{number of genes} , {number of gene-sets}].

It can be built by using *BuildGenesetV* function with network and genesets objects as the input arguments.

```r
genesetV = BuildGenesetV(network, genesets)
```

## :white_circle: Functions


### 1. netGO
netGO function tests the significance of the gene-sets for the input gene list

and returns a data frame of gene-sets, their *p*-values, *q*-values derived from netGO+, Fisher’s exact test and netGO (optional) as well as the scores for the network interaction and overlap.

Input arguments

* genes: a character vector of input genes (e.g., differentially expressed genes).

* genesets: a list of gene-sets consisting of groups of genes.

* network: A numeric matrix of network data. The network scores are normalized to the unit interval [0,1]. 1 represents strong interaction and 0 for no interaction


| |A|B|C|
|:--:|:--:|:--:|:--:|
|A|0|0.1|0.76|
|B|0.1|0|0.324|
|C|0.76|0.324|0|

* genesetV: a numeric matrix of pre-calculated interaction data between genes and gene-sets.

This object can be built with *BuildGenesetV* function.

| |Gene-set1|Gene-set2|Gene-set3|
|:--:|:--:|:--:|:--:|
|A|0.837|1.647|0.074|
|B|0|1.75|0.113|
|C|0.464|0.486|2.442|

* alpha (optional): a numeric parameter ( ≥ 1; the default is 20) that weights the contribution of network connections in enrichment analysis.


* beta (optional): a numeric parameter (∈[0,1]; the default is 0.5) that balances the weights between the relative and absolute network scores.

* nperm (optional): a numeric parameter to determine the bin size (number of genes) to be used during resampling. The default is NULL which assigns approximately 2000 genes to each bin

* pvalue (optional): a boolean parameter to determine whether to return Q-values only ( FALSE ) or both P-values and Q-values (TRUE)

* plus (optional): a boolean parameter to determine whether to run both netGO and netGO+ (plus = FALSE) or netGO+ only ( plus = TRUE, default )

* verbose (optional) : a boolean parameter whether to show more process of netGO as follows.

**Notice** the input genes should be represented in **gene symbols** when using the default networks and gene-sets (STRING and MSigDB).

Other types of gene names are also allowed if the corresponding customized data (networks and gene-set data) are used.


### 2. netGOVis

netGOVis function visualizes the analysis results on the web browser (google chrome is recommended).

The resulting graphs (svg format) and table are downloadable from the web browser.

Input arguments

* obj: the data frame of analysis results obtained by running **netGO** function.

It consists of multiple columns including

1. gene-set name and p, q-values evaluated using netGO (optional), netGO+, and Fisher’s exact test as well as the scores for the overlap and networks.

* genes, genesets, network: the same as those in the *netGO* function.

* R (optional): gene-set rank threshold, The default is 50 (Top 50 gene-sets in either method will be shown).

* Q (optional): Gene-set Q-value threshold, The default is 0.25. (gene-sets with Q-value ≤ 0.25 will be used)

After running the netGO function, the user may see the following logs in the R console.



and user's default web browser (netGO was built based on chrome environment) will return the following interactive visualization:




### 3. BuildGenesetV

BuildGenesetV function will build genesetV object using the given *network* and *genesets*.

genesetV is pre-calculated interaction files used to reduce the running time of netGO.

Input arguments

* genesets, network: the same as those in the *netGO* function.


### 4. DownloadExampleData

This function will download example data in the user's working directory and load the data ( breast tumor, [GSE3744](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3744) ) in user's R environment.

Note that, if objects exist in the working directory, this function will not download the data again, so we recommand removing and downloading them again if netGO package is updated.

Input arguments
* none
* R object named *brca, genesets, genesetV, network, obj* will be loaded.


### 5. exportGraph

exportGraph function will export network data from the netGO analsysis result as graph object that can be accessed using shinyCyJS function

Input arguments
* genes, network : the same as those in the *netGO* function.

* geneset : a character vector of gene symbols (e.g., member of genesets object in *netGO*).

for example,

``` R
geneset = genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']]
graph = exportGraph(brca[1:30], geneset =
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # shinyCyJS graph object

shinyCyJS(graph)
```

However, the default viewer of R (not web browser) will not use the layout functions as shown below.



### 6. exportGraphTxt

exportGraphTxt function will export network data from the netGO analysis result as table format.

Input arguments
* genes, network, geneset : the same as those in the *exportGraph* function.

For example,

``` R
table = exportGraphTxt(brca[1:30], geneset, network)
head(table)
```

the exported data are shown as

|geneA|geneB|strength|type|
|:---:|:---:|:---:|:---:|
|A|B|0.1|Inter|
|C|D|0.82|Inner|

'Inter' means geneB belongs to the intersection of *genes* and *genesets*.
'Inner' means geneB belongs to the differenced set *genesets* – *genes*.


### 7. exportTable

exportTable will export the result object of netGO as table or data.table.

Input arguments
* obj, R, Q : the same as those in the *netGOVis* function.

for example,

```R
table = exportTable(obj, R = 50, Q = 0.25) # table
head(table)

dtable = exportTable(obj, type='D', R = 50, Q = 0.25) # data.table
dtable
```

The exported data have the format as follows:

|geneset name|netGO+ q-value| Fisher q-value |
|:---:|:---:|:---:|
|genesetA|0.11|0.2|


## :blue_book: Visualization and exploration of netGO analysis results

The netGO analysis results are visualized through three panels: interaction networks, list of significant gene-sets, and the bubble chart.

### Interaction Network

* The network panel displays the input genes, selected gene-set, and the network connections between the two.
* ![#48dbfb](https://placehold.it/15/48dbfb/000000?text=+) Sky blue nodes represent input genes (e.g., differentially expressed genes)
* ![#feca57](https://placehold.it/15/feca57/000000?text=+) Yellow nodes represent genes in the selected gene-set
* ![#1dd1a1](https://placehold.it/15/1dd1a1/000000?text=+) Green nodes represent the intersection of input genes and the gene-set.
* The edge width represents the strength of interaction between two nodes.
* Genes without edges will be not be displayed.
* The gene-set can be selected by clicking on the gene-set name on the upper-right panel.
* The user can download the graph image as SVG format.

### Significant gene-sets

* This panel contains the list of significant gene-sets as well as their Q-values ( or P-values ) evaluated from netGO, netGO+ and Fisher’s exact test. It is downloadable by clicking the ‘Download Table’ button in the upper right corner of the table



### Bubble chart

* This module plots the bubble chart of significant gene-sets for the netGO+ results.
* The overlap (x-axis) and network (y-axis) scores of the significant gene-sets are represented.
* The size of bubbles represents the significance level of each gene-set in -log10 scale (Qvalue).
* Hovering/Click on each bubble will show corresponding statistical values.



## :blush: Contact

* Comments / suggestions and questions will be greatly appreciated,

* :octocat: Jinhwan Kim [@jhk0530](http://github.com/jhk0530) *kjh0530@unist.ac.kr*

* prof. Dougu Nam *dougnam@unist.ac.kr*

## :memo: License

This project is [MIT](https://opensource.org/licenses/MIT) licensed