Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sialindskrog/classifyNMIBC

Last synced: 4 days ago
JSON representation

Host: GitHub
URL: https://github.com/sialindskrog/classifyNMIBC
Owner: sialindskrog
Created: 2020-12-17T10:35:00.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2021-08-26T08:19:52.000Z (over 3 years ago)
Last Synced: 2024-10-12T21:31:39.171Z (4 months ago)
Language: R
Size: 2.37 MB
Stars: 7
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

Road2R - classifyNMIBC - This package implements a Pearson nearest-centroid classifier that assigns class labels to single samples according to the four transcriptomic UROMOL2021 classes of non-muscle-invasive bladder cancer (NMIBC): class 1, class 2a, class 2b and class 3. (Table of Contents / Bioinformatics)

README

        # classifyNMIBC

This package implements a Pearson nearest-centroid classifier that assigns class labels to single samples according to the four transcriptomic UROMOL2021 classes of non-muscle-invasive bladder cancer (NMIBC): class 1, class 2a, class 2b and class 3.

The classifier code was adapted from the consensusMIBC classifier: Kamoun, A et. al. A Consensus Molecular Classification of Muscle-invasive Bladder Cancer, Eur Urol (2019), doi: https://doi.org/10.1016/j.eururo.2019.09.006

Both classifiers can be found on our online web application: http://nmibc-class.dk

A smaller, example data set is provided to run the classifier.

## Citation 

Please cite Lindskrog and Prip et al. An integrated multi-omics analysis identifies prognostic molecular subtypes of non-muscle-invasive bladder cancer. Nat Commun. 2021. PMID: 33863885. DOI: 10.1038/s41467-021-22465-w

## Install

You may install this package with devtools:

``` {r}

library(devtools)

devtools::install_github("sialindskrog/classifyNMIBC", build_vignettes = TRUE)

library(classifyNMIBC)

```

## Usage

``` {r}

classifyNMIBC(x, minCor = .2, gene_id = c("ensembl_gene_ID", "hgnc_symbol")[1])

```

'x': dataframe with unique genes in rows and samples to be classified in columns (or single named vector of gene expression values).

RNA-seq data needs to be log-transformed and micro-array data should be normalized. Gene names may be supplied as Ensembl gene IDs or HUGO gene symbols.

'minCor': a numeric value specifying a minimal threshold for best Pearson's correlation between a sample's gene expression profile and centroids profiles. A sample showing no correlation above this threshold will remain unclassifed and prediction results will be set to NA. Default minCor value is 0.2.

'gene_id': a character value specifying the type of gene identifiers used for the names/rownames of 'x', ensembl_gene_ID for Ensembl gene IDs or hgnc_symbol for HUGO gene symbols.

## Example

``` {r}

data(test_data)

NMIBC_class <- classifyNMIBC(test_data)

head(NMIBC_class)

#       NMIBC_class cor_pval separationLevel   Class_1  Class_2a  Class_2b   Class_3

# U0026     Class_1        0       0.4596168 0.8722377 0.7244603 0.8402362 0.7649863

# U1270     Class_1        0       0.8715953 0.9076109 0.7632527 0.7878480 0.8151044

# U0062     Class_1        0       0.7756813 0.8935798 0.7522703 0.7440653 0.8040516

# U0268     Class_1        0       0.1017993 0.8979415 0.7236878 0.8106683 0.8932611

# U1031     Class_1        0       0.8399565 0.8918952 0.7297127 0.8244154 0.8430349

# U2111     Class_3        0       0.6588400 0.6799160 0.7345242 0.6853025 0.7820521

```

The classifier returns a dataframw with 7 columns:

'NMIBC_Class': the predicted class labels of the samples.

'cor_pval': the p-value associated with the Pearson's correlation between the sample and the nearest centroid.

'separationLevel': gives a measure (ranging from 0 to 1) of how a sample is representative of its consensus class, with 0 meaning the sample is too close to the other classes to be confidently assigned to one class label, and 1 meaning the sample is very representative of its class. This separationLevel is measured as follows: (correlation to nearest centroid - correlation to second nearest centroid) / median difference of sample-to-centroid correlation.

The remaing four columns return the Pearson's correlation values for each sample and each centroid.