Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sialindskrog/classifyNMIBC
https://github.com/sialindskrog/classifyNMIBC
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/sialindskrog/classifyNMIBC
- Owner: sialindskrog
- Created: 2020-12-17T10:35:00.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-08-26T08:19:52.000Z (over 3 years ago)
- Last Synced: 2024-10-12T21:31:39.171Z (4 months ago)
- Language: R
- Size: 2.37 MB
- Stars: 7
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Road2R - classifyNMIBC - This package implements a Pearson nearest-centroid classifier that assigns class labels to single samples according to the four transcriptomic UROMOL2021 classes of non-muscle-invasive bladder cancer (NMIBC): class 1, class 2a, class 2b and class 3. (Table of Contents / Bioinformatics)
README
# classifyNMIBC
This package implements a Pearson nearest-centroid classifier that assigns class labels to single samples according to the four transcriptomic UROMOL2021 classes of non-muscle-invasive bladder cancer (NMIBC): class 1, class 2a, class 2b and class 3.
The classifier code was adapted from the consensusMIBC classifier: Kamoun, A et. al. A Consensus Molecular Classification of Muscle-invasive Bladder Cancer, Eur Urol (2019), doi: https://doi.org/10.1016/j.eururo.2019.09.006
Both classifiers can be found on our online web application: http://nmibc-class.dkA smaller, example data set is provided to run the classifier.
## Citation
Please cite Lindskrog and Prip et al. An integrated multi-omics analysis identifies prognostic molecular subtypes of non-muscle-invasive bladder cancer. Nat Commun. 2021. PMID: 33863885. DOI: 10.1038/s41467-021-22465-w## Install
You may install this package with devtools:
``` {r}
library(devtools)
devtools::install_github("sialindskrog/classifyNMIBC", build_vignettes = TRUE)
library(classifyNMIBC)
```## Usage
``` {r}
classifyNMIBC(x, minCor = .2, gene_id = c("ensembl_gene_ID", "hgnc_symbol")[1])
```
'x': dataframe with unique genes in rows and samples to be classified in columns (or single named vector of gene expression values).
RNA-seq data needs to be log-transformed and micro-array data should be normalized. Gene names may be supplied as Ensembl gene IDs or HUGO gene symbols.'minCor': a numeric value specifying a minimal threshold for best Pearson's correlation between a sample's gene expression profile and centroids profiles. A sample showing no correlation above this threshold will remain unclassifed and prediction results will be set to NA. Default minCor value is 0.2.
'gene_id': a character value specifying the type of gene identifiers used for the names/rownames of 'x', ensembl_gene_ID for Ensembl gene IDs or hgnc_symbol for HUGO gene symbols.
## Example
``` {r}
data(test_data)NMIBC_class <- classifyNMIBC(test_data)
head(NMIBC_class)
# NMIBC_class cor_pval separationLevel Class_1 Class_2a Class_2b Class_3
# U0026 Class_1 0 0.4596168 0.8722377 0.7244603 0.8402362 0.7649863
# U1270 Class_1 0 0.8715953 0.9076109 0.7632527 0.7878480 0.8151044
# U0062 Class_1 0 0.7756813 0.8935798 0.7522703 0.7440653 0.8040516
# U0268 Class_1 0 0.1017993 0.8979415 0.7236878 0.8106683 0.8932611
# U1031 Class_1 0 0.8399565 0.8918952 0.7297127 0.8244154 0.8430349
# U2111 Class_3 0 0.6588400 0.6799160 0.7345242 0.6853025 0.7820521```
The classifier returns a dataframw with 7 columns:'NMIBC_Class': the predicted class labels of the samples.
'cor_pval': the p-value associated with the Pearson's correlation between the sample and the nearest centroid.
'separationLevel': gives a measure (ranging from 0 to 1) of how a sample is representative of its consensus class, with 0 meaning the sample is too close to the other classes to be confidently assigned to one class label, and 1 meaning the sample is very representative of its class. This separationLevel is measured as follows: (correlation to nearest centroid - correlation to second nearest centroid) / median difference of sample-to-centroid correlation.
The remaing four columns return the Pearson's correlation values for each sample and each centroid.