Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/princethewinner/FiRE
https://github.com/princethewinner/FiRE
Last synced: 23 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/princethewinner/FiRE
- Owner: princethewinner
- License: gpl-3.0
- Created: 2018-10-07T07:44:38.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-08-09T04:25:48.000Z (almost 5 years ago)
- Last Synced: 2024-02-24T17:31:24.823Z (4 months ago)
- Language: C++
- Homepage: https://princethewinner.github.io/FiRE/
- Size: 13.4 MB
- Stars: 23
- Watchers: 5
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: COPYING
Lists
- awesome_single_cell - FiRE - [python, R, C++] - Finder of rare entities (FiRE) helps identify rare cell types in voluminous single-cell datasets. Design of FiRE is inspired by the observation that rareness estimation of a particular data point is the flip side of measuring the density around it. In principle, FiRE uses the Sketching technique, a variant of locality sensitive hashing, to assign rareness score to every cell. (Software packages / Rare cell detection)
- awesome-single-cell - FiRE - [python, R, C++] - Finder of rare entities (FiRE) helps identify rare cell types in voluminous single-cell datasets. Design of FiRE is inspired by the observation that rareness estimation of a particular data point is the flip side of measuring the density around it. In principle, FiRE uses the Sketching technique, a variant of locality sensitive hashing, to assign rareness score to every cell. (Software packages / Rare cell detection)
- awesome-single-cell - FiRE - [python, R, C++] - Finder of rare entities (FiRE) helps identify rare cell types in voluminous single-cell datasets. Design of FiRE is inspired by the observation that rareness estimation of a particular data point is the flip side of measuring the density around it. In principle, FiRE uses the Sketching technique, a variant of locality sensitive hashing, to assign rareness score to every cell. (Software packages / Rare cell detection)
README
# FiRE - Finder of Rare Entities
**Update: FiRE is now available via cran. install FiRE using**
```R
install.packages('FiRE')
```## Contents
[Introduction](#introduction)
[External dependencies](#etc)
[Installation](#install)
[Python Package](#python-demo)
-[Prerequisites](#pre-python)
-[Installation Steps](#install-steps-python)
-[Usage](#usage-python)
[R Package](#r-demo)
-[Prerequisites](#pre-R)
-[Installation Steps](#install-steps-R)
-[Usage](#usage-R)
[Publication](#publication)
[Copyright](#copyright)
[Patent](#patent)Tested on Ubuntu 14.04 and Ubuntu 16.04.
All results in manuscript have been generated using `python` module.
FiRE is available for `python` and `R`. Required versions and modules for both are as mentioned below. `cpp` modules are necessary for both of them.
## External Dependencies
Following packages are required to run/install the FiRE software.Required cpp modules
```cpp
g++ >= 4.8.4
boost >= 1.54.0
```FiRE only needs `` from boost. So, full installation is not necessary. It can be downloaded from [boost.org](https://www.boost.org/) and used as is.
```bash
[sudo] ./INSTALL [ --boost-path | --log-file | --inplace | --py | --R | --help ]
[sudo] ./UNINSTALL_python
[sudo] ./UNINSTALL_R--boost-path : python : Path to boost-library, if boost is not installed at default location, this value needs to be provided.
--inplace : python : Required only for python, if set, inplace build will be run and resulting lib will be stored in python/FiRE.
--log-file : python : Required only for python, ignored with --inplace set.
--py : python : Install FiRE in python environment.
--R : R : Install FiRE in R environment.
--help : python | R : Display this help.Info:
UNINSTALL_[python | R] files are generated upon installation.
```Typically, FiRE module takes a few seconds to install. A snippet of installation time taken by FiRE (in seconds) on a machine with Intel® Core™ i5-7200U (CPU @ 2.50GHz × 4), with 8GB memory, and OS Ubuntu 16.04 LTS is as follows
```bash
real 2.92
user 2.73
sys 0.18
```Required python modules
```python
python 2.7
# [EDIT] python 3 can also be used with standard installation.
# however, with --inplace option uninstallation may fail.
# As a workround generated .so file can be removed manually if uninstallation of FiRE is desired.
```For FiRE module
```python
cython >= 0.23.4
distutils >= 2.7.12
```For preprocessing
```python
numpy >= 1.13.3
pandas >= 0.20.3
statsmodels >= 0.8.0
```For demo
```python
gzip >= 1.2.11 (zlib)
scipy >= 1.1.0
matplotlib >= 2.1.0
cmocean >= 1.2
sklearn >= 0.19.1
```with virtual environment avoid using `sudo`. (Thanks to [chenxofhit](https://github.com/chenxofhit))
```bash
[sudo] chomd +x ./INSTALL
```If boost is installed at default location
```bash
[sudo] ./INSTALL --py
```If boost is installed at custom location
```bash
[sudo] ./INSTALL --boost-path --py
```
Example:
```bash
[sudo] ./INSTALL --boost-path $HOME/boost/boost_1_54_0 --py
```
Above installation steps will generate `fireInstall.log` file. It is advisable to keep this file, since it will be needed for uninstallation. Name of the log file can be modified during installation.```bash
./INSTALL --log-file --py
```Above steps will install `FiRE` at the default location.
For inplace installation
```bash
./INSTALL --inplace --py
```Uninstallation of FiRE Software.
```bash
[sudo] ./UNINSTALL_python
```Run demo from FiRE directory as follows
```python
python example/jurkat_simulation.py
```Since data (`data/jurkat_two_species_1580.txt.gz`) is large, this may require large amount of RAM to load and pre-process. We have also providee pre-processed data (`data/preprocessedData_jurkat_two_species_1580.txt.gz`). Pre-processing was done using the script present in `utils/preprocess.py`. Demo using this data as follows
```python
python example/jurkat_simulation_small.py
```Small demo takes seconds to complete. Exact time taken by the demo on a machine with Intel® Core™ i5-7200U (CPU @ 2.50GHz × 4), with 8GB memory, and OS Ubuntu 16.04 LTS is as follows
```bash
Loading preprocessed Data : 1.850723s
Running FiRE : 1.134673sTotal Demo time:
real 4.33
user 3.55
sys 0.76```
Step-by-step description of full demo (example/jurkat_simulation.py) is as follows
1.
Load libraries
```python
import sys
sys.path.append('utils')import numpy as np
import gzip
from scipy import statsimport preprocess as pp
import misc
import FiRE
```2.
Load Data in current environment.
```python
#Data matrix should only consist of values where rows represent cells and columns represent genes.with gzip.GzipFile('data/jurkat_two_species_1580.txt.gz', 'r') as fid:
data = np.genfromtxt(fid)data = data.T #Samples * Features
labels = np.genfromtxt('data/labels_jurkat_two_species_1580.txt', dtype=np.int) #Cells with label '1' represent abundant, while cells with label '2' represent rare.
```3.
Call function ranger_preprocess for selecting thousand variable genes.
```python#Genes
genes = np.arange(1, data.shape[1]+1) #It can be replaced with original gene names#Filter top 1k genes
preprocessedData, selGenes = pp.ranger_preprocess(data, genes, optionToSave=True, dataSave=outputFolder)
```|Parameter | Description | Required or Optional| Datatype | Default Value |
| -----:| -----:| -----:|-----:|-----:|
|data | Data for processing | Required | `np.array [nCells, nGenes]` | - |
|genes | Names of Genes | Required | `np.array [nGenes]` | - |
|ngenes_keep | Number of genes to keep | Optional | `integer` | 1000 |
|dataSave | Path to save results | Optional | `string` | Current working Directory (Used only when optionToSave is True) |
|optionToSave | Save processed output or not | Optional | `boolean` | False(Does not save) |
|minLibSize | Minimum number of expressed features | Optional | `integer` | 0 |
|verbose | Display progress | Optional | `boolean` | True(Prints intermediate results) |```python
'''
Returned Value :
preprocessedData : processed data matrix (log2 transformed) : np.array [nCells, nVariableGenes]
selGenes : Names of thousand variable genes selected : np.array [nVariableGenes]
'''
```4.
Create model of FiRE.
```python
model = FiRE.FiRE(L=100, M=50, H=1017881, seed=5489, verbose=0)
```|Parameter | Description | Required or Optional| Datatype | Default Value |
| -----:| -----:| -----:|-----:|-----:|
|L | Total number of estimators | Required | `int` | - |
|M | Number of features to be randomly sampled for each estimator | Required | `int` | - |
|H | Number of bins in hash table | Optional | `int` | 1017881|
|seed | Seed for random number generator | Optional | `unsigned int` | 5489|
|verbose | Controls verbosity of program at run time (0/1) | Optional | `int` | 0 (silent) |5.
Apply model to the above dataset.
```python
model.fit(preprocessedData)
```6.
Calculate FiRE score of every cell.
```python
score = np.array(model.score(preprocessedData))
'''
Returned Value :
score : FiRE score of every cell : np.array[nCells]Higher values of FiRE score represent rare cells.
'''
```7.
Select cells with higher values of FiRE score, that satisfy IQR-based thresholding criteria.
```python
q3 = np.percentile(score, 75)
iqr = stats.iqr(score)
th = q3 + 1.5*iqrindIqr = np.where(score >= th)[0]
dataSel = preprocessedData[indIqr,:] #Select subset of rare cells
#Create a file with binary predictions
predictions = np.zeros(data.shape[0])
predictions[indIqr] = 1 #Replace predictions for rare cells with '1'.
```8.
Access to model parameters.
Sampled dimensions can be accessed via
```python
# type : 2d list
# shape : L x M
model.dims
```
Chosen thresholds can be accessed via
```python
# type : 2d list
# shape : L x M
model.thresholds
```Weights can be accessed via
```python
# type : 2d list
# shape : L X M
model.weights
```Hash tables can be accessed via
```python
# type : 3d list
# shape : L x H x
# : as per number of samples in a bin (H) for a given estimator (L).
model.bins
```9.
FiRE recovers artifitially planted rare cells (Figure).
(a) t-SNE based 2D embedding of the cells with color-coded identities (b) FiRE score intensities plotted on the t-SNE based 2D map. (c) Rare cells detected by FiRE.
Required R modules
```R
R >= 3.2.0
```For FiRE module
```R
Rcpp >= 0.12.19
BH >= 1.66
```For preprocessing and demo
```R
Matrix >= 1.2.14
plyr >= 1.8.4
``````bash
[sudo] chomd +x ./INSTALL
```Installation of FiRE Software.
```bash
[sudo] ./INSTALL --R
```Uninstallation of FiRE Software.
```bash
[sudo] ./UNINSTALL_R
```Run demo from FiRE directory as follows
```R
Rscript example/jurkat_simulation.R
```Since data (`data/jurkat_two_species_1580.txt.gz`) is large, this may require large amount of RAM to load and pre-process. We have also providee pre-processed data (`data/preprocessedData_jurkat_two_species_1580.txt.gz`). Pre-processing was done using the script present in `utils/preprocess.R`. Demo using this data as follows
```R
Rscript example/jurkat_simulation_small.R
```Small demo takes seconds to complete. Exact time taken by the demo on a machine with Intel® Core™ i5-7200U (CPU @ 2.50GHz × 4), with 8GB memory, and OS Ubuntu 16.04 LTS is as follows
```bash
Total Demo time:
real 4.11
user 3.16
sys 1.13```
Step-by-step description of full demo (example/jurkat_simulation.R) is as follows
1.
Load libraries
```R
library('FiRE')
source('utils/preprocess.R')
```2.
Load Data in current environment.
```R
#Read data
data <- read.table(gzfile('data/jurkat_two_species_1580.txt.gz'))
data <- t(data) #Samples * Features#Read Labels
labels <- read.table('data/labels_jurkat_two_species_1580.txt') #Cells with label '1' represent abundant, while cells with label '2' represent rare.#Genes
genes <- c(1:dim(data)[2]) #It can be replaced with original gene namesdata_mat <- list(mat=data, gene_symbols=genes)
```3.
Call function ranger_preprocess for selecting thousand variable genes.
```R
preprocessedList <- ranger_preprocess(data_mat)
preprocessedData <- as.matrix(preprocessedList$preprocessedData)
```|Parameter | Description | Required or Optional| Datatype | Default Value |
| -----:| -----:| -----:|-----:|-----:|
|data_mat | List consisting of data for processing and gene symbols | Required | `list(mat=data, gene_symbols=genes)` | - |
|ngenes_keep | Number of genes to keep | Optional | `integer` | 1000 |
|dataSave | Path to save results | Optional | `string` | Current working Directory (Used only when optionToSave is True) |
|optionToSave | Save processed output or not | Optional | `boolean` | False(Does not save) |
|minLibSize | Minimum number of expressed features | Optional | `integer` | 0 |
|verbose | Display progress | Optional | `boolean` | True(Prints intermediate results) |4.
Create model of FiRE.
```R
# model <- new(FiRE::FiRE, L, M, H, seed, verbose)
model <- new(FiRE::FiRE, 100, 50, 1017881, 5489, 0)
```|Parameter | Description | Required or Optional| Datatype | Default Value |
| -----:| -----:| -----:|-----:|-----:|
|L | Total number of estimators | Required | `int` | - |
|M | Number of features to be randomly sampled for each estimator | Required | `int` | - |
|H | Number of bins in hash table | Optional | `int` | 1017881|
|seed | Seed for random number generator | Optional | `int` | 5489|
|verbose | Controls verbosity of program at run time (0/1) | Optional | `int` | 0 (silent) |5.
Apply model to the above dataset.
```R
model$fit(preprocessedData)
```
Acceptable datatype is of `matrix` class and of `type` `double` (`Numeric matrix`).6.
Calculate FiRE score of every cell.
```R
# Returns a numeric vector
score <- model$score(preprocessedData)
```7.
Select cells with higher values of FiRE score, that satisfy IQR-based thresholding criteria.
```R
#Apply IQR-based criteria to identify rare cells for further downstream analysis.
q3 <- quantile(score, 0.75)
iqr <- IQR(score)
th <- q3 + (1.5*iqr)#Select indexes that satisfy IQR-based thresholding criteria.
indIqr <- which(score >= th)#Create a file with binary predictions
predictions <- integer(dim(data)[1])
predictions[indIqr] <- 1 #Replace predictions for rare cells with '1'.```
8.
Access to model parameters.
Sampled dimensions can be accessed via
```R
# type : Integer matrix
# shape : L x M
model$d
```
Chosen thresholds can be accessed via
```R
# type : Numeric matrix
# shape : L x M
model$ths
```Weights can be accessed via
```R
# type : Numeric matrix
# shape : 0 x 0
model$w
```Hash tables can be accessed via
```R
# type : List
# shape : L x H x
# : as per number of samples in a bin (H) for a given estimator (L).
model$b
```Jindal, A., Gupta, P., Jayadeva and Sengupta, D., 2018. Discovery of rare cells from voluminous single cell expression data. Nature communications, 9(1), p.4719.
DOI: https://doi.org/10.1038/s41467-018-07234-6This software package is distributed under GNU GPL v3.
This work is free to use for academic and research purposes. Please contact [Dr. Debarka]([email protected]) for commercial use of this work.