Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hmatsu1226/SCOUP
SCOUP is a probabilistic model to analyze single-cell expression data during differentiation
https://github.com/hmatsu1226/SCOUP
Last synced: 23 days ago
JSON representation
SCOUP is a probabilistic model to analyze single-cell expression data during differentiation
- Host: GitHub
- URL: https://github.com/hmatsu1226/SCOUP
- Owner: hmatsu1226
- Created: 2015-12-14T07:28:29.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-04-20T11:49:01.000Z (about 7 years ago)
- Last Synced: 2024-02-24T14:39:01.330Z (4 months ago)
- Language: C++
- Homepage:
- Size: 447 KB
- Stars: 9
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Lists
- awesome_single_cell - SCOUP - [C++] - Uses probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation. (Software packages / RNA-seq)
- awesome-single-cell - SCOUP - [C++] - Uses probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation. (Software packages / RNA-seq)
- awesome-single-cell - SCOUP - [C++] - Uses probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation. (Software packages / RNA-seq)
- roryk-awesome-single-cell - SCOUP - [C++] - Uses probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation. (Software packages)
README
# SCOUP
SCOUP : a probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation.
## Reference
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1109-3
## Requirements
The following two libraries are necessary for pseudo-time estimation based on the shortest path on the PCA space.
** This pseudo-time is only used for initialing SCOUP, and hence, pseudo-time estimates from other methods or experimental time can be substituted for initialization.*** LAPACK
* BLAS## How to build
```
git clone https://github.com/hmatsu1226/SCOUP
cd SCOUP
make
```
Or download from "Download ZIP" button and unzip it.## Running SP
Estimate pseudo-time based on shortest path on the PCA space.##### Usage
```
./sp
```* Input_file1 : G x C matrix of expression data
* Input_file2 : Initial distribution data
* Output_file1 : Pseudo-time estimates
* Output_file2 : Coordinates of PCA
* G : The number of genes
* C : The number of cells
* D : The number of PCA dimensions##### Format of Input_file1
The Input_file1 is the G x C matrix of expression data (separated with 'TAB').
Each row corresponds to each gene, and each column corresponds to each cell.##### Example of Input_file1
```
0.33 -4.95 -1.37 -4.07 ...
5.01 4.45 3.82 3.02 ...
.
.
.
```##### Format of Input_file2
The Input_file2 contains the mean and variance of the initial normal distribution.* Col1 : Index of a gene (0-origin)
* Col2 : Mean of the initial distribution for a gene
* Col3 : Variance of the initial distribution for a gene##### Example of Input_file2
```
0 0.0 1.7
1 1.0 2.3
2 -2.0 5.9
```##### Format of Output_file1
The Output_file1 contains the pseudo-time estimates.* Col1 : Index of a cell (0-origin)
* Col2 : Pseudo-time of a cell##### Example of Output_file1
```
0 0.826988
1 0.102140
2 0.758120
```##### Format of Output_file2
The Output_file2 contains the coordinates of PCA.* Col1 : Index of a cell (0-origin)
* Col2 - Col(D+1) : Coordinates of a cellThis file contain (C+1) lines and the last line corresponds to the root cell defined by the mean of the initial distribution.
##### Example of Output_file2
```
0 3.04 0.42
1 -21.21 -1.52
2 5.76 0.48
```## Running SCOUP
Estimate the parameters of Mixute Ornstein-Uhlenbeck process.##### Usage
```
./scoup
```* Input_file1 : G x C matrix of expression data
* Input_file2 : Initial distribution data
* Input_file3 : Initial pseudo-time data
* Output_file1 : Optimized parameters related to genes and lineages
* Output_file2 : Optimized parameters related to cells
* Output_file3 : Log-likelihood
* G : The number of genes
* C : The number of cells##### Options
* -k INT : The number of lineages (default is 1)
* -m INT : Upper bound of EM iteration (without pseudo-time optimization). The detailed explanation is described in the supplementary text. (default is 1,000)
* -M INT : Upper bound of EM iteration (including pseudo-time optimization) (default is 10,000).
* -a DOUBLE : Lower bound of alpha (default is 0.1)
* -A DOUBLE : Upper bound of alpha (default is 100)
* -t DOUBLE : Lower bound of pseudo-time (default is 0.001)
* -T DOUBLE : Upper bound of pseudo-time (default is 2.0)
* -s DOUBLE : Lower bound of sigma squared (default is 0.1)##### Example of running SCOUP
```
./scoup -k 2 data/data.txt data/init.txt out/time_sp.txt out/gpara.txt out/cpara.txt out/ll.txt 500 100
```##### Format of Input_file1
This is the expression data matrix data and is the same data as the Input_file1 of SP.##### Format of Input_file2
This is initial distribution and is the same data as the Input_file2 of SP.##### Format of Input_file3
This is the pseudo-time for initialization and is the same as the **Output_file1** of SP.##### Format of Output_file1
The Output_file1 contains the optimized parameters related to genes and lineages.* First line
* Col1 and Col2 : Space
* Col3 - Col(K+2) : The probability of each lineage (pi_k)
* After first line
* Col1 : alpha_g
* Col2 : sigma_g^2
* Col3 - Col(K+1) : theta_{gk}##### Example of Output_file1
```
0.509804 0.490196
0.501610 2.528400 -6.338714 -2.273163
0.309094 13.046904 3.545862 0.337260
0.223226 4.212808 -4.443503 9.629989
2.707472 14.221109 3.959898 -2.353994
4.361342 34.646044 1.392565 0.789397
```##### Format of Output_file2
* Col1 : Pseudo-time of a cell
* Col2 - Col(K) : Responsibility for each lineage##### Example of Output_file2
```
0.941979 0.990196 0.009804
2.000000 0.990196 0.009804
2.000000 0.990196 0.009804
1.102146 0.990196 0.009804
0.839387 0.990196 0.009804
```##### Format of Output_file3
The log-likelihood##### Exapmle of Output_file3
```
```## Running SCOUP from the middle of the activity
Re-estimate parameters from the middle of the activity.##### Usage
```
./scoup_resume
```* Input_file1 : G x C matrix of expression data
* Input_file2 : Initial distribution data
* Input_file3 : ** Semi-optimized gene and lineage parameters (Output_file1 of scoup) **
* Input_file4 : ** Semi-optimized cell parameters (Output_file2 of scoup) **
* Output_file1 : Optimized parameters related to genes and lineages
* Output_file2 : Optimized parameters related to cells
* Output_file3 : Log-likelihood
* G : The number of genes
* C : The number of cells##### Options
It is the same as the Options of "scoup".##### Example of running SCOUP
```
./scoup_resume -k 2 -e 0.0001 data/data.txt data/init.txt out/gpara.txt out/cpara.txt out/gpara_2.txt out/cpara_2.txt out/ll_2.txt 500 100
```##### Format of Input_file1
This is the same as the Input_file1 of "scoup".##### Format of Input_file2
This is the same as the Input_file2 of "scoup".##### Format of Input_file3
This is the parameters related to genes and lineages and is the same as the **Output_file1** of SCOUP.##### Format of Input_file4
This is the parameters related to cells and is the same as the **Output_file2** of "scoup".##### Format of Output_file1, 2, 3
These file are the same as the output files of SCOUP.## Running Correlation analysis
Calculate the correlation between genes after standardization.##### Usage
```
./cor
```* Input_file1 : G x C matrix of expression data
* Input_file2 : Initial distribution data
* Input_file3 : Optimized gene and lineage parameters (Output_file1 of scoup)
* Input_file4 : Optimized cell parameters (Output_file2 of scoup)
* Output_file1 : Standardized expression matrix
* Output_file2 : G x G correlation matrix
* G : The number of genes
* C : The number of cells##### Options
##### Example of running Correlation analysis
```
./cor data/data.txt data/init.txt out/gpara.txt out/cpara.txt out/nexp.txt out/cor.txt 500 100
```##### Format of Output_file1
The Output_file1 contains the standardized expression data.##### Format of Output_file2
The Output_file2 contains the correlation for the standardized expression data.## License
Copyright (c) 2015 Hirotaka Matsumoto
Released under the MIT license