Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/jw156605/SingleSplice

Algorithm for detecting alternative splicing in a population of single cells. See details in Welch et al., Nucleic Acids Research 2016: http://nar.oxfordjournals.org/content/early/2016/01/05/nar.gkv1525.full
https://github.com/jw156605/SingleSplice

Last synced: 7 days ago
JSON representation

Algorithm for detecting alternative splicing in a population of single cells. See details in Welch et al., Nucleic Acids Research 2016: http://nar.oxfordjournals.org/content/early/2016/01/05/nar.gkv1525.full

Lists

README

        

# SingleSplice
Algorithm for detecting alternative splicing in a population of single cells

## System Requirements
- R, Perl, gcc, and git
- Boost C++ library

## Installation
- Check that your system meets the requirements
- Navigate to the desired installation directory
- Clone the SingleSplice repository:
`git clone https://github.com/jw156605/SingleSplice`
- To install the Boost library, you can run the following commands:
```
cd SingleSplice/
wget https://sourceforge.net/projects/boost/files/boost/1.60.0/boost_1_60_0.tar.gz
tar -xvf boost_1_60_0.tar.gz
```
- If you install in a directory other than `SingleSplice/boost_1_60_0/`, you need to modify the `BOOST`
variable in the Makefile to point to the installation directory.
- Run the Makefile by simply typing: `make`

## Usage

Sample input and output files for SingleSplice are provided. The FASTQ and BAM files are not included due to their large size.
To reproduce these results, follow the set of steps below:

- Download the FASTQ files from the 80 E18.5 cells in GEO record GSE52583.
- Align the reads to mm10 using your favorite RNA-seq aligner (MapSplice, TopHat, STAR) and convert to indexed BAM.
- Run diffsplice on the indexed BAM files:
```
cd diffsplice
bin/diffsplice -o -m full -s ../sample_input/sample_config.txt ../sample_input/sample_datafile.txt
```
- Align the reads to the ERCC transcript sequences using your favorite unspliced aligner (e.g., bowtie2).
- Count the numbers of mapped biological and spike-in reads and use this information to compute "cell size" values.
- Compute normalized coverage in units of reads per kilobase per million reads (RPKM) for the ERCC transcripts. Normalize to median cell size
as described in the paper.
- Run SingleSplice using the ASM abundance estimates contained in result/asm/. Note that diffsplice outputs a file for each chromosome, so
to run on all chromosomes, these files must be concatenated.
```
perl SingleSplice.pl -a sample_input/asm_all_chr.txt -p 1000 -s 80 -t sample_input/ERCC_rpkms_size_norm_median.csv -r sample_input/total_reads_DistalLungEpithelium.csv -g sample_input/groups.csv
```
Note: The file "groups.csv" is a dummy file for illustrative purposes only. The actual cell cycle stages in this dataset are unknown. Accordingly, the group change p-values are not significant.