https://github.com/gagneurlab/fraser-analysis

Accompanying analysis code for the FRASER manuscript
https://github.com/gagneurlab/fraser-analysis

outlier-detection r rare-disease rna-seq snakemake workflow

Last synced: about 1 year ago
JSON representation

Accompanying analysis code for the FRASER manuscript

Host: GitHub
URL: https://github.com/gagneurlab/fraser-analysis
Owner: gagneurlab
License: mit
Created: 2019-12-20T07:32:24.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2020-08-27T05:28:59.000Z (almost 6 years ago)
Last Synced: 2025-03-25T14:38:46.251Z (about 1 year ago)
Topics: outlier-detection, r, rare-disease, rna-seq, snakemake, workflow
Language: R
Homepage: https://tinyurl.com/FRASER-paper
Size: 4.67 MB
Stars: 26
Watchers: 3
Forks: 7
Open Issues: 7
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

README

# FRASER-analysis

This is the accompanying analysis repository of the paper:

`Detection of aberrant splicing events in RNA-seq data with FRASER`.

The paper can be found [on bioRxiv](https://www.biorxiv.org/content/10.1101/2019.12.18.866830v1).

This repository contains the full pipeline and code to reproduce the results published in the paper using [snakemake](https://snakemake.readthedocs.io/en/stable/) and [wBuild](https://github.com/gagneurlab/wBuild).

## Project structure

This project is setup as a [wBuild workflow](https://github.com/gagneurlab/wBuild). This is an automatic build tool for R reports based on [snakemake](https://snakemake.readthedocs.io/en/stable/).

* The `wbuild.yaml` is the main configuration file to setup up the workflow
* The `Scripts` folder contains scripts which will be rendered as HTML reports
* The `src` folder contains additional helper functions and scripts
* The `Output` folder will contain all files produced in the analysis pipeline
* `Output/data` has all raw RDS output files
* `Output/html` contains the final HTML report
* `Output/paper_figures` has all paper figures

## Data and prerequisites

This project depends on the python package `wBuild` and the R package `FRASER`. Further, we use the [Leafcutter](https://github.com/davidaknowles/leafcutter) adaptation used in the [Kremer et al paper](https://www-nature-com.eaccess.ub.tum.de/articles/ncomms15824), which can be found [here](https://i12g-gagneurweb.in.tum.de/gitlab/mertes/rare-disease-leafcutter).

The pipeline starts with the raw aligned GTEx samples V7P and their genotype calls, which can be downloaded from [dbGaP](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v7.p2). Since the data are not publicly shareable one has to apply for the data at [dbGaP]( https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v6.p1).

## Repository setup

First download the repo and its dependencies:

```
# R package used throughout the workflow
git clone https://github.com/gagneurlab/FRASER
git clone https://i12g-gagneurweb.in.tum.de/gitlab/mertes/rare-disease-leafcutter

# download needed SRA annotation db
wget -O - 'https://s3.amazonaws.com/starbuck1/sradb/SRAmetadb.sqlite.gz' | gunzip -c > 'Data/filemapping/SRAmetadb.sqlite'

# analysis code
git clone https://github.com/gagneurlab/FRASER-analysis
cd FRASER-analysis
```

and install wbuild using pip by running.

```
pip install wBuild
wBuild init
```

Since `wBuild init` will reset the current `Snakefile`, ` readme.md`, and `wbuild.yaml` we have to revert them again with git.

```
git checkout Snakefile
git checkout wbuild.yaml
git checkout readme.md
```

To make sure all packages needed in the analysis are installed source the following file in R

```
Rscript ./src/r/install_dependencies.R
```

## Run the full pipeline

To run the full pipeline, execute the following command with 10 jobs and maximum 40 cores in parallel:

```
# init datasets to be used
snakemake -j 25 --cores 25 defineDatasets

# run full analysis on datasets
snakemake -j 10 --cores 40 Output/paper_figures/supplement_final.pdf
```

or to run it on the cluster with SLUM installed:

```
snakemake -k --restart-times 2 --cluster "sbatch -N 1 -n 10 --mem 80G" --jobs 20
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gagneurlab/fraser-analysis

Awesome Lists containing this project

README