https://github.com/razielar/identification-of-de-novo-lncrna-genes

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/razielar/identification-of-de-novo-lncrna-genes
Owner: razielar
License: mit
Created: 2019-05-03T13:14:02.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-02-17T14:34:49.000Z (over 5 years ago)
Last Synced: 2025-01-11T14:48:38.590Z (4 months ago)
Language: Python
Size: 87.9 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Identification of *de novo* lncRNA genes

## 1) Genome-guided transcriptome assembly

[CLASS (2013)](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S5-S14): (Contraint-based Local Assembly and Selection of Splice variants) a transcript selection scheme that takes into account **contiguity constrains** from read pairs and spliced reads and, where available, knowledge about gene structure (cDNA sequence databases). Do not estimate transcript abundance can be passed to RSEM, etc.

[StringTie (2015)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643835/): applies a **network flow algorithm**, together with optional *de novo* transcriptome assembly. The reference-based uses alignments of reads to identify clusters of reads that represent potential transcripts. If paired-end reads, they improve the ability of the assembler to link together exons belonging to the same transcript.

[Strawberry (2017)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005851): consists of two modules: assembly and quantification. The novelty is that the two modules use different optimization frameworks but utilize the same **data graph substructure**. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms. The quantification module corrects for sequencing bias through an EM algorithm.

[Scallop (2017)](https://www.nature.com/articles/nbt.4020): is a reference-based transcript assembler that improves reconstruction of of **multiexon** and **lowly expressed transcripts**. Scallop minimizes the read coverage deviation and minimizes the number of expressed transcripts by iteratively decomposing vertices of the splice graph.

## 2) *De novo* transcriptome assembly

[Trinity (2011)](https://www.nature.com/articles/nbt.1883): efficiently constructs and analyses sets of **de Brujin graphs**. Trinity fully reconstruction a large fraction of transcripts, transcripts from recently duplicated genes with a sensitivity similar to methods that rely on genome alignments.

# Run the pipeline:

First load conda environment that has **snakemake** and all its dependencies.

```{r}

snakemake --configfile config.yaml --snakefile identification.smk

```
## Debugging:

```{r}

snakemake -np --configfile config.yaml --snakefile identification.smk

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/razielar/identification-of-de-novo-lncrna-genes

Awesome Lists containing this project

README