https://github.com/razielar/identification-of-de-novo-lncrna-genes
https://github.com/razielar/identification-of-de-novo-lncrna-genes
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/razielar/identification-of-de-novo-lncrna-genes
- Owner: razielar
- License: mit
- Created: 2019-05-03T13:14:02.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2020-02-17T14:34:49.000Z (over 5 years ago)
- Last Synced: 2025-01-11T14:48:38.590Z (4 months ago)
- Language: Python
- Size: 87.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Identification of *de novo* lncRNA genes
## 1) Genome-guided transcriptome assembly
[CLASS (2013)](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S5-S14): (Contraint-based Local Assembly and Selection of Splice variants) a transcript selection scheme that takes into account **contiguity constrains** from read pairs and spliced reads and, where available, knowledge about gene structure (cDNA sequence databases). Do not estimate transcript abundance can be passed to RSEM, etc.
[StringTie (2015)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643835/): applies a **network flow algorithm**, together with optional *de novo* transcriptome assembly. The reference-based uses alignments of reads to identify clusters of reads that represent potential transcripts. If paired-end reads, they improve the ability of the assembler to link together exons belonging to the same transcript.
[Strawberry (2017)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005851): consists of two modules: assembly and quantification. The novelty is that the two modules use different optimization frameworks but utilize the same **data graph substructure**. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms. The quantification module corrects for sequencing bias through an EM algorithm.
[Scallop (2017)](https://www.nature.com/articles/nbt.4020): is a reference-based transcript assembler that improves reconstruction of of **multiexon** and **lowly expressed transcripts**. Scallop minimizes the read coverage deviation and minimizes the number of expressed transcripts by iteratively decomposing vertices of the splice graph.
## 2) *De novo* transcriptome assembly
[Trinity (2011)](https://www.nature.com/articles/nbt.1883): efficiently constructs and analyses sets of **de Brujin graphs**. Trinity fully reconstruction a large fraction of transcripts, transcripts from recently duplicated genes with a sensitivity similar to methods that rely on genome alignments.
# Run the pipeline:
First load conda environment that has **snakemake** and all its dependencies.
```{r}
snakemake --configfile config.yaml --snakefile identification.smk
```
## Debugging:```{r}
snakemake -np --configfile config.yaml --snakefile identification.smk
```