https://github.com/nanoporetech/pipeline-polya-ng
Pipeline for calling poly(A) tail lengths from nanopore direct RNA data using nanopolish
https://github.com/nanoporetech/pipeline-polya-ng
direct-rna nanopolish poly-a rna
Last synced: 3 days ago
JSON representation
Pipeline for calling poly(A) tail lengths from nanopore direct RNA data using nanopolish
- Host: GitHub
- URL: https://github.com/nanoporetech/pipeline-polya-ng
- Owner: nanoporetech
- License: other
- Created: 2019-03-25T16:29:05.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-07-06T14:43:24.000Z (over 5 years ago)
- Last Synced: 2025-04-06T08:02:15.455Z (6 months ago)
- Topics: direct-rna, nanopolish, poly-a, rna
- Language: Python
- Homepage:
- Size: 49.8 KB
- Stars: 10
- Watchers: 17
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README

-----------------------------Pipeline for calling poly(A) tail lengths from nanopore direct RNA data using nanopolish
========================================================================================This pipeline uses [snakemake](https://snakemake.readthedocs.io/en/stable/), [minimap2](https://github.com/lh3/minimap2) and [nanopolish](https://github.com/jts/nanopolish) to call poly(A) tails from Oxford Nanopore direct RNA data.
The [pipeline-polya-diff](https://github.com/nanoporetech/pipeline-polya-diff) pipeline takes the output file `tails/filtered_tails.tsv` from multiple controls and treated samples and performs analysis of shifts in poly(A) tail lengths.
Getting Started
===============## Input
The input files and parameters are specified in `config.yml`:
- `transcriptome` - the input transcriptome.
- `fast5_dir` - directory with pass FAST5 files.
- `fastq_dir` - directory with the fastq files.
- `summary_dir` - directory with the sequencing summary files.
- `spikein_fasta` - (optional) fasta file with spike-inf on known poly(A) tails length. The sequence names must end in *_* (for example "_50").
- `min_mapping_qual` - filter out reads with mapping quality less than this parameter.
- `per_transcript_plots` - plot the distribution of estimated tails lengths for all transcript (true or false).
- `threads` - number of threads to use for the analyses.## Output
- `alignment/`:
- `aligned_reads_sorted.bam` - sorted indexed alignment of reads to the transcriptome.- `input/`:
- `reads.fastq&ast` - concatenated input reads and nanopolish index files.
- `reference.fas` - reference fasta (including spike-ins).
- `summaries.fofn` - list of sequencing summary files.- `reports/`:
- `filtering_report.pdf` and `filtering_report.tsv` - nanopolish QC statistics.
- `spikein_medians.tsv` - expected and estimated medians of spike-ins.
- `spikein_report.pdf` - plots of distribution of tail lengths in spike-ins.
- `tails_report.pdf` - global and per-transcript poly(A) tail length distributions.- `tails/`:
- `all_tails.tsv` - raw nanopolish output.
- `filtered_tails.tsv` - nanopolish output - PASS reads only.
- `spikein_tails.tsv` - results for reads mapping to spike-ins.## Dependencies
- [miniconda](https://conda.io/miniconda.html) - install it according to the [instructions](https://conda.io/docs/user-guide/install/index.html).
- [snakemake](https://anaconda.org/bioconda/snakemake) install using `conda`.
- The rest of the dependencies are automatically installed using the `conda` feature of `snakemake`.## Layout
* `README.md`
* `Snakefile` - master snakefile
* `config.yml` - YAML configuration file
* `snakelib/` - snakefiles collection included by the master snakefile
* `lib/` - python files included by analysis scripts and snakefiles
* `scripts/` - analysis scripts
* `data/` - input data needed by pipeline - use with caution to avoid bloated repo
* `results/` - pipeline results to be commited - use with caution to avoid bloated repo## Installation
Clone the repository:
```bash
git clone https://github.com/nanoporetech/pipeline-polya-ng
```## Usage:
Edit `config.yml` to set the input datasets and parameters then issue:
```bash
snakemake --use-conda -j all
```Help
====## Licence and Copyright
(c) 2019 Oxford Nanopore Technologies Ltd.
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.## FAQs and tips
## References and Supporting Information
### Research Release
Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.