An open API service indexing awesome lists of open source software.

https://github.com/rnajena/dynamont


https://github.com/rnajena/dynamont

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

          

![Dynamont](figures/logo.png)

A **Dynam**ic Programming Approach to Segment **ONT** Signals.
Dynamont is a segmentation/resquiggling tool for ONT signals.
Dynamont was tested on
* RNA002
* RNA004
* DNA R10.4.1 5kHz (I applied the trained transition parameters from the RNA004 model to the DNA R10 models. These should be fine-tuned for the DNA models.)

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dynamont)
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-teal.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI](https://img.shields.io/pypi/v/dynamont) ![PyPI - Downloads](https://img.shields.io/pypi/dm/dynamont)](https://pypi.org/project/dynamont/)
[![Anaconda-Server Badge](https://anaconda.org/jannessp/dynamont/badges/version.svg)](https://anaconda.org/jannessp/dynamont) ![Conda](https://img.shields.io/conda/dn/jannessp/dynamont) [![Conda package](https://anaconda.org/jannessp/dynamont/badges/latest_release_date.svg)](https://anaconda.org/jannessp/dynamont) [![Conda package](https://anaconda.org/jannessp/dynamont/badges/platforms.svg)](https://anaconda.org/jannessp/dynamont)

[![DOI](https://zenodo.org/badge/608215683.svg)](https://zenodo.org/badge/latestdoi/608215683)

---

- [Installation](#installation)
- [Pypi/pip](#pypipip)
- [Conda](#conda)
- [Usage](#usage)
- [Default models:](#default-models)
- [Output](#output)
- [Example Output](#example-output)
- [Exit-Codes](#exit-codes)

---

# Installation

## Pypi/pip

```bash
pip install dynamont
```

## Conda

```bash
conda config --add channels jannessp # to install all dependencies from the correct channel
conda create -n dynamont jannessp::dynamont
conda activate dynamont
```

# Usage

```bash
# segment a dataset
dynamont-resquiggle -r -b --mode basic -o -p

# train model
dynamont-train -r -b --mode basic -o -p

# choosing a pore will automatically load the default model for that pore, a custom model can be used with the parameter --pore_model
```

# Default models:

- [rna_r9](models/rna/r9.4.1/rna002_5mer.model) (tested)
- [rna_rp4](models/rna/rp4/rna004_9mer.model) (tested)
- dna_r9 not available
- [dna_r10.4.1 260 bps](models/dna/r10.4.1/dna_r10.4.1_e8.2_260bps.model) (not tested)
- [dna_r10.4.1 400 bps](models/dna/r10.4.1/dna_r10.4.1_e8.2_400bps.model) (tested)

# Output

Dynamont produces a tabular output with the following columns:

| Column Name | Description |
|-------------------------|-------------|
| **readid** | Unique identifier for the read. |
| **signalid** | Identifier for the signal corresponding to the read. |
| **start** | Start position of the signal segment in the read. |
| **end** | End position of the signal segment in the read. |
| **basepos** | Reference base position in the genomic sequence. |
| **base** | The detected base at this position. |
| **motif** | The surrounding sequence motif in which the base appears. |
| **state** | The methylation state (or modification state) of the base. |
| **posterior_probability** | Probability assigned to the predicted segment. |
| **polish** | Polished kmer, only available in resquiggle mode. |

## Example Output

Below is an example of the output generated by Dynamont:

```csv
readid,signalid,start,end,basepos,base,motif,state,posterior_probability,polish
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12762,12777,53,A,AAAAAAAAA,M,0.12434,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12777,12791,52,A,AAAAAAAAA,M,0.12146,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12791,12806,51,A,AAAAAAAAA,M,0.11881,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12806,12820,50,A,AAAAAAAAA,M,0.11665,NA
```

# Exit-Codes

- -11: Segmentation fault
- -9: Out of Memory error. Decrease the number of processes or move to a system with more memory.
- -6: std::bad_alloc
- 1: `resquiggle mode` specific: alignment score (Z) does not match between forward and backward run in preprocessing on signal (T) and read (N).
- 2: `resquiggle mode` specific: alignment score (Z) does not match between forward and backward run in preprocessing on signal (T) and error correction (C).
- 3: Alignment score (Z) does not match between forward and backward pass or is -Infinity
- 4: Input signal is missing or not found in stdin stream
- 5: Input read is missing or not found in stdin stream
- 6: raw file does not exist
- 7: Invalid model path was provided
- 8: Provided ONT signal is too short
- 9: Read is too short
- 10: Signal is smaller than read
- 11: Read is smaller than `kmerSize` of provided pore model
- 20: Terminated using KeyboardInterrupt (Ctrl + C)