Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/relastle/halcyon
an accurate basecaller exploiting an encoder-decoder model with monotonic attention.
https://github.com/relastle/halcyon
Last synced: about 2 months ago
JSON representation
an accurate basecaller exploiting an encoder-decoder model with monotonic attention.
- Host: GitHub
- URL: https://github.com/relastle/halcyon
- Owner: relastle
- License: gpl-3.0
- Created: 2020-08-22T06:12:06.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-10-22T05:38:03.000Z (about 2 years ago)
- Last Synced: 2023-03-03T17:42:20.451Z (almost 2 years ago)
- Language: Python
- Homepage:
- Size: 221 KB
- Stars: 4
- Watchers: 2
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Halcyon
![halcyon](https://user-images.githubusercontent.com/6816040/100544654-9c166d80-329a-11eb-86e9-e1dd17a496a6.png)
[Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention.](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa953/5962086?searchresult=1)
Halcyon incorporates neural-network techniques frequently used in the field
of machine translation, and employs monotonic-attention mechanisms to learn semantic
correspondences between nucleotides and signal levels without any pre-segmentation against input
signals.## Installation
### pip
```bash
pip install pyhalcyon
```
### bioconda```bash
conda install pyhalcyon
```### Docker
```bash
docker pull relastle/halcyon
```## Basic Usage
Halcyon basecalls from fast5 files in a single directory and outputs the basecalled sequences into a single fasta file.
Suppose that you have the following directory structure.
```
./test_data/
└── signals.fast5
```The commands below are for obtaining fasta file (`./test.fasta`) as following.
```
>./test_data/signals.fast5
TGG...TAT
```### pip /bioconda / from source
```bash
halcyon basecall -i ./test_data/ -o ./test.fasta
```### docker
```bash
docker run -it --rm -v : relastle/halcyon basecall -i ./test_data/ -o ./test.fasta
```⚠️ Please note that you should mount the host directory into the container appropriately.
## Road Map
- [x] Inference module.
- [ ] From-scratch training module.
- [ ] Retraining module.
- [ ] Visualizing a attention transition.## Citation
Below is bibtex format for citation.
```bibtex
@article{10.1093/bioinformatics/btaa953,
author = {Konishi, Hiroki and Yamaguchi, Rui and Yamaguchi, Kiyoshi and Furukawa, Youichi and Imoto, Seiya},
title = "{Halcyon: An Accurate Basecaller Exploiting An Encoder-Decoder Model With Monotonic Attention}",
journal = {Bioinformatics},
year = {2020},
month = {11},
abstract = "{In recent years, nanopore sequencing technology has enabled inexpensive long-read sequencing, which promises reads longer than a few thousand bases. Such long-read sequences contribute to the precise detection of structural variations and accurate haplotype phasing. However, deciphering precise DNA sequences from noisy and complicated nanopore raw signals remains a crucial demand for downstream analyses based on higher-quality nanopore sequencing, although various basecallers have been introduced to date.To address this need, we developed a novel basecaller, Halcyon, that incorporates neural-network techniques frequently used in the field of machine translation. Our model employs monotonic-attention mechanisms to learn semantic correspondences between nucleotides and signal levels without any pre-segmentation against input signals. We evaluated performance with a human whole-genome sequencing dataset and demonstrated that Halcyon outperformed existing third-party basecallers and achieved competitive performance against the latest Oxford Nanopore Technologies’ basecallers.The source code (halcyon) can be found at https://github.com/relastle/halcyon.}",
issn = {1367-4803},
doi = {10.1093/bioinformatics/btaa953},
url = {https://doi.org/10.1093/bioinformatics/btaa953},
note = {btaa953},
eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa953/34178609/btaa953.pdf},
}
```