Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scutbioinformatic/causalcall
https://github.com/scutbioinformatic/causalcall
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/scutbioinformatic/causalcall
- Owner: scutbioinformatic
- Created: 2019-11-27T14:53:33.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2019-11-28T07:40:11.000Z (over 4 years ago)
- Last Synced: 2023-10-20T01:52:53.494Z (9 months ago)
- Language: Python
- Size: 37.5 MB
- Stars: 9
- Watchers: 4
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Lists
- awesome-nanopore - Causalcall - [Python] - [Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network](https://www.frontiersin.org/articles/10.3389/fgene.2019.01332/full) (Software packages / Basecalling)
README
# Causalcall
Code for the paper *Causalcall: nanopore basecalling using a temporal convolutional network*.# Environment
- Ubuntu 14.04
- python 3.6
- tensorflow 1.8
Dependencies:
numpy, collections, threading, tempfile, h5py, statsmodels, difflib, argparse, tqdm, multiprocessing, psutil# Usages
### Preparing the training data:
Extract training and validation data (.tfrecords) from resquiggled fast5 files:
```
python raw.py -i training/validation_fast5_files_folder -o tfrecords_files_folder -f train.tfrecords/validate.tfrecords
```
### Training:
Train the model with default parameters:
```
CUDA_VISIBLE_DEVICES=cuda_id python train.py -i tfrecords_files_folder -o model_folder -m model_name
```
Make sure that train.tfrecords and validate.tfrecords are both in the tfrecords_directory.
If you want to change parameters, use `train.py -h` for more details.### Basecalling:
```
CUDA_VISIBLE_DEVICES=cuda_id python basecall.py -i fast5_files_folder -o results_folder -m path_to_model
```
Path to default model: *./model/DNAmodel/*### Acknowledgement
We thank Chiron for providing the [source code](https://github.com/haotianteng/Chiron). Causalcall is developed on the basic framework of Chiron's code.
(The parts of preprocessing input data and converting outputs of the TCN-based model to base sequences are revised based on Chiron's code following MPL 2.0.)