Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/mateidavid/nanocall

An Oxford Nanopore Basecaller
https://github.com/mateidavid/nanocall

Last synced: about 2 months ago
JSON representation

An Oxford Nanopore Basecaller

Lists

README

        

# -*- mode:org; mode:visual-line; coding:utf-8; -*-

** Nanocall: An Oxford Nanopore Basecaller

[[http://travis-ci.org/mateidavid/nanocall][http://travis-ci.org/mateidavid/nanocall.svg?branch=master]] [[https://tldrlegal.com/license/mit-license][http://img.shields.io/:license-mit-blue.svg]]

*** Introduction

Nanocall is an alternative, open source, MIT licensed, basecaller for Oxford Nanopore Technologies (ONT) sequencing data. Published in [[https://doi.org/10.1093/bioinformatics/btw569][Bioinformatics, 2016]].

For the official ONT basecaller, see [[https://metrichor.com/s/][Metrichor]].

**** Usefulness of Nanocall on recent ONT sequencing data

To understand the usefulness of Nanocall compared to Metrichor, some background is in order.

*Before summer 2016*: Before the summer of 2016, and the release of the R9 sequencing pore:

- Metrichor was the only available basecaller for ONT data.
- Metrichor's source code was closed.
- Metrichor was only available as a cloud service.

This state of affairs prompted us to develop Nanocall as an open-source local basecaller alternative to Metrichor.

*After summer 2016*: The summer of 2016 has brought along several significant developments from ONT:

- A new sequencing pore R9 was released: ([[https://nanoporetech.com/about-us/news/update-new-r9-nanopore-faster-more-accurate-sequencing-and-new-ten-minute-preparation][ONT Press Release, May 2016]]).
- The Metrichor source code was opened (under a development license).
- ONT provided an official option for local basecalling: ([[https://nanoporetech.com/about-us/news/local-basecalling-now-available-enabling-minion-usage-field][ONT Press Release, Aug 2016]]).

As a result, Nanocall's usefulness is now limited to:

- a platform for developing new basecalling ideas, and
- situations where, for various reasons, you do not have access to the official ONT basecaller(s).

If you want to use Nanocall on R9 data, Nanocall does support it directly, but its accuracy is significantly lower than that of Metrichor (unlike the case of R7.3, where the two had similar accuracy). The reason for the discrepancy is that Metrichor on R9 uses a more elaborate RNN-based approach, compared to the simple HMM-based one in Nanocall.

**** Levels of ONT sequencing data

Most people are only used to dealing with DNA bases. However, to understand where Nanocall fits in, we observe that there are 3 levels of ONT sequencing data:

- Raw samples. These are direct (picoamp) current measurements, taken at preset intervals as the DNA molecule is threaded through the pore. This data is passed through the USB cable from the MinION to the controlling laptop running MinKNOW. These are stored in =fast5= files at paths such as =/Raw/Reads/Read_29/Signal=.

- Events. Each event is an aggregation of multiple consecutive raw samples, (ideally) corresponding to a certain DNA context found in the pore. The process of computing events from raw samples is referred to as /event detection/. These are stored in =fast5= files at paths such as =/Analyses/EventDetection_000/Reads/Read_29/Events=.

- DNA bases. These are the usual, finished product. The process of computing DNA bases from events is referred to as /basecalling/. These are stored in =fast5= files at paths such as =/Analyses/Basecall_2D_000/BaseCalled_2D/Fastq=.

On R7.3, event detection was performed locally by MinKNOW, and events were passed on to, and used by Metrichor. Since Nanocall was developed as an alternative local basecaller for R7.3 data, /Nanocall is designed to work with events, not with raw samples/.

On (at least some versions of) R9, Metrichor would entirely redo the event detection directly from raw samples, disregarding any event detection done locally by MinKNOW. As such, it is less uncommon with R9 (than with R7.3) to see =fast5= files without events. Nanocall cannot be run directly on such files. To use Nanocall on R9 data, you must either configure MinKNOW to perform local event detection, or pass the files through Metrichor to use its event detection.

*** Installation

Nanocall can be built from source in a classical UNIX environment, or directly under [[https://www.docker.com/what-docker][Docker]]. The Docker build might run under Windows, though this is not tested.

**** Under a Classical UNIX Environment

Nanocall uses =cmake= for configuration and =make= for building. The prerequisites needed for building are =zlib= and =hdf5=. On UNIX systems, =hdf5= can be optionally built as a submodule.
Example build:

#+BEGIN_EXAMPLE
mkdir /some/source/dir && cd /some/source/dir
git clone --recursive https://github.com/mateidavid/nanocall.git
cd nanocall
mkdir build && cd build
cmake ../src [-DCMAKE_INSTALL_PREFIX=/some/install/dir] [-DBUILD_HDF5=1] [-DHDF5_ROOT=/path/to/hdf5]
make
make install
/some/install/dir/bin/nanocall --version
#+END_EXAMPLE

*Notes*:

- The default install prefix is =/usr/local=.

- Setting =BUILD_HDF5= will cause =hdf5= to be downloaded and built as a submodule.

- Setting =HDF5_ROOT= is only necessary if a copy of =hdf5= is installed in a non-standard location. This is not needed when =BUILD_HDF5= is used.

**** Under Docker

To avoid dealing with prerequisites, Nanocall can be conveniently built under Docker. The installation and configuration of Docker itself is outside of the scope of this document.

***** Simple "fat" build

The simplest way to run Nanocall under Docker is:

#+BEGIN_EXAMPLE
docker build -t nanocall https://github.com/mateidavid/nanocall.git
docker run --rm nanocall --version
docker run --rm -u $(id -u):$(id -g) -v /path/to/data:/data nanocall -t 4 . >output.fa
#+END_EXAMPLE

Howver, there are several problems with this build:

- The docker image is "fat", in that it contains all the build time dependencies of Nanocall, which are not needed at run time.

- Without using =-u=, the image will create files with a UID of 0 on the mounted volumes of the host. To remove them, you will have to use =sudo rm= or =sudo chown=.

- The timezone inside the image might be different from the host. This might confuse programs which depend on comparing modification times, most notably =make=.

***** Alternate "slim" build

To alleviate the problems mentioned above, you can build a "slim" Docker image as follows:

#+BEGIN_EXAMPLE
git clone --recursive --depth 1 https://github.com/mateidavid/nanocall.git
nanocall/script/build-slim-docker-image
docker run --rm nanocall --version
docker run --rm -v /path/to/data:/data nanocall -t 4 . >output.fa
#+END_EXAMPLE

*** Usage Examples

#+BEGIN_EXAMPLE
# Check version
nanocall --version

# Check command line parameters
nanocall --help

# Run on single file, save output and log
nanocall /path/to/file.fast5 >output.fa 2>log

# Run on directory, using 24 threads, discard log
nanocall -t 24 /path/to/data >output.fa 2>/dev/null

# Run on file-of-file-names
nanocall /path/to/files.fofn >output.fa

# Run Docker build on directory, using 4 threads
# Note: -u is not needed with the "slim" build
docker run --rm -u $(id -u):$(id -g) -v /path/to/data:/data nanocall -t 4 . >output.fa
#+END_EXAMPLE

*** License

Released under the [[file:LICENSE][MIT license]].