https://github.com/tseemann/nullarbor
:floppy_disk: :page_with_curl: "Reads to report" for public health and clinical microbiology
https://github.com/tseemann/nullarbor
bacteria denovo-assembly fastq genotyping phylogenomics public-health report resistome variant-calling virulome
Last synced: about 1 month ago
JSON representation
:floppy_disk: :page_with_curl: "Reads to report" for public health and clinical microbiology
- Host: GitHub
- URL: https://github.com/tseemann/nullarbor
- Owner: tseemann
- License: gpl-2.0
- Created: 2015-01-08T00:13:42.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2024-03-19T08:35:28.000Z (about 1 year ago)
- Last Synced: 2024-06-13T00:03:07.313Z (11 months ago)
- Topics: bacteria, denovo-assembly, fastq, genotyping, phylogenomics, public-health, report, resistome, variant-calling, virulome
- Language: Perl
- Homepage:
- Size: 11.3 MB
- Stars: 131
- Watchers: 20
- Forks: 37
- Open Issues: 65
-
Metadata Files:
- Readme: README.V1.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Nullarbor
Pipeline to generate complete public health microbiology reports from sequenced isolates
:warning: This documents the previous Nullarbor 1.x version. Version 2.x is [here](README.md)
## Motivation
Public health microbiology labs receive batches of bacterial isolates
whenever there is a suspected outbreak. In modernised labs, each of these
isolates will be whole genome sequenced, typically on an Illumina or Ion
Torrent instrument. Each of these WGS samples needs to quality checked for
coverage, contamination and correct species. Genotyping (eg. MLST) and
resistome characterisation is also required. Finally a phylogenetic tree
needs to be generated to show the relationship and genomic distance between
the strains. All this information is then combined with epidemiological
information (metadata for each sample) to assess the situation and inform
further action.## Example reports
Feel free to browse some [example reports](http://tseemann.github.io/nullarbor/).
## Pipeline
### Limitations
Nullarbor currently only supports Illumina paired-end sequencing data;
single end reads, from either Illumina or Ion Torrent are not supported.
All jobs are run on a single compute node; there is no support yet for
distributing the work across a high performance cluster.### Per isolate
1. Clean reads
* remove adaptors, low quality bases and reads (Trimmomatic)
2. Species identification
* k-mer analysis against known genome database (Kraken)
3. _De novo_ assembly
* Fast mostly-good-enough assembly (MEGA-HIT)
* More accurate, but slower assembly (SPAdes) using `--accurate`
4. Annotation
* Genome annotation (Prokka)
5. MLST
* From assembly w/ automatic scheme detection (mlst)
6. Resistome
* From assembly (abricate)
7. Variants
* From reads relative to reference (Snippy)### Per isolate set
1. Core genome SNPs
* From reads (Snippy-core)
2. Infer core SNP phylogeny
* Maximum likelihood (FastTree)
* SNP distance matrix (afa-pairwise)
3. Pan genome
* From annotated contigs (Roary)
4. Report
* Table of isolates, yield, coverage, species, MLST (HTML + Plotly.JS + DataTables)## Installation
### Warning
Installing Nullarbor is not easy. It is a complex pipeline, and depends on lots of external
tools and databases. If you have access to cloud or virtual machines you may wish to consider
using the [Genomics Virtual Lab image](http://genome.edu.au/) or the
[Ubuntu 14.04 installer](https://gist.github.com/stephenturner/005d4e4e322b8cf5b991d1d357527859)
by @stephenturner.### Local installation
Please first install the [Linuxbrew](https://github.com/Homebrew/linuxbrew) package manner, then:
brew tap homebrew/science
brew tap tseemann/bioinformatics-linux
brew install nullarbor --HEADYou need to install a [Kraken](https://ccb.jhu.edu/software/kraken/) database.
wget https://ccb.jhu.edu/software/kraken/dl/minikraken.tgz
Choose a folder (say `$HOME`) to put it in, you need ~4 GB free:tar -C $HOME -zxvf minikraken.tgz
Then add the following to your `$HOME/.bashrc` so Nullarbor can use it:
export KRAKEN_DB_PATH=$HOME/minikraken_20141208
You should be good to go now. When you first run Nullarbor it will let you
know of any missing dependencies or databases.## Usage
### Create a 'samples' file (TAB)
This is a file, one line per isolate, with 3 tab separated columns: ID, R1, R2.
Isolate1 /data/reads/Isolate1_R1.fq.gz /data/reads/Isolate2_R1.fq.gz
Isolate2 /data/reads/Isolate2_R1.fq /data/reads/Isolate2_R2.fq
Isolate3 /data/old/s_3_1_sequence.txt /data/old/s_3_2_sequence.txt
Isolate3b /data/reads/Isolate3b_R1.fastq /data/reads/Isolate3b_R2.fastq### Choose a reference genome (FASTA, GENBANK)
This is just a regular FASTA or GENBANK file. Try and choose a reference phylogenomically similar to your isolates.
If you use a GENBANK or EMBL file the annotations will be used to annotate SNPs by Snippy.### Generate the run folder
This command will create a new folder with a `Makefile` in it:
nullarbor.pl --name PROJNAME --mlst saureus --ref US300.fna --input samples.tab --outdir OUTDIR
This will check that everything is okay. One of the last lines it prints is the command you need to run
to actually perform the analysis _e.g._Run the pipeline with: nice make -j 4 -C OUTDIR
So you can just cut and paste that:
nice make -j 4 -C OUTDIR
The `-C` option just means to change into the `/home/maria/listeria/nullarbor` folder first, so you could
do this instead:cd OUTDIR
make -j 4### View the report
firefox OUTDIR/report/index.html
Here are some [example reports](http://tseemann.github.io/nullarbor/).
### See some options
Once set up, a Nullarbor folder can be used in a few different ways.
See what's available with this command:make help
## Etymology
The [Nullarbor](http://en.wikipedia.org/wiki/Nullarbor_Plain)
is a huge treeless plain that spans the area between south-west and
south-east Australia. It comes from the Latin "nullus" (no) and "arbor"
(tree), or "no trees". As this software will generate a tree, there is an
element of Australian irony in the name.## Issues
Submit problems to the [Issues Page](https://github.com/tseemann/nullarbor/issues)
## License
[GPL 2.0](https://raw.githubusercontent.com/tseemann/nullarbor/master/LICENSE)
## Citation
Seemann T, Goncalves da Silva A, Bulach DM, Schultz MB, Kwong JC, Howden BP.
*Nullarbor*
**Github** https://github.com/tseemann/nullarbor