{"id":27303095,"url":"https://github.com/tseemann/nullarbor","last_synced_at":"2025-04-12T02:50:01.689Z","repository":{"id":25508964,"uuid":"28940476","full_name":"tseemann/nullarbor","owner":"tseemann","description":":floppy_disk: :page_with_curl: \"Reads to report\" for public health and clinical microbiology","archived":false,"fork":false,"pushed_at":"2024-03-19T08:35:28.000Z","size":11880,"stargazers_count":131,"open_issues_count":65,"forks_count":37,"subscribers_count":20,"default_branch":"master","last_synced_at":"2024-06-13T00:03:07.313Z","etag":null,"topics":["bacteria","denovo-assembly","fastq","genotyping","phylogenomics","public-health","report","resistome","variant-calling","virulome"],"latest_commit_sha":null,"homepage":"","language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tseemann.png","metadata":{"files":{"readme":"README.V1.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-01-08T00:13:42.000Z","updated_at":"2024-05-20T23:13:39.000Z","dependencies_parsed_at":"2022-08-01T05:38:32.245Z","dependency_job_id":null,"html_url":"https://github.com/tseemann/nullarbor","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tseemann%2Fnullarbor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tseemann%2Fnullarbor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tseemann%2Fnullarbor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tseemann%2Fnullarbor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tseemann","download_url":"https://codeload.github.com/tseemann/nullarbor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248509056,"owners_count":21115929,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bacteria","denovo-assembly","fastq","genotyping","phylogenomics","public-health","report","resistome","variant-calling","virulome"],"created_at":"2025-04-12T02:50:01.173Z","updated_at":"2025-04-12T02:50:01.680Z","avatar_url":"https://github.com/tseemann.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nullarbor\n\nPipeline to generate complete public health microbiology reports from sequenced isolates\n\n:warning: This documents the previous Nullarbor 1.x version. Version 2.x is [here](README.md)\n\n## Motivation\n\nPublic health microbiology labs receive batches of bacterial isolates\nwhenever there is a suspected outbreak.  In modernised labs, each of these\nisolates will be whole genome sequenced, typically on an Illumina or Ion\nTorrent instrument.  Each of these WGS samples needs to quality checked for\ncoverage, contamination and correct species.  Genotyping (eg.  MLST) and\nresistome characterisation is also required.  Finally a phylogenetic tree\nneeds to be generated to show the relationship and genomic distance between\nthe strains.  All this information is then combined with epidemiological\ninformation (metadata for each sample) to assess the situation and inform\nfurther action.\n\n## Example reports\n\nFeel free to browse some [example reports](http://tseemann.github.io/nullarbor/).\n\n## Pipeline\n\n### Limitations\n\nNullarbor currently only supports Illumina paired-end sequencing data;\nsingle end reads, from either Illumina or Ion Torrent are not supported.\nAll jobs are run on a single compute node; there is no support yet for\ndistributing the work across a high performance cluster.\n\n### Per isolate\n\n1. Clean reads\n   * remove adaptors, low quality bases and reads (Trimmomatic)\n2. Species identification\n   * k-mer analysis against known genome database (Kraken)\n3. _De novo_ assembly\n   * Fast mostly-good-enough assembly (MEGA-HIT)\n   * More accurate, but slower assembly (SPAdes) using `--accurate`\n4. Annotation\n   * Genome annotation (Prokka)\n5. MLST\n   * From assembly w/ automatic scheme detection (mlst)\n6. Resistome\n   * From assembly (abricate)\n7. Variants\n   * From reads relative to reference (Snippy)\n\n### Per isolate set\n\n1. Core genome SNPs\n   * From reads (Snippy-core)\n2. Infer core SNP phylogeny \n   * Maximum likelihood (FastTree)\n   * SNP distance matrix (afa-pairwise)\n3. Pan genome\n   * From annotated contigs (Roary)\n4. Report\n   * Table of isolates, yield, coverage, species, MLST (HTML + Plotly.JS + DataTables)\n\n## Installation\n\n### Warning\n\nInstalling Nullarbor is not easy. It is a complex pipeline, and depends on lots of external\ntools and databases. If you have access to cloud or virtual machines you may wish to consider\nusing the [Genomics Virtual Lab image](http://genome.edu.au/) or the \n[Ubuntu 14.04 installer](https://gist.github.com/stephenturner/005d4e4e322b8cf5b991d1d357527859)\nby @stephenturner.\n\n### Local installation\n\nPlease first install the [Linuxbrew](https://github.com/Homebrew/linuxbrew) package manner, then:\n\n    brew tap homebrew/science\n    brew tap tseemann/bioinformatics-linux\n    brew install nullarbor --HEAD\n\nYou need to install a [Kraken](https://ccb.jhu.edu/software/kraken/) database.\n\n    wget https://ccb.jhu.edu/software/kraken/dl/minikraken.tgz\n    \nChoose a folder (say `$HOME`) to put it in, you need ~4 GB free:\n\n    tar -C $HOME -zxvf minikraken.tgz\n\nThen add the following to your `$HOME/.bashrc` so Nullarbor can use it:\n\n    export KRAKEN_DB_PATH=$HOME/minikraken_20141208\n\nYou should be good to go now. When you first run Nullarbor it will let you\nknow of any missing dependencies or databases.\n\n## Usage\n\n### Create a 'samples' file (TAB)\n\nThis is a file, one line per isolate, with 3 tab separated columns: ID, R1, R2.\n\n    Isolate1\t/data/reads/Isolate1_R1.fq.gz\t/data/reads/Isolate2_R1.fq.gz\n    Isolate2\t/data/reads/Isolate2_R1.fq      /data/reads/Isolate2_R2.fq\n    Isolate3\t/data/old/s_3_1_sequence.txt\t/data/old/s_3_2_sequence.txt\n    Isolate3b\t/data/reads/Isolate3b_R1.fastq\t/data/reads/Isolate3b_R2.fastq\n\n### Choose a reference genome (FASTA, GENBANK)\n\nThis is just a regular FASTA or GENBANK file. Try and choose a reference phylogenomically similar to your isolates.    \nIf you use a GENBANK or EMBL file the annotations will be used to annotate SNPs by Snippy.\n\n### Generate the run folder\n\nThis command will create a new folder with a `Makefile` in it:\n\n    nullarbor.pl --name PROJNAME --mlst saureus --ref US300.fna --input samples.tab --outdir OUTDIR\n\nThis will check that everything is okay. One of the last lines it prints is the command you need to run\nto actually perform the analysis _e.g._\n\n    Run the pipeline with: nice make -j 4 -C OUTDIR\n\nSo you can just cut and paste that:\n\n    nice make -j 4 -C OUTDIR\n\nThe `-C` option just means to change into the `/home/maria/listeria/nullarbor` folder first, so you could \ndo this instead:\n\n    cd OUTDIR\n    make -j 4\n\n### View the report\n\n    firefox OUTDIR/report/index.html\n\nHere are some [example reports](http://tseemann.github.io/nullarbor/).\n\n### See some options\n\nOnce set up, a Nullarbor folder can be used in a few different ways. \nSee what's available with this command:\n\n    make help\n\n## Etymology\n\nThe [Nullarbor](http://en.wikipedia.org/wiki/Nullarbor_Plain) \nis a huge treeless plain that spans the area between south-west and\nsouth-east Australia.  It comes from the Latin \"nullus\" (no) and \"arbor\"\n(tree), or \"no trees\".  As this software will generate a tree, there is an\nelement of Australian irony in the name.\n\n## Issues\n\nSubmit problems to the [Issues Page](https://github.com/tseemann/nullarbor/issues)\n\n## License\n\n[GPL 2.0](https://raw.githubusercontent.com/tseemann/nullarbor/master/LICENSE)\n\n## Citation\n\nSeemann T, Goncalves da Silva A, Bulach DM, Schultz MB, Kwong JC, Howden BP.\n*Nullarbor* \n**Github** https://github.com/tseemann/nullarbor\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftseemann%2Fnullarbor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftseemann%2Fnullarbor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftseemann%2Fnullarbor/lists"}