{"id":27415658,"url":"https://github.com/philres/ngmlr","last_synced_at":"2025-04-14T09:28:12.855Z","repository":{"id":47735745,"uuid":"67132304","full_name":"philres/ngmlr","owner":"philres","description":"NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations","archived":false,"fork":false,"pushed_at":"2021-08-16T04:02:11.000Z","size":37167,"stargazers_count":277,"open_issues_count":48,"forks_count":40,"subscribers_count":22,"default_branch":"master","last_synced_at":"2023-10-25T18:33:57.415Z","etag":null,"topics":["alignment","bioconda","docker","long-read","mapper","next-generation-sequencing","oxford-nanopore","pacbio","structural-variations"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philres.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-09-01T13:11:53.000Z","updated_at":"2023-10-08T09:32:15.000Z","dependencies_parsed_at":"2022-09-12T15:24:16.175Z","dependency_job_id":null,"html_url":"https://github.com/philres/ngmlr","commit_stats":null,"previous_names":[],"tags_count":12,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philres%2Fngmlr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philres%2Fngmlr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philres%2Fngmlr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philres%2Fngmlr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philres","download_url":"https://codeload.github.com/philres/ngmlr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248853538,"owners_count":21172159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","bioconda","docker","long-read","mapper","next-generation-sequencing","oxford-nanopore","pacbio","structural-variations"],"created_at":"2025-04-14T09:28:11.990Z","updated_at":"2025-04-14T09:28:12.838Z","avatar_url":"https://github.com/philres.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"### Quick start\n\nDownload [binary](https://github.com/philres/ngmlr/releases/tag/v0.2.6) from github and unzip or [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/ngmlr/README.html) or pull docker [![Docker Automated buil](https://img.shields.io/docker/automated/jrottenberg/ffmpeg.svg)](https://hub.docker.com/r/philres/ngmlr/). For updates follow [![Twitter URL](https://img.shields.io/twitter/url/http/shields.io.svg?style=social\u0026style=plastic)](https://twitter.com/philres1)\n\nDownload precompiled version:\n```bash\nwget https://github.com/philres/ngmlr/releases/download/v0.2.7/ngmlr-0.2.7-linux-x86_64.tar.gz\ntar xvzf ngmlr-0.2.7-linux-x86_64.tar.gz\ncd ngmlr-0.2.7/\n```\n\nFor PacBio data run:\n```bash\nngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam\n```\nFor Oxford Nanopore run:\n```bash\nngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam -x ont\n```\n\n### Introduction\n \nCoNvex Gap-cost alignMents for Long Reads (ngmlr) is a long-read mapper designed to sensitively align PacBio or Oxford Nanopore to (large) reference genomes. It was designed to quickly and correctly align the reads, including those spanning (complex) structural variations. Ngmlr uses an SV aware k-mer search to find approximate mapping locations for a read and then a banded Smith-Waterman alignment algorithm to compute the final alignment. Ngmlr uses a convex gap cost model that penalizes gap extensions for longer gaps less than for shorter ones to compute precise alignments. The gap model allows ngmlr to account for both the sequencing error and real genomic variations at the same time and makes it especially effective at more precisely identifying the position of breakpoints stemming from structural variations. The k-mer search helps to detect and split reads that cannot be aligned linearly, enabling ngmlr to reliably align reads to a wide range of different structural variations including nested SVs (e.g. inversions flanked by deletions).\n\nWith 10 cores (AMD Opteron 6348), ngmlr currently takes about 90 minutes and 10 GB RAM for aligning 3Gbp (~ 1x human data) of PacBio reads.\n\n\n### Citation:\nPlease see and cite our paper:\nhttps://www.nature.com/articles/s41592-018-0001-7\n\n\n**Poster \u0026 Talks:**\n\n[Accurate and fast detection of complex and nested structural variations using long read technologies](http://schatzlab.cshl.edu/presentations/2016/2016.10.28.BIODATA.PacBioSV.pdf)\u003cbr\u003e\nBiological Data Science, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 26 - 29.10.2016\n\n[NGMLR: Highly accurate read mapping of third generation sequencing reads for improved structural variation analysis](http://www.cibiv.at/~philipp_/files/gi2016_poster_phr.pdf)\u003cbr\u003e \nGenome Informatics 2016, Wellcome Genome Campus Conference Centre, Hinxton, Cambridge, UK, 19.09.-2.09.2016\n\n### Parameters\n\n```\nUsage: ngmlr [options] -r \u003creference\u003e -q \u003creads\u003e [-o \u003coutput\u003e]\n\nInput/Output:\n    -r \u003cfile\u003e,  --reference \u003cfile\u003e\n        (required)  Path to the reference genome (FASTA/Q, can be gzipped)\n    -q \u003cfile\u003e,  --query \u003cfile\u003e\n        Path to the read file (FASTA/Q) [/dev/stdin]\n    -o \u003cstring\u003e,  --output \u003cstring\u003e\n        Path to output file [stdout]\n    --skip-write\n        Don't write reference index to disk [false]\n    --bam-fix\n        Report reads with \u003e 64k CIGAR operations as unmapped. Required to be compatible with the BAM format [false]\n    --rg-id \u003cstring\u003e\n        Adds RG:Z:\u003cstring\u003e to all alignments in SAM/BAM [none]\n    --rg-sm \u003cstring\u003e\n        RG header: Sample [none]\n    --rg-lb \u003cstring\u003e\n        RG header: Library [none]\n    --rg-pl \u003cstring\u003e\n        RG header: Platform [none]\n    --rg-ds \u003cstring\u003e\n        RG header: Description [none]\n    --rg-dt \u003cstring\u003e\n        RG header: Date (format: YYYY-MM-DD) [none]\n    --rg-pu \u003cstring\u003e\n        RG header: Platform unit [none]\n    --rg-pi \u003cstring\u003e\n        RG header: Median insert size [none]\n    --rg-pg \u003cstring\u003e\n        RG header: Programs [none]\n    --rg-cn \u003cstring\u003e\n        RG header: sequencing center [none]\n    --rg-fo \u003cstring\u003e\n        RG header: Flow order [none]\n    --rg-ks \u003cstring\u003e\n        RG header: Key sequence [none]\n\nGeneral:\n    -t \u003cint\u003e,  --threads \u003cint\u003e\n        Number of threads [1]\n    -x \u003cpacbio, ont\u003e,  --presets \u003cpacbio, ont\u003e\n        Parameter presets for different sequencing technologies [pacbio]\n    -i \u003c0-1\u003e,  --min-identity \u003c0-1\u003e\n        Alignments with an identity lower than this threshold will be discarded [0.65]\n    -R \u003cint/float\u003e,  --min-residues \u003cint/float\u003e\n        Alignments containing less than \u003cint\u003e or (\u003cfloat\u003e * read length) residues will be discarded [0.25]\n    --no-smallinv\n        Don't detect small inversions [false]\n    --no-lowqualitysplit\n        Split alignments with poor quality [false]\n    --verbose\n        Debug output [false]\n    --no-progress\n        Don't print progress info while mapping [false]\n\nAdvanced:\n    --match \u003cfloat\u003e\n        Match score [2]\n    --mismatch \u003cfloat\u003e\n        Mismatch score [-5]\n    --gap-open \u003cfloat\u003e\n        Gap open score [-5]\n    --gap-extend-max \u003cfloat\u003e\n        Gap open extend max [-5]\n    --gap-extend-min \u003cfloat\u003e\n        Gap open extend min [-1]\n    --gap-decay \u003cfloat\u003e\n        Gap extend decay [0.15]\n    -k \u003c10-15\u003e,  --kmer-length \u003c10-15\u003e\n        K-mer length in bases [13]\n    --kmer-skip \u003cint\u003e\n        Number of k-mers to skip when building the lookup table from the reference [2]\n    --bin-size \u003cint\u003e\n        Sets the size of the grid used during candidate search [4]\n    --max-segments \u003cint\u003e\n        Max number of segments allowed for a read per kb [1]\n    --subread-length \u003cint\u003e\n        Length of fragments reads are split into [256]\n    --subread-corridor \u003cint\u003e\n        Length of corridor sub-reads are aligned with [40]\n```\n\n### Running with docker\n```bash\ndocker run -ti -v /home/user/data/:/home/user/data/ philres/ngmlr ngmlr -r /home/user/data/ref.fa -q /home/user/data/reads.fasta -o /home/user/data/output.sam\n```\n\n### Building ngmlr from source\nOS: Linux and Mac OSX:\nRequirements: zlib-dev, cmake, gcc/g++ (\u003e=4.8.2)\n\n```bash\ngit clone https://github.com/philres/ngmlr.git\ncd ngmlr/\nmkdir -p build\ncd build/\ncmake ..\nmake\n\ncd ../bin/ngmlr-*/\n./ngmlr\n```\n\n### Building ngmlr for linux with docker\n```bash\ngit clone https://github.com/philres/ngmlr.git\nmkdir -p ngmlr/build\ndocker run -v `pwd`/ngmlr:/ngmlr philres/nextgenmaplr-buildenv bash -c \"cd /ngmlr/build \u0026\u0026 cmake .. \u0026\u0026  make\"\n`pwd`/ngmlr/bin/ngmlr-*/ngmlr\n```\n\n### NGMLR progress information\nExample:\n```\nProcessed: 92198 (0.66), R/S: 37.44, RL: 8857, Time: 2.00 5.00 11.62, Align: 0.96, 490, 0.81\n```\n\n92198 reads were processed so far\n66 % of the 92198 reads were mapped (with \u003e 25 % of their bp mapped)\n37.44 are mapped on average per second\n8857 is the average read length so far\n\n\"Time\" and \"Align\" are for debugging purpose and will be removed.\n\n### Datasets used in the mansucript:\nWe provide the NGMLR aligned reads and the Sniffles calls for the data sets used:  \n\nArabidopsis trio: [http://labshare.cshl.edu/shares/schatzlab/www-data/fsedlaze/Sniffles/Arabidopsis_trio](http://labshare.cshl.edu/shares/schatzlab/www-data/fsedlaze/Sniffles/Arabidopsis_trio) . \n\nGenome in a Bottle trio: \n+ Mappings: [ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_MtSinai_NIST/Baylor_NGMLR_bam_GRCh37/](ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_MtSinai_NIST/Baylor_NGMLR_bam_GRCh37/) . \n\n+ SV calls: [http://labshare.cshl.edu/shares/schatzlab/www-data/fsedlaze/Sniffles/GiaB/](http://labshare.cshl.edu/shares/schatzlab/www-data/fsedlaze/Sniffles/GiaB/)\n\nNA12878: [http://labshare.cshl.edu/shares/schatzlab/www-data/fsedlaze/Sniffles/NA12878/](http://labshare.cshl.edu/shares/schatzlab/www-data/fsedlaze/Sniffles/NA12878/) .  \n\nSKBR3: [http://labshare.cshl.edu/shares/schatzlab/www-data/fsedlaze/Sniffles/Skbr3/](http://labshare.cshl.edu/shares/schatzlab/www-data/fsedlaze/Sniffles/Skbr3/) . \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilres%2Fngmlr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilres%2Fngmlr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilres%2Fngmlr/lists"}