{"id":13575582,"url":"https://github.com/lh3/minimap2","last_synced_at":"2025-04-23T23:16:57.729Z","repository":{"id":37597613,"uuid":"97612481","full_name":"lh3/minimap2","owner":"lh3","description":"A versatile pairwise aligner for genomic and spliced nucleotide sequences","archived":false,"fork":false,"pushed_at":"2025-04-18T17:43:30.000Z","size":1980,"stargazers_count":1934,"open_issues_count":54,"forks_count":432,"subscribers_count":85,"default_branch":"master","last_synced_at":"2025-04-23T23:16:52.420Z","etag":null,"topics":["bioinformatics","genomics","sequence-alignment","spliced-alignment"],"latest_commit_sha":null,"homepage":"https://lh3.github.io/minimap2","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lh3.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"code_of_conduct.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-07-18T15:04:53.000Z","updated_at":"2025-04-21T02:50:26.000Z","dependencies_parsed_at":"2024-01-13T11:57:55.738Z","dependency_job_id":"e8971b6d-fc08-4aa1-ad6d-df25815bff2b","html_url":"https://github.com/lh3/minimap2","commit_stats":{"total_commits":1053,"total_committers":44,"mean_commits":"23.931818181818183","dds":0.05982905982905984,"last_synced_commit":"f3e59fc2a09e89fdb2c166b6d02281e98288c326"},"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminimap2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminimap2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminimap2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminimap2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lh3","download_url":"https://codeload.github.com/lh3/minimap2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250528902,"owners_count":21445519,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","genomics","sequence-alignment","spliced-alignment"],"created_at":"2024-08-01T15:01:02.343Z","updated_at":"2025-04-23T23:16:57.675Z","avatar_url":"https://github.com/lh3.png","language":"C","funding_links":[],"categories":["C","Variant Callers","Next Generation Sequencing","Ranked by starred repositories","Genomics Software","Software packages"],"sub_categories":["SV callers","Assembly","Articles and References","Alignment"],"readme":"[![GitHub Downloads](https://img.shields.io/github/downloads/lh3/minimap2/total.svg?style=social\u0026logo=github\u0026label=Download)](https://github.com/lh3/minimap2/releases)\n[![BioConda Install](https://img.shields.io/conda/dn/bioconda/minimap2.svg?style=flag\u0026label=BioConda%20install)](https://anaconda.org/bioconda/minimap2)\n[![PyPI](https://img.shields.io/pypi/v/mappy.svg?style=flat)](https://pypi.python.org/pypi/mappy)\n[![Build Status](https://github.com/lh3/minimap2/actions/workflows/ci.yaml/badge.svg)](https://github.com/lh3/minimap2/actions)\n## \u003ca name=\"started\"\u003e\u003c/a\u003eGetting Started\n```sh\ngit clone https://github.com/lh3/minimap2\ncd minimap2 \u0026\u0026 make\n# long sequences against a reference genome\n./minimap2 -a test/MT-human.fa test/MT-orang.fa \u003e test.sam\n# create an index first and then map\n./minimap2 -x map-ont -d MT-human-ont.mmi test/MT-human.fa\n./minimap2 -a MT-human-ont.mmi test/MT-orang.fa \u003e test.sam\n# use presets (no test data)\n./minimap2 -ax map-pb ref.fa pacbio.fq.gz \u003e aln.sam       # PacBio CLR genomic reads\n./minimap2 -ax map-ont ref.fa ont.fq.gz \u003e aln.sam         # Oxford Nanopore genomic reads\n./minimap2 -ax map-hifi ref.fa pacbio-ccs.fq.gz \u003e aln.sam # PacBio HiFi/CCS genomic reads (v2.19+)\n./minimap2 -ax lr:hq ref.fa ont-Q20.fq.gz \u003e aln.sam       # Nanopore Q20 genomic reads (v2.27+)\n./minimap2 -ax sr ref.fa read1.fa read2.fa \u003e aln.sam      # short genomic paired-end reads\n./minimap2 -ax splice ref.fa rna-reads.fa \u003e aln.sam       # spliced long reads (strand unknown)\n./minimap2 -ax splice -uf -k14 ref.fa reads.fa \u003e aln.sam  # noisy Nanopore direct RNA-seq\n./minimap2 -ax splice:hq -uf ref.fa query.fa \u003e aln.sam    # PacBio Kinnex/Iso-seq (RNA-seq)\n./minimap2 -ax splice --junc-bed=anno.bed12 ref.fa query.fa \u003e aln.sam  # use annotated junctions\n./minimap2 -ax splice:sr ref.fa r1.fq r2.fq \u003e aln.sam     # short-read RNA-seq (v2.29+)\n./minimap2 -ax splice:sr -j anno.bed12 ref.fa r1.fq r2.fq \u003e aln.sam\n./minimap2 -cx asm5 asm1.fa asm2.fa \u003e aln.paf             # intra-species asm-to-asm alignment\n./minimap2 -x ava-pb reads.fa reads.fa \u003e overlaps.paf     # PacBio read overlap\n./minimap2 -x ava-ont reads.fa reads.fa \u003e overlaps.paf    # Nanopore read overlap\n# man page for detailed command line options\nman ./minimap2.1\n```\n\n## Table of Contents\n\n- [Getting Started](#started)\n- [Users' Guide](#uguide)\n  - [Installation](#install)\n  - [General usage](#general)\n  - [Use cases](#cases)\n    - [Map long noisy genomic reads](#map-long-genomic)\n    - [Map long mRNA/cDNA reads](#map-long-splice)\n    - [Find overlaps between long reads](#long-overlap)\n    - [Map short genomic reads](#short-genomic)\n    - [Map short RNA-seq reads](#short-rna-seq)\n    - [Full genome/assembly alignment](#full-genome)\n  - [Advanced features](#advanced)\n    - [Working with \u003e65535 CIGAR operations](#long-cigar)\n    - [The cs optional tag](#cs)\n    - [Working with the PAF format](#paftools)\n  - [Algorithm overview](#algo)\n  - [Getting help](#help)\n  - [Citing minimap2](#cite)\n- [Developers' Guide](#dguide)\n- [Limitations](#limit)\n\n## \u003ca name=\"uguide\"\u003e\u003c/a\u003eUsers' Guide\n\nMinimap2 is a versatile sequence alignment program that aligns DNA or mRNA\nsequences against a large reference database. Typical use cases include: (1)\nmapping PacBio or Oxford Nanopore genomic reads to the human genome; (2)\nfinding overlaps between long reads with error rate up to ~15%; (3)\nsplice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads\nagainst a reference genome; (4) aligning Illumina single- or paired-end reads;\n(5) assembly-to-assembly alignment; (6) full-genome alignment between two\nclosely related species with divergence below ~15%.\n\nFor ~10kb noisy reads sequences, minimap2 is tens of times faster than\nmainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP. It is more\naccurate on simulated long reads and produces biologically meaningful alignment\nready for downstream analyses. For \u003e100bp Illumina short reads, minimap2 is\nthree times as fast as BWA-MEM and Bowtie2, and as accurate on simulated data.\nDetailed evaluations are available from the [minimap2 paper][doi] or the\n[preprint][preprint].\n\n### \u003ca name=\"install\"\u003e\u003c/a\u003eInstallation\n\nMinimap2 is optimized for x86-64 CPUs. You can acquire precompiled binaries from\nthe [release page][release] with:\n```sh\ncurl -L https://github.com/lh3/minimap2/releases/download/v2.29/minimap2-2.29_x64-linux.tar.bz2 | tar -jxvf -\n./minimap2-2.29_x64-linux/minimap2\n```\nIf you want to compile from the source, you need to have a C compiler, GNU make\nand zlib development files installed. Then type `make` in the source code\ndirectory to compile. If you see compilation errors, try `make sse2only=1`\nto disable SSE4 code, which will make minimap2 slightly slower.\n\nMinimap2 also works with ARM CPUs supporting the NEON instruction sets. To\ncompile for 32 bit ARM architectures (such as ARMv7), use `make arm_neon=1`. To\ncompile for for 64 bit ARM architectures (such as ARMv8), use `make arm_neon=1\naarch64=1`.\n\nMinimap2 can use [SIMD Everywhere (SIMDe)][simde] library for porting\nimplementation to the different SIMD instruction sets. To compile using SIMDe,\nuse `make -f Makefile.simde`. To compile for ARM CPUs, use `Makefile.simde`\nwith the ARM related command lines given above.\n\n### \u003ca name=\"general\"\u003e\u003c/a\u003eGeneral usage\n\nWithout any options, minimap2 takes a reference database and a query sequence\nfile as input and produce approximate mapping, without base-level alignment\n(i.e. coordinates are only approximate and no CIGAR in output), in the [PAF format][paf]:\n```sh\nminimap2 ref.fa query.fq \u003e approx-mapping.paf\n```\nYou can ask minimap2 to generate CIGAR at the `cg` tag of PAF with:\n```sh\nminimap2 -c ref.fa query.fq \u003e alignment.paf\n```\nor to output alignments in the [SAM format][sam]:\n```sh\nminimap2 -a ref.fa query.fq \u003e alignment.sam\n```\nMinimap2 seamlessly works with gzip'd FASTA and FASTQ formats as input. You\ndon't need to convert between FASTA and FASTQ or decompress gzip'd files first.\n\nFor the human reference genome, minimap2 takes a few minutes to generate a\nminimizer index for the reference before mapping. To reduce indexing time, you\ncan optionally save the index with option **-d** and replace the reference\nsequence file with the index file on the minimap2 command line:\n```sh\nminimap2 -d ref.mmi ref.fa                     # indexing\nminimap2 -a ref.mmi reads.fq \u003e alignment.sam   # alignment\n```\n***Importantly***, it should be noted that once you build the index, indexing\nparameters such as **-k**, **-w**, **-H** and **-I** can't be changed during\nmapping. If you are running minimap2 for different data types, you will\nprobably need to keep multiple indexes generated with different parameters.\nThis makes minimap2 different from BWA which always uses the same index\nregardless of query data types.\n\n### \u003ca name=\"cases\"\u003e\u003c/a\u003eUse cases\n\nMinimap2 uses the same base algorithm for all applications. However, due to the\ndifferent data types it supports (e.g. short vs long reads; DNA vs mRNA reads),\nminimap2 needs to be tuned for optimal performance and accuracy. It is usually\nrecommended to choose a preset with option **-x**, which sets multiple\nparameters at the same time. The default setting is the same as `map-ont`.\n\n#### \u003ca name=\"map-long-genomic\"\u003e\u003c/a\u003eMap long noisy genomic reads\n\n```sh\nminimap2 -ax map-pb  ref.fa pacbio-reads.fq \u003e aln.sam   # for PacBio CLR reads\nminimap2 -ax map-ont ref.fa ont-reads.fq \u003e aln.sam      # for Oxford Nanopore reads\nminimap2 -ax map-iclr ref.fa iclr-reads.fq \u003e aln.sam    # for Illumina Complete Long Reads\n```\nThe difference between `map-pb` and `map-ont` is that `map-pb` uses\nhomopolymer-compressed (HPC) minimizers as seeds, while `map-ont` uses ordinary\nminimizers as seeds. Empirical evaluation suggests HPC minimizers improve\nperformance and sensitivity when aligning PacBio CLR reads, but hurt when aligning\nNanopore reads. `map-iclr` uses an adjusted alignment scoring matrix that\naccounts for the low overall error rate in the reads, with transversion errors\nbeing less frequent than transitions.\n\n#### \u003ca name=\"map-long-splice\"\u003e\u003c/a\u003eMap long mRNA/cDNA reads\n\n```sh\nminimap2 -ax splice:hq -uf ref.fa iso-seq.fq \u003e aln.sam       # PacBio Iso-seq/traditional cDNA\nminimap2 -ax splice ref.fa nanopore-cdna.fa \u003e aln.sam        # Nanopore 2D cDNA-seq\nminimap2 -ax splice -uf -k14 ref.fa direct-rna.fq \u003e aln.sam  # Nanopore Direct RNA-seq\nminimap2 -ax splice --splice-flank=no SIRV.fa SIRV-seq.fa    # mapping against SIRV control\n```\nThere are different long-read RNA-seq technologies, including tranditional\nfull-length cDNA, EST, PacBio Iso-seq, Nanopore 2D cDNA-seq and Direct RNA-seq.\nThey produce data of varying quality and properties. By default, `-x splice`\nassumes the read orientation relative to the transcript strand is unknown. It\ntries two rounds of alignment to infer the orientation and write the strand to\nthe `ts` SAM/PAF tag if possible. For Iso-seq, Direct RNA-seq and tranditional\nfull-length cDNAs, it would be desired to apply `-u f` to force minimap2 to\nconsider the forward transcript strand only. This speeds up alignment with\nslight improvement to accuracy. For noisy Nanopore Direct RNA-seq reads, it is\nrecommended to use a smaller k-mer size for increased sensitivity to the first\nor the last exons.\n\nMinimap2 rates an alignment by the score of the max-scoring sub-segment,\n*excluding* introns, and marks the best alignment as primary in SAM. When a\nspliced gene also has unspliced pseudogenes, minimap2 slightly prefers\nthe spliced alignment. By default, minimap2 outputs up to five secondary\nalignments (i.e. likely pseudogenes in the context of RNA-seq mapping). This\ncan be tuned with option **-N**.\n\nFor long RNA-seq reads, minimap2 may produce chimeric alignments potentially\ncaused by gene fusions/structural variations or by an intron longer than the\nmax intron length **-G** (200k by default). For now, it is not recommended to\napply an excessively large **-G** as this slows down minimap2 and sometimes\nleads to false alignments.\n\nIt is worth noting that by default `-x splice` prefers GT[A/G]..[C/T]AG\nover GT[C/T]..[A/G]AG, and then over other splicing signals. Considering\none additional base improves the junction accuracy for noisy reads, but\nreduces the accuracy when aligning against the widely used SIRV control data.\nThis is because SIRV does not honor the evolutionarily conservative splicing\nsignal. If you are studying SIRV, you may apply `--splice-flank=no` to let\nminimap2 only model GT..AG, ignoring the additional base.\n\nSince v2.17, minimap2 can optionally take annotated genes as input and\nprioritize on annotated splice junctions. To use this feature, you can \n```sh\npaftools.js gff2bed anno.gff \u003e anno.bed\nminimap2 -ax splice --junc-bed anno.bed ref.fa query.fa \u003e aln.sam\n```\nHere, `anno.gff` is the gene annotation in the GTF or GFF3 format (`gff2bed`\nautomatically tests the format). The output of `gff2bed` is in the 12-column\nBED format, or the BED12 format. With the `--junc-bed` option, minimap2 adds a\nbonus score (tuned by `--junc-bonus`) if an aligned junction matches a junction\nin the annotation. Option `--junc-bed` also takes 5-column BED, including the\nstrand field. In this case, each line indicates an oriented junction.\n\n**Note:** `--junc-bed` is intended for long noisy RNA-seq reads only.\nApplying the option to short RNA-seq reads would increase run time with little\nimprovement to junction accuracy.\n\n#### \u003ca name=\"long-overlap\"\u003e\u003c/a\u003eFind overlaps between long reads\n\n```sh\nminimap2 -x ava-pb  reads.fq reads.fq \u003e ovlp.paf    # PacBio CLR read overlap\nminimap2 -x ava-ont reads.fq reads.fq \u003e ovlp.paf    # Oxford Nanopore read overlap\n```\nSimilarly, `ava-pb` uses HPC minimizers while `ava-ont` uses ordinary\nminimizers. It is usually not recommended to perform base-level alignment in\nthe overlapping mode because it is slow and may produce false positive\noverlaps. However, if performance is not a concern, you may try to add `-a` or\n`-c` anyway.\n\n#### \u003ca name=\"short-genomic\"\u003e\u003c/a\u003eMap short genomic reads\n\n```sh\nminimap2 -ax sr ref.fa reads-se.fq \u003e aln.sam           # single-end alignment\nminimap2 -ax sr ref.fa read1.fq read2.fq \u003e aln.sam     # paired-end alignment\nminimap2 -ax sr ref.fa reads-interleaved.fq \u003e aln.sam  # paired-end alignment\n```\nWhen two read files are specified, minimap2 reads from each file in turn and\nmerge them into an interleaved stream internally. Two reads are considered to\nbe paired if they are adjacent in the input stream and have the same name (with\nthe `/[0-9]` suffix trimmed if present). Single- and paired-end reads can be\nmixed.\n\n#### \u003ca name=\"short-rna-seq\"\u003e\u003c/a\u003eMap short RNA-seq reads\n\n```sh\nminimap2 -ax splice:sr ref.fa reads-se.fq.gz \u003e aln.sam           # single-end\nminimap2 -ax splice:sr ref.fa r1.fq.gz r2.fq.gz \u003e aln.sam        # paired-end\nminimap2 -ax splice:sr -j anno.bed ref.fa r1.fq r2.fq \u003e aln.sam  # use annotation\n# 2-pass alignment\nminimap2 -x splice:sr -j anno.bed --write-junc ref.fa r1.fq r2.fq \u003e junc.bed\nminimap2 -ax splice:sr -j anno.bed --pass1=junc.bed ref.fa r1.fq r2.fq \u003e aln.sam\n```\nThe new preset `splice:sr` was added in v2.29. It functions similarly to `sr`\nexcept that it performs spliced alignment.\n\n#### \u003ca name=\"full-genome\"\u003e\u003c/a\u003eFull genome/assembly alignment\n\n```sh\nminimap2 -ax asm5 ref.fa asm.fa \u003e aln.sam       # assembly to assembly/ref alignment\n```\nFor cross-species full-genome alignment, the scoring system needs to be tuned\naccording to the sequence divergence.\n\n### \u003ca name=\"advanced\"\u003e\u003c/a\u003eAdvanced features\n\n#### \u003ca name=\"long-cigar\"\u003e\u003c/a\u003eWorking with \u003e65535 CIGAR operations\n\nDue to a design flaw, BAM does not work with CIGAR strings with \u003e65535\noperations (SAM and CRAM work). However, for ultra-long nanopore reads minimap2\nmay align ~1% of read bases with long CIGARs beyond the capability of BAM. If\nyou convert such SAM/CRAM to BAM, Picard and recent samtools will throw an\nerror and abort. Older samtools and other tools may create corrupted BAM.\n\nTo avoid this issue, you can add option `-L` at the minimap2 command line.\nThis option moves a long CIGAR to the `CG` tag and leaves a fully clipped CIGAR\nat the SAM CIGAR column. Current tools that don't read CIGAR (e.g. merging and\nsorting) still work with such BAM records; tools that read CIGAR will\neffectively ignore these records. It has been decided that future tools\nwill seamlessly recognize long-cigar records generated by option `-L`.\n\n**TL;DR**: if you work with ultra-long reads and use tools that only process\nBAM files, please add option `-L`.\n\n#### \u003ca name=\"cs\"\u003e\u003c/a\u003eThe cs optional tag\n\nThe `cs` SAM/PAF tag encodes bases at mismatches and INDELs. It matches regular\nexpression `/(:[0-9]+|\\*[a-z][a-z]|[=\\+\\-][A-Za-z]+)+/`. Like CIGAR, `cs`\nconsists of series of operations.  Each leading character specifies the\noperation; the following sequence is the one involved in the operation.\n\nThe `cs` tag is enabled by command line option `--cs`. The following alignment,\nfor example:\n```txt\nCGATCGATAAATAGAGTAG---GAATAGCA\n||||||   ||||||||||   |||| |||\nCGATCG---AATAGAGTAGGTCGAATtGCA\n```\nis represented as `:6-ata:10+gtc:4*at:3`, where `:[0-9]+` represents an\nidentical block, `-ata` represents a deletion, `+gtc` an insertion and `*at`\nindicates reference base `a` is substituted with a query base `t`. It is\nsimilar to the `MD` SAM tag but is standalone and easier to parse.\n\nIf `--cs=long` is used, the `cs` string also contains identical sequences in\nthe alignment. The above example will become\n`=CGATCG-ata=AATAGAGTAG+gtc=GAAT*at=GCA`. The long form of `cs` encodes both\nreference and query sequences in one string. The `cs` tag also encodes intron\npositions and splicing signals (see the [minimap2 manpage][manpage-cs] for\ndetails).\n\n#### \u003ca name=\"paftools\"\u003e\u003c/a\u003eWorking with the PAF format\n\nMinimap2 also comes with a (java)script [paftools.js](misc/paftools.js) that\nprocesses alignments in the PAF format. It calls variants from\nassembly-to-reference alignment, lifts over BED files based on alignment,\nconverts between formats and provides utilities for various evaluations. For\ndetails, please see [misc/README.md](misc/README.md).\n\n### \u003ca name=\"algo\"\u003e\u003c/a\u003eAlgorithm overview\n\nIn the following, minimap2 command line options have a dash ahead and are\nhighlighted in bold. The description may help to tune minimap2 parameters.\n\n1. Read **-I** [=*4G*] reference bases, extract (**-k**,**-w**)-minimizers and\n   index them in a hash table.\n\n2. Read **-K** [=*200M*] query bases. For each query sequence, do step 3\n   through 7:\n\n3. For each (**-k**,**-w**)-minimizer on the query, check against the reference\n   index. If a reference minimizer is not among the top **-f** [=*2e-4*] most\n   frequent, collect its the occurrences in the reference, which are called\n   *seeds*.\n\n4. Sort seeds by position in the reference. Chain them with dynamic\n   programming. Each chain represents a potential mapping. For read\n   overlapping, report all chains and then go to step 8. For reference mapping,\n   do step 5 through 7:\n\n5. Let *P* be the set of primary mappings, which is an empty set initially. For\n   each chain from the best to the worst according to their chaining scores: if\n   on the query, the chain overlaps with a chain in *P* by **--mask-level**\n   [=*0.5*] or higher fraction of the shorter chain, mark the chain as\n   *secondary* to the chain in *P*; otherwise, add the chain to *P*.\n\n6. Retain all primary mappings. Also retain up to **-N** [=*5*] top secondary\n   mappings if their chaining scores are higher than **-p** [=*0.8*] of their\n   corresponding primary mappings.\n\n7. If alignment is requested, filter out an internal seed if it potentially\n   leads to both a long insertion and a long deletion. Extend from the\n   left-most seed. Perform global alignments between internal seeds.  Split the\n   chain if the accumulative score along the global alignment drops by **-z**\n   [=*400*], disregarding long gaps. Extend from the right-most seed.  Output\n   chains and their alignments.\n\n8. If there are more query sequences in the input, go to step 2 until no more\n   queries are left.\n\n9. If there are more reference sequences, reopen the query file from the start\n   and go to step 1; otherwise stop.\n\n### \u003ca name=\"help\"\u003e\u003c/a\u003eGetting help\n\nManpage [minimap2.1][manpage] provides detailed description of minimap2\ncommand line options and optional tags. The [FAQ](FAQ.md) page answers several\nfrequently asked questions. If you encounter bugs or have further questions or\nrequests, you can raise an issue at the [issue page][issue].  There is not a\nspecific mailing list for the time being.\n\n### \u003ca name=\"cite\"\u003e\u003c/a\u003eCiting minimap2\n\nIf you use minimap2 in your work, please cite:\n\n\u003e Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences.\n\u003e *Bioinformatics*, **34**:3094-3100. [doi:10.1093/bioinformatics/bty191][doi]\n\nand/or:\n\n\u003e Li, H. (2021). New strategies to improve minimap2 alignment accuracy.\n\u003e *Bioinformatics*, **37**:4572-4574. [doi:10.1093/bioinformatics/btab705][doi2]\n\n## \u003ca name=\"dguide\"\u003e\u003c/a\u003eDevelopers' Guide\n\nMinimap2 is not only a command line tool, but also a programming library.\nIt provides C APIs to build/load index and to align sequences against the\nindex. File [example.c](example.c) demonstrates typical uses of C APIs. Header\nfile [minimap.h](minimap.h) gives more detailed API documentation. Minimap2\naims to keep APIs in this header stable. File [mmpriv.h](mmpriv.h) contains\nadditional private APIs which may be subjected to changes frequently.\n\nThis repository also provides Python bindings to a subset of C APIs. File\n[python/README.rst](python/README.rst) gives the full documentation;\n[python/minimap2.py](python/minimap2.py) shows an example. This Python\nextension, mappy, is also [available from PyPI][mappypypi] via `pip install\nmappy` or [from BioConda][mappyconda] via `conda install -c bioconda mappy`.\n\n## \u003ca name=\"limit\"\u003e\u003c/a\u003eLimitations\n\n* Minimap2 may produce suboptimal alignments through long low-complexity\n  regions where seed positions may be suboptimal. This should not be a big\n  concern because even the optimal alignment may be wrong in such regions.\n\n* Minimap2 requires SSE2 instructions on x86 CPUs or NEON on ARM CPUs. It is\n  possible to add non-SIMD support, but it would make minimap2 slower by\n  several times.\n\n* Minimap2 does not work with a single query or database sequence ~2\n  billion bases or longer (2,147,483,647 to be exact). The total length of all\n  sequences can well exceed this threshold.\n\n* Minimap2 often misses small exons.\n\n\n\n[paf]: https://github.com/lh3/miniasm/blob/master/PAF.md\n[sam]: https://samtools.github.io/hts-specs/SAMv1.pdf\n[minimap]: https://github.com/lh3/minimap\n[smartdenovo]: https://github.com/ruanjue/smartdenovo\n[longislnd]: https://www.ncbi.nlm.nih.gov/pubmed/27667791\n[gaba]: https://github.com/ocxtal/libgaba\n[ksw2]: https://github.com/lh3/ksw2\n[preprint]: https://arxiv.org/abs/1708.01492\n[release]: https://github.com/lh3/minimap2/releases\n[mappypypi]: https://pypi.python.org/pypi/mappy\n[mappyconda]: https://anaconda.org/bioconda/mappy\n[issue]: https://github.com/lh3/minimap2/issues\n[k8]: https://github.com/attractivechaos/k8\n[manpage]: https://lh3.github.io/minimap2/minimap2.html\n[manpage-cs]: https://lh3.github.io/minimap2/minimap2.html#10\n[doi]: https://doi.org/10.1093/bioinformatics/bty191\n[doi2]: https://doi.org/10.1093/bioinformatics/btab705\n[simde]: https://github.com/nemequ/simde\n[unimap]: https://github.com/lh3/unimap\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fminimap2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flh3%2Fminimap2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fminimap2/lists"}