{"id":13751796,"url":"https://github.com/lh3/miniasm","last_synced_at":"2026-01-27T00:38:07.520Z","repository":{"id":139699888,"uuid":"45634074","full_name":"lh3/miniasm","owner":"lh3","description":"Ultrafast de novo assembly for long noisy reads (though having no consensus step)","archived":false,"fork":false,"pushed_at":"2025-07-19T03:08:43.000Z","size":876,"stargazers_count":345,"open_issues_count":57,"forks_count":69,"subscribers_count":27,"default_branch":"master","last_synced_at":"2025-12-08T01:22:48.448Z","etag":null,"topics":["bioinformatics","denovo-assembly","genomics"],"latest_commit_sha":null,"homepage":"","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lh3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-11-05T19:25:42.000Z","updated_at":"2025-12-04T13:14:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"7ed45736-7d08-49b4-822d-13dc0e0bb121","html_url":"https://github.com/lh3/miniasm","commit_stats":{"total_commits":169,"total_committers":4,"mean_commits":42.25,"dds":"0.017751479289940808","last_synced_commit":"ce615d1d6b8678d38f2f9d27c9dccd944436ae75"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/lh3/miniasm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminiasm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminiasm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminiasm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminiasm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lh3","download_url":"https://codeload.github.com/lh3/miniasm/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fminiasm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28793959,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-26T21:49:50.245Z","status":"ssl_error","status_checked_at":"2026-01-26T21:48:29.455Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","denovo-assembly","genomics"],"created_at":"2024-08-03T09:00:54.913Z","updated_at":"2026-01-27T00:38:02.510Z","avatar_url":"https://github.com/lh3.png","language":"TeX","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"readme":"## Getting Started\n\n```sh\n# Download sample PacBio from the PBcR website\nwget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz | tar zxf -\nln -s selfSampleData/pacbio_filtered.fastq reads.fq\n# Install minimap and miniasm (requiring gcc and zlib)\ngit clone https://github.com/lh3/minimap2 \u0026\u0026 (cd minimap2 \u0026\u0026 make)\ngit clone https://github.com/lh3/miniasm  \u0026\u0026 (cd miniasm  \u0026\u0026 make)\n# Overlap for PacBio reads (or use \"-x ava-ont\" for nanopore read overlapping)\nminimap2/minimap2 -x ava-pb -t8 pb-reads.fq pb-reads.fq | gzip -1 \u003e reads.paf.gz\n# Layout\nminiasm/miniasm -f reads.fq reads.paf.gz \u003e reads.gfa\n```\n\n## Introduction\n\nMiniasm is a very fast OLC-based *de novo* assembler for noisy long reads. It\ntakes all-vs-all read self-mappings (typically by [minimap][minimap]) as input\nand outputs an assembly graph in the [GFA][gfa] format. Different from\nmainstream assemblers, miniasm does not have a consensus step. It simply\nconcatenates pieces of read sequences to generate the final [unitig][unitig]\nsequences. Thus the per-base error rate is similar to the raw input reads.\n\nSo far miniasm is in early development stage. It has only been tested on\na dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the\nmapping step, it takes about 3 minutes to assemble a bacterial genome. Under\nthe default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of\n4 ONT datasets into a single contig. The 12 PacBio data sets are [PacBio E.\ncoli sample][PB-151103], [ERS473430][ERS473430], [ERS544009][ERS544009],\n[ERS554120][ERS554120], [ERS605484][ERS605484], [ERS617393][ERS617393],\n[ERS646601][ERS646601], [ERS659581][ERS659581], [ERS670327][ERS670327],\n[ERS685285][ERS685285], [ERS743109][ERS743109] and a [deprecated PacBio E.\ncoli data set][PB-deprecated]. ONT data are acquired from the [Loman\nLab][loman-ont].\n\nFor a *C. elegans* [PacBio data set][ce] (only 40X are used, not the whole\ndataset), miniasm finishes the assembly, including reads overlapping, in ~10\nminutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In\ncomparison, the [HGAP3][hgap] produces a 104Mb assembly with N50 1.61Mb. [This\ndotter plot][ce-img] gives a global view of the miniasm assembly (on the X\naxis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course,\nthe HGAP3 consensus sequences are much more accurate. In addition, on the whole\ndata set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm\nstill needs improvements.\n\nMiniasm confirms that at least for high-coverage bacterial genomes, it is\npossible to generate long contigs from raw PacBio or ONT reads without error\ncorrection. It also shows that [minimap][minimap] can be used as a read\noverlapper, even though it is probably not as sensitive as the more\nsophisticated overlapers such as [MHAP][mhap] and [DALIGNER][daligner].\nCoupled with long-read error correctors and consensus tools, miniasm\nmay also be useful to produce high-quality assemblies.\n\n## Algorithm Overview\n\n1. Crude read selection. For each read, find the longest contiguous region\n   covered by three good mappings. Get an approximate estimate of read\n   coverage.\n\n2. Fine read selection. Use the coverage information to find the good regions\n   again but with more stringent thresholds. Discard contained reads.\n\n3. Generate a [string graph][sg]. Prune tips, drop weak overlaps and collapse\n   short bubbles. These procedures are similar to those implemented in\n   short-read assemblers.\n\n4. Merge unambiguous overlaps to produce unitig sequences.\n\n## Limitations\n\n1. Consensus base quality is similar to input reads (may be fixed with a\n   consensus tool).\n\n2. Only tested on a dozen of high-coverage PacBio/ONT data sets (more testing\n   needed).\n\n3. Prone to collapse repeats or segmental duplications longer than input reads\n   (hard to fix without error correction).\n\n\n\n[unitig]: http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology\n[minimap]: https://github.com/lh3/minimap\n[paf]: https://github.com/lh3/miniasm/blob/master/PAF.md\n[gfa]: https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md\n[ERS473430]: http://www.ebi.ac.uk/ena/data/view/ERS473430\n[ERS544009]: http://www.ebi.ac.uk/ena/data/view/ERS544009\n[ERS554120]: http://www.ebi.ac.uk/ena/data/view/ERS554120\n[ERS605484]: http://www.ebi.ac.uk/ena/data/view/ERS605484\n[ERS617393]: http://www.ebi.ac.uk/ena/data/view/ERS617393\n[ERS646601]: http://www.ebi.ac.uk/ena/data/view/ERS646601\n[ERS659581]: http://www.ebi.ac.uk/ena/data/view/ERS659581\n[ERS670327]: http://www.ebi.ac.uk/ena/data/view/ERS670327\n[ERS685285]: http://www.ebi.ac.uk/ena/data/view/ERS685285\n[ERS743109]: http://www.ebi.ac.uk/ena/data/view/ERS743109\n[PB-151103]: https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly\n[PB-deprecated]: https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8\n[ce]: https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set\n[mhap]: https://github.com/marbl/MHAP\n[daligner]: https://github.com/thegenemyers/DALIGNER\n[sg]: http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.abstract\n[loman-ont]: http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/\n[hgap]: https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP\n[ce-img]: http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fminiasm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flh3%2Fminiasm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fminiasm/lists"}