{"id":32136904,"url":"https://github.com/bluenote-1577/skani","last_synced_at":"2026-02-19T13:34:03.756Z","repository":{"id":64071708,"uuid":"566050516","full_name":"bluenote-1577/skani","owner":"bluenote-1577","description":"Fast, robust ANI and aligned fraction for (metagenomic) genomes and contigs.","archived":false,"fork":false,"pushed_at":"2025-10-12T01:39:25.000Z","size":76490,"stargazers_count":224,"open_issues_count":5,"forks_count":14,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-10-21T04:49:09.561Z","etag":null,"topics":["average-nucleotide-identity","bioinformatics","metagenomics","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bluenote-1577.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-14T21:39:10.000Z","updated_at":"2025-10-20T15:58:33.000Z","dependencies_parsed_at":"2025-08-06T08:22:13.131Z","dependency_job_id":null,"html_url":"https://github.com/bluenote-1577/skani","commit_stats":{"total_commits":208,"total_committers":2,"mean_commits":104.0,"dds":0.009615384615384581,"last_synced_commit":"7f3207c1e07aefdf274eeb3f36d6c825d3cd5090"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/bluenote-1577/skani","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluenote-1577%2Fskani","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluenote-1577%2Fskani/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluenote-1577%2Fskani/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluenote-1577%2Fskani/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bluenote-1577","download_url":"https://codeload.github.com/bluenote-1577/skani/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bluenote-1577%2Fskani/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29614974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T13:04:20.082Z","status":"ssl_error","status_checked_at":"2026-02-19T13:03:33.775Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["average-nucleotide-identity","bioinformatics","metagenomics","rust"],"created_at":"2025-10-21T04:48:51.174Z","updated_at":"2026-02-19T13:34:03.750Z","avatar_url":"https://github.com/bluenote-1577.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# skani - accurate, fast nucleotide identity calculation for MAGs, genomes, and databases\n\n## Introduction\n\n**skani** is a program for calculating **average nucleotide identity** (ANI) and **aligned fraction** (AF) for DNA sequences (contigs/MAGs/genomes) and ANI \u003e ~80%.\n\nskani uses an approximate mapping method without base-level alignment to get ANI. It is magnitudes faster than BLAST-based methods and almost as accurate. skani offers:\n\n1. **Accurate ANI calculations for MAGs**. skani is accurate for incomplete and medium-quality metagenome-assembled genomes (MAGs). Pure sketching methods (e.g. Mash) may underestimate ANI for incomplete MAGs.\n\n2. **Aligned fraction results**. skani outputs the fraction of genome aligned. \n\n3. **Fast computations**. Indexing/sketching is ~ 3x faster than Mash, and querying is about 25x faster than FastANI (but slower than Mash). \n\n4. **Efficient database search**. Querying a genome against a preprocessed database of \u003e65000 prokaryotic genomes takes seconds with a single processor and ~6 GB of RAM. Constructing a database from genome sequences takes minutes to an hour.\n\n##  Updates\n\n\u003e [!IMPORTANT]\n\u003e \n\u003e Skani v0.3.x is now released. v0.3 has breaking changes compared to versions \u003c= 0.2.x. \n\n### v0.3.0 - 2025-08-10\n\n* BREAKING: old `.sketch` files no longer work.\n* New `skani sketch` functionality. Creates a single database instead of individual `.sketch` files by default. The previous behaviour can be obtained via `--separate-sketches` option.\n* Skani should now take 30-40% less memory, but 5-10% longer runtimes.\n\nSee the [CHANGELOG](https://github.com/bluenote-1577/skani/blob/main/CHANGELOG.md) for the skani's full versioning history. \n\n##  Install\n\n#### Option 1: Build from source\n\nRequirements:\n1. [rust](https://www.rust-lang.org/tools/install) programming language and associated tools such as cargo are required and assumed to be in PATH.\n2. A c compiler (e.g. GCC)\n3. make\n\nBuilding takes a few minutes (depending on # of cores).\n\n```sh\ngit clone https://github.com/bluenote-1577/skani\ncd skani\n\n# If default rust install directory is ~/.cargo\ncargo install --path . --root ~/.cargo\nskani dist refs/e.coli-EC590.fasta refs/e.coli-K12.fasta\n\n# If ~/.cargo doesn't exist use below commands instead\n#cargo build --release\n#./target/release/skani dist refs/e.coli-EC590.fasta refs/e.coli-K12.fasta\n```\n\nSee the [Releases](https://github.com/bluenote-1577/skani/releases) page for obtaining specific versions of skani.\n\n#### Option 2: Conda (source version: 0.3)\n[![Anaconda-Server Badge](https://anaconda.org/bioconda/skani/badges/version.svg)](https://anaconda.org/bioconda/skani)\n[![Anaconda-Server Badge](https://anaconda.org/bioconda/skani/badges/latest_release_date.svg)](https://anaconda.org/bioconda/skani)\n```sh\nconda install -c bioconda skani\n```\n\n#### Option 3: Pre-built x86-64 linux statically compiled executable\n\nWe offer a pre-built statically compiled executable for x86-64 Linux systems. That is, if you're on an x86-64 Linux system, you can just download the binary and run it without installing anything. \n\nFor using the latest version of skani: \n\n```sh\nwget https://github.com/bluenote-1577/skani/releases/download/latest/skani\nchmod +x skani\n./skani -h\n```\n\n**Important**: the binary runs slightly slower (3-10%) most of the time, but it can be drastically slower on some tasks. \n\n## Quick start\n\n```sh\n# compare two genomes for ANI. skani is symmetric, so order does not affect ANI\nskani dist genome1.fa genome2.fa \nskani dist genome2.fa genome1.fa \n\n# compare multiple genomes; all options take -t for multi-threading.\nskani dist -t 3 -q query1.fa query2.fa -r reference1.fa reference2.fa -o all-to-all_results.txt\n\n# compare individual fasta records (e.g. contigs)\nskani dist --qi -q assembly1.fa --ri -r assembly2.fa  \n\n# construct database and do memory-efficient search\nskani sketch genomes_to_search/* -o database\nskani search query1.fa query2.fa ... -d database\n\n# construct similarity matrix/edge list for all genomes in folder\nskani triangle genome_folder/* \u003e skani_ani_matrix.txt\nskani triangle genome_folder/* -E \u003e skani_ani_edge_list.txt\n\n# we provide a script in this repository for clustering/visualizing distance matrices.\n# requires python3, seaborn, scipy/numpy, and matplotlib.\npython scripts/clustermap_triangle.py skani_ani_matrix.txt \n\n```\n\n## Tutorials and manuals\n\n### [skani basic usage information](https://github.com/bluenote-1577/skani/wiki/skani-basic-usage-guide)\n\nFor more information about using the specific skani subcommands, see the [guide linked above](https://github.com/bluenote-1577/skani/wiki/skani-basic-usage-guide). \n\n### skani tutorials\n\n1. #### [Tutorial: setting up the GTDB prokaryotic genome database to search against](https://github.com/bluenote-1577/skani/wiki/Tutorial:-setting-up-the-GTDB-genome-database-to-search-against)\n2. #### [Tutorial: classifying entire assemblies against \u003e 85,000 genomes in under 2 minutes](https://github.com/bluenote-1577/skani/wiki/Tutorial:-classifying-entire-assemblies-(MAGs-or-contigs)-against-85,000-genomes-in-under-2-minutes)\n3. #### [Tutorial: strain-level clustering of MAGs using skani, and why Mash/FastANI have issues](https://github.com/bluenote-1577/skani/wiki/Tutorial:-strain-and-species-level-clustering-of-MAGs-with-skani-triangle)\n\n### [skani cookbook](https://github.com/bluenote-1577/skani/wiki/skani-cookbook)\n\nSome common use cases and parameter settings are outlined in the cookbook. \n\n### [Pre-sketched databases for searching](https://github.com/bluenote-1577/skani/wiki/Pre%E2%80%90sketched-databases)\n\nPre-sketched databases can be downloaded and quickly searched against. \n\n### [skani advanced usage information](https://github.com/bluenote-1577/skani/wiki/skani-advanced-usage-guide)\n\nSee the advanced usage guide linked above for more information about topics such as:\n\n* optimizing sensitivity/speed of skani\n* optimizing skani for long-reads or contigs\n* making skani for memory efficient for huge data sets\n\n## Output\n\nIf the resulting aligned fraction for the two genomes is \u003c 15%, no output is given. \n\n**In practice, this means that only results with \u003e ~82% ANI are reliably output** (with default parameters). See the [skani advanced usage guide](https://github.com/bluenote-1577/skani/wiki/skani-advanced-usage-guide) for information on how to compare lower ANI genomes. \n\nThe default output for `search` and `dist` looks like\n```\nRef_file\tQuery_file\tANI\tAlign_fraction_ref\tAlign_fraction_query\tRef_name\tQuery_name\nrefs/e.coli-EC590.fasta\trefs/e.coli-K12.fasta\t99.39\t93.95\t93.37\tNZ_CP016182.2 Escherichia coli strain EC590 chromosome, complete genome\tNC_007779.1 Escherichia coli str. K-12 substr. W3110, complete sequence\n```\n- Ref_file: the filename of the reference.\n- Query_file: the filename of the query.\n- ANI: the ANI.\n- Aligned_fraction_query/reference: fraction of query/reference covered by alignments.\n- Ref/Query_name: the id of the first record in the reference/query file.\n\nThe order of results is dependent on the command and not guaranteed to be deterministic when \u003e 5000 query genomes are present. `dist` and `search` try to place the highest ANI results first. \n\n## Citation\n\nJim Shaw and Yun William Yu. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods (2023). https://doi.org/10.1038/s41592-023-02018-3\n\n## Feature requests, issues\n\nskani is actively being developed by me ([Jim Shaw](https://jim-shaw-bluenote.github.io/)). I'm more than happy to accommodate simple feature requests (different types of outputs, etc). Feel free to open an issue with your feature request on the GitHub repository. If you catch any bugs, please open an issue or e-mail me (e-mail on my website). \n\n## Calling skani from rust or python\n\n### Rust API\n\nIf you're interested in using skani as a rust library, check out the minimal example here: https://github.com/bluenote-1577/skani-lib-example. The documentation is currently minimal (https://docs.rs/skani/0.1.0/skani/) and I guarantee no API stability. \n\n### Python bindings \n\nIf you're interested in calling skani from python, see the [pyskani](https://github.com/althonos/pyskani) python interface and bindings to skani written by [Martin Larralde](https://github.com/althonos). Note: I am not personally involved in the pyskani project and do not offer guarantees on the correctness of the outputs. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbluenote-1577%2Fskani","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbluenote-1577%2Fskani","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbluenote-1577%2Fskani/lists"}