{"id":16916910,"url":"https://github.com/althonos/pyfastani","last_synced_at":"2026-02-26T11:33:25.893Z","repository":{"id":45843715,"uuid":"376186669","full_name":"althonos/pyfastani","owner":"althonos","description":"Cython bindings and Python interface to FastANI, a method for fast whole-genome similarity estimation.","archived":false,"fork":false,"pushed_at":"2025-10-15T18:54:29.000Z","size":445,"stargazers_count":23,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-11-12T06:20:01.002Z","etag":null,"topics":["ani","average-nucleotide-identity","bioinformatics","cython-library","metagenomes","python-bindings","python-library","taxonomy"],"latest_commit_sha":null,"homepage":"","language":"Cython","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/althonos.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-06-12T02:39:35.000Z","updated_at":"2025-10-15T18:54:34.000Z","dependencies_parsed_at":"2024-12-04T17:19:04.190Z","dependency_job_id":"981367a4-83a0-447e-bb62-81ed9f323a06","html_url":"https://github.com/althonos/pyfastani","commit_stats":{"total_commits":182,"total_committers":1,"mean_commits":182.0,"dds":0.0,"last_synced_commit":"87b0f905a121a1a2087aa5293308460a8ad19465"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/althonos/pyfastani","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfastani","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfastani/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfastani/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfastani/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/althonos","download_url":"https://codeload.github.com/althonos/pyfastani/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfastani/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29857555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T08:51:08.701Z","status":"ssl_error","status_checked_at":"2026-02-26T08:50:19.607Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ani","average-nucleotide-identity","bioinformatics","cython-library","metagenomes","python-bindings","python-library","taxonomy"],"created_at":"2024-10-13T19:31:19.597Z","updated_at":"2026-02-26T11:33:25.871Z","avatar_url":"https://github.com/althonos.png","language":"Cython","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🐍⏩🧬 PyFastANI [![Stars](https://img.shields.io/github/stars/althonos/pyfastani.svg?style=social\u0026maxAge=3600\u0026label=Star)](https://github.com/althonos/pyfastani/stargazers)\n\n*[Cython](https://cython.org/) bindings and Python interface to [FastANI](https://github.com/ParBLiSS/FastANI/), a method for fast whole-genome similarity estimation.\n**Now with multithreading!***\n\n[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/pyfastani/test.yml?branch=main\u0026logo=github\u0026style=flat-square\u0026maxAge=300)](https://github.com/althonos/pyfastani/actions)\n[![Coverage](https://img.shields.io/codecov/c/gh/althonos/pyfastani/branch/main.svg?style=flat-square\u0026maxAge=3600)](https://codecov.io/gh/althonos/pyfastani/)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square\u0026maxAge=2678400)](https://choosealicense.com/licenses/mit/)\n[![PyPI](https://img.shields.io/pypi/v/pyfastani.svg?style=flat-square\u0026maxAge=3600)](https://pypi.org/project/pyfastani)\n[![Bioconda](https://img.shields.io/conda/vn/bioconda/pyfastani?style=flat-square\u0026maxAge=3600\u0026logo=anaconda)](https://anaconda.org/bioconda/pyfastani)\n[![AUR](https://img.shields.io/aur/version/python-pyfastani?logo=archlinux\u0026style=flat-square\u0026maxAge=3600)](https://aur.archlinux.org/packages/python-pyfastani)\n[![Wheel](https://img.shields.io/pypi/wheel/pyfastani.svg?style=flat-square\u0026maxAge=3600)](https://pypi.org/project/pyfastani/#files)\n[![Python Versions](https://img.shields.io/pypi/pyversions/pyfastani.svg?style=flat-square\u0026maxAge=600)](https://pypi.org/project/pyfastani/#files)\n[![Python Implementations](https://img.shields.io/pypi/implementation/pyfastani.svg?style=flat-square\u0026maxAge=600\u0026label=impl)](https://pypi.org/project/pyfastani/#files)\n[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/pyfastani/)\n[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square\u0026maxAge=2678400)](https://git.embl.de/larralde/pyfastani/)\n[![Issues](https://img.shields.io/github/issues/althonos/pyfastani.svg?style=flat-square\u0026maxAge=600)](https://github.com/althonos/pyfastani/issues)\n[![Docs](https://img.shields.io/readthedocs/pyfastani/latest?style=flat-square\u0026maxAge=600)](https://pyfastani.readthedocs.io)\n[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/pyfastani/blob/master/CHANGELOG.md)\n[![Downloads](https://img.shields.io/pypi/dm/pyfastani?style=flat-square\u0026color=303f9f\u0026maxAge=86400\u0026label=downloads)](https://pepy.tech/project/pyfastani)\n[![Paper](https://img.shields.io/badge/paper-nargab%2Flqaf095-darkblue?style=flat-square\u0026maxAge=2678400)](https://academic.oup.com/nargab/article/7/3/lqaf095/8196481)\n\n\n## 🗺️ Overview\n\nFastANI is a method published in 2018 by [Chirag Jain](https://github.com/cjain7)\n*et al.* for high-throughput computation of whole-genome\n[Average Nucleotide Identity (ANI)](https://img.jgi.doe.gov/docs/ANI.pdf).\nIt uses [MashMap](https://github.com/marbl/MashMap) to compute orthologous mappings\nwithout the need for expensive alignments.\n\n\n`pyfastani` is a Python module, implemented using the [Cython](https://cython.org/)\nlanguage, that provides bindings to FastANI. It directly interacts with the\nFastANI internals, which has the following advantages over CLI wrappers:\n\n- **simpler compilation**: FastANI requires several additional libraries,\n  which make compilation of the original binary non-trivial. In PyFastANI,\n  libraries that were needed for threading or I/O are provided as stubs,\n  and `Boost::math` headers are vendored so you can build the package without\n  hassle. Or even better, just install from one of the provided wheels!\n- **single dependency**: If your software or your analysis pipeline is\n  distributed as a Python package, you can add `pyfastani` as a dependency to\n  your project, and stop worrying about the FastANI binary being present on\n  the end-user machine.\n- **sans I/O**: Everything happens in memory, in Python objects you control,\n  making it easier to pass your sequences to FastANI\n  without needing to write them to a temporary file.\n- **multi-threading**: Genome query resolves the fragment mapping step in\n  parallel, leading to shorter querying times even with a single genome.\n\n*This library is still a work-in-progress, and in an experimental stage,\nbut it should already pack enough features to be used in a standard pipeline.*\n\n\n## 🔧 Installing\n\nPyFastANI can be installed directly from [PyPI](https://pypi.org/project/pyfastani/),\nwhich hosts some pre-built CPython wheels for x86-64 Unix platforms, as well\nas the code required to compile from source with Cython:\n```console\n$ pip install pyfastani\n```\n\nIn the event you have to compile the package from source, all the required\nlibraries are vendored in the source distribution, so you'll only need a\nC/C++ compiler.\n\nOtherwise, PyFastANI is also available as a [Bioconda](https://pyfastani.github.io/)\npackage:\n```console\n$ conda install -c bioconda pyfastani\n```\n\n## 💡 Example\n\nThe following snippets show how to compute the ANI between two genomes,\nwith the reference being a draft genome. For one-to-many or many-to-many\nsearches, simply add additional references with `m.add_draft` before indexing.\n*Note that any name can be given to the reference sequences, this will just\naffect the `name` attribute of the hits returned for a query.*\n\n### 🔬 [Biopython](https://github.com/biopython/biopython)\n\nBiopython does not let us access to the sequence directly, so we need to\nconvert it to bytes first with the `bytes` builtin function. For older\nversions of Biopython (earlier than 1.79), use `record.seq.encode()`\ninstead of `bytes(record.seq)`.\n\n```python\nimport pyfastani\nimport Bio.SeqIO\n\nsketch = pyfastani.Sketch()\n\n# add a single draft genome to the mapper, and index it\nref = list(Bio.SeqIO.parse(\"vendor/FastANI/data/Shigella_flexneri_2a_01.fna\", \"fasta\"))\nsketch.add_draft(\"S. flexneri\", (bytes(record.seq) for record in ref))\n\n# index the sketch and get a mapper\nmapper = sketch.index()\n\n# read the query and query the mapper\nquery = Bio.SeqIO.read(\"vendor/FastANI/data/Escherichia_coli_str_K12_MG1655.fna\", \"fasta\")\nhits = mapper.query_sequence(bytes(query.seq))\n\nfor hit in hits:\n    print(\"E. coli K12 MG1655\", hit.name, hit.identity, hit.matches, hit.fragments)\n```\n\n### 🧪 [Scikit-bio](https://github.com/biocore/scikit-bio)\n\nScikit-bio lets us access to the sequence directly as a `numpy` array, but\nshows the values as byte strings by default. To make them readable as\n`char` (for compatibility with the C code), they must be cast with\n`seq.values.view('B')`.\n\n```python\nimport pyfastani\nimport skbio.io\n\nsketch = pyfastani.Sketch()\n\nref = list(skbio.io.read(\"vendor/FastANI/data/Shigella_flexneri_2a_01.fna\", \"fasta\"))\nsketch.add_draft(\"Shigella_flexneri_2a_01\", (seq.values.view('B') for seq in ref))\n\nmapper = sketch.index()\n\n# read the query and query the mapper\nquery = next(skbio.io.read(\"vendor/FastANI/data/Escherichia_coli_str_K12_MG1655.fna\", \"fasta\"))\nhits = mapper.query_genome(query.values.view('B'))\n\nfor hit in hits:\n    print(\"E. coli K12 MG1655\", hit.name, hit.identity, hit.matches, hit.fragments)\n```\n\n## ⏱️ Benchmarks\n\nIn the original FastANI tool, multi-threading was only used to improve the\nperformance of many-to-many searches: each thread would have a chunk of the\nreference genomes, and querying would be done in parallel for each reference.\nHowever, with a small set of reference genomes, there may not be enough for\nall the threads to work, so it cannot scale with a large number of threads. In\naddition, this causes the same query genome to be hashed several times, which\nis not optimal. In `pyfastani`, multi-threading is used to compute the hashes and mapping of query genome fragments. This allows parallelism to be useful even\nwhen a only few reference genomes are available.\n\nThe benchmarks below show the time for querying a single genome (with\n`Mapper.query_draft`) using a variable number of threads. *Benchmarks\nwere run on a [i7-8550U CPU](https://www.intel.fr/content/www/fr/fr/products/sku/122589/) running @1.80GHz with 4 physical / 8 logical\ncores, using 50 bacterial genomes from the [proGenomes](https://progenomes.embl.de/) database.\nFor clarity, only 5 randomly-selected genomes are shown on the second graph. Each run was repeated 3 times.*\n\n![Benchmarks](https://raw.githubusercontent.com/althonos/pyfastani/main/benches/mapping/v0.4.0.svg)\n\n## 🔖 Citation\n\nIf you found PyFastANI useful, please cite [our paper](https://academic.oup.com/nargab/article/7/3/lqaf095/8196481), \nas well as the original [FastANI paper](https://www.nature.com/articles/s41467-018-07641-9).\n\nTo cite PyFastANI:\n\n\u003e Martin Larralde, Georg Zeller, Laura M. Carroll. 2025. PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity. *NAR Genomics and Bioinformatics* 7(3):lqaf095. doi:10.1093/nargab/lqaf095.\n\nTo cite FastANI:\n\n\u003e Chirag Jain, Luis M Rodriguez-R, Adam M Phillippy, Konstantinos T Konstantinidis, Srinivas Aluru. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. *Nature Communications* 9(1):5114. doi:10.1038/s41467-018-07641-9.\n\n## 🔎 See Also\n\nComputing ANI for metagenomic sequences? You may be interested in\n[`pyskani`, a Python package for computing ANI](https://github.com/althonos/pyskani)\nusing the [`skani` method](https://www.biorxiv.org/content/10.1101/2023.01.18.524587v1)\ndeveloped by [Jim Shaw](https://jim-shaw-bluenote.github.io/)\nand [Yun William Yu](https://github.com/yunwilliamyu).\n\n## 💭 Feedback\n\n### ⚠️ Issue Tracker\n\nFound a bug ? Have an enhancement request ? Head over to the [GitHub issue\ntracker](https://github.com/althonos/pyfastani/issues) if you need to report\nor ask something. If you are filing in on a bug, please include as much\ninformation as you can about the issue, and try to recreate the same bug\nin a simple, easily reproducible situation.\n\n### 🏗️ Contributing\n\nContributions are more than welcome! See\n[`CONTRIBUTING.md`](https://github.com/althonos/pyfastani/blob/master/CONTRIBUTING.md)\nfor more details.\n\n\n## ⚖️ License\n\nThis library is provided under the [MIT License](https://choosealicense.com/licenses/mit/).\n\nThe FastANI code was written by [Chirag Jain](https://github.com/cjain7)\nand is distributed under the terms of the\n[Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/),\nunless otherwise specified in vendored sources. See `vendor/FastANI/LICENSE`\nfor more information.\nThe `cpu_features` code was written by [Guillaume Chatelet](https://github.com/gchatelet)\nand is distributed under the terms of the [Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/).\nSee `vendor/cpu_features/LICENSE` for more information.\nThe `Boost::math` headers were written by [Boost Libraries](https://www.boost.org/) contributors\nand is distributed under the terms of the [Boost Software License](https://choosealicense.com/licenses/bsl-1.0/).\nSee `vendor/boost-math/LICENSE` for more information.\n\n*This project is in no way not affiliated, sponsored, or otherwise endorsed\nby the [original FastANI authors](https://github.com/cjain7). It was developed by\n[Martin Larralde](https://github.com/althonos/) during his PhD project\nat the [European Molecular Biology Laboratory](https://www.embl.de/) in\nthe [Zeller team](https://github.com/zellerlab).*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fpyfastani","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falthonos%2Fpyfastani","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fpyfastani/lists"}