{"id":16916920,"url":"https://github.com/althonos/pyfamsa","last_synced_at":"2025-04-09T16:17:21.858Z","repository":{"id":50569513,"uuid":"518887966","full_name":"althonos/pyfamsa","owner":"althonos","description":"Cython bindings and Python interface to FAMSA, an algorithm for ultra-scale multiple sequence alignments.","archived":false,"fork":false,"pushed_at":"2025-03-04T00:13:09.000Z","size":296,"stargazers_count":31,"open_issues_count":3,"forks_count":3,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-09T16:17:09.798Z","etag":null,"topics":["bioinformatics","cython-library","genomics","multiple-sequence-alignment","python-bindings","python-library","sequence-alignment"],"latest_commit_sha":null,"homepage":"","language":"Cython","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/althonos.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-28T14:54:04.000Z","updated_at":"2025-03-18T16:48:43.000Z","dependencies_parsed_at":"2024-08-27T18:14:58.697Z","dependency_job_id":"03aeb686-7e1d-437c-ae4e-2e82076d6f66","html_url":"https://github.com/althonos/pyfamsa","commit_stats":{"total_commits":124,"total_committers":1,"mean_commits":124.0,"dds":0.0,"last_synced_commit":"838a3aeb78f8db9ccdacbf1b14bfc6b2475f12c7"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfamsa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfamsa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfamsa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyfamsa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/althonos","download_url":"https://codeload.github.com/althonos/pyfamsa/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248065284,"owners_count":21041872,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","cython-library","genomics","multiple-sequence-alignment","python-bindings","python-library","sequence-alignment"],"created_at":"2024-10-13T19:31:23.016Z","updated_at":"2025-04-09T16:17:21.829Z","avatar_url":"https://github.com/althonos.png","language":"Cython","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🐍🧮 PyFAMSA [![Stars](https://img.shields.io/github/stars/althonos/pyfamsa.svg?style=social\u0026maxAge=3600\u0026label=Star)](https://github.com/althonos/pyfamsa/stargazers)\n\n*[Cython](https://cython.org/) bindings and Python interface to [FAMSA](https://github.com/refresh-bio/FAMSA), an algorithm for ultra-scale multiple sequence alignments.*\n\n[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/pyfamsa/test.yml?branch=main\u0026logo=github\u0026style=flat-square\u0026maxAge=300)](https://github.com/althonos/pyfamsa/actions)\n[![Coverage](https://img.shields.io/codecov/c/gh/althonos/pyfamsa?style=flat-square\u0026maxAge=3600\u0026logo=codecov)](https://codecov.io/gh/althonos/pyfamsa/)\n[![License](https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square\u0026maxAge=2678400)](https://choosealicense.com/licenses/gpl-3.0/)\n[![PyPI](https://img.shields.io/pypi/v/pyfamsa.svg?style=flat-square\u0026maxAge=3600\u0026logo=PyPI)](https://pypi.org/project/pyfamsa)\n[![Bioconda](https://img.shields.io/conda/vn/bioconda/pyfamsa?style=flat-square\u0026maxAge=3600\u0026logo=anaconda)](https://anaconda.org/bioconda/pyfamsa)\n[![AUR](https://img.shields.io/aur/version/python-pyfamsa?logo=archlinux\u0026style=flat-square\u0026maxAge=3600)](https://aur.archlinux.org/packages/python-pyfamsa)\n[![Wheel](https://img.shields.io/pypi/wheel/pyfamsa.svg?style=flat-square\u0026maxAge=3600)](https://pypi.org/project/pyfamsa/#files)\n[![Python Versions](https://img.shields.io/pypi/pyversions/pyfamsa.svg?style=flat-square\u0026maxAge=600\u0026logo=python)](https://pypi.org/project/pyfamsa/#files)\n[![Python Implementations](https://img.shields.io/pypi/implementation/pyfamsa.svg?style=flat-square\u0026maxAge=600\u0026label=impl)](https://pypi.org/project/pyfamsa/#files)\n[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/pyfamsa/)\n[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square\u0026maxAge=2678400)](https://git.embl.de/larralde/pyfamsa/)\n[![Issues](https://img.shields.io/github/issues/althonos/pyfamsa.svg?style=flat-square\u0026maxAge=600)](https://github.com/althonos/pyfamsa/issues)\n[![Docs](https://img.shields.io/readthedocs/pyfamsa/latest?style=flat-square\u0026maxAge=600)](https://pyfamsa.readthedocs.io)\n[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/pyfamsa/blob/main/CHANGELOG.md)\n[![Downloads](https://img.shields.io/pypi/dm/pyfamsa?style=flat-square\u0026color=303f9f\u0026maxAge=86400\u0026label=downloads)](https://pepy.tech/project/pyfamsa)\n\n\n***⚠️ This package is based on FAMSA 2.***\n\n## 🗺️ Overview\n\n[FAMSA](https://github.com/refresh-bio/FAMSA) is a method published in\n2016 by Deorowicz *et al.*[\\[1\\]](#ref1) for large-scale multiple sequence alignments.\nIt uses state-of-the-art time and memory optimizations as well as a fast\nguide tree heuristic to reach very high performance and accuracy.\n\nPyFAMSA is a Python module that provides bindings to [FAMSA](https://github.com/refresh-bio/FAMSA)\nusing [Cython](https://cython.org/). It implements a user-friendly, Pythonic\ninterface to align protein sequences using different parameters and access\nresults directly. It interacts with the FAMSA library interface, which has\nthe following advantages:\n\n- **single dependency**: PyFAMSA is distributed as a Python package, so you\n  can add it as a dependency to your project, and stop worrying about the\n  FAMSA binary being present on the end-user machine.\n- **no intermediate files**: Everything happens in memory, in a Python object\n  you control, so you don't have to invoke the FAMSA CLI using a\n  sub-process and temporary files.\n- **friendly interface**: The different guide tree build methods and\n  heuristics can be selected from the Python code with a simple keyword\n  argument when configuring a new [`Aligner`](https://pyfamsa.readthedocs.io/en/stable/api/aligner.html#pyfamsa.Aligner).\n- **custom scoring matrices**: You can use any custom scoring matrix from\n  the [`scoring-matrices`](https://pypi.org/project/scoring-matrices) library\n  in addition to the default MIQS to score the alignment.\n\n## 🔧 Installing\n\nPyFAMSA can be installed directly from [PyPI](https://pypi.org/project/pyfamsa/),\nwhich hosts some pre-built wheels for the x86-64 and Aarch architectures\nfor Linux, MacOS and Windows, as well as the code required to compile from\nsource with Cython:\n```console\n$ pip install pyfamsa\n```\n\nOtherwise, PyFAMSA is also available as a [Bioconda](https://bioconda.github.io/)\npackage:\n```console\n$ conda install -c bioconda pyfamsa\n```\n\nOtherwise, have a look at the [Installation page](https://pyfamsa.readthedocs.io/en/stable/guide/install.html) of the [online documentation](https://pyfamsa.readthedocs.io/)\n\n## 💡 Example\n\nLet's create some sequences in memory, align them using the UPGMA method,\n(without any heuristic), and simply print the alignment on screen:\n\n```python\nfrom pyfamsa import Aligner, Sequence\n\nsequences = [\n    Sequence(b\"Sp8\",  b\"GLGKVIVYGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII\"),\n    Sequence(b\"Sp10\", b\"DPAVLFVIMLGTITKFSSEWFFAWLGLEINMMVII\"),\n    Sequence(b\"Sp26\", b\"AAAAAAAAALLTYLGLFLGTDYENFAAAAANAWLGLEINMMAQI\"),\n    Sequence(b\"Sp6\",  b\"ASGAILTLGIYLFTLCAVISVSWYLAWLGLEINMMAII\"),\n    Sequence(b\"Sp17\", b\"FAYTAPDLLLIGFLLKTVATFGDTWFQLWQGLDLNKMPVF\"),\n    Sequence(b\"Sp33\", b\"PTILNIAGLHMETDINFSLAWFQAWGGLEINKQAIL\"),\n]\n\naligner = Aligner(guide_tree=\"upgma\")\nmsa = aligner.align(sequences)\n\nfor sequence in msa:\n      print(sequence.id.decode().ljust(10), sequence.sequence.decode())\n```\n\nThis should output the following:\n```\nSp10       --------DPAVLFVIMLGTIT-KFS--SEWFFAWLGLEINMMVII\nSp17       ---FAYTAPDLLLIGFLLKTVA-TFG--DTWFQLWQGLDLNKMPVF\nSp26       AAAAAAAAALLTYLGLFLGTDYENFA--AAAANAWLGLEINMMAQI\nSp33       -------PTILNIAGLHMETDI-NFS--LAWFQAWGGLEINKQAIL\nSp6        ------ASGAILTLGIYLFTLCAVIS--VSWYLAWLGLEINMMAII\nSp8        ------GLGKVIVYGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII\n```\n\n## 🧶 Thread-safety\n\n`Aligner` objects are thread-safe, and the `align` method is re-entrant. You\ncould batch process several alignments in parallel using a\n[`ThreadPool`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.ThreadPool) with a single\naligner object:\n```python\nimport glob\nimport multiprocessing.pool\nimport Bio.SeqIO\nfrom pyfamsa import Aligner, Sequence\n\nfamilies = [\n    [ Sequence(r.id.encode(), r.seq.encode()) for r in Bio.SeqIO.parse(file, \"fasta\") ]\n    for file in glob.glob(\"pyfamsa/tests/data/*.faa\")\n]\n\naligner = Aligner()\nwith multiprocessing.pool.ThreadPool() as pool:\n    alignments = pool.map(aligner.align, families)\n```\n\n\u003c!-- ## ⏱️ Benchmarks --\u003e\n\n## 🔎 See Also\n\nDone with your protein alignment? You may be interested in trimming it: in that\ncase, you could use the [`pytrimal`](https://github.com/althonos/pytrimal) Python\npackage, which wraps [trimAl](http://trimal.cgenomics.org/) 2.0. Or perhaps\nyou want to build a HMM from the alignment? Then maybe have a look at\n[`pyhmmer`](https://github.com/althonos/pyhmmer), a Python package which\nwraps [HMMER](http://hmmer.org/).\n\n## 💭 Feedback\n\n### ⚠️ Issue Tracker\n\nFound a bug ? Have an enhancement request ? Head over to the [GitHub issue tracker](https://github.com/althonos/pyfamsa/issues)\nif you need to report or ask something. If you are filing in on a bug,\nplease include as much information as you can about the issue, and try to\nrecreate the same bug in a simple, easily reproducible situation.\n\n\n### 🏗️ Contributing\n\nContributions are more than welcome! See\n[`CONTRIBUTING.md`](https://github.com/althonos/pyfamsa/blob/main/CONTRIBUTING.md)\nfor more details.\n\n\n## 📋 Changelog\n\nThis project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)\nand provides a [changelog](https://github.com/althonos/pyfamsa/blob/main/CHANGELOG.md)\nin the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.\n\n\n## ⚖️ License\n\nThis library is provided under the [GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/). FAMSA is developed by the\n[REFRESH Bioinformatics Group](https://refresh-bio.github.io/) and is\ndistributed under the terms of the GPLv3 as well. See `vendor/FAMSA/LICENSE`\nfor more information. In addition, FAMSA vendors several libraries for\ncompatibility, all of which are redistributed with PyFAMSA under their own\nterms: `atomic_wait` (MIT License), `mimalloc` (MIT License), `libdeflate`\n(MIT License),  Boost (Boost Software License).\n\n*This project is in no way not affiliated, sponsored, or otherwise endorsed\nby the [FAMSA authors](https://github.com/refresh-bio). It was developed\nby [Martin Larralde](https://github.com/althonos/) during his PhD project\nat the [European Molecular Biology Laboratory](https://www.embl.de/) in\nthe [Zeller team](https://github.com/zellerlab).*\n\n\n## 📚 References\n\n- \u003ca id=\"ref1\"\u003e\\[1\\]\u003c/a\u003e Deorowicz, Sebastian, Debudaj-Grabysz, Agnieszka \u0026 Gudyś, Adam. ‘FAMSA: Fast and accurate multiple sequence alignment of huge protein families’. Sci Rep 6, 33964 (2016). [doi:10.1038/srep33964](https://doi.org/10.1038/srep33964)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fpyfamsa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falthonos%2Fpyfamsa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fpyfamsa/lists"}