https://github.com/althonos/pyfamsa

Cython bindings and Python interface to FAMSA, an algorithm for ultra-scale multiple sequence alignments.
https://github.com/althonos/pyfamsa

bioinformatics cython-library genomics multiple-sequence-alignment python-bindings python-library sequence-alignment

Last synced: about 2 months ago
JSON representation

Cython bindings and Python interface to FAMSA, an algorithm for ultra-scale multiple sequence alignments.

Host: GitHub
URL: https://github.com/althonos/pyfamsa
Owner: althonos
License: gpl-3.0
Created: 2022-07-28T14:54:04.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2025-03-04T00:13:09.000Z (3 months ago)
Last Synced: 2025-04-09T16:17:09.798Z (about 2 months ago)
Topics: bioinformatics, cython-library, genomics, multiple-sequence-alignment, python-bindings, python-library, sequence-alignment
Language: Cython
Homepage:
Size: 289 KB
Stars: 31
Watchers: 5
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: COPYING

Awesome Lists containing this project

README

        # 🐍🧮 PyFAMSA [![Stars](https://img.shields.io/github/stars/althonos/pyfamsa.svg?style=social&maxAge=3600&label=Star)](https://github.com/althonos/pyfamsa/stargazers)

*[Cython](https://cython.org/) bindings and Python interface to [FAMSA](https://github.com/refresh-bio/FAMSA), an algorithm for ultra-scale multiple sequence alignments.*

[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/pyfamsa/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/althonos/pyfamsa/actions)

[![Coverage](https://img.shields.io/codecov/c/gh/althonos/pyfamsa?style=flat-square&maxAge=3600&logo=codecov)](https://codecov.io/gh/althonos/pyfamsa/)

[![License](https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square&maxAge=2678400)](https://choosealicense.com/licenses/gpl-3.0/)

[![PyPI](https://img.shields.io/pypi/v/pyfamsa.svg?style=flat-square&maxAge=3600&logo=PyPI)](https://pypi.org/project/pyfamsa)

[![Bioconda](https://img.shields.io/conda/vn/bioconda/pyfamsa?style=flat-square&maxAge=3600&logo=anaconda)](https://anaconda.org/bioconda/pyfamsa)

[![AUR](https://img.shields.io/aur/version/python-pyfamsa?logo=archlinux&style=flat-square&maxAge=3600)](https://aur.archlinux.org/packages/python-pyfamsa)

[![Wheel](https://img.shields.io/pypi/wheel/pyfamsa.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/pyfamsa/#files)

[![Python Versions](https://img.shields.io/pypi/pyversions/pyfamsa.svg?style=flat-square&maxAge=600&logo=python)](https://pypi.org/project/pyfamsa/#files)

[![Python Implementations](https://img.shields.io/pypi/implementation/pyfamsa.svg?style=flat-square&maxAge=600&label=impl)](https://pypi.org/project/pyfamsa/#files)

[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/pyfamsa/)

[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400)](https://git.embl.de/larralde/pyfamsa/)

[![Issues](https://img.shields.io/github/issues/althonos/pyfamsa.svg?style=flat-square&maxAge=600)](https://github.com/althonos/pyfamsa/issues)

[![Docs](https://img.shields.io/readthedocs/pyfamsa/latest?style=flat-square&maxAge=600)](https://pyfamsa.readthedocs.io)

[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/pyfamsa/blob/main/CHANGELOG.md)

[![Downloads](https://img.shields.io/pypi/dm/pyfamsa?style=flat-square&color=303f9f&maxAge=86400&label=downloads)](https://pepy.tech/project/pyfamsa)

***⚠️ This package is based on FAMSA 2.***

## 🗺️ Overview

[FAMSA](https://github.com/refresh-bio/FAMSA) is a method published in

2016 by Deorowicz *et al.*[\[1\]](#ref1) for large-scale multiple sequence alignments.

It uses state-of-the-art time and memory optimizations as well as a fast

guide tree heuristic to reach very high performance and accuracy.

PyFAMSA is a Python module that provides bindings to [FAMSA](https://github.com/refresh-bio/FAMSA)

using [Cython](https://cython.org/). It implements a user-friendly, Pythonic

interface to align protein sequences using different parameters and access

results directly. It interacts with the FAMSA library interface, which has

the following advantages:

- **single dependency**: PyFAMSA is distributed as a Python package, so you

  can add it as a dependency to your project, and stop worrying about the

  FAMSA binary being present on the end-user machine.

- **no intermediate files**: Everything happens in memory, in a Python object

  you control, so you don't have to invoke the FAMSA CLI using a

  sub-process and temporary files.

- **friendly interface**: The different guide tree build methods and

  heuristics can be selected from the Python code with a simple keyword

  argument when configuring a new [`Aligner`](https://pyfamsa.readthedocs.io/en/stable/api/aligner.html#pyfamsa.Aligner).

- **custom scoring matrices**: You can use any custom scoring matrix from

  the [`scoring-matrices`](https://pypi.org/project/scoring-matrices) library

  in addition to the default MIQS to score the alignment.

## 🔧 Installing

PyFAMSA can be installed directly from [PyPI](https://pypi.org/project/pyfamsa/),

which hosts some pre-built wheels for the x86-64 and Aarch architectures

for Linux, MacOS and Windows, as well as the code required to compile from

source with Cython:

```console

$ pip install pyfamsa

```

Otherwise, PyFAMSA is also available as a [Bioconda](https://bioconda.github.io/)

package:

```console

$ conda install -c bioconda pyfamsa

```

Otherwise, have a look at the [Installation page](https://pyfamsa.readthedocs.io/en/stable/guide/install.html) of the [online documentation](https://pyfamsa.readthedocs.io/)

## 💡 Example

Let's create some sequences in memory, align them using the UPGMA method,

(without any heuristic), and simply print the alignment on screen:

```python

from pyfamsa import Aligner, Sequence

sequences = [

    Sequence(b"Sp8",  b"GLGKVIVYGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII"),

    Sequence(b"Sp10", b"DPAVLFVIMLGTITKFSSEWFFAWLGLEINMMVII"),

    Sequence(b"Sp26", b"AAAAAAAAALLTYLGLFLGTDYENFAAAAANAWLGLEINMMAQI"),

    Sequence(b"Sp6",  b"ASGAILTLGIYLFTLCAVISVSWYLAWLGLEINMMAII"),

    Sequence(b"Sp17", b"FAYTAPDLLLIGFLLKTVATFGDTWFQLWQGLDLNKMPVF"),

    Sequence(b"Sp33", b"PTILNIAGLHMETDINFSLAWFQAWGGLEINKQAIL"),

]

aligner = Aligner(guide_tree="upgma")

msa = aligner.align(sequences)

for sequence in msa:

      print(sequence.id.decode().ljust(10), sequence.sequence.decode())

```

This should output the following:

```

Sp10       --------DPAVLFVIMLGTIT-KFS--SEWFFAWLGLEINMMVII

Sp17       ---FAYTAPDLLLIGFLLKTVA-TFG--DTWFQLWQGLDLNKMPVF

Sp26       AAAAAAAAALLTYLGLFLGTDYENFA--AAAANAWLGLEINMMAQI

Sp33       -------PTILNIAGLHMETDI-NFS--LAWFQAWGGLEINKQAIL

Sp6        ------ASGAILTLGIYLFTLCAVIS--VSWYLAWLGLEINMMAII

Sp8        ------GLGKVIVYGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII

```

## 🧶 Thread-safety

`Aligner` objects are thread-safe, and the `align` method is re-entrant. You

could batch process several alignments in parallel using a

[`ThreadPool`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.ThreadPool) with a single

aligner object:

```python

import glob

import multiprocessing.pool

import Bio.SeqIO

from pyfamsa import Aligner, Sequence

families = [

    [ Sequence(r.id.encode(), r.seq.encode()) for r in Bio.SeqIO.parse(file, "fasta") ]

    for file in glob.glob("pyfamsa/tests/data/*.faa")

]

aligner = Aligner()

with multiprocessing.pool.ThreadPool() as pool:

    alignments = pool.map(aligner.align, families)

```

## 🔎 See Also

Done with your protein alignment? You may be interested in trimming it: in that

case, you could use the [`pytrimal`](https://github.com/althonos/pytrimal) Python

package, which wraps [trimAl](http://trimal.cgenomics.org/) 2.0. Or perhaps

you want to build a HMM from the alignment? Then maybe have a look at

[`pyhmmer`](https://github.com/althonos/pyhmmer), a Python package which

wraps [HMMER](http://hmmer.org/).

## 💭 Feedback

### ⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the [GitHub issue tracker](https://github.com/althonos/pyfamsa/issues)

if you need to report or ask something. If you are filing in on a bug,

please include as much information as you can about the issue, and try to

recreate the same bug in a simple, easily reproducible situation.

### 🏗️ Contributing

Contributions are more than welcome! See

[`CONTRIBUTING.md`](https://github.com/althonos/pyfamsa/blob/main/CONTRIBUTING.md)

for more details.

## 📋 Changelog

This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)

and provides a [changelog](https://github.com/althonos/pyfamsa/blob/main/CHANGELOG.md)

in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.

## ⚖️ License

This library is provided under the [GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/). FAMSA is developed by the

[REFRESH Bioinformatics Group](https://refresh-bio.github.io/) and is

distributed under the terms of the GPLv3 as well. See `vendor/FAMSA/LICENSE`

for more information. In addition, FAMSA vendors several libraries for

compatibility, all of which are redistributed with PyFAMSA under their own

terms: `atomic_wait` (MIT License), `mimalloc` (MIT License), `libdeflate`

(MIT License),  Boost (Boost Software License).

*This project is in no way not affiliated, sponsored, or otherwise endorsed

by the [FAMSA authors](https://github.com/refresh-bio). It was developed

by [Martin Larralde](https://github.com/althonos/) during his PhD project

at the [European Molecular Biology Laboratory](https://www.embl.de/) in

the [Zeller team](https://github.com/zellerlab).*

## 📚 References

- \[1\] Deorowicz, Sebastian, Debudaj-Grabysz, Agnieszka & Gudyś, Adam. ‘FAMSA: Fast and accurate multiple sequence alignment of huge protein families’. Sci Rep 6, 33964 (2016). [doi:10.1038/srep33964](https://doi.org/10.1038/srep33964)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/althonos/pyfamsa

Awesome Lists containing this project

README