An open API service indexing awesome lists of open source software.

https://github.com/althonos/pymuscle5

Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.
https://github.com/althonos/pymuscle5

bioinformatics cython-library genomics multiple-sequence-alignment muscle python-bindings python-library sequence-alignment

Last synced: 2 months ago
JSON representation

Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.

Awesome Lists containing this project

README

        

# pyMUSCLE5 [![Stars](https://img.shields.io/github/stars/althonos/pymuscle5.svg?style=social&maxAge=3600&label=Star)](https://github.com/althonos/pymuscle5/stargazers)

*Cython bindings and Python interface to [MUSCLE v5](https://www.drive5.com/muscle/), a highly efficient and accurate multiple sequence alignment software.*

[![Actions](https://img.shields.io/github/workflow/status/althonos/pymuscle5/Test/main?logo=github&style=flat-square&maxAge=300)](https://github.com/althonos/pymuscle5/actions)

## πŸ—ΊοΈ Overview

MUSCLE is widely-used software for making multiple alignments of biological
sequences. Version 5 of MUSCLE achieves highest scores on several benchmark
tests and scales to thousands of sequences on a commodity desktop computer.

pyMUSCLE5 is a Python module that provides bindings to MUSCLE v5 using
[Cython](https://cython.org/). It directly interacts with the MUSCLE
internals, which has the following advantages:

- **single dependency**: If your software or your analysis pipeline is
distributed as a Python package, you can add `pymuscle5` as a dependency to
your project, and stop worrying about the MUSCLE binaries being properly
setup on the end-user machine.
- **no intermediate files**: Everything happens in memory, in a Python object
you fully control, so you don't have to invoke the MUSCLE CLI using a
sub-process and temporary files. Sequences can be passed directly as
strings or bytes, which avoids the overhead of formatting your input to
FASTA for MUSCLE.
- **no OpenMP**: The original MUSCLE code uses [OpenMP](https://www.openmp.org/)
to parallelize embarassingly-parallel tasks. In pyMUSCLE5 the dependency on
OpenMP has been removed in favor of the Python `threading` module for better
portability.

*This library is in a very experimental stage at the moment, and consistency
of the results across versions or platforms is not guaranteed yet.*

## πŸ”§ Installing

At the moment pyMUSCLE5 is not available on PyPI. You can however install it
directly from GitHub with:

```console
$ pip install git+https://github.com/althonos/pymuscle5
```

## πŸ’‘ Example

Let's load some sequences sequence from a FASTA file, use an `Aligner` to
align proteins together, and print the alignment in two-line FASTA format.

### πŸ”¬ [Biopython](https://github.com/biopython/biopython)

```python
import os

import Bio.SeqIO
import pymuscle5

path = os.path.join("pymuscle", "tests", "data", "swissprot-halorhodopsin.faa")
records = list(Bio.SeqIO.parse(path, "fasta"))

sequences = [
pymuscle5.Sequence(record.id.encode(), bytes(record.seq))
for record in records
]

aligner = pymuscle5.Aligner()
msa = aligner.align(sequences)

for seq in msa.sequences:
print(f">{seq.name.decode()}")
print(seq.sequence.decode())
```

### πŸ§ͺ [Scikit-bio](https://github.com/biocore/scikit-bio)

```python
import os

import skbio.io
import pymuscle5

path = os.path.join("pymuscle", "tests", "data", "swissprot-halorhodopsin.faa")
records = list(skbio.io.read(path, "fasta"))

sequences = [
pymuscle5.Sequence(record.metadata["id"].encode(), record.values.view('B'))
for record in records
]

aligner = pymuscle5.Aligner()
msa = aligner.align(sequences)

for seq in msa.sequences:
print(f">{seq.name.decode()}")
print(seq.sequence.decode())
```

*We need to use the [`view`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html)
method to get the sequence viewable by Cython as an array of `unsigned char`.*

## πŸ’­ Feedback

### ⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the [GitHub issue
tracker](https://github.com/althonos/pymuscle5/issues) if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.

### πŸ—οΈ Contributing

Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/pymuscle5/blob/main/CONTRIBUTING.md)
for more details.

## πŸ“‹ Changelog

This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/althonos/pymuscle5/blob/main/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.

## βš–οΈ License

This library is provided under the [GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/).
The MUSCLE code was written by [Robert Edgar](https://github.com/rcedgar) and is distributed under the
terms of the GPLv3 as well. See `vendor/muscle/LICENSE` for more information.

*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original MUSCLE authors](https://github.com/rcedgar). It was developed
by [Martin Larralde](https://github.com/althonos/) during his PhD project
at the [European Molecular Biology Laboratory](https://www.embl.de/) in
the [Zeller team](https://github.com/zellerlab).*