https://github.com/althonos/pymuscle5
Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.
https://github.com/althonos/pymuscle5
bioinformatics cython-library genomics multiple-sequence-alignment muscle python-bindings python-library sequence-alignment
Last synced: 2 months ago
JSON representation
Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.
- Host: GitHub
- URL: https://github.com/althonos/pymuscle5
- Owner: althonos
- License: gpl-3.0
- Created: 2022-05-06T14:19:48.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-05-08T11:25:49.000Z (about 1 year ago)
- Last Synced: 2025-01-25T21:13:18.555Z (4 months ago)
- Topics: bioinformatics, cython-library, genomics, multiple-sequence-alignment, muscle, python-bindings, python-library, sequence-alignment
- Language: Cython
- Homepage:
- Size: 115 KB
- Stars: 20
- Watchers: 4
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: COPYING
Awesome Lists containing this project
README
# pyMUSCLE5 [](https://github.com/althonos/pymuscle5/stargazers)
*Cython bindings and Python interface to [MUSCLE v5](https://www.drive5.com/muscle/), a highly efficient and accurate multiple sequence alignment software.*
[](https://github.com/althonos/pymuscle5/actions)
## πΊοΈ Overview
MUSCLE is widely-used software for making multiple alignments of biological
sequences. Version 5 of MUSCLE achieves highest scores on several benchmark
tests and scales to thousands of sequences on a commodity desktop computer.pyMUSCLE5 is a Python module that provides bindings to MUSCLE v5 using
[Cython](https://cython.org/). It directly interacts with the MUSCLE
internals, which has the following advantages:- **single dependency**: If your software or your analysis pipeline is
distributed as a Python package, you can add `pymuscle5` as a dependency to
your project, and stop worrying about the MUSCLE binaries being properly
setup on the end-user machine.
- **no intermediate files**: Everything happens in memory, in a Python object
you fully control, so you don't have to invoke the MUSCLE CLI using a
sub-process and temporary files. Sequences can be passed directly as
strings or bytes, which avoids the overhead of formatting your input to
FASTA for MUSCLE.
- **no OpenMP**: The original MUSCLE code uses [OpenMP](https://www.openmp.org/)
to parallelize embarassingly-parallel tasks. In pyMUSCLE5 the dependency on
OpenMP has been removed in favor of the Python `threading` module for better
portability.*This library is in a very experimental stage at the moment, and consistency
of the results across versions or platforms is not guaranteed yet.*## π§ Installing
At the moment pyMUSCLE5 is not available on PyPI. You can however install it
directly from GitHub with:```console
$ pip install git+https://github.com/althonos/pymuscle5
```## π‘ Example
Let's load some sequences sequence from a FASTA file, use an `Aligner` to
align proteins together, and print the alignment in two-line FASTA format.### π¬ [Biopython](https://github.com/biopython/biopython)
```python
import osimport Bio.SeqIO
import pymuscle5path = os.path.join("pymuscle", "tests", "data", "swissprot-halorhodopsin.faa")
records = list(Bio.SeqIO.parse(path, "fasta"))sequences = [
pymuscle5.Sequence(record.id.encode(), bytes(record.seq))
for record in records
]aligner = pymuscle5.Aligner()
msa = aligner.align(sequences)for seq in msa.sequences:
print(f">{seq.name.decode()}")
print(seq.sequence.decode())
```### π§ͺ [Scikit-bio](https://github.com/biocore/scikit-bio)
```python
import osimport skbio.io
import pymuscle5path = os.path.join("pymuscle", "tests", "data", "swissprot-halorhodopsin.faa")
records = list(skbio.io.read(path, "fasta"))sequences = [
pymuscle5.Sequence(record.metadata["id"].encode(), record.values.view('B'))
for record in records
]aligner = pymuscle5.Aligner()
msa = aligner.align(sequences)for seq in msa.sequences:
print(f">{seq.name.decode()}")
print(seq.sequence.decode())
```*We need to use the [`view`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html)
method to get the sequence viewable by Cython as an array of `unsigned char`.*## π Feedback
### β οΈ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the [GitHub issue
tracker](https://github.com/althonos/pymuscle5/issues) if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.### ποΈ Contributing
Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/pymuscle5/blob/main/CONTRIBUTING.md)
for more details.## π Changelog
This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/althonos/pymuscle5/blob/main/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.## βοΈ License
This library is provided under the [GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/).
The MUSCLE code was written by [Robert Edgar](https://github.com/rcedgar) and is distributed under the
terms of the GPLv3 as well. See `vendor/muscle/LICENSE` for more information.*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original MUSCLE authors](https://github.com/rcedgar). It was developed
by [Martin Larralde](https://github.com/althonos/) during his PhD project
at the [European Molecular Biology Laboratory](https://www.embl.de/) in
the [Zeller team](https://github.com/zellerlab).*