Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/althonos/mini3di
A NumPy port of the foldseek code for encoding protein structures to 3di.
https://github.com/althonos/mini3di
foldseek numpy-library protein-structure python-library
Last synced: 3 days ago
JSON representation
A NumPy port of the foldseek code for encoding protein structures to 3di.
- Host: GitHub
- URL: https://github.com/althonos/mini3di
- Owner: althonos
- License: bsd-3-clause
- Created: 2023-11-25T17:40:05.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-07T13:47:39.000Z (5 months ago)
- Last Synced: 2025-02-10T00:11:13.156Z (11 days ago)
- Topics: foldseek, numpy-library, protein-structure, python-library
- Language: Python
- Homepage:
- Size: 441 KB
- Stars: 38
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: COPYING
Awesome Lists containing this project
README
# 🚀 `mini3di` [data:image/s3,"s3://crabby-images/aaf28/aaf28d575d88e1965272fe8b1c7640de103ed946" alt="Stars"](https://github.com/althonos/mini3di/stargazers)
*A [NumPy](https://numpy.org/) port of the [`foldseek`](https://github.com/steineggerlab/foldseek) code for encoding structures to 3di.*
[data:image/s3,"s3://crabby-images/b9370/b9370e21fc466e162127d8774086883336cab7b7" alt="Actions"](https://github.com/althonos/mini3di/actions)
[data:image/s3,"s3://crabby-images/60931/60931b5d89a09ee2588f0a90e2a1cf6504f65b1c" alt="Coverage"](https://codecov.io/gh/althonos/mini3di/)
[data:image/s3,"s3://crabby-images/db6fe/db6fe26342e09787c495e79d41f960c45eba30c8" alt="License"](https://choosealicense.com/licenses/bsd-3-clause/)
[data:image/s3,"s3://crabby-images/8b391/8b39134481409906e03f3e3ff41408d9bfc6d7fe" alt="PyPI"](https://pypi.org/project/mini3di)
[data:image/s3,"s3://crabby-images/2f9c0/2f9c0789556a85c7c80fa416cf8ba8ae02933179" alt="Bioconda"](https://anaconda.org/bioconda/mini3di)
[data:image/s3,"s3://crabby-images/b3bfc/b3bfcab84a7f8cb125b36346259238d8248f152c" alt="Wheel"](https://pypi.org/project/mini3di/#files)
[data:image/s3,"s3://crabby-images/95e72/95e723c1b7d226ca93786d7cbf04e3c9135ed1b9" alt="Python Versions"](https://pypi.org/project/mini3di/#files)
[data:image/s3,"s3://crabby-images/e259b/e259b205c814039ecacbedfd00b61d9d869717d6" alt="Python Implementations"](https://pypi.org/project/mini3di/#files)
[data:image/s3,"s3://crabby-images/2dc75/2dc75b1d436159058cb3644d4aad8903b18ab307" alt="Source"](https://github.com/althonos/mini3di/)
[data:image/s3,"s3://crabby-images/a107c/a107c2d788fdf0616638573fbb4256c6163a0dd3" alt="Mirror"](https://git.embl.de/larralde/mini3di/)
[data:image/s3,"s3://crabby-images/2d3ba/2d3ba2275ffe512a615de4776774413c576dc38a" alt="GitHub issues"](https://github.com/althonos/mini3di/issues)
[data:image/s3,"s3://crabby-images/1f2dd/1f2ddba416fb5768ceb6772381828d31e5d50411" alt="Docs"](https://mini3di.readthedocs.io)
[data:image/s3,"s3://crabby-images/1f14d/1f14d68f4c4627b8fc90e23c382050502e6ce4d6" alt="Changelog"](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)
[data:image/s3,"s3://crabby-images/9ab07/9ab073758bc213791f198531fd79f7c8d0fbc8b8" alt="Downloads"](https://pepy.tech/project/mini3di)## 🗺️ Overview
[`foldseek`](https://github.com/steineggerlab/foldseek) is a method developed
by van Kempen *et al.*[\[1\]](#ref1) for the fast and accurate search of
protein structures. In order to search proteins structures at a large scale,
it first encodes the 3D structure into sequences over a structural alphabet,
3di, which captures tertiary amino acid interactions.`mini3di` is a pure-Python package to encode 3D structures of proteins into
the 3di alphabet, using the trained weights from the `foldseek` VQ-VAE model.This library only depends on NumPy and is available for all modern Python
versions (3.7+).## 🔧 Installing
Install the `mini3di` package directly from [PyPi](https://pypi.org/project/mini3di)
which hosts universal wheels that can be installed with `pip`:
```console
$ pip install mini3di
```## 💡 Example
`mini3di` provides a single `Encoder` class, which expects the 3D coordinates
of the **Cα**, **Cβ**, **N** and **C** atoms from each peptide residue. For
residues without **Cβ** (Gly), simply write the coordinates as `math.nan`.
Call the `encode_atoms` method to get a sequence of 3di states:
```python
from math import nan
import mini3diencoder = mini3di.Encoder()
states = encoder.encode_atoms(
ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...],
cb=[[ nan, nan, nan], [35.3, 53.3, 26.4], ...],
n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...],
c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...],
)
```The states returned as output will be a NumPy array of state indices. To turn
it into a sequence, use the `build_sequence` method of the encoder:
```python
sequence = encoder.build_sequence(states)
print(sequence)
```The encoder can work directly with Biopython objects, if Biopython is available.
A helper method `encode_chain` to extract the atom coordinates from
a [`Bio.PDB.Chain`](https://biopython.org/docs/latest/api/Bio.PDB.Chain.html)
and encoding them directly. For instance, to encode all the chains from a
[PDB file](https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)):
```python
import pathlibimport mini3di
from Bio.PDB import PDBParserencoder = mini3di.Encoder()
parser = PDBParser(QUIET=True)
struct = parser.get_structure("8crb", pathlib.Path("tests", "data", "8crb.pdb"))for chain in struct.get_chains():
states = encoder.encode_chain(chain)
sequence = encoder.build_sequence(states)
print(chain.get_id(), sequence)
```## 💭 Feedback
### ⚠️ Issue Tracker
Found a bug? Have an enhancement request? Head over to the [GitHub issue
tracker](https://github.com/althonos/mini3di/issues) if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.### 🏗️ Contributing
Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/mini3di/blob/main/CONTRIBUTING.md)
for more details.## 📋 Changelog
This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.## ⚖️ License
This library is provided under the [BSD 3-clause license](https://choosealicense.com/licenses/bsd-3-clause/).
It includes some code ported from `foldseek`, which is licensed under the
[GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/),
and relicensed with the permission of the authors.*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original `foldseek` authors](https://github.com/steineggerlab).
It was developed by [Martin Larralde](https://github.com/althonos/) during his
PhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)
in the [Zeller team](https://github.com/zellerlab).*## 📚 References
- \[1\] Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding, and Martin Steinegger. ‘Fast and Accurate Protein Structure Search with Foldseek’. Nature Biotechnology, 8 May 2023, 1–4. [doi:10.1038/s41587-023-01773-0](https://doi.org/10.1038/s41587-023-01773-0).