Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/althonos/mini3di
A NumPy port of the foldseek code for encoding protein structures to 3di.
https://github.com/althonos/mini3di
foldseek numpy-library protein-structure python-library
Last synced: 3 days ago
JSON representation
A NumPy port of the foldseek code for encoding protein structures to 3di.
- Host: GitHub
- URL: https://github.com/althonos/mini3di
- Owner: althonos
- License: bsd-3-clause
- Created: 2023-11-25T17:40:05.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-07T13:47:39.000Z (5 months ago)
- Last Synced: 2025-02-10T00:11:13.156Z (11 days ago)
- Topics: foldseek, numpy-library, protein-structure, python-library
- Language: Python
- Homepage:
- Size: 441 KB
- Stars: 38
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: COPYING
Awesome Lists containing this project
README
# 🚀 `mini3di` [](https://github.com/althonos/mini3di/stargazers)
*A [NumPy](https://numpy.org/) port of the [`foldseek`](https://github.com/steineggerlab/foldseek) code for encoding structures to 3di.*
[](https://github.com/althonos/mini3di/actions)
[](https://codecov.io/gh/althonos/mini3di/)
[](https://choosealicense.com/licenses/bsd-3-clause/)
[](https://pypi.org/project/mini3di)
[](https://anaconda.org/bioconda/mini3di)
[](https://pypi.org/project/mini3di/#files)
[](https://pypi.org/project/mini3di/#files)
[](https://pypi.org/project/mini3di/#files)
[](https://github.com/althonos/mini3di/)
[](https://git.embl.de/larralde/mini3di/)
[](https://github.com/althonos/mini3di/issues)
[](https://mini3di.readthedocs.io)
[](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)
[](https://pepy.tech/project/mini3di)## 🗺️ Overview
[`foldseek`](https://github.com/steineggerlab/foldseek) is a method developed
by van Kempen *et al.*[\[1\]](#ref1) for the fast and accurate search of
protein structures. In order to search proteins structures at a large scale,
it first encodes the 3D structure into sequences over a structural alphabet,
3di, which captures tertiary amino acid interactions.`mini3di` is a pure-Python package to encode 3D structures of proteins into
the 3di alphabet, using the trained weights from the `foldseek` VQ-VAE model.This library only depends on NumPy and is available for all modern Python
versions (3.7+).## 🔧 Installing
Install the `mini3di` package directly from [PyPi](https://pypi.org/project/mini3di)
which hosts universal wheels that can be installed with `pip`:
```console
$ pip install mini3di
```## 💡 Example
`mini3di` provides a single `Encoder` class, which expects the 3D coordinates
of the **Cα**, **Cβ**, **N** and **C** atoms from each peptide residue. For
residues without **Cβ** (Gly), simply write the coordinates as `math.nan`.
Call the `encode_atoms` method to get a sequence of 3di states:
```python
from math import nan
import mini3diencoder = mini3di.Encoder()
states = encoder.encode_atoms(
ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...],
cb=[[ nan, nan, nan], [35.3, 53.3, 26.4], ...],
n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...],
c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...],
)
```The states returned as output will be a NumPy array of state indices. To turn
it into a sequence, use the `build_sequence` method of the encoder:
```python
sequence = encoder.build_sequence(states)
print(sequence)
```The encoder can work directly with Biopython objects, if Biopython is available.
A helper method `encode_chain` to extract the atom coordinates from
a [`Bio.PDB.Chain`](https://biopython.org/docs/latest/api/Bio.PDB.Chain.html)
and encoding them directly. For instance, to encode all the chains from a
[PDB file](https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)):
```python
import pathlibimport mini3di
from Bio.PDB import PDBParserencoder = mini3di.Encoder()
parser = PDBParser(QUIET=True)
struct = parser.get_structure("8crb", pathlib.Path("tests", "data", "8crb.pdb"))for chain in struct.get_chains():
states = encoder.encode_chain(chain)
sequence = encoder.build_sequence(states)
print(chain.get_id(), sequence)
```## 💭 Feedback
### ⚠️ Issue Tracker
Found a bug? Have an enhancement request? Head over to the [GitHub issue
tracker](https://github.com/althonos/mini3di/issues) if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.### 🏗️ Contributing
Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/mini3di/blob/main/CONTRIBUTING.md)
for more details.## 📋 Changelog
This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.## ⚖️ License
This library is provided under the [BSD 3-clause license](https://choosealicense.com/licenses/bsd-3-clause/).
It includes some code ported from `foldseek`, which is licensed under the
[GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/),
and relicensed with the permission of the authors.*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original `foldseek` authors](https://github.com/steineggerlab).
It was developed by [Martin Larralde](https://github.com/althonos/) during his
PhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)
in the [Zeller team](https://github.com/zellerlab).*## 📚 References
- \[1\] Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding, and Martin Steinegger. ‘Fast and Accurate Protein Structure Search with Foldseek’. Nature Biotechnology, 8 May 2023, 1–4. [doi:10.1038/s41587-023-01773-0](https://doi.org/10.1038/s41587-023-01773-0).