{"id":16916919,"url":"https://github.com/althonos/mini3di","last_synced_at":"2025-04-09T09:04:49.792Z","repository":{"id":209194548,"uuid":"723447302","full_name":"althonos/mini3di","owner":"althonos","description":"A NumPy port of the foldseek code for encoding protein structures to 3di.","archived":false,"fork":false,"pushed_at":"2025-03-04T14:26:29.000Z","size":457,"stargazers_count":44,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-02T08:06:24.914Z","etag":null,"topics":["foldseek","numpy-library","protein-structure","python-library"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/althonos.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-25T17:40:05.000Z","updated_at":"2025-03-17T08:38:32.000Z","dependencies_parsed_at":"2024-10-27T12:15:34.888Z","dependency_job_id":"34a632a9-56b9-40f6-b3e8-d35d77390272","html_url":"https://github.com/althonos/mini3di","commit_stats":{"total_commits":48,"total_committers":2,"mean_commits":24.0,"dds":0.02083333333333337,"last_synced_commit":"5bc2fb0257e8d743326f74615ee2c1820c66e7c1"},"previous_names":["althonos/mini3di"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fmini3di","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fmini3di/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fmini3di/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fmini3di/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/althonos","download_url":"https://codeload.github.com/althonos/mini3di/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248008631,"owners_count":21032556,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["foldseek","numpy-library","protein-structure","python-library"],"created_at":"2024-10-13T19:31:22.922Z","updated_at":"2025-04-09T09:04:49.753Z","avatar_url":"https://github.com/althonos.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🚀 `mini3di` [![Stars](https://img.shields.io/github/stars/althonos/mini3di.svg?style=social\u0026maxAge=3600\u0026label=Star)](https://github.com/althonos/mini3di/stargazers)\n\n*A [NumPy](https://numpy.org/) port of the [`foldseek`](https://github.com/steineggerlab/foldseek) code for encoding structures to 3di.*\n\n[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/mini3di/test.yml?branch=main\u0026logo=github\u0026style=flat-square\u0026maxAge=300)](https://github.com/althonos/mini3di/actions)\n[![Coverage](https://img.shields.io/codecov/c/gh/althonos/mini3di?style=flat-square\u0026maxAge=3600)](https://codecov.io/gh/althonos/mini3di/)\n[![License](https://img.shields.io/badge/license-BSD--3--Clause-blue.svg?style=flat-square\u0026maxAge=2678400)](https://choosealicense.com/licenses/bsd-3-clause/)\n[![PyPI](https://img.shields.io/pypi/v/mini3di.svg?style=flat-square\u0026maxAge=3600)](https://pypi.org/project/mini3di)\n[![Bioconda](https://img.shields.io/conda/vn/bioconda/mini3di?style=flat-square\u0026maxAge=3600\u0026logo=anaconda)](https://anaconda.org/bioconda/mini3di)\n[![Wheel](https://img.shields.io/pypi/wheel/mini3di.svg?style=flat-square\u0026maxAge=3600)](https://pypi.org/project/mini3di/#files)\n[![Python Versions](https://img.shields.io/pypi/pyversions/mini3di.svg?style=flat-square\u0026maxAge=3600)](https://pypi.org/project/mini3di/#files)\n[![Python Implementations](https://img.shields.io/badge/impl-universal-success.svg?style=flat-square\u0026maxAge=3600\u0026label=impl)](https://pypi.org/project/mini3di/#files)\n[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/mini3di/)\n[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square\u0026maxAge=2678400)](https://git.embl.de/larralde/mini3di/)\n[![GitHub issues](https://img.shields.io/github/issues/althonos/mini3di.svg?style=flat-square\u0026maxAge=600)](https://github.com/althonos/mini3di/issues)\n[![Docs](https://img.shields.io/readthedocs/mini3di/latest?style=flat-square\u0026maxAge=600)](https://mini3di.readthedocs.io)\n[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)\n[![Downloads](https://img.shields.io/pypi/dm/mini3di?style=flat-square\u0026color=303f9f\u0026maxAge=86400\u0026label=downloads)](https://pepy.tech/project/mini3di)\n\n## 🗺️ Overview\n\n[`foldseek`](https://github.com/steineggerlab/foldseek) is a method developed\nby van Kempen *et al.*[\\[1\\]](#ref1) for the fast and accurate search of\nprotein structures. In order to search proteins structures at a large scale,\nit first encodes the 3D structure into sequences over a structural alphabet,\n3di, which captures tertiary amino acid interactions.\n\n`mini3di` is a pure-Python package to encode 3D structures of proteins into\nthe 3di alphabet, using the trained weights from the `foldseek` VQ-VAE model.\n\nThis library only depends on NumPy and is available for all modern Python\nversions (3.7+).\n\n\u003c!-- ### 📋 Features --\u003e\n\n\n## 🔧 Installing\n\nInstall the `mini3di` package directly from [PyPi](https://pypi.org/project/mini3di)\nwhich hosts universal wheels that can be installed with `pip`:\n```console\n$ pip install mini3di\n```\n\n\u003c!-- Otherwise, `mini3di` is also available as a [Bioconda](https://bioconda.github.io/)\npackage:\n```console\n$ conda install -c bioconda mini3di\n``` --\u003e\n\n\u003c!-- ## 📖 Documentation\n\nA complete [API reference](https://mini3di.readthedocs.io/en/stable/api.html)\ncan be found in the [online documentation](https://mini3di.readthedocs.io/),\nor directly from the command line using\n[`pydoc`](https://docs.python.org/3/library/pydoc.html):\n```console\n$ pydoc mini3di\n``` --\u003e\n\n## 💡 Example\n\n`mini3di` provides a single `Encoder` class, which expects the 3D coordinates\nof the **Cα**, **Cβ**, **N** and **C** atoms from each peptide residue. For\nresidues without **Cβ** (Gly), simply write the coordinates as `math.nan`.\nCall the `encode_atoms` method to get a sequence of 3di states:\n```python\nfrom math import nan\nimport mini3di\n\nencoder = mini3di.Encoder()\nstates = encoder.encode_atoms(\n    ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...],\n    cb=[[ nan,  nan,  nan], [35.3, 53.3, 26.4], ...],\n    n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...],\n    c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...],\n)\n```\n\nThe states returned as output will be a NumPy array of state indices. To turn\nit into a sequence, use the `build_sequence` method of the encoder:\n```python\nsequence = encoder.build_sequence(states)\nprint(sequence)\n```\n\nThe encoder can work directly with Biopython objects, if Biopython is available.\nA helper method `encode_chain` is provided to extract the atom coordinates from\na [`Bio.PDB.Chain`](https://biopython.org/docs/latest/api/Bio.PDB.Chain.html)\nand encoding them directly. For instance, to encode all the chains from a\n[PDB file](https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)):\n```python\nimport pathlib\n\nimport mini3di\nfrom Bio.PDB import PDBParser\n\nencoder = mini3di.Encoder()\nparser = PDBParser(QUIET=True)\nstruct = parser.get_structure(\"8crb\", pathlib.Path(\"tests\", \"data\", \"8crb.pdb\"))\n\nfor chain in struct.get_chains():\n    states = encoder.encode_chain(chain)\n    sequence = encoder.build_sequence(states)\n    print(chain.get_id(), sequence)\n```\n\n## 💭 Feedback\n\n### ⚠️ Issue Tracker\n\nFound a bug? Have an enhancement request? Head over to the [GitHub issue\ntracker](https://github.com/althonos/mini3di/issues) if you need to report\nor ask something. If you are filing in on a bug, please include as much\ninformation as you can about the issue, and try to recreate the same bug\nin a simple, easily reproducible situation.\n\n### 🏗️ Contributing\n\nContributions are more than welcome! See\n[`CONTRIBUTING.md`](https://github.com/althonos/mini3di/blob/main/CONTRIBUTING.md)\nfor more details.\n\n## 📋 Changelog\n\nThis project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)\nand provides a [changelog](https://github.com/althonos/mini3di/blob/master/CHANGELOG.md)\nin the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.\n\n## ⚖️ License\n\nThis library is provided under the [BSD 3-clause license](https://choosealicense.com/licenses/bsd-3-clause/).\nIt includes some code ported from `foldseek`, which is licensed under the\n[GNU General Public License v3.0](https://choosealicense.com/licenses/gpl-3.0/),\nand relicensed with the permission of the authors.\n\n*This project is in no way not affiliated, sponsored, or otherwise endorsed\nby the [original `foldseek` authors](https://github.com/steineggerlab).\nIt was developed by [Martin Larralde](https://github.com/althonos/) during his\nPhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)\nin the [Zeller team](https://github.com/zellerlab).*\n\n\n## 📚 References\n\n- \u003ca id=\"ref1\"\u003e\\[1\\]\u003c/a\u003e Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding, and Martin Steinegger. ‘Fast and Accurate Protein Structure Search with Foldseek’. Nature Biotechnology, 8 May 2023, 1–4. [doi:10.1038/s41587-023-01773-0](https://doi.org/10.1038/s41587-023-01773-0).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fmini3di","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falthonos%2Fmini3di","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fmini3di/lists"}