https://github.com/FreyrS/dMaSIF
https://github.com/FreyrS/dMaSIF
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/FreyrS/dMaSIF
- Owner: FreyrS
- License: other
- Created: 2021-02-26T13:50:15.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-05-22T15:34:26.000Z (about 3 years ago)
- Last Synced: 2024-11-28T02:34:50.120Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 2.86 MB
- Stars: 195
- Watchers: 5
- Forks: 45
- Open Issues: 21
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- StarryDivineSky - FreyrS/dMaSIF
README
## dMaSIF - Fast end-to-end learning on protein surfaces

## Abstract
Proteins’ biological functions are defined by the geometric
and chemical structure of their 3D molecular surfaces.
Recent works have shown that geometric deep learning can
be used on mesh-based representations of proteins to identify
potential functional sites, such as binding targets for
potential drugs. Unfortunately though, the use of meshes as
the underlying representation for protein structure has multiple
drawbacks including the need to pre-compute the input
features and mesh connectivities. This becomes a bottleneck
for many important tasks in protein science.
In this paper, we present a new framework for deep
learning on protein structures that addresses these limitations.
Among the key advantages of our method are the computation
and sampling of the molecular surface on-the-fly
from the underlying atomic point cloud and a novel efficient
geometric convolutional layer. As a result, we are able to
process large collections of proteins in an end-to-end fashion,
taking as the sole input the raw 3D coordinates and
chemical types of their atoms, eliminating the need for any
hand-crafted pre-computed features.
To showcase the performance of our approach, we test it
on two tasks in the field of protein structural bioinformatics:
the identification of interaction sites and the prediction
of protein-protein interactions. On both tasks, we achieve
state-of-the-art performance with much faster run times and
fewer parameters than previous models. These results will
considerably ease the deployment of deep learning methods
in protein science and open the door for end-to-end differentiable
approaches in protein modeling tasks such as function
prediction and design.
## Hardware requirements
Models have been trained on either a single NVIDIA RTX 2080 Ti or a single Tesla V100 GPU. Time and memory benchmarks were performed on a single Tesla V100.
## Software prerequisites
Scripts have been tested using the following two sets of core dependencies:
| Dependency | First Option | Second Option |
| ------------- | ------------- | ------------- |
| GCC | 7.5.0 | 8.4.0 |
| CMAKE | 3.10.2 | 3.16.5 |
| CUDA | 10.0.130 | 10.2.89 |
| cuDNN | 7.6.4.38 | 7.6.5.32 |
| Python | 3.6.9 | 3.7.7 |
| PyTorch | 1.4.0 | 1.6.0 |
| PyKeops | 1.4 | 1.4.1 |
| PyTorch Geometric | 1.5.0 | 1.6.1 |
## Code overview
Usage:
- In order to **train models**, run `main_training.py` with the appropriate flags.
Available flags and their descriptions can be found in `Arguments.py`.
- The command line options needed to reproduce the **benchmarks** can be found in `benchmark_scripts/`.
- To make **inference** on the testing set using pretrained models, use `main_inference.py` with the flags that were used for training the models.
Note that the `--experiment_name flag` should be modified to specify the training epoch to use.
Implementation:
- Our **surface generation** algorithm, **curvature** estimation method and **quasi-geodesic convolutions** are implemented in `geometry_processing.py`.
- The **definition of the neural network** along with surface and input features can be found in `model.py`. The convolutional layers are implemented in `benchmark_models.py`.
- The scripts used to **generate the figures** of the paper can be found in `data_analysis/`.
## License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
## Reference
Sverrisson, F., Feydy, J., Correia, B. E., & Bronstein, M. M. (2020). Fast end-to-end learning on protein surfaces. [bioRxiv](https://www.biorxiv.org/content/10.1101/2020.12.28.424589v1).