https://github.com/deeprank/ponder
PointNet for Deep Rank: protein-protein interaction scoring using neural networks
https://github.com/deeprank/ponder
deep-learning pdb point-cloud pointnet protein-protein-interaction structural-bioinformatics
Last synced: 24 days ago
JSON representation
PointNet for Deep Rank: protein-protein interaction scoring using neural networks
- Host: GitHub
- URL: https://github.com/deeprank/ponder
- Owner: DeepRank
- License: apache-2.0
- Created: 2018-06-21T08:27:26.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-05-27T10:59:31.000Z (almost 6 years ago)
- Last Synced: 2025-10-23T13:40:48.964Z (5 months ago)
- Topics: deep-learning, pdb, point-cloud, pointnet, protein-protein-interaction, structural-bioinformatics
- Language: Python
- Homepage:
- Size: 383 KB
- Stars: 9
- Watchers: 0
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# General
Experimental deep learning architecture for scoring protein-protein interactions.
See [PointNet paper](https://arxiv.org/abs/1612.00593) for original architecture description. This implementation contains two architectures, neither of which contain the transformer networks, so can be considered variants of the *vanilla* version of PointNet. The first differs merely in its dropout rate (50%), whereas the second is a novel architecture called *Siamese PointNet*, visible in the image below.

Other adaptations include cosine annealing learning rate decay, which has been implemented to improve accuracy and generalizability of the trained network (see [Stochastic Gradient Descent with Warm Restarts](https://arxiv.org/abs/1608.03983)), and a custom loss function introducing a bias in learning towards higher scoring decoys.

# Dependencies
* Python 3.x
* [H5Py](http://www.h5py.org/) for fast data retrieval
* [PyTorch <0.4](https://github.com/pytorch/pytorch) and its dependencies
* Data conversion uses [DeepRank](https://github.com/DeepRank/deeprank) and its dependencies
* [Seaborn](https://github.com/mwaskom/seaborn) for plotting
# Usage
*python train.py*
```
--batch_size BATCH_SIZE Input batch size (default = 256)
--num_points NUM_POINTS Points per point cloud used (default = 1024)
--num_epoch NUM_EPOCH Number of epochs to train for (default = 15)
--CUDA Train on GPU
--out_folder OUT_FOLDER Model output folder
--model MODEL Model input path
--data_path DATA_PATH Path to HDF5 file
--lr LR Learning rate (default = 0.0001)
--optimizer OPTIMIZER What optimizer to use. Options: Adam, SGD, SGD_cos
--avg_pool Use average pooling for feature pooling (instead of default max pooling)
--dual Use Siamese PointNet architecture
--metric METRIC Metric to be used. Options: irmsd, lrmsd, fnat, dockQ (default)
--dropout DROPOUT Dropout rate in last layer. When 0 replaced by batchnorm (default = 0.5)
--root Apply square root on metric (for DockQ score balancing)
--patience PATIENCE Number of epochs to observe overfitting before early stopping
--classification Classification instead of regression
```
The network takes the atoms taking part in an interaction as point cloud data. Data conversion can be performed using the *extract_pc.py* script.
Data is saved in HDF5 format containing 3 groups: train, test and "holdout" data. Datasets within these groups contain atom features with *float32* precision and attributes containing the iRMSD, lRMSD, FNAT, and DockQ scores.
# Current state
* Architecture & training scripts have been fully implemented