{"id":13752195,"url":"https://github.com/benevolentAI/DeeplyTough","last_synced_at":"2025-05-09T18:33:33.918Z","repository":{"id":37733883,"uuid":"179281367","full_name":"BenevolentAI/DeeplyTough","owner":"BenevolentAI","description":"DeeplyTough: Learning Structural Comparison of Protein Binding Sites","archived":false,"fork":false,"pushed_at":"2023-04-07T09:33:44.000Z","size":33312,"stargazers_count":156,"open_issues_count":9,"forks_count":37,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-16T04:33:05.469Z","etag":null,"topics":["3d-models","deep-learning","drug-discovery","metric-learning","protein-structure"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BenevolentAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-04-03T12:03:29.000Z","updated_at":"2024-11-03T06:10:09.000Z","dependencies_parsed_at":"2023-10-20T18:21:43.238Z","dependency_job_id":null,"html_url":"https://github.com/BenevolentAI/DeeplyTough","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenevolentAI%2FDeeplyTough","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenevolentAI%2FDeeplyTough/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenevolentAI%2FDeeplyTough/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenevolentAI%2FDeeplyTough/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BenevolentAI","download_url":"https://codeload.github.com/BenevolentAI/DeeplyTough/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253303310,"owners_count":21886919,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-models","deep-learning","drug-discovery","metric-learning","protein-structure"],"created_at":"2024-08-03T09:01:01.283Z","updated_at":"2025-05-09T18:33:28.865Z","avatar_url":"https://github.com/BenevolentAI.png","language":"Python","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"readme":"# DeeplyTough\n\nThis is the official PyTorch implementation of our paper *DeeplyTough: Learning Structural Comparison of Protein Binding Sites*, available from \u003chttps://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00554\u003e.\n\n![DeeplyTough overview figure](overview.png?raw=true \"DeeplyTough overview figure.\")\n\n## Setup\n\n### Code setup\n\nThe software is ready for Docker: the image can be created from `Dockerfile` by running `docker build -t deeplytough .` (image size ~4.7GB so you may have to increase the disk space available to docker). The DeeplyTough tool is then accessible within `deeplytough` conda environment inside the container with `source activate deeplytough`.\n\nAlternatively, environment `deeplytough` can be created inside local [conda](https://conda.io/en/latest/miniconda.html) by executing the following steps from the root of this repository (linux only): \n\n```bash\n# create new python 3 env and activate\nconda create -y -n deeplytough python=3.6\nconda activate deeplytough\n\n# install legacy version of htmd from source\ncurl -LO https://github.com/Acellera/htmd/archive/refs/tags/1.13.10.tar.gz \u0026\u0026 \\\n    tar -xvzf 1.13.10.tar.gz \u0026\u0026 rm 1.13.10.tar.gz \u0026\u0026 cd htmd-1.13.10 \u0026\u0026 \\\n    python setup.py install \u0026\u0026 \\\n    cd .. \u0026\u0026 \\\n    rm -rf htmd-1.13.10;\n\n# install remaining python3 reqs\napt-get -y install openbabel\npip install --upgrade pip \u0026\u0026 pip install -r requirements.txt \u0026\u0026 pip install --ignore-installed llvmlite==0.28\n\n# install legacy se3nn library from source\ngit clone https://github.com/mariogeiger/se3cnn \u0026\u0026 cd se3cnn \u0026\u0026 git reset --hard 6b976bea4ea17e1bd5655f0f030c6e2bb1637b57 \u0026\u0026 mv experiments se3cnn; sed -i \"s/exclude=\\['experiments\\*'\\]//g\" setup.py \u0026\u0026 python setup.py install \u0026\u0026 cd .. \u0026\u0026 rm -rf se3cnn\ngit clone https://github.com/AMLab-Amsterdam/lie_learn \u0026\u0026 cd lie_learn \u0026\u0026 python setup.py install \u0026\u0026 cd .. \u0026\u0026 rm -rf lie_learn\n\n# create python2 env used for protein structure preprocessing\nconda create -y -n deeplytough_mgltools python=2.7\nconda install -y -n deeplytough_mgltools -c bioconda mgltools=1.5.6\n```\n\n### Dataset setup\n\n#### Training and benchmark datasets\n\nThe tool comes with built-in support for three datasets: TOUGH-M1 (Govindaraj and Brylinski, 2018), Vertex (Chen et al., 2016), and ProSPECCTs (Ehrt et al., 2018). These datasets must be downloaded if one wishes to either retrain the network or evaluate on one of these benchmarks. The datasets can be prepared in two steps:\n\n1. Set `STRUCTURE_DATA_DIR` environment variable to a directory that will contain the datasets (about 27 GB): `export STRUCTURE_DATA_DIR=/path_to_a_dir`\n2. Run `datasets_downloader.sh` from the root of this repository and get yourself a coffee\n\nThis will download PDB files, extracted pockets and pre-process input features. It will also download lists of pocket pairs provided by the respective dataset authors. By downloading Prospeccts, you accept their [terms of use](http://www.ccb.tu-dortmund.de/ag-koch/prospeccts/license_en.pdf).\n\nNote that this is a convenience and we also provide code for data pre-processing: in case one wishes to start from the respective base datasets, pre-processing may be triggered using the `--db_preprocessing 1` flag when running any of our training and evaluation scripts. For the TOUGH-M1 dataset in particular, fpocket2 is required and can be installed as follows:\n```bash\ncurl -O -L https://netcologne.dl.sourceforge.net/project/fpocket/fpocket2.tar.gz \u0026\u0026 tar -xvzf fpocket2.tar.gz \u0026\u0026 rm fpocket2.tar.gz \u0026\u0026 cd fpocket2 \u0026\u0026 sed -i 's/\\$(LFLAGS) \\$\\^ -o \\$@/\\$\\^ -o \\$@ \\$(LFLAGS)/g' makefile \u0026\u0026 make \u0026\u0026 mv bin/fpocket bin/fpocket2 \u0026\u0026 mv bin/dpocket bin/dpocket2 \u0026\u0026 mv bin/mdpocket bin/mdpocket2 \u0026\u0026 mv bin/tpocket bin/tpocket2\n```\n\n#### Custom datasets\n\nThe tool also supports an easy way of computing pocket distances for a user-defined set of pocket pairs. This requires providing i) a set of PDB structures, ii) pockets in PDB format (extracted around bound ligands or detected using any pocket detection algorithm), iii) a CSV file defining the pairing. A toy custom dataset example is provided in `datasets/custom`. The CSV file contains a quadruplet on each line indicating pairs to evaluate: `relative_path_to_pdbA, relative_path_to_pocketA, relative_path_to_pdbB, relative_path_to_pocketB`, where paths are relative to the directory containing the CSV file and the pdb extension may be omitted. `STRUCTURE_DATA_DIR` environment variable must be set to the parent directory containing the custom dataset (in the example `/path_to_this_repository/datasets`).\n\n### Environment setup\n\nTo run the evaluation and training scripts, please first set the `DEEPLYTOUGH` environment variable to the directory containing this repository and then update the `PYTHONPATH` and `PATH` variables respectively:\n```bash\nexport DEEPLYTOUGH=/path_to_this_repository\nexport PYTHONPATH=$DEEPLYTOUGH/deeplytough:$PYTHONPATH\nexport PATH=$DEEPLYTOUGH/fpocket2/bin:$PATH\n```\n\n## Evaluation\n\nWe provide pre-trained networks in the `networks` directory in this repository. The following commands assume a GPU and a 4-core CPU available; use `--device 'cpu'` if there is no GPU and set `--nworkers` parameter accordingly if there are fewer cores available.\n\n* Evaluation on TOUGH-M1: \n```bash\npython $DEEPLYTOUGH/deeplytough/scripts/toughm1_benchmark.py --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar\n```\n\n* Evaluation on Vertex: \n```bash\npython $DEEPLYTOUGH/deeplytough/scripts/vertex_benchmark.py --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_vertex.pth.tar\n```\n\n* Evaluation on ProSPECCTs: \n```bash\npython $DEEPLYTOUGH/deeplytough/scripts/prospeccts_benchmark.py --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_prospeccts.pth.tar\n```\n\n* Evaluation on a custom dataset, located in `$STRUCTURE_DATA_DIR/some_custom_name` directory: \n```bash\npython $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'some_custom_name' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar\n```\nNote that networks `deeplytough_prospeccts.pth.tar` and `deeplytough_vertex.pth.tar` may also be used, producing different results.\n\nEach of these commands will output to `$DEEPLYTOUGH/results` a CSV file with the resulting similarity scores (negative distances) as well as a pickle file with more detailed results (please see the code). The CSV files are already provided in this repository for conveniency.\n\n\n## Training\n\nTraining requires a GPU with \u003e=11GB of memory and takes about 1.5 days on recent hardware. In addition, at least a 4-core CPU is recommended due to volumetric input pre-processing being an expensive task.\n\n* Training for TOUGH-M1 evaluation: \n```bash\npython $DEEPLYTOUGH/deeplytough/scripts/train.py --output_dir $DEEPLYTOUGH/results/TTTT_forTough --device 'cuda:0' --seed 4\n```\n\n* Training for Vertex evaluation:\n```bash\npython $DEEPLYTOUGH/deeplytough/scripts/train.py --output_dir $DEEPLYTOUGH/results/TTTT_forVertex --device 'cuda:0' --db_exclude_vertex 'uniprot' --db_split_strategy 'none'\n```\n\n* Training for ProSPECCTs evaluation:\n```bash\npython $DEEPLYTOUGH/deeplytough/scripts/train.py --output_dir $DEEPLYTOUGH/results/TTTT_forProspeccts --device 'cuda:0' --db_exclude_prospeccts 'uniprot' --db_split_strategy 'none' --model_config 'se_4_4_4_4_7_3_2_batch_1,se_8_8_8_8_3_1_1_batch_1,se_16_16_16_16_3_1_2_batch_1,se_32_32_32_32_3_0_1_batch_1,se_256_0_0_0_3_0_2_batch_1,r,b,c_128_1'\n```\n\nNote that due to non-determinism inherent to the currently established process of training deep networks, it is nearly impossible to exactly reproduce the pre-trained networks in `networks` directory.\n\nAlso note the convenience of an output directory containing \"TTTT\" will afford this substring being replaced by the current `datetime`.\n\n## Changelog\n\n- 23.02.2020: Updated code to follow our revised [JCIM paper](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00554), in particular away moving from UniProt-based splitting strategy as in our [BioRxiv](https://www.biorxiv.org/content/10.1101/600304v1) paper to sequence-based clustering approach whereby protein structures sharing more than 30% sequence identity are always allocated to the same testing/training set. We have also made data pre-processing more robust and frozen the versions of several dependencies. The old code is kept in `old_bioarxiv_version` branch, though note the legacy splitting behavior can be turned on also in the current `master` by setting `--db_split_strategy` command line argument in the scripts to `uniprot_folds` instead of `seqclust`.\n- 08.12.2020: pinned versions of requirements and updated DockerFile and README to reflect build instructions\n- 28.09.2021: replaced conda htmd with source build in dockerfile to relieve dependency solver (patched: 2.12.2021, also added biopython fn to remove non-protein atoms instead of VMD which is deprecated)\n\n## License Terms\n\n(c) BenevolentAI Limited 2019. All rights reserved.\u003cbr\u003e\nFor licensing enquiries, please contact hello@benevolent.ai\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FbenevolentAI%2FDeeplyTough","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FbenevolentAI%2FDeeplyTough","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FbenevolentAI%2FDeeplyTough/lists"}