{"id":21480608,"url":"https://github.com/flatironinstitute/deepfri","last_synced_at":"2025-04-06T08:15:16.317Z","repository":{"id":38465832,"uuid":"217190598","full_name":"flatironinstitute/DeepFRI","owner":"flatironinstitute","description":"Deep functional residue identification","archived":false,"fork":false,"pushed_at":"2023-03-24T22:40:45.000Z","size":406365,"stargazers_count":315,"open_issues_count":33,"forks_count":83,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-30T06:10:02.541Z","etag":null,"topics":["class-activation-maps","deep-learning","gene-ontology","graph-convolutional-networks","machine-learning","protein-data-bank","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flatironinstitute.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-10-24T02:03:29.000Z","updated_at":"2025-03-25T18:31:44.000Z","dependencies_parsed_at":"2023-02-09T12:15:17.708Z","dependency_job_id":"63d75a85-bd92-41fc-8d1c-5a96fa81a350","html_url":"https://github.com/flatironinstitute/DeepFRI","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flatironinstitute%2FDeepFRI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flatironinstitute%2FDeepFRI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flatironinstitute%2FDeepFRI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flatironinstitute%2FDeepFRI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flatironinstitute","download_url":"https://codeload.github.com/flatironinstitute/DeepFRI/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247451667,"owners_count":20940944,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["class-activation-maps","deep-learning","gene-ontology","graph-convolutional-networks","machine-learning","protein-data-bank","tensorflow"],"created_at":"2024-11-23T12:16:53.430Z","updated_at":"2025-04-06T08:15:16.272Z","avatar_url":"https://github.com/flatironinstitute.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepFRI\nDeep functional residue identification\n\u003cimg src=\"figs/pipeline.png\"\u003e\n\n## Citing\n```\n@article {Gligorijevic2019,\n\tauthor = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Kosciolek, Tomasz and Leman,\n\tJulia Koehler and Cho, Kyunghyun and Vatanen, Tommi and Berenberg, Daniel\n\tand Taylor, Bryn and Fisk, Ian M. and Xavier, Ramnik J. and Knight, Rob and Bonneau, Richard},\n\ttitle = {Structure-Based Function Prediction using Graph Convolutional Networks},\n\tyear = {2019},\n\tdoi = {10.1101/786236},\n\tpublisher = {Cold Spring Harbor Laboratory},\n\tURL = {https://www.biorxiv.org/content/early/2019/10/04/786236},\n\tjournal = {bioRxiv}\n}\n\n```\n## Dependencies\n\n*DeepFRI* is tested to work under Python 3.7.\n\nThe required dependencies for *DeepFRI* are  [TensorFlow](https://www.tensorflow.org/), [Biopython](https://biopython.org/) and [scikit-learn](http://scikit-learn.org/).\nTo install all dependencies run:\n\n```\npip install .\n```\n\n\n# Protein function prediction\nTo predict protein functions use `predict.py` script with the following options:\n\n* `seq`             str, Protein sequence as a string\n* `cmap`            str, Name of a file storing a protein contact map and sequence in `*.npz` file format (with the following numpy array variables: `C_alpha`, `seqres`. See `examples/pdb_cmaps/`)\n* `pdb`             str, Name of a PDB file (cleaned)\n* `pdb_dir`         str, Directory with cleaned PDB files (see `examples/pdb_files/`)\n* `cmap_csv`        str, Filename of the catalogue (in `*.csv` file format) containg mapping between protein names and directory with `*.npz` files (see `examples/catalogue_pdb_chains.csv`)\n* `fasta_fn`        str, Fasta filename (see `examples/pdb_chains.fasta`)\n* `model_config`    str, JSON file with model filenames (see `trained_models/`)\n* `ont`             str, Ontology (`mf` - Molecular Function, `bp` - Biological Process, `cc` - Cellular Component, `ec` - Enzyme Commission)\n* `output_fn_prefix`   str, Output filename (sampe prefix for predictions/saliency will be used)\n* `verbose`         bool, Whether or not to print function prediction results\n* `saliency`        bool, Whether or not to compute class activaton maps (outputs a `*.json` file)\n\nGenerated files (see `examples/outputs/`):\n* `output_fn_prefix_MF_predictions.csv`   Predictions in the `*.csv` file format with columns: Protein, GO-term/EC-number, Score, GO-term/EC-number name\n* `output_fn_prefix_MF_pred_scores.json`   Predictions in the `*.json` file with keys: `pdb_chains`, `Y_hat`, `goterms`, `gonames`\n* `output_fn_prefix_MF_saliency_maps.json` JSON file storing a dictionary of saliency maps for each predicted function of every protein\n\nDeepFRI offers 6 possible options for predicting functions. See examples below.\n\n## Option 1: predicting functions of a protein from its contact map\n\nExample: predicting MF-GO terms for Parvalbumin alpha protein using its sequence and contact map (PDB: [1S3P](https://www.rcsb.org/structure/1S3P)):\n\n```\n\u003e\u003e python predict.py --cmap ./examples/pdb_cmaps/1S3P-A.npz -ont mf --verbose\n\n```\n\n### Output:\n\n```txt\nProtein GO-term/EC-number Score GO-term/EC-number name\nquery_prot GO:0005509 0.99824 calcium ion binding\n```\n\n## Option 2: predicting functions of a protein from its sequence\n\nExample: predicting MF-GO terms for Parvalbumin alpha protein using its sequence (PDB: [1S3P](https://www.rcsb.org/structure/1S3P)):\n\n```\n\u003e\u003e python predict.py --seq 'SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKDGFIDEDELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES' -ont mf --verbose\n\n```\n\n### Output:\n\n```txt\nProtein GO-term/EC-number Score GO-term/EC-number name\nquery_prot GO:0005509 0.99769 calcium ion binding\n```\n\n## Option 3: predicting functions of proteins from a fasta file\n\n```\n\u003e\u003e python predict.py --fasta_fn examples/pdb_chains.fasta -ont mf -v\n\n```\n\n### Output:\n\n```txt\nProtein GO-term/EC-number Score GO-term/EC-number name\n1S3P-A GO:0005509 0.99769 calcium ion binding\n2J9H-A GO:0004364 0.46937 glutathione transferase activity\n2J9H-A GO:0016765 0.19910 transferase activity, transferring alkyl or aryl\n(other than methyl) groups\n2J9H-A GO:0097367 0.10537 carbohydrate derivative binding\n2PE5-B GO:0003677 0.53502 DNA binding\n2W83-E GO:0032550 0.99260 purine ribonucleoside binding\n2W83-E GO:0001883 0.99242 purine nucleoside binding\n2W83-E GO:0005525 0.99231 GTP binding\n2W83-E GO:0019001 0.99222 guanyl nucleotide binding\n2W83-E GO:0032561 0.99194 guanyl ribonucleotide binding\n2W83-E GO:0032549 0.99149 ribonucleoside binding\n2W83-E GO:0001882 0.99135 nucleoside binding\n2W83-E GO:0017076 0.98687 purine nucleotide binding\n2W83-E GO:0032555 0.98641 purine ribonucleotide binding\n2W83-E GO:0035639 0.98611 purine ribonucleoside triphosphate binding\n2W83-E GO:0032553 0.98573 ribonucleotide binding\n2W83-E GO:0097367 0.98168 carbohydrate derivative binding\n2W83-E GO:0003924 0.52355 GTPase activity\n2W83-E GO:0016817 0.36863 hydrolase activity, acting on acid anhydrides\n2W83-E GO:0016818 0.36683 hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides\n2W83-E GO:0017111 0.35465 nucleoside-triphosphatase activity\n2W83-E GO:0016462 0.35303 pyrophosphatase activity\n```\n\n## Option 4: predicting functions of proteins from contact map catalogue\n\n```\n\u003e\u003e python predict.py --cmap_csv examples/catalogue_pdb_chains.csv -ont mf -v\n\n```\n\n### Output:\n\n```txt\nProtein GO-term/EC-number Score GO-term/EC-number name\n1S3P-A GO:0005509 0.99824 calcium ion binding\n2J9H-A GO:0004364 0.84826 glutathione transferase activity\n2J9H-A GO:0016765 0.82014 transferase activity, transferring alkyl or aryl\n(other than methyl) groups\n2PE5-B GO:0003677 0.89086 DNA binding\n2PE5-B GO:0017111 0.12892 nucleoside-triphosphatase activity\n2PE5-B GO:0004386 0.12847 helicase activity\n2PE5-B GO:0032553 0.12091 ribonucleotide binding\n2PE5-B GO:0097367 0.11961 carbohydrate derivative binding\n2PE5-B GO:0016887 0.11331 ATPase activity\n2W83-E GO:0097367 0.97069 carbohydrate derivative binding\n2W83-E GO:0019001 0.96842 guanyl nucleotide binding\n2W83-E GO:0017076 0.96737 purine nucleotide binding\n2W83-E GO:0001882 0.96473 nucleoside binding\n2W83-E GO:0035639 0.96439 purine ribonucleoside triphosphate binding\n2W83-E GO:0032555 0.96294 purine ribonucleotide binding\n2W83-E GO:0016818 0.96181 hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides\n2W83-E GO:0032550 0.96142 purine ribonucleoside binding\n2W83-E GO:0016817 0.96082 hydrolase activity, acting on acid anhydrides\n2W83-E GO:0016462 0.95998 pyrophosphatase activity\n2W83-E GO:0032553 0.95935 ribonucleotide binding\n2W83-E GO:0032561 0.95930 guanyl ribonucleotide binding\n2W83-E GO:0032549 0.95877 ribonucleoside binding\n2W83-E GO:0003924 0.95453 GTPase activity\n2W83-E GO:0001883 0.95271 purine nucleoside binding\n2W83-E GO:0005525 0.94635 GTP binding\n2W83-E GO:0017111 0.93942 nucleoside-triphosphatase activity\n2W83-E GO:0044877 0.64519 protein-containing complex binding\n2W83-E GO:0001664 0.31413 G protein-coupled receptor binding\n2W83-E GO:0005102 0.20078 signaling receptor binding\n```\n\n## Option 5: predicting functions of a protein from a PDB file\n```\n\u003e\u003e python predict.py -pdb ./examples/pdb_files/1S3P-A.pdb -ont mf -v\n\n```\n\n### Output:\n\n```txt\nProtein GO-term/EC-number Score GO-term/EC-number name\nquery_prot GO:0005509 0.99824 calcium ion binding\n```\n\n## Option 6: predicting functions of a protein from a directory with PDB files\n```\n\u003e\u003e python predict.py --pdb_dir ./examples/pdb_files -ont mf --saliency --use_backprop\n\n```\n\n### Output:\n\nSee files in: `examples/outputs/`\n\n\n# Training DeepFRI\nTo train *DeepFRI* run the following command from the project directory:\n```\n\u003e\u003e python train_DeepFRI.py -h\n```\n\nor to launch jobs run the following script:\n```\n\u003e\u003e ./run_train_DeepFRI.sh\n```\n\n## Output\nGenerated files:\n* `model_name_prefix_ont_model.hdf5`   trained model with architecture and weights saved in HDF5 format\n* `model_name_prefix_ont_pred_scores.pckl` pickle file with predicted GO term/EC number scores for test proteins\n* `model_name_prefix_ont_model_params.json` JSON file with metadata (GO terms/names, architecture params, etc.)\n\nSee examples of pre-trained models (`*.hdf5`) and model params (`*.json`) in: `trained_models/`.\n\n\n# Functional residue identification\nTo visualize class activation (saliency) maps use `viz_gradCAM.py` script with the following options:\n\n* `saliency_fn` str, JSON filename with saliency maps generated by `predict.py` script (see Option 6 above)\n* `list_all`    bool, list all proteins and their predicted GO terms with corresponding class activation (saliency) maps\n* `protein_id`  str, protein (PDB chain), saliency maps of which are to be visualized for each predicted function\n* `go_id`       str, GO term, saliency maps of which are to be visualized\n* `go_name`     str, GO name, saliency maps of which are to be visualized\n\nGenerated files:\n* `saliency_fig_PDB-chain_GOterm.png`  class activation (saliency) map profile over sequence (see fig below, right)\n* `pymol_viz.py` pymol script for mapping salient residues onto 3D structure (pymol output is shown in fig below, left)\n\n## Example:\n\n```\n\u003e\u003e\u003e python viz_gradCAM.py -i ./examples/outputs/DeepFRI_MF_saliency_maps.json -p 1S3P-A -go GO:0005509\n```\n\n### Output:\n\u003cimg src=\"figs/saliency.png\"\u003e\n\n\n# Data\n\nData (train and validation) used for training DeepFRI model are provided as TensorFlow-specific `TFRecord` files and they can be downloaded from:\n\n| PDB | SWISS-MODEL |\n| --- | --- |\n| [Gene Ontology](https://users.flatironinstitute.org/~renfrew/DeepFRI_data/PDB-GO.tar.gz)(19GB) | [Gene Ontology](https://users.flatironinstitute.org/~renfrew/DeepFRI_data/SWISS-MODEL-GO.tar.gz)(165GB) |\n| [Enzyme Commission](https://users.flatironinstitute.org/~renfrew/DeepFRI_data/PDB-EC.tar.gz)(13GB) | [Enzyme Commission](https://users.flatironinstitute.org/~renfrew/DeepFRI_data/SWISS-MODEL-EC.tar.gz)(117GB) |\n\n# Pretrained models\n\nPretrained models can be downloaded from:\n* [Models](https://users.flatironinstitute.org/~renfrew/DeepFRI_data/trained_models.tar.gz) (use these models if you run DeepFRI on GPU)\n* [Newest Models](https://users.flatironinstitute.org/~renfrew/DeepFRI_data/newest_trained_models.tar.gz) (use these models if you run DeepFRI on CPU)\n\nUncompress `tar.gz` file into the DeepFRI directory (`tar xvzf trained_models.tar.gz -C /path/to/DeepFRI`).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflatironinstitute%2Fdeepfri","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflatironinstitute%2Fdeepfri","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflatironinstitute%2Fdeepfri/lists"}