{"id":13958590,"url":"https://github.com/ProteinDesignLab/protein_seq_des","last_synced_at":"2025-07-21T00:31:19.896Z","repository":{"id":47656724,"uuid":"303550500","full_name":"ProteinDesignLab/protein_seq_des","owner":"ProteinDesignLab","description":"Code for our paper \"Protein sequence design with a learned potential\"","archived":false,"fork":false,"pushed_at":"2023-09-08T04:04:41.000Z","size":5171,"stargazers_count":77,"open_issues_count":0,"forks_count":20,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-28T02:34:50.580Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"nanand2/protein_seq_des","license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ProteinDesignLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-13T01:00:13.000Z","updated_at":"2024-10-15T14:25:45.000Z","dependencies_parsed_at":"2024-11-28T02:32:23.891Z","dependency_job_id":"89d44540-7c7c-43f6-a868-2ca3730acd9d","html_url":"https://github.com/ProteinDesignLab/protein_seq_des","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ProteinDesignLab/protein_seq_des","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProteinDesignLab%2Fprotein_seq_des","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProteinDesignLab%2Fprotein_seq_des/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProteinDesignLab%2Fprotein_seq_des/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProteinDesignLab%2Fprotein_seq_des/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ProteinDesignLab","download_url":"https://codeload.github.com/ProteinDesignLab/protein_seq_des/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProteinDesignLab%2Fprotein_seq_des/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266221259,"owners_count":23894965,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-08T13:01:45.629Z","updated_at":"2025-07-21T00:31:15.358Z","avatar_url":"https://github.com/ProteinDesignLab.png","language":"Python","readme":"# Protein sequence design with a learned potential\n\nCode for the algorithm in our paper \n\n\u003e Namrata Anand-Achim, Raphael R. Eguchi, Alexander Derry, Russ B. Altman, and Possu Huang. \"Protein sequence design with a learned potential.\" bioRxiv (2020).\n\u003e [[biorxiv]](https://www.biorxiv.org/content/10.1101/2020.01.06.895466v1) [[cite]](#citation)\n\n![Model design trajectory](imgs/tim.gif)\n\nEntirely AI designed four-fold symmetric TIM-barrel\n\n## Requirements\n\n* Python 3\n* [PyTorch](https://pytorch.org)\n* [PyRosetta4](http://www.pyrosetta.org/dow)\n* Python packages in requirements.txt\n* Download pretrained models [here](https://drive.google.com/file/d/1X66RLbaA2-qTlJLlG9TI53cao8gaKnEt/view?usp=sharing) \n\nSee [here](https://github.com/nanand2/protein_seq_des/blob/master/SETUP.md) for set-up instructions on Ubuntu 18.04 with Miniconda, Python 3.7, PyTorch 1.1.0, CUDA 9.0. \n\n\n## Design\n\nIf you'd like to use the pre-trained models to run design, jump to [[this section]](#running-design)\n\n## Generating data\nData is available [here](https://drive.google.com/drive/folders/1MD-tu32SoYtZGag04HwntuxcuOnYPDXs). See the README in the drive for more information about the uploaded files. For the files used to generate the above coordinates, see the .txt files with the domain IDs (see data/train_domains_s95.txt and data/test_domain_s95.txt). These will be the inputs to regenerate the dataset. If you don't have PDB files downloaded, the script will download those and save it to pdb_dir.\n\nIf you'd like to generate the dataset or change the underlying data run the following commands.\n\nTo load and save coordinates for the backbone (BB) only model:\n```\npython load_and_save_bb_coords.py --save_dir PATH_TO_SAVE_DATA --pdb_dir PATH_TO_PDB_FILES --log_dir PATH_TO_LOG_DIR --txt PATH_TO_DOMAIN_TXT_FILE\n```\n\nTo load and save coordinates for the main model:\n```\npython load_and_save_coords.py --save_dir PATH_TO_SAVE_DATA --pdb_dir PATH_TO_PDB_FILES --log_dir PATH_TO_LOG_DIR --txt PATH_TO_DOMAIN_TXT_FILE\n```\n\n## Training the models\n\nPretrained models are available [here](https://drive.google.com/file/d/1cHoyeI0H_Jo9bqgFH4z0dfx2s9as9Jp1/view?usp=sharing) but you can also use the available scripts to train from scratch.\n\nTo train the baseline model -- residue and autoregressive rotamer prediction conditioned on backbone (BB) atoms only model (no side-chains):\n```\npython train_autoreg_chi_baseline.py --batchSize 4096 --workers 12 --lr 1.5e-4 --validation_frequency 100 --save_frequency 1000 --log_dir PATH_TO_LOG_DIR --data_dir PATH_TO_DATA\n```\n\nTo train the main model -- residue and autoregressive rotamer prediction conditioned on neighboring side-chains:\n```\npython train_autoreg_chi.py --batchSize 2048 --workers 12 --lr 7.5e-5 --validation_frequency 200 --save_frequency 2000 --log_dir PATH_TO_LOG_DIR --data_dir PATH_TO_DATA\n```\nNote that training was originally done across 8 V100 GPUs with DataParallel mode.\n\n\n  \n## Running design\n\nTo run a design trajectory, specify starting backbone with an input PDB. \n\n```\npython run.py --pdb pdbs/3mx7_gt.pdb\n```\n\nTo run a rotamer repacking trajectory with the model, specify the repack only option\n```\npython run.py --pdb pdbs/3mx7_gt.pdb --repack_only 1\n```\n\nTo specify k-fold symmetry in design or packing, specify the symmetry options \n```\npython run.py --pdb pdbs/tim10.pdb --symmetry 1 --k 4 [--repack_only 1]\n```\n\nTo constraint a subset of positions to remain fixed, point to a txt file with fixed residue indices, for example\n```\npython run.py --pdb pdbs/tim10.pdb --fixed_idx txt/test_idx.txt\n```\n\nAnd to constrain a subset of positions to be designed, keeping all others fixed, point to a txt file with variable residue indices, for example\n```\npython run.py --pdb pdbs/tim10.pdb --var_idx txt/test_idx.txt\n```\n\nSee [below](#design-parameters) for additional design parameters.\n  \n## Monitoring metrics\nDesign metrics can be monitored using Tensorboard\n\n```\ntensorboard --log_dir='./logs'\n```\n\nNote that the input PDB sequence and rotamers are considered 'ground-truth' for sequence and rotamer recovery metrics.\n\n\n\n## Design parameters\n\n* Design inputs\n```\n  --pdb              Path to input PDB\n  --model_list       Paths to conditional models. (Default: ['models/conditional_model_0.pt', \n                     'models/conditional_model_0.pt', 'models/conditional_model_1.pt', \n                     'models/conditional_model_2.pt', 'models/conditional_model_3.pt'])\n  --init_model       Path to baseline model for sequence initialization.\n                     (Default: 'models/baseline_model.pt')\n```\n* Saving / logging\n```\n  --log_dir             Path to desired output log folder for designed\n                        structures.  (Default: ./logs)\n  --seed                Random seed. Design runs are non-deterministic.\n                        (Default: 2)\n  --save_rate           How often to save intermediate designed structures\n                        (Default: 10)\n\n```\n* Sequence initialization\n```\n  --randomize {0,1}     Randomize starting sequence/rotamers for design.\n                        Toggle to 0 to keep starting sequence and rotamers.\n                        (Default: 1)\n  --no_init_model {0,1} Do not use baseline model to predict initial sequence/rotamers.\n                        (Default: 0)\n  --ala {0,1}           Initialize sequence with poly-alanine. (Default: 0)\n  --val {0,1}           Initialize sequence with poly-valine. (Default: 0)\n```\n* Rotamer repacking parameters\n```\n  --repack_only {0,1}   Only run rotamer repacking.  (Default: 0)\n  --use_rosetta_packer {0,1}\n                        Use the Rosetta packer instead of the model for\n                        rotamer repacking during design.  If in symmetry \n                        mode, rotamers are not packed symmetrically. (Default: 0)\n  --pack_radius         Radius in angstroms for Rosetta rotamer packing after\n                        residue mutation. Must set --use_rosetta_packer 1\n                        (Default: 0)\n```\n* Design parameters\n```\n  --symmetry {0,1}      Enforce symmetry during design (Default: 0)\n  --k                   Enforce k-fold symmetry. Input pose length must be\n                        divisible by k. Requires --symmetry 1 (Default: 4)\n  --restrict_gly {0,1}  Enforce no glycines for non-loop backbone positions\n                        based on DSSP assignment. (Default: 1)\n  --no_cys {0,1}        Enforce no cysteines in design (Default: 0)\n  --no_met {0,1}        Enforce no methionines in design (Default: 0)\n  --var_idx             Path to txt file listing pose indices that should be\n                        designed/packed, all other side-chains will remain\n                        fixed. Cannot be specified if fixed_idx file given. \n                        Not supported with symmetry mode. 0-indexed\n  --fixed_idx           Path to txt file listing pose indices that should NOT\n                        be designed/packed, all other side-chains will be\n                        designed/packed. Cannot be specified if var_idx file given. \n                        Not supported with symmetry mode. 0-indexed \n  --resfile\t\t        Enforce resfile on particular residues. 0-indexed\n```\n\nlearn more about [resfile](https://github.com/ProteinDesignLab/protein_seq_des/tree/master/seq_des/util)\n\n* Sampling / optimization parameters\n```\n  --anneal {0,1}        Option to do simulated annealing of average negative\n                        model pseudo-log-likelihood. Toggle to 0 to do vanilla\n                        blocked sampling (Default: 1)\n  --step_rate           Multiplicative step rate for simulated annealing (Default: 0.995)\n  --anneal_start_temp   Starting temperature for simulated annealing (Default: 1)\n  --anneal_final_temp   Final temperature for simulated annealing (Default: 0)\n  --n_iters             Total number of iterations (Default: 2500)\n  --threshold           Threshold in angstroms for defining conditionally\n                        independent residues for blocked sampling (should be\n                        greater than ~17.3) (Default: 20)\n```\n\nAdditional information\n* Code expects single chain PDB input.\n* Specifying fixed/variable indices not currently supported in symmetry mode.\n* Model rotamer packing in symmetry mode does symmetric rotamer packing, but using the Rosetta packer does not.\n\n## Citation\nIf you find our work relevant to your research, please cite:\n```\n@article{anand2020protein,\n  title={Protein sequence design with a learned potential},\n  author={Anand, Namrata and Eguchi, Raphael Ryuichi and Derry, Alexander and Altman, Russ B and Huang, Possu},\n  journal={bioRxiv},\n  year={2020},\n  publisher={Cold Spring Harbor Laboratory}\n}\n```\n","funding_links":[],"categories":["蛋白质结构"],"sub_categories":["网络服务_其他"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FProteinDesignLab%2Fprotein_seq_des","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FProteinDesignLab%2Fprotein_seq_des","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FProteinDesignLab%2Fprotein_seq_des/lists"}