{"id":44964496,"url":"https://github.com/bioinfomachinelearning/cryo2strut2","last_synced_at":"2026-02-18T14:09:45.512Z","repository":{"id":278103674,"uuid":"929889335","full_name":"BioinfoMachineLearning/Cryo2Strut2","owner":"BioinfoMachineLearning","description":"The second version of Cryo2Struct for reconstructing protein structures from cryo-EM density maps","archived":false,"fork":false,"pushed_at":"2025-02-18T20:29:24.000Z","size":4081,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-09T16:34:16.302Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BioinfoMachineLearning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-09T16:32:38.000Z","updated_at":"2025-06-10T07:57:14.000Z","dependencies_parsed_at":"2025-02-18T02:48:39.946Z","dependency_job_id":null,"html_url":"https://github.com/BioinfoMachineLearning/Cryo2Strut2","commit_stats":null,"previous_names":["bioinfomachinelearning/cryo2strut2"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BioinfoMachineLearning/Cryo2Strut2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FCryo2Strut2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FCryo2Strut2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FCryo2Strut2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FCryo2Strut2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BioinfoMachineLearning","download_url":"https://codeload.github.com/BioinfoMachineLearning/Cryo2Strut2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FCryo2Strut2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29581628,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T13:56:48.962Z","status":"ssl_error","status_checked_at":"2026-02-18T13:54:34.145Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-18T14:09:41.271Z","updated_at":"2026-02-18T14:09:45.504Z","avatar_url":"https://github.com/BioinfoMachineLearning.png","language":"Python","readme":"# Cryo2Struct2\n\n\nCryo2Struct2 is a fully automated method for modeling 3D atomic structures from cryo-EM density maps, building on its predecessor, Cryo2Struct. It employs a multi-task deep learning model that integrates sequence-based features from a Protein Language Model (ESM) with cryo-EM density maps, merging feature representation across modalities. The predicted voxels are then used to construct a Hidden Markov Model (HMM), followed by a customized Viterbi algorithm to align sequences and generate initial protein backbone structures. These backbone models are used as templates for AlphaFold3, which further refines the structures for improved accuracy. By integrating cryo-EM data with AlphaFold3 predictions, Cryo2Struct2 improves structure refinement and helps AlphaFold3 to predict accurate structures.\n\n## Setup Environment (Locally)\nTo setup Cryo2Struct2 locally, follow the steps below. It takes about 3-7 minutes to set up the environment to run Cryo2Struct2.\n\nClone this repository and `cd` into it\n```\ngit clone https://github.com/BioinfoMachineLearning/Cryo2Strut2.git\ncd ./Cryo2Struct2\n```\n\nWe will set up the environment using Anaconda. This is an example of setting up a conda environment to run the code. Use the following command to create the conda environment using the ``cryo2struct2.yml`` file.\n\n```\nconda env create -f cryo2struct2.yml\nconda activate cryo2struct2\n```\n\n## Atomic structure modeling using Cryo2Struct2\n\n1. \u003cins\u003e**Input**\u003c/ins\u003e: **cryo-EM density map and sequence** : First, you need to prepare your own data or use our provided example data. The directory should be organized as follows:\n```text \ncryo2struct\n|── input\n    │── 34610\n        │-- emd_34610.map\n        |-- 8hb0.fasta\n        |-- 8hb0.pdb\n```\nThe `emd_34610.map` is the density map with EMD ID: 34610 downloaded from EMDB website. The `8hb0.fasta` is the corresponding sequence file.  \n\nThe `8hb0.pdb` file is a PDB structure file used in this test example to generate embeddings using ESM. Alternatively, users can use the `8hb0.fasta` file to generate embeddings from ESM.\n\nThe first step is to make input cryo-EM map ready for Cryo2Struct2. We run [UCSF ChimeraX](https://www.cgl.ucsf.edu/chimerax/index.html) in non-GUI mode to resample the density map to 1 Angstrom, please install it to preprocess the map. We used ChimeraX 1.4-1 in CentOS 8 system. Once ChimeraX is installed, then please run the following.\n\n```\nbash preprocess/run_data_preparation.bash input/\n```\nIn the above example ``input/`` is the ``absolute input path`` where the maps are present.\n\n**Note**: For this example, the normalized map is provided, so there is no need to run the above bash command to prepare the map. Hence, the directory structure for this example looks like this:\n\n```text \ncryo2struct\n|── input\n    │── 34610\n        │-- emd_34610.map\n        |-- emd_normalized_map.mrc\n        |-- 8hb0.fasta\n        |-- 8hb0.pdb\n```\n2. \u003cins\u003e**Set Up ESM**\u003c/ins\u003e:\nSet up ESM in your system following the instruction provided in https://github.com/facebookresearch/esm . The esm.pretrained model we used is `esm2_t36_3B_UR50D()`. Change the path of saved ESM model in [utils/grid_division.py](utils/grid_division.py).\n\n3. \u003cins\u003e**Running Cryo2Struct2**\u003c/ins\u003e:\nThe deep learning requires trained atom and amino acid type models. The trained models are available in [Cryo2Struct2 Harvard Dataverse](https://doi.org/10.7910/DVN/YYHWZO). Use the following to download the trained models. \n\n```\ncd models\nwget -O amino_acid_type.ckpt https://dataverse.harvard.edu/api/access/datafile/10888677\nwget -O atom_type.ckpt https://dataverse.harvard.edu/api/access/datafile/10888678\ncd ..\n```\n\nThe organization of the downloaded models should look like:\n```text \ncryo2struct\n|── input\n    │── 34610\n        │-- emd_34610.map\n        |-- emd_normalized_map.mrc\n        |-- 8hb0.fasta\n        |-- 8hb0.pdb\n|── models\n    │-- amino_acid_type.ckpt\n    |-- atom_type.ckpt\n    |-- aa_regression_model.pkl\n    |-- ca_regression_model.pkl\n```\n\nUpdate the configurations in the [config/arguments.yml](config/arguments.yml) file. Especialy the input data directory, trained model checkpoint path,  and density map name. By default the program runs inference in `CPU`, running the inference program on the ``GPU`` speeds up prediction. To enable ``GPU`` processing, modify ``infer_run_on`` in the configuration file to ``gpu`` and provide the GPU device id on ``infer_on_gpu`` (example: 0). One way to update the configuration by using visual editor (``vi``).\n\n```\nvi config/arguments.yml\n```\n\n\u003cins\u003e**Compile Modified Viterbi algorithm:**\u003c/ins\u003e\nThe Hidden Markov Model-guided carbon-alpha alignment programs are available in [viterbi/](viterbi/). The alignment algorithm is written in C++ program, so compile them using: \n\n```\ncd viterbi\ng++ -fPIC -shared -o viterbi.so viterbi.cpp -O3\ncd ..\n```\nDuring the compilation, if the program asks for installation of `gcc-c++` package, then install it following the instructions. GCC C++ compiler is required to compile `viterbi.cpp`.\n\nIf the compilation of the program fails due to library issues (which typically occurs when attempting to compile on older systems), you can try compiling using the following approach:\n```\ncd viterbi\nconda install -c conda-forge gxx\ng++ -fPIC -shared -o viterbi.so viterbi.cpp -O3\ncd ..\n```\nThe above command installs the ``gxx`` package in the activated conda environment, which provides the GCC C++ compiler. This compiler is useful for compiling C++ code on the system. The HMM alignment program runs on the ``CPU`` and is optimized at the highest level using the``-O3`` flag. We tested, and the above compilation was successful on CentOS 7, 8, and AlmaLinux OS 8.8, 8.9. \n\nFinally, run the following:\n\n```\npython3 cryo2struct2.py --density_map_name 34610\n```\n\n4. \u003cins\u003e**Output**\u003c/ins\u003e:  **Modeled atomic structure**\nThe output model will be saved in the density map's directory. \n\n\n5. \u003cins\u003e**Integrating Cryo2Struct2 Models as Templates for AlphaFold3**\u003c/ins\u003e: \nThe models generated by Cryo2Struct2 are used as templates for AlphaFold3. Use the provided script [prepare_script_af3_multichain_multi_template.py](prepare_script_af3_multichain_multi_template.py) to generate `.json` files that will be used as input to run AlphaFold3.\n\n\n\n6. \u003cins\u003e**Set up AlphaFold3**\u003c/ins\u003e: \nRequest AlphaFold3 parameters and follow the instructions to set up AlphaFold3 from here : https://github.com/google-deepmind/alphafold3 .\n\n\n7. \u003cins\u003e**Run AlphaFold3**\u003c/ins\u003e: \nUse the script [run_af3_docker_all.py](run_af3_docker_all.py) to run AlphaFold3 and to predict structures.\n\n\n## Training Cryo2Struct2 Deep Learning\nThe training programs are available in the [train/](train/) directory. Cryo2Struct2 was trained on Cryo2StructData, which is accessible on the [Cryo2StructData Dataverse](https://doi.org/10.7910/DVN/FCDG0W). Download the full dataset from [Cryo2Struct Full Dataset](https://doi.org/10.7910/DVN/FCDG0W) or a small subset from [Cryo2Struct Small Subsample Dataset](https://doi.org/10.7910/DVN/CGUENL). After downloading the dataset, `unzip` the compressed files. The directory names are the EMD ID of the cryo-EM density map.\n\nThe dataset contains the preprocessed map ready for deep learning training. However, the cryo-EM density map label needs to be prepared. Run the following\n\n\n```\npython3 label/get_atoms_label.py density_map_directory\npython3 label/get_amino_labels.py density_map_directory\n```\n\nThe `density_map_directory` is the absolute directory path where unzipped cryo-EM density maps are present. The above scripts generate the atom and amino acid-type labels, which are used during the training of the deep learning model.\n\nSplit the data into training and validation sets. If you choose to use our predefined training and validation splits, refer to the Excel sheet in [Cryo2StructData Metadata](https://doi.org/10.7910/DVN/JMN60H), which contains the IDs for the training and validation cryo-EM density maps. Create separate directories for training and validation, and move the corresponding data to each directory.\n\n\nGenerate sub-grids of cryo-EM density maps from training and validation dataset for training. These sub-grids are used for training the model. Run the following:\n\n```\npython3 train/grid_division_train.py train_map_directory train_sub_grids\npython3 train/grid_division_train.py valid_map_directory valid_sub_grids\n```\n\nThe `train_map_directory` is the directory containing training cryo-EM density maps, and `train_sub_grids` is the directory where the training sub-grids will be generated. Similarly, `valid_map_directory` is the directory containing validation cryo-EM density maps, and `valid_sub_grids` is the directory where the validation sub-grids will be generated. After generation of sub-grids, run:\n\n```\nls train_sub_grids \u003e train_splits.txt\nls valid_sub_grids \u003e valid_splits.txt\n```\n\nWe used the distributed data parallel (DDP) technique to train the models on 24 compute nodes, each equipped with 6 NVIDIA V100 GPUs with 32GB of memory. The training program can run on a single GPU, multiple GPUs, or a multi-node cluster with multiple GPUs. Finally, in the training scripts [train/train.py](train/train.py) change the values in `AVAIL_GPUS` to the number of GPUs available in the compute node, `NUM_NODES` to the number of available compute nodes, and set `BATCH_SIZE`, and `DATASET_DIR` to the path of the Cryo2Struct directory. Then, train the model by running:\n\n```\npython3 train/train.py    # trains both amino acid-type and atom prediction model\n```\nMonitor the training progress in [Weights and Biases](https://wandb.ai/site).\n\n\nOptional: The source code for data preprocessing, label generation and validation of training data is available at [Cryo2StructData GitHub repository](https://github.com/BioinfoMachineLearning/cryo2struct).\n\n\n## Contact Information\nIf you have any question, feel free to open an issue or reach out to us: [ngzvh@missouri.edu](ngzvh@missouri.edu), [chengji@missouri.edu](chengji@missouri.edu).\n\n## Acknowledgements\nWe thank the High-Performance Computing (HPC) resource, Hellbender, located at the University of Missouri, Columbia, MO, which was used for training, inference and alignment process.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbioinfomachinelearning%2Fcryo2strut2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbioinfomachinelearning%2Fcryo2strut2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbioinfomachinelearning%2Fcryo2strut2/lists"}