{"id":13685166,"url":"https://github.com/Zuricho/ParallelFold","last_synced_at":"2025-05-01T01:30:57.590Z","repository":{"id":37434815,"uuid":"391804544","full_name":"Zuricho/ParallelFold","owner":"Zuricho","description":"Modified version of Alphafold to divide CPU part (MSA and template searching) and GPU part. This can accelerate Alphafold when predicting multiple structures","archived":false,"fork":false,"pushed_at":"2024-02-25T11:18:29.000Z","size":837,"stargazers_count":147,"open_issues_count":27,"forks_count":45,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-11-12T06:34:23.942Z","etag":null,"topics":["alphafold","alphafold2","parafold","parallelization"],"latest_commit_sha":null,"homepage":"https://parafold.sjtu.edu.cn","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Zuricho.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-02T03:28:33.000Z","updated_at":"2024-11-10T07:46:55.000Z","dependencies_parsed_at":"2024-11-15T00:16:21.946Z","dependency_job_id":"b410c9f4-d94e-4dbd-aa9d-1491ed200e63","html_url":"https://github.com/Zuricho/ParallelFold","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zuricho%2FParallelFold","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zuricho%2FParallelFold/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zuricho%2FParallelFold/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zuricho%2FParallelFold/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Zuricho","download_url":"https://codeload.github.com/Zuricho/ParallelFold/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251808386,"owners_count":21647283,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alphafold","alphafold2","parafold","parallelization"],"created_at":"2024-08-02T14:00:45.175Z","updated_at":"2025-05-01T01:30:57.257Z","avatar_url":"https://github.com/Zuricho.png","language":"Python","readme":"\u003cdiv align=center\u003e\n\u003cimg src=\"./docs/parafoldlogo.png\" width=\"400\" \u003e\n\u003c/div\u003e\n\n# ParaFold\n\nAuthor: Bozitao Zhong - zbztzhz@gmail.com\n\n:bookmark_tabs: Please cite our [paper](https://arxiv.org/abs/2111.06340) if you used ParaFold (ParallelFold) in you research. \n\n## Overview\n\nRecent change: **ParaFold now supports AlphaFold 2.3.1**\n\nThis project is a modified version of DeepMind's [AlphaFold2](https://github.com/deepmind/alphafold) to achieve high-throughput protein structure prediction. \n\nWe have these following modifications to the original AlphaFold pipeline:\n\n- Divide **CPU part** (MSA and template searching) and **GPU part** (prediction model)\n\n\n\n## How to install \n\nWe recommend to install AlphaFold locally, and not using **docker**.\n\n```bash\n# clone this repo\ngit clone https://github.com/Zuricho/ParallelFold.git\n\n# Create a miniconda environment for ParaFold/AlphaFold\n# Recommend you to use python 3.8, version \u003c 3.7 have missing packages, python versions newer than 3.8 were not tested\nconda create -n parafold python=3.8\n\npip install py3dmol\n# openmm 7.7 is recommended (original alphafold using 7.5.1, but it is not supported now)\nconda install -c conda-forge openmm=7.7 pdbfixer\n\n# use pip3 to install most of packages\npip3 install -r requirements.txt\n\n# install cuda and cudnn\n# cudatoolkit 11.3.1 matches cudnn 8.2.1\nconda install cudatoolkit=11.3 cudnn\n\n# downgrade jaxlib to the correct version, matches with cuda and cudnn version\npip3 install --upgrade --no-cache-dir jax==0.3.25 jaxlib==0.3.25+cuda11.cudnn82 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n\n# install packages for multiple sequence alignment\nconda install -c bioconda hmmer=3.3.2 hhsuite=3.3.0 kalign2=2.04\n\nchmod +x run_alphafold.sh\n```\n\n## Download the sequence database\n\nYou can use the [downloading script from AlphaFold repo](https://github.com/google-deepmind/alphafold/blob/main/scripts/download_all_data.sh). \nThe `data` directory should have the following directory structure. Some old versions of AlphaFold might have older database versions, you should update them (reference to AlphaFold repo)\n\n```text\n$DOWNLOAD_DIR/                             # Total: ~ 2.62 TB (download: 556 GB)\n    bfd/                                   # ~ 1.8 TB (download: 271.6 GB)\n        # 6 files.\n    mgnify/                                # ~ 120 GB (download: 67 GB)\n        mgy_clusters_2022_05.fa\n    params/                                # ~ 5.3 GB (download: 5.3 GB)\n        # 5 CASP14 models,\n        # 5 pTM models,\n        # 5 AlphaFold-Multimer models,\n        # LICENSE,\n        # = 16 files.\n    pdb70/                                 # ~ 56 GB (download: 19.5 GB)\n        # 9 files.\n    pdb_mmcif/                             # ~ 238 GB (download: 43 GB)\n        mmcif_files/\n            # About 199,000 .cif files.\n        obsolete.dat\n    pdb_seqres/                            # ~ 0.2 GB (download: 0.2 GB)\n        pdb_seqres.txt\n    small_bfd/                             # ~ 17 GB (download: 9.6 GB)\n        bfd-first_non_consensus_sequences.fasta\n    uniref30/                              # ~ 206 GB (download: 52.5 GB)\n        # 7 files.\n    uniprot/                               # ~ 105 GB (download: 53 GB)\n        uniprot.fasta\n    uniref90/                              # ~ 67 GB (download: 34 GB)\n        uniref90.fasta\n```\n\n\n## Some detail information of modified files\n\n- `run_alphafold.py`: modified version of original `run_alphafold.py`, it has multiple additional functions like skipping featuring steps when exists `feature.pkl` in output folder\n- `run_alphafold.sh`: bash script to run `run_alphafold.py`\n\n\n\n## How to run\n\n### Run features\n\nRun on CPUs to get features:\n\n```bash\n./run_alphafold.sh \\\n-d data \\\n-o output \\\n-p monomer_ptm \\\n-i input/GA98.fasta \\\n-t 1800-01-01 \\\n-m model_1 \\\n-f\n\n```\n\n`-f` means only run the featurization step, result in a `feature.pkl` file, and skip the following steps.\n\n\u003e  8 CPUs is enough, according to my test, more CPUs won't help with speed\n\nFeaturing step will output the `feature.pkl`  and MSA folder in your output folder: `./output/[FASTA_NAME]/`\n\nPS: Here we put input files in an `input` folder to organize files in a better way.\n\n\n\n### Run monomer prediction\n\nAfter the feature step, you can run `run_alphafold.sh` using GPU:\n\n```bash\n./run_alphafold.sh \\\n-d data \\\n-o output \\\n-m model_1,model_2,model_3,model_4,model_5 \\\n-p monomer_ptm \\\n-i input/GA98.fasta \\\n-t 1800-01-01 \n\n```\n\nIf you have successfully output `feature.pkl`, you can have a very fast featuring step\n\n\n\n### Run multimer prediction\n\n```bash\n./run_alphafold.sh \\\n-d data \\\n-o output \\\n-m model_1_multimer,model_2_multimer,model_3_multimer,model_4_multimer,model_5_multimer \\\n-p multimer \\\n-i input/GA98.fasta \\\n-t 1800-01-01 \n\n```\n\n\n\n## What is this for\n\nParallelFold can help you accelerate AlphaFold when you want to predict multiple sequences. After dividing the CPU part and GPU part, users can finish feature step by multiple processors. Using ParaFold, you can run AlphaFold 2~3 times faster than DeepMind's procedure. \n\n**If you have any question, please raise issues**\n\n\n\n\n\n\n\n","funding_links":[],"categories":["Structure prediction"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZuricho%2FParallelFold","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FZuricho%2FParallelFold","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZuricho%2FParallelFold/lists"}