{"id":44964487,"url":"https://github.com/bioinfomachinelearning/multicom4","last_synced_at":"2026-02-18T14:09:38.053Z","repository":{"id":268596307,"uuid":"637227417","full_name":"BioinfoMachineLearning/MULTICOM4","owner":"BioinfoMachineLearning","description":"The MULTICOM4 protein structure prediction system developed by the Bioinformatics and Machine Learning Lab at the University of Missouri - Columbia","archived":false,"fork":false,"pushed_at":"2025-11-27T02:16:15.000Z","size":150413,"stargazers_count":11,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-11-29T20:33:00.506Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BioinfoMachineLearning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-05-06T22:39:57.000Z","updated_at":"2025-11-27T02:16:19.000Z","dependencies_parsed_at":"2025-06-28T03:36:03.490Z","dependency_job_id":null,"html_url":"https://github.com/BioinfoMachineLearning/MULTICOM4","commit_stats":null,"previous_names":["bioinfomachinelearning/multicom4"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/BioinfoMachineLearning/MULTICOM4","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FMULTICOM4","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FMULTICOM4/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FMULTICOM4/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FMULTICOM4/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BioinfoMachineLearning","download_url":"https://codeload.github.com/BioinfoMachineLearning/MULTICOM4/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FMULTICOM4/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29581624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T13:56:48.962Z","status":"ssl_error","status_checked_at":"2026-02-18T13:54:34.145Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-18T14:09:37.395Z","updated_at":"2026-02-18T14:09:38.035Z","avatar_url":"https://github.com/BioinfoMachineLearning.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **MULTICOM4 protein structure prediction system**\n\nMULTICOM4 is an advanced protein structure prediction system built on AlphaFold2 and 3. It achieved remarkable success in the 16th world-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16) concluded in December 2024, ranking 1st in protein complex structure prediction without stoichiometry information (Phase 0), 2nd in protein tertiary structure prediction, 2nd in estimating the global fold accuracy of protein complex structures, 3rd in protein complex structure prediction with stoichiometry information (Phase 1), and 5th in protein-ligand structure and binding affinity prediction.\n\n# **Some CASP16 Prediction Examples**\n\n**Colored Chains**: Model predicted by MULTICOM4  \n**Brown**: True structure \n\n| **Target ID** | **Description**                                | **Visualization**              |\n| ------------- | ---------------------------------------------- | ------------------------------- |\n| **H0215**     | A1B1, mNeonGreen with Bound Nanobody           | ![H0215](imgs/H0215.gif)       |\n| **H0233**     | A2B2C2, Antibody Fab 3H4 complex, virus capsid | ![H0233](imgs/H0233.gif)       |\n| **H0245**     | A1B1, FUNComplex, Shallow MSA                  | ![H0245](imgs/H0245.gif)       |\n| **T0234o**    | A3, Better Stoichiometry Prediction            | ![T0234](imgs/T0234.gif)       |\n\n\n## **The workflow of the MULTICOM4 protein complex structure prediction system used in CASP16**\n\n![MULTICOM4](imgs/MULTICOM4.png)\n\n## **The workflow of the MULTICOM4 protein tertiary structure prediction system used in CASP16**\n\n![MULTICOM4](imgs/TS_module.png)\n\n\n# **Download MULTICOM4 package**\n\n```\ngit clone --recursive https://github.com/BioinfoMachineLearning/MULTICOM4\n```\n\n# **Installation (non Docker version)**\n\n## **Install AlphaFold/AlphaFold-Multimer and other required third-party packages (modified from [alphafold_non_docker](https://github.com/kalininalab/alphafold_non_docker))**\n\n### **Install miniconda**\n\n``` bash\nwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \u0026\u0026 bash Miniconda3-latest-Linux-x86_64.sh\n```\n\n### **Create a new conda environment and update**\n\n``` bash\nconda create --name multicom4 python==3.8\nconda update -n base conda\n```\n\n### **Activate conda environment**\n\n``` bash\nconda activate multicom4\n```\n\n### **Install dependencies**\n\n- Change `cudatoolkit==11.2.2` version if it is not supported in your system\n\n``` bash\nconda install -y -c conda-forge openmm==7.5.1 cudatoolkit==11.2.2 pdbfixer\nconda install -y -c bioconda hmmer hhsuite==3.3.0 kalign2\n```\n\n- Change `jaxlib==0.3.25+cuda11.cudnn805` version if this is not supported in your system\n\n``` bash\npip install absl-py==1.0.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.9 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.3.25 ml-collections==0.1.0 numpy==1.21.6 pandas==1.3.4 protobuf==3.20.1 scipy==1.7.0 tensorflow-cpu==2.9.0\n\npip install --upgrade --no-cache-dir jax==0.3.25 jaxlib==0.3.25+cuda11.cudnn805 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n```\n\n### **Download chemical properties to the common folder**\n\n``` bash\n\n# Replace $MULTICOM4_INSTALL_DIR with your MULTICOM4 installation directory\n\nwget -q -P $MULTICOM4_INSTALL_DIR/tools/alphafold-v2.3.2/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt\n```\n\n### **Apply OpenMM patch**\n\n``` bash\n# Replace $MULTICOM4_INSTALL_DIR with your MULTICOM4 installation directory\n\ncd ~/anaconda3/envs/multicom4/lib/python3.8/site-packages/ \u0026\u0026 patch -p0 \u003c $MULTICOM4_INSTALL_DIR/tools/alphafold-v2.3.2/docker/openmm.patch\n\n# or\n\ncd ~/miniconda3/envs/multicom4/lib/python3.8/site-packages/ \u0026\u0026 patch -p0 \u003c $MULTICOM4_INSTALL_DIR/tools/alphafold-v2.3.2/docker/openmm.patch\n```\n\n### **Install other required third-party packages**\n\n```\nconda install tqdm\nconda install -c conda-forge -c bioconda foldseek\nconda install scikit-learn\n\n#if running jackhmmer returns error: libgsl.so.25: cannot open shared object file: No such file or directory\nconda install -c conda-forge gsl=2.5\n\npip install charset_normalizer==3.3.1\n```\n\n### **Install third-party packages envorinments**\n\n```python\n# DHR\nconda env create -f tools/Dense-Homolog-Retrieval/env.yml\n\n# ESMFold\nconda env create -f envs/esm.yml\nconda activate esmfold\npip install \"fair-esm[esmfold]\"\npip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307'\npip install transformers\n\n# AFsample\nconda env create -f tools/afsample/afsample.yml\n```\n\n### **Download Genetic databases in AlphaFold2/AlphaFold-Multimer**\n\n```\n# Replace $MULTICOM4_INSTALL_DIR with your MULTICOM4 installation directory\n\nbash $MULTICOM4_INSTALL_DIR/tools/alphafold-v2.3.2/scripts/download_all_data.sh \u003cYOUR_ALPHAFOLD_DB_DIR\u003e\n```\n\nAFsample also uses both v2.1.0 and v2.2.0 AlphaFold-Multimer model weights. Download them using the links below and extract them in the params/ folder in the $YOUR_ALPHAFOLD_DB_DIR.\n\nThe v2.2.0 AlphaFold-Multimer model weights: https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar \n\nThe v2.1.0 AlphaFold-Multimer model weights: https://storage.googleapis.com/alphafold/alphafold_params_2022-01-19.tar\n\n### **Install the MULTICOM4 addon system and its databases**\n\n```python\n# Note: here the parameters should be the absolute paths\npython download_database_and_tools.py --multicom4db_dir \u003cYOUR_MULTICOM4_DB_DIR\u003e\n\n# Configure the MULTICOM4 system\n# Replace $MULTICOM4_INSTALL_DIR with your MULTICOM4 installation directory\n# Replace $YOUR_ALPHAFOLD_DB_DIR with your downloaded AlphaFold databases directory\n\npython configure.py --envdir ~/miniconda3/envs/multicom4 --multicom4db_dir \u003cYOUR_MULTICOM4_DB_DIR\u003e --afdb_dir \u003cYOUR_ALPHAFOLD_DB_DIR\u003e\n\n# e.g, \n# python download_database_and_tools.py \\\n# --multicom4db_dir /home/multicom4/tools/multicom4_db\n\n# python configure.py \\\n# --multicom4db_dir /home/multicom4/tools/multicom4_db \\\n# --afdb_dir /home/multicom4/tools/alphafold_databases/\n```\nThe configure.py python script will \n* Copy the alphafold_addon scripts\n* Create the configuration file (bin/db_option) for running the system\n\n# **Genetic databases used by MULTICOM4**\n\nAssume the following databases have been installed as a part of the AlphaFold2/AlphaFold-Multimer installation\n*   [BFD](https://bfd.mmseqs.com/),\n*   [MGnify](https://www.ebi.ac.uk/metagenomics/),\n*   [PDB70](http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/),\n*   [PDB](https://www.rcsb.org/) (structures in the mmCIF format),\n*   [PDB seqres](https://www.rcsb.org/)\n*   [UniRef30](https://uniclust.mmseqs.com/),\n*   [UniProt](https://www.uniprot.org/uniprot/),\n*   [UniRef90](https://www.uniprot.org/help/uniref).\n\nAdditional databases will be installed for the MULTICOM system by setup.py:\n*   [AlphaFoldDB](https://alphafold.ebi.ac.uk/): ~53G\n*   [ESM Atlas](https://esmatlas.com/): ~99G\n*   [Metaclust](https://metaclust.mmseqs.org/current_release/): ~114G\n*   [STRING](https://string-db.org/cgi/download?sessionId=bgV6D67b9gi2): ~129G\n*   [pdb_complex v2024](https://www.biorxiv.org/content/10.1101/2023.05.16.541055v1): ~38G\n*   [pdb_sort90 v2024](https://www.biorxiv.org/content/10.1101/2023.05.01.538929v1): ~48G\n*   [Uniclust30](https://uniclust.mmseqs.com/): ~87G\n*   [DHR_DATABASE](https://github.com/ml4bio/Dense-Homolog-Retrieval/tree/v1): 1.5T\n*   [JGIclust](https://zhanggroup.org/DeepMSA2/): ~1.1T\n\n# **Key Parameters for Running MULTICOM4**\n\nAll parameters for running AlphaFold2/AlphaFold-Multimer are defined in `multicom4/common/config.py`. These configurations were used for large-scale sampling during the CASP16 competition, with the exception of the number of models, which should be adjusted according to the size of the target protein. Users may modify the following parameters as needed:\n\n```python\n# Number of models generated per checkpoint for monomer predictions using AlphaFold2. Default: 100.\n# Note: Each AlphaFold2 predictor will generate 100 * 5 models.\nMONOMER_PREDICTIONS_PER_MODEL = 100\n\n# Configuration for hetero-multimer predictions using AlphaFold-Multimer. Default: 100 models.\n# Note: Each AlphaFold-Multimer predictor will generate 100 * 5 models.\nHETERO_MULTIMER_PREDICTIONS_PER_MODEL = 100\n\n# Configuration for homo-multimer predictions using AlphaFold-Multimer. Default: 100 models.\n# Note: Each AlphaFold-Multimer predictor will generate 100 * 5 models.\nHOMO_MULTIMER_PREDICTIONS_PER_MODEL = 100\n```\n\n# **Before running the system for non Docker version**\n\n## **Activate your python environment and add the MULTICOM4 system path to PYTHONPATH**\n\n```bash\nconda activate multicom4\n\n# Replace $MULTICOM4_INSTALL_DIR with your MULTICOM4 installation directory (absolute path)\nexport PYTHONPATH=$MULTICOM4_INSTALL_DIR\n\n# e.g, \n# conda activate MULTICOM4\n# export PYTHONPATH=/home/multicom4/MULTICOM4\n\n```\nNow MULTICOM4 is ready for you to make predictions.\n\n# **Running the monomer/tertiary structure prediction pipeline**\n\nSay we have a monomer with the sequence `\u003cSEQUENCE\u003e`. The input sequence file should be in the FASTA format as follows:\n\n```fasta\n\u003esequence_name\n\u003cSEQUENCE\u003e\n```\n\nNote: It is recommended that the name of the sequence file in FASTA format should be the same as the sequence name.\n\nThen run the following command:\n\n```bash\n# Please provide absolute path for the input parameters\nsh bin/monomer/run_monomer.sh \u003coption_file\u003e \u003cfasta_path\u003e \u003coutput_dir\u003e\n```\n\noption_file (e.g., bin/db_option) is a file in the MULTICOM4 package to store the path of the databases/tools. fasta_path is the full path of the file storing the input protein sequence(s) in the FASTA format. output_dir specifies where the prediction results are stored. \n\n## **Output**\n\n```\n$OUTPUT_DIR/                                   # Your output directory\n    N1_monomer_alignments_generation/          # Working directory for generating monomer MSAs\n    N2_monomer_template_search/                # Working directory for searching monomer templates\n    N3_monomer_structure_generation/           # Working directory for generating monomer structural predictions\n    N4_monomer_structure_evaluation/           # Working directory for evaluating the monomer structural predictions\n        - alphafold_ranking.csv    # AlphaFold2 pLDDT ranking\n```\n\n* The predictions and ranking files are saved in the *N4_monomer_structure_evaluation* folder. You can check the AlphaFold2 pLDDT score ranking file (alphafold_ranking.csv) to look for the structure with the highest pLDDT score.\n\n# **Running the multimer/quaternary structure prediction pipeline**\n\n## **Folding a multimer**\n\nSay we have a homomer with 4 copies of the same sequence\n`\u003cSEQUENCE\u003e`. The input file should be in the format as follows:\n\n```fasta\n\u003esequence_1\n\u003cSEQUENCE\u003e\n\u003esequence_2\n\u003cSEQUENCE\u003e\n\u003esequence_3\n\u003cSEQUENCE\u003e\n\u003esequence_4\n\u003cSEQUENCE\u003e\n```\n\nThen run the following command:\n\n```bash\n# Please provide absolute path for the input parameters\nsh bin/multimer/run_multimer.sh \u003coption_file\u003e \u003cfasta_path\u003e \u003coutput_dir\u003e\n```\n\n## **Output**\n\n```\n$OUTPUT_DIR/                                   # Your output directory\n    N1_monomer_alignments_generation/          # Working directory for generating monomer MSAs\n        - Subunit A\n        - Subunit B\n        - ...\n    N1_monomer_alignments_generation_img/      # Working directory for generating IMG MSA\n        - Subunit A\n        - Subunit B\n        - ...\n    N2_monomer_template_search/                # Working directory for searching monomer templates\n        - Subunit A\n        - Subunit B\n        - ...\n    N3_monomer_structure_generation/           # Working directory for generating monomer structural predictions\n        - Subunit A\n        - Subunit B\n        - ...\n    N4_monomer_alignments_concatenation/       # Working directory for concatenating the monomer MSAs\n    N5_monomer_templates_search/               # Working directory for concatenating the monomer templates\n    N6_multimer_structure_generation/          # Working directory for generating multimer structural predictions\n    N7_monomer_only_structure_evaluation       # Working directory for evaluating monomer structural predictions\n        - Subunit A\n            # Rankings for all the predictions\n            - alphafold_ranking.csv            # AlphaFold2 pLDDT ranking \n            - pairwise_ranking.tm              # Pairwise (APOLLO) ranking\n            - pairwise_af_avg.ranking          # Average ranking of the two \n\n            # Rankings for the predictions generated by monomer structure prediction\n            - alphafold_ranking_monomer.csv    # AlphaFold2 pLDDT ranking \n            - pairwise_af_avg_monomer.ranking  # Average ranking \n\n            # Rankings for the predictions extracted from multimer predictions\n            - alphafold_ranking_multimer.csv   # AlphaFold2 pLDDT ranking \n            - pairwise_af_avg_multimer.ranking # Average ranking \n\n        - Subunit B\n        - ...\n    N7_multimer_structure_evaluation           # Working directory for evaluating multimer structural predictions\n        - alphafold_ranking.csv                # AlphaFold2 pLDDT ranking\n        - multieva.csv                         # Pairwise ranking using MMalign\n        - pairwise_af_avg.ranking              # Average ranking of the two\n```\n\n* The predictions and ranking files are saved in *N7_multimer_structure_evaluation*, similarly, you can check the AlphaFold-Multimer confidence score ranking file (alphafold_ranking.csv) to look for the structure with the highest predicted confidence score generated by AlphaFold-Multimer. The *multieva.csv* and *pairwise_af_avg.ranking* are the other two ranking files.\n\n* The monomer structures and ranking files are saved in *N7_monomer_only_structure_evaluation* if you want to check the predictions and rankings for the monomer structures.\n\n# CASP16 Talks\n\n**Our CASP16 talk for protein complex structure prediction:**\n\nhttps://predictioncenter.org/casp16/doc/presentations/Day-2/Day2-05-Cheng-CASP16_MULTICOM_redacted.pdf\n\n\n**Our CASP16 talk for protein model quality assessment:**\n\nhttps://predictioncenter.org/casp16/doc/presentations/Day-2/Day2-15-Neupane-CASP16_MULTICOM_EMA.pptx\n\n**Our CASP16 talk for protein-ligand binding affinity prediction:**\n\nhttps://predictioncenter.org/casp16/doc/presentations/Day-3/Day3-14-Morehead-MULTICOM_ligand.pptx\n\n\n# Citing this work\n\n\n```\n@article{liu2025improving,\n  title={Improving AlphaFold2-and AlphaFold3-Based Protein Complex Structure Prediction With MULTICOM4 in CASP16},\n  author={Liu, Jian and Neupane, Pawan and Cheng, Jianlin},\n  journal={Proteins: Structure, Function, and Bioinformatics},\n  year={2025},\n  publisher={Wiley Online Library}\n}\n\n\n@article{liu2025boosting,\n  title={Boosting AlphaFold Protein Tertiary Structure Prediction through MSA Engineering and Extensive Model Sampling and Ranking in CASP16},\n  author={Liu, Jian and Neupane, Pawan and Cheng, Jianlin},\n  journal={Communications Biology},\n  pages={1587},\n  year={2025},\n  publisher={Nature Springer}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbioinfomachinelearning%2Fmulticom4","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbioinfomachinelearning%2Fmulticom4","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbioinfomachinelearning%2Fmulticom4/lists"}