{"id":44964477,"url":"https://github.com/bioinfomachinelearning/psbench","last_synced_at":"2026-02-18T14:09:30.075Z","repository":{"id":288160091,"uuid":"967037112","full_name":"BioinfoMachineLearning/PSBench","owner":"BioinfoMachineLearning","description":"A large and comprehensive benchmark for estimating the accuracy of protein complex structural models","archived":false,"fork":false,"pushed_at":"2025-10-14T23:41:13.000Z","size":21233,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-15T04:06:17.364Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BioinfoMachineLearning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-04-15T20:39:29.000Z","updated_at":"2025-10-14T23:41:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"2352397f-a9c6-42f4-b4ae-a1d3f967548b","html_url":"https://github.com/BioinfoMachineLearning/PSBench","commit_stats":null,"previous_names":["bioinfomachinelearning/psbench"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/BioinfoMachineLearning/PSBench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FPSBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FPSBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FPSBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FPSBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BioinfoMachineLearning","download_url":"https://codeload.github.com/BioinfoMachineLearning/PSBench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2FPSBench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29581621,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T13:56:48.962Z","status":"ssl_error","status_checked_at":"2026-02-18T13:54:34.145Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-18T14:09:29.312Z","updated_at":"2026-02-18T14:09:30.069Z","avatar_url":"https://github.com/BioinfoMachineLearning.png","language":"C","readme":"\u003ch1 align=\"center\"\u003ePSBench\u003c/h1\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n  \u003c/a\u003e\n    \u003ca href=\"https://neurips.cc/virtual/2025/poster/121810\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/NeurIPS-2025-4b44ce.svg\" alt=\"Conference\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://arxiv.org/abs/2505.22674\"\u003e\n    \u003cimg src=\"http://img.shields.io/badge/arXiv-2505.22674-B31B1B.svg\" alt=\"Paper\"\u003e\n  \u003ca href=\"https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/75SZ1U\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/dataverse-Dataset-blue\" alt=\"Dataverse\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://colab.research.google.com/github/BioinfoMachineLearning/PSBench/blob/main/PSBench_tutorial.ipynb\"\u003e\n    \u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://psbench-webserver.onrender.com/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/web-PSBench-blue?style=flat\" alt=\"Visit PSBench Web Server\" height=\"20\"\u003e\n\u003c/a\u003e\n  \u003ca href=\"https://www.repostatus.org/#active\"\u003e\n    \u003cimg src=\"https://www.repostatus.org/badges/latest/active.svg\" alt=\"Active\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"[https://www.repostatus.org/#active](https://opensource.org/licenses/MIT)\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/license-MIT-yellow.svg\" alt=\"MIT\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\n## Description\nA large-scale benchmark for developing and evaluating methods for estimating protein complex structural model accuracy (EMA). It includes five components: (I) datasets for training and evaluating EMA methods; (II) scripts to evaluate the prediction results of EMA methods on the datasets; (III) scripts to reproduce the benchmark results of the baseline EMA methods in PSBench; (IV) scripts to label new benchmark datasets; and (V) baseline EMA methods which users can compare their EMA methods with. \n![PSBench Pipeline, Methods and Metrics](imgs/pipeline_methods_metrics.png)\n\n## Data Repository at Harvard Dataverse\nThe datasets in PSBench can be downloaded from the Harvard Dataverse repository here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/75SZ1U\n\n\nDOI : https://doi.org/10.7910/DVN/75SZ1U\n\nPSBench has also integrated four datasets, featuring both community and in-house structural models for monomeric protein targets from the CASP15 and CASP16 competitions. The dataset can be downloaded from another Harvard Dataverse repository here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/8GI1DP\n\nDOI : https://doi.org/10.7910/DVN/8GI1DP\n\n## Colab tutorial\nA Google Colab tutorial is provided to facilitate the evaluation of EMA methods, reproduce the main results table from the manuscript, and generate comparative performance plots:\n\n[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BioinfoMachineLearning/PSBench/blob/main/PSBench_tutorial.ipynb)\n\n## Webserver for third-party model upload\nFor third party model uploads and future references, a webserver is provided at :\n\n[![Visit PSBench Web Server](https://img.shields.io/badge/web-PSBench-blue?style=flat)](https://psbench-webserver.onrender.com/)\n\n## PSBench Installation (tested on Linux systems)\n\n### Clone the repository\n```\ngit clone https://github.com/BioinfoMachineLearning/PSBench.git\ncd PSBench/ \n```\n### Setup the environment\n```\n#create the PSBench environment\nconda create -n PSBench python=3.10.12\n\n# activate PSBench\nconda activate PSBench\n\n#install the required packages\npip install -r scripts/requirements.txt\n```\n### Setup and test OpenStructure\n```\n# install OpenStructure\ndocker pull registry.scicore.unibas.ch/schwede/openstructure:latest\n\n# test OpenStructure installation \ndocker run -it registry.scicore.unibas.ch/schwede/openstructure:latest --version\n```\n\n\n## I. Datasets for training and testing EMA methods\nPSBench consists of the following 5 complementary large datasets for training and testing EMA methods, which can be downloaded from the Harvard Dataverse repository above: \n1. CASP15_inhouse_dataset\n2. CASP15_community_dataset\n3. CASP16_inhouse_dataset\n4. CASP16_community_dataset\n5. Multimer_7_2024_8_2025_dataset\n\nIn addition, CASP15_inhouse_TOP5_dataset (a subset of CASP15_inhouse_dataset) and CASP16_inhouse_TOP5_dataset (a subset of CASP16_inhouse_dataset) are also included into the PSBench. The two subsets were used to train and test GATE (a graph transformer EMA method) respectively. Users may train and test their machine learning methods on the same subsets and compare the results with GATE. \n\nUsers can put the downloaded datasets anywhere or in the installation directory of PSBench (recommended for easy management). \n   \n## The dataset directory structure\n\nAfter the datasets are downloaded from Harvard Dataverse and unzipped, the structure of the four main datasets should be:\n\n```text\n📁 PSBench/\n├── 📁 CASP15_inhouse_dataset/\n│   ├── 📄 CASP15_inhouse_dataset_summary.tab\n│   ├── 📁 AlphaFold_Features/\n│   ├── 📁 Fasta/\n│   ├── 📁 Predicted_Models/\n│   └── 📁 Quality_Scores/\n├── 📁 CASP15_inhouse_TOP5_dataset/\n│   ├── 📄 CASP15_inhouse_TOP5_dataset_summary.tab\n│   ├── 📁 AlphaFold_Features/\n│   ├── 📁 Fasta/\n│   └── 📁 Quality_Scores/\n├── 📁 CASP15_community_dataset/\n│   ├── 📄 CASP15_community_dataset_summary.tab\n│   ├── 📁 Fasta/\n│   ├── 📁 Predicted_Models/\n│   └── 📁 Quality_Scores/\n├── 📁 CASP16_inhouse_dataset/\n├── 📁 CASP16_inhouse_TOP5_dataset/\n├── 📁 CASP16_community_dataset/\n├── 📁 Multimer_7_2024_8_2025_dataset/\n└── 📄 README.md\n\n```\nIn the folder of each dataset, a summary file provides an overview of the dataset, the \"Fasta\" sub-folder contains the sequences of the protein complex targets in the FASTA format; the \"Predicted_Models\" sub-folder contains the predicted structural models in the PDB format; the \"Quality_Scores\" sub-folder contains the labels (quality scores) of the structural models. CASP15_inhouse_dataset and CASP16_inhouse_dataset have an additional sub-folder \"AlphaFold_Features\" containing AlphaFold-based features for the structural models. \n\nNote: The two subsets (CASP15_inhouse_TOP5_dataset and CASP16_inhouse_TOP5_dataset) do not include the Predicted_Models directories to minimize redundancy. The structural models in the two subsets are already available in the \"Predicted_Models\" sub-folder in their respective full datasets (CASP15_inhouse_dataset and CASP16_inhouse_dataset). \n\n\n\n## Quality scores (labels)\nFor each structural model in the datasets, we provide the following 10 unique quality scores as labels:\n\n| Category | Quality scores |\n|:---------|:-------------------|\n| **Global Quality Scores** | tmscore (4 variants), rmsd |\n| **Local Quality Scores** | lddt |\n| **Interface Quality Scores** | ics, ics_precision, ics_recall, ips, qs_global, qs_best, dockq_wave |\n\n## Additional features\n\nFor CASP15_inhouse_dataset and CASP16_inhouse_dataset, as well as their subsets (i.e., CASP15_inhouse_TOP5_dataset and CASP16_inhouse_TOP5_dataset), the following additional features are provided for each model:\n\n| Feature               | Description                                                       |\n|-----------------------|-------------------------------------------------------------------|\n| `model_type`          | Indicates model type (AlphaFold2-multimer or AlphaFold3)     |\n| `afm_confidence_score`| AlphaFold2-multimer confidence score                               |\n| `af3_ranking_score`   | AlphaFold3 ranking score                                          |\n| `iptm`                | Interface predicted Template Modeling score                       |\n| `num_inter_pae`       | Number of inter-chain predicted aligned errors (\u003c5 Å)             |\n| `mpDockQ/pDockQ`      | Predicted multimer DockQ score                                    |\n\n\nFor detailed explanations of each quality score and feature, please refer to [Quality_Scores_Definitions](jsons/Quality_Scores_Definitions.json)\n\n\u003cdetails\u003e\n  \nIn each figures below, there are three pieces of information: **(a) Model count.** Number of models per target in the dataset. **(b) Score Distribution.** Box plots of each of six representative quality scores of the models for each target. **(c) Example.** Three representative models (worst, average, best) in terms of sum of the six representative quality scores for a target. Each model with two chains colored in blue and red is superimposed with the true structure in gray.\n\n## i. CASP15_inhouse_dataset\nCASP15_inhouse_dataset consists of a total of 7,885 models generated by MULTICOM3 during the 2022 CASP15 competition. Example target in Figure (c): H1143. \n![CASP15_inhouse_dataset](imgs/CASP15_inhouse_dataset.png)\n\n**CASP15_inhouse_TOP5_dataset** is a subset curated for GATE-AFM from the **CASP15_inhouse_dataset**, consisting of the top 5 models per predictor. Each predictor varies in its use of input multiple sequence alignments (MSAs), structural templates, and AlphaFold configuration parameters to generate structural models using the AlphaFold. \n\n## ii. CASP15_community_dataset\nCASP15_community_dataset consists of a total of 10,942 models generated by all the participating groups during the 2022 CASP15 competition. Example target in Figure (c): H1135. \n![CASP15_community_dataset](imgs/CASP15_community_dataset.png)\n\n## iii. CASP16_inhouse_dataset\nCASP16_inhouse_dataset consists of a total of 1,009,050 models generated by MULTICOM4 during the 2024 CASP16 competition. Example target in Figure (c): T1235o. \n![CASP16_inhouse_dataset](imgs/CASP16_inhouse_dataset.png)\n\n**CASP16_inhouse_TOP5_dataset** is a subset curated for GATE-AFM from the **CASP16_inhouse_dataset**, consisting of the top 5 models per predictor. Each predictor varies in its use of input multiple sequence alignments (MSAs), structural templates, and AlphaFold configuration parameters to generate structural models using the AlphaFold program. \n\n## iv. CASP16_community_dataset\nCASP16_community_dataset consists of a total of 12,904 models generated by all the participating groups during the 2024 CASP16 competition. Example target in Figure (c): H1244. \n![CASP16_community_dataset](imgs/CASP16_community_dataset.png)\n\n## v. Multimer_7_2024_8_2025_dataset\nMultimer_7_2024_8_2025_dataset consists of a total of 400,400 AlphaFold3-generated models for 2,002 non-redundant multimeric protein entries deposited in the RCSB PDB between July 2024 and August 2025.\n\u003c/details\u003e\n\n## II. Script to evaluate EMA methods on a benchmark dataset\n\nThis script (evaluate_QA.py) is used to evaluate and compare how well different EMA methods perform. It calculates how closely the quality scores predicted by each EMA method for the structural models in a dataset match the true quality scores in terms of four metrics, including Pearson correlation, Spearman correlation, top-1 ranking loss, and AUROC.\n\n### Command:\n\n```bash\npython scripts/evaluate_QA.py \\\n  --input_dir $INPUT_DIR \\ \n  --native_dir $NATIVE_DIR \\ \n  --true_score_field $TRUE_SCORE_FIELD\n```\n\n### Arguments:\n\n| Argument               | Description |\n|------------------------|-------------|\n| `--input_dir`              | Input directory with model quality prediction files that include predicted quality scores by one or more EMA methods for each model. The predicted quaity scores will be evaluated against the true scores. |\n| `--native_dir`          | Directory containing the true model quality scores (labels) of structural models for each target. The true model quality scores are available in each of the benchmark datasets (CASP15_community_dataset, CASP15_inhouse_dataset, CASP16_inhouse_dataset, CASP16_community_dataset) downloaded from the Harvard Dataverse repository. |\n| `--true_score_field` | Name of the column in the true score file that contains the true quality score to be evaluated against. Default is `tmscore_usalign` |\n| `--ema_method`        | (Optional) The name of the EMA method column in the prediction file that you want to evaluate. If not provided, the script will evaluate the model quality prediction scores of all the EMA methods |\n| `--outfile`            | \t(Optional) The name of the CSV file where the evaluation results will be saved. Default is `evaluation_results.csv` |\n\n#### Example: \nRun the command below in the installation directory of PSBench:\n```bash\npython scripts/evaluate_QA.py \\\n  --input_dir ./Predictions/CASP16_inhouse_TOP5_dataset/ \\\n  --native_dir ./true_scores \\\n  --true_score_field tmscore_usalign_aligned\n```\n\n### Format of model quality prediction files\n\nEach prediction file should be a CSV file in the following format:\n- The first row is a header row, including the column names (e.g., model, EMA method 1, EMA method 2, ...).\n- The first column starting from the second row should be the names of structural models (e.g., model1, model2).\n- The remaining columns should be predicted quality scores for the models from one or more EMA methods (e.g., EMA method 1, EMA method 2).\n\n```\nmodel,EMA1,EMA2,...\nmodel1,0.85,0.79, ...\nmodel2,0.67,0.71, ...\n```\n\nExample of a model quality prediction file (./Predictions/CASP16_inhouse_TOP5_dataset/H1202.csv):\n```\nmodel,PSS,DProQA,VoroIF-GNN-score,VoroIF-GNN-pCAD-score,VoroMQA-dark,GCPNet-EMA,GATE-AFM,AFM-Confidence\ndeepmsa2_14_ranked_2.pdb,0.9545535989717224,0.02895,0.0,0.0,0.0,0.7771772742271423,0.5923953714036315,0.8254922444800168\nafsample_v2_ranked_2.pdb,0.8873916966580978,0.0066,0.0,0.0,0.0,0.7705466747283936,0.575105558750621,0.8153403780624796\ndef_mul_tmsearch_ranked_0.pdb,0.9609340102827764,0.02353,0.0,0.0,0.0,0.7641939520835876,0.5981529354257233,0.8133504286051549\ndeepmsa2_1_ranked_4.pdb,0.96272264781491,0.02055,0.0,0.0,0.0,0.7685595154762268,0.5959772306691834,0.8178802534659162\ndeepmsa2_1_ranked_2.pdb,0.9606568380462726,0.02318,0.0,0.0,0.0,0.7671180963516235,0.5983494717414063,0.8183128442689481\nafsample_v2_r21_not_ranked_1.pdb,0.9234104884318768,0.0192,0.0,0.0,0.0,0.7699458599090576,0.5879402631363266,0.8204161898081545\ndeepmsa2_0_ranked_3.pdb,0.9607991259640104,0.02123,0.0,0.0,0.0,0.7682469487190247,0.5953465198918304,0.8183400300533047\nafsample_v1_r21_not_ranked_1.pdb,0.9156177377892032,0.02246,0.0,0.0,0.0,0.7822033762931824,0.5772502685580536,0.8226690041151985\nfolds_iter_esm_1_ranked_1.pdb,0.9471744215938304,0.01475,0.0,0.0,0.0,0.7621756196022034,0.5904867273330673,0.8215535911325099\ndeepmsa2_15_ranked_3.pdb,0.956274524421594,0.02606,0.0,0.0,0.0,0.7756944894790649,0.5937158219111754,0.8243908296207267\n```\n\n### Output:\n\nThe script generates a CSV file summarizing the evaluation results. Each row corresponds to a different target (e.g., a protein complex), and for each EMA method, the following metrics are reported:\n- *_pearson: How strongly the predicted scores correlate with the true scores (Pearson correlation).\n- *_spearman: A rank-based version of correlation (Spearman correlation).\n- *_loss: The difference between the quality score of the truely best model of a target and that of the top-ranked model selected by the predicted quality scores.\n- *_auroc: AUROC from ROC analysis, measuring how well the EMA method distinguishes high-quality models (top 25%) from others.\n\nExample of an output file:\n\n```\ntarget,PSS_pearson,PSS_spearman,PSS_loss,PSS_auroc,DProQA_pearson,DProQA_spearman,DProQA_loss,DProQA_auroc,VoroIF-GNN-score_pearson,VoroIF-GNN-score_spearman,VoroIF-GNN-score_loss,VoroIF-GNN-score_auroc,VoroIF-GNN-pCAD-score_pearson,VoroIF-GNN-pCAD-score_spearman,VoroIF-GNN-pCAD-score_loss,VoroIF-GNN-pCAD-score_auroc,VoroMQA-dark_pearson,VoroMQA-dark_spearman,VoroMQA-dark_loss,VoroMQA-dark_auroc,GCPNet-EMA_pearson,GCPNet-EMA_spearman,GCPNet-EMA_loss,GCPNet-EMA_auroc,GATE-AFM_pearson,GATE-AFM_spearman,GATE-AFM_loss,GATE-AFM_auroc,AFM-Confidence_pearson,AFM-Confidence_spearman,AFM-Confidence_loss,AFM-Confidence_auroc\nH1202,0.0648102729743337,0.1695381876334342,0.028000000000000025,0.5,0.11790990707180772,-0.13413140758032455,0.0050000000000000044,0.5,-0.020603323167442494,-0.031220710744368295,0.016000000000000014,0.5095527954681397,-0.028282504057759977,-0.03428508620550331,0.016000000000000014,0.509130572464023,-0.028878249628187847,-0.034223373004110276,0.05900000000000005,0.5091129798388515,0.11945280299656955,0.04279138121740533,0.040000000000000036,0.5,-0.02194819533685046,0.10417440398523113,0.025000000000000022,0.5,-0.003912594407666805,-0.040927588302773835,0.040000000000000036,0.5\n```\n\n## III. Reproducing the evaluation results of GATE and other baseline EMA methods in PSBench\n\n### Blind Prediction Results of Estimating the Accuracy of CASP16 In-house Models\n\nNote: Replace $DATADIR with the path where the CASP16_inhouse_TOP5_dataset is stored.\n\n#### Evaluation in terms of TM-score\n```\npython scripts/evaluate_QA.py --input_dir ./Predictions/CASP16_inhouse_TOP5_dataset/ --native_dir $DATADIR/Quality_Scores/ --true_score_field tmscore_usalign_aligned\n```\n\n#### Evaluation in terms of DockQ score\n```\npython scripts/evaluate_QA.py --input_dir ./Predictions/CASP16_inhouse_TOP5_dataset/ --native_dir $DATADIR/Quality_Scores/ --true_score_field dockq_wave\n```\n\n### Blind Prediction Results in CASP16 EMA Competition\n\nNote: Replace $DATADIR with the path where the CASP16_community_dataset is stored.\n\n```\npython scripts/evaluate_QA.py --input_dir ./Predictions/CASP16_community_dataset/ --native_dir $DATADIR/Quality_Scores/ --true_score_field tmscore_mmalign\n```\n\n## IV. Scripts to generate labels for a new benchmark dataset\nUsers can use the tools in PSBench to create their own benchmark dataset. Following are the prerequisites to generate the labels for a new benchmark dataset:\n### Required Input Data\n- Predicted protein complex structures (structural models)\n- Native (true) structures\n- Protein sequence files in FASTA format\n### Tools (downloaded or installed in the PSBench installation section)\n - OpenStructure\n - USalign\n - Clustalw\n\n\u003c!-- \u003cdetails\u003e\n\nDownload the PSBench repository and cd into scripts\n\n```bash\ngit clone https://github.com/BioinfoMachineLearning/PSBench.git\ncd PSBench\ncd scripts\n```\n\n#### Openstructure Installation (Need to run only once)\n```bash\ndocker pull registry.scicore.unibas.ch/schwede/openstructure:latest\n```\n\nCheck the docker installation with \n```bash\n# print the installed latest version of openstructure \ndocker run -it registry.scicore.unibas.ch/schwede/openstructure:latest --version\n``` --\u003e\n\n### Generate True Quality Scores (Labels) for Predicted Structural Models\n\n#### Run the generate_quality_scores.sh pipeline using the command below\n\n```bash\nsh generate_quality_scores.sh \\\n  --fasta_dir $FASTA_DIR \\\n  --predicted_dir $PREDICTED_DIR \\\n  --native_dir $NATIVE_DIR \\\n  --outdir $OUTDIR \\\n  --usalign $USALIGN \\\n  --clustalw $CLUSTALW \\\n  --targets $TARGETS\n```\n\n#### Required Arguments:\n\n| Argument         | Description                                                                                      |\n|------------------|--------------------------------------------------------------------------------------------------|\n| `--fasta_dir`     | Path to the directory containing protein sequence files in FASTA format (named as `\u003ctarget\u003e.fasta`)                        |\n| `--predicted_dir` | Path to the base directory containing predicted models (subdirectory per target)                |\n| `--native_dir`    | Path to the directory containing native protein structure files in PDB format (named as `\u003ctarget\u003e.pdb`)                     |\n| `--outdir`        | Path to the base output directory for results                                                   |\n| `--usalign`       | Path to the USalign program (e.g., `tools/USalign`)                                              |\n| `--clustalw`      | Path to the ClustalW program (e.g., `tools/clustalw1.83/clustalw`)                               |\n| `--targets`       | Space-separated list of target names to process (e.g., `H1204 H1213`)                           |\n\nFor each target (e.g. `H1204`), ensure the following:\n\n- Sequence file in FASTA format: `/path/to/PSBench/Fasta/H1204.fasta`\n- Predicted structural models: `/path/to/PSBench/predicted_models/H1204/*.pdb`\n- Native (true) structure in the PDB format: `/path/to/PSBench/native_models/H1204.pdb`\n\n#### Example:\n\n```bash\ncd scripts/\nsh generate_quality_scores.sh \\\n  --fasta_dir /path/to/PSBench/Fasta/ \\\n  --predicted_dir /path/to/PSBench/predicted_models/ \\\n  --native_dir /path/to/PSBench/native_models/ \\\n  --outdir /path/to/PSBench/output/ \\\n  --usalign /path/to/PSBench/scripts/tools/USalign \\\n  --clustalw /path/to/PSBench/scripts/tools/clustalw1.83/clustalw \\\n  --targets H1204 H1213\n\n```\n#### Output:\nOutput folder will have subdirectories for each target (eg. /path/to/PSBench/output/ will have H1204/ H1213/). Each target subdirectory will have the following:\n\n- filtered_pdbs/ : directory where aligned and filtered structural models and native structures are saved\n- H1204_quality_scores.csv : CSV containing the true quality scores for each model of the target\n- results/ : directory where outputs of OpenStructure and USalign runs are saved\n- temp/ : temporary working directory for structure alignment and filtering process\n\n#### Preprocessing, edge cases and solutions\n\u003cdetails\u003e\n\n1. **Preprocessing**:\n   - Native structures are converted from CIF to PDB when needed  \n   - Native PDB structures are reindexed to match full-length protein sequences\n   - Non-protein components (e.g., ligands, metal ions) are excluded.\n   - Insertion codes in residue indices are replaced with monotonic numbering\n   \n   (See scripts/preprocessing for scripts)\n\n2. **Alignment \u0026 Quality Score Computation**:  \n   - Structures are aligned based on sequence identity.\n   - Scores like RMSD, lDDT are computed with no additional preprocessing.\n   - TM-scores are calculated using few approaches:  \n     - `tmscore_mmalign` (CASP16-style parameters)  \n     - `tmscore_usalign` (CASP15-style parameters)  \n     - Residue reindexing is applied for `tmscore_usalign_aligned` to ensure correct residue-residue correspondence\n\n3. **Edge Cases**:  \n   - Some models from CASP community datasets may fail due to format inconsistencies or non-monotonic residue numbering (e.g., `H1272TS191_1` from the `CASP16_community_dataset`)\n   - OpenStructure refuses to align chains that have less than six valid residues\n\n4. **Fallbacks \u0026 Exclusions**:  \n   - If OpenStructure fails but US-align succeeds, the model is retained and the TM-score from US-align is reported\n   - For example, in the `Multimer_7_2024_8_2025_dataset`, some targets like `9DYY`, `9KAP`, and `9O7J` fail in OpenStructure due to having fewer than six valid residues. The USalign-based TM-scores are still included\n   - Models that fail in both frameworks are excluded. Such failures are rare (less than 0.002% out of more than 1.4 million processed models)\n\n\n\u003c/details\u003e\n\n#### Optional : Generate AlphaFold features when available\n\n\u003cdetails\u003e\n\n#### Run the generate_af_features.sh pipeline\n\n##### Required Arguments:\n| Argument       | Description                                                                                      |\n|----------------|--------------------------------------------------------------------------------------------------|\n| `--fasta_dir`  | Directory containing sequence files of protein targets in FASTA format (e.g., `/path/to/fasta`)                        |\n| `--pdb_dir`    | Directory containing predicted structural model subfolders (e.g., `/path/to/pdbs`)                      |\n| `--pkl_dir`    | Directory containing AlphaFold pickle (.pkl) subfolders (e.g., `/path/to/pkls`)                 |\n| `--outdir` | Directory where output CSV files will be saved (e.g., `/path/to/outdir`)                    |\n| `--targets`    | List of target names (e.g., `H1204 H1213`)                                                       |\n\nFor each target (e.g. `H1204`), ensure the following:\n\n- FASTA file: `/path/to/PSBench/Fasta/H1204.fasta`\n- Predicted models: `/path/to/PSBench/predicted_models/H1204/*.pdb`\n- Pickle files: `/path/to/PSBench/predicted_models_pickles/H1204/*.pkl`\n\n#### Example:\n\n```bash\ncd scripts/\nsh generate_af_features.sh \\\n  --fasta_dir /path/to/PSBench/Fasta/ \\\n  --predicted_dir /path/to/PSBench/predicted_models/ \\\n  --pkl_dir /path/to/PSBench/predicted_models_pickles/ \\\n  --outdir /path/to/PSBench/output/ \\\n  --targets H1204 H1213\n```\n#### Output:\nOutput folder will have target_af_features.csv for each target (eg. H1204_af_features.csv).\n\n\u003c/details\u003e\n\n## V. Baseline EMA methods for comparison with a new EMA method\n\nHere are several publicly available baseline EMA methods which users can comapre their methods with:\n\n- **GATE** [[Liu et al., 2025]](https://github.com/BioinfoMachineLearning/gate):  \n  A multi-model EMA approach leveraging graph transformers on pairwise similarity graphs. Combines single-model and multi-model features for TM-score prediction.  \n  🔗 GitHub: [https://github.com/BioinfoMachineLearning/gate](https://github.com/BioinfoMachineLearning/gate)  \n  - **GATE-AFM**: An enhanced version of GATE that incorporates AlphaFold2-Multimer features as node features.\n\n- **DProQA** [[Chen et al., 2023]](https://github.com/jianlin-cheng/DProQA):  \n  A single-model EMA method using a Gated Graph Transformer. Targets interface quality prediction (e.g., DockQ scores) using KNN-based structural graphs.  \n  🔗 GitHub: [https://github.com/jianlin-cheng/DProQA](https://github.com/jianlin-cheng/DProQA)\n\n- **VoroMQA-dark, VoroIF-GNN-score, VoroIF-GNN-pCAD-score** [[Olechnovič et al., 2023]](https://github.com/kliment-olechnovic/ftdmp):  \n  Interface-focused EMA methods using Voronoi-based atomic contact areas and GNNs.  \n  🔗 GitHub: [https://github.com/kliment-olechnovic/ftdmp](https://github.com/kliment-olechnovic/ftdmp)\n\n- **GCPNet-EMA** [[Morehead et al., 2024]](https://github.com/BioinfoMachineLearning/GCPNet-EMA):  \n  A 3D graph neural network predicting lDDT and global accuracy from atomic point clouds. Adaptable to protein complex structures.  \n  🔗 GitHub: [https://github.com/BioinfoMachineLearning/GCPNet-EMA](https://github.com/BioinfoMachineLearning/GCPNet-EMA)\n\n- **PSS (Pairwise Similarity Score)** [[Roy et al., 2023]](https://github.com/BioinfoMachineLearning/MULTICOM_qa):  \n  A multi-model consensus method using average pairwise TM-scores (via MMalign).  \n  🔗 GitHub: [MULTICOM_qa](https://github.com/BioinfoMachineLearning/MULTICOM_qa)  \n  🔗 Simplified: [mmalign_pairwise.py](https://github.com/BioinfoMachineLearning/gate/blob/main/gate/feature/mmalign_pairwise.py)\n\nIt is worth noting that CASP15_inhouse_dataset and CASP16_inhouse_dataset contain AlphaFold2-Multimer self-estimated confidence scores for the structural models in the two datasets. They can also serve as a baseline to be compared with new EMA methods. \n\n## Acknowledgements\nPSBench builds upon the source code and data from the following projects:\n- [AlphaFold2](https://github.com/google-deepmind/alphafold)\n- [AlphaFold3](https://github.com/google-deepmind/alphafold3) \n- [USAlign](https://zhanggroup.org/US-align/)\n- [OpenStructure](https://git.scicore.unibas.ch/schwede/openstructure.git)\n- [CASP15](https://predictioncenter.org/casp15/)\n- [CASP16](https://predictioncenter.org/casp16/)\n\n\n## Reference\n```bibtex\n@inproceedings{neupane2025psbench,\n  title     = {PSBench: a Large-Scale Benchmark for Estimating the Accuracy of Protein Complex Structural Models},\n  author    = {Neupane, Pawan and Liu, Jian and Cheng, Jianlin},\n  booktitle = {The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},\n  year      = {2025}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbioinfomachinelearning%2Fpsbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbioinfomachinelearning%2Fpsbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbioinfomachinelearning%2Fpsbench/lists"}