{"id":31727368,"url":"https://github.com/biocomputingup/af2_af3_idr_analysis","last_synced_at":"2025-10-09T06:21:09.394Z","repository":{"id":312991758,"uuid":"1007123355","full_name":"BioComputingUP/af2_af3_idr_analysis","owner":"BioComputingUP","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-03T07:43:48.000Z","size":24009,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-03T09:24:27.227Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BioComputingUP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-23T13:52:25.000Z","updated_at":"2025-09-03T07:43:51.000Z","dependencies_parsed_at":"2025-09-03T09:34:34.703Z","dependency_job_id":null,"html_url":"https://github.com/BioComputingUP/af2_af3_idr_analysis","commit_stats":null,"previous_names":["biocomputingup/af2_af3_idr_analysis"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/BioComputingUP/af2_af3_idr_analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Faf2_af3_idr_analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Faf2_af3_idr_analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Faf2_af3_idr_analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Faf2_af3_idr_analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BioComputingUP","download_url":"https://codeload.github.com/BioComputingUP/af2_af3_idr_analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Faf2_af3_idr_analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000849,"owners_count":26082950,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-09T06:21:06.623Z","updated_at":"2025-10-09T06:21:09.343Z","avatar_url":"https://github.com/BioComputingUP.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Modelling Intrinsically Disordered Regions from AlphaFold2 to AlphaFold3\n\nThis repository contains code for comparing the performance of AlphaFold3 and AlphaFold2 on intrinsic disorder prediction using the CAID3's Disorder-PDB dataset. \n\n## Data preparation\n\nAll the datapreparation including reading the fasta file, parsing the data, downloading AF2 stuctures from AFDB or generating colabfold structures could be done by running: \n\n    cd src\n    ./prepare_data_pipeline.sh \\\n         \u003cinput_fasta\u003e # Input fasta file with [disorder] labels\n         \u003coutput_dir\u003e # The directory for writing the results from fasta file, downloading/generating AF2/AF3, CAID IDR evaluations, saving figures from analysis. we used path/to/project/data as output_dir\n         \u003cdisprot_uniprot_mapping\u003e\" # a csv file containing DisProt ids and their Corresponding UniProt ids. This file was obtained from DisProt Curators, or it could be extracted from DisProt directly for proteins that are released.\n\nIndividual steps could be performed as below:\n\n\n### 1. Parse the fasta file\nThe fasta file contains protein ids, protein sequence, and the disorder label. The dataset used in this paper is the Disorder-PDB dataset from CAID3 challenge (https://caid.idpcentral.org/challenge/results). \n\nFormat of the file should be: \n\n\n    \u003eDP0123\n    MKTFFVLLLCTFTVLSSGLTQGAE\n    111111000000------111100\n\nwhere 1 indicates disorder label, 0 indicates order label, and - indicates residues that are not annotated and are excluded from assessment. For CAID3 input and output formats, see https://caid.idpcentral.org/challenge.  \n\n    python3 parse_fasta.py --input_fasta \u003cinput.fasta\u003e --output_dir \u003coutput_dir\u003e --disprot_uniprot_mapping \u003cdisprot_uniprot_mapping.csv\u003e\n\n### 2. Obtain AlphaFold2 structures\n\nThe AlphaFold2 structures used for this paper is avaialable at [https://biocomputingup.it/shared/af2_af3_idr_analysis](https://biocomputingup.it/shared/af2_af3_idr_analysis/) and could be obtained by: \n\n    mkdir -p \u003coutput_dir\u003e/AF2/\n    cd \u003coutput_dir\u003e/AF2/\n    wget https://biocomputingup.it/shared/af2_af3_idr_analysis/caid3_disorder_pdb_af2.zip -O temp.zip \u0026\u0026 unzip temp.zip \u0026\u0026 rm temp.zip\n\n\nIf you want to obtain them for your own fasta file, you could run these codes:  \n\n    # downloads AF2 stuctures from AlphaFoldDB and writes the fails in a fasta file\n\n    python3 download_af2_files.py --input_fasta \u003cinput.fasta\u003e --disprot_uniprot_mapping \u003cdisprot_uniprot_mapping.csv\u003e --output_dir \u003coutput_dir\u003e --log_file \u003clog_dir/download_af2_files.log\u003e\n\n\nTo generate ColabFold structures for structures that were not found in AlphaFoldDB, we ran colabfold in singularity container. The installation guide is provided [here](https://github.com/sokrypton/ColabFold/wiki/Running-ColabFold-in-Docker). \n\n    cd \u003coutput_dir\u003e\n    mkdir -p AF2/colabfold\n    cd AF2/colabfold\n\n    singularity run --nv -B ~/.cache:/cache -B $(pwd):/work ../colabfold_1.5.5-cuda12.2.2.sif colabfold_batch --amber /work/afdb_failed.fasta /work/outputs/\n\n    cd outputs\n    find . -type f  -name \"*_relaxed_rank_*.pdb\" -exec cp {} \u003coutput_dir\u003e/AF2/structures/ \\; # copy the colabfold structures to directory with AF2 structures\n    \n\n\n## Obtain AlphaFold3 structures\nAlphaFold3 stuctures were manually downloaded from https://alphafoldserver.com for sequences in Disrder-PDB dataset. The mmCIF files were saved in `\u003coutput_dir\u003e/AF3/structures`. We are sharing the AlphaFold3 structures that we downloaded for Disorder-PDB dataset available at [https://biocomputingup.it/shared/af2_af3_idr_analysis](https://biocomputingup.it/shared/af2_af3_idr_analysis) in accordance with [AlphaFold Server Terms of Service](https://alphafoldserver.com/terms), subjected to to [AlphaFold Server Output Terms of Use](https://alphafoldserver.com/output-terms). \n\n    mkdir -p \u003coutput_dir\u003e/AF3/\n    cd \u003coutput_dir\u003e/AF3/\n\n    wget https://biocomputingup.it/shared/af2_af3_idr_analysis/caid3_disorder_pdb_af3.zip -O temp.zip \u0026\u0026 unzip temp.zip \u0026\u0026 rm temp.zip\n    \n## Running AlphaFold-disorder package\n\nAlphaFold disorder package could be cloned separately from [AlphaFold-disorder](https://github.com/BioComputingUP/AlphaFold-disorder/tree/main), The package needs dssp installed: https://github.com/PDB-REDO/dssp. If you have problems with mmcif_ma, try these where you clone AlphaFold-disorder:\n\n    wget https://mmcif.wwpdb.org/dictionaries/ascii/mmcif_ma.dic -O mmcif_ma.dic\n    \n    os.environ[\"MMCIF_MA_DIC\"] = \"/path/to/AlphaFold-disorder/mmcif_ma.dic\"\n\n    sudo curl -o /var/cache/libcifpp/components.cif https://files.wwpdb.org/pub/pdb/data/monomers/components.cif\n    sudo curl -o /var/cache/libcifpp/mmcif_ma.dic https://github.com/ihmwg/ModelCIF/raw/master/dist/mmcif_ma.dic\n    sudo curl -o /var/cache/libcifpp/mmcif_pdbx.dic https://mmcif.wwpdb.org/dictionaries/ascii/mmcif_pdbx_v50.dic\n\nif you have a problem with cmake: \n\n    pip install --upgrade cmake\n    /path/to/cmake -S . -B build\n    sudo /path/to/cmake --build build\n    sudo /path/to/cmake --install build\n\n    sudo curl -o /var/cache/libcifpp/components.cif https://files.wwpdb.org/pub/pdb/data/monomers/components.cif\n    sudo curl -o /var/cache/libcifpp/mmcif_ma.dic https://github.com/ihmwg/ModelCIF/raw/master/dist/mmcif_ma.dic\n    sudo curl -o /var/cache/libcifpp/mmcif_pdbx.dic https://mmcif.wwpdb.org/dictionaries/ascii/mmcif_pdbx_v50.dic\n\n\nAlphaFold-disorder package generates 4 outputs: \n1. Disorder prediction for given structures, based on 1 - pLDDT\n2. Disorder prediction for given structures, based on window averaged RSA\n3. Binding prediction for given structures, based on combining pLDDT and RSA\n4. A tsv (intermediate) file, containing `aa, plddt, 1-plddt, rsa, ss`, where ss is the secondary structure generated from dssp\n\nTo run the AlphaFold-disroder package on AF2 and AF3 structures: \n\n    python3 alphafold_disorder.py -i \u003coutput_dir\u003e/AF2/structures -o \u003coutput_dir\u003e/disorder_prediction/AF2/AlphaFold2.tsv -dssp mkdssp -f caid\n    python3 alphafold_disorder.py -i \u003coutput_dir\u003e/AF3/structures -o \u003coutput_dir\u003e/disorder_prediction/AF3/AlphaFold3.tsv -dssp mkdssp -f caid\n\n    # Copy the outputs of AlphaFold-disorder package to another folder that is ready for assessment\n\n    cp \u003coutput_dir\u003e/disorder_prediction/AF2/AlphaFold2_disorder.dat \u003coutput_dir\u003e/disorder_prediction/predictions/disorder/AlphaFold2-plddt.caid\n    cp \u003coutput_dir\u003e/disorder_prediction/AF2/AlphaFold2_disorder-25.dat \u003coutput_dir\u003e/disorder_prediction/predictions/disorder/AlphaFold2-rsa.caid\n \n    cp \u003coutput_dir\u003e/disorder_prediction/AF3/AlphaFold3_disorder.dat \u003coutput_dir\u003e/disorder_prediction/predictions/disorder/AlphaFold3-plddt.caid\n    cp \u003coutput_dir\u003e/disorder_prediction/AF3/AlphaFold3_disorder-25.dat \u003coutput_dir\u003e/disorder_prediction/predictions/disorder/AlphaFold3-rsa.caid\n\n## Evaluation of Predictions as in CAID\ncaid.py script runs the evaluation of prediction files as in CAID challenge, and saves the results in the folder given in --outputDir. The evaluation code could be cloned from https://github.com/marnec/vectorized_cls_metrics. We included the code in this repository for easier access too. \n\nfor caid.py, the disorder predictions (.caid files) must be in `\u003coutput_dir\u003e/disorder_prediction/predictions/disorder` , and the references must be in `\u003coutput_dir\u003e/disorder_prediction/references/disorder`. references, are the same fasta files with labels that were used in data preparation step. \n\ncaid.py also needs a --refList, that declares which methods should be assessed on what references. For example: \n\n    Method,disorder,binding,linker\n    AlphaFold2-rsa,1,0,0\n    AlphaFold2-plddt,1,0,0\n    AlphaFold3-rsa,1,0,0\n    AlphaFold3-plddt,1,0,0\n    AlphaFold2-binding,0,1,0\n    AlphaFold3-binding,0,1,0\n\nTo execute the script: \n\n    cd src\n    python3 caid.py \\\n        \u003coutput_dir\u003e/disorder_prediction/references/disorder/ \\\n        \u003coutput_dir\u003e/disorder_prediction/predictions/disorder/ \\\n        --refList \u003coutput_dir\u003e/inputs/associations.csv \\\n        --outputDir \u003coutput_dir\u003e/disorder_prediction/caid_results/\n\n## Analysis\nTo generate paper's figures and analysis, you can run the src/figures.ipynb, src/caid_evaluations.ipynb\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocomputingup%2Faf2_af3_idr_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbiocomputingup%2Faf2_af3_idr_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocomputingup%2Faf2_af3_idr_analysis/lists"}