https://github.com/biocomputingup/alphafold-disorder
Predict disorder and disorder binding from AlphaFold structures
https://github.com/biocomputingup/alphafold-disorder
Last synced: 8 months ago
JSON representation
Predict disorder and disorder binding from AlphaFold structures
- Host: GitHub
- URL: https://github.com/biocomputingup/alphafold-disorder
- Owner: BioComputingUP
- License: gpl-3.0
- Created: 2021-09-03T10:27:24.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2024-11-25T13:59:33.000Z (over 1 year ago)
- Last Synced: 2024-11-25T14:49:25.213Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 38.1 KB
- Stars: 14
- Watchers: 6
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AlphaFold-disorder
#### Disorder and binding region detection from AlphaFold predicted structures
The script parses and processes PDB files generated by AlphaFold. It expects the pLDDT score in the B-factor column. As intermediate (mandatory) step it calculates the Relative Solvent Accessibility (RSA) as provided by DSSP and BioPython.
#### Dependencies
- Python3
- NumPy
- Pandas
- BioPython
- DSSP 3.x ("mkdssp" executable)
#### Usage
The script takes in input a folder with PDB files and writes two TSV files.
python3 alphafold_disorder.py -i pdbs/ -o out.tsv
##### Additional parameters
- ***rsa_window*** (default 25) - RSA values are smoothed over a window centered on the residue to predict
- ***rsa_threshold*** (default 0.581) - Binding predictions are overweighted when disorder prediction is above this threshold
Both parameters take a space separated list of values (floats). The program generates an output for each possible combination of the provided lists.
##### Output format
###### TSV
By default, the program uses
the TSV format and generates two files ***out_data.tsv*** and ***out_pred.tsv***, representing intermediate calculation
(DSSP output) and the final prediction, respectively.
The last two columns (**disorder-********, **binding-*****-***)
are the relevant ones representing the disorder and binding propensities.
```
name pos aa lddt disorder rsa disorder-25 binding-25-0.581
P47710 1 M 0.688 0.312 1.000 0.680 0.869
P47710 2 R 0.832 0.168 0.879 0.691 0.929
P47710 3 L 0.850 0.150 0.854 0.696 0.937
P47710 4 L 0.863 0.137 0.756 0.705 0.943
...
Q5RJL0 67 V 0.502 0.498 0.951 0.896 0.791
Q5RJL0 68 L 0.511 0.489 1.000 0.881 0.795
Q5RJL0 69 P 0.449 0.551 0.787 0.866 0.769
Q5RJL0 70 R 0.514 0.486 1.000 0.864 0.796
...
```
###### CAID
The CAID format can be generated with the command below.
python3 alphafold_disorder.py -i pdbs/ -o out.tsv -f caid
The program will generate different files for different types of prediction and different combination of parameters:
- out_disorder.dat, disorder based on pLDDT
- out_disorder-.dat, disorder based on RSA and smoothed over a window
- out_binding--.dat, binding prediction wighted based on a threshold on the smoothed RSA
```
>P47710
1 M 0.68
2 R 0.691
3 L 0.696
4 L 0.705
...
67 V 0.896
68 L 0.881
69 P 0.866
70 R 0.864
...
```
#### How to cite
Piovesan D, Monzon AM, Tosatto SCE.
Intrinsic protein disorder and conditional folding in AlphaFoldDB.
Protein Sci. 2022 Nov;31(11):e4466.
PMID: [36210722](https://pubmed.ncbi.nlm.nih.gov/36210722/)
PMCID: [PMC9601767](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601767/).