An open API service indexing awesome lists of open source software.

https://github.com/biocomputingup/alphafold-disorder

Predict disorder and disorder binding from AlphaFold structures
https://github.com/biocomputingup/alphafold-disorder

Last synced: 8 months ago
JSON representation

Predict disorder and disorder binding from AlphaFold structures

Awesome Lists containing this project

README

          

# AlphaFold-disorder
#### Disorder and binding region detection from AlphaFold predicted structures

The script parses and processes PDB files generated by AlphaFold. It expects the pLDDT score in the B-factor column. As intermediate (mandatory) step it calculates the Relative Solvent Accessibility (RSA) as provided by DSSP and BioPython.

#### Dependencies
- Python3
- NumPy
- Pandas
- BioPython
- DSSP 3.x ("mkdssp" executable)

#### Usage

The script takes in input a folder with PDB files and writes two TSV files.

python3 alphafold_disorder.py -i pdbs/ -o out.tsv

##### Additional parameters

- ***rsa_window*** (default 25) - RSA values are smoothed over a window centered on the residue to predict
- ***rsa_threshold*** (default 0.581) - Binding predictions are overweighted when disorder prediction is above this threshold

Both parameters take a space separated list of values (floats). The program generates an output for each possible combination of the provided lists.

##### Output format

###### TSV
By default, the program uses
the TSV format and generates two files ***out_data.tsv*** and ***out_pred.tsv***, representing intermediate calculation
(DSSP output) and the final prediction, respectively.
The last two columns (**disorder-********, **binding-*****-***)
are the relevant ones representing the disorder and binding propensities.
```
name pos aa lddt disorder rsa disorder-25 binding-25-0.581
P47710 1 M 0.688 0.312 1.000 0.680 0.869
P47710 2 R 0.832 0.168 0.879 0.691 0.929
P47710 3 L 0.850 0.150 0.854 0.696 0.937
P47710 4 L 0.863 0.137 0.756 0.705 0.943
...
Q5RJL0 67 V 0.502 0.498 0.951 0.896 0.791
Q5RJL0 68 L 0.511 0.489 1.000 0.881 0.795
Q5RJL0 69 P 0.449 0.551 0.787 0.866 0.769
Q5RJL0 70 R 0.514 0.486 1.000 0.864 0.796
...
```

###### CAID
The CAID format can be generated with the command below.

python3 alphafold_disorder.py -i pdbs/ -o out.tsv -f caid

The program will generate different files for different types of prediction and different combination of parameters:
- out_disorder.dat, disorder based on pLDDT
- out_disorder-.dat, disorder based on RSA and smoothed over a window
- out_binding--.dat, binding prediction wighted based on a threshold on the smoothed RSA

```
>P47710
1 M 0.68
2 R 0.691
3 L 0.696
4 L 0.705
...
67 V 0.896
68 L 0.881
69 P 0.866
70 R 0.864
...
```

#### How to cite

Piovesan D, Monzon AM, Tosatto SCE.

Intrinsic protein disorder and conditional folding in AlphaFoldDB.
Protein Sci. 2022 Nov;31(11):e4466.

PMID: [36210722](https://pubmed.ncbi.nlm.nih.gov/36210722/)
PMCID: [PMC9601767](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601767/).