https://github.com/msikorski93/protein-tertiary-structure
Performing a regression task for estimating residue size based on given physicochemical properties of protein tertiary structures (CASP 5-9).
https://github.com/msikorski93/protein-tertiary-structure
bioinformatics gradient-boosting multilayer-perceptron-network protein-structure-prediction regression-algorithms scikit-learn tensorflow
Last synced: 8 months ago
JSON representation
Performing a regression task for estimating residue size based on given physicochemical properties of protein tertiary structures (CASP 5-9).
- Host: GitHub
- URL: https://github.com/msikorski93/protein-tertiary-structure
- Owner: msikorski93
- Created: 2023-02-15T22:18:38.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-15T22:50:25.000Z (over 2 years ago)
- Last Synced: 2025-01-09T07:51:11.242Z (10 months ago)
- Topics: bioinformatics, gradient-boosting, multilayer-perceptron-network, protein-structure-prediction, regression-algorithms, scikit-learn, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 1.27 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Protein-Tertiary-Structure
One of the most important topics of research in the field of molecular biology is the 3D reconstruction of macromolecules like proteins. The study of protein tertiary structure contributes to protein function and also used for medicine design and drug discovery. The prediction of protein structure is one of the most challenging tasks in bioinformatics. This caused in recent years soft computing technologies as the best possibilities to solve this type of tasks.
To estimate residue size for each protein we used the following supervised learning algorithms:
* artificial neural network,
* random forest,
* multi-layer perceptron,
* gradient boosting.
| Model | MAE | RMSE | R2 Score |
|------------------|----------|----------|----------|
| RandomForest | 0.466395 | 0.641751 | 0.588156 |
| GradientBoosting | 0.485399 | 0.659230 | 0.565416 |
| MLP | 0.488006 | 0.667914 | 0.553890 |
| ANN | 0.505709 | 0.689411 | 0.524713 |
We achieved such results while evaluating the models. Overall, the scores are moderate. The dataset does not have much characteristics for developing machine learning models with high usefullness. The top regressor with highest performance was the **random forest**. Estimating residue size with this dataset is possible, yet still not accurate.