https://github.com/raghavagps/nagbinder
A method for predicting NAG interacting residues in a protein from its primary sequence
https://github.com/raghavagps/nagbinder
machine-learning-model nag-binding nag-interacting prediction-algorithm protein-bioinformatics
Last synced: 2 days ago
JSON representation
A method for predicting NAG interacting residues in a protein from its primary sequence
- Host: GitHub
- URL: https://github.com/raghavagps/nagbinder
- Owner: raghavagps
- License: other
- Created: 2019-07-26T02:52:43.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2026-05-05T06:38:06.000Z (about 2 months ago)
- Last Synced: 2026-05-05T08:27:55.905Z (about 2 months ago)
- Topics: machine-learning-model, nag-binding, nag-interacting, prediction-algorithm, protein-bioinformatics
- Language: Python
- Homepage: http://webs.iiitd.edu.in/raghava/nagbinder
- Size: 10.6 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NAGbinder
Prediction of NAG interacting residues
NAGbinder is a Python-based tool for predicting NAG interacting residues in an uncharacterized protein chain. It involves various prediction models developed using machine learning techniques such as, Support Vector Classifier, Random Forest, Artificial Neural Network, which is implemented using Scikit package. These models are developed using features like (i) Binary Profile of patterns and (ii) Evolutionary Information (PSSM) matrix generated using PSI-BLAST.
Residues have the score equal or above the selected threshold are said to be “Interacting” whereas residues showing lesser value than the threshold are considered as “Non-Interacting”. Prediction model developed using binary profiles where Random Forest was implemented, performed best in our study.
# Reference
Patiyal et al. (2020) An approach for identifying N-acetylglucosamine interacting residues of a protein from its primary sequence.
Protein Sci. 201-210. doi: 10.1002/pro.3761
# Zenodo
https://doi.org/10.5281/zenodo.20034155
## Web Server
https://webs.iiitd.ac.in/raghava/nagbinder/
# Installation
Command for downloading NAGbinder
```
git clone https://github.com/raghavagps/nagbinder
```
NAGbinder is open-source Python-based software, which operates on the Python environment (Python version 3.3 or above) and can run on multi-OS systems (such as Windows, Linux and Mac operating systems). Before running NAGbinder, the user should make sure of all the following packages are available in their Python environment: sys, wget, os, shutil, scipy, numpy(), pandas(), sklearn version 0.19.1, math and re. Installation of Anaconda is recommended, and it is freely available on https://www.anaconda.com/download/ .
The user also needs to download the blastpr folder to run the prediction. Please run the commands given below to download and untar blastpr
**COMMANDS**
```
wget -c http://webs.iiitd.edu.in/gpsr2/blastpr.zip
unzip blastpr.zip
```
# For users who want to do prediction by using our NAGbinder package
```
1. cd nagbinder
2. unzip nag_models.zip
3. python3 nagbinder.py -h
```
# Examples for users to do NAG interacting residue prediction.
The input protein sequence for nagbinder.py should be in fasta format. Please find the example in example folder. The following parameters are required by nagbinder.py
**COMMAND**
```
python3 nagbinder.py -i -o -m -t
```
where,
- : Input file having sequence file in FASTA format
- : Output file generated by NAGbinder having prediction result
- \: User defined threshold score (between 0-1)
- \: Machine Learning method and the type of input feature it used
> The value of method can be between 1-6 with each numeral representing the following prediction methods:
>1. Binary SVC
>2. Binary Random Forest
>3. Binary MLP
>4. Binary KNN
>5. PSSM SVC
>6. PSSM Random Forest
### For more information type the following command
```
python3 nagbinder.py –h
```
In our package, we have provided 6 different machine learning models which utilizes different features.
- Method '1' is Support Vector Classifier which utilizes binary profile of the pattern as an input feature.
- Method '2' is Random Forest Classifier which also utilizes binary profile of the pattern as an input feature.
- Method '3' is Artificial Neural Network model developed using binary profile of the pattern as input feature.
- Method '4' is K Nearest Neighbor method developed using binary profile of the pattern as input feature.
- Method '5' is Support Vector Classifier which utilizes evolutionary information in the form of PSSM profile as an input feature.
- Method '6' is Random Forest classifier which also utilizes evolutionary information in the form of PSSM profile as an input feature. The PSSM profile is generated using PSI-BLAST by running against the SwissProt database.
## NAGbinder – Datasets
NAGbinder provides gold‑standard datasets of NAG ligand‑interacting protein chains derived from the PDB. Standard protocols were used for dataset generation. The datasets are non‑redundant (CD‑HIT at 40% sequence identity) and comprise 231 NAG‑binding protein chains, split into training and validation sets.
Dataset Protein chains NAG‑interacting residues Non‑interacting residues
Training 186 1,335 47,198
Validation 45 650 27,733
Total 231 1,985 74,931
To facilitate effective use, we provide three dataset types:
Protein chains with interaction annotation
Patterns of length 9 (binary profiles)
PSSM profiles of patterns (evolutionary information)
## 📁 Dataset Type 1 – Protein chains with interaction annotation
Contains full protein chains where interacting residues are marked with + and non‑interacting residues with -.
Dataset Description Files
Main 186 NAG‑interacting protein chains with residue‑level annotations
Validation 45 NAG‑interacting protein chains with residue‑level annotations
## 📁 Dataset Type 2 – Patterns (window length 9)
Contains sliding window patterns of length 9 generated from the PDB chains. Positive and negative patterns are provided separately for each chain.
Dataset Description Files
Main Patterns (window length 9) from 186 NAG‑interacting chains – separate positive/negative pattern files per chain
Validation Patterns (window length 9) from 45 NAG‑interacting chains – separate positive/negative pattern files per chain
## 📁 Dataset Type 3 – PSSM profiles of patterns (window length 9)
Contains PSSM (Position‑Specific Scoring Matrix) profiles for each pattern of length 9, generated from the PDB chains. Positive and negative profiles are provided separately for each chain.
Dataset Description Files
Main PSSM profiles for patterns from 186 NAG‑interacting chains – separate positive/negative profile files per chain
Validation PSSM profiles for patterns from 45 NAG‑interacting chains – separate positive/negative profile files per chain
M profiles for patterns from 45 NAG‑interacting chains – separate positive/negative profile files per chain