{"id":32639480,"url":"https://github.com/raghavagps/hemopi2","last_synced_at":"2025-10-31T02:12:14.734Z","repository":{"id":244204226,"uuid":"814555176","full_name":"raghavagps/hemopi2","owner":"raghavagps","description":"HemoPI2: Prediction of hemolytic activity of peptides against mammalian RBCs","archived":false,"fork":false,"pushed_at":"2025-02-10T08:38:51.000Z","size":221,"stargazers_count":8,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-16T09:22:49.332Z","etag":null,"topics":["hemolytic-peptides","hemotoxicity","machine-learning","quantum-computing","red-blood-cells"],"latest_commit_sha":null,"homepage":"http://webs.iiitd.edu.in/raghava/hemopi2/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raghavagps.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-13T08:33:56.000Z","updated_at":"2025-07-31T08:14:19.000Z","dependencies_parsed_at":"2024-12-03T09:38:08.037Z","dependency_job_id":null,"html_url":"https://github.com/raghavagps/hemopi2","commit_stats":null,"previous_names":["anandr88/hemopi2","raghavagps/hemopi2"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/raghavagps/hemopi2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raghavagps%2Fhemopi2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raghavagps%2Fhemopi2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raghavagps%2Fhemopi2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raghavagps%2Fhemopi2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raghavagps","download_url":"https://codeload.github.com/raghavagps/hemopi2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raghavagps%2Fhemopi2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281914569,"owners_count":26583084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-31T02:00:07.401Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hemolytic-peptides","hemotoxicity","machine-learning","quantum-computing","red-blood-cells"],"created_at":"2025-10-31T02:12:10.398Z","updated_at":"2025-10-31T02:12:14.719Z","avatar_url":"https://github.com/raghavagps.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HemoPI2\nA method for predicting hemolytic activity of the peptides\n# Introduction\nHemoPI2 is developed for identification (Classification) as well as quantification (regression) methods for predicting hemolytic activity peptides with their hemolytic concentration (HC50 value), especially targets for mammalian red blood cells (RBCs). It uses various composition based features for predicting hemolytic activity peptides. The final model also deploys a motif-based module which has been implemented using MERCI. More information on HemoPI2.0 is available from its web server http://webs.iiitd.edu.in/raghava/hemopi2. Please read/cite the content about HemoPI2 for complete information including algorithm behind the approach.\n## Reference\nRathore et al. Prediction of hemolytic peptides and their hemolytic concentration. Commun Biol 8, 176 (2025). https://doi.org/10.1038/s42003-025-07615-w\n\n## PIP Installation\nPIP version is also available for easy installation and usage of this tool. The following command is required to install the package \n```\npip install hemopi2\n```\nTo know about the available option for the pip package, type the following command:\n```\nhemopi2_regression.py -h\nhemopi2_classification.py -h\n```\n\n# Standalone\n\nStandalone version of HemoPI2 is written in python3 and the following libraries are necessary for a successful run:\n\n- scikit-learn\n```\n  pip install scikit-learn==1.3.1\n```\n- Pandas\n- Numpy\n- PyTorch: PyTorch is an open-source machine learning library. You can install it using pip (Python’s package installer). Open your terminal and type:\n```\n!pip install torch\n```\n- Transformers: The Transformers library provides state-of-the-art machine learning models like ESM. Install it with:\n```\n!pip install transformers\n```\n- ESM: ESM (Evolutionary Scale Modeling) is a library for protein sequence modeling.\n```\n!pip install git+https://github.com/facebookresearch/esm.git\n```\n# Important Note\n\n- Due to large size of the model file, we have compressed model directory and uploaded on our webserver. https://webs.iiitd.edu.in/raghava/hemopi2/download.html\n- Download this zip file \n- It is crucial to unzip the file before attempting to use the code or model. The compressed file must be extracted to its original form for the code to function properly.\n\n\n# Regression\nPredicts the Hazardous Concentration (HC50) or Half Maximum Effective Concentration (EC50) in μM. This indicates the concentration at which 50% of red blood cells (RBCs) are lysed. This model operates on the Random Forest Regressor (RFR) algorithm. \n\n**Minimum USAGE**\nTo know about the available option for the standalone, type the following command: \n```\nhemopi2_regrssion.py -h\n```\nTo run the example, type the following command:\n```\nhemopi2_regrssion.py -i peptide.fa\n\n```\n**Full Usage**: \n```\nFollowing is complete list of all options, you may get these options\nusage: hemopi2_regrssion.py [-h] \n                     [-i INPUT]\n                     [-o OUTPUT]\n                     [-j {1,2,3,4}] \n                     [-d {1,2}]\n                     [-wd Working Directory]\n```\n```\nPlease provide following arguments\n\noptions:\n  -h, --help            show this help message and exit\n  -i INPUT, --input INPUT\n                        Input: protein or peptide sequence(s) in FASTA format or single sequence per line in single letter code\n  -o OUTPUT, --output OUTPUT\n                        Output: File for saving results by default outfile.csv\n  -j {1,2,3,4}, --job {1,2,3,4}\n                        Job Type: 1: Predict, 2: Protein Scanning, 3: Design, 4: Design all possible mutants,by default 1\n  -p POSITION, --Position POSITION\n                        Position of mutation (1-indexed)\n  -r RESIDUES, --Residues RESIDUES\n                        Mutated residues (one or two of the 20 essential amino acids in upper case)\n  -w {8,9,10,11,12,13,14,15,16,17,18,19,20}, --winleng {8,9,10,11,12,13,14,15,16,17,18,19,20}\n                        Window Length: 8 to 20 (scan mode only), by default 8\n  -d {1,2}, --display {1,2}\n                        Display: 1: Hemolytic, 2: All peptides, by default 2\n  -wd WORKING, --working WORKING\n                        Working Directory: Location for writing results\n\n```\n**Input File**: It allow users to provide input in two format; i) FASTA format (standard) (e.g. peptide.fa) and ii) Simple Format. In case of simple format, file should have one peptide sequence in a single line in single letter code (eg. peptide.seq). \n\n**Output File**: Program will save result in CSV format, in case user do not provide output file name, it will be stored in outfile.csv.\n\n**Jobs**:  In this program, two models have been incorporated;  \n1) Prediction: Prediction for predicting given input peptide sequence as hemolytic and non-hemolytic peptide.\n2) Protein Scanning: for the prediction of hemolytic regions in a protein sequence.\n3) Design: generates mutant peptides with a single amino acid or dipeptide at particulal position provided by user and predict their hemolytic activity. Provide residue (-r) and position (-p) while using this job.\n4) Design all possible mutants: Design all possible mutants predict their hemolytic activity.\n\n**Position**: User can provide position at which he/she wants insert any single amino acid or dipeptide for creating mutation. This option is available for only Design module.\n\n**Residue**: Mutated residues (one or two of the 20 essential amino acids in upper case) (e.g., A for Alanine)\n\n**Window length**: User can choose any pattern length between 8 and 20 in long sequences. This option is available for only protein scan module.\n\n**Working Directory**: Location for writing results\n\n# Classification\nDetermines whether peptides are hemolytic or non-hemolytic based on their primary sequence. We have employed machine learning models and protein language models. The provided options include RF and ESM2-t6 models, as well as their hybrids with MERCI. You can select your preferred model for prediction. By default, this use the Hybrid1 (ESM2-t6+MERCI) approach, which has demonstrated best performance on our evaluation on independent dataset as well as runtime efficient.\n\n**Minimum USAGE**\nTo know about the available option for the standalone, type the following command: \n```\nhemopi2_classification.py -h\n```\nTo run the example, type the following command:\n```\nhemopi2_classification -i peptide.fa\n\n```\n**Full Usage**: \n```\nFollowing is complete list of all options, you may get these options\nusage: toxinpred3.py [-h] \n                     [-i INPUT]\n                     [-o OUTPUT]\n                     [-t THRESHOLD]\n                     [-j {1,2,3,4}]\n                     [-m {1,2,3,4}] \n                     [-d {1,2}]\n                     [-wd Working Directory]\n```\n```\nPlease provide following arguments\n\noptions:\n  -h, --help            show this help message and exit\n  -i INPUT, --input INPUT\n                        Input: protein or peptide sequence(s) in FASTA format or single sequence per line in single letter code\n  -o OUTPUT, --output OUTPUT\n                        Output: File for saving results by default outfile.csv\n  -j {1,2,3,4,5}, --job {1,2,3,4,5}\n                        Job Type: 1: Predict, 2: Protein Scanning, 3: Design, 4: Design all possible mutants, 5: Motif Scanning, by default 1\n  -m {1,2,3,4}, --model {1,2,3,4}\n                        Model: 1: Random Forest, 2: Hybrid1 (RF+MERCI), 3: ESM2-t6, 4: Hybrid2 (ESM+MERCI) by default 4\n  -t THRESHOLD, --threshold THRESHOLD\n                        Threshold: Value between 0 to 1 by default 0.46 (For RF and Hybrid1) and 0.55 (For ESM and Hybrid2)\n  -p POSITION, --Position POSITION\n                        Position of mutation (1-indexed)\n  -r RESIDUES, --Residues RESIDUES\n                        Mutated residues (one or two of the 20 essential amino acids in upper case)\n  -w {8,9,10,11,12,13,14,15,16,17,18,19,20}, --winleng {8,9,10,11,12,13,14,15,16,17,18,19,20}\n                        Window Length: 8 to 20 (scan mode only), by default 8\n  -wd WORKING, --working WORKING\n                        Working Directory: Location for writing results\n  -d {1,2}, --display {1,2}\n                        Display: 1: Hemolytic, 2: All peptides, by default 2\n\n```\n\n**Input File**: It allow users to provide input in two format; i) FASTA format (standard) (e.g. peptide.fa) and ii) Simple Format. In case of simple format, file should have one peptide sequence in a single line in single letter code (eg. peptide.seq). \n\n**Output File**: Program will save result in CSV format, in case user do not provide output file name, it will be stored in outfile.csv.\n\n**Threshold**: User should provide threshold between 0 and 1, please note score is proportional to hemolytic potential of peptide.\n\n**Jobs**:  In this program, two models have been incorporated;  \n1) Prediction: Prediction for predicting given input peptide sequence as hemolytic and non-hemolytic peptide.\n2) Protein Scanning: for the prediction of hemolytic regions in a protein sequence.\n3) Design: generates mutant peptides with a single amino acid or dipeptide at particulal position provided by user and predict their hemolytic activity. Provide residue (-r) and position (-p) while using this job.\n4) Design all possible mutants: Design all possible mutants predict their hemolytic activity.\n5) Motif Scanning: This job facilitates users in scanning or mapping hemolytic motifs within the query sequence using MERCI. \n\n**Models**:  In this program, four models have been incorporated;  \ni) Model1 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using Random Forest (RF) algorithm based on various composition based features using Pfeature tool of the peptide; \n\nii) Model3 Model1 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using protein language model ESM2-t6.\n\niii) Model2 \u0026 Model4 for predicting given input peptide sequence as hemolytic and non-hemolytic peptide using Hybrid approach, the first ensemble is ESM2-t6 and MERCI second is RF and MERCI. It combines the scores generated from machine learning (ET) and protein language model (ESM2-t6), and MERCI as Hybrid Score, and the prediction is based on Hybrid Score.\n\n**Position**: User can provide position at which he/she wants insert any single amino acid or dipeptide for creating mutation. This option is available for only Design module.\n\n**Residue**: Mutated residues (one or two of the 20 essential amino acids in upper case) (e.g., A for Alanine)\n\n**Window length**: User can choose any pattern length between 8 and 20 in long sequences. This option is available for only protein scan module.\n\n**Working Directory**: Location for writing results\n\n\n\nHemoPI2.0 Package Files\n=======================\nIt contain following files, brief description of these files given below\n\nINSTALLATION  \t: Installation instructions\n\nLICENSE       \t: License information\n\nmerci : This folder contains the program to run MERCI\n\nREADME.md     \t: This file provide information about this package\n\nhemopi2_regrssion.py \t:  Python program for regrssion\n\nhemopi2_classification.py  :  Python program for classification\n\npeptide.fa\t: Example file contain peptide sequences in FASTA format\n\npeptide.seq\t: Example file contain peptide sequences in simple format\n\n## Installation via PIP\nUser can install Hemopi2.0 via PIP also\n```\npip install hemopi2\n```\nTo know about the available option for the pip package, type the following command:\n\nFor regression: \n```\nhemopi2_regression -h\n```\nFor classification: \n```\nhemopi2_classification -h\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraghavagps%2Fhemopi2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraghavagps%2Fhemopi2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraghavagps%2Fhemopi2/lists"}