{"id":20693433,"url":"https://github.com/nanxstats/protr","last_synced_at":"2026-03-14T22:11:21.608Z","repository":{"id":5544823,"uuid":"6748764","full_name":"nanxstats/protr","owner":"nanxstats","description":"🧬 Toolkit for generating various numerical features of protein sequences","archived":false,"fork":false,"pushed_at":"2025-08-25T20:29:39.000Z","size":11035,"stargazers_count":53,"open_issues_count":5,"forks_count":13,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-12-06T20:25:12.181Z","etag":null,"topics":["bioinformatics","feature-engineering","feature-extraction","machine-learning","peptides","protein-sequences","sequence-analysis"],"latest_commit_sha":null,"homepage":"https://nanx.me/protr/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nanxstats.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2012-11-18T16:52:47.000Z","updated_at":"2025-08-21T01:08:28.000Z","dependencies_parsed_at":"2022-09-23T22:00:38.581Z","dependency_job_id":"f0d8ed56-26ef-4928-8acd-846fee475eb5","html_url":"https://github.com/nanxstats/protr","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/nanxstats/protr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanxstats%2Fprotr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanxstats%2Fprotr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanxstats%2Fprotr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanxstats%2Fprotr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nanxstats","download_url":"https://codeload.github.com/nanxstats/protr/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanxstats%2Fprotr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30519313,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-14T19:51:21.629Z","status":"ssl_error","status_checked_at":"2026-03-14T19:51:12.959Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","feature-engineering","feature-extraction","machine-learning","peptides","protein-sequences","sequence-analysis"],"created_at":"2024-11-16T23:26:38.323Z","updated_at":"2026-03-14T22:11:21.593Z","avatar_url":"https://github.com/nanxstats.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# protr  \u003cimg src=\"man/figures/logo.png\" align=\"right\" width=\"120\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/nanxstats/protr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/nanxstats/protr/actions/workflows/R-CMD-check.yaml)\n[![CRAN Version](https://www.r-pkg.org/badges/version/protr)](https://cran.r-project.org/package=protr)\n[![Downloads from the RStudio CRAN mirror](https://cranlogs.r-pkg.org/badges/protr)](https://cranlogs.r-pkg.org/badges/protr)\n\u003c!-- badges: end --\u003e\n\nComprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) ([PDF](https://nanx.me/papers/protr.pdf)).\n\n## Paper citation\n\nFormatted citation:\n\nNan Xiao, Dong-Sheng Cao, Min-Feng Zhu, Qing-Song Xu (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. _Bioinformatics_, **31**(11), 1857--1859.\n\nBibTeX entry:\n\n```\n@article{Xiao2015,\n  author  = {Xiao, Nan and Cao, Dong-Sheng and Zhu, Min-Feng and Xu, Qing-Song.},\n  title   = {protr/{ProtrWeb}: {R} package and web server for generating various numerical representation schemes of protein sequences},\n  journal = {Bioinformatics},\n  year    = {2015},\n  volume  = {31},\n  number  = {11},\n  pages   = {1857--1859},\n  doi     = {10.1093/bioinformatics/btv042}\n}\n```\n\n## Installation\n\nTo install protr from CRAN:\n\n```r\ninstall.packages(\"protr\")\n```\n\nOr try the latest version on GitHub:\n\n```r\nremotes::install_github(\"nanxstats/protr\")\n```\n\n[Browse the package vignette](https://nanx.me/protr/articles/protr.html) for a quick-start.\n\n## Shiny app\n\nProtrWeb, the Shiny web application built on protr, can be accessed from [http://protr.org](http://protr.org).\n\nProtrWeb is a user-friendly web application for computing the protein sequence descriptors (features) presented in the protr package.\n\n## List of supported descriptors\n\n### Commonly used descriptors\n\n- Amino acid composition descriptors\n  - Amino acid composition\n  - Dipeptide composition\n  - Tripeptide composition\n\n- Autocorrelation descriptors\n  - Normalized Moreau-Broto autocorrelation\n  - Moran autocorrelation\n  - Geary autocorrelation\n\n- CTD descriptors\n  - Composition\n  - Transition\n  - Distribution\n\n- Conjoint Triad descriptors\n\n- Quasi-sequence-order descriptors\n  - Sequence-order-coupling number\n  - Quasi-sequence-order descriptors\n\n- Pseudo amino acid composition (PseAAC)\n  - Pseudo amino acid composition\n  - Amphiphilic pseudo amino acid composition\n\n- Profile-based descriptors\n  - Profile-based descriptors derived by PSSM (Position-Specific Scoring Matrix)\n\n### Proteochemometric (PCM) modeling descriptors\n\n- Scales-based descriptors derived by principal components analysis\n  - Scales-based descriptors derived by amino acid properties (AAindex)\n  - Scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.)\n  - Scales-based descriptors derived by factor analysis\n  - Scales-based descriptors derived by multidimensional scaling\n  - BLOSUM and PAM matrix-derived descriptors\n\n### Similarity computation\n\nLocal and global pairwise sequence alignment for protein sequences:\n\n- Between two protein sequences\n- Parallelized pairwise similarity calculation with a list of protein sequences\n- Parallelized pairwise similarity calculation between two sets of protein sequences\n\nGO semantic similarity measures:\n\n- Between two groups of GO terms / two Entrez Gene IDs\n- Parallelized pairwise similarity calculation with a list of GO terms / Entrez Gene IDs\n\n### Miscellaneous tools and datasets\n\n- Retrieve protein sequences from UniProt\n- Read protein sequences in FASTA format\n- Read protein sequences in PDB format\n- Sanity check of the amino acid types appeared in the protein sequences\n- Protein sequence segmentation\n- Auto cross covariance (ACC) for generating scales-based descriptors of the same length\n- 20+ pre-computed 2D and 3D descriptor sets for the 20 amino acids to use with the scales-based descriptors\n- BLOSUM and PAM matrices for the 20 amino acids\n- Meta information of the 20 amino acids\n\n## Contribute\n\nTo contribute to this project, please take a look at the\n[Contributing Guidelines](https://nanx.me/protr/CONTRIBUTING.html) first.\nPlease note that the protr project is released with a\n[Contributor Code of Conduct](https://nanx.me/protr/CODE_OF_CONDUCT.html).\nBy contributing to this project, you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnanxstats%2Fprotr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnanxstats%2Fprotr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnanxstats%2Fprotr/lists"}