{"id":21023863,"url":"https://github.com/biogenies/ampbenchmark","last_synced_at":"2025-08-03T23:05:23.200Z","repository":{"id":128490704,"uuid":"484179294","full_name":"BioGenies/AMPBenchmark","owner":"BioGenies","description":null,"archived":false,"fork":false,"pushed_at":"2024-07-29T10:04:47.000Z","size":3039,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-03T06:41:59.589Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://biogenies.info/AMPBenchmark/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BioGenies.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-21T19:27:04.000Z","updated_at":"2025-03-05T06:13:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"95f70730-b0d4-48c7-bf4f-5782f064f377","html_url":"https://github.com/BioGenies/AMPBenchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioGenies%2FAMPBenchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioGenies%2FAMPBenchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioGenies%2FAMPBenchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioGenies%2FAMPBenchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BioGenies","download_url":"https://codeload.github.com/BioGenies/AMPBenchmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254304695,"owners_count":22048448,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T11:20:14.529Z","updated_at":"2025-05-15T08:32:58.411Z","avatar_url":"https://github.com/BioGenies.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---      \n\n# AMPBenchmark\n\nAMPBenchmark is a part of our initative for the improvement of benchmarking standards in the field of antimicrobial peptide (AMP) prediction.\n\n## How to use the public data?\n\n1. Download the benchmark sequence data: \n    - [Dropbox link](https://www.dropbox.com/scl/fi/6hxboi6xy1jm1q1ie6vyg/AMPBenchmark_public.fasta?rlkey=3egb368kyh347fdfamcfd75m0\u0026st=ld02vyiv\u0026dl=0).\n    - [GitHub link](https://raw.githubusercontent.com/BioGenies/AMPBenchmark/main/data/AMPBenchmark_public.fasta?token=GHSAT0AAAAAABS4SIUMO3EI6JSQJJ2OC62WYUT5E6A).\n2. Download the training sequence data for all methods and replications:\n    - [Dropbox link](https://www.dropbox.com/scl/fo/f8kdfgoa8htsvpc79v0u2/ANOcYXz3fSRyE5kEumDDsVs?rlkey=a0su8jyn5nsjnzs2gkqya5n24\u0026st=xd69dycx\u0026dl=0).\n3. Train your model using each of the training data set (class of a sequence is denoted by AMP=1 for AMPs and AMP=0 for negative samples, see [Sequence data](https://github.com/BioGenies/AMPBenchmark#sequence-data) section for details.)\n4. Benchmark trained models against our data. Make sure to use a subset of sequences for appropriate replication (replication number is denoted by, e.g. rep=1, see [Sequence data](https://github.com/BioGenies/AMPBenchmark#sequence-data) section for details.)\n5. Submit the results in the format described below to the [AMPBenchmark web server](http://biogenies.info/AMPBenchmark/).\n\n### Data submission format\n\n| ID                      | training_sampling |AMP_probability |\n|-------------------------|-------------------|----------------|\n| DBAASP_10018_AMP=1_rep1 | dbAMP             |0.97            |\n| DBAASP_3217_AMP=1_rep1  | dbAMP             |0.61            |\n| ...                     | ...               |...             |\n\n\n - **ID**: must contain the sequence ID, as provided in the FASTA headers of the input sequences. \n - **training_sampling**: has to contain the type of negative sampling method used to train the model. Possible values are: *AMAP*, *AmpGram*, *ampir-mature*, *AMPlify*, *AMPScannerV2*, *CS-AMPPred*, *dbAMP*, *Gabere\u0026Noble*, *iAMP-2L*, *Wang-et-al*, *Witten\u0026Witten*. Remember that a proper benchmark requires you to train your model using every provided sampling method and evaluate it using all sampling methods using appropriate replication.\n - **AMP_probability**: has to be in the range between 0 and 1.\n \nExample data for a random classifier can be downloaded from [Dropbox](https://www.dropbox.com/scl/fi/xqeqdsygkxjg5qt2b7ezg/sample_data.csv?rlkey=ql7gtoumuecwbg5tr0frl81bb\u0026st=w7pdevvn\u0026dl=0).\n\n### Sequence data\n\nThe input data is hosted on [Dropbox](https://www.dropbox.com/scl/fi/6hxboi6xy1jm1q1ie6vyg/AMPBenchmark_public.fasta?rlkey=3egb368kyh347fdfamcfd75m0\u0026st=wj8wc93f\u0026dl=0) and [GitHub](https://raw.githubusercontent.com/BioGenies/AMPBenchmark/main/data/AMPBenchmark_public.fasta?token=GHSAT0AAAAAABS4SIUMO3EI6JSQJJ2OC62WYUT5E6A). Note that this single file contains data for all replications which should be used separately with appropriate replications of training sets. \n\nThe training data sets are hosted on [Dropbox](https://www.dropbox.com/scl/fo/f8kdfgoa8htsvpc79v0u2/ANOcYXz3fSRyE5kEumDDsVs?rlkey=a0su8jyn5nsjnzs2gkqya5n24\u0026st=vpcy0lyc\u0026dl=0) and follow the same naming convention.  \n\nThere are two types of the input sequences:\n\n - positive sequence (e.g., **DBAASP_10718**\\_*AMP=1*\\_rep1): **IDinDBAASP**\\_*class*\\_replicateID.\n - negative sequences (e.g., **Seq1896_sampling\\_method=Gabere\u0026Noble**\\_*AMP=0*\\_rep4): **IDandSamplingMethod**\\_*class*\\_replicateID.\n \nAMP sequences are derived from the [DBAASP database](https://dbaasp.org/).\n\nmd5 sum of the **AMPBenchmark_public.fasta**: 58f1424c057aaeb64bc632cad6038cad.\n\n\n```{r echo = FALSE, results = 'asis'}\nsource(\"https://raw.githubusercontent.com/BioGenies/NegativeDatasets/main/docs/rmd_scripts.R\")\ncat(negative_sampling_citation())\n```\n\n\n```{r echo = FALSE, results = 'asis'}\ncat(negative_sampling_links())\n```\n\n  \n```{r echo = FALSE, results = 'asis'}\ncat(negative_sampling_contact())\n```\n\n## Changelog\n\n - 2024/07/29: updated dropbox links.\n - 2023/01/11: fixed data processing.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiogenies%2Fampbenchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbiogenies%2Fampbenchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiogenies%2Fampbenchmark/lists"}