{"id":19818913,"url":"https://github.com/lkytal/predfull","last_synced_at":"2025-05-01T11:32:36.847Z","repository":{"id":38398394,"uuid":"190229761","full_name":"lkytal/PredFull","owner":"lkytal","description":"This work was published on Analytical Chemistry: Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network","archived":false,"fork":false,"pushed_at":"2023-08-04T16:18:41.000Z","size":412351,"stargazers_count":27,"open_issues_count":1,"forks_count":12,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-04-14T06:21:18.022Z","etag":null,"topics":["mass-spectrometry","peptides","spectrum","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lkytal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-06-04T15:32:16.000Z","updated_at":"2024-04-01T12:32:22.000Z","dependencies_parsed_at":"2022-08-25T06:11:27.588Z","dependency_job_id":"0fffdc4d-634d-4c17-8412-9fb732c5c87e","html_url":"https://github.com/lkytal/PredFull","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lkytal%2FPredFull","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lkytal%2FPredFull/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lkytal%2FPredFull/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lkytal%2FPredFull/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lkytal","download_url":"https://codeload.github.com/lkytal/PredFull/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224253285,"owners_count":17280936,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mass-spectrometry","peptides","spectrum","tensorflow"],"created_at":"2024-11-12T10:17:14.115Z","updated_at":"2024-11-12T10:17:14.814Z","avatar_url":"https://github.com/lkytal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PredFull\r\n\r\n__Visit [http://predfull.com/](http://predfull.com/) to try online prediction__\r\n\r\n\u003e This work was published on Analytical Chemistry: [`Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network`](https://pubs.acs.org/doi/10.1021/acs.analchem.9b04867)\r\n\u003e\r\n\u003e Kaiyuan Liu, Sujun Li, Lei Wang, Yuzhen Ye, Haixu Tang\r\n\r\nThe first model for predicting complete tandem mass spectra from peptides sequences, using a deep CNN neural network trained on over 2 million experimental spectra.\r\n\r\nFree for academic uses.\r\n\r\n## Update History\r\n\r\n* 2022.05.19: Support input peptide of any length\r\n* 2021.05.18: Support predicting peptides with oxidized methionine.\r\n* 2021.01.01: Update example results.\r\n* 2020.08.22: Fixed performance issues.\r\n* 2020.05.25: Support predicting non-tryptic peptides.\r\n* 2019.09.01: First version.\r\n\r\n## Method\r\n\r\nBased on the structure of the residual convolutional networks. Current precision (bin size): 0.1 Th.\r\n\r\n![model](imgs/model.png)\r\n\r\n## How to use\r\n\r\n__Expect clone this project, you should download `pm.h5` from [google drive](https://drive.google.com/drive/folders/1Ca3HdV-w8TZPRa9KhPBbjrTtGSmtEIsn?usp=sharing) and place it into this folder.__\r\n\r\n### Important Notes\r\n\r\n* The only modification (PTM) supported is **oxidation on Methionine**, otherwise only UNMODIFIED peptides are allowed. To indicate an oxidized methionine, use the format \"M(O)\".\r\n* This model assumes a __FIXED__ carbamidomethyl on C\r\n* The length of input peptides are __NOT__ limited, however, would expect poor performance with peptides longer than 30\r\n* The prediction will NOT output peaks with M/z \u003e 2000\r\n* Predicted peaks that are weaker than STRONGEST_PEAK / 1000 are regarded as noises thus will be omitted from the final output.\r\n\r\n### Required Packages\r\n\r\nRecommend to install dependency via [Anaconda](https://www.anaconda.com/distribution/)\r\n\r\n* Python \u003e= 3.7\r\n* Tensorflow \u003e= 2.3.0\r\n* Pandas \u003e= 0.20\r\n* pyteomics\r\n* lxml\r\n\r\n__The Tensorflow has to be 2.30 or newer! A compatibility bug in Tensorflow made version before 2.3.0 can't load the model correctly. We'll release a new model once the Tensorflow team solve this.__\r\n\r\n### Input format\r\n\r\nThe required input format is TSV, with the following columns:\r\n\r\nPeptide | Charge | Type | NCE\r\n------- | ------ | ---- | ---\r\nAAAAAAAAAVSR | 2 | HCD | 25\r\nAAGAAESEEDFLR | 2 | HCD | 25\r\nAAPAPTASSTININTSTSK | 2 | HCD | 25\r\nAAPAPM(O)NTSTSK | 2 | HCD | 25\r\n\r\nApparently, 'Peptide' and 'Charge' columns mean what it says. The 'Type' must be HCD or ETD (in uppercase). NCE means normalized collision energy, set to 25 as default. Note that in the above examples the last peptide has an oxidized methionine, and it's the only modification supported now. Check `example.tsv` for examples.\r\n\r\n### Usage\r\n\r\nSimply run:\r\n\r\n`python predfull.py --input example.tsv --model pm.h5 --output example_prediction.mgf`\r\n\r\nThe output file is in MGF format\r\n\r\n* --input: the input file\r\n* --output: the output path\r\n* --model: the pretrained model\r\n\r\n## Prediction Examples\r\n\r\n__Note that intensities are shown by square rooted values__\r\n\r\n![example 1](imgs/hcd2.png)\r\n\r\n![example 2](imgs/hcd1.png)\r\n\r\n## Performance Evaluation\r\n\r\nWe provide sample data on [google drive](https://drive.google.com/drive/folders/1Ca3HdV-w8TZPRa9KhPBbjrTtGSmtEIsn?usp=sharing) and codes for you to evaluate the prediction performance. The `hcd_testingset.mgf` file on google drive contains ground truth spectra (randomly sampled from [NIST Human Synthetic Peptide Spectral Library](https://chemdata.nist.gov/dokuwiki/doku.php?id=peptidew:lib:kustersynselected20170530)) that corresponding to items in `example.tsv`, while the `example_prediction.mgf` file contains pre-run predictions.\r\n\r\nTo evaluate the similarity, first download groud truth reference file `hcd_testingset.mgf` from [google drive](https://drive.google.com/drive/folders/1Ca3HdV-w8TZPRa9KhPBbjrTtGSmtEIsn?usp=sharing), then run:\r\n\r\n`python compare_performance.py --real hcd_testingset.mgf --pred example_prediction.mgf`\r\n\r\n* --real: the ground truth file\r\n* --pred: the prediction file\r\n\r\nYou should get around ~0.789 average similarities using these two pre-given MGF files.\r\n\r\n__Make sure that items in `example.tsv` and `hcd_testingset.mgf` are of the same order! Don't permute items or add/delete items unless you will align them by yourself.__\r\n\r\n## How to build \u0026 train the model\r\n\r\nFor those who are interested in reproducing this model, here we provide `train_model.py` of example codes to build and train the model.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flkytal%2Fpredfull","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flkytal%2Fpredfull","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flkytal%2Fpredfull/lists"}