https://github.com/bamescience/peptonizer2000
https://github.com/bamescience/peptonizer2000
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bamescience/peptonizer2000
- Owner: BAMeScience
- License: other
- Created: 2023-04-20T09:10:43.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-09-11T10:53:23.000Z (9 months ago)
- Last Synced: 2025-01-20T17:20:11.671Z (4 months ago)
- Language: Python
- Size: 122 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE.md
Awesome Lists containing this project
README
![]()
The Peptonizer 2000
Integrating PepGM and Unipept for probability-based taxonomic inference of metaproteomic samples
Table of Contents
## About The Project
Introducing the Peptonizer2000 - a tool that combines the capabilities of Unipept and PepGM to analyze
metaproteomic mass spectrometry-based samples. Originally designed for taxonomic inference of viral
mass spectrometry-based samples, we've extended PepGM's functionality to analyze metaproteomic samples by
retrieving taxonomic information from the Unipept database.PepGM is a probabilistic graphical model developed by the eScience group at BAM (Federal Institute for Materials
Research and Testing) that uses belief propagation to infer the taxonomic origin of peptides and taxa in viral samples.
You can learn more about PepGM on our eScience group at BAM (Federal Institute for Materials Research and Testing).
Please refer to our [GitHub](https://github.com/BAMeScience/PepGM) page.Unipept, on the other hand, is a web-based metaproteomics analysis tool that provides taxonomic information for
identified peptides. To make it work seamlessly with PepGM, we've extended Unipept with new functionalities that
restrict the taxa queried and provide all potential taxonomic origins of the peptides queried. Check out more
information about Unipept [here](https://unipept.ugent.be/).With the Peptonizer2000, you can look forward to a comprehensive and streamlined workflow that simplifies
the process of identifying peptides and their taxonomic origins in metaproteomic samples.The Peptonizer2000 workflow is comprised of the following steps:
1. Query all identified peptides, provided by the user in a .tsv file, in the Unipept API,
and restrict the taxonomic range queried based on any prior knowledge of the sample.
2. Assemble the peptide-taxon associations provided by Unipept into a bipartite graph,
where peptides and taxa are represented by different nodes, and an edge is drawn between a peptide and a taxon
if the peptide is part of the taxon's proteome.
3. Transform the bipartite graph into a factor graph using convolution trees and conditional probability table
factors (CPD).
4. Run the belief propagation algorithm multiple times with different sets of CPD parameters until convergence,
to obtain posterior probabilities of candidate taxa.
5. Use an empirically deduced metric to determine the ideal graph parameter set.
6. Output the top scoring taxa as a results barchart. The results are also available as comma-separated files
for further downstream analysis or visualizations.
![]()
## Input
* A .tsv file of your peptides output from any protoemic peptide search method. The first column should be the peptide, the second column it's score attributed by the search engine. An example is provided in test files.
* A config file with your parameters for the peptonizer2000. A more detailed description of the configuration file can be found below. Additionally, an exemplary config file is provided in this repository.## Getting Started
### Prerequisites
Make sure you have git installed and clone the repo:
```sh
git clone https://github.com/BAMeScience/Peptonizer2000.git
```
The Peptonizer relies on a snakemake workflow developed with snakemake 5.10.0.
Installing snakemake requires mamba.To install mamba:
```sh
conda install -n -c conda-forge mamba
```Alternatively, if you do not have conda installed, you can download mamba directly together with miniforge(intructions from the [mamba installation guide](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html)):
```sh
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
```To install snakemake:
```sh
conda activate
mamba install -c conda-forge -c bioconda -n snakemake
```
In accordance with the Snakemake recommendations, we suggest to save your sample data
in `resources` folder. All outputs will be saved in `results`.Additional dependencies necessary are Java and GCC.
The Peptonizer2000 is tested for Linux OS.
All necessary binaries are autmatically installed using conda.
### Configuration file
The Peptonizer2000 relies on a configuration file in `yaml` format to set up the workflow.
An example configuration file is provided in `config/config.yaml`.
Do not change the config file location.
Peptonizer parameter
- DataDir: Relative path to raw spectra
- ResultsDir: Relative path to results
- ResourcesDir: Relative path to resources
- ExperimentName: Name of subfolder in results
- TaxaInPlot: # of inferred taxa that appear in the barplot that is created of the results csv
- Alpha: Grid search increments for alpha
- Beta: Grid search increments for beta
- prior: grid search increments for prior
Sample specific parameter
- PeptidesAndScores: path to you .tsv file of input peptides
- SampleName: wildcard for spectra file and folder name
UniPept parameter
- TaxaNumber: # of taxa
- targetTaxa: Comma separated list of taxa compromised in the UniPept query. If querying all of Unipept, use '1'
### Output files
All Peptonizer2000 output files are saved into the results folder and include the following:
Main results:
- Peptonizer_Results.csv: Table with values ID, score, type (contains all taxids under 'ID' and all probabilities under '
score' tosterior probabilities of n (default: 15) highest scoring taxa
Additional (intermediate):
- Intermediate results folder sorted by their prior value for all possible grid search parameter combinations
- TaxaWeights.csv: csv file of all taxids that had at least one protein map to them and their weight
- PepGM_graph.graphml: graphml file of the graphical model (without convolution tree factors). Useful to visualize the graph structure and peptide-taxon connections
- paramcheck.png: barplot of the metric used to determine the graphical model parameters for n (default: 15) best performing parameter combinations
- additional .csv files resulting from the clustering of taxa by peptidome
- log files for bug fixing
## Testing the Peptonizer
To test the Peptonizer2000 and see if it is set up correctly on your machine, we provide a test file under resources/test_files. This should be dowloaded automatically if you follow the installation instructions above. The test file is a .tsv resulting from the sample S03 of the [CAMPI study](https://www.nature.com/articles/s41467-021-27542-8) searched against a sample specific database using X!Tandem and MS2Rescore. The original file are available through [PRIDE under PXD023217](https://www.ebi.ac.uk/pride/archive/projects/PXD023217/).
To execute a test run of the Peptonizer2000 using the provided files:
1. Follow the installation instructions above
2. In you terminal, go to the folder resources/test_files
3. execute the following code to move config file to the right directory
```sh
cp ./config.yaml ../../config/
```
4. You need to make some alterations to the provided example config file.
- input the path to the S03 .tsv file . It should be something like 'path_to_workflow_directory/resources/SampleData/S03_test.tsv'
You should now me all set up to run the Peptonizer2000 on the test files. In your terminal, run
```sh
snakemake --use-conda --cores
````
is the number of cores available on your machine to run this workflow. Make sure your mamba environment, to which you downloaded snakemake, is active.
## License
Distributed under the MIT License. See `LICENSE.txt` for more information.
## Contact
Tanja Holstein - [@HolsteinTanja](https://twitter.com/HolsteinTanja) - [email protected]
Pieter Verschaffelt - [email protected]
[contributors-shield]: https://img.shields.io/github/contributors/BAMeScience/repo_name.svg?style=for-the-badge
[contributors-url]: https://github.com/BAMeScience/repo_name/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/BAMeScience/repo_name.svg?style=for-the-badge
[forks-url]: https://github.com/BAMeScience/repo_name/network/members
[stars-shield]: https://img.shields.io/github/stars/BAMeScience/repo_name.svg?style=for-the-badge
[stars-url]: https://github.com/BAMeScience/repo_name/stargazers
[issues-shield]: https://img.shields.io/github/issues/BAMeScience/repo_name.svg?style=for-the-badge
[issues-url]: https://github.com/BAMeScience/repo_name/issues
[license-shield]: https://img.shields.io/github/license/BAMeScience/repo_name.svg?style=for-the-badge
[license-url]: https://github.com/BAMeScience/repo_name/blob/master/LICENSE.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/linkedin_username
[product-screenshot]: images/screenshot.png
[Next.js]: https://img.shields.io/badge/next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[Next-url]: https://nextjs.org/
[React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[React-url]: https://reactjs.org/
[Vue.js]: https://img.shields.io/badge/Vue.js-35495E?style=for-the-badge&logo=vuedotjs&logoColor=4FC08D
[Vue-url]: https://vuejs.org/
[Angular.io]: https://img.shields.io/badge/Angular-DD0031?style=for-the-badge&logo=angular&logoColor=white
[Angular-url]: https://angular.io/
[Svelte.dev]: https://img.shields.io/badge/Svelte-4A4A55?style=for-the-badge&logo=svelte&logoColor=FF3E00
[Svelte-url]: https://svelte.dev/
[Laravel.com]: https://img.shields.io/badge/Laravel-FF2D20?style=for-the-badge&logo=laravel&logoColor=white
[Laravel-url]: https://laravel.com
[Bootstrap.com]: https://img.shields.io/badge/Bootstrap-563D7C?style=for-the-badge&logo=bootstrap&logoColor=white
[Bootstrap-url]: https://getbootstrap.com
[JQuery.com]: https://img.shields.io/badge/jQuery-0769AD?style=for-the-badge&logo=jquery&logoColor=white
[JQuery-url]: https://jquery.com