Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/biogenies/plastogram
Predicts subplastid localization of proteins along with sequence origin (plastid- or nuclear-encoded).
https://github.com/biogenies/plastogram
bioinformatics chloroplast hmm-profile hmmer plastid r-package random-forest subchloroplast-localization subplastid-localization
Last synced: about 1 month ago
JSON representation
Predicts subplastid localization of proteins along with sequence origin (plastid- or nuclear-encoded).
- Host: GitHub
- URL: https://github.com/biogenies/plastogram
- Owner: BioGenies
- Created: 2021-10-17T07:08:37.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-24T11:41:18.000Z (over 1 year ago)
- Last Synced: 2023-08-04T17:59:36.941Z (over 1 year ago)
- Topics: bioinformatics, chloroplast, hmm-profile, hmmer, plastid, r-package, random-forest, subchloroplast-localization, subplastid-localization
- Language: R
- Homepage: https://biogenies.info/PlastoGram
- Size: 11.7 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
out.width = "100%"
)
```## Web server
The Shiny web server is available under the address: https://biogenies.info/PlastoGram.
## Installation
### Installing HMMER
PlastoGram uses HMMER software to predict thylakoid lumen proteins imported via Sec and Tat pathways with HMM profiles.
You can install HMMER using the following instructions (adapted from [HMMER documentation](http://hmmer.org/documentation.html)):- installing with system packager
```
% brew install hmmer # OS/X, HomeBrew
% port install hmmer # OS/X, MacPorts
% apt install hmmer # Linux (Ubuntu, Debian...)
% dnf install hmmer # Linux (Fedora)
% yum install hmmer # Linux (older Fedora)
% conda install -c bioconda hmmer # Anaconda
```- compiling from source
```
% wget http://eddylab.org/software/hmmer/hmmer.tar.gz
% tar zxf hmmer.tar.gz
% cd hmmer-3.3.2
% ./configure --prefix /your/install/path
% make
% make check
% make install
% (cd easel; make install)
```For more information about HMMER please visit [HMMER website](http://hmmer.org/).
### Installing PlastoGram
You can install the latest development version of the package:
```R
if(!require("devtools")) install.packages("devtools")
devtools::install_github("https://github.com/BioGenies/PlastoGram/")
```## PlastoGram model
PlastoGram is a robust ensemble model for detailed and accurate prediction of subplastid localization and origin of analysed protein. PlastoGram classifies protein as nuclear- or plastid-encoded and predicts its most probable localization considering envelope, stroma, thylakoid membrane or thylakoid lumen. For the latter, also the import pathway (Sec or Tat) is predicted. We provide an additional function to differentiate nuclear-encoded inner and outer membrane proteins.
PlastoGram consists of eight lower-level models trained to tackle a specific labeling problem and a higher-level random forest model, which uses results from of lower-level models to provide the prediction of a protein localization and origin. The lower-level models used in the PlastoGram with their areas of competence are listed below:
- Nuclear_model - recognizes nuclear-encoded proteins
- Membrane_model - identifies membrane proteins
- N_E_vs_N_TM_model - differentiates between nuclear-encoded envelope proteins and nuclear-encoded thylakoid membrane proteins. Prediction values over 0.5 indicate envelope, whereas lower thylakoid membrane
- Plastid_membrane_model - distinguishes plastid-encoded proteins targeted to plastid inner and thylakoid membrane. Prediction values higher than 0.5 indicate inner membrane, whereas lower thylakoid membrane
- N_E_vs_N_S_model - differentiates nuclear-encoded proteins targeted to envelope from nuclear-encoded stromal proteins. Prediction values over 0.5 indicate envelope, whereas lower stroma
- Nuclear_membrane_model - distinguishes nuclear-encoded membrane proteins from all others
- Sec_model - identifies proteins targeted to the thylakoid lumen via Sec pathway
- Tat_model - recognizes proteins targeted to the thylakoid lumen via Tat pathway## Versions of the model
```{r, echo=FALSE, results='asis'}
cat(PlastoGram:::markdown_versions())
```## Predicting subplastid localization
Suppopse you wish to predict localization of proteins which sequences are stored in sequence_file.fa located in the working directory using the holdout version of PlastoGram model.
``` r
# load the package
library(PlastoGram)# read in sequences
sequences <- read_txt("sequence_file.fa")# run predictions
predictions <- predict(PlastoGram_H, sequences)# inspect results
# see predictions of origin and localization:
predictions[["Final_results"]]# save predictions of origin and localization to a csv file named 'PlastoGram_predictions.csv'
write.csv(predictions[["Final_results"]], "PlastoGram_predictions.csv")# get summary with number of proteins identified in each class
library(dplyr)
predictions[["Final_results"]] %>%
group_by(Localization) %>%
summarise(count = n())# see predictions from lower-level models
predictions[["Lower_level_preds"]]# see higher-level model predictions
predictions[["Higher_level_preds"]]# see predictions of detailed localization for N_E proteins (OM or IM)
predictions[["OM_IM_preds"]]```
## Citation
```{r, echo=FALSE, results='asis'}
cat(PlastoGram:::markdown_citation())
```## Code availability
Code necessary to reproduce the whole analysis described in the article and leading to the final PlastoGram algorithm is available at https://github.com/BioGenies/PlastoGram-analysis.
## Contact
```{r, echo=FALSE, results='asis'}
cat(PlastoGram:::markdown_contact())
```## Acknowledgements
```{r, echo=FALSE, results='asis'}
cat(PlastoGram:::markdown_acknowledgements())
```