Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/csbiology/chlamyatlas
Chlamy Atlas is a AI-powered web application which predicts the localizations of proteins from the Green Algae Chlamydomonas reinhardtii.
https://github.com/csbiology/chlamyatlas
chlamydomonas data-science ml plant science target tool
Last synced: 29 days ago
JSON representation
Chlamy Atlas is a AI-powered web application which predicts the localizations of proteins from the Green Algae Chlamydomonas reinhardtii.
- Host: GitHub
- URL: https://github.com/csbiology/chlamyatlas
- Owner: CSBiology
- Created: 2023-11-28T14:23:55.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-29T07:21:22.000Z (7 months ago)
- Last Synced: 2024-05-29T21:03:27.773Z (7 months ago)
- Topics: chlamydomonas, data-science, ml, plant, science, target, tool
- Language: F#
- Homepage: https://csb-chlamyatlas.bio.rptu.de
- Size: 135 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ChlamyAtlas
A web UI for optimised versions of the models published in Wang et al. 2023.
![alt text](./assets/image2.png)
# Supported formats
ChlamyAtlas expects input in either the FASTA format or as pure amino acid sequence.
The FASTA format consists of two building blocks. The first is a description which explains the following sequence. This description starts with ">" and is written in a single line. The amino acid sequence follows in the next line and can span multiple lines. An example for this format is:
```
>sp|A0A178WF56|CSTM3_ARATH Protein CYSTEINE-RICH TRANSMEMBRANE MODULE 3 OS=Arabidopsis thaliana OX=3702 GN=CYSTM3 PE=1 SV=1
MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA
MCCCCVLDCVF
>sp|A1YKT1|TCP18_ARATH Transcription factor TCP18 OS=Arabidopsis thaliana OX=3702 GN=TCP18 PE=1 SV=1
MNNNIFSTTTTINDDYMLFPYNDHYSSQPLLPFSPSSSINDILIHSTSNTSNNHLDHHHQ
FQQPSPFSHFEFAPDCALLTSFHPENNGHDDNQTIPNDNHHPSLHFPLNNTIVEQPTEPS
ETINLIEDSQRISTSQDPKMKKAKKPSRTDRHSKIKTAKGTRDRRMRLSLDVAKELFGLQ
DMLGFDKASKTVEWLLTQAKPEIIKIATTLSHHGCFSSGDESHIRPVLGSMDTSSDLCEL
ASMWTVDDRGSNTNTTETRGNKVDGRSMRGKRKRPEPRTPILKKLSKEERAKARERAKGR
TMEKMMMKMKGRSQLVKVVEEDAHDHGEIIKNNNRSQVNRSSFEMTHCEDKIEELCKNDR
FAVCNEFIMNKKDHISNESYDLVNYKPNSSFPVINHHRSQGAANSIEQHQFTDLHYSFGA
KPRDLMHNYQNMY
```ChlamyAtlas was developed with the assumption that the description follows the standard used by the Universal Protein Resource ([Uniprot](https://www.uniprot.org/)) and only returns the Uniprot ID as description in the output table. This can be circumvented by removing the "|" in the description. In this case the complete description gets returned.
The only other supported format are pure amino acid sequences. An example for this format is:
```
MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA
MCCCCVLDCVF
```
This format can only be used for a single amino acid sequence. Multiple amino acid sequences must be in the following format:
```
>!MAQYHQQHEMKQTMAETQYVTAPPPMGYPVMMKDSPQTVQPPHEGQSKGSGGFLRGCLAA
MCCCCVLDCVF
>!MNNNIFSTTTTINDDYMLFPYNDHYSSQPLLPFSPSSSINDILIHSTSNTSNNHLDHHHQ
FQQPSPFSHFEFAPDCALLTSFHPENNGHDDNQTIPNDNHHPSLHFPLNNTIVEQPTEPS
ETINLIEDSQRISTSQDPKMKKAKKPSRTDRHSKIKTAKGTRDRRMRLSLDVAKELFGLQ
DMLGFDKASKTVEWLLTQAKPEIIKIATTLSHHGCFSSGDESHIRPVLGSMDTSSDLCEL
ASMWTVDDRGSNTNTTETRGNKVDGRSMRGKRKRPEPRTPILKKLSKEERAKARERAKGR
TMEKMMMKMKGRSQLVKVVEEDAHDHGEIIKNNNRSQVNRSSFEMTHCEDKIEELCKNDR
FAVCNEFIMNKKDHISNESYDLVNYKPNSSFPVINHHRSQGAANSIEQHQFTDLHYSFGA
KPRDLMHNYQNMY
```# Result
![alt text](./assets/image.png)
Explanations of Chloropred ,Qchloro, Mitopred,Qmito,Secrpred,Qsecr, and FinalPred.
### Chloropred
Prediction score indicating the likelihood of the protein being localized to the Chloroplast. A higher scores suggest a stronger prediction that the protein is localized in the Chloroplast.
### Qchloro
q-value associated with the Chloroplast prediction score. Provides a measure of statistical significance for the Chloroplast prediction. Lower q-values indicate higher statistical significance.
### Mitopred
Prediction score for the localization of the protein to the Mitochondria. A higher scores suggest a stronger prediction of Mitochondrial localization.### Qmito
q-value associated with the Mitochondria prediction score. Indicates the statistical significance of the Mitochondria localization prediction. Lower q-values suggest a more reliable prediction.### Secrpred
Prediction score for identifying the protein as a Secretory Protein.A higher scores indicate a stronger likelihood that the protein functions as a Secretory Protein.### Qsecr
q-value for the Secretory Protein prediction. Provides a measure of the statistical significance of the Secretory Protein prediction. Lower q-values are indicative of more statistically significant predictions.### FinalPred
Represents the model's final prediction of the protein's localization based on the highest score and its corresponding q-value. The final localization is determined by comparing the q-values and prediction scores against preset cutoffs. If all q-values exceed the cutoff, the protein is classified as "Cytoplasmic."### Cutoff
The threshold q-value below which a prediction is considered statistically significant. Set to 0.05 by default, meaning that predictions with q-values below this threshold are classified as significant. This parameter helps in distinguishing between statistically significant and non-significant predictions, reducing the chance of false-positive localizations.# Docker
## Environment Variables
- **NET_EMAIL_EMAIL**: Email address to send emails from
Default: Set via user secrets
- **NET_EMAIL_ACCOUNTNAME**: Email account name to send emails from
Default: Set via user secrets
- **NET_EMAIL_PASSWORD**: Email account password to send emails from
Default: Set via user secrets
- **NET_EMAIL_SERVER**: Email server to send emails from
Default: Set via user secrets
- **NET_EMAIL_PORT**: Email server port to send emails from
Default: Set via user secrets
- **PYTHON_SERVICE_TIMEOUT**: Time in minutes before conenction timeout between ui and api service
Default: 30 minutes
- **PYTHON_SERVICE_URL**: Sets the url for the api predictor backend.
Default: `http://localhost:8000`
*Remarks:* In docker compose this could be `http://host.docker.internal:8000`
*Remarks:* On Linux might require:
```
extra_hosts:
- "host.docker.internal:host-gateway"
```- **PYTHON_SERVICE_STORAGE_TIMESPAN**: How long the user data should be stored
Default: 1 Hour
## Docker Compose
```yaml
version: '3.7'name: chlamyatlas
services:
api:
image: csbdocker/chlamyatlas-api:latest
ports:
- 8000:80
environment:
GUNICORN_CMD_ARGS: "-k uvicorn.workers.UvicornWorker --preload"
MAX_WORKERS: "4"
TIMEOUT: "0"
ui:
image: csbdocker/chlamyatlas-ui:latest
environment:
PYTHON_SERVICE_URL: "http://host.docker.internal:8000"
PYTHON_SERVICE_STORAGE_TIMESPAN: "7"
ports:
- 5000:5000
# Use this to make host.docker.internal accessible on linux docker
extra_hosts:
- "host.docker.internal:host-gateway"
```# Local Development
## Install pre-requisites
You'll need to install the following pre-requisites in order to build SAFE applications
* [.NET SDK](https://www.microsoft.com/net/download) 8.0 or higher
* [Node 18](https://nodejs.org/en/download/) or higher
* [NPM 9](https://www.npmjs.com/package/npm) or higher
* [Python 3.11](https://www.python.org/downloads/) or higher## Install
- run `setup.cmd`
.. __or__ ..
1. `dotnet tool restore`
2. `py -m venv .venv`
3. `.\.venv\Scripts\python.exe -m pip install -r .\src\FastAPI\requirements.txt`## Run
- `.\build.cmd run` starts SAFE stack
plus in another terminal run:
1. activate local python environment: `.\.venv\Scripts\Activate.ps1`
2. navigate to fastapi folder: `cd .\src\FastAPI\`
3. start fastapi backend: `./run.cmd`## Activate Email notification (optional)
Set user-secrets in the following schema:
```json
{
"email": {
"NET_EMAIL_EMAIL": "[email protected]",
"NET_EMAIL_ACCOUNTNAME": "PlaceholderAccountName",
"NET_EMAIL_PASSWORD": "HelloWorld1234",
"NET_EMAIL_SERVER": "smtp.placeholdermail.de",
"NET_EMAIL_PORT": 587
}
}
```## Publish
### Test Publish
1. `.\build.cmd dockerbundle [--uionly]`, creates `:new` docker image(s). Skip fastapi image with `--uionly`
2. `.\build.cmd dockertest`, uses local docker-compose file to start `:new` images.### To docker-hub
1. Login to CSB-Docker
2. Ensure correct Versions, both for python and dotnet service.
- `.\build.cmd versions`
- *Remarks:* Versions are defined in project files. Paths can be found in build project `ProjectInfo.fs`. Accessed via regex parsing.
3. Run `Test Publish` steps. The following step requires built `:new` images.
4. `.\build.cmd dockerpublish`# Request Workflow
```mermaid
sequenceDiagram
participant py as Python ML
participant net as F#35; Server
participant c as Client
actor u as User
u -->> c: Gives data
c -->>+net: sends user data
par start analysis
loop
net-)+py: send sequence
py->py: predict target
py-)net: return predicted target
end
and return request information
net -) c: returns `request-ID`
end
critical ⚠️
u -->> c: copies and stores `request-ID`
end
opt email
u -->> c: give email address
c -->> net: give id + email to store
end
opt check status
u -->> c: use `request-ID` to check status
end
py-)net: send last package
deactivate py
net-->>net: run q-value calculation
net-->>net: store results
deactivate net
opt gave email
net-)u: send email
end
u -->> c: request data
c-->>net: get data
net-->>c: return data
c-->>u: download data
```