{"id":20707490,"url":"https://github.com/idslme/idsl_mint","last_synced_at":"2025-04-23T02:14:19.064Z","repository":{"id":187769455,"uuid":"665169225","full_name":"idslme/IDSL_MINT","owner":"idslme","description":"A Deep Learning Framework to Interpret Raw Mass Spectrometry (m/z) Data","archived":false,"fork":false,"pushed_at":"2024-01-21T01:06:48.000Z","size":995,"stargazers_count":20,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-23T02:14:12.432Z","etag":null,"topics":["cheminformatics","lipidomics","mass-spectrometry","metabolomics","molecular-fingerprints","msms","python3","pytorch","rdkit","small-molecule","transformer","untargeted-metabolomics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idslme.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-11T15:31:33.000Z","updated_at":"2025-04-21T15:28:07.000Z","dependencies_parsed_at":"2024-01-21T01:46:39.931Z","dependency_job_id":null,"html_url":"https://github.com/idslme/IDSL_MINT","commit_stats":null,"previous_names":["idslme/idsl_mint"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idslme%2FIDSL_MINT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idslme%2FIDSL_MINT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idslme%2FIDSL_MINT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idslme%2FIDSL_MINT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idslme","download_url":"https://codeload.github.com/idslme/IDSL_MINT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250354512,"owners_count":21416751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cheminformatics","lipidomics","mass-spectrometry","metabolomics","molecular-fingerprints","msms","python3","pytorch","rdkit","small-molecule","transformer","untargeted-metabolomics"],"created_at":"2024-11-17T01:26:25.669Z","updated_at":"2025-04-23T02:14:19.025Z","avatar_url":"https://github.com/idslme.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IDSL_MINT\u003cimg src='MINT_educational_files/Figures/IDSL_MINT-logo.png' width=\"280px\" align=\"right\" /\u003e\n\u003c!-- badges: start --\u003e\n[![Developed-by](https://img.shields.io/badge/Developed_by-Sadjad_Fakouri_Baygi-blue)](https://github.com/sajfb)\n[![Powered by RDKit](https://img.shields.io/badge/Powered%20by-RDKit-3838ff.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQBAMAAADt3eJSAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAFVBMVEXc3NwUFP8UPP9kZP+MjP+0tP////9ZXZotAAAAAXRSTlMAQObYZgAAAAFiS0dEBmFmuH0AAAAHdElNRQfmAwsPGi+MyC9RAAAAQElEQVQI12NgQABGQUEBMENISUkRLKBsbGwEEhIyBgJFsICLC0iIUdnExcUZwnANQWfApKCK4doRBsKtQFgKAQC5Ww1JEHSEkAAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAyMi0wMy0xMVQxNToyNjo0NyswMDowMDzr2J4AAAAldEVYdGRhdGU6bW9kaWZ5ADIwMjItMDMtMTFUMTU6MjY6NDcrMDA6MDBNtmAiAAAAAElFTkSuQmCC)](https://www.rdkit.org/)\n[![Python](https://img.shields.io/pypi/pyversions/d3blocks)](https://img.shields.io/pypi/pyversions/d3blocks)\n[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge\u0026logo=pytorch\u0026logoColor=white)](https://github.com/pytorch)\n\u003c!-- badges: end --\u003e\n\n**IDSL_MINT: Mass spectra INTerpretation** by the [**Integrated Data Science Laboratory for Metabolomics and Exposomics (IDSL.ME)**](https://www.idsl.me) is a transformative mass spectrometry data processing framework. This innovative approach for mass spectrometry data processing has been constructed upon the transformer models delineated in the seminal paper, [*'Attention is all you need'*](https://arxiv.org/abs/1706.03762). **IDSL_MINT** has been meticulously engineered to predict molecular fingerprint descriptors and structures from MS/MS spectra in addition to forecasting MS/MS spectra from canonical SMILES. A key distinguishing feature of **IDSL_MINT** is its compatibility with any reference MS/MS data in ***.msp*** format to tailor **IDSL_MINT** models for various applications.\n\n\u003cimg src='MINT_educational_files/Figures/IDSL_MINT-TOC_Art.PNG' align=\"center\" width=\"750\" /\u003e\n\n\n## Table of Contents\n\n- [Features of IDSL_MINT](https://github.com/idslme/idsl_mint#features-of-idsl_mint)\n- [Installation](https://github.com/idslme/idsl_mint#installation)\n- [Workflow](https://github.com/idslme/idsl_mint#workflow)\n- [IDSL_MINT: Translating MS/MS Spectra into Molecular Fingerprints](https://github.com/idslme/idsl_mint#idsl_mint-translating-msms-spectra-into-molecular-fingerprints)\n- [IDSL_MINT: Translating MS/MS Spectra into Canonical SMILES](https://github.com/idslme/idsl_mint#idsl_mint-translating-msms-spectra-into-canonical-smiles)\n- [IDSL_MINT: Transforming Fingerprints into MS/MS Fragments](https://github.com/idslme/idsl_mint#idsl_mint-transforming-fingerprints-into-msms-fragments)\n- [Citation](https://github.com/idslme/idsl_mint#citation)\n\n## Features of IDSL_MINT\n\n1) Parameter selection for training and prediction through user-friendly and well-documented [**YAML** files](https://github.com/idslme/IDSL_MINT/tree/main/YAML)\n2) Compatibility with *.msp* file formats.\n3) Compatibility with various fingerprint descriptor methods.\n4) Supports beam search inferencing.\n5) Utilizes the power of the transformer model architecture.\n6) Device-agnostic processing.\n\n## Installation\n\n1. Installation of Prerequisites:\n    \n    a. Install [PyTorch](https://pytorch.org/get-started/locally) according to your system configurations. **IDSL_MINT** is device-agnostic and fully supports `cuda` GPU processing.\n\n    b. Install [RDKit](https://www.rdkit.org/docs/Install.html).\n\n2. Install the package:\n\n\t2.1. Option 1: `pip`\n\t\n\t- `pip install git+https://github.com/idslme/IDSL_MINT`\n\t- `pip install IDSL_MINT`\n    \n\t2.2. Option 2: `conda`\n\n\t- `git clone https://github.com/idslme/IDSL_MINT.git`\n\t- `cd IDSL_MINT`\n\t- `conda env create -f environment.yml`\n\t- `conda activate IDSL_MINT`\n\t- `pip install -e .`\n\n3. Update the Python PATH:\n\n\t`export PATH=\"root/.local/bin:$PATH\"` --\u003e root directory should be your system root directory.\n\n\n## Workflow\nThe **IDSL_MINT** framework encapsulates three transformative approaches to deeply interpret mass spectrometry data. Each of these methodologies can be effectively managed using designated model configuration `yaml` files. In the training step, weights of **IDSL_MINT** models are stored and updated in a designated directory on the decreasing trajectory of the training loss value to ensure optimal performance and accuracy. The [`yaml`](https://github.com/idslme/IDSL_MINT/tree/main/YAML) files are easy to update and model configuration is significantly simplified and commented. After configuring the model in the designated `yaml` file, run the below bash command to perform calculations. The **IDSL_MINT** package can automatically detect types of `yaml` file to run training or inference operations.\n\n    MINT_workflow --yaml /path/to/yaml/file\n\n#### Important tips:\n- **IDSL_MINT** can extract information from `comment: ` and `comments: ` entries in ***.msp*** files which enables this platform to process MoNA, GNPS, and other public library with any pre-treatment requirements.\n\n- **IDSL_MINT** identifies chemical structures through `SMILES: ` or `InChI: ` labels without case sensitivity.\n\n- In case multiple similar headers are present in a MSP block, the one with the longest content is selected for parsing.\n\n- MSP blocks must include `PrecursorMZ: ` row entries.\n\n\n## IDSL_MINT: Translating MS/MS Spectra into Molecular Fingerprints\n\n\u003c!-- badges: start --\u003e\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16A-Hw6S_04nxlopp7yefZkVB5Aakcodu#scrollTo=E4o1pG-tZNDR)\n\u003c!-- badges: end --\u003e\n\n**IDSL_MINT** includes a method to translate MS/MS spectra into molecular fingerprint descriptors. This method offers the option to calculate fingerprints using the [Extended-connectivity fingerprints (ECFPs)](https://doi.org/10.1021/ci100050t) or [MACCS Keys](https://doi.org/10.1021/ci200081k) RDKit methods from InChI and SMILES row entries. Another option to obtain molecular fingerprints is to parse the MSP files for the user-provided fingerprints. The following is an example of an Aspirin MSP block with custom fingerprint bits.\n\n\n    Name: Aspirin\n    Fingerprint: 15-53-85-157-246-322-329-343-444-464-553-708-763-785-799-821-847-1040-1139-1240-1250-1317-1348-1439-1450-1460-1475-1479-1502-1674-1693-1734-1841-1866-2046-2310-2329-2413-2627-2750-2755-2777-2782-2799-2901-2911-2915-3028-3049-3394-3412-3442-3514-3535-3557-3700-3737-3785-3972-3996\n    Synon: Acetyl salicilic acid\n    Synon: 2-acetyloxybenzoic acid\n    InChI: InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)\n    Precursor_type: [M+H]+\n    Spectrum_type: MS2\n    PrecursorMZ: 181.0495\n    Instrument_type: LC-ESI-QFT\n    Instrument: Q Exactive Plus Orbitrap Thermo Scientific\n    Ion_mode: P\n    Collision_energy: 15 (nominal)\n    Formula: C9H8O4\n    MW: 180\n    ExactMass: 180.042258736\n    Num Peaks: 10\n    65.0385 0.217327\n    76.0304 0.107699\n    77.0383 0.124517\n    92.0255 0.129908\n    121.0283 0.125197\n    133.028 0.149192\n    149.0231 100.000000\n    163.0386 63.824575\n    167.0337 0.261816\n    181.0493 0.613766\n\n\n`Fingerprint` row entries may be in any line in MSP blocks between `Name` and `Num Peaks` rows, and fingerprint bits must be dash-separated. This example represented Avalon fingerprint bits with `nBits = 4096` for Aspirin MS/MS spectra.\n\nTo train an **IDSL_MINT** model with molecular fingerprint descriptors, download and fill a [MINT_MS2FP_trainer.yaml](https://github.com/idslme/IDSL_MINT/tree/main/YAML/MINT_MS2FP_trainer.yaml) file. Similarly, for model prediction, use [MINT_MS2FP_predictor.yaml](https://github.com/idslme/IDSL_MINT/tree/main/YAML/MINT_MS2FP_predictor.yaml) file.\n\nA [colab notebook](https://colab.research.google.com/drive/16A-Hw6S_04nxlopp7yefZkVB5Aakcodu#scrollTo=E4o1pG-tZNDR) was presented to demonstrate the performance of **IDSL_MINT** in training and predicting molecular fingerprint descriptors using MS/MS data.\n\n## IDSL_MINT: Translating MS/MS Spectra into Canonical SMILES\n\n\u003c!-- badges: start --\u003e\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1UUZwCpI4b0adHZ3y4JTRDPlin-KAWIvQ#scrollTo=RbAS-ZFPVOqM)\n\u003c!-- badges: end --\u003e\n\nIn this approach, InChI and SMILES row entries in the MSP blocks are converted into canonical SMILES using [RDKit](https://www.rdkit.org). Next, standard canonical SMILES are tokenized using a method similar to [RXNFP](https://rxn4chemistry.github.io/rxnfp). As long as InChI and SMILES row entries present in the MSP blocks are available, this approach may be used to train an **IDSL_MINT** model.\n\nTo train an **IDSL_MINT** model to predict molecular structures from MS/MS spectra, download and fill a [MINT_MS2SMILES_trainer.yaml](https://github.com/idslme/IDSL_MINT/tree/main/YAML/MINT_MS2SMILES_trainer.yaml) file. Likewise, for model prediction, use [MINT_MS2SMILES_predictor.yaml](https://github.com/idslme/IDSL_MINT/tree/main/YAML/MINT_MS2SMILES_predictor.yaml) file.\n\nA [colab notebook](https://colab.research.google.com/drive/1UUZwCpI4b0adHZ3y4JTRDPlin-KAWIvQ#scrollTo=RbAS-ZFPVOqM) was presented to demonstrate the performance of **IDSL_MINT** in training and predicting canonical SMILES using MS/MS data.\n\n## IDSL_MINT: Transforming Fingerprints into MS/MS Fragments\n\nThis method is designed to translate fingerprints into MS/MS fragments using a transformer model. This approach contrasts with previous methods that predict fragment mass from fingerprints. \n\nTo train an **IDSL_MINT** model to predict MS/MS spectra from molecular structures, download and fill a [MINT_FP2MS_trainer.yaml](https://github.com/idslme/IDSL_MINT/tree/main/YAML/MINT_FP2MS_trainer.yaml) file. Likewise, for model prediction, use [MINT_FP2MS_predictor.yaml](https://github.com/idslme/IDSL_MINT/tree/main/YAML/MINT_FP2MS_predictor.yaml) file.\n\n## Citation\n\n[1] Fakouri Baygi, S., Barupal, D.K. [IDSL_MINT: a deep learning framework to predict molecular fingerprints from mass spectra](https://doi.org/10.1186/s13321-024-00804-5). *Journal of Cheminformatics*, **2024**, *16(8)*.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidslme%2Fidsl_mint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidslme%2Fidsl_mint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidslme%2Fidsl_mint/lists"}