Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chembl/ChEMBL_Structure_Pipeline
ChEMBL database structure pipelines
https://github.com/chembl/ChEMBL_Structure_Pipeline
Last synced: about 2 months ago
JSON representation
ChEMBL database structure pipelines
- Host: GitHub
- URL: https://github.com/chembl/ChEMBL_Structure_Pipeline
- Owner: chembl
- License: mit
- Created: 2019-02-06T13:43:57.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-03T22:48:17.000Z (4 months ago)
- Last Synced: 2024-11-06T09:43:50.304Z (2 months ago)
- Language: Python
- Homepage:
- Size: 223 KB
- Stars: 192
- Watchers: 15
- Forks: 38
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- top-life-sciences - **chembl/ChEMBL_Structure_Pipeline** - 10-25 15:20:47 | (Ranked by starred repositories)
README
[![CI Testing](https://github.com/chembl/ChEMBL_Structure_Pipeline/workflows/CI/badge.svg)](https://github.com/chembl/ChEMBL_Structure_Pipeline/actions?query=workflow%3ACI+branch%3Amaster)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)# ChEMBL Structure Pipeline
ChEMBL protocols used to standardise and salt strip molecules. First used in ChEMBL 26.
Check the [wiki](https://github.com/chembl/ChEMBL_Structure_Pipeline/wiki) and paper[[1]](#1) for a detailed description of the different processes.
## Installation
From source:
git clone https://github.com/chembl/ChEMBL_Structure_Pipeline.git
pip install ./ChEMBL_Structure_Pipelinewith pip:
```bash
pip install chembl_structure_pipeline
```with conda:
```bash
conda install -c conda-forge chembl_structure_pipeline
```## Usage
### Standardise a compound [(info)](https://github.com/chembl/ChEMBL_Structure_Pipeline/wiki/Work-done-by-each-step#standardize_molblock)
```python
from chembl_structure_pipeline import standardizero_molblock = """
Mrv1810 07121910172D4 3 0 0 0 0 999 V2000
-2.5038 0.4060 0.0000 C 0 0 3 0 0 0 0 0 0 0 0 0
-2.5038 1.2310 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-3.2182 -0.0065 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-1.7893 -0.0065 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
1 4 1 4 0 0 0
M CHG 2 2 -1 3 1
M END
"""std_molblock = standardizer.standardize_molblock(o_molblock)
```### Get the parent compound [(info)](https://github.com/chembl/ChEMBL_Structure_Pipeline/wiki/Work-done-by-each-step#get_parent_molblock)
```python
from chembl_structure_pipeline import standardizero_molblock = """
Mrv1810 07121910262D3 1 0 0 0 0 999 V2000
-5.2331 1.1053 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.5186 1.5178 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-2.8647 1.5789 0.0000 Cl 0 5 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M CHG 2 2 1 3 -1
M END
"""parent_molblock, _ = standardizer.get_parent_molblock(o_molblock)
```### Check a compound [(info)](https://github.com/chembl/ChEMBL_Structure_Pipeline/wiki/Work-done-by-each-step#checkmolecule)
The checker assesses the quality of a structure. It highlights specific features or issues in the structure that may need to be revised. Together with the description of the issue, the checker process returns a penalty score (between 0-9) which reflects the seriousness of the issue (the higher the score, the more critical is the issue)
```python
from chembl_structure_pipeline import checkero_molblock = """
Mrv1810 02151908462D
4 3 0 0 0 0 999 V2000
2.2321 4.4196 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0023 4.7153 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.4117 4.5059 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.9568 3.6420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 1 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
M END
"""issues = checker.check_molblock(o_molblock)
```## References
[1]
Bento, A.P., Hersey, A., Félix, E. et al. An open source chemical structure curation pipeline using RDKit. J Cheminform 12, 51 (2020). https://doi.org/10.1186/s13321-020-00456-1