https://github.com/brsynth/biorgroup

Systematic expansion of R-group for ChEBI molecules from the RHEA database
https://github.com/brsynth/biorgroup

chebi r-group rhea

Last synced: 6 months ago
JSON representation

Systematic expansion of R-group for ChEBI molecules from the RHEA database

Host: GitHub
URL: https://github.com/brsynth/biorgroup
Owner: brsynth
License: mit
Created: 2025-06-13T08:40:47.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-09-03T06:58:59.000Z (11 months ago)
Last Synced: 2025-09-09T20:14:05.898Z (11 months ago)
Topics: chebi, r-group, rhea
Language: Jupyter Notebook
Homepage:
Size: 421 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # BioRGroup dataset

![BioRGroup Logo](.github/docs/logo.jpg)  

[https://doi.org/10.57745/V3URYA](https://doi.org/10.57745/V3URYA)

## Installation

```sh

conda env create --file recipes/worklow.yaml --name biorgroup

pip install --no-deps -e .

```

## Build dataset

### 1 - Download PubChem

```sh

python -m biorgroup.pubchem.download \

    --output-pubchem-dir  \

    --output-pubchem-db 

```

### 2 - Download Rhea

```sh

python -m biorgroup.rhea.download \

    --output-rhea-dir  \

    --parameter-release-int 

```

### 3 - R-group search

```sh

snakemake \

    -p \

    -j 48 \

    -c 48 \

    --workflow-profile template/biorgroup \

    -s ./src/biorgroup/rgroup/Snakefile \

    --use-conda \

    --latency-wait 5 \

    --rerun-incomplete \

    --config input_depot_str=./src/biorgroup/rgroup input_chebi_csv=rhea-chebi-smiles.csv input_pubchem_db=pubchem.db output_dir_str=chebi parameter_search_timeout_int=10

```

## Dataset overview

The Snakemake workflow produces a `csv.gz` file containing:  

| column name | type |

| --- | --- |

| smiles_rhea | `str`|

| chebi | `List[str]` |

| num_heavy_atoms | `int` |

| exact_mol_wt | `float` |

| core_superstructure_smiles | `List[str]` |

| core_superstructure_pubchem_cid | `List[List[str]]` |

| rgroup_extended_smiles | `List[str]` |

| rgroup_extended_pubchem_cid | `List[List[str]]` |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/brsynth/biorgroup

Awesome Lists containing this project

README