https://github.com/thomas0809/rxnscribe
A Sequence Generation Model for Reaction Diagram Parsing
https://github.com/thomas0809/rxnscribe
chemistry deep-learning reaction
Last synced: about 1 year ago
JSON representation
A Sequence Generation Model for Reaction Diagram Parsing
- Host: GitHub
- URL: https://github.com/thomas0809/rxnscribe
- Owner: thomas0809
- License: mit
- Created: 2023-02-23T05:09:33.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-18T20:15:03.000Z (over 2 years ago)
- Last Synced: 2025-03-25T22:35:32.534Z (about 1 year ago)
- Topics: chemistry, deep-learning, reaction
- Language: Jupyter Notebook
- Homepage:
- Size: 42 MB
- Stars: 69
- Watchers: 3
- Forks: 24
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RxnScribe
This is the repository for RxnScribe, a sequence generation model for reaction diagram parsing.
Try our [demo](https://huggingface.co/spaces/yujieq/RxnScribe) on Hugging Face!

If you use RxnScribe in your research, please cite our [paper](https://pubs.acs.org/doi/10.1021/acs.jcim.3c00439).
```
@article{
RxnScribe,
author = {Qian, Yujie and Guo, Jiang and Tu, Zhengkai and Coley, Connor W. and Barzilay, Regina},
title = {RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing},
journal = {Journal of Chemical Information and Modeling},
doi = {10.1021/acs.jcim.3c00439}
}
```
Molecule structure recognition is supported by MolScribe
([paper](https://pubs.acs.org/doi/10.1021/acs.jcim.2c01480),
[code](https://github.com/thomas0809/MolScribe),
[demo](https://huggingface.co/spaces/yujieq/MolScribe)).
## Quick Start
Run the following command to install the package and its dependencies:
```
git clone git@github.com:thomas0809/RxnScribe.git
cd RxnScribe
python setup.py install
```
Download the checkpoint and use RxnScribe to extract reactions from a diagram:
```python
import torch
from rxnscribe import RxnScribe
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download("yujieq/RxnScribe", "pix2seq_reaction_full.ckpt")
model = RxnScribe(ckpt_path, device=torch.device('cpu'))
image_file = "assets/jacs.5b12989-Table-c3.png"
predictions = model.predict_image_file(image_file, molscribe=True, ocr=True)
```
The predictions will be in the following format:
```python
[
{ # First reaction
'reactants': [
{
'category': '[Mol]', 'category_id': 1, 'bbox': (0.1550, 0.0246, 0.2851, 0.2614),
'smiles': '*OC(=O)c1ccccc1C#Cc1ccccc1', 'molfile': '(omitted)'
},
# ... more reactants
],
'conditions': [
{
'category': '[Txt]', 'category_id': 2, 'bbox': (0.2941, 0.0641, 0.3811, 0.1450),
'text': ['CIBcat', '(1.4 equiv)']
},
# ... more conditions
],
'products': [
# ...
]
},
# More reactions
]
```
We provide a function to visualize the prediction:
```python
visualize_images = model.draw_predictions(predictions, image_file=image_file)
```
Each predicted reaction will be visualized in a separate image, where
red boxes are reactants,
green boxes are reaction conditions,
blue boxes are products.
This [notebook](notebook/predict.ipynb) shows how to run RxnScribe and visualize the prediction.
For development or reproducing the experiments, follow the instructions below.
## Requirements
Install the required packages
```
pip install -r requirements.txt
```
## Data
Download the reaction diagrams from this [link](https://huggingface.co/yujieq/RxnScribe/blob/main/images.zip),
and save them to `data/parse/images/`.
The ground truth files can be found at [`data/parse/splits/`](data/parse/splits/).
We perform five-fold cross validation in our experiments. The train/dev/test split for each fold is available.
This [notebook](notebook/visualize_data.ipynb) shows how to visualize the diagram and the ground truth.
## Train and Evaluate RxnScribe
Run this script to train and evaluate RxnScribe with five-fold cross validation.
```bash
bash scripts/train_pix2seq_cv.sh
```
Finally, we train RxnScribe with 90% of the dataset, and use the remaining 10% as the dev set.
We have released the [model checkpoint](https://huggingface.co/yujieq/RxnScribe/blob/main/pix2seq_reaction_full.ckpt).
```bash
bash scripts/train_pix2seq_full.sh
```