https://github.com/thomas0809/textreact

Predictive Chemistry Augmented with Text Retrieval
https://github.com/thomas0809/textreact

Last synced: over 1 year ago
JSON representation

Predictive Chemistry Augmented with Text Retrieval

Host: GitHub
URL: https://github.com/thomas0809/textreact
Owner: thomas0809
Created: 2023-02-23T19:08:01.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-02-20T19:34:44.000Z (over 2 years ago)
Last Synced: 2025-03-25T22:36:11.991Z (over 1 year ago)
Language: Python
Homepage:
Size: 11.2 MB
Stars: 21
Watchers: 1
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# TextReact

This repository contains the code for [TextReact](https://aclanthology.org/2023.emnlp-main.784/), a novel method that directly augments
predictive chemistry with text retrieval.

![](assets/textreact.png)

```
@inproceedings{TextReact,
author = {Yujie Qian and
Zhening Li and
Zhengkai Tu and
Connor W. Coley and
Regina Barzilay},
title = {Predictive Chemistry Augmented with Text Retrieval},
booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2023, Singapore, December 6-10, 2023},
pages = {12731--12745},
publisher = {Association for Computational Linguistics},
year = {2023},
url = {https://aclanthology.org/2023.emnlp-main.784}
}
```

## Requirements
We implement the code with `torch==1.11.0`, `pytorch-lightning==2.0.0`, and `transformers==4.27.3`.
To reproduce our experiments, we recommend creating a conda environment with the same dependencies:
```bash
conda env create -f environment.yml -n textreact
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
```

## Data

Run the following commands to download and unzip the preprocessed datasets:
```
git clone https://huggingface.co/datasets/yujieq/TextReact data
cd data
unzip '*'
```

## Training Scripts

TextReact consists of two modules: SMILES-To-text retriever and
text-augmented predictor. This repository only contains the code for
training the predictor, while the code for the retriever is available in
a separate repository: https://github.com/thomas0809/tevatron.

The training scripts are located under [`scripts`](scripts):
* [`train_RCR.sh`](scripts/train_RCR.sh) trains a model for reaction condition recommendation (RCR)
on the random split of the USPTO dataset.
* [`train_RetroSyn_tf.sh`](scripts/train_RetroSyn_tf.sh) trains a template-free model for retrosynthesis
on the random split of the USPTO-50K dataset.
* [`train_RetroSyn_tb.sh`](scripts/train_RetroSyn_tb.sh) trains a template-based model for retrosynthesis
on the random split of the USPTO-50K dataset.
In addition, [`train_RCR_TS.sh`](scripts/train_RCR_TS.sh), [`train_RetroSyn_tf_TS.sh`](scripts/train_RetroSyn_tf_TS.sh)
and [`train_RetroSyn_tb_TS.sh`](scripts/train_RetroSyn_tb_TS.sh) train the corresponding models
on the time-based split of the dataset.

If you're working on a distributed file system, it is recommended to
add to the script a `--cache_path` option specifying a local path to reduce network time.

To run the script `scripts/train_MODEL.sh`, run the following command at the root of the folder:
```
bash scripts/train_MODEL.sh
```

At the end of training, two dictionaries are printed with the top-k test accuracies.
The first one corresponds to retrieving from the full corpus
and the second one corresponds to retrieving from the gold-removed corpus.

Models and test predictions are stored under the path specified by the `SAVE_PATH` variable in the script.
* `best.ckpt` is the checkpoint with the highest validation accuracy so far, whereas
* `last.ckpt` is the last checkpoint.
* `prediction_test_0.json` contains the test predictions when retrieving from the full corpus.
* `prediction_test_1.json` contains the predictions when retrieving from the gold-removed corpus.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thomas0809/textreact

Awesome Lists containing this project

README