{"id":20356186,"url":"https://github.com/thomas0809/textreact","last_synced_at":"2025-04-12T02:52:00.867Z","repository":{"id":211650231,"uuid":"605723489","full_name":"thomas0809/textreact","owner":"thomas0809","description":"Predictive Chemistry Augmented with Text Retrieval","archived":false,"fork":false,"pushed_at":"2024-02-20T19:34:44.000Z","size":11746,"stargazers_count":21,"open_issues_count":1,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T22:36:11.991Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomas0809.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-02-23T19:08:01.000Z","updated_at":"2025-03-10T15:44:50.000Z","dependencies_parsed_at":"2024-02-20T20:47:18.217Z","dependency_job_id":null,"html_url":"https://github.com/thomas0809/textreact","commit_stats":null,"previous_names":["thomas0809/textreact"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas0809%2Ftextreact","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas0809%2Ftextreact/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas0809%2Ftextreact/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas0809%2Ftextreact/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomas0809","download_url":"https://codeload.github.com/thomas0809/textreact/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248509508,"owners_count":21116039,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T23:15:27.048Z","updated_at":"2025-04-12T02:52:00.843Z","avatar_url":"https://github.com/thomas0809.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TextReact\n\nThis repository contains the code for [TextReact](https://aclanthology.org/2023.emnlp-main.784/), a novel method that directly augments \npredictive chemistry with text retrieval.\n\n![](assets/textreact.png)\n\n```\n@inproceedings{TextReact,\n  author       = {Yujie Qian and\n                  Zhening Li and\n                  Zhengkai Tu and\n                  Connor W. Coley and\n                  Regina Barzilay},\n  title        = {Predictive Chemistry Augmented with Text Retrieval},\n  booktitle    = {Proceedings of the 2023 Conference on Empirical Methods in Natural\n                  Language Processing, {EMNLP} 2023, Singapore, December 6-10, 2023},\n  pages        = {12731--12745},\n  publisher    = {Association for Computational Linguistics},\n  year         = {2023},\n  url          = {https://aclanthology.org/2023.emnlp-main.784}\n}\n```\n\n## Requirements\nWe implement the code with `torch==1.11.0`, `pytorch-lightning==2.0.0`, and `transformers==4.27.3`. \nTo reproduce our experiments, we recommend creating a conda environment with the same dependencies:\n```bash\nconda env create -f environment.yml -n textreact\npip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113\n```\n\n## Data\n\nRun the following commands to download and unzip the preprocessed datasets:\n```\ngit clone https://huggingface.co/datasets/yujieq/TextReact data\ncd data\nunzip '*'\n```\n\n## Training Scripts\n\nTextReact consists of two modules: SMILES-To-text retriever and \ntext-augmented predictor. This repository only contains the code for \ntraining the predictor, while the code for the retriever is available in\na separate repository: https://github.com/thomas0809/tevatron.\n\nThe training scripts are located under [`scripts`](scripts):\n* [`train_RCR.sh`](scripts/train_RCR.sh) trains a model for reaction condition recommendation (RCR)\non the random split of the USPTO dataset.\n* [`train_RetroSyn_tf.sh`](scripts/train_RetroSyn_tf.sh) trains a template-free model for retrosynthesis\non the random split of the USPTO-50K dataset.\n* [`train_RetroSyn_tb.sh`](scripts/train_RetroSyn_tb.sh) trains a template-based model for retrosynthesis\non the random split of the USPTO-50K dataset.\nIn addition, [`train_RCR_TS.sh`](scripts/train_RCR_TS.sh), [`train_RetroSyn_tf_TS.sh`](scripts/train_RetroSyn_tf_TS.sh)\nand [`train_RetroSyn_tb_TS.sh`](scripts/train_RetroSyn_tb_TS.sh) train the corresponding models\non the time-based split of the dataset.\n\nIf you're working on a distributed file system, it is recommended to\nadd to the script a `--cache_path` option specifying a local path to reduce network time.\n\nTo run the script `scripts/train_MODEL.sh`, run the following command at the root of the folder:\n```\nbash scripts/train_MODEL.sh\n```\n\nAt the end of training, two dictionaries are printed with the top-k test accuracies.\nThe first one corresponds to retrieving from the full corpus\nand the second one corresponds to retrieving from the gold-removed corpus.\n\nModels and test predictions are stored under the path specified by the `SAVE_PATH` variable in the script.\n* `best.ckpt` is the checkpoint with the highest validation accuracy so far, whereas\n* `last.ckpt` is the last checkpoint.\n* `prediction_test_0.json` contains the test predictions when retrieving from the full corpus.\n* `prediction_test_1.json` contains the predictions when retrieving from the gold-removed corpus.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomas0809%2Ftextreact","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomas0809%2Ftextreact","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomas0809%2Ftextreact/lists"}