{"id":20356181,"url":"https://github.com/thomas0809/rxnscribe","last_synced_at":"2025-04-12T02:51:11.270Z","repository":{"id":102423997,"uuid":"605413624","full_name":"thomas0809/RxnScribe","owner":"thomas0809","description":"A Sequence Generation Model for Reaction Diagram Parsing","archived":false,"fork":false,"pushed_at":"2023-09-18T20:15:03.000Z","size":44092,"stargazers_count":69,"open_issues_count":2,"forks_count":24,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-25T22:35:32.534Z","etag":null,"topics":["chemistry","deep-learning","reaction"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomas0809.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-02-23T05:09:33.000Z","updated_at":"2025-03-25T09:40:55.000Z","dependencies_parsed_at":"2023-09-23T23:56:29.653Z","dependency_job_id":null,"html_url":"https://github.com/thomas0809/RxnScribe","commit_stats":{"total_commits":33,"total_committers":1,"mean_commits":33.0,"dds":0.0,"last_synced_commit":"ad6b1c75d40e563e68deca0491918885948d69c7"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas0809%2FRxnScribe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas0809%2FRxnScribe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas0809%2FRxnScribe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas0809%2FRxnScribe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomas0809","download_url":"https://codeload.github.com/thomas0809/RxnScribe/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248509255,"owners_count":21115970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chemistry","deep-learning","reaction"],"created_at":"2024-11-14T23:15:26.677Z","updated_at":"2025-04-12T02:51:11.252Z","avatar_url":"https://github.com/thomas0809.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RxnScribe \n\nThis is the repository for RxnScribe, a sequence generation model for reaction diagram parsing.\nTry our [demo](https://huggingface.co/spaces/yujieq/RxnScribe) on Hugging Face!\n\n![](assets/model.png)\n\nIf you use RxnScribe in your research, please cite our [paper](https://pubs.acs.org/doi/10.1021/acs.jcim.3c00439).\n```\n@article{\n    RxnScribe,\n    author = {Qian, Yujie and Guo, Jiang and Tu, Zhengkai and Coley, Connor W. and Barzilay, Regina},\n    title = {RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing},\n    journal = {Journal of Chemical Information and Modeling},\n    doi = {10.1021/acs.jcim.3c00439}\n}\n```\n\nMolecule structure recognition is supported by MolScribe \n([paper](https://pubs.acs.org/doi/10.1021/acs.jcim.2c01480), \n[code](https://github.com/thomas0809/MolScribe), \n[demo](https://huggingface.co/spaces/yujieq/MolScribe)).\n\n## Quick Start\nRun the following command to install the package and its dependencies:\n```\ngit clone git@github.com:thomas0809/RxnScribe.git\ncd RxnScribe\npython setup.py install\n```\n\nDownload the checkpoint and use RxnScribe to extract reactions from a diagram:\n```python\nimport torch\nfrom rxnscribe import RxnScribe\nfrom huggingface_hub import hf_hub_download\n\nckpt_path = hf_hub_download(\"yujieq/RxnScribe\", \"pix2seq_reaction_full.ckpt\")\nmodel = RxnScribe(ckpt_path, device=torch.device('cpu'))\n\nimage_file = \"assets/jacs.5b12989-Table-c3.png\"\npredictions = model.predict_image_file(image_file, molscribe=True, ocr=True)\n```\nThe predictions will be in the following format:\n```python\n[\n    {  # First reaction\n        'reactants': [\n            {\n                'category': '[Mol]', 'category_id': 1, 'bbox': (0.1550, 0.0246, 0.2851, 0.2614),\n                'smiles': '*OC(=O)c1ccccc1C#Cc1ccccc1', 'molfile': '(omitted)' \n            }, \n            # ... more reactants \n        ],\n        'conditions': [\n            {\n                'category': '[Txt]', 'category_id': 2, 'bbox': (0.2941, 0.0641, 0.3811, 0.1450),\n                'text': ['CIBcat', '(1.4 equiv)']\n            }, \n            # ... more conditions\n        ],\n        'products': [ \n            # ...\n        ]\n    },\n    # More reactions\n]\n```\nWe provide a function to visualize the prediction:\n```python\nvisualize_images = model.draw_predictions(predictions, image_file=image_file)\n```\nEach predicted reaction will be visualized in a separate image, where \n\u003cb style=\"color:red\"\u003ered boxes are \u003ci\u003e\u003cu style=\"color:red\"\u003ereactants\u003c/u\u003e\u003c/i\u003e,\u003c/b\u003e\n\u003cb style=\"color:green\"\u003egreen boxes are \u003ci\u003e\u003cu style=\"color:green\"\u003ereaction conditions\u003c/u\u003e\u003c/i\u003e,\u003c/b\u003e\n\u003cb style=\"color:blue\"\u003eblue boxes are \u003ci\u003e\u003cu style=\"color:blue\"\u003eproducts\u003c/u\u003e\u003c/i\u003e.\u003c/b\u003e\n\n\u003cimg src=\"assets/output/output0.png\" width=\"384\"/\u003e \u003cimg src=\"assets/output/output1.png\" width=\"384\"/\u003e \n\nThis [notebook](notebook/predict.ipynb) shows how to run RxnScribe and visualize the prediction.\n\nFor development or reproducing the experiments, follow the instructions below.\n\n## Requirements\nInstall the required packages\n```\npip install -r requirements.txt\n```\n\n## Data\nDownload the reaction diagrams from this [link](https://huggingface.co/yujieq/RxnScribe/blob/main/images.zip), \nand save them to `data/parse/images/`.\n\nThe ground truth files can be found at [`data/parse/splits/`](data/parse/splits/).\n\nWe perform five-fold cross validation in our experiments. The train/dev/test split for each fold is available.\n\nThis [notebook](notebook/visualize_data.ipynb) shows how to visualize the diagram and the ground truth.\n\n## Train and Evaluate RxnScribe\nRun this script to train and evaluate RxnScribe with five-fold cross validation.\n```bash\nbash scripts/train_pix2seq_cv.sh\n```\nFinally, we train RxnScribe with 90% of the dataset, and use the remaining 10% as the dev set.\nWe have released the [model checkpoint](https://huggingface.co/yujieq/RxnScribe/blob/main/pix2seq_reaction_full.ckpt).\n```bash\nbash scripts/train_pix2seq_full.sh\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomas0809%2Frxnscribe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomas0809%2Frxnscribe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomas0809%2Frxnscribe/lists"}