{"id":15140781,"url":"https://github.com/zjunlp/molgen","last_synced_at":"2025-04-05T03:07:21.952Z","repository":{"id":107436020,"uuid":"584983454","full_name":"zjunlp/MolGen","owner":"zjunlp","description":"[ICLR 2024] Domain-Agnostic Molecular Generation with Chemical Feedback","archived":false,"fork":false,"pushed_at":"2024-12-17T15:33:14.000Z","size":17242,"stargazers_count":155,"open_issues_count":4,"forks_count":13,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-05T03:06:45.513Z","etag":null,"topics":["generation","huggingface","iclr2024","language-model","molecular-generation","molecular-optimization","molecule","molgen","multitask","pre-trained-language-models","pre-trained-model","pre-training","pytorch","selfies","targeted-molecular-generation"],"latest_commit_sha":null,"homepage":"https://huggingface.co/spaces/zjunlp/MolGen","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-04T02:38:28.000Z","updated_at":"2025-03-31T09:53:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"80e17b5c-e5da-4735-a24c-fd3467b57a5c","html_url":"https://github.com/zjunlp/MolGen","commit_stats":{"total_commits":81,"total_committers":3,"mean_commits":27.0,"dds":"0.13580246913580252","last_synced_commit":"ad16142ad3b6f162126cae62f0bddbd314300dc8"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMolGen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMolGen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMolGen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMolGen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/MolGen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247280264,"owners_count":20912967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["generation","huggingface","iclr2024","language-model","molecular-generation","molecular-optimization","molecule","molgen","multitask","pre-trained-language-models","pre-trained-model","pre-training","pytorch","selfies","targeted-molecular-generation"],"created_at":"2024-09-26T08:41:13.504Z","updated_at":"2025-04-05T03:07:21.931Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","readme":"\n\n\n\u003ch1 align=\"center\"\u003e  ⚗️ MolGen  \u003c/h1\u003e\n\u003ch3 align=\"center\"\u003e Domain-Agnostic Molecular Generation with Chemical Feedback \u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n  📃 \u003ca href=\"https://arxiv.org/abs/2301.11259\" target=\"_blank\"\u003ePaper\u003c/a\u003e • 🤗 \u003ca href=\"https://huggingface.co/zjunlp/MolGen-large\" target=\"_blank\"\u003eModel\u003c/a\u003e  • 🔬 \u003ca href=\"https://huggingface.co/spaces/zjunlp/MolGen\" target=\"_blank\"\u003eSpace\u003c/a\u003e \u003cbr\u003e\n\u003c/p\u003e\n\n[![Pytorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?e\u0026logo=PyTorch\u0026logoColor=white)](https://pytorch.org/)\n![](https://img.shields.io/badge/version-1.0.1-blue)\n[![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://github.com/zjunlp/MolGen/blob/main/LICENSE)\n\n\n\u003cdiv align=center\u003e\u003cimg src=\"molgen.png\" width=\"100%\" height=\"100%\" /\u003e\u003c/div\u003e\n\n# 🔔 News \n\n- **`2024-2` We've released [ChatCell](https://huggingface.co/papers/2402.08303), a new paradigm that leverages natural language to make single-cell analysis more accessible and intuitive. Please visit our [homepage](https://www.zjukg.org/project/ChatCell) and [Github page](https://github.com/zjunlp/ChatCell) for more information.**\n- **`2024-1` Our paper [Domain-Agnostic Molecular Generation with Chemical Feedback](https://github.com/zjunlp/MolGen) is accepted by ICLR 2024.**\n- **`2024-1` Our paper [Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models](https://github.com/zjunlp/Mol-Instructions) is accepted by ICLR 2024.**\n- **`2023-10` We open-source [MolGen-7b](https://huggingface.co/zjunlp/MolGen-7b), which now supports de novo molecule generation!** \n- **`2023-6` We open-source [KnowLM](https://github.com/zjunlp/KnowLM), a knowledgeable LLM framework with pre-training and instruction fine-tuning code (supports multi-machine multi-GPU setup).**\n- **`2023-6` We release [Mol-Instructions](https://github.com/zjunlp/Mol-Instructions), a large-scale biomolecule instruction dataset for large language models.**\n- **`2023-5` We propose [Knowledge graph-enhanced molecular contrAstive learning with fuNctional prOmpt (KANO)](https://github.com/HICAI-ZJU/KANO) on `Nature Machine Intelligence`, exploiting fundamental domain knowledge in both pre-training and fine-tuning.**\n- **`2023-4` We provide a NLP for science paper-list at [https://github.com/zjunlp/NLP4Science_Papers](https://github.com/zjunlp/NLP4Science_Papers).**\n- **`2023-3` We release our pre-trained and fine-tuned model on 🤗 **Hugging Face** at [MolGen-large](https://huggingface.co/zjunlp/MolGen-large) and [MolGen-large-opt](https://huggingface.co/zjunlp/MolGen-large-opt).**\n- **`2023-2` We provide a demo on 🤗 **Hugging Face** at [Space](https://huggingface.co/spaces/zjunlp/MolGen).**\n\n\n\n# 📕 Requirements\n\nTo run the codes, You can configure dependencies by restoring our environment:\n```\nconda env create -f environment.yaml\n```\n\nand then：\n\n```\nconda activate my_env\n```\n\n# 📚 Resource Download\n    \n\nYou can download the pre-trained and fine-tuned models via Huggingface: [MolGen-large](https://huggingface.co/zjunlp/MolGen-large) and [MolGen-large-opt](https://huggingface.co/zjunlp/MolGen-large-opt).\n\nYou can also download the model using the following link: https://drive.google.com/drive/folders/1Eelk_RX1I26qLa9c4SZq6Tv-AAbDXgrW?usp=sharing\n\nMoreover, the dataset used for downstream tasks can be found [here](https://github.com/zjunlp/MolGen/tree/main/moldata/finetune).\n\nThe expected structure of files is:\n\n```\nmoldata\n├── checkpoint \n│   ├── molgen.pkl              # pre-trained model\n│   ├── syn_qed_model.pkl       # fine-tuned model for QED optimization on synthetic data\n│   ├── syn_plogp_model.pkl     # fine-tuned model for p-logP optimization on synthetic data\n│   ├── np_qed_model.pkl        # fine-tuned model for QED optimization on natural product data\n│   ├── np_plogp_model.pkl      # fine-tuned model for p-logP optimization on natural product data\n├── finetune\n│   ├── np_test.csv             # nature product test data\n│   ├── np_train.csv            # nature product train data\n│   ├── plogp_test.csv          # synthetic test data for plogp optimization\n│   ├── qed_test.csv            # synthetic test data for plogp optimization\n│   └── zinc250k.csv            # synthetic train data\n├── generate                    # generate molecules\n├── output                      # molecule candidates\n└── vocab_list\n    └── zinc.npy                # SELFIES alphabet\n``` \n\n# 🚀 How to run\n\n\n+ ## Fine-tune\n\n    - First, preprocess the finetuning dataset by generating candidate molecules using our pre-trained model. The preprocessed data will be stored in the folder ``output``.\n\n    ```shell\n        cd MolGen\n        bash preprocess.sh\n    ```\n\n    - Then utilize the self-feedback paradigm. The fine-tuned model will be stored in the folder ``checkpoint``.\n\n\n    ```shell\n        bash finetune.sh\n    ```\n\n+ ## Generate\n\n    To generate molecules, run this script. Please specify the ``checkpoint_path`` to determine whether to use the pre-trained model or the fine-tuned model.\n\n    ```shell\n    cd MolGen\n    bash generate.sh\n    ```\n\n#  🥽 Experiments\n\nWe conduct experiments on well-known benchmarks to confirm MolGen's optimization capabilities, encompassing penalized logP, QED, and molecular docking properties. For detailed experimental settings and analysis, please refer to our [paper](https://arxiv.org/abs/2301.11259).\n\n+ ## MolGen captures real-word molecular distributions\n\n\u003cimg width=\"950\" alt=\"image\" src=\"https://github.com/zjunlp/MolGen/assets/61076726/c32bf106-d43c-4d1d-af48-8caed03305bc\"\u003e\n\n\n+ ## MolGen mitigates molecular hallucinations\n### Targeted molecule discovery\n\n\u003cimg width=\"480\" alt=\"image\" src=\"https://github.com/zjunlp/MolGen/assets/61076726/51533e08-e465-44c8-9e78-858775b59b4f\"\u003e\n\n\u003cimg width=\"595\" alt=\"image\" src=\"https://github.com/zjunlp/MolGen/assets/61076726/6f17a630-88e4-46f6-9cb1-9c3637a264fc\"\u003e\n\n\u003cimg width=\"376\" alt=\"image\" src=\"https://github.com/zjunlp/MolGen/assets/61076726/4b934314-5f23-4046-a771-60cdfe9b572d\"\u003e\n\n### Constrained molecular optimization\n\u003cimg width=\"350\" alt=\"image\" src=\"https://github.com/zjunlp/MolGen/assets/61076726/bca038cc-637a-41fd-9b53-48ac67c4f182\"\u003e\n\n\n# Citation\n\nIf you use or extend our work, please cite the paper as follows:\n\n```bibtex\n@inproceedings{fang2023domain,\n  author       = {Yin Fang and\n                  Ningyu Zhang and\n                  Zhuo Chen and\n                  Xiaohui Fan and\n                  Huajun Chen},\n  title        = {Domain-Agnostic Molecular Generation with Chemical feedback},\n  booktitle    = {{ICLR}},\n  publisher    = {OpenReview.net},\n  year         = {2024},\n  url          = {https://openreview.net/pdf?id=9rPyHyjfwP}\n}\n```\n\n![Star History Chart](https://api.star-history.com/svg?repos=zjunlp/MolGen\u0026type=Date)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fmolgen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fmolgen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fmolgen/lists"}