{"id":22270641,"url":"https://github.com/garywei944/chemflow","last_synced_at":"2025-09-11T05:15:11.087Z","repository":{"id":238736792,"uuid":"793735034","full_name":"garywei944/ChemFlow","owner":"garywei944","description":"Uncover meaningful structures of latent spaces learned by generative models with flows!","archived":false,"fork":false,"pushed_at":"2024-05-10T18:48:14.000Z","size":4977,"stargazers_count":42,"open_issues_count":0,"forks_count":9,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-07-28T13:37:48.242Z","etag":null,"topics":["ai4science","drug-discovery","machine-learning","optimal-transport"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/garywei944.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-29T19:11:56.000Z","updated_at":"2025-05-03T20:05:53.000Z","dependencies_parsed_at":"2024-05-07T20:27:47.627Z","dependency_job_id":"be13b4f6-e34c-46e7-b85e-b4620fcd3fd6","html_url":"https://github.com/garywei944/ChemFlow","commit_stats":null,"previous_names":["garywei944/chemflow"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/garywei944/ChemFlow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garywei944%2FChemFlow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garywei944%2FChemFlow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garywei944%2FChemFlow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garywei944%2FChemFlow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/garywei944","download_url":"https://codeload.github.com/garywei944/ChemFlow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garywei944%2FChemFlow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270932581,"owners_count":24670241,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-17T02:00:09.016Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai4science","drug-discovery","machine-learning","optimal-transport"],"created_at":"2024-12-03T12:09:04.737Z","updated_at":"2025-08-18T01:38:04.383Z","avatar_url":"https://github.com/garywei944.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ChemFlow: Navigating Chemical Space with Latent Flows\n\nThis repo implements the paper 🔗: [Navigating Chemical Space with Latent Flows](https://arxiv.org/abs/2405.03987) by\nGuanghao Wei*, Yining Huang*, Chenru Duan, Yue Song, and Yuanqi Du.\n\nFlows can uncover meaningful structures of latent spaces learned by generative models!\nWe propose a unifying framework to characterize latent structures by flows/diffusions for optimization and traversal.\n\n![](ChemFlow.png)\n\n## Live Demo\n\nTry our live demo [here](https://colab.research.google.com/drive/1QAy_QoEnDRaiLF6kJ6RyhuGx1qCJXYKm?usp=sharing)!\n\n![](Demo.gif)\n\n\n## Quick Start\n\n* Install all dependencies with `conda env create -f environment.yml`.\n    * (Optional) Install [AutoDock-GPU](https://github.com/ccsb-scripps/AutoDock-GPU)      for docking binding affinity.\n      See [Notes on Compiling AutoDock-GPU](#notes-on-compiling-autodock-gpu).\n    * (Recommended) `mv .env.defaults .env` and specify `PROJECT_PATH` in `.env`. This is later used to run the\n      experiments in the project root directory.\n* [Download data](#download-data--model-checkpoints) and put it in the `data` directory.\n* Train the VAE model by running `python experiments/train_vae.py`.\n    * (Optional) Download our pre-trained VAE model checkpoint,\n      see [Download Data \u0026 Model Checkpoints](#download-data--model-checkpoints).\n* For supervised learning\n    1. Prepare the data by running `python experiments/prepare_random_data.py`.\n    2. Train the supervised surrogate predictor by running `bash experiments/supervised/train_prop_predictor.sh`.\n    3. Train the energy network with supervised semantic guidance by\n       running `bash experiments/supervised/train_wavepde_prop.sh`.\n* For unsupervised learning\n    1. Train the energy network with unsupervised diversity guidance by running `python experiments/train_wavepde.py`.\n    2. Compute the pearson correlation coefficient by running `python experiments/unsupervised/corr.py`. Refer\n       to [`notebooks/experiments/unsupervised/corr.ipynb`](notebooks/experiments/unsupervised/corr.ipynb) for more\n       details.\n    3. Modify [`experiments/utils/traversal_step.py`](experiments/utils/traversal_step.py) in place with the best\n       correlation coefficient index.\n* To reproduce the experiment results from the paper, run the following commands:\n    * `bash experiments/optimization/optimization.sh` for similarity constrained optimization.\n    * `bash experiments/optimization/uc_optim.sh` for unconstrained optimization.\n    * `python experiments/optimization/optimization_multi.py` for multi-objective optimization.\n    * `bash experiments/success_rate/success_rate.sh` for molecule manipulation tasks.\n\n### Additional Arguments for the scripts\n\nWe used `lightning`([doc](https://lightning.ai/docs/pytorch/stable/cli/lightning_cli.html))\nand `tap`([doc](https://github.com/swansonk14/typed-argument-parser)) to parse the arguments.\nFollowing is an example command to pass in arguments configured by `lightning`:\n\n```bash\npython experiments/supervised/train_prop_predictor.py \\\n    -e 50 \\\n    --model.optimizer sgd \\\n    --data.n 11000 \\\n    --data.batch_size 100 \\\n    --data.binding_affinity true \\\n    --data.prop 1err\n```\n\n## Download Data \u0026 Model Checkpoints\n\nWe extracted 4,253,577 molecules from the three commonly used datasets for drug discovery\nincluding [MOSES](https://github.com/molecularsets/moses), [ZINC250K](https://zinc.docking.org/)([download](https://www.kaggle.com/datasets/basu369victor/zinc250k/data)),\nand [ChEMBL](https://www.ebi.ac.uk/chembl/).\n\n* The processed dataset and VAE model checkpoints are available\n  at [Google Drive](https://drive.google.com/drive/folders/1_FykJJNq0Qun7_e8-hlg2zvfkNkWJhe9?usp=sharing).\n    * Data processing notebooks refers to [`notebooks/datasets.ipynb`](notebooks/datasets.ipynb).\n\n## Notes on Compiling [AutoDock-GPU](https://github.com/ccsb-scripps/AutoDock-GPU)\n\nThe conda version of `AutoDock-GPU` is not compatible with RTX 3080 \u0026 3090.\nSo don't use `environment.yml` to install `AutoDock-GPU`.\nMake sure to follow this [issue](https://github.com/ccsb-scripps/AutoDock-GPU/issues/172#issuecomment-1010263229) to\ncompile the source code.\nA good reference for the SM code\ncan be found [here](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/).\n\nSome commands might be useful:\n\n```bash\nexport GPU_INCLUDE_PATH=/usr/local/cuda/include\nexport GPU_LIBRARY_PATH=/usr/local/cuda/lib64\n\nmake DEVICE=CUDA NUMWI=128 TARGETS=86\n```\n\nTo test if the compilation is successful, run the following command:\n\n```bash\nobabel -:\"CCN(CCCCl)OC1=CC2=C(Cl)C1C3=C2CCCO3\" -O demo.pdbqt -p 7.4 --partialcharge gasteiger --gen3d\nautodock_gpu_128wi -M data/raw/1err/1err.maps.fld -L demo.pdbqt -s 0 -N demo\n```\n\n## Cite Us\n```bibtex\n@misc{wei2024navigating,\n      title={Navigating Chemical Space with Latent Flows}, \n      author={Guanghao Wei and Yining Huang and Chenru Duan and Yue Song and Yuanqi Du},\n      year={2024},\n      eprint={2405.03987},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgarywei944%2Fchemflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgarywei944%2Fchemflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgarywei944%2Fchemflow/lists"}