{"id":15140777,"url":"https://github.com/graph-0/jodo","last_synced_at":"2025-10-23T17:31:52.825Z","repository":{"id":165817400,"uuid":"639278099","full_name":"GRAPH-0/JODO","owner":"GRAPH-0","description":"Learning Joint 2D \u0026 3D Diffusion Models for Complete Molecule Generation","archived":false,"fork":false,"pushed_at":"2023-11-11T03:02:40.000Z","size":22318,"stargazers_count":42,"open_issues_count":1,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-15T22:35:13.859Z","etag":null,"topics":["diffusion-models","graph-neural-networks","molecule"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GRAPH-0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-05-11T06:26:08.000Z","updated_at":"2025-01-15T21:00:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"5fe8f2b7-e7e2-4ad5-8023-f11735132bd1","html_url":"https://github.com/GRAPH-0/JODO","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GRAPH-0%2FJODO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GRAPH-0%2FJODO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GRAPH-0%2FJODO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GRAPH-0%2FJODO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GRAPH-0","download_url":"https://codeload.github.com/GRAPH-0/JODO/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237869064,"owners_count":19379259,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","graph-neural-networks","molecule"],"created_at":"2024-09-26T08:41:07.176Z","updated_at":"2025-10-23T17:31:42.813Z","avatar_url":"https://github.com/GRAPH-0.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# JODO\n\n----\n\nThe implementation of [Learning Joint 2D \u0026 3D Diffusion Models for\nComplete Molecule Generation](https://arxiv.org/abs/2305.12347).\n\nRepresent molecules as 3D point cloud and 2D bonding graph:\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/exp_geom_ver.png\" width=\"800\"/\u003e \n\u003c/p\u003e\n\nThe generative diffusion process:\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/sampling_exp.png\" width=\"1200\"/\u003e\n\u003c/p\u003e\n\n----\n\nVisualization of molecules generated by JODO trained on the GEOM-Drugs dataset:\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/geom_vis_3_3d.png\" width=\"1500\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/geom_vis_3_2d.png\" width=\"1500\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/geom_vis_2_3d.png\" width=\"1500\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/geom_vis_2_2d.png\" width=\"1500\"/\u003e\n\u003c/p\u003e\n\nVisualization of molecules generated by JODO trained on the QM9 dataset with explict hydrogen atoms:\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/qm9_vis_5_3d.png\" width=\"1500\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/qm9_vis_5_2d.png\" width=\"1500\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/qm9_vis_4_3d.png\" width=\"1500\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"assets/qm9_vis_4_2d.png\" width=\"1500\"/\u003e\n\u003c/p\u003e\n\n----\n\n## Dependencies\n* [pyTorch \u003e= 1.11](https://pytorch.org/)\n* [PyG 2.1](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html)\n* See requirements.txt for others.\n\n## Dataset\n\nWe recommend using our processed dataset files provided [here](https://zenodo.org/record/7966493).\n\nDownload datasets:\n```bash\n# 718MB\nwget https://zenodo.org/record/7966493/files/data.zip\nunzip data.zip\n```\n\nIf you want to construct the GEOM-Drugs dataset from scratch:\n* The raw GEOM dataset is available at [here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JNGTDF).\n* Download `rdkit_folder.tar.gz` and unpack it.\n* Run `python build_geom_dataset.py --data_dir YOUR_DATA_PATH`.\n\n\n## Generated Molecules  \nWe provide pickles of 10000 molecules generated by JODO on different datasets in `./rdkit_mols`. \nMolecules are saved as RDKit Mol objects. Just load the list of molecules and make further analysis.\n\n```python\n# Example for loading molecules generated from JODO trained on GEOM-Drugs dataset. \nimport pickle\nmol_list = pickle.load(open('rdkit_mols/geom_jodo_ancestral_ckpt_35.pkl', 'rb'))\n```\n\n## Evaluation\nWe construct a comprehensive evaluation pipeline for molecule generation, including 2D molecular graph metrics, \n3D geometry metrics, and substructure geometry alignment metrics.\n* Especially for 3D geometry metrics, we follow https://github.com/ehoogeboom/e3_diffusion_for_molecules to use distance\nlookup table to predict bonds and report the same stability metrics for 3D geometry comparisons.\n* \u003cb\u003eHowever, stability metrics for 3D geometry may be tricked in some situation\u003c/b\u003e. Some methods get high stability \nratio but fail on FCD and alignment MMD, implying poor molecule generation quality. \nThis phenomenon is more pronounced on the GEOM-Drugs dataset because of more atypical interatomic distances.\n* We recommend using the stability metric more cautiously, preferably in combination with other metrics to evaluate \nmolecular quality.\n\nTo evaluate your models with our pipeline conveniently, you can save your generated molecules as a list of RDKit Mol \nobjects and run `eval_rdkit_pkl.py`.\n\nTake QM9 as an example:\n```shell\n# Molecules with 3D positions and atom types, without bonds\npython eval_rdkit_pkl.py --dataset_name qm9 --type 3D --root_path YOUR_DATASET_PATH --pkl_path YOUR_MOL_PATH\n\n# Molecules with atom and bond types, without 3D positions\npython eval_rdkit_pkl.py --dataset_name qm9 --type 2D --root_path YOUR_DATASET_PATH --pkl_path YOUR_MOL_PATH\n\n# Molecules with atom types, bond types and 3D positions\npython eval_rdkit_pkl.py --dataset_name qm9 --type both --sub_geometry=True --root_path YOUR_DATASET_PATH --pkl_path YOUR_MOL_PATH\n```\n\n## Checkpoint\n\nOur checkpoints are provided [here](https://zenodo.org/record/8002902).\n\nDownload checkpoints:\n```bash\n# Unconditional Generation: QM9, GEOM-Drugs (2.8GB)\nwget https://zenodo.org/record/8002902/files/exp_uncond.zip\nunzip exp_uncond.zip\n\n# Conditional Generation: single quantum property on QM9 (3.1GB)\nwget https://zenodo.org/record/8002902/files/exp_cond.zip \nunzip exp_cond.zip\n\n# Conditional Generation: multi properties (1.6GB)\nwget https://zenodo.org/record/8002902/files/exp_cond_multi.zip \nunzip exp_cond_multi.zip\n\n# Molecular Graph Generation: ZINC250k, MOSES (3.9GB)\nwget https://zenodo.org/record/8002902/files/exp_2d.zip \nunzip exp_2d.zip\n```\n\n## Unconditional Generation\n\nQM9 Training Example:\n```shell\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_uncond_jodo.py --mode train --workdir exp_uncond/vpsde_qm9_jodo\n```\n* Set GPU_id with `CUDA_VISIBLE_DEVICES`, support multi GPUs.\n\nQM9 Sampling Example:\n```shell\n# sample from our pretrained checkpoint\nCUDA_VISIBLE_DEVICES=2 python main.py --config configs/vpsde_qm9_uncond_jodo.py --mode eval --workdir exp_uncond/vpsde_qm9_jodo --config.eval.ckpts '30' --config.eval.batch_size 2500 --config.sampling.steps 1000\n```\n* Set `--config.eval.batch_size` to control GPU memory usage.\n* Set iteration steps via `--config.sampling.steps`. (Great results can be obtained from 1000 steps to 50 steps)\n\nGEOM-Drugs Training Example:\n```shell\n# Base\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_geom_uncond_jodo.py --mode train --workdir exp_uncond/vpsde_geom_jodo_base --config.model.n_layers 6 --config.model.nf 128\n\n# Medium\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_geom_uncond_jodo.py --mode train --workdir exp_uncond/vpsde_geom_jodo_media\n\n# Large\nCUDA_VISIBLE_DEVICES=0,1 python main.py --config configs/vpsde_geom_uncond_jodo.py --mode train --workdir exp_uncond/vpsde_geom_jodo_large --config.model.nf 384 --config.training.n_iters 1500000\n```\n\nGEOM-Drugs Sampling Example:\n```shell\n# Base\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_geom_uncond_jodo.py --mode eval --workdir exp_uncond/vpsde_geom_jodo_base --config.model.n_layers 6 --config.model.nf 128 --config.eval.ckpts '30' --config.eval.batch_size 800 --config.sampling.steps 1000\n\n# Medium\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_geom_uncond_jodo.py --mode eval --workdir exp_uncond/vpsde_geom_jodo_media --config.eval.ckpts '30' --config.eval.batch_size 1000 --config.sampling.steps 1000\n\n# Large\nCUDA_VISIBLE_DEVICES=0,1 python main.py --config configs/vpsde_geom_uncond_jodo.py --mode eval --workdir exp_uncond/vpsde_geom_jodo_large --config.model.nf 384 --config.eval.ckpts '30' --config.eval.batch_size 500 --config.sampling.steps 1000\n```\n\nUsing the simplified DGT without extra attention heads can also achieve relatively good performance:\n```shell\n# QM9 Training\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_uncond_jodo.py --mode train --workdir exp_uncond/vpsde_qm9_jodo_sim --config.model.name DGT_concat_sim\n\n# GEOM-Drugs Medium Training\nCUDA_VISIBLE_DEVICES=2,3 python main.py --config configs/vpsde_geom_uncond_jodo.py --mode train --workdir exp_uncond/vpsde_geom_jodo_media_sim --config.model.name DGT_concat_sim\n```\n\n## Conditional Generation\n\n```shell\n# Training\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode train --workdir exp_cond/vpsde_qm9_cond_jodo_gap --config.cond_property gap\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode train --workdir exp_cond/vpsde_qm9_cond_jodo_homo --config.cond_property homo\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode train --workdir exp_cond/vpsde_qm9_cond_jodo_lumo --config.cond_property lumo\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode train --workdir exp_cond/vpsde_qm9_cond_jodo_mu --config.cond_property mu\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode train --workdir exp_cond/vpsde_qm9_cond_jodo_Cv --config.cond_property Cv\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode train --workdir exp_cond/vpsde_qm9_cond_jodo_alpha --config.cond_property alpha\n\n# Sampling\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode eval --workdir exp_cond/vpsde_qm9_cond_jodo_gap --config.cond_property gap --config.eval.ckpts '40'\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode eval --workdir exp_cond/vpsde_qm9_cond_jodo_homo --config.cond_property homo --config.eval.ckpts '40'\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode eval --workdir exp_cond/vpsde_qm9_cond_jodo_lumo --config.cond_property lumo --config.eval.ckpts '40'\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode eval --workdir exp_cond/vpsde_qm9_cond_jodo_mu --config.cond_property mu --config.eval.ckpts '40'\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode eval --workdir exp_cond/vpsde_qm9_cond_jodo_Cv --config.cond_property Cv --config.eval.ckpts '40'\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_jodo.py --mode eval --workdir exp_cond/vpsde_qm9_cond_jodo_alpha --config.cond_property alpha --config.eval.ckpts '40'\n```\n\n* Set conditional property `alpha, gap, homo, lumo, mu, Cv` by `--config.cond_property`.\n\n```shell\n# Training\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_multi_jodo.py --mode train --workdir exp_cond_multi/vpsde_qm9_cond_jodo_Cv_mu --config.cond_property1 Cv --config.cond_property2 mu\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_multi_jodo.py --mode train --workdir exp_cond_multi/vpsde_qm9_cond_jodo_gap_mu --config.cond_property1 gap --config.cond_property2 mu\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_multi_jodo.py --mode train --workdir exp_cond_multi/vpsde_qm9_cond_jodo_alpha_mu --config.cond_property1 alpha --config.cond_property2 mu\n\n# Sampling\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_multi_jodo.py --mode eval --workdir exp_cond_multi/vpsde_qm9_cond_jodo_Cv_mu --config.cond_property1 Cv --config.cond_property2 mu --config.eval.ckpts '50'\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_multi_jodo.py --mode eval --workdir exp_cond_multi/vpsde_qm9_cond_jodo_gap_mu --config.cond_property1 gap --config.cond_property2 mu --config.eval.ckpts '50'\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_cond_multi_jodo.py --mode eval --workdir exp_cond_multi/vpsde_qm9_cond_jodo_alpha_mu --config.cond_property1 alpha --config.cond_property2 mu --config.eval.ckpts '50'\n```\n* Set multi conditional properties via `--config.cond_property1` and `--config.cond_property2`.\n\n\n## Molecular Graph Generation\n\nZINC250k:\n```shell\n# Training\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_zinc_2d_jodo.py --mode train --workdir exp_2d/vpsde_zinc_2d_jodo --config.model.nf 1024 --config.model.n_heads 64 --config.model.n_layers 6 --config.training.snapshot_freq 300000\n\n# Sampling\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_zinc_2d_jodo.py --mode eval --workdir exp_2d/vpsde_zinc_2d_jodo --config.model.nf 1024 --config.model.n_heads 64 --config.model.n_layers 6 --config.training.snapshot_freq 300000 --config.eval.ckpts '5'\n```\n* You can train a smaller model by `--config.model.nf 256 --config.model.n_heads 16 --config.model.n_layers 8`.\n\nMOSES:\n```shell\n# Training\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_moses_2d_jodo.py --mode train --workdir exp_2d/vpsde_moses_2d_jodo --config.model.nf 1024 --config.model.n_heads 64 --config.model.n_layers 6 --config.training.snapshot_freq 300000\n\n# Sampling\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_moses_2d_jodo.py --mode eval --workdir exp_2d/vpsde_moses_2d_jodo --config.model.nf 1024 --config.model.n_heads 64 --config.model.n_layers 6 --config.training.snapshot_freq 300000 --config.eval.ckpts '4'\n```\n\nTraining CDGS on QM9 and GEOM-Drugs:\n```shell\n# QM9\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_qm9_2d_cdgs.py --mode train --workdir exp_2d/vpsde_qm9_2d_cdgs\n\n# GEOM-Drugs\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/vpsde_geom_2d_cdgs.py --mode train --workdir exp_2d/vpsde_geom_2d_cdgs\n```\n\n## Citation\n\n```bibtex\n@article{huang2023learning,\n  title={Learning Joint 2D \\\u0026 3D Diffusion Models for Complete Molecule Generation},\n  author={Huang, Han and Sun, Leilei and Du, Bowen and Lv, Weifeng},\n  journal={arXiv preprint arXiv:2305.12347},\n  year={2023}\n}\n\n@article{huang2023conditional,\n  title={Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation},\n  author={Huang, Han and Sun, Leilei and Du, Bowen and Lv, Weifeng},\n  journal={arXiv preprint arXiv:2301.00427},\n  year={2023}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraph-0%2Fjodo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraph-0%2Fjodo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraph-0%2Fjodo/lists"}