{"id":13752516,"url":"https://github.com/chao1224/Geom3D","last_synced_at":"2025-05-09T19:32:21.621Z","repository":{"id":173437016,"uuid":"650731419","full_name":"chao1224/Geom3D","owner":"chao1224","description":"Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023","archived":false,"fork":false,"pushed_at":"2024-06-05T03:18:58.000Z","size":891,"stargazers_count":118,"open_issues_count":4,"forks_count":13,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-28T22:46:28.601Z","etag":null,"topics":["3d","3d-structures","ai4science","biology","chemistry","crystals","drugs","equivariance","geometry","group","invariance","material","molecules","physics","proteins","symmetry"],"latest_commit_sha":null,"homepage":"https://openreview.net/forum?id=ygXSNrIU1p","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chao1224.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-07T17:27:56.000Z","updated_at":"2025-04-20T14:29:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"3dfede80-0c1c-4950-8665-7ab7e2fe6d68","html_url":"https://github.com/chao1224/Geom3D","commit_stats":{"total_commits":9,"total_committers":1,"mean_commits":9.0,"dds":0.0,"last_synced_commit":"e44b11d4959aeb757789a2bbe5080b4ccdb485a1"},"previous_names":["chao1224/geom3d"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FGeom3D","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FGeom3D/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FGeom3D/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FGeom3D/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chao1224","download_url":"https://codeload.github.com/chao1224/Geom3D/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253312350,"owners_count":21888626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d","3d-structures","ai4science","biology","chemistry","crystals","drugs","equivariance","geometry","group","invariance","material","molecules","physics","proteins","symmetry"],"created_at":"2024-08-03T09:01:06.921Z","updated_at":"2025-05-09T19:32:20.818Z","avatar_url":"https://github.com/chao1224.png","language":"Python","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"readme":"# Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials\n\nAuthors: Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Zhiling Zheng, Chenru Duan, Zhiming Ma, Omar Yaghi, Anima Anandkumar, Christian Borgs, Jennifer Chayes, Hongyu Guo, Jian Tang\n\n[[ArXiv](https://arxiv.org/abs/2306.09375)]\n\nThis is **Geom3D**, a platfrom for geometric modeling on 3D structures:\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./figure/pipeline.jpg\" /\u003e \n\u003c/p\u003e\n\n## Environment\n\n### Conda\n\nSetup the anaconda\n ```bash\nwget https://repo.continuum.io/archive/Anaconda3-2019.10-Linux-x86_64.sh\nbash Anaconda3-2019.10-Linux-x86_64.sh -b\nexport PATH=$PWD/anaconda3/bin:$PATH\n ```\n\n### Packages\nStart with some basic packages.\n```bash\nconda create -n Geom3D python=3.7\nconda activate Geom3D\nconda install -y -c rdkit rdkit\nconda install -y numpy networkx scikit-learn\nconda install -y -c conda-forge -c pytorch pytorch=1.9.1\nconda install -y -c pyg -c conda-forge pyg=2.0.2\npip install ogb==1.2.1\n\npip install sympy\n\npip install ase\n\npip install lie_learn # for TFN and SE3-Trans\n\npip install packaging # for SEGNN\npip3 install e3nn # for SEGNN\n\npip install transformers # for smiles\npip install selfies # for selfies\n\npip install atom3d # for Atom3D\npip install cffi # for Atom3D\npip install biopython # for Atom3D\n\npip install cython # for pyximport \n\nconda install -y -c conda-forge py-xgboost-cpu # for XGB\n```\n\n## Datasets\n\nWe cover three types of datasets:\n- Small Molecules\n    - QM9\n    - MD17\n    - rMD17\n    - COLL\n- Proteins\n    - EC\n    - FOLD\n- Small Molecules and Proteins\n    - LBA\n    - LEP\n- Materials\n    - MatBench\n    - QMOF\n\nFor dataset acquisition:\n- We provide a set of raw and processed dataset [HuggingFace](https://huggingface.co/datasets/chao1224/Geom3D_data). You can download the data using `python download_data.py` under `./data`.\n- Please refer to the [data](./data) folder for more details.\n\n## Overview of Models\n\n### Representation Models\n\nGeom3D includes the following representation models:\n- [SchNet, NeurIPS'18](https://papers.nips.cc/paper_files/paper/2017/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html)\n- [TFN, NeurIPS'18 Workshop](https://arxiv.org/abs/1802.08219)\n- [DimeNet, ICLR'20](https://openreview.net/forum?id=B1eWbxStPH)\n- [SE(3)-Trans, NeurIPS'20](https://proceedings.neurips.cc//paper/2020/hash/15231a7ce4ba789d13b722cc5c955834-Abstract.html)\n- [EGNN, ICML'21](http://proceedings.mlr.press/v139/satorras21a.html)\n- [PaiNN, ICML'21](https://arxiv.org/abs/2102.03150)\n- [GemNet, NeurIPS'21](https://proceedings.neurips.cc/paper/2021/hash/35cf8659cfcb13224cbd47863a34fc58-Abstract.html)\n- [SphereNet, ICLR'22](https://openreview.net/forum?id=givsRXsOt9r)\n- [SEGNN, ICLR'22](https://openreview.net/forum?id=_xwr8gOBeV1)\n- [NequIP, Nature Communications'22](https://www.nature.com/articles/s41467-022-29939-5)\n- [Allegro, Nature Communications'23](https://www.nature.com/articles/s41467-023-36329-y)\n- [Equiformer, ICLR'23](https://openreview.net/forum?id=KwmPfARgOTD)\n- [GVP-GNN, ICLR'21](https://openreview.net/forum?id=1YLJDvSx6J4)\n- [IEConv, ICLR'21](https://openreview.net/forum?id=l0mSUROpwY)\n- [GearNet, ICLR'23](https://openreview.net/forum?id=to3qCB3tOh9)\n- [ProNet, ICLR'23](https://openreview.net/forum?id=9X-hgLDLYkQ)\n- [CDConv, ICLR'23](https://openreview.net/forum?id=P5Z-Zl9XJ7)\n\nWe also include the following 7 1D models and 11 2D models (specifically for small molecules):\n- 1D Fingerprints: MLP, RF, XGB\n- 1D SMILES: CNN, BERT\n- 1D Selfies: CNN, BERT\n- 2D topology:\n    - [GCN, NeurIPS'2015](https://arxiv.org/abs/1509.09292)\n    - [ENN-S2S, ICML'17](https://arxiv.org/abs/1704.01212)\n    - [GraphSAGE, NeurIPS'17](https://arxiv.org/abs/1706.02216)\n    - [GAT, ICLR'2018](https://openreview.net/forum?id=rJXMpikCZ)\n    - [GIN, ICLR'2019](https://openreview.net/forum?id=ryGs6iA5Km)\n    - [D-MPNN, ACS-JCIM'2019](https://pubs.acs.org/doi/10.1021/acs.jcim.9b00237)\n    - [N-Gram Graph, NeurIPS'2019](https://arxiv.org/abs/1806.09206)\n    - [PNA, NeurIPS'2020](https://arxiv.org/abs/2004.05718)\n    - [Graphormer, NeurIPS'21](https://openreview.net/forum?id=OeWooOxFwDa)\n    - [AWARE, TMLR'2022](https://openreview.net/forum?id=TWSTyYd2Rl)\n    - [GraphGPS, NeurIPS'22](https://arxiv.org/abs/2205.12454)\n\nNotice that there is no pretraining considered at this stage. For geoemtric pretraining models, please check the following section.\n\n### Geometric Pretraining\n\nWe include the following 14 geometric pretraining methods:\n\n- Pure 3D:\n    - Supervised\n    - Atom Type Prediction\n    - Distance Prediction\n    - Angle Prediction\n    - 3D InfoGraph, from [GeoSSL, ICLR'23](https://openreview.net/forum?id=CjTHVo1dvR)\n    - GeoSSL-RR, from [GeoSSL, ICLR'23](https://openreview.net/forum?id=CjTHVo1dvR)\n    - GeoSSL-InfoNCE, from [GeoSSL, ICLR'23](https://openreview.net/forum?id=CjTHVo1dvR)\n    - GeoSSL-EBM-NCE, from [GeoSSL, ICLR'23](https://openreview.net/forum?id=CjTHVo1dvR)\n    - [GeoSSL-DDM, ICLR'23](https://openreview.net/forum?id=CjTHVo1dvR)\n    - [GeoSSL-DDM-1L, ICLR'23](https://openreview.net/forum?id=tYIMtogyee)\n    - [3D-EMGP, AAAI'23](https://arxiv.org/abs/2207.08824)\n- Joint 2D-3D:\n    - [GraphMVP, ICLR'22](https://openreview.net/forum?id=xQUe1pOKPam)\n    - [3D InfoMax, ICML'22](https://proceedings.mlr.press/v162/stark22a.html)\n    - [MoleculeSDE, ICML'23](https://arxiv.org/abs/2305.18407)\n\n## Scripts\n\nThe python scripts can be found in `examples_3D`. We list the bash scripts (and hyperparameters) in `scripts`. For example, the bash script for SchNet on QM9 is:\n```\ncd examples_3D\n\nexport model_3d=SchNet\nexport dataset=QM9\nexport task_list=(mu alpha homo lumo gap r2 zpve u0 u298 h298 g298 cv)\n\nexport lr_list=(5e-4)\nexport lr_scheduler_list=(CosineAnnealingLR)\nexport split=customized_01\nexport seed=42\nexport emb_dim_list=(128 300)\nexport batch_size_list=(128)\n\nexport epochs=1000\n\nfor task in \"${task_list[@]}\"; do\nfor lr in \"${lr_list[@]}\"; do\nfor lr_scheduler in \"${lr_scheduler_list[@]}\"; do\nfor emb_dim in \"${emb_dim_list[@]}\"; do\nfor batch_size in \"${batch_size_list[@]}\"; do\n\n    export output_model_dir=output/random/\"$model_3d\"/\"$dataset\"/\"$task\"_\"$split\"_\"$seed\"/\"$lr\"_\"$lr_scheduler\"_\"$emb_dim\"_\"$batch_size\"_\"$epochs\"\n    export output_file=\"$output_model_dir\"/result.out\n    mkdir -p \"$output_model_dir\"\n\n    python finetune_QM9.py \\\n    --model_3d=\"$model_3d\" --dataset=\"$dataset\" --epochs=\"$epochs\" \\\n    --task=\"$task\" \\\n    --split=\"$split\" --seed=\"$seed\" \\\n    --batch_size=\"$batch_size\" \\\n    --emb_dim=\"$emb_dim\" \\\n    --lr=\"$lr\" --lr_scheduler=\"$lr_scheduler\" --no_eval_train --print_every_epoch=1 --num_workers=8 \\\n    --output_model_dir=\"$output_model_dir\" \\\n    \u003e \"$output_file\"\n    \ndone\ndone\ndone\ndone\ndone\n```\n\nNow only the bash scripts for QM9 are available. We will release the complete version soon, together with Notebook demo. Please stay tuned.\n\n## Checkpoints\n\nCheckpoints for all the pretraining and downstream tasks will be released soon.\n\n## Cite us\n\nFeel free to cite this work if you find it useful to you!\n\n```\n@article{liu2023symmetry,\n    title={Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials},\n    author={Liu, Shengchao and Du, Weitao and Li, Yanjing and Li, Zhuoxinran and Zheng, Zhiling and Duan, Chenru and Ma, Zhiming and Yaghi, Omar and Anandkumar, Anima and Borgs, Christian and others},\n    journal={arXiv preprint arXiv:2306.09375},\n    year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchao1224%2FGeom3D","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchao1224%2FGeom3D","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchao1224%2FGeom3D/lists"}