{"id":19984570,"url":"https://github.com/intellabs/confflow","last_synced_at":"2025-05-04T06:33:25.405Z","repository":{"id":195674108,"uuid":"685129816","full_name":"IntelLabs/ConfFlow","owner":"IntelLabs","description":null,"archived":true,"fork":false,"pushed_at":"2024-09-02T19:26:23.000Z","size":45792,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-01T20:25:29.613Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IntelLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"security.md","support":null,"governance":null}},"created_at":"2023-08-30T15:21:56.000Z","updated_at":"2024-11-25T08:25:39.000Z","dependencies_parsed_at":"2023-09-19T06:50:54.812Z","dependency_job_id":null,"html_url":"https://github.com/IntelLabs/ConfFlow","commit_stats":null,"previous_names":["intellabs/confflow"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FConfFlow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FConfFlow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FConfFlow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FConfFlow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IntelLabs","download_url":"https://codeload.github.com/IntelLabs/ConfFlow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252299432,"owners_count":21725716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T04:19:33.147Z","updated_at":"2025-05-04T06:33:21.825Z","avatar_url":"https://github.com/IntelLabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DISCONTINUATION OF PROJECT #  \nThis project will no longer be maintained by Intel.  \nIntel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.  \nIntel no longer accepts patches to this project.  \n If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.  \n  \n# Conformation Generation using Transformer Flows #\n[Sohil Atul Shah](https://sites.google.com/site/sas21587/) and [Vladlen Koltun](http://vladlen.info/)\n\n[[`arXiv`](http://arxiv.org/abs/)] [[`BibTeX`](#CitingConfFlow)] \n\n![sequence](./assets/movie.gif)\n\n## Features\n* Fast synthesis of 3D molecular conformers for a given input 2D graph.\n* Highly interpretable procedure akin to force field updates in molecular dynamics simulation.\n\n## Requirements\n- Linux with Python 3.8\n- PyTorch \u003e= 1.13.1 and [torchvision](https://github.com/pytorch/vision/) that matches the PyTorch installation.\n  Install them together at [pytorch.org](https://pytorch.org) to make sure of this.\n- Create and install all dependencies using `conda env create -f environment.yml`\n\n## Getting Started\nTo train a model, first setup the corresponding datasets following [scripts/data_generation.sh](./scripts/data_generation.sh).\n### Datasets\nThe offical raw GEOM dataset is available [[here]](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JNGTDF).\n\n#### Dataset split and preprocessing\nFor ease of benchmarking and comparison, we use the default split of GEOM as provided by [[ConfGF]](https://github.com/DeepGraphLearning/ConfGF/). For each category of GEOM we first retrieve the corresponding dataset compressed file from [[google drive folder]](https://drive.google.com/drive/folders/10dWaj5lyMY0VY4Zl0zDPCa69cuQUGb-6?usp=sharing) and unpack it into the corresponding folder under [data](./data/) folder, eg: `data/GEOM_QM9/raw/`. Following this, we pack all the molecules and their smiles structures for all splits into a single raw dataset file as shown below.   \n```python\nimport pickle\n\nmols,smiles,mollist = [],[], []\nmollist += pickle.load(open('train_data_40k.pkl', 'rb'))\nmollist += pickle.load(open('val_data_5k.pkl', 'rb'))\nmollist += pickle.load(open('test_data_200.pkl', 'rb'))\nfor mol in mollist:\n  mols.append(mol.rdmol)\n  smiles.append(mol.smiles)\npickle.dump([mols, smiles], open('GEOM_QM9_molset_all.p', 'wb'))\n```\n\nWith split taken care of, we generate the preprocessed data by running their featurization code under the [[data]](./data/) folder. You can run the featurization code using [[data_generation]](./scripts/data_generation.sh) script,\n```commandline\nsh scripts/data_generation.sh GEOM_QM9\n```\n\nDuring the first training / evaluation run on the corresponding dataset, the `torch_geometric` package packs each split into its compressed form under the `./data/processed` folder for faster in-memory future retrievals.\n\nThe final dataset folder structure will look like this.\n```\nGEOM_QM9\n|___train_data_40k.pkl\n|___val_data_5k.pkl\n|___test_data_200.pkl\n|___GEOM_QM9_molset_all.p\n|___raw\n|   |___GEOM_QM9_molvec_graph_29.p\n|   |___GEOM_QM9_molset_graph_29.p\n|___processed\n|   |___train_1_0.pt\n|   |___val_1_0.pt\n|   |___test_1_0.pt\n|   |___train_1_0_extra.npz\n|   |___val_1_0_extra.npz\n|   |___test_1_0_extra.npz\n|   |___pre_transform.pt\n|   |___pre_filter.pt\n|\n...\n```\n\n### Training\nAll hyper-parameters and training details are provided in script files (`./scripts/train_*.sh`), and free feel to tune these parameters.\n\nYou can train the model with the following command:\n\n```commandline\nsh train_geom_qm9.sh output_dir_name\n```\nThe folder `output_dir_name` is created under the `./checkpoints/${dataset}/` directory. \n\n### Generation\nWe provide generation / evaluation scripts for the GEOM_Drugs data. Same can be replicated for other dataset by referring to the config listed in the training scripts. \n\nThe 3d structures of the test set is generated with the following command: \n```commandline\nsh test_geom_drugs.sh output_dir_name\n```\nThese generated structures are stored in file `generated.pkl` under the `output_dir_name` folder.\n\n### Evaluation\nFollowing the generation, one can run various evaluation with the following commands:\n```commandline\nsh evaluate.sh output_dir_name\nsh property_eval.sh output_dir_name\nsh distance_eval.sh output_dir_name test_data_200\n```\nThe `evaluate.sh` computes three scores `COV, MAT, MIS` w.r.t. to RMSD to GT structures (Table 1) using `evaluate.py`.\nWhereas the `property_eval.sh` evaluates the median of absolute prediction errors of various ensemble properties (Table 3) using `ensemble_property_pred.py`.\n\n### Visualization\nFollowing command generates the visual 3D structure of few sampled test data molecules. Feel free to modify files in order to generate different set of molecules and baselines.   \n\n```commandline\nsh extract_visual.sh output_dir_name\npython pymol_visual.py\n```\n\n## Model Zoo and Baselines\nWe provide a trained models for both GEOM_Drugs and GEOM_QM9 available in the [checkpoints](./checkpoints/) folder.\n\n\n## License\nShield: [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nThe source code and dataset are licensed under a [MIT License](LICENSE). In general, you can use the code for any purpose with proper attribution. If you do something interesting with the code, we'll be happy to know. Feel free to contact us.\n\n## \u003ca name=\"CitingConfFlow\"\u003e\u003c/a\u003eCiting ConfFlow\nIf you use ConfFlow code or trained models in your research, please use the following BibTeX entry.\n\n```BibTeX\n@inproceedings{shah2023ConfFlow,\n  title={Conformation Generation using Transformer Flows},\n  author={Sohil Atul Shah and Vladlen Koltun},\n  journal={arXiv:},\n  year={2023}\n}\n```\n\n## Contact\n\n[Sohil Shah](sohil.iitb@gmail.com)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintellabs%2Fconfflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintellabs%2Fconfflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintellabs%2Fconfflow/lists"}