{"id":28243773,"url":"https://github.com/membrizard/ml_conformer_generator","last_synced_at":"2026-04-11T13:21:37.103Z","repository":{"id":288441728,"uuid":"919955485","full_name":"Membrizard/ml_conformer_generator","owner":"Membrizard","description":"Shape-constrained molecule generation via Equivariant Diffusion and GCN","archived":false,"fork":false,"pushed_at":"2025-08-09T15:27:50.000Z","size":36364,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-08-09T17:28:13.171Z","etag":null,"topics":["chemistry","conformers","diffusion-models","graph-convolutional-networks","molecule-generation","rdkit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Membrizard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-21T10:06:01.000Z","updated_at":"2025-08-09T15:27:53.000Z","dependencies_parsed_at":"2025-04-18T05:20:13.597Z","dependency_job_id":"b21c31a5-24fd-4fd8-b22e-80e165aaf68d","html_url":"https://github.com/Membrizard/ml_conformer_generator","commit_stats":null,"previous_names":["membrizard/ml_conformer_generator"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/Membrizard/ml_conformer_generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Membrizard%2Fml_conformer_generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Membrizard%2Fml_conformer_generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Membrizard%2Fml_conformer_generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Membrizard%2Fml_conformer_generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Membrizard","download_url":"https://codeload.github.com/Membrizard/ml_conformer_generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Membrizard%2Fml_conformer_generator/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269969197,"owners_count":24505424,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-11T02:00:10.019Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chemistry","conformers","diffusion-models","graph-convolutional-networks","molecule-generation","rdkit"],"created_at":"2025-05-19T07:07:58.930Z","updated_at":"2026-04-11T13:21:37.092Z","avatar_url":"https://github.com/Membrizard.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ML Conformer Generator\n[![DOI](https://img.shields.io/badge/DOI-10.1039%2FD5DD00318K-blue)](https://doi.org/10.1039/D5DD00318K)\n\n\u003cimg src=\"https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/logo/mlconfgen_logo.png\" width=\"120\" style=\"display: block; margin: 0 10%;\"\u003e\n\n**ML Conformer Generator** \nis a tool for spatially-aware molecule generation with an Equivariant Diffusion Model (EDM)\nand a Graph Convolutional Network (GCN). It is designed to generate 3D molecular conformations\nthat are both chemically valid and spatially similar to a reference shape.\n\n---\n\n## Molecule Generation in Action\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/animations/rotating_animation_optim_560p_db.gif\" width=\"400\" style=\"margin-left: 12px;\" /\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/animations/closeup_optim_640p_db.gif\" width=\"400\" /\u003e\n\u003c/p\u003e\n\n---\n\n## Supported features\n\n* **Shape-guided molecular generation**\n\n    Generate novel molecules that conform to arbitrary 3D shapes—such as protein binding pockets or custom-defined spatial regions.\n\n\n* **Objective-guided Generation**\n    \n    Use reinforcement learning (RL) to steer molecular generation toward higher-scoring candidates, with support for custom scoring functions.\n\n\n* **Reference-based conformer similarity**\n\n    Create molecules conformations of which closely resemble a reference structure, supporting scaffold-hopping and ligand-based design workflows.\n\n\n* **Fragment-based inpainting**\n\n    Fix specific substructures or fragments within a molecule and complete or grow the rest in a geometrically consistent manner.\n\n\n* **Inertial Fragment Matching**\n\n    Generate molecules fragment-by-fragment by leveraging the physical properties of the shape descriptor, improving both shape similarity and chemical validity.\n\n\n## Citation\n\nIf you use **MLConfGen** in your research, please cite:\n\nDenis Sapegin, Fedor Bakharev, Dmitry Krupenya, Azamat Gafurov, Konstantin Pildish, and Joseph C. Bear.  \n*Moment of inertia as a simple shape descriptor for diffusion-based shape-constrained molecular generation.*  \nDigital Discovery, 2025.\nDOI: [10.1039/D5DD00318K](https://doi.org/10.1039/D5DD00318K)\n\n---\n## Installation\n\n1. Install the package for your preferred backend:\n\n   *  `pip install mlconfgen[torch]` — use the PyTorch-based inference pipeline\n\n   *  `pip install mlconfgen[onnx]` — use the torch-free ONNX runtime version\n\n\n2. Load the weights from Huggingface\n\u003e https://huggingface.co/Membrizard/ml_conformer_generator\n\n`edm_moi_chembl_15_39.pt`\n\n`adj_mat_seer_chembl_15_39.pt`\n\n---\n\n## 🐍 Python API\n\nSee interactive examples: `./python_api_demo.ipynb`\n\n```python\nfrom rdkit import Chem\nfrom mlconfgen import MLConformerGenerator, evaluate_samples\n\nmodel = MLConformerGenerator(\n                             edm_weights=\"./edm_moi_chembl_15_39.pt\",\n                             adj_mat_seer_weights=\"./adj_mat_seer_chembl_15_39.pt\",\n                             diffusion_steps=100,\n                            )\n\nreference = Chem.MolFromMolFile('./assets/demo_files/ceyyag.mol')\n\nsamples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2)\n\naligned_reference, std_samples = evaluate_samples(reference, samples)\n```\n---\n\n## 🚀 Overview\n\nThis solution employs:\n\n- **Equivariant Diffusion Model (EDM) [[1]](https://doi.org/10.48550/arXiv.2203.17003)**: For generating atom coordinates and types under a shape constraint.\n- **Graph Convolutional Network (GCN) [[2]](https://doi.org/10.1039/D3DD00178D)**: For predicting atom adjacency matrices.\n- **Deterministic Standardization Pipeline**: For refining and validating generated molecules.\n\n---\n\n## 🧠 Model Training\n\n- Trained on **1.6 million** compounds from the **ChEMBL** database.\n- Filtered to molecules with **15–39 heavy atoms**.\n- Supported elements: `H, C, N, O, F, P, S, Cl, Br`.\n\n---\n\n## 🧪 Standardization Pipeline\n\nThe generated molecules are post-processed through the following steps:\n\n- Largest Fragment picker\n- Valence check\n- Kekulization\n- RDKit sanitization\n- Constrained Geometry optimization via **MMFF94** Molecular Dynamics\n\n---\n\n## 📏 Evaluation Pipeline\n\nAligns and Evaluates shape similarity between generated molecules and a reference using\n**Shape Tanimoto Similarity [[3]](https://doi.org/10.1007/978-94-017-1120-3_5 )** via Gaussian Molecular Volume overlap.\n\n\u003e Hydrogens are ignored in both reference and generated samples for this metric.\n\n---\n\n## 📊 Performance (100 Denoising Steps)\n\n*Tested on 100,000 samples using 1,000 CCDC Virtual Screening [[4]](https://www.ccdc.cam.ac.uk/support-and-resources/downloads/) reference compounds.*\n\n### General Overview\n\n- ⏱ **Avg time to generate 50 valid samples**: 11.46 sec (NVIDIA H100) (100 samples batch)\n- ⚡️ **Generation speed**: 4.18 valid molecules/sec (100 samples batch)\n- 💾 **GPU memory (per generation thread)**: Up to 14.0 GB (`float16` 39 atoms 100 samples)\n- 📐 **Avg Shape Tanimoto Similarity**: 53.32% (Basic generation) - 69.97% (Inertial Fragment Matching)\n- 🎯 **Max Shape Tanimoto Similarity**: 99.69%\n- 🔬 **Avg Chemical Tanimoto Similarity (2-hop 2048-bit Morgan Fingerprints)**: 10.87%\n- 🧬 **% Chemically novel (vs. training set)**: 99.84%\n- ✔️ **% Valid molecules (post-standardization)**: 48% (ML Bond Prediction) - 93% (OpenBabel bond prediction)\n- 🔁 **% Unique molecules in generated set**: 99.94%\n- 📎 **Fréchet Fingerprint Distance (2-hop 2048-bit Morgan Fingerprints)**:  \n  - To ChEMBL: 4.13  \n  - To PubChem: 2.64  \n  - To ZINC (250k): 4.95\n\n### PoseBusters [[5]](https://doi.org/10.1039/D3SC04185A) validity check results:\n\n**Overall stats**:\n\n  - PB-valid molecules: **91.33 %**\n\n**Detailed Problems**:\n\n   - position: 0.01 %\n   - mol_pred_loaded: 0.0 %\n   - sanitization: 0.01 %\n   - inchi_convertible: 0.01 %\n   - all_atoms_connected: 0.0 %\n   - bond_lengths: 0.24 %\n   - bond_angles: 0.70 %\n   - internal_steric_clash: 2.31 %\n   - aromatic_ring_flatness: 3.34 %\n   - non-aromatic_ring_non-flatness: 0.27 %\n\n### Synthesizability of the generated compounds\n\n#### SA Score [[6]](https://doi.org/10.1186/1758-2946-1-8)\n\n*1 (easy to make) - 10 (very difficult to make)*\n\n**Average SA Score**: **3.18**\n\n\u003cimg src=\"https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/benchmarks/sa_score_dist.png\" width=\"300\"\u003e\n\n---\n\n## RL Fine Tuning\n\nMLConformerGenerator supports objective-guided reinforcement learning (RL) fine-tuning, allowing you to steer the generated molecular distribution toward molecules that better match your desired properties.\n\nScoring functions are fully customizable. The only requirement is that they accept a list of RDKit `Mol` objects and return a list of scores in the range `[0, 1]`.\n\nA scoring function should follow this interface:\n\n```python\nfrom rdkit import Chem\n\ndef scoring_function(mols: list[Chem.Mol | None]) -\u003e list[float]:\n    ...\n```\n### Example: RL fine-tuning\n\n\u003e [!NOTE]\n\u003e If `scoring_function` is None, a default scoring function enforcing validity is applied for RL.\n\n```python\nfrom rdkit import Chem\nfrom mlconfgen import MLConformerGenerator\n\nmodel = MLConformerGenerator(\n                             edm_weights=\"./edm_moi_chembl_15_39.pt\",\n                             adj_mat_seer_weights=\"./adj_mat_seer_chembl_15_39.pt\",\n                             diffusion_steps=10,\n                            )\n\nreference = Chem.MolFromMolFile('./assets/demo_files/ceyyag.mol')\n\nmodel.fine_tune(\n                  reference_conformer=reference,\n                  variance=1,\n                  n_epochs=20,\n                  sigma=60.0,\n                  lambda_edm_adapter=1.5,\n                  temperature=1.5,\n                  n_samples_per_mol=16,\n                  eval_every=5,\n                  save_dir=\"./rl_checkpoints\"\n)\n\n\n\n```\n\nFine-tuning produces both the best and the latest checkpoints, which can later be loaded into the model:\n\n```python\nfrom mlconfgen import MLConformerGenerator\n\nmodel = MLConformerGenerator(\n                             edm_weights=\"./edm_moi_chembl_15_39.pt\",\n                             adj_mat_seer_weights=\"./adj_mat_seer_chembl_15_39.pt\",\n                             finetune_checkpoint = \"./finetune_checkpoint.pt\",\n                             diffusion_steps=10,\n                            )\n\n# Or\n\nmodel.load_finetune_checkpoint(\"./finetune_checkpoint.pt\")\n\n```\n\n### REINVENT4 compatibility\n\nThe RL fine-tuning pipeline is compatible with scoring functions from [REINVENT4](https://github.com/MolecularAI/REINVENT4/tree/main).\nIf REINVENT4 is installed, you can use `ReinventScoreWrapper` to load a REINVENT4 scoring configuration and use MLConfGen as a spatially-aware molecule generator.\n\nFor working examples, see `rl_fine_tuning_demo.ipynb.`\n\n```python\nfrom rdkit import Chem\nfrom mlconfgen import MLConformerGenerator\nfrom mlconfgen.rl_fine_tuning.reinvent_score_wrapper import ReinventScoreWrapper\n\nmodel = MLConformerGenerator(\n                             edm_weights=\"./edm_moi_chembl_15_39.pt\",\n                             adj_mat_seer_weights=\"./adj_mat_seer_chembl_15_39.pt\",\n                             diffusion_steps=10,\n                            )\n\nreference = Chem.MolFromMolFile('./assets/demo_files/ceyyag.mol')\nscoring_function = ReinventScoreWrapper(\"./assets/demo_files/scoring_config.toml\")\n\nmodel.fine_tune(\n                  scoring_function=scoring_function, \n                  reference_conformer=reference,\n                  variance=1,\n                  n_epochs=100,\n                  train_batch_size=128,\n                  eval_batch_size=128,\n                  learning_rate= 8e-5,\n                  sigma=128.0,\n                  lambda_edm_adapter=1.5,\n                  lambda_edm_reg=0.2,\n                  temperature=1.5,\n                  n_samples_per_mol=32,\n                  eval_every=5,\n                  save_dir=\"./rl_checkpoints_reinvent\",\n\n)\n\n```\n\n---\n\n## Generation Examples\n\n![ex1](https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/ref_mol/molecule_1.png)\n![ex2](https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/ref_mol/molecule_2.png)\n![ex3](https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/ref_mol/molecule_3.png)\n![ex4](https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/ref_mol/molecule_4.png)\n\n---\n\n## 💾 Access \u0026 Licensing\n\nThe **Python package and inference code are available on GitHub** under Apache 2.0 License\n\u003e https://github.com/Membrizard/ml_conformer_generator\n\nThe trained model **Weights** are available at\n\n\u003e https://huggingface.co/Membrizard/ml_conformer_generator\n\nAnd are licensed under CC BY-NC-ND 4.0\n\nThe usage of the trained weights for any profit-generating activity is restricted.\n\nFor commercial licensing and inference-as-a-service, contact:\n[Denis Sapegin](https://github.com/Membrizard)\n\n---\n\n## ONNX Inference:\nFor torch Free inference an ONNX version of the model is present. \n\nWeights of the model in ONNX format are available at:\n\u003e https://huggingface.co/Membrizard/ml_conformer_generator\n\n`egnn_chembl_15_39.onnx`\n\n`adj_mat_seer_chembl_15_39.onnx`\n\n\n```python\nfrom mlconfgen import MLConformerGeneratorONNX\nfrom rdkit import Chem\n\nmodel = MLConformerGeneratorONNX(\n                                 egnn_onnx=\"./egnn_chembl_15_39.onnx\",\n                                 adj_mat_seer_onnx=\"./adj_mat_seer_chembl_15_39.onnx\",\n                                 diffusion_steps=100,\n                                )\n\nreference = Chem.MolFromMolFile('./assets/demo_files/yibfeu.mol')\nsamples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2)\n\n```\nInstall ONNX GPU runtime (if needed):\n`pip install onnxruntime-gpu`\n\n---\n## Export to ONNX\nAn option to compile the model to ONNX is provided\n\nrequires `onnxscript==0.2.2`\n\n`pip install onnxscript`\n\n```python\nfrom mlconfgen import MLConformerGenerator\nfrom onnx_export import export_to_onnx\n\nmodel = MLConformerGenerator()\nexport_to_onnx(model)\n```\nThis compiles and saves the ONNX files to: `./`\n\n---\n## Testing\n\nTo execute all tests (including slow generation ones)\n\n`pytest -v tests`\n\nTo bypass generation tests\n\n`pytest -v tests -m \"not slow\"`\n\n---\n\n## Streamlit App\n\n![streamlit_app](https://raw.githubusercontent.com/Membrizard/ml_conformer_generator/main/assets/app_ui/streamlit_app.png)\n\n### Running\n- Move the trained PyTorch weights into `./streamlit_app`\n\n`./streamlit_app/edm_moi_chembl_15_39.pt`\n\n`./streamlit_app/adj_mat_seer_chembl_15_39.pt`\n\n- Install the dependencies `pip install -r ./streamlit_app/requirements.txt`\n\n- Bring the app UI up:\n  ```commandline\n  cd ./streamlit_app\n  streamlit run app.py\n  ```\n\n### Streamlit App Development\n\n1. To enable development mode for the 3D viewer (`stspeck`), set `_RELEASE = False` in `./streamlit/stspeck/__init__.py`.\n\n2. Navigate to the 3D viewer frontend and start the development server:\n   ```commandline\n   cd ./frontend/speck/frontend\n   npm run start\n   ```\n   \n   This will launch the dev server at `http://localhost:3001`\n\n3. In a separate terminal, run the Streamlit app from the root frontend directory: \n   ```commandline\n   cd ./streamlit_app\n   streamlit run app.py\n   ```\n\n4. To build the production version of the 3D viewer, run:\n   ```commandline\n   cd ./streamlit_app/stspeck/frontend\n   npm run build\n   ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmembrizard%2Fml_conformer_generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmembrizard%2Fml_conformer_generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmembrizard%2Fml_conformer_generator/lists"}