{"id":17596248,"url":"https://github.com/lamm-mit/moleculediffusiontransformer","last_synced_at":"2025-04-13T09:24:31.245Z","repository":{"id":215981674,"uuid":"623354846","full_name":"lamm-mit/MoleculeDiffusionTransformer","owner":"lamm-mit","description":"Molecular generation using diffusion models and autoregressive transformer models","archived":false,"fork":false,"pushed_at":"2024-01-28T10:40:08.000Z","size":8606,"stargazers_count":11,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-27T01:01:35.042Z","etag":null,"topics":["ai","chemistry","data","design","dft","generative","modeling","molecular-design","quantum-mechanics"],"latest_commit_sha":null,"homepage":"https://github.com/lamm-mit/MoleculeDiffusionTransformer","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lamm-mit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-04T07:53:13.000Z","updated_at":"2025-03-09T18:31:11.000Z","dependencies_parsed_at":"2024-10-23T03:21:30.075Z","dependency_job_id":null,"html_url":"https://github.com/lamm-mit/MoleculeDiffusionTransformer","commit_stats":null,"previous_names":["lamm-mit/moleculediffusiontransformer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamm-mit%2FMoleculeDiffusionTransformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamm-mit%2FMoleculeDiffusionTransformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamm-mit%2FMoleculeDiffusionTransformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamm-mit%2FMoleculeDiffusionTransformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lamm-mit","download_url":"https://codeload.github.com/lamm-mit/MoleculeDiffusionTransformer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248689846,"owners_count":21146016,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chemistry","data","design","dft","generative","modeling","molecular-design","quantum-mechanics"],"created_at":"2024-10-22T08:24:34.271Z","updated_at":"2025-04-13T09:24:31.184Z","avatar_url":"https://github.com/lamm-mit.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MoleculeDiffusionTransformer\n\n![image](https://user-images.githubusercontent.com/101393859/230594718-d818b4a4-6af9-4df3-a7a9-ae2d86918c16.png)\n\n## Installation\n\n```\nconda create -n MoleculeDiffusionTransformer python=3.8\nconda activate MoleculeDiffusionTransformer\n```\nClone repository\n```\ngit clone https://github.com/lamm-mit/MoleculeDiffusionTransformer/\ncd MoleculeDiffusionTransformer\n```\nTo install MoleculeDiffusionTransformer:\n```\npip install -e .\n```\nStart Jupyter Lab (or Jupyter Notebook):\n```\njupyter-lab --no-browser\n```\n\n## Datasets\n\nThe QM9 dataset is used for training. Download via and place in the home folder:\n\n```\nwget https://www.dropbox.com/s/gajj3euub7k9p9j/qm9_.csv?dl=0 -O qm9_.csv\ndf=pd.read_csv(\"qm9_.csv\")\ndf.describe()\n```\n\n## Model overview\n\n1. Forward diffusion model (predicts molecular properties from SMILES input)\n2. Forward transformer model, using an encoder architecture (predicts molecular properties from SMILES input)\n3. Generative inverse diffusion model (predicts molecular designs via SMILES codes from molecular properties input, solving the inverse problem)\n4. Generative inverse transformer mode, using an autoregressive decoder (predicts molecular designs via SMILES codes from molecular properties input, solving the inverse problem)\n\n### Pretrained weights\n\n- Weights for model 1: https://www.dropbox.com/s/wft4uhcj8287ojt/statedict_save-model-epoch_78.pt?dl=0 (place in ```diffusion_forward``` folder)\n\n- Weights for model 2: https://www.dropbox.com/s/6hkd5vpw738o4so/statedict_save-model-epoch_10.pt?dl=0 (place in ```transformer_forward``` folder)\n\n- Weights for model 3: https://www.dropbox.com/s/xzb2bb4eo1m859p/statedict_save-model-epoch_4851.pt?dl=0 (place in ```QM_generative_diffusion_inverse``` folder)\n\n- Weights for model 4: https://www.dropbox.com/s/fqu6mogj4yw2rcc/statedict_save-model-epoch_2861.pt?dl=0 (place in ```QM_generative_transformer_inverse``` folder)\n\nModels 1 and 2, and respectively, models 3 and 4 solve the same task, albeit with distinct neural network architectures and strategies. \n\n#### Download all weights and place in proper folders:\n\n```\nwget https://www.dropbox.com/s/wft4uhcj8287ojt/statedict_save-model-epoch_78.pt?dl=0 -O ./diffusion_forward/statedict_save-model-epoch_78.pt\nwget https://www.dropbox.com/s/xzb2bb4eo1m859p/statedict_save-model-epoch_4851.pt?dl=0 -O ./QM_generative_diffusion_inverse/statedict_save-model-epoch_4851.pt\nwget https://www.dropbox.com/s/fqu6mogj4yw2rcc/statedict_save-model-epoch_2861.pt?dl=0 -O ./QM_generative_transformer_inverse/statedict_save-model-epoch_2861.pt\nwget https://www.dropbox.com/s/6hkd5vpw738o4so/statedict_save-model-epoch_10.pt?dl=0 -O ./transformer_forward/statedict_save-model-epoch_78.pt\n```\n\n## Sample results\n\nBoth generative models can generate novel molecular structure that meet a set of properties. The image below shows a comparison of the predicted properties with the set of required properties, along with the molecular structure. \n\n![image](https://user-images.githubusercontent.com/101393859/230594632-11e80aab-05ba-497e-9ed3-01d58b6c3d21.png)\n\n## General use of the models\n\nThe models are set up in a flexible way so they can be used to generate any kind of sqeuence data from conditioning, using both the diffusion or transformer models. Below are examples of how this can be done. To implement this for systems other than, say, SMILES representations, the traing, sampling and property prediction methods need to be rewritten. \n\n### Forward diffusion model (predicts molecular properties from  input): Basic model setup \n\n```\nfrom   MoleculeDiffusion import QMDiffusionForward,predict_properties_from_SMILES,ADPM2Sampler\n\npred_dim=1 #Prediction embedding dimension, 1 here since we're predicting a max_featuresx1 tensor with the properties\nmax_length_forward=64\n \ncontext_embedding_max_length=y_data.shape[1]\nmodel_forward =QMDiffusionForward( \n        max_length=max_length_forward, #length of predicted data\n        pred_dim=pred_dim,\n        channels=64,\n        unet_type='cfg', #'base', #'cfg',\n        context_embedding_max_length=max_length_forward, #length of conditioning \n        pos_emb_fourier=True,\n        pos_emb_fourier_add=False,\n        text_embed_dim = 64,\n        embed_dim_position=64,\n        ) .to(device)  \n```\n\n### Generative inverse diffusion mode: Basic model setup \n\n```\nfrom MoleculeDiffusion import QMDiffusion \n\ndevice='cpu'\nmax_length = 64\npred_dim=16 #dimension equals number of unique tokens\ncontext_embedding_max_length=12 #dimension equals length of conditioning, i.e. number of molecular features to be considered\n\nmodel =QMDiffusion( \n        max_length=max_length,#length of predicted results, i.e. max length of the SMILES string\n        pred_dim=pred_dim,\n        channels=64,\n        unet_type='cfg', #'base', #'cfg',\n        context_embedding_max_length=context_embedding_max_length,#length of conditioning \n        pos_emb_fourier=True,\n        pos_emb_fourier_add=False,\n        text_embed_dim = 64,\n        embed_dim_position=64,\n        )  .to(device)\n\nsequences= torch.randn(4, context_embedding_max_length ).to (device) #conditioning sequence; note, max_text_len=12, \noutput=torch.randint (0,pred_dim, (4, pred_dim , max_length)).to(device).float() #batch, number of tokens, length (length is flexible)\n \nloss=model(sequences=sequences, #conditioning sequence (set of floating points)\n           output=output, #desired result (e.g. one-hot encoded sequence\n        )\nloss.backward()\nloss\n\n#Generate\ngenerated=model.sample (sequences,\n              device,\n              cond_scale=1.,\n              timesteps=64,\n              clamp=False,\n              )\n \nprint (generated.shape) #(b, pred_dim, max_length])\n```\n\n### Generative inverse transformer model: Basic model setup \n\n#### Model that takes input in the form (batch, num_tokens, length); MSE loss\n\nIn this case, the input and output dimension is the same.\n```\nfrom  MoleculeDiffusion import MoleculeTransformer \n\nlogits_dim = 32 #number of tokens\nMolTrans = MoleculeTransformer(\n        dim=128,\n        depth=6,\n        logits_dim=logits_dim, #number of tokens, and also input/output dimension\n        dim_head = 16,\n        heads = 8,\n        dropout = 0.,\n        ff_mult = 4,\n        text_embed_dim = 32,\n        cond_drop_prob = 0.25,\n        max_text_len = 12, #max length of conditioning sequence\n        pos_fourier_graph_dim= 32, #entire graph fourier embedding, will be added to logits_dim\n        \n).cuda()\n\nsequences= torch.randn(4, 12 ).cuda() #conditioning sequence; note, max_text_len=12, \noutput=torch.randint (0,logits_dim, (4, logits_dim , 128)).cuda().float() #batch, number of tokens, length (length is flexible)\n \nloss=MolTrans(\n        sequences=sequences,#conditioning sequence\n        output=output,\n        text_mask = None,\n        return_loss = True,\n)\nloss.backward()\nloss\n\n#Generate\ngenerated = MolTrans.generate (   sequences=sequences,#conditioning\n                                 tokens_to_generate=128, #can also generate less....\n                                 cond_scale = 1., temperature=1,  \n                              )  \nprint (generated.shape) #(b, number_tokens, tokens_to_generate])\n```\n\n#### Model that takes input in the form of a sequence (batch, length); Cross Entropy loss (used in the paper)\n\n```\nfrom   MoleculeDiffusion import MoleculeTransformerSequence, count_parameters\nlogits_dim = 32 #number of tokens\n\nmodel = MoleculeTransformerSequence(\n        dim=128,\n        depth=6,\n        logits_dim=logits_dim, #number of tokens  \n        dim_head = 16,\n        heads = 8,\n        dropout = 0.,\n        ff_mult = 4,\n        text_embed_dim = 32, # conditioning embedding\n        cond_drop_prob = 0.25,\n        max_text_len = 12, #max length of conditioning sequence\n        pos_fourier_graph_dim= 32, #entire graph fourier embedding, will be added to logits_dim\n              \n).cuda()\n\nsequences= torch.randn(4, 12 ).cuda() #conditioning sequence; note, max_text_len=12, \noutput=torch.randint (0,logits_dim, (4,  23)).cuda().long() #batch, length (length is flexible)\nprint (output.shape)\nloss=model(\n          sequences=sequences,#conditioning sequence\n          output=output,\n          text_mask = None,\n          return_loss = True,\n          )\nloss.backward()\nloss\n\n#if no start token provided: Model will randomly select one\ngenerated = model.generate(    sequences=sequences,#conditioning\n        tokens_to_generate=32, #can also generate less....\n        cond_scale = 1., #temperature=3,  \n        )  \n     \n#Generate start token\noutput_start=torch.randint (0,logits_dim, (4,  1)).cuda().long() #batch, length (length is flexible)\n\ngenerated = model.generate(sequences=sequences,#conditioning\n                           output=output_start, #this is the sequence to start with...\n                           tokens_to_generate=32, #can also generate less....\n                           cond_scale = 1., temperature=1,  \n                           )  \nprint (generated.shape) #(b, tokens_to_generate+1) \n```\n\n#### More flexible model that takes input in the form of a sequence (batch, length), with different embedding/internal dim Cross Entropy loss (used in the paper)\n\n```\nfrom   MoleculeDiffusion import MoleculeTransformerSequenceInternaldim, count_parameters\nlogits_dim = 32 #number of tokens\n\nmodel = MoleculeTransformerSequenceInternaldim(\n        dim=128,\n        depth=6,\n        logits_dim=logits_dim, #number of tokens  \n        dim_head = 16,\n        heads = 8,\n        dropout = 0.,\n        ff_mult = 4,\n        embed_dim = 16, # input embedding \n        text_embed_dim = 32, # conditioning embedding\n        cond_drop_prob = 0.25,\n        max_text_len = 12, #max length of conditioning sequence\n        pos_fourier_graph_dim= 32, #entire graph fourier embedding, will be added to logits_dim\n              \n).cuda()\n\nsequences= torch.randn(4, 12 ).cuda() #conditioning sequence; note, max_text_len=12, \noutput=torch.randint (0,logits_dim, (4,  23)).cuda().long() #batch, length (length is flexible)\nprint (output.shape)\nloss=model(\n          sequences=sequences,#conditioning sequence\n          output=output,\n          text_mask = None,\n          return_loss = True,\n          )\nloss.backward()\nloss\n\n#if no start token provided: Model will randomly select one\ngenerated = model.generate(    sequences=sequences,#conditioning\n        tokens_to_generate=32, #can also generate less....\n        cond_scale = 1., #temperature=3,  \n        )  \n     \n#Generate start token\noutput_start=torch.randint (0,logits_dim, (4,  1)).cuda().long() #batch, length (length is flexible)\n\ngenerated = model.generate(sequences=sequences,#conditioning\n                           output=output_start, #this is the sequence to start with...\n                           tokens_to_generate=32, #can also generate less....\n                           cond_scale = 1., temperature=1,  \n                           )  \nprint (generated.shape) #(b, tokens_to_generate+1) \n```\n\n### Forward transformer model (predicts molecular properties from  input): Basic model setup \n\nThis model takes a tokenized sequence and produces an encoded output. \n\n```\nmax_length        =64\nlogits_dim_length =12\nlogits_dim        = 1           #output will be b, logits_dim, logits_dim_length\n\nmodel = MoleculeTransformerSequenceEncoder(\n        dim=64,\n        depth=3,\n        logits_dim=logits_dim,                            \n        logits_dim_length = logits_dim_length , #  OUTPUT: (b, logits_dim, logits_dim_length)\n        max_length = max_length, #  \n        dim_head = 8,\n        heads = 8,\n        dropout = 0.1,\n        ff_mult = 2.,\n        max_tokens= num_words,\n        embed_dim = 16,  #for sequence embedding\n        padding_token=0, #used for mask generation\n\n).cuda()\n\nseq_input=torch.randint (0,num_words, (4, max_length)).cuda()  #batch, max_length  \npred=model(seq_input) # 4, logits_dim, logits_dim_length\n``` \n\n### Multi-task transformer model, full multi-headed attention\n\nFull multi-headed autoregressive model, can be used to train multi-task fully text based model. \n\n```\nfrom MoleculeDiffusion import MoleculeTransformerGPT\n\nlogits_dim = num_words #number of tokens\nmodel = MoleculeTransformerGPT(\n        dim=256,\n        depth=12,\n        logits_dim=logits_dim, #number of tokens \n        max_tokens = logits_dim,\n        dim_head = 16,\n        heads = 16,\n        dropout = 0.,\n        ff_mult = 2,\n        one_kv_head=False,\n        embed_dim = 8, #for input sequence\n        text_embed_dim = 8, #256, #for sequenc conditiing\n).cuda()\noptimizer = optim.Adam(model.parameters() , lr=0.0002)\n```\n\n\n## Utility functions (e.g. drawing SMILES representations) \n\n```\nfrom MoleculeDiffusion import draw_and_save, draw_and_save_set\n#this function draws and saves a set of SMILES codes\ndraw_and_save_set (smiles = ['CC(=CC(C)(C)CCCCCCCC(=O)O)C1CCC2C(=CC=C3CC(O)CC(O)C3)CCCC21C',\n                             'CCCC(C)'],\n                   fname='out.png',\n                   plot_title=None,\n                   figsize=1,\n                   mols_per_row=2,\n                )\n\n#This function compates a predicted SMILES code with a ground truth one\ndraw_and_save (smi = 'CC=C', GTsmile = 'CNC=C', fname='out.png', add_Hs=True)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flamm-mit%2Fmoleculediffusiontransformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flamm-mit%2Fmoleculediffusiontransformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flamm-mit%2Fmoleculediffusiontransformer/lists"}