{"id":24824746,"url":"https://github.com/akensert/molcraft","last_synced_at":"2025-03-25T23:49:04.873Z","repository":{"id":274028471,"uuid":"921704737","full_name":"akensert/molcraft","owner":"akensert","description":"Generative deep learning for molecules using transformers.","archived":false,"fork":false,"pushed_at":"2025-02-07T18:17:40.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-07T19:24:14.110Z","etag":null,"topics":["computational-chemistry","deep-learning","machine-learning","molecules","smiles","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/akensert.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-24T13:08:26.000Z","updated_at":"2025-02-07T18:17:43.000Z","dependencies_parsed_at":"2025-01-24T13:29:17.910Z","dependency_job_id":null,"html_url":"https://github.com/akensert/molcraft","commit_stats":null,"previous_names":["akensert/molcraft"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fmolcraft","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fmolcraft/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fmolcraft/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fmolcraft/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/akensert","download_url":"https://codeload.github.com/akensert/molcraft/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245563040,"owners_count":20635907,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-chemistry","deep-learning","machine-learning","molecules","smiles","transformers"],"created_at":"2025-01-30T20:57:08.474Z","updated_at":"2025-03-25T23:49:04.868Z","avatar_url":"https://github.com/akensert.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"https://github.com/akensert/molcraft/blob/main/media/molcraft-logo.png\" alt=\"molcraft-logo\" width=\"90%\"\u003e\n\n**Transformers** with **TensorFlow** and **Keras**. Focused on **Molecule Generation** and **Chemistry Predictions**.\n\n\u003e [!NOTE]\n\u003e In progress.\n\n## Highlights\n\nAims to implement efficient models, samplers and \\[soon\\] reinforcement learning for [SMILES](https://en.wikipedia.org/wiki/Simplified_Molecular_Input_Line_Entry_System) generation and optimization.\n\n- [Models](https://github.com/akensert/molcraft/blob/main/molcraft/models.py) / [Layers](https://github.com/akensert/molcraft/blob/main/molcraft/layers.py)\n    - Implements **key-value caching** for efficient autoregression\n- [Samplers](https://github.com/akensert/molcraft/blob/main/molcraft/samplers.py)\n    - Samples [Models](https://github.com/akensert/molcraft/blob/main/molcraft/models.py) for next tokens\n    - Can **generate** a batch of **sequences** in parallel **non-eagerly**\n    - Can generate a batch of sequences based on **initial sequences of varying lengths**\n- [Tokenizers](https://github.com/akensert/molcraft/blob/main/molcraft/tokenizers.py)\n    - Tokenizes data input for [Models](https://github.com/akensert/molcraft/blob/main/molcraft/models.py)\n    - Can be **adapted** to data via **tokenizer.adapt(ds)** to build vocabulary\n    - Can be added as a layer to **keras.Sequential**\n    - Can both **tokenize** and **detokenize** data \n\n## Code Examples\n\n```python\nimport tensorflow as tf\nimport keras\nimport random\n\nfrom molcraft import tokenizers\nfrom molcraft import models\nfrom molcraft import samplers \n\nfilename = './data/zinc250K.txt' # replace this with actual path\n\nwith open(filename, 'r') as fh:\n    smiles = fh.read().splitlines()\n\nrandom.shuffle(smiles)\n\n# Adapt tokenizer (create vocabulary)\ntokenizer = tokenizers.SMILESTokenizer(add_bos=True, add_eos=True)\ntokenizer.adapt(smiles)\n\n# Build dataset (input pipeline)\nds = tf.data.Dataset.from_tensor_slices(smiles)\nds = ds.shuffle(8192)\nds = ds.batch(256)\nds = ds.map(tokenizer)\nds = ds.map(lambda x: (x[:, :-1], x[:, 1:]))\nds = ds.prefetch(-1)\n\n# Build, compile, and fit model\nmodel = models.TransformerDecoder(\n    num_layers=4,\n    num_heads=8,\n    embedding_dim=512,\n    intermediate_dim=1024,\n    vocabulary_size=tokenizer.vocabulary_size,\n    sequence_length=tokenizer.sequence_length,\n    dropout=0,\n)\nmodel.compile(\n    optimizer=keras.optimizers.Adam(learning_rate=3e-4), \n    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n)\nmodel.fit(ds, epochs=1)\n\n# Generate 32 novel SMILES with sampler\nsampler = samplers.TopKSampler(model, tokenizer)\nsmiles = sampler.sample([''] * 32)\n```\n\n## Installation\n\u003e [!NOTE]\n\u003e Project is under development, hence incomplete and subject to breaking changes.\n\nFor GPU users:\n```\ngit clone git@github.com:akensert/molcraft.git\npip install -e .[gpu]\n```\nFor CPU users:\n```\ngit clone git@github.com:akensert/molcraft.git\npip install -e .\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakensert%2Fmolcraft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakensert%2Fmolcraft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakensert%2Fmolcraft/lists"}