{"id":29625160,"url":"https://github.com/novartis/pqsar2cpd","last_synced_at":"2025-07-21T06:07:29.666Z","repository":{"id":180419566,"uuid":"665104866","full_name":"Novartis/pqsar2cpd","owner":"Novartis","description":"pqsar2cpd is a deep learning algorithm for translation of activity profiles into novel molecules.","archived":false,"fork":false,"pushed_at":"2023-07-12T09:10:24.000Z","size":15,"stargazers_count":30,"open_issues_count":0,"forks_count":6,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-01-18T06:33:03.982Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Novartis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-11T12:52:38.000Z","updated_at":"2024-10-25T20:00:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"f982f982-3974-423c-a747-f0900d4a762c","html_url":"https://github.com/Novartis/pqsar2cpd","commit_stats":null,"previous_names":["novartis/pqsar2cpd"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Novartis/pqsar2cpd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpqsar2cpd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpqsar2cpd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpqsar2cpd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpqsar2cpd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Novartis","download_url":"https://codeload.github.com/Novartis/pqsar2cpd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpqsar2cpd/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266248501,"owners_count":23899056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-21T06:07:29.203Z","updated_at":"2025-07-21T06:07:29.654Z","avatar_url":"https://github.com/Novartis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pqsar2cpd - de novo generation of hit-like molecules from pQSAR pIC50 with AI-based generative chemistry\n\n[![python](https://img.shields.io/badge/Python-3.8-3776AB.svg?style=flat\u0026logo=python\u0026logoColor=white)](https://www.python.org) [![tensorflow](https://img.shields.io/badge/TensorFlow-2.8-FF6F00.svg?style=flat\u0026logo=tensorflow)](https://www.tensorflow.org) [![LICENSE](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/Novartis/pqsar2cpd/blob/main/LICENSE)\n\nThis repository contains the code of the conditional generative adversarial network capable of translating [pQSAR](https://github.com/Novartis/pQSAR) profiles of pIC50 values into novel chemical structures, as described in [[1]](https://www.biorxiv.org/content/10.1101/2021.12.10.472084v1)\n\nThe model itself operates entirely in the latent space. This means users can use any external molecular encoder/decoder to encode the molecules into vectors for training, and decode the output back to SMILES after inference. This way, pqsar2cpd can be implemented into any existing pipeline seamlessly. We have succesfully tested the approach with [CDDD](https://github.com/jrwnter/cddd), [JT-VAE](https://github.com/wengong-jin/icml18-jtnn), [HierVAE](https://github.com/wengong-jin/hgraph2graph), and [MoLeR](https://github.com/microsoft/molecule-generation). \n\nSince the model is input-agnostic, other property profiles, such as gene expression profiles or protein embeddings, could potentially be used instead of pQSAR to generate novel compounds.\n\n## Requirements\npqsar2cpd is implemented in Tensorflow. To make sure all your packages are compatible, you can install the dependencies using the provided requirements file:\n```\npip install -r requirements.txt\n```\n\n## Training\nTo train a new model, you need a set of compound vectors coming from a molecular encoder, and a matching set of property profiles. The compound and profile sets should be separate numpy arrays containing n-dimensional vectors, one row per compound, with 1:1 correspondence in indexing. If you're interested in using pQSAR profiles, you can follow the instructions in the [pQSAR](https://github.com/Novartis/pQSAR) repository.\n\nTo use the model out of the box, save the compounds and profiles as separate .npy files with NumPy.\n\nTo train the model, run:\n\n```\npython train.py --compounds='cpd.npy' --profiles='profiles.npy'\n```\nyou can also specify an optional argument for the number of epochs, e.g. `--epochs=400`.\n\nThe script will train the cGAN, and save the generator as pqsar2cpd.h5, which will be ready for use in inference.\n\n## Inference\nTo generate novel molecules out of a set of profiles, run:\n\n```\npython predict.py --model='pqsar2cpd.h5' --profiles='test.npy' --output='new_mols.h5' --n_samples=100\n```\nThis will load the profile numpy array from `test.npy` and will generate 100 samples for each of the profiles in the set. Then, the results will be saved in `new_mols.h5` in hdf5 format, with the samples stored as a dataset with the profile index as key. These can now be passed to the molecular decoder to get the SMILES.\n\n## Contact\nCode authored by [Michal Pikusa](mailto:michal.pikusa@novartis.com)\n\nContributions: **Florian Nigsch**, **W. Armand Guiguemde**, Eric Martin, William J. Godinez, Christian Kolter\n\n## References\n```\n[1] De-novo generation of novel phenotypically active molecules for Chagas disease from biological signatures using AI-driven generative chemistry\nMichal Pikusa, Olivier René, Sarah Williams, Yen-Liang Chen, Eric Martin, William J. Godinez, Srinivasa P S Rao, W. Armand Guiguemde, Florian Nigsch\nbioRxiv 2021.12.10.472084; doi: https://doi.org/10.1101/2021.12.10.472084\n`````","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnovartis%2Fpqsar2cpd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnovartis%2Fpqsar2cpd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnovartis%2Fpqsar2cpd/lists"}