Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/daoyuanli2816/molecule-generator
Variational Autoencoder (VAE)-based molecular SMILES string generator
https://github.com/daoyuanli2816/molecule-generator
ai4science chemistry generative-model molecular-simulation smiles-strings tokenizer vae-pytorch
Last synced: 6 days ago
JSON representation
Variational Autoencoder (VAE)-based molecular SMILES string generator
- Host: GitHub
- URL: https://github.com/daoyuanli2816/molecule-generator
- Owner: DaoyuanLi2816
- License: mit
- Created: 2024-06-20T21:59:49.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-06-21T03:24:18.000Z (5 months ago)
- Last Synced: 2024-06-22T17:08:26.656Z (5 months ago)
- Topics: ai4science, chemistry, generative-model, molecular-simulation, smiles-strings, tokenizer, vae-pytorch
- Language: Python
- Homepage:
- Size: 610 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VAE-Based SMILES String Generator
This project is a Variational Autoencoder (VAE)-based molecular SMILES string generator. It generates molecules composed of CHOH/CH2OH (referred to as A) and CH/CH2/CH3 (referred to as B) repeat units. The generated molecules are saturated and contain no rings.
![Image](./molecule.png)
## Project Structure
The project consists of the following Python scripts:
- `VAE.py`: Defines the VAE model and includes functions for training and testing the model.
- `generate.py`: Generates new SMILES strings by perturbing the latent space of the trained VAE.
- `interpolate.py`: Generates interpolated SMILES strings between two given SMILES strings using the latent space of the trained VAE.
- `synthetic_dataset.py`: Generates a synthetic dataset of SMILES strings based on specified constraints.## Features
- Generates over 100,000 synthetic SMILES strings.
- Only A and B repeat units are included.
- No molecule contains more than six consecutive A repeat units.
- All molecules in the dataset are saturated and contain no rings.## Installation
1. Clone the repository:
```bash
git clone https://github.com/DaoyuanLi2816/Molecule-Generator.git
cd Molecule-Generator
```2. Install the required dependencies:
```bash
pip install -r requirements.txt
```3. Ensure you have RDKit installed. RDKit is required for molecular operations. Installation instructions can be found [here](https://www.rdkit.org/docs/Install.html).
## Usage
### Generating Synthetic Dataset
To generate a synthetic dataset of SMILES strings, run `synthetic_dataset.py`:
```bash
python synthetic_dataset.py
```
This will create a CSV file named `molecules.csv` containing the generated SMILES strings.### Training the VAE Model
To train the VAE model, run `VAE.py`:
```bash
python VAE.py
```
This will train the VAE model on the generated dataset and save the trained model as `beta_tc_vae_model.pth`.### Generating New SMILES Strings
To generate new SMILES strings using the trained VAE model, run `generate.py`:
```bash
python generate.py
```
This will output new SMILES strings generated by perturbing the latent space of the trained VAE.### Interpolating Between Two SMILES Strings
To generate interpolated SMILES strings between two given SMILES strings, run `interpolate.py`:
```bash
python interpolate.py
```
This will output SMILES strings that are interpolations between the two input SMILES strings in the latent space of the trained VAE.## Contributing
If you would like to contribute to this project, please open an issue or submit a pull request. We welcome contributions from the community.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.