Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/unixjunkie/fasmifra
Molecular Generation by Fast Assembly of SMILES Fragments
https://github.com/unixjunkie/fasmifra
cadd deepsmiles distribution-matching molecular-fragments molecular-generation smiles
Last synced: 6 days ago
JSON representation
Molecular Generation by Fast Assembly of SMILES Fragments
- Host: GitHub
- URL: https://github.com/unixjunkie/fasmifra
- Owner: UnixJunkie
- License: gpl-3.0
- Created: 2021-07-08T01:28:43.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-09-24T08:33:31.000Z (about 1 month ago)
- Last Synced: 2024-10-15T07:11:14.558Z (22 days ago)
- Topics: cadd, deepsmiles, distribution-matching, molecular-fragments, molecular-generation, smiles
- Language: OCaml
- Homepage:
- Size: 1.71 MB
- Stars: 50
- Watchers: 3
- Forks: 8
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FASMIFRA
Reference implementation for the article
"Molecular Generation by Fast Assembly of (Deep)SMILES Fragments".
Generate molecules fast from a molecular training set while also
doing training-set distribution matching.# Installing the software
run ./install.sh
It should install FASMIFRA with all its dependencies automatically
without requiring any user interaction.For OCaml programmers, you can clone this repository
then type 'make && make install'.
Note that you need to have opam installed and configured.install.sh does something like this:
```bash
(test -e /usr/local/bin/brew && brew install opam) || sudo apt install -y opam
opam init -y
pip3 install rdkit
eval `opam config env`
opam install --fake conf-rdkit
opam install -y fasmifra
which fasmifra_fragment.py
which fasmifra
```# Fragmenting molecules
Those molecules are your "molecular training set".
```bash
fasmifra_fragment.py -i my_molecules.smi -o my_molecules_frags.smi
```If you fragment rather small molecules, you might want to use the -w option
and pass a smaller recommended fragment weight than the default (150 Da).```
usage: fasmifra_fragment.py [-h] [-i input.smi] [-o output.smi] [--seed SEED]
[-n NB_PASSES] [-w FRAG_WEIGHT]fragment molecules (tag cleaved bonds)
optional arguments:
-h, --help show this help message and exit
-i input.smi molecules input file
-o output.smi fragments output file
--seed SEED RNG seed
-n NB_PASSES number of fragmentation passes
-w FRAG_WEIGHT fragment weight (default=150Da)
```# Generating molecules from fragments
```bash
fasmifra -n 100000 -i my_molecules_frags.smi -o my_molecules_gen.smi
``````
usage:
fasmifra
-n : how many molecules to generate
-i : smiles fragments input file
-o : output file
[--seed ]: RNG seed
[--deep-smiles]: input/output molecules in DeepSMILES no-rings format
```# FASMIFRA in the GuacaMol benchmark
|Benchmark |Random sampler|SMILES LSTM|Graph MCTS|AAE |ORGAN|VAE |FASMIFRA|Negative control|
|-------------|--------------|-----------|----------|-----|-----|-----|--------|----------------|
|Validity |1.000 |0.959 |1.000 |0.822|0.379|0.870|1.000 |1.000 |
|Uniqueness |0.997 |1.000 |1.000 |1.000|0.841|0.999|0.994 |0.959 |
|Novelty |0.000 |0.912 |0.994 |0.998|0.687|0.974|0.702 |0.947 |
|KL_divergence|0.998 |0.991 |0.522 |0.886|0.267|0.982|0.959 |0.855 |
|FCD |0.929 |0.913 |0.015 |0.529|0.000|0.863|0.814 |0.397 |# Bibliography
[1] Berenger, F., Tsuda, K.
Molecular generation by Fast Assembly of (Deep)SMILES fragments.
J Cheminform 13, 88 (2021). https://doi.org/10.1186/s13321-021-00566-4[2] O'Boyle, N., & Dalke, A. (2018).
"DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures".
chemrxiv.org[3] Weininger, D. (1988). SMILES, a chemical language and information system.
"1. Introduction to methodology and encoding rules".
Journal of chemical information and computer sciences, 28(1), 31-36.
https://doi.org/10.1021/ci00057a005[4] Klarich, K., Goldman, B., Kramer, T., Riley, P., & Walters, W. P. (2024). Thompson Sampling-An Efficient Method for Searching Ultralarge Synthesis on Demand Databases. Journal of Chemical Information and Modeling, 64(4), 1158-1171.
https://doi.org/10.1021/acs.jcim.3c01790[5] Welford, B. P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3), 419-420.
https://doi.org/10.1080/00401706.1962.10490022