{"id":17255094,"url":"https://github.com/unixjunkie/fasmifra","last_synced_at":"2025-07-16T09:38:00.627Z","repository":{"id":53898315,"uuid":"383967575","full_name":"UnixJunkie/FASMIFRA","owner":"UnixJunkie","description":"Molecular Generation by Fast Assembly of SMILES Fragments","archived":false,"fork":false,"pushed_at":"2024-10-31T09:27:26.000Z","size":1790,"stargazers_count":57,"open_issues_count":5,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-05T11:34:20.824Z","etag":null,"topics":["cadd","deepsmiles","distribution-matching","molecular-fragments","molecular-generation","smiles"],"latest_commit_sha":null,"homepage":"","language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UnixJunkie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-08T01:28:43.000Z","updated_at":"2025-06-03T19:33:32.000Z","dependencies_parsed_at":"2023-11-27T07:28:18.026Z","dependency_job_id":"4bf28acf-2eab-437a-a389-5d05b0ca8d7b","html_url":"https://github.com/UnixJunkie/FASMIFRA","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/UnixJunkie/FASMIFRA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2FFASMIFRA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2FFASMIFRA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2FFASMIFRA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2FFASMIFRA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UnixJunkie","download_url":"https://codeload.github.com/UnixJunkie/FASMIFRA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UnixJunkie%2FFASMIFRA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265500438,"owners_count":23777484,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cadd","deepsmiles","distribution-matching","molecular-fragments","molecular-generation","smiles"],"created_at":"2024-10-15T07:10:41.362Z","updated_at":"2025-07-16T09:38:00.602Z","avatar_url":"https://github.com/UnixJunkie.png","language":"OCaml","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FASMIFRA\n\nReference implementation for the article\n\"Molecular Generation by Fast Assembly of (Deep)SMILES Fragments\".\nGenerate molecules fast from a molecular training set while also\ndoing training-set distribution matching.\n\n\u003cimg src=\"TOC.png\" alt=\"logo\" width=\"400\"/\u003e\n\n# Installing the software\n\nrun ./install.sh\nIt should install FASMIFRA with all its dependencies automatically\nwithout requiring any user interaction.\n\nFor OCaml programmers, you can clone this repository\nthen type 'make \u0026\u0026 make install'.\nNote that you need to have opam installed and configured.\n\ninstall.sh does something like this:\n```bash\n(test -e /usr/local/bin/brew \u0026\u0026 brew install opam) || sudo apt install -y opam\nopam init -y\npip3 install rdkit\neval `opam config env`\nopam install --fake conf-rdkit\nopam install -y fasmifra\nwhich fasmifra_fragment.py\nwhich fasmifra\n```\n\n# Fragmenting molecules\n\nThose molecules are your \"molecular training set\".\n\n```bash\nfasmifra_fragment.py -i my_molecules.smi -o my_molecules_frags.smi\n```\n\nIf you fragment rather small molecules, you might want to use the -w option\nand pass a smaller recommended fragment weight than the default (150 Da).\n\n```\nusage: fasmifra_fragment.py [-h] [-i input.smi] [-o output.smi] [--seed SEED]\n                            [-n NB_PASSES] [-w FRAG_WEIGHT]\n\nfragment molecules (tag cleaved bonds)\n\noptional arguments:\n  -h, --help      show this help message and exit\n  -i input.smi    molecules input file\n  -o output.smi   fragments output file\n  --seed SEED     RNG seed\n  -n NB_PASSES    number of fragmentation passes\n  -w FRAG_WEIGHT  fragment weight (default=150Da)\n```\n\n# Generating molecules from fragments\n\n```bash\nfasmifra -n 100000 -i my_molecules_frags.smi -o my_molecules_gen.smi\n```\n\n```\nusage:\n  fasmifra\n  -n \u003cint\u003e: how many molecules to generate\n  -i \u003cfilename\u003e: smiles fragments input file\n  -o \u003cfilenams\u003e: output file\n  [--seed \u003cint\u003e]: RNG seed\n  [--deep-smiles]: input/output molecules in DeepSMILES no-rings format\n```\n\n# FASMIFRA in the GuacaMol benchmark\n\n|Benchmark    |Random sampler|SMILES LSTM|Graph MCTS|AAE  |ORGAN|VAE  |FASMIFRA|Negative control|\n|-------------|--------------|-----------|----------|-----|-----|-----|--------|----------------|\n|Validity     |1.000         |0.959      |1.000     |0.822|0.379|0.870|1.000   |1.000           |\n|Uniqueness   |0.997         |1.000      |1.000     |1.000|0.841|0.999|0.994   |0.959           |\n|Novelty      |0.000         |0.912      |0.994     |0.998|0.687|0.974|0.702   |0.947           |\n|KL_divergence|0.998         |0.991      |0.522     |0.886|0.267|0.982|0.959   |0.855           |\n|FCD          |0.929         |0.913      |0.015     |0.529|0.000|0.863|0.814   |0.397           |\n\n# Bibliography\n\n[1] Berenger, F., Tsuda, K.\nMolecular generation by Fast Assembly of (Deep)SMILES fragments.\nJ Cheminform 13, 88 (2021). https://doi.org/10.1186/s13321-021-00566-4\n\n[2] O'Boyle, N., \u0026 Dalke, A. (2018).\n\"DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures\".\nchemrxiv.org\n\n[3] Weininger, D. (1988). SMILES, a chemical language and information system.\n\"1. Introduction to methodology and encoding rules\".\nJournal of chemical information and computer sciences, 28(1), 31-36.\nhttps://doi.org/10.1021/ci00057a005\n\n[4] Klarich, K., Goldman, B., Kramer, T., Riley, P., \u0026 Walters, W. P. (2024). Thompson Sampling-An Efficient Method for Searching Ultralarge Synthesis on Demand Databases. Journal of Chemical Information and Modeling, 64(4), 1158-1171.\nhttps://doi.org/10.1021/acs.jcim.3c01790\n\n[5] Welford, B. P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3), 419-420.\nhttps://doi.org/10.1080/00401706.1962.10490022\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funixjunkie%2Ffasmifra","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funixjunkie%2Ffasmifra","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funixjunkie%2Ffasmifra/lists"}