Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/llnl/smallmoleval
Using machine learning to score potential drug candidates may offer an advantage over traditional imprecise scoring functions because the parameters and model structure can be learned from the data. However, models may lack interpretability, are often overfit to the data, and are not generalizable to drug targets and chemotypes not in the training data. Benchmark datasets are prone to artificial enrichment and analogue bias due to the overrepresentation of certain scaffolds in experimentally determined active sets. Datasets can be evaluated using spatial statistics to quantify the dataset topology and better understand potential biases. Dataset clumping comprises a combination of self-similarity of actives and separation from decoys in chemical space and is associated with overoptimistic virtual screening results. This code explores methods of quantifying potential biases and examines some common benchmark datasets.
https://github.com/llnl/smallmoleval
machine-learning python statistics
Last synced: 24 days ago
JSON representation
Using machine learning to score potential drug candidates may offer an advantage over traditional imprecise scoring functions because the parameters and model structure can be learned from the data. However, models may lack interpretability, are often overfit to the data, and are not generalizable to drug targets and chemotypes not in the training data. Benchmark datasets are prone to artificial enrichment and analogue bias due to the overrepresentation of certain scaffolds in experimentally determined active sets. Datasets can be evaluated using spatial statistics to quantify the dataset topology and better understand potential biases. Dataset clumping comprises a combination of self-similarity of actives and separation from decoys in chemical space and is associated with overoptimistic virtual screening results. This code explores methods of quantifying potential biases and examines some common benchmark datasets.
- Host: GitHub
- URL: https://github.com/llnl/smallmoleval
- Owner: LLNL
- License: mit
- Created: 2018-10-04T22:46:28.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-06-19T18:44:13.000Z (over 5 years ago)
- Last Synced: 2024-11-11T21:38:50.670Z (3 months ago)
- Topics: machine-learning, python, statistics
- Language: Python
- Homepage:
- Size: 24.4 KB
- Stars: 3
- Watchers: 7
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
SmallMolEval
----------------
Using machine learning to score potential drug candidates may offer an advantage over traditional imprecise scoring functions because the parameters and model structure can be learned from the data. However, models may lack interpretability, are often overfit to the data, and are not generalizable to drug targets and chemotypes not in the training data. Benchmark datasets are prone to artificial enrichment and analogue bias due to the overrepresentation of certain scaffolds in experimentally determined active sets. Datasets can be evaluated using spatial statistics to quantify the dataset topology and better understand potential biases. Dataset clumping comprises a combination of self-similarity of actives and separation from decoys in chemical space and is associated with overoptimistic virtual screening results. This code explores methods of quantifying potential biases and examines some common benchmark datasets.Documentation
----------------
File:
remove_AVE_bias2.py
slight modification on atomwise script to split datarun_remove_AVE_bias.py
example of running remove_AVE_bias2.py on DUDE datasetmain.py,main_activeonly.py,main.old.py
scripts that run the MUV spatial statisticsDescriptorSets.py
mostly contains functions used by MUV statistics and called in main filesgf.plot
plots MUV statisticsmakegraphs.py
uses gf.plots to make whole dataset plotsanalyze_AVE_bias.py
no revisions from atomwise, computes the bias score and AUC of ligand based modelsaveanalyze.py
runs analyze_AVE_bias.py for directory of multiple directories containing splits on different receptorsAuthors
----------------SmallMolEval was written by Dr. Sally Ellingson.
Release
----------------SmallMolEval is released under an MIT license. For more details see the
NOTICE and LICENSE files.``LLNL-CODE-759342``