Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ffstghc/caco2ml

Main code chunks used for models in the publication "Exploring the Potential of Adaptive, Local Machine Learning (ML) in Comparison ton the Prediction Performance of Global Models: A Case Study from Bayer's Caco-2 Permeability Database"
https://github.com/ffstghc/caco2ml

caco-2 local-models machine-learning pharmacokinetics scikit-learn

Last synced: about 3 hours ago
JSON representation

Host: GitHub
URL: https://github.com/ffstghc/caco2ml
Owner: ffstghc
Created: 2024-10-16T21:21:26.000Z (4 months ago)
Default Branch: main
Last Pushed: 2024-11-21T10:11:09.000Z (3 months ago)
Last Synced: 2024-12-19T02:23:10.626Z (about 2 months ago)
Topics: caco-2, local-models, machine-learning, pharmacokinetics, scikit-learn
Language: Python
Homepage:
Size: 27.3 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# "Exploring the Potential of Adaptive, Local Machine Learning (ML) in Comparison to the Prediction Performance of Global Models: A Case Study from Bayer's Caco-2 Permeability Database"
## _American Chemical Society (ACS): Journal of Chemical Information and Modeling (JCIM)_
### **_Frank Filip Steinbauer, Thorsten Lehr, Andreas Reichel_**
### http://pubs.acs.org/doi/abs/10.1021/acs.jcim.4c01083

Repository for archiving the main code chunks used for the local and global machine learning models in the publication **_"Exploring the Potential of Adaptive, Local Machine Learning (ML) in Comparison ton the Prediction Performance of Global Models: A Case Study from Bayer's Caco-2 Permeability Database"_** published in 2024 in **_ACS Journal of Chemical Information and Modeling (JCIM)_** as 1st publication of my doctoral studies at Bayer.

The five different included files contain the main code chunks for:
1. Data preparation (SMILES/molecule object standardization; PaDEL descriptor calculation)
2. Global models (including other descriptor calculations and recursive feature elimination with cross-validation as well as external TDC benchmarking^[1])
3. Local model (training data selection via fixed tanimoto similarity criteria)
4. Local model (training data selection via fixed amounts of most similar structuress)
5. Local model (training data selection via kNN^[2] as control/proof of superiority of the chosen tanimoto similarity approach)

If you have further questions or need additional parts of the utilized code for your own studies, feel free to contact [email protected].

[1]: https://tdcommons.ai/single_pred_tasks/adme#caco-2-cell-effective-permeability-wang-et-al
[2]: https://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html