https://github.com/filipspl/bayesian-svm-knime-scikit
Bayesian Optimization of SVM parameters with scikit-learn to be used in KNIME in Python-learner node
https://github.com/filipspl/bayesian-svm-knime-scikit
bayesian-optimization knime machine-learning python svm
Last synced: 8 months ago
JSON representation
Bayesian Optimization of SVM parameters with scikit-learn to be used in KNIME in Python-learner node
- Host: GitHub
- URL: https://github.com/filipspl/bayesian-svm-knime-scikit
- Owner: filipsPL
- Created: 2019-10-22T08:40:28.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-10-22T13:57:57.000Z (over 6 years ago)
- Last Synced: 2025-01-11T12:49:18.047Z (over 1 year ago)
- Topics: bayesian-optimization, knime, machine-learning, python, svm
- Language: Python
- Size: 1.05 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# bayesian-svm-knime-scikit
Bayesian Optimization of SVM parameters C and gamma, with scikit-learn, to be used in KNIME in Python learner node. Based on the [optimization functions by thuijskens](https://github.com/thuijskens/bayesian-optimization).
Why?
1. Parameter Optimization Loop Node(s) doesn't work as expected for some data. Including Bayesian optimization.
2. You may want to use scikit-learn instead of KNIME or Weka implementation.
3. You can tune this workflow to optimize other parameters for many different scikit algorithms.
## Setup
- In python node please select python2.
- copy&paste the python code into the code window of Python Learner (`python-learner.py`) and Python Predictor (`python-predictor.py`)
- sample workflow:

- fine tuning - edit variables at the top of the `python-learner.py`:
```python
# values of log10 gamma and C
# from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4534515/
# - log10(C) in [ - 2, 5]
# - log10(gamma) in [ - 10, 3]
bounds = np.array([[-2, 5], [-10, 3]])
# number of optimizations for bayesian optimizer
n_iters = 50
# number of initial samples to calculate
n_pre_samples=10
```
- please note: scripts (after slight modifications) can be run from the command line
- sample data file provided (`nr-ahr-lite.csv ` from my [tox21 dataset](https://github.com/filipsPL/tox21_dataset))
## Sample output
- standard output from the Python Learner gives you C, gamma and CV AUROC values:
```
best C 82404.4422051
best gamma 1.01295459839e-10
best AUROC 0.793847566575
```
- output ROC (from the ROC Curve node):
