Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/filipspl/bayesian-rf-knime-scikit

Bayesian optimization of RF via scikit in KNIME
https://github.com/filipspl/bayesian-rf-knime-scikit

bayesian-optimization knime knime-analytics-platform python random-forest scikit-learn

Last synced: 25 days ago
JSON representation

Bayesian optimization of RF via scikit in KNIME

Host: GitHub
URL: https://github.com/filipspl/bayesian-rf-knime-scikit
Owner: filipsPL
Created: 2019-11-08T11:12:56.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2019-11-08T13:09:56.000Z (about 5 years ago)
Last Synced: 2024-11-12T13:24:59.695Z (3 months ago)
Topics: bayesian-optimization, knime, knime-analytics-platform, python, random-forest, scikit-learn
Language: Python
Size: 1.09 MB
Stars: 1
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# bayesian-rf-knime-scikit

Bayesian Optimization of RandomForest parameters, with scikit-learn, to be used in KNIME in Python learner node. Based on https://github.com/fmfn/BayesianOptimization/ by fmfn.

Prerequisities:
- `pip install bayesian-optimization`

Why?
1. Parameter Optimization Loop Node(s) doesn't work as expected for some data. Including Bayesian optimization.
2. You may want to use scikit-learn instead of KNIME or Weka implementation.
3. You can tune this workflow to optimize other parameters for many different scikit algorithms.

## Setup

- In python node please select python2.
- copy&paste the python code into the code window of Python Learner (`python-learner.py`) and Python Predictor (`python-predictor.py`)
- sample workflow:

![](obrazki/README-94e22874.png)

- in the input table, the class should be in the last column
- fine tuning - edit variables at the top of the `python-learner.py`:

```python
#
# Bounded region of parameter space
#

parameterDict = { 'n_estimators': (100, 1200),
'max_depth': (5, 30),
'min_samples_split': (2, 100),
'min_samples_leaf': (1, 10)
}

#
# bayesian configuration
#

init_points = 5
n_iter = 20
```

- please note: scripts (after slight modifications) can be run from the command line
- sample data file provided (`nr-ahr-lite.csv ` from my [tox21 dataset](https://github.com/filipsPL/tox21_dataset))

## Standard output

Among some training progress data (static) info about best parameters found is displayed:

```
Best params: {'min_samples_split': 2, 'n_estimators': 205, 'max_depth': 30, 'min_samples_leaf': 1}
Best target value: 0.837006427916
```

## ROC output (ROC curve node)

![](obrazki/README-5f63414c.png)