Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lewis-morris/skperopt

A hyperopt wrapper - simplifying hyperparameter tuning with Scikit-learn style estimators.
https://github.com/lewis-morris/skperopt

accuracy auc f1-score hyperopt hyperopt-wrapper pandas-dataframe randomsearch rmse sklearn-estimator

Last synced: about 1 month ago
JSON representation

A hyperopt wrapper - simplifying hyperparameter tuning with Scikit-learn style estimators.

Host: GitHub
URL: https://github.com/lewis-morris/skperopt
Owner: lewis-morris
License: mit
Created: 2020-01-19T11:44:45.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2020-04-04T04:07:28.000Z (over 4 years ago)
Last Synced: 2024-10-29T07:08:30.130Z (about 2 months ago)
Topics: accuracy, auc, f1-score, hyperopt, hyperopt-wrapper, pandas-dataframe, randomsearch, rmse, sklearn-estimator
Language: Python
Homepage:
Size: 2.61 MB
Stars: 5
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        
 





# Skperopt

 

A hyperopt wrapper - simplifying hyperparameter tuning with Scikit-learn style estimators.

Works with either classification evaluation metrics "f1", "auc" or "accuracy" AND regression "rmse" and "mse".

## Installation:

```

pip install skperopt

```

## Usage:

Just pass in an estimator, a parameter grid and Hyperopt will do the rest. No need to define objectives or write hyoperopt specific parameter grids. 

### Recipe (vanilla flavour):

- [x]  Import skperopt

- [x]  Initalize skperopt 

- [x]  Run skperopt.HyperSearch.search

- [x]  Collect the results

Code example below.

```python

import skperopt as sk

import pandas as pd

from sklearn.datasets import make_classification

from sklearn.neighbors import KNeighborsClassifier

#generate classification data

data = make_classification(n_samples=1000, n_features=10, n_classes=2)

X = pd.DataFrame(data[0])

y = pd.DataFrame(data[1])

#init the classifier

kn = KNeighborsClassifier()

param = {"n_neighbors": [int(x) for x in np.linspace(1, 60, 30)],

         "leaf_size": [int(x) for x in np.linspace(1, 60, 30)],

         "p": [1, 2, 3, 4, 5, 10, 20],

         "algorithm": ['auto', 'ball_tree', 'kd_tree', 'brute'],

         "weights": ["uniform", "distance"]}

#search parameters

search = sk.HyperSearch(kn, X, y, params=param)

search.search()

#gather and apply the best parameters

kn.set_params(**search.best_params)

#view run results

print(search.stats)

```

## HyperSearch parameters

* **est** (*[sklearn estimator]* required) 

> any sklearn style estimator

* **X** (*[pandas Dataframe]* required) 

> your training data

* **y** (*[pandas Dataframe]* required) 

> your training data

* **params** (*[dictionary]* required) 

> a parameter search grid 

* **iters** (default 500 *[int]*) 

> number of iterations to try before early stopping

* **time_to_search** (default None *[int]*) 

> time in seconds to run for before early stopping (None = no time limit)

* **cv** (default 5 *[int]*) 

> number of folds to use in cross_vaidation tests

* **cv_times** (default 1 *[int]*) 

> number of times to perfrom cross validation on a new random sample of the data -higher values decrease variance but increase run time

* **randomState** (default 10 *[int]*) 

> random state for the data shuffling

* **scorer** (default "f1" *[str]*) 

> type of evaluation metric to use - accepts classification "f1","auc","accuracy" or regression "rmse" and "mse"

* **verbose** (default 1 *[int]*) 

> amount of verbosity 

         0 = none 

         

         1 = some 

         

         2 = debug

* **random** (default - *False*) 

> should the data be randomized during the cross validation

* **foldtype** (default "Kfold" *[str]*) 

> type of folds to use - accepts "KFold", "Stratified"

## HyperSearch methods 

* **HyperSearch.search()** (None) 

> Used to search the parameter grid using hyperopt. No parameters need to be passed to the function. All parameters are set during initialization.

# Testing

With 100 tests of 150 search iterations for both RandomSearch and Skperopt Searches.

Skperopt (hyperopt) performs better than a RandomSearch, producing higher average f1 score with a smaller standard deviation.

![alt chart](./images/skperopt.png "Chart")

### Skperopt Search Results 

f1 score over 100 test runs:

> Mean **0.9340930**

> Standard deviation **0.0062275**

### Random Search Results

f1 score over 100 test runs 

> Mean **0.927461652**

> Standard deviation **0.0063314**

----------------------------------------------------------------------------

## Updates

### V0.0.73

* Added cv_times attr - runs the cross validation n times (ie cv (5x5) ) each iteration on a new randomly sampled data set

 this should reduce overfitting 

### V0.0.7

* Added **FIXED** RMSE eval metric 

* Added MSE eval metric