Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lewis-morris/skperopt
A hyperopt wrapper - simplifying hyperparameter tuning with Scikit-learn style estimators.
https://github.com/lewis-morris/skperopt
accuracy auc f1-score hyperopt hyperopt-wrapper pandas-dataframe randomsearch rmse sklearn-estimator
Last synced: about 1 month ago
JSON representation
A hyperopt wrapper - simplifying hyperparameter tuning with Scikit-learn style estimators.
- Host: GitHub
- URL: https://github.com/lewis-morris/skperopt
- Owner: lewis-morris
- License: mit
- Created: 2020-01-19T11:44:45.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-04-04T04:07:28.000Z (over 4 years ago)
- Last Synced: 2024-10-29T07:08:30.130Z (about 2 months ago)
- Topics: accuracy, auc, f1-score, hyperopt, hyperopt-wrapper, pandas-dataframe, randomsearch, rmse, sklearn-estimator
- Language: Python
- Homepage:
- Size: 2.61 MB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Skperopt
A hyperopt wrapper - simplifying hyperparameter tuning with Scikit-learn style estimators.Works with either classification evaluation metrics "f1", "auc" or "accuracy" AND regression "rmse" and "mse".
## Installation:
```
pip install skperopt
```## Usage:
Just pass in an estimator, a parameter grid and Hyperopt will do the rest. No need to define objectives or write hyoperopt specific parameter grids.
### Recipe (vanilla flavour):
- [x] Import skperopt
- [x] Initalize skperopt
- [x] Run skperopt.HyperSearch.search
- [x] Collect the resultsCode example below.
```python
import skperopt as skimport pandas as pd
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier#generate classification data
data = make_classification(n_samples=1000, n_features=10, n_classes=2)
X = pd.DataFrame(data[0])
y = pd.DataFrame(data[1])#init the classifier
kn = KNeighborsClassifier()
param = {"n_neighbors": [int(x) for x in np.linspace(1, 60, 30)],
"leaf_size": [int(x) for x in np.linspace(1, 60, 30)],
"p": [1, 2, 3, 4, 5, 10, 20],
"algorithm": ['auto', 'ball_tree', 'kd_tree', 'brute'],
"weights": ["uniform", "distance"]}#search parameters
search = sk.HyperSearch(kn, X, y, params=param)
search.search()#gather and apply the best parameters
kn.set_params(**search.best_params)#view run results
print(search.stats)```
## HyperSearch parameters
* **est** (*[sklearn estimator]* required)
> any sklearn style estimator* **X** (*[pandas Dataframe]* required)
> your training data* **y** (*[pandas Dataframe]* required)
> your training data* **params** (*[dictionary]* required)
> a parameter search grid* **iters** (default 500 *[int]*)
> number of iterations to try before early stopping* **time_to_search** (default None *[int]*)
> time in seconds to run for before early stopping (None = no time limit)* **cv** (default 5 *[int]*)
> number of folds to use in cross_vaidation tests* **cv_times** (default 1 *[int]*)
> number of times to perfrom cross validation on a new random sample of the data -higher values decrease variance but increase run time* **randomState** (default 10 *[int]*)
> random state for the data shuffling* **scorer** (default "f1" *[str]*)
> type of evaluation metric to use - accepts classification "f1","auc","accuracy" or regression "rmse" and "mse"* **verbose** (default 1 *[int]*)
> amount of verbosity0 = none
1 = some
2 = debug* **random** (default - *False*)
> should the data be randomized during the cross validation* **foldtype** (default "Kfold" *[str]*)
> type of folds to use - accepts "KFold", "Stratified"## HyperSearch methods
* **HyperSearch.search()** (None)
> Used to search the parameter grid using hyperopt. No parameters need to be passed to the function. All parameters are set during initialization.# Testing
With 100 tests of 150 search iterations for both RandomSearch and Skperopt Searches.
Skperopt (hyperopt) performs better than a RandomSearch, producing higher average f1 score with a smaller standard deviation.
![alt chart](./images/skperopt.png "Chart")
### Skperopt Search Results
f1 score over 100 test runs:
> Mean **0.9340930**
> Standard deviation **0.0062275**
### Random Search Results
f1 score over 100 test runs
> Mean **0.927461652**
> Standard deviation **0.0063314**
----------------------------------------------------------------------------
## Updates
### V0.0.73
* Added cv_times attr - runs the cross validation n times (ie cv (5x5) ) each iteration on a new randomly sampled data set
this should reduce overfitting### V0.0.7
* Added **FIXED** RMSE eval metric
* Added MSE eval metric