https://github.com/ray-project/tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://github.com/ray-project/tune-sklearn

automl bayesian-optimization gridsearchcv hyperparameter-tuning scikit-learn

Last synced: 6 months ago
JSON representation

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.

Host: GitHub
URL: https://github.com/ray-project/tune-sklearn
Owner: ray-project
License: apache-2.0
Archived: true
Created: 2019-11-28T04:09:21.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-11-06T19:55:38.000Z (over 1 year ago)
Last Synced: 2025-01-16T10:14:28.392Z (6 months ago)
Topics: automl, bayesian-optimization, gridsearchcv, hyperparameter-tuning, scikit-learn
Language: Python
Homepage: https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Size: 13.8 MB
Stars: 467
Watchers: 17
Forks: 51
Open Issues: 35
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - ray-project/tune-sklearn - sklearn是一个 Scikit-Learn 的模型选择模块（GridSearchCV、RandomizedSearchCV）的替代方案，它使用最新的超参数调整技术。它与 Scikit-Learn API 兼容，只需修改少量代码即可使用。tune-sklearn 支持贝叶斯优化、HyperBand、BOHB等优化技术，并利用 Ray Tune 进行分布式超参数调整，可在多个核心和机器上并行化交叉验证。tune-sklearn 支持Scikit-Learn 模型，以及 Skorch（Pytorch）、KerasClassifier（Keras）和 XGBoostClassifier（XGBoost）等框架。对于某些估计器，tune-sklearn 可以启用增量训练和提前停止，例如支持“warm_start”的估计器、支持部分拟合的估计器以及 XGBoost、LightGBM 和 CatBoost 模型。 (参数优化)

README

        # tune-sklearn

[![pytest](https://github.com/ray-project/tune-sklearn/workflows/Development/badge.svg)](https://github.com/ray-project/tune-sklearn/actions?query=workflow%3A%22Development%22)

Tune-sklearn is a drop-in replacement for Scikit-Learn’s model selection module (GridSearchCV, RandomizedSearchCV) with cutting edge hyperparameter tuning techniques.

## ⚠️ `tune-sklearn` is no longer being maintained

The latest release `0.5.0` is the last version of the library that will be released by the Ray team, and it is compatible

with `ray>=2.7.x, ray<=2.9.x`. The library will not be guaranteed to work with future Ray versions.

The recommended alternative to keep up with the latest version of Ray is to migrate `tune-sklearn` usage

to the [Ray Tune APIs](https://docs.ray.io/en/latest/tune/getting-started.html) to accomplish the same thing.

Feel free to post an issue on the [Ray Github](https://github.com/ray-project/ray) if you run into any issues in migrating.

## Features

Here’s what tune-sklearn has to offer:

 * **Consistency with Scikit-Learn API**: Change less than 5 lines in a standard Scikit-Learn script to use the API [[example](https://github.com/ray-project/tune-sklearn/blob/master/examples/random_forest.py)].

 * **Modern tuning techniques**: tune-sklearn allows you to easily leverage Bayesian Optimization, HyperBand, BOHB, and other optimization techniques by simply toggling a few parameters.

 * **Framework support**: tune-sklearn is used primarily for tuning Scikit-Learn models, but it also supports and provides examples for many other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch) [[example](https://github.com/ray-project/tune-sklearn/blob/master/examples/torch_nn.py)], KerasClassifier (Keras) [[example](https://github.com/ray-project/tune-sklearn/blob/master/examples/keras_example.py)], and XGBoostClassifier (XGBoost) [[example](https://github.com/ray-project/tune-sklearn/blob/master/examples/xgbclassifier.py)].

 * **Scale up**: Tune-sklearn leverages [Ray Tune](http://tune.io/), a library for distributed hyperparameter tuning, to parallelize cross validation on multiple cores and even multiple machines without changing your code.

Check out our [API Documentation](docs) and [Walkthrough](https://docs.ray.io/en/master/tune/examples/tune-sklearn.html) (for `master` branch).

## Installation

### Dependencies

- numpy (>=1.16)

- [ray](http://docs.ray.io/) (>=2.7.0)

- scikit-learn (>=0.23)

### User Installation

`pip install tune-sklearn ray[tune]`

or

`pip install -U git+https://github.com/ray-project/tune-sklearn.git && pip install 'ray[tune]'`

### Tune-sklearn Early Stopping

For certain estimators, tune-sklearn can also immediately enable **incremental training and early stopping**. Such estimators include:

 * Estimators that implement 'warm_start' (except for ensemble classifiers and decision trees)

 * Estimators that implement partial fit

 * [XGBoost](https://github.com/dmlc/xgboost/issues/1686), LightGBM and [CatBoost](https://catboost.ai/docs/concepts/python-reference_train.html?lang=en) models (via incremental learning)

To read more about compatible scikit-learn models, see [scikit-learn's documentation at section 8.1.1.3](https://scikit-learn.org/stable/modules/computing.html#strategies-to-scale-computationally-bigger-data).

Early stopping algorithms that can be enabled include HyperBand and Median Stopping (see below for examples).

If the estimator does not support `partial_fit`, a warning will be shown saying early stopping cannot be done and it will simply run the cross-validation on Ray's parallel back-end.

Apart from early stopping scheduling algorithms, tune-sklearn also supports passing custom stoppers to Ray Tune. These

can be passed via the `stopper` argument when instantiating `TuneSearchCV` or `TuneGridSearchCV`.

See [the Ray documentation for an overview of available stoppers](https://docs.ray.io/en/master/tune/api_docs/stoppers.html).

## Examples

#### [TuneGridSearchCV](docs/tune_gridsearch.md)

To start out, it’s as easy as changing our import statement to get Tune’s grid search cross validation interface, and the rest is almost identical!

`TuneGridSearchCV` accepts dictionaries in the format `{ param_name: str : distribution: list }` or a list of such dictionaries, just like scikit-learn's `GridSearchCV`. The distribution can also be the output of Ray Tune's [`tune.grid_search`](https://docs.ray.io/en/master/tune/api/search_space.html).

```python

# from sklearn.model_selection import GridSearchCV

from tune_sklearn import TuneGridSearchCV

# Other imports

import numpy as np

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.linear_model import SGDClassifier

# Set training and validation sets

X, y = make_classification(n_samples=11000, n_features=1000, n_informative=50, n_redundant=0, n_classes=10, class_sep=2.5)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1000)

# Example parameters to tune from SGDClassifier

parameters = {

    'alpha': [1e-4, 1e-1, 1],

    'epsilon':[0.01, 0.1]

}

tune_search = TuneGridSearchCV(

    SGDClassifier(),

    parameters,

    early_stopping="MedianStoppingRule",

    max_iters=10

)

import time # Just to compare fit times

start = time.time()

tune_search.fit(X_train, y_train)

end = time.time()

print("Tune Fit Time:", end - start)

pred = tune_search.predict(X_test)

accuracy = np.count_nonzero(np.array(pred) == np.array(y_test)) / len(pred)

print("Tune Accuracy:", accuracy)

```

If you'd like to compare fit times with sklearn's `GridSearchCV`, run the following block of code:

```python

from sklearn.model_selection import GridSearchCV

# n_jobs=-1 enables use of all cores like Tune does

sklearn_search = GridSearchCV(

    SGDClassifier(),

    parameters,

    n_jobs=-1

)

start = time.time()

sklearn_search.fit(X_train, y_train)

end = time.time()

print("Sklearn Fit Time:", end - start)

pred = sklearn_search.predict(X_test)

accuracy = np.count_nonzero(np.array(pred) == np.array(y_test)) / len(pred)

print("Sklearn Accuracy:", accuracy)

```

#### [TuneSearchCV](docs/tune_search.md)

`TuneSearchCV` is an upgraded version of scikit-learn's `RandomizedSearchCV`.

It also provides a wrapper for several search optimization algorithms from Ray Tune's [searchers](https://docs.ray.io/en/master/tune/api/suggestion.html), which in turn are wrappers for other libraries. The selection of the search algorithm is controlled by the `search_optimization` parameter. In order to use other algorithms, you need to install the libraries they depend on (`pip install` column). The search algorithms are as follows:

| Algorithm          | `search_optimization` value | Summary                | Website                                                 | `pip install`              |

|--------------------|-----------------------------|------------------------|---------------------------------------------------------|--------------------------|

| (Random Search)    | `"random"`                  | Randomized Search      |                                                         | built-in                 |

| SkoptSearch        | `"bayesian"`                | Bayesian Optimization  | [[Scikit-Optimize](https://scikit-optimize.github.io/)] | `scikit-optimize`        |

| HyperOptSearch     | `"hyperopt"`                | Tree-Parzen Estimators | [[HyperOpt](http://hyperopt.github.io/hyperopt)]        | `hyperopt`               |

| TuneBOHB           | `"bohb"`                    | Bayesian Opt/HyperBand | [[BOHB](https://github.com/automl/HpBandSter)]          | `hpbandster ConfigSpace` |

| Optuna             | `"optuna"`                  | Tree-Parzen Estimators | [[Optuna](https://optuna.readthedocs.io/en/stable/)]    | `optuna`                 |

All algorithms other than RandomListSearcher accept parameter distributions in the form of dictionaries in the format `{ param_name: str : distribution: tuple or list }`.

Tuples represent real distributions and should be two-element or three-element, in the format `(lower_bound: float, upper_bound: float, Optional: "uniform" (default) or "log-uniform")`. Lists represent categorical distributions. [Ray Tune Search Spaces](https://docs.ray.io/en/master/tune/api/search_space.html) are also supported and provide a rich set of potential distributions. Search spaces allow for users to specify complex, potentially nested search spaces and parameter distributions. Furthermore, each algorithm also accepts parameters in their own specific format. More information in [Tune documentation](https://docs.ray.io/en/master/tune/api/suggestion.html).

Random Search (default) accepts dictionaries in the format `{ param_name: str : distribution: list }` or a list of such dictionaries, just like scikit-learn's `RandomizedSearchCV`.

```python

from tune_sklearn import TuneSearchCV

# Other imports

import scipy

from ray import tune

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.linear_model import SGDClassifier

# Set training and validation sets

X, y = make_classification(n_samples=11000, n_features=1000, n_informative=50, n_redundant=0, n_classes=10, class_sep=2.5)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1000)

# Example parameter distributions to tune from SGDClassifier

# Note the use of tuples instead if non-random optimization is desired

param_dists = {

    'loss': ['squared_hinge', 'hinge'], 

    'alpha': (1e-4, 1e-1, 'log-uniform'),

    'epsilon': (1e-2, 1e-1)

}

bohb_tune_search = TuneSearchCV(SGDClassifier(),

    param_distributions=param_dists,

    n_trials=2,

    max_iters=10,

    search_optimization="bohb"

)

bohb_tune_search.fit(X_train, y_train)

# Define the `param_dists using the SearchSpace API

# This allows the specification of sampling from discrete and 

# categorical distributions (below for the `learning_rate` scheduler parameter)

param_dists = {

    'loss': tune.choice(['squared_hinge', 'hinge']),

    'alpha': tune.loguniform(1e-4, 1e-1),

    'epsilon': tune.uniform(1e-2, 1e-1),

}

hyperopt_tune_search = TuneSearchCV(SGDClassifier(),

    param_distributions=param_dists,

    n_trials=2,

    early_stopping=True, # uses Async HyperBand if set to True

    max_iters=10,

    search_optimization="hyperopt"

)

hyperopt_tune_search.fit(X_train, y_train)

```

### Other Machine Learning Libraries and Examples

Tune-sklearn also supports the use of other machine learning libraries such as Pytorch (using Skorch) and Keras. You can find these examples here:

* [Keras](https://github.com/ray-project/tune-sklearn/blob/master/examples/keras_example.py)

* [LightGBM](https://github.com/ray-project/tune-sklearn/blob/master/examples/lgbm.py)

* [Sklearn Random Forest](https://github.com/ray-project/tune-sklearn/blob/master/examples/random_forest.py)

* [Sklearn Pipeline](https://github.com/ray-project/tune-sklearn/blob/master/examples/sklearn_pipeline.py)

* [Pytorch (Skorch)](https://github.com/ray-project/tune-sklearn/blob/master/examples/torch_nn.py)

* [XGBoost](https://github.com/ray-project/tune-sklearn/blob/master/examples/xgbclassifier.py)

## [Documentation](docs)

See the auto-generated docs [here](docs).

These are generated by `lazydocs` and should be updated on every release:

```bash

pip install lazydocs

lazydocs /path/to/tune-sklearn/tune-sklearn --src-base-url="https://github.com/ray-project/tune-sklearn/blob/master" --overview-file="README.md"

```

## More information

[Ray Tune](https://docs.ray.io/en/latest/tune/index.html)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ray-project/tune-sklearn

Awesome Lists containing this project

README