https://github.com/filipspl/optuml

Optuna-optimized ML methods, with scikit-learn like API
https://github.com/filipspl/optuml

hyperparameter-optimization hyperparameter-tuning machine-learning optuna python python-module scikit-learn

Last synced: 3 months ago
JSON representation

Optuna-optimized ML methods, with scikit-learn like API

Host: GitHub
URL: https://github.com/filipspl/optuml
Owner: filipsPL
License: mit
Created: 2024-09-12T08:09:13.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-09-27T13:31:35.000Z (9 months ago)
Last Synced: 2025-03-19T13:18:44.547Z (4 months ago)
Topics: hyperparameter-optimization, hyperparameter-tuning, machine-learning, optuna, python, python-module, scikit-learn
Language: Python
Homepage:
Size: 87.9 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        OptuML: Hyperparameter Optimization for Multiple Machine Learning Algorithms using Optuna

=============================

```

  ⣰⡁ ⡀⣀ ⢀⡀ ⣀⣀    ⡎⢱ ⣀⡀ ⣰⡀ ⡀⢀ ⡷⢾ ⡇    ⠄ ⣀⣀  ⣀⡀ ⢀⡀ ⡀⣀ ⣰⡀   ⡎⢱ ⣀⡀ ⣰⡀ ⠄ ⣀⣀  ⠄ ⣀⣀ ⢀⡀ ⡀⣀

  ⢸  ⠏  ⠣⠜ ⠇⠇⠇   ⠣⠜ ⡧⠜ ⠘⠤ ⠣⠼ ⠇⠸ ⠧⠤   ⠇ ⠇⠇⠇ ⡧⠜ ⠣⠜ ⠏  ⠘⠤   ⠣⠜ ⡧⠜ ⠘⠤ ⠇ ⠇⠇⠇ ⠇ ⠴⠥ ⠣⠭ ⠏ 

```

`OptuML` (for *Optu*na and *ML*) is a Python module that provides hyperparameter optimization for several machine learning algorithms using the [Optuna](https://optuna.org/) framework. The module supports a variety of algorithms and allows easy hyperparameter tuning through a scikit-learn-like API.

[![Python manual install](https://github.com/filipsPL/optuml/actions/workflows/python-package.yml/badge.svg)](https://github.com/filipsPL/optuml/actions/workflows/python-package.yml) [![Python pip install](https://github.com/filipsPL/optuml/actions/workflows/python-pip.yml/badge.svg)](https://github.com/filipsPL/optuml/actions/workflows/python-pip.yml) [![pypi version](https://img.shields.io/pypi/v/optuml)](https://pypi.org/project/optuml/)

```

Input                  OptuML train                           Predict                        

┌─────────────────┐    ┌──────────────────────────────────┐   ┌─────────────────────────────┐

│X_train, y_train ┼────► clf = Optimizer(algorithm="SVC") ├───► y_pred = clf.predict(X_test)│

└─────────────────┘    │ clf.fit(X_train, y_train)        │   │                             │

┌─────────────────┐    └─▲────────────────────────────────┘   └─────────────────────────▲───┘

│ML algorithm     ├──────┘                                                              │    

└─────────────────┘                                                            X_test───┘    

```

## Features

- **Multiple Algorithms**: Supports hyperparameter optimization for the following algorithms:

  - Scikit-learn zoo, plus:

  - CatBoost

  - XGBoost

- **Optuna Framework**: Leverages Optuna for powerful hyperparameter search.

- **Maximize or Minimize**: Allows setting the optimization direction (`maximize` or `minimize`).

- **Scikit-learn API**: Provides a consistent interface for `fit()`, `predict()`, `predict_proba()`, and `score()` methods.

- **Control Output**: Optionally run Optuna with granular verbosity settings (`verbose` as `bool` or `int`).

- **Cross-validation**: Easily integrate cross-validation with custom scoring metrics (e.g., accuracy, ROC AUC).

## Installation

### a) pip

`pip install optuml`

or upgrade with:

`pip install optuml --upgrade`

### b) Manual way

You can install the required packages via `pip`:

```bash

pip install optuna scikit-learn catboost xgboost numpy wrapt_timeout_decorator

```

Next just fetch the `optuml.py` [file from the repo](optuml/optuml.py) and put it in the directory with your script.

## Usage

### Basic Example

Here’s how you can use the `Optimizer` class to optimize hyperparameters for different machine learning algorithms using the **Iris** dataset:

```python

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from OptuML import Optimizer

# Load the Iris dataset

X, y = load_iris(return_X_y=True)

# Split the dataset into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Instantiate the optimizer for SVC

optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", verbose=True)

# Fit the optimizer to the training data

optimizer.fit(X_train, y_train)

# Predict on the test set

y_pred = optimizer.predict(X_test)

# Calculate the accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")

# Print the best hyperparameters found during optimization

print(f"Best Hyperparameters: {optimizer.best_params_}")

```

### Available Algorithms

The `Optimizer` class supports the following algorithms. You can specify the `algorithm` parameter to choose which one to use:

##### Classifiers

| Algorithm                | Type                           |

|--------------------------|--------------------------------|

| `AdaBoostClassifier`    | AdaBoost Classifier            |

| `CatBoostClassifier`    | CatBoost Classifier            |

| `GaussianNB`            | Gaussian Naive Bayes           |

| `KNeighborsClassifier`  | k-Nearest Neighbors Classifier |

| `MLPClassifier`         | Multi-layer Perceptron Classifier |

| `RandomForestClassifier`| Random Forest Classifier       |

| `SVC`                  | Support Vector Classifier      |

| `XGBClassifier`         | XGBoost Classifier             |

| `QDA`                  | Quadratic Discriminant Analysis |

##### Regressors

| Algorithm                | Type                           |

|--------------------------|--------------------------------|

| `AdaBoostRegressor`     | AdaBoost Regressor             |

| `CatBoostRegressor`     | CatBoost Regressor             |

| `KNeighborsRegressor`   | k-Nearest Neighbors Regressor  |

| `MLPRegressor`          | Multi-layer Perceptron Regressor  |

| `RandomForestRegressor` | Random Forest Regressor        |

| `SVR`                  | Support Vector Regressor       |

| `XGBRegressor`          | XGBoost Regressor              |

### Controlling Verbosity

You can control the verbosity of Optuna's output by using the `verbose` parameter:

- Set `verbose=True` for standard logging.

- Use an `int` value to specify more granular verbosity levels (e.g., `optuna.logging.DEBUG`).

```python

optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", verbose=True)

```

- and/or show a progress bar:

```python

optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", show_progress_bar=True)

```

## API Reference

### `Optimizer`

#### Parameters

- **`algorithm`** (`str`): The machine learning algorithm to optimize.

- **`direction`** (`str`, default `"maximize"`): Direction of optimization. Can be `"maximize"` or `"minimize"`.

- **`verbose`** (`bool` or `int`, default `False`): Controls Optuna's verbosity.

- **`n_trials`** (`int`, default `100`): Number of optimization trials to run.

- **`timeout`** (`float`, optional): Maximum time (in seconds) for the optimization process.

- **`cv`** (`int`, default `5`): Number of cross-validation folds.

- **`scoring`** (`str`, default `"accuracy"`): Scoring metric to use during cross-validation.

- **`random_state`** (`int`, optional): Seed for random number generation.

- **`cv_timeout`** (`int`, default `120`) Timeout for a signle cv process within a trial

#### Methods

- **`fit(X, y)`**: Fit the model using hyperparameter optimization.

- **`predict(X)`**: Make predictions using the best model found during optimization.

- **`predict_proba(X)`**: Predict class probabilities (if supported by the model).

- **`score(X, y)`**: Score the model using the test data.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/filipspl/optuml

Awesome Lists containing this project

README