Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/filipspl/optuml
Optuna-optimized ML methods, with scikit-learn like API
https://github.com/filipspl/optuml
hyperparameter-optimization hyperparameter-tuning machine-learning optuna python python-module scikit-learn
Last synced: 3 months ago
JSON representation
Optuna-optimized ML methods, with scikit-learn like API
- Host: GitHub
- URL: https://github.com/filipspl/optuml
- Owner: filipsPL
- License: mit
- Created: 2024-09-12T08:09:13.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-09-27T13:31:35.000Z (3 months ago)
- Last Synced: 2024-10-12T23:43:23.130Z (3 months ago)
- Topics: hyperparameter-optimization, hyperparameter-tuning, machine-learning, optuna, python, python-module, scikit-learn
- Language: Python
- Homepage:
- Size: 87.9 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
OptuML: Hyperparameter Optimization for Multiple Machine Learning Algorithms using Optuna
=============================```
⣰⡁ ⡀⣀ ⢀⡀ ⣀⣀ ⡎⢱ ⣀⡀ ⣰⡀ ⡀⢀ ⡷⢾ ⡇ ⠄ ⣀⣀ ⣀⡀ ⢀⡀ ⡀⣀ ⣰⡀ ⡎⢱ ⣀⡀ ⣰⡀ ⠄ ⣀⣀ ⠄ ⣀⣀ ⢀⡀ ⡀⣀
⢸ ⠏ ⠣⠜ ⠇⠇⠇ ⠣⠜ ⡧⠜ ⠘⠤ ⠣⠼ ⠇⠸ ⠧⠤ ⠇ ⠇⠇⠇ ⡧⠜ ⠣⠜ ⠏ ⠘⠤ ⠣⠜ ⡧⠜ ⠘⠤ ⠇ ⠇⠇⠇ ⠇ ⠴⠥ ⠣⠭ ⠏
````OptuML` (for *Optu*na and *ML*) is a Python module that provides hyperparameter optimization for several machine learning algorithms using the [Optuna](https://optuna.org/) framework. The module supports a variety of algorithms and allows easy hyperparameter tuning through a scikit-learn-like API.
[![Python manual install](https://github.com/filipsPL/optuml/actions/workflows/python-package.yml/badge.svg)](https://github.com/filipsPL/optuml/actions/workflows/python-package.yml) [![Python pip install](https://github.com/filipsPL/optuml/actions/workflows/python-pip.yml/badge.svg)](https://github.com/filipsPL/optuml/actions/workflows/python-pip.yml) [![pypi version](https://img.shields.io/pypi/v/optuml)](https://pypi.org/project/optuml/)
```
Input OptuML train Predict
┌─────────────────┐ ┌──────────────────────────────────┐ ┌─────────────────────────────┐
│X_train, y_train ┼────► clf = Optimizer(algorithm="SVC") ├───► y_pred = clf.predict(X_test)│
└─────────────────┘ │ clf.fit(X_train, y_train) │ │ │
┌─────────────────┐ └─▲────────────────────────────────┘ └─────────────────────────▲───┘
│ML algorithm ├──────┘ │
└─────────────────┘ X_test───┘
```## Features
- **Multiple Algorithms**: Supports hyperparameter optimization for the following algorithms:
- Scikit-learn zoo, plus:
- CatBoost
- XGBoost
- **Optuna Framework**: Leverages Optuna for powerful hyperparameter search.
- **Maximize or Minimize**: Allows setting the optimization direction (`maximize` or `minimize`).
- **Scikit-learn API**: Provides a consistent interface for `fit()`, `predict()`, `predict_proba()`, and `score()` methods.
- **Control Output**: Optionally run Optuna with granular verbosity settings (`verbose` as `bool` or `int`).
- **Cross-validation**: Easily integrate cross-validation with custom scoring metrics (e.g., accuracy, ROC AUC).## Installation
### a) pip
`pip install optuml`
or upgrade with:
`pip install optuml --upgrade`
### b) Manual way
You can install the required packages via `pip`:
```bash
pip install optuna scikit-learn catboost xgboost numpy wrapt_timeout_decorator
```Next just fetch the `optuml.py` [file from the repo](optuml/optuml.py) and put it in the directory with your script.
## Usage
### Basic Example
Here’s how you can use the `Optimizer` class to optimize hyperparameters for different machine learning algorithms using the **Iris** dataset:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from OptuML import Optimizer# Load the Iris dataset
X, y = load_iris(return_X_y=True)# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)# Instantiate the optimizer for SVC
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", verbose=True)# Fit the optimizer to the training data
optimizer.fit(X_train, y_train)# Predict on the test set
y_pred = optimizer.predict(X_test)# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")# Print the best hyperparameters found during optimization
print(f"Best Hyperparameters: {optimizer.best_params_}")
```### Available Algorithms
The `Optimizer` class supports the following algorithms. You can specify the `algorithm` parameter to choose which one to use:
##### Classifiers
| Algorithm | Type |
|--------------------------|--------------------------------|
| `AdaBoostClassifier` | AdaBoost Classifier |
| `CatBoostClassifier` | CatBoost Classifier |
| `GaussianNB` | Gaussian Naive Bayes |
| `KNeighborsClassifier` | k-Nearest Neighbors Classifier |
| `MLPClassifier` | Multi-layer Perceptron Classifier |
| `RandomForestClassifier`| Random Forest Classifier |
| `SVC` | Support Vector Classifier |
| `XGBClassifier` | XGBoost Classifier |
| `QDA` | Quadratic Discriminant Analysis |##### Regressors
| Algorithm | Type |
|--------------------------|--------------------------------|
| `AdaBoostRegressor` | AdaBoost Regressor |
| `CatBoostRegressor` | CatBoost Regressor |
| `KNeighborsRegressor` | k-Nearest Neighbors Regressor |
| `MLPRegressor` | Multi-layer Perceptron Regressor |
| `RandomForestRegressor` | Random Forest Regressor |
| `SVR` | Support Vector Regressor |
| `XGBRegressor` | XGBoost Regressor |### Controlling Verbosity
You can control the verbosity of Optuna's output by using the `verbose` parameter:
- Set `verbose=True` for standard logging.
- Use an `int` value to specify more granular verbosity levels (e.g., `optuna.logging.DEBUG`).```python
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", verbose=True)
```- and/or show a progress bar:
```python
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", show_progress_bar=True)
```## API Reference
### `Optimizer`
#### Parameters
- **`algorithm`** (`str`): The machine learning algorithm to optimize.
- **`direction`** (`str`, default `"maximize"`): Direction of optimization. Can be `"maximize"` or `"minimize"`.
- **`verbose`** (`bool` or `int`, default `False`): Controls Optuna's verbosity.
- **`n_trials`** (`int`, default `100`): Number of optimization trials to run.
- **`timeout`** (`float`, optional): Maximum time (in seconds) for the optimization process.
- **`cv`** (`int`, default `5`): Number of cross-validation folds.
- **`scoring`** (`str`, default `"accuracy"`): Scoring metric to use during cross-validation.
- **`random_state`** (`int`, optional): Seed for random number generation.
- **`cv_timeout`** (`int`, default `120`) Timeout for a signle cv process within a trial#### Methods
- **`fit(X, y)`**: Fit the model using hyperparameter optimization.
- **`predict(X)`**: Make predictions using the best model found during optimization.
- **`predict_proba(X)`**: Predict class probabilities (if supported by the model).
- **`score(X, y)`**: Score the model using the test data.