{"id":15063905,"url":"https://github.com/filipspl/optuml","last_synced_at":"2026-04-04T20:06:20.285Z","repository":{"id":256779990,"uuid":"856230793","full_name":"filipsPL/optuml","owner":"filipsPL","description":"Optuna-optimized ML methods, with scikit-learn like API","archived":false,"fork":false,"pushed_at":"2024-09-27T13:31:35.000Z","size":90,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-19T13:18:44.547Z","etag":null,"topics":["hyperparameter-optimization","hyperparameter-tuning","machine-learning","optuna","python","python-module","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/filipsPL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-12T08:09:13.000Z","updated_at":"2025-02-13T13:21:34.000Z","dependencies_parsed_at":"2024-09-13T02:42:53.106Z","dependency_job_id":"ef22be9d-3839-494d-bacb-53b8ca4d3184","html_url":"https://github.com/filipsPL/optuml","commit_stats":null,"previous_names":["filipspl/optuml"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Foptuml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Foptuml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Foptuml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Foptuml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/filipsPL","download_url":"https://codeload.github.com/filipsPL/optuml/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248208621,"owners_count":21065203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hyperparameter-optimization","hyperparameter-tuning","machine-learning","optuna","python","python-module","scikit-learn"],"created_at":"2024-09-25T00:08:36.020Z","updated_at":"2026-04-04T20:06:20.269Z","avatar_url":"https://github.com/filipsPL.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OptuML: Hyperparameter Optimization for Machine Learning Algorithms using Optuna\n\n```\n ⣰⡁ ⡀⣀ ⢀⡀ ⣀⣀    ⢀⡀ ⣀⡀ ⣰⡀ ⡀⢀ ⣀⣀  ⡇   ⠄ ⣀⣀  ⣀⡀ ⢀⡀ ⡀⣀ ⣰⡀   ⡎⢱ ⣀⡀ ⣰⡀ ⠄ ⣀⣀  ⠄ ⣀⣀ ⢀⡀ ⡀⣀\n ⢸  ⠏  ⠣⠜ ⠇⠇⠇   ⠣⠜ ⡧⠜ ⠘⠤ ⠣⠼ ⠇⠇⠇ ⠣   ⠇ ⠇⠇⠇ ⡧⠜ ⠣⠜ ⠏  ⠘⠤   ⠣⠜ ⡧⠜ ⠘⠤ ⠇ ⠇⠇⠇ ⠇ ⠴⠥ ⠣⠭ ⠏ \n```\n\n`OptuML` (*Optu*na + *ML*) is a Python module providing hyperparameter optimization for machine learning algorithms using the [Optuna](https://optuna.org/) framework. The module offers a scikit-learn compatible API with enhanced features for robust optimization.\n\n[![Python manual install](https://github.com/filipsPL/optuml/actions/workflows/python-package.yml/badge.svg)](https://github.com/filipsPL/optuml/actions/workflows/python-package.yml) [![Python pip install](https://github.com/filipsPL/optuml/actions/workflows/python-pip.yml/badge.svg)](https://github.com/filipsPL/optuml/actions/workflows/python-pip.yml) [![pypi version](https://img.shields.io/pypi/v/optuml)](https://pypi.org/project/optuml/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17305964.svg)](https://doi.org/10.5281/zenodo.17305963)\n\n## tl;dr\n\n```python\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\nfrom optuml import Optimizer\n\n# Load data\nX, y = load_iris(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\n\n# Create and train optimizer\nclf = Optimizer(algorithm=\"RandomForestClassifier\", n_trials=50, cv=5, scoring=\"accuracy\")\nclf.fit(X_train, y_train)\n\n# Make predictions\ny_pred = clf.predict(X_test)\naccuracy = accuracy_score(y_test, y_pred)\n```\n\n## Key Features\n\n- **Comprehensive Algorithm Support**: Full scikit-learn algorithm zoo plus CatBoost and XGBoost\n- **Full Scikit-learn Compatibility**: Seamless integration with pipelines, cross-validation, and all sklearn tools\n- **Robust Optimization**: Powered by Optuna with early stopping, timeout protection, and parallel execution\n- **Type-Safe Design**: Separate optimizers for classification and regression with proper type checking\n- **Production Ready**: Cross-platform compatibility, comprehensive error handling, and extensive validation\n- **Flexible Configuration**: Control every aspect of the optimization process\n\n## Installation\n\n### Option A: pip (recommended)\n\n```bash\npip install optuml\n```\n\nWith optional algorithm support:\n\n```bash\npip install optuml[all]          # CatBoost + XGBoost + LightGBM\npip install optuml[catboost]     # CatBoost only\npip install optuml[xgboost]      # XGBoost only\npip install optuml[lightgbm]     # LightGBM only\n```\n\nor upgrade:\n\n```bash\npip install optuml --upgrade\n```\n\n### Option B: Manual installation\n\n```bash\n# Install required dependencies\npip install optuna scikit-learn numpy\n\n# Optional: Install additional algorithms\npip install catboost xgboost\n\n# Download the module\nwget https://raw.githubusercontent.com/filipsPL/optuml/main/optuml/optuml.py\n```\n\n## Quick Start\n\n### Classification Example\n\n```python\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\nfrom optuml import Optimizer\n\n# Load data\nX, y = load_iris(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\n# Create and train optimizer\nclf = Optimizer(\n    algorithm=\"RandomForestClassifier\",\n    n_trials=50,\n    cv=5,\n    scoring=\"accuracy\",\n    random_state=42,\n    show_progress_bar=True\n)\nclf.fit(X_train, y_train)\n\n# Make predictions\ny_pred = clf.predict(X_test)\naccuracy = accuracy_score(y_test, y_pred)\n\n# View results\nprint(f\"Accuracy: {accuracy:.3f}\")\nprint(f\"Best parameters: {clf.best_params_}\")\nprint(f\"Optimization took: {clf.study_time_:.2f} seconds\")\nprint(f\"Trials completed: {clf.n_trials_completed_}\")\n```\n\n### Regression Example\n\n```python\nfrom sklearn.datasets import load_diabetes\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import r2_score\nfrom optuml import Optimizer\n\n# Load data\nX, y = load_diabetes(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\n# Create and train optimizer\nreg = Optimizer(\n    algorithm=\"XGBRegressor\",\n    n_trials=100,\n    cv=5,\n    scoring=\"r2\",\n    early_stopping_patience=10,  # Stop if no improvement for 10 trials\n    n_jobs=-1,  # Use all CPU cores for CV\n    verbose=True\n)\nreg.fit(X_train, y_train)\n\n# Evaluate\ny_pred = reg.predict(X_test)\nr2 = r2_score(y_test, y_pred)\nprint(f\"R² Score: {r2:.3f}\")\n```\n\n## Supported Algorithms\n\n### Classification Algorithms\n\n| Algorithm                        | Description                     | Key Features                              |\n| -------------------------------- | ------------------------------- | ----------------------------------------- |\n| `SVC`                            | Support Vector Classifier       | Non-linear kernels, probability estimates |\n| `LogisticRegression`             | Logistic Regression             | L1/L2/Elastic-Net regularization          |\n| `RidgeClassifier`                | Ridge Classifier                | L2 regularization, fast linear model      |\n| `KNeighborsClassifier`           | k-Nearest Neighbors             | Distance weighting, various metrics       |\n| `RandomForestClassifier`         | Random Forest                   | Feature importance, OOB score             |\n| `ExtraTreesClassifier`           | Extremely Randomized Trees      | Faster than RF, reduced variance          |\n| `AdaBoostClassifier`             | AdaBoost                        | Boosted ensemble, learning rate tuning    |\n| `GradientBoostingClassifier`     | Gradient Boosting               | Sequential boosting, feature subsampling  |\n| `HistGradientBoostingClassifier` | Histogram Gradient Boosting     | Fast GBDT, native NaN support             |\n| `MLPClassifier`                  | Neural Network                  | Multiple architectures, early stopping    |\n| `GaussianNB`                     | Gaussian Naive Bayes            | Fast, probabilistic                       |\n| `QDA`                            | Quadratic Discriminant Analysis | Non-linear boundaries                     |\n| `DecisionTreeClassifier`         | Decision Tree                   | Multiple criteria, pruning                |\n| `SGDClassifier`                  | Stochastic Gradient Descent     | Multiple losses, L1/L2/ElasticNet, online |\n| `CatBoostClassifier`*            | CatBoost                        | Categorical features, GPU support         |\n| `XGBClassifier`*                 | XGBoost                         | Regularization, missing values            |\n| `LGBMClassifier`*                | LightGBM                        | Fast GBDT, leaf-wise growth               |\n\n### Regression Algorithms\n\n| Algorithm                       | Description                 | Key Features                             |\n| ------------------------------- | --------------------------- | ---------------------------------------- |\n| `SVR`                           | Support Vector Regression   | Epsilon-insensitive loss                 |\n| `LinearRegression`              | Linear Regression           | Simple, interpretable                    |\n| `Ridge`                         | Ridge Regression            | L2 regularization, stable on collinear   |\n| `Lasso`                         | Lasso Regression            | L1 regularization, feature selection     |\n| `ElasticNet`                    | Elastic Net                 | L1+L2 regularization, sparse solutions   |\n| `KNeighborsRegressor`           | k-Nearest Neighbors         | Local regression                         |\n| `RandomForestRegressor`         | Random Forest               | Reduces overfitting                      |\n| `ExtraTreesRegressor`           | Extremely Randomized Trees  | Faster than RF, reduced variance         |\n| `AdaBoostRegressor`             | AdaBoost                    | Sequential learning                      |\n| `GradientBoostingRegressor`     | Gradient Boosting           | Sequential boosting, feature subsampling |\n| `HistGradientBoostingRegressor` | Histogram Gradient Boosting | Fast GBDT, native NaN support            |\n| `MLPRegressor`                  | Neural Network              | Non-linear patterns                      |\n| `DecisionTreeRegressor`         | Decision Tree               | Non-parametric                           |\n| `SGDRegressor`                  | Stochastic Gradient Descent | Multiple losses, L1/L2/ElasticNet, online |\n| `CatBoostRegressor`*            | CatBoost                    | Handles categoricals                     |\n| `XGBRegressor`*                 | XGBoost                     | High performance                         |\n| `LGBMRegressor`*                | LightGBM                    | Fast GBDT, leaf-wise growth              |\n\n*Optional dependencies (install separately)\n\n## Advanced Features\n\n### Early Stopping\n\nStop optimization when no improvement is observed:\n\n```python\noptimizer = Optimizer(\n    algorithm=\"XGBClassifier\",\n    n_trials=1000,\n    early_stopping_patience=20  # Stop after 20 trials without improvement\n)\n```\n\n### Parallel Cross-Validation\n\nSpeed up optimization using multiple CPU cores:\n\n```python\noptimizer = Optimizer(\n    algorithm=\"RandomForestClassifier\",\n    n_trials=100,\n    cv=10,\n    n_jobs=-1  # Use all available cores\n)\n```\n\n### Custom Scoring Metrics\n\nUse any scikit-learn compatible scoring metric:\n\n```python\noptimizer = Optimizer(\n    algorithm=\"SVC\",\n    scoring=\"roc_auc\",  # For classification\n    # scoring=\"neg_mean_squared_error\",  # For regression\n    # scoring=\"f1_weighted\",  # For imbalanced classes\n)\n```\n\n### Timeout Protection\n\nSet time limits for optimization:\n\n```python\noptimizer = Optimizer(\n    algorithm=\"MLPClassifier\",\n    timeout=300,  # Total optimization timeout (5 minutes)\n    cv_timeout=30,  # Per-trial timeout (30 seconds)\n    n_trials=1000  # Will stop at timeout even if trials remain\n)\n```\n\n### Access to Optuna Study\n\nGet detailed optimization information:\n\n```python\n# After fitting\noptimizer.fit(X_train, y_train)\n\n# Access the Optuna study object\nstudy = optimizer.study_\nprint(f\"Best trial: {study.best_trial.number}\")\nprint(f\"Best value: {study.best_value:.4f}\")\n\n# Plot optimization history (requires plotly)\nimport optuna.visualization as vis\nfig = vis.plot_optimization_history(study)\nfig.show()\n\n# Plot parameter importances\nfig = vis.plot_param_importances(study)\nfig.show()\n```\n\n### Pipeline Integration\n\nFull compatibility with scikit-learn pipelines:\n\n```python\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\n\n# Create pipeline with OptuML\npipe = Pipeline([\n    ('scaler', StandardScaler()),\n    ('optimizer', Optimizer(algorithm=\"SVC\", n_trials=50))\n])\n\n# Use like any sklearn pipeline\npipe.fit(X_train, y_train)\npredictions = pipe.predict(X_test)\n```\n\n### Type-Specific Optimizers\n\nFor more control, use the specific optimizer classes:\n\n```python\nfrom optuml.optuml import ClassifierOptimizer, RegressorOptimizer\n\n# Classifier with all classifier-specific methods\nclf = ClassifierOptimizer(\n    algorithm=\"RandomForestClassifier\",\n    n_trials=100\n)\nclf.fit(X_train, y_train)\nprobas = clf.predict_proba(X_test)\ndecision = clf.decision_function(X_test)  # If supported\n\n# Regressor with regression-specific defaults\nreg = RegressorOptimizer(\n    algorithm=\"RandomForestRegressor\",\n    n_trials=100,\n    scoring=\"r2\"  # Default for regressors\n)\n```\n\n## API Reference\n\n### Main Classes\n\n#### `Optimizer`\nUniversal optimizer that automatically selects between classification and regression.\n\n#### `ClassifierOptimizer`\nSpecialized optimizer for classification algorithms with methods like `predict_proba()` and `decision_function()`.\n\n#### `RegressorOptimizer`\nSpecialized optimizer for regression algorithms with appropriate default scoring metrics.\n\n### Common Parameters\n\n| Parameter                 | Type       | Default    | Description                                |\n| ------------------------- | ---------- | ---------- | ------------------------------------------ |\n| `algorithm`               | str        | required   | ML algorithm to optimize                   |\n| `n_trials`                | int        | 100        | Number of optimization trials              |\n| `cv`                      | int        | 5          | Cross-validation folds                     |\n| `scoring`                 | str/None   | Auto*      | Scoring metric for CV                      |\n| `direction`               | str        | \"maximize\" | Optimization direction                     |\n| `timeout`                 | float/None | None       | Total optimization timeout (seconds)       |\n| `cv_timeout`              | float      | 120        | Single CV evaluation timeout               |\n| `random_state`            | int/None   | None       | Random seed for reproducibility            |\n| `n_jobs`                  | int        | 1          | Parallel jobs for CV (-1 for all cores)    |\n| `early_stopping_patience` | int/None   | None       | Trials without improvement before stopping |\n| `verbose`                 | bool/int   | False      | Verbosity level                            |\n| `show_progress_bar`       | bool       | False      | Show optimization progress                 |\n\n*Auto defaults: \"accuracy\" for classifiers, \"r2\" for regressors\n\n### Methods\n\n| Method                 | Description                        | Available For    |\n| ---------------------- | ---------------------------------- | ---------------- |\n| `fit(X, y)`            | Optimize hyperparameters and train | All              |\n| `predict(X)`           | Make predictions                   | All              |\n| `score(X, y)`          | Evaluate model performance         | All              |\n| `predict_proba(X)`     | Predict class probabilities        | Classifiers      |\n| `decision_function(X)` | Get decision values                | Some classifiers |\n| `get_params()`         | Get optimizer parameters           | All              |\n| `set_params(**params)` | Set optimizer parameters           | All              |\n\n### Attributes (after fitting)\n\n| Attribute             | Description                        |\n| --------------------- | ---------------------------------- |\n| `best_estimator_`     | Trained model with best parameters |\n| `best_params_`        | Best hyperparameters found         |\n| `best_score_`         | Best cross-validation score        |\n| `study_`              | Optuna study object                |\n| `study_time_`         | Total optimization time            |\n| `n_trials_completed_` | Number of completed trials         |\n| `classes_`            | Class labels (classifiers only)    |\n| `n_features_in_`      | Number of input features           |\n| `feature_names_in_`   | Feature names (if available)       |\n\n## Troubleshooting\n\n### Issue: \"No successful trials completed\"\n**Solution**: Increase `cv_timeout` or reduce `cv` folds:\n```python\noptimizer = Optimizer(algorithm=\"SVC\", cv_timeout=300, cv=3)\n```\n\n### Issue: CatBoost/XGBoost/LightGBM not available\n**Solution**: Install optional dependencies:\n```bash\npip install optuml[all]\n# or individually:\npip install catboost xgboost lightgbm\n```\n\n### Issue: Optimization takes too long\n**Solutions**:\n1. Use parallel CV: `n_jobs=-1`\n2. Set timeout: `timeout=600`\n3. Use early stopping: `early_stopping_patience=10`\n4. Reduce trials: `n_trials=50`\n\n### Issue: Memory errors with large datasets\n**Solutions**:\n1. Use algorithms with lower memory footprint (e.g., `LogisticRegression`, `SGDClassifier`, or `SGDRegressor`)\n2. Reduce CV folds\n\n## Best Practices\n\n1. **Start with fewer trials**: Begin with `n_trials=20-50` for exploration, then increase for final optimization\n\n2. **Use appropriate scoring metrics**: \n   - Imbalanced classification: `\"f1_weighted\"`, `\"roc_auc\"`\n   - Regression: `\"r2\"`, `\"neg_mean_squared_error\"`\n   \n3. **Enable early stopping** for large trial counts:\n   ```python\n   Optimizer(n_trials=1000, early_stopping_patience=20)\n   ```\n\n4. **Set random state** for reproducibility:\n   ```python\n   Optimizer(random_state=42)\n   ```\n\n5. **Use parallel processing** for faster optimization:\n   ```python\n   Optimizer(n_jobs=-1)\n   ```\n\n## Benchmark\n\nSee [this page](benchmark/README.md) for benchmark results.\n\n## Citation\n\nIf you use OptuML in your research, please cite:\n\n```bibtex\n@software{stefaniak_optuml_2024,\n  author       = {Filip Stefaniak},\n  title        = {OptuML: Hyperparameter Optimization for Multiple Machine Learning Algorithms using Optuna},\n  year         = {2024},\n  publisher    = {Zenodo},\n  doi          = {10.5281/zenodo.17305963},\n  url          = {https://doi.org/10.5281/zenodo.17305963}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffilipspl%2Foptuml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffilipspl%2Foptuml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffilipspl%2Foptuml/lists"}