{"id":13486996,"url":"https://github.com/sigopt/sigopt-sklearn","last_synced_at":"2025-03-27T21:31:48.004Z","repository":{"id":48111583,"uuid":"56349398","full_name":"sigopt/sigopt-sklearn","owner":"sigopt","description":"SigOpt wrappers for scikit-learn methods","archived":true,"fork":false,"pushed_at":"2023-08-22T20:43:23.000Z","size":126,"stargazers_count":75,"open_issues_count":3,"forks_count":11,"subscribers_count":12,"default_branch":"main","last_synced_at":"2024-05-19T05:05:52.045Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sigopt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-04-15T20:58:39.000Z","updated_at":"2024-07-31T20:52:14.291Z","dependencies_parsed_at":"2024-07-31T20:52:12.773Z","dependency_job_id":"35640001-7e89-4c68-853b-1698a036f967","html_url":"https://github.com/sigopt/sigopt-sklearn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigopt%2Fsigopt-sklearn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigopt%2Fsigopt-sklearn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigopt%2Fsigopt-sklearn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigopt%2Fsigopt-sklearn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sigopt","download_url":"https://codeload.github.com/sigopt/sigopt-sklearn/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245927288,"owners_count":20695205,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:00:54.170Z","updated_at":"2025-03-27T21:31:47.635Z","avatar_url":"https://github.com/sigopt.png","language":"Python","funding_links":[],"categories":["The Data Science Toolbox"],"sub_categories":["General Machine Learning Packages"],"readme":"# SigOpt + scikit-learn Interfacing\n[![Build Status](https://travis-ci.org/sigopt/sigopt-sklearn.svg?branch=master)](https://travis-ci.org/sigopt/sigopt_sklearn)\n\nThis package implements useful interfaces and wrappers for using [SigOpt](https://sigopt.com) and [scikit-learn](http://scikit-learn.org/stable/) together\n\n## Getting Started\n\nInstall the sigopt_sklearn python modules with `pip install sigopt_sklearn`.\n\nSign up for an account at [https://sigopt.com](https://sigopt.com). To use the interfaces, you'll need your API token\nfrom the [API tokens page](https://sigopt.com/tokens).\n\n### SigOptSearchCV\n\nThe simplest use case for SigOpt in conjunction with scikit-learn is optimizing estimator hyperparameters using cross\nvalidation. A short example that tunes the parameters of an SVM on a small dataset is provided below\n\n```python\nfrom sklearn import svm, datasets\nfrom sigopt_sklearn.search import SigOptSearchCV\n\n# find your SigOpt client token here : https://sigopt.com/tokens\nclient_token = '\u003cYOUR_SIGOPT_CLIENT_TOKEN\u003e'\n\niris = datasets.load_iris()\n\n# define parameter domains\nsvc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}\n\n# define sklearn estimator\nsvr = svm.SVC()\n\n# define SigOptCV search strategy\nclf = SigOptSearchCV(svr, svc_parameters, cv=5,\n    client_token=client_token, n_jobs=5, n_iter=20)\n\n# perform CV search for best parameters and fits estimator\n# on all data using best found configuration\nclf.fit(iris.data, iris.target)\n\n# clf.predict() now uses best found estimator\n# clf.best_score_ contains CV score for best found estimator\n# clf.best_params_ contains best found param configuration\n```\n\nThe objective optimized by default is is the default score associated with an estimator. A custom objective can be used\nby passing the `scoring` option to the SigOptSearchCV constructor. Shown below is an example that uses the f1_score\nalready implemented in sklearn\n\n```python\nfrom sklearn.metrics import f1_score, make_scorer\nf1_scorer = make_scorer(f1_score)\n\n# define SigOptCV search strategy\nclf = SigOptSearchCV(svr, svc_parameters, cv=5, scoring=f1_scorer,\n    client_token=client_token, n_jobs=5, n_iter=50)\n\n# perform CV search for best parameters\nclf.fit(X, y)\n```\n\n### XGBoostClassifier\n\nSigOptSearchCV also works with XGBoost's XGBClassifier wrapper. A hyperparameter search over XGBClassifier models can be done using the same interface\n\n```python\nimport xgboost as xgb\nfrom xgboost.sklearn import XGBClassifier\nfrom sklearn import datasets\nfrom sigopt_sklearn.search import SigOptSearchCV\n\n# find your SigOpt client token here : https://sigopt.com/tokens\nclient_token = '\u003cYOUR_SIGOPT_CLIENT_TOKEN\u003e'\niris = datasets.load_iris()\n\nxgb_params = {\n  'learning_rate': (0.01, 0.5),\n  'n_estimators': (10, 50),\n  'max_depth': (3, 10),\n  'min_child_weight': (6, 12),\n  'gamma': (0, 0.5),\n  'subsample': (0.6, 1.0),\n  'colsample_bytree': (0.6, 1.)\n}\n\nxgbc = XGBClassifier()\n\nclf = SigOptSearchCV(xgbc, xgb_params, cv=5,\n    client_token=client_token, n_jobs=5, n_iter=70, verbose=1)\n\nclf.fit(iris.data, iris.target)\n```\n\n### SigOptEnsembleClassifier\n\nThis class concurrently trains and tunes several classification models within sklearn to facilitate model selection\nefforts when investigating new datasets.\n\nYou'll need to install the sigopt_sklearn library with the extra requirements of xgboost for this aspect of the library\nto work:\n\n```\npip install sigopt_sklearn[ensemble]\n```\n\nA short example, using an activity recognition dataset is provided below We also have a video tutorial outlining how to run this example here:\n\n[![SigOpt scikit-learn Tutorial](http://img.youtube.com/vi/9XZ3ihE7OjM/0.jpg)](http://www.youtube.com/watch?v=9XZ3ihE7OjM \"SigOpt scikit-learn Hyperparameter Optimization Tutorial\")\n\n```\n# Human Activity Recognition Using Smartphone\n# https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones\nwget https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip\nunzip UCI\\ HAR\\ Dataset.zip\ncd UCI\\ HAR\\ Dataset\n```\n\n```python\nimport numpy as np\nimport pandas as pd\nfrom sigopt_sklearn.ensemble import SigOptEnsembleClassifier\n\ndef load_datafile(filename):\n  X = []\n  with open(filename, 'r') as f:\n    for l in f:\n      X.append(np.array([float(v) for v in l.split()]))\n  X = np.vstack(X)\n  return X\n\nX_train = load_datafile('train/X_train.txt')\ny_train = load_datafile('train/y_train.txt').ravel()\nX_test = load_datafile('test/X_test.txt')\ny_test = load_datafile('test/y_test.txt').ravel()\n\n# fit and tune several classification models concurrently\n# find your SigOpt client token here : https://sigopt.com/tokens\nsigopt_clf = SigOptEnsembleClassifier()\nsigopt_clf.parallel_fit(X_train, y_train, est_timeout=(40 * 60),\n    client_token='\u003cYOUR_CLIENT_TOKEN\u003e')\n\n# compare model performance on hold out set\nensemble_train_scores = [est.score(X_train,y_train) for est in sigopt_clf.estimator_ensemble]\nensemble_test_scores = [est.score(X_test,y_test) for est in sigopt_clf.estimator_ensemble]\ndata = sorted(zip([est.__class__.__name__\n                        for est in sigopt_clf.estimator_ensemble], ensemble_train_scores, ensemble_test_scores),\n                        reverse=True, key=lambda x: (x[2], x[1]))\npd.DataFrame(data, columns=['Classifier ALGO.', 'Train ACC.', 'Test ACC.'])\n```\n\n### CV Fold Timeouts\n\nSigOptSearchCV performs evaluations on cv folds in parallel using joblib. Timeouts are now supported in the master\nbranch of joblib and SigOpt can use this timeout information to learn to avoid hyperparameter configurations that are\ntoo slow.\n\n```python\nfrom sklearn import svm, datasets\nfrom sigopt_sklearn.search import SigOptSearchCV\n\n# find your SigOpt client token here : https://sigopt.com/tokens\nclient_token = '\u003cYOUR_SIGOPT_CLIENT_TOKEN\u003e'\ndataset = datasets.fetch_20newsgroups_vectorized()\nX = dataset.data\ny = dataset.target\n\n# define parameter domains\nsvc_parameters  = {\n  'kernel': ['linear', 'rbf'],\n  'C': (0.5, 100),\n  'max_iter': (10, 200),\n  'tol': (1e-2, 1e-6)\n}\nsvr = svm.SVC()\n\n# SVM fitting can be quite slow, so we set timeout = 180 seconds\n# for each fit.  SigOpt will then avoid configurations that are too slow\nclf = SigOptSearchCV(svr, svc_parameters, cv=5, opt_timeout=180,\n    client_token=client_token, n_jobs=5, n_iter=40)\n\nclf.fit(X, y)\n```\n\n### Categoricals\n\nSigOptSearchCV supports categorical parameters specified as list of string as the `kernel` parameter is in the SVM example:\n\n```python\nsvc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}\n```\n\nSigOpt also supports non-string valued categorical parameters. For example the `hidden_layer_sizes` parameter \nin the MLPRegressor example below,\n\n```python\nparameters = {\n  'activation': ['relu', 'tanh', 'logistic'],\n  'solver': ['lbfgs', 'adam'],\n  'alpha': (0.0001, 0.01),\n  'learning_rate_init': (0.001, 0.1),\n  'power_t': (0.001, 1.0),\n  'beta_1': (0.8, 0.999),\n  'momentum': (0.001, 1.0),\n  'beta_2': (0.8, 0.999),\n  'epsilon': (0.00000001, 0.0001),\n  'hidden_layer_sizes': {\n    'shallow': (100,),\n    'medium': (10, 10),\n    'deep': (10, 10, 10, 10)\n  }\n}\nnn = MLPRegressor()\nclf = SigOptSearchCV(nn, parameters, cv=5, cv_timeout=240,\n    client_token=client_token, n_jobs=5, n_iter=40)\n\nclf.fit(X, y)\n```\n\nGeneral Information\n=========\nrepository: 2016-2023\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsigopt%2Fsigopt-sklearn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsigopt%2Fsigopt-sklearn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsigopt%2Fsigopt-sklearn/lists"}