{"id":13595186,"url":"https://github.com/hyperopt/hyperopt-sklearn","last_synced_at":"2025-05-13T21:07:39.131Z","repository":{"id":39616228,"uuid":"8293893","full_name":"hyperopt/hyperopt-sklearn","owner":"hyperopt","description":"Hyper-parameter optimization for sklearn","archived":false,"fork":false,"pushed_at":"2025-04-15T17:15:54.000Z","size":2388,"stargazers_count":1622,"open_issues_count":78,"forks_count":277,"subscribers_count":58,"default_branch":"master","last_synced_at":"2025-04-28T11:55:23.682Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"hyperopt.github.io/hyperopt-sklearn","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hyperopt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-02-19T16:09:53.000Z","updated_at":"2025-04-19T22:49:11.000Z","dependencies_parsed_at":"2024-09-22T07:00:48.626Z","dependency_job_id":"1fee853f-98b0-49f8-9df6-c627a0e7cc40","html_url":"https://github.com/hyperopt/hyperopt-sklearn","commit_stats":{"total_commits":390,"total_committers":28,"mean_commits":"13.928571428571429","dds":"0.45641025641025645","last_synced_commit":"4bc286479677a0bfd2178dac4546ea268b3f3b77"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyperopt%2Fhyperopt-sklearn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyperopt%2Fhyperopt-sklearn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyperopt%2Fhyperopt-sklearn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyperopt%2Fhyperopt-sklearn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hyperopt","download_url":"https://codeload.github.com/hyperopt/hyperopt-sklearn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251311332,"owners_count":21569008,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:45.474Z","updated_at":"2025-04-28T11:55:32.319Z","avatar_url":"https://github.com/hyperopt.png","language":"Python","funding_links":[],"categories":["Optimization","Python","Uncategorized","Libraries","Tools and projects","Machine Learning Frameworks"],"sub_categories":["NLP","Others","General-Purpose Machine Learning","Uncategorized","LLM"],"readme":"# hyperopt-sklearn\n\n[Hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn) is\n[Hyperopt](https://github.com/hyperopt/hyperopt)-based model selection among machine learning algorithms in\n[scikit-learn](https://scikit-learn.org/).\n\nSee how to use hyperopt-sklearn through [examples](http://hyperopt.github.io/hyperopt-sklearn/#documentation)\nMore examples can be found in the Example Usage section of the SciPy paper\n\nKomer B., Bergstra J., and Eliasmith C. \"Hyperopt-Sklearn: automatic hyperparameter configuration for Scikit-learn\" Proc. SciPy 2014. https://proceedings.scipy.org/articles/Majora-14bd3278-006\n\n## Installation\n\nInstallation from the GitHub repository is supported using [pip](https://pypi.org/project/hyperopt-sklearn):\n\n    pip install hyperopt-sklearn\n    \nOptionally you can install a specific tag, branch or commit from the repository:\n\n    pip install git+https://github.com/hyperopt/hyperopt-sklearn@1.0.3\n    pip install git+https://github.com/hyperopt/hyperopt-sklearn@master\n    pip install git+https://github.com/hyperopt/hyperopt-sklearn@fd718c44fc440bd6e2718ec1442b1af58cafcb18\n\n## Usage\n\nIf you are familiar with sklearn, adding the hyperparameter search with hyperopt-sklearn is only a one line change from the standard pipeline.\n\n```python\nfrom hpsklearn import HyperoptEstimator, svc\nfrom sklearn import svm\n\n# Load Data\n# ...\n\nif __name__ == \"__main__\":\n    if use_hpsklearn:\n        estim = HyperoptEstimator(classifier=svc(\"mySVC\"))\n    else:\n        estim = svm.SVC()\n    \n    estim.fit(X_train, y_train)\n    \n    print(estim.score(X_test, y_test))\n# \u003c\u003cshow score here\u003e\u003e\n```\n\nEach component comes with a default search space.\nThe search space for each parameter can be changed or set constant by passing in keyword arguments.\nIn the following example the `penalty` parameter is held constant during the search, and the `loss` and `alpha` parameters have their search space modified from the default.\n\n```python\nfrom hpsklearn import HyperoptEstimator, sgd_classifier\nfrom hyperopt import hp\nimport numpy as np\n\nsgd_penalty = \"l2\"\nsgd_loss = hp.pchoice(\"loss\", [(0.50, \"hinge\"), (0.25, \"log\"), (0.25, \"huber\")])\nsgd_alpha = hp.loguniform(\"alpha\", low=np.log(1e-5), high=np.log(1))\n\nif __name__ == \"__main__\":\n    estim = HyperoptEstimator(classifier=sgd_classifier(\"my_sgd\", penalty=sgd_penalty, loss=sgd_loss, alpha=sgd_alpha))\n    estim.fit(X_train, y_train)\n```\n\nComplete example using the Iris dataset:\n\n```python\nfrom hpsklearn import HyperoptEstimator, any_classifier, any_preprocessing\nfrom sklearn.datasets import load_iris\nfrom hyperopt import tpe\nimport numpy as np\n\n# Download the data and split into training and test sets\n\niris = load_iris()\n\nX = iris.data\ny = iris.target\n\ntest_size = int(0.2 * len(y))\nnp.random.seed(13)\nindices = np.random.permutation(len(X))\nX_train = X[indices[:-test_size]]\ny_train = y[indices[:-test_size]]\nX_test = X[indices[-test_size:]]\ny_test = y[indices[-test_size:]]\n\n\nif __name__ == \"__main__\":\n    # Instantiate a HyperoptEstimator with the search space and number of evaluations\n    estim = HyperoptEstimator(classifier=any_classifier(\"my_clf\"),\n                              preprocessing=any_preprocessing(\"my_pre\"),\n                              algo=tpe.suggest,\n                              max_evals=100,\n                              trial_timeout=120)\n    \n    # Search the hyperparameter space based on the data\n    estim.fit(X_train, y_train)\n    \n    # Show the results\n    print(estim.score(X_test, y_test))\n    # 1.0\n    \n    print(estim.best_model())\n    # {'learner': ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n    #           max_depth=3, max_features='log2', max_leaf_nodes=None,\n    #           min_impurity_decrease=0.0, min_impurity_split=None,\n    #           min_samples_leaf=1, min_samples_split=2,\n    #           min_weight_fraction_leaf=0.0, n_estimators=13, n_jobs=1,\n    #           oob_score=False, random_state=1, verbose=False,\n    #           warm_start=False), 'preprocs': (), 'ex_preprocs': ()}\n```\n\nHere's an example using MNIST and being more specific on the classifier and preprocessing.\n\n```python\nfrom hpsklearn import HyperoptEstimator, extra_tree_classifier\nfrom sklearn.datasets import load_digits\nfrom hyperopt import tpe\nimport numpy as np\n\n# Download the data and split into training and test sets\n\ndigits = load_digits()\n\nX = digits.data\ny = digits.target\n\ntest_size = int(0.2 * len(y))\nnp.random.seed(13)\nindices = np.random.permutation(len(X))\nX_train = X[indices[:-test_size]]\ny_train = y[indices[:-test_size]]\nX_test = X[indices[-test_size:]]\ny_test = y[indices[-test_size:]]\n\n\nif __name__ == \"__main__\":\n    # Instantiate a HyperoptEstimator with the search space and number of evaluations\n    estim = HyperoptEstimator(classifier=extra_tree_classifier(\"my_clf\"),\n                              preprocessing=[],\n                              algo=tpe.suggest,\n                              max_evals=10,\n                              trial_timeout=300)\n\n    # Search the hyperparameter space based on the data\n    estim.fit(X_train, y_train)\n\n    # Show the results\n    print(estim.score(X_test, y_test))\n    # 0.962785714286\n\n    print(estim.best_model())\n    # {'learner': ExtraTreesClassifier(bootstrap=True, class_weight=None, criterion='entropy',\n    #           max_depth=None, max_features=0.959202875857,\n    #           max_leaf_nodes=None, min_impurity_decrease=0.0,\n    #           min_impurity_split=None, min_samples_leaf=1,\n    #           min_samples_split=2, min_weight_fraction_leaf=0.0,\n    #           n_estimators=20, n_jobs=1, oob_score=False, random_state=3,\n    #           verbose=False, warm_start=False), 'preprocs': (), 'ex_preprocs': ()}\n```\n\n## Available Components\n\nAlmost all classifiers/regressors/preprocessing scikit-learn components are implemented.\nIf there is something you would like that is not yet implemented, feel free to make an issue or a pull request!\n\n### Classifiers\n\n```\nrandom_forest_classifier\nextra_trees_classifier\nbagging_classifier\nada_boost_classifier\ngradient_boosting_classifier\nhist_gradient_boosting_classifier\n\nbernoulli_nb\ncategorical_nb\ncomplement_nb\ngaussian_nb\nmultinomial_nb\n\nsgd_classifier\nsgd_one_class_svm\nridge_classifier\nridge_classifier_cv\npassive_aggressive_classifier\nperceptron\n\ndummy_classifier\n\ngaussian_process_classifier\n\nmlp_classifier\n\nlinear_svc\nnu_svc\nsvc\n\ndecision_tree_classifier\nextra_tree_classifier\n\nlabel_propagation\nlabel_spreading\n\nelliptic_envelope\n\nlinear_discriminant_analysis\nquadratic_discriminant_analysis\n\nbayesian_gaussian_mixture\ngaussian_mixture\n\nk_neighbors_classifier\nradius_neighbors_classifier\nnearest_centroid\n\nxgboost_classification\nlightgbm_classification\n\none_vs_rest\none_vs_one\noutput_code\n```\n\nFor a simple generic search space across many classifiers, use `any_classifier`. \nIf your data is in a sparse matrix format, use `any_sparse_classifier`.\nFor a complete search space across all possible classifiers, use `all_classifiers`.\n\n### Regressors\n\n```\nrandom_forest_regressor\nextra_trees_regressor\nbagging_regressor\nisolation_forest\nada_boost_regressor\ngradient_boosting_regressor\nhist_gradient_boosting_regressor\n\nlinear_regression\nbayesian_ridge\nard_regression\nlars\nlasso_lars\nlars_cv\nlasso_lars_cv\nlasso_lars_ic\nlasso\nelastic_net\nlasso_cv\nelastic_net_cv\nmulti_task_lasso\nmulti_task_elastic_net\nmulti_task_lasso_cv\nmulti_task_elastic_net_cv\npoisson_regressor\ngamma_regressor\ntweedie_regressor\nhuber_regressor\nsgd_regressor\nridge\nridge_cv\nlogistic_regression\nlogistic_regression_cv\northogonal_matching_pursuit\northogonal_matching_pursuit_cv\npassive_aggressive_regressor\nquantile_regression\nransac_regression\ntheil_sen_regressor\n\ndummy_regressor\n\ngaussian_process_regressor\n\nmlp_regressor\n\ncca\npls_canonical\npls_regression\n\nlinear_svr\nnu_svr\none_class_svm\nsvr\n\ndecision_tree_regressor\nextra_tree_regressor\n\ntransformed_target_regressor\n\nhp_sklearn_kernel_ridge\n\nbayesian_gaussian_mixture\ngaussian_mixture\n\nk_neighbors_regressor\nradius_neighbors_regressor\n\nk_means\nmini_batch_k_means\n\nxgboost_regression\n\nlightgbm_regression\n```\n\nFor a simple generic search space across many regressors, use `any_regressor`. \nIf your data is in a sparse matrix format, use `any_sparse_regressor`. \nFor a complete search space across all possible regressors, use `all_regressors`.\n\n### Preprocessing\n\n```\nbinarizer\nmin_max_scaler\nmax_abs_scaler\nnormalizer\nrobust_scaler\nstandard_scaler\nquantile_transformer\npower_transformer\none_hot_encoder\nordinal_encoder\npolynomial_features\nspline_transformer\nk_bins_discretizer\n\ntfidf_vectorizer\nhashing_vectorizer\ncount_vectorizer\n\npca\n\nts_lagselector\n\ncolkmeans\n```\n\nFor a simple generic search space across many preprocessing algorithms, use `any_preprocessing`.\nIf your data is in a sparse matrix format, use `any_sparse_preprocessing`.\nFor a complete search space across all preprocessing algorithms, use `all_preprocessing`.\nIf you are working with raw text data, use `any_text_preprocessing`.\nCurrently, only TFIDF is used for text, but more may be added in the future.\n\nNote that the `preprocessing` parameter in `HyperoptEstimator` is expecting a list, since various preprocessing steps can be chained together.\nThe generic search space functions `any_preprocessing` and `any_text_preprocessing` already return a list, but the others do not, so they should be wrapped in a list.\nIf you do not want to do any preprocessing, pass in an empty list `[]`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyperopt%2Fhyperopt-sklearn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhyperopt%2Fhyperopt-sklearn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyperopt%2Fhyperopt-sklearn/lists"}