{"id":18336046,"url":"https://github.com/maxim5/hyper-engine","last_synced_at":"2025-04-06T04:34:53.287Z","repository":{"id":62569943,"uuid":"79011885","full_name":"maxim5/hyper-engine","owner":"maxim5","description":"Python library for Bayesian hyper-parameters optimization","archived":false,"fork":false,"pushed_at":"2018-08-28T07:21:11.000Z","size":409,"stargazers_count":87,"open_issues_count":9,"forks_count":22,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-10-03T08:37:41.986Z","etag":null,"topics":["bayesian-optimization","big-data","convolutional-neural-networks","data-science","deep-learning","gaussian-processes","hyperparameter-optimization","machine-learning","model-selection","neural-network","optimization-algorithms","python","random-search","tensorflow"],"latest_commit_sha":null,"homepage":"https://pypi.python.org/pypi/hyperengine","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxim5.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-15T07:49:46.000Z","updated_at":"2024-09-24T05:00:12.000Z","dependencies_parsed_at":"2022-11-03T18:47:57.705Z","dependency_job_id":null,"html_url":"https://github.com/maxim5/hyper-engine","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim5%2Fhyper-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim5%2Fhyper-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim5%2Fhyper-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim5%2Fhyper-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxim5","download_url":"https://codeload.github.com/maxim5/hyper-engine/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223238121,"owners_count":17111359,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-optimization","big-data","convolutional-neural-networks","data-science","deep-learning","gaussian-processes","hyperparameter-optimization","machine-learning","model-selection","neural-network","optimization-algorithms","python","random-search","tensorflow"],"created_at":"2024-11-05T20:05:44.987Z","updated_at":"2024-11-05T20:05:45.505Z","avatar_url":"https://github.com/maxim5.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"============================================\nHyper-parameters Tuning for Machine Learning\n============================================\n\n- `Overview \u003c#overview\u003e`__\n    - `About \u003c#about\u003e`__\n    - `Installation \u003c#installation\u003e`__\n    - `How to use \u003c#how-to-use\u003e`__\n- `Features \u003c#features\u003e`__\n    - `Straight-forward specification \u003c#straight-forward-specification\u003e`__\n    - `Exploration-exploitation trade-off \u003c#exploration-exploitation-trade-off\u003e`__\n    - `Learning Curve Estimation \u003c#learning-curve-estimation\u003e`__\n- `Bayesian Optimization \u003c#bayesian-optimization\u003e`__\n\n--------\nOverview\n--------\n\nAbout\n=====\n\n*HyperEngine* is a toolbox for `model selection and hyper-parameters tuning \u003chttps://en.wikipedia.org/wiki/Hyperparameter_optimization\u003e`__.\nIt aims to provide most state-of-the-art techniques via intuitive API and with minimum dependencies.\n*HyperEngine* is **not a framework**, which means it doesn't enforce any structure or design to the main code,\nthus making integration local and non-intrusive.\n\nInstallation\n============\n\n.. code-block:: shell\n\n    pip install hyperengine\n\nDependencies:\n\n-  ``six``, ``numpy``, ``scipy``\n-  ``tensorflow`` (optional)\n-  ``matplotlib`` (optional, only for development)\n\nCompatibility:\n\n.. image:: https://travis-ci.org/maxim5/hyper-engine.svg?branch=master\n    :target: https://travis-ci.org/maxim5/hyper-engine\n\n-  Python 2.7, 3.5, 3.6\n\nLicense:\n\n- `Apache 2.0 \u003cLICENSE\u003e`__\n\n*HyperEngine* is designed to be ML-platform agnostic, but currently provides only simple `TensorFlow \u003chttps://github.com/tensorflow/tensorflow\u003e`__ binding.\n\nHow to use\n==========\n\nAdapting your code to *HyperEngine* usually boils down to migrating hard-coded hyper-parameters to a dictionary (or an object)\nand giving names to particular tensors.\n\n**Before:**\n\n.. code-block:: python\n\n    def my_model():\n      x = tf.placeholder(...)\n      y = tf.placeholder(...)\n      ...\n      optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)\n      ...\n\n**After:**\n\n.. code-block:: python\n\n    def my_model(params):\n      x = tf.placeholder(..., name='input')\n      y = tf.placeholder(..., name='label')\n      ...\n      optimizer = tf.train.GradientDescentOptimizer(learning_rate=params['learning_rate'])\n      ...\n\n    # Now can run the model with any set of hyper-parameters\n\n\nThe rest of the integration code is isolated and can be placed in the ``main`` script.\nSee the examples of hyper-parameter tuning in `examples \u003chyperengine/examples\u003e`__ package.\n\n--------\nFeatures\n--------\n\nStraight-forward specification\n==============================\n\nThe crucial part of hyper-parameter tuning is the definition of a *domain*\nover which the engine is going to optimize the model. Some variables are continuous (e.g., the learning rate),\nsome variables are integer values in a certain range (e.g., the number of hidden units), some variables are categorical\nand represent architecture knobs (e.g., the choice of non-linearity).\n\nYou can define all these variables and their ranges in ``numpy``-like fashion:\n\n.. code-block:: python\n\n    hyper_params_spec = {\n      'optimizer': {\n        'learning_rate': 10**spec.uniform(-3, -1),          # makes the continuous range [0.1, 0.001]\n        'epsilon': 1e-8,                                    # constants work too\n      },\n      'conv': {\n        'filters': [[3, 3, spec.choice(range(32, 48))],     # an integer between [32, 48]\n                    [3, 3, spec.choice(range(64, 96))],     # an integer between [64, 96]\n                    [3, 3, spec.choice(range(128, 192))]],  # an integer between [128, 192]\n        'activation': spec.choice(['relu','prelu','elu']),  # a categorical range: 1 of 3 activations\n        'down_sample': {\n          'size': [2, 2],\n          'pooling': spec.choice(['max_pool', 'avg_pool'])  # a categorical range: 1 of 2 pooling methods\n        },\n        'residual': spec.random_bool(),                     # either True or False\n        'dropout': spec.uniform(0.75, 1.0),                 # a uniform continuous range\n      },\n    }\n\nNote that ``10**spec.uniform(-3, -1)`` is not the same *distribution* as ``spec.uniform(0.001, 0.1)``\n(though they both define the same *range* of values).\nIn the first case, the whole logarithmic spectrum ``(-3, -1)`` is equally probable, while in\nthe second case, small values around ``0.001`` are much less likely than the values around the mean ``0.0495``.\nSpecifying the following domain range for the learning rate - ``spec.uniform(0.001, 0.1)`` - will likely skew the results\ntowards higher learning rates. This outlines the importance of random variable transformations and arithmetic operations.\n\nExploration-exploitation trade-off\n==================================\n\nMachine learning model selection is expensive.\nEach model evaluation requires full training from scratch and may take minutes to hours to days, \ndepending on the problem complexity and available computational resources.\n*HyperEngine* provides the algorithm to explore the space of parameters efficiently, focus on the most promising areas,\nthus converge to the maximum as fast as possible.\n\n**Example 1**: the true function is 1-dimensional, ``f(x) = x * sin(x)`` (black curve) on [-10, 10] interval.\nRed dots represent each trial, red curve is the `Gaussian Process \u003chttps://en.wikipedia.org/wiki/Gaussian_process\u003e`__ mean,\nblue curve is the mean plus or minus one standard deviation.\nThe optimizer randomly chose the negative mode as more promising.\n\n.. image:: /.images/figure_1.png\n    :width: 80%\n    :alt: 1D Bayesian Optimization\n    :align: center\n\n**Example 2**: the 2-dimensional function ``f(x, y) = (x + y) / ((x - 1) ** 2 - sin(y) + 2)`` (black surface) on [0,9]x[0,9] square.\nRed dots represent each trial, the Gaussian Process mean and standard deviations are not shown for simplicity.\nNote that to achieve the maximum both variables must be picked accurately.\n\n.. image:: /.images/figure_2-1.png\n   :width: 100%\n   :alt: 2D Bayesian Optimization\n   :align: center\n\n.. image:: /.images/figure_2-2.png\n   :width: 100%\n   :alt: 2D Bayesian Optimization\n   :align: center\n\nThe code for these and others examples is `here \u003chttps://github.com/maxim5/hyper-engine/blob/master/hyperengine/tests/strategy_test.py\u003e`__.\n\nLearning Curve Estimation\n=========================\n\n*HyperEngine* can monitor the model performance during the training and stop early if it's learning too slowly.\nThis is done via *learning curve prediction*. Note that this technique is compatible with Bayesian Optimization, since\nit estimates the model accuracy after full training - this value can be safely used to update Gaussian Process parameters.\n\nExample code:\n\n.. code-block:: python\n\n    curve_params = {\n      'burn_in': 30,                # burn-in period: 30 models \n      'min_input_size': 5,          # start predicting after 5 epochs\n      'value_limit': 0.80,          # stop if the estimate is less than 80% with high probability\n    }\n    curve_predictor = LinearCurvePredictor(**curve_params)\n\nCurrently there is only one implementation of the predictor, ``LinearCurvePredictor``, \nwhich is very efficient, but requires relatively large burn-in period to predict model accuracy without flaws.\n\nNote that learning curves can be reused between different models and works quite well for the burn-in,\nso it's recommended to serialize and load curve data via ``io_save_dir`` and ``io_load_dir`` parameters.\n\nSee also the following paper:\n`Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks\nby Extrapolation of Learning Curves \u003chttp://aad.informatik.uni-freiburg.de/papers/15-IJCAI-Extrapolation_of_Learning_Curves.pdf\u003e`__\n\n---------------------\nBayesian Optimization\n---------------------\n\nImplements the following `methods \u003chttps://en.wikipedia.org/wiki/Bayesian_optimization\u003e`__:\n\n-  Probability of improvement (See H. J. Kushner. A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise. J. Basic Engineering, 86:97–106, 1964.)\n-  Expected Improvement (See J. Mockus, V. Tiesis, and A. Zilinskas. Toward Global Optimization, volume 2, chapter The Application of Bayesian Methods for Seeking the Extremum, pages 117–128. Elsevier, 1978)\n-  `Upper Confidence Bound \u003chttp://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf\u003e`__\n-  `Mixed / Portfolio strategy \u003chttp://mlg.eng.cam.ac.uk/hoffmanm/papers/hoffman:2011.pdf\u003e`__\n-  Naive random search.\n\nPI method prefers exploitation to exploration, UCB is the opposite. One of the best strategies we've seen is a mixed one:\nstart with high probability of UCB and gradually decrease it, increasing PI probability.\n\nDefault kernel function used is `RBF kernel \u003chttps://en.wikipedia.org/wiki/Radial_basis_function_kernel\u003e`__, but it is extensible.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim5%2Fhyper-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxim5%2Fhyper-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim5%2Fhyper-engine/lists"}