{"id":13757241,"url":"https://github.com/lucfra/FAR-HO","last_synced_at":"2025-05-10T05:31:53.224Z","repository":{"id":72343561,"uuid":"113098403","full_name":"lucfra/FAR-HO","owner":"lucfra","description":"Gradient based hyperparameter optimization \u0026 meta-learning package for TensorFlow","archived":false,"fork":false,"pushed_at":"2020-03-24T12:22:57.000Z","size":517,"stargazers_count":187,"open_issues_count":2,"forks_count":47,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-11-16T13:35:01.804Z","etag":null,"topics":["gradient-descent","hyperparameter-optimization","optimization","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucfra.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-12-04T21:42:43.000Z","updated_at":"2024-11-15T23:58:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"8e1fa8cc-8bb9-44a9-817c-e6b5067a35a2","html_url":"https://github.com/lucfra/FAR-HO","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucfra%2FFAR-HO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucfra%2FFAR-HO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucfra%2FFAR-HO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucfra%2FFAR-HO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucfra","download_url":"https://codeload.github.com/lucfra/FAR-HO/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253371072,"owners_count":21897998,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gradient-descent","hyperparameter-optimization","optimization","tensorflow"],"created_at":"2024-08-03T12:00:30.340Z","updated_at":"2025-05-10T05:31:52.865Z","avatar_url":"https://github.com/lucfra.png","language":"Jupyter Notebook","funding_links":[],"categories":["2.For Experiment"],"sub_categories":["Hyperparameter Tuning"],"readme":"# FAR-HO\n\nGradient-based hyperparameter optimization and meta-learning package based on [TensorFlow](https://www.tensorflow.org/)\n\nThis is the new package that implements the algorithms presented in the paper\n [_Forward and Reverse Gradient-Based Hyperparameter Optimization_](http://proceedings.mlr.press/v70/franceschi17a). For the older package see [RFHO](https://github.com/lucfra/RFHO). FAR-HO [features simplified interfaces, additional\ncapabilities and a tighter integration with `tensorflow`](https://github.com/lucfra/FAR-HO#new-features-and-differences-from-rfho). \n\n- Reverse hypergradient (`ReverseHG`), generalization of algorithms presented in Domke [2012] and MacLaurin et Al. [2015] (without reversable dynamics and \"reversable dtype\"); including the \"truncated reverse version\" `ReverseHG.truncated`, see [_Truncated Back-propagation for Bilevel Optimization_](https://arxiv.org/abs/1810.10667)\n- Forward hypergradient (`ForwardHG`)\n- Online versions of the two previous algorithms: Real-Time HO (RTHO) and Truncated-Reverse HO (TRHO)\n- Implicit differentiation (`ImplicitHG`), which can be used to implement [HOAG algorithm](http://proceedings.mlr.press/v48/pedregosa16) [Pedregosa, 2016] \n\nThese algorithms algorithms compute, with different procedures, the (approximate) gradient\n  of an outer objective such as a validation error with respect \n  to the outer variables (e.g. hyperparameters). \n  We call the gradient of the outer objective _hypergradient_.\n  The \"online\" algorithms may perform several updates of the \n  outer variables before reaching the final iteration, and are in general are much faster then their \"batch\" version. This procedure is linked to warm restart for solving the inner optimizaiton problem, but the hypergradient is, in general, biased.\n\n**IMPORTANT NOTE:** This is not a plug-and-play hyperparameter optimizaiton package, but rather a research package that collects some useful methods that aim at simplifying the creation of experiments in gradient-based hyperparameter optimizaiton and related areas. With respect to other HPO packages, here a more specific problem structure is required. Furthermore, depending on the specific problem, the performance may be somewhat sensiteve to algorithmic parameters. As an important example, the inner optimizaion dynamics *should not diverge* in order for the hypergradients to yield useful informations [ Troubleshooting section coming soon! ].  \n\n**NOTE II:** In Italian FARO means beacon or lighthouse (so... no \"H\", but the \"H\" in Italian is silent!) . \n\n![alt text](https://github.com/lucfra/RFHO/blob/master/rfho/examples/0_95_crop.png \n\"Response surface of a small neural network and optimization trajectory in the hyperparameter space.\nThe arrows depicts the negative hypergradient at the current point, computed with Forward-HG algorithm.\")\n\nThese algorithms are useful also in meta-learning where parameters of various _meta-learners_ effectively play the role \nof  outer variables, as explained here in the workshop paper \n[_A Bridge Between Hyperparameter Optimization and Learning-to-learn_](https://arxiv.org/abs/1712.06283).\nand [_Bilevel Programming for Hyperparameter Optimization and Meta-Learning_](http://proceedings.mlr.press/v80/franceschi18a/franceschi18a.pdf)\n\nThis package is also described in the workshop paper \n[_ Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning_]( https://arxiv.org/abs/1806.04941)\npresented at [AutoML 2018](https://sites.google.com/site/automl2018icml/) at ICML\n\n\n## Installation \u0026 Dependencies\n\nClone the repository and run setup script.\n\n```\ngit clone git clone https://github.com/lucfra/FAR-HO.git\ncd FAR-HO\npython setup.py install\n```\n\nBeside \"usual\" packages (`numpy`), FAR-HO is built upon `tensorflow`. \nSome examples depend on the package [`experimet_manager`](https://github.com/lucfra/ExperimentManager)\nwhile automatic dataset download (Omniglot) requires `datapackage`.\n\nPlease note that required packages will not be installed automatically.\n\n## Overview\n\nAim of this package is to implement and develop gradient-based hyperparameter optimization (HO) techniques in\nTensorFlow, thus making them readily applicable to deep learning systems. \nThis optimization techniques find also natural applications in the field of meta-learning and\nlearning-to-learn. \nFeel free to issues comments, suggestions and feedbacks! You can email me at luca.franceschi@iit.it .\n\n\n#### Quick Start \n\n- [Self contained example](https://github.com/lucfra/FAR-HO/blob/master/far_ho/examples/Example_weighted_error(and_lr_and_w0).ipynb) on MNIST with `ReverseHG` for the optimization of initial starting point (inital weights), weights of each example and learning rate. \n- [IPython notebook](https://github.com/lucfra/FAR-HO/blob/master/far_ho/examples/autoMLDemos/Far-HO%20Demo%2C%20AutoML%202018%2C%20ICML%20workshop.ipynb)\nthat showcase the usage of `ReverseHG`, `ForwardHG` and online `ForwardHG` (RTHO algorithm) in simple settings\n- _Coming soon_: What you can and cannot do with this package.\n- [Hyper-representation](https://github.com/lucfra/FAR-HO/blob/master/far_ho/examples/hyper_representation.py) and related [notebook](https://github.com/lucfra/FAR-HO/blob/master/far_ho/examples/Hyper%20Representation_experiments.ipynb): an example in the context of learning-to-learn. In this case the hyperparameters are some of the weights of a convolutional neural network (plus the learning rate!). \nThe idea is to learn a cross-episode shared representation by explicitly minimizing the mean generalization error over meta-training tasks. See [A bridge between hyperparameter optimization and learning-to-Learn](https://arxiv.org/abs/1712.06283) presentied at [Workshop on meta-learning](http://metalearning.ml/). _Note_: for the moment, for running the code for this experiment you need to install the package https://github.com/lucfra/ExperimentManager for data management and statistics recording. \n- See also [this experiments package](https://github.com/prolearner/hyper-representation) for code for reproducing few-shot experiments \npresented in ICML 2018 paper.\n\n#### Core Steps\n\n- Create a model\u003csup\u003e1\u003c/sup\u003e with TensorFlow\n- Create the hyperparameters you wish to optimize\u003csup\u003e2\u003c/sup\u003e with the function `get_hyperparameter` (which could be also variables of your model)\n- Define an inner objective (e.g. a training error) and an outer objective (e.g. a validation error) as scalar `tensorflow.Tensor`\n- Create an instance of `HyperOptimizer` after choosing an hyper-gradient computation algorithm among\n`ForwardHG`, `ReverseHG` and `ImplicitHG` (see next section)\n- Call the function `HyperOptimizer.minimize` specifying passing the outer and inner objectives, \nas well as an optimizer for the outer problem (which can be any optimizer form `tensorflow`) \nand an optimizer for the inner problem (which must be an optimizer contained in this package; \nat the moment gradient descent, gradient descent with momentum and Adam algorithms are available, \nbut it should be quite straightforward to implement other optimizers, email me if you're interested!) \n- Execute `HyperOptimizer.run(T, ...)` function inside a `tensorflow.Session`, \noptimize inner variables (parameters) and perform a step of optimization of outer variables (hyperparameter).\n\nTwo scripts in the folder [autoMLDemos](https://github.com/lucfra/FAR-HO/tree/master/far_ho/examples/autoMLDemos) \nshowcase typical usage of this package\n\n\n```python\nimport far_ho as far\nimport tensorflow as tf\n\nmodel = create_model(...)  \n\nlambda1 = far.get_hyperparameter('lambda1', ...)\nlambda1 = far.get_hyperparameter('lambda2', ...)\nio, oo = create_objective(...)\n\ninner_problem_optimizer = far.GradientDescentOptimizer(lr=far.get_hyperparameter('lr', 0.1))\nouter_problem_optimizer = tf.train.AdamOptimizer()\n\nfarho = far.HyperOptimizer() \nho_step = farho.minimize(oo, outer_problem_optimizer,\n                     io, inner_problem_optimizer)\n\nT = 100\nwith tf.Session().as_default():\n  for _ in range(100):\n    ho_step(T)    \n```\n____\n\u003csup\u003e1\u003c/sup\u003e This is gradient-based optimization and for the computation\nof the hyper-gradients second order derivatives of the training error show up\n(_even tough no Hessian matrix is explicitly computed at any time_);\ntherefore, all the ops used\nin the model should have a second order derivative registered in `tensorflow`.\n\n\u003csup\u003e2\u003c/sup\u003e For the hyper-gradients to make sense, hyperparameters should be \nreal-valued. Moreover, while `ReverseHG` should handle generic r-rank tensor \nhyperparameters, `ForwardHG`requires scalars hyperparameters. Use the keyword argument `scalar=True` in `get_hyperparameter` for obtaining a scalr splitting of a general tensor.\n\n#### Which Algorithm Do I Choose?\n\nForward and Reverse-HG compute the same hypergradient, so\nthe choice is a matter of time versus memory!\n\n![alt text](https://github.com/lucfra/RFHO/blob/master/rfho/examples/time_memory.png \"Time vs memory requirements\")\n\nThe online versions of the algorithms can dramatically speed-up the optimization.\n\n#### The Idea Behind: Hyperparameter Optimization\n\nThe objective is to minimize some validation function _E_ with respect to\n a vector of hyperparameters _lambda_. The validation error depends on the model output and thus\n on the model parameters _w_. \n  _w_ should be a minimizer of the training error and the hyperparameter optimization \n  problem can be naturally formulated as a __bilevel optimization__ problem.  \n   Since these problems are rather hard to tackle, we  \nexplicitly take into account the learning dynamics used to obtain the model  \nparameters (e.g. you can think about stochastic gradient descent with momentum),\nand we formulate\nHO as a __constrained optimization__ problem. See the [paper](http://proceedings.mlr.press/v70/franceschi17a) for details.\n\n#### New features and differences from RFHO\n\n- __Simplified interface__: optimize paramters and hyperparamters with \"just\" a call of `far.HyperOptimizer.minimize`, create variables designed as hyperparameters with `far.get_hyperparameter`, no more need to vectorize the model weights, `far.optimizers` only need to specify the update as a list of pairs (v, v_{k+1})\n- __Additional capabilities__: set an initalizaiton dynamics and optimize the (dsitribution) of initial weights, allowed explicit dependence of the outer objective w.r.t. hyperparameters, support for multiple outer objectives and multiple inner problems (episode batching, average the sampling from distributions, ...)\n- __Tighter integration__: collections for hyperparameters and hypergradients (use `far.GraphKeys`), use out-of-the-box models (no need to vectorize the model), use any TensorFlow optimizer for the outer objective (validation error)\n- Lighter package: only code for implementing the algorithms and running the examples\n- Forward hypergradient methods have been reimplemented with a [double reverse mode trick](https://j-towns.github.io/2017/06/12/A-new-trick.html), thanks to Jamie Townsend. \n\n### Citing \n\nIf you use this package please cite\n\n```latex\n@InProceedings{franceschi2017forward,\n  title = \t {Forward and Reverse Gradient-Based Hyperparameter Optimization},\n  author = \t {Luca Franceschi and Michele Donini and Paolo Frasconi and Massimiliano Pontil},\n  booktitle = \t {Proceedings of the 34th International Conference on Machine Learning},\n  pages = \t {1165--1173},\n  year = \t {2017},\n  volume = \t {70},\n  series = \t {Proceedings of Machine Learning Research},\n  publisher = \t {PMLR},\n  pdf = \t {http://proceedings.mlr.press/v70/franceschi17a/franceschi17a.pdf},\n}\n```\n\n##### Works on meta-learning\n\n\n```latex\n@InProceedings{franceschi2018bilevel,\n  title = \t {Bilevel Programming for Hyperparameter Optimization and Meta-learning},\n  author = \t {Luca Franceschi and Paolo Frasconi and Saverio Salzo and Riccardo Grazzi and Massimiliano Pontil},\n  booktitle = \t {Proceedings of the 35th International Conference on Machine Learning (ICML 2018},\n  year = \t {2018},\n  series = \t {Proceedings of Machine Learning Research},\n  publisher = \t {PMLR},\n  pdf = \t {http://proceedings.mlr.press/v80/franceschi18a/franceschi18a.pdf},\n}\n```\n\n\n```latex\n@article{franceschi2017bridge,\n  title={A Bridge Between Hyperparameter Optimization and Larning-to-learn},\n  author={Franceschi, Luca and Frasconi, Paolo and Donini, Michele and Pontil, Massimiliano},\n  journal={arXiv preprint arXiv:1712.06283},\n  year={2017}\n}\n```\n\nThis package has been used for the project [LDS-GNN](https://github.com/lucfra/LDS-GNN): the code for the ICML 2019 paper [\"Learning Discrete Structures for Graph Neural Networks\"](https://arxiv.org/abs/1903.11960).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucfra%2FFAR-HO","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucfra%2FFAR-HO","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucfra%2FFAR-HO/lists"}