{"id":14989238,"url":"https://github.com/gngdb/pytorch-minimize","last_synced_at":"2025-04-15T21:48:15.812Z","repository":{"id":54365805,"uuid":"340420077","full_name":"gngdb/pytorch-minimize","owner":"gngdb","description":"Use scipy.optimize.minimize as a PyTorch Optimizer.","archived":false,"fork":false,"pushed_at":"2024-07-25T15:02:33.000Z","size":2176,"stargazers_count":72,"open_issues_count":1,"forks_count":11,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-15T21:48:10.430Z","etag":null,"topics":["python","pytorch","scipy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gngdb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-19T16:03:03.000Z","updated_at":"2025-03-27T09:14:51.000Z","dependencies_parsed_at":"2024-09-25T00:34:29.151Z","dependency_job_id":null,"html_url":"https://github.com/gngdb/pytorch-minimize","commit_stats":{"total_commits":66,"total_committers":2,"mean_commits":33.0,"dds":0.2727272727272727,"last_synced_commit":"01ce582f90b49b638cf77c88e75dd3868b5f3f95"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gngdb%2Fpytorch-minimize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gngdb%2Fpytorch-minimize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gngdb%2Fpytorch-minimize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gngdb%2Fpytorch-minimize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gngdb","download_url":"https://codeload.github.com/gngdb/pytorch-minimize/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249161104,"owners_count":21222468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","pytorch","scipy"],"created_at":"2024-09-24T14:17:54.972Z","updated_at":"2025-04-15T21:48:15.791Z","avatar_url":"https://github.com/gngdb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003e [!IMPORTANT]\n\u003e This project is not in active development. [Functional updates to PyTorch would\n\u003e make everything here cleaner and more reliable][func_issue]. Also, I haven't\n\u003e tried it but [`rfeinman`'s re-implementation][rfeinman] of\n\u003e `scipy.optimize` in PyTorch may be what you're looking for as it should have\n\u003e the same functionality as this project in most cases.\n\n[func_issue]: https://github.com/gngdb/pytorch-minimize/issues/3\n\nPyTorch Minimize\n================\n\nA wrapper for [`scipy.optimize.minimize`][scipy] to make it a PyTorch\nOptimizer implementing Conjugate Gradients, BFGS, l-BFGS, SLSQP, Newton\nConjugate Gradient, Trust Region methods and others in PyTorch.\n\n*Warning*: this project is a proof of concept and is not necessarily\nreliable, although [the code](./pytorch_minimize/optim.py) (that's all of\nit) is small enough to be readable.\n\n* [Quickstart](#quickstart)\n  * [Install](#install)\n  * [Using The Optimizer](#using-the-optimizer)\n* [Which Algorithms Are Supported?](#which-algorithms-are-supported)\n  * [Methods that require Hessian evaluations](#methods-that-require-hessian-evaluations)\n  * [Algorithms without gradients](#algorithms-without-gradients)\n  * [Algorithms you can choose but don't work](#algorithms-you-can-choose-but-dont-work)\n* [Global Optimizers](#global-optimizers)\n* [How Does it Work?](#how-does-it-work)\n  * [Other Implmentations](#other-implementations)\n* [How Does This Evaluate the Hessian?](#how-does-this-evaluate-the-hessian)\n* [Credits](#credits)\n\nQuickstart\n----------\n\n### Install\n\nDependencies:\n\n* `pytorch`\n* `scipy`\n\nThe following install procedure isn't going to check these are installed.\n\nThis package can be installed with `pip` directly from Github:\n\n```\npip install git+https://github.com/gngdb/pytorch-minimize.git\n```\n\nOr by cloning the repository and then installing:\n\n```\ngit clone https://github.com/gngdb/pytorch-minimize.git\ncd pytorch-minimize\npython -m pip install .\n```\n\n### Using The Optimizer\n\nThe Optimizer class is `MinimizeWrapper` in `pytorch_minimize.optim`.  It\nhas the same interface as a [PyTorch Optimizer][optimizer], taking\n`model.parameters()`, and is configured by passing a dictionary of\narguments, here called `minimizer_args`, that will later be passed to\n[`scipy.optimize.minimize`][scipy]:\n\n```\nfrom pytorch_minimize.optim import MinimizeWrapper\nminimizer_args = dict(method='CG', options={'disp':True, 'maxiter':100})\noptimizer = MinimizeWrapper(model.parameters(), minimizer_args)\n```\n\nThe main difference when using this optimizer as opposed to most PyTorch\noptimizers is that a [closure][] ([`torch.optim.LBFGS`][torch_lbfgs] also\nrequires this) must be defined:\n\n```\ndef closure():\n    optimizer.zero_grad()\n    output = model(input)\n    loss = loss_fn(output, target)\n    loss.backward()\n    return loss\noptimizer.step(closure)\n```\n\nThis optimizer is intended for **deterministic optimisation problems**,\nsuch as [full batch learning problems][batch]. Because of this,\n`optimizer.step(closure)` should only need to be called **once**.\n\nCan `.step(closure)` be called more than once? Technically yes, but it\nshouldn't be necessary because multiple steps are run internally up to the\n`maxiter` option in `minimizer_args` and multiple calls are not\nrecommended. Each call to `optimizer.step(closure)` is an independent\nevaluation of `scipy.optimize.minimize`, so the internal state of any\noptimization algorithm will be interrupted.\n\n[torch_lbfgs]: https://pytorch.org/docs/stable/optim.html#torch.optim.LBFGS\n\n\nWhich Algorithms Are Supported?\n-------------------------------\n\nUsing PyTorch to calculate the Jacobian, the following algorithms are\nsupported:\n\n* [Conjugate Gradients][conjugate]: `'CG'`\n* [Broyden-Fletcher-Goldfarb-Shanno (BFGS)][bfgs]: `'BFGS'`\n* [Limited-memory BFGS][lbfgs]: `'L-BFGS-B'` but **requires double precision**:\n    * `nn.Module` containing parameters must be cast to double, example:\n`model = model.double()`\n* [Sequential Least Squares Programming][slsqp]: `'SLSQP'`\n* [Truncated Newton][tnc]: `'TNC'` but **also requires double precision**\n\nThe method name string is given on the right, corresponding to the names\nused by [scipy.optimize.minimize][scipy].\n\n### Methods that require Hessian evaluations\n\n**Warning**: this is experimental and probably unpredictable.\n\nTo use the methods that require evaluating the Hessian, a `Closure` object\nwith the following methods is required (full MNIST example\n[here](./mnist/hessian_logistic_regression.py)):\n\n```\nclass Closure():\n    def __init__(self, model):\n        self.model = model\n\n    @staticmethod\n    def loss(model):\n        output = model(data)\n        return loss_fn(output, target)\n\n    def __call__(self):\n        optimizer.zero_grad()\n        loss = self.loss(self.model)\n        loss.backward()\n        return loss\nclosure = Closure(model)\n```\n\nThe following methods can then be used:\n\n* [Newton Conjugate Gradient](https://youtu.be/0qUAb94CpOw?t=30m41s): `'Newton-CG'`\n* [Newton Conjugate Gradient Trust-Region][trust]: `'trust-ncg'`\n* [Krylov Subspace Trust-Region][krylov]: `'trust-krylov'`\n* [Nearly Exact Trust-Region][trust]: `'trust-exact'`\n* [Constrained Trust-Region][trust]: `'trust-constr'`\n\nThe code contains hacks to make it possible to call\n[torch.autograd.functional.hessian][torchhessian] (which is itself only\nsupplied in PyTorch as beta).\n\n### Algorithms without gradients\n\nIf using the `scipy.optimize.minimize` algorithms that don't require\ngradients (such as `'Nelder-Mead'`, `'COBYLA'` or `'Powell'`), ensure that\n`minimizer_args['jac'] = False` when instancing `MinimizeWrapper`.\n\n### Algorithms you can choose but don't work\n\nAlgorithms I tested didn't converge on a toy problem or hit errors.\nYou can still select them but they may not work:\n\n* [Dogleg][]: `'dogleg'`\n\nAll the other methods that require gradients converged on a toy problem\nthat is tested in Travis-CI.\n\nGlobal Optimizers\n-----------------\n\nThere are a few [global optimization algorithms in\n`scipy.optimize`][global]. The following are supported via their own\nwrapper classes:\n\n* Basin Hopping via `BasinHoppingWrapper`\n* Differential Evolution via `DifferentialEvolutionWrapper`\n* Simplicial Homology Global Optimization via `SHGOWrapper`\n* Dual Annealing via `DualAnnealingWrapper`\n\nAn example of how to use one of these wrappers:\n\n```\nfrom pytorch_minimize.optim import BasinHoppingWrapper\nminimizer_args = dict(method='CG', options={'disp':True, 'maxiter':100})\nbasinhopping_kwargs = dict(niter=200)\noptimizer = BasinHoppingWrapper(model.parameters(), minimizer_args, basinhopping_kwargs)\n```\n\nThese are also illustrated in [this colab notebook][colab], where the\nfollowing plots were generated:\n\n![Basin Hopping](images/rastrigin_BasinHoppingWrapper.png)\n\n![Differential Evolution](images/rastrigin_DifferentialEvolutionWrapper.png)\n\n![Dual Annealing](images/rastrigin_DualAnnealingWrapper.png)\n\n![Simplicial Homology Global Optimization](images/rastrigin_SHGOWrapper.png)\n\n[colab]: https://colab.research.google.com/drive/19hZSxw3ZT3IgWGD9ZOuOYryeJoOGenJU?usp=sharing\n[global]: https://docs.scipy.org/doc/scipy/reference/optimize.html#global-optimization\n\nHow Does it Work?\n-----------------\n\n[`scipy.optimize.minimize`][scipy] is expecting to receive a function `fun` that\nreturns a scalar and an array of gradients the same size as the initial\ninput array `x0`. To accomodate this, `MinimizeWrapper` does the following:\n\n1. Create a wrapper function that will be passed as `fun`\n2. In that function:\n    1. Unpack the umpy array into parameter tensors\n    2. Substitute each parameter in place with these tensors\n    3. Evaluate `closure`, which will now use these parameter values\n    4. Extract the gradients\n    5. Pack the gradients back into one 1D Numpy array\n    6. Return the loss value and the gradient array\n\nThen, all that's left is to call `scipy.optimize.minimize` and unpack the\noptimal parameters found back into the model.\n\nThis procedure involves unpacking and packing arrays, along with moving\nback and forth between Numpy and PyTorch, which may incur some overhead. I\nhaven't done any profiling to find out if it's likely to be a big problem\nand it completes in seconds when optimizing a logistic regression on MNIST\nby conjugate gradients.\n\n### Other Implementations\n\nThere are a few other projects that incorporate `scipy.optimize` and\npytorch:\n\n* [This gist][mygist] I wrote in 2018 then forgot about creates an\nObjective object to pass into `scipy.optimize` but packs the arrays and\ngradients in approximately the same way.\n* [botorch's `gen_candidates_scipy`][botorch] wraps\n`scipy.optimize.minimize` and uses it to optimize acquisition functions as\npart of Bayesian Optimization.\n* [autograd-minimize][agmin] wraps the `minimize` function itself, allowing\nPyTorch or Tensorflow objectives to be passed directly to a function with\nthe same interface as `scipy.optimize.minimize`.\n\n[agmin]: https://github.com/brunorigal/autograd-minimize\n[botorch]: https://github.com/pytorch/botorch/blob/main/botorch/generation/gen.py\n[mygist]: https://gist.github.com/gngdb/a9f912df362a85b37c730154ef3c294b\n\n### Pure PyTorch Minimization\n\n`rfeinman` has implemented some of the algorithms available in `scipy.optimize`\nin a repository with [the same name as this repository][rfeinman]. That\nimplementation is much more efficient and avoids switching between\n32 and 64 bit floats between Numpy and PyTorch.\n\nThat repository also contains [a wrapper around scipy.optimize.minimize][rfeinmanwrapper].\n\n[rfeinman]: https://github.com/rfeinman/pytorch-minimize\n[rfeinmanwrapper]: https://github.com/rfeinman/pytorch-minimize/blob/15742bbc17999976e7e3268c9181dadad772698b/torchmin/optim/scipy_minimizer.py#L93-L291\n\nHow Does This Evaluate the Hessian?\n-----------------------------------\n\nTo evaluate the Hessian in PyTorch,\n[`torch.autograd.functional.hessian`][torchhessian] takes two arguments:\n\n* `func`: function that returns a scalar\n* `inputs`: variables to take the derivative wrt\n\nIn most PyTorch code, `inputs` is a list of tensors embedded as parameters\nin the Modules that make up the `model`. They can't be passed as `inputs`\nbecause we typically don't have a `func` that will take the parameters as\ninput, build a network from these parameters and then produce a scalar\noutput.\n\nFrom a [discussion on the PyTorch forum][forum] the only way to calculate\nthe gradient with respect to the parameters would be to monkey patch\n`inputs` into the model and then calculate the loss. I wrote a [recursive\nmonkey patch][monkey] that operates on a [deepcopy][] of the original\n`model`.  This involves copying everything in the model so it's not very\nefficient.\n\nThe function passed to `scipy.optimize.minimize` as `hess` does the\nfollowing:\n\n1. [`copy.deepcopy`][deepcopy] the entire `model` Module\n2. Input `x` is a Numpy array so cast it to tensor float32 and\n`require_grad`\n3. Define a function `f` that unpacks this 1D Numpy array into parameter\ntensors\n    * [Recursively navigate][re_attr] the module object\n        - Deleting all existing parameters\n        - Replacing them with unpacked parameters from step 2\n    * Calculate the loss using the static method stored in the `closure` object\n5. Pass `f` to `torch.autograd.functional.hessian` and `x` then cast the\nresult back into a Numpy array\n\nCredits\n-------\n\nIf you use this in your work, please cite this repository using the\nfollowing Bibtex entry, along with [Numpy][numpycite], [Scipy][scipycite]\nand [PyTorch][pytorchcite].\n\n```\n@misc{gray2021minimize,\n  author = {Gray, Gavia},\n  title = {PyTorch Minimize},\n  year = {2021},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/gngdb/pytorch-minimize}}\n}\n```\n\nThis package was created with [Cookiecutter][] and the\n[`audreyr/cookiecutter-pypackage`][audreyr] project template.\n\n[pytorchcite]: https://github.com/pytorch/pytorch/blob/master/CITATION\n[numpycite]: https://www.scipy.org/citing.html#numpy\n[scipycite]: https://www.scipy.org/citing.html#scipy-the-library\n[re_attr]: https://stackoverflow.com/a/31174427/6937913\n[deepcopy]: https://docs.python.org/3/library/copy.html#copy.deepcopy\n[monkey]: https://github.com/gngdb/pytorch-minimize/blob/master/pytorch_minimize/optim.py#L106-L122\n[forum]: https://discuss.pytorch.org/t/using-autograd-functional-jacobian-hessian-with-respect-to-nn-module-parameters/103994/3\n[dogleg]: https://en.wikipedia.org/wiki/Powell%27s_dog_leg_method\n[tnc]: https://en.wikipedia.org/wiki/Truncated_Newton_method\n[krylov]: https://epubs.siam.org/doi/abs/10.1137/1.9780898719857.ch5\n[trust]: https://en.wikipedia.org/wiki/Trust_region\n[torchhessian]: https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.hessian\n[slsqp]: https://en.wikipedia.org/wiki/Sequential_quadratic_programming\n[conjugate]: https://en.wikipedia.org/wiki/Conjugate_gradient_method\n[lbfgs]: https://en.wikipedia.org/wiki/Limited-memory_BFGS\n[bfgs]: https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm\n[batch]: https://towardsdatascience.com/batch-mini-batch-stochastic-gradient-descent-7a62ecba642a\n[closure]: https://pytorch.org/docs/stable/optim.html#optimizer-step-closure\n[optimizer]: https://pytorch.org/docs/stable/optim.html\n[scipy]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html\n[Cookiecutter]: https://github.com/audreyr/cookiecutter\n[audreyr]: https://github.com/audreyr/cookiecutter-pypackage\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgngdb%2Fpytorch-minimize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgngdb%2Fpytorch-minimize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgngdb%2Fpytorch-minimize/lists"}