{"id":16479851,"url":"https://github.com/ltatzel/pytorchhessianfree","last_synced_at":"2025-10-07T14:23:47.998Z","repository":{"id":49843264,"uuid":"512674603","full_name":"ltatzel/PyTorchHessianFree","owner":"ltatzel","description":"PyTorch implementation of the Hessian-free optimizer","archived":false,"fork":false,"pushed_at":"2024-06-14T12:20:18.000Z","size":165,"stargazers_count":37,"open_issues_count":0,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-09-22T02:42:40.543Z","etag":null,"topics":["conjugate-gradient","deep-learning","deep-neural-networks","ggn","hessian-free","matrix-free","newtons-method","optimization-algorithms","pytorch-implementation","second-order-optimization","stochastic-optimizers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ltatzel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-07-11T08:27:17.000Z","updated_at":"2025-09-15T22:24:24.000Z","dependencies_parsed_at":"2024-06-14T10:45:02.264Z","dependency_job_id":"e31792da-4c31-448a-a155-dc4e8423cd36","html_url":"https://github.com/ltatzel/PyTorchHessianFree","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ltatzel/PyTorchHessianFree","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltatzel%2FPyTorchHessianFree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltatzel%2FPyTorchHessianFree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltatzel%2FPyTorchHessianFree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltatzel%2FPyTorchHessianFree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ltatzel","download_url":"https://codeload.github.com/ltatzel/PyTorchHessianFree/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltatzel%2FPyTorchHessianFree/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278788180,"owners_count":26045915,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conjugate-gradient","deep-learning","deep-neural-networks","ggn","hessian-free","matrix-free","newtons-method","optimization-algorithms","pytorch-implementation","second-order-optimization","stochastic-optimizers"],"created_at":"2024-10-11T12:53:04.833Z","updated_at":"2025-10-07T14:23:47.956Z","avatar_url":"https://github.com/ltatzel.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `PyTorchHessianFree` \n\nHere, we provide a PyTorch implementation of the Hessian-free optimizer as\ndescribed in [1] and [2] (see below). This project is currently still being\ndeveloped, so changes may be made at any time.\n\n**Core idea:** At each step, the  optimizer computes a local quadratic\napproximation of the target function and uses the conjugate gradient (cg) method\nto approximate its minimum (the Newton step). This method only requires access\nto matrix-vector products with the curvature matrix, which can be done without\ncreating this matrix in memory explicitly. This makes the Hessian-free optimizer\napplicable for large problems with high-dimensional parameter spaces (e.g.\ntraining neural networks).\n\n**Credits:** The `pytorch-hessianfree`\n[repo](https://github.com/fmeirinhos/pytorch-hessianfree/blob/master/hessianfree.py)\nby GitHub-user `fmeirinhos` served as a starting point. For the matrix-vector\nproducts with the Hessian or GGN, we use functionality from the BackPACK\n[package](https://backpack.pt/) [3].\n\n**Table of contents:**\n1. [Installation instructions](#installation)\n2. [Example](#example)\n3. [Structure of this repo](#structure)\n4. [Implementation details](#details)\n5. [Contributing](#contributing)\n6. [References](#references)\n\n---\n\n## 1. Installation instructions \u003ca name=\"installation\"\u003e\u003c/a\u003e\n\nIf you want to use the optimizer, you can download the repo from GitHub via `git clone\nhttps://github.com/ltatzel/PyTorchHessianFree.git`. Then, navigate to the project folder\n`cd PyTorchHessianFree` and install it with `pip install -e .`. Or install the package\ndirectly from GitHub via `pip install hessianfree\ngit+https://git@github.com/ltatzel/PyTorchHessianFree.git@main`. \n\nAdditional requirements for the **tests, examples and dev** can be installed via `pip\ninstall -e \".[tests]\"`, `pip install -e \".[examples]\"` or `pip install -e \".[dev]\"`,\nrespectively. For running the tests, execute `pytest` from the repo's root directory.\n\n\n## 2. Example \u003ca name=\"example\"\u003e\u003c/a\u003e\n\n```python\n\"\"\"A minimal working example using the `HessianFree` optimizer on a small neural\nnetwork and some dummy data.\n\"\"\"\n\nimport torch\n\nfrom hessianfree.optimizer import HessianFree\n\nDEVICE = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\nBATCH_SIZE = 16\nDIM = 10\n\nif __name__ == \"__main__\":\n\n    # Set up model, loss-function and optimizer\n    model = torch.nn.Sequential(\n        torch.nn.Linear(DIM, DIM, bias=False),\n        torch.nn.ReLU(),\n        torch.nn.Linear(DIM, DIM),\n    ).to(DEVICE)\n    loss_function = torch.nn.MSELoss()\n    opt = HessianFree(model.parameters(), verbose=True)\n\n    # Training\n    for step_idx in range(5):\n\n        # Sample dummy data, define the `forward`-function\n        inputs = torch.rand(BATCH_SIZE, DIM).to(DEVICE)\n        targets = torch.rand(BATCH_SIZE, DIM).to(DEVICE)\n\n        def forward():\n            outputs = model(inputs)\n            loss = loss_function(outputs, targets)\n            return loss, outputs\n\n        # Update the model's parameters\n        opt.step(forward=forward)\n```\n\n\n## 3. Structure of this repo \u003ca name=\"structure\"\u003e\u003c/a\u003e\n\nThe repo contains three folders:\n- `hessianfree`: This folder contains all the optimizer's components (e.g. the\n  line search, the cg-method and preconditioners). The **Hessian-free\n  optimizer** itself is implemented in the `optimizer.py` file.\n- `tests`: Here, we **test** functionality implemented in `hessianfree`. \n- `examples`: This folder contains a few **basic examples** demonstrating how to\nuse the optimizer for training neural networks (using the `step` and `acc_step`\nmethod) and optimizing deterministic functions (e.g. the Rosenbrock function). \n\n\n## 4. Implementation details \u003ca name=\"details\"\u003e\u003c/a\u003e\n\n- **Hessian \u0026 GGN:** Our implementation allows using either the Hessian matrix\n  or the GGN as curvature matrix via the argument `curvature_opt` to the\n  optimizer's constructor. As recommended in [1, Section 4.2] and [2, e.g. p.\n  10], the default is the symmetric positive semidefinite GGN. For the\n  matrix-vector products with these matrices, we use functionality from the\n  BackPACK package [3].\n\n- **Damping:** As described in [1, Section 4.1], Tikhonov-damping can be used to\n  avoid overly large steps. Our implementation also features the\n  Levenberg-Marquardt style heuristic for adjusting the damping parameter - it\n  can be turned on and off via the `adapt_damping` switch.\n\n- **PCG:** Our implementation of the preconditioned conjugate gradient method\n  features the termination criterion presented in [1, Section 4.4] via the\n  argument `martens_conv_crit`. \n  \n  As suggested in [1, Section 4.5], we use the cg-\"solution\" from the last step\n  as a starting point for the next one. Via the argument `cg_decay_x0` to the\n  optimizer's constructor, this initial search direction can be scaled by a\n  constant. The default is `0.95` as in [2, Section 10].\n\n  The `get_preconditioner`-method implements the preconditioner suggested in [1,\n  Section 4.7]: The diagonal of the empirical Fisher matrix. \n\n- **CG-backtracking \u0026 line search:** When cg-backtracking is used, the\n  `cg`-method will return not only the final \"solution\" to the linear system but\n  also intermediate \"solutions\" for a subset of the iterations. This grid of\n  iterations is generated using the approach from [1, Section 4.6]. In a\n  subsequent step, the set of potential update steps is searched for an \"ideal\"\n  candidate. \n  \n  Next, this update step is iteratively scaled back by the line search until the\n  target function is decreased \"significantly\" (Armijo condition). This approach\n  is described in [2, Section 8.8]. \n  \n  Both these modules are optional and can be turned on and off via the switches\n  `use_cg_backtracking` and `use_linesearch`.\n\n- **Computing parameter updates:** Our Hessian-free optimizer offers two methods\n  for computing parameter updates: `step` and `acc_step`. \n\n  The former one, which is also used in the example above, only has one required\n  argument: the `forward`-function. This represents the target function and all\n  relevant quantities needed by the optimizer (e.g. the gradient and curvature\n  information) are deduced from this function. \n  \n  You may want to use the latter method `acc_step` if you run out of memory when\n  training your neural network model using `step` or if you want to evaluate the\n  target function value (the loss), gradient and curvature on different data\n  sets (this is actually recommended since it reduces mini-batch overfitting).\n  The `acc_step` method allows you to specify (potentially different)\n  lists of data for these three quantities. It evaluates e.g. the gradient only\n  on one list entry (i.e. one mini-batch) at a time and `acc`umulates the\n  individual gradients automatically. This iterative approach slows down the\n  computations but enables us to work with very large data sets. A basic example\n  can be found\n  [here](https://github.com/ltatzel/PyTorchHessianFree/blob/740bd80346873a75f904bbba15f0737403a3d511/examples/run_small_nn_acc.py).\n\n\n## 5. Contributing \u003ca name=\"contributing\"\u003e\u003c/a\u003e\n\nI would be very grateful for any feedback! If you have questions, a feature\nrequest, found a bug or have comments on how to improve the code, please don't\nhesitate to reach out to me.\n\n\n## 6. References \u003ca name=\"references\"\u003e\u003c/a\u003e\n\n[1] \"Deep learning via Hessian-free optimization\" by James Martens. In\n    Proceedings of the 27th International Conference on International Conference\n    on Machine Learning (ICML), 2010. Paper available at\n    https://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf (accessed\n    June 2022).\n\n[2] \"Training Deep and Recurrent Networks with Hessian-Free Optimization\" by\n    James Martens and Ilya Sutskever. Report available at\n    https://www.cs.utoronto.ca/~jmartens/docs/HF_book_chapter.pdf (accessed June\n    2022).\n\n[3] \"BackPACK: Packing more into Backprop\" by Felix Dangel, Frederik Kunstner\n    and Philipp Hennig. In International Conference on Learning Representations,\n    2020. Paper available at https://openreview.net/forum?id=BJlrF24twB\n    (accessed June 2022). Python package available at\n    https://github.com/f-dangel/backpack.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fltatzel%2Fpytorchhessianfree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fltatzel%2Fpytorchhessianfree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fltatzel%2Fpytorchhessianfree/lists"}