{"id":24021035,"url":"https://github.com/dscamiss/generalized-newtons-method","last_synced_at":"2026-05-18T04:13:36.798Z","repository":{"id":266351070,"uuid":"897565030","full_name":"dscamiss/generalized-newtons-method","owner":"dscamiss","description":"PyTorch implementation of the generalized Newton's method for learning rate selection","archived":false,"fork":false,"pushed_at":"2025-02-10T23:18:33.000Z","size":5239,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-08T02:02:33.054Z","etag":null,"topics":["artificial-intelligence","learning-rate","learning-rate-scheduler","learning-rate-scheduling","machine-learning","neural-networks","newtons-method","pytorch"],"latest_commit_sha":null,"homepage":"https://dscamiss.github.io/blog/posts/generalized_newtons_method/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dscamiss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-02T21:08:38.000Z","updated_at":"2025-02-10T23:18:37.000Z","dependencies_parsed_at":"2024-12-28T20:25:12.395Z","dependency_job_id":"56305593-d5fb-4635-84c1-21d69c7a25dd","html_url":"https://github.com/dscamiss/generalized-newtons-method","commit_stats":null,"previous_names":["dscamiss/learning-rate-utils","dscamiss/lplr","dscamiss/generalized-newtons-method"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dscamiss/generalized-newtons-method","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fgeneralized-newtons-method","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fgeneralized-newtons-method/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fgeneralized-newtons-method/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fgeneralized-newtons-method/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dscamiss","download_url":"https://codeload.github.com/dscamiss/generalized-newtons-method/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fgeneralized-newtons-method/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33164677,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T22:39:12.733Z","status":"online","status_checked_at":"2026-05-18T02:00:06.436Z","response_time":71,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","learning-rate","learning-rate-scheduler","learning-rate-scheduling","machine-learning","neural-networks","newtons-method","pytorch"],"created_at":"2025-01-08T12:20:42.022Z","updated_at":"2026-05-18T04:13:36.765Z","avatar_url":"https://github.com/dscamiss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `generalized-newtons-method`\n\n![License](https://img.shields.io/badge/license-MIT-blue)\n![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?logo=PyTorch\u0026logoColor=white)\n![Python](https://img.shields.io/badge/python-3.9-blue.svg)\n![Python](https://img.shields.io/badge/python-3.10-blue.svg)\n![Python](https://img.shields.io/badge/python-3.11-blue.svg)\n![Build](https://github.com/dscamiss/generalized-newtons-method/actions/workflows/python-package.yml/badge.svg)\n[![codecov](https://codecov.io/gh/dscamiss/generalized-newtons-method/graph/badge.svg?token=ZWTBITN49T)](https://codecov.io/gh/dscamiss/generalized-newtons-method)\n\nA PyTorch implementation of the generalized Newton's method, first proposed in [1].\n\n# Brief background\n\nThe generalized Newton's method is a learning rate scheduler that uses second-order derivative data.\n\nAs a concrete example, suppose that our objective is to minimize the loss function $L: \\Theta \\to \\mathbf{R}$ using\nvanilla SGD and a static learning rate $\\alpha$.  One gradient descent iteration is $\\theta_{t+1} \\leftarrow \\theta_t - \\alpha \\nabla_\\theta L(\\theta_t)$.\nFor this iteration, introduce the \"loss per learning rate\" function \n\n$$g(\\alpha) = L(\\theta_t - \\alpha \\nabla_\\theta L(\\theta_t)).$$  \n\nTowards the objective of minimizing $L$, we can attempt to choose $\\alpha$ such that \n$g$ is (approximately) minimized.  Provided that $g$ is well-approximated \nnear the origin by its second-order Taylor polynomial, and\nthat this polynomial is strictly convex, the generalized Newton's method chooses\n\n$$\\alpha_t = \\frac{d_\\theta L(\\theta_t) \\cdot \\nabla_\\theta L(\\theta_t)}{d_\\theta^2 L(\\theta_t) \\cdot (\\nabla_\\theta L(\\theta_t), \\nabla_\\theta L(\\theta_t))}.$$\n\nThis choice of $\\alpha_t$ minimizes the second-order Taylor polynomial, and therefore approximately minimizes $g$.\n\nMore theory and implementation notes can be found in [this blog post](https://dscamiss.github.io/blog/posts/generalized_newtons_method).\n\n# Caveats\n\nCurrently only the \"exact version\" of the method is implemented. A future version will implement the \"approximate \nversion\" of the method as well.  The difference between the two versions is that the \"approximate version\" trades off \naccuracy for efficiency, since it does not materialize the required Hessian-vector products.\n\n# Installation\n\n```bash\ngit clone https://github.com/dscamiss/generalized-newtons-method\npip install generalized-newtons-method\n```\n\n# Usage\n\n## Setup\n\n```python\nimport generalized_newtons_method as gen\nmodel = MyModel()\ncriterion = MyLossCriterion()\n```\n\n* Call `make_gen_optimizer()` to make a wrapped version of your desired optimizer:\n\n```python\noptimizer = gen.make_gen_optimizer(torch.optim.AdamW, model.parameters())\n```\n\n* Create the learning rate scheduler:\n\n```python\nlr_min, lr_max = 0.0, 1e-3  # Clamp learning rate between `lr_min` and `lr_max`\nscheduler = gen.ExactGen(optimizer, model, criterion, lr_min, lr_max)\n```\n\n## Training\n\n* Run standard training loop:\n\n```python\nfor x, y in dataloader:\n    optimizer.zero_grad()\n    loss = criterion(model(x), y)\n    loss.backward()\n    scheduler.step(x, y)  # \u003c-- Note additional arguments\n    optimizer.step()\n```\n\n# TODO\n\n- [x] Add test cases to verify second-order coefficients\n- [ ] Add \"approximate version\"\n- [ ] Add shallow CNN training example\n\n# References\n\n[1] Zi Bu and Shiyun Xu, Automatic gradient descent with generalized Newton’s method, [arXiv:2407.02772](https://arxiv.org/abs/2407.02772)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdscamiss%2Fgeneralized-newtons-method","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdscamiss%2Fgeneralized-newtons-method","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdscamiss%2Fgeneralized-newtons-method/lists"}