{"id":28209058,"url":"https://github.com/dscamiss/newt","last_synced_at":"2025-06-10T13:35:30.832Z","repository":{"id":290456258,"uuid":"966991372","full_name":"dscamiss/newt","owner":"dscamiss","description":"The Newton-like learning rate scheduler","archived":false,"fork":false,"pushed_at":"2025-04-29T18:07:56.000Z","size":188,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-17T15:14:00.071Z","etag":null,"topics":["artificial-intelligence","learning-rate","learning-rate-scheduler","machine-learning","machine-learning-algorithms","second-order-optimization"],"latest_commit_sha":null,"homepage":"https://dscamiss.github.io/blog/posts/newton-like-method/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dscamiss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-15T18:57:16.000Z","updated_at":"2025-04-29T18:07:59.000Z","dependencies_parsed_at":"2025-04-28T23:43:33.407Z","dependency_job_id":"b7acd6d8-1857-4091-b75e-d2f0aa027eab","html_url":"https://github.com/dscamiss/newt","commit_stats":null,"previous_names":["dscamiss/newt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fnewt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fnewt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fnewt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fnewt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dscamiss","download_url":"https://codeload.github.com/dscamiss/newt/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dscamiss%2Fnewt/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259085448,"owners_count":22803203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","learning-rate","learning-rate-scheduler","machine-learning","machine-learning-algorithms","second-order-optimization"],"created_at":"2025-05-17T15:13:22.196Z","updated_at":"2025-06-10T13:35:30.820Z","avatar_url":"https://github.com/dscamiss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `newt` :lizard:\n\n![License](https://img.shields.io/badge/license-MIT-blue)\n![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?logo=PyTorch\u0026logoColor=white)\n![Python](https://img.shields.io/badge/python-3.9-blue.svg)\n![Python](https://img.shields.io/badge/python-3.10-blue.svg)\n![Python](https://img.shields.io/badge/python-3.11-blue.svg)\n![Build](https://github.com/dscamiss/newt/actions/workflows/python-package.yml/badge.svg)\n[![codecov](https://codecov.io/gh/dscamiss/newt/graph/badge.svg?token=Z3CGGZJ70B)](https://codecov.io/gh/dscamiss/newt)\n\n# Introduction\n\nThis package provides a PyTorch implementation of the Newton-like learning rate scheduler.\n\nThe general approach [1] is to attempt to minimize the loss function $L : \\Theta \\to \\mathbb{R}$ by iterating\n\n$$\n\\begin{align*}\n    \\theta_{t+1} = \\theta_t - \\alpha_t u_t \\\\\n    \\alpha_{t+1} = \\alpha_t - \\frac{g'_t(\\alpha_t)}{g''_t(\\alpha_t)},\n\\end{align*}\n$$\n\nwhere\n\n* $\\alpha_t$ is the learning rate at iteration $t$,\n* $u_t$ is the $\\theta$ update vector at iteration $t$, and\n* $g_t(\\alpha) = L(\\theta_t - \\alpha u_t)$.\n\nIn other words, we simultaneously run a gradient descent update on $\\theta$ (using an arbitrary\noptimizer to produce the update vectors) and a Newton update on $\\alpha$.  \n\nThe implementation details primarily concern the Newton update, since directly computing $g''_t(\\alpha_t)$ \nrequires an expensive Hessian-vector product.  To work around this, we must use an approximation.\nThe approximations available in this package are described [here](https://dscamiss.github.io/blog/posts/newton-like-method/).\nOn top of the approximations, there are added heuristics to manage increasing loss values, to avoid a vanishing or diverging \nlearning rate, etc.\n\n# Installation\n\nIn an existing Python 3.9+ environment:\n\n```python\ngit clone https://github.com/dscamiss/newt\npip install ./newt\n```\n\n# Usage\n\nImport:\n\n```python\nfrom newt import Newt, NewtConfig\n````\n\nCreate your model and loss criterion:\n\n```python\nmodel = MyModel(...)\nloss_criterion = MyLossCriterion(...)\n```\n\nCreate the corresponding `Newt` instance:\n\n```python\nnewt_config = NewtConfig(model=model, loss_criterion=loss_criterion)\nnewt = Newt(optimizer, newt_config)\n```\n\nAdd the LR scheduler step to the training loop:\n\n```python\nfor batch_idx, (x, y) in enumerate(train_data_loader):\n    x, y = x.to(device), y.to(device)\n    optimizer.zero_grad()\n    y_hat = model(x)\n    loss = loss_criterion(y_hat, y)\n    loss.backward()\n    optimizer.step()\n\n    newt.step_setup(loss, x, y)  # Computes lookahead gradients\n    newt.step()                  # Consumes lookahead gradients\n```\n\n# Example\n\nTraces for a simple MNIST example using the `AdamW` optimizer: \n\n![Alt text](src/examples/plots/train_mnist_AdamW.png)\n\n# To-do\n\n- [ ] Support for multiple parameter groups\n- [ ] Add CIFAR example, larger models\n- [ ] EWMA alpha update instead of multiplicative\n\n# References\n\n1. G. Retsinas, G. Sfikas, P. Filntisis and P. Maragos, \"Newton-Based Trainable Learning Rate,\" ICASSP 2023.\n2. G. Retsinas, G. Sfikas, P. Filntisis and P. Maragos, \"Trainable Learning Rate\",\n2022, retracted.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdscamiss%2Fnewt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdscamiss%2Fnewt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdscamiss%2Fnewt/lists"}