{"id":20977053,"url":"https://github.com/githubharald/analyze_ada_hessian","last_synced_at":"2025-05-14T14:31:50.245Z","repository":{"id":107433298,"uuid":"289535095","full_name":"githubharald/analyze_ada_hessian","owner":"githubharald","description":"Analyze AdaHessian optimizer on 2D functions.","archived":false,"fork":false,"pushed_at":"2021-08-13T13:38:02.000Z","size":79,"stargazers_count":14,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-02T19:11:28.014Z","etag":null,"topics":["adahessian","optimizer","pytorch"],"latest_commit_sha":null,"homepage":"https://harald-scheidl.medium.com/2fc76b29bcbb","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/githubharald.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-22T17:28:03.000Z","updated_at":"2025-01-16T23:36:39.000Z","dependencies_parsed_at":"2023-04-22T01:48:47.064Z","dependency_job_id":null,"html_url":"https://github.com/githubharald/analyze_ada_hessian","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2Fanalyze_ada_hessian","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2Fanalyze_ada_hessian/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2Fanalyze_ada_hessian/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2Fanalyze_ada_hessian/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/githubharald","download_url":"https://codeload.github.com/githubharald/analyze_ada_hessian/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254160648,"owners_count":22024574,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adahessian","optimizer","pytorch"],"created_at":"2024-11-19T04:57:10.023Z","updated_at":"2025-05-14T14:31:50.239Z","avatar_url":"https://github.com/githubharald.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Analyze AdaHessian\n\nThis repository allows analyzing the new AdaHessian optimizer and its parameters on 2D functions f(x, y).\n\n\n## Usage\n\nGo to `src/` and run the script `main.py`.\nAll parameters of this script have default values (so they are all optional):\n* --func: the function to be optimized, the 2D input vector (x, y) must be named \"v\", PyTorch functions can be used, e.g. \"v\\[0\\]\\*\\*2+v\\[1\\]\\*\\*2\" for a quadratic function x^2+y^2.\n* --start: start value v0=(x0, y0), e.g. \"1 2\" to start at (1, 2)\n* --num_iter: number of iterations\n* --lr: learning rate\n* --beta_g: momentum-parameter for the gradient\n* --beta_h: momentum-parameter for the Hessian\n* --hessian_pow: Hessian power, between between 1 and 0, where 1 gives the Newton update direction (default), while 0 gives the gradient descent update direction\n* --num_samples: the Hessian diagonal is computed using random vectors, where the approximation of the Hessian gets better when taking more samples per iteration (default 1)\n* --window: the region shown in the plot, specified in the order left, right, bottom, top, with left \u003c right and bottom \u003c top, e.g. \"-3 1 -2 5\"\n\n\n## Examples\n\n### Quadratic functions\n\n\n* `python main.py --func \"v[0]**2+5*v[1]**2\" --beta_g 0 --beta_h 0 --lr 1`: here AdaHessian behaves like the \"vanilla\" Newton method (lr=1, betas=0, 0) on a quadratic function without mixed terms (x\\*y), for which a single step is enough to reach the minimum (left plot)\n* `python main.py --func \"v[0]**2+5*v[1]**2+v[0]*v[1]\" --beta_g 0 --beta_h 0 --lr 0.5`: quadratic function with mixed terms (x\\*y), for which AdaHessian with its diagonal-only Hessian is not able to jump directly to the minimum in contrast to \"vanilla\" Newton (right plot)\n\n![img1](doc/1.png)\n\n### Simulated noise\n\n* `python main.py --func \"v[0]**2+v[1]**2+v[0]*v[1]+0.5*torch.sum(torch.sin(5*v)**2)\" --beta_g 0 --beta_h 0 --lr 1 --start -2 -1 --num_iter 30`: there are multiple local minima due to the simulated noise, and AdaHessian gets stuck in one of them (left plot)\n* `python main.py --func \"v[0]**2+v[1]**2+v[0]*v[1]+0.5*torch.sum(torch.sin(5*v)**2)\" --beta_g 0.9 --beta_h 0.9 --lr 1 --start -2 -1 --num_iter 30`: by using momentum for both gradient and Hessian, the optimizer can \"average out\" the noise and find the global minimum (right plot)\n\n![img2](doc/2.png)\n\n## References\n\n* [Paper](https://arxiv.org/pdf/2006.00719.pdf)\n* Implementations for PyTorch\n  * [Original implementation](https://github.com/amirgholami/adahessian) from the authors of the paper\n  * [Re-implementation](https://github.com/davda54/ada-hessian) by davda54 with a nicer interface, which I use for this repository\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithubharald%2Fanalyze_ada_hessian","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgithubharald%2Fanalyze_ada_hessian","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithubharald%2Fanalyze_ada_hessian/lists"}