{"id":13992680,"url":"https://github.com/dave-fernandes/SaddleFreeOptimizer","last_synced_at":"2025-07-22T16:31:09.578Z","repository":{"id":200451344,"uuid":"167580807","full_name":"dave-fernandes/SaddleFreeOptimizer","owner":"dave-fernandes","description":"A second order optimizer for TensorFlow that uses the Saddle-Free method.","archived":false,"fork":false,"pushed_at":"2019-02-04T17:12:37.000Z","size":62,"stargazers_count":19,"open_issues_count":1,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-08-10T14:11:17.375Z","etag":null,"topics":["krylov-dimension","lanczos","optimization-algorithms","optimizer","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dave-fernandes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-01-25T16:49:12.000Z","updated_at":"2024-03-13T13:00:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"16f04b4f-4f5c-41a3-93c4-c02ad0ee2d04","html_url":"https://github.com/dave-fernandes/SaddleFreeOptimizer","commit_stats":null,"previous_names":["dave-fernandes/saddlefreeoptimizer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave-fernandes%2FSaddleFreeOptimizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave-fernandes%2FSaddleFreeOptimizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave-fernandes%2FSaddleFreeOptimizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave-fernandes%2FSaddleFreeOptimizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dave-fernandes","download_url":"https://codeload.github.com/dave-fernandes/SaddleFreeOptimizer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227133865,"owners_count":17735814,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["krylov-dimension","lanczos","optimization-algorithms","optimizer","tensorflow"],"created_at":"2024-08-09T14:02:05.405Z","updated_at":"2024-11-29T13:31:03.913Z","avatar_url":"https://github.com/dave-fernandes.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# SaddleFreeOptimizer\nA second order optimizer for TensorFlow that uses the Saddle-Free method of Dauphin _et al_. (2014) with some modifications.\n\n## Algorithm\nThe algorithm is described by [Dauphin, _et al_. \\(2014\\)](https://arxiv.org/abs/1406.2572). The implementation here follows this paper with the following exceptions:\n* The order of operations in the Lanczos method follows that recommended by [Paige \\(1972\\)](https://academic.oup.com/imamat/article-abstract/10/3/373/824284).\n* The type of damping applied to the curvature matrix in the Krylov subspace has 3 options that can be specified in the optimizer's constructor.\n* Instead of applying multiple damping coefficients and finding the result with the lowest loss, this implementation uses a Marquardt-style heuristic to update the damping coefficient as per [Martens \\(2010\\)](http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf).\n* If you choose a Krylov dimension that is larger than the number of parameters in the model, then the algorithm will not perform the Lanczos method; it will essentially become a Levenberg-Marquardt method with multiple options for damping and a custom loss function. Obviously, this can only be done with very small models such as the XOR_Test example.\n\n## Files\n* `SFOptimizer.py` is the optimizer class.\n* `mnist/dataset.py` is a utility class from https://github.com/tensorflow/models.git used to obtain MNIST data.\n* `XOR_Test.ipynb` is a Jupyter notebook containing a simple network trained to an XOR function.\n* `AE_Test.ipynb` is a Jupyter notebook containing a deep autoencoder network trained with MNIST data.\n\n## Implementation Notes\n* The Lanczos iteration loop is unrolled into branches in the TensorFlow graph. This allows a full step to be taken in one TF operation. However, it means the graph can get large if you use a high Krylov dimension.\n* As in the original paper, no re-orthogonalization is used for the Lanczos vectors. This means that they will likely become linearly dependent if the Krylov dimension is high \\(\u003e 100?\\). There would, thus, be little benefit in attempting this.\n* Tested with Python 3.6.7 and TensorFlow 1.12.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdave-fernandes%2FSaddleFreeOptimizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdave-fernandes%2FSaddleFreeOptimizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdave-fernandes%2FSaddleFreeOptimizer/lists"}