Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dave-fernandes/SaddleFreeOptimizer
A second order optimizer for TensorFlow that uses the Saddle-Free method.
https://github.com/dave-fernandes/SaddleFreeOptimizer
krylov-dimension lanczos optimization-algorithms optimizer tensorflow
Last synced: about 1 month ago
JSON representation
A second order optimizer for TensorFlow that uses the Saddle-Free method.
- Host: GitHub
- URL: https://github.com/dave-fernandes/SaddleFreeOptimizer
- Owner: dave-fernandes
- License: apache-2.0
- Created: 2019-01-25T16:49:12.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-02-04T17:12:37.000Z (over 5 years ago)
- Last Synced: 2024-07-18T12:12:18.126Z (2 months ago)
- Topics: krylov-dimension, lanczos, optimization-algorithms, optimizer, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 60.5 KB
- Stars: 19
- Watchers: 3
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SaddleFreeOptimizer
A second order optimizer for TensorFlow that uses the Saddle-Free method of Dauphin _et al_. (2014) with some modifications.## Algorithm
The algorithm is described by [Dauphin, _et al_. \(2014\)](https://arxiv.org/abs/1406.2572). The implementation here follows this paper with the following exceptions:
* The order of operations in the Lanczos method follows that recommended by [Paige \(1972\)](https://academic.oup.com/imamat/article-abstract/10/3/373/824284).
* The type of damping applied to the curvature matrix in the Krylov subspace has 3 options that can be specified in the optimizer's constructor.
* Instead of applying multiple damping coefficients and finding the result with the lowest loss, this implementation uses a Marquardt-style heuristic to update the damping coefficient as per [Martens \(2010\)](http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf).
* If you choose a Krylov dimension that is larger than the number of parameters in the model, then the algorithm will not perform the Lanczos method; it will essentially become a Levenberg-Marquardt method with multiple options for damping and a custom loss function. Obviously, this can only be done with very small models such as the XOR_Test example.## Files
* `SFOptimizer.py` is the optimizer class.
* `mnist/dataset.py` is a utility class from https://github.com/tensorflow/models.git used to obtain MNIST data.
* `XOR_Test.ipynb` is a Jupyter notebook containing a simple network trained to an XOR function.
* `AE_Test.ipynb` is a Jupyter notebook containing a deep autoencoder network trained with MNIST data.## Implementation Notes
* The Lanczos iteration loop is unrolled into branches in the TensorFlow graph. This allows a full step to be taken in one TF operation. However, it means the graph can get large if you use a high Krylov dimension.
* As in the original paper, no re-orthogonalization is used for the Lanczos vectors. This means that they will likely become linearly dependent if the Krylov dimension is high \(> 100?\). There would, thus, be little benefit in attempting this.
* Tested with Python 3.6.7 and TensorFlow 1.12.0