Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dave-fernandes/SaddleFreeOptimizer

A second order optimizer for TensorFlow that uses the Saddle-Free method.
https://github.com/dave-fernandes/SaddleFreeOptimizer

krylov-dimension lanczos optimization-algorithms optimizer tensorflow

Last synced: about 1 month ago
JSON representation

A second order optimizer for TensorFlow that uses the Saddle-Free method.

Awesome Lists containing this project

README

        

# SaddleFreeOptimizer
A second order optimizer for TensorFlow that uses the Saddle-Free method of Dauphin _et al_. (2014) with some modifications.

## Algorithm
The algorithm is described by [Dauphin, _et al_. \(2014\)](https://arxiv.org/abs/1406.2572). The implementation here follows this paper with the following exceptions:
* The order of operations in the Lanczos method follows that recommended by [Paige \(1972\)](https://academic.oup.com/imamat/article-abstract/10/3/373/824284).
* The type of damping applied to the curvature matrix in the Krylov subspace has 3 options that can be specified in the optimizer's constructor.
* Instead of applying multiple damping coefficients and finding the result with the lowest loss, this implementation uses a Marquardt-style heuristic to update the damping coefficient as per [Martens \(2010\)](http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf).
* If you choose a Krylov dimension that is larger than the number of parameters in the model, then the algorithm will not perform the Lanczos method; it will essentially become a Levenberg-Marquardt method with multiple options for damping and a custom loss function. Obviously, this can only be done with very small models such as the XOR_Test example.

## Files
* `SFOptimizer.py` is the optimizer class.
* `mnist/dataset.py` is a utility class from https://github.com/tensorflow/models.git used to obtain MNIST data.
* `XOR_Test.ipynb` is a Jupyter notebook containing a simple network trained to an XOR function.
* `AE_Test.ipynb` is a Jupyter notebook containing a deep autoencoder network trained with MNIST data.

## Implementation Notes
* The Lanczos iteration loop is unrolled into branches in the TensorFlow graph. This allows a full step to be taken in one TF operation. However, it means the graph can get large if you use a high Krylov dimension.
* As in the original paper, no re-orthogonalization is used for the Lanczos vectors. This means that they will likely become linearly dependent if the Krylov dimension is high \(> 100?\). There would, thus, be little benefit in attempting this.
* Tested with Python 3.6.7 and TensorFlow 1.12.0