https://github.com/dave-fernandes/SaddleFreeOptimizer

A second order optimizer for TensorFlow that uses the Saddle-Free method.
https://github.com/dave-fernandes/SaddleFreeOptimizer

krylov-dimension lanczos optimization-algorithms optimizer tensorflow

Last synced: 7 months ago
JSON representation

A second order optimizer for TensorFlow that uses the Saddle-Free method.

Host: GitHub
URL: https://github.com/dave-fernandes/SaddleFreeOptimizer
Owner: dave-fernandes
License: apache-2.0
Created: 2019-01-25T16:49:12.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-02-04T17:12:37.000Z (over 6 years ago)
Last Synced: 2024-08-10T14:11:17.375Z (11 months ago)
Topics: krylov-dimension, lanczos, optimization-algorithms, optimizer, tensorflow
Language: Jupyter Notebook
Homepage:
Size: 60.5 KB
Stars: 19
Watchers: 3
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# SaddleFreeOptimizer
A second order optimizer for TensorFlow that uses the Saddle-Free method of Dauphin _et al_. (2014) with some modifications.

## Algorithm
The algorithm is described by [Dauphin, _et al_. \(2014\)](https://arxiv.org/abs/1406.2572). The implementation here follows this paper with the following exceptions:
* The order of operations in the Lanczos method follows that recommended by [Paige \(1972\)](https://academic.oup.com/imamat/article-abstract/10/3/373/824284).
* The type of damping applied to the curvature matrix in the Krylov subspace has 3 options that can be specified in the optimizer's constructor.
* Instead of applying multiple damping coefficients and finding the result with the lowest loss, this implementation uses a Marquardt-style heuristic to update the damping coefficient as per [Martens \(2010\)](http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf).
* If you choose a Krylov dimension that is larger than the number of parameters in the model, then the algorithm will not perform the Lanczos method; it will essentially become a Levenberg-Marquardt method with multiple options for damping and a custom loss function. Obviously, this can only be done with very small models such as the XOR_Test example.

## Files
* `SFOptimizer.py` is the optimizer class.
* `mnist/dataset.py` is a utility class from https://github.com/tensorflow/models.git used to obtain MNIST data.
* `XOR_Test.ipynb` is a Jupyter notebook containing a simple network trained to an XOR function.
* `AE_Test.ipynb` is a Jupyter notebook containing a deep autoencoder network trained with MNIST data.

## Implementation Notes
* The Lanczos iteration loop is unrolled into branches in the TensorFlow graph. This allows a full step to be taken in one TF operation. However, it means the graph can get large if you use a high Krylov dimension.
* As in the original paper, no re-orthogonalization is used for the Lanczos vectors. This means that they will likely become linearly dependent if the Krylov dimension is high \(> 100?\). There would, thus, be little benefit in attempting this.
* Tested with Python 3.6.7 and TensorFlow 1.12.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dave-fernandes/SaddleFreeOptimizer

Awesome Lists containing this project

README