https://github.com/titu1994/keras-lamb-optimizer
Implementation of the LAMB optimizer for Keras from the paper "Reducing BERT Pre-Training Time from 3 Days to 76 Minutes"
https://github.com/titu1994/keras-lamb-optimizer
Last synced: 2 months ago
JSON representation
Implementation of the LAMB optimizer for Keras from the paper "Reducing BERT Pre-Training Time from 3 Days to 76 Minutes"
- Host: GitHub
- URL: https://github.com/titu1994/keras-lamb-optimizer
- Owner: titu1994
- License: mit
- Created: 2019-04-04T04:06:28.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-04-05T02:35:49.000Z (over 6 years ago)
- Last Synced: 2025-04-04T04:41:12.070Z (6 months ago)
- Language: Python
- Size: 13.7 KB
- Stars: 75
- Watchers: 4
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Keras LAMB Optimizer (Layer-wise Adaptive Moments optimizer for Batch training)
-----Implementation of the LAMB optimizer from the paper [Reducing BERT Pre-Training Time from 3 Days to 76 Minutes](https://arxiv.org/abs/1904.00962).
Supports large batch training of upto 64k while only using the learning rate as a hyper parameter. Also supports smaller batch sizes without any change in other hyper parameters.
# Usage
```python
from keras_lamb import LAMBOptimizer
optimizer = LAMBOptimizer(0.001, weight_decay=0.01)
model.compile(optimizer, ...)
```# Requirements
- Keras 2.2.4+
- Tensorflow 1.13+