https://github.com/lonepatient/novograd-pytorch
pytorch implement of NovoGrad Optimizer
https://github.com/lonepatient/novograd-pytorch
adam adamw alexnet cifar10 novograd optimizer pytorch
Last synced: 6 months ago
JSON representation
pytorch implement of NovoGrad Optimizer
- Host: GitHub
- URL: https://github.com/lonepatient/novograd-pytorch
- Owner: lonePatient
- License: mit
- Created: 2019-08-25T16:01:47.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2022-03-08T10:35:44.000Z (over 3 years ago)
- Last Synced: 2025-03-24T08:21:22.875Z (6 months ago)
- Topics: adam, adamw, alexnet, cifar10, novograd, optimizer, pytorch
- Language: Python
- Size: 215 KB
- Stars: 18
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## NovoGrad Pytorch
This repository contains a PyTorch implementation of the NovoGrad Optimizer from the paper
[Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks](https://arxiv.org/abs/1905.11286)
by Boris Ginsburg, Patrice Castonguay......
## Summarize
NovoGrad is a first-order SGD method with gradients normalized per layer. Borrowingfrom ND-Adam, NovoGrad uses the 2nd moment for normalization and decouples weight decayfrom stochastic gradient for regularization as in AdamW. NovoGrad has half the memoryconsumption compared to Adam (similar to AdaFactor, but with a simpler moment computation).Unlike AdaFactor, NovoGrad does not require learning rate warmup.

## Dependencies
* PyTorch
* torchvision
* matplotlib## Usage
The code in this repository implements both NovoGrad and Adam training, with examples on the CIFAR-10 datasets.
Add the `optimizer.py` script to your project, and import it.
To use NovoGrad use the following command.
```python
from optimizer import NovoGrad
optimizer = NovoGrad(model.parameters(), lr=0.01,betas=(0.95, 0.98),weight_decay=0.001)
```## Example
To produce the result,we use CIFAR-10 dataset for alexnet.
```python
# use adam
python run.py --optimizer-adam --model=alexnet# use novograd
python run.py --optimizer=novograd --model=alexnet# use adamW
python run.py --optimizer=adamw --model=alexnet# use lr scheduler
python run.py --optimizer=adam --model=alexnet --do_scheduler
python run.py --optimizer=novograd --model=alexnet --do_scheduler
python run.py --optimizer=adamw --model=alexnet --do_scheduler
```
## Alexnet ResultsTrain loss of adam ,adamw and novograd with Alexnet on CIFAR-10.

Valid loss of adam ,adamw and novograd with Alexnet on CIFAR-10.

Valid accuracy of adam ,adamw and novograd with Alexnet on CIFAR-10.
