https://github.com/dingo-actual/dropgrad

PyTorch Implementation of DropGrad
https://github.com/dingo-actual/dropgrad

neural-network pytorch regularization

Last synced: about 1 month ago
JSON representation

PyTorch Implementation of DropGrad

Host: GitHub
URL: https://github.com/dingo-actual/dropgrad
Owner: dingo-actual
License: mit
Created: 2023-12-10T21:43:28.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-04-27T05:40:08.000Z (about 2 years ago)
Last Synced: 2026-04-30T16:36:12.967Z (about 1 month ago)
Topics: neural-network, pytorch, regularization
Language: Python
Homepage:
Size: 20.5 KB
Stars: 4
Watchers: 1
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # DropGrad: A Simple Method for Regularization and Accelerated Optimization of Neural Networks

- [DropGrad: A Simple Method for Regularization and Accelerated Optimization of Neural Networks](#dropgrad-a-simple-method-for-regularization-and-accelerated-optimization-of-neural-networks)

  - [Installation](#installation)

    - [Requirements](#requirements)

    - [Using pip](#using-pip)

    - [Using git](#using-git)

  - [Usage](#usage)

    - [Basic Usage](#basic-usage)

    - [Use with Learning Rate Schedulers](#use-with-learning-rate-schedulers)

    - [Varying `drop_rate` per `Parameter`](#varying-drop_rate-per-parameter)

  - [TODO 🚧](#todo-)

DropGrad is a regularization method for neural networks that works by randomly (and independently) setting gradient values to zero before an optimization step. Similarly to Dropout, it has a single parameter, `drop_rate`, the probability of setting each parameter gradient to zero. In order to de-bias the remaining gradient values, they are divided by `1.0 - drop_rate`.

> To the best of my knowledge DropGrad is an original contribution. However, I have no plans of publishing a paper.

> If indeed, it is an original method, please feel free to publish a paper about DropGrad. If you do so, all I ask is

> that you mention me in your publication and cite this repository.

## Installation

The PyTorch implementation of DropGrad can be installed simply using pip or by cloning the current GitHub repo.

### Requirements

The only requirement for DropGrad is PyTorch. (Only versions of PyTorch >= 2.0 have been tested, although DropGrad should be compatible with any version of PyTorch)

### Using pip

To install using pip:

```bash

pip install dropgrad

```

### Using git

```bash

git clone https://github.com/dingo-actual/dropgrad.git

cd dropgrad

python -m build

pip install dist/dropgrad-0.1.0-py3-none-any.whl

```

## Usage

### Basic Usage

To use DropGrad in your neural network optimization, simply import the `DropGrad` class to wrap your optimizer.

```python

from dropgrad import DropGrad

```

Wrapping an optimizer is similar to using a learning rate scheduler:

```python

opt_unwrapped = Adam(net.parameters(), lr=1e-3)

opt = DropGrad(opt_unwrapped, drop_rate=0.1)

```

During training, the application of DropGrad is automatically handled by the wrapper. Simply call `.step()` on

the wrapped optimizer to apply DropGrad then `.zero_grad()` to reset the gradients.

```python

opt.step()

opt.zero_grad()

```

### Use with Learning Rate Schedulers

If you use a learning rate scheduler as well as DropGrad, simply pass the base optimizer to both the DropGrad

wrapper and the learning rate scheduler:

```python

opt_unwrapped = Adam(net.parameters(), lr=1e-3)

lr_scheduler = CosineAnnealingLR(opt_unwrapped, T_max=100)

opt = DropGrad(opt_unwrapped, drop_rate=0.1)

```

During the training loop, you call `.step()` on the DropGrad wrapper before calling `.step()` on the learning rate

scheduler, similarly to using an optimizer without DropGrad:

```python

for epoch_n in range(n_epochs):

    for x_batch, y_batch in dataloader:

        pred_batch = net(x_batch)

        loss = loss_fn(pred_batch, y_batch)

        loss.backward()

        opt.step()

        opt.zero_grad()

    lr_scheduler.step()

```

### Varying `drop_rate` per `Parameter`

DropGrad allows the user to set a different drop rate for each `Parameter` under optimization. To do this, simply

pass a dictionary mapping `Parameters` to drop rates to the `drop_rate` argument of the DropGrad wrapper. If a dictionary

is passed to DropGrad during initialization, all optimized `Parameter`s that are not present in that dictionary will have

the drop rate passed to the DropGrad wrapper at initialization (if `drop_rate=None` then drop grad simply won't be applied

to `Parameter`s that are not present in the dictionary).

The example below will apply a `drop_rate` of 0.1 to all optimized weights and a `drop_rate` of 0.01 to all optimized biases,

with no DropGrad applied to any other optimized `Parameter`s:

```python

drop_rate_weights = 0.1

drop_rate_biases = 0.01

params_weights = [p for name, p in net.named_parameters() if p.requires_grad and 'weight' in name]

params_biases = [p for name, p in net.named_parameters() if p.requires_grad and 'bias' in name]

param_drop_rates = {p: drop_rate_weights for p in params_weights}

param_drop_rates.update({p: drop_rate_biases for p in params_biases})

opt_unwrapped = Adam(net.parameters(), lr=1e-3)

opt = DropGrad(opt_unwrapped, drop_rate=None, params=param_drop_rates)

```

## TODO 🚧

- [ ] Write analysis of DropGrad

- [ ] Implement drop rate schedulers

- [ ] Implement option to apply "full" update drop by interrupting `.step()`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dingo-actual/dropgrad

Awesome Lists containing this project

README