https://github.com/athemathmo/maxmin_cuda

CUDA implementation of the MaxMin activation
https://github.com/athemathmo/maxmin_cuda

Last synced: 12 months ago
JSON representation

CUDA implementation of the MaxMin activation

Host: GitHub
URL: https://github.com/athemathmo/maxmin_cuda
Owner: AtheMathmo
Created: 2019-03-16T21:34:40.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2021-07-06T16:58:15.000Z (almost 5 years ago)
Last Synced: 2025-02-25T08:19:48.679Z (over 1 year ago)
Language: Python
Size: 16.6 KB
Stars: 0
Watchers: 5
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # CUDA implementation of the MaxMin activation function

This repository provides a CUDA implementation of the MaxMin activation function, via a pytorch extension.

The MaxMin activation was first introduced as the [OPLU unit](https://arxiv.org/abs/1604.02313), and later as a special case of the [GroupSort activation function](https://arxiv.org/abs/1811.05381). It is especially useful for neural networks which incorporate norm constraints (e.g. Lipschitz-constrained neural networks or orthogonal RNNs).

_The original pytorch GroupSort implementation can be found [here](https://github.com/cemanil/LNets), with a slightly newer version [here](https://github.com/ColinQiyangLi/LConvNet)._

## Installation

The extension can be installed simply with `python setup.py install`, and then subsequently imported as

```python

from maxmin import MaxMin

```

An implementation using standard pytorch ops is also available in `maxmin/maxmin_py`, and can be imported as

```python

from maxmin import PyMaxMin

```

## Benchmarking

Some quick benchmarking, applying MaxMin to a random matrix of size (50000,5000), and averaging over 5000 trials.

| Method        | CUDA?           | Fwd Avg Time (us)  | Bckwd Avg Time (us)  |

| ------------- |:-------------:| -----:|-----:|

| Default Pytorch | [ ] | 365 | 286 |

| Default Pytorch | [x] |   93  | 320 |

| This Repo       | [ ] |  1004 | 1057 |

| This Repo       | [x] |    37 | 247 |

In short, the implementation in this repo is moderately faster than the GPU implementation using default pytorch ops. However, the CPU implementation is slower than the default method (so if you need a good CPU implementation then use that one). It is also worth noting that the default implementation is still plenty fast for most practical purposes.

## Notes

- Improved error handling is needed on the C++ side.

- A specialized CUDA implementation of the more general GroupSort is harder to implement, and won't give substantial gains over the default pytorch operations. Therefore is isn't planned for now.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/athemathmo/maxmin_cuda

Awesome Lists containing this project

README