https://github.com/shimataro/ralu
RaLU - A New Activation Function for Deep Neural Network
https://github.com/shimataro/ralu
deep-learning deep-learning-algorithms neural-network
Last synced: 14 days ago
JSON representation
RaLU - A New Activation Function for Deep Neural Network
- Host: GitHub
- URL: https://github.com/shimataro/ralu
- Owner: shimataro
- License: mit
- Created: 2025-04-13T12:58:51.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2025-04-19T00:50:06.000Z (11 months ago)
- Last Synced: 2025-12-01T04:20:46.050Z (3 months ago)
- Topics: deep-learning, deep-learning-algorithms, neural-network
- Homepage:
- Size: 84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RaLU - A New Activation Function for Deep Neural Network
**RaLU** (not ReLU!), stands for **Rational Linear Unit**, is a parametric, lightweight, and gradient-stable activation function designed for Deep Neural Network.
## Definition
```math
\begin{split}
\text{RaLU}_{\alpha}(x) &= x \frac{x^{2} + \alpha}{x^{2} + 1} \\
\frac{d}{dx}\text{RaLU}_{\alpha}(x) &= \frac{x^{4} + (3-\alpha)x^{2} + \alpha}{(x^{2} + 1)^{2}}
\end{split}
```
$`\alpha (\in \mathbb{R})`$ is a learnable parameter.


* Gradient is $\alpha$ at $`x=0`$.
* It asymptotes to the identity function at $`x \to \pm \infty`$, regardless of $`\alpha`$.
* $`\alpha=1`$ gives an indentity function.
* $`0 \le \alpha \le 9`$ gives a monotonic increasing function.
* Gradient is $0$ at $`x=0`$ when $`\alpha=0`$.
* Gradient is $0$ at $`x= \pm \sqrt{3}`$ when $`\alpha=9`$.
* It loses its monotonically increasing property when $`\alpha<0`$ or $`\alpha>9`$.
## Why for DNN?
### Resistant to vanishing/exploding gradient problems
It asymptotes to the identity function at $`x \to \pm \infty`$ and the gradient is $1$.
In other words, it is unlikely to cause a vanishing/exploding gradient even when layered and is well suited for regression problems, CV (CNN), NLP (RNN, Transformer), etc.
### Beneficial for training
It is infinitely differentiable (smooth) for all $x$, for any parameter $\alpha$.
This property means that it can avoid gradient discontinuities and can lead to more stable training in gradient descent.
Also, it outputs zero mean values because it is a zero-centred odd function.
This prevents systematic bias in the activations.
### No "dead neuron problems"
The output range of RaLU is unlimited (unsaturated).
This could avoid the "Dying ReLU Problem".
### Learnable
It has a parameter so that it can form the best possible shape for each unit or layer.
### Fast and lightweight
It uses only basic arithmetic operations, no exponents or trigonometry.
Therefore it is fast enough, although not as fast as ReLU.