https://github.com/notedance/ranger25

TensorFlow implementation for Ranger25
https://github.com/notedance/ranger25

deep-learning deep-reinforcement-learning keras machine-learning optimizer reinforcement-learning tensorflow

Last synced: 4 months ago
JSON representation

TensorFlow implementation for Ranger25

Host: GitHub
URL: https://github.com/notedance/ranger25
Owner: NoteDance
License: apache-2.0
Created: 2025-05-05T14:20:49.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-05-05T14:24:36.000Z (9 months ago)
Last Synced: 2025-05-08T02:52:02.839Z (9 months ago)
Topics: deep-learning, deep-reinforcement-learning, keras, machine-learning, optimizer, reinforcement-learning, tensorflow
Language: Python
Homepage:
Size: 10.7 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Ranger25

**Overview**:

`Ranger25` is an experimental composite optimizer that blends together seven advanced optimization techniques—ADOPT, AdEMAMix, Cautious updates, StableAdamW/Adam‑atan2, OrthoGrad, adaptive gradient clipping (AGC), and Lookahead—to achieve more reliable convergence, improved stability, and faster training across a wide range of deep‑learning tasks. By combining theoretical convergence fixes (ADOPT) with enhanced utilization of past gradients (AdEMAMix), directional masking (Cautious), numerical stability (Adam‑atan2), gradient decorrelation (OrthoGrad), unit‑wise clipping (AGC), and periodic weight averaging (Lookahead), Ranger25 aims to deliver the best of each world in a single optimizer.

**Parameters**:

- **`learning_rate`** *(float, default=1e-3)*: Base step size for parameter updates.

- **`betas`** *(tuple of three floats, default=(0.9, 0.98, 0.9999))*:

  * `beta1` for first‑moment EMA (momentum)

  * `beta2` for second‑moment EMA (RMS scaling)

  * `beta3` for slow EMA used in the “mix” component (AdEMAMix).

- **`epsilon`** *(float, default=1e-8)*: Small constant for numerical stability in denominator (and in StableAdamW/Adam‑atan2 branch).

- **`weight_decay`** *(float, default=1e-3)*: Coefficient for decoupled weight‑decay regularization (AdamW style).

- **`alpha`** *(float, default=5.0)*: Mixing coefficient magnitude for the slow EMA in AdEMAMix.

- **`t_alpha_beta3`** *(int or None, default=None)*: Number of steps over which to warm up `alpha` and `beta3`; if `None`, no warmup.

- **`lookahead_merge_time`** *(int, default=5)*: Number of steps between Lookahead slow‑weight merges.

- **`lookahead_blending_alpha`** *(float, default=0.5)*: Interpolation factor between fast and slow weights at each Lookahead merge.

- **`cautious`** *(bool, default=True)*: Enable Cautious updates—masking out parameter updates whose sign conflicts with the raw gradient.

- **`stable_adamw`** *(bool, default=True)*: Use StableAdamW variant, which rescales step size by measured gradient variance for numerical stability.

- **`orthograd`** *(bool, default=True)*: Enable OrthoGrad, projecting each gradient to be orthogonal to its parameter vector before update.

- **`weight_decouple`** *(bool, default=True)*: Apply weight decay in a decoupled fashion (AdamW) rather than via loss augmentation.

- **`fixed_decay`** *(bool, default=False)*: Use fixed weight‑decay (not scaled by learning rate) when `weight_decouple` is True.

- **`clipnorm`** *(float or None)*: Clip gradients by global L‑2 norm.

- **`clipvalue`** *(float or None)*: Clip gradients by value.

- **`global_clipnorm`** *(float or None)*: Alias for clipping by global norm across all parameters.

- **`use_ema`** *(bool, default=False)*: Maintain an exponential moving average of model weights.

- **`ema_momentum`** *(float, default=0.99)*: Decay rate for weight EMA.

- **`ema_overwrite_frequency`** *(int or None)*: How often to overwrite model weights with EMA weights.

- **`loss_scale_factor`** *(float or None)*: Static loss‑scaling factor for mixed‑precision training.

- **`gradient_accumulation_steps`** *(int or None)*: Number of steps to accumulate gradients before applying an update.

- **`name`** *(str, default="ranger25")*: Name identifier for the optimizer instance.

**Example Usage**:

```python

import tensorflow as tf

from ranger25 import Ranger25

# Instantiate the Ranger25 optimizer with custom settings

optimizer = Ranger25(

    learning_rate=3e-4,

    betas=(0.9, 0.98, 0.9999),

    epsilon=1e-8,

    weight_decay=1e-4,

    alpha=4.0,

    t_alpha_beta3=10000,

    lookahead_merge_time=6,

    lookahead_blending_alpha=0.6,

    cautious=True,

    stable_adamw=True,

    orthograd=True,

    fixed_decay=False,

    clipnorm=1.0,

    use_ema=True,

    ema_momentum=0.995,

    gradient_accumulation_steps=2,

    name="ranger25_custom"

)

# Compile a Keras model

model = tf.keras.models.Sequential([

    tf.keras.layers.Dense(128, activation="relu"),

    tf.keras.layers.Dense(10, activation="softmax")

])

model.compile(

    optimizer=optimizer,

    loss="sparse_categorical_crossentropy",

    metrics=["accuracy"]

)

# Train the model

model.fit(train_dataset, validation_data=val_dataset, epochs=20)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/notedance/ranger25

Awesome Lists containing this project

README