https://github.com/jelhamm/overview-gradient-descent-optimization-by-sebastian-ruder

"Simulations for the paper 'A Review Article On Gradient Descent Optimization Algorithms' by Sebastian Roeder"
https://github.com/jelhamm/overview-gradient-descent-optimization-by-sebastian-ruder

adadelta adagrad adam amsgrad artificial-intelligence-algorithms artificial-neural-networks batch damax gradient gradient-descent gradient-descent-algorithm momentum-gradient-descent nadam nesterov-accelerated-sgd numpy-library python rmsprop scipy-library sebastian sympy-library

Last synced: 2 months ago
JSON representation

"Simulations for the paper 'A Review Article On Gradient Descent Optimization Algorithms' by Sebastian Roeder"

Host: GitHub
URL: https://github.com/jelhamm/overview-gradient-descent-optimization-by-sebastian-ruder
Owner: jElhamm
License: bsd-3-clause
Created: 2023-12-15T17:33:10.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-06-19T19:53:45.000Z (over 1 year ago)
Last Synced: 2025-04-04T01:18:11.700Z (6 months ago)
Topics: adadelta, adagrad, adam, amsgrad, artificial-intelligence-algorithms, artificial-neural-networks, batch, damax, gradient, gradient-descent, gradient-descent-algorithm, momentum-gradient-descent, nadam, nesterov-accelerated-sgd, numpy-library, python, rmsprop, scipy-library, sebastian, sympy-library
Language: Jupyter Notebook
Homepage: https://www.ruder.io/optimizing-gradient-descent/
Size: 7.57 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# A Review Article On Gradient Descent Optimization Algorithms

This repository contains the complete implementation of the article titled *"A Review Article On Gradient Descent Optimization Algorithms" by Sebastian Roeder*.
It includes the implementation of various existing optimization algorithms for gradient descent.

## Table of Contents

- [Introduction](#introduction)
- [Algorithms](#algorithms)
- [Usage](#usage)
- [License](#license)

## Introduction

This repository serves as a comprehensive resource for understanding and implementing gradient descent optimization algorithms discussed
in the article "A Review Article On Gradient Descent Optimization Algorithms" by Sebastian Roeder.
The implementation covers a range of algorithms that can be utilized in the field of machine learning and optimization.

## Algorithms

* Each algorithm is implemented as a separate module in this repository, accompanied by comprehensive documentation and code examples.
The following optimization algorithms have been implemented:

1. [*Adam*](Source%20Code/Adam.py): Combines the benefits of momentum and RMSprop, using adaptive learning rates and momentum to converge faster.
- Usage: Widely used and effective for a wide range of optimization problems.

2. [*Nadam*](Source%20Code/Nadam.py): Combines Nesterov accelerated gradient and Adam, benefiting from both lookahead updates and adaptive learning rates.
- Usage: A more advanced variant of Adam that offers improved convergence properties.

3. [*Adamax*](Source%20Code/Adamax.py): A variant of Adam that incorporates the maximum norm of the past gradients for adaptive learning rates.
- Usage: Effective for models with different ranges of parameter magnitudes.

4. [*Amsgrad*](Source%20Code/Amsgrad.py): A modification to Adam that addresses the problem of the adaptive learning rate not achieving convexity for some objective functions.
- Usage: Helps avoid overshooting in non-convex optimization problems.

5. [*AdaGrad*](Source%20Code/Adagrad.py): Adapts the learning rate of each parameter based on the historical gradients, giving more weight to infrequent features.
- Usage: Suitable for sparse datasets, where some features occur infrequently.

6. [*RMSprop*](Source%20Code/RmsProp.py): A variation of AdaGrad that addresses its aggressive and monotonically decreasing learning rate.
- Usage: Effective for non-stationary (changing) optimization problems.

7. [*Momentum*](Source%20Code/Momentum.py): Adds momentum to the gradient descent update by accumulating a moving average of past gradients.
- Usage: Accelerates convergence, especially in the presence of sparse gradients or noisy data.

10. [*AdaDelta*](Source%20Code/Adadelta.py): An extension of AdaGrad that further improves the learning rate adaptation by eliminating the need for an initial learning rate.
- Usage: Overcomes the learning rate decay problem of AdaGrad.

8. [*Batch Gradient Descent*](Source%20Code/BatchGradientDescent.py): A basic optimization algorithm that updates the model parameters using the gradients of the entire training dataset.
- Usage: Suitable for small to medium-sized datasets.

9. [*Nesterov Accelerated Gradient*](Source%20Code/NesterovAccelarated.py): A modification of momentum that improves convergence by using a lookahead update.
- Usage: Helps achieve faster convergence by reducing oscillations.

## Usage

To use the implemented algorithms, follow these steps:

1. Clone this repository to your local machine.
2. Navigate to the respective algorithm module of interest.
3. Read the provided documentation to understand the algorithm's theory, parameters, and usage.
4. Refer to the code examples to see how the algorithm is applied in practical scenarios.
5. Integrate the algorithms into your own machine learning or optimization projects by importing the necessary modules.

## References

[An overview of gradient descent optimization algorithms](https://www.ruder.io/optimizing-gradient-descent/)

## License

This repository is licensed under the BSD-3-Clause License.
See the [LICENSE](./LICENSE) file for more details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jelhamm/overview-gradient-descent-optimization-by-sebastian-ruder

Awesome Lists containing this project

README