https://github.com/murrellgroup/cannotwaitfortheseoptimisers.jl
https://github.com/murrellgroup/cannotwaitfortheseoptimisers.jl
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/murrellgroup/cannotwaitfortheseoptimisers.jl
- Owner: MurrellGroup
- License: mit
- Created: 2024-12-21T20:22:38.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-12-27T18:09:19.000Z (5 months ago)
- Last Synced: 2025-12-29T16:46:11.795Z (5 months ago)
- Language: Julia
- Homepage:
- Size: 482 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CannotWaitForTheseOptimisers
[](https://MurrellGroup.github.io/CannotWaitForTheseOptimisers.jl/stable/)
[](https://MurrellGroup.github.io/CannotWaitForTheseOptimisers.jl/dev/)
[](https://github.com/MurrellGroup/CannotWaitForTheseOptimisers.jl/actions/workflows/CI.yml?query=branch%3Amain)
[](https://codecov.io/gh/MurrellGroup/CannotWaitForTheseOptimisers.jl)
A collection of experimental optimizers implemented according to the [Optimisers.jl](https://github.com/FluxML/Optimisers.jl) interface. We intend to use this package as a testing ground for new optimization algorithms, and then possibly get them incorporated into the main Optimisers.jl package. As such, please do not expect much stability from this package.
## Installation
```julia
pkg> add Optimisers
pkg> add https://github.com/MurrellGroup/CannotWaitForTheseOptimisers.jl
```
## Usage
```julia
using CannotWaitForTheseOptimisers, Optimisers
```
## Description
This package currently includes attempts at implementing:
- [x] [Muon](https://kellerjordan.github.io/posts/muon/) which performs an orthogonalization step before parameter update, and seems excellent for training transformers.
- [x] [Apollo](https://arxiv.org/abs/2412.05270) which tracks low rank moments using a random projection, reducing the memory footprint of the optimizer.
- [x] [NormGrowthCap](https://arxiv.org/abs/2410.01623) which prevents the norm of the parameters from growing too quickly.