{"id":46059383,"url":"https://github.com/nvidia-nemo/emerging-optimizers","last_synced_at":"2026-03-01T11:01:10.854Z","repository":{"id":315495507,"uuid":"1046464070","full_name":"NVIDIA-NeMo/Emerging-Optimizers","owner":"NVIDIA-NeMo","description":null,"archived":false,"fork":false,"pushed_at":"2026-02-23T17:15:01.000Z","size":587,"stargazers_count":156,"open_issues_count":7,"forks_count":15,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-02-23T22:51:17.662Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA-NeMo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-28T18:16:41.000Z","updated_at":"2026-02-23T17:15:30.000Z","dependencies_parsed_at":"2025-09-19T00:38:41.929Z","dependency_job_id":"66a3e061-f7c9-410b-b593-a622f4a39ae0","html_url":"https://github.com/NVIDIA-NeMo/Emerging-Optimizers","commit_stats":null,"previous_names":["nvidia-nemo/emerging-optimizers"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/NVIDIA-NeMo/Emerging-Optimizers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-NeMo%2FEmerging-Optimizers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-NeMo%2FEmerging-Optimizers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-NeMo%2FEmerging-Optimizers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-NeMo%2FEmerging-Optimizers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA-NeMo","download_url":"https://codeload.github.com/NVIDIA-NeMo/Emerging-Optimizers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-NeMo%2FEmerging-Optimizers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29967930,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T10:55:55.490Z","status":"ssl_error","status_checked_at":"2026-03-01T10:55:55.175Z","response_time":124,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-03-01T11:01:09.897Z","updated_at":"2026-03-01T11:01:10.849Z","avatar_url":"https://github.com/NVIDIA-NeMo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Emerging Optimizers\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n\u003c!-- Get the codecov badge with a token direct from https://app.codecov.io/gh/NVIDIA-NeMo --\u003e\n[![codecov](https://codecov.io/gh/NVIDIA-NeMo/Emerging-Optimizers/graph/badge.svg?token=IQ6U7IFYN0)](https://codecov.io/gh/NVIDIA-NeMo/Emerging-Optimizers)\n[![CICD NeMo](https://github.com/NVIDIA-NeMo/Emerging-Optimizers/actions/workflows/cicd-main.yml/badge.svg?branch=main)](https://github.com/NVIDIA-NeMo/Emerging-Optimizers/actions/workflows/cicd-main.yml)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/release/python-3120/)\n![GitHub Repo stars](https://img.shields.io/github/stars/NVIDIA-NeMo/Emerging-Optimizers)\n[![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html)\n\n\u003c/div\u003e\n\n## Overview\n\nEmerging Optimizers is a research project focused on understanding and optimizing the algorithmic behavior of emerging optimizers (including Shampoo, SOAP, Muon, and others) and their implications to performance of GPU systems in LLM training.\n\n\u003e ⚠️ Note: Emerging-Optimizers is under active development. All APIs are experimental and subject to change. New features, improvements, and documentation updates are released regularly. Your feedback and contributions are welcome, and we encourage you to follow along as new updates roll out.\n\n## Background\n\n### What are Emerging Optimizers?\n\nEmerging optimizers represent a class of novel optimization algorithms that go beyond traditional first-order methods like Adam or SGD. These include optimizers that use matrix-based (non-diagonal) preconditioning, orthogonalization techniques, and other innovative approaches to achieve faster convergence and improved training efficiency.\n\nExamples include Shampoo, which uses Kronecker-factored preconditioning ([arXiv:1802.09568](https://arxiv.org/abs/1802.09568)), and Muon, which uses Newton-Schulz orthogonalization ([arXiv:2502.16982](https://arxiv.org/abs/2502.16982)).\n\n### Why They Matter\n\nEmerging optimizers have demonstrated significant practical impact in large-scale language model training. Most notably, **Muon was used to train the Kimi K2 model** ([arXiv:2507.20534](https://arxiv.org/abs/2507.20534)), showcasing the effectiveness of these novel approaches at scale. These optimizers can:\n\n- Achieve faster convergence, reducing the number of training steps required\n- Improve final model quality through better conditioning of the optimization landscape\n- Enable more efficient hyperparameter tuning due to reduced sensitivity to learning rates\n\n## Installation\n\n### Prerequisites\n\n- Python 3.12 (Release v0.1.0 is the last version supports Python 3.10)\n- PyTorch 2.0 or higher\n\n### Install from Source\n\n```bash\ngit clone https://github.com/NVIDIA-NeMo/Emerging-Optimizers.git\ncd Emerging-Optimizers\npip install .\n```\n\n## Usage\n\n### Example\n\nRefer to tests for usage of different optimizers, e.g.  [`tests/test_orthogonalized_optimizer.py::MuonTest`](tests/test_orthogonalized_optimizer.py).\n\n### Integration with Megatron Core\n\nIntegration with Megatron Core is available in **dev** branch, e.g. [muon.py](https://github.com/NVIDIA/Megatron-LM/blob/dev/megatron/core/optimizer/muon.py)\n\n## Benchmarks\n\nComing soon.\n\n## License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia-nemo%2Femerging-optimizers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvidia-nemo%2Femerging-optimizers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia-nemo%2Femerging-optimizers/lists"}