{"id":15116150,"url":"https://github.com/lucidrains/grokfast-pytorch","last_synced_at":"2025-04-05T01:05:22.715Z","repository":{"id":244558746,"uuid":"815598095","full_name":"lucidrains/grokfast-pytorch","owner":"lucidrains","description":"Explorations into the proposal from the paper \"Grokfast, Accelerated Grokking by Amplifying Slow Gradients\"","archived":false,"fork":false,"pushed_at":"2024-12-22T16:48:15.000Z","size":170,"stargazers_count":98,"open_issues_count":4,"forks_count":7,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-29T00:06:04.451Z","etag":null,"topics":["artificial-intelligence","deep-learning","grokking"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucidrains.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-15T15:27:33.000Z","updated_at":"2025-03-14T11:03:01.000Z","dependencies_parsed_at":"2024-12-29T06:06:27.330Z","dependency_job_id":"88328007-d3d4-4fc2-b4d0-71a1081cb802","html_url":"https://github.com/lucidrains/grokfast-pytorch","commit_stats":{"total_commits":20,"total_committers":1,"mean_commits":20.0,"dds":0.0,"last_synced_commit":"f757a671ddfc7984113b693acf3a832429b41468"},"previous_names":["lucidrains/grokfast-pytorch"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fgrokfast-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fgrokfast-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fgrokfast-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fgrokfast-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucidrains","download_url":"https://codeload.github.com/lucidrains/grokfast-pytorch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271522,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","grokking"],"created_at":"2024-09-26T01:44:11.350Z","updated_at":"2025-04-05T01:05:22.692Z","avatar_url":"https://github.com/lucidrains.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cimg src=\"./grokfast.png\" width=\"400px\"\u003e\u003c/img\u003e\n\n## Grokfast - Pytorch (wip)\n\nExplorations into \u003ca href=\"https://arxiv.org/html/2405.20233v2\"\u003e\"Grokfast, Accelerated Grokking by Amplifying Slow Gradients\"\u003c/a\u003e, out of Seoul National University in Korea. In particular, will compare it with NAdam on modular addition as well as a few other tasks, since I am curious why those experiments are left out of the paper. If it holds up, will polish it up into a nice package for quick use.\n\nThe official repository can be found \u003ca href=\"https://github.com/ironjr/grokfast\"\u003ehere\u003c/a\u003e\n\n## Install\n\n```bash\n$ pip install grokfast-pytorch\n```\n\n## Usage\n\n```python\nimport torch\nfrom torch import nn\n\n# toy model\n\nmodel = nn.Linear(10, 1)\n\n# import GrokFastAdamW and instantiate with parameters\n\nfrom grokfast_pytorch import GrokFastAdamW\n\nopt = GrokFastAdamW(\n    model.parameters(),\n    lr = 1e-4,\n    weight_decay = 1e-2\n)\n\n# forward and backwards\n\nloss = model(torch.randn(10))\nloss.backward()\n\n# optimizer step\n\nopt.step()\nopt.zero_grad()\n```\n\n## Todo\n\n- [ ] run all experiments on small transformer\n    - [ ] modular addition\n    - [ ] pathfinder-x\n    - [ ] run against nadam and some other optimizers\n    - [ ] see if `exp_avg` could be repurposed for amplifying slow grads\n- [ ] add the foreach version only if above experiments turn out well\n\n## Citations\n\n```bibtex\n@inproceedings{Lee2024GrokfastAG,\n    title   = {Grokfast: Accelerated Grokking by Amplifying Slow Gradients},\n    author  = {Jaerin Lee and Bong Gyun Kang and Kihoon Kim and Kyoung Mu Lee},\n    year    = {2024},\n    url     = {https://api.semanticscholar.org/CorpusID:270123846}\n}\n```\n\n```bibtex\n@misc{kumar2024maintaining,\n    title   = {Maintaining Plasticity in Continual Learning via Regenerative Regularization},\n    author  = {Saurabh Kumar and Henrik Marklund and Benjamin Van Roy},\n    year    = {2024},\n    url     = {https://openreview.net/forum?id=lyoOWX0e0O}\n}\n```\n\n```bibtex\n@inproceedings{anonymous2024the,\n    title   = {The Complexity Dynamics of Grokking},\n    author  = {Anonymous},\n    booktitle = {Submitted to The Thirteenth International Conference on Learning Representations},\n    year    = {2024},\n    url     = {https://openreview.net/forum?id=07N9jCfIE4},\n    note    = {under review}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fgrokfast-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucidrains%2Fgrokfast-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fgrokfast-pytorch/lists"}