{"id":13576506,"url":"https://github.com/yangluo7/CAME","last_synced_at":"2025-04-05T05:32:00.866Z","repository":{"id":185850635,"uuid":"668735812","full_name":"yangluo7/CAME","owner":"yangluo7","description":"The official implementation of \"CAME: Confidence-guided Adaptive Memory Optimization\"","archived":false,"fork":false,"pushed_at":"2025-03-22T08:39:50.000Z","size":804,"stargazers_count":87,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-22T09:26:46.184Z","etag":null,"topics":["deep-learning","diffusion-transformer","large-language-models","memory-efficient","optimizer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yangluo7.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-20T13:28:45.000Z","updated_at":"2025-03-22T08:39:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"4f45e0ab-85f0-4cb4-8ba4-a32a00239d3d","html_url":"https://github.com/yangluo7/CAME","commit_stats":null,"previous_names":["yangluo7/came"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangluo7%2FCAME","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangluo7%2FCAME/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangluo7%2FCAME/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangluo7%2FCAME/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yangluo7","download_url":"https://codeload.github.com/yangluo7/CAME/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247294448,"owners_count":20915335,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","diffusion-transformer","large-language-models","memory-efficient","optimizer"],"created_at":"2024-08-01T15:01:10.849Z","updated_at":"2025-04-05T05:31:55.856Z","avatar_url":"https://github.com/yangluo7.png","language":"Python","readme":"\u003ch1 align=\"center\"\u003eCAME Optimizer\u003c/h1\u003e\n\u003ch3 align=\"center\"\u003eACL 2023 Outstanding Paper Award\u003cbr/\u003eConfidence-guided Adaptive Memory Efficient Optimization\u003c/h3\u003e\n\n\nThis is an official implementation of **CAME** optimizer in the \"[Confidence-guided Adaptive Memory Efficient Optimization](https://arxiv.org/abs/2307.02047)\". Please cite the paper and star this repo if you find CAME useful. Thanks!\n\n[Paper](https://arxiv.org/abs/2307.02047) | [Twitter](https://twitter.com/ZangweiZheng/status/1680227732788236289) | [Blog](https://zhengzangw.github.io/blogs/came) | [Pypi Package](https://pypi.org/project/came-pytorch/) | [zhihu](https://zhuanlan.zhihu.com/p/643816029)\n## Method\n\nIn this work, we studied a confidence-guided strategy to reduce the instability of existing memory efficient optimizers. Based on this strategy, we proposed CAME to simultaneously achieve two goals: fast convergence as in traditional adaptive methods, and low memory usage as in memory-efficient methods.\n\nThe pseudo code is presented in the figure with difference with Adafactor in blue fonts.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/came_code.png\" alt=\"CAME optimizer pseudo code\" width=\"50%\" /\u003e\n\u003c/p\u003e\n\u003c!-- ![CAME_code](assets/came_code.png) --\u003e\n\n## Install \n```\npip install came-pytorch\n```\n## Usage\n\n```python\nfrom came_pytorch import CAME\noptimizer = CAME(\n    model.parameters(),\n    lr=2e-4,\n    weight_decay=1e-2,\n    betas=(0.9, 0.999, 0.9999),\n    eps=(1e-30, 1e-16)\n)\n```\n\n## Hyper-parameter Tuning\n\n* Pre-training: Based on our experiments on BERT-Large, GPT-2, and T5, it's suitable to choose a learning rate for CAME 0.5-0.9x lr for AdamW.\n* Set $\\beta_1$ and $\\beta_2$ to the same values used in AdamW. Choose $\\beta_3$ to be larger than $\\beta_2$. For example, consider choosing $\\beta_3$ between $[0.9995, 0.99995]$ if setting $\\beta_1, \\beta_2=0.9, 0.999$, and choosing $\\beta_3$ between $[0.99, 0.999]$ if setting $\\beta_1, \\beta_2=0.9, 0.95$. Due to computational resource constraints, we did not explore more combinations of three betas. Different training tasks may require different combinations of optimal performance.\n* If you have any feedback or comments regarding hyper-parameter tuning, please do not hesitate to provide them to us!\n\n## Experiments\n\nApart from the BERT and T5 experiments shown in the paper, we conduct more and record the results here.\n\n### Fine-tuning LLaMA-7B\n\n|                | MMLU      | WikiText | HellaSwag | TruthfulQA (MC) | BoolQ     | COPA      | WSC       | WIC       |\n| -------------- | --------- | -------- | --------- | --------------- | --------- | --------- | --------- | --------- |\n| Alpaca-7B      | 40.21     | 6.74     | 59.76     | **38.89**       | **79.57** | **88.00** | 46.15     | 49.84     |\n| Alpaca-7B-CAME | **40.59** | **6.38** | **59.80** | 38.61           | 79.08     | **88.00** | **49.04** | **50.78** |\n\nWe fine-tuned LLaMA-7B with [stanford-alpaca](https://github.com/tatsu-lab/stanford_alpaca) (52k instruction-tuning dataset). To replicate our result, first register the CAME optimizer to the transformer package. Then in Alpaca training script, change the default optimizer from \"adamw\" to \"came\".\n\nAlpaca-7B and Alpaca-7B-CAME are evaluated using [Instruct-eval](https://github.com/declare-lab/instruct-eval) and [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).\n\n### Pre-training GPT-2\n\n![CAME_gpt2](assets/gpt-2_came.png)\n\nThe pre-training of GPT-2 (Medium, 345M) is based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM). To replicate our result, add the CAME optimizer in [`megatron/optimizer/__init__.py`](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/optimizer/__init__.py) and set the *args.optimizer* to \"came\".\n\n## Memory Usage Comparison\nTo ensure a fair comparison, we set the batch size to 1 for the pre-training of GPT-2 (Medium) to examine the memory footprint of CAME and AdamW.\n\n|              | AdamW | CAME     | \n|--------------|-------|----------|\n| Memory (GiB) | 8.77  | **7.44** | \n\n## Citation\n\n```bibtex\n@inproceedings{luo2023came,\n  title={CAME: Confidence-guided Adaptive Memory Efficient Optimization},\n  author={Luo, Yang and Ren, Xiaozhe and Zheng, Zangwei and Jiang, Zhuo and Jiang, Xin and You, Yang},\n  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},\n  pages={4442--4453},\n  year={2023}\n}\n```\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyangluo7%2FCAME","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyangluo7%2FCAME","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyangluo7%2FCAME/lists"}