{"id":21164214,"url":"https://github.com/xichen-fy/Fira","last_synced_at":"2025-07-09T16:33:15.444Z","repository":{"id":257811610,"uuid":"866535732","full_name":"xichen-fy/Fira","owner":"xichen-fy","description":"Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?","archived":false,"fork":false,"pushed_at":"2024-10-21T01:16:38.000Z","size":556,"stargazers_count":66,"open_issues_count":1,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-10-21T04:43:05.526Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xichen-fy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-02T12:46:18.000Z","updated_at":"2024-10-21T01:16:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"ef3c5972-c581-4adf-9a62-0c14d1e96d4d","html_url":"https://github.com/xichen-fy/Fira","commit_stats":null,"previous_names":["xichen-fy/fira"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xichen-fy%2FFira","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xichen-fy%2FFira/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xichen-fy%2FFira/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xichen-fy%2FFira/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xichen-fy","download_url":"https://codeload.github.com/xichen-fy/Fira/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225570420,"owners_count":17489885,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-20T14:01:17.766Z","updated_at":"2024-11-20T14:01:24.732Z","avatar_url":"https://github.com/xichen-fy.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"# Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?\n\n[![arXiv](https://img.shields.io/badge/arxiv-2410.01623-b31b1b)](https://arxiv.org/abs/2410.01623) [![blog-cn](https://img.shields.io/badge/%E9%87%8F%E5%AD%90%E4%BD%8D-%E7%AE%80%E4%BB%8B-brightgreen)](https://mp.weixin.qq.com/s/gTj3VAhnJOJbl1_Nqfs0Fw) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-blue)](https://huggingface.co/papers/2410.01623)\n\n\n\n![](./assests/framework.png)\n\n## Introduction\n\nWe introduce [Fira](https://arxiv.org/abs/2410.01623), a plug-and-play memory-efficient training framework of LLMs. \n\nDifferent from LoRA and Galore, we realize training with full-rank gradients of full-rank weights, constituting the first attempt to achieve full-rank training consistently under the low-rank constraint.\n\nOur method is easy to implement, basically relying on just two lines of equations.\n\n\n## TODOs\n\n- [x] Release the pra-training code\n- [x] Release the fine-tuning code\n- [x] Package our Fira into a Python library for easy use\n- [x] Release the code for quantitative analysis of scaling factor and provide further analysis on it\n\n\n\n## Usage\n\n### Install Fira optimizer\n```bash\npip install fira\n```\n\n### Example\n\n```python\nfrom fira import FiraAdamW, divide_params\nparam_groups = divide_params(model, target_modules_list = [\"attn\", \"mlp\"], rank=8)\noptimizer = FiraAdamW(param_groups, lr=learning_rate)\n```\nPlease add the module names that need to enable our Fira in `target_modules_list` (substrings are acceptable).\n### Quick Start\n\nWe also provide a quick-start tutorial for the Fira optimizer. You can find it in `./quick_start`.\n\n### Notice\nIn Fira, Adam is used by default with `weight_decay=0`.\nIf you want to enable weight decay for AdamW, set as follows:\n```python\noptimizer = FiraAdamW(param_groups, lr=learning_rate, weight_decay=0.01)\n```\nBesides, you can modify the learning rate according to different tasks, with a recommended range of $10^{-5}$ to $10^{-2}$.\n\n## Pre-training LLaMA (60M~7B) on the C4 dataset\n\n`./pre_training_c4` includes the code for pre-training LLaMA models on the C4 dataset.\n\n### Set up the environment\n```bash\ncd pre_training_c4\npip install -r requirements.txt\n```\nOur experiment scripts are validated on Python 3.9 with PyTorch 2.2.2.\n\n### Code Structure\n`./pre_training_c4/torchrun_main.py` script is used for pre-training LLaMA models on the C4 dataset. \n`./pre_training_c4/scripts` directory stores the benchmark scripts across different LLaMA model sizes (60M, 130M, 350M, 1B, 7B).\n\nFor instance, to pre-train a 60M model on C4 dataset, execute the following command:\n```bash\n# LLaMA-60M, Fira-Adam, 1 A100, 1 Node\ntorchrun --standalone --nproc_per_node 1 torchrun_main.py \\\n    --model_config llama_configs/llama_60m.json \\\n    --lr 0.01 \\\n    --alpha 0.25 \\\n    --rank 128 \\\n    --update_proj_gap 200 \\\n    --batch_size 256 \\\n    --total_batch_size 512 \\\n    --num_training_steps 10000 \\\n    --warmup_steps 1000 \\\n    --weight_decay 0 \\\n    --dtype bfloat16 \\\n    --eval_every 1000 \\\n    --optimizer fira_adamw \n```\n\n### Notice\nThis script directly accesses [huggingface](https://huggingface.co/) to load the C4 dataset, so please ensure a stable internet connection.\n\nAlternatively, you can refer to the tutorials in `./download_use_c4` for using a local dataset.\n\n## Fine-tuning LLaMA-7B\n\n`./fine_tuning` includes the code for fine-tuning LLaMA-7B with Fira.\n\n### Set up the environment\n\n```bash\ncd fine_tuning\npip install -r requirements.txt\n```\n\n### Download Datasets\nDownload commonsense 170k finetuning dataset from [LLM-Adapters](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/ft-training_set/commonsense_170k.json). Then, place it as `./fine_tuning/commonsense_170k.json`. \nDownload full dataset directory from [LLM-Adapters](https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/ft-training_set/commonsense_170k.json). Then, place it as `./fine_tuning/dataset`.\n\n### Code Structure\n`./finetune.py` is used for finetuning LLaMA-7B on the commonsense reasoning tasks. \n`./commonsense_evaluate.py` is used for evaluating the finetuned LLaMA-7B model on 8 sub-tasks of the commonsense reasoning tasks.\n\n### Finetuning\nFor instance, to finetuning LLaMA-7B with Fira on the commonsense reasoning tasks by a single GPU, execute the following command:\n```bash\n# LLaMA-7B, Fira-Adam, 1 4090\nCUDA_VISIBLE_DEVICES=0 python finetune.py \\\n  --base_model 'yahma/llama-7b-hf' \\\n  --data_path 'commonsense_170k.json' \\\n  --output_dir './result/fira' \\\n  --batch_size 16 \\\n  --micro_batch_size 4 \\\n  --num_epochs 3 \\\n  --learning_rate 1e-4 \\\n  --cutoff_len 256 \\\n  --val_set_size 120 \\\n  --adapter_name lora \\\n  --lora_r 32 \\\n  --lora_alpha 64 \\\n  --use_gradient_checkpointing \\\n  --target_modules '[\"q_proj\", \"k_proj\", \"v_proj\", \"up_proj\", \"down_proj\"]' \\\n  --save_step 15000 \\\n  --eval_step 1000 \\\n  --optimizer_name fira_adamw \n```\n\n### Evaluating\nFor instance, evaluate the finetuned LLaMA-7B model on the BoolQ sub-task:\n```bash\n# LLaMA-7B, Fira-Adam, 1 4090\nCUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \\\n    --model LLaMA-7B \\\n    --adapter LoRA \\\n    --dataset boolq \\\n    --batch_size 1 \\\n    --base_model 'yahma/llama-7b-hf' \\\n    --lora_weights './result/fira' | tee -a './result/fira/boolq.txt'\n```\n\n## Further Analysis of Scaling Factor Similarities \n\n\nTo further substantiate our findings of the scaling factor, we conduct more quantitative analysis of scaling factor similarities between low-rank and full-rank LLMs training. Specifically, we assess scaling factor similarities at both matrix and column level for pre-training LLaMA models ranging from 60M to 1B, averaged over 10,000 steps.\n\u003ctable style=\"margin: auto; width: 90%\"\u003e\n    \u003ctr\u003e\n        \u003cth rowspan=\"3\" style=\"font-weight: normal;\"\u003eSize\u003c/th\u003e\n        \u003cth colspan=\"4\" style=\"font-weight: normal;\"\u003eMatrix Level\u003c/th\u003e\n        \u003cth colspan=\"4\" style=\"font-weight: normal;\"\u003eColumn Level\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth colspan=\"2\" style=\"font-weight: normal;\"\u003eSpearman\u003c/th\u003e\n        \u003cth colspan=\"2\" style=\"font-weight: normal;\"\u003eKendall\u003c/th\u003e\n        \u003cth colspan=\"2\" style=\"font-weight: normal;\"\u003eSpearman\u003c/th\u003e\n        \u003cth colspan=\"2\" style=\"font-weight: normal;\"\u003eKendall\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth style=\"font-weight: normal;\"\u003eCoefficient\u003c/th\u003e\n        \u003cth style=\"font-weight: normal;\"\u003eP-value\u003c/th\u003e\n        \u003cth style=\"font-weight: normal;\"\u003eCoefficient\u003c/th\u003e\n        \u003cth style=\"font-weight: normal;\"\u003eP-value\u003c/th\u003e\n        \u003cth style=\"font-weight: normal;\"\u003eCoefficient\u003c/th\u003e\n        \u003cth style=\"font-weight: normal;\"\u003eP-value\u003c/th\u003e\n        \u003cth style=\"font-weight: normal;\"\u003eCoefficient\u003c/th\u003e\n        \u003cth style=\"font-weight: normal;\"\u003eP-value\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e60M\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.9972\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e2e-62\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.9662\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e7e-26\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.9372\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.0\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.7942\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e130M\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.9925\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e2e-76\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.9409\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e9e-37\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.8698\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.0\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.6830\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e350M\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.9770\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e3e-113\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.8848\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e5e-65\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.9091\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.0\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.7400\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e1B\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.9469\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e1e-83\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.8249\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e1e-56\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.8331\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.0\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.6513\u003c/td\u003e\n        \u003ctd style=\"text-align: center;\"\u003e0.0\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n\nSpearman and Kendall correlation coefficients range from -1 to +1, +1 signifies a perfect positive correlation, and -1 signifies a perfect negative correlation. Generally, a p-value below 0.05 suggests that a significant correlation exists. As shown in the above table, both Spearman and Kendall correlation coefficients indicate a strong positive relationship at the matrix and column levels across all sizes of the LLaMA models, with all p-values below 0.05. \n\nTherefore, it is likely that the observed behavior is an inherent feature of LLM training, manifesting across a broad range of scenarios. This insight provides a robust experimental basis for our proposed norm-based scaling in Fira and helps explain its effectiveness. Code for this analysis is provided in `./similarity`.\n\n\u003c!-- Code can be found in `./similarity`. --\u003e\n## Acknowledgement\nThis implementation is based on code from several repositories.\n* [Galore](https://github.com/jiaweizzhao/GaLore)\n* [LLM-Adapters](https://github.com/AGI-Edgerunners/LLM-Adapters)\n\n## Citation\n\n```\n@article{chen2024firaachievefullranktraining,\n      title={Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?}, \n      author={Xi Chen and Kaituo Feng and Changsheng Li and Xunhao Lai and Xiangyu Yue and Ye Yuan and Guoren Wang},\n      journal={arXiv},\n      year={2024},\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxichen-fy%2FFira","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxichen-fy%2FFira","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxichen-fy%2FFira/lists"}