{"id":18832919,"url":"https://github.com/declare-lab/della","last_synced_at":"2025-04-14T04:31:15.973Z","repository":{"id":244883645,"uuid":"816311866","full_name":"declare-lab/della","owner":"declare-lab","description":"DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling","archived":false,"fork":false,"pushed_at":"2024-07-12T13:05:55.000Z","size":164,"stargazers_count":29,"open_issues_count":1,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-03-27T18:21:28.656Z","etag":null,"topics":["huggingface","llm","llms","model-merging","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/declare-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-17T13:37:21.000Z","updated_at":"2025-03-15T15:21:01.000Z","dependencies_parsed_at":"2024-06-18T04:12:05.694Z","dependency_job_id":"c541d114-5d58-4fe5-96fc-f9f8d4704419","html_url":"https://github.com/declare-lab/della","commit_stats":null,"previous_names":["declare-lab/della"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2Fdella","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2Fdella/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2Fdella/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2Fdella/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/declare-lab","download_url":"https://codeload.github.com/declare-lab/della/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248821691,"owners_count":21166931,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["huggingface","llm","llms","model-merging","transformer"],"created_at":"2024-11-08T01:59:32.704Z","updated_at":"2025-04-14T04:31:15.953Z","avatar_url":"https://github.com/declare-lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling\n\n[Read the paper](https://arxiv.org/abs/2406.11617)\n\nWith the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, **D**rop and r**E**sca**L**e via samp**L**ing with m**A**gnitude (DELLA-Merging), that employs a novel pruning technique, MagPrune, which shows significant advantages over DARE and TIES. MagPrune first ranks the parameters in order of their magnitude and assigns higher dropout probabilities ($p$) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MagPrune employs a rescaling operation on the parameters that survive the random dropping by $1/(1-p)$. On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA).\n\n## Setting up Environment\n```bash\nconda create -n della python=3.9.18\nconda activate della\n\npip install -r requirements.txt\npip install -e ./mergekit/\n```\n\n### Installing HumanEval\n```bash\ngit clone https://github.com/openai/human-eval\npip install -e human-eval\n```\n\n### Installing lm-evaluation-harness\n```bash\ngit clone https://github.com/EleutherAI/lm-evaluation-harness\ncd lm-evaluation-harness\npip install -e .\n```\n\n## Merging and Pruning models\nBefore performing pruning or merging, add the paths to the following model checkpoints in merge.py\n\n```py\n# Expert model paths\nWIZARDMATH13B_PATH = \"\u003cPath to WizardMath-13B-V1.0\u003e\"\nWIZARDCODER13B_PATH = \"\u003cPath to WizardCoder-Python-13B-V1.0\u003e\"\nWIZARDLM13B_PATH = \"\u003cPath to WizardLM-13B-V1.2\u003e\"\nLLAMA2_13B_CODE_ALPACA = \"\u003cPath to llama-2-13b-code-alpaca\u003e\"\n\n# Base model paths\nCODELLAMA_PATH = \"\u003cPath to CodeLlama-13b-Python-hf\u003e\"\nLLAMA2_13B_PATH = \"\u003cPath to Llama-2-13b-hf\u003e\"\n```\n\n```bash\npython merge.py \\\n    --drop_rate 0.3 \\ # Drop Rate of delta parameters\n    --merge_method della\\ \n    --models LM_math_code \\\n    --weights 1.0 \\ # Weight assigned to each model's delta parameters\n    --lambda_factor 1.1 \\ # Lambda Scaling Factor after Step 3: Merge\n    --window_size 0.14 \\ # Window Size for Probabilities. Does not affect DARE and TIES\n    --rescale 1 \\ # Whether to rescale in step 1, acccepts only 1 or 0.\n    --seed 42 # Random Seed \n```\n\nTo perform other merge combinations, replace `LM_math_code` in the command with `LM_math`, `LM_code` or `math_code`. For pruning experiments, pass in `LM`, `math`, `code` or `Coder` under the models argument. Since WizardCoder(`Coder`), uses a different base model compared to the other 3 model, we are not able to merge it with the other models effectively. Refer to `mergekit/mergekit/mergemethods/__init__.py` for a list of all the implemented mergemethods using DARE, TIES and DELLA.\n\n## Generating Responses for Evaluation\nAfter performing Merging, We provide scripts to perform inference on 2 evaluation datasets: AlpacaEval for instruction-following task, and MBPP for code generation.\n\nFor GSM8K response generation, we use [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to perform generation.\n\n### Script for generating AlpacaEval responses\n```bash\nCUDA_VISIBLE_DEVICES=0 python generate.py \\\n  --model_path Path_to_Model_Checkpoint_folder \\\n  --dataset alpaca_eval \\\n  --full\n```\nAfter running the script, the model's generations would then be found at `./save_gen_instruct_responses_results/alpaca_eval/{model_name}.json`\n\n### Script for generating MBPP responses\n```bash\nCUDA_VISIBLE_DEVICES=0 python generate.py \\\n  --model_path Path_to_Model_Checkpoint_folder \\\n  --dataset mbpp \\\n  --full\n```\nAfter running the script, the model's generations would then be found at `./save_gen_codes_results/mbpp/{model_name}.json`\n\n### Generating GSM8K Responses\nBefore running the Script, copy the lm_eval config file from `./lm_eval_task_config/gsm8k_cot_zeroshot_alpaca.yaml` and paste it in the lm-evaluation-harness repository under `./lm-evaluation-harness/lm_eval/tasks/gsm8k` directory. Since the WizardMath model uses the alpaca prompt template, we use this new task config that uses the alapca prompt template to get the gsm8k generations.\n\n```bash\nlm_eval --model hf \\\n      --model_args pretrained=Path_to_Model_Checkpoint_folder \\\n      --tasks gsm8k_cot_zeroshot_alpaca \\\n      --batch_size 8 \\\n      --output_path ./save_gen_math_results/gsm8k/{model_name}/ \\\n      --log_samples \\\n      --seed 42 \\\n      --device cuda:0 \\\n```\n\nAfter running, the script, you will find the model's generations and the hard-coded parser evaluations under the output path specified in the command.\n\n## Performing Evaluation\nFor the 3 tasks, our code will store the generated completions from the model from the earlier scripts. Please run the following commands to perform evaluation and get the final metrics.\n\n### AlpacaEval\nWe use ```alpaca_eval_gpt4``` evaluator in the [alpaca_eval repository](https://github.com/tatsu-lab/alpaca_eval) to compute the win rate. Please refer to [alpaca_eval repository](https://github.com/tatsu-lab/alpaca_eval) to install the environment. Then, to perform the evaluation, run the following command:\n```bash\nalpaca_eval \\\n  --model_outputs save_gen_instruct_responses_results/alpaca_eval/{model_name}.json \\\n  --name {model_name} \\\n  --output_path alpaca_eval_results/ \\\n  --is_overwrite_leaderboard True \\\n  --annotators_config alpaca_eval_gpt4 \\\n```\nThis will create a csv file ```./alpaca_res_full/alpaca_eval_gpt4/leaderboard.csv``` containing a leaderboard ranking models based on their evaluated winrate.\n\n### AlpacaEval\nWe use ```alpaca_eval_gpt4``` evaluator in the [alpaca_eval repository](https://github.com/tatsu-lab/alpaca_eval) to compute the win rate. Please refer to [alpaca_eval repository](https://github.com/tatsu-lab/alpaca_eval) to install the environment. Then, to perform the evaluation, run the following command:\n```bash\nalpaca_eval \\\n  --model_outputs save_gen_instruct_responses_results/alpaca_eval/{model_name}.json \\\n  --name {model_name} \\\n  --output_path alpaca_eval_results/ \\\n  --is_overwrite_leaderboard True \\\n  --annotators_config alpaca_eval_gpt4 \\\n```\nThis will create a csv file ```./alpaca_res_full/alpaca_eval_gpt4/leaderboard.csv``` containing a leaderboard ranking models based on their evaluated winrate.\n\n### MBPP\n\nTo perform evaluation on MBPP, please refer to [bigcode-evaluation-harness repository](https://github.com/bigcode-project/bigcode-evaluation-harness) to install the environment. Then run the following command to perform the evaluation:\n\n```{bash}\naccelerate launch ../bigcode-evaluation-harness/main.py\\\n    --tasks mbpp \\\n    --allow_code_execution \\\n    --model {model_name} \\\n    --load_generations_path ./save_gen_codes_results/mbpp/{model_name}.jsonl \\\n    --metric_output_path ./save_codes_results/mbpp/{model_name}_eval_metrics.json \\\n```\n\n### GSM8k\nWe used GPT4 to perform evaluation of the model's math resposnses by prompting GPT4 with the question, reference solution, and model-generated answer to evaluate the answer's correctness. This performs a more comprehensive automatic evaluation as compared to the rigid, hard-coded parsing approaches that are suboptimal math evaluators. GPT4 acts as a smart parser and can correctly identify the final answer instead of just taking the last number in the generation as the answer. GPT-4 can also evaluate the intermediate steps in the solution to provide a more accurate evaluation of the model's mathematical reasoning.\n\nTo perform evaluation with GPT4, first open gpt4_as_judge_gsm8k.py and add the path to a json file containing your gpt4 api key. \n```python\nkey_path = \"\u003cPath to GPT4 API KEY JSON\u003e\"\n```\n run the following command:\n\n```bash\npython gpt4_as_judge_gsm8k.py \\\n  --response_file Path_to_lm-eval_gsm8k_response_file \\\n  --save_path save_gsm8k_results/\n```\n\n## Citation\n\n```\n@misc{deep2024dellamerging,\n      title={DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling}, \n      author={Pala Tej Deep and Rishabh Bhardwaj and Soujanya Poria},\n      year={2024},\n      eprint={2406.11617},\n      archivePrefix={arXiv},\n      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeclare-lab%2Fdella","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeclare-lab%2Fdella","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeclare-lab%2Fdella/lists"}