{"id":37762010,"url":"https://github.com/nik-dim/tall_masks","last_synced_at":"2026-01-16T14:38:01.603Z","repository":{"id":239647809,"uuid":"793715559","full_name":"nik-dim/tall_masks","owner":"nik-dim","description":"Official repository of \"Localizing Task Information for Improved Model Merging and Compression\" [ICML 2024]","archived":false,"fork":false,"pushed_at":"2025-12-22T18:03:40.000Z","size":555,"stargazers_count":51,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-12-24T07:30:37.418Z","etag":null,"topics":["model-merging","task-arithmetic"],"latest_commit_sha":null,"homepage":"https://tall-masks.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nik-dim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-04-29T18:26:20.000Z","updated_at":"2025-12-22T18:03:44.000Z","dependencies_parsed_at":"2024-10-24T20:27:10.128Z","dependency_job_id":"f87d31e9-e023-478d-ae3c-5a51bf905e2c","html_url":"https://github.com/nik-dim/tall_masks","commit_stats":null,"previous_names":["nik-dim/tall_masks"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nik-dim/tall_masks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nik-dim%2Ftall_masks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nik-dim%2Ftall_masks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nik-dim%2Ftall_masks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nik-dim%2Ftall_masks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nik-dim","download_url":"https://codeload.github.com/nik-dim/tall_masks/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nik-dim%2Ftall_masks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28479399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["model-merging","task-arithmetic"],"created_at":"2026-01-16T14:38:01.512Z","updated_at":"2026-01-16T14:38:01.583Z","avatar_url":"https://github.com/nik-dim.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TALL Masks\n\nThis is the source code to reproduce the experiments for \"[Localizing Task Information for Improved Model Merging and Compression](https://arxiv.org/abs/2405.07813)\" by Ke Wang*, Nikolaos Dimitriadis*, Guillermo Ortiz-Jimenez, Francois Fleuret, and Pascal Frossard.\n\nOur paper identifies that the task-specific knowledge is preserved after mering, and proposed a method named TALL mask to localize them.\nBased on TALL mask, we proposed:\n1) a compression scheme which utilizes TALL mask to recover single-task fine-tuned performance for each task\n2) a merging algorithm which removes catastrophic and selfish weights to improve model merging performance\n\nYou can also check more information on the [project website](https://tall-masks.github.io/).\n\n![](figures/illustration.png)\n\n## Updates\n\nOur new work \"[LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging](https://arxiv.org/abs/2410.17146)\" is accepted ICLR 2025. Checkout the github repo from [LiNeS repo](https://github.com/wang-kee/LiNeS)!\n\n## Dependencies\n\nTo run the code, please install all its dependencies:\n```sh\nconda env create\nconda activate tall-masks\n```\n\n## Checkpoints\nWe provide the checkpoints, as well as the generated task-specific masks we used in the paper in [this link](https://drive.google.com/drive/folders/15ParSng4d5xSdaWdBFsg1617zPXT8Dae?usp=sharing). Alternatively, you can download the checkpoints and masks by running the following script:\n```sh\n# model options --model {ViT-B-32,ViT-L-14} \n# kind options --kind {checkpoints,tall_masks}\n# use python download_checkpoints.py --help for more information\npython download_checkpoints.py --model='ViT-B-32' --kind=checkpoints\n```\n\nThe script downloads *all* the checkpoints for one model corresponding to 40 files (finetuned checkpoint and classification head for 20 tasks). The script used the `gdown` package to download the files. If you encounter any issues, please refer to the [gdown documentation](https://github.com/wkentaro/gdown?tab=readme-ov-file#faq). A common issue is that the download quota is exceeded, in which case you can download the files manually from the [Google Drive folder](https://drive.google.com/drive/folders/15ParSng4d5xSdaWdBFsg1617zPXT8Dae?usp=sharing) or modify your local cookies file as described in the gdown documentation.\n\nAlternatively, the checkpoints can be downloaded from the HuggingFace repo [`nik-dim/tall_masks`](https://huggingface.co/nik-dim/tall_masks). See the [`snapshot_download documentation`](https://huggingface.co/docs/huggingface_hub/v0.26.0/en/package_reference/file_download#huggingface_hub.snapshot_download) for more details.\n\n```sh\nfrom huggingface_hub import snapshot_download\n\n# download the ViT-B-32 checkpoints including backbone, classification heads and tall masks\nsnapshot_download(repo_id=\"nik-dim/tall_masks\", allow_patterns=\"*32*\")\n\n# download the ViT-B-16 checkpoints including backbone, classification heads and tall masks\nsnapshot_download(repo_id=\"nik-dim/tall_masks\", allow_patterns=\"*16*\")\n\n# download the ViT-L-14 checkpoints including backbone, classification heads and tall masks\nsnapshot_download(repo_id=\"nik-dim/tall_masks\", allow_patterns=\"*14*\")\n\n# download everything\nsnapshot_download(repo_id=\"nik-dim/tall_masks\")\n```\n\n## Datasets\nMost datasets being used should be downloaded automatically with torchvision or huggingface. For the datasets requiring manual preparation, please follow the instructions in [this issue](https://github.com/mlfoundations/task_vectors/issues/1). Depending on the torchvision version, some issues might arise when downloading specific datasets like [here](https://github.com/basveeling/pcam/issues/4) or [here](https://github.com/pytorch/vision/issues/5662). In this case, using a different torchvision version might solve the issue. \n\n\n## Localizing Task Information with TALL Masks\n\nBelow gives an example of pseudo-code to use TALL mask to localize the information in multi-task vector to reconstruct the individual checkpoints.\n\nTo create a task vector, you will need a pre-trained checkpoint and a fine-tuned checkpoint:\n```python\nfrom task_vectors import TaskVector\ntask_vector_A = TaskVector(pretrained_checkpoint, finetuned_checkpoint_A)\n```\nCreate a multi-task vector:\n```python\nmulti_task_vector = task_vector_A + task_vector_B + task_vector_C\n```\nConstruct tall mask:\n```python\ntall_mask_A = task_vector_A.abs() \u003e (multi_task_vector - task_vector_A).abs() * lambda\n```\nReconstruct fine-tuned model with tall mask:\n```python\n# the reconstructed finetuned_checkpoint_A has near the same performance as original finetuned_checkpoint_A\nreconstructed_finetuned_checkpoint_A = pretrained_checkpoint + multi_task_vector * tall_mask_A\n```\n## Finetuning\nThe script `finetune.py` can be used to reproduce the training protocol we used to fine-tune our models on all our downstream tasks.\n```sh \n# Finetune on 2 GPUs\npython finetune.py --model=ViT-B-32 --world-size=2 \n```\n\n## Evaluation\n\n### Model merging evaluation\n\nEvaluation is performed with Hydra, please modify `model_location` and `data_location` in `config/config.yaml` before evaluation. \n\n##### Evaluate with baseline model merging methods:\n```bash\n# Evaluate with Task Arithmetic\npython main.py model=ViT-B-32 method=\"sum\" \n\n# Evaluate with Ties-merging\npython main.py model=ViT-B-32 method=\"ties\" method.k=20\n```\n##### Evaluate with TALL mask + model merging methods:\n```bash\n# Evaluate with Tall mask + Task Arithmetic (load tall masks from storage)\npython main.py model=ViT-B-32 method=\"tall_mask\" method.load_mask=True\n\n# Evaluate with Tall mask + Task Arithmetic (construct tall masks from scratch)\npython main.py model=ViT-B-32 method=\"tall_mask\"\n\n# Evaluate with Tall mask + Ties-merging (load tall masks from storage)\npython main.py model=ViT-B-32 method=\"tall_mask\" method.use_ties=True method.load_mask=True\n\n# Evaluate with Tall mask + Ties-merging (construct tall masks from scratch)\npython main.py model=ViT-B-32 method=\"tall_mask\" method.use_ties=True \n```\n##### Evaluate with Consensus Merging (after constructing TALL masks):\n``` bash\n# Evaluate with Consensus Task Arithmetic\npython main.py model=ViT-B-32 method=\"consensus\" method.prun_thre_k=2\n\n# Evaluate with Consensus Ties-merging\npython main.py model=ViT-B-32 method=\"consensus\" method.prun_thre_k=2 method.use_ties=True\n```\n\nNote that you can set different number of tasks by setting `num_tasks`. Then, the first `num_tasks` are going to be selected from the list defined in `src/utils/variables_and_paths.py`. Alternatively, you can directly specify the tasks as a list of strings (e.g. `DATASETS=[MNIST,Cars]`). The results of the papers can be retrived by setting `num_tasks` to 8, 14 and 20 for the corresponding experiments.\n\n### Single-task evaluation\nYou can evaluate the performance of the fine-tuned weights on each single task by running\n```sh \n# Evaluate pre-trained models.\npython eval_single_task.py --model=ViT-B-32 --finetuning-mode=none\n\n# Evaluate non-linearly fine-tuned models.\npython eval_single_task.py --model=ViT-B-32 --finetuning-mode=standard\n```\n\nThe results are saved in the `results/` folder. \n\n## Reference\nIf you find this code useful, please cite the following paper:\n```bibtex\n@inproceedings{wang2024localizing,\n  title={Localizing Task Information for Improved Model Merging and Compression},\n  author={Wang, Ke and\n    Dimitriadis, Nikolaos and\n    Ortiz{-}Jim{\\'{e}}nez, Guillermo and\n    Fleuret, Fran\\c{c}ois and\n    Frossard, Pascal},\n  booktitle={International Conference on Machine Learning},\n  year={2024}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnik-dim%2Ftall_masks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnik-dim%2Ftall_masks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnik-dim%2Ftall_masks/lists"}