{"id":18322505,"url":"https://github.com/tencentarc/pi-tuning","last_synced_at":"2025-04-05T23:31:05.944Z","repository":{"id":182736299,"uuid":"632265433","full_name":"TencentARC/pi-Tuning","owner":"TencentARC","description":"Official code for \"pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation\", ICML 2023.","archived":false,"fork":false,"pushed_at":"2023-07-21T23:54:55.000Z","size":8131,"stargazers_count":32,"open_issues_count":2,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-21T13:23:10.751Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-25T03:57:19.000Z","updated_at":"2024-04-19T12:31:05.000Z","dependencies_parsed_at":"2024-11-05T18:44:43.061Z","dependency_job_id":null,"html_url":"https://github.com/TencentARC/pi-Tuning","commit_stats":null,"previous_names":["tencentarc/pi-tuning"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2Fpi-Tuning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2Fpi-Tuning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2Fpi-Tuning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2Fpi-Tuning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/pi-Tuning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247415783,"owners_count":20935383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T18:24:54.009Z","updated_at":"2025-04-05T23:31:00.936Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# $\\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation\n\n\u003e Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ping Luo, Ying Shan\n\nThis repo is the official implementation of the paper \u003ca href=\"https://arxiv.org/abs/2304.14381\"\u003e $\\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation \u003c/a\u003e.\n\n![Overview](./imgs/overview.png)\n\n## News\n\n+ **[2023.04]** Our paper is accepted by ICML 2023.\n\n+ **[2023.07]** The official code is released.\n\n## Main Results\n\n### Vision-Language Benchmarks\n\n![Tab1](imgs/Tab1.png)\n\n### Vision Benchmarks\n\n![Tab2](imgs/Tab2.png)\n\n### Language Benchmarks\n\n![Tab3](imgs/Tab3.png)\n\n## Instruction\n\n### Dataset and Checkpoints Preparation\n\nSee [datasets.md](datasets.md) for dataset preparation. As for the checkpoints, please see [checkpoints](checkpoints.md).\n\n### Installation\n```bash\npip install -r OFA/requirements.txt\n```\n### Training and Evaluation\n\nWe use NVIDIA A100 GPUs for training and evaluation. The detailed hyper-parameters can be found in the Appendix. \n\n#### Step 1: PETL training\nWe provide several demo scripts that have all the required parts for PETL training:\n* OFA/run_scripts/refcoco/train_refcoco_adapter.sh \n* OFA/run_scripts/refcoco/train_refcoco_prefix.sh\n* OFA/run_scripts/refcoco/train_refcoco_lora.sh \n\nUsage:\n```bash\ncd OFA\nbash ./run_scripts/refcoco/train_refcoco_adapter.sh\n```\nA few options of note:\n*   `--encoder-prompt` :: whether to insert prompts to the encoder\n*   `--decoder-prompt` :: whether to insert prompts to the decoder\n*   `--encoder-prompt-length` :: encoder prompt length\n*   `--decoder-prompt-length` :: decoder prompt length\n*   `--bitfit` :: whether to use bitfit\n*   `--adapter` :: whether to use adapter\n*   `--adapter-dim` :: adapter projection dim\n*   `--lora` :: whether to use lora\n*   `--lora-r` :: lora rank\n\n#### Step 2: Task similarity measurement\nWe provide a demo script to calculate task embedding of RefCOCO based on Fisher Information Matrix (FIM) with diagonal approximation: `OFA/run_scripts/refcoco/refcoco_task_emb.sh `\n\nUsage:\n```bash\ncd OFA\nbash ./run_scripts/refcoco/refcoco_task_emb.sh\n```\n\nA few options of note:\n* `--task-emb` :: task embedding calculation\n* `--task-emb-file-path` :: directory to save task embedding result (we recommend to save it under OFA/results/task_name/)\n\nAfter obtaining the embedding of each task, use the [task_emb_post_process.ipynb](./OFA/results/task_emb_post_process.ipynb) to calculate the similarity of tasks.\n\n#### Step 3: Expert interpolation\nWe provide a demo script to interpolate 3 experts (RefCOCO, RefCOCO+, RefCOCOg) for the target task, RefCOCO: `OFA/run_scripts/refcoco/train_refcoco_adapter_interpolation.sh`\n\nUsage:\n```bash\ncd OFA\nbash ./run_scripts/refcoco/train_refcoco_adapter_interpolation.sh\n```\n\n#### Evaluation\nAfter the above steps, you can use `OFA/run_scripts/refcoco/evaluate_refcoco.sh` to evaluate the final checkpoint. Remember to change the path of checkpoint in the script.\n\nUsage:\n```bash\ncd OFA\nbash ./run_scripts/refcoco/evaluate_refcoco.sh\n```\n\nWe recommend that your workspace directory should be organized like this: \n```\nOFA/\n├── checkpoints/\n│   ├── ofa_base.pt\n│   ├── ofa_large.pt\n│   └── ...\n├── criterions/\n├── data/\n├── dataset/\n│   ├── caption_data/\n│   ├── refcoco_data/\n│   └── ...\n├── fairseq/\n├── models/\n├── run_scripts/\n├── tasks/\n├── train.py\n├── trainer.py\n└── utils/\n```\n### Acknowledgement\n\nThe code is based on the official implementation of [OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework](https://github.com/OFA-Sys/OFA).\n\n\u003c!-- \n### Citation\n\nIf you find our work helps, please cite our paper.\n\n```tex\n@article{zeng2022learning,\n  title={Learning Transferable Spatiotemporal Representations from Natural Script Knowledge},\n  author={Zeng, Ziyun and Ge, Yuying and Liu, Xihui and Chen, Bin and Luo, Ping and Xia, Shu-Tao and Ge, Yixiao},\n  journal={arXiv preprint arXiv:2209.15280},\n  year={2022}\n}\n``` --\u003e\n\n### License\n\nThis research paper makes references to some open-source projects. Credits are given to these projects. See [License.txt](License.txt) for details.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fpi-tuning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencentarc%2Fpi-tuning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fpi-tuning/lists"}