Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/declare-lab/robustmift
[Arxiv 2024] Official Implementation of the paper: "Towards Robust Instruction Tuning on Multimodal Large Language Models"
https://github.com/declare-lab/robustmift
instruction-following large-language-models llm llm-robustness multimodal-instruction-tuning multimodal-large-language-models
Last synced: 6 days ago
JSON representation
[Arxiv 2024] Official Implementation of the paper: "Towards Robust Instruction Tuning on Multimodal Large Language Models"
- Host: GitHub
- URL: https://github.com/declare-lab/robustmift
- Owner: declare-lab
- License: mit
- Created: 2024-02-22T11:14:17.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-06-22T08:46:38.000Z (5 months ago)
- Last Synced: 2024-06-22T18:28:24.261Z (5 months ago)
- Topics: instruction-following, large-language-models, llm, llm-robustness, multimodal-instruction-tuning, multimodal-large-language-models
- Language: Jupyter Notebook
- Homepage:
- Size: 67.5 MB
- Stars: 8
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Robust Instruction Tuning on MLLMs
Official Implementation of the paper: [InstrAug: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning](https://arxiv.org/pdf/2402.14492.pdf)## Introduction
InstrAug is a framework for instruction augmentation. It can expand extant small instruction set to __up to 30x__ larger.
The whole pipeline of InstrAug includes (as illustrated in the figure below):
1. Meta-prompt Generation
2. Augmented Instruction Generation and Rule-based Filtering
* Multi-temp sampling ($\rm MIns+_{\rm MT}$)
* Iterative rephrasing ($\rm MIns+_{\rm Iter}$)
3. Instruction-following Dataset Construction
We apply InstrAug to Multimodal Instruction Fine-tuning (MIFT) benchmarks and test on 12 downstream tasks from __MultiInstruct__ and __InstrutBLIP-Bench__ and the whole __MMMU benchmark__.
The results show that the model's capability on instruction-augmented dataset (59K) is competitive to or even exceeds non-augmented but larger datasets (564K).## Repo Hierarchy
The file structure in this repository is as below, we only show important folders/files.
├── IBLIP # Implementation code on Instruct-BLIP
├── OFA # Implementation code on OFA
├── MultiInstruct # Code to create MINS+
├──llama # Code to generate augmented instructions using LLaMA
├──mminstr_dataset # folder to store MINS and MINS+ dataset
└──instruction_data # folder to store original and generated instruction set
├── LICENSE
└── README.md## Usage
Please refer to the README.md under individual folder for more details.## Results
### 1. Results on MultiInstruct
### 2. Results on IBLIP-Bench
### 3. Results on MMMU
## Citation
Please cite our paper if you find this work useful for your research and applications```bibtex
@misc{han2024robust,
title={Towards Robust Instruction Tuning on Multimodal Large Language Models},
author={Wei Han and Hui Chen and Soujanya Poria},
year={2024},
eprint={2402.14492},
archivePrefix={arXiv},
}
```