https://github.com/declare-lab/robustmift

[Arxiv 2024] Official Implementation of the paper: "Towards Robust Instruction Tuning on Multimodal Large Language Models"
https://github.com/declare-lab/robustmift

instruction-following large-language-models llm llm-robustness multimodal-instruction-tuning multimodal-large-language-models

Last synced: 3 months ago
JSON representation

[Arxiv 2024] Official Implementation of the paper: "Towards Robust Instruction Tuning on Multimodal Large Language Models"

Host: GitHub
URL: https://github.com/declare-lab/robustmift
Owner: declare-lab
License: mit
Created: 2024-02-22T11:14:17.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-22T08:46:38.000Z (about 1 year ago)
Last Synced: 2025-03-27T18:21:39.824Z (3 months ago)
Topics: instruction-following, large-language-models, llm, llm-robustness, multimodal-instruction-tuning, multimodal-large-language-models
Language: Jupyter Notebook
Homepage:
Size: 67.5 MB
Stars: 8
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Robust Instruction Tuning on MLLMs
Official Implementation of the paper: [InstrAug: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning](https://arxiv.org/pdf/2402.14492.pdf)

## Introduction
InstrAug is a framework for instruction augmentation. It can expand extant small instruction set to __up to 30x__ larger.
The whole pipeline of InstrAug includes (as illustrated in the figure below):
1. Meta-prompt Generation
2. Augmented Instruction Generation and Rule-based Filtering
* Multi-temp sampling ($\rm MIns+_{\rm MT}$)
* Iterative rephrasing ($\rm MIns+_{\rm Iter}$)
3. Instruction-following Dataset Construction

We apply InstrAug to Multimodal Instruction Fine-tuning (MIFT) benchmarks and test on 12 downstream tasks from __MultiInstruct__ and __InstrutBLIP-Bench__ and the whole __MMMU benchmark__.
The results show that the model's capability on instruction-augmented dataset (59K) is competitive to or even exceeds non-augmented but larger datasets (564K).

## Repo Hierarchy
The file structure in this repository is as below, we only show important folders/files

.
├── IBLIP # Implementation code on Instruct-BLIP
├── OFA # Implementation code on OFA
├── MultiInstruct # Code to create MINS+
├──llama # Code to generate augmented instructions using LLaMA
├──mminstr_dataset # folder to store MINS and MINS+ dataset
└──instruction_data # folder to store original and generated instruction set
├── LICENSE
└── README.md

## Usage
Please refer to the README.md under individual folder for more details.

## Results
### 1. Results on MultiInstruct

### 2. Results on IBLIP-Bench

### 3. Results on MMMU

## Citation
Please cite our paper if you find this work useful for your research and applications

```bibtex
@misc{han2024robust,
title={Towards Robust Instruction Tuning on Multimodal Large Language Models},
author={Wei Han and Hui Chen and Soujanya Poria},
year={2024},
eprint={2402.14492},
archivePrefix={arXiv},
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/declare-lab/robustmift

Awesome Lists containing this project

README