https://github.com/amazon-science/mix-generation

MixGen: A New Multi-Modal Data Augmentation
https://github.com/amazon-science/mix-generation

data-augmentation data-efficiency multimodal pretraining vision-language

Last synced: 10 days ago
JSON representation

MixGen: A New Multi-Modal Data Augmentation

Host: GitHub
URL: https://github.com/amazon-science/mix-generation
Owner: amazon-science
License: apache-2.0
Created: 2022-06-30T07:33:41.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-01-09T21:46:27.000Z (over 2 years ago)
Last Synced: 2025-06-26T11:04:15.933Z (17 days ago)
Topics: data-augmentation, data-efficiency, multimodal, pretraining, vision-language
Language: Python
Homepage:
Size: 9.09 MB
Stars: 123
Watchers: 3
Forks: 7
Open Issues: 2
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

Awesome-Mixup - [Code

README

        ## MixGen: A New Multi-Modal Data Augmentation

This is the official PyTorch implementation of [MixGen](https://arxiv.org/abs/2206.08358), which is a joint data augmentation technique for vision-language representation learning to improve data efficiency.



Here are some image-text pairs generated by MixGen,



## How to use

MixGen is an input-level data augmentation technique, which can be plugged-and-played into existing vision-language learning methods with minimal code change.

Here we adopt [ALBEF, NeurIPS'21](https://arxiv.org/abs/2107.07651) as an illustrating example. We only need to add one line between dataloader and model forward [here](https://github.com/salesforce/ALBEF/blob/main/Pretrain.py#L54).

That is, change from

```

for i, (image, text) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):

    optimizer.zero_grad()

```

to

```

import mixgen as mg

for i, (image, text) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):

    image, text = mg.mixgen(image, text, num=16)

    optimizer.zero_grad()

```

And that's it!!! No more changes needed to be made. You can simply kicoff training just like ALBEF does,

```

python -m torch.distributed.launch --nproc_per_node=8 --use_env Pretrain.py

```

## Citation

If you find MixGen useful in your research, please kindly consider to cite the following paper.

```

@InProceedings{Hao_2023_WACV,

    author    = {Hao, Xiaoshuai and Zhu, Yi and Appalaraju, Srikar and Zhang, Aston and Zhang, Wanqian and Li, Bo and Li, Mu},

    title     = {MixGen: A New Multi-Modal Data Augmentation},

    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},

    month     = {January},

    year      = {2023},

    pages     = {379-389}

}

```

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License

This project is licensed under the Apache-2.0 License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amazon-science/mix-generation

Awesome Lists containing this project

README