Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amazon-science/mix-generation
MixGen: A New Multi-Modal Data Augmentation
https://github.com/amazon-science/mix-generation
data-augmentation data-efficiency multimodal pretraining vision-language
Last synced: 3 months ago
JSON representation
MixGen: A New Multi-Modal Data Augmentation
- Host: GitHub
- URL: https://github.com/amazon-science/mix-generation
- Owner: amazon-science
- License: apache-2.0
- Created: 2022-06-30T07:33:41.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-09T21:46:27.000Z (almost 2 years ago)
- Last Synced: 2024-07-28T19:19:42.280Z (4 months ago)
- Topics: data-augmentation, data-efficiency, multimodal, pretraining, vision-language
- Language: Python
- Homepage:
- Size: 9.09 MB
- Stars: 109
- Watchers: 3
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- Awesome-Mixup - [Code
README
## MixGen: A New Multi-Modal Data Augmentation
This is the official PyTorch implementation of [MixGen](https://arxiv.org/abs/2206.08358), which is a joint data augmentation technique for vision-language representation learning to improve data efficiency.
Here are some image-text pairs generated by MixGen,
## How to use
MixGen is an input-level data augmentation technique, which can be plugged-and-played into existing vision-language learning methods with minimal code change.
Here we adopt [ALBEF, NeurIPS'21](https://arxiv.org/abs/2107.07651) as an illustrating example. We only need to add one line between dataloader and model forward [here](https://github.com/salesforce/ALBEF/blob/main/Pretrain.py#L54).
That is, change from
```
for i, (image, text) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
optimizer.zero_grad()
```to
```
import mixgen as mg
for i, (image, text) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
image, text = mg.mixgen(image, text, num=16)
optimizer.zero_grad()
```And that's it!!! No more changes needed to be made. You can simply kicoff training just like ALBEF does,
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env Pretrain.py
```## Citation
If you find MixGen useful in your research, please kindly consider to cite the following paper.
```
@InProceedings{Hao_2023_WACV,
author = {Hao, Xiaoshuai and Zhu, Yi and Appalaraju, Srikar and Zhang, Aston and Zhang, Wanqian and Li, Bo and Li, Mu},
title = {MixGen: A New Multi-Modal Data Augmentation},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
month = {January},
year = {2023},
pages = {379-389}
}
```## Security
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
## License
This project is licensed under the Apache-2.0 License.