https://github.com/togethercomputer/smir
synthetic data pipeline for multi-image reasoning
https://github.com/togethercomputer/smir
Last synced: 4 months ago
JSON representation
synthetic data pipeline for multi-image reasoning
- Host: GitHub
- URL: https://github.com/togethercomputer/smir
- Owner: togethercomputer
- License: apache-2.0
- Created: 2024-10-16T22:18:48.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-04T21:45:48.000Z (over 1 year ago)
- Last Synced: 2025-03-04T22:29:55.648Z (over 1 year ago)
- Language: Python
- Size: 939 MB
- Stars: 3
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SMiR
Synthetic data pipeline for multi-image reasoning
## Overview
This repository contains the official implementation of our paper: [Efficient Synthetic Data Pipeline to Improve Multi-Image Reasoning](https://arxiv.org/abs/2501.03675).
## 🏆 Credits
We would like to acknowledge the following resources that were instrumental in the development of SMIR:
- [Meta Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct): We utilized the Llama 3.1 model as our foundational language model via ["Together AI"](https://www.together.ai/models/llama-3-1-70b).
- [SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384): We utilized a SigLIP model as our embedding model from Google.
- [CLIP](https://github.com/facebookresearch/MetaCLIP/blob/main/src/open_clip/model_configs/ViT-H-14-quickgelu.json): We utilized MetaCLIP, Meta's implementation of CLIP, as our embedding model.
- We used training and evaluation code from the following repositories:
- [MANTIS: Interleaved Multi-Image Instruction Tuning](https://github.com/TIGER-AI-Lab/Mantis)
- [From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://github.com/lmarena/arena-hard-auto)
## 📚 BibTeX
```bibtex
@misc{li2025smirefficientsyntheticdata,
title={SMIR: Efficient Synthetic Data Pipeline To Improve Multi-Image Reasoning},
author={Andrew Li and Rahul Thapa and Rahul Chalamala and Qingyang Wu and Kezhen Chen and James Zou},
year={2025},
eprint={2501.03675},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.03675},
}
```