https://github.com/togethercomputer/smir

synthetic data pipeline for multi-image reasoning
https://github.com/togethercomputer/smir

Last synced: 4 months ago
JSON representation

synthetic data pipeline for multi-image reasoning

Host: GitHub
URL: https://github.com/togethercomputer/smir
Owner: togethercomputer
License: apache-2.0
Created: 2024-10-16T22:18:48.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-04T21:45:48.000Z (over 1 year ago)
Last Synced: 2025-03-04T22:29:55.648Z (over 1 year ago)
Language: Python
Size: 939 MB
Stars: 3
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # SMiR

Synthetic data pipeline for multi-image reasoning

## Overview

This repository contains the official implementation of our paper: [Efficient Synthetic Data Pipeline to Improve Multi-Image Reasoning](https://arxiv.org/abs/2501.03675).

## 🏆 Credits

We would like to acknowledge the following resources that were instrumental in the development of SMIR:

- [Meta Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct): We utilized the Llama 3.1 model as our foundational language model via ["Together AI"](https://www.together.ai/models/llama-3-1-70b).

- [SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384): We utilized a SigLIP model as our embedding model from Google.

- [CLIP](https://github.com/facebookresearch/MetaCLIP/blob/main/src/open_clip/model_configs/ViT-H-14-quickgelu.json): We utilized MetaCLIP, Meta's implementation of CLIP, as our embedding model.

- We used training and evaluation code from the following repositories:

  - [MANTIS: Interleaved Multi-Image Instruction Tuning](https://github.com/TIGER-AI-Lab/Mantis)

  - [From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://github.com/lmarena/arena-hard-auto)



## 📚 BibTeX

```bibtex

@misc{li2025smirefficientsyntheticdata,

      title={SMIR: Efficient Synthetic Data Pipeline To Improve Multi-Image Reasoning}, 

      author={Andrew Li and Rahul Thapa and Rahul Chalamala and Qingyang Wu and Kezhen Chen and James Zou},

      year={2025},

      eprint={2501.03675},

      archivePrefix={arXiv},

      primaryClass={cs.CV},

      url={https://arxiv.org/abs/2501.03675}, 

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/togethercomputer/smir

Awesome Lists containing this project

README