Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/showlab/movieseq

[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
https://github.com/showlab/movieseq

Last synced: about 2 months ago
JSON representation

[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences

Awesome Lists containing this project

README

        

# MovieSeq (ECCV'24)

![overview](./assets/teaser.png)

MovieSeq is a method designed to enhance Large Multimodal Models for improved video in-context learning using interleaved multimodal sequences (e.g., character photo, human dialogues, etc).

We have developed a lightweight practical code that can be easily integrated into existing LMMs (e.g., GPT-4o) for easy usage.

## Environments
```
conda create --name movieseq python=3.10
conda activate movieseq
conda install pytorch==2.0.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia

pip install git+https://github.com/m-bain/whisperx.git
pip install tqdm moviepy openai opencv-python
```

## Guideline
Please refer to `example.ipynb` to learn how MovieSeq works.
Have fun!