https://github.com/willard-yuan/video-text-retrieval-papers

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/willard-yuan/video-text-retrieval-papers
Owner: willard-yuan
Created: 2021-04-12T12:39:04.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2021-09-16T12:48:04.000Z (almost 4 years ago)
Last Synced: 2025-01-12T00:26:35.420Z (5 months ago)
Size: 12.7 KB
Stars: 15
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Awesome Text to Video papers

The main goal is to collect text-to-video works in academia and industry.

[![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

## Papers

- [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918), arxiv 2021.

- [Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval](https://arxiv.org/abs/2104.00650), arxiv 2021, [code](https://github.com/m-bain/frozen-in-time).

- [MDMMT: Multidomain Multimodal Transformer for Video Retrieval](https://arxiv.org/abs/2103.10699), arxiv 2021.

- [Multi-modal Transformer for Video Retrieval](https://hal.inria.fr/hal-02903209/document), ECCV 2020, [code](https://github.com/gabeur/mmt).

- [VisRel: Media Search at Scale](https://research.fb.com/wp-content/uploads/2021/08/VisRel-Media-Search-at-Scale.pdf), SIGKDD 2021.

- [Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook](https://research.fb.com/wp-content/uploads/2021/08/Que2Search-Fast-and-Accurate-Query-and-Document-Understanding-for-Search-at-Facebook.pdf), SIGKDD 2021.

- [CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval](https://arxiv.org/abs/2104.08860), [code](https://github.com/ArrowLuo/CLIP4Clip).

## Datasets

- [Microsoft Research Video Description Corpus (MSVD)](https://paperswithcode.com/dataset/msvd)：也称为YouTube2Text dataset，该数据集同样由Microsoft Research提供，地址为 Microsoft Research Video Description Corpus 。该数据集包含1970段YouTube视频片段（时长在10-25s之间），每段视频被标注了大概40条英文句子。

- [MSR-VTT (Microsoft Research Video to Text)](https://paperswithcode.com/dataset/msr-vtt)：该数据集为ACM Multimedia 2016 的 Microsoft Research - Video to Text (MSR-VTT) Challenge。地址为 Microsoft Multimedia Challenge 。该数据集包含10000个视频片段（video clip），被分为训练，验证和测试集三部分。每个视频片段都被标注了大概20条英文句子。此外，MSR-VTT还提供了每个视频的类别信息（共计20类），这个类别信息算是先验的，在测试集中也是已知的。同时，视频都是包含音频信息的。该数据库共计使用了四种机器翻译的评价指标，分别为：METEOR, BLEU@1-4,ROUGE-L,CIDEr。

- [LSMDC (Large Scale Movie Description Challenge)](https://paperswithcode.com/dataset/lsmdc): This dataset contains 118,081 short video clips extracted from 202 movies. Each video has a caption, either extracted from the movie script or from transcribed DVS (descriptive video services) for the visually impaired. The validation set contains 7408 clips and evaluation is performed on a test set of 1000 videos from movies disjoint from the training and val sets.

## Benchmarks

- [Text-To-Video Benchmarks](https://paperswithcode.com/task/video-retrieval).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/willard-yuan/video-text-retrieval-papers

Awesome Lists containing this project

README