https://github.com/willard-yuan/video-text-retrieval-papers
https://github.com/willard-yuan/video-text-retrieval-papers
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/willard-yuan/video-text-retrieval-papers
- Owner: willard-yuan
- Created: 2021-04-12T12:39:04.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-09-16T12:48:04.000Z (almost 4 years ago)
- Last Synced: 2025-01-12T00:26:35.420Z (5 months ago)
- Size: 12.7 KB
- Stars: 15
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Awesome Text to Video papers
The main goal is to collect text-to-video works in academia and industry.
[](https://awesome.re)
## Papers
- [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918), arxiv 2021.
- [Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval](https://arxiv.org/abs/2104.00650), arxiv 2021, [code](https://github.com/m-bain/frozen-in-time).
- [MDMMT: Multidomain Multimodal Transformer for Video Retrieval](https://arxiv.org/abs/2103.10699), arxiv 2021.
- [Multi-modal Transformer for Video Retrieval](https://hal.inria.fr/hal-02903209/document), ECCV 2020, [code](https://github.com/gabeur/mmt).
- [VisRel: Media Search at Scale](https://research.fb.com/wp-content/uploads/2021/08/VisRel-Media-Search-at-Scale.pdf), SIGKDD 2021.
- [Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook](https://research.fb.com/wp-content/uploads/2021/08/Que2Search-Fast-and-Accurate-Query-and-Document-Understanding-for-Search-at-Facebook.pdf), SIGKDD 2021.
- [CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval](https://arxiv.org/abs/2104.08860), [code](https://github.com/ArrowLuo/CLIP4Clip).## Datasets
- [Microsoft Research Video Description Corpus (MSVD)](https://paperswithcode.com/dataset/msvd):也称为YouTube2Text dataset,该数据集同样由Microsoft Research提供,地址为 Microsoft Research Video Description Corpus 。该数据集包含1970段YouTube视频片段(时长在10-25s之间),每段视频被标注了大概40条英文句子。
- [MSR-VTT (Microsoft Research Video to Text)](https://paperswithcode.com/dataset/msr-vtt):该数据集为ACM Multimedia 2016 的 Microsoft Research - Video to Text (MSR-VTT) Challenge。地址为 Microsoft Multimedia Challenge 。该数据集包含10000个视频片段(video clip),被分为训练,验证和测试集三部分。每个视频片段都被标注了大概20条英文句子。此外,MSR-VTT还提供了每个视频的类别信息(共计20类),这个类别信息算是先验的,在测试集中也是已知的。同时,视频都是包含音频信息的。该数据库共计使用了四种机器翻译的评价指标,分别为:METEOR, BLEU@1-4,ROUGE-L,CIDEr。
- [LSMDC (Large Scale Movie Description Challenge)](https://paperswithcode.com/dataset/lsmdc): This dataset contains 118,081 short video clips extracted from 202 movies. Each video has a caption, either extracted from the movie script or from transcribed DVS (descriptive video services) for the visually impaired. The validation set contains 7408 clips and evaluation is performed on a test set of 1000 videos from movies disjoint from the training and val sets.## Benchmarks
- [Text-To-Video Benchmarks](https://paperswithcode.com/task/video-retrieval).