https://github.com/NeverMoreLCH/Awesome-Video-Grounding

A reading list of papers about Visual Grounding.
https://github.com/NeverMoreLCH/Awesome-Video-Grounding
Last synced: 7 months ago
JSON representation
A reading list of papers about Visual Grounding.
Host: GitHub
URL: https://github.com/NeverMoreLCH/Awesome-Video-Grounding
Owner: NeverMoreLCH
Created: 2021-06-17T07:27:06.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2022-08-24T08:11:15.000Z (almost 3 years ago)
Last Synced: 2024-10-29T16:57:56.144Z (8 months ago)
Size: 52.7 KB
Stars: 31
Watchers: 3
Forks: 4
Open Issues: 2
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

awesome_long_form_video_understanding - Github repo: Awesome Video Grounding
awesome_long_form_video_understanding - Github repo: Awesome Video Grounding
ultimate-awesome - Awesome-Video-Grounding - A reading list of papers about Visual Grounding. (Other Lists / Julia Lists)
README

        # Awesome-Video-Grounding

A reading list of papers about Video Grounding.




## Table of Contents

  * [Temporal Video Grounding Papers](#temporal-video-grounding-papers)

     * [Datasets](#datasets)

     * [2022 Papers](#2022-papers)

     * [2021 Papers](#2021-papers)

     * [2020 Papers](#2020-papers)

     * [2019 Papers](#2019-papers)

     * [2018 Papers](#2018-papers)

     * [2017 Papers](#2017-papers)

  * [Spatial-Temporal Video Grounding Papers](#spatial-temporal-video-grounding-papers)

     * [Datasets](#datasets-1)

     * [2022 Papers](#2022-papers-1)

     * [2021 Papers](#2021-papers-1)

     * [2020 Papers](#2020-papers-1)

     * [2019 Papers](#2019-papers-1)

     * [2018 Papers](#2018-papers-1)

     * [2017 Papers](#2017-papers-1)




## Temporal Video Grounding Papers

### Datasets

1. **Charades-STA** [2017][ICCV] TALL: Temporal Activity Localization via Language Query.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Gao_TALL_Temporal_Activity_ICCV_2017_paper.pdf)][[dataset](https://github.com/jiyanggao/TALL)][[Charades](https://prior.allenai.org/projects/charades)]

2. **ActivityNet Captions** [2017][ICCV] Dense-Captioning Events in Videos.[[paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.pdf)][[dataset](https://cs.stanford.edu/people/ranjaykrishna/densevid/)]

3. **DiDeMo** [2017][ICCV] Localizing Moments in Video with Natural Language.[[paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Hendricks_Localizing_Moments_in_ICCV_2017_paper.pdf)][[dataset](https://github.com/LisaAnne/TemporalLanguageRelease)]

4. **TACoS** [2013][ACL] Grounding Action Descriptions in Videos.[[paper](https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00207/1566623/tacl_a_00207.pdf)][[dataset](http://www.coli.uni-saarland.de/projects/smile/page.php?id=tacos)]

5. **CD** [2021][arXiv] A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics.[[paper](https://arxiv.org/pdf/2101.09028)][[dataset](https://github.com/yytzsy/grounding_changing_distribution)]

6. **CG** [2022][CVPR] Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.[[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Compositional_Temporal_Grounding_With_Structured_Variational_Cross-Graph_Correspondence_Learning_CVPR_2022_paper.pdf)][[dataset](https://github.com/YYJMJC/Compositional-Temporal-Grounding)]

7. **MAD** [2022][CVPR] MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions.[[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Soldan_MAD_A_Scalable_Dataset_for_Language_Grounding_in_Videos_From_CVPR_2022_paper.pdf)][[dataset](https://github.com/Soldelli/MAD)]

### 2022 Papers

1. [2022][AAAI] Explore Inter-Contrast Between Videos via Composition forWeakly Supervised Temporal Sentence Grounding.[[paper](https://www.aaai.org/AAAI22Papers/AAAI-2108.ChenJ.pdf)]

2. [2022][AAAI] Exploring Motion and Appearance Information for Temporal Sentence Grounding.[[paper](https://www.aaai.org/AAAI22Papers/AAAI-112.LiuD.pdf)]

3. [2022][AAAI] Memory-Guided Semantic Learning Network for Temporal Sentence Grounding.[[paper](https://www.aaai.org/AAAI22Papers/AAAI-111.LiuD.pdf)]

4. [2022][AAAI] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/20163/19922)]

5. [2022][AAAI] Unsupervised Temporal Video Grounding with Deep Semantic Clustering.[[paper](https://www.aaai.org/AAAI22Papers/AAAI-110.LiuD.pdf)]

6. [2022][CVPR] Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.[[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Compositional_Temporal_Grounding_With_Structured_Variational_Cross-Graph_Correspondence_Learning_CVPR_2022_paper.pdf)][[code](https://github.com/YYJMJC/Compositional-Temporal-Grounding)]

7. [2022][CVPR] MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions.[[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Soldan_MAD_A_Scalable_Dataset_for_Language_Grounding_in_Videos_From_CVPR_2022_paper.pdf)][[code](https://github.com/Soldelli/MAD)]

8. [2022][IJCV] Weakly Supervised Moment Localization with Decoupled Consistent Concept Prediction.[[paper](https://link.springer.com/article/10.1007/s11263-022-01600-0)]

9. [2022][TIP] Video Moment Retrieval with Cross-Modal Neural Architecture Search.[[paper](https://ieeexplore.ieee.org/abstract/document/9677948/)]

10. [2022][TIP] Exploring Language Hierarchy for Video Grounding.[[paper](https://ieeexplore.ieee.org/abstract/document/9817030/)]

11. [2022][TMM] Cross-modal Dynamic Networks for Video Moment Retrieval with Text Query.[[paper](https://ieeexplore.ieee.org/abstract/document/9681153/)]

### 2021 Papers

1. [2021][ACL] Parallel Attention Network with Sequence Matching for Video Grounding.[[paper](https://arxiv.org/pdf/2105.08481)]

2. [2021][ACMMM] AsyNCE: Disentangling False-Positives forWeakly-Supervised Video Grounding.[[paper](https://dl.acm.org/doi/pdf/10.1145/3474085.3481539)]

3. [2021][CVPR] Cascaded Prediction Network via Segment Tree for Temporal Video Grounding.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhao_Cascaded_Prediction_Network_via_Segment_Tree_for_Temporal_Video_Grounding_CVPR_2021_paper.pdf)]

4. [2021][CVPR] Context-aware Biaffine Localizing Network for Temporal Sentence Grounding.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Context-Aware_Biaffine_Localizing_Network_for_Temporal_Sentence_Grounding_CVPR_2021_paper.pdf)][[code](https://github.com/liudaizong/CBLN)]

5. [2021][CVPR] Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhou_Embracing_Uncertainty_Decoupling_and_De-Bias_for_Robust_Temporal_Grounding_CVPR_2021_paper.pdf)]

6. [2021][CVPR] Interventional Video Grounding with Dual Contrastive Learning.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Nan_Interventional_Video_Grounding_With_Dual_Contrastive_Learning_CVPR_2021_paper.pdf)][[code](https://github.com/nanguoshun/IVG)]

7. [2021][CVPR] Multi-stage Aggregated Transformer Network for Temporal Language Localization in Videos.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Multi-Stage_Aggregated_Transformer_Network_for_Temporal_Language_Localization_in_Videos_CVPR_2021_paper.pdf)]

8. [2021][CVPR] Structured Multi-Level Interaction Network for Video Moment Localization via Language Query.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_Structured_Multi-Level_Interaction_Network_for_Video_Moment_Localization_via_Language_CVPR_2021_paper.pdf)]

9. [2021][ICCV] Zero-shot Natural Language Video Localization.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Nam_Zero-Shot_Natural_Language_Video_Localization_ICCV_2021_paper.pdf)]

10. [2021][ICCV] Boundary-sensitive Pre-training for Temporal Localization in Videos.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Xu_Boundary-Sensitive_Pre-Training_for_Temporal_Localization_in_Videos_ICCV_2021_paper.pdf)]

11. [2021][ICCV] Support-Set Based Cross-Supervision for Video Grounding.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Ding_Support-Set_Based_Cross-Supervision_for_Video_Grounding_ICCV_2021_paper.pdf)]

12. [2021][ICCV] VLG-Net: Video-Language Graph Matching Network for Video Grounding.[[paper](https://openaccess.thecvf.com/content/ICCV2021W/CVEU/papers/Soldan_VLG-Net_Video-Language_Graph_Matching_Network_for_Video_Grounding_ICCVW_2021_paper.pdf)][[code](https://github.com/Soldelli/VLG-Net)]

13. [2021][TMM] Weakly Supervised Temporal Adjacent Network for Language Grounding.[[paper](https://arxiv.org/pdf/2106.16136)]

14. [2021][arXiv] A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics.[[paper](https://arxiv.org/pdf/2101.09028)][[code](https://github.com/yytzsy/grounding_changing_distribution)]

15. [2021][CVPR] .[[paper]()]

### 2020 Papers

1. [2020][AAAI] Weakly-Supervised Video Moment Retrieval via Semantic Completion Network.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/6820/6674)]

2. [2020][AAAI] Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/6924/6778)][[code](https://github.com/microsoft/2D-TAN)]

3. [2020][AAAI] Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/6897/6751)]

4. [2020][AAAI] Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/6984/6838)]

5. [2020][ACMMM] Fine-grained Iterative Attention Network for Temporal Language Localization in Videos.[[paper](https://arxiv.org/pdf/2008.02448)]

6. [2020][ECCV] Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos.[[paper](https://arxiv.org/pdf/2007.14164)]

7. [2020][CVPR] Local-Global Video-Text Interactions for Temporal Grounding.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Mun_Local-Global_Video-Text_Interactions_for_Temporal_Grounding_CVPR_2020_paper.pdf)][[code](https://github.com/JonghwanMun/LGI4temporalgrounding)]

8. [2020][CVPR] Dense Regression Network for Video Grounding.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zeng_Dense_Regression_Network_for_Video_Grounding_CVPR_2020_paper.pdf)]

### 2019 Papers

1. [2019][AAAI] Localizing Natural Language in Videos.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/4827/4700)]

2. [2019][AAAI] Multilevel Language and Vision Integration for Text-to-Clip Retrieval.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/4938/4811)]

3. [2019][AAAI] Read,Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4854/4727)]

4. [2019][AAAI] Semantic Proposal for Activity Localization in Videos via Sentence Query.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/4830/4703)]

5. [2019][AAAI] To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4950/4823)]

6. [2019][CVPR] Language-driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model.[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Language-Driven_Temporal_Activity_Localization_A_Semantic_Matching_Reinforcement_Learning_Model_CVPR_2019_paper.pdf)]

7. [2019][CVPR] MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment.[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_MAN_Moment_Alignment_Network_for_Natural_Language_Moment_Retrieval_via_CVPR_2019_paper.pdf)]

8. [2019][CVPR] Weakly Supervised Video Moment Retrieval From Text Queries .[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Mithun_Weakly_Supervised_Video_Moment_Retrieval_From_Text_Queries_CVPR_2019_paper.pdf)]

9. [2019][EMNLP] WSLLN:Weakly Supervised Natural Language Localization Networks.[[paper](https://arxiv.org/pdf/1909.00239)]

10. [2019][NeurIPS] Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos.[[paper](http://papers.neurips.cc/paper/8344-semantic-conditioned-dynamic-modulation-for-temporal-sentence-grounding-in-videos.pdf)]

11. [2019][WACV] MAC: Mining Activity Concepts for Language-based Temporal Localization.[[paper](https://arxiv.org/pdf/1811.08925)]

### 2018 Papers

1. [2018][EMNLP] Localizing Moments in Video with Temporal Language.[[paper](https://arxiv.org/pdf/1809.01337)]

2. [2018][EMNLP] Temporally Grounding Natural Sentence in Video.[[paper](https://www.aclweb.org/anthology/D18-1015.pdf)]

3. [2018][SIGIR] Attentive Moment Retrieval in Videos.[[paper](https://www.researchgate.net/profile/Meng-Liu-67/publication/326141659_Attentive_Moment_Retrieval_in_Videos/links/6052a32f299bf173674e0c03/Attentive-Moment-Retrieval-in-Videos.pdf)]

### 2017 Papers

1. [2017][ICCV] TALL: Temporal Activity Localization via Language Query.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Gao_TALL_Temporal_Activity_ICCV_2017_paper.pdf)]

2. [2017][ICCV] Dense-Captioning Events in Videos.[[paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.pdf)]

3. [2017][ICCV] Localizing Moments in Video with Natural Language.[[paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Hendricks_Localizing_Moments_in_ICCV_2017_paper.pdf)]




## Spatial-Temporal Video Grounding Papers

### Datasets

TODO

### 2022 Papers

1. [2022][AAAI] End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding.[[paper]()]

### 2021 Papers

1. [2021][CVPR] Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Song_Co-Grounding_Networks_With_Semantic_Attention_for_Referring_Expression_Comprehension_in_CVPR_2021_paper.pdf)]

2. [2021][ICCV] STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Su_STVGBert_A_Visual-Linguistic_Transformer_Based_Framework_for_Spatio-Temporal_Video_Grounding_ICCV_2021_paper.pdf)]

3. [2021][CVPR] .[[paper]()]

### 2020 Papers

TODO

### 2019 Papers

TODO

### 2018 Papers

TODO

### 2017 Papers

TODO
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/NeverMoreLCH/Awesome-Video-Grounding

Awesome Lists containing this project

README