Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tgc1997/Awesome-Video-Captioning
A curated list of research papers in Video Captioning
https://github.com/tgc1997/Awesome-Video-Captioning
List: Awesome-Video-Captioning
Last synced: about 1 month ago
JSON representation
A curated list of research papers in Video Captioning
- Host: GitHub
- URL: https://github.com/tgc1997/Awesome-Video-Captioning
- Owner: tgc1997
- Created: 2019-12-30T06:38:59.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-01-05T11:37:55.000Z (almost 4 years ago)
- Last Synced: 2024-05-23T02:03:36.904Z (7 months ago)
- Size: 44.9 KB
- Stars: 116
- Watchers: 2
- Forks: 14
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - Awesome-Video-Captioning - A curated list of research papers in Video Captioning. (Other Lists / Monkey C Lists)
README
# Awesome-Video-Captioning
A curated list of research papers in Video Captioning(from 2015 to 2020). Link to the code and project website if available.# Contents
- [2015](#2015)
- [2016](#2016)
- [2017](#2017)
- [2018](#2018)
- [2019](#2019)
- [2020](#2020)
- [Dense Captioning](#Dense-Captioning)
- [Grounded Captioning](#Grounded-Captioning)# Paper List
## 2015
1. **LSTM-P**: [Translating Videos to Natural Language Using Deep Recurrent Neural Networks](https://www.cs.utexas.edu/users/ml/papers/venugopalan.naacl15.pdf)
*Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko*
NAACL, 2015.[[caffe-code]](https://gist.github.com/vsubhashini/3761b9ad43f60db9ac3d)2. **LRCN**: [Long-term Recurrent Convolutional Networks for Visual Recognition and Description](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf)
*Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell*
CVPR, 2015.[[website]](http://jeffdonahue.com/lrcn/)3. **S2VT**: [Sequence to Sequence – Video to Text](https://www.cs.utexas.edu/users/ml/papers/venugopalan.iccv15.pdf)
*Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko*
ICCV, 2015.[[caffe-code]](https://gist.github.com/vsubhashini/38d087e140854fee4b14)4. **SA**: [Describing Videos by Exploiting Temporal Structure](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yao_Describing_Videos_by_ICCV_2015_paper.pdf)
*Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville*
ICCV, 2015.[[theano-code]](https://github.com/yaoli/arctic-capgen-vid) [[tf-code]](https://github.com/tsenghungchen/SA-tensorflow)## 2016
1. **LSTM-E**: [Jointly Modeling Embedding and Translation to Bridge Video and Language](http://openaccess.thecvf.com/content_cvpr_2016/papers/Pan_Jointly_Modeling_Embedding_CVPR_2016_paper.pdf)
*Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui*
CVPR, 2016.2. **HRNE**: [Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning](http://zhongwen.ai/pdf/HRNE.pdf)
*Pingbo Pan, Zhongwen Xu, Yi Yang, Fei Wu, Yueting Zhuang*
CVPR, 2016.3. **h-RNN**: [Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks](https://arxiv.org/pdf/1510.07712)
*Haonan Yu, Jiang Wang, Zhiheng Huang, Yi Yang, Wei Xu*
CVPR, 2016.4. **MSR-VTT**: [MSR-VTT: A Large Video Description Dataset for Bridging Video and Language](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/cvpr16.msr-vtt.tmei_-1.pdf)
*Jun Xu , Tao Mei , Ting Yao and Yong Rui*
CVPR, 2016.[[website]](https://www.microsoft.com/en-us/research/publication/msr-vtt-a-large-video-description-dataset-for-bridging-video-and-language/)5. **BiLSTM**: [Video Description using Bidirectional Recurrent Neural Networks](https://arxiv.org/pdf/1604.03390)
*Álvaro Peris, Marc Bolaños, Petia Radeva, Francisco Casacuberta*
ICANN, 2016.## 2017
1. **DenseVidCap**: [ Weakly Supervised Dense Video Captioning](https://arxiv.org/pdf/1704.01502)
*Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue*
CVPR, 2017.[[tf-code]](https://github.com/SCLinDennis/Weakly-Supervised-Dense-Video-Captioning)2. **LSTM-TSA**: [Video Captioning with Transferred Semantic Attributes](https://arxiv.org/pdf/1611.07675)
*Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei*
CVPR, 2017.3. **SCN**: [Semantic Compositional Networks for Visual Captioning](https://arxiv.org/pdf/1611.08002)
*Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng*
CVPR, 2017.[[theano-code]](https://github.com/zhegan27/Semantic_Compositional_Nets)4. **StyleNet**: [StyleNet: Generating Attractive Visual Captions with Styles](http://openaccess.thecvf.com/content_cvpr_2017/papers/Gan_StyleNet_Generating_Attractive_CVPR_2017_paper.pdf)
*Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng*
CVPR, 2017.[[pytorch-code]](https://github.com/kacky24/stylenet)5. **CT-SAN**: [End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering](https://zpascal.net/cvpr2017/Yu_End-To-End_Concept_Word_CVPR_2017_paper.pdf)
*Youngjae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim*
CVPR, 2017.[[tf-code]](https://gitlab.com/fodrh1201/CT-SAN/tree/master)6. **CGVS**: [Top-down Visual Saliency Guided by Captions](http://zpascal.net/cvpr2017/Ramanishka_Top-Down_Visual_Saliency_CVPR_2017_paper.pdf)
*Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko*
CVPR, 2017.[[tf-code]](https://github.com/VisionLearningGroup/caption-guided-saliency)7. **HBA**: [Hierarchical Boundary-Aware Neural Encoder for Video Captioning](http://openaccess.thecvf.com/content_cvpr_2017/papers/Baraldi_Hierarchical_Boundary-Aware_Neural_CVPR_2017_paper.pdf)
*Lorenzo Baraldi, Costantino Grana, Rita Cucchiara*
CVPR, 2017.[[pytorch-code]](https://github.com/Yugnaynehc/banet)8. **TDDF**: [Task-Driven Dynamic Fusion: Reducing Ambiguity in Video Description](https://www.zpascal.net/cvpr2017/Zhang_Task-Driven_Dynamic_Fusion_CVPR_2017_paper.pdf)
*Xishan Zhang, Ke Gao, Yongdong Zhang, Dongming Zhang, Jintao Li,and Qi Tian*
CVPR, 2017.9. **GEAN**: [Supervising Neural Attention Models for Video Captioning by Human Gaze Data](http://zpascal.net/cvpr2017/Yu_Supervising_Neural_Attention_CVPR_2017_paper.pdf)
*Youngjae Yu, Jongwook Choi, Yeonhwa Kim, Kyung Yoo, Sang-Hun Lee, Gunhee Kim*
CVPR, 2017.[[tf-code]](https://github.com/yj-yu/Recurrent_Gaze_Prediction)10. **MM-Att**: [Attention-Based Multimodal Fusion for Video Description](http://openaccess.thecvf.com/content_ICCV_2017/papers/Hori_Attention-Based_Multimodal_Fusion_ICCV_2017_paper.pdf)
*Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks*
ICCV, 2017.11. **Tessellation**: [Temporal Tessellation: A Unified Approach for Video Analysis](http://openaccess.thecvf.com/content_ICCV_2017/papers/Kaufman_Temporal_Tessellation_A_ICCV_2017_paper.pdf)
*Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf*
ICCV, 2017.[[tf-code]](https://github.com/dot27/temporal-tessellation)12. **MTEG**: [Multi-Task Video Captioning with Video and Entailment Generation](https://arxiv.org/pdf/1704.07489)
*Ramakanth Pasunuru, Mohit Bansal*
ACL, 2017.13. **MAM-RNN**: [MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning](https://www.ijcai.org/proceedings/2017/0307.pdf)
*Xuelong Li, Bin Zhao, Xiaoqiang Lu*
IJCAI, 2017.14. **hLSTMat**: [Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning](https://www.ijcai.org/proceedings/2017/0381.pdf)
*Jingkuan Song, Lianli Gao, Zhao Guo, Wu Liu, Dongxiang Zhang, Heng Tao Shen*
IJCAI, 2017.[[theano-code]](https://github.com/zhaoluffy/hLSTMat)## 2018
1. **Survey**: [Study of Video Captioning Problem](https://www.cs.princeton.edu/courses/archive/spring18/cos598B/public/projects/LiteratureReview/COS598B_spr2018_VideoCaptioning.pdf)
*Jiaqi Su*
cos598B, 2018.2. [Fine-grained Video Captioning for Sports Narrative](http://openaccess.thecvf.com/content_cvpr_2018/papers/Yu_Fine-Grained_Video_Captioning_CVPR_2018_paper.pdf)
*Huanyu Yu, Shuo Cheng, Bingbing Ni, Minsi Wang, Jian Zhang, Xiaokang Yang*
CVPR, 2018.3. **TSA-ED**: [Interpretable Video Captioning via Trajectory Structured Localization](http://openaccess.thecvf.com/content_cvpr_2018/papers/Wu_Interpretable_Video_Captioning_CVPR_2018_paper.pdf)
*Xian Wu, Guanbin Li Qingxing Cao, Qingge Ji, Liang Lin*
CVPR, 2018.4. **RecNet**: [Reconstruction Network for Video Captioning](https://www.zpascal.net/cvpr2018/Wang_Reconstruction_Network_for_CVPR_2018_paper.pdf)
*Bairui Wang, Lin Ma, Wei Zhang, Wei Liu*
CVPR, 2018.[[pytorch-code]](https://github.com/hobincar/reconstruction-network-for-video-captioning)5. **M3**: [M3: Multimodal Memory Modelling for Video Captioning](http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_M3_Multimodal_Memory_CVPR_2018_paper.pdf)
*Junbo Wang, Wei Wang, Yan Huang, Liang Wang, Tieniu Tan*
CVPR, 2018.6. **PickNet**: [Less Is More: Picking Informative Frames for Video Captioning](https://eccv2018.org/openaccess/content_ECCV_2018/papers/Yangyu_Chen_Less_is_More_ECCV_2018_paper.pdf)
*Yangyu Chen, Shuhui Wang, Weigang Zhang, Qingming Huang*
ECCV, 2018.7. **ECO-SCN**: [ECO: Efficient Convolutional Network for Online Video Understanding](http://openaccess.thecvf.com/content_ECCV_2018/papers/Mohammadreza_Zolfaghari_ECO_Efficient_Convolutional_ECCV_2018_paper.pdf)
*Mohammadreza Zolfaghari, Kamaljeet Singh, Thomas Brox*
ECCV, 2018.[[caffe-code]](https://github.com/mzolfaghari/ECO-efficient-video-understanding) [[pytorch-code]](https://github.com/zhang-can/ECO-pytorch)8. **SibNet**: [SibNet: Sibling Convolutional Encoder for Video Captioning](https://cse.buffalo.edu/~jsyuan/papers/2018/SibNet__Sibling_Convolutional_Encoder_for_Video_Captioning.pdf)
*Sheng liu, Zhou Ren, Junsong Yuan*
ACM MM, 2018.9. **TubeNet**: [Video Captioning with Tube Features](https://www.ijcai.org/proceedings/2018/0164.pdf)
*Bin Zhao, Xuelong Li, Xiaoqiang Lu*
IJCAI, 2018.## 2019
1. **Survey**: [Video Description: A Survey of Methods, Datasets and Evaluation Metrics](https://arxiv.org/pdf/1806.00186)
*Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, Mubarak Shah*
ACM Computing Surveys, 2019.2. **GRU-EVE**: [Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning](https://zpascal.net/cvpr2019/Aafaq_Spatio-Temporal_Dynamics_and_Semantic_Attribute_Enriched_Visual_Encoding_for_Video_CVPR_2019_paper.pdf)
*Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, Ajmal Mian*
CVPR, 2019.3. **MARN**: [Memory-Attended Recurrent Network for Video Captioning](https://arxiv.org/pdf/1905.03966)
*Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai*
CVPR, 2019.4. **OA-BTG**: [Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Object-Aware_Aggregation_With_Bidirectional_Temporal_Graph_for_Video_Captioning_CVPR_2019_paper.pdf)
*Junchao Zhang, Yuxin Peng*
CVPR, 2019.5. **VATEX**: [VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research](http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_VaTeX_A_Large-Scale_High-Quality_Multilingual_Dataset_for_Video-and-Language_Research_ICCV_2019_paper.pdf)
*Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang*
ICCV, 2019.[[website]](https://vatex.org/main/index.html#)6. **POS**: [Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning](http://openaccess.thecvf.com/content_ICCV_2019/papers/Hou_Joint_Syntax_Representation_Learning_and_Visual_Cue_Translation_for_Video_ICCV_2019_paper.pdf)
*Jingyi Hou, Xinxiao Wu, Wentian Zhao, Jiebo Luo, Yunde Jia*
ICCV, 2019.7. **POS-CG**: [Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network](http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Controllable_Video_Captioning_With_POS_Sequence_Guidance_Based_on_Gated_ICCV_2019_paper.pdf)
*Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu*
ICCV, 2019.[[pytorch-code]](https://github.com/vsislab/Controllable_XGating)8. **WIT**: [Watch It Twice: Video Captioning with a Refocused Video Encoder](https://arxiv.org/pdf/1907.12905)
*Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu*
ACM MM, 2019.9. **MGSA**: [Motion Guided Spatial Attention for Video Captioning](https://www.aaai.org/ojs/index.php/AAAI/article/view/4829/4702)
*Shaoxiang Chen and Yu-Gang Jiang*
AAAI, 2019.10. **TDConvED**: [Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning](https://arxiv.org/pdf/1905.01077)
*Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei*
AAAI, 2019.11. **FCVC-CF&IA**: [Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention](https://aaai.org/ojs/index.php/AAAI/article/view/4839)
*Kuncheng Fang, Lian Zhou, Cheng Jin, Yuejie Zhang,Kangnian Weng,Tao Zhang, Weiguo Fan*
AAAI, 2019.12. **TAMoE**: [Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning](https://arxiv.org/pdf/1811.02765)
*Xin Wang, Jiawei Wu, Da Zhang, Yu Su, William Yang Wang*
AAAI, 2019.[[code]](https://github.com/eric-xw/Zero-Shot-Video-Captioning)13. **VIC**: [Video Interactive Captioning with Human Prompts](https://www.ijcai.org/proceedings/2019/0135.pdf)
*Aming Wu, Yahong Han and Yi Yang*
IJCAI, 2019.[[code]](https://github.com/ViCap01/ViCap)## 2020
1. [Spatio-Temporal Graph for Video Captioning with Knowledge Distillation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Pan_Spatio-Temporal_Graph_for_Video_Captioning_With_Knowledge_Distillation_CVPR_2020_paper.pdf)
*Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles*
CVPR, 2020.2. **SAAT**: [Syntax-Aware Action Targeting for Video Captioning](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf)
*Zheng, Qi and Wang, Chaoyue and Tao, Dacheng*
CVPR, 2020.[[pytorch-code]](https://github.com/SydCaption/SAAT)3. **ORG-TRL**: [Object Relational Graph with Teacher-Recommended Learning for Video Captioning](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Object_Relational_Graph_With_Teacher-Recommended_Learning_for_Video_Captioning_CVPR_2020_paper.pdf)
*Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zhengjun Zha*
CVPR, 2020.4. **PMI-CAP**: [Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos](https://arxiv.org/pdf/2007.14164.pdf)
*Shaoxiang Chen, Wenhao Jiang, Wei Liu, Yu-Gang Jiang*
ECCV, 2020.[[pytorch-code]](https://github.com/xuewyang/Fashion_Captioning)5. **RMN**: [Learning to Discretely Compose Reasoning Module Networks for Video Captioning](https://www.ijcai.org/Proceedings/2020/0104.pdf)
*Ganchao Tan, Daqing Liu, Meng Wang and Zheng-Jun Zha*
IJCAI, 2020.[[pytorch-code]](https://github.com/tgc1997/RMN)6. **SBAT**: [SBAT: Video Captioning with Sparse Boundary-Aware Transformer](https://www.ijcai.org/Proceedings/2020/0088.pdf)
*Tao Jin, Siyu Huang, Yingming Li, Zhongfei Zhang, Ming Chen*
IJCAI, 2020.7. [Joint Commonsense and Relation Reasoning for Image and Video Captioning](https://wuxinxiao.github.io/assets/papers/2020/C-R_reasoning.pdf)
*Jingyi Hou, Xinxiao Wu, Xiaoxun Zhang, Yayun Qi, Yunde Jia, Jiebo Luo*
AAAI, 2020.8. **SMCG**: [Controllable Video Captioning with an Exemplar Sentence](https://dl.acm.org/doi/abs/10.1145/3394171.3413908)
*Yitian Yuan, Lin Ma, Jingwen Wang, Wenwu Zhu*
ACM MM, 2020.9. **Poet**: [Poet: Product-oriented Video Captioner for E-commerce](https://arxiv.org/pdf/2008.06880.pdf)
*Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Jie Liu, Jingren Zhou, Hongxia Yang, Fei Wu*
ACM MM, 2020.10. [Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning](https://dl.acm.org/doi/abs/10.1145/3394171.3413498)
*Botian Shi, Lei Ji, Zhendong Niu, Nan Duan, Ming Zhou, Xilin Chen*
ACM MM, 2020.## Dense-Captioning
1. [Dense-Captioning Events in Videos](http://openaccess.thecvf.com/content_ICCV_2017/papers/Krishna_Dense-Captioning_Events_in_ICCV_2017_paper.pdf)
*Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles*
ICCV, 2017.[[code]](https://github.com/ranjaykrishna/densevid_eval) [[website]](https://cs.stanford.edu/people/ranjaykrishna/densevid/)2. [End-to-End Dense Video Captioning with Masked Transformer](http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_End-to-End_Dense_Video_CVPR_2018_paper.pdf)
*Luowei Zhou, Yingbo Zhou, Jason J. Corso, Richard Socher, Caiming Xiong*
CVPR, 2018.[[pytorch-code]](https://github.com/salesforce/densecap)3. [Attend and Interact: Higher-Order Object Interactions for Video Understanding](http://openaccess.thecvf.com/content_cvpr_2018/CameraReady/0330.pdf)
*Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, and Hans Peter Graf*
CVPR, 2018.4. [Jointly Localizing and Describing Events for Dense Video Captioning](http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_Jointly_Localizing_and_CVPR_2018_paper.pdf)
*Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei*
CVPR, 2018.5. [Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning](http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Bidirectional_Attentive_Fusion_CVPR_2018_paper.pdf)
*Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, Yong Xu*
CVPR, 2018.[[tf-code]](https://github.com/JaywongWang/DenseVideoCaptioning)6. [Move Forward and Tell: A Progressive Generator of Video Descriptions](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yilei_Xiong_Move_Forward_and_ECCV_2018_paper.pdf)
*Yilei Xiong, Bo Dai, Dahua Lin*
ECCV, 2018.7. [Adversarial Inference for Multi-sentence Video Description](http://openaccess.thecvf.com/content_CVPR_2019/papers/Park_Adversarial_Inference_for_Multi-Sentence_Video_Description_CVPR_2019_paper.pdf)
*Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach*
CVPR, 2019.[[pytorch-code]](https://github.com/jamespark3922/adv-inf)8. [Dense Relational Captioning: Triple-stream Networks for Relationship-based Captioning](http://openaccess.thecvf.com/content_CVPR_2019/papers/Kim_Dense_Relational_Captioning_Triple-Stream_Networks_for_Relationship-Based_Captioning_CVPR_2019_paper.pdf)
*Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon*
CVPR, 2019.[[torch-code]](https://github.com/Dong-JinKim/DenseRelationalCaptioning)9. [Streamlined Dense Video Captioning](http://openaccess.thecvf.com/content_CVPR_2019/papers/Mun_Streamlined_Dense_Video_Captioning_CVPR_2019_paper.pdf)
*Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han*
CVPR, 2019.9. [Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning](http://openaccess.thecvf.com/content_ICCV_2019/papers/Rahman_Watch_Listen_and_Tell_Multi-Modal_Weakly_Supervised_Dense_Event_Captioning_ICCV_2019_paper.pdf)
*Tanzila Rahman, Bicheng Xu, Leonid Sigal*
ICCV, 2019.10. [An Efficient Framework for Dense Video Captioning](https://www.aaai.org/Papers/AAAI/2020GB/AAAI-SuinM.7561.pdf)
*Maitreya Suin, A. N. Rajagopalan*
AAAI, 2020.11. [MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning](https://arxiv.org/pdf/2005.05402.pdf)
*Jie Lei, Liwei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal*
ACL, 2020. [[pytorch-code]](https://github.com/jayleicn/recurrent-transformer)12. [Identity-Aware Multi-Sentence Video Description](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660358.pdf)
*Jae Sung Park, Trevor Darrell, Anna Rohrbach*
ECCV, 2020.## Grounded-Captioning
1. **GVD**: [Grounded Video Description](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhou_Grounded_Video_Description_CVPR_2019_paper.pdf)
*Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach*
CVPR, 2019.[[pytorch-code]](https://github.com/facebookresearch/grounded-video-description)2. [Relational Graph Learning for Grounded Video Description Generation](https://dl.acm.org/doi/abs/10.1145/3394171.3413746)
*Wenqiao Zhang, Xineric Wang, Siliang Tang, Haizhou Shi, Haochen Shi, Jun Xiao, Yueting Zhuang, Williamyang Wang*
ACM MM, 2020.