Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Video-Captioning
A curated list of research papers in Video Captioning
https://github.com/tgc1997/Awesome-Video-Captioning
Last synced: 3 days ago
JSON representation
-
2015
- [theano-code - code]](https://github.com/tsenghungchen/SA-tensorflow)
- Translating Videos to Natural Language Using Deep Recurrent Neural Networks
- [caffe-code
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- [website
- [caffe-code
- Describing Videos by Exploiting Temporal Structure
- Sequence to Sequence – Video to Text
- Translating Videos to Natural Language Using Deep Recurrent Neural Networks
-
2017
- [tf-code
- [theano-code
- [pytorch-code
- [tf-code
- [tf-code
- Video Captioning with Transferred Semantic Attributes
- Semantic Compositional Networks for Visual Captioning
- StyleNet: Generating Attractive Visual Captions with Styles
- End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering
- [tf-code
- Top-down Visual Saliency Guided by Captions
- Hierarchical Boundary-Aware Neural Encoder for Video Captioning
- [pytorch-code
- Task-Driven Dynamic Fusion: Reducing Ambiguity in Video Description
- Supervising Neural Attention Models for Video Captioning by Human Gaze Data
- Attention-Based Multimodal Fusion for Video Description
- Multi-Task Video Captioning with Video and Entailment Generation
- MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning
- Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
- Weakly Supervised Dense Video Captioning
- Weakly Supervised Dense Video Captioning
- Video Captioning with Transferred Semantic Attributes
- Semantic Compositional Networks for Visual Captioning
- Multi-Task Video Captioning with Video and Entailment Generation
-
2019
- [website
- [website
- [website
- [website
- Video Description: A Survey of Methods, Datasets and Evaluation Metrics
- Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
- Memory-Attended Recurrent Network for Video Captioning
- [website
- Watch It Twice: Video Captioning with a Refocused Video Encoder
- Motion Guided Spatial Attention for Video Captioning
- Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
- Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
- [code
- Video Interactive Captioning with Human Prompts
- [website
- Video Description: A Survey of Methods, Datasets and Evaluation Metrics
- [website
- Watch It Twice: Video Captioning with a Refocused Video Encoder
- Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- Memory-Attended Recurrent Network for Video Captioning
- [website
- Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
- [website
-
2018
- M3: Multimodal Memory Modelling for Video Captioning
- Study of Video Captioning Problem
- Interpretable Video Captioning via Trajectory Structured Localization
- Reconstruction Network for Video Captioning
- Less Is More: Picking Informative Frames for Video Captioning
- ECO: Efficient Convolutional Network for Online Video Understanding
- SibNet: Sibling Convolutional Encoder for Video Captioning
- Video Captioning with Tube Features
-
2016
- Jointly Modeling Embedding and Translation to Bridge Video and Language
- Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
- MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
- [website
- Video Description using Bidirectional Recurrent Neural Networks
- Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
- Video Description using Bidirectional Recurrent Neural Networks
- Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
-
2020
- Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
- Syntax-Aware Action Targeting for Video Captioning
- Object Relational Graph with Teacher-Recommended Learning for Video Captioning
- Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
- Learning to Discretely Compose Reasoning Module Networks for Video Captioning
- SBAT: Video Captioning with Sparse Boundary-Aware Transformer
- Joint Commonsense and Relation Reasoning for Image and Video Captioning
- Poet: Product-oriented Video Captioner for E-commerce
-
Dense-Captioning
- End-to-End Dense Video Captioning with Masked Transformer
- Attend and Interact: Higher-Order Object Interactions for Video Understanding
- Jointly Localizing and Describing Events for Dense Video Captioning
- Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
- Move Forward and Tell: A Progressive Generator of Video Descriptions
- An Efficient Framework for Dense Video Captioning
- MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Programming Languages
Sub Categories