Projects in Awesome Lists tagged with video-language
A curated list of projects in awesome lists tagged with video-language .
https://github.com/showlab/VLog
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
chatgpt langchain large-language-model video-language vocabulary whisper
Last synced: 07 Apr 2025
https://github.com/showlab/vlog
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
chatgpt langchain large-language-model video-language vocabulary whisper
Last synced: 11 Apr 2025
https://github.com/microsoft/univl
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
alignment caption caption-task coin joint localization msrvtt multimodal-sentiment-analysis multimodality pretrain pretraining retrieval-task segmentation video video-language video-text video-text-retrieval youcookii
Last synced: 05 Apr 2025
https://github.com/showlab/univtg
[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding
highlight-detection moment-retrieval pretraining video-grounding video-language video-summarization
Last synced: 05 Apr 2025
https://github.com/showlab/UniVTG
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
highlight-detection moment-retrieval pretraining video-grounding video-language video-summarization
Last synced: 29 Nov 2024
https://github.com/showlab/all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
codebase pre-training pytorch video-language
Last synced: 09 Apr 2025
https://github.com/showlab/egovlp
[NeurIPS 2022] Egocentric Video-Language Pretraining
egocentric-vision pretraining pytorch video-language
Last synced: 09 Apr 2025
https://github.com/junchen14/multi-modal-transformer
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
efficiency-transformer image-transformer language mlp-mixer multi-modal multi-modal-cvpr2021 transformer-readling-list video-language video-transformer vision-transformer
Last synced: 13 Apr 2025
https://github.com/salesforce/alpro
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
prompt-learning representation-learning video-language video-question-answering video-text-retrieval vision-and-language
Last synced: 19 Dec 2024
https://github.com/bytedance/shot2story
A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
benchmark dataset large-language-models video-captioning video-language video-language-pretraining video-question-answering video-story video-story-generation video-summarization vision-language
Last synced: 13 Apr 2025
https://github.com/showlab/region_learner
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
Last synced: 22 Apr 2025
https://github.com/showlab/videogui
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Last synced: 22 Apr 2025
https://github.com/zinengtang/perceiver_vl
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
efficiency retrieval scalability video-language vision-and-language
Last synced: 10 Apr 2025
https://github.com/zinengtang/decembert
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
video video-language video-language-understanding vision-language
Last synced: 10 Apr 2025