An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with video-language

A curated list of projects in awesome lists tagged with video-language .

https://github.com/showlab/VLog

[CVPR 2025] Video Narration as Vocabulary & Video as Long Document

chatgpt langchain large-language-model video-language vocabulary whisper

Last synced: 07 Apr 2025

https://github.com/showlab/vlog

[CVPR 2025] Video Narration as Vocabulary & Video as Long Document

chatgpt langchain large-language-model video-language vocabulary whisper

Last synced: 11 Apr 2025

https://github.com/microsoft/univl

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

alignment caption caption-task coin joint localization msrvtt multimodal-sentiment-analysis multimodality pretrain pretraining retrieval-task segmentation video video-language video-text video-text-retrieval youcookii

Last synced: 05 Apr 2025

https://github.com/showlab/univtg

[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding

highlight-detection moment-retrieval pretraining video-grounding video-language video-summarization

Last synced: 05 Apr 2025

https://github.com/showlab/UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding

highlight-detection moment-retrieval pretraining video-grounding video-language video-summarization

Last synced: 29 Nov 2024

https://github.com/showlab/all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

codebase pre-training pytorch video-language

Last synced: 09 Apr 2025

https://github.com/showlab/egovlp

[NeurIPS 2022] Egocentric Video-Language Pretraining

egocentric-vision pretraining pytorch video-language

Last synced: 09 Apr 2025

https://github.com/junchen14/multi-modal-transformer

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

efficiency-transformer image-transformer language mlp-mixer multi-modal multi-modal-cvpr2021 transformer-readling-list video-language video-transformer vision-transformer

Last synced: 13 Apr 2025

https://github.com/salesforce/alpro

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

prompt-learning representation-learning video-language video-question-answering video-text-retrieval vision-and-language

Last synced: 19 Dec 2024

https://github.com/bytedance/shot2story

A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.

benchmark dataset large-language-models video-captioning video-language video-language-pretraining video-question-answering video-story video-story-generation video-summarization vision-language

Last synced: 13 Apr 2025

https://github.com/showlab/region_learner

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

video-language

Last synced: 22 Apr 2025

https://github.com/showlab/videogui

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

gui llm-agent video-language

Last synced: 22 Apr 2025

https://github.com/zinengtang/perceiver_vl

PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)

efficiency retrieval scalability video-language vision-and-language

Last synced: 10 Apr 2025

https://github.com/zinengtang/decembert

Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)

video video-language video-language-understanding vision-language

Last synced: 10 Apr 2025