Projects in Awesome Lists tagged with video-captioning

https://github.com/yehli/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering

Last synced: 16 Dec 2024

https://github.com/YehLi/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering

Last synced: 03 Nov 2024

https://github.com/xiadingZ/video-caption.pytorch

pytorch implementation of video captioning

deep-learning pytorch video-captioning

Last synced: 14 Nov 2024

https://github.com/scopeInfinity/Video2Description

Video to Text: Natural language description generator for some given video. [Video Captioning]

audio-processing cnn-keras deep-neural-networks image-captioning lstm-neural-networks video-captioning video-processing video-to-text

Last synced: 06 Nov 2024

https://github.com/tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

asr deep-learning gradio gradio-interface language-model pytorch speech-processing speech-recognition speech-to-text text-to-speech video-captioning voice-activity-detection

Last synced: 20 Nov 2024

https://github.com/jpthu17/emcl

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

cross-modal-retrieval neurips video-captioning video-question-answering video-retrieval

Last synced: 17 Nov 2024

https://github.com/ParitoshParmar/MTL-AQA

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

action-quality-assessment action-recognition c3d captioning dilated-c3d dilated-convolution fine-grained-action-recognition fine-grained-classification lstm mtl-aqa multitask-learning pytorch representation-learning video-captioning video-processing video-understanding

Last synced: 03 Nov 2024

https://github.com/bytedance/shot2story

A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.

benchmark dataset large-language-models video-captioning video-language video-language-pretraining video-question-answering video-story video-story-generation video-summarization vision-language

Last synced: 15 Nov 2024

https://github.com/amazon-science/crossmodal-contrastive-learning

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

computer-vision contrastive-learning multi-modality natural-language-processing transformers video video-captioning video-text-retrieval

Last synced: 12 Nov 2024

https://github.com/aimagelab/mvad-names-dataset

M-VAD Names Dataset. Multimedia Tools and Applications (2019)

captioning-videos mvad-names-dataset video-captioning

Last synced: 07 Nov 2024

https://github.com/ai-forever/aggme

Aggregation framework for annotating datasets in computer vision tasks (detection, segmentation, video captioning etc.)

aggregation-pipleline annotation-tool computer-vision crowdsourcing image-segmentation object-detection video-captioning

Last synced: 16 Nov 2024

https://github.com/fork123aniket/encoder-decoder-based-video-captioning

Implementation of Encoder-Decoder Model for Video Captioning in Tensorflow

encoder-decoder encoder-decoder-model keras-model keras-tensorflow tensorflow video-caption video-captioning

Last synced: 15 Nov 2024

https://github.com/willyfh/msvd-indonesian

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).

bahasa-indonesia deep-learning indonesian-dataset msvd msvd-indonesian multimodal-dataset neural-network video-captioning video-description video-retrieval video-text

Last synced: 21 Dec 2024

https://github.com/akagawatsurunaki/zerolan-core

ZerolanCore integrates many open-source, locally deployable AI models, and aims to integrate a series of AI models such as large language model (LLM), automatic speech recognition (ASR), text-to-speech (TTS), image captioning, optical character recognition (OCR), video captioning, etc.

anaconda asr cv docker image-captioning llm nlp ocr python tts video-captioning

Last synced: 14 Dec 2024