Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with video-captioning
A curated list of projects in awesome lists tagged with video-captioning .
https://github.com/yehli/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering
Last synced: 16 Dec 2024
https://github.com/YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering
Last synced: 03 Nov 2024
https://github.com/xiadingZ/video-caption.pytorch
pytorch implementation of video captioning
deep-learning pytorch video-captioning
Last synced: 14 Nov 2024
https://github.com/scopeInfinity/Video2Description
Video to Text: Natural language description generator for some given video. [Video Captioning]
audio-processing cnn-keras deep-neural-networks image-captioning lstm-neural-networks video-captioning video-processing video-to-text
Last synced: 06 Nov 2024
https://github.com/tomchang25/whisper-auto-transcribe
Auto transcribe tool based on whisper
asr deep-learning gradio gradio-interface language-model pytorch speech-processing speech-recognition speech-to-text text-to-speech video-captioning voice-activity-detection
Last synced: 20 Nov 2024
https://github.com/jpthu17/emcl
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
cross-modal-retrieval neurips video-captioning video-question-answering video-retrieval
Last synced: 17 Nov 2024
https://github.com/ParitoshParmar/MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
action-quality-assessment action-recognition c3d captioning dilated-c3d dilated-convolution fine-grained-action-recognition fine-grained-classification lstm mtl-aqa multitask-learning pytorch representation-learning video-captioning video-processing video-understanding
Last synced: 03 Nov 2024
https://github.com/bytedance/shot2story
A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
benchmark dataset large-language-models video-captioning video-language video-language-pretraining video-question-answering video-story video-story-generation video-summarization vision-language
Last synced: 15 Nov 2024
https://github.com/amazon-science/crossmodal-contrastive-learning
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
computer-vision contrastive-learning multi-modality natural-language-processing transformers video video-captioning video-text-retrieval
Last synced: 12 Nov 2024
https://github.com/aimagelab/mvad-names-dataset
M-VAD Names Dataset. Multimedia Tools and Applications (2019)
captioning-videos mvad-names-dataset video-captioning
Last synced: 07 Nov 2024
https://github.com/ai-forever/aggme
Aggregation framework for annotating datasets in computer vision tasks (detection, segmentation, video captioning etc.)
aggregation-pipleline annotation-tool computer-vision crowdsourcing image-segmentation object-detection video-captioning
Last synced: 16 Nov 2024
https://github.com/fork123aniket/encoder-decoder-based-video-captioning
Implementation of Encoder-Decoder Model for Video Captioning in Tensorflow
encoder-decoder encoder-decoder-model keras-model keras-tensorflow tensorflow video-caption video-captioning
Last synced: 15 Nov 2024
https://github.com/willyfh/msvd-indonesian
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).
bahasa-indonesia deep-learning indonesian-dataset msvd msvd-indonesian multimodal-dataset neural-network video-captioning video-description video-retrieval video-text
Last synced: 21 Dec 2024
https://github.com/akagawatsurunaki/zerolan-core
ZerolanCore integrates many open-source, locally deployable AI models, and aims to integrate a series of AI models such as large language model (LLM), automatic speech recognition (ASR), text-to-speech (TTS), image captioning, optical character recognition (OCR), video captioning, etc.
anaconda asr cv docker image-captioning llm nlp ocr python tts video-captioning
Last synced: 14 Dec 2024