Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with video-captioning

A curated list of projects in awesome lists tagged with video-captioning .

https://github.com/yehli/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering

Last synced: 16 Dec 2024

https://github.com/YehLi/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering

Last synced: 03 Nov 2024

https://github.com/xiadingZ/video-caption.pytorch

pytorch implementation of video captioning

deep-learning pytorch video-captioning

Last synced: 14 Nov 2024

https://github.com/scopeInfinity/Video2Description

Video to Text: Natural language description generator for some given video. [Video Captioning]

audio-processing cnn-keras deep-neural-networks image-captioning lstm-neural-networks video-captioning video-processing video-to-text

Last synced: 06 Nov 2024

https://github.com/jpthu17/emcl

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

cross-modal-retrieval neurips video-captioning video-question-answering video-retrieval

Last synced: 17 Nov 2024

https://github.com/bytedance/shot2story

A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.

benchmark dataset large-language-models video-captioning video-language video-language-pretraining video-question-answering video-story video-story-generation video-summarization vision-language

Last synced: 15 Nov 2024

https://github.com/aimagelab/mvad-names-dataset

M-VAD Names Dataset. Multimedia Tools and Applications (2019)

captioning-videos mvad-names-dataset video-captioning

Last synced: 07 Nov 2024

https://github.com/ai-forever/aggme

Aggregation framework for annotating datasets in computer vision tasks (detection, segmentation, video captioning etc.)

aggregation-pipleline annotation-tool computer-vision crowdsourcing image-segmentation object-detection video-captioning

Last synced: 16 Nov 2024

https://github.com/akagawatsurunaki/zerolan-core

ZerolanCore integrates many open-source, locally deployable AI models, and aims to integrate a series of AI models such as large language model (LLM), automatic speech recognition (ASR), text-to-speech (TTS), image captioning, optical character recognition (OCR), video captioning, etc.

anaconda asr cv docker image-captioning llm nlp ocr python tts video-captioning

Last synced: 14 Dec 2024