An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with captioning

A curated list of projects in awesome lists tagged with captioning .

https://github.com/facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

captioning deep-learning dialog hateful-memes multi-tasking multimodal pretrained-models pytorch textvqa vqa

Last synced: 14 May 2025

https://github.com/roboflow/maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision qwen2-vl transformers vision-and-language vqa

Last synced: 14 May 2025

https://github.com/fpgaminer/joycaption

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

captioning joycaption vlm

Last synced: 07 Oct 2025

https://github.com/wangleihitcs/medicalreportgeneration

A Base Tensorflow Project for Medical Report Generation

captioning medical-report-generate tensorflow-models

Last synced: 02 May 2025

https://github.com/amanchadha/iperceive

Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021

attention captioning captioning-videos causality common-sense convolutional-neural-networks dense-captioning distilling-the-knowledge lstm multi-modal python python3 pytorch question-answering reasoning resnets self-attention transformers video videoqa

Last synced: 05 Apr 2025

https://github.com/aimagelab/pacscore

[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

captioning captioning-images captioning-videos computer-vision cvpr cvpr2023 vision-and-language

Last synced: 18 Oct 2025

https://github.com/lucidrains/aoa-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

attention attention-mechanism captioning visual-question-answering vqa

Last synced: 13 Dec 2025

https://github.com/davidmchan/caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.

ai captioning chatgpt deep-learning image machine-learning python

Last synced: 13 Apr 2025

https://github.com/aimagelab/camel

CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022

artificial-intelligence captioning captioning-images computer-vision image-captioning pytorch

Last synced: 11 Apr 2025

https://github.com/ebu/ebu-tt-live-toolkit

Toolkit for supporting the EBU-TT Live specification

broadcast captioning captions ebu-tt live python subtitles subtitling video

Last synced: 07 May 2025

https://github.com/aimagelab/pma-net

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023

captioning captioning-images iccv2023 image-captioning memory-augmented-neural-networks transformer vision-and-language vision-language

Last synced: 22 Jul 2025

https://github.com/rayandrew/indonesian-image-captioning

Indonesian Image Captioning using Attention-based Semantic Compositional Networks

attention captioning image-captioning indonesia indonesian pytorch resnet

Last synced: 12 Apr 2025

https://github.com/labbeti/aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

audio audio-captioning captioning metrics text

Last synced: 06 May 2025

https://github.com/imkett/zerogen

[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation

captioning controllable-text-generation decoding gpt2 multimodal nlpcc vision-language zero-shot

Last synced: 13 Apr 2025

https://github.com/naivehobo/smart-i

Smart-I is an android application aimed at helping the visually impaired using artificial intelligence and cloud computing.

andorid android android-app android-application caption captioning captioning-images captions cloud cloud-computing deep-learning deep-neural-networks image-recognition visualization

Last synced: 07 May 2025

https://github.com/ebu/ebu-tt

A public repository with key information about the EBU Timed Text (EBU-TT) format.

broadcast captioning captions ebu-tt subtitles subtitling video

Last synced: 24 Jan 2026

https://github.com/wangleihitcs/imagecaptions

A base model for image captions.

captioning rnn-model tensorflow

Last synced: 16 May 2026

https://github.com/brayevalerien/recap

An image (re)captioning GUI for image generation models dataset preparation, made for easy caption editing.

captioning captioning-images image-captioning image-dataset-management tkinter

Last synced: 24 Oct 2025

https://github.com/aavtic/parashu

A video subtitle editor program in rust.

caption-effects captioning rust-lang

Last synced: 09 Apr 2026

https://github.com/basedrhys/text-od-robustness

Evaluating the robustness of text-conditioned OD models such as MDETR

captioning deep-learning image-captioning machine-learning mdetr model object-detection transformers

Last synced: 10 Apr 2025

https://github.com/git-khandelwal/cnn-to-gpt2

Image Captioning using CNNs and Transformers

captioning cnn image transformer

Last synced: 07 Apr 2025

https://github.com/Damkohler/CaptionForge

CaptionForge creates stronger local dataset captions by combining multiple image-caption witnesses, distilling their agreements and contradictions, validating the result with a VLM, and exporting auditable LoRA-ready captions.

audit-trail captioning captioning-images comfyui comfyui-custom-nodes dataset-captions dataset-preparation image-captioning joy-caption joycaption jsonl local-ai lora lora-training multimodal-ai ollama qwen qwen2-5 vision-language-models vlm-validation

Last synced: 21 Jun 2026

https://github.com/petercorke/vtt-clean

Python script to clean VTT files generated by Microsoft Stream

captioning microsoft-stream timecode vtt vtt-subtitles

Last synced: 27 Mar 2025