Projects in Awesome Lists tagged with captioning
A curated list of projects in awesome lists tagged with captioning .
https://github.com/facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
captioning deep-learning dialog hateful-memes multi-tasking multimodal pretrained-models pytorch textvqa vqa
Last synced: 14 May 2025
https://github.com/roboflow/maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision qwen2-vl transformers vision-and-language vqa
Last synced: 14 May 2025
https://github.com/fpgaminer/joycaption
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
Last synced: 07 Oct 2025
https://github.com/wangleihitcs/medicalreportgeneration
A Base Tensorflow Project for Medical Report Generation
captioning medical-report-generate tensorflow-models
Last synced: 02 May 2025
https://github.com/amanchadha/iperceive
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
attention captioning captioning-videos causality common-sense convolutional-neural-networks dense-captioning distilling-the-knowledge lstm multi-modal python python3 pytorch question-answering reasoning resnets self-attention transformers video videoqa
Last synced: 05 Apr 2025
https://github.com/labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
audio audio-captioning caption captioning dataset datasets deep-learning pytorch
Last synced: 06 Oct 2025
https://github.com/aimagelab/pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
captioning captioning-images captioning-videos computer-vision cvpr cvpr2023 vision-and-language
Last synced: 18 Oct 2025
https://github.com/ParitoshParmar/MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
action-quality-assessment action-recognition c3d captioning dilated-c3d dilated-convolution fine-grained-action-recognition fine-grained-classification lstm mtl-aqa multitask-learning pytorch representation-learning video-captioning video-processing video-understanding
Last synced: 02 Apr 2025
https://github.com/lucidrains/aoa-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
attention attention-mechanism captioning visual-question-answering vqa
Last synced: 13 Dec 2025
https://github.com/davidmchan/caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
ai captioning chatgpt deep-learning image machine-learning python
Last synced: 13 Apr 2025
https://github.com/aimagelab/camel
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
artificial-intelligence captioning captioning-images computer-vision image-captioning pytorch
Last synced: 11 Apr 2025
https://github.com/ebu/ebu-tt-live-toolkit
Toolkit for supporting the EBU-TT Live specification
broadcast captioning captions ebu-tt live python subtitles subtitling video
Last synced: 07 May 2025
https://github.com/aimagelab/pma-net
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023
captioning captioning-images iccv2023 image-captioning memory-augmented-neural-networks transformer vision-and-language vision-language
Last synced: 22 Jul 2025
https://github.com/rayandrew/indonesian-image-captioning
Indonesian Image Captioning using Attention-based Semantic Compositional Networks
attention captioning image-captioning indonesia indonesian pytorch resnet
Last synced: 12 Apr 2025
https://github.com/labbeti/aac-metrics
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
audio audio-captioning captioning metrics text
Last synced: 06 May 2025
https://github.com/imkett/zerogen
[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
captioning controllable-text-generation decoding gpt2 multimodal nlpcc vision-language zero-shot
Last synced: 13 Apr 2025
https://github.com/nssharmaofficial/reddit-hole
Automated reddit scraper and video creator
amazon-polly amazon-polly-api automation aws captioning openai openai-whisper reddit reddit-bot reddit-crawler reddit-scraper tts whisper
Last synced: 23 Oct 2025
https://github.com/naivehobo/smart-i
Smart-I is an android application aimed at helping the visually impaired using artificial intelligence and cloud computing.
andorid android android-app android-application caption captioning captioning-images captions cloud cloud-computing deep-learning deep-neural-networks image-recognition visualization
Last synced: 07 May 2025
https://github.com/samuelbradshaw/text-to-timestamps
Python and command-line utility for aligning audio to a transcript.
batch-processing captioning command-line forced-alignment karaoke machine-learning mlx mps python speech-recognition speech-to-text subtitles transcription webvtt
Last synced: 06 Mar 2026
https://github.com/fofr/cog-batch-image-captioning
Caption images for lora training
ai anthropic captioning claude cog gemini openai replicate
Last synced: 10 Apr 2025
https://github.com/cd2bit/awesome-list-of-captioned-courses
Online professional courses that are captioned and/or subtitled
accessibility airtable captioning captioning-videos captions courses online-course subtitles subtitling
Last synced: 23 Jan 2026
https://github.com/ebu/ebu-tt
A public repository with key information about the EBU Timed Text (EBU-TT) format.
broadcast captioning captions ebu-tt subtitles subtitling video
Last synced: 24 Jan 2026
https://github.com/kennethwussmann/caption.now
Quickly and efficiently caption your image dataset for AI training
ai annotation annotations captioning captioning-images data-labeling dataset dataset-generation datasets image-classification image-labeling image-labelling-tool labeling labeling-tool offline-first progressive-web-app pwa
Last synced: 14 Jun 2025
https://github.com/wangleihitcs/imagecaptions
A base model for image captions.
captioning rnn-model tensorflow
Last synced: 16 May 2026
https://github.com/brayevalerien/recap
An image (re)captioning GUI for image generation models dataset preparation, made for easy caption editing.
captioning captioning-images image-captioning image-dataset-management tkinter
Last synced: 24 Oct 2025
https://github.com/aavtic/parashu
A video subtitle editor program in rust.
caption-effects captioning rust-lang
Last synced: 09 Apr 2026
https://github.com/basedrhys/text-od-robustness
Evaluating the robustness of text-conditioned OD models such as MDETR
captioning deep-learning image-captioning machine-learning mdetr model object-detection transformers
Last synced: 10 Apr 2025
https://github.com/git-khandelwal/cnn-to-gpt2
Image Captioning using CNNs and Transformers
captioning cnn image transformer
Last synced: 07 Apr 2025
https://github.com/Damkohler/CaptionForge
CaptionForge creates stronger local dataset captions by combining multiple image-caption witnesses, distilling their agreements and contradictions, validating the result with a VLM, and exporting auditable LoRA-ready captions.
audit-trail captioning captioning-images comfyui comfyui-custom-nodes dataset-captions dataset-preparation image-captioning joy-caption joycaption jsonl local-ai lora lora-training multimodal-ai ollama qwen qwen2-5 vision-language-models vlm-validation
Last synced: 21 Jun 2026
https://github.com/petercorke/vtt-clean
Python script to clean VTT files generated by Microsoft Stream
captioning microsoft-stream timecode vtt vtt-subtitles
Last synced: 27 Mar 2025
https://github.com/awesome-webdevs/a11y
Tools, checklists, and best practices for building accessible, inclusive web experiences.
a11y-audits a11y-testing a11y-tools accessibility-linters accessible-components alt-text aria assistive-tech captioning cognitive-load color-contrast focus-management inclusive-design keyboard-navigation motion-accessibility readability screen-readers semantic-html voice-ui wcag
Last synced: 04 Apr 2026
https://github.com/ssube/label-prompt-caption
annotations captioning captioning-images dataset llama3 llm vlm
Last synced: 09 May 2026