Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with visual-question-answering
A curated list of projects in awesome lists tagged with visual-question-answering .
https://github.com/salesforce/blip
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
image-captioning image-text-retrieval vision-and-language-pre-training vision-language vision-language-transformer visual-question-answering visual-reasoning
Last synced: 17 Dec 2024
https://github.com/salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
image-captioning image-text-retrieval vision-and-language-pre-training vision-language vision-language-transformer visual-question-answering visual-reasoning
Last synced: 27 Oct 2024
https://github.com/ofa-sys/ofa
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
chinese image-captioning multimodal pretrained-models pretraining prompt prompt-tuning referring-expression-comprehension text-to-image-synthesis vision-language visual-question-answering
Last synced: 20 Dec 2024
https://github.com/OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
chinese image-captioning multimodal pretrained-models pretraining prompt prompt-tuning referring-expression-comprehension text-to-image-synthesis vision-language visual-question-answering
Last synced: 03 Nov 2024
https://github.com/peteanderson80/bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
caffe captioning-images faster-rcnn image-captioning mscoco mscoco-dataset visual-question-answering vqa
Last synced: 15 Dec 2024
https://github.com/lucidrains/flamingo-pytorch
Implementation of 𦩠Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
artificial-intelligence attention-mechanism deep-learning transformers visual-question-answering
Last synced: 21 Dec 2024
https://github.com/yehli/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering
Last synced: 16 Dec 2024
https://github.com/YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering
Last synced: 03 Nov 2024
https://github.com/jnhwkim/ban-vqa
Bilinear attention networks for visual question answering
attention bilinear-pooling pytorch-implmention visual-question-answering
Last synced: 07 Nov 2024
https://github.com/davidmascharka/tbd-nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
deep-learning machine-learning neural-networks pytorch visual-question-answering visualization vqa
Last synced: 17 Dec 2024
https://github.com/MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering
Last synced: 08 Nov 2024
https://github.com/lupantech/mathvista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
ai4math large-language-models large-multimadality-models machine-learning mathematics mathqa science visual-question-answering
Last synced: 17 Dec 2024
https://github.com/cyanogenoid/pytorch-vqa
Strong baseline for visual question answering
baseline pytorch visual-question-answering vqa
Last synced: 17 Dec 2024
https://github.com/lupantech/MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
ai4math large-language-models large-multimadality-models machine-learning mathematics mathqa science visual-question-answering
Last synced: 27 Oct 2024
https://github.com/markdtw/vqa-winner-cvprw-2017
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
pytorch visual-question-answering
Last synced: 13 Nov 2024
https://github.com/qiantianwen/NuScenes-QA
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
autonomous-driving vision-language visual-question-answering
Last synced: 28 Oct 2024
https://github.com/zhegan27/VILLA
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
adversarial-training neurips-2020 pretraining vision-and-language visual-question-answering
Last synced: 28 Nov 2024
https://github.com/rentainhe/trar-vqa
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2
Last synced: 07 Nov 2024
https://github.com/China-UK-ZSL/ZS-F-VQA
[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph
commonsense commonsense-reasoning fvqa knowledge-graph visual-question-answering vqa zero-shot zs-f-vqa zsl
Last synced: 28 Nov 2024
https://github.com/showlab/lova3
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark data-asse multimodal-deep-learning multimodal-large-language-models visual-question-answering visual-question-generation
Last synced: 17 Nov 2024
https://github.com/ai-forever/fusion_brain_aij2021
Creating multimodal multitask models
bilingual handwritten-text-recognition java-to-python multimodal-fusion multitask visual-question-answering zero-shot-object-detection
Last synced: 16 Nov 2024
https://github.com/mapluisch/llava-cli-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
image-concatenation image-processing inference llama2 llama2-13b llava lmm lmms pillow python python3 pytorch visual-question-answering vqa
Last synced: 13 Nov 2024
https://github.com/lupantech/dual-mfa-vqa
Co-attending Regions and Detections for VQA.
aaai attention-mechanism caffe faster-rcnn multi-gpu multi-modal object-detection torch visual-question-answering vqa
Last synced: 05 Nov 2024
https://github.com/lucidrains/aoa-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
attention attention-mechanism captioning visual-question-answering vqa
Last synced: 22 Oct 2024
https://github.com/cloud-cv/vilbert-multi-task
:eyes: :speaking_head: :memo:12-in-1: Multi-Task Vision and Language Representation Learning Web Demo
channels cnn deep-learning javascript machine-learning postgresql python3 rabbitmq redis visual-question-answering web-sockets
Last synced: 09 Nov 2024
https://github.com/junweiliang/fvta_memexqa
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
memex-question-answering memexqa-dataset multimodal-datasets multimodal-deep-learning multimodal-representation vision-and-language visual-question-answering
Last synced: 08 Nov 2024
https://github.com/adrianbzg/llama-multimodal-vqa
Multimodal Instruction Tuning for Llama 3
chatbot chatgpt gpt-4 huggingface instruction-tuning language-models llama llama2 llama3 multimodal multimodal-instruction-tuning visual-language-learning visual-question-answering vqa
Last synced: 10 Oct 2024
https://github.com/vzhou842/easy-vqa
The Easy Visual Question Answering dataset.
dataset easy-vqa visual-question-answering vqa vqa-dataset
Last synced: 30 Oct 2024
https://github.com/cdancette/detect-shortcuts
Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
biases deep-learning visual-question-answering
Last synced: 07 Nov 2024
https://github.com/abachaa/VQA-Med-2021
VQA-Med 2021
medical-imaging radiology visual-question-answering visual-question-generation vqa vqa-dataset vqa-med
Last synced: 04 Nov 2024
https://github.com/sominw/vqamd_floyd
Visual Question Answering through modal dialogue + API
deep-learning deep-neural-networks floydhub machine-learning visual-question-answering
Last synced: 13 Nov 2024
https://github.com/mbzuai-oryx/camel-bench
CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
arabic benchmark large-multimodal-models mbzuai multimodal-learning visual-question-answering vqa
Last synced: 12 Nov 2024
https://github.com/ailln/vqa-roadmap
πVisual Question Answering Roadmap.
roadmap visual-question-answering vqa
Last synced: 18 Nov 2024
https://github.com/dinhanhx/visualroberta
The first public Vietnamese visual linguistic foundation model(s)
image-captioning image-text python python-3 python3 vietnamese-nlp visual-linguistic visual-question-answering
Last synced: 30 Nov 2024
https://github.com/fork123aniket/graph-neural-network-based-visual-question-answering
Implementation of GNNs for Visual Question Answering task in PyTorch
computer-vision encoder-decoder-architecture encoder-decoder-model graph-neural-networks natural-language-processing pytorch pytorch-geometric pytorch-implementation seq2seq-model visual-question-answering
Last synced: 15 Nov 2024
https://github.com/chen0040/mxnet-vqa
Yet Another Visual Question Answering in MXNet
image-encoding mxnet text-encoding visual-question-answering vqa
Last synced: 16 Dec 2024
https://github.com/amirshnll/persian-visual-question-answering
Visual Question Answering in Persian Based on deep learning techniques (paper code)
deep-learning persian persian-vqa resnext resnext-101 visual-question-answering vqa
Last synced: 22 Nov 2024
https://github.com/letsdoitbycode/vixual-ai-suite
The Visual AI Suite is a comprehensive toolkit designed to deliver cutting-edge AI functionalities for processing and analyzing visual data combined with natural language tasks. The suite integrates three powerful models: Image Description, Question Answering, and Visual Question Answering.
artificial-intelligence bert-models machine-learning natural-language-processing visual-question-answering
Last synced: 15 Nov 2024
https://github.com/dinhanhx/vl-datasets
Some Python scripts to load Vietnamese visual linguistic data
image-captioning image-text python python-3 python3 vietnamese vietnamese-nlp visual-linguistic visual-question-answering
Last synced: 30 Nov 2024
https://github.com/simonesartoni/anndl-visual-questioning
Third project of course "Artificial Neural Networks and Deep Learning" attended during Master Degree at Polimi and concerning the creation a Neural Network for visual question answering problem using Dataset VQA. Authors: Simone Sartoni, Mattia Surricchio
artificial-neural-networks ipynb-jupyter-notebook visual-question-answering
Last synced: 13 Nov 2024
https://github.com/nagababumo/open-source-models-with-hugging-face
asr audio-detection audio-processing automatic-speech-recognition blip clip huggingface huggingface-spaces huggingface-transformers image-captioning image-classification image-retrieval multi-modality object-detection open-source segementation sentence-embeddings transformers visual-question-answering zero-shot-learning
Last synced: 14 Nov 2024
https://github.com/0xnu/tiny_llm_trainer
The experiment implements a tiny language model trainer using PyTorch.
large-language-model large-language-models llm llm-training pytorch text-generation text-to-speech tts visual-question-answering vqa wiki wikipedia
Last synced: 15 Dec 2024
https://github.com/atharva-naik/mmml-termproject-vizwiz-vqa-challenge
VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)
carnegie-mellon-university computer-vision image-processing natural-language-processing open-source open-source-project opencv pytorch question-answering term-project vision-language vision-language-transformer visual-question-answering vizwiz vizwiz-vqa
Last synced: 10 Dec 2024
https://github.com/reshalfahsi/vqa-clip-lstm
Visual Question Answering Using CLIP + LSTM
clip lstm nlp pytorch pytorch-lightning visual-question-answering vizwiz-vqa vqa
Last synced: 15 Nov 2024
https://github.com/kritiksoman/relation-network
IPython Notebook showing pytorch implementation of Google DeepMind paper on Relation Network
ipynb-jupyter-notebook neural-networks nips-2017 pytorch-implementation visual-question-answering
Last synced: 05 Nov 2024