https://github.com/fork123aniket/graph-neural-network-based-visual-question-answering

Implementation of GNNs for Visual Question Answering task in PyTorch
https://github.com/fork123aniket/graph-neural-network-based-visual-question-answering

computer-vision encoder-decoder-architecture encoder-decoder-model graph-neural-networks natural-language-processing pytorch pytorch-geometric pytorch-implementation seq2seq-model visual-question-answering

Last synced: 6 months ago
JSON representation

Implementation of GNNs for Visual Question Answering task in PyTorch

Host: GitHub
URL: https://github.com/fork123aniket/graph-neural-network-based-visual-question-answering
Owner: fork123aniket
License: mit
Created: 2022-11-04T07:44:49.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-04-30T17:57:15.000Z (about 2 years ago)
Last Synced: 2024-11-15T18:09:59.843Z (8 months ago)
Topics: computer-vision, encoder-decoder-architecture, encoder-decoder-model, graph-neural-networks, natural-language-processing, pytorch, pytorch-geometric, pytorch-implementation, seq2seq-model, visual-question-answering
Language: Python
Homepage:
Size: 865 KB
Stars: 3
Watchers: 2
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Generating Multimodal Representations for VQA using Graph Neural Networks

This repository aims at providing a GNN-based implementation to reason over both input modalities and improve performance on a VQA dataset. Here, the model receives an image `im` and a text-based question `t` and outputs the answer to the question `t`. The following approach is being followed by the GNN model for the Visual Question Answering task:-

- Processing our input question into a graph G_t and image into a graph G_im using the Graph Parser.
- Passing the text graph G_t and image graph G_im into a graph neural network (GNN) to get the text and image node embeddings.
- Combining the embeddings using the Graph Matcher, which projects the text embeddings into the image embedding space and returns the combined multimodal representation of the input.
- Passing the joint representation through a sequence to sequence model to output the answer to the question.

## Requirements

- `PyTorch`
- `PyTorch Geometric`
- `Numpy`
- `rsmlkit`

## Usage

- This implementations trains the GNN model on the ***CLEVR*** dataset (a diagnostic dataset of `3D` shapes that tests visual and linguistic reasoning), which can be downloaded from [***here***](https://cs.stanford.edu/people/jcjohns/clevr/).
- To pre-process and prepare the dataset for training, run `Dataset.py`
- To see the GNN-based model implementation, check `Model.py`
- `Match.py` is responsible for matching nodes locally via a graph neural network and then updating correspondence scores iteratively
- To see RNN-based Encoder-Decoder implementation and how it interacts with GNN model, check `Encoder_Decoder.py`
- `Parser.py` is responsible for instantiating the models as well as taking care of loading and saving of model checkpoints
- To train the whole model pipeline, run `Train.py`

## Results

The predicted answer to each question alongside its corresponsing image can be seen in the following attached output images:-

| Image | Question | GNN-Generated Answer|
| ----- |:--------:|:-------------------:|
| ![alt text](https://github.com/fork123aniket/Graph-Neural-Network-based-Visual-Question-Answering/blob/main/Images/Shape1.PNG) | There is a tiny matte thing that is left of the big gray cylinder and in front of the cyan metal cylinder; what color is it? | Purple |
| ![alt text](https://github.com/fork123aniket/Graph-Neural-Network-based-Visual-Question-Answering/blob/main/Images/Shape2.PNG) | There is a small blue block; are there any spheres to the left of it? | Yes |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fork123aniket/graph-neural-network-based-visual-question-answering

Awesome Lists containing this project

README