https://github.com/qinzzz/multimodal-alignment-framework

Implementation for MAF: Multimodal Alignment Framework
https://github.com/qinzzz/multimodal-alignment-framework

localization python pytorch

Last synced: 8 months ago
JSON representation

Implementation for MAF: Multimodal Alignment Framework

Host: GitHub
URL: https://github.com/qinzzz/multimodal-alignment-framework
Owner: qinzzz
Created: 2020-03-16T21:42:38.000Z (about 6 years ago)
Default Branch: public
Last Pushed: 2020-11-25T17:26:37.000Z (over 5 years ago)
Last Synced: 2025-04-06T23:14:07.812Z (about 1 year ago)
Topics: localization, python, pytorch
Language: Python
Homepage:
Size: 291 KB
Stars: 46
Watchers: 0
Forks: 9
Open Issues: 4
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

          # Multimodal Alignment Framework

Implementation of MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding.

Some of our code is based on [ban-vqa](https://github.com/jnhwkim/ban-vqa). Thanks!

**TODO**

provide Faster R-CNN feature extraction script.

## Prerequisites

- python 3.7

- pytorch 1.4.0 

## Data

### Flickr30k Entities

We use flickr30k dataset to train and validate our model.

the raw dataset can be found at [Flickr30k Entites Annotations](https://github.com/BryanPlummer/flickr30k_entities/blob/master/annotations.zip)

Run

`

 sh tools/prepare_data.sh

`

to downloaded and process Flickr30k Annotations, Images and Glove word embeddings.

### Object proposals

#### Donwload object proposals:

We use an off-the-shelf [faster-rcnn](https://github.com/jwyang/faster-rcnn.pytorch) pretrained on Visual Genome 

to generate objects proposals and labels. 

We use [Bottom-Up Attention](https://github.com/airsplay/py-bottom-up-attention) for visual features.

As [Issue#1](https://github.com/qinzzz/Multimodal-Alignment-Framework/issues/1#issue-727382153) pointed out, there is some inconsistency

between features generated using our script (faster-rcnn) and Bottom-Up Attention.

We therefore upload our generated features.

Download [train_features_compress.hdf5](https://drive.google.com/file/d/1ABnF0SZMf6pOAC89LJXbXZLMW1X86O96/view?usp=sharing)(6GB), [val features_compress.hdf5](https://drive.google.com/file/d/1iK-yz6PHwRuAciRW1vGkg9Bkj-aBE8yJ/view?usp=sharing), and [test features_compress.hdf5](https://drive.google.com/file/d/1pjntkbr20l2MiUBVQLVV6rQNWpXQymFs/view?usp=sharing) to `data/flickr30k`.

alternative link for train_feature.hdf5 (20GB, same features): [google drive](https://drive.google.com/file/d/1zxghit_mDyIKhZRemN6EDCZ3xMR4xPu5/view?usp=sharing); [baidu drive](https://pan.baidu.com/s/1cyiKNYZzpja-5brcn9QD1A), code: n1yd.

Download [train_detection_dict.json](https://drive.google.com/file/d/1_S-zyKF7F8SIEht6V66Sqbsz9TBqzY-P/view?usp=sharing), [val_detection_dict.json](https://drive.google.com/file/d/1KmyG0mghwydkb7pEwxDjItwZvNi_DRA4/view?usp=sharing), and [test_detection_dict.json](https://drive.google.com/file/d/1-r4u45EyxY7uaIk6VxCZxCiBxaOlaTC2/view?usp=sharing) and  to `data/`.

#### Generate object proposals by yourself(TODO)

~~run ` sh tools/prepare_detection.sh ` to clone faster-rcnn code and download pre-trained models.~~

~~run ` sh tools/run_faster_rcnn.sh ` to run faster-rcnn detection on flickr30k dataset and generate features.~~

*you may have to customize your environment in order to run faster-rcnn successfully. 

See [prerequisites](https://github.com/jwyang/faster-rcnn.pytorch#prerequisites)*

## Training

`

python main.py [args]

`

In our experiments, we get a ~61% accuracy using the default setting.

## Evaluating

Our trained model can be downloaded at [google drive](https://drive.google.com/file/d/1hVLDcsks2MuDJWpl2QB1H8DBCUefKCRY/view?usp=sharing).

`

python test.py --file 

`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/qinzzz/multimodal-alignment-framework

Awesome Lists containing this project

README