Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cshizhe/hgr_v2t
Code accompanying the paper "Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning".
https://github.com/cshizhe/hgr_v2t
Last synced: 17 days ago
JSON representation
Code accompanying the paper "Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning".
- Host: GitHub
- URL: https://github.com/cshizhe/hgr_v2t
- Owner: cshizhe
- License: mit
- Created: 2020-03-14T09:09:54.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-06-12T07:03:05.000Z (over 4 years ago)
- Last Synced: 2024-07-31T20:49:13.113Z (3 months ago)
- Language: Python
- Homepage:
- Size: 483 KB
- Stars: 206
- Watchers: 14
- Forks: 21
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
This repository contains PyTorch implementation of our paper [Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning (CVPR 2020)](https://arxiv.org/abs/2003.00392).
![Overview of HGR Model](figures/method_intro.png)
## Prerequisites
Python 3 and PyTorch 1.3.```
# clone the repository
git clone [email protected]:cshizhe/hgr_v2t.git
cd hgr_v2t
export PYTHONPATH=$(pwd):${PYTHONPATH}
```## Datasets
We provide annotations, pretrained features on MSRVTT, TGIF, VATEX and Youtube2Text video captioning datasets, which can be downloaded from [BaiduNetdisk](https://pan.baidu.com/s/1WuF0i1nd2uAtHzp8i8QLhA) (code: vxpi).### Annotations
- groundtruth: annotation/RET directory
1) ref_captions.json: dict, {videoname: [sent]}
2) sent2rolegraph.augment.json: {sent: (graph_nodes, graph_edges)}- vocabularies: annotation/RET directory
int2word.npy: [word]
word2int.json: {word: int}- data splits: public_split directory
trn_names.npy, val_names.npy, tst_names.npy### Features
For MSRVTT, TGIF and Youtube2Text datasets, we extract features with [Resnet152 pretrained on ImageNet](https://pytorch.org/docs/stable/torchvision/models.html).
For VATEX dataset, we use the [I3D features](http://vatex.org/main/download.html) released by VATEX challenge organizers.- mean pooling features: ordered_feature/MP directory
format: np array, shape=(num_fts, dim_ft) corresponding to the order in data_split names
- frame-level features: ordered_feature/SA directory
format: hdf5 file, {name: ft}, ft.shape=(num_frames, dim_ft)
### Fine-grained Binary Selection Annotation
We construct the fine-grained binary selection dataset based on the testing set of Youtube2Text dataset. The annotations are in the Youtube2Text/annotation/binary_selection directory.## Training & Inference
### Semantic Graph Construction
We provided constructed role graph annotations. If you want to generate role graphs for new datasets, please follow the following instructions.1. semantic role labeling:
```
python misc/semantic_role_labeling.py ref_caption_file out_file --cuda_device 0
```2. convert sentence into role graph:
```
cd misc
jupyter notebook
# open parse_sent_to_role_graph.ipynd
```### Training and Evaluation
1) The baseline [VSE++ model](https://arxiv.org/abs/1707.05612):
```
cd t2vretrieval/driver# setup config files
# you should modify data paths in configs/prepare_globalmatch_configs.py
python configs/prepare_globalmatch_cofig.py $datadir
resdir='' # copy the output string of the previous step# training
python global_match.py $resdir/model.json $resdir/path.json --is_train --resume_file $resdir/../../word_embeds.glove42b.th# inference
python global_match.py $resdir/model.json $resdir/path.json --eval_set tst
```2) Our HGR model:
```
cd t2vretrieval/driver# setup config files
# you should modify data paths in configs/prepare_mlmatch_configs.py
python configs/prepare_mlmatch_configs.py $datadir
resdir='' # copy the output string of the previous step# training
python multilevel_match.py $resdir/model.json $resdir/path.json --load_video_first --is_train --resume_file $resdir/../../word_embeds.glove42b.th# inference
python multilevel_match.py $resdir/model.json $resdir/path.json --load_video_first --eval_set tst
```## Citations
If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:
```text
@article{chen2020fine,
title={Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning},
author={Chen, Shizhe and Zhao, Yida and Jin, Qin and Wu, Qi},
journal={CVPR},
year={2020}
}
```## License
MIT License