https://github.com/idiap/multimodal_gaze_target_prediction
This repo provides the training and testing code for our paper "A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings" published at the GAZE workshop at CVPR 2022
https://github.com/idiap/multimodal_gaze_target_prediction
attention cvpr cvpr2022 gaze gaze-estimation pytorch
Last synced: about 1 year ago
JSON representation
This repo provides the training and testing code for our paper "A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings" published at the GAZE workshop at CVPR 2022
- Host: GitHub
- URL: https://github.com/idiap/multimodal_gaze_target_prediction
- Owner: idiap
- Created: 2022-10-18T08:36:36.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-18T08:38:20.000Z (over 3 years ago)
- Last Synced: 2025-03-23T01:01:50.166Z (about 1 year ago)
- Topics: attention, cvpr, cvpr2022, gaze, gaze-estimation, pytorch
- Language: Python
- Homepage:
- Size: 48.8 KB
- Stars: 24
- Watchers: 5
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSES/GPL-3.0-only.txt
Awesome Lists containing this project
README
### Overview
This repo provides the training and testing code for our paper "A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings" published at the GAZE workshop at CVPR 2022.
[[paper]](https://openaccess.thecvf.com/content/CVPR2022W/GAZE/papers/Gupta_A_Modular_Multimodal_Architecture_for_Gaze_Target_Prediction_Application_to_CVPRW_2022_paper.pdf) [[video]](https://youtu.be/z-XSwLOpNzw)
### Setup
We use the GazeFollow and VideoAttentionTarget datasets for training and testing our models. Please download them at the following link provided by ejcgt/attention-target-detection:
GazeFollow extended: [link](https://www.dropbox.com/s/3ejt9pm57ht2ed4/gazefollow_extended.zip?dl=0)
VideoAttentionTarget: [link](https://www.dropbox.com/s/8ep3y1hd74wdjy5/videoattentiontarget.zip?dl=0)
Next, extract the pose and depth modalities for both datasets following the instructions in [modality_extraction.md](modality_extraction.md)
After, please update the paths to the datasets in the ```config.py``` file.
We use pytorch for our experiments. Use the provided environment file to create the conda environment for the experiments.
```
conda env create -f environment.yml
```
### Training
#### Training on GazeFollow
##### Step 1. Train the single modality models
```
python train_on_gazefollow.py --modality image --backbone_name efficientnet-b1 --log_dir
python train_on_gazefollow.py --modality depth --backbone_name efficientnet-b0 --log_dir
python train_on_gazefollow.py --modality pose --backbone_name efficientnet-b0 --log_dir
```
The trained model weights will be saved in the specified ```log_dir```.
##### Step 2. Initialize the weights for the attention model
```
python initialize_attention_model.py --image_weights --depth_weights --pose_weights --attention_weights
```
Provide the paths to the pretrained image, depth and pose models. The attention model with initialized weights will be saved in the path specified by the ```attention_weights``` argument.
##### Step 3. Train the attention model
```
python train_on_gazefollow.py --modality attention --init_weights --log_dir
```
Provide the path to the initialized attention model weights. The trained model weights will be saved in the specified ```log_dir```.
#### Training on VideoAttentionTarget
Set ```pred_inout=True``` in the ```config.py``` file.
##### Train the single modality models
```
python train_on_videoatttarget.py --modality image --init_weights --backbone_name efficientnet-b1 --log_dir
python train_on_videoatttarget.py --modality depth --init_weights --backbone_name efficientnet-b0 --log_dir
python train_on_videoatttarget.py --modality pose --init_weights --backbone_name efficientnet-b0 --log_dir
```
Provide the initial weights from training on GazeFollow. The trained model weights will be saved in the specified ```log_dir```.
##### Train the attention model
```
python train_on_videoatttarget.py --modality attention --init_weights --log_dir
```
Provide the initial weights from training on GazeFollow. The trained model weights will be saved in the specified ```log_dir```.
#### Training the privacy-sensitive models
Simply set ```privacy=True``` in the ```config.py``` file. Then follow the same steps as above to train the respective models.
### Testing
#### Testing on GazeFollow
```
python eval_on_gazefollow.py --model_weights
```
Provide the path to the model weights with the ```model_weights``` argument.
#### Testing on VideoAttentionTarget
```
python eval_on_videoatttarget.py --model_weights
```
Provide the path to the model weights with the ```model_weights``` argument.
### Pre-trained models
Pre-trained human-centric module: [link](https://drive.switch.ch/index.php/s/5hDsBdP4OsLks5X)
Pre-trained attention model on GazeFollow: [link](https://drive.switch.ch/index.php/s/fJVjWSJWQtoJeT3)
Pre-trained attention model on VideoAttentionTarget: [link](https://drive.switch.ch/index.php/s/EjVQlvUDisvL1c4)
### Citation
If you use our code, please cite:
```bibtex
@inproceedings{gupta2022modular,
title={A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings},
author={Gupta, Anshul and Tafasca, Samy and Odobez, Jean-Marc},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
pages={5041--5050},
year={2022}
}
```
### References
Parts of the code have been adapted from ejcgt/attention-target-detection