Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cmhungsteve/ta3n

[ICCV 2019 (Oral)] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation (PyTorch)
https://github.com/cmhungsteve/ta3n

action-recognition cvpr2019 domain-adaptation domain-discrepancy iccv2019 pytorch temporal-dynamics video video-classification video-da-datasets

Last synced: 3 days ago
JSON representation

[ICCV 2019 (Oral)] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation (PyTorch)

Awesome Lists containing this project

README

        

# Temporal Attentive Alignment for Video Domain Adaptation

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporal-attentive-alignment-for-large-scale/domain-adaptation-on-hmdb-ucf-full)](https://paperswithcode.com/sota/domain-adaptation-on-hmdb-ucf-full?p=temporal-attentive-alignment-for-large-scale)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporal-attentive-alignment-for-large-scale/domain-adaptation-on-ucf-hmdb-full)](https://paperswithcode.com/sota/domain-adaptation-on-ucf-hmdb-full?p=temporal-attentive-alignment-for-large-scale)

---
[](https://ghassanalregib.info/)
This work was mainly done in the [**Omni Lab for Intelligent Visual Engineering and Science (OLIVES)**](https://ghassanalregib.info/) @ Georgia Tech.

Feel free to check our lab's [**Website**](https://ghassanalregib.info/) and [**GitHub**](https://github.com/olivesgatech) for other interesting work!!!

---
This is the official PyTorch implementation of our papers:

**Temporal Attentive Alignment for Large-Scale Video Domain Adaptation**
[__***Min-Hung Chen***__](https://www.linkedin.com/in/chensteven), [Zsolt Kira](https://www.cc.gatech.edu/~zk15/), [Ghassan AlRegib (Advisor)](https://ghassanalregib.info/), [Jaekwon Yoo](https://www.linkedin.com/in/jaekwon-yoo-8685862b/), [Ruxin Chen](https://www.linkedin.com/in/ruxin-chen-991477119/), [Jian Zheng](https://www.linkedin.com/in/jian-zheng/)
[*International Conference on Computer Vision (ICCV), 2019*](http://iccv2019.thecvf.com/) **[Oral (acceptance rate: 4.6%)]**
[[arXiv](https://arxiv.org/abs/1907.12743)][[Project](https://minhungchen.netlify.app/project/cdar/)][[Blog](https://mlatgt.blog/2019/09/10/overcoming-large-scale-annotation-requirements-for-understanding-videos-in-the-wild/)][[Presentation (officially recorded)](https://conftube.com/video/8oUPyhwzIDo?tocitem=146)][[Oral](https://youtu.be/j9cDuzmpYP8)][[Poster](webpage/ICCV2019_Steve_TA3N_poster_v1_2.pdf)][[Slides](https://www.dropbox.com/s/s9ud77a1zt0vqbn/Oral_TA3N_ICCV_2019_mute.pdf?dl=0)][[Open Access](http://openaccess.thecvf.com/content_ICCV_2019/html/Chen_Temporal_Attentive_Alignment_for_Large-Scale_Video_Domain_Adaptation_ICCV_2019_paper.html)][[IEEE Xplore](https://ieeexplore.ieee.org/document/9008391)]

**Temporal Attentive Alignment for Video Domain Adaptation**
[__***Min-Hung Chen***__](https://www.linkedin.com/in/chensteven), [Zsolt Kira](https://www.cc.gatech.edu/~zk15/), [Ghassan AlRegib (Advisor)](https://ghassanalregib.info/)
[*CVPR Workshop (Learning from Unlabeled Videos), 2019*](https://sites.google.com/view/luv2019)
[[arXiv](https://arxiv.org/abs/1905.10861)]



Although various image-based domain adaptation (DA) techniques have been proposed in recent years, domain shift in videos is still not well-explored. Most previous
works only evaluate performance on small-scale datasets which are saturated. Therefore, we first propose two largescale video DA datasets with much larger domain discrepancy: **UCF-HMDBfull** and **Kinetics-Gameplay**. Second, we investigate different DA integration methods for videos, and show that simultaneously aligning and learning temporal dynamics achieves effective alignment even without sophisticated DA methods. Finally, we propose **Temporal Attentive Adversarial Adaptation Network (TA3N)**, which explicitly attends to the temporal dynamics using domain
discrepancy for more effective domain alignment, achieving state-of-the-art performance on four video DA datasets.




---
## Contents
* [Requirements](#requirements)
* [Dataset Preparation](#dataset-preparation)
* [Data structure](#data-structure)
* [File lists for training/validation](#file-lists-for-trainingvalidation)
* [Input data](#input-data)
* [Usage](#usage)
* [Training](#training)
* [Testing](#testing)

* [Options](#options)
* [Domain Adaptation](#domain-adaptation)
* [More options](#more-options)
* [Citation](#citation)
* [Contact](#contact)

---
## Requirements
* support Python 3.6, PyTorch 0.4, CUDA 9.0, CUDNN 7.1.4
* install all the library with: `pip install -r requirements.txt`

---
## Dataset Preparation
### Data structure
You need to extract frame-level features for each video to run the codes. To extract features, please check [`dataset_preparation/`](dataset_preparation/).

Folder Structure:
```
DATA_PATH/
DATASET/
list_DATASET_SUFFIX.txt
RGB/
CLASS_01/
VIDEO_0001.mp4
VIDEO_0002.mp4
...
CLASS_02/
...

RGB-Feature/
VIDEO_0001/
img_00001.t7
img_00002.t7
...
VIDEO_0002/
...
```
`RGB-Feature/` contains all the feature vectors for training/testing. `RGB/` contains all the raw videos.

There should be at least two `DATASET` folders: source training set and validation set. If you want to do domain adaption, you need to have another `DATASET`: target training set.

### File lists for training/validation
The file list `list_DATASET_SUFFIX.txt` is required for data feeding. Each line in the list contains the full path of the video folder, video frame number, and video class index. It looks like:
```
DATA_PATH/DATASET/RGB-Feature/VIDEO_0001/ 100 0
DATA_PATH/DATASET/RGB-Feature/VIDEO_0002/ 150 1
......
```
To generate the file list, please check [`dataset_preparation/`](dataset_preparation/).

### Input data
Here we provide pre-extracted features and data list files, so you can skip the above two steps and directly try our training/testing codes. You may need to manually edit the path in the data list files.
* Features
* UCF: [download](https://www.dropbox.com/s/ebtc1hz1paz9bmz/ucf101-feat.zip?dl=0)
* HMDB: [download](https://www.dropbox.com/s/aiac0ytb9jt83a2/hmdb51-feat.zip?dl=0)
* Olympic: [training](https://www.dropbox.com/s/0ljfsp52hydyqht/olympic_train-feat.zip?dl=0) | [validation](https://www.dropbox.com/s/yh09a2th4hf8hqp/olympic_val-feat.zip?dl=0)
* Data lists
* UCF-Olympic
* UCF: [training list](https://www.dropbox.com/s/du8d3qrzs9h8phn/list_ucf101_train_ucf_olympic-feature.txt?dl=0) | [validation list](https://www.dropbox.com/s/0qrhuen3o27g9k5/list_ucf101_val_ucf_olympic-feature.txt?dl=0)
* Olympic: [training list](https://www.dropbox.com/s/0eafz1kjk71i0i9/list_olympic_train_ucf_olympic-feature.txt?dl=0) | [validation list](https://www.dropbox.com/s/ku27uniw4xm7wpm/list_olympic_val_ucf_olympic-feature.txt?dl=0)
* UCF-HMDBsmall
* UCF: [training list](https://www.dropbox.com/s/2g04infpxwysjfb/list_ucf101_train_hmdb_ucf_small-feature.txt?dl=0) | [validation list](https://www.dropbox.com/s/6fjour5n1dcabfy/list_ucf101_val_hmdb_ucf_small-feature.txt?dl=0)
* HMDB: [training list](https://www.dropbox.com/s/q6e7jwhr1ktmrrt/list_hmdb51_train_hmdb_ucf_small-feature.txt?dl=0) | [validation list](https://www.dropbox.com/s/qh3h619bdo2q3h1/list_hmdb51_val_hmdb_ucf_small-feature.txt?dl=0)
* UCF-HMDBfull
* UCF: [training list](https://www.dropbox.com/s/jrahoh6u8k90iec/list_ucf101_train_hmdb_ucf-feature.txt?dl=0) | [validation list](https://www.dropbox.com/s/7359sfsflfkf60c/list_ucf101_val_hmdb_ucf-feature.txt?dl=0)
* HMDB: [training list](https://www.dropbox.com/s/thj7mkzof6pgfmj/list_hmdb51_train_hmdb_ucf-feature.txt?dl=0) | [validation list](https://www.dropbox.com/s/s9yc43u87kjcdhx/list_hmdb51_val_hmdb_ucf-feature.txt?dl=0)

* Kinetics-Gameplay: please fill [**this form**](https://forms.gle/bziHhvQJGmi7hwF26) to request the features and data lists.

Creative Commons License
The Kinetics-Gameplay dataset is licensed under CC BY-NC-SA 4.0 for non-commercial purposes only.
[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

---
## Usage
* training/validation: Run `./script_train_val.sh`

All the commonly used variables/parameters have comments in the end of the line. Please check [Options](#options).

#### Training
All the outputs will be under the directory `exp_path`.
* Outputs:
* model weights: `checkpoint.pth.tar`, `model_best.pth.tar`
* log files: `train.log`, `train_short.log`, `val.log`, `val_short.log`

#### Testing
You can choose one of model_weights for testing. All the outputs will be under the directory `exp_path`.

* Outputs:
* score_data: used to check the model output (`scores_XXX.npz`)
* confusion matrix: `confusion_matrix_XXX.png` and `confusion_matrix_XXX-topK.txt`

---
## Options
#### Domain Adaptation

In `./script_train_val.sh`, there are several options related to our DA approaches.
* `use_target`: switch on/off the DA mode
* `none`: not use target data (no DA)
* `uSv`/`Sv`: use target data in a unsupervised/supervised way

#### More options
For more details of all the arguments, please check [opts.py](opts.py).

#### Notes
The options in the scripts have comments with the following types:
* no comment: user can still change it, but NOT recommend (may need to change the code or have different experimental results)
* comments with choices (e.g. `true | false`): can only choose from choices
* comments as `depend on users`: totally depend on users (mostly related to data path)

---
## Citation
If you find this repository useful, please cite our papers:
```
@inproceedings{chen2019temporal,
title={Temporal attentive alignment for large-scale video domain adaptation},
author={Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan and Woo, Jaekwon and Chen, Ruxin and Zheng, Jian},
booktitle={IEEE International Conference on Computer Vision (ICCV)},
year={2019},
url={https://arxiv.org/abs/1907.12743}
}

@article{chen2019taaan,
title={Temporal Attentive Alignment for Video Domain Adaptation},
author={Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan},
journal={CVPR Workshop on Learning from Unlabeled Videos},
year={2019},
url={https://arxiv.org/abs/1905.10861}
}
```

---
### Acknowledgments
This work was mainly done in [OLIVES](https://ghassanalregib.info/)@GT with the guidance of Prof. [Ghassan AlRegib](https://ghassanalregib.info/), and the collaboration with Prof. [Zsolt Kira](https://www.cc.gatech.edu/~zk15/) at Georgia Tech. Part of this work was done with the collaboration with [Jaekwon Yoo](https://www.linkedin.com/in/jaekwon-yoo-8685862b/), [Ruxin Chen](https://www.linkedin.com/in/ruxin-chen-991477119/) and [Jian Zheng](https://www.linkedin.com/in/jian-zheng/).

Some codes are borrowed from [TSN](https://github.com/yjxiong/temporal-segment-networks), [pytorch-tsn](https://github.com/yjxiong/tsn-pytorch), [TRN-pytorch](https://github.com/metalbubble/TRN-pytorch), and [Xlearn](https://github.com/thuml/Xlearn/tree/master/pytorch).

Special thanks to the development team for the product used in the Kinetics-Gameplay dataset:

**Detroit: Become Human™ ©Sony Interactive Entertainment Europe, developed by Quantic Dream**

---
### Contact
[Min-Hung Chen](https://www.linkedin.com/in/chensteven)

cmhungsteve AT gatech DOT edu

[](https://ghassanalregib.info/)