Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/facebookresearch/Motionformer
Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers
https://github.com/facebookresearch/Motionformer
Last synced: 24 days ago
JSON representation
Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers
- Host: GitHub
- URL: https://github.com/facebookresearch/Motionformer
- Owner: facebookresearch
- Archived: true
- Created: 2021-06-03T07:59:07.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-06-13T23:03:22.000Z (over 2 years ago)
- Last Synced: 2024-08-09T13:19:28.218Z (4 months ago)
- Language: Python
- Size: 1.42 MB
- Stars: 223
- Watchers: 12
- Forks: 29
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- StarryDivineSky - facebookresearch/Motionformer - 400 和Something-Something V2)上实现最先进的结果。 (其他_机器视觉 / 网络服务_其他)
README
# Motionformer
This is an official pytorch implementation of paper [Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers](https://arxiv.org/abs/2106.05392). In this repository, we provide PyTorch code for training and testing our proposed Motionformer model. Motionformer use proposed *trajectory attention* to achieve state-of-the-art results on several video action recognition benchmarks such as Kinetics-400 and Something-Something V2.
If you find Motionformer useful in your research, please use the following BibTeX entry for citation.
```BibTeX
@inproceedings{patrick2021keeping,
title={Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers},
author={Mandela Patrick and Dylan Campbell and Yuki M. Asano and Ishan Misra Florian Metze and Christoph Feichtenhofer and Andrea Vedaldi and Jo\ão F. Henriques},
year={2021},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
}
```# Model Zoo
We provide Motionformer models pretrained on Kinetics-400 (K400), Kinetics-600 (K600), Something-Something-V2 (SSv2), and Epic-Kitchens datasets.
| name | dataset | # of frames | spatial crop | acc@1 | acc@5 | url |
| --- | --- | --- | --- | --- | --- | --- |
| Joint | K400 | 16 | 224 | 79.2 | 94.2 | [model](https://dl.fbaipublicfiles.com/motionformer/k400_joint_224_16x4.pyth) |
| Divided | K400 | 16 | 224 | 78.5 | 93.8 | [model](https://dl.fbaipublicfiles.com/motionformer/k400_divided_224_16x4.pyth) |
| Motionformer | K400 | 16 | 224 | 79.7 | 94.2 | [model](https://dl.fbaipublicfiles.com/motionformer/k400_motionformer_224_16x4.pyth) |
| Motionformer-HR | K400 | 16 | 336 | 81.1 | 95.2 | [model](https://dl.fbaipublicfiles.com/motionformer/k400_motionformer_336_16x8.pyth) |
| Motionformer-L | K400 | 32 | 224 | 80.2 | 94.8 | [model](https://dl.fbaipublicfiles.com/motionformer/k400_motionformer_224_32x3.pyth) || name | dataset | # of frames | spatial crop | acc@1 | acc@5 | url |
| --- | --- | --- | --- | --- | --- | --- |
| Motionformer | K600 | 16 | 224 | 81.6 | 95.6 | [model](https://dl.fbaipublicfiles.com/motionformer/k600_motionformer_224_16x4.pyth) |
| Motionformer-HR | K600 | 16 | 336 | 82.7 | 96.1 | [model](https://dl.fbaipublicfiles.com/motionformer/k600_motionformer_336_16x8.pyth) |
| Motionformer-L | K600 | 32 | 224 | 82.2 | 96.0 | [model](https://dl.fbaipublicfiles.com/motionformer/k600_motionformer_224_32x3.pyth) || name | dataset | # of frames | spatial crop | acc@1 | acc@5 | url |
| --- | --- | --- | --- | --- | --- | --- |
| Joint | SSv2 | 16 | 224 | 64.0 | 88.4 | [model](https://dl.fbaipublicfiles.com/motionformer/ssv2_joint_224_16x4.pyth) |
| Divided | SSv2 | 16 | 224 | 64.2 | 88.6 | [model](https://dl.fbaipublicfiles.com/motionformer/ssv2_divided_224_16x4.pyth) |
| Motionformer | SSv2 | 16 | 224 | 66.5 | 90.1 | [model](https://dl.fbaipublicfiles.com/motionformer/ssv2_motionformer_224_16x4.pyth) |
| Motionformer-HR | SSv2 | 16 | 336 | 67.1 | 90.6 | [model](https://dl.fbaipublicfiles.com/motionformer/ssv2_motionformer_336_16x4.pyth) |
| Motionformer-L | SSv2 | 32 | 224 | 68.1 | 91.2 | [model](https://dl.fbaipublicfiles.com/motionformer/ssv2_motionformer_224_32x3.pyth) || name | dataset | # of frames | spatial crop | A acc | N acc | url |
| --- | --- | --- | --- | --- | --- | --- |
| Motionformer | EK | 16 | 224 | 43.1 | 56.5 | [model](https://dl.fbaipublicfiles.com/motionformer/ek_motionformer_224_16x4.pyth) |
| Motionformer-HR | EK | 16 | 336 | 44.5 | 58.5 | [model](https://dl.fbaipublicfiles.com/motionformer/ek_motionformer_336_16x4.pyth) |
| Motionformer-L | EK | 32 | 224 | 44.1 | 57.6 | [model](https://dl.fbaipublicfiles.com/motionformer/ek_motionformer_224_32x3.pyth) |# Installation
First, create a conda virtual environment and activate it:
```
conda create -n motionformer python=3.8.5 -y
source activate motionformer
```Then, install the following packages:
- torchvision: `pip install torchvision` or `conda install torchvision -c pytorch`
- [fvcore](https://github.com/facebookresearch/fvcore/): `pip install 'git+https://github.com/facebookresearch/fvcore'`
- simplejson: `pip install simplejson`
- einops: `pip install einops`
- timm: `pip install timm`
- PyAV: `conda install av -c conda-forge`
- psutil: `pip install psutil`
- scikit-learn: `pip install scikit-learn`
- OpenCV: `pip install opencv-python`
- tensorboard: `pip install tensorboard`
- matplotlib: `pip install matplotlib`
- pandas: `pip install pandas`
- ffmeg: `pip install ffmpeg-python`OR:
simply create conda environment with all packages just from yaml file:
`conda env create -f environment.yml`
Lastly, build the Motionformer codebase by running:
```
git clone https://github.com/facebookresearch/Motionformer
cd Motionformer
python setup.py build develop
```# Usage
## Dataset Preparation
Please use the dataset preparation instructions provided in [DATASET.md](slowfast/datasets/DATASET.md).
## Training the Default Motionformer
Training the default Motionformer that uses trajectory attention, and operates on 16-frame clips cropped at 224x224 spatial resolution, can be done using the following command:
```
python tools/run_net.py \
--cfg configs/K400/motionformer_224_16x4.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_dataset \
NUM_GPUS 8 \
TRAIN.BATCH_SIZE 8 \
```
You may need to pass location of your dataset in the command line by adding `DATA.PATH_TO_DATA_DIR path_to_your_dataset`, or you can simply modify```
DATA:
PATH_TO_DATA_DIR: path_to_your_dataset
```We improved the trajectory attention from original code, and you can set the `VIT.USE_ORIGINAL_TRAJ_ATTN_CODE` flag to `False` to use it:
```
VIT:
USE_ORIGINAL_TRAJ_ATTN_CODE: False
```To the yaml configs file, then you do not need to pass it to the command line every time.
## Using a Different Number of GPUs
If you want to use a smaller number of GPUs, you need to modify .yaml configuration files in [`configs/`](configs/). Specifically, you need to modify the NUM_GPUS, TRAIN.BATCH_SIZE, TEST.BATCH_SIZE, DATA_LOADER.NUM_WORKERS entries in each configuration file. The BATCH_SIZE entry should be the same or higher as the NUM_GPUS entry.
## Using Different Self-Attention Schemes
If you want to experiment with different space-time self-attention schemes, e.g., joint space-time attention or divided space-time attention, use the following commands:
```
python tools/run_net.py \
--cfg configs/K400/joint_224_16x4.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_dataset \
NUM_GPUS 8 \
TRAIN.BATCH_SIZE 8 \
```and
```
python tools/run_net.py \
--cfg configs/K400/divided_224_16x4.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_dataset \
NUM_GPUS 8 \
TRAIN.BATCH_SIZE 8 \
```## Training Different Motionformer Variants
If you want to train more powerful Motionformer variants, e.g., Motionformer-HR (operating on 16-frame clips sampled at 336x336 spatial resolution), and Motionformer-L (operating on 32-frame clips sampled at 224x224 spatial resolution), use the following commands:
```
python tools/run_net.py \
--cfg configs/K400/motionformer_336_16x8.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_dataset \
NUM_GPUS 8 \
TRAIN.BATCH_SIZE 8 \
```and
```
python tools/run_net.py \
--cfg configs/K400/motionformer_224_32x3.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_dataset \
NUM_GPUS 8 \
TRAIN.BATCH_SIZE 8 \
```Note that for these models you will need a set of GPUs with ~32GB of memory.
## Inference
Use `TRAIN.ENABLE` and `TEST.ENABLE` to control whether training or testing is required for a given run. When testing, you also have to provide the path to the checkpoint model via TEST.CHECKPOINT_FILE_PATH.
```
python tools/run_net.py \
--cfg configs/K400/motionformer_224_16x4.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_dataset \
TEST.CHECKPOINT_FILE_PATH path_to_your_checkpoint \
TRAIN.ENABLE False \
```Alterantively, you can modify provided SLURM script and run following:
```
sbatch slurm_scripts/test.sh configs/K400/motionformer_224_16x4.yaml path_to_your_checkpoint
```## Single-Node Training via Slurm
To train Motionformer via Slurm, please check out our single node Slurm training script [`slurm_scripts/run_single_node_job.sh`](slurm_scripts/run_single_node_job.sh).
```
sbatch slurm_scripts/run_single_node_job.sh configs/K400/motionformer_224_16x4.yaml /your/job/dir/${JOB_NAME}/
```## Multi-Node Training via Submitit
Distributed training is available via Slurm and submitit
```
pip install submitit
```To train Motionformer model on Kinetics using 8 nodes with 8 gpus each use the following command:
```
python run_with_submitit.py --cfg configs/K400/motionformer_224_16x4.yaml --job_dir /your/job/dir/${JOB_NAME}/ --partition $PARTITION --num_shards 8 --use_volta32
```We provide a script for launching slurm jobs in [`slurm_scripts/run_multi_node_job.sh`](slurm_scripts/run_multi_node_job.sh).
```
sbatch slurm_scripts/run_multi_node_job.sh configs/K400/motionformer_224_16x4.yaml /your/job/dir/${JOB_NAME}/
```Please note that hyper-parameters in configs were used with 8 nodes with 8 gpus (32 GB). Please scale batch-size, and learning-rate appropriately for your cluster configuration.
## Finetuning
To finetune from an existing PyTorch checkpoint add the following line in the command line, or you can also add it in the YAML config:
```
TRAIN.CHECKPOINT_EPOCH_RESET: True
TRAIN.CHECKPOINT_FILE_PATH path_to_your_PyTorch_checkpoint
```# Environment
The code was developed using python 3.8.5 on Ubuntu 20.04. For training, we used eight GPU compute nodes each node containing 8 Tesla V100 GPUs (32 GPUs in total). Other platforms or GPU cards have not been fully tested.
# License
The majority of this work is licensed under [CC-NC 4.0 International license](LICENSE). However, portions of the project are available under separate license terms: [SlowFast](https://github.com/facebookresearch/SlowFast) and [pytorch-image-models](https://github.com/rwightman/pytorch-image-models) are licensed under the Apache 2.0 license.
# Contributing
We actively welcome your pull requests. Please see [CONTRIBUTING.md](CONTRIBUTING.md) and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) for more info.
# Acknowledgements
Motionformer is built on top of [PySlowFast](https://github.com/facebookresearch/SlowFast), [Timesformer](https://github.com/facebookresearch/TimeSformer) and [pytorch-image-models](https://github.com/rwightman/pytorch-image-models) by [Ross Wightman](https://github.com/rwightman). We thank the authors for releasing their code. If you use our model, please consider citing these works as well:
```BibTeX
@misc{fan2020pyslowfast,
author = {Haoqi Fan and Yanghao Li and Bo Xiong and Wan-Yen Lo and
Christoph Feichtenhofer},
title = {PySlowFast},
howpublished = {\url{https://github.com/facebookresearch/slowfast}},
year = {2020}
}
``````BibTeX
@inproceedings{gberta_2021_ICML,
author = {Gedas Bertasius and Heng Wang and Lorenzo Torresani},
title = {Is Space-Time Attention All You Need for Video Understanding?},
booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
month = {July},
year = {2021}
}
``````BibTeX
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
}
```