https://github.com/difhnp/MAT
code for 'Representation Learning for Visual Object Tracking by Masked Appearance Transfer'
https://github.com/difhnp/MAT
Last synced: about 2 months ago
JSON representation
code for 'Representation Learning for Visual Object Tracking by Masked Appearance Transfer'
- Host: GitHub
- URL: https://github.com/difhnp/MAT
- Owner: difhnp
- License: gpl-3.0
- Created: 2023-03-07T04:55:55.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-06-10T02:48:22.000Z (almost 2 years ago)
- Last Synced: 2024-11-02T05:32:46.530Z (7 months ago)
- Language: Python
- Size: 4.42 MB
- Stars: 18
- Watchers: 3
- Forks: 4
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Visual-Object-Tracking - [code
README
# MAT
[Representation Learning for Visual Object Tracking by Masked Appearance Transfer](./misc/CVPR_23_MAT_Final.pdf)---

## **Installation**
- Clone the repository locally
- Create a conda environment
```shell
conda env create -f env.yaml
conda activate mat
pip install --upgrade git+https://github.com/got-10k/toolkit.git@master
# (You can use `pipreqs ./root` to analyse the requirements of this project.)
```## **Training**
- Prepare the training data:
We use LMDB to store the training data. Please check `./data/parse_` and generate lmdb datasets.
- Specify the paths in `./lib/register/paths.py`.All of our models are trained on a single machine with two RTX3090 GPUs. For distributed training on a single node with 2 GPUs:
- MAT pre-training
```shell
python -m torch.distributed.launch --nproc_per_node=2 train.py --experiment=translate_template --train_set=common_pretrain
```
- Tracker trainingModify the `cfg.model.backbone.weights` in `./config/cfg_translation_track.py` to be the last checkpoint of the MAT pre-training.
```shell
python -m torch.distributed.launch --nproc_per_node=2 train.py --experiment=translate_track --train_set=common
```[//]: # ()
[//]: # (Arguments:)
[//]: # ()
[//]: # (- `-e` or `--experiment`: the name of experiment -- check `./lib/register/experiments.py` to get more)[//]: # ( information about each experiment.)
[//]: # (- `-t` or `--train_set`: the name of train set -- check `./lib/register/dataset.py` to get more information)
[//]: # ( about each train set.)
[//]: # (- `--resume_epoch`: resume from which epoch -- for example, `100` indicates we load `checkpoint_100.pth` and)
[//]: # ( resume training.)
[//]: # (- `--pretrain_name`: the full name of the pre-trained model file -- for example, `checkpoint_100.pth` indicates we)
[//]: # ( load `./pretrain/checkpoint_100.pth`.)
[//]: # (- `--pretrain_lr_mult`: pretrain_lr = pretrain_lr_mult * base_lr -- load pre-trained weights and fine tune these)
[//]: # ( parameters with `pretrain_lr`.)
[//]: # (- `--pretrain_exclude`: the keyword of the name of pre-trained parameters that we want to discard -- for)
[//]: # ( example, `head` indicates we do not load the pre-trained weights whose name contains `head`.)
[//]: # (- `--gpu_id`: CUDA_VISIBLE_DEVICES)
[//]: # (- `--find_unused`: used in DDP mode)
[//]: # ()
[//]: # ()## **Evaluation**
We provide a multi-process testing script for evaluation on several benchmarks.
Please modify the paths to your dataset in `./lib/register/paths.py`.
Download this [checkpoint](https://drive.google.com/file/d/1rQ_hWsd0ZlBax224V443M42aNGasfKdR/view?usp=share_link) and put it into `./checkpoints/translate_track_common/`, and download this [checkpoint](https://drive.google.com/file/d/1mWpQR_96GBcB4sAAcopkHJ9b8N9S1a3F/view?usp=sharing) and put it into `./checkpoints/translate_template_common_pretrain/`.
```shell
python test.py --gpu_id=0,1 --num_process=0 --experiment=translate_track --train_set=common --benchmark=lasot --vis
```[//]: # ()
[//]: # (Arguments:)
[//]: # ()
[//]: # (- `-e` or `--experiment`: the name of experiment -- check `./lib/register/experiments.py` to get more)[//]: # ( information about each experiment.)
[//]: # (- `-t` or `--train_set`: the name of train set -- check `./lib/register/dataset.py` to get more information)
[//]: # ( about each train set.)
[//]: # (- `-b` or `--benchmark`: the name of benchmark -- check `./lib/register/benchmarks.py` to get more information)
[//]: # ( about each benchmark.)
[//]: # (- `--test_epoch`: ckp of which epoch -- the default value is `300` indicates we load weights from the last epoch.)
[//]: # (- `--num_process`: max processes each time, set 0 for single-process test.)
[//]: # (- `--gpu_id`: CUDA_VISIBLE_DEVICES)
[//]: # (- `--vis`: show tracking result.)
[//]: # ()
[//]: # ()---
## Citation
```bibtex
@InProceedings{Zhao_2023_CVPR,
author = {Zhao, Haojie and Wang, Dong and Lu, Huchuan},
title = {Representation Learning for Visual Object Tracking by Masked Appearance Transfer},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {18696-18705}
}
```## **Acknowledgments**
- Thanks for the great [MAE](https://github.com/facebookresearch/mae),
[MixFormer](https://github.com/MCG-NJU/MixFormer),
[pysot](https://github.com/STVIR/pysot).
- For data augmentation, we use [Albumentations](https://github.com/albumentations-team/albumentations).## **License**
This work is released under the **GPL 3.0 license**. Please see the
LICENSE file for more information.