Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lsabrinax/VideoTextSCM
https://github.com/lsabrinax/VideoTextSCM
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/lsabrinax/VideoTextSCM
- Owner: lsabrinax
- Created: 2021-10-19T07:51:24.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-04-01T09:48:53.000Z (over 2 years ago)
- Last Synced: 2024-08-02T11:16:05.179Z (3 months ago)
- Language: Python
- Size: 74.2 KB
- Stars: 17
- Watchers: 1
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Introduction
This is a PyToch implementation of [Video Text Tracking With a Spatio-Temporal Complementary Model](https://arxiv.org/abs/2111.04987).Part of the code is inherited from [DB](https://github.com/MhLiao/DB) and [SiamMask](https://github.com/foolwood/SiamMask).
## ToDo List- [x] Release code
- [x] Document for Installation
- [x] Document for training and testing## Installation
### Requirements:
- Python 3.6
- PyTorch >= 1.2
- GCC 5.5
- CUDA 9.2```bash
conda create --name scm python=3.6
conda activate scm# install PyTorch with cuda-9.2
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=9.2 -c pytorch# python dependencies
pip install -r requirement.txt# clone repo
git clone https://github.com/lsabrinax/VideoTextSCM
cd VideoTextSCM/# build deformable convolution opertor
cd assets/ops/dcn/
python setup.py build_ext --inplace
```## Datasets
The root of the dataset directory can be ```VideoTextSCM/datasets/```.
Download the converted ground-truth and data list [Baidu Drive](https://pan.baidu.com/s/1-r084b6l58Rhe__1SCBo6Q)(download code: 0e8b), [Google Drive](https://drive.google.com/drive/folders/13GkcaSLsXxTCbuFwUAHvBfbB6DB-5Fwq?usp=sharing). The images of each dataset can be obtained from official website.## Testing
run the below command to get the tracking results and submit the results to official website to get the performance```CUDA_VISIBLE_DEVICES=0 python demo_textboxPP.py --input-root path-to-test-dataset --output-root path-to-save-result --sub-res --dataset icdar --weight-path path-to-embedding-model --scm-config path-to-scm-config --scm-weight-path path-to-scm-model```
## Training
### SCM
```bash
#download the pre-trained model
cd VideoTextSCM/scm/experiments/siammask_sharp
wget http://www.robots.ox.ac.uk/~qwang/SiamMask_VOT.pth#train the model
cd VideoTextSCM
CUDA_VISIBLE_DEVICES=0,1,2,3 python train_scm.py --save-dir path-to-save-scm-model --pretrained \
./scm/experiments/siammask_sharp/SiamMask_VOT.pth --config ./scm/experiments/siammask_sharp/config_icdar.json \
--batch 256 --epochs 20
```### Embedding
Download totaltext_resnet50 [Baidu Drive](https://pan.baidu.com/s/1vxcdpOswTK6MxJyPIJlBkA) (download code: p6u3), [Google Drive](https://drive.google.com/open?id=1T9n0HTP3X3Y_nJ0D1ekMhCQRHntORLJG).
```bash
cd db_model & mkdir weights # put totaltext_resnet50 in db_model/weights#train embedding
cd VideoTextSCM
CUDA_VISIBLE_DEVICES=0 python train_embedding.py --exp_name model-name --batch_size 3 --num_workers 8 --lr 0.0005
```## Citing the related works
Please cite the related works in your publications if it helps your research:
@article{gao2021video,
title={Video Text Tracking With a Spatio-Temporal Complementary Model},
author={Gao, Yuzhe and Li, Xing and Zhang, Jiajian and Zhou, Yu and Jin, Dian and Wang, Jing and Zhu, Shenggao and Bai, Xiang},
journal={IEEE Transactions on Image Processing},
volume={30},
pages={9321--9331},
year={2021},
publisher={IEEE}
}