Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/phantrdat/cvpr20-scatter-text-recognizer

Unofficial implementation of CVPR 2020 paper "SCATTER: Selective Context Attentional Scene Text Recognizer"
https://github.com/phantrdat/cvpr20-scatter-text-recognizer

Last synced: 9 days ago
JSON representation

Unofficial implementation of CVPR 2020 paper "SCATTER: Selective Context Attentional Scene Text Recognizer"

Awesome Lists containing this project

README

        

# An implementation of CVPR 2020 paper "SCATTER: Selective Context Attentional Scene Text Recognizer"

[Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Litman_SCATTER_Selective_Context_Attentional_Scene_Text_Recognizer_CVPR_2020_paper.pdf) | [Pretrained model](https://drive.google.com/drive/folders/1niuPM6otpSQFSai8Ft2bO0lhdqEjE96Z?usp=sharing)

## Introduction
This is an unofficial implementation of paper "SCATTER: Selective Context Attentional Scene Text Recognizer" published at CVPR 2020.

## Getting Started

### Dependency
- This work was tested with PyTorch 1.6.0, CUDA 10.2, python 3.6.10 and Ubuntu 18.04.
```
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
```
- requirements : lmdb, pillow, nltk, natsort
```
pip3 install lmdb pillow nltk natsort
```

### Dataset
- training dataset: [MJSynth (MJ)](http://www.robots.ox.ac.uk/~vgg/data/text/)[1], [SynthText (ST)](http://www.robots.ox.ac.uk/~vgg/data/scenetext/)[2] and
[SynthAdd (SA)](https://arxiv.org/pdf/1811.00751.pdf) [3]
- validation datasets : the union of the training sets [IC13](http://rrc.cvc.uab.es/?ch=2)[4], [IC15](http://rrc.cvc.uab.es/?ch=4)[5], [IIIT](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)[6], and [SVT](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)[7].\
evaluation datasets : benchmark evaluation datasets, consist of [IIIT](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)[5], [SVT](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)[7], [IC03](http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2003_Robust_Reading_Competitions)[8], [IC13](http://rrc.cvc.uab.es/?ch=2)[4], [IC15](http://rrc.cvc.uab.es/?ch=4)[5], [SVTP](http://openaccess.thecvf.com/content_iccv_2013/papers/Phan_Recognizing_Text_with_2013_ICCV_paper.pdf)[9], and [CUTE](http://cs-chan.com/downloads_CUTE80_dataset.html)[10].

### Pretrained Model

Two pretrained models are provided (Will be updated when better models are trained):
1. non-senstive: includes ten digits (0-9) and 26 characters (a-z).
2. sensitive: includes all readable characters.

Pretrained models can be downloaded [here](https://drive.google.com/drive/folders/1niuPM6otpSQFSai8Ft2bO0lhdqEjE96Z?usp=sharing)

### Run demo
- With non-sensitve model
```
python demo.py --saved_model scatter-case-non-sensitive.pth --image_folder
```

- With sensitve model
```
python demo.py --saved_model scatter-case-sensitive.pth --sensitive --image_folder
```

### Training and evaluation

Download lmdb dataset for traininig and evaluation provided by [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark) from [here](https://drive.google.com/drive/folders/192UfE9agQUMNq6AgU3_E05_FcPZK4hyt)

Download addition dataset SynthText_Add (SA) for training from [here](https://drive.google.com/drive/u/1/folders/1agZ9ufDNYfzdQe1fGWH3dSk6L5BQU0o0) (includes raw images and lmdb format).

Training
```
python3 train.py --train_data data_lmdb_release/training --valid_data data_lmdb_release/validation --select_data MJ-ST-SA --batch_ratio 0.4-0.4-0.2 --sensitive
```

Testing

```
python3 test.py --eval_data data_lmdb_release/evaluation --saved_model scatter-case-sensitive.pth --sensitive --data_filtering_off
```

### Reported results

- Using evaluation set [here](https://drive.google.com/drive/folders/192UfE9agQUMNq6AgU3_E05_FcPZK4hyt)

- Compare with result in the original paper and baseline model.

| Model | IIIT5K | SVT | IC03 | IC13 | **Regular Text** | IC15 | SVTP | CUTE | **Irregular Text** |
|:---------------------:|:----------:|:-------:|:-------:|:-------:|:----------------:|:-------:|:-------:|:-------:|:------------------:|
| Paper (non-sensitive)| 93.7 | 92.7 | 96.3 | 93.9 | 94.0 | 82.2 | 86.9 | 87.5 | 83.7 |
| Baseline | 87.9 | 87.5 | 94.9 | 92.3 | 89.8 | 71.8 | 79.2 | 74.0 | 73.6 |
| Our (sensitive) | 93.5 | 90.9 | 95.0 | 93.6 | 93.4 | 78.6 | 83.4 | 83.3 | 80.0 |
| Our (non-sensitive) | 93.8 | 90.9 | 95.3 | 93.8 | 93.7 | 79.7 | 85.0 | 86.1 | 81.5 |

## Acknowledgements
This code is built upon [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark).

## Reference
[1] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scenetext recognition. In Workshop on Deep Learning, NIPS, 2014.

[2] A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data fortext localisation in natural images. In CVPR, 2016.

[3] Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. In AAAI, 2019

[4] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Big-orda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, andL. P. De Las Heras. ICDAR 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013.

[5] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R.Chandrasekhar, S. Lu, et al. ICDAR 2015 competition on ro-bust reading. In ICDAR, pages 1156–1160, 2015.

[6] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In BMVC, 2012.

[7] K. Wang, B. Babenko, and S. Belongie. End-to-end scenetext recognition. In ICCV, pages 1457–1464, 2011.

[8] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, andR. Young. ICDAR 2003 robust reading competitions. In ICDAR, pages 682–687, 2003.

[9] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, pages 569–576, 2013.

[10] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. In ESWA, volume 41, pages 8027–8048, 2014.

[11] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, volume 39, pages2298–2304. 2017.

## Citation
Please consider citing this work in your publications if it helps your research.
```
@inproceedings{litman2020scatter,
title={SCATTER: selective context attentional scene text recognizer},
author={Litman, Ron and Anschel, Oron and Tsiper, Shahar and Litman, Roee and Mazor, Shai and Manmatha, R},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={11962--11972},
year={2020}
}
```