https://github.com/Charmve/awesome-scene-text-detection

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized with code and dataset
https://github.com/Charmve/awesome-scene-text-detection

List: awesome-scene-text-detection

charmve dataset datasets detection irregular-text-recognition level-annotation ocr recognition scene-text-detection scene-text-recognition text-detection text-detection-recognition

Last synced: 6 months ago
JSON representation

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized with code and dataset

Host: GitHub
URL: https://github.com/Charmve/awesome-scene-text-detection
Owner: Charmve
License: mit
Created: 2020-11-04T04:04:16.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-06-26T04:16:02.000Z (almost 4 years ago)
Last Synced: 2024-05-23T08:02:40.419Z (about 1 year ago)
Topics: charmve, dataset, datasets, detection, irregular-text-recognition, level-annotation, ocr, recognition, scene-text-detection, scene-text-recognition, text-detection, text-detection-recognition
Homepage:
Size: 15.1 MB
Stars: 80
Watchers: 5
Forks: 17
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

ultimate-awesome - awesome-scene-text-detection - Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized with code and dataset. (Other Lists / Julia Lists)

README

# Scene-Text-Detection

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized with code and dataset.

## Table of Content
- [1.Datasets](#1-datasets)
- [1.1 Horizontal-Text Datasets](#11-Horizontal-Text-Datasets)
- [1.2 Arbitrary-Quadrilateral-Text Datasets](#12-Arbitrary-Quadrilateral-Text-Datasets)
- [1.3 Irregular-Text Datasets](#13-Irregular-Text-Datasets)
- [1.4 Synthetic Datasets](#14-synthetic-datasets)
- [1.5 Comparison of Datasets](#15-comparison-of-datasets)
- [2. Survey](#2-survey)
- [3. Evaluation](#3-Evaluation)
- [4. OCR Service](#4-ocr-service)
- [5. References and Code](#5-references)

------

## ✨ News! ✨

2020.11.04: 21 papers was updated from CVPR 2020!

5. References and Code

References and Code(CVPR2020)

2020.10.12: A detailed survey was organized from IJCV 2020!

Scene Text Detection Survey

## 1. Datasets

### 1.1 Horizontal-Text Datasets

- ICDAR 2003(IC03)：
* **Introduction:** It contains 509 images in total, 258 for training and 251 for testing. Specifically, it contains 1110 text instance in training set, while 1156 in testing set. It has word-level annotation. IC03 only consider English text instance.
* **Link:** [IC03-download](http://www.iapr-tc11.org/mediawiki/index.php?title=ICDAR_2003_Robust_Reading_Competitions)

- ICDAR 2011(IC11):
* **Introduction:** IC11 is an English dataset for text detection. It contains 484 images, 229 for training and 255 for testing. There are 1564 text instance in this dataset. It provides both word-level and character-level annotation.
* **Link:** [IC11-download](http://www.cvc.uab.es/icdar2011competition/?com=downloads)

- ICDAR 2013(IC13)：
* **Introduction:** IC13 is almost the same as IC11. It contains 462 images in total, 229 for training and 233 for testing. Specifically, it contains 849 text instance in training set, while 1095 in testing set.
* **Link:** [IC13-download](http://dagdata.cvc.uab.es/icdar2013competition/?ch=2&com=downloads)

### 1.2 Arbitrary-Quadrilateral-Text Datasets

- USTB-SV1K：
* **Introduction:** USTB-SV1K is an English dataset. It contains 1000 street images from Google Street View with 2955 text instance in total. It only provides word-level annotations.
* **Link:** [USTB-SV1K-download](http://prir.ustb.edu.cn/TexStar/MOMV-text-detection/)

- SVT：
* **Introduction:** It contains 350 images with 725 English text intance in total. SVT has both character-level and word-level annotations. The images of SVT are harvested from Google Street View and have low resolution.
* **Link:** [SVT-download](http://vision.ucsd.edu/~kai/grocr/)

- SVT-P：
- **Introduction:** It contains 639 cropped word images for testing. Images were selected from the side-view angle snapshots in Google Street View. Therefore, most images are heavily distorted by the non-frontal view angle. It is the imporved datasets of SVT.
- **Link:** [SVT-P-download](https://pan.baidu.com/s/1rhYUn1mIo8OZQEGUZ9Nmrg ) \(Password : vnis)

- ICDAR 2015(IC15)：
- **Introduction:** It contains 1500 images in total, 1000 for training and 500 for testing. Specifically, it contains 17548 text instance. It provides word-level annotations. IC15 is the first incidental scene text dataset and it only considers English words.
- **Link:** [IC15-download](http://rrc.cvc.uab.es/?ch=4&com=downloads)

- COCO-Text：
- **Introduction:** It contains 63686 images in total, 43686 for training, 10000 for validating and 10000 for testing. Specifically, it contains 145859 cropped word images for testing, including handwritten and printed, clear and blur, English and non-English.
- **Link:** [COCO-Text-download](https://vision.cornell.edu/se3/coco-text-2/)

- MSRA-TD500：
- **Introduction:** It contains 500 images in total. It provides text-line-level annotation rather than word, and polygon boxes rather than axis-aligned rectangles for text region annootation. It contains both English and Chinese text instance.
- **Link:** [MSRA-TD500-download](http://pages.ucsd.edu/~ztu/Download_front.htm)

- MLT 2017：
- **Introduction:** It contains 10000 natural images in total. It provides word-level annotation. There are 9 languages for MLT. It is a more real and complex datasets for scene text detection and recognition..
- **Link:** [MLT-download](http://rrc.cvc.uab.es/?ch=8)

- MLT 2019:
- **Introduction:** It contains 18000 images in total. It provides word-level annotation. Compared to MLT, this dataset has 10 languages. It is a more real and complex datasets for scene text detection and recognition..
- **Link:** [MLT-2019-download](http://rrc.cvc.uab.es/?ch=15)

- CTW：
- **Introduction:** It contains 32285 high resolution street view images of Chinese text, with 1018402 character instances in total. All images are annotated at the character level, including its underlying character type, bouding box, and 6 other attributes. These attributes indicate whether its background is complex, whether it’s raised, whether it’s hand-written or printed, whether it’s occluded, whether it’s distorted, whether it uses word-art.
- **Link:** [CTW-download](https://ctwdataset.github.io/)

- RCTW-17：
- **Introduction:** It contains 12514 images in total, 11514 for training and 1000 for testing. Images in RCTW-17 were mostly collected by camera or mobile phone, and others were generated images. Text instances are annotated with parallelograms. It is the first large scale Chinese dataset, and was also the largest published one by then.
- **Link:** [RCTW-17-download](http://rctw.vlrlab.net/dataset/)

- ReCTS：
- **Introduction:** This data set is a large-scale Chinese Street View Trademark Data Set. It is based on Chinese words and Chinese text line-level labeling. The labeling method is arbitrary quadrilateral labeling. It contains 20000 images in total.
- **Link:** [ReCTS-download](http://rrc.cvc.uab.es/?ch=12)

### 1.3 Irregular-Text Datasets

- CUTE80：
- **Introduction:** It contains 80 high-resolution images taken in natural scenes. Specifically, it contains 288 cropped word images for testing. The dataset focuses on curved text. No lexicon is provided.
- **Link:** [CUTE80-download](http://cs-chan.com/downloads_CUTE80_dataset.html)

- Total-Text：
- **Introduction:** It contains 1,555 images in total. Specifically, it contains 11,459 cropped word images with more than three different text orientations: horizontal, multi-oriented and curved.
- **Link:** [Total-Text-download](https://github.com/cs-chan/Total-Text-Dataset)

- SCUT-CTW1500：
- **Introduction:** It contains 1500 images in total, 1000 for training and 500 for testing. Specifically, it contains 10751 cropped word images for testing. Annotations in CTW-1500 are polygons with 14 vertexes. The dataset mainly consists of Chinese and English.
- **Link:** [CTW-1500-download](https://github.com/Yuliang-Liu/Curve-Text-Detector)

- LSVT：
- **Introduction:** LSVT consists of 20,000 testing data, 30,000 training data in full annotations and 400,000 training data in weak annotations, which are referred to as partial labels. The labeled text regions demonstrate the diversity of text: horizontal, multi-oriented and curved.
- **Link:** [LSVT-download](https://rrc.cvc.uab.es/?ch=16)

- ArTs：
- **Introduction:** ArT consists of 10,166 images, 5,603 for training and 4,563 for testing. They were collected with text shape diversity in mind and all text shapes have high number of existence in ArT.
- **Link:** [ArT-download](https://rrc.cvc.uab.es/?ch=14)

### 1.4 Synthetic Datasets

* Synth80k :
* **Introduction:** It contains 800 thousands images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.
* **Link:** [Synth80k-download](http://www.robots.ox.ac.uk/~vgg/data/scenetext/)

* SynthText :
* **Introduction:** It contains 6 million cropped word images. The generation process is similar to that of Synth90k. It is also annotated in horizontal-style.
* **Link:** [SynthText-download](https://github.com/ankush-me/SynthText)

### 1.5 Comparison of Datasets

Comparison of Datasets

Datasets
Language
Image
Text instance
Text Shape
Annotation level

Total
Train
Test
Total
Train
Test
Horizontal
Arbitrary-Quadrilateral
Multi-oriented
Char
Word
Text-Line

IC03
English
509
258
251
2266
1110
1156
✓
✕
✕
✕
✓
✕

IC11
English
484
229
255
1564
～
～
✓
✕
✕
✓
✓
✕

IC13
English
462
229
233
1944
849
1095
✓
✕
✕
✓
✓
✕

USTB-SV1K
English
1000
500
500
2955
～
～
✓
✓
✕
✕
✓
✕

SVT
English
350
100
250
725
211
514
✓
✓
✕
✓
✓
✕

SVT-P
English
238
～
～
639
～
～
✓
✓
✕
✕
✓
✕

IC15
English
1500
1000
500
17548
122318
5230
✓
✓
✕
✕
✓
✕

COCO-Text
English
63686
43686
20000
145859
118309
27550
✓
✓
✕
✕
✓
✕

MSRA-TD500
English/Chinese
500
300
200
～
～
～
✓
✓
✕
✕
✕
✓

MLT 2017
Multi-lingual
18000
7200
10800
～
～
～
✓
✓
✕
✕
✓
✕

MLT 2019
Multi-lingual
20000
10000
10000
～
～
～
✓
✓
✕
✕
✓
✕

CTW
Chinese
32285
25887
6398
1018402
812872
205530
✓
✓
✕
✓
✓
✕

RCTW-17
English/Chinese
12514
15114
1000
～
～
～
✓
✓
✕
✕
✕
✓

ReCTS
Chinese
20000
～
～
～
～
～
✓
✓
✕
✓
✓
✕

CUTE80
English
80
～
～
～
～
～
✕
✕
✓
✕
✓
✓

Total-Text
English
1525
1225
300
9330
～
～
✓
✓
✓
✕
✓
✓

CTW-1500
English/Chinese
1500
1000
500
10751
～
～
✓
✓
✓
✕
✓
✓

LSVT
English/Chinese
450000
430000
20000
～
～
～
✓
✓
✓
✕
✓
✓

ArT
English/Chinese
10166
5603
4563
～
～
～
✓
✓
✓
✕
✓
✕

Synth80k
English
80k
～
～
8m
～
～
✓
✕
✕
✓
✓
✕

SynthText
English
800k
～
～
6m
～
～
✓
✓
✕
✕
✓
✕

## 2. Survey
**[A] \[IJCV-2020]** Shangbang Long, Xin He, Cong Yao. **Scene Text Detection and Recognition: The Deep Learning Era**[J]. International Journal of Computer Vision, 2020, 1--24. [arXiv](https://arxiv.org/abs/1811.04256)

**[B] \[TPAMI-2015]** Ye Q, Doermann D. **Text detection and recognition in imagery: A survey**[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6945320)

**[C] \[Frontiers-Comput. Sci-2016]** Zhu Y, Yao C, Bai X. **Scene text detection and recognition: Recent advances and future trends**[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. [paper](https://link.springer.com/article/10.1007/s11704-015-4488-0)

**[D] \[arXiv-2018]** Long S, He X, Ya C. **Scene Text Detection and Recognition: The Deep Learning Era**[J]. arXiv preprint arXiv:1811.04256, 2018. [paper](https://arxiv.org/pdf/1811.04256.pdf)

## 3. Evaluation

If you are insterested in developing better scene text detection metrics, some references recommended here might be useful.

**[A]** Wolf, Christian, and Jean-Michel Jolion. "**Object count/area graphs for the evaluation of object detection and segmentation algorithms.**" International Journal of Document Analysis and Recognition (IJDAR) 8.4 (2006): 280-296. [paper](https://link.springer.com/article/10.1007/s10032-006-0014-0)

**[B]** D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. **ICDAR 2015 competition on robust reading**. In ICDAR, pages 1156–1160, 2015. [paper](https://ieeexplore.ieee.org/document/7333942)

**[C]** Calarasanu, Stefania, Jonathan Fabrizio, and Severine Dubuisson. "**What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions.**" Image and Vision Computing 46 (2016): 1-17. [paper](https://www.sciencedirect.com/science/article/pii/S0262885615001377)

**[D]** Shi, Baoguang, et al. "**ICDAR2017 competition on reading chinese text in the wild (RCTW-17).**" 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017. [paper](https://ieeexplore.ieee.org/abstract/document/8270164)

**[E]** Nayef, N; Yin, F; Bizid, I; et al. **ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identiﬁcation-rrc-mlt**. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE.
[paper](https://ieeexplore.ieee.org/document/8270168)

**[F]** Dangla, Aliona, et al. "**A first step toward a fair comparison of evaluation protocols for text detection algorithms.**" 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 2018. [paper](https://ieeexplore.ieee.org/abstract/document/8395220)

**[G]** He,Mengchao and Liu, Yuliang, et al. **ICPR2018 Contest on Robust Reading for Multi-Type Web images.** ICPR 2018. [paper](https://www.researchgate.net/publication/329316151_ICPR2018_Contest_on_Robust_Reading_for_Multi-Type_Web_Images)

**[H]** Liu, Yuliang and Jin, Lianwen, et al. "**Tightness-aware Evaluation Protocol for Scene Text Detection**" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. [paper](https://arxiv.org/abs/1904.00813) [code](https://github.com/Yuliang-Liu/TIoU-metric)

## 4. OCR Service

| OCR | API | Free |
| :----------------------------------------------------------: | :--: | :--: |
| [Tesseract OCR Engine](https://github.com/tesseract-ocr/tesseract) | × | √ |
| [Azure](https://azure.microsoft.com/zh-cn/services/cognitive-services/computer-vision/#Analysis) | √ | √ |
| [ABBYY](https://www.abbyy.cn/real-time-recognition-sdk/technical-specifications/) | √ | √ |
| [OCR Space](https://ocr.space/) | √ | √ |
| [SODA PDF OCR](https://www.sodapdf.com/ocr-pdf/) | √ | √ |
| [Free Online OCR](https://www.newocr.com/) | √ | √ |
| [Online OCR](https://www.onlineocr.net/) | √ | √ |
| [Super Tools](https://www.wdku.net/) | √ | √ |
| [Online Chinese Recognition](http://chongdata.com/ocr/) | √ | √ |
| [Calamari OCR](https://github.com/Calamari-OCR/calamari) | × | √ |
| [Tencent OCR](https://cloud.tencent.com/product/ocr?lang=cn) | √ | × |

## 5. References and Code

## 场景文本检测

### 深度关系推理图网络用于任意形状文本检测

[1].Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

作者 | Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, Xu-Cheng Yin

单位 | 北京科技大学；中国科学技术大学人工智能联合实验室；腾讯科技（深圳）

代码 | https://github.com/GXYM/DRRG

备注 | CVPR 2020 Oral

解读 | https://blog.csdn.net/SpicyCoder/article/details/105072570

[2].ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

作者 | Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang

单位 | 中国科学技术大学

代码 | https://github.com/wangyuxin87/ContourNet

解读 | https://zhuanlan.zhihu.com/p/135399747

## 场景文本识别

### 论场景文本识别中的词汇依赖性

[3].On Vocabulary Reliance in Scene Text Recognition

作者 | Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao

单位 | 旷视；中国矿业大学；罗切斯特大学

[4].SCATTER: Selective Context Attentional Scene Text Recognizer

作者 | Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, R. Manmatha

单位 | Amazon Web Services

### 语义推理网络，用于场景文本的精确识别

[5].Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

作者 | Deli Yu, Xuan Li, Chengquan Zhang, Tao Liu, Junyu Han, Jingtuo Liu, Errui Ding

单位 | 国科大；百度；中科院

代码 | https://github.com/chenjun2hao/SRN.pytorch

### 语义增强的编解码框架，用于识别低质量图像（模糊、光照不均、字符不完整等）场景文本

[6].SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

作者 | Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, Weiping Wang

单位 | 中科院；国科大

代码 | https://github.com/Pay20Y/SEED（即将）

## 手写文本识别

[7].OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold

作者 | Mohamed Yousef, Tom E. Bishop

单位 | Intuition Machines, Inc

代码 | https://github.com/IntuitionMachines/OrigamiNet

## Scene Text Spotting

### 实时端到端场景文本识别

[8].ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

作者 | Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang

单位 | 华南理工大学；阿德莱德大学；

代码 | https://github.com/Yuliang-Liu/bezier_curve_text_spotting

备注 | CVPR 2020 Oral

解读 | https://zhuanlan.zhihu.com/p/146276834

## 手写文本生成

半监督变长手写文本生成，增加文本数据集，提高识别算法精度

[9].ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation

作者 | Sharon Fogel, Hadar Averbuch-Elor, Sarel Cohen, Shai Mazor, Roee Litman

单位 | 以色列国，Amazon Rekognition；康奈尔大学

代码 | https://github.com/amzn/convolutional-handwriting-gan

## 场景文本合成

使用渲染引擎合成场景文本，增加训练样本，提升识别算法精度

[10].UnrealText: Synthesizing Realistic Scene Text Images From the Unreal

作者 | WorldShangbang Long, Cong Yao

单位 | 卡内基梅隆大学；旷视

代码 | https://jyouhou.github.io/UnrealText/

解读 | https://zhuanlan.zhihu.com/p/137406773

## 数据增广+文本识别

图像增广用于手写与场景文本识别

[11].Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

作者 | Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang

单位 | 华南理工大学；阿里

代码 | https://github.com/Canjie-Luo/Text-Image-Augmentation

## 场景文本编辑

[12].STEFANN: Scene Text Editor Using Font Adaptive Neural Network

作者 | Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal

单位 | 印度统计研究所；印度理工学院

代码 | https://github.com/prasunroy/stefann

网站 | https://prasunroy.github.io/stefann/

## 碎纸文档重建

破碎纸片重建文档，用于法医等刑侦调查

[13].Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning

作者 | Thiago M. Paixao, Rodrigo F. Berriel, Maria C. S. Boeres, Alessandro L. Koerich, Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos

单位 | IFES，Brazil；UFES，Brazil；ETS，Canada

## 文本风格迁移

[14].SwapText: Image Based Texts Transfer in Scenes

作者 | Qiangpeng Yang, Jun Huang, Wei Lin

单位 | 阿里

## 场景文本识别+对抗攻击

[15].What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images

作者 | Xing Xu, Jiefu Chen, Jinhui Xiao, Lianli Gao, Fumin Shen, Heng Tao Shen

单位 | 电子科技大学

## 笔迹鉴定

[16].Sequential Motif Profiles and Topological Plots for Offline Signature Verification

作者 | Elias N. Zois, Evangelos Zervas, Dimitrios Tsourounis, George Economou

单位 | University of West Attica ；派图拉斯大学

# Contribute & Acknowledge

## Contributing

Feel free to dive in! [Open an issue](https://github.com/Charmve/Scene-Text-Detection/issues/new) or submit PRs.

## Acknowledge

This project exists thanks to all the people who contribute.

More sincerely, I'm appreciate to @HCIILAB & @Jyouhou

# License

# Copyright

*Last updated in July 2020.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Charmve/awesome-scene-text-detection

Awesome Lists containing this project

README