https://github.com/HCIILAB/Scene-Text-Detection

Last synced: 8 months ago
JSON representation
Host: GitHub
URL: https://github.com/HCIILAB/Scene-Text-Detection
Owner: HCIILAB
Created: 2019-05-15T09:25:49.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-09-07T05:15:06.000Z (about 2 years ago)
Last Synced: 2024-11-03T09:33:37.708Z (about 1 year ago)
Size: 115 KB
Stars: 540
Watchers: 32
Forks: 129
Open Issues: 2
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

awesome-deep-text-detection-recognition - HCIILAB-Detection
awesome-semantic-understanding-for-aerial-scene - Scene-Text-Detection
README

          # Scene Text Detection Resources

Author: Chongyu Liu


# Updates

Dec 4, 2020: Add 2 papers from CVPR2020/ECCV2020 and update corresponding tables.

------

- [1.Datasets](#1-datasets)

    - [1.1 Horizontal-Text Datasets](#11-Horizontal-Text-Datasets)

    - [1.2 Arbitrary-Quadrilateral-Text Datasets](#12-Arbitrary-Quadrilateral-Text-Datasets)

    - [1.3 Irregular-Text Datasets](#13-Irregular-Text-Datasets)

    - [1.4 Synthetic Datasets](#14-synthetic-datasets)

    - [1.5 Comparison of Datasets](#15-comparison-of-datasets)

- [2. Summary of Scene Text Detection Resources](#2-summary-of-scene-text-detection-results)

    - [2.1 Comparison of Methods](#21-comparison-of-methods)

        - [2.1.1 Traditional Methods](#211-traditional-methods)

        - [2.1.2 Segmentation-based Methods](#212-Pixel-level-methods-methods)

        - [2.1.3 Regression-based Methods](#213-regression-methods)

        - [2.1.4 Hybrid Methods](#214-hybrid-methods)

    - [2.2 Detection Results](#22-detection-result)

        - [2.2.1 Detection Results on Horizontal-Text Datasets](#221-Detection-Results-on-Horizontal-Text-Datasets)

        - [2.2.2 Detection Results on Arbitrary-Quadrilateral-Text Datasets](#222-Detection-Results-on-Arbitrary-Quadrilateral-Text-Datasets)

        - [2.2.3 Detection Results on Irregular-Text Datasets](#223-Detection-Results-on-Irregular-Text-Datasets)

- [3. Survey](#3-survey)

- [4. Evaluation](#4-Evaluation)

- [5. OCR Service](#5-ocr-service)

- [6. References and Code](#6-references)

------



## 1. Datasets



### 1.1 Horizontal-Text Datasets

- ICDAR 2003(IC03)：

  * **Introduction:** It contains 509 images in total, 258 for training and 251 for testing. Specifically, it contains 1110 text instance in training set, while 1156 in testing set. It has word-level annotation. IC03 only consider English text instance.

  * **Link:** [IC03-download](http://www.iapr-tc11.org/mediawiki/index.php?title=ICDAR_2003_Robust_Reading_Competitions)

- ICDAR 2011(IC11):

  * **Introduction:** IC11 is an English dataset for text detection.  It contains 484 images, 229 for training and 255 for testing. There are 1564 text instance in this dataset. It provides both word-level and character-level annotation.

  * **Link:** [IC11-download](http://www.cvc.uab.es/icdar2011competition/?com=downloads)   

- ICDAR 2013(IC13)：

  * **Introduction:** IC13 is almost the same as IC11. It contains 462 images in total, 229 for training and 233 for testing. Specifically, it contains 849 text instance in training set, while 1095 in testing set.

  * **Link:** [IC13-download](http://dagdata.cvc.uab.es/icdar2013competition/?ch=2&com=downloads)



### 1.2 Arbitrary-Quadrilateral-Text Datasets

- USTB-SV1K：

  * **Introduction:** USTB-SV1K is an English dataset.  It contains 1000 street images from Google Street View  with  2955 text instance  in total. It only provides word-level annotations.  

  * **Link:** [USTB-SV1K-download](http://prir.ustb.edu.cn/TexStar/MOMV-text-detection/)

- SVT：

  * **Introduction:** It contains 350 images with 725 English text intance in total. SVT has both character-level and word-level annotations. The images of SVT are harvested from Google Street View and have low resolution.

  * **Link:** [SVT-download](http://vision.ucsd.edu/~kai/grocr/)

- SVT-P：

  - **Introduction:** It contains 639 cropped word images for testing. Images were selected from the side-view angle snapshots in Google Street View. Therefore, most images are heavily distorted by the non-frontal view angle. It is the imporved datasets of SVT.

  - **Link:** [SVT-P-download](https://pan.baidu.com/s/1rhYUn1mIo8OZQEGUZ9Nmrg )  \(Password : vnis)

- ICDAR 2015(IC15)：

  - **Introduction:** It contains 1500 images in total, 1000 for training and 500 for testing. Specifically, it contains 17548 text instance. It provides word-level annotations. IC15 is the first incidental scene text dataset and it only considers English words.

  - **Link:** [IC15-download](http://rrc.cvc.uab.es/?ch=4&com=downloads)

- COCO-Text：

  - **Introduction:** It contains 63686 images in total, 43686 for training, 10000 for validating and 10000 for testing. Specifically, it contains 145859  cropped word images for testing, including handwritten and printed, clear and blur, English and non-English.

  - **Link:** [COCO-Text-download](https://vision.cornell.edu/se3/coco-text-2/)

- MSRA-TD500：

  - **Introduction:** It contains 500 images in total. It provides text-line-level annotation rather than word, and polygon boxes rather than axis-aligned rectangles for text region annootation. It contains both English and Chinese text instance.

  - **Link:** [MSRA-TD500-download](http://pages.ucsd.edu/~ztu/Download_front.htm)

- MLT 2017：

  - **Introduction:** It contains 10000 natural images in total. It provides word-level annotation. There are 9 languages for MLT. It is a more real and complex datasets for scene text detection and recognition..

  - **Link:** [MLT-download](http://rrc.cvc.uab.es/?ch=8)

- MLT 2019:

  - **Introduction:** It contains 18000 images in total. It provides word-level annotation. Compared to MLT,  this dataset has 10 languages. It is a more real and complex datasets for scene text detection and recognition..

  - **Link:** [MLT-2019-download](http://rrc.cvc.uab.es/?ch=15)

- CTW：

  - **Introduction:** It contains 32285 high resolution street view images of Chinese text, with 1018402 character instances in total. All images are annotated at the character level, including its underlying character type, bouding box, and 6 other attributes. These attributes indicate whether its background is complex, whether it’s raised, whether it’s hand-written or printed, whether it’s occluded, whether it’s distorted, whether it uses word-art.

  - **Link:** [CTW-download](https://ctwdataset.github.io/)

- RCTW-17：

  - **Introduction:** It contains 12514 images in total, 11514 for training and 1000 for testing. Images in RCTW-17 were mostly collected by camera or mobile phone, and others were generated images. Text instances are annotated with parallelograms. It is the first large scale Chinese dataset, and was also the largest published one by then.

  - **Link:** [RCTW-17-download](http://rctw.vlrlab.net/dataset/)

- ReCTS：

  - **Introduction:** This data set is a large-scale Chinese Street View Trademark Data Set. It is based on Chinese words and Chinese text line-level labeling. The labeling method is arbitrary quadrilateral labeling. It contains 20000 images in total.

  - **Link:** [ReCTS-download](http://rrc.cvc.uab.es/?ch=12)



### 1.3 Irregular-Text Datasets

- CUTE80：

  - **Introduction:** It contains 80 high-resolution images taken in natural scenes. Specifically, it contains 288 cropped word images for testing. The dataset focuses on curved text. No lexicon is provided.

  - **Link:** [CUTE80-download](http://cs-chan.com/downloads_CUTE80_dataset.html)

- Total-Text：

  - **Introduction:** It contains 1,555 images in total. Specifically, it contains 11,459  cropped word images with more than three different text orientations: horizontal, multi-oriented and curved.

  - **Link:** [Total-Text-download](https://github.com/cs-chan/Total-Text-Dataset)

- SCUT-CTW1500：

  - **Introduction:** It contains 1500 images in total, 1000 for training and 500 for testing. Specifically, it contains 10751 cropped word images for testing. Annotations in CTW-1500 are polygons with 14 vertexes. The dataset mainly consists of Chinese and English.

  - **Link:** [CTW-1500-download](https://github.com/Yuliang-Liu/Curve-Text-Detector)

- LSVT：

  - **Introduction:** LSVT consists of 20,000 testing data, 30,000 training data in full annotations and 400,000 training data in weak annotations, which are referred to as partial labels. The labeled text regions demonstrate the diversity of text: horizontal, multi-oriented and curved.

  - **Link:** [LSVT-download](https://rrc.cvc.uab.es/?ch=16)

- ArTs：

  - **Introduction:** ArT consists of 10,166 images, 5,603 for training and 4,563 for testing. They were collected with text shape diversity in mind and all text shapes have high number of existence in ArT.

  - **Link:** [ArT-download](https://rrc.cvc.uab.es/?ch=14)



### 1.4 Synthetic Datasets

* Synth80k :

  * **Introduction:** It contains 800 thousands images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.

  * **Link:** [Synth80k-download](http://www.robots.ox.ac.uk/~vgg/data/scenetext/)

* SynthText :

  * **Introduction:** It contains 6 million cropped word images. The generation process is similar to that of Synth90k. It is also annotated in horizontal-style.  

  * **Link:** [SynthText-download](https://github.com/ankush-me/SynthText)



### 1.5 Comparison of Datasets

	

	

	

	

	

	

	

		Comparison of Datasets

		

	

		Datasets

		Language

		Image

		Text instance 

		Text Shape

		Annotation level

		

	

		Total

		Train

		Test

		Total

		Train

		Test

		Horizontal

		Arbitrary-Quadrilateral

		Multi-oriented

		Char

		Word

		Text-Line

	

	

		IC03

		English

		509

		258

		251

		2266

		1110

		1156

		✓

		✕

		✕

		✕

		✓

		✕

	

	

		IC11

		English

		484

		229

		255

		1564

		～

		～

		✓

		✕

		✕

		✓

		✓

		✕

	

	

		IC13

		English

		462

		229

		233

		1944

		849

		1095

		✓

		✕

		✕

		✓

		✓

		✕

	

	

		USTB-SV1K

		English

		1000

		500

		500

		2955

		～

		～

		✓

		✓

		✕

		✕

		✓

		✕

	

	

		SVT

		English

		350

		100

		250

		725

		211

		514

		✓

		✓

		✕

		✓

		✓

		✕

	

	

		SVT-P

		English

		238

		～

		～

		639

		～

		～

		✓

		✓

		✕

		✕

		✓

		✕

	

	

		IC15

		English

		1500

		1000

		500

		17548

		122318

		5230

		✓

		✓

		✕

		✕

		✓

		✕

	

	

		COCO-Text

		English

		63686

		43686

		20000

		145859

		118309

		27550

		✓

		✓

		✕

		✕

		✓

		✕

	

	

		MSRA-TD500

		English/Chinese

		500

		300

		200

		～

		～

		～

		✓

		✓

		✕

		✕

		✕

		✓

	

	

		MLT 2017

		Multi-lingual

		18000

		7200

		10800

		～

		～

		～

		✓

		✓

		✕

		✕

		✓

		✕

	

	

		MLT 2019

		Multi-lingual

		20000

		10000

		10000

		～

		～

		～

		✓

		✓

		✕

		✕

		✓

		✕

	

	

		CTW

		Chinese

		32285

		25887

		6398

		1018402

		812872

		205530

		✓

		✓

		✕

		✓

		✓

		✕

	

	

		RCTW-17

		English/Chinese

		12514

		15114

		1000

		～

		～

		～

		✓

		✓

		✕

		✕

		✕

		✓

	

	

		ReCTS

		Chinese

		20000

		～

		～

		～

		～

		～

		✓

		✓

		✕

		✓

		✓

		✕

	

	

		CUTE80

		English

		80

		～

		～

		～

		～

		～

		✕

		✕

		✓

		✕

		✓

		✓

	

	

		Total-Text

		English

		1525

		1225

		300

		9330

		～

		～

		✓

		✓

		✓

		✕

		✓

		✓

	

	

		CTW-1500

		English/Chinese

		1500

		1000

		500

		10751

		～

		～

		✓

		✓

		✓

		✕

		✓

		✓

	

	

		LSVT

		English/Chinese

		450000

		430000

		20000

		～

		～

		～

		✓

		✓

		✓

		✕

		✓

		✓

	

  

    ArT

    English/Chinese

    10166

    5603

    4563

    ～

    ～

    ～

    ✓

    ✓

    ✓

    ✕

    ✓

    ✕

  

	

		Synth80k

		English

		80k

		～

		～

		8m

		～

		～

		✓

		✕

		✕

		✓

		✓

		✕

	

	

		SynthText 

		English

		800k

		～

		～

		6m

		～

		～

		✓

		✓

		✕

		✕

		✓

		✕

	



## 2. Summary of Scene Text Detection Resources



### 2.1 Comparison of Methods

Scene text detection methods can be devided into four parts:

 **(a) Traditional methods;**

 **(b) Segmentation-based methods;**

 **(c) Regression-based methods;**

 **(d) Hybrid methods.**

It is important to notice that:  (1) "Hori" stands for horizontal scene text datasets. (2) "Quad" stands for arbitrary-quadrilateral-text datasets. (3) "Irreg" stands for  irregular scence text datasets. (4) "Traditional method" stands for the methods that don't rely on deep learning.



#### 2.1.1 Traditional Methods

	

	

	

	

		      Method       

		    Model     

		Code

		Hori

		Quad

		Irreg

		Source 

		Time

		                                                        Highlight                                                        

	

	

		Yao et al. [1]

		TD-Mixture

		✕

		✓

		✓

		✕

		CVPR

		2012

		1) A new dataset MSRA-TD500 and  protocol for evaluation. 2) Equipped a two-level classification scheme and two sets of features extractor.

	

	

		Yin et al. [2]

		


		✕

		✓

		✕

		✕

		TPAMI

		2013

		Extract Maximally Stable Extremal Regions (MSERs) as character candidates and group them together.

	

	

		Le et al. [5]

		HOCC

		✕

		✓

		✓

		✕

		CVPR

		2014

		HOCC + MSERs

	

	

		Yin et al. [7]

		


		✕

		✓

		✓

		✕

		TPAMI

		2015

		Presenting a unified distance metric learning framework for adaptive hierarchical clustering.

	

	

		Wu et al. [9]

		


		✕

		✓

		✓

		✕

		TMM

		2015

		Exploring gradient directional symmetry at component level for smoothing edge components before text detection. 

	

	

		Tian et al. [17]

		


		✕

		✓

		✕

		✕

		IJCAI

		2016

		Scene text is first detected locally in individual frames and finally linked by an optimal tracking trajectory.

	

	

		Yang et al. [33]

		


		✕

		✓

		✓

		✕

		TIP

		2017

		A text detector will locate character candidates and extract text regions. Then they will linked  by an optimal tracking trajectory.

	

	

		Liang et al. [8]

		


		✕

		✓

		✓

		✓

		TIP

		2015

		Exploring maxima stable extreme regions along with stroke width transform for detecting candidate text regions.

	

	

		Michal et al.[12]

		FASText

		✕

		✓

		✓

		✕

		ICCV

		2015

		Stroke keypoints are efficiently detected and then exploited to obtain stroke segmentations.

	



#### 2.1.2 Segmentation-based Methods

	

	

	

	

		       Method      

		    Model     

		Code

		Hori

		Quad

		Irreg

		Source 

		Time

		                                                                 Highlight                                                             

	

	

		Li et al. [3]

		


		✕

		✓

		✓

		✕

		TIP

		2014

		(1)develop three novel cues that are tailored for character detection and a Bayesian method for their integration; (2)design a Markov random field model to exploit the inherent dependencies between characters.

	

	

		Zhang et al. [14]

		


		✕

		✓

		✓

		✕

		CVPR

		2016

		Utilizing FCN for salient map detection and  centroid of each character prediction.

	

	

		Zhu et al. [16]

		


		✕

		✓

		✓

		✕

		CVPR

		2016

		Performs a graph-based segmentation of connected components into words (Word-Graph).

	

	

		He et al. [18]

		Text-CNN

		✕

		✓

		✓

		✕

		TIP

		2016

		Developing a new learning mechanism to train the Text-CNN with multi-level and rich supervised information.

	

	

		Yao et al. [21]

		


		✕

		✓

		✓

		✕

		arXiv

		2016

		Proposing to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem.

	

	

		Hu et al. [27]

		WordSup

		✕

		✓

		✓

		✕

		ICCV

		2017

		Proposing a weakly supervised framework that can utilize word annotations. Then the detected characters are fed to a text structure analysis module.

	

	

		Wu et al. [28]

		


		✕

		✓

		✓

		✕

		ICCV

		2017

		Introducing the border class to the text detection problem for the first time, and validate that the decoding process is largely simplified with the help of text border.

	

	

		Tang et al.[32]

		


		✕

		✓

		✕

		✕

		TIP

		2017

		A text-aware candidate text region(CTR) extraction model + CTR refinement model.

	

	

		Dai et al. [35]

		FTSN

		✕

		✓

		✓

		✕

		arXiv

		2017

		Detecting and segmenting the text instance jointly and simultaneously, leveraging merits from both semantic segmentation task and region proposal based object detection task.

	

	

		Wang et al. [38]

		


		✕

		✓

		✕

		✕

		ICDAR

		2017

		This paper proposes a novel character candidate extraction method based on super-pixel segmentation and hierarchical clustering.

	

	

		Deng et al. [40]

		PixelLink

		✓

		✓

		✓

		✕

		AAAI

		2018

		Text instances are first segmented out by linking pixels wthin the same instance together.

	

	

		Liu et al. [42]

		MCN

		✕

		✓

		✓

		✕

		CVPR

		2018

		Stochastic Flow Graph (SFG) + Markov Clustering.

	

	

		Lyu et al. [43]

		


		✕

		✓

		✓

		✕

		CVPR

		2018

		Detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions.

	

	

		Chu et al. [45]

		Border

		✕

		✓

		✓

		✕

		ECCV

		2018

		The paper presents a novel scene text detection technique that makes use of semantics-aware text borders and bootstrapping based text segment augmentation.

	

	

		Long et al. [46]

		TextSnake

		✕

		✓

		✓

		✓

		ECCV

		2018

		The paper proposes TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms based on symmetry axis.

	

	

		Yang et al. [47]

		IncepText

		✕

		✓

		✓

		✕

		IJCAI

		2018

		Designing a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection.

	

	

		Yue et al. [48]

		


		✕

		✓

		✓

		✕

		BMVC

		2018

		Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously.

	

	

		Zhong et al. [53]

		AF-RPN

		✕

		✓

		✓

		✕

		arXiv

		2018

		Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework.

	

	

		Wang et al. [54]

		PSENet

		✓

		✓

		✓

		✓

		CVPR

		2019

		Proposing a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance.

	

	

		Xu et al.[57]

		TextField

		✕

		✓

		✓

		✓

		arXiv

		2018

		Presenting a novel direction field which can represent scene texts of arbitrary shapes.

	

	

		Tian et al. [58]

		FTDN

		✕

		✓

		✓

		✕

		ICIP

		2018

		FTDN is able to segment text region and simultaneously regress text box at pixel-level.

	

	

		Tian et al. [83]

		


		✕

		✓

		✓

		✓

		CVPR

		2019

		Constraining embedding feature of pixels inside the same text region to share similar properties.

	

	

		Huang et al. [4]

		MSERs-CNN

		✕

		✓

		✕

		✕

		ECCV

		2014

		Combining MSERs with CNN

	

	

		Sun et al. [6]

		


		✕

		✓

		✕

		✕

		PR

		2015

		Presenting a robust text detection approach based on color-enhanced CER and neural networks.

	

	

		Baek et al. [62]

		CRAFT

		✕

		✓

		✓

		✓

		CVPR

		2019

		Proposing CRAFT effectively detect text area by exploring each character and affinity between characters.

	

	

		Richardson et al. [87]

		


		✕

		✓

		✓

		✕

		WACV

		2019

		Presenting an additional scale predictor the estimate the better scale of text regions for testing.

	

	

		Wang et al. [88]

		SAST

		✕

		✓

		✓

		✓

		ACMM

		2019

		Presenting  a context attended multi-task learning framework for scene text detection.

	

	

		Wang et al. [90]

		PAN

		✕

		✓

		✓

		✓

		ICCV

		2019

		Proposing an efﬁcient and accurate arbitrary-shaped text detector called Pixel Aggregation Network(PAN),

	



#### 2.1.3 Regression-based Methods

	

	

	

  

		      Method       

		    Model     

		Code

		Hori

		Quad

		Irreg

		Source 

		Time

		                                                      Highlight                                                                        

	

	

		Gupta et al. [15]

		FCRN

		✓

		✓

		✕

		✕

		CVPR

		2016

		(a) Proposing a fast and scalable engine to generate synthetic images of text in clutter; (b) FCRN. 

	

	

		Zhong et al. [20]

		DeepText

		✕

		✓

		✕

		✕

		arXiv

		2016

		(a) Inception-RPN; (b) Utilize ambiguous text category (ATC) information and multilevel region-of-interest pooling (MLRP).

	

	

		Liao et al. [22]

		TextBoxes

		✓

		✓

		✕

		✕

		AAAI

		2017

		Mainly basing SSD object detection framework.

	

	

		Liu et al. [25]

		DMPNet

		✕

		✓

		✓

		✕

		CVPR

		2017

		Quadrilateral sliding windows + shared Monte-Carlo method for fast and accurate computing of the polygonal areas + a sequential protocol for relative regression.

	

	

		He et al. [26]

		DDR

		✕

		✓

		✓

		✕

		ICCV

		2017

		Proposing an FCN that has bi-task outputs where one is pixel-wise classification between text and non-text, and the other is direct regression to determine the vertex coordinates of quadrilateral text boundaries.

	

	

		Jiang et al. [36]

		R2CNN

		✕

		✓

		✓

		✕

		arXiv

		2017

		Using the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations.

	

	

		Xing et al. [37]

		ArbiText

		✕

		✓

		✓

		✕

		arXiv

		2017

		Adopting the circle anchors and incorporating a pyramid pooling module into the Single Shot MultiBox Detector framework.

	

	

		Zhang et al. [39]

		FEN

		✕

		✓

		✕

		✕

		AAAI

		2018

		Proposing a refined scene text detector with a novel Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement.

	

	

		Wang et al. [41]

		ITN

		✕

		✓

		✓

		✕

		CVPR

		2018

		ITN is presented to learn the geometry-aware representation encoding the unique geometric configurations of scene text instances with in-network transformation embedding.

	

	

		Liao et al. [44]

		RRD

		✕

		✓

		✓

		✕

		CVPR

		2018

		The regression branch extracts rotation-sensitive features, while the classification branch extracts rotation-invariant features by pooling the rotation sensitive features.

	

	

		Liao et al. [49]

		TextBoxes++

		✓

		✓

		✓

		✕

		TIP

		2018

		Mainly basing SSD object detection framework and it replaces the rectangular box representation in conventional object detector by a quadrilateral or oriented rectangle representation.

	

	

		He et al. [50]

		


		✕

		✓

		✓

		✕

		TIP

		2018

		Proposing a scene text detection framework based on fully convolutional network with a bi-task prediction module.

	

	

		Ma et al. [51]

		RRPN

		✓

		✓

		✓

		✕

		TMM

		2018

		RRPN + RRoI Pooling.

	

	

		Zhu et al. [55]

		SLPR

		✕

		✓

		✓

		✓

		arXiv

		2018

		SLPR regresses multiple points on the edge of text line and then utilizes these points to sketch the outlines of the text.

	

	

		Deng et al. [56]

		


		✓

		✓

		✓

		✕

		arXiv

		2018

		CRPN employs corners to estimate the possible locations of text instances. And it also designs a embedded data augmentation module inside region-wise subnetwork.

	

	

		Cai et al. [59]

		FFN

		✕

		✓

		✕

		✕

		ICIP

		2018

		Proposing a Feature Fusion Network to deal with text regions differing in enormous sizes.

	

	

		Sabyasachi et al. [60]

		RGC

		✕

		✓

		✓

		✕

		ICIP

		2018

		Proposing a novel recurrent architecture to improve the learnings of a feature map at a given time.

	

	

		Liu et al. [63]

		CTD

		✓

		✓

		✓

		✓

		PR

		2019

		CTD + TLOC + PNMS

	

	

		Xie et al. [79]

		DeRPN

		✓

		✓

		✕

		✕

		AAAI

		2019

		DeRPN utilizes anchor string mechanism instead of anchor box in RPN.

	

	

		Wang et al. [82]

		


		✕

		✓

		✓

		✓

		CVPR

		2019

		Text-RPN  + RNN

	

	

		Liu et al. [84]

		


		✕

		✓

		✓

		✓

		CVPR

		2019

		CSE mechanism

	

	

		He et al. [29]

		SSTD

		✓

		✓

		✓

		✕

		ICCV

		2017

		Proposing an attention mechanism. Then developing a hierarchical inception module which efficiently aggregates multi-scale inception features.

	

	

		Tian et al. [11]

		


		✕

		✓

		✕

		✕

		ICCV

		2015

		Cascade boosting detects character candidates, and  the min-cost flow network model get the final result.

	

	

		Tian et al. [13]

		CTPN

		✓

		✓

		✕

		✕

		ECCV

		2016

		1) RPN + LSTM. 2) RPN incorporate a new vertical anchor mechanism and LSTM connects the region to get the final result.

	

	

		He et al. [19]

		


		✕

		✓

		✓

		✕

		ACCV

		2016

		ER detetctor detects regions to get coarse prediction of text regions. Then the local context is aggregated to classify the remaining regions to obtain a final prediction.

	

	

		Shi et al. [23]

		SegLink

		✓

		✓

		✓

		✕

		CVPR

		2017

		Decomposing text into segments and links. A link connects two adjacent segments. 

	

	

		Tian et al. [30]

		WeText

		✕

		✓

		✕

		✕

		ICCV

		2017

		Proposing a weakly supervised scene text detection method (WeText).

	

	

		Zhu et al. [31]

		RTN

		✕

		✓

		✕

		✕

		ICDAR

		2017

		Mainly basing CTPN vertical vertical proposal mechanism.

	

	

		Ren et al. [34]

		


		✕

		✓

		✕

		✕

		TMM

		2017

		Proposing a CNN-based detector. It contains a text structure component detector layer, a spatial pyramid layer, and a multi-input-layer deep belief network (DBN).

	

	

		Zhang et al. [10]

		


		✕

		✓

		✕

		✕

		CVPR

		2015

		The proposed algorithm exploits the symmetry property of character groups and allows for direct extraction of text lines from natural images.

	

	

		Wang et al. [86]

		DSRN

		✕

		✓

		✓

		✕

		IJCAI

		2019

		Presenting a scale-transfer module and scale relationship module to handle the problem of scale variation.

	

	

		Tang et al.[89]

		Seglink++

		✕

		✓

		✓

		✓

		PR

		2019

		Presenting instance aware component grouping (ICG) for arbitrary-shape text detection.

	

	

		Wang et al.[92]

		ContourNet

		✓

		✓

		✓

		✓

		CVPR

		2020

		1.A scale-insensitive Adaptive Region Proposal Network (AdaptiveRPN); 2. Local Orthogonal Texture-aware Module (LOTM).

	



#### 2.1.4 Hybrid Methods

	

	

	

	

  

		       Method      

		    Model     

		Code

		Hori

		Quad

		Irreg

		Source 

		Time

		                                                             Highlight                                                                 

	

	

		Tang et al. [52]

		SSFT

		✕

		✓

		✕

		✕

		TMM

		2018

		Proposing a novel scene text detection method that involves superpixel-based stroke feature transform (SSFT) and deep learning based region classification (DLRC).

	

	

		Xie et al.[61]

		SPCNet

		✕

		✓

		✓

		✓

		AAAI

		2019

		Text Context module + Re-Score mechanism.

	

	

		Liu et al. [64]

		PMTD

		✓

		✓

		✓

		✕

		arXiv

		2019

		Perform “soft” semantic segmentation. It assigns a soft pyramid label (i.e., a real value between 0 and 1) for each pixel within text instance.

	

	

		Liu et al. [80]

		BDN

		✓

		✓

		✓

		✕

		IJCAI

		2019

		Discretizing bouding boxes into key edges to address label confusion for text detection.

	

	

		Zhang et al. [81]

		LOMO

		✕

		✓

		✓

		✓

		CVPR

		2019

		DR + IRM + SEM

	

	

		Zhou et al. [24]

		EAST

		✓

		✓

		✓

		✕

		CVPR

		2017

		The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images with instance segmentation.

	

	

		Yue et al. [48]

		


		✕

		✓

		✓

		✕

		BMVC

		2018

		Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously.

	

	

		Zhong et al. [53]

		AF-RPN

		✕

		✓

		✓

		✕

		arXiv

		2018

		Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework.

	

	

		Xue et al.[85]

		MSR

		✕

		✓

		✓

		✓

		IJCAI

		2019

		Presenting a noval multi-scale regression network.

	

	

		Liao et al. [91]

		DB

		✓

		✓

		✓

		✓

		AAAI

		2020

		Presenting differentiable binarization module to adaptively set the thresholds for binarization, which simpliﬁes the post-processing.

	

	

		Xiao et al. [93]

		SDM

		✕

		✓

		✓

		✓

		ECCV

		2020

		1. A novel sequential deformation method; 2. auxiliary character counting supervision.

	



### 2.2 Detection Results



#### 2.2.1 Detection Results on Horizontal-Text Datasets

	

	

	

	

	

		Method               

		Model

		Source

		Time

		Method Category

		IC11[68]

		IC13 [69]

		IC05[67]

		

	

		P

		R

		F

		P

		R

		F

		P

		R

		F

	

	

		Yao et al. [1]

		TD-Mixture

		CVPR

		2012

		Traditional 

		~

		~

		~

		0.69

		0.66

		0.67

		~

		~

		~

	

	

		Yin et al. [2]

		


		TPAMI

		2013

		0.86

		0.68

		0.76

		~

		~

		~

		~

		~

		~

	

	

		Yin et al. [7]

		


		TPAMI

		2015

		0.838

		0.66

		0.738

		~

		~

		~

		~

		~

		~

	

	

		Wu et al. [9]

		


		TMM

		2015

		~

		~

		~

		0.76

		0.70

		0.73

		~

		~

		~

	

	

		Liang et al. [8]

		


		TIP

		2015

		0.77

		0.68

		0.71

		0.76

		0.68

		0.72

		~

		~

		~

	

	

		Michal et al.[12]

		FASText

		ICCV

		2015

		~

		~

		~

		0.84

		0.69

		0.77

		~

		~

		~

	

	

		Li et al. [3]

		


		TIP

		2014

		Segmentation

		0.80

		0.62

		0.70

		~

		~

		~

		~

		~

		~

	

	

		Zhang et al. [14]

		


		CVPR

		2016

		~

		~

		~

		0.88

		0.78

		0.83

		~

		~

		~

	

	

		He et al. [18]

		Text-CNN

		TIP

		2016

		0.91

		0.74

		0.82

		0.93

		0.73

		0.82

		0.87

		0.73

		0.79

	

	

		Yao et al. [21]

		


		arXiv

		2016

		~

		~

		~

		0.889

		0.802

		0.843

		~

		~

		~

	

	

		Hu et al. [27]

		WordSup

		ICCV

		2017

		~

		~

		~

		0.933

		0.875

		0.903

		~

		~

		~

	

	

		Tang et al.[32]

		


		TIP

		2017

		0.90

		0.86

		0.88

		0.92

		0.87

		0.89

		~

		~

		~

	

	

		Wang et al. [38]

		


		ICDAR

		2017

		0.87

		0.78

		0.82

		0.87

		0.82

		0.84

		~

		~

		~

	

	

		Deng et al. [40]

		PixelLink

		AAAI

		2018

		~

		~

		~

		0.886

		0.875

		0.881

		~

		~

		~

	

	

		Liu et al. [42]

		MCN

		CVPR

		2018

		~

		~

		~

		0.88

		0.87

		0.88

		~

		~

		~

	

	

		Lyu et al. [43]

		


		CVPR

		2018

		~

		~

		~

		0.92

		0.844

		0.880

		~

		~

		~

	

	

		Chu et al. [45]

		Border

		ECCV

		2018

		~

		~

		~

		0.915

		0.871

		0.892

		~

		~

		~

	

	

		Wang et al. [54]

		PSENet

		CVPR

		2019

		~

		~

		~

		0.94

		0.90

		0.92

		~

		~

		~

	

	

		Huang et al. [4]

		MSERs-CNN

		ECCV

		2014

		0.88

		0.71

		0.78

		~

		~

		~

		0.84

		0.67

		0.75

	

	

		Sun et al. [6]

		


		PR

		2015

		0.92

		0.91

		0.91

		0.94

		0.92

		0.93

		~

		~

		~

	

	

		Gupta et al. [15]

		FCRN

		CVPR

		2016

		Regression

		0.94

		0.77

		0.85

		0.938

		0.764

		0.842

		~

		~

		~

	

	

		Zhong et al. [20]

		DeepText

		arXiv

		2016

		0.87

		0.83

		0.85

		0.85

		0.81

		0.83

		~

		~

		~

	

	

		Liao et al. [22]

		TextBoxes

		AAAI

		2017

		0.89

		0.82

		0.86

		0.89

		0.83

		0.86

		~

		~

		~

	

	

		Liu et al. [25]

		DMPNet

		CVPR

		2017

		~

		~

		~

		0.93

		0.83

		0.870

		~

		~

		~

	

	

		Jiang et al. [36]

		R2CNN

		arXiv

		2017

		~

		~

		~

		0.92

		0.81

		0.86

		~

		~

		~

	

	

		Xing et al. [37]

		ArbiText

		arXiv

		2017

		~

		~

		~

		0.826

		0.936

		0.877

		~

		~

		~

	

	

		Wang et al. [41]

		ITN

		CVPR

		2018

		0.896

		0.889

		0.892

		0.941

		0.893

		0.916

		~

		~

		~

	

	

		Liao et al. [49]

		TextBoxes++

		TIP

		2018

		~

		~

		~

		0.92

		0.86

		0.89

		~

		~

		~

	

	

		He et al. [50]

		


		TIP

		2018

		~

		~

		~

		0.91

		0.84

		0.88

		~

		~

		~

	

	

		Ma et al. [51]

		RRPN

		TMM

		2018

		~

		~

		~

		0.95

		0.89

		0.91

		~

		~

		~

	

	

		Zhu et al. [55]

		SLPR

		arXiv

		2018

		~

		~

		~

		0.90

		0.72

		0.80

		~

		~

		~

	

	

		Cai et al. [59]

		FFN

		ICIP

		2018

		~

		~

		~

		0.92

		0.84

		0.876

		~

		~

		~

	

	

		Sabyasachi et al. [60]

		RGC

		ICIP

		2018

		~

		~

		~

		0.89

		0.77

		0.83

		~

		~

		~

	

	

		Wang et al. [82]

		


		CVPR

		2019

		~

		~

		~

		0.937

		0.878

		0.907

		~

		~

		~

	

	

		Liu et al. [84]

		


		CVPR

		2019

		~

		~

		~

		0.937

		0.897

		0.917

		~

		~

		~

	

	

		He et al. [29]

		SSTD

		ICCV

		2017

		~

		~

		~

		0.89

		0.86

		0.88

		~

		~

		~

	

	

		Tian et al. [11]

		


		ICCV

		2015

		0.86

		0.76

		0.81

		0.852

		0.759

		0.802

		~

		~

		~

	

	

		Tian et al. [13]

		CTPN

		ECCV

		2016

		~

		~

		~

		0.93

		0.83

		0.88

		~

		~

		~

	

	

		He et al. [19]

		


		ACCV

		2016

		~

		~

		~

		0.90

		0.75

		0.81

		~

		~

		~

	

	

		Shi et al. [23]

		SegLink

		CVPR

		2017

		~

		~

		~

		0.877

		0.83

		0.853

		~

		~

		~

	

	

		Tian et al. [30]

		WeText

		ICCV

		2017

		~

		~

		~

		0.911

		0.831

		0.869

		~

		~

		~

	

	

		Zhu et al. [31]

		RTN

		ICDAR

		2017

		~

		~

		~

		0.94

		0.89

		0.91

		~

		~

		~

	

	

		Ren et al. [34]

		


		TMM

		2017

		0.78

		0.67

		0.72

		0.81

		0.67

		0.73

		~

		~

		~

	

	

		Zhang et al. [10]

		


		CVPR

		2015

		0.84

		0.76

		0.80

		0.88

		0.74

		0.80

		~

		~

		~

	

	

		Tang et al. [52]

		SSFT

		TMM

		2018

		Hybrid

		0.906

		0.847

		0.876

		0.911

		0.861

		0.885

		~

		~

		~

	

	

		Xie et al.[61]

		SPCNet

		AAAI

		2019

		~

		~

		~

		0.94

		0.91

		0.92

		~

		~

		~

	

	

		Liu et al. [80]

		BDN

		IJCAI

		2019

		~

		~

		~

		0.887

		0.894

		0.89

		~

		~

		~

	

	

		Zhou et al. [24]

		EAST

		CVPR

		2017

		~

		~

		~

		0.93

		0.83

		0.870

		~

		~

		~

	

	

		Yue et al. [48]

		


		BMVC

		2018

		~

		~

		~

		0.885

		0.846

		0.870

		~

		~

		~

	

	

		Zhong et al. [53]

		AF-RPN

		arXiv

		2018

		~

		~

		~

		0.94

		0.90

		0.92

		~

		~

		~

	

	

		Xue et al.[85]

		MSR

		IJCAI

		2019

		~

		~

		~

		0.918

		0.885

		0.901

		~

		~

		~

	



#### 2.2.2 Detection Results on Arbitrary-Quadrilateral-Text Datasets

	

	

	

	

	

	

  

		Method               

		Model

		Source

		Time

		Method Category

		IC15 [70]

		MSRA-TD500 [71]

		USTB-SV1K [65]

		SVT [66]

		

	

		P

		R

		F

		P

		R

		F

		P

		R

		F

		P

		R

		F

	

	

		Le et al. [5]

		HOCC

		CVPR

		2014

		Traditional

		~

		~

		~

		0.71

		0.62

		0.66

		~

		~

		~

		~

		~

		~

	

	

		Yin et al. [7]

		


		TPAMI

		2015

		~

		~

		~

		0.81

		0.63

		0.71

		0.499

		0.454

		0.475

		~

		~

		~

	

	

		Wu et al. [9]

		


		TMM

		2015

		~

		~

		~

		0.63

		0.70

		0.66

		~

		~

		~

		~

		~

		~

	

	

		Tian et al. [17]

		


		IJCAI

		2016

		~

		~

		~

		0.95

		0.58

		0.721

		0.537

		0.488

		0.51

		~

		~

		~

	

	

		Yang et al. [33]

		


		TIP

		2017

		~

		~

		~

		0.95

		0.58

		0.72

		0.54

		0.49

		0.51

		~

		~

		~

	

	

		Liang et al. [8]

		


		TIP

		2015

		~

		~

		~

		0.74

		0.66

		0.70

		~

		~

		~

		~

		~

		~

	

	

		Zhang et al. [14]

		


		CVPR

		2016

		Segmentation

		0.71

		0.43

		0.54

		0.83

		0.67

		0.74

		~

		~

		~

		~

		~

		~

	

	

		Zhu et al. [16]

		


		CVPR

		2016

		0.81

		0.91

		0.85

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		He et al. [18]

		Text-CNN

		TIP

		2016

		~

		~

		~

		0.76

		0.61

		0.69

		~

		~

		~

		~

		~

		~

	

	

		Yao et al. [21]

		


		arXiv

		2016

		0.723

		0.587

		0.648

		0.765

		0.753

		0.759

		~

		~

		~

		~

		~

		~

	

	

		Hu et al. [27]

		WordSup

		ICCV

		2017

		0.793

		0.77

		0.782

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Wu et al. [28]

		


		ICCV

		2017

		0.91

		0.78

		0.84

		0.77

		0.78

		0.77

		~

		~

		~

		~

		~

		~

	

	

		Dai et al. [35]

		FTSN

		arXiv

		2017

		0.886

		0.80

		0.841

		0.876

		0.771

		0.82

		~

		~

		~

		~

		~

		~

	

	

		Deng et al. [40]

		PixelLink

		AAAI

		2018

		0.855

		0.820

		0.837

		0.830

		0.732

		0.778

		~

		~

		~

		~

		~

		~

	

	

		Liu et al. [42]

		MCN

		CVPR

		2018

		0.72

		0.80

		0.76

		0.88

		0.79

		0.83

		~

		~

		~

		~

		~

		~

	

	

		Lyu et al. [43]

		


		CVPR

		2018

		0.895

		0.797

		0.843

		0.876

		0.762

		0.815

		~

		~

		~

		~

		~

		~

	

	

		Chu et al. [45]

		Border

		ECCV

		2018

		~

		~

		~

		0.830

		0.774

		0.801

		~

		~

		~

		~

		~

		~

	

	

		Long et al. [46]

		TextSnake

		ECCV

		2018

		0.849

		0.804

		0.826

		0.832

		0.739

		0.783

		~

		~

		~

		~

		~

		~

	

	

		Yang et al. [47]

		IncepText

		IJCAI

		2018

		0.938

		0.873

		0.905

		0.875

		0.790

		0.830

		~

		~

		~

		~

		~

		~

	

	

		Wang et al. [54]

		PSENet

		CVPR

		2019

		0.8692

		0.845

		0.8569

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Xu et al.[57]

		TextField

		arXiv

		2018

		0.843

		0.805

		0.824

		0.874

		0.759

		0.813

		~

		~

		~

		~

		~

		~

	

	

		Tian et al. [58]

		FTDN

		ICIP

		2018

		0.847

		0.773

		0.809

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Tian et al. [83]

		


		CVPR

		2019

		0.883

		0.850

		0.866

		0.842

		0.817

		0.829

		~

		~

		~

		~

		~

		~

	

	

		Baek et al. [62]

		CRAFT

		CVPR

		2019

		0.898

		0.843

		0.869

		0.882

		0.782

		0.829

		~

		~

		~

		~

		~

		~

	

	

		Richardson et al. [87]

		


		IJCAI

		2019

		0.853

		0.83

		0.827

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Wang et al. [88]

		SAST

		ACMM

		2019

		0.8755

		0.8734

		0.8744

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Wang et al. [90]

		PAN

		ICCV

		2019

		0.84

		0.819

		0.829

		0.844

		0.838

		0.821

		~

		~

		~

		~

		~

		~

	

	

		Gupta et al. [15]

		FCRN

		CVPR

		2016

		Regression

		~

		~

		~

		~

		~

		~

		~

		~

		~

		0.651

		0.599

		0.624

	

	

		Liu et al. [25]

		DMPNet

		CVPR

		2017

		0.732

		0.682

		0.706

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		He et al. [26]

		DDR

		ICCV

		2017

		0.82

		0.80

		0.81

		0.77

		0.70

		0.74

		~

		~

		~

		~

		~

		~

	

	

		Jiang et al. [36]

		R2CNN

		arXiv

		2017

		0.856

		0.797

		0.825

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Xing et al. [37]

		ArbiText

		arXiv

		2017

		0.792

		0.735

		0.759

		0.78

		0.72

		0.75

		~

		~

		~

		~

		~

		~

	

	

		Wang et al. [41]

		ITN

		CVPR

		2018

		0.857

		0.741

		0.795

		0.903

		0.723

		0.803

		~

		~

		~

		~

		~

		~

	

	

		Liao et al. [44]

		RRD

		CVPR

		2018

		0.88

		0.8

		0.838

		0.876

		0.73

		0.79

		~

		~

		~

		~

		~

		~

	

	

		Liao et al. [49]

		TextBoxes++

		TIP

		2018

		0.878

		0.785

		0.829

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		He et al. [50]

		


		TIP

		2018

		0.85

		0.80

		0.82

		0.91

		0.81

		0.86

		~

		~

		~

		~

		~

		~

	

	

		Ma et al. [51]

		RRPN

		TMM

		2018

		0.822

		0.732

		0.774

		0.821

		0.677

		0.742

		~

		~

		~

		~

		~

		~

	

	

		Zhu et al. [55]

		SLPR

		arXiv

		2018

		0.855

		0.836

		0.845

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Deng et al. [56]

		


		arXiv

		2018

		0.89

		0.81

		0.845

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Sabyasachi et al. [60]

		RGC

		ICIP

		2018

		0.83

		0.81

		0.82

		0.85

		0.76

		0.80

		~

		~

		~

		~

		~

		~

	

	

		Wang et al. [82]

		


		CVPR

		2019

		0.892

		0.86

		0.876

		0.852

		0.821

		0.836

		~

		~

		~

		~

		~

		~

	

	

		He et al. [29]

		SSTD

		ICCV

		2017

		0.80

		0.73

		0.77

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Tian et al. [13]

		CTPN

		ECCV

		2016

		0.74

		0.52

		0.61

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		He et al. [19]

		


		ACCV

		2016

		~

		~

		~

		~

		~

		~

		~

		~

		~

		0.87

		0.73

		0.79

	

	

		Shi et al. [23]

		SegLink

		CVPR

		2017

		0.731

		0.768

		0.75

		0.86

		0.70

		0.77

		~

		~

		~

		~

		~

		~

	

	

		Wang et al. [86]

		DSRN

		IJCAI

		2019

		0.832

		0.796

		0.814

		0.876

		0.712

		0.785

		~

		~

		~

		~

		~

		~

	

	

		Tang et al.[89]

		Seglink++

		PR

		2019

		0.837

		0.803

		0.820

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Wang et al. [92]

		ContourNet

		CVPR

		2020

		0.876

		0.861

		0.869

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Tang et al. [52]

		SSFT

		TMM

		2018

		Hybrid

		~

		~

		~

		~

		~

		~

		~

		~

		~

		0.541

		0.758

		0.631

	

	

		Xie et al.[61]

		SPCNet

		AAAI

		2019

		0.89

		0.86

		0.87

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Liu et al. [64]

		PMTD

		arXiv

		2019

		0.913

		0.874

		0.893

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Liu et al. [80]

		BDN

		IJCAI

		2019

		0.881

		0.846

		0.863

		0.87

		0.815

		0.842

		~

		~

		~

		~

		~

		~

	

	

		Zhang et al. [81]

		LOMO

		CVPR

		2019

		0.878

		0.876

		0.877

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Zhou et al. [24]

		EAST

		CVPR

		2017

		0.833

		0.783

		0.807

		0.873

		0.674

		0.761

		~

		~

		~

		~

		~

		~

	

	

		Yue et al. [48]

		


		BMVC

		2018

		0.866

		0.789

		0.823

		~

		~

		~

		~

		~

		~

		0.691

		0.660

		0.675

	

	

		Zhong et al. [53]

		AF-RPN

		arXiv

		2018

		0.89

		0.83

		0.86

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Xue et al.[85]

		MSR

		IJCAI

		2019

		~

		~

		~

		0.874

		0.767

		0.817

		~

		~

		~

		~

		~

		~

	

	

		Liao et al. [91]

		DB

		AAAI

		2020

		0.918

		0.832

		0.873

		0.915

		0.792

		0.849

		~

		~

		~

		~

		~

		~

	

	

		Xiao et al. [93]

		SDM

		ECCV

		2020

		0.9196

		0.8922

		0.9057

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

	

	

	

	

  

		Method               

		Model

		Source

		Time

		Method Category

		IC15 [70]

		MSRA-TD500 [71]

		USTB-SV1K [65]

		SVT [66]

		

	

		P

		R

		F

		P

		R

		F

		P

		R

		F

		P

		R

		F

	

	

		Le et al. [5]

		HOCC

		CVPR

		2014

		Traditional

		~

		~

		~

		~

		~

		~

		~

		~

		~

		0.80

		0.73

		0.76

	

	

		Yao et al. [21]

		


		arXiv

		2016

		Segmentation

		0.432

		0.27

		0.333

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Hu et al. [27]

		WordSup

		ICCV

		2017

		0.452

		0.309

		0.368

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Lyu et al. [43]

		


		CVPR

		2018

		0.351

		0.348

		0.349

		~

		~

		~

		0.743

		0.706

		0.724

		~

		~

		~

	

	

		Chu et al. [45]

		Border

		ECCV

		2018

		~

		~

		~

		0.782

		0.588

		0.671

		0.777

		0.621

		0.690

		~

		~

		~

	

	

		Yang et al. [47]

		IncepText

		IJCAI

		2018

		~

		~

		~

		0.785

		0.569

		0.660

		~

		~

		~

		~

		~

		~

	

	

		Wang et al. [54]

		PSENet

		CVPR

		2019

		~

		~

		~

		~

		~

		~

		0.7535

		0.6918

		0.7213

		~

		~

		~

	

	

		Baek et al. [62]

		CRAFT

		CVPR

		2019

		~

		~

		~

		~

		~

		~

		0.806

		0.682

		0.739

		~

		~

		~

	

	

		He et al. [29]

		SSTD

		ICCV

		2017

		Regression

		0.46

		0.31

		0.37

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Gupta et al. [15]

		FCRN

		CVPR

		2016

		~

		~

		~

		~

		~

		~

		0.844

		0.763

		0.801

		~

		~

		~

	

	

		Liao et al. [49]

		TextBoxes++

		TIP

		2018

		0.61

		0.57

		0.59

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Ma et al. [51]

		RRPN

		TMM

		2018

		~

		~

		~

		~

		~

		~

		0.7669

		0.5794

		0.6601

		~

		~

		~

	

	

		Deng et al. [56]

		


		arXiv

		2018

		0.555

		0.633

		0.591

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Cai et al. [59]

		FFN

		ICIP

		2018

		0.43

		0.35

		0.39

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Xie et al. [79]

		DeRPN

		AAAI

		2019

		0.586

		0.557

		0.571

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		He et al. [29]

		SSTD

		ICCV

		2017

		0.46

		0.31

		0.37

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Liao et al. [44]

		RRD

		CVPR

		2018

		~

		~

		~

		0.591

		0.775

		0.670

		~

		~

		~

		~

		~

		~

	

	

		Richardson et al. [87]

		


		IJCAI

		2019

		~

		~

		~

		~

		~

		~

		0.729

		0.618

		0.669

		~

		~

		~

	

	

		Wang et al. [88]

		SAST

		ACMM

		2019

		~

		~

		~

		~

		~

		~

		0.7935

		0.6653

		0.7237

		~

		~

		~

	

	

		Xie et al.[61]

		SPCNet

		AAAI

		2019

		Hybrid

		~

		~

		~

		~

		~

		~

		0.806

		0.686

		0.741

		~

		~

		~

	

	

		Liu et al. [64]

		PMTD

		arXiv

		2019

		~

		~

		~

		~

		~

		~

		0.844

		0.763

		0.801

		~

		~

		~

	

	

		Liu et al. [80]

		BDN

		IJCAI

		2019

		~

		~

		~

		~

		~

		~

		0.791

		0.698

		0.742

		~

		~

		~

	

	

		Zhang et al. [81]

		LOMO

		CVPR

		2019

		~

		~

		~

		0.791

		0.602

		0.684

		0.802

		0.672

		0.731

		~

		~

		~

	

	

		Zhou et al. [24]

		EAST

		CVPR

		2017

		0.504

		0.324

		0.395

		~

		~

		~

		~

		~

		~

		~

		~

		~

	

	

		Zhong et al. [53]

		AF-RPN

		arXiv

		2018

		~

		~

		~

		~

		~

		~

		0.75

		0.66

		0.70

		~

		~

		~

	

	

		Liao et al. [91]

		DB

		AAAI

		2020

		~

		~

		~

		~

		~

		~

		0.831

		0.679

		0.747

		~

		~

		~

	

	

		Xiao et al. [93]

		SDM

		ECCV

		2020

		~

		~

		~

		~

		~

		~

		0.8679

		0.7526

		0.8061

		~

		~

		~

	



#### 2.2.3 Detection Results on Irregular-Text Datasets

In this section, we only select those methods suitable for irregular text detection.

	

	

	

	

  

		Method               

		Model

		Source

		Time

		Method Category

		Total-text [74]

		SCUT-CTW1500 [75]

		

	

		P

		R

		F

		P

		R

		F

	

	

		Baek et al. [62]

		CRAFT

		CVPR

		2019

		Segmentation

		0.876

		0.799

		0.836

		0.860

		0.811

		0.835

	

	

		Long et al. [46]

		TextSnake

		ECCV

		2018

		0.827

		0.745

		0.784

		0.679

		0.853

		0.756

	

	

		Tian et al. [83]

		


		CVPR

		2019

		~

		~

		~

		81.7

		84.2

		80.1

	

	

		Wang et al. [54]

		PSENet

		CVPR

		2019

		0.840

		0.779

		0.809

		0.848

		0.797

		0.822

	

	

		Wang et al. [88]

		SAST

		ACMM

		2019

		0.8557

		0.7549

		0.802

		0.8119

		0.8171

		0.8145

	

	

		Wang et al. [90]

		PAN

		ICCV

		2019

		0.893

		0.81

		0.85

		0.864

		0.812

		0.837

	

	

		Zhu et al. [55]

		SLPR

		arXiv

		2018

		Regression

		~

		~

		~

		0.801

		0.701

		0.748

	

	

		Liu et al. [63]

		CTD+TLOC

		PR

		2019

		~

		~

		~

		0.774

		0.698

		0.734

	

	

		Wang et al. [82]

		


		CVPR

		2019

		~

		~

		~

		80.1

		80.2

		80.1

	

	

		Liu et al. [84]

		


		CVPR

		2019

		0.814

		0.791

		0.802

		0.787

		0.761

		0.774

	

	

		Tang et al.[89]

		Seglink++

		PR

		2019

		0.829

		0.809

		0.815

		0.828

		0.798

		0.813

	

	

		Wang et al. [92]

		ContourNet

		CVPR

		2020

		0.869

		0.839

		0.854

		0.837

		0.841

		0.839

	

	

		Zhang et al. [81]

		LOMO

		CVPR

		2019

		Hybrid

		0.876

		0.793

		0.833

		0.857

		0.765

		0.808

	

	

		Xie et al.[61]

		SPCNet

		AAAI

		2019

		0.83

		0.83

		0.83

		~

		~

		~

	

	

		Xue et al.[85]

		MSR

		IJCAI

		2019

		0.852

		0.73

		0.768

		0.838

		0.778

		0.807

	

	

		Liao et al. [91]

		DB

		AAAI

		2020

		0.871

		0.825

		0.847

		0.869

		0.802

		0.834

	

	

		Xiao et al.[93]

		SDM

		ECCV

		2020

		0.9085

		0.8603

		0.8837

		0.884

		0.8442

		0.8636

	



## 3. Survey

**[A] \[TPAMI-2015]** Ye Q, Doermann D. **Text detection and recognition in imagery: A survey**[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6945320)

**[B] \[Frontiers-Comput. Sci-2016]** Zhu Y, Yao C, Bai X. **Scene text detection and recognition: Recent advances and future trends**[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. [paper](https://link.springer.com/article/10.1007/s11704-015-4488-0)

**[C] \[arXiv-2018]** Long S, He X, Ya C. **Scene Text Detection and Recognition: The Deep Learning Era**[J]. arXiv preprint arXiv:1811.04256, 2018. [paper](https://arxiv.org/pdf/1811.04256.pdf)



## 4. Evaluation

If you are insterested in developing better scene text detection metrics, some references recommended here might be useful.

**[A]** Wolf, Christian, and Jean-Michel Jolion. "**Object count/area graphs for the evaluation of object detection and segmentation algorithms.**" International Journal of Document Analysis and Recognition (IJDAR) 8.4 (2006): 280-296. [paper](https://link.springer.com/article/10.1007/s10032-006-0014-0)

**[B]** D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. **ICDAR 2015 competition on robust reading**. In ICDAR, pages 1156–1160, 2015. [paper](https://ieeexplore.ieee.org/document/7333942)

**[C]** Calarasanu, Stefania, Jonathan Fabrizio, and Severine Dubuisson. "**What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions.**" Image and Vision Computing 46 (2016): 1-17. [paper](https://www.sciencedirect.com/science/article/pii/S0262885615001377)

**[D]** Shi, Baoguang, et al. "**ICDAR2017 competition on reading chinese text in the wild (RCTW-17).**" 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017. [paper](https://ieeexplore.ieee.org/abstract/document/8270164)

**[E]** Nayef, N; Yin, F; Bizid, I; et al. **ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identiﬁcation-rrc-mlt**. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE.

[paper](https://ieeexplore.ieee.org/document/8270168)

**[F]** Dangla, Aliona, et al. "**A first step toward a fair comparison of evaluation protocols for text detection algorithms.**" 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 2018. [paper](https://ieeexplore.ieee.org/abstract/document/8395220)

**[G]** He,Mengchao and Liu, Yuliang, et al. **ICPR2018 Contest on Robust Reading for Multi-Type Web images.** ICPR 2018. [paper](https://www.researchgate.net/publication/329316151_ICPR2018_Contest_on_Robust_Reading_for_Multi-Type_Web_Images)

**[H]** Liu, Yuliang and Jin, Lianwen, et al. "**Tightness-aware Evaluation Protocol for Scene Text Detection**" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. [paper](https://arxiv.org/abs/1904.00813) [code](https://github.com/Yuliang-Liu/TIoU-metric)



## 5. OCR Service

|                             OCR                              | API  | Free |

| :----------------------------------------------------------: | :--: | :--: |

| [Tesseract OCR Engine](https://github.com/tesseract-ocr/tesseract) |  ×   |  √   |

| [Azure](https://azure.microsoft.com/zh-cn/services/cognitive-services/computer-vision/#Analysis) |  √   |  √   |

| [ABBYY](https://www.abbyy.cn/real-time-recognition-sdk/technical-specifications/) |  √   |  √   |

|               [OCR Space](https://ocr.space/)                |  √   |  √   |

|       [SODA PDF OCR](https://www.sodapdf.com/ocr-pdf/)       |  √   |  √   |

|          [Free Online OCR](https://www.newocr.com/)          |  √   |  √   |

|           [Online OCR](https://www.onlineocr.net/)           |  √   |  √   |

|             [Super Tools](https://www.wdku.net/)             |  √   |  √   |

|          [Online Chinese Recognition](http://chongdata.com/ocr/)           |  √   |  √   |

|   [Calamari OCR](https://github.com/Calamari-OCR/calamari)   |  ×   |  √   |

|   [Tencent OCR](https://cloud.tencent.com/product/ocr?lang=cn)   |  √   |  ×   |



## 6. References and Code

|                                                     |

| -- |

| **[1]** Yao C, Bai X, Liu W, et al. **Detecting texts of arbitrary orientations in natural images**. 2012 IEEE               Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 1083-1090. [Paper](https://ieeexplore.ieee.org/document/6247787) |

| **[2]** Yin X C, Yin X, Huang K, et al. **Robust text detection in natural scene images**. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 36(5):   970-83. [Paper](https://arxiv.org/pdf/1301.2628.pdf) |

| **[3]** Li Y, Jia W, Shen C, et al. **Characterness: An indicator of text in the wild**. IEEE transactions on image processing, 2014, 23(4): 1666-1677. [Paper](https://arxiv.org/pdf/1309.6691.pdf) |

| **[4]** Huang W, Qiao Y, Tang X. **Robust scene text detection with convolution neural network induced mser trees**. European Conference on Computer Vision(ECCV), 2014: 497-511. [Paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.724.6859&rep=rep1&type=pdf) |

| **[5]** Kang L, Li Y, Doermann D. **Orientation robust text line detection in natural images**. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 4034-4041. [Paper](https://ieeexplore.ieee.org/document/6909910) |

| **[6]** Sun L, Huo Q, Jia W, et al. **A robust approach for text detection from natural scene images**.   Pattern Recognition, 2015, 48(9): 2906-2920. [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0031320315001260) |

| **[7]** Yin X C, Pei W Y, Zhang J, et al. **Multi-orientation scene text detection with adaptive clustering**. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015 (9): 1930-1937. [Paper](https://ieeexplore.ieee.org/document/7001081) |

| **[8]** Liang G, Shivakumara P, Lu T, et al. **Multi-spectral fusion based approach for  arbitrarily oriented scene text detection in video images**. IEEE Transactions on Image Processing, 2015, 24(11): 4488-4501. [Paper](https://ieeexplore.ieee.org/document/7180356) |

| **[9]** Wu L, Shivakumara P, Lu T, et al. **A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video**. IEEE Trans. Multimedia, 2015, 17(8): 1137-1152. [Paper](https://ieeexplore.ieee.org/document/7121019) |

| **[10]** Zheng Z, Wei S, et al. **Symmetry-based text line detection in natural scenes**. IEEE Conference on Computer Vision & Pattern Recognition(CVPR), 2015. [Paper](https://ieeexplore.ieee.org/document/7298871) |

| **[11]** Tian S, Pan Y, Huang C, et al. **Text flow: A unified text detection system in natural scene images**. Proceedings of the IEEE international conference on computer vision(ICCV). 2015: 4651-4659. [Paper](https://arxiv.org/pdf/1604.06877.pdf) |

| **[12]** Buta M, et al. **FASText: Efficient unconstrained scene text detector**. 2015 IEEE International Conference on Computer Vision (ICCV). 2015: 1206-1214. [Paper](https://ieeexplore.ieee.org/document/7410500) |

| **[13]** Tian Z, Huang   W, He T, et al. **Detecting text in natural image with connectionist text proposal network**. European conference on computer vision(ECCV), 2016: 56-72. [Paper](https://arxiv.org/pdf/1609.03605.pdf) [Code](https://github.com/tianzhi0549/CTPN) |

| **[14]** Zhang Z, Zhang C, Shen W, et al. **Multi-oriented text detection with fully convolutional networks.** Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 4159-4167. [Paper](https://arxiv.org/pdf/1604.04018.pdf) |

| **[15]** Gupta A, Vedaldi A, Zisserman A. **Synthetic data for text localisation in natural images**. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 2315-2324. [Paper](https://arxiv.org/pdf/1604.06646.pdf) [Code](https://github.com/ankush-me/SynthText) |

| **[16]** S. Zhu and R. Zanibbi, **A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification**, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 625-632. [Paper](https://ieeexplore.ieee.org/document/7780443) |

| **[17]** Tian S, Pei W Y, Zuo Z Y, et al. **Scene Text Detection in Video by Learning Locally and Globally**. IJCAI. 2016: 2647-2653. [Paper](https://www.ijcai.org/Proceedings/16/Papers/376.pdf) |

| **[18]** He T, Huang W, Qiao Y, et al. **Text-attentional convolutional neural network for scene text detection**. IEEE transactions on image processing, 2016, 25(6):   2529-2541. [Paper](https://arxiv.org/pdf/1510.03283.pdf) |

| **[19]** He, Dafang and Yang, Xiao and Huang, Wenyi and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. **Aggregating local context for accurate scene text detection**. ACCV, 2016. [Paper](http://www.cse.psu.edu/~duk17/papers/accv2016.pdf) |

| **[20]** Zhong Z, Jin L, Zhang S, et al. **Deeptext: A unified framework for text proposal generation and text detection in natural images**. arXiv preprint arXiv:1605.07314, 2016. [Paper](https://arxiv.org/pdf/1605.07314.pdf) |

| **[21]** Yao C, Bai X, Sang N, et al. **Scene text detection via holistic, multi-channel prediction**. arXiv preprint arXiv:1606.09002, 2016. [Paper](https://arxiv.org/pdf/1606.09002.pdf) |

| **[22]** Liao M, Shi B, Bai X, et al. **TextBoxes: A Fast Text Detector with a Single Deep Neural  Network**. AAAI. 2017: 4161-4167. [Paper](https://arxiv.org/abs/1611.06779) [Code](https://github.com/MhLiao/TextBoxes) |

| **[23]** Shi B, Bai X, Belongie S. **Detecting Oriented Text in Natural Images by Linking Segments**. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3482-3490. [Paper](https://arxiv.org/abs/1703.06520) [Code](https://github.com/dengdan/seglink) |

| **[24]** Zhou X, Yao C, Wen H, et al. **EAST: an efficient and accurate scene text detector**. CVPR, 2017: 2642-2651. [Paper](https://arxiv.org/abs/1704.03155) [Code](https://github.com/argman/EAST) |

| **[25]** Liu Y, Jin L. **Deep matching prior network: Toward tighter multi-oriented text detection**. CVPR, 2017: 3454-3461. [Paper](https://arxiv.org/abs/1703.01425) |

| **[26]** He W, Zhang X Y, Yin F, et al. **Deep Direct Regression for Multi-Oriented Scene Text Detection**. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017: 745-753. [Paper](https://arxiv.org/abs/1703.08289) |

| **[27]** Hu H, Zhang C, Luo Y, et al. **Wordsup: Exploiting word annotations for character based text detection**. ICCV, 2017. [Paper](https://arxiv.org/abs/1708.06720) |

| **[28]** Wu Y,   Natarajan P. **Self-organized text detection with minimal post-processing via border learning**. ICCV, 2017. [Paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Wu_Self-Organized_Text_Detection_ICCV_2017_paper.pdf) |

| **[29]** He P, Huang W, He T, et al. **Single shot text detector with regional attention**. The IEEE   International Conference on Computer Vision (ICCV). 2017, 6(7). [Paper](https://arxiv.org/abs/1709.00138) [Code](https://github.com/BestSonny/SSTD) |

| **[30]** Tian S, Lu S, Li C. **Wetext: Scene text detection under weak supervision**. ICCV, 2017. [Paper](https://arxiv.org/abs/1710.04826) |

| **[31]** Zhu, Xiangyu and Jiang, Yingying et al. **Deep Residual Text Detection Network for Scene Text**. ICDAR, 2017.  [Paper](https://arxiv.org/abs/1711.04147) |

| **[32]** Tang Y , Wu X. **Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks**. IEEE Transactions on Image Processing, 2017, 26(3):1509-1520. [Paper](https://ieeexplore.ieee.org/document/7828014) |

| **[33]** Yang C, Yin X C, Pei W Y, et al. **Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework with Dynamic Programming**. IEEE Transactions on Image Processing, 2017. [Paper](https://ieeexplore.ieee.org/document/7903596) |

| **[34]** X. Ren, Y. Zhou, J. He, K. Chen, X. Yang and J. Sun, **A Convolutional Neural Network-Based Chinese Text Detection Algorithm via Text Structure Modeling**. in IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 506-518, March 2017. [Paper](https://ieeexplore.ieee.org/document/7733055) |

| **[35]** Dai Y, Huang Z, Gao Y, et al. **Fused text segmentation networks for multi-oriented scene text detection**. arXiv preprint arXiv:1709.03272, 2017. [Paper](https://arxiv.org/abs/1709.03272) |

| **[36]** Jiang Y, Zhu X, Wang X, et al. **R2CNN: rotational region CNN for orientation robust scene text detection**. arXiv preprint arXiv:1706.09579, 2017. [Paper](https://arxiv.org/abs/1706.09579) |

| **[37]** Xing D, Li Z, Chen X, et al. **ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene**. arXiv preprint arXiv:1711.11249, 2017. [Paper](https://arxiv.org/abs/1711.11249) |

| **[38]** C. Wang, F. Yin and C. Liu, **Scene Text Detection with Novel Superpixel Based Character Candidate Extraction**. in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR),  2017, pp. 929-934. [Paper](https://ieeexplore.ieee.org/document/8270087) |

| **[39]** Sheng Zhang, Yuliang Liu, Lianwen Jin et al. **Feature Enhancement Network: A Refined Scene Text Detector**. In AAAI 2018. [Paper](https://arxiv.org/abs/1711.04249) |

| **[40]** Dan Deng et al. **PixelLink: Detecting Scene Text via Instance Segmentation**. In AAAI 2018. [Paper](https://arxiv.org/abs/1801.01315) [Code](https://github.com/ZJULearning/pixel_link) |

| **[41]** Fangfang Wang, Liming Zhao, Xi L et al. **Geometry-Aware Scene Text Detection with Instance Transformation Network**. In CVPR 2018. [Paper](http://openaccess.thecvf.com/content_cvpr_2018/CameraReady/1653.pdf) |

| **[42]** Zichuan Liu,   Guosheng Lin, Sheng Yang et al. **Learning Markov Clustering Networks for Scene Text Detection**. In CVPR 2018. [Paper](https://arxiv.org/abs/1805.08365) |

| **[43]** Pengyuan Lyu, Cong Yao, Wenhao Wu et al. **Multi-Oriented Scene Text Detection via Corner   Localization and Region Segmentation**. In CVPR 2018. [Paper](https://arxiv.org/abs/1802.08948) |

| **[44]** Minghui L, Zhen Z, Baoguang S. **Rotation-Sensitive Regression for Oriented Scene Text Detection**. In CVPR 2018. [Paper](https://arxiv.org/abs/1803.05265) |

| **[45]** Chuhui Xue et al. **Accurate Scene Text Detection through Border Semantics Awareness and   Bootstrapping**. In ECCV 2018. [Paper](https://arxiv.org/abs/1807.03547) |

| **[46]** Long, Shangbang and Ruan, Jiaqiang, et al. **TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes**. In ECCV, 2018. [Paper](https://arxiv.org/abs/1807.01544) |

| **[47]** Qiangpeng Yang, Mengli Cheng et al. **IncepText: A New Inception-Text Module with       Deformable PSROI Pooling for Multi-Oriented Scene Text Detection**. In IJCAI 2018. [Paper](https://arxiv.org/abs/1805.01167) |

| **[48]** Xiaoyu Yue et al. **Boosting up Scene Text Detectors with Guided CNN**. In BMVC 2018. [Paper](https://arxiv.org/abs/1805.04132) |

| **[49]** Liao M, Shi B , Bai X. **TextBoxes++: A Single-Shot Oriented Scene Text Detector**. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690. [Paper](https://arxiv.org/abs/1801.02765) [Code](https://github.com/MhLiao/TextBoxes_plusplus) |

| **[50]** W. He, X. Zhang, F. Yin and C. Liu, **Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression**, in IEEE Transactions on Image Processing, vol. 27, no. 11, pp.5406-5419, 2018. [Paper](https://ieeexplore.ieee.org/document/8410577) |

| **[51]** Ma J, Shao W, Ye H, et al. **Arbitrary-oriented scene text detection via rotation proposals**.in IEEE Transactions on Multimedia, 2018. [Paper](https://arxiv.org/abs/1703.01086) [Code](https://github.com/mjq11302010044/RRPN) |

| **[52]** Youbao Tang and Xiangqian Wu. **Scene Text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification**. In TMM, 2018. [Paper](https://ieeexplore.ieee.org/document/8281640) |

| **[53]** Zhuoyao Zhong, Lei Sun and Qiang Huo. **An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches**. arXiv preprint arXiv:1804.09003. 2018. [Paper](https://arxiv.org/abs/1804.09003) |

| **[54]** Wenhai W, Enze X, et al. **Shape Robust Text Detection with Progressive Scale Expansion   Network**. In CVPR 2019. [Paper](https://arxiv.org/abs/1903.12473) [Code](https://github.com/whai362/PSENet) |

| **[55]** Zhu Y, Du J. **Sliding Line Point Regression for Shape Robust Scene Text Detection**. arXiv preprint arXiv:1801.09969, 2018. [Paper](https://arxiv.org/abs/1801.09969) |

| **[56]** Linjie D, Yanxiang Gong, et al. **Detecting Multi-Oriented Text with Corner-based Region Proposals**.  arXiv preprint arXiv: 1804.02690, 2018. [Paper](https://arxiv.org/abs/1804.02690) [Code](https://github.com/xhzdeng/crpn) |

| **[57]** Yongchao Xu, Yukang Wang, Wei Zhou, et al. **TextField: Learning A Deep Direction Field for Irregular Scene Text Detection**. arXiv preprint arXiv: 1812.01393, 2018. [Paper](https://arxiv.org/abs/1812.01393) |

| **[58]** Xiaowei Tian, Dao Wu, Rui Wang, Xiaochun Cao. **Focal Text: an Accurate Text Detection with Focal Loss**. In ICIP 2018. [Paper](https://ieeexplore.ieee.org/document/8451241) |

| **[59]** Chenqin C, Pin L, Bing S. **Feature Fusion Network for Scene Text Detection**. In ICIP, 2018. [Paper](https://ieeexplore.ieee.org/document/8451402) |

| **[60]** Sabyasachi Mohanty et al. **Recurrent Global Convolutional Network for Scene Text Detection**. In ICIP 2018. [Paper](https://ieeexplore.ieee.org/document/8451058) |

| **[61]** Enze Xie, et al. **Scene Text Detection with Supervised Pyramid Context Network**. In AAAI 2019. [Paper](https://arxiv.org/abs/1811.08605) |

| **[62]** Youngmin Baek, Bado Lee, et al. **Character Region Awareness for Text Detection**. In CVPR 2019. [Paper](https://arxiv.org/abs/1904.01941) |

| **[63]** Yuliang L, Lianwen J, Shuaitao Z, et al. **Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection**. Pattern Recognition, 2019. [Paper](https://arxiv.org/abs/1712.02170) [Code](https://github.com/Yuliang-Liu/Curve-Text-Detector) |

| **[64]** Jingchao Liu, Xuebo Liu, et al, **Pyramid Mask Text Detector**. arXiv preprint arXiv:1903.11800, 2019. [Paper](https://arxiv.org/abs/1903.11800) [Code](https://github.com/STVIR/PMTD) |

| **[79]** Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, **DeRPN: Taking a further step toward more general object detection**. In AAAI, 2019. [Paper](https://arxiv.org/abs/1811.06700) [Code](https://github.com/HCIILAB/DeRPN)|

| **[80]** Yuliang Liu, Lianwen Jin, et al, **Omnidirectional Scene Text Detction with Sequential-free Box Discretization**. In IJCAI, 2019.[Paper](https://arxiv.org/abs/1906.02371) [Code](https://github.com/Yuliang-Liu/Box_Discretization_Network)|

| **[81]** Chengquan Zhang, Borong Liang, et al, **Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes**. In CVPR, 2019.[Paper](https://arxiv.org/abs/1904.06535)|

| **[82]** Xiaobing Wang, Yingying Jiang, et al, **Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation**. In CVPR, 2019. [Paper](https://arxiv.org/abs/1905.05980?context=cs.CV)|

| **[83]** Zhuotao Tian, Michelle Shu, et al, **Learning Shape-Aware Embedding for Scene Text Detection**. In CVPR, 2019. [Paper](http://jiaya.me/papers/textdetection_cvpr19.pdf)|

| **[84]** Zichuan Liu, Guosheng Lin, et al, **Towards Robust Curve Text Detection with Conditional Spatial Expansion**. In CVPR, 2019. [Paper](https://arxiv.org/abs/1903.08836)|

| **[85]** Xue C, Lu S, Zhang W. **MSR: multi-scale shape regression for scene text detection**. In IJCAI, 2019. [Paper](https://arxiv.org/abs/1901.02596)|

| **[86]** Wang Y, Xie H, Fu Z, et al. **DSRN: a deep scale relationship network for scene text detection.** In IJCAI, 2019: 947-953. [Paper](https://pdfs.semanticscholar.org/cc51/d4756494ffea379aba095c41bde77c61f65c.pdf)|

| **[87]** Elad Richardson, et al, **It's All About The Scale -- Efficient Text Detection Using Adaptive Scaling**. In WACV, 2020. [Paper](https://arxiv.org/abs/1907.12122)|

| **[88]** Pengfei Wang, et al, **A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning**. In ACMM, 2019. [Paper](https://arxiv.org/abs/1908.05498)|

| **[89]** Jun Tang, et al, **SegLink ++: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping**. In PR, 2019. [Paper](https://www.researchgate.net/publication/334015431_Detecting_Dense_and_Arbitrary-shaped_Scene_Text_by_Instance-aware_Component_Grouping)|

| **[90]** Wenhai Wang, et al, **Efﬁcient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network**. In ICCV, 2019. [Paper](https://arxiv.org/abs/1903.08836)|

| **[91]** Minghui Liao, et al, **Real-time Scene Text Detection with Differentiable Binarization**. In AAAI, 2020. [Paper](https://arxiv.org/abs/1911.08947)[Code](https://github.com/MhLiao/DB)|

| **[92]** Wang, Yuxin, et al. **ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection.** CVPR. 2020. [Paper](https://openaccess.thecvf.com/content_CVPR_2020/html/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.html)[Code](https://github.com/wangyuxin87/ContourNet)|

| **[93]** Xiao, et al, **Sequential Deformation for Accurate Scene Text Detection**. In ECCV, 2020. [Paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123740103.pdf)|

|                                                                                    **Datasets** |

| **USTB-SV1K[65]**：Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, **Robust text detection   in natural scene images**, IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), priprint, 2013. [Paper](https://ieeexplore.ieee.org/document/6247787) |

| **SVT[66]**:  Wang,Kai, and S. Belongie. **Word Spotting in the Wild**. European Conference on Computer Vision(ECCV), 2010: 591-604.  [Paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.168.4897&rep=rep1&type=pdf) |

| **ICDAR2005[67]**:  Lucas, S: **ICDAR 2005 text locating competition results**. In: ICDAR ,2005. [Paper](https://ieeexplore.ieee.org/document/1575514) |

| **ICDAR2011[68]**:  Shahab, A, Shafait, F, Dengel, A: **ICDAR 2011 robust reading competition challenge 2: Reading text in scene images**. In: ICDAR, 2011. [Paper](https://ieeexplore.ieee.org/document/6065556) |

| **ICDAR2013[69]**：D. Karatzas, F. Shafait, S. Uchida, et al. **ICDAR 2013 robust reading competition**. In ICDAR, 2013. [Paper](https://ieeexplore.ieee.org/document/6628859) |

| **ICDAR2015[70]**：D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. **ICDAR 2015 competition on robust reading**. In ICDAR, pages 1156–1160, 2015. [Paper](https://ieeexplore.ieee.org/document/7333942) |

| **MSRA-TD500[71]**：C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, **Detecting texts of arbitrary orientations in natural images**. in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp.1083–1090.[Paper](http://pages.ucsd.edu/~ztu/publication/cvpr12_textdetection.pdf) |

| **COCO-Text[72]**：Veit A, Matera T, Neumann L, et al. **Coco-text: Dataset and benchmark for text   detection and recognition in natural images**. arXiv preprint arXiv:1601.07140, 2016. [Paper](https://arxiv.org/abs/1601.07140) |

| **RCTW-17[73]**：Shi B, Yao C, Liao M, et al. **ICDAR2017 competition on reading chinese text in the wild (RCTW-17)**. Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 1429-1434. [Paper](https://arxiv.org/abs/1708.09585) |

| **Total-Text[74]**：Chee C K, Chan C S. **Total-text: A comprehensive dataset for scene text detection and recognition**.Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 935-942.[Paper](https://arxiv.org/abs/1710.10400) |

| **SCUT-CTW1500[75]**：Yuliang L, Lianwen J, Shuaitao Z, et al. **Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection**. Pattern Recognition, 2019.[Paper](https://arxiv.org/abs/1712.02170) |

| **MLT 2017[76]**:  Nayef, N; Yin, F; Bizid, I; et al. **ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identiﬁcation-rrc-mlt**. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE. [Paper](https://ieeexplore.ieee.org/document/8270168) |

| **OSTD[77]**: Chucai Yi and YingLi Tian, **Text string detection from natural scenes by structure-based partition and grouping**, In IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2594–2605, 2011. [Paper](https://ieeexplore.ieee.org/abstract/document/5729827) |

| **CTW[78]**: Yuan T L, Zhu Z, Xu K, et al. **Chinese Text in the Wild**. arXiv preprint arXiv:1803.00085, 2018. [Paper](https://arxiv.org/abs/1803.00085) |

If you find any problems in our resources, or any good papers/codes we have missed, please inform us at    **liuchongyu1996@gmail.com**. Thank you for your contribution.

### Copyright

Copyright © 2019 SCUT-DLVC. All Rights Reserved.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/HCIILAB/Scene-Text-Detection

Awesome Lists containing this project

README