Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jiangxiluning/MASTER-TF
MASTER
https://github.com/jiangxiluning/MASTER-TF
cv deep-learning ocr ocr-recognition scene-text-recognition transformer
Last synced: 10 days ago
JSON representation
MASTER
- Host: GitHub
- URL: https://github.com/jiangxiluning/MASTER-TF
- Owner: jiangxiluning
- License: mit
- Created: 2020-07-05T10:17:30.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-24T22:51:27.000Z (over 1 year ago)
- Last Synced: 2024-08-02T11:14:48.885Z (3 months ago)
- Topics: cv, deep-learning, ocr, ocr-recognition, scene-text-recognition, transformer
- Language: Python
- Homepage:
- Size: 71.3 KB
- Stars: 138
- Watchers: 7
- Forks: 43
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MASTER-TensorFlow ![](https://img.shields.io/badge/license-MIT-blue)
TensorFlow reimplementation of ["MASTER: Multi-Aspect Non-local Network for Scene Text Recognition"](https://arxiv.org/abs/1910.02562)
(Pattern Recognition 2021). This project is different from our original implementation that builds on the privacy codebase FastOCR of the company.
You can also find PyTorch reimplementation at [MASTER-pytorch](https://github.com/wenwenyu/MASTER-pytorch) repository,
and the performance is almost identical. (PS. Logo inspired by the Master Oogway in Kung Fu Panda)## News
* 2021/07: [MASTER-mmocr](https://github.com/JiaquanYe/MASTER-mmocr), reimplementation of MASTER by mmocr. [@Jiaquan Ye](https://github.com/JiaquanYe)
* 2021/07: [TableMASTER-mmocr](https://github.com/JiaquanYe/TableMASTER-mmocr), 2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing Task B based on MASTER. [@Jiaquan Ye](https://github.com/JiaquanYe)
* 2021/07: Talk can be found at [here](https://www.bilibili.com/video/BV1T44y1m7vc) (Chinese).
* 2021/05: [Savior](https://github.com/novioleo/Savior), which aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for RPA,
is now integrated MASTER for captcha recognition. [@Tao Luo](https://github.com/novioleo)
* 2021/04: Slides can be found at [here](https://github.com/wenwenyu/MASTER-pytorch/blob/main/assets/MASTER.pdf).## Honors based on MASTER
* 1st place (2021/05) solution to [ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX (Subtask I: Table structure reconstruction)](https://competitions.codalab.org/competitions/26979)
* 1st place (2021/05) solution to [ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX (Subtask II: Table content reconstruction)](https://competitions.codalab.org/competitions/26979)
* 2nd place (2021/05) solution to [ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table recognition](https://icdar2021.org/program-2/competitions/competition-on-scientific-literature-parsing/)
* 1st place (2020/10) solution to [ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard (task2)](https://rrc.cvc.uab.es/?ch=12&com=evaluation&task=2)
* 2nd and 5th places (2020/10) in [The 5th China Innovation Challenge on Handwritten Mathematical Expression Recognition](https://www.heywhale.com/home/competition/5f703ac023f41e002c3ed5e4/content/6)
* 4th place (2019/08) of [ICDAR 2017 Robust Reading Challenge on COCO-Text (task2)](https://rrc.cvc.uab.es/?ch=5&com=evaluation&task=2)
* More will be released## Introduction
MASTER is a self-attention based scene text recognizer that (1) not only encodes the input-output attention,
but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder
and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion and
(3) owns a better training and evaluation efficiency. Overall architecture shown follows.
This repo contains the following features.- [x] Multi-gpu Training
- [x] Greedy Decoding
- [x] Single image inference
- [x] Eval iiit5k
- [x] Convert Checkpoint to SavedModel format
- [x] Refactory codes to be more tensorflow-style and be more consistent to graph mode
- [x] Support tensorflow serving mode## Preparation
It is highly recommended that install tensorflow-gpu using conda.Python3.7 is preferred.
```bash
pip install -r requirements.txt
```## Dataset
I use Clovaai's MJ training split for training.
please check `src/dataset/benchmark_data_generator.py` for details.
Eval datasets are some real scene text datasets. You can downloaded directly from [here](https://drive.google.com/drive/folders/1OG4ufr-kj2jFLmM4gyFEI0tMGYZrz8HI).
## Training
```bash
# training from scratch
python train.py -c [your_config].yaml# resume training from last checkpoint
python train.py -c [your_config].yaml -r# finetune with some checkpoint
python train.py -c [your_config].yaml -f [checkpoint]
```## Eval
**Since I made change to the usage of gcb block, the weight could not be suitable to HEAD. If you want to test the model, please use https://github.com/jiangxiluning/MASTER-TF/commit/85f9217af8697e41aefe5121e580efa0d6d04d92**
Currently, you can download checkpoint from [here](https://pan.baidu.com/s/1ijpo8WRZHR-AyDclxQVDiw) with code **o6g9**, or from [Google Driver](https://drive.google.com/file/d/1gpfMvnQWZimogQLFM_teOwiLNz-ZEF02/view?usp=sharing), this checkpoint was trained with MJ and selected
for the best performance of iiit5k dataset. Below is the comparision between pytorch version and tensorflow version.| Framework | Dataset | Word Accuracy | Training Details |
| --- | --- | --- | --- |
| Pytorch | MJ | 85.05% | 3 V100 4 epochs Batch Size: 3*128|
| Tensorflow | MJ | 85.53% | 2 2080ti 4 epochs Batch Size: 2 * 50 |Please download the checkpoint and model config from [here](https://pan.baidu.com/s/1ijpo8WRZHR-AyDclxQVDiw) with code **o6g9** and unzip it, and you can get this metric by running:
```bash
python eval_iiit5k.py --ckpt [checkpoint file] --cfg [model config] -o [output dir] -i [iiit5k lmdb test dataset]
```
The checkpoint file argument should be `${where you unzip}/backup/512_8_3_3_2048_2048_0.2_0_Adam_mj_my/checkpoints/OCRTransformer-Best`## Tensorflow Serving
For tensorflow serving, you should use savedModel format, I provided test case to show you how to convert a checkpoint to savedModel and how to use it.
```bash
pytest -s tests/test_units::test_savedModel #check the test case test_savedModel in tests/test_units
pytest -s tests/test_units::test_loadModel # call decode to inference and get predicted transcript and logits out.
```## Citations
If you find MASTER useful please cite our [paper](https://arxiv.org/abs/1910.02562):
```bibtex
@article{Lu2021MASTER,
title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
journal={Pattern Recognition},
year={2021}
}
```## License
This project is licensed under the MIT License. See LICENSE for more details.## Acknowledgements
Thanks to the authors and their repo:
- [SAR_TF](https://github.com/Pay20Y/SAR_TF)
- [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark)