Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Canjie-Luo/MORAN_v2
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition
https://github.com/Canjie-Luo/MORAN_v2
attention-mechanism image-deformation image-rectification scene-text scene-text-recognition
Last synced: 18 days ago
JSON representation
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition
- Host: GitHub
- URL: https://github.com/Canjie-Luo/MORAN_v2
- Owner: Canjie-Luo
- License: mit
- Created: 2019-01-09T02:43:29.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-07-25T10:12:40.000Z (4 months ago)
- Last Synced: 2024-07-31T21:54:22.703Z (3 months ago)
- Topics: attention-mechanism, image-deformation, image-rectification, scene-text, scene-text-recognition
- Language: Python
- Size: 2.63 MB
- Stars: 628
- Watchers: 24
- Forks: 152
- Open Issues: 26
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition
![](https://img.shields.io/badge/version-v2-brightgreen.svg)
| Python 2.7 | Python 3.6 |
| :---: | :---: |
| [![Build Status](https://travis-ci.org/Canjie-Luo/MORAN_v2.svg?branch=master)](https://travis-ci.org/Canjie-Luo/MORAN_v2) | [![Build Status](https://travis-ci.org/Canjie-Luo/MORAN_v2.svg?branch=master)](https://travis-ci.org/Canjie-Luo/MORAN_v2) |MORAN is a network with rectification mechanism for general scene text recognition. The paper (accepted to appear in Pattern Recognition, 2019) in [arXiv](https://arxiv.org/abs/1901.03003), [final](https://www.sciencedirect.com/science/article/pii/S0031320319300263) version is available now.
[Here is a brief introduction in Chinese.](https://mp.weixin.qq.com/s/XbT_t_9C__KdyCCw8CGDVA)
![](demo/MORAN_v2.gif)
## Recent Update
- 2019.03.21 Fix a bug about Fractional Pickup.
- Support [Python 3](https://www.python.org/).## Improvements of MORAN v2:
- More stable rectification network for one-stage training
- Replace VGG backbone by ResNet
- Use bidirectional decoder (a trick borrowed from [ASTER](https://github.com/bgshih/aster))| Version | IIIT5K | SVT | IC03 | IC13 | SVT-P | CUTE80 | IC15 (1811) | IC15 (2077) |
| :---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:|
| MORAN v1 (curriculum training)\* | 91.2 | **88.3** | **95.0** | 92.4 | 76.1 | 77.4 | 74.7 | 68.8 |
| MORAN v2 (one-stage training) | **93.4** | **88.3** | 94.2 | **93.2** | **79.7** | **81.9** | **77.8** | **73.9** |\*The results of v1 were reported in our paper. If this project is helpful for your research, please [cite](https://github.com/Canjie-Luo/MORAN_v2/blob/master/README.md#citation) our Pattern Recognition paper.
## Requirements
(Welcome to develop MORAN together.)
We recommend you to use [Anaconda](https://www.anaconda.com/) to manage your libraries.
- [Python 2.7 or Python 3.6](https://www.python.org/) (Python 3 is faster than Python 2)
- [PyTorch](https://pytorch.org/) 0.3.* (`Higher version causes slow training, please ref to` [issue#8](https://github.com/Canjie-Luo/MORAN_v2/issues/8#issuecomment-455416756))
- [TorchVision](https://pypi.org/project/torchvision/)
- [OpenCV](https://opencv.org/)
- [PIL (Pillow)](https://pillow.readthedocs.io/en/stable/#)
- [Colour](https://pypi.org/project/colour/)
- [LMDB](https://pypi.org/project/lmdb/)
- [matplotlib](https://pypi.org/project/matplotlib/)Or use [pip](https://pypi.org/project/pip/) to install the libraries. (Maybe the torch is different from the anaconda version. Please check carefully and fix the warnings in training stage if necessary.)
```bash
pip install -r requirements.txt
```## Data Preparation
Please convert your own dataset to **LMDB** format by using the [tool](https://github.com/bgshih/crnn/blob/master/tool/create_dataset.py) (run in **Python 2.7**) provided by [@Baoguang Shi](https://github.com/bgshih).You can also download the training ([NIPS 2014](http://www.robots.ox.ac.uk/~vgg/data/text/), [CVPR 2016](http://www.robots.ox.ac.uk/~vgg/data/scenetext/)) and testing datasets prepared by us.
- [BaiduCloud (about 20G training datasets and testing datasets in **LMDB** format)](https://pan.baidu.com/s/1TqZfvoEhyv57yf4YBjSzFg), password: l8em
- [Google Drive (testing datasets in **LMDB** format)](https://drive.google.com/open?id=1NAs78a38xkl1MhodoD7BM0Lh3v_sFwYs)
- [OneDrive (testing datasets in **LMDB** format)](https://1drv.ms/f/s!Am3wqyDHs7r0hkHUYy0edaC2UC3c)The raw pictures of testing datasets can be found [here](https://github.com/chengzhanzhan/STR).
## Training and Testing
Modify the path to dataset folder in `train_MORAN.sh`:
```bash
--train_nips path_to_dataset \
--train_cvpr path_to_dataset \
--valroot path_to_dataset \
```And start training: (manually decrease the learning rate for your task)
```bash
sh train_MORAN.sh
```
- The training process should take **less than 20s** for 100 iterations on a 1080Ti.## Demo
Download the model parameter file `demo.pth`.
- [BaiduCloud](https://pan.baidu.com/s/1TqZfvoEhyv57yf4YBjSzFg) (password: l8em)
- [Google Drive](https://drive.google.com/file/d/1IDvT51MXKSseDq3X57uPjOzeSYI09zip/view?usp=sharing)
- [OneDrive](https://1drv.ms/u/s!Am3wqyDHs7r0hkAl0AtRIODcqOV3)Put it into root folder. Then, execute the `demo.py` for more visualizations.
```bash
python demo.py
```![](demo/demo.png)
## Citation
```
@article{cluo2019moran,
author = {Canjie Luo and Lianwen Jin and Zenghui Sun},
title = {MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition},
journal = {Pattern Recognition},
volume = {90},
pages = {109--118},
year = {2019},
publisher = {Elsevier}
}
```## Acknowledgment
The repo is developed based on [@Jieru Mei's](https://github.com/meijieru) [crnn.pytorch](https://github.com/meijieru/crnn.pytorch) and [@marvis'](https://github.com/marvis) [ocr_attention](https://github.com/marvis/ocr_attention). Thanks for your contribution.## Attention
The project is only free for academic research purposes.