Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Saeed-Biabani/Scene-Text-Recognition

Text recognition (optical character recognition) with deep learning methods in farsi.
https://github.com/Saeed-Biabani/Scene-Text-Recognition

crnn deep-learning farsi ocr persian persian-ocr python pytorch text-recognition

Last synced: 3 months ago
JSON representation

Text recognition (optical character recognition) with deep learning methods in farsi.

Awesome Lists containing this project

README

        


Scene Text Recognition

Scene Text Recognition With Deep Learning Methods In Farsi.

#### **Quick Links**
- [Dependencies](#Dependencies)
- [Getting Started](#Getting-Started)
- [Overview](#Overview)
- [Training](#Training)
- [Samples](#Samples)
- [References](#References)
- [License](#License)

## Dependencies
- Install Dependencies `$ pip install -r requirements.txt`
- Download Pretrained Weights [Here](https://huggingface.co/ordaktaktak/Scene-Text-Recognition)

## Getting Started



Fig. 1: Model architectur.

- Project Structure
```
.
├── src
│   ├── nn
│   │   ├── feature_extractor.py
│   │   ├── layers.py
│   │   └── ocr_model.py
│   └── utils
│   ├── dataset.py
│   ├── labelConverter.py
│   ├── loss_calculator.py
│   ├── misc.py
│   ├── trainUtils.py
│   └── transforms.py
├── config.py
└── train.py
```

- place dataset path in `config.py` file.
```python
ds_path = {
"train_ds" : "path/to/train/dataset",
"test_ds" : "path/to/test/dataset",
}
```

- DataSet Structure (each image must eventually contain a word)
```
.
├── Images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
│   ...
└── labels.json
```

- `labels.json` Contents
```json
{"img_1": "بالا", "img_2": "و", "img_3": "بدانند", "img_4": "چندین", "img_5": "به", ...}
```
## Overview



## Training

### Objective Function
Denote the training dataset by $\ TD = \langle X_i , Y_i \rangle\$ where $\ X_i$ is the training image and $\ Y_i$ is the word label. The training conducted by minimizing the objective function that negative log-likelihood of the conditional probability of word label.
```math
O = -\sum_{(X_i, Y_i) \in TD} \log P(Y_i|X_i)
```
This function calculates a cost from an image and its word label, and the modules in the framework are trained end-to-end manner.



Fig. 1: Model Training History.

### CTC Loss
CTC takes a sequence $\ H = h_1 , . . . , h_T$ , where $\ T$ is the sequence length, and outputs the probability of $\ \pi$, which is defined as
```math
P(\pi|H) = \prod_{t = 1}^T y_{{\pi}_t}^t
```
where $\ y_{{\pi}_t}^t$ is the probability of generating character $\ \pi_t$ at each time step $\ t$.




Model
Input Size
Recall
Precision
F1
Params
Speed(img/s)


$\ OCR-Base$
$\ 1$ $\ \times$ $\ 64$ $\ \times$ $\ 192$
$\ 0.993$
$\ 0.997$
$\ 0.997$
$\ 35,023,143$
$\ 89.24$


## Samples



## References
- [What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis](https://arxiv.org/abs/1904.01906)
- [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/abs/1507.05717)
- [Text recognition (optical character recognition) with deep learning methods, ICCV 2019 ](https://github.com/clovaai/deep-text-recognition-benchmark)

## 🛡️ License
Project is distributed under [MIT License](https://github.com/Saeed-Biabani/Scene-Text-Recognition/blob/main/LICENSE)