Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Saeed-Biabani/Scene-Text-Recognition

Text recognition (optical character recognition) with deep learning methods in farsi.
https://github.com/Saeed-Biabani/Scene-Text-Recognition

crnn deep-learning farsi ocr persian persian-ocr python pytorch text-recognition

Last synced: 3 months ago
JSON representation

Text recognition (optical character recognition) with deep learning methods in farsi.

Host: GitHub
URL: https://github.com/Saeed-Biabani/Scene-Text-Recognition
Owner: Saeed-Biabani
License: mit
Created: 2024-03-17T18:34:34.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-07-11T17:17:51.000Z (4 months ago)
Last Synced: 2024-07-11T19:51:19.380Z (4 months ago)
Topics: crnn, deep-learning, farsi, ocr, persian, persian-ocr, python, pytorch, text-recognition
Language: Python
Homepage:
Size: 575 KB
Stars: 7
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


  
Scene Text Recognition


Scene Text Recognition With Deep Learning Methods In Farsi.

#### **Quick Links**

- [Dependencies](#Dependencies)

- [Getting Started](#Getting-Started)

- [Overview](#Overview)

- [Training](#Training)

- [Samples](#Samples)

- [References](#References)

- [License](#License)

## Dependencies

- Install Dependencies `$ pip install -r requirements.txt`

- Download Pretrained Weights [Here](https://huggingface.co/ordaktaktak/Scene-Text-Recognition)

## Getting Started



  


  Fig. 1: Model architectur.


- Project Structure

```

.

├── src

│   ├── nn

│   │   ├── feature_extractor.py

│   │   ├── layers.py

│   │   └── ocr_model.py

│   └── utils

│       ├── dataset.py

│       ├── labelConverter.py

│       ├── loss_calculator.py

│       ├── misc.py

│       ├── trainUtils.py

│       └── transforms.py

├── config.py

└── train.py

```

- place dataset path in `config.py` file.

```python

ds_path = {

    "train_ds" : "path/to/train/dataset",

    "test_ds" : "path/to/test/dataset",

}

```

- DataSet Structure (each image must eventually contain a word)

```

.

├── Images

│   ├── img_1.jpg

│   ├── img_2.jpg

│   ├── img_3.jpg

│   ├── img_4.jpg

│   └── img_5.jpg

│   ...

└── labels.json

```

- `labels.json` Contents

```json

{"img_1": "بالا", "img_2": "و", "img_3": "بدانند", "img_4": "چندین", "img_5": "به", ...}

```

## Overview



  



## Training

### Objective Function

Denote the training dataset by $\ TD = \langle X_i , Y_i \rangle\$ where $\ X_i$ is the training image and $\ Y_i$ is the word label. The training conducted by minimizing the objective function that negative log-likelihood of the conditional probability of word label.

```math

O = -\sum_{(X_i, Y_i) \in TD} \log P(Y_i|X_i)

```

This function calculates a cost from an image and its word label, and the modules in the framework are trained end-to-end manner.



  


  Fig. 1: Model Training History.


### CTC Loss

CTC takes a sequence $\ H = h_1 , . . . , h_T$ , where $\ T$ is the sequence length, and outputs the probability of $\ \pi$, which is defined as

```math

P(\pi|H) = \prod_{t = 1}^T y_{{\pi}_t}^t

```

where $\ y_{{\pi}_t}^t$ is the probability of generating character $\ \pi_t$ at each time step $\ t$.



  

    

      Model

      Input Size

      Recall

      Precision

      F1

      Params

      Speed^(img/s)

    

    

       $\ OCR-Base$ 

       $\ 1$ $\ \times$ $\ 64$ $\ \times$ $\ 192$

       $\ 0.993$ 

       $\ 0.997$ 

       $\ 0.997$ 

       $\ 35,023,143$ 

       $\ 89.24$ 

    

   



## Samples



  



## References

- [What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis](https://arxiv.org/abs/1904.01906)

- [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/abs/1507.05717)

- [Text recognition (optical character recognition) with deep learning methods, ICCV 2019 ](https://github.com/clovaai/deep-text-recognition-benchmark)

## 🛡️ License 

Project is distributed under [MIT License](https://github.com/Saeed-Biabani/Scene-Text-Recognition/blob/main/LICENSE)