Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Saeed-Biabani/Scene-Text-Recognition
Text recognition (optical character recognition) with deep learning methods in farsi.
https://github.com/Saeed-Biabani/Scene-Text-Recognition
crnn deep-learning farsi ocr persian persian-ocr python pytorch text-recognition
Last synced: 3 months ago
JSON representation
Text recognition (optical character recognition) with deep learning methods in farsi.
- Host: GitHub
- URL: https://github.com/Saeed-Biabani/Scene-Text-Recognition
- Owner: Saeed-Biabani
- License: mit
- Created: 2024-03-17T18:34:34.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-07-11T17:17:51.000Z (4 months ago)
- Last Synced: 2024-07-11T19:51:19.380Z (4 months ago)
- Topics: crnn, deep-learning, farsi, ocr, persian, persian-ocr, python, pytorch, text-recognition
- Language: Python
- Homepage:
- Size: 575 KB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Scene Text Recognition
Scene Text Recognition With Deep Learning Methods In Farsi.
#### **Quick Links**
- [Dependencies](#Dependencies)
- [Getting Started](#Getting-Started)
- [Overview](#Overview)
- [Training](#Training)
- [Samples](#Samples)
- [References](#References)
- [License](#License)## Dependencies
- Install Dependencies `$ pip install -r requirements.txt`
- Download Pretrained Weights [Here](https://huggingface.co/ordaktaktak/Scene-Text-Recognition)## Getting Started
Fig. 1: Model architectur.- Project Structure
```
.
├── src
│ ├── nn
│ │ ├── feature_extractor.py
│ │ ├── layers.py
│ │ └── ocr_model.py
│ └── utils
│ ├── dataset.py
│ ├── labelConverter.py
│ ├── loss_calculator.py
│ ├── misc.py
│ ├── trainUtils.py
│ └── transforms.py
├── config.py
└── train.py
```- place dataset path in `config.py` file.
```python
ds_path = {
"train_ds" : "path/to/train/dataset",
"test_ds" : "path/to/test/dataset",
}
```- DataSet Structure (each image must eventually contain a word)
```
.
├── Images
│ ├── img_1.jpg
│ ├── img_2.jpg
│ ├── img_3.jpg
│ ├── img_4.jpg
│ └── img_5.jpg
│ ...
└── labels.json
```- `labels.json` Contents
```json
{"img_1": "بالا", "img_2": "و", "img_3": "بدانند", "img_4": "چندین", "img_5": "به", ...}
```
## Overview
## Training
### Objective Function
Denote the training dataset by $\ TD = \langle X_i , Y_i \rangle\$ where $\ X_i$ is the training image and $\ Y_i$ is the word label. The training conducted by minimizing the objective function that negative log-likelihood of the conditional probability of word label.
```math
O = -\sum_{(X_i, Y_i) \in TD} \log P(Y_i|X_i)
```
This function calculates a cost from an image and its word label, and the modules in the framework are trained end-to-end manner.
Fig. 1: Model Training History.### CTC Loss
CTC takes a sequence $\ H = h_1 , . . . , h_T$ , where $\ T$ is the sequence length, and outputs the probability of $\ \pi$, which is defined as
```math
P(\pi|H) = \prod_{t = 1}^T y_{{\pi}_t}^t
```
where $\ y_{{\pi}_t}^t$ is the probability of generating character $\ \pi_t$ at each time step $\ t$.
Model
Input Size
Recall
Precision
F1
Params
Speed(img/s)
$\ OCR-Base$
$\ 1$ $\ \times$ $\ 64$ $\ \times$ $\ 192$
$\ 0.993$
$\ 0.997$
$\ 0.997$
$\ 35,023,143$
$\ 89.24$
## Samples
## References
- [What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis](https://arxiv.org/abs/1904.01906)
- [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/abs/1507.05717)
- [Text recognition (optical character recognition) with deep learning methods, ICCV 2019 ](https://github.com/clovaai/deep-text-recognition-benchmark)## 🛡️ License
Project is distributed under [MIT License](https://github.com/Saeed-Biabani/Scene-Text-Recognition/blob/main/LICENSE)