https://github.com/annndruha/ocr-munji

Text detection of printed text in Munji language.
https://github.com/annndruha/ocr-munji

lingustics ocr ocr-python ocr-recognition

Last synced: 11 months ago
JSON representation

Text detection of printed text in Munji language.

Host: GitHub
URL: https://github.com/annndruha/ocr-munji
Owner: annndruha
Archived: true
Created: 2023-01-28T21:03:05.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-02-13T22:42:52.000Z (over 3 years ago)
Last Synced: 2025-03-22T02:42:26.313Z (over 1 year ago)
Topics: lingustics, ocr, ocr-python, ocr-recognition
Language: Python
Homepage:
Size: 9.23 MB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## OCR-Munji

Munji language text detection.

Detector created for book "Грюнберг А.Л. — Мунджанский язык Тексты" with printed text.

![readme.png](readme.png)

### Alghoritm

The detector is based on Google cloud vision text detection with additional heuristics that recognize the characters of Munji language. A variety of heuristics are used, such as the correlation of special characters or signs and the replacement of some letters obtained by Google text detection. (See `detector/mapping.py`)

### Using

#### Step 0
Install requirements

```commandline
pip install -r requirements.txt
```

#### Step 1

For use Google cloud vision you need to get [GOOGLE_APPLICATION_CREDENTIALS](https://cloud.google.com/vision/docs/detect-labels-image-client-libraries#before-you-begin) and set corresponding environment variable.

#### Step 2

Get Google cloud vision text detection response for image:
```commandline
python detector\google_ocr.py --path tests/page148/img.png
```
If command succeed, response saved as `.pickle` file.

#### Step 3

Get Munji text from image and google response:

```commandline
python -m detector tests/page148/img.png tests/page148/img.pickle
```

or simply
```commandline
python -m detector tests/page148/img.png
```
if response located in same dirictory with same filename as image.

**Result**

Resulted detected text located in `.txt`-file near `.pickle`-file.

### Tested on
| | version |
|------------------|----------|
| Windows | 11 |
| Python | 3.11 |
| pip | 23.0 |
| numpy | 1.24.1 |
| opencv-python | 4.7.0.68 |
| google | 3.0.0 |
| google-cloud-vision | 3.3.1 |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/annndruha/ocr-munji

Awesome Lists containing this project

README