https://github.com/annndruha/ocr-munji
Text detection of printed text in Munji language.
https://github.com/annndruha/ocr-munji
lingustics ocr ocr-python ocr-recognition
Last synced: 8 months ago
JSON representation
Text detection of printed text in Munji language.
- Host: GitHub
- URL: https://github.com/annndruha/ocr-munji
- Owner: annndruha
- Archived: true
- Created: 2023-01-28T21:03:05.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-13T22:42:52.000Z (about 3 years ago)
- Last Synced: 2025-03-22T02:42:26.313Z (12 months ago)
- Topics: lingustics, ocr, ocr-python, ocr-recognition
- Language: Python
- Homepage:
- Size: 9.23 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## OCR-Munji
Munji language text detection.
Detector created for book "Грюнберг А.Л. — Мунджанский язык Тексты" with printed text.

### Alghoritm
The detector is based on Google cloud vision text detection with additional heuristics that recognize the characters of Munji language. A variety of heuristics are used, such as the correlation of special characters or signs and the replacement of some letters obtained by Google text detection. (See `detector/mapping.py`)
### Using
#### Step 0
Install requirements
```commandline
pip install -r requirements.txt
```
#### Step 1
For use Google cloud vision you need to get [GOOGLE_APPLICATION_CREDENTIALS](https://cloud.google.com/vision/docs/detect-labels-image-client-libraries#before-you-begin) and set corresponding environment variable.
#### Step 2
Get Google cloud vision text detection response for image:
```commandline
python detector\google_ocr.py --path tests/page148/img.png
```
If command succeed, response saved as `.pickle` file.
#### Step 3
Get Munji text from image and google response:
```commandline
python -m detector tests/page148/img.png tests/page148/img.pickle
```
or simply
```commandline
python -m detector tests/page148/img.png
```
if response located in same dirictory with same filename as image.
**Result**
Resulted detected text located in `.txt`-file near `.pickle`-file.
### Tested on
| | version |
|------------------|----------|
| Windows | 11 |
| Python | 3.11 |
| pip | 23.0 |
| numpy | 1.24.1 |
| opencv-python | 4.7.0.68 |
| google | 3.0.0 |
| google-cloud-vision | 3.3.1 |