https://github.com/zabir-nabil/autoocr

Python wrapper for cross platform tesseract OCR engine with multiple languages (e.g. Bangla)
https://github.com/zabir-nabil/autoocr

bangla-ocr image-to-text multi-language-ocr ocr python-ocr tesseract

Last synced: 8 months ago
JSON representation

Python wrapper for cross platform tesseract OCR engine with multiple languages (e.g. Bangla)

Host: GitHub
URL: https://github.com/zabir-nabil/autoocr
Owner: zabir-nabil
License: mit
Created: 2019-05-12T03:57:54.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2023-01-30T08:59:32.000Z (about 3 years ago)
Last Synced: 2025-06-12T11:04:23.506Z (9 months ago)
Topics: bangla-ocr, image-to-text, multi-language-ocr, ocr, python-ocr, tesseract
Language: Python
Homepage: https://pypi.org/project/autoocr/
Size: 1.13 MB
Stars: 17
Watchers: 1
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # autoocr

> A Python wrapper for cross platform tesseract OCR engine with multiple languages (e.g. Bangla)

## Installations

```

pip3 install autoocr

```

## Usage

### Mac OS

* Import the library

```

from autoocr import AutoOCR # import the AutoOCR class

```

* Specify the language

```

oa = AutoOCR(lang='bangla') # specify the language code

```

* Set the tessdata folder, on mac you can do `brew list tesseract` to get the path. This is only needed once.

```

oa.set_datapath('/usr/local/Cellar/tesseract/4.0.0_1/share/tessdata')

```

* Get the text from image by passing the path to image

```

out_text = oa.get_text('image_ocr.jpg')

```

[![demo of autoocr on mac](demo.gif)](https://www.youtube.com/channel/UCVaObCskAlvvctDP9vZvW6w)

### Windows

* Install tesseract engine

* Import the library

```

from autoocr import AutoOCR # import the AutoOCR class

```

* Specify the language

```

oa = AutoOCR(lang='bangla') # specify the language code

```

* Set the tessdata folder. This is only needed once.

```

oa.set_datapath('/path/to/tessdata')

```

* Get the text from image by passing the path to image

```

out_text = oa.get_text('image_ocr.jpg')

```

### Linux

* Install tesseract engine. Follow this page [tesseract-ocr](https://tesseract-ocr.github.io/)

* Import the library

```

from autoocr import AutoOCR # import the AutoOCR class

```

* Specify the language

```

oa = AutoOCR(lang='bangla') # specify the language code

```

* Set the tessdata folder. This is only needed once. Run, `rpm -ql tesseract` for yum to get the location.

```

oa.set_datapath('/path/to/tessdata')

```

* Get the text from image by passing the path to image

```

out_text = oa.get_text('image_ocr.jpg')

```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

[![MIT License](https://opensource.org/files/CDPost.png)](https://opensource.org/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zabir-nabil/autoocr

Awesome Lists containing this project

README