https://github.com/zabir-nabil/autoocr
Python wrapper for cross platform tesseract OCR engine with multiple languages (e.g. Bangla)
https://github.com/zabir-nabil/autoocr
bangla-ocr image-to-text multi-language-ocr ocr python-ocr tesseract
Last synced: 3 months ago
JSON representation
Python wrapper for cross platform tesseract OCR engine with multiple languages (e.g. Bangla)
- Host: GitHub
- URL: https://github.com/zabir-nabil/autoocr
- Owner: zabir-nabil
- License: mit
- Created: 2019-05-12T03:57:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-01-30T08:59:32.000Z (over 2 years ago)
- Last Synced: 2025-06-12T11:04:23.506Z (4 months ago)
- Topics: bangla-ocr, image-to-text, multi-language-ocr, ocr, python-ocr, tesseract
- Language: Python
- Homepage: https://pypi.org/project/autoocr/
- Size: 1.13 MB
- Stars: 17
- Watchers: 1
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# autoocr
> A Python wrapper for cross platform tesseract OCR engine with multiple languages (e.g. Bangla)## Installations
```
pip3 install autoocr
```## Usage
### Mac OS
* Import the library
```
from autoocr import AutoOCR # import the AutoOCR class
```* Specify the language
```
oa = AutoOCR(lang='bangla') # specify the language code
```
* Set the tessdata folder, on mac you can do `brew list tesseract` to get the path. This is only needed once.```
oa.set_datapath('/usr/local/Cellar/tesseract/4.0.0_1/share/tessdata')
```
* Get the text from image by passing the path to image```
out_text = oa.get_text('image_ocr.jpg')
```[](https://www.youtube.com/channel/UCVaObCskAlvvctDP9vZvW6w)
### Windows
* Install tesseract engine
* Import the library
```
from autoocr import AutoOCR # import the AutoOCR class
```* Specify the language
```
oa = AutoOCR(lang='bangla') # specify the language code
```
* Set the tessdata folder. This is only needed once.```
oa.set_datapath('/path/to/tessdata')
```
* Get the text from image by passing the path to image```
out_text = oa.get_text('image_ocr.jpg')
```### Linux
* Install tesseract engine. Follow this page [tesseract-ocr](https://tesseract-ocr.github.io/)
* Import the library
```
from autoocr import AutoOCR # import the AutoOCR class
```* Specify the language
```
oa = AutoOCR(lang='bangla') # specify the language code
```
* Set the tessdata folder. This is only needed once. Run, `rpm -ql tesseract` for yum to get the location.```
oa.set_datapath('/path/to/tessdata')
```
* Get the text from image by passing the path to image```
out_text = oa.get_text('image_ocr.jpg')
```## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
[](https://opensource.org/)