https://github.com/codedotjs/pytesser-module
Python wrapper for the tesseract OCR engine. The module is based on OpenCV.
https://github.com/codedotjs/pytesser-module
Last synced: 7 months ago
JSON representation
Python wrapper for the tesseract OCR engine. The module is based on OpenCV.
- Host: GitHub
- URL: https://github.com/codedotjs/pytesser-module
- Owner: CodeDotJS
- Created: 2015-05-18T20:52:42.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2015-05-29T10:39:32.000Z (over 10 years ago)
- Last Synced: 2025-03-20T00:41:18.980Z (8 months ago)
- Language: Python
- Size: 141 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Pytesser-module
Python wrapper for the tesseract OCR engine. The module is based on OpenCV.
Pytesser
========
Python wrapper for the tesseract OCR engine. The module is based on OpenCV.
Informations
------------
There is already multiples tesseract python modules, but none of them satisfied me. This one is different on the following point:
* All the classes are put in the same file and all inessential class are removed
* Use OpenCV instead of PIL (to really an advantage because PIL as far more widespread, but better fit my needs ;))
* Use subprocess.communicate instead of subprocess.wait to avoid any output in the shell or in the programs that use the module.
* Management of the differents languages via the option '-l' because the original pytesser use the default language which is english. By this way the detection of french for instance is totally inacurrate.
* Management of of the pagesegmode, which allow to modify the behavior of tesseract if we want for instance to detect only one character, a word or a line.
* The code is far more straightforward (my opinion)
How to use it ?
---------------
There is to ways to use it. Either you give it a filename, either directly an IplImage. For a filename you can do:
import pytesser
txt = pytesser.image_to_string("myimage.jpg") #By default language is eng, and page seg mode auto
#To give specifify parameters:
txt = pytesser.image_to_string("myimage.jpg","fra",pytesser.PSM_SINGLE_WORD) #Analyse image as a single french word
Or you can directly give it an IplImage like this:
image = cv.LoadImage("myimage.jpg")
txt = pytesser.iplimage_to_string(image)
Or give it a mat:
image = cv2.imwrite("myimage.jpg")
txt = pytesser.mat_to_string(image)