https://github.com/codedotjs/pytesser-module

Python wrapper for the tesseract OCR engine. The module is based on OpenCV.
https://github.com/codedotjs/pytesser-module

Last synced: about 1 year ago
JSON representation

Python wrapper for the tesseract OCR engine. The module is based on OpenCV.

Host: GitHub
URL: https://github.com/codedotjs/pytesser-module
Owner: CodeDotJS
Created: 2015-05-18T20:52:42.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2015-05-29T10:39:32.000Z (about 11 years ago)
Last Synced: 2025-03-20T00:41:18.980Z (over 1 year ago)
Language: Python
Size: 141 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Pytesser-module

Python wrapper for the tesseract OCR engine. The module is based on OpenCV.

Pytesser

========

Python wrapper for the tesseract OCR engine. The module is based on OpenCV.

Informations

------------

There is already multiples tesseract python modules, but none of them satisfied me. This one is different on the following point:

* All the classes are put in the same file and all inessential class are removed

* Use OpenCV instead of PIL (to really an advantage because PIL as far more widespread, but better fit my needs ;))

* Use subprocess.communicate instead of subprocess.wait to avoid any output in the shell or in the programs that use the module.

* Management of the differents languages via the option '-l' because the original pytesser use the default language which is english. By this way the detection of french for instance is totally inacurrate.

* Management of of the pagesegmode, which allow to modify the behavior of tesseract if we want for instance to detect only one character, a word or a line.

* The code is far more straightforward (my opinion)

How to use it ?

---------------

There is to ways to use it. Either you give it a filename, either directly an IplImage. For a filename you can do:

    import pytesser

    txt = pytesser.image_to_string("myimage.jpg") #By default language is eng, and page seg mode auto

    #To give specifify parameters:

    txt = pytesser.image_to_string("myimage.jpg","fra",pytesser.PSM_SINGLE_WORD) #Analyse image as a single french word

Or you can directly give it an IplImage like this:

    image = cv.LoadImage("myimage.jpg")

    txt = pytesser.iplimage_to_string(image) 

Or give it a mat:

    image = cv2.imwrite("myimage.jpg")

    txt = pytesser.mat_to_string(image)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/codedotjs/pytesser-module

Awesome Lists containing this project

README