https://github.com/digitronik/pytesser

Automatically exported from code.google.com/p/pytesser
https://github.com/digitronik/pytesser

Last synced: 3 months ago
JSON representation

Automatically exported from code.google.com/p/pytesser

Host: GitHub
URL: https://github.com/digitronik/pytesser
Owner: digitronik
License: other
Created: 2016-04-24T15:27:51.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2016-04-24T15:30:24.000Z (about 9 years ago)
Last Synced: 2025-02-08T08:10:00.682Z (5 months ago)
Language: Python
Size: 1.78 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README
- Changelog: ChangeLog
- License: LICENSE
- Authors: AUTHORS

Awesome Lists containing this project

README

        Introduction:

============

PyTesser is an Optical Character Recognition module for Python. It takes 

as input an image or image file and outputs a string.

PyTesser uses the Tesseract OCR engine (an Open Source project at Google), 

converting images to an accepted format and calling the Tesseract 

executable as an external script. A Windows executable is provided 

along with the Python scripts. The scripts should work in Linux as well. 

PyTesser:

http://code.google.com/p/pytesser/

Tesseract:

http://code.google.com/p/tesseract-ocr/

Dependencies:

=============

PIL is required to work with images in memory. PyTesser has been tested with Python 2.4 in Windows XP. 

http://www.pythonware.com/products/pil/

Installation:

==============

PyTesser has no installation functionality in this release.  Extract pytesser.zip

into directory with other scripts.  Necessary files are listed in File Dependencies below.  

Usage:

================================

>>> from pytesser import *

>>> im = Image.open('phototest.tif')

>>> text = image_to_string(im)

>>> print text

This is a lot of 12 point text to test the

ocr code and see if it works on all types

of file format.

The quick brown dog jumped over the

lazy fox. The quick brown dog jumped

over the lazy fox. The quick brown dog

jumped over the lazy fox. The quick

brown dog jumped over the lazy fox.

>>> try:

... 	text = image_file_to_string('fnord.tif', graceful_errors=False)

... except errors.Tesser_General_Exception, value:

... 	print "fnord.tif is incompatible filetype.  Try graceful_errors=True"

... 	print value

... 	

fnord.tif is incompatible filetype.  Try graceful_errors=True

Tesseract Open Source OCR Engine

read_tif_image:Error:Illegal image format:Compression

Tessedit:Error:Read of file failed:fnord.tif

Signal_exit 31 ABORT. LocCode: 3  AbortCode: 3

>>> text = image_file_to_string('fnord.tif', graceful_errors=True)

>>> print "fnord.tif contents:", text

fnord.tif contents: fnord

>>> text = image_file_to_string('fonts_test.png', graceful_errors=True)

>>> print text

12 pt

And Arnazwngw few dwscotheques provwde jukeboxes

Tames Amazmgly few dnscotheques pmvxde Jukeboxes

24 pt:

Arial: Amazingly few discotheques

provide jul

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/digitronik/pytesser

Awesome Lists containing this project

README