https://github.com/edsu/ocropy
minimalist wrapper around ocropus for generating hOCR documents from images
https://github.com/edsu/ocropy
Last synced: 9 months ago
JSON representation
minimalist wrapper around ocropus for generating hOCR documents from images
- Host: GitHub
- URL: https://github.com/edsu/ocropy
- Owner: edsu
- Created: 2010-09-28T19:38:06.000Z (over 15 years ago)
- Default Branch: master
- Last Pushed: 2010-09-28T20:27:41.000Z (over 15 years ago)
- Last Synced: 2025-07-26T22:18:19.914Z (11 months ago)
- Language: Python
- Homepage:
- Size: 641 KB
- Stars: 9
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
README
ocropy is a minimal wrapper around the optical character recognition software
ocropus [1] for doing quick and dirty OCR jobs. It converts TIFF, JPEG and
PNG images to grayscale before attempting to do the ocr, and internally
handles some temporary files for you.
Usage:
import ocropy
hocr = ocropy.hocr("image.png")
If you would like to further process the image, and don't want to go through
the effort of reading it again you can get the PIL Image object that was
used to convert the image to grayscale:
hocr, image = hocr_with_image("page.png")
And you can pass it a URL for an image if that's more convenient:
hocr = ocropy.hocr("http://example.com/image.tif")
Requirements:
* ocropus (available to ubuntu systems w/ apt-get)
* PIL
License:
Public Domain
Contributors:
Ed Summers