Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/steventhanna/ocr

Java implementation of Optical Character Recognition
https://github.com/steventhanna/ocr

Last synced: 5 days ago
JSON representation

Java implementation of Optical Character Recognition

Host: GitHub
URL: https://github.com/steventhanna/ocr
Owner: steventhanna
License: mit
Created: 2015-01-06T19:32:06.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2015-01-12T03:11:04.000Z (almost 10 years ago)
Last Synced: 2024-04-14T22:19:56.416Z (7 months ago)
Language: Java
Size: 5.49 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

OCR
===

Java implementation of Optical Character Recognition

How It Works
------------
The core concept, at the character level, is image matching with automatic position and aspect ratio correction, using a least-square-error matching algorithm.

Phases
------

### Training Phase
1. Printing out the characters which it is expected to recognize
2. Scanning those characters into an image
3. Cropping the image down so that it includes only the training characters
4. Telling the OCR engine to use the resulting training image, and specifying which characters the image contains

### Character Recognition
1. Load training images
2. Load the scanned image of the document to be converted to text
3. Convert the scanned image to grayscale
4. Filter the scanned image using a low-pass Finite Impulse Response (FIR) filter to remove dust
5. Break the document into lines of text, based on whitespace between the text lines
6. Break each line into characters, based on whitespace between the characters; using the average character width, determine where spaces occur within the line
7. For each character, determine the most closely matching character from the training images and append that to the output text; for each space, append a space character to the output text
8. Output the accumulated text
9. If there are any more scanned images to be converted to text, return to step 2