Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/steventhanna/ocr
Java implementation of Optical Character Recognition
https://github.com/steventhanna/ocr
Last synced: 5 days ago
JSON representation
Java implementation of Optical Character Recognition
- Host: GitHub
- URL: https://github.com/steventhanna/ocr
- Owner: steventhanna
- License: mit
- Created: 2015-01-06T19:32:06.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2015-01-12T03:11:04.000Z (almost 10 years ago)
- Last Synced: 2024-04-14T22:19:56.416Z (7 months ago)
- Language: Java
- Size: 5.49 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
OCR
===Java implementation of Optical Character Recognition
How It Works
------------
The core concept, at the character level, is image matching with automatic position and aspect ratio correction, using a least-square-error matching algorithm.Phases
------### Training Phase
1. Printing out the characters which it is expected to recognize
2. Scanning those characters into an image
3. Cropping the image down so that it includes only the training characters
4. Telling the OCR engine to use the resulting training image, and specifying which characters the image contains### Character Recognition
1. Load training images
2. Load the scanned image of the document to be converted to text
3. Convert the scanned image to grayscale
4. Filter the scanned image using a low-pass Finite Impulse Response (FIR) filter to remove dust
5. Break the document into lines of text, based on whitespace between the text lines
6. Break each line into characters, based on whitespace between the characters; using the average character width, determine where spaces occur within the line
7. For each character, determine the most closely matching character from the training images and append that to the output text; for each space, append a space character to the output text
8. Output the accumulated text
9. If there are any more scanned images to be converted to text, return to step 2