Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gnonio/korporize
OCR - Object Character Recognition for any image you browse upon
https://github.com/gnonio/korporize
javascript ocr-recognition webextensions
Last synced: 14 days ago
JSON representation
OCR - Object Character Recognition for any image you browse upon
- Host: GitHub
- URL: https://github.com/gnonio/korporize
- Owner: gnonio
- License: apache-2.0
- Created: 2020-05-06T21:46:42.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-05-20T22:28:25.000Z (over 4 years ago)
- Last Synced: 2024-08-01T16:44:46.372Z (3 months ago)
- Topics: javascript, ocr-recognition, webextensions
- Language: JavaScript
- Homepage:
- Size: 35.5 MB
- Stars: 11
- Watchers: 1
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
[![korporize](./img/korporize.svg)](http://tesseract.projectnaptha.com)
## korpora - OCR - Optical Character Recognition
Offline text recognition from any image. This web extension will enable context menu access to extract text from any image while browsing. Builds upon [Tesseract.js](https://github.com/naptha/tesseract.js)
****
### Install
- [Addons Mozilla page](https://addons.mozilla.org/en-GB/firefox/addon/korporize/)
#### Alternate install for advanced users
- [download this repository](https://github.com/gnonio/korporize/archive/master.zip)
- follow instructions for [Temporary installation in Firefox](./user-install.md)****
### Usage
- Right click over an image in a web page
- Select "Extract Text from Image"
- A popup will open with korporize interface
- Wait for tesseract to work in the background
- Obtain results in korporize panel
- (Optional) copy results to clipboard****
To obtain good results:
- make sure the automatic language detected is suitable for the characters in the image loaded
- force another language via Options page
- increase quality in Options page
(try Normal or Best - both will take longer)
- make sure you have a suitable page segmentation for the image
(will make this choice handier in future releases)
- choose a high resolution version of the image****
### Features
- Extracts text from any image while browsing
- Works offline (requires network only the first time a language is used to cache the dictionaries)
- Automatic language detection (based on the visited web page)
- Prevents downloading twice already loaded images****
### Notes
- Careful with the size of language dictionaries
- Expect around 8Mb for Normal and 12Mb for Best Quality per language
- Aside from above dictionaries no other data is ever stored by korporize****
### Todo
- Many other options for accessing Tesseract functionality (image from link, PDF load and save, etc...)
- Preloading of language dictionaries (via Options page)
- Provide some cache management options
- Provide access as an API for other webextensions