https://github.com/srirangav/vocr

MacOSX command line program to perform optical character recognition (OCR) on images and PDFs
https://github.com/srirangav/vocr

cli macos ocr

Last synced: over 1 year ago
JSON representation

MacOSX command line program to perform optical character recognition (OCR) on images and PDFs

Host: GitHub
URL: https://github.com/srirangav/vocr
Owner: srirangav
License: mit
Created: 2022-04-20T07:51:20.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2022-10-29T07:23:12.000Z (over 3 years ago)
Last Synced: 2025-03-26T19:36:54.749Z (over 1 year ago)
Topics: cli, macos, ocr
Language: Objective-C
Homepage:
Size: 254 KB
Stars: 7
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.txt
- License: LICENSE.txt

Awesome Lists containing this project

README

README
------

vocr v0.3.1

Homepage:

https://github.com/srirangav/vocr

About:

vocr is a MacOSX command line program that can perform optical
character recognition (OCR) on images and PDF files. It outputs
any text found in the input files to stdout. vocr relies on,
and derives its names from, the Vision framework (v for [V]ision).

Usage:

vocr [-v] [-f] [-p] [-a [accurate|fast]] [-i [no|tab]] [-l [lang]] [files]

If -v is specified, vocr runs in [v]erbose mode and outputs
errors and informational messages.

If -f is specified, vocr will perform ocr on each page in a
PDF. By default, if a PDF already contains a text representation
of a given page, vocr will output that text.

If -p is specified, when OCR'ing a PDF, a page break (^L) will
be inserted at the end of each page.

if -a is specified with the 'fast' option, vocr will use the
'fast' ocr algorithm, which may be useful for non-English
languages, such as German. If -a is specified with the 'accurate'
option, vour will use the 'accurate' ocr algorithm (which is the
default).

If -i is specified with the 'no' option, vocr will not attempt
to indent any text that is OCR'ed. If -i is specified with the
'tab' option, vocr will indent using tabs instead of spaces (by
default vocr indents using spaces).

If -l is specified, on MacOSX 11.x (BigSur) and newer, vocr
will ask the Vision framework to recognize the text in the
specified language. The supported language options are:

'de' - German
'en' - English
'fr' - French
'it' - Italian
'pt' - Portuguese
'es' - Spanish
'zh' - Chinese

Build:

$ ./configure
$ make

Install:

$ ./configure
$ make
$ make install

By default, vocr is installed in /usr/local/bin. To install
it in a different location, the alternate installation prefix
can be supplied to configure:

$ ./configure --prefix=""

or, alternately to make:

$ make install PREFIX=""

For example, the following will install vocr in /opt/local:

$ make PREFIX=/opt/local install

A DESTDIR can also be specified for staging purposes (with or
without an alternate prefix):

$ make DESTDIR="" [PREFIX=""] install

Dependencies:

vocr relies on VNRecognizeTextRequest in Apple's Vision
framework, which is available on MacOSX 10.15 (Catalina)
and newer:

https://developer.apple.com/documentation/vision/vnrecognizetextrequest

History:

v. 0.3.1 - updates for Monterey (MacOSX 12)
v. 0.3.0 - switch to PDFKit
v. 0.2.3 - fix manpage formatting
v. 0.2.2 - move source files into configure.ac
v. 0.2.1 - update configure with additional compiler options
related to security
v. 0.2.0 - print text as soon as it has been recognized,
default to quiet mode, add support for languages
other than English
v. 0.1.0 - initial release

Platforms:

vocr has been tested on MacOSX 11 (BigSur) and 12 (Monterey) on M1
and x86_64. It should also work on MacOSX 10.15+ (Catalina) x86_64.

License:

See LICENSE.txt

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/srirangav/vocr

Awesome Lists containing this project

README