Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/straussmaximilian/ocrmac

A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple.
https://github.com/straussmaximilian/ocrmac

Last synced: 4 days ago
JSON representation

A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple.

Host: GitHub
URL: https://github.com/straussmaximilian/ocrmac
Owner: straussmaximilian
License: mit
Created: 2022-12-30T17:58:50.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-11-07T12:01:04.000Z (2 months ago)
Last Synced: 2024-12-29T01:05:50.615Z (11 days ago)
Language: Jupyter Notebook
Size: 4.95 MB
Stars: 291
Watchers: 2
Forks: 24
Open Issues: 4
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.md
- Contributing: CONTRIBUTING.rst
- License: LICENSE

Awesome Lists containing this project

README

        [![Downloads](https://static.pepy.tech/badge/ocrmac)](https://pepy.tech/project/ocrmac)

# ocrmac

A small Python wrapper to extract text from images on a Mac system. Uses the vision framework from Apple. Simply pass a path to an image or a `PIL` image directly and get lists of texts, their confidence, and bounding box.

This only works on macOS systems with newer macOS versions (10.15+).

## Example and Quickstart

Install via pip:

- `pip install ocrmac`

### Basic Usage

```python

    from ocrmac import ocrmac

    annotations = ocrmac.OCR('test.png').recognize()

    print(annotations)

```

Output (Text, Confidence, BoundingBox):

```

[("GitHub: Let's build from here - X", 0.5, [0.16, 0.91, 0.17, 0.01]),

('github.com', 0.5, [0.174, 0.87, 0.06, 0.01]),

('Qi &0 O M #O', 0.30, [0.65, 0.87, 0.23, 0.02]),

[...]

('P&G U TELUS', 0.5, [0.64, 0.16, 0.22, 0.03])]

```

(BoundingBox precision capped for readability reasons)

### Create Annotated Images

```python

    from ocrmac import ocrmac

    ocrmac.OCR('test.png').annotate_PIL()

```

![Plot](https://github.com/straussmaximilian/ocrmac/blob/main/output.png?raw=true)

## Functionality

- You can pass the path to an image or a PIL image as an object

- You can use as a class (`ocrmac.OCR`) or function `ocrmac.text_from_image`)

- You can pass several arguments:

    - `recognition_level`: `fast` or `accurate`

    - `language_preference`: A list with languages for post-processing, e.g. `['en-US', 'zh-Hans', 'de-DE']`. 

- You can get an annotated output either as PIL image (`annotate_PIL`) or matplotlib figure (`annotate_matplotlib`)

- You can either use the `vision` or the `livetext` framework as backend.

#### Example: Select Language Preference

You can set a language preference like so:

```python

    ocrmac.OCR('test.png',language_preference=['en-US'])

```

What abbreviation should you use for your language of choice? [Here](https://www.alchemysoftware.com/livedocs/ezscript/Topics/Catalyst/Language.htm) is an overview of language codes, e.g.: `Chinese (Simplified)` -> `zh-Hans`, `English` -> `en-US` ..

If you set a wrong language you will see an error message showing the languages available. Note that the `recognition_level` will affect the languages available (fast has fewer)

See also this [Example Notebook](https://github.com/straussmaximilian/ocrmac/blob/main/ExampleNotebook.ipynb) for implementation details.

## Speed

Timings for the  above recognize-statement:

MacBook Pro (Apple M3 Max):

- `accurate`: 207 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

- `fast`: 131 ms ± 702 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

- `livetext`: 174 ms ± 4.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

## About LiveText

Since MacOS Sonoma, `LiveText` is now supported, which is stronger than the `VisionKit` OCR. You can try this feature by:

```python

# Use the OCR class

from ocrmac import ocrmac

annotations = ocrmac.OCR('test.png', framework="livetext").recognize()

print(annotations)

# Or use the helper directly

annotations = ocrmac.livetext_from_image('test.png').recognize()

```

Notice, when using this feature, the `recognition_level` and `confidence_threshold` are not available. The `confidence` output will always be 1.

## Technical Background & Motivation

If you want to do Optical character recognition (OCR) with Python, widely used tools are [`pytesseract`](https://github.com/madmaze/pytesseract) or [`EasyOCR`](https://github.com/JaidedAI/EasyOCR). For me, tesseract never did give great results. EasyOCR did, but it is slow on CPU. While there is GPU acceleration with CUDA, this does not work for Mac. *(Update from 9/2023: Apparently EasyOCR now has mps support for Mac.)*  

In any case, as a Mac user you might notice that you can, with newer versions, directly copy and paste from images. The built-in OCR functionality is quite good. The underlying functionality for this is [`VNRecognizeTextRequest`](https://developer.apple.com/documentation/vision/vnrecognizetextrequest) from Apple's Vision Framework. Unfortunately it is in Swift; luckily, a wrapper for this exists. [`pyobjc-framework-Vision`](https://github.com/ronaldoussoren/pyobjc). `ocrmac` utilizes this wrapper and provides an easy interface to use this for OCR.

I found the following resources very helpful when implementing this:

- [Gist from RheTbull](https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac587fc4c)

- [Apple Documentation](https://developer.apple.com/documentation/vision/recognizing_text_in_images/)

- [Using Pythonista with VNRecognizeTextRequest](https://forum.omz-software.com/topic/6016/recognize-text-from-picture)

I also did a small writeup about OCR on mac in this blogpost on [medium.com](https://betterprogramming.pub/a-practical-guide-to-extract-text-from-images-ocr-in-python-d8c9c30ae74b).

## Contributing

If you have a feature request or a bug report, please post it either as an idea in the discussions or as an issue on the GitHub issue tracker.  If you want to contribute, put a PR for it. Thanks!

If you like the project, consider starring it!