https://github.com/jabberjabberjabber/llmocr

Simple script that reads an image and dumps the text it reads using a vision model and KobolodCPP
https://github.com/jabberjabberjabber/llmocr

Last synced: 3 months ago
JSON representation

Simple script that reads an image and dumps the text it reads using a vision model and KobolodCPP

Host: GitHub
URL: https://github.com/jabberjabberjabber/llmocr
Owner: jabberjabberjabber
License: mit
Created: 2024-09-10T20:54:17.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-03-16T12:12:33.000Z (4 months ago)
Last Synced: 2025-03-25T07:22:18.661Z (4 months ago)
Language: Python
Homepage:
Size: 435 KB
Stars: 54
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# LLMOCR

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

LLMOCR uses a local LLM to read text from images.

You can also change the instruction to have the LLM use the image in the way that you prompt.

![Screenshot](llmocr.png)

## Features

- **Local Processing**: All processing is done locally on your machine.
- **User-Friendly GUI**: Includes a GUI. Relies on Koboldcpp, a single executable, for all AI functionality.
- **GPU Acceleration**: Will use Apple Metal, Nvidia CUDA, or AMD (Vulkan) hardware if available to greatly speed inference.
- **Cross-Platform**: Supports Windows, macOS ARM, and Linux.

## Installation

### Prerequisites

- Python 3.8 or higher
- KoboldCPP

### Windows Installation

1. Clone the repository or download the [ZIP file](https://github.com/jabberjabberjabber/LLavaImageTagger/archive/refs/heads/main.zip) and extract it.

2. Install [Python for Windows](https://www.python.org/downloads/windows/).

3. Download [KoboldCPP.exe](https://github.com/LostRuins/koboldcpp/releases) and place it in the LLMOCR folder. If it is not named KoboldCPP.exe, rename it to KoboldCPP.exe

4. If you want the script to download a model for you and have KoboldCpp run it for you, open `llm_ocr.bat`

5. If you want to load your own model using KoboldCpp, open ``llm_ocr_no_kobold.bat``

### Mac and Linux Installation

1. Clone the repository or download and extract the ZIP file.

2. Install Python 3.8 or higher if not already installed.

3. Create a new python env and install the requirements.txt.

4. Run kobold with flag --config llm-ocr.kcppt

5. Wait until the model weights finish downloading and the terminal window says ```Please connect to custom endpoint at http://localhost:5001```

6. Run llm-ocr-gui.py using Python.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgements

- [KoboldCPP](https://github.com/LostRuins/koboldcpp) for local AI processing
- [PyQt6](https://www.riverbankcomputing.com/software/pyqt/) for the GUI framework

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jabberjabberjabber/llmocr

Awesome Lists containing this project

README