https://github.com/jabberjabberjabber/llmocr
Simple script that reads an image and dumps the text it reads using a vision model and KobolodCPP
https://github.com/jabberjabberjabber/llmocr
Last synced: about 1 month ago
JSON representation
Simple script that reads an image and dumps the text it reads using a vision model and KobolodCPP
- Host: GitHub
- URL: https://github.com/jabberjabberjabber/llmocr
- Owner: jabberjabberjabber
- License: mit
- Created: 2024-09-10T20:54:17.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-03-16T12:12:33.000Z (2 months ago)
- Last Synced: 2025-03-25T07:22:18.661Z (2 months ago)
- Language: Python
- Homepage:
- Size: 435 KB
- Stars: 54
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLMOCR
[](https://opensource.org/licenses/MIT)
LLMOCR uses a local LLM to read text from images.
You can also change the instruction to have the LLM use the image in the way that you prompt.

## Features
- **Local Processing**: All processing is done locally on your machine.
- **User-Friendly GUI**: Includes a GUI. Relies on Koboldcpp, a single executable, for all AI functionality.
- **GPU Acceleration**: Will use Apple Metal, Nvidia CUDA, or AMD (Vulkan) hardware if available to greatly speed inference.
- **Cross-Platform**: Supports Windows, macOS ARM, and Linux.## Installation
### Prerequisites
- Python 3.8 or higher
- KoboldCPP### Windows Installation
1. Clone the repository or download the [ZIP file](https://github.com/jabberjabberjabber/LLavaImageTagger/archive/refs/heads/main.zip) and extract it.
2. Install [Python for Windows](https://www.python.org/downloads/windows/).
3. Download [KoboldCPP.exe](https://github.com/LostRuins/koboldcpp/releases) and place it in the LLMOCR folder. If it is not named KoboldCPP.exe, rename it to KoboldCPP.exe
4. If you want the script to download a model for you and have KoboldCpp run it for you, open `llm_ocr.bat`
5. If you want to load your own model using KoboldCpp, open ``llm_ocr_no_kobold.bat``
### Mac and Linux Installation
1. Clone the repository or download and extract the ZIP file.
2. Install Python 3.8 or higher if not already installed.
3. Create a new python env and install the requirements.txt.
4. Run kobold with flag --config llm-ocr.kcppt
5. Wait until the model weights finish downloading and the terminal window says ```Please connect to custom endpoint at http://localhost:5001```6. Run llm-ocr-gui.py using Python.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgements
- [KoboldCPP](https://github.com/LostRuins/koboldcpp) for local AI processing
- [PyQt6](https://www.riverbankcomputing.com/software/pyqt/) for the GUI framework