https://github.com/bytefer/ollama-ocr
Implementing OCR with a local visual model run by ollama.
https://github.com/bytefer/ollama-ocr
llama llama-vision-model llama3 ollama ollama-ocr vison-models
Last synced: about 1 year ago
JSON representation
Implementing OCR with a local visual model run by ollama.
- Host: GitHub
- URL: https://github.com/bytefer/ollama-ocr
- Owner: bytefer
- License: mit
- Created: 2024-11-25T04:54:51.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-27T14:10:05.000Z (over 1 year ago)
- Last Synced: 2025-05-11T12:20:13.510Z (about 1 year ago)
- Topics: llama, llama-vision-model, llama3, ollama, ollama-ocr, vison-models
- Language: TypeScript
- Homepage:
- Size: 415 KB
- Stars: 278
- Watchers: 7
- Forks: 26
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Ollama OCR
An OCR tool based on Ollama-supported visual models such as [Llama 3.2-Vision](https://ollama.com/library/llama3.2-vision) or [MiniCPM-V 2.6](https://ollama.com/library/minicpm-v) accurately recognizes text in images while preserving the original formatting.
## Features
- 🚀 High accuracy text recognition using Llama 3.2-Vision/MiniCPM-V 2.6 model
- 📝 Preserves original text formatting and structure
- 🖼️ Supports multiple image formats: JPG, JPEG, PNG
- ⚡️ Customizable recognition prompts and models
- 🔍 Markdown output format option
- 💪 Robust error handling
> Accurate text recognition on macOS: [macos-vision-ocr](https://github.com/bytefer/macos-vision-ocr).
## System Requirements
- Node.js 18.0 or higher
- Local running [Ollama](https://ollama.com/) server
- [Llama 3.2-Vision](https://ollama.com/library/llama3.2-vision) model installed
## Important Notes
1. Ensure Ollama server is running before use
2. Make sure Llama 3.2-Vision model is downloaded
3. Currently supported image formats: .jpg, .jpeg, .png
## Installation
```bash
npm install ollama-ocr
# or using pnpm
pnpm add ollama-ocr
```
## Usage
### Basic Usage
```javascript
import { ollamaOCR, DEFAULT_OCR_SYSTEM_PROMPT } from "ollama-ocr";
async function runOCR() {
const text = await ollamaOCR({
filePath: "./test/images/handwriting.jpg",
systemPrompt: DEFAULT_OCR_SYSTEM_PROMPT,
});
console.log(text);
}
```
### Markdown Output
```javascript
import { ollamaOCR, DEFAULT_MARKDOWN_SYSTEM_PROMPT } from "ollama-ocr";
async function runOCR() {
const text = await ollamaOCR({
filePath: "./test/images/trader-joes-receipt.jpg",
systemPrompt: DEFAULT_MARKDOWN_SYSTEM_PROMPT,
});
console.log(text);
}
```
## Use MiniCPM-V 2.6 Vision Model
```javascript
async function runOCR() {
const text = await ollamaOCR({
model: "minicpm-v",
filePath: "./handwriting.jpg.jpg",
systemPrompt: DEFAULT_OCR_SYSTEM_PROMPT,
});
console.log(text);
}
```
## Error Handling
The tool provides comprehensive error handling:
```javascript
import { ollamaOCR, LlamaOCRError, ErrorCode } from "ollama-ocr";
async function runOCR() {
try {
const text = await ollamaOCR({
filePath: "./test/images/handwriting.jpg",
});
console.log(text);
} catch (error) {
if (error instanceof LlamaOCRError) {
switch (error.code) {
case ErrorCode.FILE_NOT_FOUND:
console.error("Image file not found");
break;
case ErrorCode.UNSUPPORTED_FILE_TYPE:
console.error("Unsupported image format");
break;
case ErrorCode.OLLAMA_SERVER_ERROR:
console.error("Ollama server connection failed");
break;
case ErrorCode.OCR_PROCESSING_ERROR:
console.error("OCR processing failed");
break;
}
}
}
}
```
## License
MIT