https://github.com/dwqs/ollama-ocr

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models
https://github.com/dwqs/ollama-ocr

Last synced: 9 months ago
JSON representation

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models

Host: GitHub
URL: https://github.com/dwqs/ollama-ocr
Owner: dwqs
License: mit
Created: 2025-01-01T14:04:45.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-01T14:59:43.000Z (over 1 year ago)
Last Synced: 2025-01-01T15:29:27.828Z (over 1 year ago)
Language: Vue
Size: 64.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

> inspired by [imanoop7/Ollama-OCR](https://github.com/imanoop7/Ollama-OCR)

## Ollama OCR for web

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images.

## Supported Models

- **LLaVA**: A multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4. (LLaVa model can generate wrong output sometimes)
- **Llama 3.2 Vision**: Instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image
- **MiniCPM-V 2.6**: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

## Quick Start

#### Prerequisites

1. Install [Ollama](https://ollama.com/)
2. Pull the required models:

```sh
ollama pull llama3.2-vision:11b
ollama pull llava:13b
ollama pull minicpm-v:8b
```

Then run following command:

```sh
git clone git@github.com:dwqs/ollama-ocr.git
cd ollama-ocr
yarn or npm i
yarn dev or npm run dev
```

## Docker Supports

**you can run the demo from docker**: `debounce/ollama-ocr`

## Examples

#### Input Image1

![input-image](https://image-static.segmentfault.com/149/814/1498143911-677575ecd6977_fix732)

#### Output Markdown

![output-markdown.png](https://image-static.segmentfault.com/338/339/3383395719-67757691e9b37_fix732)

#### Input Image2

![input-image](https://image-static.segmentfault.com/257/222/2572220334-677579c2747c7_fix732)

#### Output JSON

![output-json.png](https://image-static.segmentfault.com/104/188/1041885248-677579f517f02_fix732)

## Output Format Details

- **Markdown Format**: The output is a markdown string containing the extracted text from the image.
- **Text Format**: The output is a plain text string containing the extracted text from the image.
- **JSON Format**: The output is a JSON object containing the extracted text from the image.

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dwqs/ollama-ocr

Awesome Lists containing this project

README