https://github.com/dwqs/ollama-ocr
A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models
https://github.com/dwqs/ollama-ocr
Last synced: 8 months ago
JSON representation
A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models
- Host: GitHub
- URL: https://github.com/dwqs/ollama-ocr
- Owner: dwqs
- License: mit
- Created: 2025-01-01T14:04:45.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-01T14:59:43.000Z (over 1 year ago)
- Last Synced: 2025-01-01T15:29:27.828Z (over 1 year ago)
- Language: Vue
- Size: 64.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
> inspired by [imanoop7/Ollama-OCR](https://github.com/imanoop7/Ollama-OCR)
## Ollama OCR for web
A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images.
## Supported Models
- **LLaVA**: A multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4. (LLaVa model can generate wrong output sometimes)
- **Llama 3.2 Vision**: Instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image
- **MiniCPM-V 2.6**: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
## Quick Start
#### Prerequisites
1. Install [Ollama](https://ollama.com/)
2. Pull the required models:
```sh
ollama pull llama3.2-vision:11b
ollama pull llava:13b
ollama pull minicpm-v:8b
```
Then run following command:
```sh
git clone git@github.com:dwqs/ollama-ocr.git
cd ollama-ocr
yarn or npm i
yarn dev or npm run dev
```
## Docker Supports
**you can run the demo from docker**: `debounce/ollama-ocr`
## Examples
#### Input Image1

#### Output Markdown

#### Input Image2

#### Output JSON

## Output Format Details
- **Markdown Format**: The output is a markdown string containing the extracted text from the image.
- **Text Format**: The output is a plain text string containing the extracted text from the image.
- **JSON Format**: The output is a JSON object containing the extracted text from the image.
## License
MIT