Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/uk0/llmocr
使用LLM + OCR进行总结OCR识别的内容,返回相应的结构数据
https://github.com/uk0/llmocr
llm ocr ollama pp-ocrv4
Last synced: about 1 month ago
JSON representation
使用LLM + OCR进行总结OCR识别的内容,返回相应的结构数据
- Host: GitHub
- URL: https://github.com/uk0/llmocr
- Owner: uk0
- License: apache-2.0
- Created: 2024-09-01T11:14:41.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-09-10T10:23:57.000Z (2 months ago)
- Last Synced: 2024-10-12T18:02:21.126Z (about 1 month ago)
- Topics: llm, ocr, ollama, pp-ocrv4
- Language: Python
- Homepage:
- Size: 47.5 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### LLM + OCR
* 后面可以扩展插件支持一些奇怪的表格处理数据,也可以使用`openai`的模型,目前是做了一个POC验证。
* 主要是为了方便识别一些简单的小图片,比如有些图片上的文字等消息,需要整理出来或是复制等,也能识别一些不清晰的内容。#### Quick Start
* PP-OCR-V4.0
* Ollama (`gemma2:2b-instruct-q8_0`)
* flask
* chrome plugin```shell
pip install -r requirements.txt
python app.py```
### install chrome plugin
* open `chrome://extensions/`
* switch developer mode on
* load unpacked extension![img.png](doc/img.png)
* find image and right click to open with WiseRead(`Analyze Image`)
![img.png](doc/img_id_card.png)
* Result for Chrome Tab will be shown as below on right div box
![img.png](doc/img1.png)
### Model
* llama3.1:8b-instruct-q8_0 `效果最好`
* gemma2:9b-instruct-q8_0
* Qwen1.5-MoE-A2.7B-Chat:latest### TODO
* 优化提示词(Doing...)
* 使用RAG优化结果,使结果更稳定(Doing...)