https://github.com/rapidai/paddleocrmodelconvert

Convert the model in PaddleOCR to ONNX format
https://github.com/rapidai/paddleocrmodelconvert

convert onnx paddle paddleocr rapidocr

Last synced: over 1 year ago
JSON representation

Convert the model in PaddleOCR to ONNX format

Host: GitHub
URL: https://github.com/rapidai/paddleocrmodelconvert
Owner: RapidAI
License: apache-2.0
Created: 2021-07-17T09:39:30.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2025-03-07T05:59:08.000Z (over 1 year ago)
Last Synced: 2025-04-24T01:47:16.033Z (over 1 year ago)
Topics: convert, onnx, paddle, paddleocr, rapidocr
Language: Python
Homepage:
Size: 58 MB
Stars: 82
Watchers: 2
Forks: 12
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

🔄 PaddleOCR Model Convert

### Introduction
- This repository is mainly to convert [Inference Model in PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md) into ONNX format.
- **Input**: **url** or local **tar** path of inference model
- **Output**: converted **ONNX** model
- If it is a recognition model, you need to provide the original txt path of the corresponding dictionary (**Open the txt file in github, click the path after raw in the upper right corner, similar to [this](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/ppocr/utils/ppocr_keys_v1.txt)**), used to write the dictionary into the ONNX model
- ☆ It needs to be used with the relevant reasoning code in [RapidOCR](https://github.com/RapidAI/RapidOCR)
- If you encounter a model that cannot be successfully converted, you can check which steps are wrong one by one according to the ideas in the figure below.

### Overall framework
```mermaid
flowchart TD

A([PaddleOCR inference model]) --paddle2onnx--> B([ONNX])
B --> C([Change Dynamic Input]) --> D([Rec: save the character dict to onnx])
D --> E([Save])
```

### Installation
```bash
pip install paddleocr_convert
```

### Usage
> [!WARNING]
>
> Only support the **reasoning model** in the download address in [link](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md), if it is a training model, Manual conversion to inference format is required.
>
> The **slim quantized model** in PaddleOCR does not support conversion.

#### Using the command line
- Usage:
```bash
$ paddleocr_convert -h
usage: paddleocr_convert [-h] [-p MODEL_PATH] [-o SAVE_DIR]
[-txt_path TXT_PATH]

optional arguments:
-h, --help show this help message and exit
-p MODEL_PATH, --model_path MODEL_PATH
The inference model url or local path of paddleocr.
e.g. https://paddleocr.bj.bcebos.com/PP-
OCRv3/chinese/ch_PP-OCRv3_det_infer.tar or
models/ch_PP-OCRv3_det_infer.tar
-o SAVE_DIR, --save_dir SAVE_DIR
The directory of saving the model.
-txt_path TXT_PATH, --txt_path TXT_PATH
The raw txt url or local txt path, if the model is
recognition model.
```
- Example:
```bash
#online
$ paddleocr_convert -p https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar \
-o models

$ paddleocr_convert -p https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar\
-o models\
-txt_path https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/ppocr/utils/ppocr_keys_v1.txt

# offline
$ paddleocr_convert -p models/ch_PP-OCRv3_det_infer.tar\
-o models

$ paddleocr_convert -p models/ch_PP-OCRv3_rec_infer.tar\
-o models\
-txt_path models/ppocr_keys_v1.txt
```

#### Script use
- online mode
```python
from paddleocr_convert import PaddleOCRModelConvert

converter = PaddleOCRModelConvert()
save_dir = 'models'
url = 'https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar'
txt_url = 'https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/ppocr/utils/ppocr_keys_v1.txt'

converter(url, save_dir, txt_path=txt_url)
```
- offline mode
```python
from paddleocr_convert import PaddleOCRModelConvert

converter = PaddleOCRModelConvert()
save_dir = 'models'
model_path = 'models/ch_PP-OCRv3_rec_infer.tar'
txt_path = 'models/ppocr_keys_v1.txt'
converter(model_path, save_dir, txt_path=txt_path)
```

### Use the model
Assuming that the model needs to be recognized in Japanese, and it has been converted, the path is `local/models/japan.onnx`

1. Install `rapidocr_onnxruntime` library
```bash
pip install rapidocr_onnxruntime
```
2. Script use
```python
from rapidocr_onnxruntime import RapidOCR

model_path = 'local/models/japan.onnx'
engine = RapidOCR(rec_model_path=model_path)

img = '1.jpg'
result, elapse = engine(img)
```
3. CLI use
```bash
rapidocr_onnxruntime -img 1.jpg --rec_model_path local/models/japan.onnx
```

### Changelog

Click to expand

- 2023-09-22 v0.0.17 update:
- Improve the log when meets the error.
- 2023-07-27 v0.0.16 update:
- Added the online conversion version of ModelScope.
- Change python version from python 3.6 ~ 3.11.
- 2023-04-13 update:
- Add online conversion program [link](https://huggingface.co/spaces/SWHL/PaddleOCRModelConverter)
- 2023-03-05 v0.0.4~7 update:
- Support transliteration of local models and dictionaries
- Optimize internal logic and error feedback
- 2023-02-28 v0.0.3 update:
- Added setting to automatically change to dynamic input for models that are not dynamic input
- 2023-02-27 v0.0.2 update:
- Encapsulate the conversion model code into a package, which is convenient for self-help model conversion
- 2022-08-15 v0.0.1 update:
- Write the dictionary of the recognition model into the meta in the onnx model for subsequent distribution.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rapidai/paddleocrmodelconvert

Awesome Lists containing this project

README

🔄 PaddleOCR Model Convert