https://github.com/robbyzhaox/myocr
A highly extensible and customizable framework for building OCR systems.
https://github.com/robbyzhaox/myocr
ai cv machine-learning ocr
Last synced: 4 months ago
JSON representation
A highly extensible and customizable framework for building OCR systems.
- Host: GitHub
- URL: https://github.com/robbyzhaox/myocr
- Owner: robbyzhaox
- License: apache-2.0
- Created: 2025-03-13T03:27:27.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-05-25T06:37:34.000Z (5 months ago)
- Last Synced: 2025-05-25T07:35:38.697Z (5 months ago)
- Topics: ai, cv, machine-learning, ocr
- Language: Python
- Homepage: https://robbyzhaox.github.io/myocr/
- Size: 8.32 MB
- Stars: 254
- Watchers: 1
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome - robbyzhaox/myocr - A highly extensible and customizable framework for building OCR systems. (Python)
README
MyOCR - Advanced OCR Pipeline Builder
![]()
[](https://robbyzhaox.github.io/myocr/)
[](https://huggingface.co/spaces/robbyzhaox/myocr)
[](https://hub.docker.com/r/robbyzhaox/myocr)
[](https://pypi.org/project/myocr-kit/)
[](LICENSE)English | [įŽäŊ䏿](./README_zh.md)
MyOCR is a highly extensible and customizable framework for building OCR systems. Engineers can easily train, integrate deep learning models into custom OCR pipelines for real-world applications.
Try the online demo on
[HuggingFace](https://huggingface.co/spaces/robbyzhaox/myocr) or [ModelScope](https://modelscope.cn/studios/robbyzhao/myocr/summary)## **đ Key Features**:
**âĄī¸ End-to-End OCR Development Framework** â Designed for developers to build and integrate detection, recognition, and custom OCR models in a unified and flexible pipeline.
**đ ī¸ Modular & Extensible** â Mix and match components - swap models, predictors, or input output processors with minimal changes.
**đ Developer-Friendly by Design** - Clean Python APIs, prebuilt pipelines and processors, and straightforward customization for training and inference.
**đ Production-Ready Performance** â ONNX runtime support for fast CPU/GPU inference, support various ways of deployment.
## đŖ Updates
- **đĨ2025.05.17 MyOCR v0.1.1 released**## đ ī¸ Installation
### đĻ Requirements
- Python 3.11+
- CUDA: Version 12.6 or higher is recommended for GPU acceleration. CPU-only mode is also supported.
- Operating System: Linux, macOS, or Windows.### đĨ Install Dependencies
```bash
# Clone the code from GitHub
git clone https://github.com/robbyzhaox/myocr.git
cd myocr# You can create your own venv before the following steps
# Install dependencies
pip install -e .# Development environment installation
pip install -e ".[dev]"# Download pre-trained model weights to models
# for Linux, macOS
mkdir -p ~/.MyOCR/models/
# for Windows, the "models" directory can be created in the current path
Download weights from: https://drive.google.com/drive/folders/1RXppgx4XA_pBX9Ll4HFgWyhECh5JtHnY
# Alternative download link: https://pan.baidu.com/s/122p9zqepWfbEmZPKqkzGBA?pwd=yq6j
```## đ Quick Start
### đĨī¸ Local Inference
#### Basic OCR Recognition
```python
from myocr.pipelines import CommonOCRPipeline# Initialize common OCR pipeline (using GPU)
pipeline = CommonOCRPipeline("cuda:0") # Use "cpu" for CPU mode# Perform OCR recognition on an image
result = pipeline("path/to/your/image.jpg")
print(result)
```#### Structured OCR Output (Example: Invoice Information Extraction)
config chat_bot in myocr.pipelines.config.structured_output_pipeline.yaml
```yaml
chat_bot:
model: qwen2.5:14b
base_url: http://127.0.0.1:11434/v1
api_key: 'key'
```
**Note:** chat bot currently support:
- Ollama API
- OpenAI API```python
from pydantic import BaseModel, Field
from myocr.pipelines import StructuredOutputOCRPipeline# Define output data model, refer to InvoiceModel in main.py
# Initialize structured OCR pipeline
pipeline = StructuredOutputOCRPipeline("cuda:0", InvoiceModel)# Process image and get structured data
result = pipeline("path/to/invoice.jpg")
print(result.to_dict())
```### đŗ Docker Deployment
The framework provides support for Docker deployment, which can be built and run using the following commands:
#### Run the Docker Container
```bash
docker run -d -p 8000:8000 robbyzhaox/myocr:latest# set the environment variables like following with -e option of docker run if you want use the StructuredOutputOCRPipline
docker run -d \
-p 8000:8000 \
-e CHAT_BOT_MODEL="qwen2.5:14b" \
-e CHAT_BOT_BASEURL="http://127.0.0.1:11434/v1" \
-e CHAT_BOT_APIKEY="key" \
robbyzhaox/myocr:latest
```#### Accessing API Endpoints (Docker)
```bash
IMAGE_PATH="your_image.jpg"BASE64_IMAGE=$(base64 -w 0 "$IMAGE_PATH") # Linux
#BASE64_IMAGE=$(base64 -i "$IMAGE_PATH" | tr -d '\n') # macOScurl -X POST \
-H "Content-Type: application/json" \
-d "{\"image\": \"${BASE64_IMAGE}\"}" \
http://localhost:8000/ocr```
### đ Using Rest API
The framework provides a simple Flask API service that can be called via HTTP interface:
```bash
# Start the service default port: 5000
python main.py
```API endpoints:
- `GET /ping`: Check if the service is running properly
- `POST /ocr`: Basic OCR recognition
- `POST /ocr-json`: Structured OCR outputWe also have a UI for these endpoints, please refer to [doc-insight-ui](https://github.com/robbyzhaox/doc-insight-ui)
## đ Contribution Guidelines
We welcome any form of contribution, including but not limited to:
- Submitting bug reports
- Adding new features
- Improving documentation
- Optimizing performance## đ License
This project is open-sourced under the Apache 2.0 License, see the [LICENSE](LICENSE) file for details.