https://github.com/ai-forever/readingpipeline
Text reading pipeline that combines segmentation and OCR-models.
https://github.com/ai-forever/readingpipeline
object-detection ocr pytorch segmentation text-recognition
Last synced: 12 months ago
JSON representation
Text reading pipeline that combines segmentation and OCR-models.
- Host: GitHub
- URL: https://github.com/ai-forever/readingpipeline
- Owner: ai-forever
- License: mit
- Created: 2021-12-28T08:35:45.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-02-06T14:54:33.000Z (over 3 years ago)
- Last Synced: 2025-04-19T18:17:21.393Z (about 1 year ago)
- Topics: object-detection, ocr, pytorch, segmentation, text-recognition
- Language: Python
- Homepage:
- Size: 456 KB
- Stars: 26
- Watchers: 2
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Reading Pipeline
This is a pipeline for text detection and reading. It combines the [OCR](https://github.com/ai-forever/OCR-model) and [Segmentation](https://github.com/ai-forever/SEGM-model) models into the single pipeline and allows to segment an input image, then crop text regions from it and, finally, read these texts using OCR.
## Demo
A [web demo](https://huggingface.co/spaces/sberbank-ai/PeterRecognition) (on hugging face) of ReadingPipeline for the Peter the Great dataset and [web demo](https://huggingface.co/spaces/sberbank-ai/NotebooksRecognition) for recognition school notebook dataset.
Also there is a [demo](scripts/ReadPipeline-GoogleColab.ipynb) with an example of using the ReadingPipeline (you can run it in your Google Colab).
### Models
[Weights for reading manuscripts of Peter the Great](https://huggingface.co/sberbank-ai/ReadingPipeline-Peter), and [Peter dataset](https://huggingface.co/datasets/sberbank-ai/Peter)
[Weights for reading school notebooks handwritten dataset](https://huggingface.co/sberbank-ai/ReadingPipeline-notebooks), and school notebook datasets itself: [RU data](https://huggingface.co/datasets/sberbank-ai/school_notebooks_RU) and [EN data](https://huggingface.co/datasets/sberbank-ai/school_notebooks_EN)
## Quick setup and start
- Nvidia drivers >= 470, CUDA >= 11.4
- [Docker](https://docs.docker.com/engine/install/ubuntu/), [nvidia-docker](https://github.com/NVIDIA/nvidia-docker)
The provided [Dockerfile](Dockerfile) is supplied to build an image with CUDA support and cuDNN.
## Preparations
- Clone the repo.
- Download weights and config-files of segmentation and OCR models to the `data/` folder.
- `sudo make all` to build a docker image and create a container.
Or `sudo make all GPUS=device=0 CPUS=10` if you want to specify gpu devices and limit CPU-resources.
If you don't want to use Docker, you can install dependencies via requirements.txt
## Configuring the pipeline
You can change parameters of the pipeline in the [pipeline_config.json](scripts/pipeline_config.json).
### Main pipeline loop
The `main_process`-dict defines the order of the main processing methods that make up the pipeline loop. Classes are initialized with the parameters specified in the config, and are called one after the other in the predefined order.
PipelinePredictor - the class responsible for assembling the pipeline, and is located in [ocrpipeline/predictor.py](ocrpipeline/predictor.py). To add a new class to the pipeline, you need to add it to the `MAIN_PROCESS_DICT` dictionary in [ocrpipeline/predictor.py](ocrpipeline/predictor.py) and also specify it in the `main_process`-dict in the config at the point in the chain from which the class should be called.
```
"main_process": {
"SegmPrediction": {...},
"RestoreImageAngle": {...},
"ClassContourPosptrocess": {...},
"OCRPrediction": {...},
"LineFinder": {...},
...
}
```
### Models runtime, ONNX
You can specify runtime method for OCR and segmentation models.
```
"main_process": {
"SegmPrediction": {
"model_path": "/path/to/model.ckpt",
"config_path": "/path/to/config.json",
"num_threads": 8,
"device": "cuda",
"runtime": "Pytorch" # here you can chose runtime method
},
...
}
```
You can chose runtime method from several options: "Pytorch" (cuda and cpu devices), "ONNX" (only cpu is allowed) or "OpenVino" (only cpu).
### Class specific parameters
Parameters in the `classes`-dict are set individually for each class. The names of the classes must correspond to the class names of the segmentation model.
The `contour_posprocess`-dict defines the order of the contour processing, predicted by the segmentation model. Classes are initialized with the parameters specified in the config, and are called one after the other in the predefined order.
`ClassContourPosptrocess` is the class responsible for assembling and calling `contour_posptrocess` methods, and is located in [ocrpipeline/predictor.py](ocrpipeline/predictor.py). To add a new class to the pipeline, you need to add it to the `CONTOUR_PROCESS_DICT` dictionary in [ocrpipeline/predictor.py](ocrpipeline/predictor.py) and also specify it in the `contour_posprocess`-dict in the config at the point in the chain from which the class should be called.
```
"classes": {
"shrinked_pupil_text": {
"contour_posptrocess": {
"BboxFromContour": {},
"UpscaleBbox": {"upscale_bbox": [1.4, 2.3]}
}
},
...
}
```
## Inference
An example of model inference can be found in [inference_pipeline.ipynb](scripts/inference_pipeline.ipynb).
To evaluate the pipeline accuracy (the OCR-model combined with the SEGM-model), you can use [evaluate](scripts/evaluate.py) script (you first need to generate model predictions, an example in [inference_pipeline_on_dataset.ipynb](scripts/inference_pipeline_on_dataset.ipynb)).