https://github.com/vinodbaste/paddleocr_rec_dec
Optical Character Recognition (OCR) is a powerful technology that enables machines to recognize and extract text from images or scanned documents. OCR finds applications in various fields, including document digitization, text extraction from images, and text-based data analysis.
https://github.com/vinodbaste/paddleocr_rec_dec
detection image-processing ocr paddleocr recognition
Last synced: 9 months ago
JSON representation
Optical Character Recognition (OCR) is a powerful technology that enables machines to recognize and extract text from images or scanned documents. OCR finds applications in various fields, including document digitization, text extraction from images, and text-based data analysis.
- Host: GitHub
- URL: https://github.com/vinodbaste/paddleocr_rec_dec
- Owner: vinodbaste
- License: apache-2.0
- Created: 2023-07-29T08:41:20.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-30T19:51:32.000Z (almost 3 years ago)
- Last Synced: 2025-07-11T13:38:21.749Z (11 months ago)
- Topics: detection, image-processing, ocr, paddleocr, recognition
- Language: Python
- Homepage:
- Size: 49.8 KB
- Stars: 19
- Watchers: 1
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# paddleOCR for Det and Rec
Optical Character Recognition (OCR) is a powerful technology that enables machines to recognize and extract text from images or scanned documents. OCR finds applications in various fields, including document digitization, text extraction from images, and text-based data analysis. In this article, we will explore how to use PaddleOCR, an advanced OCR toolkit based on deep learning, for text detection and recognition tasks. We will walk through a code snippet that demonstrates the process step-by-step.
# Prerequisites
Before we dive into the code, let's ensure we have everything set up to run the PaddleOCR library. Make sure you have the following prerequisites installed on your machine:
Python (3.6 or higher)
PaddleOCR library
Other necessary dependencies (e.g., NumPy, pandas, etc)
Before running the code snippet, make sure you have the necessary libraries installed. You can find all the required packages and their versions in the requirements.txt file
```python
pip install -r requirements.txt
```
# Getting started
In the provided ```main.py```, we have an example usage of the RecMain class for performing text recognition (OCR) on a folder of images and generating an output Excel file with evaluation metrics:
```
# Example usage: Replace with your image folder path, label file, and desired output file name
image_folder = "path/to/image_folder"
label_file = "path/to/txt_file"
output_file = "raw_output.xlsx"
if __name__ == "__main__":
RecMain(image_folder=image_folder, rec_file=label_file, output_file=output_file).run_rec()
```
same can be initiated for detection
```
#Dec
# Example usage: Replace with your image folder path, label file, and desired output file name
image_folder = "path/to/image_folder"
label_file = "path/to/txt_file"
output_file = "raw_output.xlsx"
if __name__ == "__main__":
DecMain(image_folder_path=image_folder, label_file_path=label_file, output_file=output_file) \
.run_dec()
```
# 1. Text Detection
The code provided is a part of a class named DecMain, which seems to be designed for Optical Character Recognition (OCR) evaluation using ground truth data. It appears to use PaddleOCR to extract text from images and then calculates metrics like precision, recall, and Character Error Rate (CER) to evaluate the performance of the OCR system.
```python
class DecMain:
def __init__(self, image_folder_path, label_file_path, output_file):
self.image_folder_path = image_folder_path
self.label_file_path = label_file_path
self.output_file = output_file
def run_dec(self):
# Check and update the ground truth file
CheckAndUpdateGroundTruth(self.label_file_path).check_and_update_ground_truth_file()
df = OcrToDf(image_folder=self.image_folder_path, label_file=self.label_file_path, det=True, rec=True, cls=False).ocr_to_df()
ground_truth_data = ReadGroundTruthFile(self.label_file_path).read_ground_truth_file()
# Get the extracted text as a list of dictionaries (representing the OCR results)
ocr_results = df.to_dict(orient="records")
# Calculate precision, recall, and CER
precision, recall, total_samples = CalculateMetrics(ground_truth_data, ocr_results).calculate_precision_recall()
CreateSheet(dataframe=df, precision=precision, recall=recall, total_samples=total_samples,
file_name=self.output_file).create_sheet()
```
# Note: Format of the DET Ground Truth Label File
```
To perform OCR evaluation using the DecMain class and the provided code, it's crucial to format the ground truth label file correctly.
The label file should be in JSON format and follow the structure as shown below:
image_name.jpg [{"transcription": "215mm 18", "points": [[199, 6], [357, 6], [357, 33], [199, 33]], "difficult": False, "key_cls": "digits"}, {"transcription": "XZE SA", "points": [[15, 6], [140, 6], [140, 36], [15, 36]], "difficult": False, "key_cls": "text"}]
The label file should be in JSON format.
Each line of the file represents an image's OCR ground truth.
Each line contains the filename of the image, followed by the OCR results for that image in the form of a JSON object.
The JSON object should have the following keys:
"transcription": The ground truth text transcription of the image.
"points": A list of four points representing the bounding box coordinates of the text region in the image.
"difficult": A boolean value indicating whether the text region is difficult to recognize.
"key_cls": The class label of the OCR result, e.g., "digits" or "text".
Make sure to follow this format while creating the ground truth label file for accurate OCR evaluation.
```
# 2. Text Recognition
The code provided defines a class named RecMain, which is designed to run text recognition (OCR) using a pre-trained OCR model on a folder of images and generate an evaluation Excel sheet.
```python
class RecMain:
def __init__(self, image_folder, rec_file, output_file):
self.image_folder = image_folder
self.rec_file = rec_file
self.output_file = output_file
def run_rec(self):
image_paths = GetImagePathsFromFolder(self.image_folder, self.rec_file). \
get_image_paths_from_folder()
ocr_model = LoadRecModel().load_model()
results = ProcessImages(ocr=ocr_model, image_paths=image_paths).process_images()
ground_truth_data = ConvertTextToDict(self.rec_file).convert_txt_to_dict()
model_predictions, ground_truth_texts, image_names, precision, recall, \
overall_model_precision, overall_model_recall, cer_data_list = EvaluateRecModel(results,
ground_truth_data).evaluate_model()
# Create Excel sheet
CreateMetricExcel(image_names, model_predictions, ground_truth_texts,
precision, recall, cer_data_list, overall_model_precision, overall_model_recall,
self.output_file).create_excel_sheet()
```
# Note: Format of the Ground Truth Text File
```
To perform OCR evaluation using the RecMain class and the provided code, it's essential to format the ground truth (GT) text file correctly.
The GT text file should be in the following format:
image_name.jpg text
Each line of the file represents an image's GT text.
Each line contains the filename of the image, followed by a tab character (\t), and then the GT text for that image.
Ensure that the GT text file contains GT text entries for all the images present in the image folder specified in the RecMain class. The GT text should match the actual text content present in the images. This format is necessary for accurate evaluation of the OCR model's performance.
```
**If you find this library useful, please consider starring this repository from the top of this page.**
[](#)
# License
```
Copyright [2023] [Vinod Baste]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```