{"id":13574360,"url":"https://github.com/oneapi-src/historical-assets-document-process","last_synced_at":"2025-04-04T15:30:45.062Z","repository":{"id":66145924,"uuid":"574715802","full_name":"oneapi-src/historical-assets-document-process","owner":"oneapi-src","description":"AI Starter Kit for Historical Assets document processing using Intel® Extension for Pytorch","archived":true,"fork":false,"pushed_at":"2024-05-08T23:57:32.000Z","size":1359,"stargazers_count":7,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-05T09:44:29.393Z","etag":null,"topics":["ai-starter-kit","deep-learning","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oneapi-src.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-05T23:18:31.000Z","updated_at":"2024-10-27T18:48:37.000Z","dependencies_parsed_at":"2024-05-08T20:42:24.582Z","dependency_job_id":"1bc71cbe-ae86-4bda-ac0e-1b9dc6de47bc","html_url":"https://github.com/oneapi-src/historical-assets-document-process","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fhistorical-assets-document-process","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fhistorical-assets-document-process/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fhistorical-assets-document-process/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fhistorical-assets-document-process/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oneapi-src","download_url":"https://codeload.github.com/oneapi-src/historical-assets-document-process/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247202528,"owners_count":20900794,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-starter-kit","deep-learning","pytorch"],"created_at":"2024-08-01T15:00:50.854Z","updated_at":"2025-04-04T15:30:40.052Z","avatar_url":"https://github.com/oneapi-src.png","language":"Python","readme":"PROJECT NOT UNDER ACTIVE MANAGEMENT\n\nThis project will no longer be maintained by Intel.\n\nIntel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.  \n\nIntel no longer accepts patches to this project.\n\nIf you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.  \n\nContact: webadmin@linux.intel.com\n# Historical Assets Document Process\n\n## Introduction\n\nBuild an optimized Optical Character Recognition (OCR) solution to automate text detection and extraction from input document images using Intel® Extension for PyTorch\\*, Intel® Neural Compressor and the Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e toolkit. Check out the [Developer Catalog](https://developer.intel.com/aireferenceimplementations) for information about different use cases.\n\n## Solution Technical Overview\nHistorically, business and organizations have faced the need to manage a huge amount of printed documents for multiple purposes, like obtain customer’s credit history, collect medical history of patients or access to legal documents for judicial cases. Manually process this enormous flow of paper-based documents represents a big challenge for any industry, since this manual procedure takes a lot of time to carry out, is prone to human error/bias, and requires a considerable physical space to store hundreds or thousands of paper files.\n\nThe issue related to using storage facilities to preserve the documents can be addressed by a paperless and digitized solution that offers a way to easily store the printed documents in a suitable database. However, having a document scanned into an image of text is different than a machine-encoded text, which allows, for example, to efficiently use a text editor to modify some old file or retrieve a document by searching for a specific entity in a database, like client’s name. In this context, a large set of scanned files still requires domain specialization to manually extract useful information, which involves time, increases the cost of the process, and cannot eradicate the potential intentional or unintentional errors due to human intervention.\n\nOptical Character Recognition (OCR) systems emerge as an automated solution that generates machine-encoded text from input document images, making more efficient the processing of an increasingly number of digital files, in addition to minimizing human intervention [[1]](#hegghammer_2021)[[2]](#li_2022).\n\nIn an OCR pipeline, an input document image flows into a text detection component and next, it is processed by a text recognition component. In the text detection stage, the objective is to localize all text regions within the input document images, where each of these text zones are known as region of interest (ROI). Once the ROIs are detected, they are cropped from the input images and passed to the text recognition component, which is in charge of identifying the text contained in the ROIs and transcribe such text into machine-encoded text. This process is illustrated in the following diagram:\n\n![ocr-flow](assets/ocr_flow_diagram_op.png)\n\nNowadays, AI (Artificial Intelligence) methods in the form of cutting-edge deep learning algorithms are commonly incorporated into OCR solutions to increase their efficiency in the processing of scanned files and their accuracy in the text recognition task [[3]](#memon_2020). Deep learning detection models like YOLO variations and CRAFT are frequently used in the text detection module to localize the ROIs, whereas models like Convolutional Recurrent Neural Networks (CRNN) and Transformers are implemented as part of the text recognition stage [[2]](#li_2022)[[4]](#faustomorales_2019).\n\nAlthough deep learning-based OCR systems deliver a solution to effectively recognize and extract text images, in a production environment, where a massive number of digitized documents may query an OCR engine, it becomes essential to scale compute resources while maintaining the accuracy and speeding up the inference time of text extraction from document images.\n\nThis reference kit presents an OCR solution that features deep learning models for the text detection and recognition stages. For the text detection process, a Python\\* library called EasyOCR is used. In the case of the text recognition stage, a CRNN architecture is implemented. Even though both detection and recognition components are contained in the proposed OCR pipeline, it is very important to note that this OCR solution mainly focuses on the training and inference stages of the text recognition module. For more details about the text recognition component, please refer to [this section](#solution-technical-details).\n\nBesides offering an OCR system based on state-of-the-art deep learning techniques, the proposed OCR solution also considers the scale demands for a production environment by boosting the performance of the text recognition component using the following Intel® packages:\n\n* ***Intel® Extension for PyTorch****\n\n  With a few lines of code, you can use [Intel® Extension for PyTorch*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/optimization-for-pytorch.html#gs.5vjhbw) to:\n    * Take advantage of the most up-to-date Intel software and hardware optimizations for PyTorch.\n    * Automatically mix different precision data types to reduce the model size and computational workload for inference.\n    * Add your own performance customizations using APIs.\n\n* ***Intel® Neural Compressor***\n\n  [Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html#gs.5vjr1p) performs model compression to reduce the model size and increase the speed of deep learning inference for deployment on CPUs or GPUs. This open source Python* library automates popular model compression technologies, such as quantization, pruning, and knowledge distillation across multiple deep learning frameworks.\n\n* ***Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e* Toolkit***\n\n  The [OpenVINO\u003csup\u003eTM\u003c/sup\u003e](https://www.intel.com/content/www/us/en/download/753640/intel-distribution-of-openvino-toolkit.html) toolkit:\n     * Enables the use of models trained with popular frameworks, such as TensorFlow* and PyTorch*.\n     * Optimizes inference of deep learning models by applying model retraining or fine-tuning, like post-training quantization. \n     * Supports heterogeneous execution across Intel hardware, using a common API for the Intel CPU, Intel® Integrated Graphics, Intel® Discrete Graphics, and other commonly used accelerators. \n\nIn particular, Intel® Neural Compressor functionalities are applied to compress the CRNN text extraction model via a post-training quantization procedure, which improves the performance of the model in inference time without compromising its accuracy and supports an efficient deployment of the quantized model in a wide range of Intel® CPUs and GPUs. In the same way, the Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e toolkit reduces model size by using quantization techniques, but also features an optimized deployment across Intel platforms, including edge devices and cloud environments. A detailed description of how this reference kit implements Intel® optimization packages can be found in this [section](#how-it-works).\n\nWith the aim to provide an accessible approach to conduct frequent re-training to analyze the performance of multiple CRNN models for the text extraction component, this OCR solution enables hyperparameter tuning. Combined with the use of cutting-edge deep learning models and Intel® optimization packages, hyperparameter tuning makes possible to leverage this reference kit as a useful resource for the machine learning practitioner looking to easily build and deploy a custom OCR system optimized to accurately extract text within document images.\n\nFurthermore, avoiding the manual retrieval of some specific information from a myriad of paper-based files and drastically reducing the human bias in the process, this reference kit presents an OCR system that the machine learning practitioner can leverage to perform text extraction for multiple applications, including [[5]](#thompson_2016)[[6]](#shatri_2020)[[7]](#oucheikh_2022)[[8]](#arvindrao_2023):\n  * Preservation of data contained in historical texts dating back centuries.\n  * Recognition of musical notation within scanned sheet music.\n  * Extracting text information from products to reduce shrinkage loss in grocery stores.\n  * Automate the processing of financial documents to combat fraud, increase productivity and improve customer service.  \n\nFor more details, visit [Intel® Extension for PyTorch\\*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/optimization-for-pytorch.html#gs.5vjhbw), [Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html#gs.5vjr1p), [Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e Toolkit](https://www.intel.com/content/www/us/en/download/753640/intel-distribution-of-openvino-toolkit.html), the [Historical Assets Document Process]() GitHub repository, and the [EasyOCR](https://github.com/JaidedAI/EasyOCR) GitHub repository.\n\n## Solution Technical Details\nIn this section, the interested reader can find a more in deep explanation about the text recognition component from the proposed OCR solution. A description of the dataset used to perform training and inference is also presented.\n\n### Text Recognition Component\nThis reference kit is mainly focused on the text recognition component of the OCR workflow. The motivation behind this operation mode is that in a typical OCR system, the deep learning model that performs the text detection is a pretrained model that offers enough generalization capabilities to effectively localize the ROIs in the images dataset of the current task, so the process of finetuning the text detection model in the given dataset can be skipped. As a reference, the finetuning procedure consists of adjusting the weights already learned by the pretrained deep learning model, in order to make them more relevant for the problem at hand [[9]](#chollet_2017). Using the given dataset, this weight adjustment is achieved by training some or all the layers of the deep learning model.\n\nFor this reference kit, the text detection task is carried out by EasyOCR, a Python\\* library that uses the CRAFT (Character Region Awareness For Text Detection) framework, which incorporates a pretrained large scale convolutional neural network. \n\nOn the other hand, the text recognition component does implement a deep learning model that is finetuned based on the ROIs detected in the given dataset, where this detection stage is firstly performed by EasyOCR. It is advisable to finetune the text recognition model on the given dataset because for a particular text recognition problem, the corresponding text images could exhibit certain font properties or they could come from determined sources, like a scanned ID card or a photo of an old manuscript, where the text could have very specific characteristics. In this scenario, finetuning the recognition model on the given dataset is the appropriate approach to achieve high accuracy on the text recognition task.\n\nFor the recognition phase, this reference kit uses a CRNN model to identify text within the document images. A CRNN is a neural network architecture composed of a Convolutional Neural Network (CNN) followed by a certain type of Recurrent Neural Network (RNN), which for this project is a Long-Short Term Memory (LSTM). \n\nA CNN model is useful to automatically extract sequential hierarchical features (known as feature maps) from each input image, whereas a LSTM, as any RNN, exhibits great proficiency in capturing contextual information within a sequence, in addition to be a type of RNN specialized in capturing long-term dependencies [[10]](#choi_2016)[[11]](#shi_2017). In fact, in the context of an OCR pipeline, the text recognition task is framed as an encoder-decoder problem that leverages a CNN-based encoder for image understanding and an RNN-base decoder for text generation [[2]](#li_2022).\n\nAbout the LSTM model used by the CRNN in this project, it works under a bidirectional approach. A traditional LSTM is directional and only uses contexts from the past, but for an image-based sequence problem, like the one address in this reference kit, a bidirectional LSTM is more appropriate as contexts containing previous and subsequent data are useful and complementary to each other [[11]](#shi_2017). \n\nRegarding the workflow process of the CRNN, it receives a cropped ROI image from EasyOCR as an input, and the convolutional component proceeds to extract a sequence of feature maps, which are then mapped into a sequence of feature vectors. Next, the bidirectional LSTM makes a prediction for each feature vector. Finally, a post-processing step is carried out to convert the LSTM predictions into a label sequence. The diagram below provides an illustrative reference of this process.\n\n![ocr-flow](assets/crnn_flow_diagram_op.png)\n\nIn terms of model architecture, the CRNN is composed by seven convolutional layers, each of them followed by a max pooling layer. As for the RNN, it is constituted by two bidirectional LSTM layers, each of them followed by a linear layer. The next table summarizes the structure of the CRNN model implemented in this reference kit. \"in_maps\" stands for \"input feature maps\", \"out_maps\" for \"output feature maps\", \"k\" for \"kernel size\", \"s\" for \"stride\", \"p\" for \"padding\", \"in_features\" is the size of each input instance and \"out_features\" is the size of each output instance.\n\n| **Layer** | Setup\n| :--- | :---\n| **Input** | Input ROI image\u003cbr\u003e\n| **Convolutional** | in_maps:1, out_maps:64, k:3 X 3, s:(1,1), p:(1,1)\n| **Max pooling** | k:2 X 2, s:2\n| **Convolutional** | in_maps:64, out_maps:128, k:3 X 3, s:(1,1), p:(1,1)\n| **Max pooling** | k:2 X 2, s:2\n| **Convolutional** | in_maps:128, out_maps:256, k:3 X 3, s:(1,1), p:(1,1)\n| **BatchNormalization** | -\n| **Convolutional** | in_maps:256, out_maps:256, k:3 X 3, s:(1,1), p:(1,1)\n| **Max pooling** | k:2 X 2, s:(2,1), p:(0,1)\n| **Convolutional** | in_maps:256, out_maps:512, k:3 X 3, s:(1,1), p:(1,1)\n| **BatchNormalization** | -\n| **Convolutional** | in_maps:512, out_maps:512, k:3 X 3, s:(1,1), p:(1,1)\n| **Max pooling** | k:2 X 2, s:(2,1), p:(0,1)\n| **Convolutional** | in_maps:512, out_maps:512, k:3 X 3, s:(1,1)\n| **BatchNormalization** | -\n| **Bidirectional LSTM** | hidden state:256\n| **Linear** | in_features: 512, out_features:256\n| **Bidirectional LSTM** | hidden state:256\n| **Linear** | in_features: 512, out_features:5835\n\nPlease see this [section](#download-the-crnn-model) to obtain more information on how to download the CRNN model used in this reference kit.\n\n### Dataset\nThis reference kit uses a synthetic image dataset of 3,356 labelled images created by the package [TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator). In this dataset, each image has certain text in it, so these images represent the ROIs that will feed the CRNN text recognition model. As a ground truth, TextRecognitionDataGenerator also generates a text file containing the image path and the corresponding text that appears in the image. Also, the dataset is automatically split into training and test sets. The next table describes the main attributes of the dataset.\n\n| **Use case** | OCR Text Extraction\n| :--- | :---\n| **Size** | Total 3,356 Labelled Images\u003cbr\u003e\n| **Train Images** | 3,256\n| **Test Images** | 100\n| **Input Size** | 32x280\n\nTo setup this dataset using TextRecognitionDataGenerator, follow the instructions listed [here](#download-the-datasets).\n\n\u003e *The data set used here was generated synthetically. Intel does not own the rights to this data set and does not confer any rights to it.*\n\n## Validated Hardware Details\nThere are workflow-specific hardware and software setup requirements depending on\nhow the workflow is run. \n\n| Recommended Hardware                                            | Precision\n| ----------------------------------------------------------------|-\n| CPU: Intel® 2nd Gen Xeon® Platinum 8280 CPU @ 2.70GHz or higher | FP32, INT8\n| RAM: 187 GB                                                     |\n| Recommended Free Disk Space: 20 GB or more                      |\n\nCode was tested on Ubuntu\\* 22.04 LTS.\n\n## How it Works\nThe text recognition component enables the training and inference modalities. Furthermore, this reference kit provides the option to incorporate the trained CRNN text recognition model into an end-to-end OCR system to make predictions from a complete document image. All these procedures are optimized using Intel® specialized packages. The next diagram illustrates the workflow of these processes and how the Intel® optimization features are applied in each stage. \n\n![ocr-flow](assets/e2e_flow_diagram_op.png)\n\n### Intel® Extension for PyTorch\\*\nTraining a CRNN model, and making inference with it, usually represent compute-intensive tasks. To address these requirements and to gain a performance boost on Intel® hardware, in this reference kit the training and inference stages of the CRNN model include the implementation of Intel® Extension for PyTorch\\*.\n\nThe training step through Intel® Extension for PyTorch\\* has two operation modalities:\n   1)\tRegular training. In this modality, the CRNN text recognition model is finetuned allowing the user to change the batch size. The resulting trained CRNN model is considered as the pretrained model that will be used to apply hyperparameter tuning in the next modality.\n   2)\tHyperparameter tuning. For this approach, the pretrained CRNN model from the previous modality is finetuned via the grid search paradigm, which performs an exhaustive search of optimal hyperparameters based on different values for batch size, learning rate and epochs. It is important to state that the values for batch size, learning rate and epochs are defined by default. The following table shows these values.\n\n        | **Hyperparameter** | Values\n        | :--- | :---\n        | **Batch size** | 80\u003cbr\u003e\n        | **Number of epochs** | 5, 10\n        | **Learning rates** | 1e-3\n\nRegarding the inference phase, any of the trained CRNN models with Intel® Extension for PyTorch\\* can be used to conduct inference tests. An additional advantage of this reference kit is that any of the trained CRNN models can be leveraged to make end-to-end predictions based on an input document image.\n\nAnother important aspect of the CRNN models trained with Intel® Extension for PyTorch\\*, is that these models are trained using a FP32 precision.\n\n### Intel® Neural Compressor\nOnce the CRNN models have been trained using Intel® Extension for PyTorch\\*, their inference efficiency can be accelerated even more by the Intel® Neural Compressor library. This project enables the use of Intel® Neural Compressor to convert any of the trained FP32 CRNN models into an INT8 CRNN model by implementing post-training quantization, which apart from reducing model size, increase the inference speed up. \n\nJust like any of the trained CRNN models with Intel® Extension for PyTorch\\*, the CRNN model quantized with Intel® Neural Compressor can be used to carry out end-to-end predictions.\n\n### Intel® Distribution of OpenVINO™ Toolkit\nSimilar to Intel® Neural Compressor, the Intel® Distribution of OpenVINO™ toolkit allows to reduce the model size with post-training quantization, which improves inference performance. By using the Intel® Distribution of OpenVINO™ toolkit post-training quantization, the FP32 CRNN model is converted to INT8. Moreover, the Intel® Distribution of OpenVINO™ toolkit optimizes the CRNN model for deployment in resource-constrained environments, like edge devices.\n\nIn order to quantize the FP32 CRNN model using the Intel® Distribution of OpenVINO™ toolkit, it is necessary to first convert the original FP32 CRNN model into ONNX (Open Neural Network Exchange) model representation. After the model is converted to ONNX, it must be converted into an Intermediate Representation (IR) format, which is an internal Intel® Distribution of OpenVINO™ toolkit model representation. Once the CRNN model is in IR format, the Intel® Distribution of OpenVINO™ toolkit directly quantizes the IR model via the Post-training Optimization (POT) tool and transforms it into an INT8 model. These conversion stages are illustrated in the following diagram.\n\n![ocr-flow](assets/conversion_stages_op.png)\n\nAnother benefit from using the Intel® Distribution of OpenVINO™ toolkit is that it enables the use of the benchmark Python\\* tool, which is a feature that estimates the inference performance of the corresponding deep learning model on supported devices [[12]](#openvino). The estimated inference performance is calculated in terms of latency and throughput. For this use case, the benchmark Python\\* tool is applied on the ONNX, IR and quantized INT8 models.\n\nAs it can be seen, this reference kit offers the alternative to optimize the inference performance of the CRNN model not just with Intel® Neural Compressor, but also with the Intel® Distribution of OpenVINO™ toolkit.\n\nPlease refer to the [Get Started](#get-started) section to see the instructions to implement the training, inference and end-to-end modalities using Intel® Extension for PyTorch\\*, Intel® Neural Compressor and the Intel® Distribution of OpenVINO™ toolkit.\n\n## Get Started\nStart by **defining an environment variable** that will store the workspace path, this can be an existing directory or one to be created in further steps. This ENVVAR will be used for all the commands executed using absolute paths.\n\n[//]: # (capture: baremetal)\n```bash\nexport WORKSPACE=$PWD/historical-assets-document-process\n```\n\nAlso, it is necessary to define the following environment variables to correctly setup this reference kit\n\n[//]: # (capture: baremetal)\n```bash\nexport DATA_DIR=$WORKSPACE/data\nexport OUTPUT_DIR=$WORKSPACE/output \n```\n\n**DATA_DIR:** This path will contain all the directories and files required to manage the dataset. \n\n**OUTPUT_DIR:** This path will contain the multiple outputs generated by the workflow, e.g. FP32 CRNN model and INT8 CRNN model.\n\n### Download the Workflow Repository\nCreate the workspace directory for the workflow and clone the [Historical Assets Document Process]() repository inside it.\n\n[//]: # (capture: baremetal)\n```bash\nmkdir -p $WORKSPACE \u0026\u0026 cd $WORKSPACE\n```\n\n```bash\ngit clone https://github.com/oneapi-src/historical-assets-document-process.git $WORKSPACE\n```\n\n### Set Up Conda\nPlease follow the instructions below to download and install Miniconda.\n\n1. Download the required Miniconda installer for Linux.\n   \n   ```bash\n   wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n   ```\n\n2. Install Miniconda.\n   \n   ```bash\n   bash Miniconda3-latest-Linux-x86_64.sh\n   ```\n\n3. Delete Miniconda installer.\n   \n   ```bash\n   rm Miniconda3-latest-Linux-x86_64.sh\n   ```\n\nPlease visit [Conda Installation on Linux](https://docs.anaconda.com/free/anaconda/install/linux/) for more details. \n\n### Set Up Environment\nExecute the next commands to install and setup libmamba as conda's default solver.\n\n```bash\nconda install -n base conda-libmamba-solver\nconda config --set solver libmamba\n```\n\n| Packages | Version | \n| -------- | ------- |\n| python | 3.9 |\n| intelpython3_core | 2024.1.0 |\n| intel-extension-for-pytorch | 2.2.0 |\n| neural-compressor| 2.4.1 |\n| openvino-dev| 2023.2.0 |\n\nThe dependencies required to properly execute this workflow can be found in the yml file [$WORKSPACE/env/intel_env.yml](env/intel_env.yml).\n\nProceed to create the conda environment.\n\n```bash\nconda env create -f $WORKSPACE/env/intel_env.yml\n```\n\nEnvironment setup is required only once. This step does not cleanup the existing environment with the same name hence we need to make sure there is no conda environment with the same name. During this setup, `historical_assets_intel` conda environment will be created with the dependencies listed in the YAML configuration.\n\nActivate the `historical_assets_intel` conda environment as follows:\n\n```bash\nconda activate historical_assets_intel\n```\n\n### Download the Datasets\nThe next instructions allow to generate and setup the synthetic image dataset:\n\n1. Create the `dataset` directory\n\n[//]: # (capture: baremetal)\n```bash\nmkdir -p $DATA_DIR/dataset\n```\n\n2. Generate the synthetic image dataset:\n\n[//]: # (capture: baremetal)\n```bash\nsh $WORKSPACE/src/dataset_gen.sh\n```\n\n\u003e *After running the above steps, the synthetic image dataset will be generated in `$DATA_DIR/data/dataset`. The files `train.txt` and `test.txt` will be created inside `$DATA_DIR/data`, and they will be used for training and inference, respectively. These files contain the path to each image and its associate label, which is the corresponding text that appears in the image. Finally, a directory located in `$DATA_DIR/data/pipeline_test_images` will be created to store some images useful to test the end-to-end performance of this reference kit.*\n\nIn this [section](#dataset), a detailed description about the properties of the dataset is provided.\n\n### Download the CRNN Model\nThe next instructions allow to download and manage the CRNN model for further training and inference using Intel® optimization packages.\n\n1. Download the text recognition CRNN model called `CRNN-1010.pth`.\n   \n   [//]: # (capture: baremetal)\n   ```bash\n   wget --no-check-certificate 'https://docs.google.com/uc?export=download\u0026id=15TNHcvzNIjxCnoURkK1RGqe0m3uQAYxM' -O CRNN-1010.pth\n   ```\n\n2. Create the `output` directory, and inside it, the `crrn_models` folder, which will store the different CRNN models that the workflow will generate.\n  \n   [//]: # (capture: baremetal)\n   ```bash\n   mkdir -p $OUTPUT_DIR/crnn_models\n   ```\n\n3. Move the `CRNN-1010.pth` model to `$OUTPUT_DIR/crnn_models` folder.\n   \n   [//]: # (capture: baremetal)\n   ```bash\n   mv $WORKSPACE/CRNN-1010.pth $OUTPUT_DIR/crnn_models/CRNN-1010.pth\n   ```\n\nIn this [section](#text-recognition-component), the interested reader can find technical information about the CRNN model.\n\n## Supported Runtime Environment\nThe execution of this reference kit is compatible with the following environments:\n* Bare Metal\n\n---\n\n### Run Using Bare Metal\nBefore executing the different stages of this use case, make sure your ``conda`` environment is already configured. If you don't already have ``conda`` installed, go to [Set Up Conda](#set-up-conda), or if the ``conda`` environment is not already activated, refer to [Set Up Environment](#set-up-environment).\n\n\u003e *Note: It is assumed that the present working directory is the root directory of this code repository. Use the following command to go to the root directory.*\n\n```bash\ncd $WORKSPACE\n```\n\n#### Run Workflow\nThe following subsections provide the commands to make an optimized execution of this OCR workflow based on Intel® Extension for PyTorch\\*, Intel® Neural Compressor and the Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e toolkit. As an illustrative guideline to understand how the Intel® specialized packages are used to optimize the performance of the text recognition CRNN model, please check the [How it Works](#how-it-works) section.\n\n---\n\n#### Optimizations with Intel® Extension for PyTorch\\*\nBy using Intel® Extension for PyTorch\\*, the performance of training, inference and end-to-end can be optimized.\n\n#### Regular training\nThe Python\\* script given below needs to be executed to start training the text recognition CRNN model previously downloaded in this [subsection](#download-the-crnn-model). About the training data, please check this [subsection](#download-the-datasets).\n\n```bash\nusage: python $WORKSPACE/src/ocr_train.py [-n] [-b] [-m]\n\noptional arguments:\n  --model_name, -n          Define the name for the model generated after training\n  --batch_size, -b          Give the required batch size\n  --model_path, -m          Give the path for the downloaded model \n```\n\nAs a reference to set the batch size, remember that the training data contains 3,256 images (see [here](#dataset)). The training data is located in `$DATA_DIR/dataset/train.txt`.\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/ocr_train.py -n \"CRNN-1010_Intel_WOHP\" -b 80 -m $OUTPUT_DIR/crnn_models/CRNN-1010.pth\n```\n\nIn this example, Intel® Extension for PyTorch\\* is applied, the batch size is set to 80, and the generated model will be saved in `$OUTPUT_DIR/crnn_models folder` with the name \"CRNN-1010_Intel_WOHP\".\n\n#### Hyperparameter tuning\nThe Python script given below needs to be executed to start hyperparameter tuned training. The model generated using the regular training approach will be regard as the pretrained model in which the fine tuning process will be applied. Hyperparameters considered for tuning are learning rate, epochs and batch size. The model generated is saved to `$OUTPUT_DIR/crnn_models` folder. To obtain more details about the hyperparameter tuning modality, refer to [this section](#how-it-works). \n\n```bash\nusage: python $WORKSPACE/src/ocr_train_hp.py [-n] [-b] [-m]\n\noptional arguments:\n  --model_name, -n          Define the name for the model generated after hyperparameter tuning\n  --batch_size, -b          Give the required batch size\n  --model_path, -m          Give the path for the model pretrained using regular training\n```\n\nAs a reference to set the batch size, remember that the training data contains 3,256 images (see [here](#dataset)). The training data is located in `$DATA_DIR/dataset/train.txt`.\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/ocr_train_hp.py -n \"CRNN-1010_Intel_WHP\" -b 80 -m $OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WOHP.pth\n```\n\nIn this example, Intel® Extension for PyTorch\\* is applied, the batch size is set to 80, and the generated finetuned model will be saved in `$OUTPUT_DIR/crnn_models` folder with the name \"CRNN-1010_Intel_WHP\". \n\n#### Inference\nThe Python script given below needs to be executed to perform inference based on any of the CRNN models trained with Intel® Extension for PyTorch\\*. \n\n```bash\nusage: python $WORKSPACE/src/inference.py [-m] [-b]\n\noptional arguments:\n  -m, --model_path          Absolute path to any of the CRNN models trained with Intel® Extension for PyTorch* \n  -b, --batchsize           Batchsize of input images\n```\n\nAs a reference to set the batch size, remember that the test data contains 100 images (see [here](#dataset)). The test data is located in `$DATA_DIR/dataset/test.txt`.\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/inference.py -m $OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WHP.pth -b 100\n```\n\nIn this example, Intel® Extension for PyTorch\\* is applied, the batch size is set to 80, and the trained CRNN model used to perform inference is the one fitted through hyperparameter tuning.\n\n#### End-to-end Inference Pipeline\nThe Python script given below needs to be executed to perform end-to-end inference on an input document image based on any of the CRNN models trained with Intel® Extension for PyTorch\\*. This pipeline will extract all the text from the given input document image. \n\n```bash\nusage: python $WORKSPACE/src/ocr_pipeline.py [-p] [-q] [-m] [-n]\n\noptional arguments:\n  -q, --inc                     Give 1 for enabling INC quantized model for inferencing, default is 0\n  -m, --crnn_model_path         Path to the any of the CRNN FP32 models trained with Intel® Extension for PyTorch*\n  -n, --quantized_model_path    Path to the CRNN INT8 model, it must be given if -q parameter is set to True, default is None\n  -p, --test_dataset_path       Path to the test image directory \n```\n\nThe script does the inference on the set of test input document images given in `$DATA_DIR/pipeline_test_images`. Extracted text output will be saved in `$OUTPUT_DIR/test_result folder`. The `--crnn_model_path` argument refers to the path of any of the CRNN models trained with Intel® Extension for PyTorch\\*, which in this use case are stored in `$OUTPUT_DIR/crnn_models`. For now, ignore the `--quantized_model_path`, as it will be used in further stages.\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/ocr_pipeline.py -p $DATA_DIR/pipeline_test_images -m $OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WHP.pth\n```\n\nIn this example, where Intel® Extension for PyTorch\\* is applied, the trained CRNN model used to perform end-to-end inference is the one fitted through hyperparameter tuning.\n\n---\n\n#### Optimizations with Intel® Neural Compressor\nBy using Intel® Neural Compressor, any of the trained FP32 CRNN models with Intel® Extension for PyTorch\\* can be converted into an INT8 model, which optimizes the inference and end-to-end performance.\n\n#### Quantization\nThe below script is used to convert any of the FP32 CRNN models trained with Intel® Extension for PyTorch\\* into an INT8 quantized model.\n\n```bash\nusage: python $WORKSPACE/src/neural_compressor_conversion.py [-m] [-o]\n\noptional arguments:\n  -m, --modelpath                 Path to any of the FP32 CRNN models trained with Intel® Extension for PyTorch*\n  -o, --outputpath                Output path where the quantized INT8 model will be stored\n```\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/neural_compressor_conversion.py -m $OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WHP.pth -o $OUTPUT_DIR/crnn_models\n```\n\nHere the `$OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WHP.pth` gives the path of the FP32 CRNN model trained with Intel® Extension for PyTorch\\* following the hyperparameter tuning approach. The model after conversion, that is the quantized model, will be stored in the `$OUTPUT_DIR/crnn_models` with the name `best_model.pt`.\n\n#### Inference with Quantized Model \nThe Python script given below needs to be executed to perform inference based on the quantized CRNN model. \n\n```bash\nusage: python $WORKSPACE/src/inc_inference.py [-m] [-q] [-b]\n\noptional arguments:\n  -m, --fp32modelpath             Path to any of the FP32 CRNN models trained with Intel® Extension for PyTorch*\n  -q, --int8modelpath             Path where the quantized CRNN model is stored. This model is the quantized INT8 version of the FP32 model stored in -m\n  -b, --batchsize                 Batchsize of input images\n```\n\nAs a reference to set the batch size, remember that the test data contains 100 images (see [here](#dataset)). The test data is located in `$DATA_DIR/dataset/test.txt`. Please consider that the INT8 CRNN model stored in `-q` must correspond to the quantized version of the FP32 CRNN model stored in `-m`.\n\nExample\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/inc_inference.py -m $OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WHP.pth  -q $OUTPUT_DIR/crnn_models/best_model.pt -b 100\n```\n\nIn this example, the FP32 CRNN model corresponds to the one fitted through hyperparameter tuning.\n\n#### Comparison Between FP32 and INT8 Models\n\nExecute the below script to compare the performance of a CRNN FP32 model regarding its corresponding INT8 quantized model.\n\n```bash\nusage: python $WORKSPACE/src/performance_analysis.py [-m] [-q]\n\noptional arguments:\n  -m, --fp32modelpath             Path to any of the FP32 CRNN models trained with Intel® Extension for PyTorch\\*\n  -q, --int8modelpath             Path where the quantized CRNN model is stored. This model is the quantized INT8 version of the FP32 model stored in -m\n```\n\nExample\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/performance_analysis.py -m $OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WHP.pth  -q $OUTPUT_DIR/crnn_models/best_model.pt\n```\n\nIn this example, the FP32 CRNN model is the one fitted through hyperparameter tuning. Please consider that the INT8 CRNN model stored in `-q` must correspond to the quantized version of the FP32 CRNN model stored in `-m`.\n\n#### End-to-end Inference Pipeline\nThe Python script given below needs to be executed to perform end-to-end inference on an input document image based on the quantized INT8 model using Intel® Neural Compressor. This pipeline will extract all the text from the given input document image. \n\n```bash\nusage: python $WORKSPACE/src/ocr_pipeline.py [-p] [-q] [-m] [-n]\n\noptional arguments:\n  -q, --inc                     Give 1 for enabling INC quantized model for inferencing, default is 0\n  -m, --crnn_model_path         Path to the any of the CRNN FP32 models trained with Intel® Extension for PyTorch*\n  -n, --quantized_model_path    Path to the CRNN INT8 model, it must be given if -q parameter is set to True, default is None. This model is the quantized INT8 version of the FP32 model stored in -m\n  -p, --test_dataset_path       Path to the test image directory \n```\n\nThe script does the inference on the set of test input document images given in `$DATA_DIR/pipeline_test_images`. Extracted text output will be saved in `$OUTPUT_DIR/test_result folder`. The `--crnn_model_path` argument refers to the path of any of the FP32 CRNN models trained with Intel® Extension for PyTorch\\*, which by default are saved in `$OUTPUT_DIR/crnn_models`. \n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/ocr_pipeline.py -q 1 -p $DATA_DIR/pipeline_test_images -m $OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WHP.pth -n $OUTPUT_DIR/crnn_models/best_model.pt \n```\n\nIn this example, the FP32 CRNN model is the one fitted through hyperparameter tuning. Please consider that the INT8 CRNN model stored in `-n` must correspond to the quantized version of the FP32 CRNN model stored in `-m`.\n\n---\n\n#### Optimizations with the Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e Toolkit\nAnother option to quantize any of the trained FP32 CRNN models with Intel® Extension for PyTorch\\* is by using the Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e toolkit, which is specialized in optimizing the inference performance in constrained environments, like edge devices. However, in order to quantize the FP32 CRNN model using the Intel® Distribution of OpenVINO™ toolkit, it is required to first convert the FP32 CRNN model into an ONXX model representation, then, the ONXX model is converted into an Intermediate Representation (IR) format, and finally, the IR model can be quantized. For further details, check this [subsection](#intel®-distribution-of-openvino™-toolkit).\n\n#### Model Conversion to ONNX Format\nBelow script is used to convert FP32 model to ONNX model representation. The converted ONNX model file will be saved in `$WORKSPACE/src/openvino`.\n\n```bash\npython $WORKSPACE/src/onnx_convert.py [-m] [-output]\n\noptional arguments:\n  -m, --fp32modelpath             Path to any of the FP32 CRNN models trained with Intel® Extension for PyTorch\\*\n  -output, --onnxmodelpath        Give path in which you want the ONNX model to be stored\n```\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\npython $WORKSPACE/src/onnx_convert.py -m $OUTPUT_DIR/crnn_models/CRNN-1010_Intel_WHP.pth -output $WORKSPACE/src/openvino\n```\n\nIn this example, the FP32 CRNN model converted to ONNX format is the one fitted through hyperparameter tuning. \n\n#### Model Conversion to OpenVINO\u003csup\u003eTM\u003c/sup\u003e IR Format\nBelow script is used to convert ONNX model into an IR model representation, which is an internal Intel® Distribution of OpenVINO™ Toolkit model representation. \n\n```bash\nmo --input_model \u003cONNX model\u003e --output_dir \u003cOutput directory path to save the IR model\u003e\n\noptional arguments:\n  --input_model             Path where the ONNX is stored.\n  --output_dir              Path of the folder to save the OpenVINO IR model format\n```\n\nThe above command will generate `\u003cmodel-name\u003e.bin` and `\u003cmodel-name\u003e.xml`. These files will be used in later steps to finally quantize the model and perform predictions. Default precision is FP32.\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\nmo --input_model $WORKSPACE/src/openvino/test_model.onnx --output_dir $WORKSPACE/src/openvino\n```\n\n#### Model Inference Performance with the OpenVINO\u003csup\u003eTM\u003c/sup\u003e Benchmark Python\\* Tool\nBy using the benchmark Python\\* tool from the Intel® Distribution of OpenVINO™ toolkit, it is possible to estimate the inference performance of the ONNX, IR and quantized INT8 models.\n\n#### Inference Performance of ONNX Model\nBelow command is used to run the benchmark tool for the ONNX model.\n\n```bash\nbenchmark_app -m \u003cPath of ONNX model\u003e\n\noptional arguments: \n  -m,--modelpath   Path of model in ONNX format\n```\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\nbenchmark_app -m $WORKSPACE/src/openvino/test_model.onnx\n```\n\n#### Inference Performance of OpenVINO\u003csup\u003eTM\u003c/sup\u003e IR Model\nBelow command is used to run the benchmark tool for the IR model.\n\n```bash\nbenchmark_app -m \u003cPath of the IR model in xml format\u003e\n\noptional arguments: \n  -m,--modelpath   Path of model in IR model in xml format\n```\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\nbenchmark_app -m $WORKSPACE/src/openvino/test_model.xml\n```\n\n#### Model Conversion Using the OpenVINO\u003csup\u003eTM\u003c/sup\u003e Post-training Optimization Tool (POT)\nA configuration file is needed to setup the various parameters and apply quantization via the Post-training Optimization Tool (POT), which converts the IR FP32 model into an INT8 model. The same configuration file has already been provided in the repo at following path:\n\n```\n$WORKSPACE/src/openvino/OCR_model_int8.json\n```\n\nUser can update the following parameters in the json file using any editor tool available on the computer.\n\n```\n\"model_name\"    : Name of the output model\n\"model\"         : Path to IR FP32 model (.xml) file\n\"weights\"       : Path to IR FP32 model weights (.bin) file\n```\n\nUse the below command to quantize the model and convert it into an INT8 model. When this command execution completes successfully, it generates a folder with the name `results` where the quantized model files will be generated.\n\n```bash\ncd $WORKSPACE/src/openvino\npot -c \u003cPath of the configuration file\u003e -d \n\noptional arguments: \n  -c,--configfile  Path of the configuration file\n```\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\ncd $WORKSPACE/src/openvino\npot -c OCR_model_int8.json -d\n```\n\nAfter running the above command, we can verify that files \"OCR_model.bin\", \"OCR_model.xml and \"OCR_model.mapping\" (quantized model) got generated on `$WORKSPACE/src/openvino/results/optimized` path.\n\n#### Inference Performance of Quantized INT8 Model\nBelow command is used to run the benchmark tool for the quantized INT8 model.\n\n```bash\nbenchmark_app -m \u003cquantized POT model in xml format\u003e\n\noptional arguments: \n  -m,--quantizedmodelpath   Quantized POT model in xml format\n```\n\nExample:\n\n[//]: # (capture: baremetal)\n```bash\nbenchmark_app -m ./results/optimized/OCR_model.xml\n```\n\n#### Clean Up Bare Metal\nThe next commands are useful to remove the previously generated conda environment, as well as the dataset and the multiple models and files created during the workflow execution. Before proceeding with the cleanup process, it is recommended to backing up the data you want to preserve.\n\n```bash\nconda deactivate #Run this line if the historical_assets_intel environment is still active\nconda env remove -n historical_assets_intel\nrm $OUTPUT_DIR $DATA_DIR $WORKSPACE -rf\n```\n\n---\n\n### Expected Output\nA successful execution of the different stages of this workflow should produce outputs similar to the following:\n\n#### Regular Training Output with Intel® Extension for PyTorch\\*\n\n```\nloading pretrained model from output/crnn_models/CRNN-1010.pth\nIntel Pytorch Optimizations has been Enabled!\nStart of Training\nepoch 0....\nbatch size: 80\nepoch: 0 iter: 0/41 Train loss: 39.135\nbatch size: 80\nepoch: 0 iter: 1/41 Train loss: 13.852\nbatch size: 80\nepoch: 0 iter: 2/41 Train loss: 8.824\nbatch size: 80\nepoch: 0 iter: 3/41 Train loss: 9.496\nbatch size: 80\nepoch: 0 iter: 4/41 Train loss: 8.571\nbatch size: 80\nepoch: 0 iter: 5/41 Train loss: 5.992\n```\n...\n```\nepoch: 24 iter: 35/41 Train loss: 0.053\nbatch size: 80\nepoch: 24 iter: 36/41 Train loss: 0.060\nbatch size: 80\nepoch: 24 iter: 37/41 Train loss: 0.127\nbatch size: 80\nepoch: 24 iter: 38/41 Train loss: 0.304\nbatch size: 80\nepoch: 24 iter: 39/41 Train loss: 0.039\nbatch size: 56\nepoch: 24 iter: 40/41 Train loss: 0.278\nTrain loss: 0.130145\nTrain time is 1022.9462730884552\nModel saving....\nStart val\n```\n\n#### Hyperparameter Tuning Output with Intel® Extension for PyTorch\\*\n\n```\nloading pretrained model from output/crnn_models/CRNN-1010_Intel_WOHP_best_840.pth\nCRNN(\n  (conv1): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n  (relu1): ReLU(inplace=True)\n  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n  (conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n  (relu2): ReLU(inplace=True)\n  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n  (conv3_1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n  (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n  (relu3_1): ReLU(inplace=True)\n  (conv3_2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n  (relu3_2): ReLU(inplace=True)\n  (pool3): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)\n  (conv4_1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n  (bn4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n  (relu4_1): ReLU(inplace=True)\n  (conv4_2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n  (relu4_2): ReLU(inplace=True)\n  (pool4): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)\n  (conv5): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1))\n  (bn5): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n  (relu5): ReLU(inplace=True)\n  (rnn): Sequential(\n    (0): BidirectionalLSTM(\n      (rnn): LSTM(512, 256, bidirectional=True)\n      (embedding): Linear(in_features=512, out_features=256, bias=True)\n    )\n    (1): BidirectionalLSTM(\n      (rnn): LSTM(256, 256, bidirectional=True)\n      (embedding): Linear(in_features=512, out_features=5835, bias=True)\n    )\n  )\n)\nIntel Pytorch Optimizations has been Enabled!\nStart of hp tuning\nepoch 0....\nbatch size: 80\nepoch: 0 iter: 0/41 Train loss: 0.016\nbatch size: 80\nepoch: 0 iter: 1/41 Train loss: 0.055\n```\n...\n```\nepoch: 9 iter: 35/41 Train loss: 0.070\nbatch size: 80\nepoch: 9 iter: 36/41 Train loss: 0.049\nbatch size: 80\nepoch: 9 iter: 37/41 Train loss: 0.006\nbatch size: 80\nepoch: 9 iter: 38/41 Train loss: 0.050\nbatch size: 80\nepoch: 9 iter: 39/41 Train loss: 0.047\nbatch size: 56\nepoch: 9 iter: 40/41 Train loss: 0.133\nTrain loss: 0.197468\nInferencing.................\nStart val\nocr_acc: 0.880000\nHyperparameter tuning time is 699.8766577243805\naccuracy list\n[0.87, 0.88]\nparameters list\n[(80, 0.001, 5), (80, 0.001, 10)]\nthe best parameters are\n(80, 0.001, 10)\n```\n\n#### Inference Output with Intel® Extension for PyTorch\\*\n\n```\nIntel Pytorch Optimizations has been Enabled!\nAverage Batch Inference time taken for in seconds ---\u003e  0.3702504992485046\nAccuracy is ---\u003e 0.88\n```\n\n#### End-to-end Inference Pipeline with Intel® Extension for PyTorch\\*\n\n```\nUsing CPU. Note: This module is much faster with a GPU.\n['data/pipeline_test_images/0_avoidable self-prescribed old-age Carlylese prioress.jpg', 'data/pipeline_test_images/0_mapau delitescency WW2 thunderer flukes.jpg']\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/intel_extension_for_pytorch/frontend.py:474: UserWarning: Conv BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Conv BatchNorm folding failed during the optimize process.\")\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/intel_extension_for_pytorch/frontend.py:479: UserWarning: Linear BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Linear BatchNorm folding failed during the optimize process.\")\nIntel Pytorch Optimizations has been Enabled!\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005330908298492432\noutput from current roi: avoidable\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.006089353561401367\noutput from current roi: self-prescribed\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.004653286933898926\noutput from current roi: old-age\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.0050509095191955565\noutput from current roi: Carlylese\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.004717493057250976\noutput from current roi: prioress\nExtracetd text:\n avoidable self-prescribed old-age Carlylese prioress\nPrediction time for image:  0.025841951370239258\nIntel Pytorch Optimizations has been Enabled!\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.0045200705528259276\noutput from current roi: mapall\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005540859699249267\noutput from current roi: delitescency\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.004426908493041992\noutput from current roi: WWz\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005026328563690186\noutput from current roi: thunderer\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.004381299018859863\noutput from current roi: flukes\nExtracetd text:\n mapall delitescency WWz thunderer flukes\nPrediction time for image:  0.023895466327667234\nTotal pipeline prediction time for all the images:  0.04973741769790649\n```\n\n#### Quantization Output with Intel® Neural Compressor\n\n```\n2023-09-28 22:27:51 [INFO] Because both eval_dataloader_cfg and user-defined eval_func are None, automatically setting 'tuning.exit_policy.performance_only = True'.\n2023-09-28 22:27:51 [INFO] The cfg.tuning.exit_policy.performance_only is: True\n2023-09-28 22:27:51 [INFO] Attention Blocks: 0\n2023-09-28 22:27:51 [INFO] FFN Blocks: 0\n2023-09-28 22:27:51 [INFO] Pass query framework capability elapsed time: 159.7 ms\n2023-09-28 22:27:51 [INFO] Adaptor has 2 recipes.\n2023-09-28 22:27:51 [INFO] 0 recipes specified by user.\n2023-09-28 22:27:51 [INFO] 0 recipes require future tuning.\n2023-09-28 22:27:51 [INFO] Neither evaluation function nor metric is defined. Generate a quantized model with default quantization configuration.\n2023-09-28 22:27:51 [INFO] Force setting 'tuning.exit_policy.performance_only = True'.\n2023-09-28 22:27:51 [INFO] Fx trace of the entire model failed, We will conduct auto quantization\n2023-09-28 22:27:54 [INFO] |******Mixed Precision Statistics******|\n2023-09-28 22:27:54 [INFO] +----------------------+-------+-------+\n2023-09-28 22:27:54 [INFO] |       Op Type        | Total |  INT8 |\n2023-09-28 22:27:54 [INFO] +----------------------+-------+-------+\n2023-09-28 22:27:54 [INFO] | quantize_per_tensor  |   12  |   12  |\n2023-09-28 22:27:54 [INFO] |        Conv2d        |   7   |   7   |\n2023-09-28 22:27:54 [INFO] |      dequantize      |   12  |   12  |\n2023-09-28 22:27:54 [INFO] |     BatchNorm2d      |   3   |   3   |\n2023-09-28 22:27:54 [INFO] |        Linear        |   2   |   2   |\n2023-09-28 22:27:54 [INFO] +----------------------+-------+-------+\n2023-09-28 22:27:54 [INFO] Pass quantize model elapsed time: 3654.04 ms\n2023-09-28 22:27:54 [INFO] Save tuning history to /historical-assets-main-test/nc_workspace/2023-09-28_22-27-50/./history.snapshot.\n2023-09-28 22:27:54 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.\n2023-09-28 22:27:54 [INFO] Save deploy yaml to /historical-assets-main-test/nc_workspace/2023-09-28_22-27-50/deploy.yaml\n2023-09-28 22:27:55 [INFO] Save config file and weights of quantized model to /historical-assets-main-test/output/crnn_models.\n```\n\n#### Inference Output with Intel® Neural Compressor\n\n```\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/torch/_utils.py:335: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n  device=storage.device,\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/torch/ao/quantization/observer.py:1209: UserWarning: must run observer before calling calculate_qparams.\n     Returning default scale and zero point\n  warnings.warn(\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/intel_extension_for_pytorch/frontend.py:474: UserWarning: Conv BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Conv BatchNorm folding failed during the optimize process.\")\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/intel_extension_for_pytorch/frontend.py:479: UserWarning: Linear BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Linear BatchNorm folding failed during the optimize process.\")\nIntel Pytorch Optimizations has been Enabled!\nRunning Inference with INC Quantized Int8 Model\nAverage Batch Inference time taken in seconds ---\u003e  0.31022746562957765\nRunning Inference with FP32 Model\nAverage Batch Inference time taken in seconds ---\u003e  0.2863597750663757\n```\n\n#### Output of the Comparison Between FP32 and INT8 Models Using Intel® Neural Compressor\n\n```\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/torch/_utils.py:335: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n  device=storage.device,\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/torch/ao/quantization/observer.py:1209: UserWarning: must run observer before calling calculate_qparams.\n     Returning default scale and zero point\n  warnings.warn(\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/intel_extension_for_pytorch/frontend.py:474: UserWarning: Conv BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Conv BatchNorm folding failed during the optimize process.\")\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/intel_extension_for_pytorch/frontend.py:479: UserWarning: Linear BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Linear BatchNorm folding failed during the optimize process.\")\n**************************************************\nEvaluating the FP32 Model\n**************************************************\npredicted:  cimbalom\ntargeted: cimbalom\npredicted:  IKT\ntargeted: ITT\npredicted:  roughrider\ntargeted: roughrider\npredicted:  gramaphone\ntargeted: gramaphone\npredicted:  nonsolicitously\ntargeted: nonsolicitously\npredicted:  in-goal\ntargeted: in-goal\npredicted:  slaking\ntargeted: slaking\npredicted:  spani\ntargeted: spaid\npredicted:  cardioplegia\ntargeted: cardioplegia\npredicted:  rejuvenating\n```\n...\n```\ntargeted: H-steel\npredicted:  postexilian\ntargeted: postexilian\npredicted:  entasia\ntargeted: entasia\npredicted:  all-understanding\ntargeted: all-understanding\npredicted:  trademarks\ntargeted: trademarks\npredicted:  Darbyism\ntargeted: Darbyism\npredicted:  gluing\ntargeted: gluing\npredicted:  Scincus\ntargeted: Scincus\npredicted:  haeremai\ntargeted: haeremai\npredicted:  sassywood\ntargeted: sassywood\npredicted:  sabtaiaagyalanr\ntargeted: subtriangular\npredicted:  uncemented\ntargeted: uncemented\npredicted:  Thbuan\ntargeted: thuan\npredicted:  ideals\ntargeted: ideals\npredicted:  celadons\ntargeted: celadons\npredicted:  anosphresia\ntargeted: anosphresia\npredicted:  ralliance\ntargeted: ralliance\nAccuracy of INT8 model : 0.88\n```\n\n#### Output of End-to-end Inference Pipeline with Intel® Neural Compressor\n\n```\nUsing CPU. Note: This module is much faster with a GPU.\n['data/pipeline_test_images/0_avoidable self-prescribed old-age Carlylese prioress.jpg', 'data/pipeline_test_images/0_mapau delitescency WW2 thunderer flukes.jpg']\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/intel_extension_for_pytorch/frontend.py:474: UserWarning: Conv BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Conv BatchNorm folding failed during the optimize process.\")\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/intel_extension_for_pytorch/frontend.py:479: UserWarning: Linear BatchNorm folding failed during the optimize process.\n  warnings.warn(\"Linear BatchNorm folding failed during the optimize process.\")\nIntel Pytorch Optimizations has been Enabled!\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.006333315372467041\noutput from current roi: avoidable\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.007137012481689453\noutput from current roi: self-prescribed\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005516910552978515\noutput from current roi: old-age\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005982935428619385\noutput from current roi: Carlylese\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005561506748199463\noutput from current roi: prioress\nExtracetd text:\n avoidable self-prescribed old-age Carlylese prioress\nPrediction time for image:  0.030531680583953856\nIntel Pytorch Optimizations has been Enabled!\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005342221260070801\noutput from current roi: mapall\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.006565296649932861\noutput from current roi: delitescency\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005243873596191407\noutput from current roi: WWz\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.005867362022399902\noutput from current roi: thunderer\nProcessing roi\nAverage Batch Inference time taken for in seconds ---\u003e  0.0051540017127990724\noutput from current roi: flukes\nExtracetd text:\n mapall delitescency WWz thunderer flukes\nPrediction time for image:  0.02817275524139404\nTotal pipeline prediction time for all the images:  0.0587044358253479\n```\n\n#### Output From Model Conversion to ONNX Format Using the Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e Toolkit\n\n```\n/historical-assets-main-test/src/crnn.py:87: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n  assert h == 1, \"the height of conv must be 1\"  # nosec\n/root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py:4476: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.\n  warnings.warn(\n============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============\nverbose: False, log level: Level.ERROR\n======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================\n```\n\n#### Output From Model Conversion to OpenVINO\u003csup\u003eTM\u003c/sup\u003e IR Format\n\n```\n[ INFO ] Generated IR will be compressed to FP16. If you get lower accuracy, please consider disabling compression explicitly by adding argument --compress_to_fp16=False.\nFind more information about compression to FP16 at https://docs.openvino.ai/2023.0/openvino_docs_MO_DG_FP16_Compression.html\n[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.\nFind more information about API v2.0 and IR v11 at https://docs.openvino.ai/2023.0/openvino_2_0_transition_guide.html\n[ SUCCESS ] Generated IR version 11 model.\n[ SUCCESS ] XML file: /historical-assets-main-test/src/openvino/test_model.xml\n[ SUCCESS ] BIN file: /historical-assets-main-test/src/openvino/test_model.bin\n```\n\n#### Output From Inference Performance of ONNX Model\n\n```\n[Step 1/11] Parsing and validating input arguments\n[ INFO ] Parsing input parameters\n[Step 2/11] Loading OpenVINO Runtime\n[ INFO ] OpenVINO:\n[ INFO ] Build ................................. 2023.1.0-12185-9e6b00e51cd-releases/2023/1\n[ INFO ]\n[ INFO ] Device info:\n[ INFO ] CPU\n[ INFO ] Build ................................. 2023.1.0-12185-9e6b00e51cd-releases/2023/1\n[ INFO ]\n[ INFO ]\n[Step 3/11] Setting device configuration\n[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT.\n```\n...\n```\n[Step 9/11] Creating infer requests and preparing input tensors\n[ WARNING ] No input files were given for input 'input.1'!. This input will be filled with random values!\n[ INFO ] Fill input 'input.1' with random values\n[Step 10/11] Measuring performance (Start inference asynchronously, 56 inference requests, limits: 60000 ms duration)\n[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).\n[ INFO ] First inference took 48.80 ms\n[Step 11/11] Dumping statistics report\n[ INFO ] Execution Devices:['CPU']\n[ INFO ] Count:            154168 iterations\n[ INFO ] Duration:         60024.32 ms\n[ INFO ] Latency:\n[ INFO ]    Median:        21.10 ms\n[ INFO ]    Average:       21.20 ms\n[ INFO ]    Min:           15.20 ms\n[ INFO ]    Max:           62.28 ms\n[ INFO ] Throughput:   2568.43 FPS\n```\n\n#### Output From Inference Performance of OpenVINO\u003csup\u003eTM\u003c/sup\u003e IR Model\n\n```\n-[Step 1/11] Parsing and validating input arguments\n[ INFO ] Parsing input parameters\n[Step 2/11] Loading OpenVINO Runtime\n[ INFO ] OpenVINO:\n[ INFO ] Build ................................. 2023.1.0-12185-9e6b00e51cd-releases/2023/1\n[ INFO ]\n[ INFO ] Device info:\n[ INFO ] CPU\n[ INFO ] Build ................................. 2023.1.0-12185-9e6b00e51cd-releases/2023/1\n[ INFO ]\n[ INFO ]\n[Step 3/11] Setting device configuration\n[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT.\n```\n...\n```\n[Step 9/11] Creating infer requests and preparing input tensors\n[ WARNING ] No input files were given for input 'input.1'!. This input will be filled with random values!\n[ INFO ] Fill input 'input.1' with random values\n[Step 10/11] Measuring performance (Start inference asynchronously, 56 inference requests, limits: 60000 ms duration)\n[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).\n[ INFO ] First inference took 47.88 ms\n[Step 11/11] Dumping statistics report\n[ INFO ] Execution Devices:['CPU']\n[ INFO ] Count:            157080 iterations\n[ INFO ] Duration:         60023.72 ms\n[ INFO ] Latency:\n[ INFO ]    Median:        20.77 ms\n[ INFO ]    Average:       20.80 ms\n[ INFO ]    Min:           17.80 ms\n[ INFO ]    Max:           65.43 ms\n[ INFO ] Throughput:   2616.97 FPS\n```\n\n#### Output From Model Conversion Using OpenVINO\u003csup\u003eTM\u003c/sup\u003e Post-training Optimization Tool (POT)\n\n```\n[ WARNING ] /root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/openvino/tools/accuracy_checker/preprocessor/launcher_preprocessing/ie_preprocessor.py:21: FutureWarning: OpenVINO Inference Engine Python API is deprecated and will be removed in 2024.0 release.For instructions on transitioning to the new API, please refer to https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html\n  from openvino.inference_engine import ResizeAlgorithm, PreProcessInfo, ColorFormat, MeanVariant  # pylint: disable=import-outside-toplevel,package-absolute-imports\n\n[ WARNING ] /root/miniconda3/envs/historical_assets_intel/lib/python3.9/site-packages/openvino/tools/accuracy_checker/launcher/dlsdk_launcher.py:60: FutureWarning: OpenVINO nGraph Python API is deprecated and will be removed in 2024.0 release.For instructions on transitioning to the new API, please refer to https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html\n  import ngraph as ng\n\nPost-training Optimization Tool is deprecated and will be removed in the future. Please use Neural Network Compression Framework instead: https://github.com/openvinotoolkit/nncf\nNevergrad package could not be imported. If you are planning to use any hyperparameter optimization algo, consider installing it using pip. This implies advanced usage of the tool. Note that nevergrad is compatible only with Python 3.7+\nPost-training Optimization Tool is deprecated and will be removed in the future. Please use Neural Network Compression Framework instead: https://github.com/openvinotoolkit/nncf\nINFO:openvino.tools.pot.app.run:Output log dir: ./results\nINFO:openvino.tools.pot.app.run:Creating pipeline:\n Algorithm: DefaultQuantization\n Parameters:\n        preset                     : mixed\n        stat_subset_size           : 300\n        target_device              : ANY\n        model_type                 : None\n        dump_intermediate_model    : False\n        inplace_statistics         : True\n        exec_log_dir               : ./results\n ===========================================================================\nINFO:openvino.tools.pot.data_loaders.image_loader:Layout value is set [N,C,H,W]\nINFO:openvino.tools.pot.pipeline.pipeline:Inference Engine version:                2023.1.0-12185-9e6b00e51cd-releases/2023/1\nINFO:openvino.tools.pot.pipeline.pipeline:Model Optimizer version:                 2023.1.0-12185-9e6b00e51cd-releases/2023/1\nINFO:openvino.tools.pot.pipeline.pipeline:Post-Training Optimization Tool version: 2023.1.0-12185-9e6b00e51cd-releases/2023/1\nINFO:openvino.tools.pot.statistics.collector:Start computing statistics for algorithms : DefaultQuantization\nINFO:openvino.tools.pot.statistics.collector:Computing statistics finished\nINFO:openvino.tools.pot.pipeline.pipeline:Start algorithm: DefaultQuantization\nINFO:openvino.tools.pot.algorithms.quantization.default.algorithm:Start computing statistics for algorithm : ActivationChannelAlignment\nINFO:openvino.tools.pot.algorithms.quantization.default.algorithm:Computing statistics finished\nINFO:openvino.tools.pot.algorithms.quantization.default.algorithm:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection\nINFO:openvino.tools.pot.algorithms.quantization.default.algorithm:Computing statistics finished\nINFO:openvino.tools.pot.pipeline.pipeline:Finished: DefaultQuantization\n ===========================================================================\n```\n\n#### Output From Inference Performance of Quantized INT8 Model\n\n```\n[Step 1/11] Parsing and validating input arguments\n[ INFO ] Parsing input parameters\n[Step 2/11] Loading OpenVINO Runtime\n[ INFO ] OpenVINO:\n[ INFO ] Build ................................. 2023.1.0-12185-9e6b00e51cd-releases/2023/1\n[ INFO ]\n[ INFO ] Device info:\n[ INFO ] CPU\n[ INFO ] Build ................................. 2023.1.0-12185-9e6b00e51cd-releases/2023/1\n[ INFO ]\n[ INFO ]\n[Step 3/11] Setting device configuration\n[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT.\n```\n...\n```\n[Step 9/11] Creating infer requests and preparing input tensors\n[ WARNING ] No input files were given for input 'input.1'!. This input will be filled with random values!\n[ INFO ] Fill input 'input.1' with random values\n[Step 10/11] Measuring performance (Start inference asynchronously, 56 inference requests, limits: 60000 ms duration)\n[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).\n[ INFO ] First inference took 5.68 ms\n[Step 11/11] Dumping statistics report\n[ INFO ] Execution Devices:['CPU']\n[ INFO ] Count:            653688 iterations\n[ INFO ] Duration:         60008.15 ms\n[ INFO ] Latency:\n[ INFO ]    Median:        5.03 ms\n[ INFO ]    Average:       5.05 ms\n[ INFO ]    Min:           4.56 ms\n[ INFO ]    Max:           15.89 ms\n[ INFO ] Throughput:   10893.32 FPS\n```\n\n## Summary and Next Steps\n\nThis reference kit presents an OCR solution specialized in the text recognition task through the implementation of a deep learning CRNN model. Furthermore, the CRNN text recogniton model leverages the optimizations given by Intel® Extension for PyTorch\\*, Intel® Neural Compressor and the Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e toolkit to accelerate its training, inference and end-to-end processing capabilities while maintaining the accuracy. Based on this setup, this reference kit emerges as an efficient tool to build and deploy an OCR system that is is able to match the resources demands of different production environments, including edge devices.\n\nAs next steps, the machine learning practitioners could adapt this OCR solution to train a different CRNN model with a custom dataset using Intel® Extension for PyTorch\\*, quantize the trained model with either Intel® Neural Compressor or the Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e toolkit to assess its inference gains, and finally, incorporate the trained or quantized model into an end-to-end pipeline to extract text from complex input document images.\n\n## Learn More\nFor more information about Predictive Asset Maintenance or to read about other relevant workflow examples, see these guides and software resources:\n\n- [Intel® AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html)\n- [Intel® Extension for PyTorch\\*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/optimization-for-pytorch.html#gs.5vjhbw)\n- [Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html#gs.5vjr1p)\n- [Intel® Distribution of OpenVINO\u003csup\u003eTM\u003c/sup\u003e Toolkit](https://www.intel.com/content/www/us/en/download/753640/intel-distribution-of-openvino-toolkit.html)\n\n## Troubleshooting\n1. Could not build wheels for pycocotools\n\n    **Issue:** \n      ```\n      ERROR: Could not build wheels for pycocotools, which is required to install pyproject.toml-based projects\n      ```\n\n    **Solution:**\n\n    Install gcc.  For Ubuntu, this will be: \n\n      ```bash\n      apt install gcc\n      ```\n\n2. libGL.so.1/libgthread-2.0.so.0: cannot open shared object file: No such file or directory\n   \n    **Issue:**\n      ```\n      ImportError: libGL.so.1: cannot open shared object file: No such file or directory\n      or\n      libgthread-2.0.so.0: cannot open shared object file: No such file or directory\n      ```\n\n    **Solution:**\n\n      Install the libgl11-mesa-glx and libglib2.0-0 libraries. For Ubuntu this will be:\n\n      ```bash\n     apt install libgl1-mesa-glx\n     apt install libglib2.0-0\n      ```\n\n## Support\nIf you have questions or issues about this workflow, want help with troubleshooting, want to report a bug or submit enhancement requests, please submit a GitHub issue.\n\n## Appendix\n\\*Names and brands that may be claimed as the property of others. [Trademarks](https://www.intel.com/content/www/us/en/legal/trademarks.html).\n\n### Disclaimer\n\nTo the extent that any public or non-Intel datasets or models are referenced by or accessed using tools or code on this site those datasets or models are provided by the third party indicated as the content source. Intel does not create the content and does not warrant its accuracy or quality. By accessing the public content, or using materials trained on or with such content, you agree to the terms associated with that content and that your use complies with the applicable license.\nIntel expressly disclaims the accuracy, adequacy, or completeness of any such public content, and is not liable for any errors, omissions, or defects in the content, or for any reliance on the content. Intel is not liable for any liability or damages relating to your use of public content.\n\n### References\n\n\u003ca id=\"hegghammer_2021\"\u003e[1]\u003c/a\u003e Hegghammer, T. (2021). OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment. Journal of Computational Social Science. https://doi.org/10.1007/s42001-021-00149-1\n\n‌\u003ca id=\"li_2022\"\u003e[2]\u003c/a\u003e Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., \u0026 Wei, F. (n.d.). TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. https://arxiv.org/pdf/2109.10282.pdf\n\n‌\u003ca id=\"memon_2020\"\u003e[3]\u003c/a\u003e Memon, J., Sami, M., Khan, R. A., \u0026 Uddin, M. (2020). Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR). IEEE Access, 8, 142642–142668. https://doi.org/10.1109/access.2020.3012542\n\n‌\u003ca id=\"faustomorales_2019\"\u003e[4]\u003c/a\u003e faustomorales. (2019, December 8). faustomorales/keras-ocr. GitHub. https://github.com/faustomorales/keras-ocr\n\n‌\u003ca id=\"thompson_2016\"\u003e[5]\u003c/a\u003e Thompson, P., Batista-Navarro, R. T., Kontonatsios, G., Carter, J., Toon, E., McNaught, J., Timmermann, C., Worboys, M., \u0026 Ananiadou, S. (2016). Text Mining the History of Medicine. PLOS ONE, 11(1), e0144717. https://doi.org/10.1371/journal.pone.0144717\n\n‌\u003ca id=\"shatri_2020\"\u003e[6]\u003c/a\u003e Shatri, E., \u0026 Fazekas, G. (n.d.). OPTICAL MUSIC RECOGNITION: STATE OF THE ART AND MAJOR CHALLENGES. https://arxiv.org/pdf/2006.07885.pdf\n\n‌\u003ca id=\"oucheikh_2022\"\u003e[7]\u003c/a\u003e Oucheikh, R., Pettersson, T., \u0026 Löfström, T. (2022). Product verification using OCR classification and Mondrian conformal prediction. Expert Systems with Applications, 188, 115942. https://doi.org/10.1016/j.eswa.2021.115942\n\n‌‌\u003ca id=\"arvindrao_2023\"\u003e[8]\u003c/a\u003e IJRASET. (n.d.). Automated Financial Documents Processing System Using Machine Learning. Www.ijraset.com. Retrieved September 19, 2023, from https://www.ijraset.com/research-paper/automated-financial-documents-processing-system\n\n‌\u003ca id=\"chollet_2017\"\u003e[9]\u003c/a\u003e Chollet, F. (2017). Deep learning with python. Manning Publications.\n\n‌\u003ca id=\"choi_2016\"\u003e[10]\u003c/a\u003e Choi, K., Fazekas, G., Sandler, M., \u0026 Cho, K. (n.d.). CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR MUSIC CLASSIFICATION. https://arxiv.org/pdf/1609.04243.pdf?source=post_page---------------------------\n\n‌\u003ca id=\"shi_2017\"\u003e[11]\u003c/a\u003e Shi, B., Bai, X., \u0026 Yao, C. (2017). An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304. https://doi.org/10.1109/tpami.2016.2646371\n\n‌\u003ca id=\"openvino\"\u003e[12]\u003c/a\u003e Benchmark Python Tool — OpenVINOTM documentation. (n.d.). Docs.openvino.ai. Retrieved September 28, 2023, from https://docs.openvino.ai/2023.1/openvino_inference_engine_tools_benchmark_tool_README.html\n\n‌\n\n","funding_links":[],"categories":["Table of Contents"],"sub_categories":["AI - Frameworks and Toolkits"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneapi-src%2Fhistorical-assets-document-process","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foneapi-src%2Fhistorical-assets-document-process","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneapi-src%2Fhistorical-assets-document-process/lists"}