An open API service indexing awesome lists of open source software.

https://github.com/pcisha/ocr-opencv-tesseract

OCR Image to Text Converter using OpenCV and Tesseract
https://github.com/pcisha/ocr-opencv-tesseract

Last synced: about 1 year ago
JSON representation

OCR Image to Text Converter using OpenCV and Tesseract

Awesome Lists containing this project

README

          

# ๐Ÿ–ผ๏ธ OCR Image Processing Web Application

A Java Spring Boot web application that allows users to upload images (screenshots, scanned documents, etc.) and extract text using Tesseract OCR with advanced image preprocessing powered by OpenCV.

### ๐Ÿš€ Features

- ๐Ÿ“ค Upload Images: Supports multiple image formats (JPG, PNG, BMP).

- ๐Ÿ” Accurate OCR: Preprocesses images for high-accuracy text extraction.

- ๐Ÿ“ File Size Validation: Supports file uploads up to 5MB.

- ๐Ÿงช Error Handling: Graceful error handling for large files, invalid images, and processing failures.

- โš™๏ธ Dynamic Configurations: Easily configure library paths, file size limits, and other properties.

### โš™๏ธ Technologies Used

- Java 17 (or higher)

- Spring Boot (REST API)
- OpenCV (Image Preprocessing)
- Tesseract OCR (Text Extraction)
- Maven (Dependency Management)

### ๐Ÿš€ Getting Started

##### 1๏ธโƒฃ Prerequisites

- Java 17+

- Maven

- OpenCV with Java Bindings

- Tesseract OCR

##### 2๏ธโƒฃ Clone the Repository

`git clone https://github.com/your-username/ocr-image-processor.git
cd ocr-image-processor`

##### 3๏ธโƒฃ Install Dependencies

`mvn clean install`

##### 4๏ธโƒฃ Run the Application

`mvn spring-boot:run`

##### 5๏ธโƒฃ API Endpoint

`POST api/ocr/upload
Content-Type: multipart/form-data`

### ๐Ÿ“„ Configuration

Edit `src/main/resources/application.properties`:

#### File Upload Settings
`max.file.size.mb=5`

#### OpenCV and Tesseract Paths
OpenCV: `opencv.library.path=/usr/local/share/java/opencv4/libopencv_java4120.dylib`

Tesseract: `jna.library.path=/opt/homebrew/Cellar/tesseract/5.5.0/lib`

### ๐Ÿšฉ Error Handling

- 413 Payload Too Large: Triggered when a file exceeds 5MB.

- 400 Invalid Image: If the uploaded file is not a valid image.

- 500 Server Error: For unexpected issues during processing.

#
Date: February 5, 2025

Author: Prachi Shah @ https://pcisha.my.canva.site/

P.S. The default copyright laws apply.