https://github.com/pcisha/ocr-opencv-tesseract
OCR Image to Text Converter using OpenCV and Tesseract
https://github.com/pcisha/ocr-opencv-tesseract
Last synced: about 1 year ago
JSON representation
OCR Image to Text Converter using OpenCV and Tesseract
- Host: GitHub
- URL: https://github.com/pcisha/ocr-opencv-tesseract
- Owner: pcisha
- Created: 2025-02-05T08:14:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-05T08:53:39.000Z (over 1 year ago)
- Last Synced: 2025-02-05T09:32:46.284Z (over 1 year ago)
- Language: Java
- Size: 6.59 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ผ๏ธ OCR Image Processing Web Application
A Java Spring Boot web application that allows users to upload images (screenshots, scanned documents, etc.) and extract text using Tesseract OCR with advanced image preprocessing powered by OpenCV.
### ๐ Features
- ๐ค Upload Images: Supports multiple image formats (JPG, PNG, BMP).
- ๐ Accurate OCR: Preprocesses images for high-accuracy text extraction.
- ๐ File Size Validation: Supports file uploads up to 5MB.
- ๐งช Error Handling: Graceful error handling for large files, invalid images, and processing failures.
- โ๏ธ Dynamic Configurations: Easily configure library paths, file size limits, and other properties.
### โ๏ธ Technologies Used
- Java 17 (or higher)
- Spring Boot (REST API)
- OpenCV (Image Preprocessing)
- Tesseract OCR (Text Extraction)
- Maven (Dependency Management)
### ๐ Getting Started
##### 1๏ธโฃ Prerequisites
- Java 17+
- Maven
- OpenCV with Java Bindings
- Tesseract OCR
##### 2๏ธโฃ Clone the Repository
`git clone https://github.com/your-username/ocr-image-processor.git
cd ocr-image-processor`
##### 3๏ธโฃ Install Dependencies
`mvn clean install`
##### 4๏ธโฃ Run the Application
`mvn spring-boot:run`
##### 5๏ธโฃ API Endpoint
`POST api/ocr/upload
Content-Type: multipart/form-data`
### ๐ Configuration
Edit `src/main/resources/application.properties`:
#### File Upload Settings
`max.file.size.mb=5`
#### OpenCV and Tesseract Paths
OpenCV: `opencv.library.path=/usr/local/share/java/opencv4/libopencv_java4120.dylib`
Tesseract: `jna.library.path=/opt/homebrew/Cellar/tesseract/5.5.0/lib`
### ๐ฉ Error Handling
- 413 Payload Too Large: Triggered when a file exceeds 5MB.
- 400 Invalid Image: If the uploaded file is not a valid image.
- 500 Server Error: For unexpected issues during processing.
#
Date: February 5, 2025
Author: Prachi Shah @ https://pcisha.my.canva.site/
P.S. The default copyright laws apply.