https://github.com/tsvetang2/advanced-local-ocr

Advanced local OCR is a project, inspired by the text extraction some AIs do. So instead of leaving people paying for such services, why not publish a open-source version, that keeps the privacy of each user. The app allows integration with LLMs via APIs.
https://github.com/tsvetang2/advanced-local-ocr

desktop-app easyocr gui-application image-processing llms local-first ocr ocr-engine ocr-python ocr-text-reader privacy pyqt5 pyqt5-desktop-application tesseract text-cleaning text-recognition-from-image

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/tsvetang2/advanced-local-ocr
Owner: TsvetanG2
License: mit
Created: 2025-06-28T02:32:44.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-06-28T03:17:21.000Z (9 months ago)
Last Synced: 2025-06-28T03:29:02.788Z (9 months ago)
Topics: desktop-app, easyocr, gui-application, image-processing, llms, local-first, ocr, ocr-engine, ocr-python, ocr-text-reader, privacy, pyqt5, pyqt5-desktop-application, tesseract, text-cleaning, text-recognition-from-image
Language: Python
Homepage: https://github.com/TsvetanG2/Advanced-Local-OCR
Size: 0 Bytes
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

# 🔍 Advanced Local OCR Studio

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyQt5](https://img.shields.io/badge/GUI-PyQt5-green.svg)](https://pypi.org/project/PyQt5/)

A **modern, professional OCR application** with intelligent text cleaning and beautiful UI. Built for **local-first processing** with optional AI enhancements.

## ✨**Why This OCR App is Special**

**Solves Real OCR Problems** - Transforms garbled OCR output like `"cyoGuoyy pu"` into perfect text: `"Trailing only Facebook Messenger, WeChat is now the second most popular messaging platform in Bhutan and Mongolia."`

**Local-First Design** - Works completely offline, no API keys required, your images never leave your computer

**Beautiful Modern UI** - Professional design that rivals commercial software (9.2/10 visual rating)

**Smart Text Cleaning** - Advanced algorithms that reconstruct coherent text from fragmented OCR results

## **Key Features**

### **Dual OCR Engine Power**
- **EasyOCR + Tesseract** - Best of both worlds for maximum accuracy
- **Smart Fallback System** - Automatically chooses the best engine for each image
- **Advanced Preprocessing** - Adaptive enhancement, noise reduction, deskewing
- **Confidence-Based Processing** - Intelligent quality assessment

### **Revolutionary Text Cleaning**
- **Smart Fragment Reconstruction** - Rebuilds coherent sentences from OCR fragments
- **Duplicate Elimination** - Removes redundant and overlapping text
- **Error Pattern Recognition** - Fixes common OCR mistakes automatically
- **Context-Aware Processing** - Understands text patterns for better results

### **Professional User Interface**
- **Modern Design Language** - Beautiful gradients, rounded corners, professional styling
- **Intuitive Workflow** - Load → Process → Review → Compare
- **Real-Time Feedback** - Progress indicators, status updates, confidence scores
- **Responsive Layout** - Adapts to different screen sizes

### **Advanced Analysis**
- **Text Comparison Engine** - Detailed accuracy analysis with similarity scoring
- **Confidence Visualization** - Color-coded results (🟢🟡🔴)
- **Processing Insights** - Engine performance, timing, quality metrics
- **Error Categorization** - Detailed breakdown of text differences

### **Developer-Friendly**
- **Modular Architecture** - Clean, extensible codebase
- **Comprehensive Logging** - Detailed debugging information
- **Thread-Safe Design** - Proper cleanup, no memory leaks
- **Well-Documented** - Clear code comments and documentation

## **Quick Start**

### **Prerequisites**
- Python 3.8 or higher
- Tesseract OCR (recommended)

### **1. Clone & Install**
```bash
# Clone the repository
git clone https://github.com/yourusername/advanced-local-ocr-studio.git
cd advanced-local-ocr-studio

# Install core dependencies
pip install -r requirements.txt
```

### **2. Install OCR Engines**
```bash
# Install EasyOCR (recommended)
pip install easyocr

# Install Tesseract
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# macOS: brew install tesseract
# Linux: sudo apt-get install tesseract-ocr
pip install pytesseract
```

### **3. Run the Application**
```bash
# Simple start
python app.py

# Or directly
python enhanced_ocr_app.py
```

### **4. Basic Workflow**
1. **Load Image** - Click "📂 Load Image" or drag & drop
2. **Configure** - Enable preprocessing, choose OCR engine
3. **Extract** - Click "✨ Extract Text" to process
4. **Compare** - (Optional) Enter expected text for accuracy analysis
5. **Review** - Check cleaned results and raw OCR data

## 🎯 **Real-World Example**

### **The Problem This Solves**
Many OCR tools produce garbled, unusable output. Here's a real example:

**❌ Typical OCR Output:**
```
"cyoGuoyy pu"
```

**❌ Raw OCR with Artifacts:**
```
"Trailing only Facebook Messenger, WeChat is now the second most popular messaging platform in Bhutan and Mongolia. Bhutan and Mongolia. popular messaging platform in Trailing only Facebook Messenger, WeChat is now the second most Bhutanland Trailing only Facebook Messenger Trai €b: Més: WeChat'is:now:the:second most: popuilar:mess lattormin id Mon Mongoliax WeChatis now [he second mosti popular messaging platform;jn"
```

**✅ Our Smart Cleaned Result:**
```
"Trailing only Facebook Messenger, WeChat is now the second most popular messaging platform in Bhutan and Mongolia."
```

**🎯 100% Perfect Accuracy!**

## ⚙️ **Configuration**

The application uses smart defaults but can be customized via `config/settings.yaml`:

### **OCR Settings**
```yaml
ocr:
engines:
easyocr:
enabled: true
gpu: false # Set to true if you have CUDA GPU
text_threshold: 0.8
tesseract:
enabled: true
oem: 1 # LSTM OCR Engine
psm: 6 # Uniform block of text
```

### **Smart Cleaning**
```yaml
text_cleaning:
smart_cleaner: true # Use advanced text reconstruction
confidence_threshold: 0.3
min_text_length: 2
```

### **UI Preferences**
```yaml
ui:
theme: "modern" # Modern blue theme
window_size: [1200, 800]
auto_save_settings: true
```
## **Media**
### **Main window**
![Screenshot 2025-06-28 173301](https://github.com/user-attachments/assets/1e568fc3-731e-4189-8dd1-fc2867d927ce)

## 🏗️ **Architecture**

### **Local-First Design**
- **Offline Processing**: Works completely without internet
- **Privacy-Focused**: Images never leave your computer
- **Fast Performance**: No network latency or API limits
- **Optional AI**: LLM features are completely optional

### **🧩 Modular Components**
```
📁 Advanced Local OCR Studio
├── 🎨 enhanced_ocr_app.py # Beautiful main application
├── 🚀 app.py # Simple entry point
├── 📁 src/
│ ├── 🔍 core/ # OCR processing engines
│ │ ├── local_ocr.py # Dual OCR engine manager
│ │ ├── smart_text_cleaner.py # Revolutionary text cleaning
│ │ └── text_processors.py # Analysis and comparison
│ └── 🛠️ utils/ # Configuration and utilities
├── 📁 tests/ # Comprehensive test suite
└── 📁 docs/ # Documentation
```

## 🤝 **Contributing**

We welcome contributions! This project is designed to be **developer-friendly**.

### **🚀 Quick Start for Contributors**
```bash
# Fork and clone
git clone https://github.com/yourusername/advanced-local-ocr-studio.git
cd advanced-local-ocr-studio

# Set up development environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/
python tests/test_installation.py
```

### **📋 Contribution Areas**
- **Language Support**: Add support for more languages
- **OCR Engines**: Integrate additional OCR engines
- **UI Improvements**: Enhance the beautiful interface
- **Smart Cleaning**: Improve text reconstruction algorithms
- **Documentation**: Help others understand and use the project

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

## 📄 **License**

This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.

## 🙏 **Acknowledgments**

- **EasyOCR** team for excellent neural OCR
- **Tesseract** community for robust traditional OCR
- **PyQt5** for powerful GUI framework
- **Open Source Community** for inspiration and support

## 📞 **Support & Community**

- **Documentation**: [docs/](docs/) folder
- **Bug Reports**: [GitHub Issues](https://github.com/TsvetanG2/advanced-local-ocr-studio/issues)
- **Feature Requests**: [GitHub Discussions](https://github.com/TsvetanG2/Advanced-Local-OCR/discussions)
- **Show Support**: Star the repository if you find it useful!

---

**🎉 Transform your OCR experience with intelligent text cleaning and beautiful design!**

[![Star this repo](https://img.shields.io/github/stars/yourusername/advanced-local-ocr-studio?style=social)](https://github.com/TsvetanG2/Advanced-Local-OCR)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tsvetang2/advanced-local-ocr

Awesome Lists containing this project

README