An open API service indexing awesome lists of open source software.

https://github.com/varshneydevansh/tejocr


https://github.com/varshneydevansh/tejocr

Last synced: 5 months ago
JSON representation

Awesome Lists containing this project

README

          


TejOCR Logo

# TejOCR v0.1.5 - LibreOffice OCR Extension

๐ŸŽ‰ **Phase 2 Complete: Professional UI/UX with Real Configurable Dialogs!**

TejOCR is a powerful LibreOffice extension that adds Optical Character Recognition (OCR) capabilities to your documents. Extract text from images directly within LibreOffice Writer.

## โœ… What's New in v0.1.5

**๐ŸŽจ COMPLETE UI/UX OVERHAUL:**
- โœ… **Real Settings Dialog**: Configurable XDL-based settings with dependency checking
- โœ… **Professional OCR Options Dialog**: Language selection, output modes, advanced options
- โœ… **Smart Workflow Integration**: Seamless dialog flow for both OCR methods
- โœ… **Enhanced User Experience**: Grouped controls, helpful hints, and error guidance

**๐Ÿ”ง MAJOR IMPROVEMENTS:**
- **Dependency Status Dashboard**: Live status checking with installation guidance
- **Tesseract Path Configuration**: Browse, test, and validate Tesseract installation
- **Advanced OCR Options**: Page segmentation modes, engine modes, preprocessing
- **Multiple Output Modes**: Cursor, text box, replace image, clipboard
- **Smart Defaults**: Remembers your preferences between sessions

## ๐ŸŽฏ Current Status

**Phase 1 (Core Stability)**: โœ… **COMPLETE**
- Core OCR functionality fully working
- Multi-strategy error handling
- Robust dependency detection

**Phase 2 (Professional UI/UX)**: โœ… **COMPLETE**
- Real XDL-based dialogs
- Configurable settings system
- Professional user experience
- Advanced OCR options

**Phase 3 (Advanced Features)**: ๐Ÿšง **Next Priority**
- Batch processing capabilities
- Enhanced output formatting
- Performance optimizations

## ๐Ÿš€ Quick Start

### Prerequisites

1. **Tesseract OCR** (Required):
```bash
# macOS
brew install tesseract

# Ubuntu/Debian
sudo apt install tesseract-ocr

# Windows
# Download from: https://github.com/UB-Mannheim/tesseract/wiki
```

2. **Python Dependencies** (for LibreOffice's Python):

**Automated Installation** (Recommended):
```bash
python3 install_dependencies.py
```

**Manual Installation**:
```bash
# Get LibreOffice's Python path first
/Applications/LibreOffice.app/Contents/Frameworks/LibreOfficePython.framework/Versions/Current/bin/python3 -m pip install numpy pytesseract pillow
```

### Installation

1. **Download**: Get the latest `TejOCR-0.1.5.oxt` from releases
2. **Install**: LibreOffice โ†’ Tools โ†’ Extension Manager โ†’ Add โ†’ Select the .oxt file
3. **Restart**: Close and restart LibreOffice completely
4. **Verify**: Look for "TejOCR" in the top menu bar

### Usage

1. **Open LibreOffice Writer**
2. **Configure Settings**: Tools โ†’ TejOCR โ†’ Settings (first time setup)
3. **For File OCR**: Tools โ†’ TejOCR โ†’ OCR Image from File โ†’ Select options โ†’ Start OCR
4. **For Selected Image**: Insert image โ†’ Select it โ†’ Tools โ†’ TejOCR โ†’ OCR Selected Image โ†’ Select options โ†’ Start OCR

## ๐Ÿ”ง Troubleshooting

### Check Dependencies
Go to **Tools โ†’ TejOCR โ†’ Settings** to see real-time status:
- โœ… Tesseract: Shows installed version and path
- โœ… Python packages: Shows NumPy, Pytesseract, Pillow status
- ๐Ÿ“ **Browse & Test**: Built-in path finder and validator

### Common Issues

**"Settings dialog won't open"**:
- Check LibreOffice version (4.0+ required)
- Restart LibreOffice completely
- Check extension is properly installed

**"OCR options not working"**:
- Use Settings dialog to verify all dependencies
- Check Tesseract path with built-in tester
- Ensure image is properly selected

### Advanced Configuration
- **Language Selection**: Choose from all installed Tesseract languages
- **Output Modes**: Customize where text appears
- **Page Segmentation**: Optimize for different image types
- **Preprocessing**: Enable image enhancement for better results

## ๐Ÿ—๏ธ Development

### Building from Source
```bash
git clone
cd TejOCR
python3 build.py
```

### Project Structure
```
TejOCR/
โ”œโ”€โ”€ python/tejocr/ # Main Python package
โ”‚ โ”œโ”€โ”€ constants.py # Version and configuration constants
โ”‚ โ”œโ”€โ”€ tejocr_service.py # Main UNO service with dialog integration
โ”‚ โ”œโ”€โ”€ tejocr_engine.py # OCR processing engine
โ”‚ โ”œโ”€โ”€ tejocr_output.py # Text insertion handling
โ”‚ โ”œโ”€โ”€ tejocr_dialogs.py # Professional XDL dialog handlers
โ”‚ โ””โ”€โ”€ uno_utils.py # UNO utilities and helpers
โ”œโ”€โ”€ dialogs/ # XDL dialog definitions
โ”‚ โ”œโ”€โ”€ tejocr_settings_dialog.xdl # Settings UI
โ”‚ โ””โ”€โ”€ tejocr_options_dialog.xdl # OCR options UI
โ”œโ”€โ”€ icons/ # Extension icons
โ”œโ”€โ”€ description.xml # Extension metadata
โ”œโ”€โ”€ Addons.xcu # LibreOffice menu/toolbar integration
โ””โ”€โ”€ build.py # Build script
```

## ๐Ÿ“ License

This project is licensed under the Mozilla Public License 2.0 - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ™ Acknowledgments

- Tesseract OCR team for the excellent OCR engine
- LibreOffice community for extension development resources
- Python community for pytesseract and imaging libraries

---

**Note**: This is v0.1.5 with Phase 2 (Professional UI/UX) complete. Phase 3 (Advanced Features) is coming next!

For detailed changes and technical information, see [CHANGELOG.md](CHANGELOG.md).

## ๐Ÿง  About the Name

**Tej** (เคคเฅ‡เคœ) in Sanskrit and other Indian languages means *light*, *effulgence*, *sharpness*, or *brilliance*. **TejOCR** aims to bring clarity and insight to your documents by making the text within images accessible and editable.

## ๐Ÿ“ง Contact

* Maintainer: **Devansh Varshney**
* GitHub: [varshneydevansh](https://github.com/varshneydevansh)
* Twitter: [@varshneydevansh](https://x.com/varshneydevansh)