{"id":28918802,"url":"https://github.com/varshneydevansh/tejocr","last_synced_at":"2026-02-02T16:43:28.642Z","repository":{"id":293614922,"uuid":"984363873","full_name":"varshneydevansh/TejOCR","owner":"varshneydevansh","description":null,"archived":false,"fork":false,"pushed_at":"2025-05-26T18:27:24.000Z","size":20177,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-22T02:46:31.181Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/varshneydevansh.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-15T20:09:59.000Z","updated_at":"2025-06-08T07:58:13.000Z","dependencies_parsed_at":"2025-06-22T02:49:50.743Z","dependency_job_id":null,"html_url":"https://github.com/varshneydevansh/TejOCR","commit_stats":null,"previous_names":["varshneydevansh/tejocr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/varshneydevansh/TejOCR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varshneydevansh%2FTejOCR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varshneydevansh%2FTejOCR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varshneydevansh%2FTejOCR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varshneydevansh%2FTejOCR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/varshneydevansh","download_url":"https://codeload.github.com/varshneydevansh/TejOCR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varshneydevansh%2FTejOCR/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29015332,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-02T16:17:30.374Z","status":"ssl_error","status_checked_at":"2026-02-02T15:58:50.469Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-22T02:39:16.168Z","updated_at":"2026-02-02T16:43:28.637Z","avatar_url":"https://github.com/varshneydevansh.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- This Source Code Form is subject to the terms of the Mozilla Public --\u003e\n\u003c!-- License, v. 2.0. If a copy of the MPL was not distributed with this --\u003e\n\u003c!-- file, You can obtain one at https://mozilla.org/MPL/2.0/. --\u003e\n\u003c!-- © 2025 Devansh (Author of TejOCR) --\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"icons/main_logo.png\" alt=\"TejOCR Logo\" width=\"360\" style=\"margin-bottom: -20px;\"/\u003e\n\u003c/div\u003e\n\n# TejOCR v0.1.5 - LibreOffice OCR Extension\n\n🎉 **Phase 2 Complete: Professional UI/UX with Real Configurable Dialogs!** \n\nTejOCR is a powerful LibreOffice extension that adds Optical Character Recognition (OCR) capabilities to your documents. Extract text from images directly within LibreOffice Writer.\n\n## ✅ What's New in v0.1.5\n\n**🎨 COMPLETE UI/UX OVERHAUL:**\n- ✅ **Real Settings Dialog**: Configurable XDL-based settings with dependency checking\n- ✅ **Professional OCR Options Dialog**: Language selection, output modes, advanced options\n- ✅ **Smart Workflow Integration**: Seamless dialog flow for both OCR methods\n- ✅ **Enhanced User Experience**: Grouped controls, helpful hints, and error guidance\n\n**🔧 MAJOR IMPROVEMENTS:**\n- **Dependency Status Dashboard**: Live status checking with installation guidance\n- **Tesseract Path Configuration**: Browse, test, and validate Tesseract installation\n- **Advanced OCR Options**: Page segmentation modes, engine modes, preprocessing\n- **Multiple Output Modes**: Cursor, text box, replace image, clipboard\n- **Smart Defaults**: Remembers your preferences between sessions\n\n## 🎯 Current Status\n\n**Phase 1 (Core Stability)**: ✅ **COMPLETE**\n- Core OCR functionality fully working\n- Multi-strategy error handling\n- Robust dependency detection\n\n**Phase 2 (Professional UI/UX)**: ✅ **COMPLETE**\n- Real XDL-based dialogs\n- Configurable settings system  \n- Professional user experience\n- Advanced OCR options\n\n**Phase 3 (Advanced Features)**: 🚧 **Next Priority**\n- Batch processing capabilities\n- Enhanced output formatting\n- Performance optimizations\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n1. **Tesseract OCR** (Required):\n```bash\n   # macOS\nbrew install tesseract\n\n   # Ubuntu/Debian\nsudo apt install tesseract-ocr\n   \n   # Windows\n   # Download from: https://github.com/UB-Mannheim/tesseract/wiki\n   ```\n\n2. **Python Dependencies** (for LibreOffice's Python):\n   \n   **Automated Installation** (Recommended):\n   ```bash\n   python3 install_dependencies.py\n   ```\n\n   **Manual Installation**:\n   ```bash\n   # Get LibreOffice's Python path first\n   /Applications/LibreOffice.app/Contents/Frameworks/LibreOfficePython.framework/Versions/Current/bin/python3 -m pip install numpy pytesseract pillow\n   ```\n\n### Installation\n\n1. **Download**: Get the latest `TejOCR-0.1.5.oxt` from releases\n2. **Install**: LibreOffice → Tools → Extension Manager → Add → Select the .oxt file\n3. **Restart**: Close and restart LibreOffice completely\n4. **Verify**: Look for \"TejOCR\" in the top menu bar\n\n### Usage\n\n1. **Open LibreOffice Writer**\n2. **Configure Settings**: Tools → TejOCR → Settings (first time setup)\n3. **For File OCR**: Tools → TejOCR → OCR Image from File → Select options → Start OCR\n4. **For Selected Image**: Insert image → Select it → Tools → TejOCR → OCR Selected Image → Select options → Start OCR\n\n## 🔧 Troubleshooting\n\n### Check Dependencies\nGo to **Tools → TejOCR → Settings** to see real-time status:\n- ✅ Tesseract: Shows installed version and path\n- ✅ Python packages: Shows NumPy, Pytesseract, Pillow status\n- 📁 **Browse \u0026 Test**: Built-in path finder and validator\n\n### Common Issues\n\n**\"Settings dialog won't open\"**:\n- Check LibreOffice version (4.0+ required)\n- Restart LibreOffice completely\n- Check extension is properly installed\n\n**\"OCR options not working\"**:\n- Use Settings dialog to verify all dependencies\n- Check Tesseract path with built-in tester\n- Ensure image is properly selected\n\n### Advanced Configuration\n- **Language Selection**: Choose from all installed Tesseract languages\n- **Output Modes**: Customize where text appears\n- **Page Segmentation**: Optimize for different image types\n- **Preprocessing**: Enable image enhancement for better results\n\n## 🏗️ Development\n\n### Building from Source\n   ```bash\ngit clone \u003crepository\u003e\ncd TejOCR\npython3 build.py\n   ```\n\n### Project Structure\n```\nTejOCR/\n├── python/tejocr/          # Main Python package\n│   ├── constants.py        # Version and configuration constants\n│   ├── tejocr_service.py   # Main UNO service with dialog integration\n│   ├── tejocr_engine.py    # OCR processing engine\n│   ├── tejocr_output.py    # Text insertion handling\n│   ├── tejocr_dialogs.py   # Professional XDL dialog handlers\n│   └── uno_utils.py        # UNO utilities and helpers\n├── dialogs/                # XDL dialog definitions\n│   ├── tejocr_settings_dialog.xdl     # Settings UI\n│   └── tejocr_options_dialog.xdl      # OCR options UI\n├── icons/                  # Extension icons\n├── description.xml         # Extension metadata\n├── Addons.xcu             # LibreOffice menu/toolbar integration\n└── build.py               # Build script\n```\n\n## 📝 License\n\nThis project is licensed under the Mozilla Public License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- Tesseract OCR team for the excellent OCR engine\n- LibreOffice community for extension development resources\n- Python community for pytesseract and imaging libraries\n\n---\n\n**Note**: This is v0.1.5 with Phase 2 (Professional UI/UX) complete. Phase 3 (Advanced Features) is coming next!\n\nFor detailed changes and technical information, see [CHANGELOG.md](CHANGELOG.md).\n\n## 🧠 About the Name\n\n**Tej** (तेज) in Sanskrit and other Indian languages means *light*, *effulgence*, *sharpness*, or *brilliance*. **TejOCR** aims to bring clarity and insight to your documents by making the text within images accessible and editable.\n\n## 📧 Contact\n\n*   Maintainer: **Devansh Varshney**\n*   GitHub: [varshneydevansh](https://github.com/varshneydevansh)\n*   Twitter: [@varshneydevansh](https://x.com/varshneydevansh)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvarshneydevansh%2Ftejocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvarshneydevansh%2Ftejocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvarshneydevansh%2Ftejocr/lists"}