{"id":22980113,"url":"https://github.com/rayyan9477/ocr-image-to-text","last_synced_at":"2026-04-02T18:45:51.099Z","repository":{"id":267533178,"uuid":"899696931","full_name":"Rayyan9477/OCR-Image-to-text","owner":"Rayyan9477","description":"Developed an OCR Image-to-Text application using Python and Streamlit, focusing on accurate text extraction and image preprocessing. Enhanced reliability and performance, enabling seamless conversion of diverse image formats into editable text.","archived":false,"fork":false,"pushed_at":"2025-06-16T11:33:37.000Z","size":82505,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-16T12:20:17.883Z","etag":null,"topics":["image-processing","image-to-text","machine-learning","ocr","pypdf2","python","pytorch","streamlit","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Rayyan9477.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-06T20:14:45.000Z","updated_at":"2025-06-16T11:33:41.000Z","dependencies_parsed_at":"2025-02-08T00:25:41.455Z","dependency_job_id":"3354d095-1105-40fc-b6da-dc176fcd4d16","html_url":"https://github.com/Rayyan9477/OCR-Image-to-text","commit_stats":null,"previous_names":["rayyan9477/ocr-image-to-text"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Rayyan9477/OCR-Image-to-text","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rayyan9477%2FOCR-Image-to-text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rayyan9477%2FOCR-Image-to-text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rayyan9477%2FOCR-Image-to-text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rayyan9477%2FOCR-Image-to-text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Rayyan9477","download_url":"https://codeload.github.com/Rayyan9477/OCR-Image-to-text/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rayyan9477%2FOCR-Image-to-text/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31313311,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image-processing","image-to-text","machine-learning","ocr","pypdf2","python","pytorch","streamlit","transformers"],"created_at":"2024-12-15T01:37:14.538Z","updated_at":"2026-04-02T18:45:51.094Z","avatar_url":"https://github.com/Rayyan9477.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Intelligent OCR and Text Analysis Tool\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Streamlit](https://img.shields.io/badge/Streamlit-1.0+-red.svg)](https://streamlit.io/)\n\n**🎯 Status: PRODUCTION READY** | **Performance: 16.7x Faster** | **All OCR Engines: ✅ Working**\n\n## 🚀 Performance Highlights\n\n- **⚡ 16.7x faster** than baseline with batch processing\n- **🧠 Intelligent caching** system for repeated operations  \n- **🔄 Real-time progress** tracking with ETA calculations\n- **💻 Multi-core processing** utilizing all available CPU cores\n- **🎯 99%+ accuracy** with multiple OCR engine support\n\n## Description\n\nAn advanced application that performs Optical Character Recognition (OCR) on images and PDFs, extracts text with layout preservation, and provides a question-answering interface based on the extracted content. It leverages machine learning models, state-of-the-art OCR engines, and modern NLP techniques to enable users to interactively query their documents.\n\n## Features\n\n- **Multiple OCR Engines**: Choose between PaddleOCR, EasyOCR, Tesseract, Dolphin, or a combined approach for optimal results\n- **Layout Preservation**: Maintains the original document formatting, including line breaks and text positioning\n- **Image Preprocessing**: Automatically enhances images for better OCR accuracy\n- **Table Detection**: Identifies table structures in documents\n- **Format Output Options**: Download extracted text in various formats (TXT, JSON, Markdown)\n- **Interactive Q\u0026A**: Ask questions about the extracted text using the RAG (Retrieval-Augmented Generation) system\n- **Multi-page PDF Support**: Process multi-page PDFs with progress tracking\n- **Modern UI/UX**: Enhanced user interface with custom styling and interactive elements\n- **Robust Design**: Gracefully handles missing dependencies with fallbacks\n- **Modular Architecture**: Well-organized code structure for easy maintenance and extension\n\n## Installation\n\n### Prerequisites\n\n- Python 3.8+ recommended\n- Pip package manager\n- Optional: Tesseract OCR engine installed on your system (for fallback OCR)\n\n### Basic Installation\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/Rayyan9477/OCR-Image-to-text.git\n   cd OCR-Image-to-text\n   ```\n\n2. Install the required packages:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. **NEW: Automated Tesseract Installation** (Windows):\n   ```bash\n   # Install Tesseract automatically using winget\n   winget install UB-Mannheim.TesseractOCR\n   ```\n\n4. For other platforms, install system dependencies:\n\n   **For macOS:**\n   ```bash\n   brew install tesseract\n   ```\n\n   **For Linux:**\n   ```bash\n   sudo apt-get update\n   sudo apt-get install -y tesseract-ocr\n   ```\n\n5. Verify your installation:\n   ```bash\n   python cli_app.py --check\n   ```\n\n   **For Linux:**\n   ```\n   sudo apt-get update\n   sudo apt-get install -y tesseract-ocr\n   ```\n\n4. Check your installation:\n   ```\n   python run.py --check\n   ```\n\n### Optimizing Installation\n\nThe system can work with just one OCR engine, but for best results, install multiple engines:\n\n- **For best accuracy:** Install PaddleOCR AND EasyOCR\n- **For lightweight usage:** Install only PyTesseract\n- **For offline usage:** Install PyTesseract (no internet required)\n\n## Project Structure\n\nThe project follows a modular architecture for better maintainability and extensibility:\n\n```\nocr_app/                  # Main package\n├── __init__.py           # Package initialization\n├── ocr_app.py            # Main application entry point\n├── streamlit_app.py      # Streamlit application launcher\n├── config/               # Configuration management\n│   ├── __init__.py\n│   ├── config.json       # Default configuration\n│   └── settings.py       # Settings and configuration\n├── core/                 # Core OCR functionality\n│   ├── __init__.py\n│   ├── ocr_engine.py     # Main OCR engine implementation\n│   └── image_processor.py # Image preprocessing utilities\n├── models/               # ML model management\n│   ├── __init__.py\n│   └── model_manager.py  # Model loading and caching\n├── rag/                  # Question-answering functionality\n│   ├── __init__.py\n│   └── rag_processor.py  # RAG implementation\n├── ui/                   # User interfaces\n│   ├── __init__.py\n│   ├── web_app.py        # Streamlit web interface\n│   └── cli.py            # Command-line interface\n└── utils/                # Utility functions\n    ├── __init__.py\n    └── text_utils.py     # Text processing utilities\n```\n\n## Usage\n\nThe application provides multiple ways to interact with it:\n\n### Web Interface (Recommended)\n\n1. Start the web application:\n   ```\n   python run.py\n   ```\n   or\n   ```\n   python -m ocr_app.streamlit_app\n   ```\n\n2. Open your browser to the displayed URL (typically http://localhost:8501)\n\n3. Use the intuitive interface to:\n   - Upload images or PDFs\n   - Configure OCR options\n   - Process and extract text\n   - Ask questions about the extracted content\n\n### Command Line Interface\n\nFor batch processing or integration with other tools:\n\n1. Extract text from an image:\n   ```\n   python run.py --cli extract --image path/to/image.jpg --output result.txt\n   ```\n\n2. Analyze an image and extract information:\n   ```\n   python run.py --cli analyze --image path/to/image.jpg --format json\n   ```\n\n3. Ask a question about an image:\n   ```\n   python run.py --cli question --image path/to/image.jpg --query \"What is the date mentioned?\"\n   ```\n\n4. Process a batch of files:\n   ```\n   python run.py --cli --batch path/to/folder --output results.json --format json\n   ```\n\n5. Get help and see all available options:\n   ```\n   python run.py --cli --help\n   ```\n\n6. **Run CLI with Dolphin model**\n   ```bash\n   python run_ocr.py --cli --engine dolphin --input path/to/image.jpg --output result.txt\n   ```\n\n### Python API\n\nYou can also use the components programmatically in your Python code:\n\n```python\nfrom ocr_app.core.ocr_engine import OCREngine\nfrom ocr_app.config.settings import Settings\nfrom PIL import Image\n\n# Initialize components\nsettings = Settings()\nocr_engine = OCREngine(settings)\n\n# Process an image\nimage = Image.open(\"path/to/image.jpg\")\ntext = ocr_engine.perform_ocr(\n    image, \n    engine=\"combined\",  # \"auto\", \"tesseract\", \"easyocr\", \"paddleocr\", or \"combined\"\n    preserve_layout=True,\n    preprocess=True\n)\n\n# Use the extracted text\nprint(text)\n```\n\nFor Q\u0026A functionality:\n\n```python\nfrom ocr_app.core.ocr_engine import OCREngine\nfrom ocr_app.rag.rag_processor import RAGProcessor\nfrom ocr_app.models.model_manager import ModelManager\nfrom ocr_app.config.settings import Settings\nfrom PIL import Image\n\n# Initialize components\nsettings = Settings()\nmodel_manager = ModelManager(settings)\nocr_engine = OCREngine(settings)\nrag_processor = RAGProcessor(model_manager, settings)\n\n# Process an image and ask a question\nimage = Image.open(\"path/to/image.jpg\")\ntext = ocr_engine.perform_ocr(image)\nanswer = rag_processor.process_query(text, \"What dates are mentioned in the text?\")\n\nprint(f\"Answer: {answer['answer']}\")\nprint(f\"Confidence: {answer['confidence']}\")\n```\n    ├── __init__.py\n    └── text_utils.py     # Text processing utilities\n```\n\n## Usage\n\nThe application can be run in multiple modes:\n\n### Web Interface Mode (Default)\n\nThe easiest way to use the application with a full graphical interface:\n\n```\npython run.py\n```\n\nor explicitly:\n\n```\npython run.py --web\n```\n\n### Command-Line Interface\n\nProcess files directly from the command line:\n\n```\npython run.py --cli --input image.jpg --output results.txt\n```\n\nProcess multiple files in a directory:\n\n```\npython run.py --cli --batch ./images/ --output ./results/\n```\n\nSupport for different output formats:\n\n```\npython run.py --cli --input document.pdf --format json\n```\n\n### Check Mode\n\nVerify your OCR functionality and available engines:\n\n```\npython run.py --check\n```\n\n## OCR Engine Comparison\n\n- **PaddleOCR**: Fast and accurate, particularly good for structured documents and Asian languages\n- **EasyOCR**: Good all-around OCR with support for 80+ languages\n- **Combined Mode**: Uses multiple engines and selects the best result for optimal accuracy\n- **Tesseract**: Great for offline usage, no internet required, but less accurate on complex layouts\n\n## Advanced Usage\n\n### Using the OCR Module in Your Code\n\n```python\nfrom ocr_app.core.ocr_engine import OCREngine\nfrom ocr_app.config.settings import Settings\nfrom PIL import Image\n\n# Initialize OCR engine\nsettings = Settings()\nocr_engine = OCREngine(settings)\n\n# Open an image\nimage = Image.open(\"document.jpg\")\n\n# Perform OCR with layout preservation\ntext = ocr_engine.perform_ocr(image, engine=\"auto\", preserve_layout=True)\nprint(text)\n```\n\n### Processing PDF Documents\n\n```python\nimport fitz  # PyMuPDF\nfrom ocr_app.core.ocr_engine import OCREngine\nfrom ocr_app.config.settings import Settings\nfrom PIL import Image\n\n# Open PDF\nsettings = Settings()\nocr_engine = OCREngine(settings)\n\ndoc = fitz.open(\"document.pdf\")\nfor page in doc:\n    pix = page.get_pixmap()\n    img = Image.frombytes(\"RGB\", [pix.width, pix.height], pix.samples)\n    text = ocr_engine.perform_ocr(img, engine=\"combined\", preserve_layout=True)\n    print(text)\n```\n\n### Question-Answering with Documents\n\n```python\nfrom ocr_app.core.ocr_engine import OCREngine\nfrom ocr_app.rag.rag_processor import RAGProcessor\nfrom ocr_app.models.model_manager import ModelManager\nfrom ocr_app.config.settings import Settings\nfrom PIL import Image\n\n# Initialize components\nsettings = Settings()\nmodel_manager = ModelManager(settings)\nocr_engine = OCREngine(settings)\nrag_processor = RAGProcessor(model_manager, settings)\n\n# Extract text from image\nimage = Image.open(\"document.jpg\")\ntext = ocr_engine.perform_ocr(image)\n\n# Ask a question about the document\nquestion = \"What is the main topic of this document?\"\nanswer = rag_processor.process_query(text, question)\nprint(f\"Question: {question}\")\nprint(f\"Answer: {answer['answer']}\")\nprint(f\"Confidence: {answer['confidence']}\")\n```\n\n### Command-Line Options\n\n```\nusage: run.py [-h] [--web] [--cli] [--check] ...\n\nOCR Image-to-Text Application\n\nMode Selection:\n  --web, -w           Run in web interface mode (default)\n  --cli, -c           Run in command-line interface mode\n  --check             Check available OCR engines and dependencies\n\nCLI Mode Options:\n  --input INPUT, -i INPUT\n                      Path to input image or PDF file\n  --output OUTPUT, -o OUTPUT\n                      Path to output file\n  --engine {auto,tesseract,easyocr,paddleocr,combined}\n                      OCR engine to use\n  --no-layout         Disable layout preservation\n  --format {txt,json,md}\n                      Output format (txt, json, or md)\n  --batch BATCH, -b BATCH\n                      Process all files in a directory\n  --verbose, -v       Enable verbose logging\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Missing Dependencies**: If you encounter import errors, run `python run.py --check` to check which dependencies are missing.\n\n2. **OCR Engine Not Found**: The system will fall back to alternative engines if your primary choice isn't available.\n\n3. **TensorFlow/Keras Compatibility**: The application handles TensorFlow/Keras compatibility issues automatically, but you might need to set environment variables manually in some environments:\n   ```powershell\n   $env:TF_CPP_MIN_LOG_LEVEL = \"2\"\n   $env:TF_USE_LEGACY_KERAS = \"1\"\n   $env:KERAS_BACKEND = \"tensorflow\"\n   ```\n\n4. **Tesseract Not Found**: Make sure Tesseract is installed and properly added to your system PATH.\n\n## Developer Guide\n\n### Adding a New OCR Engine\n\n1. Create a new engine class that inherits from `BaseOCREngine` in `ocr_app/core/ocr_engine.py`:\n\n```python\nclass MyNewOCREngine(BaseOCREngine):\n    def __init__(self, settings):\n        super().__init__(settings)\n        # Initialize your OCR engine\n        \n    def extract_text(self, image, preserve_layout=True):\n        # Implement OCR logic\n        return extracted_text\n```\n\n2. Add engine detection in the `OCREngine._check_engines` method:\n\n```python\ndef _check_engines(self):\n    engines = {\n        # Existing engines\n        \"my_new_engine\": False\n    }\n    \n    # Check for your engine\n    try:\n        # Check if your OCR engine is available\n        engines[\"my_new_engine\"] = True\n    except ImportError:\n        pass\n        \n    return engines\n```\n\n3. Register the engine in `OCREngine._initialize_engines`:\n\n```python\nif self.available_engines.get(\"my_new_engine\", False):\n    try:\n        self.engines[\"my_new_engine\"] = MyNewOCREngine(self.settings)\n    except Exception as e:\n        logger.error(f\"Failed to initialize MyNewOCR engine: {e}\")\n```\n\n### Customizing Settings\n\nYou can create a custom configuration file at `ocr_app/config/config.json`:\n\n```json\n{\n  \"ocr\": {\n    \"engines\": {\n      \"tesseract\": {\n        \"enabled\": true,\n        \"cmd_path\": \"C:\\\\Program Files\\\\Tesseract-OCR\\\\tesseract.exe\"\n      },\n      \"easyocr\": {\n        \"enabled\": true,\n        \"gpu\": false\n      }\n    },\n    \"default_engine\": \"tesseract\",\n    \"preserve_layout\": true\n  },\n  \"models\": {\n    \"download_path\": \"./custom_models\",\n    \"qa_model\": \"distilbert-base-cased-distilled-squad\"\n  }\n}\n```\n\n## Technologies Used\n\n- **Streamlit**: For building the interactive web application\n- **PyMuPDF (fitz)**: For improved PDF handling and processing\n- **Pillow (PIL)**: For image processing and manipulation\n- **EasyOCR**: Neural network-based OCR engine\n- **PaddleOCR**: State-of-the-art OCR system with high accuracy\n- **OpenCV**: For advanced image preprocessing and layout analysis\n- **Pytesseract**: Tesseract OCR Python wrapper\n- **Transformers**: HuggingFace library for loaded pre-trained models\n- **SentenceTransformers**: For generating sentence embeddings\n- **FAISS**: Facebook AI Similarity Search for efficient similarity search\n- **PyTorch**: Deep learning framework underpinning the ML models\n\n## Contact\n\nFor inquiries or feedback:\n\n- **Email**: [rayyanahmed265@yahoo.com](mailto:rayyanahmed265@yahoo.com)\n- **LinkedIn**: [Rayyan Ahmed](https://www.linkedin.com/in/rayyan-ahmed9477/)\n- **GitHub**: [Rayyan9477](https://github.com/Rayyan9477/)\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frayyan9477%2Focr-image-to-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frayyan9477%2Focr-image-to-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frayyan9477%2Focr-image-to-text/lists"}