{"id":23557076,"url":"https://github.com/lahcenezzara/python-tesseract-explorer","last_synced_at":"2025-05-15T23:08:55.726Z","repository":{"id":264390654,"uuid":"892908417","full_name":"LahcenEzzara/python-tesseract-explorer","owner":"LahcenEzzara","description":"Python Tesseract Explorer","archived":false,"fork":false,"pushed_at":"2024-11-23T22:26:39.000Z","size":523,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-17T14:29:09.474Z","etag":null,"topics":["image-processing","optical-character-recognition","pytesseract","python","tesseract-ocr"],"latest_commit_sha":null,"homepage":"https://lahcenezzara.github.io/python-tesseract-explorer/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LahcenEzzara.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-23T02:59:20.000Z","updated_at":"2024-11-23T22:30:27.000Z","dependencies_parsed_at":"2024-11-23T23:24:01.960Z","dependency_job_id":"4d8ab38c-56c3-48ec-8f43-3f63e0076051","html_url":"https://github.com/LahcenEzzara/python-tesseract-explorer","commit_stats":null,"previous_names":["lahcenezzara/python-tesseract-explorer"],"tags_count":0,"template":false,"template_full_name":"LahcenEzzara/python-explorer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LahcenEzzara%2Fpython-tesseract-explorer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LahcenEzzara%2Fpython-tesseract-explorer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LahcenEzzara%2Fpython-tesseract-explorer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LahcenEzzara%2Fpython-tesseract-explorer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LahcenEzzara","download_url":"https://codeload.github.com/LahcenEzzara/python-tesseract-explorer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254436949,"owners_count":22070947,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image-processing","optical-character-recognition","pytesseract","python","tesseract-ocr"],"created_at":"2024-12-26T14:19:13.753Z","updated_at":"2025-05-15T23:08:49.937Z","avatar_url":"https://github.com/LahcenEzzara.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python Tesseract Explorer\n\nThis repository demonstrates the use of Tesseract OCR in Python for text extraction from various image formats. It processes multiple images and extracts their textual content using the `pytesseract` library.\n\n## Features\n\n- Extract text from multiple image formats such as `.png`, `.jpg`, `.bmp`, `.gif`, `.webp`, and more.\n- Supports language-based OCR, including Arabic and other languages installed in Tesseract.\n- Organized directory structure for managing images and code.\n\n---\n\n## Installation\n\nFollow these steps to set up and run the project:\n\n### 1. Clone the Repository\n\n```bash\ngit clone https://github.com/LahcenEzzara/python-tesseract-explorer.git\ncd python-tesseract-explorer\n```\n\n### 2. Set Up Python Environment\n\nIt is recommended to use a virtual environment to manage dependencies.\n\n```bash\npython3 -m venv venv\nsource venv/bin/activate  # On Linux/Mac\n# OR\nvenv\\Scripts\\activate  # On Windows\n```\n\n### 3. Install Dependencies\n\nInstall the required Python libraries:\n\n```bash\npip install -r requirements.txt\n```\n\n### 4. Install Tesseract OCR\n\nEnsure Tesseract OCR is installed on your system. For Ubuntu, you can use:\n\n```bash\nsudo apt update\nsudo apt install tesseract-ocr\n```\n\n### 5. Install Additional Language Support\n\nTo process Arabic or other languages, install their respective Tesseract language data. For example:\n\n```bash\nsudo apt install tesseract-ocr-ara\n```\n\n---\n\n## Usage\n\nThe main script processes images in the `images/` folder and extracts text from each. To run the script:\n\n```bash\npython main.py\n```\n\nThe extracted text for each image will be printed in the terminal.\n\n---\n\n## Directory Structure\n\n```\npython-tesseract-explorer/\n├── images/                 # Folder containing test images\n│   ├── test_ar.png\n│   ├── test_la.png\n│   ├── test-european.jpg\n│   ├── test-small.jpg\n│   ├── test.bmp\n│   ├── test.gif\n│   ├── test.jpg\n│   ├── test.png\n│   ├── test.webp\n│   └── ...\n├── main.py                 # Python script for OCR\n├── requirements.txt        # Python dependencies\n├── README.md               # Project documentation\n└── LICENSE                 # License file\n```\n\n---\n\n## Example Output\n\nWhen running the script, you will see output similar to this:\n\n```\nProcessing: images/test_ar.png\nExtracted Text from test_ar.png:\nالسلام عليكم\n\n----------------------------------------\nProcessing: images/test_la.png\nExtracted Text from test_la.png:\nHello World!\n\n----------------------------------------\n...\n```\n\n---\n\n## Dependencies\n\n- **Python 3.8+**\n- **Tesseract OCR**\n- **Pytesseract**: Python wrapper for Tesseract OCR\n- **Pillow**: Python Imaging Library for image processing\n\nInstall them using:\n\n```bash\npip install pytesseract pillow\n```\n\n---\n\n## Contributing\n\nContributions are welcome! Please feel free to submit issues or pull requests.\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n---\n\n### Notes\n\n- Make sure your `tesseract` binary is properly installed and accessible from the command line.\n- If a language is missing, download the `.traineddata` file from the [Tesseract tessdata repository](https://github.com/tesseract-ocr/tessdata) and place it in your Tesseract `tessdata` folder.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flahcenezzara%2Fpython-tesseract-explorer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flahcenezzara%2Fpython-tesseract-explorer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flahcenezzara%2Fpython-tesseract-explorer/lists"}