{"id":25131990,"url":"https://github.com/taleblou/searchtextinimages_python","last_synced_at":"2025-04-03T00:11:49.961Z","repository":{"id":272733890,"uuid":"917581918","full_name":"taleblou/SearchTextInImages_Python","owner":"taleblou","description":"This script extracts text from images using EasyOCR, searches for specific predefined strings, and saves the results in a CSV file. It processes images in bulk from a specified directory, providing a streamlined way to analyze and search image text efficiently.","archived":false,"fork":false,"pushed_at":"2025-01-16T09:08:20.000Z","size":5,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-08T14:16:10.929Z","etag":null,"topics":["easyocr"],"latest_commit_sha":null,"homepage":"https://taleblou.ir/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/taleblou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-16T09:05:48.000Z","updated_at":"2025-01-16T09:08:43.000Z","dependencies_parsed_at":"2025-01-16T10:35:43.319Z","dependency_job_id":"d8d87e20-59fc-42de-88db-53dc70bd8b94","html_url":"https://github.com/taleblou/SearchTextInImages_Python","commit_stats":null,"previous_names":["taleblou/searchtextinimages_python"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taleblou%2FSearchTextInImages_Python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taleblou%2FSearchTextInImages_Python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taleblou%2FSearchTextInImages_Python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taleblou%2FSearchTextInImages_Python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/taleblou","download_url":"https://codeload.github.com/taleblou/SearchTextInImages_Python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246911470,"owners_count":20853657,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["easyocr"],"created_at":"2025-02-08T14:16:14.933Z","updated_at":"2025-04-03T00:11:49.942Z","avatar_url":"https://github.com/taleblou.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **Search Text in Images or Find Text in Images**\n\nThis script enables the user to extract text from images using Optical Character Recognition (OCR) and search for specific text patterns within the extracted text. It processes all images in a specified directory, identifies matches, and exports the results to a CSV file.\n\n## **Features**\n\n* **Text Extraction**: Uses the `EasyOCR` library to extract text from images.  \n* **Text Search**: Searches for predefined strings in the extracted text.  \n* **Batch Processing**: Processes all images in a given directory.  \n* **CSV Output**: Saves the results, including filenames and matching texts, into a CSV file.\n\n## **Requirements**\n\nMake sure you have the following libraries installed before running the script:\n\n* `easyocr`  \n* `Pillow` (Python Imaging Library)  \n* `pandas`  \n* `tqdm`\n\nYou can install these packages using pip:\n\nbash\n\nCopy code\n\n`pip install easyocr pillow pandas tqdm`\n\n## **How to Use**\n\n1. **Set Up the Directory**:  \n   * Place all the images you want to process in a directory.  \n   * Update the `directory` variable in the script with the path to your image directory.  \n2. **Predefine Search Texts**:  \n   * Add the text strings you want to search for in the `texts` list.  \n3. **Run the Script**:\n\nExecute the script in your Python environment:  \nbash  \nCopy code  \n`python main.py`\n\n*   \n4. **Check the Output**:  \n   * After running, the script generates a CSV file named `output.csv` containing the results:  \n     * **Filename**: Name of the image file.  \n     * **Matching\\_Text**: The text string that was matched.  \n     * **Extracted\\_Text**: Full text extracted from the image.\n\n## **Output Format**\n\nThe CSV file (`output.csv`) will have the following structure:\n\n| Filename | Matching\\_Text | Extracted\\_Text |\n| ----- | ----- | ----- |\n| image1.jpg | test1 | This is a test1 text |\n| image2.png | test2 | Sample text test2 |\n\n## **Error Handling**\n\n* If the script encounters an error while processing an image, it prints the error message and continues to the next image. This ensures the script doesn't stop due to a single problematic file.\n\n## **Customization**\n\n**Languages**: Update the `reader` initialization line to support additional languages. For example:  \npython  \nCopy code  \n`reader = easyocr.Reader(['en', 'fr'])  # English and French`\n\n*   \n* **File Extensions**: Modify the `if filename.endswith(...)` line to include additional image formats if needed.\n\n## **Example**\n\npython\n\nCopy code\n\n`# Directory containing images`\n\n`directory = r'D:\\Images'`\n\n`# Text to search for`\n\n`texts = [\"example\", \"sample\", \"demo\"]`\n\n`# Output file`\n\n`output.csv`\n\nRun the script, and it will process all images in `D:\\Images`, searching for \"example\", \"sample\", or \"demo\" in the extracted text and saving the results to `output.csv`.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaleblou%2Fsearchtextinimages_python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaleblou%2Fsearchtextinimages_python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaleblou%2Fsearchtextinimages_python/lists"}