{"id":28664184,"url":"https://github.com/b-luis/py-textractify","last_synced_at":"2026-05-11T07:07:18.872Z","repository":{"id":297455254,"uuid":"996821468","full_name":"b-luis/py-textractify","owner":"b-luis","description":"📸  Optical Character Recognition for scanned images using Tesseract and OpenCV.","archived":false,"fork":false,"pushed_at":"2025-06-05T14:20:44.000Z","size":635,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"dev","last_synced_at":"2025-06-05T15:29:30.428Z","etag":null,"topics":["opencv","python","tesseract-ocr"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/b-luis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-05T14:10:08.000Z","updated_at":"2025-06-05T14:21:02.000Z","dependencies_parsed_at":"2025-06-10T10:15:18.814Z","dependency_job_id":null,"html_url":"https://github.com/b-luis/py-textractify","commit_stats":null,"previous_names":["b-luis/py-textractify"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/b-luis/py-textractify","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b-luis%2Fpy-textractify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b-luis%2Fpy-textractify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b-luis%2Fpy-textractify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b-luis%2Fpy-textractify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/b-luis","download_url":"https://codeload.github.com/b-luis/py-textractify/tar.gz/refs/heads/dev","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b-luis%2Fpy-textractify/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259642474,"owners_count":22889000,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["opencv","python","tesseract-ocr"],"created_at":"2025-06-13T12:12:21.464Z","updated_at":"2026-05-11T07:07:13.843Z","avatar_url":"https://github.com/b-luis.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# textractify\n[![Maintenance](https://img.shields.io/badge/Maintained%3F-no-red.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/b-luis/textractify/blob/main/LICENSE) \n\n\nA website that uses Optical Character Recognition to extract text from scanned images. It is built as a Flask web app using OpenCV for image processing and tesseract for image recognition. It allows the user to upload an image, which is processed first to easily recognize text. The processed image is then passed onto tesseract which performs image recognition and outputs the text.\n\n## Features\nImage processing ensures that the text in that image is readable. It does so by first converting the image to grayscale and thresholding the image.\n\n## Limitations\nTesseract has its own set of limitations. It fails to deliver when passed images of different fonts or when font sizes are too small. Though image processing eliminates the noise in images to an extent, extremely noisy images don't do well with tesseract. This app may not do well with bordered images either.\n\n## Demo\nUpload an image [here](textractify.herokuapp.com/upload) using one of the images in the [uploads folder](https://github.com/muichii/textractify/tree/main/static/uploads) to view a demo of the OCR in action!\n\n## Screenshots\n**Landing page**\n\n![image](https://user-images.githubusercontent.com/86459271/146956600-6dc3c21b-c553-4174-9393-b25f8942482e.png)\n\n**Upload section**\n\n![image](https://user-images.githubusercontent.com/86459271/146956983-9fc8c5f1-4e46-4df4-a80d-706e71aac544.png)\n\n**Uploading an image**\n\n![image](https://user-images.githubusercontent.com/86459271/146957298-b404f220-b171-45f2-8a1d-b9109b927819.png)\n\n\n**Result**\n\n![image](https://user-images.githubusercontent.com/86459271/146957461-de0a0eb3-9fa1-46b4-9c79-99108f8d8c43.png)\n\n## Getting Started \nThis installation procedure assumes you are on a Windows system, and have `pip` , `bash` , and `python3.9` installed.\n\n### Requirements\n- [Git](https://git-scm.com)\n- [Python 3.6 (or higher)](https://www.python.org)\n- [Tesseract](https://github.com/UB-Mannheim/tesseract/wiki)\n\n  Download windows executable file by clicking the hyper link titled **tesseract-ocr-w64-setup-v4.1.0.20190314.exe.** (for 64-bit version) A notification asking you to save an     exe file called “Tesseract-ocr-w64-setup-v4.1.0.20190314.exe” will appear. Save this .exe file wherever you have enough storage space. Open this exe file. If it windows         asks you “Do you want to allow this software to make changes to your system”, click yes. You will be taken to the installation section.\n\n### Python Packages\n- `Flask`\n- `pytesseract`\n- `Pillow`\n- `opencv-python`\n\n### **Steps to execute the app locally**\n1. Download the project files:\n   \n   On, click the green \"Clone or Download\" button at the top right of the page. If you want to get started with this script more quickly, click the \"Download ZIP\" button, and      extract the ZIP somewhere on your computer.\n   \n   or you can clone using this command:\n    ```\n    git clone https://github.com/b-luis/py-textractify.git\n    cd py-textractify\n    ```\n    \n2. Create a new virtual env:\n\n    ```\n    python3.9 -m venv venv\n    ```\n    \n3. Activate the virtualenv:\n    \n    ```\n    source venv/bin/activate\n    ```\n    \n4. Install the project requirements:\n\n    ```\n    pip install -r requirements.txt\n    ```\n5. Run the Script:\n    \n    ```\n    python app.py\n    ```\n\n6. Review the Results:\n    \n   The app.py script will start the python flask server.\n   \n   ```\n   Serving on 127.0.0.1:5000\n   ```\n\nIf the above steps does not work, download the zip file and extract it. Copy and paste the extracted folders on your desired code editor and install the project dependencies by doing step 4 and run the app.py script which will direct you to `localhost:5000`\n\n## Note\n- only `JPG`, `JPEG`, and `PNG` file formats are supported.\n- all the image files uploaded in the local server are stored in `uploads` folder\n\n\u003c!-- ## Website --\u003e\n\u003c!-- [textractify](textractify.herokuapp.com) is already deployed and running on Heroku. The website might go on maintenance from time to time.  --\u003e\n\n\u003c!-- ## Technology Stack\n#### Backend\n![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge\u0026logo=python\u0026logoColor=ffdd54)\n\n#### Frontend\n![HTML5](https://img.shields.io/badge/html5-%23E34F26.svg?style=for-the-badge\u0026logo=html5\u0026logoColor=white) ![CSS3](https://img.shields.io/badge/css3-%231572B6.svg?style=for-the-badge\u0026logo=css3\u0026logoColor=white) ![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge\u0026logo=javascript\u0026logoColor=%23F7DF1E) ![Bootstrap](https://img.shields.io/badge/bootstrap-%23563D7C.svg?style=for-the-badge\u0026logo=bootstrap\u0026logoColor=white) \n\n#### Framework\n![Flask](https://img.shields.io/badge/flask-%23000.svg?style=for-the-badge\u0026logo=flask\u0026logoColor=white) \n\n#### Image Processing \u0026 OCR\n![OpenCV](https://img.shields.io/badge/opencv-%23white.svg?style=for-the-badge\u0026logo=opencv\u0026logoColor=white) \u003ca href='https://github.com/shivamkapasia0' target=\"_blank\"\u003e\u003cimg alt='Tesseract' src='https://img.shields.io/badge/Tesseract-100000?style=for-the-badge\u0026logo=Tesseract\u0026logoColor=white\u0026labelColor=0561FF\u0026color=26A8E4'/\u003e\u003c/a\u003e\n\n#### Web Hosting\n![Heroku](https://img.shields.io/badge/Heroku-430098?style=for-the-badge\u0026logo=heroku\u0026logoColor=white) --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fb-luis%2Fpy-textractify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fb-luis%2Fpy-textractify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fb-luis%2Fpy-textractify/lists"}