{"id":25905576,"url":"https://github.com/nafisarkar/pdf_converter_ocr","last_synced_at":"2026-06-29T07:31:43.793Z","repository":{"id":256999268,"uuid":"857053100","full_name":"Nafisarkar/Pdf_Converter_OCR","owner":"Nafisarkar","description":"This is a graphical tool for performing Optical Character Recognition (OCR) on images and converting PDF files to images","archived":false,"fork":false,"pushed_at":"2024-10-22T14:50:19.000Z","size":214,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-10T06:00:45.799Z","etag":null,"topics":["image","image-processing","machine-learning","ocr","pdf","text-extraction","tkinter-gui"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nafisarkar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-13T17:53:51.000Z","updated_at":"2025-01-19T06:49:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"f1f26b6d-84f2-4d53-86ec-573178f93475","html_url":"https://github.com/Nafisarkar/Pdf_Converter_OCR","commit_stats":null,"previous_names":["nafisarkar/pdf_converter_ocr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Nafisarkar/Pdf_Converter_OCR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nafisarkar%2FPdf_Converter_OCR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nafisarkar%2FPdf_Converter_OCR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nafisarkar%2FPdf_Converter_OCR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nafisarkar%2FPdf_Converter_OCR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nafisarkar","download_url":"https://codeload.github.com/Nafisarkar/Pdf_Converter_OCR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nafisarkar%2FPdf_Converter_OCR/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34918101,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-29T02:00:05.398Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image","image-processing","machine-learning","ocr","pdf","text-extraction","tkinter-gui"],"created_at":"2025-03-03T05:15:35.780Z","updated_at":"2026-06-29T07:31:43.766Z","avatar_url":"https://github.com/Nafisarkar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Alt text](Tools.png)\n\n\u003chr\u003e\u003ch1\u003eOCR and PDF Helper - SAKUNO\u003c/h1\u003e\u003cp\u003eThis is a graphical tool for performing Optical Character Recognition (OCR) on images and converting PDF files to images. Additionally, it allows for merging text files within a selected folder. The tool is built using \u003ccode\u003eCustomTkinter\u003c/code\u003e for the GUI, \u003ccode\u003eEasyOCR\u003c/code\u003e for OCR, \u003ccode\u003epypdfium2\u003c/code\u003e for PDF manipulation, and \u003ccode\u003ePillow\u003c/code\u003e for image handling.\u003c/p\u003e\u003ch2\u003eTable of Contents\u003c/h2\u003e\u003cul\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#features\"\u003eFeatures\u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#installation\"\u003eInstallation\u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#usage\"\u003eUsage\u003c/a\u003e\u003cul\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#pdf-conversion\"\u003ePDF Conversion\u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#ocr-on-images\"\u003eOCR on Images\u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#merging-text-files\"\u003eMerging Text Files\u003c/a\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/li\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#contributing\"\u003eContributing\u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#license\"\u003eLicense\u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca rel=\"noopener\" href=\"#author\"\u003eAuthor\u003c/a\u003e\u003c/li\u003e\u003c/ul\u003e\u003ch2\u003eFeatures\u003c/h2\u003e\u003cul\u003e\u003cli\u003e\u003cstrong\u003ePDF to Image Conversion\u003c/strong\u003e: Convert PDF files into images, with adjustable DPI settings for image quality.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eOCR on Images\u003c/strong\u003e: Perform OCR on images in a selected folder to extract text and save it as \u003ccode\u003e.txt\u003c/code\u003e files.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eMerge Text Files\u003c/strong\u003e: Merge all text files in a folder into a single text file.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eUser-friendly GUI\u003c/strong\u003e: Built with \u003ccode\u003eCustomTkinter\u003c/code\u003e, making it easy to navigate.\u003c/li\u003e\u003c/ul\u003e\u003ch2\u003eInstallation\u003c/h2\u003e\u003cp\u003eTo run this project, you need to have Python installed. Follow these steps to set it up:\u003c/p\u003e\u003col\u003e\u003cli\u003e\u003cp\u003eClone the repository:\u003c/p\u003e\u003cpre class=\"!overflow-visible\"\u003e\u003cdiv class=\"dark bg-gray-950 contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative\"\u003e\u003cdiv class=\"flex items-center text-token-text-secondary bg-token-main-surface-secondary px-4 py-2 text-xs font-sans justify-between rounded-t-md h-9\"\u003ebash\u003c/div\u003e\u003cdiv class=\"sticky top-9 md:top-[5.75rem]\"\u003e\u003cdiv class=\"absolute bottom-0 right-2 flex h-9 items-center\"\u003e\u003cdiv class=\"flex items-center rounded bg-token-main-surface-secondary px-2 font-sans text-xs text-token-text-secondary\"\u003e\u003cspan class=\"\" data-state=\"closed\"\u003e\u003cbutton class=\"flex gap-1 items-center py-1\"\u003e\u003csvg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" fill=\"none\" xmlns=\"http://www.w3.org/2000/svg\" class=\"icon-sm\"\u003e\u003cpath fill-rule=\"evenodd\" clip-rule=\"evenodd\" d=\"M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z\" fill=\"currentColor\"\u003e\u003c/path\u003e\u003c/svg\u003eCopy code\u003c/button\u003e\u003c/span\u003e\u003c/div\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"overflow-y-auto p-4\" dir=\"ltr\"\u003e\u003ccode class=\"!whitespace-pre hljs language-bash\"\u003egit \u003cspan class=\"hljs-built_in\"\u003eclone\u003c/span\u003e https://github.com/yourusername/ocr-pdf-helper.git\n\u003cspan class=\"hljs-built_in\"\u003ecd\u003c/span\u003e ocr-pdf-helper\n\u003c/code\u003e\u003c/div\u003e\u003c/div\u003e\u003c/pre\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eInstall the required dependencies:\u003c/p\u003e\u003cpre class=\"!overflow-visible\"\u003e\u003cdiv class=\"dark bg-gray-950 contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative\"\u003e\u003cdiv class=\"flex items-center text-token-text-secondary bg-token-main-surface-secondary px-4 py-2 text-xs font-sans justify-between rounded-t-md h-9\"\u003ebash\u003c/div\u003e\u003cdiv class=\"sticky top-9 md:top-[5.75rem]\"\u003e\u003cdiv class=\"absolute bottom-0 right-2 flex h-9 items-center\"\u003e\u003cdiv class=\"flex items-center rounded bg-token-main-surface-secondary px-2 font-sans text-xs text-token-text-secondary\"\u003e\u003cspan class=\"\" data-state=\"closed\"\u003e\u003cbutton class=\"flex gap-1 items-center py-1\"\u003e\u003csvg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" fill=\"none\" xmlns=\"http://www.w3.org/2000/svg\" class=\"icon-sm\"\u003e\u003cpath fill-rule=\"evenodd\" clip-rule=\"evenodd\" d=\"M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z\" fill=\"currentColor\"\u003e\u003c/path\u003e\u003c/svg\u003eCopy code\u003c/button\u003e\u003c/span\u003e\u003c/div\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv class=\"overflow-y-auto p-4\" dir=\"ltr\"\u003e\u003ccode class=\"!whitespace-pre hljs language-bash\"\u003epip install customtkinter pypdfium2 Pillow easyocr\n\u003c/code\u003e\u003c/div\u003e\u003c/div\u003e\u003c/pre\u003e\u003cp\u003eYou may need additional libraries like \u003ccode\u003epytorch\u003c/code\u003e for \u003ccode\u003eEasyOCR\u003c/code\u003e depending on your system.\u003c/p\u003e\u003c/li\u003e\u003c/ol\u003e\u003ch2\u003eUsage\u003c/h2\u003e\u003cp\u003eOnce installed, you can run the program directly using Python. The interface provides buttons and options for performing the tasks mentioned below.\u003c/p\u003e\u003ch3\u003ePDF Conversion\u003c/h3\u003e\u003col\u003e\u003cli\u003e\u003cstrong\u003eFile Selector\u003c/strong\u003e: Choose a PDF file that you want to convert into images.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eSet DPI\u003c/strong\u003e: Adjust the DPI (dots per inch) for image quality (default is 100%).\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eConvert\u003c/strong\u003e: Convert the PDF into images. The images will be saved in a new folder named after the PDF.\u003c/li\u003e\u003c/ol\u003e\u003ch3\u003eOCR on Images\u003c/h3\u003e\u003col\u003e\u003cli\u003e\u003cstrong\u003eFolder Selector\u003c/strong\u003e: Select a folder containing images on which OCR should be performed.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eSet OCR Language\u003c/strong\u003e: Input the languages for OCR in a comma-separated format (e.g., \u003ccode\u003eeng,bn\u003c/code\u003e for English and Bengali).\u003c/li\u003e\u003cli\u003e\u003cstrong\u003ePerform OCR\u003c/strong\u003e: The tool will scan each image, extract text, and save it as a \u003ccode\u003e.txt\u003c/code\u003e file in the same folder.\u003c/li\u003e\u003c/ol\u003e\u003ch3\u003eMerging Text Files\u003c/h3\u003e\u003col\u003e\u003cli\u003e\u003cstrong\u003eFolder Selector\u003c/strong\u003e: Select a folder that contains multiple \u003ccode\u003e.txt\u003c/code\u003e files.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eMerge All Text Files\u003c/strong\u003e: Click the \"Merge All the Text Files\" button to combine all the \u003ccode\u003e.txt\u003c/code\u003e files in the folder into one single file.\u003c/li\u003e\u003c/ol\u003e\u003ch3\u003eGUI Overview\u003c/h3\u003e\u003cul\u003e\u003cli\u003e\u003cstrong\u003ePDF Path\u003c/strong\u003e: Displays the selected PDF file path.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eImage Preview\u003c/strong\u003e: After PDF to image conversion, the preview of the first image will be displayed.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eOCR and Merge Options\u003c/strong\u003e: Available after selecting a folder for OCR and text merging.\u003c/li\u003e\u003c/ul\u003e\u003ch2\u003eContributing\u003c/h2\u003e\u003cp\u003eContributions are welcome! Feel free to fork this repository, make changes, and submit a pull request.\u003c/p\u003e\u003ch3\u003eSteps:\u003c/h3\u003e\u003col\u003e\u003cli\u003eFork the repository.\u003c/li\u003e\u003cli\u003eCreate a new branch (\u003ccode\u003egit checkout -b feature/your-feature-name\u003c/code\u003e).\u003c/li\u003e\u003cli\u003eCommit your changes (\u003ccode\u003egit commit -m 'Add some feature'\u003c/code\u003e).\u003c/li\u003e\u003cli\u003ePush to the branch (\u003ccode\u003egit push origin feature/your-feature-name\u003c/code\u003e).\u003c/li\u003e\u003cli\u003eOpen a pull request.\u003c/li\u003e\u003c/ol\u003e\u003ch2\u003eLicense\u003c/h2\u003e\u003cp\u003eThis project is licensed under the MIT License. See the \u003ca rel=\"noopener\"\u003eLICENSE\u003c/a\u003e file for details.\u003c/p\u003e\u003ch2\u003eAuthor\u003c/h2\u003e\u003cp\u003eDeveloped by \u003cstrong\u003eShaon An Nafi\u003c/strong\u003e.\u003cbr\u003eFeel free to reach out for any questions or suggestions.\u003c/p\u003e\u003chr\u003e\u003cp\u003eThis \u003ccode\u003eREADME.md\u003c/code\u003e provides clear instructions for installation, usage, and contributing, making your project easy to understand for new users. Let me know if you need any changes!\u003c/p\u003e\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnafisarkar%2Fpdf_converter_ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnafisarkar%2Fpdf_converter_ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnafisarkar%2Fpdf_converter_ocr/lists"}