https://github.com/nafisarkar/pdf_converter_ocr

This is a graphical tool for performing Optical Character Recognition (OCR) on images and converting PDF files to images
https://github.com/nafisarkar/pdf_converter_ocr

image image-processing machine-learning ocr pdf text-extraction tkinter-gui

Last synced: about 3 hours ago
JSON representation

This is a graphical tool for performing Optical Character Recognition (OCR) on images and converting PDF files to images

Host: GitHub
URL: https://github.com/nafisarkar/pdf_converter_ocr
Owner: Nafisarkar
Created: 2024-09-13T17:53:51.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-10-22T14:50:19.000Z (over 1 year ago)
Last Synced: 2025-03-10T06:00:45.799Z (over 1 year ago)
Topics: image, image-processing, machine-learning, ocr, pdf, text-extraction, tkinter-gui
Language: Python
Homepage:
Size: 209 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

![Alt text](Tools.png)

OCR and PDF Helper - SAKUNO

This is a graphical tool for performing Optical Character Recognition (OCR) on images and converting PDF files to images. Additionally, it allows for merging text files within a selected folder. The tool is built using CustomTkinter for the GUI, EasyOCR for OCR, pypdfium2 for PDF manipulation, and Pillow for image handling.

Features

Installation

Usage

Contributing

License

Author

Features

PDF to Image Conversion: Convert PDF files into images, with adjustable DPI settings for image quality.

OCR on Images: Perform OCR on images in a selected folder to extract text and save it as .txt files.

Merge Text Files: Merge all text files in a folder into a single text file.

User-friendly GUI: Built with CustomTkinter, making it easy to navigate.

Installation

To run this project, you need to have Python installed. Follow these steps to set it up:

Clone the repository:



bash

Copy code

git clone https://github.com/yourusername/ocr-pdf-helper.git

cd ocr-pdf-helper

Install the required dependencies:
```
bash

Copy code

pip install customtkinter pypdfium2 Pillow easyocr
```
You may need additional libraries like pytorch for EasyOCR depending on your system.

Usage

Once installed, you can run the program directly using Python. The interface provides buttons and options for performing the tasks mentioned below.

PDF Conversion

File Selector: Choose a PDF file that you want to convert into images.

Set DPI: Adjust the DPI (dots per inch) for image quality (default is 100%).

Convert: Convert the PDF into images. The images will be saved in a new folder named after the PDF.

OCR on Images

Folder Selector: Select a folder containing images on which OCR should be performed.

Set OCR Language: Input the languages for OCR in a comma-separated format (e.g., eng,bn for English and Bengali).

Perform OCR: The tool will scan each image, extract text, and save it as a .txt file in the same folder.

Merging Text Files

Folder Selector: Select a folder that contains multiple .txt files.

Merge All Text Files: Click the "Merge All the Text Files" button to combine all the .txt files in the folder into one single file.

GUI Overview

PDF Path: Displays the selected PDF file path.

Image Preview: After PDF to image conversion, the preview of the first image will be displayed.

OCR and Merge Options: Available after selecting a folder for OCR and text merging.

Contributing

Contributions are welcome! Feel free to fork this repository, make changes, and submit a pull request.

Steps:

Fork the repository.

Create a new branch (git checkout -b feature/your-feature-name).

Commit your changes (git commit -m 'Add some feature').

Push to the branch (git push origin feature/your-feature-name).

Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Developed by Shaon An Nafi.
Feel free to reach out for any questions or suggestions.

This README.md provides clear instructions for installation, usage, and contributing, making your project easy to understand for new users. Let me know if you need any changes!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome