Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yjg30737/pyqt-pdf2text
Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2
https://github.com/yjg30737/pyqt-pdf2text
ocr pdf-converter pdf2image pypdf2 pyqt pyqt5 pytesseract tesseract
Last synced: 18 days ago
JSON representation
Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2
- Host: GitHub
- URL: https://github.com/yjg30737/pyqt-pdf2text
- Owner: yjg30737
- License: mit
- Created: 2023-09-14T05:07:39.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-14T05:16:32.000Z (over 1 year ago)
- Last Synced: 2024-01-26T10:12:17.544Z (11 months ago)
- Topics: ocr, pdf-converter, pdf2image, pypdf2, pyqt, pyqt5, pytesseract, tesseract
- Language: C
- Homepage:
- Size: 12.4 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pyqt-pdf2text
Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2## Requirements
* PyPDF2
* pytesseract
* pdf2image
* PyQt5>=5.14
Poppler is already included. (As of September 14, 2020, it is the latest version.)## Note
The current GUI only uses Tesseract for image-to-text conversion and does not use it for PDF-to-text conversion. The functionality does exist in the script.py, so feel free to use it if you'd like.## How to install
1. Install Tesseract from Google.
2. Add the installed path of Tesseract to your environment variables.
3. git clone
4. pip install -r requirements.txt
5. python main.py## Preview
![image](https://github.com/yjg30737/pyqt-pdf2text/assets/55078043/021bfa73-3d68-4f59-9fbf-b4a2b907d2c5)