https://github.com/ranger-nf/page-reader
Simple python script to extract texts from images and PDFs using Tesseract-OCR
https://github.com/ranger-nf/page-reader
cli multiple-languages python-script tesseract-ocr
Last synced: 12 months ago
JSON representation
Simple python script to extract texts from images and PDFs using Tesseract-OCR
- Host: GitHub
- URL: https://github.com/ranger-nf/page-reader
- Owner: Ranger-NF
- License: gpl-3.0
- Created: 2022-04-30T16:48:33.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-01-16T14:50:04.000Z (about 2 years ago)
- Last Synced: 2025-01-23T14:42:17.032Z (about 1 year ago)
- Topics: cli, multiple-languages, python-script, tesseract-ocr
- Language: Python
- Homepage:
- Size: 6.9 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Page Reader :page_facing_up:
A Python script that helps you with text in **images & PDFs** :snake:
### Features :rocket:
- Extract text from images and PDFs and print it to the terminal
- Save extracted text as an audio file (needs internet)
- Many more options are provided to you on running the script...
## Usage
1. Clone this repo
2. Install all the dependencies found in [requirement.txt](requirements.txt)
3. Install [Tesseract-OCR](https://github.com/tesseract-ocr/tesseract) in your system
4. Add the local path of your [tesseract](https://github.com/tesseract-ocr/tesseract) to [config.ini](config.ini)
5. Run `python3 main.py` and choose from the options!
## Future Goals :dart:
- [ ] Make CLI more beautiful
- [ ] Refactor code to be more readable