An open API service indexing awesome lists of open source software.

https://github.com/ranger-nf/page-reader

Simple python script to extract texts from images and PDFs using Tesseract-OCR
https://github.com/ranger-nf/page-reader

cli multiple-languages python-script tesseract-ocr

Last synced: 12 months ago
JSON representation

Simple python script to extract texts from images and PDFs using Tesseract-OCR

Awesome Lists containing this project

README

          

# Page Reader :page_facing_up:
A Python script that helps you with text in **images & PDFs** :snake:

### Features :rocket:
- Extract text from images and PDFs and print it to the terminal
- Save extracted text as an audio file (needs internet)
- Many more options are provided to you on running the script...

## Usage

1. Clone this repo
2. Install all the dependencies found in [requirement.txt](requirements.txt)
3. Install [Tesseract-OCR](https://github.com/tesseract-ocr/tesseract) in your system
4. Add the local path of your [tesseract](https://github.com/tesseract-ocr/tesseract) to [config.ini](config.ini)
5. Run `python3 main.py` and choose from the options!

## Future Goals :dart:
- [ ] Make CLI more beautiful
- [ ] Refactor code to be more readable