https://github.com/ranger-nf/page-reader

Simple python script to extract texts from images and PDFs using Tesseract-OCR
https://github.com/ranger-nf/page-reader

cli multiple-languages python-script tesseract-ocr

Last synced: over 1 year ago
JSON representation

Simple python script to extract texts from images and PDFs using Tesseract-OCR

Host: GitHub
URL: https://github.com/ranger-nf/page-reader
Owner: Ranger-NF
License: gpl-3.0
Created: 2022-04-30T16:48:33.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2024-01-16T14:50:04.000Z (over 2 years ago)
Last Synced: 2025-01-23T14:42:17.032Z (over 1 year ago)
Topics: cli, multiple-languages, python-script, tesseract-ocr
Language: Python
Homepage:
Size: 6.9 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Page Reader :page_facing_up:

A Python script that helps you with text in **images & PDFs** :snake:

### Features :rocket:

- Extract text from images and PDFs and print it to the terminal

- Save extracted text as an audio file (needs internet)

- Many more options are provided to you on running the script...

## Usage

1. Clone this repo

2. Install all the dependencies found in [requirement.txt](requirements.txt)

3. Install [Tesseract-OCR](https://github.com/tesseract-ocr/tesseract) in your system

4. Add the local path of your [tesseract](https://github.com/tesseract-ocr/tesseract) to [config.ini](config.ini)

5. Run `python3 main.py` and choose from the options!

## Future Goals :dart:

- [ ] Make CLI more beautiful

- [ ] Refactor code to be more readable

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ranger-nf/page-reader

Awesome Lists containing this project

README