https://github.com/maylad31/allin1_ocr

ocr paddleocr python tesseract

Last synced: 6 months ago
JSON representation

OCR

README

# allin1_ocr
Choose from paddleocr, python-doctr or tesseract to perfrom OCR.

**Installation:**

git clone https://github.com/maylad31/allin1_ocr.git

cd allin1_ocr

pip install -r requirements.txt

For using tesseract, you need to install tesseract:

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

Tested with python3.8 on linux

**How to run:**
python app.py --dir directory path --ocr paddle (choose from 'paddle', 'doctr','tesseract')

Perfroms ocr on all the files in the directory and saves the results to corresponding text files. You can run on pdf, png, jpeg, jpg.

If you ask me, paddleocr is fast and reasonably accurate. Doctr is good too.

You are welcome to add any other library.

Always looking for opoortunities to enhance my skills, contact me at mynameladdha@gmail.com