https://github.com/maylad31/allin1_ocr
OCR
https://github.com/maylad31/allin1_ocr
ocr paddleocr python tesseract
Last synced: 6 months ago
JSON representation
OCR
- Host: GitHub
- URL: https://github.com/maylad31/allin1_ocr
- Owner: maylad31
- Created: 2021-06-04T09:54:17.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2022-08-05T14:14:47.000Z (almost 4 years ago)
- Last Synced: 2025-01-11T13:54:07.494Z (over 1 year ago)
- Topics: ocr, paddleocr, python, tesseract
- Language: Python
- Homepage:
- Size: 42 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# allin1_ocr
Choose from paddleocr, python-doctr or tesseract to perfrom OCR.
**Installation:**
git clone https://github.com/maylad31/allin1_ocr.git
cd allin1_ocr
pip install -r requirements.txt
For using tesseract, you need to install tesseract:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
Tested with python3.8 on linux
**How to run:**
python app.py --dir directory path --ocr paddle (choose from 'paddle', 'doctr','tesseract')
Perfroms ocr on all the files in the directory and saves the results to corresponding text files. You can run on pdf, png, jpeg, jpg.
If you ask me, paddleocr is fast and reasonably accurate. Doctr is good too.
You are welcome to add any other library.
Always looking for opoortunities to enhance my skills, contact me at mynameladdha@gmail.com