https://github.com/rbhatia46/opticalcharacterrecognition

Using Tesseract, an open source library for performing Optical Character Recognition in Python.
https://github.com/rbhatia46/opticalcharacterrecognition

Last synced: 7 months ago
JSON representation

Using Tesseract, an open source library for performing Optical Character Recognition in Python.

Host: GitHub
URL: https://github.com/rbhatia46/opticalcharacterrecognition
Owner: rbhatia46
Created: 2018-07-02T14:06:33.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-10-23T16:43:10.000Z (almost 7 years ago)
Last Synced: 2025-01-24T18:37:03.467Z (9 months ago)
Language: Python
Size: 94.7 KB
Stars: 0
Watchers: 3
Forks: 3
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# OpticalCharacterRecognition
Using Tesseract, an open source library for performing Optical Character Recognition in Python.

## How to use :
This repository contains 2 python scripts for 2 different use cases:
* For performing OCR on images
* For performing OCR on PDFs

Run the respective python scripts for respective use-cases.

## Dependencies required :
* Tesseract Core Library
* PyTesseract (Python wrapper for Tesseract Core)
* Pillow (For Image Processing)
* [ImageMagick](https://legacy.imagemagick.org/script/binary-releases.php#windows)
* wand(Python binding for ImageMagick)
***
**[Tesseract](https://github.com/tesseract-ocr/tesseract)** was originally written in C++ and uses an LSTM Network behind the scenes, for more reading and installation guide, you can check out this very helpful [blog post](https://appliedmachinelearning.blog/2018/06/30/performing-ocr-by-running-parallel-instances-of-tesseract-4-0-python/). This will explain you the essential stuff. I have also extended this for PDFs to make it more useful for real-world use-case.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rbhatia46/opticalcharacterrecognition

Awesome Lists containing this project

README