Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aryaminus/saram

Get OCR in txt form from an image or pdf extension supporting multiple files from directory using pytesseract with auto rotation for wrong orientation. PYPI:
https://github.com/aryaminus/saram

character-recognition chmod image ocr orientation-detection pdf pillow pyocr pytesseract python tesseract wand

Last synced: about 2 months ago
JSON representation

Get OCR in txt form from an image or pdf extension supporting multiple files from directory using pytesseract with auto rotation for wrong orientation. PYPI:

Awesome Lists containing this project

README

        

# Saram - Image/PDF OCR detection system
Get OCR in txt form from an image or pdf extension supporting multiple files from directory using `pytesseract` with support for rotation in case of wrong orientation along.

**Currently in beta state**

Follow: Demo run

[![Saram features](https://i.imgur.com/M9dAwPq.gif)](https://youtu.be/YF6Tf7qOXU4)

**Note:**
Make sure you have a OCR tool like `tesseract` and certain data value for comparing OCR, eg `tesseract-data-eng` along with `Pillow` and `Wand` for image conversion and loading which will be fetched during pip install.

**For using in python**:
Refer to the py-module branch

## Installation

Install using PIP:
```
$ pip install saram
$ saram
```
***else***

Clone the source locally:
```
$ git clone https://github.com/aryaminus/saram
$ cd saram
$ git checkout py-module
$ python main.py
```

## Todo
- [x] Add support for PDF by PDF -> Image -> Txt with converted image deletion after processing
- [x] Double check for orientation in case of image and PDF
- [x] Make a PIP package
- [ ] Add NLP to process the most repeated frequent characters to filer content
- [ ] Add Cloud Vision support for effective character recognization
- [ ] Suppot for GUI using tkinter

## Reference
1. pdf-to-txt
2. ocr-convert-image-to-text
3. fix-image-rotation
4. python-packaging

-----------------------------------------------------------------------------------------------------------

## Contributing

1. Fork it ()
2. Create your feature branch (`git checkout -b feature/fooBar`)
3. Commit your changes (`git commit -am 'Add some fooBar'`)
4. Push to the branch (`git push origin feature/fooBar`)
5. Create a new Pull Request

**Enjoy!**