Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Monogramm/erpnext_ocr
:snake: :alembic: Optical Character Recognition using tesseract within Frappe.
https://github.com/Monogramm/erpnext_ocr
erpnext frappe ocr python tesseract
Last synced: about 2 months ago
JSON representation
:snake: :alembic: Optical Character Recognition using tesseract within Frappe.
- Host: GitHub
- URL: https://github.com/Monogramm/erpnext_ocr
- Owner: Monogramm
- License: mit
- Created: 2019-06-22T15:22:52.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-06-19T00:05:03.000Z (6 months ago)
- Last Synced: 2024-08-01T15:33:41.016Z (5 months ago)
- Topics: erpnext, frappe, ocr, python, tesseract
- Language: Python
- Homepage:
- Size: 938 KB
- Stars: 81
- Watchers: 5
- Forks: 53
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-frappe - ERPNext OCR - Optical Character Recognition using Tesseract within Frappe. (Uncategorized / Uncategorized)
README
[![License: MIT][uri_license_image]][uri_license]
[![Managed with Taiga.io](https://img.shields.io/badge/managed%20with-TAIGA.io-709f14.svg)](https://tree.taiga.io/project/monogrammbot-monogrammerpnext_ocr/ "Managed with Taiga.io")
[![Build Status](https://travis-ci.org/Monogramm/erpnext_ocr.svg)](https://travis-ci.org/Monogramm/erpnext_ocr)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/e154ec72926346d4ba4951c25d906d33)](https://www.codacy.com/gh/Monogramm/erpnext_ocr?utm_source=github.com&utm_medium=referral&utm_content=Monogramm/erpnext_ocr&utm_campaign=Badge_Grade)
[![Coverage Status](https://coveralls.io/repos/github/Monogramm/erpnext_ocr/badge.svg?branch=master)](https://coveralls.io/github/Monogramm/erpnext_ocr?branch=master)## ERPNext OCR
> :alembic: **Experimental** Frappe OCR application with [tesseract](https://github.com/tesseract-ocr/tesseract).
This project is a fork of [ERPNext-OCR](https://github.com/jvfiel/ERPNext-OCR) by [John Vincent Fiel](https://github.com/jvfiel). Its aim is to fix and cleanup the original source code and add some new features.
Check out more on [ERPNext Discuss](https://discuss.erpnext.com/t/erpnext-ocr-app/33834/7).
## :chart_with_upwards_trend: Changes
See [CHANGELOG](./CHANGELOG.md)
## :bookmark: Roadmap
See [Taiga.io](https://tree.taiga.io/project/monogrammbot-monogrammerpnext_ocr/ "Taiga.io monogrammbot-monogrammerpnext_ocr")
## :construction: Install
### Pre-requisites: tesseract-python and imagemagick
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
```sh
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript
```### Install Frappe application
```sh
bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr
```When installing Frappe app, the following python requirements will be installed:
- python binding for tesseract, [tesserocr](https://pypi.org/project/tesserocr/)
- image processing library in python, [pillow](https://pypi.org/project/Pillow/)
- HTTP library in python, [requests](https://pypi.org/project/requests/)
- python binding for imagemagick, [wand](https://pypi.org/project/Wand/)
## :rocket: Usage
**File Being Read**:
![File Being Read](./erpnext_ocr/tests/test_data/Picture_010.png)
**Sample Screenshot**:
![Sample Screenshot](./erpnext_ocr/tests/test_data/Picture_010_screenshot.png)
### Tesseract trained data
In order to use OCR with different languages, you need to install the appropriate trained data files.
Check tesseract Wiki for details:### Development
If you wish to develop or just test locally this application, you can use `docker-compose up -d` at the root of the this repository.
You can then access your ERPNext OCR dev env at `http://localhost:8080`.### Known issues
- `wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412`
- This can happen due to security configuration in imagemagick preventing it to read PDF files.
- Reference:
-
-- `wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.`
- This might happen if you're missing a dependency to convert PDF, most of the time `ghostscript`
- References:
-- `OSError: encoder error -2 when writing image file`
- This might happen when trying to open a TIFF image, but the real error is "_hidden_" and only displayed in console.
- If the original error in console is `Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.` that usually happens when TIFF image compression is not valid / recognized.## :white_check_mark: Run tests
```sh
bench run-tests --app erpnext_ocr
```## :bust_in_silhouette: Authors
**Monogramm**
- Website:
- Github: [@Monogramm](https://github.com/Monogramm)**John Vincent Fiel**
- Github: [@jvfiel](https://github.com/jvfiel)
## :handshake: Contributing
Contributions, issues and feature requests are welcome!
Feel free to check [issues page](https://github.com/Monogramm/erpnext_ocr/issues).
[Check the contributing guide](./CONTRIBUTING.md).## :thumbsup: Show your support
Give a :star: if this project helped you!
## :page_facing_up: License
Copyright © 2019 [Monogramm](https://github.com/Monogramm).
This project is [MIT](uri_license) licensed.* * *
_This README was generated with :heart: by [readme-md-generator](https://github.com/kefranabg/readme-md-generator)_
[uri_license]: https://opensource.org/licenses/MIT
[uri_license_image]: https://img.shields.io/badge/license-MIT-blue