https://github.com/Monogramm/erpnext_ocr
:snake: :alembic: Optical Character Recognition using tesseract within Frappe.
https://github.com/Monogramm/erpnext_ocr
erpnext frappe ocr python tesseract
Last synced: 3 months ago
JSON representation
:snake: :alembic: Optical Character Recognition using tesseract within Frappe.
- Host: GitHub
- URL: https://github.com/Monogramm/erpnext_ocr
- Owner: Monogramm
- License: mit
- Created: 2019-06-22T15:22:52.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-06T01:38:17.000Z (10 months ago)
- Last Synced: 2025-03-31T05:25:49.815Z (3 months ago)
- Topics: erpnext, frappe, ocr, python, tesseract
- Language: Python
- Homepage:
- Size: 938 KB
- Stars: 100
- Watchers: 4
- Forks: 54
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-frappe - ERPNext OCR - Optical Character Recognition using Tesseract within Frappe. (Uncategorized / Uncategorized)
README
[![License: MIT][uri_license_image]][uri_license]
[](https://tree.taiga.io/project/monogrammbot-monogrammerpnext_ocr/ "Managed with Taiga.io")
[](https://travis-ci.org/Monogramm/erpnext_ocr)
[](https://www.codacy.com/gh/Monogramm/erpnext_ocr?utm_source=github.com&utm_medium=referral&utm_content=Monogramm/erpnext_ocr&utm_campaign=Badge_Grade)
[](https://coveralls.io/github/Monogramm/erpnext_ocr?branch=master)## ERPNext OCR
> :alembic: **Experimental** Frappe OCR application with [tesseract](https://github.com/tesseract-ocr/tesseract).
This project is a fork of [ERPNext-OCR](https://github.com/jvfiel/ERPNext-OCR) by [John Vincent Fiel](https://github.com/jvfiel). Its aim is to fix and cleanup the original source code and add some new features.
Check out more on [ERPNext Discuss](https://discuss.erpnext.com/t/erpnext-ocr-app/33834/7).
## :chart_with_upwards_trend: Changes
See [CHANGELOG](./CHANGELOG.md)
## :bookmark: Roadmap
See [Taiga.io](https://tree.taiga.io/project/monogrammbot-monogrammerpnext_ocr/ "Taiga.io monogrammbot-monogrammerpnext_ocr")
## :construction: Install
### Pre-requisites: tesseract-python and imagemagick
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
```sh
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript
```### Install Frappe application
```sh
bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr
```When installing Frappe app, the following python requirements will be installed:
- python binding for tesseract, [tesserocr](https://pypi.org/project/tesserocr/)
- image processing library in python, [pillow](https://pypi.org/project/Pillow/)
- HTTP library in python, [requests](https://pypi.org/project/requests/)
- python binding for imagemagick, [wand](https://pypi.org/project/Wand/)
## :rocket: Usage
**File Being Read**:

**Sample Screenshot**:

### Tesseract trained data
In order to use OCR with different languages, you need to install the appropriate trained data files.
Check tesseract Wiki for details:### Development
If you wish to develop or just test locally this application, you can use `docker-compose up -d` at the root of the this repository.
You can then access your ERPNext OCR dev env at `http://localhost:8080`.### Known issues
- `wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412`
- This can happen due to security configuration in imagemagick preventing it to read PDF files.
- Reference:
-
-- `wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.`
- This might happen if you're missing a dependency to convert PDF, most of the time `ghostscript`
- References:
-- `OSError: encoder error -2 when writing image file`
- This might happen when trying to open a TIFF image, but the real error is "_hidden_" and only displayed in console.
- If the original error in console is `Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.` that usually happens when TIFF image compression is not valid / recognized.## :white_check_mark: Run tests
```sh
bench run-tests --app erpnext_ocr
```## :bust_in_silhouette: Authors
**Monogramm**
- Website:
- Github: [@Monogramm](https://github.com/Monogramm)**John Vincent Fiel**
- Github: [@jvfiel](https://github.com/jvfiel)
## :handshake: Contributing
Contributions, issues and feature requests are welcome!
Feel free to check [issues page](https://github.com/Monogramm/erpnext_ocr/issues).
[Check the contributing guide](./CONTRIBUTING.md).## :thumbsup: Show your support
Give a :star: if this project helped you!
## :page_facing_up: License
Copyright © 2019 [Monogramm](https://github.com/Monogramm).
This project is [MIT](uri_license) licensed.* * *
_This README was generated with :heart: by [readme-md-generator](https://github.com/kefranabg/readme-md-generator)_
[uri_license]: https://opensource.org/licenses/MIT
[uri_license_image]: https://img.shields.io/badge/license-MIT-blue