{"id":13579592,"url":"https://github.com/Monogramm/erpnext_ocr","last_synced_at":"2025-04-05T23:31:54.535Z","repository":{"id":36281033,"uuid":"193248656","full_name":"Monogramm/erpnext_ocr","owner":"Monogramm","description":":snake: :alembic: Optical Character Recognition using tesseract within Frappe.","archived":false,"fork":false,"pushed_at":"2024-09-06T01:38:17.000Z","size":961,"stargazers_count":100,"open_issues_count":14,"forks_count":54,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-31T05:25:49.815Z","etag":null,"topics":["erpnext","frappe","ocr","python","tesseract"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Monogramm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-22T15:22:52.000Z","updated_at":"2025-03-15T04:41:10.000Z","dependencies_parsed_at":"2024-02-11T19:59:08.613Z","dependency_job_id":"806b0bae-2d70-49bf-920e-c96f15e3c857","html_url":"https://github.com/Monogramm/erpnext_ocr","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Monogramm%2Ferpnext_ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Monogramm%2Ferpnext_ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Monogramm%2Ferpnext_ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Monogramm%2Ferpnext_ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Monogramm","download_url":"https://codeload.github.com/Monogramm/erpnext_ocr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247415783,"owners_count":20935383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["erpnext","frappe","ocr","python","tesseract"],"created_at":"2024-08-01T15:01:40.894Z","updated_at":"2025-04-05T23:31:49.524Z","avatar_url":"https://github.com/Monogramm.png","language":"Python","readme":"[![License: MIT][uri_license_image]][uri_license]\n[![Managed with Taiga.io](https://img.shields.io/badge/managed%20with-TAIGA.io-709f14.svg)](https://tree.taiga.io/project/monogrammbot-monogrammerpnext_ocr/ \"Managed with Taiga.io\")\n[![Build Status](https://travis-ci.org/Monogramm/erpnext_ocr.svg)](https://travis-ci.org/Monogramm/erpnext_ocr)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/e154ec72926346d4ba4951c25d906d33)](https://www.codacy.com/gh/Monogramm/erpnext_ocr?utm_source=github.com\u0026utm_medium=referral\u0026utm_content=Monogramm/erpnext_ocr\u0026utm_campaign=Badge_Grade)\n[![Coverage Status](https://coveralls.io/repos/github/Monogramm/erpnext_ocr/badge.svg?branch=master)](https://coveralls.io/github/Monogramm/erpnext_ocr?branch=master)\n\n## ERPNext OCR\n\n\u003e :alembic: **Experimental** Frappe OCR application with [tesseract](https://github.com/tesseract-ocr/tesseract).\n\nThis project is a fork of [ERPNext-OCR](https://github.com/jvfiel/ERPNext-OCR) by [John Vincent Fiel](https://github.com/jvfiel). Its aim is to fix and cleanup the original source code and add some new features.\n\nCheck out more on [ERPNext Discuss](https://discuss.erpnext.com/t/erpnext-ocr-app/33834/7).\n\n## :chart_with_upwards_trend: Changes\n\nSee [CHANGELOG](./CHANGELOG.md)\n\n## :bookmark: Roadmap\n\nSee [Taiga.io](https://tree.taiga.io/project/monogrammbot-monogrammerpnext_ocr/ \"Taiga.io monogrammbot-monogrammerpnext_ocr\")\n\n## :construction: Install\n\n### Pre-requisites: tesseract-python and imagemagick\n\nInstall tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:\n\n```sh\nsudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript\n```\n\n### Install Frappe application\n\n```sh\nbench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr\nbench install-app erpnext_ocr\n```\n\nWhen installing Frappe app, the following python requirements will be installed:\n\n-   python binding for tesseract, [tesserocr](https://pypi.org/project/tesserocr/)\n\n-   image processing library in python, [pillow](https://pypi.org/project/Pillow/)\n\n-   HTTP library in python, [requests](https://pypi.org/project/requests/)\n\n-   python binding for imagemagick, [wand](https://pypi.org/project/Wand/)\n\n## :rocket: Usage\n\n**File Being Read**:\n\n![File Being Read](./erpnext_ocr/tests/test_data/Picture_010.png)\n\n**Sample Screenshot**:\n\n![Sample Screenshot](./erpnext_ocr/tests/test_data/Picture_010_screenshot.png)\n\n### Tesseract trained data\n\nIn order to use OCR with different languages, you need to install the appropriate trained data files.\nCheck tesseract Wiki for details: \u003chttps://github.com/tesseract-ocr/tesseract/wiki/Data-Files\u003e\n\n### Development\n\nIf you wish to develop or just test locally this application, you can use `docker-compose up -d` at the root of the this repository.\nYou can then access your ERPNext OCR dev env at `http://localhost:8080`.\n\n### Known issues\n\n-   `wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412`\n\n    -   This can happen due to security configuration in imagemagick preventing it to read PDF files.\n\n    -   Reference:\n        -   \u003chttps://stackoverflow.com/questions/52699608/wand-policy-error-error-constitute-c-readimage-412\u003e\n        -   \u003chttps://stackoverflow.com/questions/42928765/convertnot-authorized-aaaa-error-constitute-c-readimage-453\u003e\n\n-   `wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.`\n\n    -   This might happen if you're missing a dependency to convert PDF, most of the time `ghostscript`\n\n    -   References:\n        -   \u003chttps://stackoverflow.com/questions/57271287/user-wand-by-python-to-convert-pdf-to-jepg-raise-wand-exceptions-wandruntimeerr\u003e\n\n-   `OSError: encoder error -2 when writing image file`\n\n    -   This might happen when trying to open a TIFF image, but the real error is \"_hidden_\" and only displayed in console.\n    -   If the original error in console is `Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.` that usually happens when TIFF image compression is not valid / recognized.\n\n## :white_check_mark: Run tests\n\n```sh\nbench run-tests --app erpnext_ocr\n```\n\n## :bust_in_silhouette: Authors\n\n**Monogramm**\n\n-   Website: \u003chttps://www.monogramm.io\u003e\n-   Github: [@Monogramm](https://github.com/Monogramm)\n\n**John Vincent Fiel**\n\n-   Github: [@jvfiel](https://github.com/jvfiel)\n\n## :handshake: Contributing\n\nContributions, issues and feature requests are welcome!\u003cbr /\u003eFeel free to check [issues page](https://github.com/Monogramm/erpnext_ocr/issues).\n[Check the contributing guide](./CONTRIBUTING.md).\u003cbr /\u003e\n\n## :thumbsup: Show your support\n\nGive a :star: if this project helped you!\n\n## :page_facing_up: License\n\nCopyright © 2019 [Monogramm](https://github.com/Monogramm).\u003cbr /\u003e\nThis project is [MIT](uri_license) licensed.\n\n* * *\n\n_This README was generated with :heart: by [readme-md-generator](https://github.com/kefranabg/readme-md-generator)_\n\n[uri_license]: https://opensource.org/licenses/MIT\n\n[uri_license_image]: https://img.shields.io/badge/license-MIT-blue\n","funding_links":[],"categories":["Python","Uncategorized"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMonogramm%2Ferpnext_ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMonogramm%2Ferpnext_ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMonogramm%2Ferpnext_ocr/lists"}