https://github.com/ciur/papermerge-worker
papermerge worker - extracts (OCR) text from documents using tesseract.
https://github.com/ciur/papermerge-worker
Last synced: 6 months ago
JSON representation
papermerge worker - extracts (OCR) text from documents using tesseract.
- Host: GitHub
- URL: https://github.com/ciur/papermerge-worker
- Owner: ciur
- License: other
- Created: 2020-01-07T06:47:28.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-07-21T17:01:32.000Z (about 5 years ago)
- Last Synced: 2024-09-24T02:23:57.341Z (about 1 year ago)
- Language: Python
- Size: 2.39 MB
- Stars: 3
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: changelog.md
- License: LICENSE
Awesome Lists containing this project
README
Papermerge Worker
================pmwroker's main job is OCR processing. It extracts text from pdf, tiff, jpeg and png.
For full project description please see [Papermerge Project](https://github.com/ciur/papermerge)Requirements
=============python >= 3.6
pmworker.wrapper uses subprocess.run method, method added in python 3.5.
Also argument of subprocess.run(encoding='utf-8') is used. This argument
was added python 3.6Dependencies
=============Depends on celery, tesseract, imagemagick.
Usage:
> export CELERY_CONFIG_MODULE='pmwroker.config'
> celery -A pmworker.celery worker -l infoRun Tests
=============
Run all tests:
python3 ./test/run.pyRun specific test file:
python3 ./test/run.py -p test_endpoint
Which is same as:
python3 ./test/run.py -p test_endpoint.py