https://github.com/ciur/papermerge-worker

papermerge worker - extracts (OCR) text from documents using tesseract.
https://github.com/ciur/papermerge-worker

Last synced: 6 months ago
JSON representation

papermerge worker - extracts (OCR) text from documents using tesseract.

Host: GitHub
URL: https://github.com/ciur/papermerge-worker
Owner: ciur
License: other
Created: 2020-01-07T06:47:28.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2020-07-21T17:01:32.000Z (about 5 years ago)
Last Synced: 2024-09-24T02:23:57.341Z (about 1 year ago)
Language: Python
Size: 2.39 MB
Stars: 3
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: changelog.md
- License: LICENSE

Awesome Lists containing this project

README

          Papermerge Worker

================

pmwroker's main job is OCR processing. It extracts text from pdf, tiff, jpeg and png.

For full project description please see [Papermerge Project](https://github.com/ciur/papermerge)

Requirements

=============

python >= 3.6

pmworker.wrapper uses subprocess.run method, method added in python 3.5.

Also argument of subprocess.run(encoding='utf-8') is used. This argument

was added python 3.6

Dependencies

=============

Depends on celery, tesseract, imagemagick.

Usage:

> export CELERY_CONFIG_MODULE='pmwroker.config'

> celery -A pmworker.celery worker -l info

Run Tests

=============

Run all tests:

    

    python3 ./test/run.py

Run specific test file:

    python3 ./test/run.py -p test_endpoint

Which is same as:

    python3 ./test/run.py -p test_endpoint.py

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ciur/papermerge-worker

Awesome Lists containing this project

README