https://github.com/ocr-d/ocrmultieval
Extensible evaluation of (intermediate) results of an OCR workflow
https://github.com/ocr-d/ocrmultieval
ocr ocr-d ocr-evaluation
Last synced: about 2 months ago
JSON representation
Extensible evaluation of (intermediate) results of an OCR workflow
- Host: GitHub
- URL: https://github.com/ocr-d/ocrmultieval
- Owner: OCR-D
- Created: 2021-09-21T13:39:28.000Z (over 3 years ago)
- Default Branch: dev
- Last Pushed: 2022-02-23T16:07:35.000Z (about 3 years ago)
- Last Synced: 2025-02-03T01:33:59.770Z (4 months ago)
- Topics: ocr, ocr-d, ocr-evaluation
- Language: Python
- Homepage:
- Size: 12.8 MB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ocrmultieval
> Proof-of-concept for extensible evaluation of (intermediate) results of an OCR workflow
## Installation
```
make deps install
```## Architecture
All evaluation functionality is provided by *backends*.
Every backend inherits from [`EvalBackend`](ocrmultieval/backend.py) and must
implement a `compare_files` method, that accepts paths to and media types of
the Ground Truth and detection results, does the actual evaluation and returns
an [`EvalReport`](ocrmultieval/report.py).An `EvalReport` is a map of metrics to their resp. value and can be serialized
as JSON or CSV for further processing/analysis.The glue code for running the backends is in
[`ocrmultieval.runner.py`](ocrmultieval/runner.py).## Usage
### CLI
The `ocrmultieval compare` command line tool allows evaluating individual pages of GT
and detection with any of the available [backends](#backends).```
Usage: ocrmultieval compare [OPTIONS] {dinglehopper|ocrevalUAtion|PrimaTextEva
l|CorAsvAnnEval|CorAsvAnnCompare|OcrdSegmentEvalua
te|IsriOcreval} GT_FILE OCR_FILEOptions:
--gt-mediatype TEXT
--ocr-mediatype TEXT
--format [csv|json|yaml|xml]
-g, --pageId TEXT pageId to uniquely identify pages in a work
--help Show this message and exit.
```### OCR-D processor
The `ocrd-ocrmultieval` command line tool implments the [OCR-D processor
API](https://ocr-d.de/en/spec/cli) and can be used to process complete
workspaces.```
Usage: ocrd-ocrmultieval [OPTIONS]Evaluate
> Eval processor
Options:
-I, --input-file-grp USE File group(s) used as input
-O, --output-file-grp USE File group(s) used as output
-g, --page-id ID Physical page ID(s) to process
--overwrite Remove existing output pages/images
(with --page-id, remove only those)
-p, --parameter JSON-PATH Parameters, either verbatim JSON string
or JSON file path
-P, --param-override KEY VAL Override a single JSON object key-value pair,
taking precedence over --parameter
-m, --mets URL-PATH URL or file path of METS to process
-w, --working-dir PATH Working directory of local workspace
-l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
Log level
-C, --show-resource RESNAME Dump the content of processor resource RESNAME
-L, --list-resources List names of processor resources
-J, --dump-json Dump tool description as JSON and exit
-h, --help This help message
-V, --version Show versionParameters:
"backend" [string - "PrimaTextEval"]
Backend to use
Possible values: ["PrimaTextEval", "ocrevalUAtion", "dinglehopper",
"OcrdSegmentEvaluate", "IsriOcreval", "CorAsvAnnCompare"]
"format" [string - "csv"]
Output format
Possible values: ["csv", "json", "yaml", "xml"]
"config" [object]
Configuration to override defaultDefault Wiring:
['GT,OCR1'] -> ['GT_VS_OCR1']
```