https://github.com/tdiprima/trick-or-text
Generate OCR-challenging PNG and DICOM test images, then run engines and comparisons.
https://github.com/tdiprima/trick-or-text
dicom document-analysis gemma4 lightonocr ocr synthetic-data tesseract
Last synced: 5 days ago
JSON representation
Generate OCR-challenging PNG and DICOM test images, then run engines and comparisons.
- Host: GitHub
- URL: https://github.com/tdiprima/trick-or-text
- Owner: tdiprima
- License: mit
- Created: 2026-04-23T19:23:13.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-06T15:23:11.000Z (26 days ago)
- Last Synced: 2026-05-06T16:38:39.980Z (26 days ago)
- Topics: dicom, document-analysis, gemma4, lightonocr, ocr, synthetic-data, tesseract
- Language: Python
- Homepage:
- Size: 44.9 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Trick or Text 🎃 🦇
OCR stress demo for Chandra OCR 2, pytesseract, local Ollama vision models, and optional model adapters.
This repo creates two synthetic OCR-hostile images on Linux:
- a PNG
- a fake DICOM image
The text is intentionally packed with ambiguous glyphs such as `I`, `l`, `1`, `O`, `0`, `rn`, punctuation, and symbols so OCR quality is easier to inspect.
## Files
- `generate_fake_images.py`: builds the PNG and DICOM fixtures under `artifacts/inputs/`
- `run_chandra_ocr.py`: converts the DICOM into a PNG for OCR, runs Chandra OCR 2 with the HuggingFace backend, and writes what Chandra recognized
- `compare_ocr_engines.py`: runs selected OCR engines, scores each against ground truth, records OCR timing, and writes machine-readable plus human-readable comparisons
- `install_and_run.sh`: creates `.venv`, installs dependencies, generates the files, and runs the comparison pipeline
- `run_processing_and_comparison.sh`: skips installation, regenerates the fixtures, and runs the comparison pipeline with already-installed dependencies
## Usage
Run the whole pipeline on Rocky Linux, RHEL, or Ubuntu:
```bash
./install_and_run.sh
```
After dependencies are installed, rerun only processing and comparison:
```bash
./run_processing_and_comparison.sh
```
Run a specific engine set:
```bash
OCR_ENGINES=pytesseract,chandra ./install_and_run.sh
```
By default the comparison also asks a local Ollama server on `http://localhost:11434`
to run `gemma4:latest` and `gemma4:26b` as image-to-text decoders:
```bash
OCR_ENGINES=pytesseract,chandra,ollama-gemma4-latest,ollama-gemma4-26b ./install_and_run.sh
```
Use a different Ollama endpoint if needed:
```bash
OLLAMA_BASE_URL=http://127.0.0.1:11434 ./install_and_run.sh
```
List available engine keys:
```bash
python compare_ocr_engines.py --list-engines
```
Run the optional LightOnOCR adapter:
```bash
INSTALL_LIGHTONOCR=1 ./install_and_run.sh
```
With `INSTALL_LIGHTONOCR=1`, the default engine set becomes
`pytesseract,chandra,ollama-gemma4-latest,ollama-gemma4-26b,lightonocr`.
You can also run it directly after installing optional dependencies:
```bash
python -m pip install torch transformers pillow
python compare_ocr_engines.py --engines pytesseract,chandra,lightonocr
```
To add another model or OCR library, add a class in `compare_ocr_engines.py` with:
- `key`: stable CLI/output identifier
- `display_name`: readable report name
- `prepare()`: one-time setup such as loading binaries, API clients, or model weights
- `recognize(sample, output_dir)`: per-sample OCR call that returns `OcrOutput`
Then register it in `build_engine_registry()`. The comparison pipeline automatically handles timing, metrics, summaries, and rankings for registered engines.
## Output
After the run, check:
- `artifacts/inputs/` for the generated `fake_ocr_hostile.png` and `fake_ocr_hostile.dcm`
- `artifacts/ocr/chandra/*/recognized.txt` for the OCR text Chandra produced
- `artifacts/ocr/comparison///recognized.txt` for comparison outputs written by engine adapters
- `artifacts/ocr/comparison///raw_response.json` for raw Ollama API responses
- `artifacts/ocr/comparison_summary.json` for machine-readable per-engine timing, text metrics, summaries, failures, and rankings
- `artifacts/ocr/comparison_report.txt` for the human-readable comparison and judgment
## Notes
- The DICOM file is synthetic and only meant to carry image pixels for this demo.
- The OCR step uses `chandra --method hf`, so the first run may download model weights and can take a while.
- The Ollama adapters require a running local Ollama server and locally available `gemma4:latest` and `gemma4:26b` models.
- The LightOnOCR adapter uses the Hugging Face `lightonai/LightOnOCR-1B-1025` model by default and may download large model weights on first use.
- The scripts assume `python3`, `venv`, a working `pip`, and the `tesseract` system binary are available on the Linux host.