{"id":50718190,"url":"https://github.com/aiptimizer/TurboOCR","last_synced_at":"2026-06-26T22:00:41.439Z","repository":{"id":350011331,"uuid":"1187099783","full_name":"aiptimizer/TurboOCR","owner":"aiptimizer","description":"Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.","archived":false,"fork":false,"pushed_at":"2026-05-11T21:45:46.000Z","size":79083,"stargazers_count":267,"open_issues_count":3,"forks_count":29,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-05-11T23:35:18.724Z","etag":null,"topics":["document-ai","document-parsing","easyocr","fastapi","fp16","gpu-ocr","grpc","inference-server","nvidia","ocr","paddleocr","pdf-extraction","qwen-vl","rag","tensorrt","text-detection","text-recognition"],"latest_commit_sha":null,"homepage":"https://turboocr.com","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aiptimizer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-20T10:42:34.000Z","updated_at":"2026-05-11T21:45:51.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/aiptimizer/TurboOCR","commit_stats":null,"previous_names":["aiptamize/turbo-ocr","aiptimizer/turboocr"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/aiptimizer/TurboOCR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aiptimizer%2FTurboOCR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aiptimizer%2FTurboOCR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aiptimizer%2FTurboOCR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aiptimizer%2FTurboOCR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aiptimizer","download_url":"https://codeload.github.com/aiptimizer/TurboOCR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aiptimizer%2FTurboOCR/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34834415,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["document-ai","document-parsing","easyocr","fastapi","fp16","gpu-ocr","grpc","inference-server","nvidia","ocr","paddleocr","pdf-extraction","qwen-vl","rag","tensorrt","text-detection","text-recognition"],"created_at":"2026-06-09T21:00:25.963Z","updated_at":"2026-06-26T22:00:41.432Z","avatar_url":"https://github.com/aiptimizer.png","language":"C++","funding_links":[],"categories":["Model Serving \u0026 Inference","*Ops for AI"],"sub_categories":["Inference Optimization","Model Serving \u0026 Inference"],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"tests/benchmark/comparison/images/banner.png\" alt=\"Turbo OCR — Fast GPU OCR server. 270 img/s on FUNSD.\" width=\"100%\"\u003e\n\u003c/p\u003e\n\n\u003c!--\nTurbo OCR — Fast GPU OCR server. C++ / CUDA / TensorRT. 270 img/s on FUNSD.\n--\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eGPU-accelerated OCR server. 50x faster than PaddleOCR Python.\u003c/strong\u003e\u003cbr\u003e\n  C++ / CUDA / TensorRT / PP-OCRv5 \u0026mdash; Linux + NVIDIA GPU\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/throughput-270_img%2Fs-blue?style=flat-square\u0026logo=speedtest\u0026logoColor=white\" alt=\"270 img/s\"\u003e\n  \u003ca href=\"https://turboocr.com\"\u003e\u003cimg src=\"https://img.shields.io/badge/website-turboocr.com-3B82F6?style=flat-square\u0026logo=googlechrome\u0026logoColor=white\" alt=\"turboocr.com\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/aiptimizer/TurboOCR/releases/latest\"\u003e\u003cimg src=\"https://img.shields.io/github/v/release/aiptimizer/TurboOCR?style=flat-square\u0026logo=github\u0026logoColor=white\" alt=\"Release\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://ghcr.io/aiptimizer/turboocr\"\u003e\u003cimg src=\"https://img.shields.io/badge/docker-ghcr.io-2496ED?style=flat-square\u0026logo=docker\u0026logoColor=white\" alt=\"Docker\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/C%2B%2B20-00599C?style=flat-square\u0026logo=cplusplus\u0026logoColor=white\" alt=\"C++20\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/CUDA-76B900?style=flat-square\u0026logo=nvidia\u0026logoColor=white\" alt=\"CUDA\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/TensorRT-10.16-76B900?style=flat-square\u0026logo=nvidia\u0026logoColor=white\" alt=\"TensorRT 10.16\"\u003e\n  \u003ca href=\"https://drogon.org\"\u003e\u003cimg src=\"https://img.shields.io/badge/Drogon-1.9-009688?style=flat-square\u0026logo=cplusplus\u0026logoColor=white\" alt=\"Drogon\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/nginx-009639?style=flat-square\u0026logo=nginx\u0026logoColor=white\" alt=\"nginx\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/gRPC-4285F4?style=flat-square\u0026logo=google\u0026logoColor=white\" alt=\"gRPC\"\u003e\n  \u003ca href=\"https://github.com/PaddlePaddle/PaddleOCR\"\u003e\u003cimg src=\"https://img.shields.io/badge/PP--OCRv5-PaddleOCR-0053D6?style=flat-square\u0026logo=paddlepaddle\u0026logoColor=white\" alt=\"PaddleOCR\"\u003e\u003c/a\u003e\n  \u003ca href=\"#monitoring\"\u003e\u003cimg src=\"https://img.shields.io/badge/Prometheus-E6522C?style=flat-square\u0026logo=prometheus\u0026logoColor=white\" alt=\"Prometheus\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-blue?style=flat-square\u0026logo=opensourceinitiative\u0026logoColor=white\" alt=\"MIT License\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#quick-start\"\u003eQuick Start\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#api\"\u003eAPI\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#benchmarks\"\u003eBenchmarks\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#monitoring\"\u003eMonitoring\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#configuration\"\u003eConfiguration\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#building-from-source\"\u003eBuild\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#roadmap\"\u003eRoadmap\u003c/a\u003e \u0026middot;\n  \u003ca href=\"https://turboocr.com\"\u003eWebsite\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n![Turbo-OCR vs alternatives on FUNSD](tests/benchmark/comparison/images/hero.png)\n\n\u003c!--\nThe image above is a chart that may be blocked by robots.txt. Below is the chart data as a text table for LLMs, scrapers, and other automated readers.\nTurbo-OCR vs PaddleOCR · EasyOCR · VLMs — FUNSD (50 pages, RTX 5090)\n| Engine | F1 (%) | Throughput (img/s) |\n|---|---:|---:|\n| Turbo-OCR (C++/TRT) | 90.2 | 268.4 |\n| PaddleOCR mobile latin | 86.6 | 5.1 |\n| Qwen3-VL-2B | 84.3 | 1.3 |\n| PaddleOCR-VL (pipeline) | 83.1 | 2.0 |\n| EasyOCR (Python) | 63.0 | 2.8 |\n--\u003e\n\n### Highlights\n\n- 🚀 **270 img/s** on FUNSD A4 forms (c=16) \u0026mdash; **1,200+ img/s** on sparse images\n- ⚡ **11 ms p50 latency**, single request\n- 🎯 **F1 = 90.2%** on FUNSD \u0026mdash; higher accuracy than PaddleOCR Python with the same weights\n- 🖨️ **Prints \u0026 handwriting** \u0026mdash; PP-OCRv5 handles both out of the box\n- 📄 **PDF native** \u0026mdash; pages rendered and OCR'd in parallel\n- 🔒 **4 PDF modes** \u0026mdash; pure OCR, native text layer, auto-dispatch, detection-verified hybrid\n- 🧩 **Layout detection** \u0026mdash; PP-DocLayoutV3 with 25 region classes, per-request `?layout=1` toggle\n- 📖 **Reading order** \u0026mdash; class-aware XY-cut (header → body → footer/reference), row-tolerant table-cell sort, orphan-aware placement, opt-in via `?reading_order=1`\n- 🌐 **HTTP + gRPC** from a single binary, sharing the same GPU pipeline pool\n- 🐳 **One-line Docker deploy** \u0026mdash; `docker run` with auto TRT engine build on first start\n- 📊 **Prometheus metrics** \u0026mdash; request counters, latency histograms, VRAM usage on `/metrics`\n- 🌐 Configurable languages (Latine e.g., English, French, German, Spanish, Portuguese; Chinese, Greek, Russian, Arabic, Korean, Thai)\n\n*RTX 5090, PP-OCRv5 mobile latin, TensorRT FP16, pool=5. Prints, handwriting, layout detection. This is the fast lane.*\n\n### 🗺️ Roadmap\n- 🔍 Structured extraction\n- 📝 Markdown output\n- 📊 Table parsing\n\n---\n\n## Quick Start\n\n**Requirements:** Linux, NVIDIA driver 595+, Turing or newer GPU (RTX 20-series / GTX 16-series+).\n\n```bash\ndocker run --gpus all -p 8000:8000 -p 50051:50051 \\\n  -v trt-cache:/home/ocr/.cache/turbo-ocr \\\n  ghcr.io/aiptimizer/turboocr:v2.3.0\n```\n\nFirst startup builds TensorRT engines from ONNX. This takes about 90 seconds on a 5090 GPU and up to an hour on older ones. Set TRT_OPT_LEVEL=3 to\n  cut build time 3 to 5x with a small speed regression. The volume caches the engines, so subsequent starts are instant. During the build, requests will\n  return a connection refused error from nginx until the backend is ready. nginx (port 8000) reverse-proxies to Drogon (port 8080), and both start\n  automatically.\n\n```bash\ncurl -X POST http://localhost:8000/ocr/raw \\\n  --data-binary @document.png -H \"Content-Type: image/png\"\n```\n\n```json\n{\n  \"results\": [\n    {\"text\": \"Invoice Total\", \"confidence\": 0.97, \"bounding_box\": [[42,10],[210,10],[210,38],[42,38]]}\n  ]\n}\n```\n\n---\n\n## API\n\nHTTP on port 8000, gRPC on port 50051 — single binary, shared GPU pipeline pool.\n\n\u003e **Important:** Use persistent connections (HTTP keep-alive). Sending many short-lived connections (e.g. one `curl` per request in a loop) can overwhelm the server and cause it to stall. All standard HTTP client libraries (`requests.Session`, `aiohttp`, Go `http.Client`, etc.) reuse connections by default.\n\n### Endpoints\n\n| Endpoint | Input | Description |\n|----------|-------|-------------|\n| `/health` | — | Returns `\"ok\"` |\n| `/health/live` | — | Kubernetes liveness probe |\n| `/health/ready` | — | Readiness probe — verifies GPU pipeline is responsive |\n| `/ocr/raw` | Raw image bytes | Fastest path — PNG, JPEG, etc. |\n| `/ocr` | `{\"image\": \"\u003cbase64\u003e\"}` | For clients that can only send JSON |\n| `/ocr/batch` | `{\"images\": [\"\u003cb64\u003e\", ...]}` | Multiple images in one request |\n| `/ocr/pixels` | Raw BGR bytes + `X-Width` / `X-Height` / `X-Channels` headers | Zero-decode path — see [/ocr/pixels](#ocrpixels-zero-decode-path) |\n| `/ocr/pdf` | Raw bytes, `{\"pdf\": \"\u003cb64\u003e\"}`, or `multipart/form-data` | All pages OCR'd in parallel |\n| `/metrics` | — | Prometheus metrics (text exposition format) |\n| gRPC | Raw bytes (protobuf) | Port 50051 — see `proto/ocr.proto` |\n\n### Query Parameters\n\n| Parameter | Endpoints | Values | Default |\n|-----------|-----------|--------|---------|\n| `layout` | all | `0` / `1` | `0` — include [layout regions](#layout-detection) (~20% throughput cost) |\n| `reading_order` | image routes | `0` / `1` | `0` — emit `reading_order` array indexing `results` in proper reading order (auto-enables `layout=1`). Class-aware: header → body → footer/footnote/reference; XY-cut on body with row-tolerant table-cell sort and orphan placement |\n| `as_blocks` | image + PDF routes | `0` / `1` | `0` — when `1`, response includes a `blocks` array: paragraph-level aggregate, one entry per non-empty layout cell, in reading order. Auto-enables `layout=1` and `reading_order=1`. Each block has `{id, layout_id, class, bounding_box, content, order_index}`. Mirrors PaddleX PP-StructureV3 `parsing_res_list` granularity. |\n| `mode` | `/ocr/pdf` | `ocr` / `geometric` / `auto` / `auto_verified` | `ocr` — on the CPU binary, `auto_verified` is silently aliased to `auto` (no native text re-verifier on CPU). Inspect the per-page `mode` field in the response to see which path actually ran. |\n| `dpi` | `/ocr/pdf` | `50`–`600` | `100` — render resolution |\n\n**Parameter parsing rules.** Parameter *names* are case-sensitive: `?layout=1` works, `?Layout=1` is silently ignored. Boolean values for `layout`, `reading_order`, and `as_blocks` accept any case of `1/0`, `true/false`, `on/off`, `yes/no`, and reject anything else with `400 INVALID_PARAMETER`. Values for `mode=` are **case-sensitive and silently fall back to the configured default** when unrecognized — `?mode=Auto`, `?mode=AUTO`, or `?mode=foobar` all run as `mode=ocr` (or whatever `ENABLE_PDF_MODE` is set to) without error. Always pass exactly `ocr`, `geometric`, `auto`, or `auto_verified`.\n\n### Examples\n\n```bash\n# Image — raw bytes (fastest)\ncurl -X POST http://localhost:8000/ocr/raw \\\n  --data-binary @doc.png -H \"Content-Type: image/png\"\n\n# Image — base64 JSON\ncurl -X POST http://localhost:8000/ocr \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"image\":\"'$(base64 -w0 doc.png)'\"}'\n\n# PDF — raw bytes\ncurl -X POST http://localhost:8000/ocr/pdf \\\n  --data-binary @document.pdf\n\n# PDF — multipart (works from any client, including browsers)\ncurl -X POST http://localhost:8000/ocr/pdf \\\n  -F \"file=@document.pdf\"\n\n# PDF — with layout + auto mode\ncurl -X POST \"http://localhost:8000/ocr/pdf?layout=1\u0026mode=auto\" \\\n  --data-binary @document.pdf\n\n# gRPC (grpcurl uses base64 for CLI; real clients send raw bytes)\ngrpcurl -plaintext -d '{\"image\":\"'$(base64 -w0 doc.png)'\"}' \\\n  localhost:50051 ocr.OCRService/Recognize\n```\n\n### `/ocr/pixels` (zero-decode path)\n\nFor clients that already hold a decoded image in memory (NumPy, OpenCV, custom pipelines), `/ocr/pixels` skips the PNG/JPEG decode step entirely. The body is sent as raw pixel bytes; dimensions travel in HTTP headers.\n\n| Header | Required | Values | Meaning |\n|--------|:---:|--------|---------|\n| `X-Width` | yes | `1`–`MAX_IMAGE_DIM` (default `16384`) | Image width in pixels |\n| `X-Height` | yes | `1`–`MAX_IMAGE_DIM` (default `16384`) | Image height in pixels |\n| `X-Channels` | no | `1` or `3` (default `3`) | `3` = BGR (OpenCV order, **not** RGB), `1` = grayscale |\n\n- **Body:** raw pixel bytes, length must equal `width * height * channels` exactly. A mismatch returns `400 BODY_SIZE_MISMATCH`.\n- **Query parameters:** the same `?layout=` and `?reading_order=` as `/ocr` apply.\n- **Errors:** `MISSING_HEADER` (no `X-Width` / `X-Height`), `INVALID_HEADER` (unparseable values), `INVALID_DIMENSIONS` (non-positive size or channels other than 1/3), `DIMENSIONS_TOO_LARGE` (exceeds `MAX_IMAGE_DIM`).\n- **Use case:** the hot path when upstream code already has a decoded `cv::Mat` / `np.ndarray` and you don't want to round-trip through PNG.\n\n```bash\n# Python — send a decoded OpenCV image (BGR)\npython -c \"\nimport cv2, requests\nimg = cv2.imread('doc.png')        # BGR, HxWx3\nh, w, c = img.shape\nrequests.post('http://localhost:8000/ocr/pixels',\n              data=img.tobytes(),\n              headers={'X-Width': str(w), 'X-Height': str(h), 'X-Channels': str(c)})\n\"\n```\n\n### Response Format\n\n**Image endpoints** return:\n```json\n{\"results\": [{\"text\": \"Invoice Total\", \"confidence\": 0.97, \"bounding_box\": [[42,10],[210,10],[210,38],[42,38]]}]}\n```\n\n**With `?layout=1`**, a `layout` array is added. Each OCR result gets a `layout_id` linking it to the containing layout region:\n```json\n{\n  \"results\": [{\"text\": \"...\", \"confidence\": 0.97, \"id\": 0, \"layout_id\": 2, \"bounding_box\": [...]}],\n  \"layout\": [{\"id\": 0, \"class\": \"header\", \"confidence\": 0.91, \"bounding_box\": [...]},\n             {\"id\": 2, \"class\": \"table\", \"confidence\": 0.95, \"bounding_box\": [...]}]\n}\n```\n\n**PDF endpoint** wraps results per page:\n```json\n{\n  \"pages\": [{\n    \"page\": 1, \"page_index\": 0, \"dpi\": 100, \"width\": 1047, \"height\": 1389,\n    \"mode\": \"ocr\", \"text_layer_quality\": \"absent\", \"results\": [...]\n  }]\n}\n```\nCoordinate conversion: `x_pdf = x_px * 72 / dpi`.\n\nPer-page fields:\n- `mode` — the **resolved** mode that actually ran on this page (`ocr` / `geometric` / `auto_verified`). For `?mode=auto` requests, each page resolves to either `geometric` (text layer accepted) or `ocr` (fell back to OCR), never `auto`. On the CPU binary, `?mode=auto_verified` resolves to `auto` semantics, so per-page `mode` will be `geometric` or `ocr` — `auto_verified` only appears on the GPU binary.\n- `text_layer_quality` — assessment of the page's native text layer:\n  - `absent` — no usable text layer (image-only PDF, fewer than 10 chars, or empty lines)\n  - `rejected` — text layer present but failed sanity checks (non-zero rotation, \u003e5% replacement chars, \u003e10% non-printable chars)\n  - `trusted` — native text passed sanity checks and was used (`geometric` / `auto`) or considered for cross-check (`auto_verified`)\n  - For `mode=ocr` this is always `absent` (the text-layer pre-pass is skipped entirely).\n\n### PDF Extraction Modes\n\n| Mode | What it does | Speed |\n|------|-------------|-------|\n| `ocr` | Render + full OCR pipeline | Baseline |\n| `geometric` | PDFium text layer only, no rasterization | ~10x faster |\n| `auto` | Per-page: text layer if available, else OCR | Fastest for mixed PDFs |\n| `auto_verified` | Full pipeline + replace with native text where sanity check passes | Slightly slower than OCR |\n\n\u003e [!CAUTION]\n\u003e **PDF text-layer trust model.** Modes other than `ocr` read the PDF's native text layer, which the PDF author controls. A malicious PDF can embed invisible text, remap glyphs via ToUnicode, or inject arbitrary strings that differ from what's visually rendered.\n\u003e\n\u003e **When to use each mode:**\n\u003e | Scenario | Recommended mode | Why |\n\u003e |----------|-----------------|-----|\n\u003e | Untrusted uploads (user-submitted PDFs) | `ocr` | Only trusts pixel data — immune to text-layer manipulation |\n\u003e | Internal/trusted documents | `auto` or `geometric` | Safe when you control the PDF source; much faster |\n\u003e | High-accuracy with verification | `auto_verified` | OCR runs first, then results are cross-checked against the text layer. Accepts native text only if it passes heuristic validation (character count, non-printable ratio \u003c 10%, replacement char ratio \u003c 5%, no rotation) |\n\u003e\n\u003e **Default:** `mode=ocr` (safest). Override per-request via `?mode=` query parameter or globally via `ENABLE_PDF_MODE` env var.\n\u003e\n\u003e **Deployment recommendation:** If your service accepts PDFs from untrusted sources, do **not** set `ENABLE_PDF_MODE` to `geometric` or `auto` globally. Keep the default `ocr` and only use text-layer modes for trusted internal workflows.\n\n### Layout Detection\n\nAll endpoints accept `?layout=1` to detect document regions using [PP-DocLayoutV3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3) (25 classes):\n\n`abstract` · `algorithm` · `aside_text` · `chart` · `content` · `display_formula` · `doc_title` · `figure_title` · `footer` · `footer_image` · `footnote` · `formula_number` · `header` · `header_image` · `image` · `inline_formula` · `number` · `paragraph_title` · `reference` · `reference_content` · `seal` · `table` · `text` · `vertical_text` · `vision_footnote`\n\n#### Layout classes (reading-order buckets)\n\nWhen `?reading_order=1` is set, classes are partitioned into three strata before XY-cut runs, so common page furniture lands in the right slot regardless of where the layout model placed it spatially: `TOP` is read first, then `BODY` (sorted by XY-cut), then `BOTTOM`.\n\n| Class ID | Name | Bucket |\n|---:|---|---|\n| 0  | `abstract`           | BODY   |\n| 1  | `algorithm`          | BODY   |\n| 2  | `aside_text`         | BODY   |\n| 3  | `chart`              | BODY   |\n| 4  | `content`            | BODY   |\n| 5  | `display_formula`    | BODY   |\n| 6  | `doc_title`          | BODY   |\n| 7  | `figure_title`       | BODY   |\n| 8  | `footer`             | BOTTOM |\n| 9  | `footer_image`       | BOTTOM |\n| 10 | `footnote`           | BOTTOM |\n| 11 | `formula_number`     | BODY   |\n| 12 | `header`             | TOP    |\n| 13 | `header_image`       | TOP    |\n| 14 | `image`              | BODY   |\n| 15 | `inline_formula`     | BODY   |\n| 16 | `number`             | BODY   |\n| 17 | `paragraph_title`    | BODY   |\n| 18 | `reference`          | BOTTOM |\n| 19 | `reference_content`  | BOTTOM |\n| 20 | `seal`               | BODY   |\n| 21 | `table`              | BODY   |\n| 22 | `text`               | BODY   |\n| 23 | `vertical_text`      | BODY   |\n| 24 | `vision_footnote`    | BOTTOM |\n\nClass 16 (`number`, page numbers) deliberately stays in BODY because page numbers can appear at the top **or** the bottom of a page — XY-cut places them by geometry. Class IDs are pinned with `static_assert` against the PaddleX label list, so a future re-shuffle would fail the build rather than silently misroute classes.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"tests/benchmark/comparison/images/layout_example.png\" alt=\"Layout detection overlay\" width=\"500\"\u003e\n  \u003cbr\u003e\u003csub\u003eLayout detection overlay — color-coded regions: \u003cspan style=\"color:#9C27B0\"\u003eparagraph_title\u003c/span\u003e, \u003cspan style=\"color:#2196F3\"\u003etext\u003c/span\u003e, \u003cspan style=\"color:#00BCD4\"\u003echart\u003c/span\u003e, \u003cspan style=\"color:#FFC107\"\u003efigure_title\u003c/span\u003e, \u003cspan style=\"color:#F44336\"\u003eheader\u003c/span\u003e, \u003cspan style=\"color:#607D8B\"\u003efooter\u003c/span\u003e, \u003cspan style=\"color:#646464\"\u003enumber\u003c/span\u003e\u003c/sub\u003e\n\u003c/p\u003e\n\n---\n\n## Benchmarks\n\nFUNSD form-understanding dataset (50 pages, ~170 words/page). Same word-level F1 metric for all engines. Single RTX 5090.\n\n![Accuracy](tests/benchmark/comparison/images/accuracy_v2.png)\n\n\u003c!--\nOCR Accuracy — FUNSD · 50 images · ~174 words/img\n| Engine | F1 (%) | Recall (%) | Precision (%) |\n|---|---:|---:|---:|\n| Turbo-OCR (C++/TRT) | 90.2 | 91.6 | 88.8 |\n| PaddleOCR mobile latin | 86.6 | 85.5 | 88.2 |\n| Qwen3-VL-2B | 84.3 | 82.8 | 87.5 |\n| PaddleOCR-VL (pipeline) | 83.1 | 82.5 | 85.0 |\n| EasyOCR (Python) | 63.0 | 66.2 | 60.4 |\n--\u003e\n\n![Throughput](tests/benchmark/comparison/images/throughput_v2.png)\n\n\u003c!--\nOCR Throughput — FUNSD Dataset · Higher is Better\n| Engine | Throughput (img/s) |\n|---|---:|\n| Turbo-OCR (C++/TRT) | 268.4 |\n| PaddleOCR mobile latin | 5.1 |\n| EasyOCR (Python) | 2.8 |\n| PaddleOCR-VL (pipeline) | 2.0 |\n| Qwen3-VL-2B | 1.3 |\n--\u003e\n\n![Latency](tests/benchmark/comparison/images/latency_v2.png)\n\n\u003c!--\nOCR Latency — FUNSD Dataset · Lower is Better\n| Engine | p50 (ms) | p95 (ms) |\n|---|---:|---:|\n| Turbo-OCR (C++/TRT) | 11 | 16 |\n| PaddleOCR mobile latin | 182 | 352 |\n| Qwen3-VL-2B | 2859 | 6191 |\n| PaddleOCR-VL (pipeline) | 1513 | 6517 |\n| EasyOCR (Python) | 559 | 948 |\n--\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eBenchmark caveats\u003c/summary\u003e\n\n- **Crude accuracy metric.** Bag-of-words F1 ignores order and duplicate counts. CER or reading-order metrics would likely help VLM systems.\n- **VLMs could run faster.** Served via off-the-shelf vLLM in fp16. Quantization, speculative decoding, or a dedicated stack would push throughput higher.\n- **VLM prompts are untuned.** With prompt engineering both VLMs would likely surpass every CTC engine here.\n- **Single domain.** FUNSD is English business forms; other document types would look different.\n\nReproduce: `python tests/benchmark/comparison/bench_turbo_ocr.py` (requires running server + `datasets` library).\n\u003c/details\u003e\n\n---\n\n## Configuration\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `OCR_LANG` | *(unset = latin)* | Language bundle: `latin`, `chinese`, `greek`, `eslav`, `arabic`, `korean`, `thai`. All bundles are baked into the image at build time — no runtime download. |\n| `OCR_SERVER` | *(unset)* | With `OCR_LANG=chinese`, set to `1` to use the 84 MB PP-OCRv5 server rec instead of the 16 MB mobile rec. Ignored for other languages. |\n| `PIPELINE_POOL_SIZE` | auto | Concurrent GPU pipelines (~1.4 GB each) |\n| `DISABLE_LAYOUT` | `0` | Set to `1` to disable PP-DocLayoutV3 layout detection and save ~300-500 MB VRAM |\n| `ENABLE_PDF_MODE` | `ocr` | Default PDF mode: `ocr` / `geometric` / `auto` / `auto_verified` |\n| `DISABLE_ANGLE_CLS` | `0` | Skip angle classifier (~0.4 ms savings) |\n| `DET_MAX_SIDE` | `960` | Max detection input side (px). Bounds: 32–4096. The TRT engine profile is built to match this value; changing it invalidates the cached engine and triggers a one-time rebuild. |\n| `TRT_OPT_LEVEL` | `5` | TensorRT builder optimization level. Bounds: 0–5. Lower values trade runtime perf for faster cold builds (`3` typically builds ~3-5× faster with \u003c5% runtime regression). The cache key includes the level, so different values produce separate engines. |\n| `TRT_ENGINE_CACHE` | `~/.cache/turbo-ocr` | Directory for cached TensorRT engines. Set to a host-mounted path to share engines across container restarts. |\n| `TURBO_OCR_HOST` | `0.0.0.0` | Bind address for HTTP and gRPC listeners. Default binds every IPv4 interface; use `127.0.0.1` for loopback only, `::` for all interfaces incl. IPv6, or a specific interface IP. Equivalent CLI flag: `--host`. |\n| `PORT` / `GRPC_PORT` | `8080` / `50051` | Server ports. The binary listens on `PORT=8080` by default; the Docker image runs nginx in front of it on port `8000`, so external clients use `8000` and `PORT` only matters for direct/native runs. |\n| `PDF_DAEMONS` / `PDF_WORKERS` | `16` / `4` | PDF render parallelism |\n| `GRPC_BATCH_WORKERS` | `8` | Parallel workers in gRPC `RecognizeBatch` for fan-out across pipeline pool |\n| `HTTP_THREADS` | `pool * 32` | Work pool threads for blocking inference |\n| `MAX_PDF_PAGES` | `2000` | Maximum pages per PDF request |\n| `SHUTDOWN_GRACE_SECONDS` | `30` | Seconds to wait for inflight requests to drain on SIGTERM/SIGINT before tearing down. Set to stay below your orchestrator's SIGKILL grace (K8s default 30s). |\n| `GRPC_CQS` | `10` | Number of gRPC completion queues. Higher values trade memory for connection-handling parallelism on high-fanout deployments. |\n| `GRPC_RESPONSE_MODE` | `json_bytes` | gRPC response format: `json_bytes` (default — full JSON in `json_response` field) or `structured` (typed protobuf fields). |\n| `MAX_BODY_MB` | `100` | Max request body size in MB. Applied at all three layers: nginx (413 at proxy), Drogon HTTP (`setClientMaxBodySize`), and gRPC (`SetMaxReceive/SendMessageSize`). Bounds: 1–102400. |\n| `MAX_BODY_MEMORY_MB` | `min(1024, MAX_BODY_MB)` — effectively `100` with stock config | Per-request in-memory buffer threshold. Bodies up to this size stay in RAM; larger ones spill to a tempfile under `/tmp`. Always clamped to `[1, MAX_BODY_MB]`, so the effective default tracks `MAX_BODY_MB`. Raise `MAX_BODY_MB` first to unlock larger in-memory buffers. Lower on memory-constrained hosts (e.g. `MAX_BODY_MEMORY_MB=50` caps buffer RSS at ~50 MB × concurrent requests). |\n| `MAX_IMAGE_DIM` | `16384` | Max width or height (px) accepted on `/ocr/pixels` and image-decode routes. Bounds: 64–65535. |\n| `LOG_LEVEL` | `info` | Log level: `debug` / `info` / `warn` / `error` |\n| `LOG_FORMAT` | `json` | Log format: `json` (structured) / `text` (human-readable) |\n| `TOCR_LOG_RATELIMIT` | `10` | Max rate-limited logs per call site per 1s window (applies to per-request error paths). `0` disables. Format `N` or `N:W_MS` (e.g. `5:2000` = 5 logs / 2s). On window roll a single `[suppressed logs]` rollup line is emitted. |\n\nEvery knob above is also exposed as a CLI flag (`--http-port`, `--max-body-mb`, `--disable-layout`, `--det-max-side`, `--log-level`, etc.). The two exceptions, which remain env-only because their valid set is context-dependent, are `OCR_LANG` (validated against installed model bundles at first request) and `TOCR_LOG_RATELIMIT` (custom `N` or `N:W_MS` format). CLI flags override env vars when both are set. Useful flags for inspection:\n\n```\npaddle_highspeed_cpp --help            # full flag listing\npaddle_highspeed_cpp --print-config    # resolved JSON config; exit 0\npaddle_highspeed_cpp --check-config    # validate only; exit 0 on valid, 2 on errors\n```\n\nMalformed env vars or out-of-range values cause startup to fail with a clear error list — the server refuses to bind rather than silently coerce bad input (e.g. `PORT=abc` used to become `1`; it now exits with `[config error] PORT=\"abc\" is not a valid integer`). Validate config without booting the pipeline using `--check-config`.\n\nLayout detection is **enabled by default**. The model is loaded at startup but only runs when a request includes `?layout=1`. Requests without `?layout=1` have zero overhead. Requests with `?layout=1` reduce throughput by ~20%. Set `DISABLE_LAYOUT=1` to skip loading the model entirely and save ~300-500 MB VRAM.\n\n\u003e **Migration note (v2.3+):** The legacy `ENABLE_LAYOUT` env var has been removed. If set, startup fails with a clear error — use `DISABLE_LAYOUT=1` to disable layout, or remove the var (layout is on by default).\n\n```bash\ndocker run --gpus all -p 8000:8000 \\\n  -v trt-cache:/home/ocr/.cache/turbo-ocr \\\n  -e PIPELINE_POOL_SIZE=3 \\\n  ghcr.io/aiptimizer/turboocr:v2.3.0\n```\n\nAdd `MAX_PDF_PAGES` (default `2000`) to limit the number of pages processed per PDF request. `LOG_LEVEL` (`debug`/`info`/`warn`/`error`) and `LOG_FORMAT` (`json`/`text`) control structured logging output.\n\n---\n\n## Monitoring\n\n### Prometheus Metrics\n\nScrape `GET /metrics` for Prometheus-compatible metrics:\n\n```\nturbo_ocr_requests_total{route=\"/ocr/raw\",status=\"2xx\"} 1042\nturbo_ocr_request_duration_seconds_bucket{route=\"/ocr/raw\",le=\"0.025\"} 980\nturbo_ocr_request_duration_seconds_sum{route=\"/ocr/raw\"} 12.345\nturbo_ocr_request_duration_seconds_count{route=\"/ocr/raw\"} 1042\nturbo_ocr_gpu_vram_used_bytes 9052815360\nturbo_ocr_gpu_vram_total_bytes 33661911040\nturbo_ocr_pipeline_pool_size 5\nturbo_ocr_pool_exhaustions_total 0\nturbo_ocr_request_bytes_total 49493243\nturbo_ocr_request_body_avg_bytes 9407\n```\n\n### Response Headers\n\nEvery response includes:\n\n| Header | Description |\n|--------|-------------|\n| `X-Request-Id` | UUID v7 (or propagated from client `X-Request-Id` header) |\n| `X-Inference-Time-Ms` | End-to-end processing time in milliseconds |\n| `Retry-After` | Seconds to wait (only on 503 responses) |\n\n### Health Endpoints\n\n| Endpoint | Description |\n|----------|-------------|\n| `GET /health` | Basic liveness check |\n| `GET /health/live` | Kubernetes liveness probe |\n| `GET /health/ready` | Readiness probe \u0026mdash; verifies GPU pipeline is responsive |\n\n### Structured Errors\n\nAll error responses return JSON with `Content-Type: application/json`:\n\n```json\n{\"error\": {\"code\": \"EMPTY_BODY\", \"message\": \"Empty body\"}}\n```\n\nError codes: `EMPTY_BODY`, `INVALID_JSON`, `MISSING_IMAGE`, `BASE64_DECODE_FAILED`, `IMAGE_DECODE_FAILED`, `INVALID_PARAMETER`, `UNSUPPORTED_PARAMETER`, `INVALID_DPI`, `INVALID_DIMENSIONS`, `DIMENSIONS_TOO_LARGE`, `BODY_SIZE_MISMATCH`, `MISSING_HEADER`, `INVALID_HEADER`, `EMPTY_BATCH`, `MISSING_FILE`, `MISSING_PDF`, `INVALID_MULTIPART`, `PDF_RENDER_FAILED`, `PDF_TOO_LARGE`, `EMPTY_PDF`, `SERVER_BUSY`, `NOT_READY`, `INFERENCE_ERROR`.\n\n---\n\n## Building from Source\n\n| Dependency | GPU | CPU |\n|-----------|:---:|:---:|\n| GCC 13.3+ / C++20 | x | x |\n| CUDA + TensorRT 10.2+ | x | |\n| OpenCV 4.x | x | x |\n| Drogon 1.9+ | x | x |\n| gRPC + Protobuf | x | |\n| ONNX Runtime 1.22+ | | x |\n\nWuffs, Clipper, PDFium vendored in `third_party/`.\n\n```bash\n# Docker (recommended)\ndocker build -f docker/Dockerfile.gpu -t turboocr .\ndocker run --gpus all -p 8000:8000 -p 50051:50051 \\\n  -v trt-cache:/home/ocr/.cache/turbo-ocr turboocr\n\n# CPU only (Docker) — ~2-3 img/s, mainly for testing\ndocker build -f docker/Dockerfile.cpu -t turboocr-cpu .\ndocker run -p 8000:8000 turboocr-cpu\n\n# Native build — PP-OCRv5 models auto-fetched into ./models/ on first build\ncmake -B build -DTENSORRT_DIR=/usr/local/tensorrt\ncmake --build build -j$(nproc)\nLD_LIBRARY_PATH=/usr/local/tensorrt/lib ./build/paddle_highspeed_cpp\n\n# CPU-only native\ncmake -B build_cpu -DUSE_CPU_ONLY=ON\ncmake --build build_cpu -j$(nproc)\n./build_cpu/paddle_cpu_server\n\n# If your distro's gRPC CMake config conflicts with system protobuf,\n# add -DCMAKE_DISABLE_FIND_PACKAGE_gRPC=ON to fall back to pkg-config.\n# To skip the model auto-fetch (e.g. in CI), add -DFETCH_MODELS=OFF.\n\n# CUDA SM target. Native builds default to sm_120 (Blackwell, RTX 50-series)\n# only — the full multi-arch fat binary is ~12.5 GB and adds 10-15 s of\n# PTX-JIT to first-start on cold cache. To target other GPUs, opt back in:\n#   cmake -B build -DCMAKE_CUDA_ARCHITECTURES=\"86;89;120\" ...\n# Reference: 75=Turing, 80=A100, 86=Ampere consumer, 89=Ada, 90=Hopper,\n# 100=Blackwell DC, 120=Blackwell consumer.\n```\n\n---\n\n## Supported Languages\n\nSet via the `OCR_LANG` environment variable. Every supported language bundle is baked into the image at build time from the pinned PP-OCRv5 GitHub Release (SHA256-verified). No runtime downloads, no network dependency at container start.\n\n| `OCR_LANG` | Script / family | Notes |\n|---|---|---|\n| *(unset)* / `latin` | Latin + basic Greek (English, German, French, Italian, Polish, Czech, …) | 836-char dict; what powers the benchmarks above |\n| `chinese` | Simplified + Traditional Chinese | 18,385-class mobile rec (16 MB); set `OCR_SERVER=1` for the 84 MB server variant |\n| `greek` | dedicated Greek rec | 356-class Greek-specialized rec (7.8 MB) — higher accuracy than Latin's combined dict |\n| `korean` | Hangul + basic Latin | 11,947-class rec (13 MB) |\n| `arabic`, `eslav`, `thai` | per-script PP-OCRv5 | 7-8 MB each |\n\n```bash\n# Chinese\ndocker run --gpus all -p 8000:8000 -p 50051:50051 \\\n  -v trt-cache:/home/ocr/.cache/turbo-ocr \\\n  -e OCR_LANG=chinese \\\n  ghcr.io/aiptimizer/turboocr:v2.3.0\n```\n\n\u003e **Volume tip:** use a **named** volume (`trt-cache:`) as shown above, not a\n\u003e host bind-mount. Named volumes auto-populate from the image on first use,\n\u003e so the baked language bundles survive. A bind-mount of an empty host\n\u003e directory would shadow `/home/ocr/.cache/turbo-ocr` and leave the server\n\u003e with nothing to load.\n\nRun `tests/language_smoketest.py` to verify any language end-to-end on your\nhardware (renders a short phrase, OCRs it, checks char-recall against a\nper-language threshold).\n\n---\n\n## Acknowledgements\n\nThis project builds on the work of several open-source projects:\n\n- **[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)** (Baidu) — PP-OCRv5 detection, recognition, and classification models. PP-DocLayoutV3 layout detection model. This project would not exist without their research and pre-trained weights.\n- **[Drogon](https://drogon.org)** — high-performance async C++ HTTP framework\n- **[Wuffs](https://github.com/google/wuffs)** — fast PNG decoder by Google (vendored)\n- **[PDFium](https://pdfium.googlesource.com/pdfium/)** — PDF rendering and text extraction (vendored)\n- **[Clipper](http://www.angusj.com/delphi/clipper.php)** — polygon clipping for text detection post-processing (vendored)\n\n## License\n\nMIT. See [LICENSE](LICENSE).\n\n\u003cp align=\"center\"\u003e\n  \u003csub\u003eMain Sponsor: \u003ca href=\"https://miruiq.com\"\u003e\u003cstrong\u003eMiruiq\u003c/strong\u003e\u003c/a\u003e — AI-powered data extraction from PDFs and documents.\u003c/sub\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faiptimizer%2FTurboOCR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faiptimizer%2FTurboOCR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faiptimizer%2FTurboOCR/lists"}