{"id":51236748,"url":"https://github.com/kisaesdevlab/vibe-ocr-extractor","last_synced_at":"2026-06-28T21:01:19.035Z","repository":{"id":367137670,"uuid":"1279417966","full_name":"KisaesDevLab/Vibe-OCR-Extractor","owner":"KisaesDevLab","description":null,"archived":false,"fork":false,"pushed_at":"2026-06-24T18:16:41.000Z","size":36,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-24T19:08:23.942Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KisaesDevLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-24T17:07:06.000Z","updated_at":"2026-06-24T18:16:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/KisaesDevLab/Vibe-OCR-Extractor","commit_stats":null,"previous_names":["kisaesdevlab/vibe-ocr-extractor"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/KisaesDevLab/Vibe-OCR-Extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-OCR-Extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-OCR-Extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-OCR-Extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-OCR-Extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KisaesDevLab","download_url":"https://codeload.github.com/KisaesDevLab/Vibe-OCR-Extractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-OCR-Extractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34903523,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-28T02:00:05.809Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-28T21:01:17.949Z","updated_at":"2026-06-28T21:01:19.027Z","avatar_url":"https://github.com/KisaesDevLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vibe-OCR-Extractor\n\nA simple web GUI to upload a **PDF or image**, run it through your **local\nGLM-OCR** model (served with **llama.cpp**), view the extracted text, and\ndownload it as a `.txt` file.\n\n![flow](https://img.shields.io/badge/upload-%E2%86%92%20OCR%20%E2%86%92%20view%20%E2%86%92%20download-6c8cff)\n\n## Features\n\n- 📤 Drag-and-drop (or browse) upload for PDFs and common image formats\n- 📝 **Text-layer detection** — for PDFs that already contain a text layer, the\n  exact text is pulled with **pdf.js**, mirroring the\n  [Vibe-Transaction-Convertor](https://github.com/KisaesDevLab/Vibe-Transaction-Convertor)\n  so you see precisely what the converter will receive (no OCR needed)\n- 🧠 OCR via your **local GLM-OCR** server (OpenAI-compatible vision API)\n- ⚙️ **In-app Settings panel** — set the server IP/URL, model, API key, PDF\n  DPI, timeout, and OCR prompt right from the browser (with a *Test connection*\n  button). Changes are saved and persist across restarts.\n- 📄 Multi-page PDFs are rasterized and OCR'd page by page\n- 👀 View / edit the extracted text in the browser\n- 💾 One-click download to a `.txt` file (any edits you make are included)\n- 📋 Copy to clipboard\n- 🐳 Runs as a Docker image, published to **GHCR**\n\n## How it works\n\n```\nBrowser  ──upload──▶  Flask (app.py)\n                         │\n                         ├─ PDF → images (PyMuPDF)   image → normalized (Pillow)\n                         │\n                         └──▶  llama.cpp server (/v1/chat/completions, vision)\n                                        │\n                         ◀──────────  extracted text  ──────────┘\n```\n\nThis app is the front end — point it at any OpenAI-compatible endpoint with\nvision support. It is built and tested against **llama.cpp** (`llama-server`),\nwhich serves GLM-OCR on port `8080` by default.\n\n## Text layer vs OCR\n\nBank/credit-card PDFs come in two flavors: **digitally generated** (they carry a\nreal text layer) and **scanned** (just images). The converter extracts the text\nlayer when present and only OCRs scans. This app does the same so you can preview\nthe converter's exact input.\n\nDetection runs the same logic as the converter's `preprocess.ts` (via a small\nNode + `pdfjs-dist` helper in [`pdf_text/`](pdf_text/)):\n\n| Result | Meaning | What you get |\n| ------ | ------- | ------------ |\n| **text**   | \u003e50% of pages have text **and** \u003e100 avg chars/page | pdf.js text layer for every page |\n| **ocr**    | no text layer at all (a scan) | GLM-OCR for every page |\n| **hybrid** | some pages have text, others don't | text layer per text page, OCR for the scanned pages |\n\nThe **Extraction mode** setting controls this:\n\n- **Auto** *(default)* — detect the text layer and follow the routing above\n- **Text layer only** — always use pdf.js (errors if no extractor is installed)\n- **OCR only** — always rasterize and OCR, ignoring any text layer\n\nAfter extraction the UI shows which method was used, the detected route,\ntext-layer coverage, average chars/page, and a per-page method breakdown.\n\n\u003e **Node requirement:** text-layer extraction uses `pdfjs-dist` and needs\n\u003e Node.js. The Docker image bundles it. For a local Python run, install it once:\n\u003e `cd pdf_text \u0026\u0026 npm install`. If Node isn't available, **Auto** falls back to\n\u003e OCR and **Text layer only** reports an error.\n\n## Settings (configurable from the UI)\n\nClick the **⚙️ gear** in the top-right to open Settings. You can change:\n\n| Setting           | Description                                              |\n| ----------------- | -------------------------------------------------------- |\n| **Extraction mode** | Auto / Text layer only / OCR only (see above)          |\n| **GLM-OCR Base URL** | Your llama.cpp endpoint, e.g. `http://localhost:8080/v1` |\n| **Model name**    | Model id as registered on the server (`glm-ocr`)         |\n| **API key**       | Usually ignored by llama.cpp; leave as `EMPTY`           |\n| **PDF render DPI**| Rasterization quality for PDFs (50–600)                  |\n| **Timeout**       | Per-page request timeout in seconds (5–1800)             |\n| **OCR prompt**    | The instruction sent with each image                     |\n\nUse **🔌 Test connection** to confirm the server is reachable, or **Reset to\ndefaults** to revert. Settings are stored in a JSON file (`SETTINGS_FILE`) so\nthey survive restarts.\n\n## Quick start (local Python)\n\n```bash\n# 1. Install dependencies\npip install -r requirements.txt\n\n# 2. Install the pdf.js text-layer extractor (needs Node.js)\ncd pdf_text \u0026\u0026 npm install \u0026\u0026 cd ..\n\n# 3. (Optional) set defaults — or just configure them later in the UI\nexport GLM_OCR_BASE_URL=\"http://localhost:8080/v1\"   # llama.cpp default port\nexport GLM_OCR_MODEL=\"glm-ocr\"\n\n# 4. Run the web app\npython app.py\n\n# 5. Open the GUI\n#    http://127.0.0.1:5000\n```\n\n\u003e Step 2 is optional — the app runs without it, but text-layer detection is\n\u003e disabled (everything goes through OCR). The Docker image includes Node, so no\n\u003e extra step is needed there.\n\n## Run with Docker\n\n### Use the published GHCR image\n\n```bash\ndocker run --rm -p 5000:5000 \\\n  --add-host=host.docker.internal:host-gateway \\\n  -e GLM_OCR_BASE_URL=\"http://host.docker.internal:8080/v1\" \\\n  -e GLM_OCR_MODEL=\"glm-ocr\" \\\n  -v ocr-settings:/data \\\n  ghcr.io/kisaesdevlab/vibe-ocr-extractor:latest\n```\n\nThen open \u003chttp://localhost:5000\u003e.\n\n\u003e **Reaching llama.cpp on the host:** from inside the container, `localhost`\n\u003e is the container itself. Use `host.docker.internal` (enabled by the\n\u003e `--add-host` flag above) to reach a `llama-server` running on your host\n\u003e machine — e.g. `http://host.docker.internal:8080/v1`.\n\n### Or with Docker Compose\n\n```bash\ndocker compose up -d\n```\n\n`docker-compose.yml` wires up the port, the `host.docker.internal` mapping, and\na named volume for persisted settings. Edit the environment block to match your\nsetup (or set everything from the Settings UI afterwards).\n\n### Build the image yourself\n\n```bash\ndocker build -t vibe-ocr-extractor .\ndocker run --rm -p 5000:5000 vibe-ocr-extractor\n```\n\n## Container image (GHCR)\n\nEvery push to `main` (and every `v*` tag) builds and publishes a multi-tagged\nimage to the GitHub Container Registry via\n[`.github/workflows/docker-publish.yml`](.github/workflows/docker-publish.yml):\n\n```\nghcr.io/kisaesdevlab/vibe-ocr-extractor:latest\nghcr.io/kisaesdevlab/vibe-ocr-extractor:main\nghcr.io/kisaesdevlab/vibe-ocr-extractor:\u003cgit-sha\u003e\nghcr.io/kisaesdevlab/vibe-ocr-extractor:\u003cversion\u003e   # on v* tags\n```\n\nThe workflow authenticates with the built-in `GITHUB_TOKEN` (no extra secrets\nneeded). After the first successful publish, set the package visibility to\n*public* in the repo's **Packages** settings if you want to pull it without\nauthentication.\n\n## Environment variables\n\nThese set the **defaults** (the UI can override most of them at runtime):\n\n| Variable             | Default                       | Description                                          |\n| -------------------- | ----------------------------- | ---------------------------------------------------- |\n| `GLM_OCR_BASE_URL`   | `http://localhost:8080/v1`    | Base URL of the llama.cpp OpenAI-compatible API      |\n| `GLM_OCR_MODEL`      | `glm-ocr`                     | Model name as registered on your server              |\n| `GLM_OCR_API_KEY`    | `EMPTY`                       | API key (llama.cpp ignores it)                       |\n| `GLM_OCR_PROMPT`     | *(OCR instruction)*           | Instruction sent with each image                     |\n| `GLM_OCR_TIMEOUT`    | `180`                         | Per-image request timeout in seconds                 |\n| `PDF_RENDER_DPI`     | `200`                         | DPI used to rasterize PDF pages                      |\n| `EXTRACTION_MODE`    | `auto`                        | `auto` / `text` / `ocr` (text-layer routing)         |\n| `NODE_BIN`           | `node`                        | Node binary used for the pdf.js text-layer extractor |\n| `TEXT_LAYER_TIMEOUT` | `120`                         | Timeout (s) for the text-layer extractor             |\n| `SETTINGS_FILE`      | `settings.json` (`/data/...` in Docker) | Where UI settings are persisted          |\n| `MAX_CONTENT_LENGTH` | `52428800` (50 MB)            | Max upload size in bytes                             |\n| `HOST` / `PORT`      | `127.0.0.1` / `5000`          | Where the dev server listens (Docker uses `0.0.0.0`) |\n| `FLASK_DEBUG`        | `0`                           | Set to `1` for Flask debug mode                      |\n\n## Example: serving GLM-OCR with llama.cpp\n\n```bash\n# Vision models need both the model and its multimodal projector (mmproj).\nllama-server \\\n  -m GLM-OCR.gguf \\\n  --mmproj GLM-OCR-mmproj.gguf \\\n  --host 0.0.0.0 --port 8080 \\\n  --alias glm-ocr\n```\n\nThen run this app and (if needed) set the Base URL to `http://localhost:8080/v1`\nin Settings.\n\n## Supported file types\n\n`pdf`, `png`, `jpg`, `jpeg`, `webp`, `bmp`, `tif`, `tiff`, `gif`\n\n## Try it quickly\n\nThe [`samples/`](samples/) folder contains a `sample.pdf` and `sample.png` you\ncan upload to verify the end-to-end flow once your llama.cpp server is running.\n\n## Development\n\n```bash\npip install -r requirements-dev.txt\ncd pdf_text \u0026\u0026 npm install \u0026\u0026 cd ..   # for text-layer tests (else they skip)\nruff check .      # lint\npytest            # run the test suite\n```\n\nLinting (ruff) and tests (pytest) also run in CI on every push and pull request\nvia [`.github/workflows/ci.yml`](.github/workflows/ci.yml).\n\n## Project layout\n\n```\napp.py            Flask routes (/, /api/extract, /api/download, /api/config, /api/test-connection)\nextract.py        Routing: PDF text layer vs OCR (text / ocr / hybrid)\ntextlayer.py      Python wrapper around the pdf.js text-layer extractor\npdf_text/         Node + pdfjs-dist helper (extract.mjs) mirroring the converter\nocr.py            PDF/image → GLM-OCR → text\nsettings.py       Runtime-editable settings, persisted to JSON\nconfig.py         Defaults and static configuration\ntemplates/        index.html (the GUI + settings dialog)\nstatic/           style.css, app.js\nDockerfile        Container image (gunicorn)\ndocker-compose.yml\nsamples/          sample.pdf / sample.png for quick manual testing\ntests/            pytest suite\n.github/workflows/ci.yml               Lint (ruff) + tests (pytest)\n.github/workflows/docker-publish.yml   Build \u0026 publish to GHCR\nrequirements.txt  Python dependencies\nrequirements-dev.txt  Dev/test dependencies\n```\n\n## Notes\n\n- The GLM-OCR server is **not** bundled — start `llama-server` separately and\n  point the app at it.\n- Uploaded files are processed in memory and never written to disk.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkisaesdevlab%2Fvibe-ocr-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkisaesdevlab%2Fvibe-ocr-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkisaesdevlab%2Fvibe-ocr-extractor/lists"}