{"id":50932042,"url":"https://github.com/kisaesdevlab/vibe-glm-ocr","last_synced_at":"2026-06-17T05:04:43.610Z","repository":{"id":351100442,"uuid":"1209575993","full_name":"KisaesDevLab/Vibe-GLM-OCR","owner":"KisaesDevLab","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-13T15:31:13.000Z","size":17,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-13T17:27:40.855Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KisaesDevLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-13T15:07:20.000Z","updated_at":"2026-04-13T15:31:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/KisaesDevLab/Vibe-GLM-OCR","commit_stats":null,"previous_names":["kisaesdevlab/vibe-glm-ocr"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/KisaesDevLab/Vibe-GLM-OCR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-GLM-OCR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-GLM-OCR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-GLM-OCR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-GLM-OCR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KisaesDevLab","download_url":"https://codeload.github.com/KisaesDevLab/Vibe-GLM-OCR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KisaesDevLab%2FVibe-GLM-OCR/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34434498,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-17T05:04:28.539Z","updated_at":"2026-06-17T05:04:43.604Z","avatar_url":"https://github.com/KisaesDevLab.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kisaes OCR Server\n\nSelf-contained Docker image running [llama.cpp](https://github.com/ggml-org/llama.cpp) server with [GLM-OCR](https://huggingface.co/ggml-org/GLM-OCR-GGUF) (0.9B parameter multimodal OCR model). Provides an OpenAI-compatible `/v1/chat/completions` endpoint that accepts base64-encoded images and returns recognized text or structured Markdown tables.\n\n**No HuggingFace downloads at runtime. No Ollama dependency. No model management. Pull the image, run it, send images.**\n\n## Quick Start\n\n```bash\n# Pull and run\ndocker pull ghcr.io/kisaesdevlab/vibe-glm-ocr:latest\ndocker run -p 8090:8090 ghcr.io/kisaesdevlab/vibe-glm-ocr:latest\n\n# Check health\ncurl http://localhost:8090/health\n\n# OCR a document\nBASE64=$(base64 -w0 document.png)\ncurl -s http://localhost:8090/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d \"{\n    \\\"model\\\": \\\"GLM-OCR\\\",\n    \\\"messages\\\": [{\n      \\\"role\\\": \\\"user\\\",\n      \\\"content\\\": [\n        {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": \\\"data:image/png;base64,$BASE64\\\"}},\n        {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Text Recognition:\\\"}\n      ]\n    }],\n    \\\"temperature\\\": 0.02\n  }\"\n```\n\n## Build from Source\n\n```bash\n# Clone and build\ngit clone https://github.com/KisaesDevLab/Vibe-GLM-OCR.git\ncd Vibe-GLM-OCR\ndocker compose -f docker-compose.dev.yml build\n\n# Run locally\ndocker compose -f docker-compose.dev.yml up\n```\n\n## OCR Prompts\n\nGLM-OCR supports two primary prompt modes:\n\n| Prompt | Use Case |\n|--------|----------|\n| `Text Recognition:` | General text extraction — receipts, forms, letters, any unstructured document |\n| `Table Recognition:` | Structured table extraction — returns Markdown or HTML tables |\n\n## API\n\n### Request\n\n```json\n{\n  \"model\": \"GLM-OCR\",\n  \"messages\": [{\n    \"role\": \"user\",\n    \"content\": [\n      {\n        \"type\": \"image_url\",\n        \"image_url\": {\n          \"url\": \"data:image/png;base64,{base64data}\"\n        }\n      },\n      {\n        \"type\": \"text\",\n        \"text\": \"Table Recognition:\"\n      }\n    ]\n  }],\n  \"temperature\": 0.02\n}\n```\n\n### Response\n\nStandard OpenAI chat completion format. OCR text is in `choices[0].message.content`:\n\n```json\n{\n  \"choices\": [{\n    \"message\": {\n      \"role\": \"assistant\",\n      \"content\": \"| Date | Description | Amount | Balance |\\n|---|---|---|---|\\n| 01/15 | Direct Deposit | 3,500.00 | 4,200.50 |\"\n    },\n    \"finish_reason\": \"stop\"\n  }],\n  \"usage\": {\n    \"prompt_tokens\": 1842,\n    \"completion_tokens\": 156,\n    \"total_tokens\": 1998\n  }\n}\n```\n\n### Endpoints\n\n| Path | Method | Description |\n|------|--------|-------------|\n| `/health` | GET | Returns `{\"status\":\"ok\"}` when model is loaded |\n| `/v1/chat/completions` | POST | OpenAI-compatible chat endpoint (OCR requests) |\n| `/metrics` | GET | Prometheus metrics (request count, latency, tokens) |\n\n## Configuration\n\nAll configuration is via environment variables:\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `OCR_PORT` | `8090` | Server listen port |\n| `OCR_THREADS` | `4` | CPU threads for inference |\n| `OCR_CTX_SIZE` | `32768` | Context window (must be \u003e= 16384 for GLM-OCR images) |\n| `OCR_PARALLEL` | `2` | Concurrent request slots |\n| `OCR_TEMPERATURE` | `0.02` | Sampling temperature (keep low for OCR) |\n| `OCR_API_KEY` | *(empty)* | Bearer token for endpoint protection (optional) |\n\n### Example with custom config\n\n```bash\ndocker run -p 9090:9090 \\\n  -e OCR_PORT=9090 \\\n  -e OCR_THREADS=8 \\\n  -e OCR_PARALLEL=4 \\\n  -e OCR_API_KEY=my-secret-key \\\n  ghcr.io/kisaesdevlab/vibe-glm-ocr:latest\n```\n\n## Architecture\n\n```\n                POST /v1/chat/completions\n                (base64 image + prompt)\n                        |\n                +-------v--------+\n                |  ocr-server    |\n                |  :8090         |\n                |                |\n                |  llama-server  |\n                |  GLM-OCR F16   |\n                |  ~1.8 GB model |\n                |  ~2-3 GB RAM   |\n                +----------------+\n                        |\n                OCR text / Markdown table\n```\n\n## Resource Requirements\n\n| Metric | Value |\n|--------|-------|\n| RAM (idle, model loaded) | ~2 GB |\n| RAM (peak, during inference) | ~3 GB |\n| CPU (during inference) | All configured threads saturated |\n| Disk (image) | ~2.1 GB |\n| Startup time (model load) | ~5-10s |\n| Inference time per page | ~40-60s CPU, ~2-3s with GPU |\n\n## Why llama.cpp Instead of Ollama\n\n- **Smaller image**: No Ollama runtime, no model registry, no Go binary\n- **More control**: Direct access to `--cache-type-k`, `--flash-attn`, `--temperature` flags\n- **Slight speedup**: llama.cpp direct is marginally faster than Ollama for the same model\n- **Simpler healthcheck**: `curl /health` on a single-purpose server\n- **Appliance model**: One image, one model, one purpose\n\n## Model Details\n\n| File | Size | Purpose |\n|------|------|---------|\n| `GLM-OCR-F16.gguf` | 1.79 GB | Language decoder (GLM-0.5B) — F16 for max OCR accuracy |\n| `mmproj-GLM-OCR-Q8_0.gguf` | ~160 MB | CogViT visual encoder + projection |\n\nF16 is chosen for the decoder because at only 0.9B parameters, the size difference vs Q8_0 is negligible, while F16 preserves full precision for financial documents where a single misread digit matters.\n\n### Slim variant (Q8_0 decoder)\n\nFor bandwidth-constrained deployments, a `:slim` tag using the Q8_0 decoder (~950 MB vs 1.79 GB) reduces the compressed image by roughly 700 MB. Accuracy loss is minimal for printed documents; for handwritten notes or faint scans, stick with the default F16 tag.\n\n```bash\ndocker pull ghcr.io/kisaesdevlab/vibe-glm-ocr:slim\n```\n\nTo build slim locally, override the decoder filename in a fork of the Dockerfile's `model-fetcher` stage (`GLM-OCR-Q8_0.gguf`) and update the entrypoint's `--model` path.\n\n## Operations\n\n### Log rotation\n\n`llama-server` logs request lines and token counts to stdout. Under sustained traffic, unbounded Docker logs will eventually fill the host disk. Configure the `json-file` driver with rotation, or switch to `journald` / a remote syslog sink.\n\nPer-container (Docker CLI):\n\n```bash\ndocker run -p 8090:8090 \\\n  --log-driver json-file \\\n  --log-opt max-size=50m \\\n  --log-opt max-file=5 \\\n  ghcr.io/kisaesdevlab/vibe-glm-ocr:latest\n```\n\nCompose:\n\n```yaml\nservices:\n  ocr-server:\n    image: ghcr.io/kisaesdevlab/vibe-glm-ocr:latest\n    logging:\n      driver: json-file\n      options:\n        max-size: \"50m\"\n        max-file: \"5\"\n```\n\nHost-wide default lives in `/etc/docker/daemon.json`:\n\n```json\n{\n  \"log-driver\": \"json-file\",\n  \"log-opts\": { \"max-size\": \"50m\", \"max-file\": \"5\" }\n}\n```\n\n## License\n\nMIT (Dockerfile, entrypoint scripts, and repository code). GLM-OCR model is MIT licensed. llama.cpp is MIT licensed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkisaesdevlab%2Fvibe-glm-ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkisaesdevlab%2Fvibe-glm-ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkisaesdevlab%2Fvibe-glm-ocr/lists"}