https://github.com/kisaesdevlab/vibe-glm-ocr
https://github.com/kisaesdevlab/vibe-glm-ocr
Last synced: 12 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/kisaesdevlab/vibe-glm-ocr
- Owner: KisaesDevLab
- License: other
- Created: 2026-04-13T15:07:20.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-13T15:31:13.000Z (3 months ago)
- Last Synced: 2026-04-13T17:27:40.855Z (3 months ago)
- Language: Dockerfile
- Size: 16.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Kisaes OCR Server
Self-contained Docker image running [llama.cpp](https://github.com/ggml-org/llama.cpp) server with [GLM-OCR](https://huggingface.co/ggml-org/GLM-OCR-GGUF) (0.9B parameter multimodal OCR model). Provides an OpenAI-compatible `/v1/chat/completions` endpoint that accepts base64-encoded images and returns recognized text or structured Markdown tables.
**No HuggingFace downloads at runtime. No Ollama dependency. No model management. Pull the image, run it, send images.**
## Quick Start
```bash
# Pull and run
docker pull ghcr.io/kisaesdevlab/vibe-glm-ocr:latest
docker run -p 8090:8090 ghcr.io/kisaesdevlab/vibe-glm-ocr:latest
# Check health
curl http://localhost:8090/health
# OCR a document
BASE64=$(base64 -w0 document.png)
curl -s http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{
\"model\": \"GLM-OCR\",
\"messages\": [{
\"role\": \"user\",
\"content\": [
{\"type\": \"image_url\", \"image_url\": {\"url\": \"data:image/png;base64,$BASE64\"}},
{\"type\": \"text\", \"text\": \"Text Recognition:\"}
]
}],
\"temperature\": 0.02
}"
```
## Build from Source
```bash
# Clone and build
git clone https://github.com/KisaesDevLab/Vibe-GLM-OCR.git
cd Vibe-GLM-OCR
docker compose -f docker-compose.dev.yml build
# Run locally
docker compose -f docker-compose.dev.yml up
```
## OCR Prompts
GLM-OCR supports two primary prompt modes:
| Prompt | Use Case |
|--------|----------|
| `Text Recognition:` | General text extraction — receipts, forms, letters, any unstructured document |
| `Table Recognition:` | Structured table extraction — returns Markdown or HTML tables |
## API
### Request
```json
{
"model": "GLM-OCR",
"messages": [{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,{base64data}"
}
},
{
"type": "text",
"text": "Table Recognition:"
}
]
}],
"temperature": 0.02
}
```
### Response
Standard OpenAI chat completion format. OCR text is in `choices[0].message.content`:
```json
{
"choices": [{
"message": {
"role": "assistant",
"content": "| Date | Description | Amount | Balance |\n|---|---|---|---|\n| 01/15 | Direct Deposit | 3,500.00 | 4,200.50 |"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 1842,
"completion_tokens": 156,
"total_tokens": 1998
}
}
```
### Endpoints
| Path | Method | Description |
|------|--------|-------------|
| `/health` | GET | Returns `{"status":"ok"}` when model is loaded |
| `/v1/chat/completions` | POST | OpenAI-compatible chat endpoint (OCR requests) |
| `/metrics` | GET | Prometheus metrics (request count, latency, tokens) |
## Configuration
All configuration is via environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `OCR_PORT` | `8090` | Server listen port |
| `OCR_THREADS` | `4` | CPU threads for inference |
| `OCR_CTX_SIZE` | `32768` | Context window (must be >= 16384 for GLM-OCR images) |
| `OCR_PARALLEL` | `2` | Concurrent request slots |
| `OCR_TEMPERATURE` | `0.02` | Sampling temperature (keep low for OCR) |
| `OCR_API_KEY` | *(empty)* | Bearer token for endpoint protection (optional) |
### Example with custom config
```bash
docker run -p 9090:9090 \
-e OCR_PORT=9090 \
-e OCR_THREADS=8 \
-e OCR_PARALLEL=4 \
-e OCR_API_KEY=my-secret-key \
ghcr.io/kisaesdevlab/vibe-glm-ocr:latest
```
## Architecture
```
POST /v1/chat/completions
(base64 image + prompt)
|
+-------v--------+
| ocr-server |
| :8090 |
| |
| llama-server |
| GLM-OCR F16 |
| ~1.8 GB model |
| ~2-3 GB RAM |
+----------------+
|
OCR text / Markdown table
```
## Resource Requirements
| Metric | Value |
|--------|-------|
| RAM (idle, model loaded) | ~2 GB |
| RAM (peak, during inference) | ~3 GB |
| CPU (during inference) | All configured threads saturated |
| Disk (image) | ~2.1 GB |
| Startup time (model load) | ~5-10s |
| Inference time per page | ~40-60s CPU, ~2-3s with GPU |
## Why llama.cpp Instead of Ollama
- **Smaller image**: No Ollama runtime, no model registry, no Go binary
- **More control**: Direct access to `--cache-type-k`, `--flash-attn`, `--temperature` flags
- **Slight speedup**: llama.cpp direct is marginally faster than Ollama for the same model
- **Simpler healthcheck**: `curl /health` on a single-purpose server
- **Appliance model**: One image, one model, one purpose
## Model Details
| File | Size | Purpose |
|------|------|---------|
| `GLM-OCR-F16.gguf` | 1.79 GB | Language decoder (GLM-0.5B) — F16 for max OCR accuracy |
| `mmproj-GLM-OCR-Q8_0.gguf` | ~160 MB | CogViT visual encoder + projection |
F16 is chosen for the decoder because at only 0.9B parameters, the size difference vs Q8_0 is negligible, while F16 preserves full precision for financial documents where a single misread digit matters.
### Slim variant (Q8_0 decoder)
For bandwidth-constrained deployments, a `:slim` tag using the Q8_0 decoder (~950 MB vs 1.79 GB) reduces the compressed image by roughly 700 MB. Accuracy loss is minimal for printed documents; for handwritten notes or faint scans, stick with the default F16 tag.
```bash
docker pull ghcr.io/kisaesdevlab/vibe-glm-ocr:slim
```
To build slim locally, override the decoder filename in a fork of the Dockerfile's `model-fetcher` stage (`GLM-OCR-Q8_0.gguf`) and update the entrypoint's `--model` path.
## Operations
### Log rotation
`llama-server` logs request lines and token counts to stdout. Under sustained traffic, unbounded Docker logs will eventually fill the host disk. Configure the `json-file` driver with rotation, or switch to `journald` / a remote syslog sink.
Per-container (Docker CLI):
```bash
docker run -p 8090:8090 \
--log-driver json-file \
--log-opt max-size=50m \
--log-opt max-file=5 \
ghcr.io/kisaesdevlab/vibe-glm-ocr:latest
```
Compose:
```yaml
services:
ocr-server:
image: ghcr.io/kisaesdevlab/vibe-glm-ocr:latest
logging:
driver: json-file
options:
max-size: "50m"
max-file: "5"
```
Host-wide default lives in `/etc/docker/daemon.json`:
```json
{
"log-driver": "json-file",
"log-opts": { "max-size": "50m", "max-file": "5" }
}
```
## License
MIT (Dockerfile, entrypoint scripts, and repository code). GLM-OCR model is MIT licensed. llama.cpp is MIT licensed.