{"id":50687913,"url":"https://github.com/mrtcode/line-seg","last_synced_at":"2026-06-09T00:32:47.830Z","repository":{"id":325633625,"uuid":"1101874434","full_name":"mrtcode/line-seg","owner":"mrtcode","description":"Layout-Aware Transformer for Fast PDF Line-Type Classification in the Browser","archived":false,"fork":false,"pushed_at":"2025-11-22T13:45:55.000Z","size":470,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-22T14:21:21.996Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mrtcode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-22T12:00:04.000Z","updated_at":"2025-11-22T13:45:58.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mrtcode/line-seg","commit_stats":null,"previous_names":["mrtcode/line-seg"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mrtcode/line-seg","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrtcode%2Fline-seg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrtcode%2Fline-seg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrtcode%2Fline-seg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrtcode%2Fline-seg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mrtcode","download_url":"https://codeload.github.com/mrtcode/line-seg/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrtcode%2Fline-seg/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34086664,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-09T00:32:44.561Z","updated_at":"2026-06-09T00:32:47.825Z","avatar_url":"https://github.com/mrtcode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Layout-Aware Transformer-CRF for Fast PDF Line-Type Classification in the Browser\n\nA lightweight model built on a Transformer encoder with a linear-chain CRF decoder. It classifies lines on PDF pages and groups them into logical text blocks using only line geometry and simple text features, relying on the existing PDF text layer. This design allows for a much smaller model than image-based OCR and vision–language models, which naturally limits its use cases but greatly expands deployment options and enables fast CPU-only inference in the browser via ONNX WebAssembly.\n\n## Browser demo\n\nYou can try the demo directly in the [browser](https://mrtcode.github.io/line-seg/demo/index.html).\n\n## Demo model summary\n\n- **Inference time:** 1–6 ms per PDF page using ONNX WebAssembly on a CPU with SIMD acceleration.\n- **Training data:** ~25k PDF pages.\n- **Training setup:** ~40 hours on MPS (Apple M1 Pro).\n- **Evaluation:** 91% macro F1 score on a held-out test set.\n- **Model size:** ~100k parameters.\n\n## Supported block types\n\n- `frame` – headers, footers, page numbers, margin text\n- `title` – titles and table/figure captions\n- `body` – body text\n- `list_item` – individual list items\n- `equation`\n- `other`\n\nA `table` type was also evaluated in alternative model configurations and performed surprisingly well.\n\n## Model architecture\n\n- **Per-line input (16D)**\n    - 4 geometry values: normalized bbox `[x1, y1, x2, y2]`\n    - 10 numeric layout/text statistics (size, area, ratios, deltas, flags)\n    - 2 categorical features: `first_char_cat`, `last_char_cat`\n\n- **Embeddings**\n    - Geometry is encoded with multi-resolution bbox embeddings (fine + coarse grids).\n    - Numeric + categorical features are passed through a small MLP.\n    - The geometry and feature embeddings are concatenated into a 64‑dimensional vector per line.\n\n- **Encoder**\n    - 2-layer Transformer encoder (`d_model = 64`, 2 attention heads)\n    - Learned 2D relative position bias over line centers\n\n- **Output head**\n    - Per-line MLP produces scores (emissions) for 11 labels\n        - start / continuation / singleton variants of the semantic block types\n\n- **Sequence layer**\n    - Linear-chain CRF on top of the emissions\n    - Uses transition priors to favor coherent multi-line blocks\n    - Trained with negative log-likelihood + regularizer; decoded with Viterbi","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrtcode%2Fline-seg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrtcode%2Fline-seg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrtcode%2Fline-seg/lists"}