{"id":50925672,"url":"https://github.com/qrcommunication/gigapdf-lib","last_synced_at":"2026-06-16T23:00:35.929Z","repository":{"id":364883613,"uuid":"1269587283","full_name":"QrCommunication/gigapdf-lib","owner":"QrCommunication","description":"Zero-dependency Rust→WASM PDF engine + TypeScript SDK — read/edit/render/OCR/convert PDFs with no third-party libs (@qrcommunication/gigapdf-lib)","archived":false,"fork":false,"pushed_at":"2026-06-14T22:56:11.000Z","size":716,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-15T00:20:04.856Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QrCommunication.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-14T22:33:49.000Z","updated_at":"2026-06-14T22:56:14.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/QrCommunication/gigapdf-lib","commit_stats":null,"previous_names":["qrcommunication/gigapdf-lib"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/QrCommunication/gigapdf-lib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QrCommunication%2Fgigapdf-lib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QrCommunication%2Fgigapdf-lib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QrCommunication%2Fgigapdf-lib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QrCommunication%2Fgigapdf-lib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QrCommunication","download_url":"https://codeload.github.com/QrCommunication/gigapdf-lib/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QrCommunication%2Fgigapdf-lib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34426745,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-16T02:00:06.860Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-16T23:00:18.349Z","updated_at":"2026-06-16T23:00:35.922Z","avatar_url":"https://github.com/QrCommunication.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gigapdf-lib\n\nA **zero-dependency** PDF engine, written from scratch in Rust and compiled to\nWebAssembly — read, edit, render, secure, **and convert** PDFs with no third-party\ncrates and no native libraries.\n\nThe TypeScript SDK is published as **[`@qrcommunication/gigapdf-lib`](https://www.npmjs.com/package/@qrcommunication/gigapdf-lib)**\n(see [`sdk/`](sdk/)); the self-contained `.wasm` ships inside it.\n\n\u003e Copyright 2025 Rony Licha / QR Communication.\n\u003e Licensed under the **PolyForm Noncommercial License 1.0.0** — see [`LICENSE`](LICENSE).\n\u003e Required Notice: Copyright 2025 Rony Licha / QR Communication.\n\n## Why it exists\n\nThe previous editor used a Fabric.js **overlay + cosmetic mask**, which cannot\nreconstruct a complex background (gradient, image, pattern) under edited text.\nThis engine edits the **real PDF content stream**: it physically removes/edits/adds\nthe page operators, so the background is preserved *by construction* and the\noriginal glyphs never leak. It then grew into a self-contained PDF toolkit so the\nproduct depends on **no** external PDF/Office/font library (no MuPDF, no\nLibreOffice, no fontkit) for its core flows.\n\n## Zero dependencies\n\n**None.** Everything is pure `std` and compiles straight to `wasm32`:\n\n- Lexer, object parser, xref-streams, object-streams.\n- `FlateDecode`/zlib **inflate *and* deflate** (RFC 1950/1951) from scratch.\n- Content-stream interpreter + editor; renumbering serializer.\n- Crypto from scratch: MD5, RC4, AES-128/256, SHA-256/384/512, big-integer\n  modular arithmetic (Montgomery), RSA, ASN.1 DER, X.509, CMS/PKCS#7.\n- Rasterizer: scanline fill (AA), PNG encoder, TrueType `glyf` + CFF Type2 glyph\n  outlines, image XObject blit.\n- ZIP reader/writer, OOXML/ODF builders, a from-scratch PDF page builder.\n\nThe WebAssembly sandbox has **no network and no entropy** — those come from the\nhost through a tiny port (the host supplies `crypto.getRandomValues` bytes and\nperforms Google-Fonts downloads). Everything else is in the engine.\n\n## Feature matrix\n\n| Area | Capabilities |\n|------|--------------|\n| **Read** | PDF 1.7, xref + object streams, FlateDecode, encrypted (RC4/AESV2/AESV3) |\n| **Write** | Renumbering serializer, `save`, `save_compressed` (Flate streams) |\n| **Edit content** | Text edit/remove, elements (text/image/shape) list/remove/move/duplicate/add; draw text/rect/line/ellipse/polygon/SVG-path/image (opacity + PNG alpha); hit-test |\n| **Text extraction** | Font-aware, zero-tofu via WinAnsi + `/ToUnicode` CMap (CID/Type0) |\n| **Annotations** | Highlight, underline, strike-out, free-text, square, line, ink, stamp, link; **flatten** |\n| **Forms (AcroForm)** | Text/checkbox/radio/combo/list/signature fields — **read · fill · create** (build widgets from scratch with appearance streams + `NeedAppearances`) |\n| **Pages** | Rotate, delete, move, extract, merge; bookmarks/outline; metadata |\n| **Security** | Encrypt/permissions, **self-signed digital signature** (RSA/X.509/CMS), **PKCS#12 signing** (import a user `.p12`/`.pfx` natively — PBES2 AES + PBES1 3DES/RC2, MAC-verified — no node-forge/@signpdf), **true redaction** (delete from stream, no opaque cover) |\n| **Render** | Rasterize a page to PNG (vector + TrueType/CFF glyphs + images) |\n| **Text intelligence** | Font-aware extraction, **structured text** (reading-order lines + boxes), **full-text search** with highlight boxes |\n| **OCR** | Built-in recognizer — Otsu → connected components → line/word segmentation → MLP trained on **EMNIST handwriting + synthetic font glyphs** (Latin + accents). No Tesseract, no model download at runtime |\n| **Convert →** | PDF → **TXT, HTML, DOCX, PPTX, ODP, ODT, XLSX, ODS, RTF** (real editable elements, not a page image) |\n| **Convert ←** | **TXT, HTML, RTF, DOCX, ODT, ODP, PPTX, XLSX, ODS** → PDF (ODF `.odt`/`.ods`/`.odp` are fully bidirectional) |\n| **HTML rendering** | Native **HTML + CSS → PDF** engine (parser, selector cascade, block / inline / table / **flex** (direction · justify-content · grow) / **grid** layout, pagination, **`page-break-*` + `\u003cpagebreak\u003e`**) — no headless browser. Text set in **embedded Google fonts** (real glyphs + metrics, identical or nearest match) |\n| **JavaScript** | Built-in zero-dependency **JS engine** that runs a document's inline `\u003cscript\u003e`s before layout — **no Chromium/Playwright**. Lexer → parser → tree-walking interpreter with **classes + `super`**, closures, destructuring, generators (`function*`/`yield`), **`async`/`await` + `Promise`** (microtask queue + `setTimeout`), and built-ins: `Object`/`Array`/`String`/`Number`/`Math`/`JSON`/`console`/`Map`/`Set`/**`RegExp`** + a backtracking regex engine. **DOM bindings**: `getElementById`, `querySelector(All)` (`#id`/`.class`/`tag`/`\u003e`/`+`/`~`/`[attr]`), `textContent`, `innerHTML`, `createElement`/`appendChild`, `classList`, `style`, … |\n| **Archival** | **PDF/A-2b** metadata (XMP + sRGB OutputIntent + ID) |\n| **Fonts** | Draw **and edit** real text in **every font source \u0026 any font file** — built-in **base-14 standard fonts** (no embedding), any family / **Google Font** (1951-family catalog + URL builder + **TrueType *and* OpenType-CFF embedding**: glyf→Type0/CIDFontType2+FontFile2, `.otf`/`OTTO`→Type0/CIDFontType0+FontFile3, Identity-H + full widths + ToUnicode), and the **document's own embedded faces** (`embeddedFonts` + `extractFont` → re-embed). `addText` **and** font-aware `replaceText` resolve any face's char→glyph map (`FontFile2`/`FontFile3`); needed-font detection |\n\nAll of it is exercised by `cargo test` (**284 tests**, incl. a 100-test pure-Rust\nJavaScript engine: lexer, parser, interpreter, built-ins, regex, DOM, and a\nsuspendable bytecode VM with lazy generators, spec-ordered async, and full\ncontrol-flow — `try`/`catch`/`finally`, `switch`, labels, destructuring,\nspread), a Node WASM smoke test\n(end-to-end, all green), and **validated externally**: generated Office files\n(DOCX/PPTX/XLSX **and ODT/ODS/ODP**) open and round-trip in LibreOffice; embedded\nfonts verify as `emb=yes` under poppler's `pdffonts`.\n\n## Honest scope\n\nConversions are **content-and-layout faithful**, not pixel-perfect re-typesetting.\nPDF→Office reconstructs **real, editable objects** (positioned text boxes,\nre-embedded images, table cells) the way an office suite's PDF import does — not a\nrendered page image. Office→PDF is **text-faithful** (all content, reading order,\npagination) using the standard-14 fonts; pixel-perfect re-layout of an arbitrary,\nrichly-styled document stays the job of a full layout engine. Full PDF/A\nconformance additionally requires every font embedded (the engine can do that).\n\nThe **JavaScript engine** targets the language used by templating/report scripts:\nclasses/`super`, closures, destructuring/spread, `RegExp`, `Map`/`Set`, `Symbol`\n(real, with the iterator protocol), `eval`/`Function`, tagged templates, and\n`import`/`export` (parsed transparently). `function*`/`async` bodies compile to a\n**suspendable bytecode VM**, so generators are **truly lazy** (infinite\n`while (true) { yield … }` works, `.next(v)` is bidirectional, `yield*` delegates\nlazily) and `await` **yields to the event loop** with spec microtask ordering.\nThe VM covers the full statement/expression language used by templates —\n`try`/`catch`/`finally`, `for…of`/`for…in`, `switch`, labelled `break`/\n`continue`, destructuring, compound assignment, and `...spread` — all able to\nspan a `yield`/`await`. A handful of corner cases (a `return`/`break` *through* a\n`finally`, a logical `\u0026\u0026=`/`||=`/`??=` with an awaited right-hand side, sparse\narray holes) transparently fall back to the eager generator / synchronous-await\nmodel — same results, just not lazy.\nBy design the sandbox has **no network and no real timers** (`setTimeout`\nresolves on the microtask queue). CSS **flex** supports `flex-direction`,\n`justify-content` and `flex-grow`; **grid** lays out `grid-template-columns`;\n**float** maps to inline-block.\n\n## OCR \u0026 text intelligence\n\nText already in a PDF is extracted **font-aware** (zero tofu) with reading-order\nlines and bounding boxes, and is searchable with highlight boxes. For **scanned,\nimage-only pages** the engine has a built-in OCR following the classic Tesseract\npipeline — Otsu binarization → connected-component blobs → line/word segmentation\n→ per-glyph classification — but with a from-scratch, dependency-free classifier:\n\n- The classifier is a small MLP **trained offline** on two public sources:\n  **EMNIST** (NIST handwritten digits + letters, public domain) for **handwriting**,\n  and **synthetic glyphs rendered from ~220 system fonts** (the Tesseract\n  `text2image` approach) for **printed text, punctuation and accented Latin**.\n- Training is build-time only (`tools/train_ocr.py`); the engine ships the\n  **int8-quantized weights** and runs a pure-`std` forward pass — no ML library,\n  no model download at runtime.\n- **Scripts/languages:** Latin — `0-9 A-Z a-z`, common punctuation, and accented\n  Latin (`é è à ç ñ ü …`) for French, Spanish, German, Portuguese, etc. Both\n  **printed and handwritten** Latin are recognized. Other scripts (Cyrillic,\n  Greek, CJK, Arabic) are not covered yet — they're a matter of adding classes +\n  data to the trainer, with **no runtime change**.\n- **Honest accuracy:** strong on clean machine print, decent on tidy handwriting\n  (EMNIST-grade); noisy scans and dense layouts are harder. Retrain with more data\n  to improve — the runtime never changes.\n\n## Layout\n\n```\ncrates/core   gigapdf-core  — the whole engine (parse, inflate, edit, render, crypto, convert)\ncrates/wasm   gigapdf-wasm  — extern \"C\" WebAssembly bindings (zero-dep ABI)\nfixtures/     test PDFs\ntest/         wasm-smoke.mjs — end-to-end Node harness\ntools/        catalog/ICC generators + snapshots\ndocs/         API.md · USAGE.md · INSTALL.md\n```\n\n## Quickstart\n\n### Rust\n\n```rust\nuse gigapdf_core::Document;\n\nlet mut doc = Document::open(\u0026bytes)?;\nlet docx = doc.to_docx();            // PDF → editable Word\nlet pdf  = gigapdf_core::convert::reverse::txt_to_pdf(\"Hello\\nWorld\"); // text → PDF\ndoc.embed_truetype_font(\"Roboto\", \u0026ttf)?; // host-downloaded font\nlet signed = doc.sign(\u0026signer, \"Me\", \"Approval\", \"D:20260614120000Z\")?;\nlet out = doc.save();\n```\n\n### Browser / Node (WebAssembly)\n\n```js\nconst { instance } = await WebAssembly.instantiate(wasmBytes, {});\nconst ex = instance.exports;\nconst handle = ex.gp_open(ptr, len);     // returns an opaque handle\nconst docx = callBuffer(() =\u003e ex.gp_to_docx(handle, lenPtr)); // → Uint8Array\nex.gp_close(handle);\n```\n\n### Documentation\n\n| Doc | What's in it |\n|-----|--------------|\n| [`docs/SDK.md`](docs/SDK.md) | **Complete TypeScript SDK reference** — every `GigaPdfEngine`/`GigaPdfDoc` method, grouped by domain, with parameters, returns and notes. |\n| [`docs/USAGE.md`](docs/USAGE.md) | Cookbook: the buffer ABI plus a worked example for every feature area. |\n| [`docs/API.md`](docs/API.md) | The Rust ↔ WASM ABI mapping (every `gp_*` export and its Rust method). |\n| [`docs/HTML-CSS.md`](docs/HTML-CSS.md) | The **exhaustive** list of supported HTML elements, CSS properties, units, colours, selectors and JS in the HTML→PDF renderer. |\n| [`docs/INSTALL.md`](docs/INSTALL.md) | Install, build-from-source, and Next.js (`output: \"standalone\"`) wiring. |\n\n## Build\n\n```bash\ncargo test -p gigapdf-core   # native tests (real fixtures)\ncargo wasm                   # build the WASM engine (alias, see .cargo/config.toml)\nnode test/wasm-smoke.mjs     # end-to-end WASM smoke test\n```\n\n`cargo wasm` is a repo alias for the full target build, so you never type the\ntarget triple by hand (`cargo wasm-dev` for a debug build).\n\nThe release `.wasm` is ~540 KB — **zero dependencies**, versus ~14 MB for MuPDF.\n\n## License \u0026 provenance\n\nPolyForm Noncommercial 1.0.0. Built clean-room from the ISO 32000 specification;\n**no AGPL code (e.g. MuPDF) was ever read or copied.** See [`LICENSE`](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqrcommunication%2Fgigapdf-lib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqrcommunication%2Fgigapdf-lib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqrcommunication%2Fgigapdf-lib/lists"}