{"id":50314396,"url":"https://github.com/openclaw/clawpdf","last_synced_at":"2026-06-04T05:00:42.534Z","repository":{"id":360894652,"uuid":"1252135795","full_name":"openclaw/clawpdf","owner":"openclaw","description":"Zero-dependency PDFium WebAssembly bindings for Node and browsers.","archived":false,"fork":false,"pushed_at":"2026-05-28T21:42:11.000Z","size":2229,"stargazers_count":70,"open_issues_count":0,"forks_count":4,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-03T04:19:08.896Z","etag":null,"topics":["node","pdf","wasm"],"latest_commit_sha":null,"homepage":"https://clawpdf.dev","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openclaw.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["moltbot"]}},"created_at":"2026-05-28T08:15:15.000Z","updated_at":"2026-06-02T17:22:14.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/openclaw/clawpdf","commit_stats":null,"previous_names":["openclaw/clawpdf"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/openclaw/clawpdf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fclawpdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fclawpdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fclawpdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fclawpdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openclaw","download_url":"https://codeload.github.com/openclaw/clawpdf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fclawpdf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33890052,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-04T02:00:06.755Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["node","pdf","wasm"],"created_at":"2026-05-28T23:00:53.706Z","updated_at":"2026-06-04T05:00:42.495Z","avatar_url":"https://github.com/openclaw.png","language":"TypeScript","funding_links":["https://github.com/sponsors/moltbot"],"categories":[],"sub_categories":[],"readme":"# clawpdf\n\n![clawpdf banner](docs/assets/readme-banner.jpg)\n\n[![CI](https://github.com/openclaw/clawpdf/actions/workflows/ci.yml/badge.svg)](https://github.com/openclaw/clawpdf/actions/workflows/ci.yml)\n\nZero-dependency PDFium WebAssembly bindings for Node and browsers.\n\nDocs: \u003chttps://clawpdf.dev/\u003e\n\n`clawpdf` loads PDFs, extracts text, renders pages, and encodes PNG fallback\nimages without runtime dependencies, native addons, postinstall scripts, or a\ncanvas package.\n\n## Why\n\nOpenClaw needs a predictable local PDF path:\n\n- text extraction before model fallback\n- page rendering when a PDF has little extractable text\n- PNG output for multimodal model input\n- one dependency with no transitive package tree\n- current vendored PDFium provenance\n\nThis package currently vendors `pdfium-lib` release `7623`.\n\n## Install\n\n```bash\nnpm install clawpdf\n```\n\nESM-only. Node 20+ is supported.\n\n## Quick Start\n\n```ts\nimport { writeFile } from \"node:fs/promises\";\nimport { openPdf } from \"clawpdf\";\n\nawait using pdf = await openPdf(\"report.pdf\");\n\nconsole.log(pdf.pageCount);\nconsole.log(pdf.text({ maxPages: 5 }));\n\nconst png = await pdf.page(1).png({ dpi: 144, forms: true });\nawait writeFile(\"page-1.png\", png);\n```\n\nAll user-facing page numbers are one-based.\n\n## CLI\n\nThe package also installs a `clawpdf` command:\n\n```bash\nclawpdf report.pdf\ncat report.pdf | clawpdf -\nclawpdf report.pdf --json\nclawpdf render report.pdf --page 1 \u003e page.png\nclawpdf render report.pdf --page 1 --inline auto\n```\n\nUse `--password` or `--password-file` for encrypted PDFs. See the\n[CLI docs](https://clawpdf.dev/cli.html) for flags, JSON output, and exit codes.\n\n## Reuse an Engine\n\nServer code should create one PDFium engine and reuse it:\n\n```ts\nimport { createEngine } from \"clawpdf\";\n\nawait using engine = await createEngine();\n\nawait using pdf = await engine.open(pdfBytes);\n\nconsole.log(pdf.metadata.title);\nconsole.log(pdf.page(1).text());\n```\n\nUse `engine.extract(...)` when you want the same text-first fallback behavior\nwithout manually opening and closing a document:\n\n```ts\nconst result = await engine.extract(pdfBytes, { mode: \"auto\", maxPages: 20 });\n```\n\n## Text-First Extraction\n\n```ts\nimport { extractPdf } from \"clawpdf\";\nimport { toMessageContent } from \"clawpdf/adapters\";\n\nconst result = await extractPdf(\"report.pdf\", {\n  mode: \"auto\",\n  minTextChars: 200,\n  maxPages: 20,\n  image: {\n    dpi: 96,\n    maxPixels: 4_000_000,\n    maxDimension: 10_000,\n    forms: true,\n  },\n});\n\nconsole.log(result.text);\nconsole.log(result.images); // raw PNG bytes\nconsole.log(toMessageContent(result)); // transport-shaped blocks\n```\n\n`auto` always extracts text and renders PNG images only when extracted text is\nshorter than `minTextChars`.\n\n## Browser Usage\n\nUse `clawpdf/browser` in bundled browser code. It exports the same API and\npre-wires the packaged WASM URL.\n\n```ts\nimport { openPdf } from \"clawpdf/browser\";\n\nawait using pdf = await openPdf(file);\nconsole.log(pdf.text({ maxPages: 3 }));\n```\n\nCustom WASM hosting is still available:\n\n```ts\nimport { createEngine } from \"clawpdf/browser\";\n\nawait using engine = await createEngine({\n  wasmUrl: \"/assets/pdfium.esm.wasm\",\n});\n```\n\n## Passwords\n\n```ts\nimport { openPdf } from \"clawpdf\";\n\nawait using pdf = await openPdf(\"secret.pdf\", { password: \"secret\" });\nconsole.log(pdf.text());\n```\n\nWrong or missing passwords throw `PdfPasswordError`.\n\n## API\n\nFeature docs:\n\n- [Loading PDFs](https://clawpdf.dev/loading.html)\n- [CLI](https://clawpdf.dev/cli.html)\n- [Text extraction](https://clawpdf.dev/text-extraction.html)\n- [Page rendering](https://clawpdf.dev/page-rendering.html)\n- [PNG output](https://clawpdf.dev/png-output.html)\n- [Extraction fallback](https://clawpdf.dev/extraction-fallback.html)\n- [Password-protected PDFs](https://clawpdf.dev/passwords.html)\n- [Browser and bundlers](https://clawpdf.dev/browser-bundlers.html)\n- [PDFium provenance](https://clawpdf.dev/pdfium-provenance.html)\n- [Package shape](https://clawpdf.dev/package-shape.html)\n- [Performance](https://clawpdf.dev/performance.html)\n- [API reference](https://clawpdf.dev/api-reference.html)\n\nCore exports:\n\n- `extractPdf(input, options?)`: one-shot extraction with a shared engine.\n- `openPdf(input, options?)`: open one document with private lifetime.\n- `createEngine(options?)`: create a reusable PDFium engine.\n- `releaseExtractEngine()`: dispose the shared extraction engine after in-flight calls finish.\n- `encodePng(rgba, { width, height, compress })`: standalone RGBA to PNG.\n- `PdfError` subclasses for typed failures.\n- `PDFIUM_RELEASE` and `PDFIUM_WASM_SHA256`.\n\n## Performance Snapshot\n\nLocal Node benchmark on five sample PDFs, first page rendered at scale `2` with\ntext extraction and PNG encoding included.\n\n| Sample | previous stack total / RSS / PNG | clawpdf total / RSS / PNG |\n| --- | --- | --- |\n| Form | 95.4 ms / 174.9 MB / 114,930 B | 38.7 ms / 129.4 MB / 100,629 B |\n| Hello | 65.2 ms / 159.7 MB / 41,408 B | 27.2 ms / 124.1 MB / 47,106 B |\n| Scientific | 176.9 ms / 202.0 MB / 608,807 B | 66.0 ms / 137.8 MB / 321,122 B |\n| Magazine | 519.4 ms / 312.0 MB / 1,616,318 B | 255.9 ms / 179.5 MB / 1,930,947 B |\n| Checkmark | 2.6 ms / 128.1 MB / 589 B | 1.1 ms / 83.2 MB / 498 B |\n\n## Package Shape\n\nRuntime dependencies: none.\nRelease history: see `CHANGELOG.md`.\n\nPublished files:\n\n- `dist/index.js`\n- `dist/cli.d.ts`\n- `dist/cli.js`\n- `dist/browser.js`\n- `dist/adapters/index.js`\n- `dist/vendor/pdfium.esm.js`\n- `dist/vendor/pdfium.esm.wasm`\n- `CHANGELOG.md`\n- license/readme/notices\n\nCurrent vendored binary:\n\n- `pdfium-lib`: `7623`\n- WASM SHA-256: `14ca2adbe23b45dea57da28ae2746e376f1cddfb8e2d0b01b71dcc5cf227734e`\n\n## Refresh PDFium\n\n```bash\npnpm download:pdfium\npnpm test\n```\n\nTo move to a newer `pdfium-lib` release, update the release tag and hashes in:\n\n- `scripts/download-pdfium.mjs`\n- `src/constants.ts`\n- this README\n- `docs/pdfium-provenance.md`\n\n## License\n\nMIT for this wrapper. PDFium has upstream BSD-style and Apache-2.0 notices; see\n`THIRD_PARTY_NOTICES.md`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenclaw%2Fclawpdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenclaw%2Fclawpdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenclaw%2Fclawpdf/lists"}