https://github.com/openclaw/clawpdf
Zero-dependency PDFium WebAssembly bindings for Node and browsers.
https://github.com/openclaw/clawpdf
node pdf wasm
Last synced: 10 days ago
JSON representation
Zero-dependency PDFium WebAssembly bindings for Node and browsers.
- Host: GitHub
- URL: https://github.com/openclaw/clawpdf
- Owner: openclaw
- License: mit
- Created: 2026-05-28T08:15:15.000Z (17 days ago)
- Default Branch: main
- Last Pushed: 2026-05-28T21:42:11.000Z (16 days ago)
- Last Synced: 2026-06-03T04:19:08.896Z (11 days ago)
- Topics: node, pdf, wasm
- Language: TypeScript
- Homepage: https://clawpdf.dev
- Size: 2.13 MB
- Stars: 70
- Watchers: 0
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# clawpdf

[](https://github.com/openclaw/clawpdf/actions/workflows/ci.yml)
Zero-dependency PDFium WebAssembly bindings for Node and browsers.
Docs:
`clawpdf` loads PDFs, extracts text, renders pages, and encodes PNG fallback
images without runtime dependencies, native addons, postinstall scripts, or a
canvas package.
## Why
OpenClaw needs a predictable local PDF path:
- text extraction before model fallback
- page rendering when a PDF has little extractable text
- PNG output for multimodal model input
- one dependency with no transitive package tree
- current vendored PDFium provenance
This package currently vendors `pdfium-lib` release `7623`.
## Install
```bash
npm install clawpdf
```
ESM-only. Node 20+ is supported.
## Quick Start
```ts
import { writeFile } from "node:fs/promises";
import { openPdf } from "clawpdf";
await using pdf = await openPdf("report.pdf");
console.log(pdf.pageCount);
console.log(pdf.text({ maxPages: 5 }));
const png = await pdf.page(1).png({ dpi: 144, forms: true });
await writeFile("page-1.png", png);
```
All user-facing page numbers are one-based.
## CLI
The package also installs a `clawpdf` command:
```bash
clawpdf report.pdf
cat report.pdf | clawpdf -
clawpdf report.pdf --json
clawpdf render report.pdf --page 1 > page.png
clawpdf render report.pdf --page 1 --inline auto
```
Use `--password` or `--password-file` for encrypted PDFs. See the
[CLI docs](https://clawpdf.dev/cli.html) for flags, JSON output, and exit codes.
## Reuse an Engine
Server code should create one PDFium engine and reuse it:
```ts
import { createEngine } from "clawpdf";
await using engine = await createEngine();
await using pdf = await engine.open(pdfBytes);
console.log(pdf.metadata.title);
console.log(pdf.page(1).text());
```
Use `engine.extract(...)` when you want the same text-first fallback behavior
without manually opening and closing a document:
```ts
const result = await engine.extract(pdfBytes, { mode: "auto", maxPages: 20 });
```
## Text-First Extraction
```ts
import { extractPdf } from "clawpdf";
import { toMessageContent } from "clawpdf/adapters";
const result = await extractPdf("report.pdf", {
mode: "auto",
minTextChars: 200,
maxPages: 20,
image: {
dpi: 96,
maxPixels: 4_000_000,
maxDimension: 10_000,
forms: true,
},
});
console.log(result.text);
console.log(result.images); // raw PNG bytes
console.log(toMessageContent(result)); // transport-shaped blocks
```
`auto` always extracts text and renders PNG images only when extracted text is
shorter than `minTextChars`.
## Browser Usage
Use `clawpdf/browser` in bundled browser code. It exports the same API and
pre-wires the packaged WASM URL.
```ts
import { openPdf } from "clawpdf/browser";
await using pdf = await openPdf(file);
console.log(pdf.text({ maxPages: 3 }));
```
Custom WASM hosting is still available:
```ts
import { createEngine } from "clawpdf/browser";
await using engine = await createEngine({
wasmUrl: "/assets/pdfium.esm.wasm",
});
```
## Passwords
```ts
import { openPdf } from "clawpdf";
await using pdf = await openPdf("secret.pdf", { password: "secret" });
console.log(pdf.text());
```
Wrong or missing passwords throw `PdfPasswordError`.
## API
Feature docs:
- [Loading PDFs](https://clawpdf.dev/loading.html)
- [CLI](https://clawpdf.dev/cli.html)
- [Text extraction](https://clawpdf.dev/text-extraction.html)
- [Page rendering](https://clawpdf.dev/page-rendering.html)
- [PNG output](https://clawpdf.dev/png-output.html)
- [Extraction fallback](https://clawpdf.dev/extraction-fallback.html)
- [Password-protected PDFs](https://clawpdf.dev/passwords.html)
- [Browser and bundlers](https://clawpdf.dev/browser-bundlers.html)
- [PDFium provenance](https://clawpdf.dev/pdfium-provenance.html)
- [Package shape](https://clawpdf.dev/package-shape.html)
- [Performance](https://clawpdf.dev/performance.html)
- [API reference](https://clawpdf.dev/api-reference.html)
Core exports:
- `extractPdf(input, options?)`: one-shot extraction with a shared engine.
- `openPdf(input, options?)`: open one document with private lifetime.
- `createEngine(options?)`: create a reusable PDFium engine.
- `releaseExtractEngine()`: dispose the shared extraction engine after in-flight calls finish.
- `encodePng(rgba, { width, height, compress })`: standalone RGBA to PNG.
- `PdfError` subclasses for typed failures.
- `PDFIUM_RELEASE` and `PDFIUM_WASM_SHA256`.
## Performance Snapshot
Local Node benchmark on five sample PDFs, first page rendered at scale `2` with
text extraction and PNG encoding included.
| Sample | previous stack total / RSS / PNG | clawpdf total / RSS / PNG |
| --- | --- | --- |
| Form | 95.4 ms / 174.9 MB / 114,930 B | 38.7 ms / 129.4 MB / 100,629 B |
| Hello | 65.2 ms / 159.7 MB / 41,408 B | 27.2 ms / 124.1 MB / 47,106 B |
| Scientific | 176.9 ms / 202.0 MB / 608,807 B | 66.0 ms / 137.8 MB / 321,122 B |
| Magazine | 519.4 ms / 312.0 MB / 1,616,318 B | 255.9 ms / 179.5 MB / 1,930,947 B |
| Checkmark | 2.6 ms / 128.1 MB / 589 B | 1.1 ms / 83.2 MB / 498 B |
## Package Shape
Runtime dependencies: none.
Release history: see `CHANGELOG.md`.
Published files:
- `dist/index.js`
- `dist/cli.d.ts`
- `dist/cli.js`
- `dist/browser.js`
- `dist/adapters/index.js`
- `dist/vendor/pdfium.esm.js`
- `dist/vendor/pdfium.esm.wasm`
- `CHANGELOG.md`
- license/readme/notices
Current vendored binary:
- `pdfium-lib`: `7623`
- WASM SHA-256: `14ca2adbe23b45dea57da28ae2746e376f1cddfb8e2d0b01b71dcc5cf227734e`
## Refresh PDFium
```bash
pnpm download:pdfium
pnpm test
```
To move to a newer `pdfium-lib` release, update the release tag and hashes in:
- `scripts/download-pdfium.mjs`
- `src/constants.ts`
- this README
- `docs/pdfium-provenance.md`
## License
MIT for this wrapper. PDFium has upstream BSD-style and Apache-2.0 notices; see
`THIRD_PARTY_NOTICES.md`.