{"id":50555023,"url":"https://github.com/coroboros/pdf-cleaner","last_synced_at":"2026-06-04T06:02:51.944Z","repository":{"id":359298040,"uuid":"1245249518","full_name":"coroboros/pdf-cleaner","owner":"coroboros","description":"Strip metadata and links from PDFs locally — no upload, no tracking.","archived":false,"fork":false,"pushed_at":"2026-05-21T09:54:22.000Z","size":145,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-21T14:58:43.896Z","etag":null,"topics":["annotations","cli","coroboros","hyperlinks","metadata","node","pdf","pdf-cleaner","privacy","typescript"],"latest_commit_sha":null,"homepage":"https://github.com/coroboros/pdf-cleaner#readme","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coroboros.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-21T03:45:37.000Z","updated_at":"2026-05-21T09:54:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/coroboros/pdf-cleaner","commit_stats":null,"previous_names":["coroboros/pdf-cleaner"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/coroboros/pdf-cleaner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coroboros%2Fpdf-cleaner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coroboros%2Fpdf-cleaner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coroboros%2Fpdf-cleaner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coroboros%2Fpdf-cleaner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coroboros","download_url":"https://codeload.github.com/coroboros/pdf-cleaner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coroboros%2Fpdf-cleaner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33891733,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-04T02:00:06.755Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotations","cli","coroboros","hyperlinks","metadata","node","pdf","pdf-cleaner","privacy","typescript"],"created_at":"2026-06-04T06:02:48.901Z","updated_at":"2026-06-04T06:02:51.937Z","avatar_url":"https://github.com/coroboros.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"assets/logo.png\" width=\"288\" height=\"288\" alt=\"@coroboros/pdf-cleaner\"/\u003e\n\n\u003c!-- omit in toc --\u003e\n# @coroboros/pdf-cleaner\n\n**Strip metadata and links from PDFs locally — no upload, no tracking.**\n\nRemoves `/Link` annotations and wipes the Info dictionary plus any XMP metadata stream attached to the catalog. Ships as both a programmatic library and an `npx` CLI. One runtime dependency: `pdf-lib`.\n\n[![npm](https://img.shields.io/npm/v/@coroboros/pdf-cleaner?style=flat-square\u0026color=000000)](https://www.npmjs.com/package/@coroboros/pdf-cleaner)\n[![ci](https://img.shields.io/github/actions/workflow/status/coroboros/pdf-cleaner/ci.yml?branch=main\u0026style=flat-square\u0026label=ci\u0026color=000000)](https://github.com/coroboros/pdf-cleaner/actions/workflows/ci.yml)\n[![license](https://img.shields.io/badge/license-MIT-000000?style=flat-square)](https://opensource.org/licenses/MIT)\n[![stars](https://img.shields.io/github/stars/coroboros/pdf-cleaner?style=flat-square\u0026label=stars\u0026color=000000)](https://github.com/coroboros/pdf-cleaner)\n[![coroboros.com](https://img.shields.io/badge/coroboros.com-000000?style=flat-square\u0026logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJub25lIiBzdHJva2U9IndoaXRlIiBzdHJva2Utd2lkdGg9IjIiIHN0cm9rZS1saW5lY2FwPSJyb3VuZCIgc3Ryb2tlLWxpbmVqb2luPSJyb3VuZCI+PGNpcmNsZSBjeD0iMTIiIGN5PSIxMiIgcj0iMTAiLz48cGF0aCBkPSJNMiAxMmgyME0xMiAyYTE1LjMgMTUuMyAwIDAgMSA0IDEwIDE1LjMgMTUuMyAwIDAgMS00IDEwIDE1LjMgMTUuMyAwIDAgMS00LTEwIDE1LjMgMTUuMyAwIDAgMSA0LTEweiIvPjwvc3ZnPg==)](https://coroboros.com)\n\n\u003c/div\u003e\n\n\u003c!-- omit in toc --\u003e\n## Contents\n\n- [Requirements](#requirements)\n- [Install](#install)\n- [Usage](#usage)\n- [Why this exists](#why-this-exists)\n- [CLI](#cli)\n- [API](#api)\n- [Limitations](#limitations)\n- [Compared to alternatives](#compared-to-alternatives)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Requirements\n\n- Node.js `\u003e= 22 LTS`. Use [fnm](https://github.com/Schniz/fnm) for fast Rust-based version switching.\n- Any modern package manager: pnpm, npm, yarn, bun.\n\n## Install\n\n**As a library**\n\n```bash\npnpm add @coroboros/pdf-cleaner\n```\n\n```bash\nnpm install @coroboros/pdf-cleaner\n```\n\n```bash\nyarn add @coroboros/pdf-cleaner\n```\n\n```bash\nbun add @coroboros/pdf-cleaner\n```\n\n**As a CLI**\n\n```bash\n# Run without installing\nnpx @coroboros/pdf-cleaner cv.pdf\n```\n\n```bash\n# Install globally for repeated use\npnpm add -g @coroboros/pdf-cleaner\npdf-cleaner --help\n```\n\n## Usage\n\n**Programmatic**\n\n```ts\nimport { readFile, writeFile } from 'node:fs/promises';\nimport { clean } from '@coroboros/pdf-cleaner';\n\nconst cleaned = await clean(await readFile('cv.pdf'));\nawait writeFile('cv_clean.pdf', cleaned);\n```\n\n**CLI**\n\n```bash\nnpx @coroboros/pdf-cleaner cv.pdf\n```\n\n## Why this exists\n\nPDFs carry hidden authorship. The Info dictionary embeds `/Title`, `/Author`, `/Producer`, creation and modification dates, and any XMP metadata stream attached to the catalog. Hyperlinks travel via `/Link` annotations on each page. Hosted cleaners strip both, then upload the bytes. `@coroboros/pdf-cleaner` runs the same strips in-process on a single dependency ([`pdf-lib`](https://github.com/Hopding/pdf-lib)). No network calls, no telemetry. See [`bench/baseline.md`](bench/baseline.md) for the round-trip numbers and the regression budget.\n\n## CLI\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003epdf-cleaner \u0026lt;input\u0026gt; [options]\u003c/code\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nStrip metadata and links from a PDF or a directory of PDFs. Writes the cleaned bytes alongside the input with a `_clean.pdf` suffix unless `--out` or `--in-place` is set.\n\n**Arguments**\n\n| Arg | Type | Description |\n| --- | --- | --- |\n| `\u003cinput\u003e` | `string` *(required)* | A `.pdf` file, or a directory of `.pdf` files. Directory mode is top-level only — subdirectories are not traversed. |\n\n**Options**\n\n| Flag | Type | Default | Description |\n| --- | --- | --- | --- |\n| `--out \u003cdir\u003e` | `string` | alongside input | Output directory for cleaned files. Created if missing. |\n| `--in-place` | `boolean` | `false` | Overwrite the input(s) in place. TTY prompts for confirmation; non-TTY contexts require `--yes`. |\n| `--yes`, `-y` | `boolean` | `false` | Skip the `--in-place` confirmation prompt. Required to run `--in-place` in CI, scripts, or any non-TTY context. |\n| `--keep-links` | `boolean` | `false` | Preserve `/Link` annotations. Other annotation subtypes are preserved regardless. |\n| `--keep-metadata` | `boolean` | `false` | Preserve the Info dictionary (`/Title`, `/Author`, `/Subject`, `/Keywords`, `/Creator`, `/Producer`, `/CreationDate`, `/ModDate`) and any XMP metadata stream. |\n| `--help`, `-h` | `boolean` | — | Print the usage block and exit `0`. |\n| `--version`, `-v` | `boolean` | — | Print the package version and exit `0`. |\n\n**Exit codes**\n\n| Code | Meaning |\n| --- | --- |\n| `0` | Success. Every input file produced a cleaned output. |\n| `1` | User error. Bad input path, unknown flag, or `--in-place` in a non-TTY context without `--yes`. |\n| `2` | Per-file cleaning error. At least one file failed. Other files in directory mode still complete. |\n| `3` | Unexpected error not classified above. |\n\n**Examples**\n\n```bash\n# Single file → cv_clean.pdf alongside the input\npdf-cleaner cv.pdf\n\n# Single file → custom output directory\npdf-cleaner cv.pdf --out ./out\n\n# Directory of PDFs (top-level only, non-recursive)\npdf-cleaner ./input --out ./output\n\n# Overwrite the originals — prompts in a TTY, requires --yes otherwise\npdf-cleaner cv.pdf --in-place\npdf-cleaner cv.pdf --in-place --yes\n\n# Granular opt-out\npdf-cleaner cv.pdf --keep-links\npdf-cleaner cv.pdf --keep-metadata\n```\n\n\u003c/details\u003e\n\n## API\n\n### Types\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eCleanInput\u003c/code\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nThe bytes that [`clean`](#cleaning) accepts.\n\n```ts\ntype CleanInput = Uint8Array | ArrayBuffer;\n```\n\nNode `Buffer` is accepted via structural compatibility — `Buffer` extends `Uint8Array`.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eCleanOptions\u003c/code\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nPer-call overrides for [`clean`](#cleaning). Every field is optional; the two boolean flags default to `false` so the defaults strip aggressively.\n\n| Option | Type | Default | Description |\n| --- | --- | --- | --- |\n| `keepLinks` | `boolean` | `false` | Preserve `/Link` annotations on every page. Other annotation subtypes (text notes, highlights, form widgets) are preserved regardless. |\n| `keepMetadata` | `boolean` | `false` | Preserve the Info dictionary (`/Title`, `/Author`, `/Subject`, `/Keywords`, `/Creator`, `/Producer`, `/CreationDate`, `/ModDate`) and any XMP metadata stream attached to the catalog. |\n| `signal` | `AbortSignal` | *(none)* | Cancel the operation cooperatively. Checked before pdf-lib `load`, after `load`, and after the strip phase. Aborting throws `CleanError` with `code: 'ABORTED'` and `cause = signal.reason`. The cancellation is non-cooperative inside pdf-lib itself — once `load` or `save` is entered, it runs to completion before the next check fires. |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eCleanError\u003c/code\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nThrown by [`clean`](#cleaning) for inputs it cannot process. Inherits from `Error`, supports `Error.cause` for wrapping.\n\n```ts\nclass CleanError extends Error {\n  readonly name: 'CleanError';\n  readonly code: CleanErrorCode;\n  constructor(code: CleanErrorCode, message: string, options?: { cause?: unknown });\n}\n```\n\nThe `code` field is a stable string discriminant safe for runtime branching. See [Errors](#errors) for the code list.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eCleanErrorCode\u003c/code\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n```ts\ntype CleanErrorCode = 'INVALID_INPUT' | 'PARSE_FAILED' | 'ENCRYPTED' | 'ABORTED';\n```\n\n\u003c/details\u003e\n\n### Cleaning\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eclean(input, options?)\u003c/code\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nStrip metadata and links from a PDF and return the cleaned bytes.\n\n**Parameters**\n\n| Option | Type | Default | Description |\n| --- | --- | --- | --- |\n| `input` | [`CleanInput`](#types) | *(required)* | The PDF bytes. Must be non-empty. |\n| `options?` | [`CleanOptions`](#types) | `{}` | Per-call overrides. See the type for each field. |\n\n**Returns** — `Promise\u003cUint8Array\u003e`. The cleaned PDF bytes. `clean()` is idempotent on the observable surface — calling it on its own output is a no-op.\n\n**Throws** — [`CleanError`](#types). `INVALID_INPUT` when the input is not bytes, is null, or is empty. `PARSE_FAILED` when the bytes do not parse as a valid PDF; the underlying parser error is preserved on `Error.cause`. `ENCRYPTED` when the PDF carries an `/Encrypt` entry — decrypt before cleaning. `ABORTED` when `options.signal` fires; `signal.reason` is preserved on `Error.cause`.\n\n**Notes** — see [`bench/baseline.md`](bench/baseline.md) for the round-trip numbers and the regression budget.\n\n**Examples**\n\n```ts\n// Default — strip both links and metadata\nconst cleaned = await clean(bytes);\n```\n\n```ts\n// Wipe metadata, keep working hyperlinks\nconst cleaned = await clean(bytes, { keepLinks: true });\n```\n\n```ts\n// Pre-publish CV — strip metadata, keep links so the LinkedIn URL still clicks\nimport { readFile, writeFile } from 'node:fs/promises';\nconst original = await readFile('cv.pdf');\nconst cleaned = await clean(original, { keepLinks: true });\nawait writeFile('cv_public.pdf', cleaned);\n```\n\n```ts\n// Server-side use — bound the work with an AbortSignal\nconst cleaned = await clean(bytes, { signal: AbortSignal.timeout(5000) });\n```\n\n\u003c/details\u003e\n\n### Errors\n\n| Code | Description |\n| --- | --- |\n| `INVALID_INPUT` | `input` is missing, `null`, not a [`CleanInput`](#types), or empty. |\n| `PARSE_FAILED` | The bytes do not parse as a valid PDF. The original parser error is available via `Error.cause`. |\n| `ENCRYPTED` | The PDF carries an `/Encrypt` trailer entry. Decrypt before cleaning. |\n| `ABORTED` | `options.signal` fired during the operation. `signal.reason` is preserved on `Error.cause`. |\n\n## Limitations\n\n- Stripping is limited to `/Link` annotations and the standard metadata surfaces (Info dictionary plus any XMP metadata stream). Other annotation subtypes are preserved.\n- Encrypted PDFs are rejected with `ENCRYPTED`. Decrypt them first.\n- Directory mode walks the top level only — subdirectories are not traversed.\n- Text content, embedded images, page geometry, fonts, bookmarks, and form fields are preserved untouched.\n- Out of scope: text redaction, watermark removal, compression, OCR, JavaScript action stripping, attachment removal.\n\n## Compared to alternatives\n\n| Feature                              |   `pdf-lib` (raw)    | `qpdf` / `node-qpdf2` | `exiftool-vendored` |    `muhammara`     | **`@coroboros/pdf-cleaner`** |\n| ------------------------------------ | :------------------: | :-------------------: | :-----------------: | :----------------: | :--------------------------: |\n| Strip Info dictionary                | DIY                  | DIY (binary flags)    | yes (`-all=`)       | DIY                | yes                          |\n| Strip XMP metadata stream            | DIY                  | DIY (binary flags)    | yes (`-all=`)       | DIY                | yes                          |\n| Strip `/Link` annotations            | DIY                  | DIY                   | no                  | DIY                | yes                          |\n| Pure JS — no native binary           | yes                  | no (qpdf binary)      | no (Perl binary)    | no (C++ bindings)  | yes                          |\n| In-process — no network upload       | yes                  | yes                   | yes                 | yes                | yes                          |\n| CLI included                         | no                   | no (lib only)         | no (lib only)       | no                 | yes                          |\n| `AbortSignal` cancellation           | no                   | no                    | no                  | no                 | yes                          |\n| Coded `ENCRYPTED` rejection          | throws (no code)     | no                    | n/a                 | unknown            | yes                          |\n\nThe market gap is in-process strip plus a bundled CLI. `pdf-lib` ships the engine but no strip helper; every byte you remove, you write the code for. `qpdf` and `muhammara` carry native binaries, and the npm wrappers focus on encryption rather than metadata. `exiftool` clears the Info dict and XMP cleanly but never touches the annotation array, so `/Link` rectangles stay clickable in the output. Hosted cleaners cover everything except the one rule that mattered first: the file leaves your machine. `@coroboros/pdf-cleaner` runs the three strips in-process on `pdf-lib`. The same install ships a coded `CleanError`, `AbortSignal` cancellation at every phase, an `npx` CLI, and an `ENCRYPTED` rejection code for password-protected PDFs.\n\n## Contributing\n\nBug reports and PRs welcome.\n\n- Open an issue before submitting non-trivial PRs.\n- Commits follow [Conventional Commits](https://www.conventionalcommits.org/).\n- Run `pnpm lint \u0026\u0026 pnpm typecheck \u0026\u0026 pnpm test` before pushing.\n- Run `pnpm bench` against `bench/baseline.md` when touching `src/clean.ts` — no regression \u003e 10 % at fixed feature set.\n- Target the `main` branch.\n\n## License\n\n[MIT](LICENSE.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoroboros%2Fpdf-cleaner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoroboros%2Fpdf-cleaner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoroboros%2Fpdf-cleaner/lists"}