{"id":50302171,"url":"https://github.com/softwarity/pdfbox","last_synced_at":"2026-05-28T13:30:20.436Z","repository":{"id":359981815,"uuid":"1248030639","full_name":"softwarity/pdfbox","owner":"softwarity","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-24T12:00:13.000Z","size":8577,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-24T13:25:46.374Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/softwarity.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-24T05:07:28.000Z","updated_at":"2026-05-24T12:00:20.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/softwarity/pdfbox","commit_stats":null,"previous_names":["softwarity/pdfbox"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/softwarity/pdfbox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/softwarity%2Fpdfbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/softwarity%2Fpdfbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/softwarity%2Fpdfbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/softwarity%2Fpdfbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/softwarity","download_url":"https://codeload.github.com/softwarity/pdfbox/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/softwarity%2Fpdfbox/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33611247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-28T13:30:18.903Z","updated_at":"2026-05-28T13:30:20.432Z","avatar_url":"https://github.com/softwarity.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pdfbox\n\nA tiny, self-contained **Spring Boot** service that turns **HTML into PDF — including PDF/A**\n(PDF/A-1a, 1b, 2a, 2b, 2u, 3a, 3b, 3u). Push HTML, get the PDF binary back. No external services,\nno runtime configuration, fonts bundled for offline generation of exotic scripts (Vietnamese,\nHebrew, Arabic, Thai, Devanagari, Japanese/CJK…).\n\n```\nPOST  your HTML  ──►  pdfbox  ──►  PDF/A binary\n```\n\n## Quick start (Docker)\n\n```bash\ndocker run --rm -p 8080:8080 softwarity/pdfbox:latest\n```\n\nThen open \u003chttp://localhost:8080\u003e for a small test page, or call the API directly:\n\n```bash\n# PDF/A-1b (default) — HTML in the request body, standard as a query param\ncurl -X POST \"http://localhost:8080/api/v1/pdf?standard=PDF_A_1B\" \\\n     -H \"Content-Type: text/html\" \\\n     --data '\u003ch1\u003eHello\u003c/h1\u003e\u003cp\u003eTiếng Việt · עברית · 日本語\u003c/p\u003e' \\\n     -o document.pdf\n```\n\nThat is the whole contract: send HTML, pick a standard, receive `application/pdf`.\n\n## Build \u0026 run locally\n\nRequires JDK 21.\n\n```bash\nmvn spring-boot:run          # run from sources\n# or\nmvn package \u0026\u0026 java -jar target/pdfbox.jar\n```\n\n\u003e A broad Noto font set is bundled **inside the jar**, so multi-script rendering works offline\n\u003e the same way whether you run the jar, the IDE, or the Docker image — no system fonts required.\n\n## API\n\n| Method \u0026 path | Description |\n|---|---|\n| `POST /api/v1/pdf` | Body = HTML. Query params: `standard` (see below, default `PDF_A_1B`), `filename` (default `document.pdf`). Returns `application/pdf`. |\n| `POST /api/v1/pdf/upload` | `multipart/form-data`: `file` = a standalone HTML file, plus `standard` and `filename`. Returns `application/pdf`. This is the form Swagger UI shows as a file picker. |\n| `GET /api/v1/standards` | Lists the supported standards. |\n| `GET /v3/api-docs` | OpenAPI 3 description (JSON). |\n| `GET /swagger-ui.html` | Interactive Swagger UI — **dev profile only** (see below). |\n| `GET /actuator/health` | Health probe. |\n| `GET /` | Browser test page. |\n\nSupported `standard` values: `NONE`, `PDF_A_1A`, `PDF_A_1B`, `PDF_A_2A`, `PDF_A_2B`, `PDF_A_2U`,\n`PDF_A_3A`, `PDF_A_3B`, `PDF_A_3U`.\n\n```bash\n# Upload an HTML file (no JavaScript — it is not executed) and get a PDF/A-2b back:\ncurl -X POST \"http://localhost:8080/api/v1/pdf/upload\" \\\n     -F \"file=@invoice.html;type=text/html\" \\\n     -F \"standard=PDF_A_2B\" \\\n     -o invoice.pdf\n```\n\n### OpenAPI \u0026 Swagger UI\n\nThe OpenAPI 3 description is always served at **`/v3/api-docs`** (handy for generating clients).\n\nThe interactive **Swagger UI is a dev-only convenience** and is **disabled by default**. Enable it by\nactivating the `dev` profile, then browse to \u003chttp://localhost:8080/swagger-ui.html\u003e:\n\n```bash\nmvn spring-boot:run -Dspring-boot.run.profiles=dev\n# or, on the packaged jar / Docker:\nSPRING_PROFILES_ACTIVE=dev java -jar target/pdfbox.jar\ndocker run --rm -p 8080:8080 -e SPRING_PROFILES_ACTIVE=dev softwarity/pdfbox:latest\n```\n\nIn Swagger UI, `POST /api/v1/pdf/upload` renders a file picker: choose a standalone HTML file, pick a\n`standard`, and the response download is your PDF.\n\n### Fonts \u0026 exotic scripts\n\nAll glyphs are **embedded** (required for PDF/A), so output is reproducible offline. A very broad Noto\nset ships with the service: `Noto Sans` (Latin/Vietnamese/Cyrillic/Greek), `Noto Serif`, `Noto Sans\nMono`, and per-script faces for Hebrew, Arabic, Thaana, all the major Indic scripts (Devanagari,\nBengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala), Thai, Lao, Myanmar,\nKhmer, Georgian, Armenian, Ethiopic, Japanese, Korean, Chinese (Simplified \u0026 Traditional), plus\nsymbols/math and monochrome emoji. The CSS generic families `sans-serif`/`serif`/`monospace` resolve\nto `Noto Sans`/`Noto Serif`/`Noto Sans Mono`, so a stack ending in a generic (the usual\n`font-family: Arial, sans-serif`) always lands on a real embedded face.\n\nYou normally write nothing: a single document can mix scripts freely and the right faces are chosen\nautomatically. To force a family explicitly, use CSS:\n\n```html\n\u003cp style=\"font-family:'Noto Sans Hebrew'\"\u003eשלום עולם\u003c/p\u003e\n\u003cp style=\"font-family:'Noto Sans JP'\"\u003e日本語のテキスト\u003c/p\u003e\n```\n\n**Per-document font selection.** openhtmltopdf re-parses every font it is told about on *every*\nrender, so registering the whole set each time would re-parse tens of MB per request. Instead the\nservice scans the source for the Unicode blocks it actually contains and only registers the matching\nfaces — a Latin-only PDF never pays to parse Thai, Devanagari or CJK. Shared CJK Han ideographs are\ndisambiguated to Japanese / Korean / Simplified / Traditional from the document `lang`\n(`ja` / `ko` / `zh-Hans` / `zh-Hant`), defaulting to Japanese. So **declare `lang`** on elements\nmixing Chinese/Japanese/Korean to get the correct Han shapes.\n\n**Where the fonts live.** The light faces are bundled in the jar. The big CJK faces (Korean, Chinese\nSC/TC) are shipped as filesystem fonts in the Docker image under `/usr/share/fonts` instead, to keep\nthe jar small — running the bare jar therefore covers everything *except* CJK unless you add CJK\n`.ttf` files yourself. Need another script, or CJK when running the jar directly? Drop extra **`.ttf`**\nfiles into `/app/fonts` (Docker) or any directory in `PDFBOX_FONTS_DIRECTORIES`; they are indexed at\nstartup. Two caveats: the PDFBox renderer only embeds TrueType outlines, so OpenType/CFF `.otf` files\nare skipped (convert to `.ttf` first); and a variable font is embedded at its *default* master, so\nfreeze it to a static instance first (the image does this for the Noto CJK variable fonts, forcing\n`wght=400`, otherwise they would embed as Thin).\n\n### PDF/A conformance notes\n\n- **`B` (visual) and `U` (Unicode) levels** are the recommended, fully-supported targets. Output\n  carries the proper PDF version (1.4 for part 1, 1.7 for parts 2/3), an embedded sRGB output intent,\n  embedded fonts and a PDF/A XMP identification block; PDF/A-1 is written with a classic\n  cross-reference table.\n- **`A` (accessible/tagged) levels** additionally enable tagged/PDF-UA output, whose validity depends\n  on the *input* HTML being accessible (document language, image `alt` text, proper heading order…).\n  As a safety net the service injects a `\u003ctitle\u003e`, `\u003chtml lang\u003e` and `\u003cmeta name=\"subject\"\u003e` (the\n  PDF/UA description maps to the PDF *Subject*, i.e. `\u003cmeta name=\"subject\"\u003e`, not `description`) when\n  the source omits them (see `PDFBOX_DEFAULT_LANG` / `PDFBOX_DEFAULT_TITLE`), but for genuinely\n  accessible output you should supply accurate ones — generic defaults satisfy the validator, not a\n  screen-reader user.\n- Always validate critical output with [veraPDF](https://verapdf.org/):\n  ```bash\n  verapdf --flavour 1b document.pdf\n  ```\n\n## Configuration\n\nSensible defaults mean you normally configure nothing. If needed, override via environment variables:\n\n| Property | Default | Purpose |\n|---|---|---|\n| `PDFBOX_DEFAULT_STANDARD` | `PDF_A_1B` | Standard used when the request omits `standard`. |\n| `PDFBOX_FONTS_DIRECTORIES` | `/app/fonts,/usr/share/fonts,fonts` | Comma-separated font scan directories. |\n| `PDFBOX_DEFAULT_FONT_FAMILY` | Noto stack | CSS font stack applied when the HTML sets none. |\n| `PDFBOX_DEFAULT_LANG` | `en` | `\u003chtml lang\u003e` injected when the source omits it (required by the accessible \"A\" levels / PDF/UA). |\n| `PDFBOX_DEFAULT_TITLE` | `Document` | `\u003ctitle\u003e` injected when the source has none and no usable `\u003ch1\u003e/\u003ch2\u003e/\u003ch3\u003e` (required by PDF/UA). |\n| `PDFBOX_BASE_PATH` | _(empty)_ | Base path prefixed to **every** route, set at startup. Must start with `/`. |\n| `SPRING_PROFILES_ACTIVE` | _(none)_ | Set to `dev` to enable Swagger UI. |\n| `JAVA_OPTS` | _(empty)_ | Extra JVM flags. |\n\n### Base path (set at launch)\n\nThe base path is a **startup parameter**, fixed for the lifetime of the process — not a per-request\nvalue. Provide it at launch as an environment variable or a Spring command-line argument:\n\n```bash\n# Environment variable\nPDFBOX_BASE_PATH=/pdfbox java -jar target/pdfbox.jar\ndocker run --rm -p 8080:8080 -e PDFBOX_BASE_PATH=/pdfbox softwarity/pdfbox:latest\n\n# ...or as a launch argument\njava -jar target/pdfbox.jar --server.servlet.context-path=/pdfbox\n```\n\nEvery route then lives under that prefix, e.g. `POST http://localhost:8080/pdfbox/api/v1/pdf`,\n`GET http://localhost:8080/pdfbox/v3/api-docs`, `GET http://localhost:8080/pdfbox/actuator/health`.\n\n## How it works\n\n```\nHTML ──jsoup──► well-formed W3C DOM ──openhtmltopdf──► PDF (PDFBox) ──► PDF/A normalization ──► bytes\n```\n\n- **jsoup** parses arbitrary/malformed HTML into a clean DOM.\n- **openhtmltopdf** renders the DOM to PDF/A on top of **Apache PDFBox**.\n- A small PDFBox pass fixes the PDF/A-1 header version and cross-reference table.\n\n## Built with open source\n\nThis project stands on the shoulders of these open-source projects — thank you:\n\n- [openhtmltopdf](https://github.com/openhtmltopdf/openhtmltopdf) — HTML/CSS → PDF/A \u0026 PDF/UA engine (LGPL-2.1), a fork of\n- [Flying Saucer](https://github.com/flyingsaucerproject/flyingsaucer) (LGPL-2.1)\n- [Apache PDFBox](https://pdfbox.apache.org/) — PDF library (Apache-2.0)\n- [jsoup](https://jsoup.org/) — HTML parser (MIT)\n- [Spring Boot](https://spring.io/projects/spring-boot) (Apache-2.0)\n- [Noto fonts](https://fonts.google.com/noto) (SIL Open Font License 1.1)\n- PDF/A validation by [veraPDF](https://verapdf.org/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoftwarity%2Fpdfbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoftwarity%2Fpdfbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoftwarity%2Fpdfbox/lists"}