{"id":21629729,"url":"https://github.com/harryr/pandapdf","last_synced_at":"2025-03-18T21:22:47.471Z","repository":{"id":139181315,"uuid":"42074012","full_name":"HarryR/PandaPDF","owner":"HarryR","description":"PDF to images with content / link extraction","archived":false,"fork":false,"pushed_at":"2015-12-10T02:03:51.000Z","size":77,"stargazers_count":3,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-24T23:27:28.415Z","etag":null,"topics":["ghostscript","pdf","pdf-converter","photoshop","poppler","webp"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HarryR.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-09-07T20:51:56.000Z","updated_at":"2023-04-06T18:13:21.000Z","dependencies_parsed_at":"2023-06-26T01:54:29.194Z","dependency_job_id":null,"html_url":"https://github.com/HarryR/PandaPDF","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarryR%2FPandaPDF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarryR%2FPandaPDF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarryR%2FPandaPDF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarryR%2FPandaPDF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HarryR","download_url":"https://codeload.github.com/HarryR/PandaPDF/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244306720,"owners_count":20431879,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ghostscript","pdf","pdf-converter","photoshop","poppler","webp"],"created_at":"2024-11-25T02:08:34.776Z","updated_at":"2025-03-18T21:22:47.446Z","avatar_url":"https://github.com/HarryR.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"```\n  ____                 _       ____  ____  _____\n |  _ \\ __ _ _ __   __| | __ _|  _ \\|  _ \\|  ___|\n | |_) / _` | '_ \\ / _` |/ _` | |_) | | | | |_\n |  __/ (_| | | | | (_| | (_| |  __/| |_| |  _|\n |_|   \\__,_|_| |_|\\__,_|\\__,_|_|   |____/|_|\n```\n\nPandaPDF was developed as one of the components used to fully automate a the \nworkflow of a digital magazine publishing company, the aim was to convert PDF\nfiles into images so they can be distributed  without requiring full PDF\nreading software on the client, this was necessary because delivering high\nDPI print-quality PDF files to portable devices wasn't feasible.\n\nThis software allows for a PDF file to be converted into individual image files\nat different resolutions which can be streamed to clients on an as-needed basis,\nwhen using WEBP or lower quality JPEG the aim is for the total download size to\nbe lower than if the PDF were to be retrieved in-full.\n\n*In almost all situations PandaPDF, when used with the Poppler and Ghostscript\nbackends, will be visually identical to PhotoShop even with very complex PDFs.*\n\nSuper-sampling is used in an attempt to and mitigate aliasing and float-rounding\nproblems that occur with highly complex graphical artwork, when this type of PDF\nis rendered by PhotoShop there are no visual artefacts, but when rendered with\nAcrobat, Ghostscript and Poppler small white gaps between elements appear. The\nsuper-sampling technique renderes the PDF at a higher than required resolution\nthen down-samples it to produce the final images.\n\nFor best bandwidth efficiency *we highly recommend using WEBP output*, on average\nthis is half the size of JPEG output at 80% quality and results in significant\nbandwidth savings.\n\nPandaPDF software provides the following features:\n\n * High-Quality PDF rasterization\n * Thumbnail Generation\n * Content extraction\n * Interactive Region extraction\n * Manifest creator\n * Output to PNG, JPEG and WEBP\n\n[![Build Status](https://drone.io/github.com/HarryR/PandaPDF/status.png)](https://drone.io/github.com/HarryR/PandaPDF/latest)\n\nv3.0 Goals\n----------\n\n * Multiple PDF rasterizers, Photoshop and Ghostscript\n * Remove usage of boost dependencies.\n * Remove GPLv2 code from project, open-source it\n * Decouple output format from rendering component.\n * Document problems with coordinate transforms etc.\n * Improved build system, builds on Win32, OSX and Linux\n * Fully modular architecture with cleaner code\n * Include 'poppler-data' package in distribution. (FAIL)\n * Regression tests. (FAIL)\n * Unit tests. (FAIL)\n\n\nVersion History\n---------------\n\n * 3.0 - Photoshop \u0026 Ghostscript support\n * 2.x - Refactoring, Modular code and backends\n * 2.0 - Improved Quality, Cairo backend\n * 1.x - Multi-threaded, Text Extraction\n * 1.0 - Production run, cloud enabled\n * 0.x - C++/Poppler Prototype\n * -1  - Java Prototype\n\n\nCommandline\n-----------\n```\nUsage: pandapdf \u003c-options\u003e [\u003cname,type@quality,size\u003e ...]\n\nOptions:\n -debug : Enable debug logging\n -quiet : Log only warnings \u0026 errors\n -pdf \u003cstr\u003e : PDF File\n -out \u003cstr\u003e : Output Directory\n -box \u003cstr\u003e : Which PDF box to use [crop,trim,media]\n -no-images : Disable Image Output\n -json-regions  : Enable JSON regions\n -json-words  : Enable JSON words\n -text-words  : Enable text words\n -opw \u003cstr\u003e : Owner/Modify Password\n -upw \u003cstr\u003e : User/Open Password\n -first \u003cint\u003e : First Page\n -last \u003cint\u003e  : Last Page\n -image-backend \u003cstr\u003e : Which backend to use\n -supersample \u003cfloat\u003e : Supersample at N times resolution [default 1.5]\n\nImage Backends, set with '-image-backend':\n - poppler-cairo (default)\n - poppler-splash\n - ghostscript\n\nExamples:\n  $ pandapdf -pdf test.pdf -out dir -json-regions large,jpeg@80,1500 medium,webp@90,720\n\nVersions:\n PandaPDF   : 3.0.1 (c) 2009-2015 PixelMags Inc., H Roberts\n Poppler    : 0.38.0\n Cairo      : 1.12.18\n Freetype   : 0e0fdc5dc89e5079898c5da67b56f994c439fee1\n FontConfig : 2.11.94\n Pixman     : 0.32.8\n libJBIG    : 2.1\n libPNG     : 1.2.54\n libJPEG    : 8d\n libTIFF    : 4.0.6\n libWEBP    : 0.4.4\n```\n\nExample Usage:\n\n    pandapdf -pdf test.pdf -out ~/test/ large,jpeg@80,800 medium,webp@80,800\n\nThis would extract images of all pages from `test.pdf` into the `~/test/` directory. The output files are named according to the format:\n\n    page_%04d_%s.%s - e.g. page_0023_large.jpg\n\nThe different profiles specify the maximum dimension and output quality for\neach image. \n\n\nBuilding\n--------\n\nThe software can be built using system dependencies, but also includes a build \nsystem that compiles all required libraries from scratch to produce a mostly\nstatic executable.\n\nThe `pandapdf` target produces a dynamic executable, if `make dependencies` is\nrun it will build, compile and then statically link all libraries into a single executable leaving only the system libc dynamic. If `make dependencies` is not\nrun it will dynamically link all required dependencies from the system.\n\n```\napt-get install autoconf automake make g++ gcc wget git libtool pkg-config xz-utils libexpat1-dev ghostscript upx cmake libbzip2-dev\nmake dependencies\nmake\n```\n\nOn OSX, Windows and Linux the executable is about 9mb.\n\nAfter compression with UPX the executable is a little over 2mb.\n\n\nLicensing\n---------\n\nThe software is inexplicably tied to the Poppler library and includes one\nsource file (parseargs) from the project. As Poppler is released under the GPLv2\nlicense this program must comply with the licensing restrictions, this means\nthat PandaPDF binaries cannot be distributed without releasing the source code\nas it is considered a derivative work. As such the source code has been released\nunder the terms of the GPLv2 (see the `COPYING` file) so that it can be further\ndeveloped, improved and distributed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharryr%2Fpandapdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharryr%2Fpandapdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharryr%2Fpandapdf/lists"}