{"id":20381267,"url":"https://github.com/sigalor/poppler-native","last_synced_at":"2025-08-02T03:38:16.014Z","repository":{"id":42941306,"uuid":"234955503","full_name":"sigalor/poppler-native","owner":"sigalor","description":"A native interface to the Poppler PDF parser for NodeJS.","archived":false,"fork":false,"pushed_at":"2023-10-18T18:53:36.000Z","size":2943,"stargazers_count":5,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-10T14:34:54.901Z","etag":null,"topics":["nodejs","parser","pdf","poppler"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sigalor.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-01-19T19:45:00.000Z","updated_at":"2023-02-17T17:19:52.000Z","dependencies_parsed_at":"2025-04-12T09:06:03.545Z","dependency_job_id":null,"html_url":"https://github.com/sigalor/poppler-native","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sigalor/poppler-native","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigalor%2Fpoppler-native","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigalor%2Fpoppler-native/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigalor%2Fpoppler-native/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigalor%2Fpoppler-native/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sigalor","download_url":"https://codeload.github.com/sigalor/poppler-native/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigalor%2Fpoppler-native/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268331288,"owners_count":24233218,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nodejs","parser","pdf","poppler"],"created_at":"2024-11-15T02:12:47.759Z","updated_at":"2025-08-02T03:38:15.990Z","avatar_url":"https://github.com/sigalor.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Poppler for Node\n\n[![GitHub license](https://img.shields.io/github/license/sigalor/poppler-native)](https://github.com/sigalor/poppler-native/blob/master/LICENSE) [![npm](https://img.shields.io/npm/v/poppler-native)](https://www.npmjs.com/package/poppler-native) [![Unit tests workflow status](https://github.com/sigalor/poppler-native/actions/workflows/tests.yaml/badge.svg)](https://github.com/sigalor/poppler-native/actions/workflows/tests.yaml)\n\nAllows you to use the native Poppler C++ backend to efficiently parse PDF files from NodeJS. Outputs similar information to `pdftohtml -xml -stdout test.pdf` (with `pdftohtml` from the `poppler-utils` package), because it uses parts of the same codebase which have been rewritten to output N-API objects instead of XML code. All contained functions return JavaScript promises.\n\n## Getting started\n\n1. dependencies: `build-essential`; curl, PNG, JPEG and FreeType development headers; ghostscript, python, mupdf-tools, qpdf\n2. `npm install poppler-native` (only tested on Ubuntu 16.04 and 20.04 so far)\n\n```javascript\n// allows filename...\nconst pdf = require('poppler-native');\npdf.info('test.pdf').then(res =\u003e console.log(res));\n\n// ...or buffer with raw PDF bytes directly\nconst fs = require('fs-extra');\nfs.readFile('test.pdf')\n  .then(f =\u003e pdf.info(f))\n  .then(res =\u003e console.log(res));\n\n// you also have the option to convert the PDF to a PS and then reconvert it to PDF again via GhostScript before extracting data from it, which can sometimes help when wrong characters are extracted with the default method:\npdf.info('test.pdf', { reconvertThroughPS: true });\n```\n\nIn order to visualize the parsed text boxes and images, you can also write the entire output from the `pdf.info` function into a JSON file, then open the file `misc/pdf-json-viewer.html` in any web browser and drag-and-drop the JSON file there.\n\n## Contributing\n\n### Updating Poppler\n\nThis is obviously only necessary when a new version of Poppler is released. According to their readme, the internal Poppler C++ API, which is the foundation of this project, might be subject to breaking changes, even in minor releases. Consequently, evaluate new Poppler versions thoroughly before updating.\n\n1. Download the Poppler sources from [here](https://poppler.freedesktop.org/releases.html).\n2. Put all `*.h`, `*.c` and `*.cc` files from `poppler-20.11.0/goo` into `native/poppler/goo`. The same for `fofi` and `poppler`. Do not change the existing two config files.\n3. Remove the following files from the `native/poppler/poppler` directory: `CairoFontEngine.cc CairoFontEngine.h CairoOutputDev.cc CairoOutputDev.h CairoRescaleBox.cc CairoRescaleBox.h GlobalParamsWin.cc JPEG2000Stream.cc JPEG2000Stream.h SignatureHandler.cc SignatureHandler.h SplashOutputDev.cc SplashOutputDev.h`\n4. Remove the line `#include \"splash/SplashTypes.h\"` from `native/poppler/poppler/GfxState.cc`.\n\n## License\n\nGPLv2 or later, because the Poppler source is bundled.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsigalor%2Fpoppler-native","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsigalor%2Fpoppler-native","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsigalor%2Fpoppler-native/lists"}