{"id":19732087,"url":"https://github.com/ffalt/pdf.js-extract","last_synced_at":"2026-04-11T10:19:06.135Z","repository":{"id":37514072,"uuid":"71908973","full_name":"ffalt/pdf.js-extract","owner":"ffalt","description":"nodejs lib for extracting data from PDF files","archived":false,"fork":false,"pushed_at":"2026-03-22T20:43:59.000Z","size":16958,"stargazers_count":250,"open_issues_count":0,"forks_count":59,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-03-23T08:34:48.089Z","etag":null,"topics":["extracting-data","node-module","pdf"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ffalt.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-10-25T15:04:39.000Z","updated_at":"2026-03-22T20:44:02.000Z","dependencies_parsed_at":"2024-06-18T13:42:55.085Z","dependency_job_id":"35041bc2-0802-467b-a4c8-c43a9ca2f367","html_url":"https://github.com/ffalt/pdf.js-extract","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/ffalt/pdf.js-extract","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ffalt%2Fpdf.js-extract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ffalt%2Fpdf.js-extract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ffalt%2Fpdf.js-extract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ffalt%2Fpdf.js-extract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ffalt","download_url":"https://codeload.github.com/ffalt/pdf.js-extract/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ffalt%2Fpdf.js-extract/sbom","scorecard":{"id":398132,"data":{"date":"2025-08-11","repo":{"name":"github.com/ffalt/pdf.js-extract","commit":"fb12fced426bdc4414e4cc68683f0fbb3d9e351c"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.7,"checks":[{"name":"Maintained","score":1,"reason":"2 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/nodejs.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 2/26 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":2,"reason":"dependency not pinned by hash detected -- score normalized to 2","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/nodejs.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/ffalt/pdf.js-extract/nodejs.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/nodejs.yml:21: update your workflow using https://app.stepsecurity.io/secureworkflow/ffalt/pdf.js-extract/nodejs.yml/main?enable=pin","Warn: npmCommand not pinned by hash: update_pdfjs.sh:3","Warn: npmCommand not pinned by hash: update_pdfjs.sh:4","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   1 out of   3 npmCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 6 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":0,"reason":"11 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-968p-4wvh-cqc8","Warn: Project is vulnerable to: GHSA-67hx-6x53-jw92","Warn: Project is vulnerable to: GHSA-v6h2-p8h4-qcjw","Warn: Project is vulnerable to: GHSA-grv7-fg5c-xmjg","Warn: Project is vulnerable to: GHSA-3xgq-45jj-v275","Warn: Project is vulnerable to: GHSA-9c47-m6qq-7p4h","Warn: Project is vulnerable to: GHSA-952p-6rrq-rcjv","Warn: Project is vulnerable to: GHSA-c2qf-rxjj-qqgw","Warn: Project is vulnerable to: GHSA-72xf-g2v4-qvf3","Warn: Project is vulnerable to: GHSA-j8xg-fqg3-53r7","Warn: Project is vulnerable to: GHSA-3h5v-q93c-6h6q"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-18T19:30:18.484Z","repository_id":37514072,"created_at":"2025-08-18T19:30:18.484Z","updated_at":"2025-08-18T19:30:18.484Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31676837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-11T08:18:19.405Z","status":"ssl_error","status_checked_at":"2026-04-11T08:17:08.892Z","response_time":54,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extracting-data","node-module","pdf"],"created_at":"2024-11-12T00:24:40.417Z","updated_at":"2026-04-11T10:19:06.123Z","avatar_url":"https://github.com/ffalt.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pdf.js-extract\n\nExtracts text/annotations/attachments/images from PDF files\n\n\u003e [!NOTE]\n\u003e This library is for **Node.js**. It is not meant to be used in the browser.\n\nRead a PDF file and exports all pages \u0026 texts with coordinates. \nThis can be e.g. used to extract structured table data.\nOptions include extracting attachments and images as well.\n\nThis package includes a build of [pdf.js](https://github.com/mozilla/pdf.js).\n\n\u003e [!IMPORTANT]\n\u003e NO OCR!\n\n## Install\n\n[![NPM](https://nodei.co/npm/pdf.js-extract.png?downloads=true\u0026downloadRank=true\u0026stars=true)](https://www.npmjs.com/package/pdf.js-extract)\n\n![test](https://github.com/ffalt/pdf.js-extract/workflows/test/badge.svg)\n[![license](https://img.shields.io/npm/l/pdf.js-extract.svg)](http://opensource.org/licenses/MIT) \n\n## Options\n```typescript\nexport interface PDFExtractOptions {\n  firstPage?: number; // default:`1` - start extract at page nr\n  lastPage?: number; //  stop extract at page nr, no default value\n  password?: string; //  for decrypting password-protected PDFs., no default value\n  verbosity?: number; // default:`-1` - log level of pdf.js\n  normalizeWhitespace?: boolean; // default:`false` - replaces all occurrences of whitespace with standard spaces (0x20).\n  disableCombineTextItems?: boolean; // default:`false` - do not attempt to combine  same line {@link TextItem}'s.\n  includeAttachments?: boolean; // include attachments as base64. The default value is `false`.\n  includeImages?: boolean; // include images as base64. The default value is `false`.\n  includeColors?: boolean; // default:`false` - include font fill color (best effort, possibly incomplete).\n}\n```\n\n## Example Usage\n\n### Async Javascript with Callback using Buffer\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nimport fs from 'node:fs';\nconst pdfExtract = new PDFExtract();\nconst buffer = fs.readFileSync(\"./example.pdf\");\nconst options = {}; \npdfExtract.extractBuffer(buffer, options, (err, data) =\u003e {\n  if (err) return console.log(err);\n  console.log(data);\n});\n```\n\n### Async Javascript with Callback\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nconst pdfExtract = new PDFExtract();\nconst options = {}; \npdfExtract.extract('test.pdf', options, (err, data) =\u003e {\n  if (err) return console.log(err);\n  console.log(data);\n});\n```\n\n### Async Typescript with Promise\n\n```typescript\nimport {PDFExtract, PDFExtractOptions} from 'pdf.js-extract';\nconst pdfExtract = new PDFExtract();\nconst options: PDFExtractOptions = {}; \npdfExtract.extract('test.pdf', options)\n  .then(data =\u003e console.log(data))\n  .catch(err=\u003e console.log(err));\n```\n\n### Extract Specific Pages\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nconst pdfExtract = new PDFExtract();\n\n// Extract only pages 2 through 5\nconst data = await pdfExtract.extract('report.pdf', { firstPage: 2, lastPage: 5 });\nconsole.log(`Extracted ${data.pages.length} pages`);\n```\n\n### Password-Protected PDFs\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nconst pdfExtract = new PDFExtract();\n\nconst data = await pdfExtract.extract('secure.pdf', { password: 'my-secret' });\nconsole.log(data.pages[0].content.map(item =\u003e item.str).join(' '));\n```\n\n### Collect All Text from a PDF\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nconst pdfExtract = new PDFExtract();\nconst data = await pdfExtract.extract('document.pdf', { normalizeWhitespace: true });\n\nconst fullText = data.pages\n  .map(page =\u003e page.content.map(item =\u003e item.str).join(' '))\n  .join('\\n\\n');\n\nconsole.log(fullText);\n```\n\n### Extract Text as Lines and Rows (Table Data)\n\nThe built-in utility functions help convert raw text items into structured lines and table rows.\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nconst pdfExtract = new PDFExtract();\nconst data = await pdfExtract.extract('table.pdf');\n\nconst page = data.pages[0];\n\n// Group text items into lines (items within 5 units of y are merged)\nconst lines = PDFExtract.utils.pageToLines(page, 5);\n\n// Get plain text rows\nconst rows = PDFExtract.utils.extractTextRows(lines);\nconsole.log(rows); // [['Name', 'Age', 'City'], ['Alice', '30', 'Berlin'], ...]\n\n// Or map to columns by x-positions with a tolerance of 10 units\nconst columns = [50, 200, 350]; // x-positions of each column\nconst tableRows = PDFExtract.utils.extractColumnRows(lines, columns, 10);\nconsole.log(tableRows);\n```\n\n### Extract All Pages as Text Rows\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nconst pdfExtract = new PDFExtract();\nconst data = await pdfExtract.extract('multi-page.pdf');\n\n// Get text rows for every page at once (merge items within 5 y-units)\nconst allRows = PDFExtract.utils.extractAllPagesTextRows(data.pages, 5);\nallRows.forEach((pageRows, i) =\u003e {\n  console.log(`--- Page ${i + 1} ---`);\n  pageRows.forEach(row =\u003e console.log(row.join(' | ')));\n});\n```\n\n### Extract Links\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nconst pdfExtract = new PDFExtract();\nconst data = await pdfExtract.extract('document.pdf');\n\ndata.pages.forEach((page) =\u003e {\n  if (!page.annotations) return;\n\n  const links = page.annotations.filter(a =\u003e a.subtype === 'Link' \u0026\u0026 a.url);\n  links.forEach(link =\u003e {\n    console.log(`Page ${page.info.num}: ${link.overlaidText || 'link'} -\u003e ${link.url}`);\n  });\n});\n```\n\n### Extract Attachments\n\n```javascript\nimport { PDFExtract } from 'pdf.js-extract';\nimport fs from 'node:fs';\nconst pdfExtract = new PDFExtract();\nconst data = await pdfExtract.extract('document.pdf', { includeAttachments: true });\n\nif (data.attachments) {\n  data.attachments.forEach(att =\u003e {\n    if (att.base64data) {\n      const buffer = Buffer.from(att.base64data, 'base64');\n      fs.writeFileSync(att.filename || 'attachment.bin', buffer);\n      console.log(`Saved ${att.filename} (${buffer.length} bytes)`);\n    }\n  });\n}\n```\n\n### Extract Images\n\n```javascript\nconst pdfExtract = new PDFExtract();\nconst data = await pdfExtract.extract('document.pdf', { includeImages: true });\n\n// Access images for each page\ndata.pages.forEach((page) =\u003e {\n  if (page.images \u0026\u0026 page.images.length \u003e 0) {\n    console.log(`Page ${page.info.num} has ${page.images.length} images`);\n    \n    page.images.forEach((img) =\u003e {\n      console.log(`  Image ${img.index}: ${img.width}x${img.height}px (${img.colorSpace})`);\n      \n      // Save image if data available\n      if (img.base64data) {\n        const buffer = Buffer.from(img.base64data, 'base64');\n        fs.writeFileSync(`image_${img.index}.jpg`, buffer);\n      }\n    });\n  }\n});\n```\n\n#### Image Properties\n\nEach extracted image contains:\n\n```typescript\ninterface PDFExtractImage {\n  index: number;              // Image index on the page\n  width: number;              // Image width in pixels\n  height: number;             // Image height in pixels\n  kind: number;               // Image type: 1=XObject, 2=Inline, 3=Form\n  base64data?: string;        // Base64-encoded image data\n  colorSpace?: string;        // Color space (DeviceRGB, DeviceGray, DeviceCMYK, etc.)\n  bitsPerComponent?: number;  // Bits per component (typically 8)\n  filter?: string;            // Compression filter (DCTDecode, FlateDecode, etc.)\n}\n```\n\n#### Image Types\n\n- **kind 1 - XObject**: Standard image objects from page resources (most common)\n- **kind 2 - Inline**: Images embedded directly in content streams\n- **kind 3 - Form**: Images contained within Form XObjects\n\n## Example Output\n\n```json\n{\n  \"filename\": \"helloworld.pdf\",\n  \"meta\": {\n    \"info\": {\n      \"PDFFormatVersion\": \"1.7\",\n      \"IsAcroFormPresent\": false,\n      \"IsCollectionPresent\": false,\n      \"IsLinearized\": true,\n      \"IsXFAPresent\": false\n    },\n    \"metadata\": {\n      \"dc:format\": \"application/pdf\",\n      \"dc:creator\": \"someone\",\n      \"dc:title\": \"This is a hello world PDF file\",\n      \"xmp:createdate\": \"2000-06-29T10:21:08+11:00\",\n      \"xmp:creatortool\": \"Microsoft Word 8.0\",\n      \"xmp:modifydate\": \"2013-10-28T15:24:13-04:00\",\n      \"xmp:metadatadate\": \"2013-10-28T15:24:13-04:00\",\n      \"pdf:producer\": \"Acrobat Distiller 4.0 for Windows\",\n      \"xmpmm:documentid\": \"uuid:0205e221-80a8-459e-a522-635ed5c1e2e6\",\n      \"xmpmm:instanceid\": \"uuid:68d6ae6d-43c4-472d-9b28-7c4add8f9e46\"\n    }\n  },\n  \"pages\": [\n    {\n      \"pageInfo\": {\n        \"num\": 1,\n        \"scale\": 1,\n        \"rotation\": 0,\n        \"offsetX\": 0,\n        \"offsetY\": 0,\n        \"width\": 200,\n        \"height\": 200,\n        \"view\": { \"minX\": 0, \"minY\": 0, \"maxX\": 200, \"maxY\": 200 }\n      },\n      \"annotations\": [\n        {\n          \"annotationType\": 2,\n          \"annotationFlags\": 0,\n          \"borderStyle\": {\n            \"width\": 0,\n            \"rawWidth\": 1,\n            \"style\": 1,\n            \"dashArray\": [3],\n            \"horizontalCornerRadius\": 0,\n            \"verticalCornerRadius\": 0\n          },\n          \"color\": \"#000000\",\n          \"borderColor\": \"#000000\",\n          \"rotation\": 0,\n          \"contentsObj\": {\n            \"str\": \"\",\n            \"dir\": \"ltr\"\n          },\n          \"hasAppearance\": false,\n          \"id\": \"4R\",\n          \"rect\": [92.043, 771.389, 217.757, 785.189],\n          \"subtype\": \"Link\",\n          \"hasOwnCanvas\": false,\n          \"noRotate\": false,\n          \"noHTML\": false,\n          \"isEditable\": false,\n          \"structParent\": -1,\n          \"url\": \"https://example.com/\",\n          \"unsafeUrl\": \"https://example.com/\",\n          \"overlaidText\": \"a link to an awesome site\",\n          \"x\": 217.757,\n          \"y\": 785.189\n        }\n      ],\n      \"content\": [\n        {\n          \"x\": 70,\n          \"y\": 150,\n          \"str\": \"Hello, world!\",\n          \"dir\": \"ltr\",\n          \"width\": 64.656,\n          \"height\": 12,\n          \"transform\": [12, 0, 0, 12, 70, 50],\n          \"font\": {\n            \"size\": 12,\n            \"name\": \"TimesNewRomanPSMT\",\n            \"color\": \"#000000\",\n            \"family\": \"serif\",\n            \"vertical\": false,\n            \"ascent\": 0.891,\n            \"descent\": -0.216\n          },\n          \"hasEOL\": false\n        }\n      ],\n      \"images\": [\n        {\n          \"index\": 0,\n          \"width\": 16,\n          \"height\": 16,\n          \"kind\": 1,\n          \"base64data\": \"AAAAAAAAEAgAAAEAAQABAAEAAAAQEBAwD+AAAAAAAAA=\"\n        }\n      ]\n    }\n  ],\n  \"attachments\": [\n    {\n      \"filename\": \"My first attachment\",\n      \"base64data\": \"VGhpcyBpcyB0aGUgY29udGVudHMgb2YgYSBub24gb3Mgc3BlY2lmaWMgZW1iZWRkZWQgZmlsZQ==\"\n    }\n  ],\n  \"pdfInfo\": {\n    \"numPages\": 1,\n    \"fingerprint\": \"1ee9219eb9eaa49acbfc20155ac359c3\"\n  }\n}\n```\n\nNote: The `images` and `attachments` arrays are optional and only included when they are detected in the PDF. \n\n## Limitations\n\n### Font Color\n\nFont color extraction is enabled by setting `includeColors: true`.\nThe `font.color` value is extracted with best effort by correlating the rendering operator list with the text content items using position matching.\nWhen pdf.js merges adjacent text runs with different colors into a single content item (e.g. differently colored words on the same line), \nonly the first color is reported.\nThis is an inherent limitation of how pdf.js combines text during extraction and cannot be resolved without upstream changes to pdf.js.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fffalt%2Fpdf.js-extract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fffalt%2Fpdf.js-extract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fffalt%2Fpdf.js-extract/lists"}