{"id":13588968,"url":"https://github.com/Yuras/pdf-toolbox","last_synced_at":"2025-04-08T07:31:32.035Z","repository":{"id":6936952,"uuid":"8188354","full_name":"Yuras/pdf-toolbox","owner":"Yuras","description":"A collection of tools for processing PDF files in Haskell","archived":false,"fork":false,"pushed_at":"2024-05-29T18:41:25.000Z","size":582,"stargazers_count":182,"open_issues_count":12,"forks_count":26,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-04-04T14:08:04.791Z","etag":null,"topics":["haskell","pdf"],"latest_commit_sha":null,"homepage":"","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yuras.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-02-13T21:35:47.000Z","updated_at":"2025-03-29T14:41:49.000Z","dependencies_parsed_at":"2024-05-29T21:41:57.013Z","dependency_job_id":null,"html_url":"https://github.com/Yuras/pdf-toolbox","commit_stats":{"total_commits":245,"total_committers":7,"mean_commits":35.0,"dds":0.02857142857142858,"last_synced_commit":"e2ca024904869ed5996b4bd0a2ae5f21c31bb20d"},"previous_names":[],"tags_count":30,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yuras%2Fpdf-toolbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yuras%2Fpdf-toolbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yuras%2Fpdf-toolbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yuras%2Fpdf-toolbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yuras","download_url":"https://codeload.github.com/Yuras/pdf-toolbox/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247796147,"owners_count":20997521,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["haskell","pdf"],"created_at":"2024-08-01T16:00:16.516Z","updated_at":"2025-04-08T07:31:31.684Z","avatar_url":"https://github.com/Yuras.png","language":"Haskell","readme":"pdf-toolbox\n===========\n\n[![Haskell CI](https://github.com/Yuras/pdf-toolbox/actions/workflows/build.yml/badge.svg)](https://github.com/Yuras/pdf-toolbox/actions/workflows/build.yml)\n\nA collection of tools for processing PDF files\n\n\nFeatures\n--------\n\n * Written in Haskell\n * Parsing on demand. You don't need to parse or load into memory\nthe entire PDF file just to extract one image\n * Different levels of abstraction. You can inspect high level (catalog, page tree, pages)\nor low level (xref, trailer, object) structure of PDF file.\nYou can even switch between levels of details on the fly.\n * Extremely fast and memory efficient when you need to inspect only part of the document\n * Resonably fast and memory efficient in general case\n * Text extraction with exact glyph positions\nIt can be used e.g. to implement text selection and copying in pdf viewer\n * Full support of xref streams and object streams\n * Supports editing of PDF files (incremental updates)\n * Basic support for PDF file generating\n * Encrypted PDF documents are partially supported\n\nStill in TODO list\n------------------\n\n * Linearized PDF files\n * Higher level API for incremental updates and PDF generating\n\nExamples\n--------\n\n(Also see `examples` and `viewer` directories)\n\nInspect high level structure:\n\n```haskell\nimport Control.Monad\nimport Pdf.Document\n\nmain =\n  withPdfFile \"input.pdf\" $ \\pdf -\u003e do\n    encrypted \u003c- isEncrypted pdf\n    when encrypted $ do\n      ok \u003c- setUserPassword pdf defaultUserPassword\n      unless ok $\n        fail \"need password\"\n    doc \u003c- document pdf\n    catalog \u003c- documentCatalog doc\n    rootNode \u003c- catalogPageNode catalog\n    count \u003c- pageNodeNKids rootNode\n    print count\n    -- the first page of the document\n    page \u003c- pageNodePageByNum rootNode 0\n    -- extract text\n    txt \u003c- pageExtractText page\n    print txt\n    ...\n```\n","funding_links":[],"categories":["HASKELL","Haskell","Libraries"],"sub_categories":["Misc/Multi-language"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYuras%2Fpdf-toolbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYuras%2Fpdf-toolbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYuras%2Fpdf-toolbox/lists"}