Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Yuras/pdf-toolbox
A collection of tools for processing PDF files in Haskell
https://github.com/Yuras/pdf-toolbox
haskell pdf
Last synced: 3 months ago
JSON representation
A collection of tools for processing PDF files in Haskell
- Host: GitHub
- URL: https://github.com/Yuras/pdf-toolbox
- Owner: Yuras
- Created: 2013-02-13T21:35:47.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2024-05-29T18:41:25.000Z (5 months ago)
- Last Synced: 2024-07-20T21:43:50.866Z (4 months ago)
- Topics: haskell, pdf
- Language: Haskell
- Homepage:
- Size: 568 KB
- Stars: 180
- Watchers: 16
- Forks: 25
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-pdf - toolbox
README
pdf-toolbox
===========[![Haskell CI](https://github.com/Yuras/pdf-toolbox/actions/workflows/build.yml/badge.svg)](https://github.com/Yuras/pdf-toolbox/actions/workflows/build.yml)
A collection of tools for processing PDF files
Features
--------* Written in Haskell
* Parsing on demand. You don't need to parse or load into memory
the entire PDF file just to extract one image
* Different levels of abstraction. You can inspect high level (catalog, page tree, pages)
or low level (xref, trailer, object) structure of PDF file.
You can even switch between levels of details on the fly.
* Extremely fast and memory efficient when you need to inspect only part of the document
* Resonably fast and memory efficient in general case
* Text extraction with exact glyph positions
It can be used e.g. to implement text selection and copying in pdf viewer
* Full support of xref streams and object streams
* Supports editing of PDF files (incremental updates)
* Basic support for PDF file generating
* Encrypted PDF documents are partially supportedStill in TODO list
------------------* Linearized PDF files
* Higher level API for incremental updates and PDF generatingExamples
--------(Also see `examples` and `viewer` directories)
Inspect high level structure:
```haskell
import Control.Monad
import Pdf.Documentmain =
withPdfFile "input.pdf" $ \pdf -> do
encrypted <- isEncrypted pdf
when encrypted $ do
ok <- setUserPassword pdf defaultUserPassword
unless ok $
fail "need password"
doc <- document pdf
catalog <- documentCatalog doc
rootNode <- catalogPageNode catalog
count <- pageNodeNKids rootNode
print count
-- the first page of the document
page <- pageNodePageByNum rootNode 0
-- extract text
txt <- pageExtractText page
print txt
...
```