https://github.com/Yuras/pdf-toolbox

A collection of tools for processing PDF files in Haskell
https://github.com/Yuras/pdf-toolbox

haskell pdf

Last synced: 10 months ago
JSON representation

A collection of tools for processing PDF files in Haskell

Host: GitHub
URL: https://github.com/Yuras/pdf-toolbox
Owner: Yuras
Created: 2013-02-13T21:35:47.000Z (almost 13 years ago)
Default Branch: master
Last Pushed: 2024-05-29T18:41:25.000Z (over 1 year ago)
Last Synced: 2025-04-04T14:08:04.791Z (10 months ago)
Topics: haskell, pdf
Language: Haskell
Homepage:
Size: 568 KB
Stars: 182
Watchers: 15
Forks: 26
Open Issues: 12
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-pdf - toolbox
awesome-pdf - pdf-toolbox - Haskell PDF processing tools. (Libraries / Misc/Multi-language)

README

          pdf-toolbox

===========

[![Haskell CI](https://github.com/Yuras/pdf-toolbox/actions/workflows/build.yml/badge.svg)](https://github.com/Yuras/pdf-toolbox/actions/workflows/build.yml)

A collection of tools for processing PDF files

Features

--------

 * Written in Haskell

 * Parsing on demand. You don't need to parse or load into memory

the entire PDF file just to extract one image

 * Different levels of abstraction. You can inspect high level (catalog, page tree, pages)

or low level (xref, trailer, object) structure of PDF file.

You can even switch between levels of details on the fly.

 * Extremely fast and memory efficient when you need to inspect only part of the document

 * Resonably fast and memory efficient in general case

 * Text extraction with exact glyph positions

It can be used e.g. to implement text selection and copying in pdf viewer

 * Full support of xref streams and object streams

 * Supports editing of PDF files (incremental updates)

 * Basic support for PDF file generating

 * Encrypted PDF documents are partially supported

Still in TODO list

------------------

 * Linearized PDF files

 * Higher level API for incremental updates and PDF generating

Examples

--------

(Also see `examples` and `viewer` directories)

Inspect high level structure:

```haskell

import Control.Monad

import Pdf.Document

main =

  withPdfFile "input.pdf" $ \pdf -> do

    encrypted <- isEncrypted pdf

    when encrypted $ do

      ok <- setUserPassword pdf defaultUserPassword

      unless ok $

        fail "need password"

    doc <- document pdf

    catalog <- documentCatalog doc

    rootNode <- catalogPageNode catalog

    count <- pageNodeNKids rootNode

    print count

    -- the first page of the document

    page <- pageNodePageByNum rootNode 0

    -- extract text

    txt <- pageExtractText page

    print txt

    ...

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Yuras/pdf-toolbox

Awesome Lists containing this project

README