https://github.com/py-pdf/awesome-pdf

A curated list of resources around PDF files
https://github.com/py-pdf/awesome-pdf

List: awesome-pdf

awesome-list pdf

Last synced: 3 months ago
JSON representation

A curated list of resources around PDF files

Host: GitHub
URL: https://github.com/py-pdf/awesome-pdf
Owner: py-pdf
License: bsd-3-clause
Created: 2022-04-14T04:14:27.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-08-02T05:35:42.000Z (11 months ago)
Last Synced: 2025-02-03T19:07:46.182Z (5 months ago)
Topics: awesome-list, pdf
Homepage:
Size: 24.4 KB
Stars: 118
Watchers: 5
Forks: 11
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

ultimate-awesome - awesome-pdf - A curated list of resources around PDF files. (Other Lists / Julia Lists)

README

        # Awesome PDF  [![Awesome](https://awesome.re/badge-flat.svg)](https://awesome.re)

A curated list of resources around PDF files

## The File Format

* PDF Association: [PDF Specification Index](https://www.pdfa.org/resource/pdf-specification-index/), 2021.

* Jindrich Kubec, Jiri Sejtko: [X is not enough! Grab the PDF by the tail!](https://www.virusbulletin.com/uploads/pdf/conference_slides/2011/Kubec-Sejtko-VB2011.pdf) at [Virus Bulletin](https://www.virusbulletin.com/), 2011.

* Selected compilation of PDF Standards from the [Adobe Open Source Reference](https://web.archive.org/web/20220827074128/https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/#pdf-reference), 2022.

  1. [PDF Reference 1.0](https://web.archive.org/web/20220827074128/https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.0.pdf)

  2. [PDF Reference 1.2](https://web.archive.org/web/20220827074128/https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.2.pdf)

  3. [PDF Reference 1.3](https://web.archive.org/web/20220827074128/https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.3.pdf)

  4. [PDF Reference 1.4](https://web.archive.org/web/20220827074128/https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.4.pdf)

  5. [PDF Reference 1.5 (v6)](https://web.archive.org/web/20220827074128/https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.5_v6.pdf)

  6. [PDF Reference 1.6](https://web.archive.org/web/20220827074128/https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.6.pdf)

  7. [PDF Reference 1.7 (ISO 32000, 2008)](https://web.archive.org/web/20220827074128/https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf)

  8. [PDF Reference 2.0 (ISO 32000-2:2020)](https://www.pdfa-inc.org/product/iso-32000-2-pdf-2-0-bundle-sponsored-access/) (freely available ISO standard due to corporate sponsorship)

* Adobe: [XMP Specification Part 3](https://github.com/adobe/xmp-docs/blob/master/XMPSpecifications/XMPSpecificationPart3.pdf), January 2020.

## Viewers

* [KOReader](https://github.com/koreader/koreader): a document viewer primarily aimed at e-ink readers

* [react-native-pdf](https://github.com/wonday/react-native-pdf): a react native PDF view component

* [PdfViewPager](https://github.com/voghDev/PdfViewPager): Android widget to display PDF documents in your Activities or Fragments

* [vue-pdf](https://github.com/FranckFreiburger/vue-pdf): vue.js pdf viewer

## Data Extraction

* [pdftotext](https://manpages.debian.org/stretch/poppler-utils/pdftotext.1.en.html): an application that converts Portable Document Format (PDF) files to plain text. Part of poppler-utils.

* [pdfminer.six](https://pypi.org/project/pdfminer.six/): a Python library for extracting information from PDF documents

    * [pdfplumber](https://github.com/jsvine/pdfplumber): Plumb a PDF for detailed information about each text character, rectangle, and line. Plus: Table extraction and visual debugging.

* [Tabula](https://github.com/tabulapdf/tabula): an application for extracting tables

* [camelot](https://github.com/atlanhq/camelot): PDF Table Extraction

* [awesome-document-understanding](https://github.com/tstanislawek/awesome-document-understanding): A curated list of resources for Document Understanding (DU) topic

## Generators

Anything that can produce PDF files from scratch:

* [fpdf2](https://pypi.org/project/fpdf2/): An Open Source Python library for generating PDFs

* pdflatex (e.g. in [TexLive](https://www.tug.org/texlive/)): A LaTeX-to-PDF converter

* [reportlab](https://pypi.org/project/reportlab/): An Open Source Python library for generating PDFs and graphics.

* [prawn](https://github.com/prawnpdf/prawn): a pure Ruby PDF generation library

* [react-pdf](https://github.com/diegomura/react-pdf): Create PDF files using React

* [markdown-pdf](https://github.com/alanshaw/markdown-pdf): Markdown to PDF converter

* [mpdf](https://github.com/mpdf/mpdf): PHP library generating PDF files from UTF-8 encoded HTML

## Manipulators

Anything that's used to edit an existing PDF file:

* [pdfarranger](https://github.com/pdfarranger/pdfarranger): a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using a graphical interface

* [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF): adds an OCR text layer to scanned PDF files, allowing them to be searched

## File Analysis / Security

* [Pdfalyzer](https://github.com/michelcrypt4d4mus/pdfalyzer): PDF analysis tool to visualize the internal data structure of a PDF in large and colorful diagrams as well as scanning the binary streams embedded in the PDF against a collection of malicious PDF specific YARA rules.

* [Malicious PDF Generator](https://github.com/jonaslejon/malicious-pdf): generate a bunch of malicious pdf files with phone-home functionality

* [pdfbox](https://pdfbox.apache.org/1.8/commandline.html): tool in java to browse internally a pdf. [Download](https://pdfbox.apache.org/download.cgi) and use as `pdfbox-app-x.y.z.jar debug pdf_file`

## Multi-Purpose Libraries

* [pdftk](https://www.pdflabs.com/tools/pdftk-server/): command-line tool for working with PDFs. It is commonly used for client-side scripting or server-side processing of PDFs.

* [pypdf](https://pypi.org/project/pypdf/) ![](https://shields.io/badge/-extract-inactive) ![](https://shields.io/badge/-manipulate-inactive): a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

* [pikepdf](https://github.com/pikepdf/pikepdf) ![](https://shields.io/badge/-extract-inactive) ![](https://shields.io/badge/-manipulate-inactive): a Python library for reading and writing PDF, powered by qpdf

* [PyMuPDF](https://github.com/pymupdf/PyMuPDF) ![](https://shields.io/badge/-extract-inactive) ![](https://shields.io/badge/-manipulate-inactive) ![](https://shields.io/badge/-render-inactive): Python bindings to MuPDF.

* [pypdfium2](https://github.com/pypdfium2-team/pypdfium2) ![](https://shields.io/badge/-extract-inactive) ![](https://shields.io/badge/-manipulate-inactive) ![](https://shields.io/badge/-create-inactive) ![](https://shields.io/badge/-render-inactive): Python bindings to PDFium.

* [borb](https://github.com/jorisschellekens/borb) ![](https://shields.io/badge/-extract-inactive)  ![](https://shields.io/badge/-manipulate-inactive) ![](https://shields.io/badge/-create-inactive): reading, creating and manipulating PDF files in python

* [pdfcpu](https://github.com/pdfcpu/pdfcpu) ![](https://shields.io/badge/-extract-inactive)  ![](https://shields.io/badge/-manipulate-inactive) ![](https://shields.io/badge/-create-inactive): batch processing and scripting via a rich command line

* [pdf-lib](https://github.com/Hopding/pdf-lib)  ![](https://shields.io/badge/-manipulate-inactive) ![](https://shields.io/badge/-create-inactive): Create and modify PDF documents in any JavaScript environment

* [HexaPDF](https://hexapdf.gettalong.org): ![](https://shields.io/badge/-extract-inactive) ![](https://shields.io/badge/-manipulate-inactive) ![](https://shields.io/badge/-create-inactive): A pure Ruby PDF creation and manipulation library

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/py-pdf/awesome-pdf

Awesome Lists containing this project

README