https://github.com/smalot/pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
https://github.com/smalot/pdfparser

Last synced: 6 months ago
JSON representation

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.

Host: GitHub
URL: https://github.com/smalot/pdfparser
Owner: smalot
License: lgpl-3.0
Created: 2013-08-30T21:58:21.000Z (almost 13 years ago)
Default Branch: master
Last Pushed: 2025-03-31T14:34:42.000Z (over 1 year ago)
Last Synced: 2025-05-14T08:02:53.901Z (about 1 year ago)
Language: PHP
Homepage:
Size: 63.2 MB
Stars: 2,528
Watchers: 83
Forks: 557
Open Issues: 199
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt

Awesome Lists containing this project

awesome-pdf - pdfparser
php-awesome - PDFParser - PDF 文档解析库 (类库 / PDF/条形码)
awesome-pdf - PdfParser - Data extraction library. (Libraries / PHP)
awesome-pdf - smalot/pdfparser - A standalone PHP library, provides various tools to extract data from a PDF file. (Parsers, OCR and extraction)

README

          # PDF parser

[![Version](https://poser.pugx.org/smalot/pdfparser/v)](//packagist.org/packages/smalot/pdfparser)

![CI](https://github.com/smalot/pdfparser/workflows/CI/badge.svg)

![CS](https://github.com/smalot/pdfparser/workflows/CS/badge.svg)

[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/smalot/pdfparser/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/smalot/pdfparser/?branch=master)

[![Downloads](https://poser.pugx.org/smalot/pdfparser/downloads)](//packagist.org/packages/smalot/pdfparser)

The `smalot/pdfparser` is a standalone PHP package that provides various tools to extract data from PDF files.

This library is under **active maintenance**.

There is no active development by the author of this library (at the moment), but we welcome any pull request adding/extending functionality!

See [CONTRIBUTING.md](./CONTRIBUTING.md) for further information about how to contribute.

## Features

- Load/parse objects and headers

- Extract metadata (author, description, ...)

- Extract text from ordered pages

- Support of compressed PDFs

- Support of MAC OS Roman charset encoding

- Handling of hexa and octal encoding in text sections

- Create custom configurations (see [CustomConfig.md](/doc/CustomConfig.md)).

Currently, secured documents and extracting form data are not supported.

## License

This library is under the [LGPLv3 license](https://github.com/smalot/pdfparser/blob/master/LICENSE.txt).

## Install

This library requires PHP 7.1+ since [v1](https://github.com/smalot/pdfparser/releases/tag/v1.0.0).

You can install it via [Composer](https://getcomposer.org/):

```bash

composer require smalot/pdfparser

```

In case you can't use Composer, you can include `alt_autoload.php-dist`. It will include all required files automatically.

## Quick example

```php

parseFile('/path/to/document.pdf');

$text = $pdf->getText();

echo $text;

```

Further usage information can be found [here](/doc/Usage.md).

## Documentation

Documentation can be found in the [doc](/doc) folder.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/smalot/pdfparser

Awesome Lists containing this project

README