https://github.com/smalot/pdfparser
PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
https://github.com/smalot/pdfparser
Last synced: 22 days ago
JSON representation
PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
- Host: GitHub
- URL: https://github.com/smalot/pdfparser
- Owner: smalot
- License: lgpl-3.0
- Created: 2013-08-30T21:58:21.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2025-03-31T14:34:42.000Z (2 months ago)
- Last Synced: 2025-05-14T08:02:53.901Z (22 days ago)
- Language: PHP
- Homepage:
- Size: 63.2 MB
- Stars: 2,528
- Watchers: 83
- Forks: 557
- Open Issues: 199
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-pdf - pdfparser
- awesome-pdf - PdfParser - Data extraction library. (Libraries / PHP)
- php-awesome - PDFParser - PDF 文档解析库 (类库 / PDF/条形码)
README
# PDF parser
[](//packagist.org/packages/smalot/pdfparser)


[](https://scrutinizer-ci.com/g/smalot/pdfparser/?branch=master)
[](//packagist.org/packages/smalot/pdfparser)The `smalot/pdfparser` is a standalone PHP package that provides various tools to extract data from PDF files.
This library is under **active maintenance**.
There is no active development by the author of this library (at the moment), but we welcome any pull request adding/extending functionality!
See [CONTRIBUTING.md](./CONTRIBUTING.md) for further information about how to contribute.## Features
- Load/parse objects and headers
- Extract metadata (author, description, ...)
- Extract text from ordered pages
- Support of compressed PDFs
- Support of MAC OS Roman charset encoding
- Handling of hexa and octal encoding in text sections
- Create custom configurations (see [CustomConfig.md](/doc/CustomConfig.md)).Currently, secured documents and extracting form data are not supported.
## License
This library is under the [LGPLv3 license](https://github.com/smalot/pdfparser/blob/master/LICENSE.txt).
## Install
This library requires PHP 7.1+ since [v1](https://github.com/smalot/pdfparser/releases/tag/v1.0.0).
You can install it via [Composer](https://getcomposer.org/):```bash
composer require smalot/pdfparser
```In case you can't use Composer, you can include `alt_autoload.php-dist`. It will include all required files automatically.
## Quick example
```php
parseFile('/path/to/document.pdf');$text = $pdf->getText();
echo $text;
```Further usage information can be found [here](/doc/Usage.md).
## Documentation
Documentation can be found in the [doc](/doc) folder.