https://github.com/xp-forge/pdf-parser
PDF Parser
https://github.com/xp-forge/pdf-parser
Last synced: 11 months ago
JSON representation
PDF Parser
- Host: GitHub
- URL: https://github.com/xp-forge/pdf-parser
- Owner: xp-forge
- Created: 2025-06-22T10:21:07.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-22T11:56:08.000Z (about 1 year ago)
- Last Synced: 2025-06-22T12:32:01.501Z (about 1 year ago)
- Language: PHP
- Size: 9.77 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog.md
Awesome Lists containing this project
README
PDF Parser
==========
[](https://github.com/xp-forge/pdf-parser/actions)
[](https://github.com/xp-framework/core)
[](https://github.com/xp-framework/core/blob/master/LICENCE.md)
[](http://php.net/)
[](http://php.net/)
[](https://packagist.org/packages/xp-forge/pdf-parser)
Parses PDF files to extract text and images.
Example
-------
Low-level usage:
```php
use com\adobe\pdf\PdfReader;
use util\cmd\Console;
use io\streams\FileInputStream;
$reader= new PdfReader(new FileInputStream($argv[1]));
// Create objects lookup table while streaming
$objects= $trailer= [];
foreach ($reader->objects() as $kind => $value) {
if ('object' === $kind) {
$objects[$value['id']->hashCode()]= $value['dict'];
} else if ('trailer' === $kind) {
$trailer+= $value;
}
}
Console::writeLine('Trailer: ', $trailer);
// Optional meta information like author and creation date
if ($info= ($trailer['Info'] ?? null)) {
Console::writeLine('Info: ', $objects[$info->hashCode()]);
}
// Root catalogue and pages enumeration
Console::writeLine('Root: ', $objects[$trailer['Root']->hashCode()]);
Console::writeLine('Pages: ', $objects[$trailer['Pages']->hashCode()]);
```
See also
--------
* https://pdfa.org/resource/iso-32000-2/
* https://github.com/pdf-association
* https://opensource.adobe.com/dc-acrobat-sdk-docs/pdflsdk/#pdf-reference