{"id":13404807,"url":"https://github.com/smalot/pdfparser","last_synced_at":"2026-01-08T10:12:49.390Z","repository":{"id":10357382,"uuid":"12496360","full_name":"smalot/pdfparser","owner":"smalot","description":"PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.","archived":false,"fork":false,"pushed_at":"2025-03-31T14:34:42.000Z","size":66233,"stargazers_count":2528,"open_issues_count":199,"forks_count":557,"subscribers_count":83,"default_branch":"master","last_synced_at":"2025-05-14T08:02:53.901Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smalot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2013-08-30T21:58:21.000Z","updated_at":"2025-05-13T19:08:27.000Z","dependencies_parsed_at":"2023-01-13T15:53:48.470Z","dependency_job_id":"9aea542e-cd21-4321-90ed-429bede10e10","html_url":"https://github.com/smalot/pdfparser","commit_stats":{"total_commits":380,"total_committers":72,"mean_commits":5.277777777777778,"dds":0.7973684210526316,"last_synced_commit":"f44ada017eac4f607ffeb1caca96a2347d48f38f"},"previous_names":[],"tags_count":72,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smalot%2Fpdfparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smalot%2Fpdfparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smalot%2Fpdfparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smalot%2Fpdfparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smalot","download_url":"https://codeload.github.com/smalot/pdfparser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254101588,"owners_count":22014907,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T19:01:51.689Z","updated_at":"2026-01-08T10:12:49.349Z","avatar_url":"https://github.com/smalot.png","language":"PHP","funding_links":[],"categories":["PHP","类库","Libraries","Parsers, OCR and extraction"],"sub_categories":["PDF/条形码","PHP"],"readme":"# PDF parser\n\n[![Version](https://poser.pugx.org/smalot/pdfparser/v)](//packagist.org/packages/smalot/pdfparser)\n![CI](https://github.com/smalot/pdfparser/workflows/CI/badge.svg)\n![CS](https://github.com/smalot/pdfparser/workflows/CS/badge.svg)\n[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/smalot/pdfparser/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/smalot/pdfparser/?branch=master)\n[![Downloads](https://poser.pugx.org/smalot/pdfparser/downloads)](//packagist.org/packages/smalot/pdfparser)\n\nThe `smalot/pdfparser` is a standalone PHP package that provides various tools to extract data from PDF files.\n\nThis library is under **active maintenance**.\nThere is no active development by the author of this library (at the moment), but we welcome any pull request adding/extending functionality!\nSee [CONTRIBUTING.md](./CONTRIBUTING.md) for further information about how to contribute.\n\n## Features\n\n- Load/parse objects and headers\n- Extract metadata (author, description, ...)\n- Extract text from ordered pages\n- Support of compressed PDFs\n- Support of MAC OS Roman charset encoding\n- Handling of hexa and octal encoding in text sections\n- Create custom configurations (see [CustomConfig.md](/doc/CustomConfig.md)).\n\nCurrently, secured documents and extracting form data are not supported.\n\n## License\n\nThis library is under the [LGPLv3 license](https://github.com/smalot/pdfparser/blob/master/LICENSE.txt).\n\n## Install\n\nThis library requires PHP 7.1+ since [v1](https://github.com/smalot/pdfparser/releases/tag/v1.0.0).\nYou can install it via [Composer](https://getcomposer.org/):\n\n```bash\ncomposer require smalot/pdfparser\n```\n\nIn case you can't use Composer, you can include `alt_autoload.php-dist`. It will include all required files automatically.\n\n## Quick example\n\n```php\n\u003c?php\n\n// Parse PDF file and build necessary objects.\n$parser = new \\Smalot\\PdfParser\\Parser();\n$pdf = $parser-\u003eparseFile('/path/to/document.pdf');\n\n$text = $pdf-\u003egetText();\necho $text;\n```\n\nFurther usage information can be found [here](/doc/Usage.md).\n\n## Documentation\n\nDocumentation can be found in the [doc](/doc) folder.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmalot%2Fpdfparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmalot%2Fpdfparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmalot%2Fpdfparser/lists"}