{"id":18686529,"url":"https://github.com/ashutoshvarma/pyxpdf","last_synced_at":"2025-07-13T23:08:10.334Z","repository":{"id":45577627,"uuid":"250876246","full_name":"ashutoshvarma/pyxpdf","owner":"ashutoshvarma","description":"Fast and memory-efficient Python PDF Parser based on xpdf sources","archived":false,"fork":false,"pushed_at":"2023-12-15T08:43:40.000Z","size":12775,"stargazers_count":42,"open_issues_count":20,"forks_count":17,"subscribers_count":5,"default_branch":"dev","last_synced_at":"2025-06-17T12:44:58.539Z","etag":null,"topics":["cython","pdf","pdf-converter","pdf-parser","pdfparser","pdftohtml","pdftopng","pdftotext","python","xpdf","xpdf-reader"],"latest_commit_sha":null,"homepage":"https://pyxpdf.readthedocs.io/","language":"Cython","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashutoshvarma.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-28T19:25:53.000Z","updated_at":"2025-04-04T13:10:44.000Z","dependencies_parsed_at":"2024-06-21T20:21:18.121Z","dependency_job_id":"96fefa88-31ed-4aec-a47f-c4af61572db4","html_url":"https://github.com/ashutoshvarma/pyxpdf","commit_stats":{"total_commits":307,"total_committers":4,"mean_commits":76.75,"dds":0.009771986970684043,"last_synced_commit":"05527ec67ebb6b6c28dc24977ac7110f5ed63899"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/ashutoshvarma/pyxpdf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashutoshvarma%2Fpyxpdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashutoshvarma%2Fpyxpdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashutoshvarma%2Fpyxpdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashutoshvarma%2Fpyxpdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashutoshvarma","download_url":"https://codeload.github.com/ashutoshvarma/pyxpdf/tar.gz/refs/heads/dev","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashutoshvarma%2Fpyxpdf/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265218748,"owners_count":23729527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cython","pdf","pdf-converter","pdf-parser","pdfparser","pdftohtml","pdftopng","pdftotext","python","xpdf","xpdf-reader"],"created_at":"2024-11-07T10:27:59.375Z","updated_at":"2025-07-13T23:08:10.309Z","avatar_url":"https://github.com/ashutoshvarma.png","language":"Cython","funding_links":[],"categories":[],"sub_categories":[],"readme":"pyxpdf\n======\npyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.\n\n\n.. start-badges\n\n.. list-table::\n    :stub-columns: 1\n\n    * - docs\n      - |docs|\n    * - tests\n      - |azure| |travis| |codecov| \n    * - package\n      - |pypi| |pythonver| |wheel| |downloads|\n    * - license\n      - |license|\n\n.. end-badges\n\nFeatures\n--------\n- Almost x20 times faster than pure python based pdf parsers (see `Speed Comparison`_)\n- Extract text while maintaining original document layout (best possible)\n- Support almost all PDF encodings, CMaps and predefined CMaps.\n- Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.\n- Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.\n- No explict dependencies (except optional ones, see `Installation`_)\n- Thread Safe\n\nMore Information\n----------------\n\n- `Documentation \u003chttps://pyxpdf.readthedocs.io/\u003e`_\n\n  - `Installation`_\n  - `Quickstart \u003chttps://pyxpdf.readthedocs.io/en/latest/intro.html#quick-start\u003e`_\n\n- `Contribute \u003chttps://github.com/ashutoshvarma/pyxpdf/blob/master/.github/CONTRIBUTING.md\u003e`_\n\n  - `Build \u003chttps://github.com/ashutoshvarma/pyxpdf/blob/master/BUILD.rst\u003e`_\n  - `Issues \u003chttps://github.com/ashutoshvarma/pyxpdf/issues\u003e`_\n  - `Pull requests \u003chttps://github.com/ashutoshvarma/pyxpdf/pulls\u003e`_\n\n- `Speed Comparison`_\n\n- `Changelog \u003chttps://pyxpdf.readthedocs.io/en/latest/changelog.html\u003e`_\n\nLicense\n-------\n``pyxpdf`` is licensed under the GNU General Public License (GPL),\nversion 2 or 3. See the `LICENSE \u003chttps://github.com/ashutoshvarma/pyxpdf/blob/master/LICENSE\u003e`_\n\nCredits\n-------\n- `xpdf reader \u003chttps://www.xpdfreader.com/\u003e`_ by Derek Noonburg\n- `lxml \u003chttps://www.github.com/lxml/lxml\u003e`_ - project structure and build adapted from lxml\n- `poppler \u003chttps://poppler.freedesktop.org/\u003e`_ project\n\n.. _`Speed Comparison`: https://pyxpdf.readthedocs.io/en/latest/compare.html\n.. _`Installation`: https://pyxpdf.readthedocs.io/en/latest/intro.html#installation\n\n.. |azure| image:: https://img.shields.io/azure-devops/build/ashutoshvarma/pyxpdf/1/master?label=Azure%20Pipelines\u0026style=for-the-badge   \n   :alt: Azure DevOps builds (branch)\n   :target: https://ashutoshvarma.visualstudio.com/pyxpdf/_build\n.. |travis| image:: https://img.shields.io/travis/com/ashutoshvarma/pyxpdf?label=Travis\u0026style=for-the-badge   \n   :alt: Travis (.com)\n   :target: https://travis-ci.com/github/ashutoshvarma/pyxpdf     \n.. |docs| image:: https://img.shields.io/readthedocs/pyxpdf?style=for-the-badge         \n   :alt: Read the Docs\n   :target: https://pyxpdf.readthedocs.io/en/latest/\n          \n.. |codecov| image:: https://img.shields.io/codecov/c/github/ashutoshvarma/pyxpdf?style=for-the-badge   \n   :alt: Codecov\n   :target: https://codecov.io/gh/ashutoshvarma/pyxpdf/\n             \n.. |license| image:: https://img.shields.io/github/license/ashutoshvarma/pyxpdf?style=for-the-badge   \n   :alt: GitHub\n   :target: https://github.com/ashutoshvarma/pyxpdf/blob/master/LICENSE\n             \n.. |pypi| image:: https://img.shields.io/pypi/v/pyxpdf?color=light\u0026style=for-the-badge   \n   :alt: PyPI\n   :target: https://pypi.org/project/pyxpdf/\n\n.. |pythonver| image:: https://img.shields.io/pypi/pyversions/pyxpdf?style=for-the-badge   \n   :alt: PyPI - Python Version\n   :target: https://pypi.org/project/pyxpdf/\n\n.. |wheel| image:: https://img.shields.io/pypi/wheel/pyxpdf?style=for-the-badge   \n   :alt: PyPI - Wheel\n   :target: https://pypi.org/project/pyxpdf/\n           \n.. |downloads| image:: https://img.shields.io/pypi/dm/pyxpdf?label=PyPI%20Downloads\u0026style=for-the-badge   \n   :alt: PyPI - Downloads\n   :target: https://pypi.org/project/pyxpdf/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashutoshvarma%2Fpyxpdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashutoshvarma%2Fpyxpdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashutoshvarma%2Fpyxpdf/lists"}