Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ashutoshvarma/pyxpdf
Fast and memory-efficient Python PDF Parser based on xpdf sources
https://github.com/ashutoshvarma/pyxpdf
cython pdf pdf-converter pdf-parser pdfparser pdftohtml pdftopng pdftotext python xpdf xpdf-reader
Last synced: 9 days ago
JSON representation
Fast and memory-efficient Python PDF Parser based on xpdf sources
- Host: GitHub
- URL: https://github.com/ashutoshvarma/pyxpdf
- Owner: ashutoshvarma
- License: other
- Created: 2020-03-28T19:25:53.000Z (over 4 years ago)
- Default Branch: dev
- Last Pushed: 2023-12-15T08:43:40.000Z (11 months ago)
- Last Synced: 2024-11-01T16:39:48.342Z (15 days ago)
- Topics: cython, pdf, pdf-converter, pdf-parser, pdfparser, pdftohtml, pdftopng, pdftotext, python, xpdf, xpdf-reader
- Language: Cython
- Homepage: https://pyxpdf.readthedocs.io/
- Size: 12.2 MB
- Stars: 40
- Watchers: 5
- Forks: 16
- Open Issues: 19
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
pyxpdf
======
pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources... start-badges
.. list-table::
:stub-columns: 1* - docs
- |docs|
* - tests
- |azure| |travis| |codecov|
* - package
- |pypi| |pythonver| |wheel| |downloads|
* - license
- |license|.. end-badges
Features
--------
- Almost x20 times faster than pure python based pdf parsers (see `Speed Comparison`_)
- Extract text while maintaining original document layout (best possible)
- Support almost all PDF encodings, CMaps and predefined CMaps.
- Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
- Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
- No explict dependencies (except optional ones, see `Installation`_)
- Thread SafeMore Information
----------------- `Documentation `_
- `Installation`_
- `Quickstart `_- `Contribute `_
- `Build `_
- `Issues `_
- `Pull requests `_- `Speed Comparison`_
- `Changelog `_
License
-------
``pyxpdf`` is licensed under the GNU General Public License (GPL),
version 2 or 3. See the `LICENSE `_Credits
-------
- `xpdf reader `_ by Derek Noonburg
- `lxml `_ - project structure and build adapted from lxml
- `poppler `_ project.. _`Speed Comparison`: https://pyxpdf.readthedocs.io/en/latest/compare.html
.. _`Installation`: https://pyxpdf.readthedocs.io/en/latest/intro.html#installation.. |azure| image:: https://img.shields.io/azure-devops/build/ashutoshvarma/pyxpdf/1/master?label=Azure%20Pipelines&style=for-the-badge
:alt: Azure DevOps builds (branch)
:target: https://ashutoshvarma.visualstudio.com/pyxpdf/_build
.. |travis| image:: https://img.shields.io/travis/com/ashutoshvarma/pyxpdf?label=Travis&style=for-the-badge
:alt: Travis (.com)
:target: https://travis-ci.com/github/ashutoshvarma/pyxpdf
.. |docs| image:: https://img.shields.io/readthedocs/pyxpdf?style=for-the-badge
:alt: Read the Docs
:target: https://pyxpdf.readthedocs.io/en/latest/
.. |codecov| image:: https://img.shields.io/codecov/c/github/ashutoshvarma/pyxpdf?style=for-the-badge
:alt: Codecov
:target: https://codecov.io/gh/ashutoshvarma/pyxpdf/
.. |license| image:: https://img.shields.io/github/license/ashutoshvarma/pyxpdf?style=for-the-badge
:alt: GitHub
:target: https://github.com/ashutoshvarma/pyxpdf/blob/master/LICENSE
.. |pypi| image:: https://img.shields.io/pypi/v/pyxpdf?color=light&style=for-the-badge
:alt: PyPI
:target: https://pypi.org/project/pyxpdf/.. |pythonver| image:: https://img.shields.io/pypi/pyversions/pyxpdf?style=for-the-badge
:alt: PyPI - Python Version
:target: https://pypi.org/project/pyxpdf/.. |wheel| image:: https://img.shields.io/pypi/wheel/pyxpdf?style=for-the-badge
:alt: PyPI - Wheel
:target: https://pypi.org/project/pyxpdf/
.. |downloads| image:: https://img.shields.io/pypi/dm/pyxpdf?label=PyPI%20Downloads&style=for-the-badge
:alt: PyPI - Downloads
:target: https://pypi.org/project/pyxpdf/