Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ashutoshvarma/pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources
https://github.com/ashutoshvarma/pyxpdf

cython pdf pdf-converter pdf-parser pdfparser pdftohtml pdftopng pdftotext python xpdf xpdf-reader

Last synced: 9 days ago
JSON representation

Fast and memory-efficient Python PDF Parser based on xpdf sources

Awesome Lists containing this project

README

        

pyxpdf
======
pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.

.. start-badges

.. list-table::
:stub-columns: 1

* - docs
- |docs|
* - tests
- |azure| |travis| |codecov|
* - package
- |pypi| |pythonver| |wheel| |downloads|
* - license
- |license|

.. end-badges

Features
--------
- Almost x20 times faster than pure python based pdf parsers (see `Speed Comparison`_)
- Extract text while maintaining original document layout (best possible)
- Support almost all PDF encodings, CMaps and predefined CMaps.
- Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
- Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
- No explict dependencies (except optional ones, see `Installation`_)
- Thread Safe

More Information
----------------

- `Documentation `_

- `Installation`_
- `Quickstart `_

- `Contribute `_

- `Build `_
- `Issues `_
- `Pull requests `_

- `Speed Comparison`_

- `Changelog `_

License
-------
``pyxpdf`` is licensed under the GNU General Public License (GPL),
version 2 or 3. See the `LICENSE `_

Credits
-------
- `xpdf reader `_ by Derek Noonburg
- `lxml `_ - project structure and build adapted from lxml
- `poppler `_ project

.. _`Speed Comparison`: https://pyxpdf.readthedocs.io/en/latest/compare.html
.. _`Installation`: https://pyxpdf.readthedocs.io/en/latest/intro.html#installation

.. |azure| image:: https://img.shields.io/azure-devops/build/ashutoshvarma/pyxpdf/1/master?label=Azure%20Pipelines&style=for-the-badge
:alt: Azure DevOps builds (branch)
:target: https://ashutoshvarma.visualstudio.com/pyxpdf/_build
.. |travis| image:: https://img.shields.io/travis/com/ashutoshvarma/pyxpdf?label=Travis&style=for-the-badge
:alt: Travis (.com)
:target: https://travis-ci.com/github/ashutoshvarma/pyxpdf
.. |docs| image:: https://img.shields.io/readthedocs/pyxpdf?style=for-the-badge
:alt: Read the Docs
:target: https://pyxpdf.readthedocs.io/en/latest/

.. |codecov| image:: https://img.shields.io/codecov/c/github/ashutoshvarma/pyxpdf?style=for-the-badge
:alt: Codecov
:target: https://codecov.io/gh/ashutoshvarma/pyxpdf/

.. |license| image:: https://img.shields.io/github/license/ashutoshvarma/pyxpdf?style=for-the-badge
:alt: GitHub
:target: https://github.com/ashutoshvarma/pyxpdf/blob/master/LICENSE

.. |pypi| image:: https://img.shields.io/pypi/v/pyxpdf?color=light&style=for-the-badge
:alt: PyPI
:target: https://pypi.org/project/pyxpdf/

.. |pythonver| image:: https://img.shields.io/pypi/pyversions/pyxpdf?style=for-the-badge
:alt: PyPI - Python Version
:target: https://pypi.org/project/pyxpdf/

.. |wheel| image:: https://img.shields.io/pypi/wheel/pyxpdf?style=for-the-badge
:alt: PyPI - Wheel
:target: https://pypi.org/project/pyxpdf/

.. |downloads| image:: https://img.shields.io/pypi/dm/pyxpdf?label=PyPI%20Downloads&style=for-the-badge
:alt: PyPI - Downloads
:target: https://pypi.org/project/pyxpdf/