https://github.com/py-pdf/benchmarks
Benchmarking PDF libraries
https://github.com/py-pdf/benchmarks
benchmark data-extraction mupdf pdf poppler-utils pypdf2 text-extraction
Last synced: 8 months ago
JSON representation
Benchmarking PDF libraries
- Host: GitHub
- URL: https://github.com/py-pdf/benchmarks
- Owner: py-pdf
- License: bsd-3-clause
- Created: 2022-05-08T13:18:00.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2023-10-31T21:45:39.000Z (over 2 years ago)
- Last Synced: 2025-05-20T03:07:19.811Z (10 months ago)
- Topics: benchmark, data-extraction, mupdf, pdf, poppler-utils, pypdf2, text-extraction
- Language: Python
- Homepage:
- Size: 3.73 MB
- Stars: 281
- Watchers: 5
- Forks: 15
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PDF Library Benchmarks
This benchmark is about reading pure PDF files - notscanned documents and not documents that applied OCR.
## Benchmarking machine
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
## Input Documents
| # | Name | File Size | Pages |
| -: | :----------------------------------------------------------------------------------------------- | --------: | ----: |
| 1 | [2201.00214](https://arxiv.org/pdf/2201.00214.pdf) | 2.4MiB | 22 |
| 2 | [GeoTopo-book](https://github.com/py-pdf/sample-files/raw/main/009-pdflatex-geotopo/GeoTopo.pdf) | 5.1MiB | 117 |
| 3 | [2201.00151](https://arxiv.org/pdf/2201.00151.pdf) | 1.5MiB | 12 |
| 4 | [1707.09725](https://arxiv.org/pdf/1707.09725.pdf) | 7.0MiB | 134 |
| 5 | [2201.00021](https://arxiv.org/pdf/2201.00021.pdf) | 2.6MiB | 10 |
| 6 | [2201.00037](https://arxiv.org/pdf/2201.00037.pdf) | 2.9MiB | 33 |
| 7 | [2201.00069](https://arxiv.org/pdf/2201.00069.pdf) | 14.7MiB | 15 |
| 8 | [2201.00178](https://arxiv.org/pdf/2201.00178.pdf) | 2.3MiB | 16 |
| 9 | [2201.00201](https://arxiv.org/pdf/2201.00201.pdf) | 1.3MiB | 9 |
| 10 | [1602.06541](https://arxiv.org/pdf/1602.06541.pdf) | 2.9MiB | 16 |
| 11 | [2201.00200](https://arxiv.org/pdf/2201.00200.pdf) | 284.8KiB | 7 |
| 12 | [2201.00022](https://arxiv.org/pdf/2201.00022.pdf) | 1.2MiB | 14 |
| 13 | [2201.00029](https://arxiv.org/pdf/2201.00029.pdf) | 797.6KiB | 12 |
| 14 | [1601.03642](https://arxiv.org/pdf/1601.03642.pdf) | 1004.9KiB | 8 |
## Libraries
| Name | Last PyPI Release | License | Version | Dependencies |
| -----------: | :---------------- | ------------------------------: | -------: | :-------------------------------------------------------- |
| pypdfium2 | 2024-12-19 | Apache-2.0 or BSD-3-Clause | 4.30.1 | PDFium (Foxit/Google) |
| pdfminer.six | 2025-05-06 | MIT/X | 20250506 | |
| pdfplumber | 2025-06-12 | MIT | 0.11.7 | pdfminer.six |
| pdfrw | 2017-09-18 | MIT | 0.4 | |
| pdftotext | - | GPL | 0.86.1 | build-essential libpoppler-cpp-dev pkg-config python3-dev |
| PyMuPDF | 2025-06-12 | GNU AFFERO GPL 3.0 / Commerical | 1.26.1 | MuPDF |
| pypdf | 2025-06-29 | BSD 3-Clause | 5.7.0 | |
| Tika | 2025-03-26 | Apache v2 | 3.1.0 | Apache Tika |
## Text Extraction Speed
| # | Library | Average | [ 1 ](https://arxiv.org/pdf/2201.00214.pdf) | [ 2 ](https://github.com/py-pdf/sample-files/raw/main/009-pdflatex-geotopo/GeoTopo.pdf) | [ 3 ](https://arxiv.org/pdf/2201.00151.pdf) | [ 4 ](https://arxiv.org/pdf/1707.09725.pdf) | [ 5 ](https://arxiv.org/pdf/2201.00021.pdf) | [ 6 ](https://arxiv.org/pdf/2201.00037.pdf) | [ 7 ](https://arxiv.org/pdf/2201.00069.pdf) | [ 8 ](https://arxiv.org/pdf/2201.00178.pdf) | [ 9 ](https://arxiv.org/pdf/2201.00201.pdf) | [ 10 ](https://arxiv.org/pdf/1602.06541.pdf) | [ 11 ](https://arxiv.org/pdf/2201.00200.pdf) | [ 12 ](https://arxiv.org/pdf/2201.00022.pdf) | [ 13 ](https://arxiv.org/pdf/2201.00029.pdf) | [ 14 ](https://arxiv.org/pdf/1601.03642.pdf) |
| :- | :-------------------------------------------------------- | :------ | :---------------------------------------------- | :------------------------------------------------------------------------------------------ | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- |
| 1 | [PyMuPDF ](https://pypi.org/project/PyMuPDF/) | 0.1s | 0.4s | 0.3s | 0.2s | 0.2s | 0.0s | 0.1s | 0.0s | 0.1s | 0.0s | 0.1s | 0.0s | 0.1s | 0.0s | 0.0s |
| 2 | [pypdfium2 ](https://pypi.org/project/pypdfium2/) | 0.1s | 0.5s | 0.3s | 0.2s | 0.2s | 0.0s | 0.1s | 0.0s | 0.0s | 0.0s | 0.1s | 0.0s | 0.0s | 0.0s | 0.0s |
| 3 | [Tika ](https://pypi.org/project/tika/) | 0.2s | 0.8s | 0.5s | 0.3s | 0.3s | 0.1s | 0.2s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.0s | 0.0s |
| 4 | [pdftotext ](https://poppler.freedesktop.org/) | 0.3s | 0.7s | 0.9s | 0.2s | 0.8s | 0.1s | 0.3s | 0.4s | 0.1s | 0.1s | 0.2s | 0.1s | 0.1s | 0.0s | 0.0s |
| 5 | [pypdf ](https://pypi.org/project/pypdf/) | 3.5s | 26.2s | 6.4s | 6.8s | 3.3s | 0.9s | 1.6s | 0.6s | 0.6s | 0.5s | 0.8s | 0.6s | 0.6s | 0.5s | 0.3s |
| 6 | [pdfminer.six ](https://pypi.org/project/pdfminer.six/) | 5.8s | 35.1s | 16.6s | 10.2s | 5.5s | 1.5s | 2.5s | 1.1s | 1.6s | 1.1s | 2.0s | 1.5s | 1.4s | 0.7s | 0.6s |
| 7 | [pdfplumber ](https://pypi.org/project/pdfplumber/) | 9.5s | 60.9s | 16.6s | 17.0s | 10.7s | 3.1s | 5.3s | 2.6s | 2.5s | 2.3s | 3.8s | 2.5s | 2.7s | 1.4s | 1.3s |
## Image Extraction Speed
| # | Library | Average | [ 1 ](https://arxiv.org/pdf/2201.00214.pdf) | [ 2 ](https://github.com/py-pdf/sample-files/raw/main/009-pdflatex-geotopo/GeoTopo.pdf) | [ 3 ](https://arxiv.org/pdf/2201.00151.pdf) | [ 4 ](https://arxiv.org/pdf/1707.09725.pdf) | [ 5 ](https://arxiv.org/pdf/2201.00021.pdf) | [ 6 ](https://arxiv.org/pdf/2201.00037.pdf) | [ 7 ](https://arxiv.org/pdf/2201.00069.pdf) | [ 8 ](https://arxiv.org/pdf/2201.00178.pdf) | [ 9 ](https://arxiv.org/pdf/2201.00201.pdf) | [ 10 ](https://arxiv.org/pdf/1602.06541.pdf) | [ 11 ](https://arxiv.org/pdf/2201.00200.pdf) | [ 12 ](https://arxiv.org/pdf/2201.00022.pdf) | [ 13 ](https://arxiv.org/pdf/2201.00029.pdf) | [ 14 ](https://arxiv.org/pdf/1601.03642.pdf) |
| :- | :-------------------------------------------------------- | :------ | :---------------------------------------------- | :------------------------------------------------------------------------------------------ | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- |
| 1 | [PyMuPDF ](https://pypi.org/project/PyMuPDF/) | 0.5s | 0.3s | 0.5s | 0.0s | 1.6s | 0.4s | 0.0s | 2.9s | 0.4s | 0.4s | 0.1s | 0.0s | 0.3s | 0.2s | 0.0s |
| 2 | [pypdfium2 ](https://pypi.org/project/pypdfium2/) | 1.1s | 1.2s | 1.8s | 0.0s | 3.3s | 0.9s | 0.2s | 5.1s | 0.7s | 0.6s | 0.4s | 0.0s | 0.5s | 0.2s | 0.0s |
| 3 | [pypdf ](https://pypi.org/project/pypdf/) | 4.2s | 21.6s | 6.1s | 5.7s | 11.8s | 1.3s | 0.6s | 6.5s | 1.2s | 1.2s | 0.8s | 0.2s | 0.9s | 0.2s | 0.2s |
| 4 | [pdfminer.six ](https://pypi.org/project/pdfminer.six/) | 7.4s | 43.9s | 17.5s | 12.7s | 15.4s | 1.6s | 2.5s | 1.6s | 1.5s | 1.0s | 1.8s | 1.2s | 1.3s | 0.7s | 0.5s |
## Watermarking Speed
| # | Library | Average | [ 1 ](https://arxiv.org/pdf/2201.00214.pdf) | [ 2 ](https://github.com/py-pdf/sample-files/raw/main/009-pdflatex-geotopo/GeoTopo.pdf) | [ 3 ](https://arxiv.org/pdf/2201.00151.pdf) | [ 4 ](https://arxiv.org/pdf/1707.09725.pdf) | [ 5 ](https://arxiv.org/pdf/2201.00021.pdf) | [ 6 ](https://arxiv.org/pdf/2201.00037.pdf) | [ 7 ](https://arxiv.org/pdf/2201.00069.pdf) | [ 8 ](https://arxiv.org/pdf/2201.00178.pdf) | [ 9 ](https://arxiv.org/pdf/2201.00201.pdf) | [ 10 ](https://arxiv.org/pdf/1602.06541.pdf) | [ 11 ](https://arxiv.org/pdf/2201.00200.pdf) | [ 12 ](https://arxiv.org/pdf/2201.00022.pdf) | [ 13 ](https://arxiv.org/pdf/2201.00029.pdf) | [ 14 ](https://arxiv.org/pdf/1601.03642.pdf) |
| :- | :--------------------------------------------------- | :------ | :---------------------------------------------- | :------------------------------------------------------------------------------------------ | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- |
| 1 | [pdfrw ](https://pypi.org/project/pdfrw/) | 0.1s | 0.1s | 0.5s | 0.0s | 0.3s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.0s | 0.1s | 0.0s | 0.0s |
| 2 | [PyMuPDF ](https://pypi.org/project/PyMuPDF/) | 0.2s | 0.4s | 0.6s | 0.2s | 0.4s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.0s | 0.1s | 0.0s | 0.0s |
| 3 | [pypdf ](https://pypi.org/project/pypdf/) | 0.5s | 0.6s | 2.0s | 0.4s | 1.1s | 0.2s | 0.3s | 0.3s | 0.3s | 0.2s | 0.3s | 0.1s | 0.6s | 0.1s | 0.1s |
## Watermarking File Size
| # | Library | Average | [ 1 ](https://arxiv.org/pdf/2201.00214.pdf) | [ 2 ](https://github.com/py-pdf/sample-files/raw/main/009-pdflatex-geotopo/GeoTopo.pdf) | [ 3 ](https://arxiv.org/pdf/2201.00151.pdf) | [ 4 ](https://arxiv.org/pdf/1707.09725.pdf) | [ 5 ](https://arxiv.org/pdf/2201.00021.pdf) | [ 6 ](https://arxiv.org/pdf/2201.00037.pdf) | [ 7 ](https://arxiv.org/pdf/2201.00069.pdf) | [ 8 ](https://arxiv.org/pdf/2201.00178.pdf) | [ 9 ](https://arxiv.org/pdf/2201.00201.pdf) | [ 10 ](https://arxiv.org/pdf/1602.06541.pdf) | [ 11 ](https://arxiv.org/pdf/2201.00200.pdf) | [ 12 ](https://arxiv.org/pdf/2201.00022.pdf) | [ 13 ](https://arxiv.org/pdf/2201.00029.pdf) | [ 14 ](https://arxiv.org/pdf/1601.03642.pdf) |
| :- | :--------------------------------------------------- | :------ | :---------------------------------------------- | :------------------------------------------------------------------------------------------ | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- |
| 1 | [pypdf ](https://pypi.org/project/pypdf/) | 3.4MB | 2.5MB | 5.6MB | 1.6MB | 7.2MB | 2.7MB | 3.1MB | 15.4MB | 2.4MB | 1.3MB | 3.0MB | 0.3MB | 1.2MB | 0.8MB | 1.0MB |
| 2 | [pdfrw ](https://pypi.org/project/pdfrw/) | 3.5MB | 2.5MB | 5.7MB | 1.6MB | 7.3MB | 2.7MB | 3.1MB | 15.4MB | 2.4MB | 1.3MB | 3.0MB | 0.3MB | 1.2MB | 0.8MB | 1.0MB |
| 3 | [PyMuPDF ](https://pypi.org/project/PyMuPDF/) | 3.7MB | 2.7MB | 6.9MB | 1.7MB | 8.5MB | 2.8MB | 3.4MB | 15.5MB | 2.5MB | 1.4MB | 3.2MB | 0.3MB | 1.3MB | 0.9MB | 1.1MB |
## Text Extraction Quality
| # | Library | Average | [ 1 ](https://arxiv.org/pdf/2201.00214.pdf) | [ 2 ](https://github.com/py-pdf/sample-files/raw/main/009-pdflatex-geotopo/GeoTopo.pdf) | [ 3 ](https://arxiv.org/pdf/2201.00151.pdf) | [ 4 ](https://arxiv.org/pdf/1707.09725.pdf) | [ 5 ](https://arxiv.org/pdf/2201.00021.pdf) | [ 6 ](https://arxiv.org/pdf/2201.00037.pdf) | [ 7 ](https://arxiv.org/pdf/2201.00069.pdf) | [ 8 ](https://arxiv.org/pdf/2201.00178.pdf) | [ 9 ](https://arxiv.org/pdf/2201.00201.pdf) | [ 10 ](https://arxiv.org/pdf/1602.06541.pdf) | [ 11 ](https://arxiv.org/pdf/2201.00200.pdf) | [ 12 ](https://arxiv.org/pdf/2201.00022.pdf) | [ 13 ](https://arxiv.org/pdf/2201.00029.pdf) | [ 14 ](https://arxiv.org/pdf/1601.03642.pdf) |
| :- | :-------------------------------------------------------- | :------ | :---------------------------------------------- | :------------------------------------------------------------------------------------------ | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- |
| 1 | [pypdfium2 ](https://pypi.org/project/pypdfium2/) | 97% | 99% | 97% | 94% | 99% | 98% | 96% | 99% | 99% | 99% | 99% | 98% | 78% | 99% | 99% |
| 2 | [pypdf ](https://pypi.org/project/pypdf/) | 96% | 99% | 95% | 93% | 98% | 99% | 96% | 97% | 99% | 99% | 99% | 99% | 78% | 100% | 99% |
| 3 | [PyMuPDF ](https://pypi.org/project/PyMuPDF/) | 96% | 98% | 96% | 93% | 97% | 98% | 95% | 99% | 98% | 98% | 98% | 97% | 77% | 98% | 99% |
| 4 | [Tika ](https://pypi.org/project/tika/) | 95% | 99% | 98% | 92% | 97% | 98% | 96% | 93% | 97% | 98% | 93% | 98% | 73% | 98% | 96% |
| 5 | [pdftotext ](https://poppler.freedesktop.org/) | 91% | 96% | 93% | 91% | 94% | 92% | 96% | 96% | 96% | 97% | 83% | 94% | 77% | 96% | 79% |
| 6 | [pdfminer.six ](https://pypi.org/project/pdfminer.six/) | 89% | 95% | 79% | 86% | 92% | 86% | 93% | 95% | 93% | 92% | 92% | 93% | 71% | 98% | 86% |
| 7 | [pdfplumber ](https://pypi.org/project/pdfplumber/) | 75% | 94% | 84% | 68% | 97% | 61% | 93% | 61% | 89% | 57% | 59% | 67% | 58% | 98% | 67% |