https://github.com/pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://github.com/pymupdf/PyMuPDF

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: over 1 year ago
JSON representation

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Host: GitHub
URL: https://github.com/pymupdf/PyMuPDF
Owner: pymupdf
License: agpl-3.0
Created: 2012-10-06T18:54:25.000Z (almost 14 years ago)
Default Branch: main
Last Pushed: 2025-02-05T15:53:32.000Z (over 1 year ago)
Last Synced: 2025-02-06T09:07:10.791Z (over 1 year ago)
Topics: data-science, epub, extract-data, font, mupdf, ocr, pdf, pdf-documents, pymupdf, python, table-extraction, tesseract, text-processing, text-shaping, xps
Language: Python
Homepage: https://pymupdf.readthedocs.io
Size: 322 MB
Stars: 6,368
Watchers: 66
Forks: 566
Open Issues: 41
Metadata Files:
- Readme: README.md
- Changelog: changes.txt
- License: COPYING
- Support: docs/supported-files-table.rst

Awesome Lists containing this project

awesome-scrapers - PyMuPDF
awesome-pdf - PyMuPDF
StarryDivineSky - pymupdf/PyMuPDF
awesome-python-fa - **PyMuPDF** - یک کتابخانه سریع و کارآمد برای پردازش فایل‌های PDF و سایر فرمت‌های مستندات. PyMuPDF به شما امکان می‌دهد که به راحتی متن، تصاویر، و متاداده‌ها را از فایل‌های PDF استخراج کرده و فایل‌های PDF را ویرایش کنید. (📚 فهرست / کار با pdf)
awesome - pymupdf/PyMuPDF - PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. (Python)
awesome-pdf - PyMuPDF - extract-inactive) ![](https://shields.io/badge/-manipulate-inactive) ![](https://shields.io/badge/-render-inactive): Python bindings to MuPDF. (Multi-Purpose Libraries)
awesome-data-analysis - PyMuPDF - Advanced PDF manipulation library. (📦 Additional Python Libraries / Documentation & File Processing)

README

          # PyMuPDF

**PyMuPDF** is a high performance **Python** library for data extraction, analysis, conversion & manipulation of [PDF (and other) documents](https://pymupdf.readthedocs.io/en/latest/the-basics.html#supported-file-types).

# Community

Join us on **Discord** here: [#pymupdf](https://discord.gg/TSpYGBW4eq)

# Installation

**PyMuPDF** requires **Python 3.9 or later**, install using **pip** with:

`pip install PyMuPDF`

There are **no mandatory** external dependencies. However, some [optional features](#pymupdf-optional-features) become available only if additional packages are installed.

You can also try without installing by visiting [PyMuPDF.io](https://pymupdf.io/#examples).

# Usage

Basic usage is as follows:

```python

import pymupdf # imports the pymupdf library

doc = pymupdf.open("example.pdf") # open a document

for page in doc: # iterate the document pages

  text = page.get_text() # get plain text encoded as UTF-8

```

# Documentation

Full documentation can be found on [pymupdf.readthedocs.io](https://pymupdf.readthedocs.io).

# Optional Features

* [fontTools](https://pypi.org/project/fonttools/) for creating font subsets.

* [pymupdf-fonts](https://pypi.org/project/pymupdf-fonts/) contains some nice fonts for your text output.

* [Tesseract-OCR](https://github.com/tesseract-ocr/tesseract) for optical character recognition in images and document pages.

# About

**PyMuPDF** adds **Python** bindings and abstractions to [MuPDF](https://mupdf.com/), a lightweight **PDF**, **XPS**, and **eBook** viewer, renderer, and toolkit. Both **PyMuPDF** and **MuPDF** are maintained and developed by [Artifex Software, Inc](https://artifex.com).

**PyMuPDF** was originally written by [Jorj X. McKie](mailto:jorj.x.mckie@outlook.de).

# License and Copyright

**PyMuPDF** is available under [open-source AGPL](https://www.gnu.org/licenses/agpl-3.0.html) and commercial license agreements. If you determine you cannot meet the requirements of the **AGPL**, please contact [Artifex](https://artifex.com/contact/pymupdf-inquiry.php) for more information regarding a commercial license.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pymupdf/PyMuPDF

Awesome Lists containing this project

README