{"id":19003892,"url":"https://github.com/py-pdf/pdf","last_synced_at":"2025-04-22T18:19:44.262Z","repository":{"id":42446049,"uuid":"477843783","full_name":"py-pdf/pdf","owner":"py-pdf","description":"A modern pure-Python library for reading PDF files","archived":false,"fork":false,"pushed_at":"2022-04-05T19:29:48.000Z","size":4969,"stargazers_count":11,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-10T18:10:40.350Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/py-pdf.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-04T19:26:24.000Z","updated_at":"2024-08-20T06:18:24.000Z","dependencies_parsed_at":"2022-09-26T17:00:52.603Z","dependency_job_id":null,"html_url":"https://github.com/py-pdf/pdf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/py-pdf%2Fpdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/py-pdf%2Fpdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/py-pdf%2Fpdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/py-pdf%2Fpdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/py-pdf","download_url":"https://codeload.github.com/py-pdf/pdf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250296438,"owners_count":21407037,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T18:20:39.295Z","updated_at":"2025-04-22T18:19:44.211Z","avatar_url":"https://github.com/py-pdf.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://badge.fury.io/py/pdffile.svg)](https://badge.fury.io/py/pdffile)\n[![Code](https://img.shields.io/badge/code-GitHub-brightgreen)](https://github.com/py-pdf/pdf)\n[![Actions Status](https://github.com/py-pdf/pdf/workflows/Unit%20Tests/badge.svg)](https://github.com/py-pdf/pdf/actions)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n# pdf\nA modern pure-Python library for reading PDF files.\n\nThe goal is to have a modern interface to handle PDF files which is consistent\nwith itself and typical Python syntax.\n\nThe library should be Python-only (hence no C-extensions), but allow to change\nthe backend. Similar in concept to [matplotlib backends](https://matplotlib.org/2.0.2/faq/usage_faq.html#what-is-a-backend) and [Keras backends](https://faroit.com/keras-docs/1.2.0/backend/).\n\nThe default backend could be PyPDF2.\n\nPossible other backends could be [PyMuPDF](https://pymupdf.readthedocs.io/en/latest/)\n(using [MuPDF](https://mupdf.com/))\nand [PikePDF](https://github.com/pikepdf/pikepdf) (using [QPDF](https://github.com/qpdf/qpdf)).\n\n\u003e **WARNING**: This library is UNSTABLE at the moment! Expect many changes!\n\n## Installation\n\n```bash\npip install pdffile\n```\n\n## Usage\n\n\n### Retrieve Metadata\n\n```pycon\n\u003e\u003e\u003e import pdf\n\n\u003e\u003e\u003e doc = pdf.PdfFile(\"001-trivial/minimal-document.pdf\")\n\u003e\u003e\u003e len(doc)\n1\n\n\u003e\u003e\u003e doc.metadata\nMetadata(\n    title=None,\n    producer='pdfTeX-1.40.23',\n    creator='TeX',\n    creation_date=datetime.datetime(2022, 4, 3, 18, 5, 42),\n    modification_date=datetime.datetime(2022, 4, 3, 18, 5, 42)\n    other={\n         '/CreationDate': \"D:20220403180542+02'00'\",\n         '/ModDate': \"D:20220403180542+02'00'\",\n         '/Trapped': '/False',\n         '/PTEX.Fullbanner': 'This is pdfTeX, V...'})\n\n```\n\n### Encrypted PDFs\n\nIf you have an encrypted PDF, just provide the key:\n\n```python\ndoc = pdf.PdfFile(pdf_path, password=password)\n```\n\nAll following operations work just as described.\n\n\n## Get Outline\n\n```pycon\n\u003e\u003e\u003e import pdf\n\u003e\u003e\u003e doc = pdf.PdfFile(pdf_path, password=password)\n\u003e\u003e\u003e doc.outline\n[\n    Links(page=5, text='1 Header'),\n    Links(page=5, text='1.1 A section'),\n    Links(page=9, text='2 Foobar'),\n    Links(page=108, text='References')\n]\n```\n\n### Extract Text\n\n```pycon\n\u003e\u003e\u003e import pdf\n\u003e\u003e\u003e doc = pdf.PdfFile(\"001-trivial/minimal-document.pdf\")\n\u003e\u003e\u003e doc[0]\n\u003cpdf.PdfPage object at 0x7f72d2b04100\u003e\n\u003e\u003e\u003e doc[0].text\n'Loremipsumdolorsitamet,consetetursadipscingelitr,seddiamnonumyeirmod\\ntemporinviduntutlaboreetdoloremagnaaliquyamerat,seddiamvoluptua.Atvero\\neosetaccusametjustoduodoloresetearebum.Stetclitakasdgubergren,noseataki-\\nmatasanctusestLoremipsumdolorsitamet.Loremipsumdolorsitamet,consetetur\\nsadipscingelitr,seddiamnonumyeirmodtemporinviduntutlaboreetdoloremagna\\naliquyamerat,seddiamvoluptua.Atveroeosetaccusametjustoduodoloresetea\\nrebum.Stetclitakasdgubergren,noseatakimatasanctusestLoremipsumdolorsit\\namet.\\n1\\n'\n```\n\nAlternatively, you can use `doc.text` to get the text of all pages.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpy-pdf%2Fpdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpy-pdf%2Fpdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpy-pdf%2Fpdf/lists"}