{"id":20069766,"url":"https://github.com/lebedov/python-pdfbox","last_synced_at":"2025-04-09T19:17:58.045Z","repository":{"id":48480867,"uuid":"110066820","full_name":"lebedov/python-pdfbox","owner":"lebedov","description":"Python interface to Apache PDFBox command-line tools.","archived":false,"fork":false,"pushed_at":"2023-01-24T13:51:20.000Z","size":109,"stargazers_count":75,"open_issues_count":10,"forks_count":24,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-09T19:17:52.716Z","etag":null,"topics":["pdf","pdfbox","python","python3"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lebedov.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.rst","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-09T04:22:04.000Z","updated_at":"2024-10-11T08:34:02.000Z","dependencies_parsed_at":"2023-02-13T22:01:07.697Z","dependency_job_id":null,"html_url":"https://github.com/lebedov/python-pdfbox","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lebedov%2Fpython-pdfbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lebedov%2Fpython-pdfbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lebedov%2Fpython-pdfbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lebedov%2Fpython-pdfbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lebedov","download_url":"https://codeload.github.com/lebedov/python-pdfbox/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248094988,"owners_count":21046770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pdf","pdfbox","python","python3"],"created_at":"2024-11-13T14:16:08.958Z","updated_at":"2025-04-09T19:17:58.020Z","avatar_url":"https://github.com/lebedov.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. -*- rst -*-\n\npython-pdfbox\n=============\n\nPackage Description\n-------------------\nProvides a simple Python 3 interface to the \n`Apache PDFBox \u003chttps://pdfbox.apache.org/\u003e`_\ncommand-line tools.\n\n.. image:: https://img.shields.io/pypi/v/python-pdfbox.svg\n    :target: https://pypi.python.org/pypi/python-pdfbox\n    :alt: Latest Version\n          \nRequirements\n------------\nAside from Python 3 and those packages specified in\n`setup.py \u003chttps://github.com/lebedov/python-pdfbox/blob/master/setup.py\u003e`_,\npython-pdfbox requires ``java`` to be present in the system path.\n\nSome users have reported `issues on\nMacOS \u003chttps://github.com/lebedov/python-pdfbox/issues/14\u003e`_ with certain\nversions of Java. If you encounter such issues, try a recent release of OpenJDK\n(14 or later).\n\nInstallation\n------------\nThe package may be installed as follows: ::\n\n    pip install python-pdfbox\n\nOne may specify the location of the PDFBox jar file via the ``PDFBOX``\nenvironmental variable. If not set, python-pdfbox looks for the jar file\nin the platform-specific user cache directory and automatically downloads\nthe latest available version below 3.0.0 and caches it if not present.\n\nUsage\n-----\nThe interface currently exposes only several features in PDFBox (text extraction, conversion to images, extraction\nof images): ::\n\n    import pdfbox\n    p = pdfbox.PDFBox()\n    p.extract_text('/path/to/my_file.pdf')   # writes text to /path/to/my_file.txt\n    p.pdf_to_images('/path/to/my_file.pdf')  # writes images to /path/to/my_file1.jpg, /path/to/my_file2.jpg, etc.\n    p.extract_images('/path/to/my_file.pdf') # writes images to /path/to/my_file-1.png, /path/to/my_file-2.png, etc.\n\nNotes\n-----\nOwing to a change in command line interface, python-pdfbox cannot \ncurrently use PDFBox 3.0.0.\n\nDevelopment\n-----------\nThe latest release of the package may be obtained from\n`GitHub \u003chttps://github.com/lebedov/python-pdfbox\u003e`_.\n\nAuthor\n------\nSee the included `AUTHORS.rst \n\u003chttps://github.com/lebedov/python-pdfbox/blob/master/AUTHORS.rst\u003e`_ file for more \ninformation.\n\nLicense\n-------\nThis software is licensed under the\n`Apache 2.0 License \u003chttps://opensource.org/licenses/Apache-2.0\u003e`_.\nSee the included `LICENSE.rst \n\u003chttps://github.com/lebedov/python-pdfbox/blob/master/LICENSE.rst\u003e`_ file for more \ninformation.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flebedov%2Fpython-pdfbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flebedov%2Fpython-pdfbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flebedov%2Fpython-pdfbox/lists"}