{"id":15028970,"url":"https://github.com/belval/pdf2image","last_synced_at":"2025-05-14T00:07:19.483Z","repository":{"id":37390130,"uuid":"92681034","full_name":"Belval/pdf2image","owner":"Belval","description":"A python module that wraps the pdftoppm utility to convert PDF to PIL Image object","archived":false,"fork":false,"pushed_at":"2024-07-23T13:52:58.000Z","size":4816,"stargazers_count":1779,"open_issues_count":81,"forks_count":202,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-05-11T13:05:55.158Z","etag":null,"topics":["convert","pdf","pil","pil-image","poppler"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Belval.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"Belval"}},"created_at":"2017-05-28T19:00:59.000Z","updated_at":"2025-05-08T15:19:43.000Z","dependencies_parsed_at":"2024-11-19T05:37:00.822Z","dependency_job_id":null,"html_url":"https://github.com/Belval/pdf2image","commit_stats":{"total_commits":184,"total_committers":28,"mean_commits":6.571428571428571,"dds":0.375,"last_synced_commit":"1915dbd429cddfcd7d1f488526f9d5830d5698c6"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Belval%2Fpdf2image","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Belval%2Fpdf2image/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Belval%2Fpdf2image/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Belval%2Fpdf2image/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Belval","download_url":"https://codeload.github.com/Belval/pdf2image/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254043736,"owners_count":22005007,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["convert","pdf","pil","pil-image","poppler"],"created_at":"2024-09-24T20:09:27.228Z","updated_at":"2025-05-14T00:07:19.461Z","avatar_url":"https://github.com/Belval.png","language":"Python","funding_links":["https://github.com/sponsors/Belval"],"categories":[],"sub_categories":[],"readme":"# pdf2image\n[![CircleCI](https://circleci.com/gh/Belval/pdf2image/tree/master.svg?style=svg)](https://circleci.com/gh/Belval/pdf2image/tree/master) [![PyPI version](https://badge.fury.io/py/pdf2image.svg)](https://badge.fury.io/py/pdf2image) [![codecov](https://codecov.io/gh/Belval/pdf2image/branch/master/graph/badge.svg)](https://codecov.io/gh/Belval/pdf2image) [![Downloads](https://pepy.tech/badge/pdf2image/month)](https://pepy.tech/project/pdf2image) [![GitHub CI](https://github.com/Belval/pdf2image/actions/workflows/documentation.yml/badge.svg)](https://belval.github.io/pdf2image)\n\nA python (3.7+) module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object\n\n## How to install\n\n`pip install pdf2image`\n\n### Windows\n\nWindows users will have to build or download poppler for Windows. I recommend [@oschwartz10612 version](https://github.com/oschwartz10612/poppler-windows/releases/) which is the most up-to-date. You will then have to add the `bin/` folder to [PATH](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) or use `poppler_path = r\"C:\\path\\to\\poppler-xx\\bin\" as an argument` in `convert_from_path`.\n\n### Mac\n\nMac users will have to install [poppler](https://poppler.freedesktop.org/).\n\nInstalling using [Brew](https://brew.sh/):\n\n```\nbrew install poppler\n```\n\n### Linux\n\nMost distros ship with `pdftoppm` and `pdftocairo`. If they are not installed, refer to your package manager to install `poppler-utils`\n\n### Platform-independant (Using `conda`)\n\n1. Install poppler: `conda install -c conda-forge poppler`\n2. Install pdf2image: `pip install pdf2image`\n\n## How does it work?\n\n\n```py\nfrom pdf2image import convert_from_path, convert_from_bytes\nfrom pdf2image.exceptions import (\n    PDFInfoNotInstalledError,\n    PDFPageCountError,\n    PDFSyntaxError\n)\n```\n\nThen simply do:\n\n```py\nimages = convert_from_path('/home/belval/example.pdf')\n```\n\nOR\n\n```py\nimages = convert_from_bytes(open('/home/belval/example.pdf', 'rb').read())\n```\n\nOR better yet\n\n```py\nimport tempfile\n\nwith tempfile.TemporaryDirectory() as path:\n    images_from_path = convert_from_path('/home/belval/example.pdf', output_folder=path)\n    # Do something here\n```\n\n`images` will be a list of PIL Image representing each page of the PDF document.\n\nHere are the definitions:\n\n`convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)`\n\n`convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False, use_pdftocairo=False, timeout=600, hide_attributes=False)`\n\n## What's new?\n\n- Allow users to hide attributes when using pdftoppm with `hide_attributes` (Thank you @StaticRocket)\n- Fix console opening on Windows (Thank you @OhMyAgnes!)\n- Add `timeout` parameter which raises `PDFPopplerTimeoutError` after the given number of seconds.\n- Add `use_pdftocairo` parameter which forces `pdf2image` to use `pdftocairo`. Should improve performance.\n- Fixed a bug where using `pdf2image` with multiple threads (but not multiple processes) would cause and exception\n- `jpegopt` parameter allows for tuning of the output JPEG when using `fmt=\"jpeg\"` (`-jpegopt` in pdftoppm CLI) (Thank you @abieler)\n- `pdfinfo_from_path` and `pdfinfo_from_bytes` which expose the output of the pdfinfo CLI\n- `paths_only` parameter will return image paths instead of Image objects, to prevent OOM when converting a big PDF\n- `size` parameter allows you to define the shape of the resulting images (`-scale-to` in pdftoppm CLI)\n    - `size=400` will fit the image to a 400x400 box, preserving aspect ratio\n    - `size=(400, None)` will make the image 400 pixels wide, preserving aspect ratio\n    - `size=(500, 500)` will resize the image to 500x500 pixels, not preserving aspect ratio\n- `grayscale` parameter allows you to convert images to grayscale (`-gray` in pdftoppm CLI)\n- `single_file` parameter allows you to convert the first PDF page only, without adding digits at the end of the `output_file`\n- Allow the user to specify poppler's installation path with `poppler_path`\n\n## Performance tips\n\n- Using an output folder is significantly faster if you are using an SSD. Otherwise i/o usually becomes the bottleneck.\n- Using multiple threads can give you some gains but avoid more than 4 as this will cause i/o bottleneck (even on my NVMe SSD!).\n- If i/o is your bottleneck, using the JPEG format can lead to significant gains.\n- PNG format is pretty slow, this is because of the compression.\n- If you want to know the best settings (most settings will be fine anyway) you can clone the project and run `python tests.py` to get timings.\n\n## Limitations / known issues\n\n- A relatively big PDF will use up all your memory and cause the process to be killed (unless you use an output folder)\n- Sometimes fail read pdf signed using DocuSign, [Solution for DocuSign issue.](docs/installation.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbelval%2Fpdf2image","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbelval%2Fpdf2image","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbelval%2Fpdf2image/lists"}