https://github.com/robocorp/example-pdf-to-image
This robot converts PDF files to PNG images using Python.
https://github.com/robocorp/example-pdf-to-image
Last synced: about 1 year ago
JSON representation
This robot converts PDF files to PNG images using Python.
- Host: GitHub
- URL: https://github.com/robocorp/example-pdf-to-image
- Owner: robocorp
- License: apache-2.0
- Created: 2021-05-20T10:13:24.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2024-01-02T17:21:02.000Z (over 2 years ago)
- Last Synced: 2025-05-13T00:47:03.238Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 653 KB
- Stars: 2
- Watchers: 16
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Convert PDF files to PNG images
This robot converts [PDF](https://en.wikipedia.org/wiki/PDF) files to [PNG](https://en.wikipedia.org/wiki/Portable_Network_Graphics) images using Python. This is useful when you want to use [OCR (Optical Character Recognition)](https://en.wikipedia.org/wiki/Optical_character_recognition) and image recognition services to extract data from your documents.
There are two example PDF files:
- A single-page `example-invoice.pdf`
- A multipage `example-multipage.pdf`
The `pdf2image` library generates one image per PDF document page:
```bash
example-invoice.pdf-0.png
example-multipage.pdf-0.png
example-multipage.pdf-1.png
example-multipage.pdf-2.png
example-multipage.pdf-3.png
example-multipage.pdf-4.png
example-multipage.pdf-5.png
example-multipage.pdf-6.png
example-multipage.pdf-7.png
example-multipage.pdf-8.png
```
## Dependencies
`conda.yaml`:
```yaml
channels:
- conda-forge
dependencies:
- python=3.7.5
- poppler=21.03.0
- pdf2image=1.15.1
```
## The robot
`task.py`:
```py
from pdf2image import convert_from_path
def convert_pdf_to_images(pdf_path):
images = convert_from_path(pdf_path)
for index, image in enumerate(images):
image.save(f'output/{pdf_path}-{index}.png')
def task():
convert_pdf_to_images('example-invoice.pdf')
convert_pdf_to_images('example-multipage.pdf')
if __name__ == "__main__":
task()
```