An open API service indexing awesome lists of open source software.

https://github.com/merrvve/pdf-image-extract

Command-line tool to extract and save images (JPEG, PNG) from a PDF file or all PDFs in a directory based on the specific byte signatures.
https://github.com/merrvve/pdf-image-extract

command-line-tool pdf-extractor pdf-image-extractor python

Last synced: 11 months ago
JSON representation

Command-line tool to extract and save images (JPEG, PNG) from a PDF file or all PDFs in a directory based on the specific byte signatures.

Awesome Lists containing this project

README

          

# PDF IMAGE EXTRACTOR (JPEG AND PNG)

This script extracts and saves JPEG and PNG images embedded within PDF files.

The script reads PDF files in binary format, searches for embedded JPEG and PNG images
by identifying their unique byte signatures, and saves each detected image into a separate
file in a designated output directory. The output directory is named after the input PDF file
and is located in the 'results' folder.

## Usage:
python3 bin/main.py input_file.pdf
python3 bin/main.py path/to/input/files

## Arguments:
input_file.pdf (or) path/to/input/files : Path to the PDF files or a single pdf file from which images will be extracted.