https://github.com/merrvve/pdf-image-extract
Command-line tool to extract and save images (JPEG, PNG) from a PDF file or all PDFs in a directory based on the specific byte signatures.
https://github.com/merrvve/pdf-image-extract
command-line-tool pdf-extractor pdf-image-extractor python
Last synced: 11 months ago
JSON representation
Command-line tool to extract and save images (JPEG, PNG) from a PDF file or all PDFs in a directory based on the specific byte signatures.
- Host: GitHub
- URL: https://github.com/merrvve/pdf-image-extract
- Owner: merrvve
- License: mit
- Created: 2024-08-23T12:01:20.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-25T20:12:44.000Z (almost 2 years ago)
- Last Synced: 2025-03-14T14:27:55.837Z (about 1 year ago)
- Topics: command-line-tool, pdf-extractor, pdf-image-extractor, python
- Language: Python
- Homepage:
- Size: 4.14 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PDF IMAGE EXTRACTOR (JPEG AND PNG)
This script extracts and saves JPEG and PNG images embedded within PDF files.
The script reads PDF files in binary format, searches for embedded JPEG and PNG images
by identifying their unique byte signatures, and saves each detected image into a separate
file in a designated output directory. The output directory is named after the input PDF file
and is located in the 'results' folder.
## Usage:
python3 bin/main.py input_file.pdf
python3 bin/main.py path/to/input/files
## Arguments:
input_file.pdf (or) path/to/input/files : Path to the PDF files or a single pdf file from which images will be extracted.