https://github.com/adlerweb/pdfqrsplit

Split PDF-files into separate ones based on separator pages / Barcodes
https://github.com/adlerweb/pdfqrsplit

pdf python zxing

Last synced: 5 months ago
JSON representation

Split PDF-files into separate ones based on separator pages / Barcodes

Host: GitHub
URL: https://github.com/adlerweb/pdfqrsplit
Owner: adlerweb
License: mit
Created: 2020-05-30T17:40:13.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-07-19T20:16:23.000Z (about 2 years ago)
Last Synced: 2025-05-13T10:16:13.315Z (5 months ago)
Topics: pdf, python, zxing
Language: Python
Homepage:
Size: 22.5 KB
Stars: 22
Watchers: 3
Forks: 8
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# PdfQRSplit

*PdfQRSplit* is a small utility to split a multi-page PDF document into separate PDF files based on pages containing a specified barcode. This concept is known as "separator page" and used in combination with high volume document scanners to scan a large number of unrelated documents in bulk.

While named "*QR*" this tool will also work with most other barcode types.

## Installation and requirements

Python 3 or newer is required. You also need **zxing** (Barcode recognition), **pypdf4** (PDF handling) and **pillow** (image handling) - all of them can be installed using pip:

```
pip install zxing pypdf4 pillow
```
or
```
pip install -r requirements.txt
```

## Usage
```
usage: PdfQRSplit.py [-h] [-p PREFIX] [-s SEPARATOR] [-k] [--keep-page-next] [-b BRIGHTNESS] [-v] [-d] inputfile

Split PDF-file into separate files based on a separator barcode

positional arguments:
inputfile Filename or glob to process

optional arguments:
-h, --help show this help message and exit
-p PREFIX, --prefix PREFIX
Prefix for generated PDF files. Default: split
-s SEPARATOR, --separator SEPARATOR
Barcode content used to find separator pages. Default: ADAR-NEXTDOC
-k, --keep-page Keep separator page in previous document
--keep-page-next Keep separator page in next document
-b BRIGHTNESS, --brightness BRIGHTNESS
brightness threshold for barcode preparation (0-255). Default: 128
-v, --verbose Show verbose processing messages
-d, --debug Show debug messages
```

### Example

Take the file **input.pdf**, search all pages for barcodes containing the text *"SPLITME"*. If found (or at the end of the input file) previously encountered pages will be written to a separate file, in this case (-k) including the page containing the separator barcode. Since no prefix was given the first file will be named "*split_0_0.pdf*". *split* is the default prefix, 0 indicates it was generated from the first (and in this case only) input file and the second 0 indicates it's the first document extracted from this file.

```python .\test.py .\input.pdf -s "SPLITME" -k -v```

```
Processing file .\input.pdf containing 66 pages
Analyzing page 1
Analyzing page 2
[...]
Analyzing page 6
Found separator - writing 6 pages to split_0_0.pdf
Analyzing page 7
[...]
Analyzing page 13
Found separator - writing 7 pages to split_0_1.pdf
Analyzing page 14
[...]
Split 1 given files into 19 files
```

## Thanks

This script is based on ["pdf_split_tool" by Thiago Carvalho D'Ávila (staticdev)](https://github.com/staticdev/pdf-split-tool/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adlerweb/pdfqrsplit

Awesome Lists containing this project

README