https://github.com/baltpeter/scanprep

Small utility to prepare scanned documents. Supports separating PDF files by separator pages and removing blank pages.
https://github.com/baltpeter/scanprep

hacktoberfest image-processing pdf scanned-documents scanning

Last synced: 5 months ago
JSON representation

Small utility to prepare scanned documents. Supports separating PDF files by separator pages and removing blank pages.

Host: GitHub
URL: https://github.com/baltpeter/scanprep
Owner: baltpeter
License: mit
Created: 2021-01-09T15:26:17.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2024-08-13T00:10:15.000Z (about 1 year ago)
Last Synced: 2025-05-01T00:00:00.203Z (5 months ago)
Topics: hacktoberfest, image-processing, pdf, scanned-documents, scanning
Language: Python
Homepage:
Size: 512 KB
Stars: 32
Watchers: 2
Forks: 11
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# scanprep – Prepare scanned PDF documents

> Small utility to prepare scanned documents. Supports separating PDF files by separator pages and removing blank pages.

Scanprep can be used to prepare scanned documents for further processing with existing tools (like the great [OCRmyPDF](https://github.com/jbarlow83/OCRmyPDF)) or directly for archival. It allows splitting multiple documents that were scanned in a single batch into multiple files. In addition, it can also remove blank pages from the output (this is especially helpful if using a duplex scanner).

For document separation, separator pages need to be inserted between the different documents before scanning. These pages tell the program where to split. You can either use the [included separator page](/separator-page.pdf) or create your own. The separator page simply needs to have a barcode that encodes the text `SCANPREP_SEP` (you can use any [barcode type supported by zbar](http://zbar.sourceforge.net/about.html)).

## Installation

### Via Snap

You can install scanprep from the [Snap Store](https://snapcraft.io/scanprep):

```sh
snap install scanprep

scanprep -h
```

### Via PyPI

You can install scanprep using `pip` (consider doing that in a venv):

```sh
pip3 install scanprep

# If you see an error like "ImportError: Unable to find zbar shared library", you need to install zbar yourself. See: https://pypi.org/project/pyzbar/
scanprep -h
```

### From source

To install scanprep from source, clone this repository and install the dependencies:

```sh
git clone https://github.com/baltpeter/scanprep.git
cd scanprep
pip3 install -r requirements.txt # You may want to do this in a venv.
# You may also need to install the zbar shared library. See: https://pypi.org/project/pyzbar/

python3 scanprep/scanprep.py -h
```

## Usage

Most simply, you can run scanprep via `scanprep `. This will process the input file and output the results into your current working directory. To specify a different output directory, use `scanprep `.
The output files will be called `0-`, `1-`, and so on.

By default, both page separation and blank page removal will be performed. To turn them off, use `--no-page-separation` or `--no-blank-removal`, respectively.

Use `scanprep -h` to show the help:

```
usage: scanprep [-h] [--page-separation] [--blank-removal] input_pdf [output_dir]

positional arguments:
input_pdf The PDF document to process.
output_dir The directory where the output documents will be saved. (defaults to the
current directory)

optional arguments:
-h, --help show this help message and exit
--page-separation, --no-page-separation
Do (or do not) split document into separate files by the included
separator pages. (default yes)
--blank-removal, --no-blank-removal
Do (or do not) remove empty pages from the output. (default yes)
```

## License

Scanprep is licensed under the MIT license, see the [`LICENSE`](/LICENSE) file for details. Issues and pull requests are welcome!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/baltpeter/scanprep

Awesome Lists containing this project

README