Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/unpaper/unpaper
A post-processing tool for scanned sheets of paper.
https://github.com/unpaper/unpaper
Last synced: 8 days ago
JSON representation
A post-processing tool for scanned sheets of paper.
- Host: GitHub
- URL: https://github.com/unpaper/unpaper
- Owner: unpaper
- Created: 2011-08-14T22:25:27.000Z (about 13 years ago)
- Default Branch: main
- Last Pushed: 2024-07-11T20:44:57.000Z (4 months ago)
- Last Synced: 2024-08-02T15:30:08.558Z (3 months ago)
- Language: C
- Homepage:
- Size: 15.8 MB
- Stars: 999
- Watchers: 30
- Forks: 89
- Open Issues: 43
-
Metadata Files:
- Readme: README.md
- License: LICENSES/0BSD.txt
- Authors: AUTHORS
Awesome Lists containing this project
README
unpaper
=======Originally written by Jens Gulden — see AUTHORS for more information.
The entire `unpaper` project is licensed under GNU GPL v2.
Some of the individual files are licensed under the MIT or Apache 2.0 licenses.
Each file contains an [SPDX license header](https://reuse.software/)
specifying its license. The text of all three licenses is available under
`LICENSES`.Overview
--------`unpaper` is a post-processing tool for scanned sheets of paper,
especially for book pages that have been scanned from previously
created photocopies. The main purpose is to make scanned book pages
better readable on screen after conversion to PDF. Additionally,
`unpaper` might be useful to enhance the quality of scanned pages
before performing optical character recognition (OCR).`unpaper` tries to clean scanned images by removing dark edges that
appeared through scanning or copying on areas outside the actual page
content (e.g. dark areas between the left-hand-side and the
right-hand-side of a double- sided book-page scan).The program also tries to detect misaligned centering and rotation of
pages and will automatically straighten each page by rotating it to
the correct angle. This process is called "deskewing".Note that the automatic processing will sometimes fail. It is always a
good idea to manually control the results of unpaper and adjust the
parameter settings according to the requirements of the input. Each
processing step can also be disabled individually for each sheet.See [further documentation][3] for the supported file formats notes.
Dependencies
------------The only hard dependency of `unpaper` is [ffmpeg][4], which is used for
file input and output.Building instructions
---------------------`unpaper` uses [the Meson Build system](https://mesonbuild.com), which
can be installed using Python's package manage (`pip3` or `pip`):unpaper$ pip3 install --user 'meson >= 0.57' 'sphinx >= 3.4'
unpaper$ CFLAGS="-march=native" meson --buildtype=debugoptimized builddir
unpaper$ meson compile -C builddirYou can pass required optimization flags when creating the meson build
directory in the `CFLAGS` environment variable. Usage of Link-Time
Optimizations (Meson option `-Db_lto=true`) is recommended if
available.Further optimizations such as `-ftracer` and `-ftree-vectorize` are
thought to work, but their effect has not been evaluated so your
mileage may vary.Tests depend on `pytest` and `pillow`, which will be auto-detected by
Meson.Development Hints
-----------------The project includes configuration for [pre-commit](https://pre-commit.com/)
which is integrated with GitHub Actions CI. If you're using git for
devleopment, you can install it with
`pip install pre-commit && pre-commit --install`.Using [Sapling](https://sapling-scm.com/) with this repository is possible
and diffs can be reviewed as a stack.Further Information
-------------------You can find more information on the [basic concepts][1] and the
[image processing][2] in the available documentation.[1]: doc/basic-concepts.md
[2]: doc/image-processing.md
[3]: doc/file-formats.md
[4]: https://www.ffmpeg.org/