Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wohali/hough
Skew detection in scanned images
https://github.com/wohali/hough
deskew hough image-processing python rotate
Last synced: 3 months ago
JSON representation
Skew detection in scanned images
- Host: GitHub
- URL: https://github.com/wohali/hough
- Owner: wohali
- License: gpl-2.0
- Fork: true (qu1j0t3/hough)
- Created: 2020-03-24T00:41:13.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2024-02-06T10:08:00.000Z (11 months ago)
- Last Synced: 2024-09-18T13:43:21.634Z (4 months ago)
- Topics: deskew, hough, image-processing, python, rotate
- Language: Python
- Size: 3.32 MB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
README
# hough - Skew detection in scanned images
_Hough_ finds skew angles in scanned document pages, using the Hough transform.
It is oriented to batch processing, and can make use of multiple cores. (You'll
want this - analysis and image processing is very CPU intensive!)# Installation and usage
## Installation
```
pip install -U pip
pip install hough
```The first line is required to update `pip` to a new enough version to be
compatible with `manylinux` wheel packaging, required for PyMuPDF.Older versions of `pip` are fine, but you'll have to install MuPDF, its
headers, and a compiler first, so PyMuPDF can be compiled locally.## Usage
To get started right away, here's some examples.
Generate angles (in CSV form) for a bunch of TIFF page images, one page per file:
```
hough --csv in/*.tif
```The same, but for a PDF file, and display a histogram at the end:
```
hough --histogram Able_Attach_Sep83.pdf
```The same, but show progress while running:
```
hough -v --histogram Able_Attach_Sep83.pdf
```The deskewing results are placed in the `results.csv` file. Example:
```csv
"Input File","Page Number","Computed angle","Variance of computed angles","Image width (px)","Image height (px)"
"/home/toby/my-pages/orig/a--0000.pgm.tif",,-0.07699791151672428,0.001073874144832815,5014,6659
"/home/toby/my-pages/orig/a--0001.pgm.tif",,,,5018,6630
"/home/toby/my-pages/orig/a--0002.pgm.tif",,0.24936351676615068,0.005137031681286154,5021,6629
"/home/toby/my-pages/orig/a--0003.pgm.tif",,,,5020,6608
"/home/toby/my-pages/orig/a--0004.pgm.tif",,-0.037485115754500545,0.025945115897015238,5021,6616
```The program should work on various image input formats, and with both grey scale
and RGB images. _Hough_ works best with images ≥300dpi.Here's a histogram sample:
```
=== Skew statistics ===
0.00° - 0.10° [57] ████████████████████████████████████████
0.10° - 0.20° [39] ███████████████████████████▍
0.20° - 0.30° [30] █████████████████████
0.30° - 0.40° [30] █████████████████████
0.40° - 0.50° [11] ███████▊
0.50° - 0.60° [11] ███████▊
0.60° - 0.70° [ 3] ██▏
0.70° - 0.80° [ 4] ██▊
0.80° - 0.90° [ 0]
0.90° - 1.00° [ 1] ▊
1.00° - 1.10° [ 1] ▊
1.10° - 1.20° [ 0]
1.20° - 1.30° [ 1] ▊
1.30° - 1.40° [ 1] ▊
1.40° - 1.50° [ 1] ▊
1.50° - 1.60° [ 2] █▍
1.60° - 1.70° [ 0]
1.70° - 1.80° [ 1] ▊
1.80° - 1.90° [ 2] █▍
1.90° - 2.00° [ 0]
Samples: 195
50th percentile: 0.20°
90th percentile: 0.55°
99th percentile: 1.77°
```## Command line options
You can list them by running `hough --help`:
```
hough - straighten scanned pages using the Hough transform.Usage:
hough (-h | --help)
hough [options] [FILE] ...
hough [options] [--results=] [FILE] ...
hough (-r | --rotate) [options] [--results=]
hough (-r | --rotate) [options] [--results=] [FILE] ...Arguments:
FILE input files to analyze/rotateOptions:
-h --help display this help and exit
--version display the version number and exit
-v --verbose print status messages
-d --debug retain debug image output in debug/ dir
(also enables --verbose)
--histogram display rotation angle histogram summary
-o DIR, --out=DIR store output results/images in named
directory. Directory is created if it
does not exist [default: out/TIMESTAMP]
--results= save results in FILE under output path,
or specify path to results file for
rotation [default: results.csv]
-w --workers=<#> specify the number of workers to run
simultaneously. Default: total # of CPUs
-r --rotate rotates the files passed on the command
line, or if none given, those listed
in the results file.
```# Examples
Just about all of [these files](http://docs.telegraphics.com.au/) have been
deskewed this way.# Getting the best results
### NOTE: This is a beta product!
There's a few guidelines you should follow to get the best deskewing results
from your document scans:1. Bilevel (black-and-white) bitmaps will produce lower quality results.
For best results, scan to greyscale or RGB first, deskew with _Hough_, then
reduce the colour depth to bilevel.
1. Hough deskewing is an inexact process, with many heuristics discovered
by trial and error. _Hough_ may not work well on your material without tuning
and further modification. (We'd love your pull requests!)## Debugging output
You can spy on _Hough_'s attempts to perform deskewing by passing the `--debug`
flag on the command line. The generated images, and any detected lines in them,
are placed in the `debug//` directory.Note that _Hough_ cannot always determine a skew for a page (e.g. blank pages
in particular), and will very occasionally get the skew wrong (depending on
source material). It's worth reviewing these images if _Hough_ makes a bad
decision on your scans. Please submit these files along with the original image
when filing an issue!## Recommended scanners
The authors have tested this software with output from the following scanners:
* Fujitsu fi-4530C, USB
* Fast
* Cheap on eBay
* Requires a Windows XP VirtualBox for drivers
* Brother ADS-2700W, USB + Ethernet + WiFi
* Fast
* Can scan directly to the network or to a memory stick
* Factory reconditioned models stilll available (March 2020)
* Very low skew out of the box
* Epson WF-7610, USB + Ethernet + WiFi
* 11"x17" and duplex capable
* Can scan directly to the network or to a memory stick# Developing
First, clone this repo.
You'll need to install [Poetry](https://python-poetry.org/docs/#installation),
then run:```
poetry run pip install -U pip setuptools
poetry install
poetry shell
```Do some work, then run the tests with
```
pytest
pre-commit run
```# License notice
```
This file is part of "hough", which detects skew angles in scanned images
Copyright (C) 2016-2020 Toby Thain
Copyright (C) 2020 Joan TouzetThis program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
```