{"id":18974497,"url":"https://github.com/m7a/lp-scanning","last_synced_at":"2026-04-08T15:30:19.341Z","repository":{"id":164554628,"uuid":"217778963","full_name":"m7a/lp-scanning","owner":"m7a","description":"Workflow-tools for performing (mass) document scans","archived":false,"fork":false,"pushed_at":"2024-04-28T19:57:45.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-01T09:07:59.569Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/m7a.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-26T22:55:04.000Z","updated_at":"2024-04-28T19:57:49.000Z","dependencies_parsed_at":"2023-07-03T18:46:07.396Z","dependency_job_id":null,"html_url":"https://github.com/m7a/lp-scanning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Flp-scanning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Flp-scanning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Flp-scanning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Flp-scanning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/m7a","download_url":"https://codeload.github.com/m7a/lp-scanning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239972037,"owners_count":19727290,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T15:15:14.089Z","updated_at":"2026-04-08T15:30:19.270Z","avatar_url":"https://github.com/m7a.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\nsection: 32\nx-masysma-name: scanning\ntitle: Scanning Workflow and Tools\ndate: 2020/03/19 20:34:10\nlang: en-US\nauthor: [\"Linux-Fan, Ma_Sys.ma (Ma_Sys.ma@web.de)\"]\nkeywords: [\"scanimgrename\", \"ma_scanner\", \"mdvl\", \"scanner\", \"ocr\"]\nx-masysma-version: 1.2.0\nx-masysma-repository: https://github.com/m7a/lp-scanning\nx-masysma-website: https://masysma.net/32/scanning.xhtml\nx-masysma-owned: 1\nx-masysma-copyright: |\n  Copyright (c) 2020 Ma_Sys.ma.\n  For further info send an e-mail to Ma_Sys.ma@web.de.\n---\nBackground\n==========\n\nThis document describes a workflow for scanning documents which is being used\nat the Ma_Sys.ma. Additionally, the repository provides tools to aid with this\nworkflow.\n\nBeing very specific, it is expected that the scripts are not immediately useful\nfor general-purpose usage. However, they may still serve as an inspiration for\ndeveloping an own workflow.\n\nTechnical Details\n=================\n\nScans are expected to be acquired in form of multiple PDF documents which\nmay potentially contain multiple pages (e.g. from the same scan run through an\nautomatic document feeder, short: ADF). Alternatively, scanning documents\none-by-one by using a locally connected scanner are supported.\n\nFor storage, scans are converted to PNG files with a resolution of 150 DPI and\n8 colors. If the mode of scanning can be influenced, it is set to grayscale by\ndefault.\n\nAs an additional feature, scanned pages can be processed by an OCR to retrieve\na six-digit number which is used for constucting the file name of form\n`madocYYYYYY.png` where `YYYYYY` is the document's number.\n\nWorkflow\n========\n\nThe Ma_Sys.ma scanning workflow starts with (often hand-written) documents that\nare labelled by a _pagination stamp_ with consecutive six-digit numbers near the\nbegin of the document.\n\nThese documents are then processed by any of these two “ways”:\n\n## Way A: Scanning one-by-one\n\n 1. Script `ma_scanner2 -n` is invoked\n 2. First document put into the scanner\n 3. With [ENTER] the scan is triggered.\n 4. After scanning, the next document is processed by putting it in the\n    scanner and continuing from step 3. If no more documents remain, the\n    scanning is completed by entering `q` followed by [ENTER]\n\nIn parallel, background processes convert the scan results to indexed grayscale\nwith 8 colors and OCR processes try to recognize the numbers and rename the\nfiles accordingly\n\n## Way B: Scanning with ADF\n\n 1. A local FTP server is started.\n 2. Using a networked-scanner with ADF, documents are scanned in multiple\n    batches and uploaded to the FTP server in form of PDF documents.\n 3. In the directory with the PDF files, `ma_scanner -n .` is invoked.\n 4. Documents are converted and OCRed in parallel.\n\n## After the scanning\n\nAfter processing documents this way, two types of faults can be observed:\n\nWrongly named files\n:   These can often be identified by being numbered outside the interval of\n    scanned pages. For instance, if documents 002001 to 002300 were processed,\n    files named `madoc992022.png` are likely to be wrongly named. One can\n    identify these files by using a file manager's _sort by name_ function.\nUnidentified files\n:   If the OCR did not return any six-digit numbers for a file, it likely did\n    not recognize the stamped number. In this case, the file will be called\n    like its origin PDF + page number or `scan_YYY` with `YYY` being a\n    consecutive numbering scheme.\n\nIn both cases, the wrongly named files need to be renamed. To do this, program\n`scanimgrename` is invoked on all the files whose names are incorrect.\nFor instance, a typical invocation is `scanimgrename scan_???.png` to process\nall the hand-scanned but unidentified files.\n\nThe `scanimgrename` tool\n========================\n\n## Name\n\n`scanimgrename` -- rename scanned image files\n\n## Synopsis\n\n\tUsage: scanimgrename [file...]\n\n## Description\n\n`scanimgrename` provides a minmalistic interface showing only the scanned\ndocument and a field to enter a number which will automatically be prefixed by\na suitable number of zeroes if it is less than six digits. Upon pressing\n[ENTER], the file is renamed and the next document is presented. Having\nprocessed all files, the interface closes.\n\nIn case a file with the target file name already exists, pressing [ENTER] will\nsave the file name to a stack and turn to the already existing file under the\nassumption that that file might have been mis-named. `scanimage` indicates this\nmode by showing the number being entered in red as opposed to the regular blue\ncolor. Once the rename conflict has been resolved, the color will turn blue\nagain.\n\n## Configuration\n\nIn order to configure a different file name scheme, the source code\n`ScanImgRename.java` nees to be changed and the `scanimgrename.jar` needs to be\nrebuilt e.g. by invoking `ant jar`.\n\nThe `ma_scanner2` tool\n======================\n\n## Name\n\n`ma_scanner2` -- Ma_Sys.ma Scanning Helper Script\n\n## Synopsis\n\n\tWay A Usage ma_scanner2 [-n] [MODE RESOLUTION INDICES]\n\tWay B Usage ma_scanner2 [-n] DIRECTORY\n\n## Description\n\nTwo different modes of invocation, corresponding to the different ways\nexplained for the workflow are available.\n\nWay A\n:   To perform scans of individual documents, the `scanimage` tool is used.\n    `ma_scanner2` provides an interactive interface querying the user to press\n    [ENTER] to perform the next scan. `MODE` can be one of Color, Gray or\n    Lineart (default: Gray). `RESOLUTION` is the scanning resolution in DPI\n    (default: 150) and `INDICES` is the number of colors to use for the output\n    file. If `-1` is given, all the colors from the scanner are retained.\nWay B\n:   To process scanned documents from an ADF, all pages from PDF files in\n    `DIRECTORY` are processed using parallel processes.\n\n## Options\n\n`-n`\n:   If `-n` is given, _numbers_ are attempted to be assigned to the files by\n    processing them through the Tesseract OCR.\n\n## Examples\n\n`ma_scanner2`\n:   A plain invocation allows scanning documents without numbers.\n`ma_scanner2 -n`\n:   Scan individual pages and attempt to recognize the numbers.\n`ma_scanner2 Color 300 -1`\n:   Scan images without reducing colors and with elevated resolutions.\n    This is useful for non-document scans.\n\n## See also\n\nHere are the links to the script's dependencies. Most of them are optional for\none of the ways described above, see their documentation to find out what they\nare useful for:\n\n[convert(1)](https://manpages.debian.org/buster/imagemagick-6.q16/convert-im6.q16.1.en.html)\n[gimp(1)](https://manpages.debian.org/buster/gimp/gimp.1.en.html)\n[parallel(1)](https://manpages.debian.org/buster/parallel/parallel.1.en.html)\n[pdfimages(1)](https://manpages.debian.org/buster/poppler-utils/pdfimages.1.en.html)\n[scanimage(1)](https://manpages.debian.org/buster/sane-utils/scanimage.1.en.html)\n[tesseract(1)](https://manpages.debian.org/buster/tesseract-ocr/tesseract.1.en.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm7a%2Flp-scanning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fm7a%2Flp-scanning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm7a%2Flp-scanning/lists"}