{"id":15343818,"url":"https://github.com/rocketraman/sane-scan-pdf","last_synced_at":"2025-10-23T16:47:45.098Z","repository":{"id":45113021,"uuid":"43214906","full_name":"rocketraman/sane-scan-pdf","owner":"rocketraman","description":"Sane command-line scan-to-pdf script on Linux with OCR and deskew support","archived":false,"fork":false,"pushed_at":"2024-03-10T14:18:58.000Z","size":57,"stargazers_count":142,"open_issues_count":4,"forks_count":32,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-31T10:22:33.342Z","etag":null,"topics":["deskew","linux","ocr","sane","scanner","scanning","unpaper"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rocketraman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-09-26T17:15:39.000Z","updated_at":"2024-12-28T15:54:48.000Z","dependencies_parsed_at":"2024-10-01T10:51:38.011Z","dependency_job_id":"2c255ebf-da2b-4cd7-ba35-7d5dbdb45299","html_url":"https://github.com/rocketraman/sane-scan-pdf","commit_stats":{"total_commits":88,"total_committers":6,"mean_commits":"14.666666666666666","dds":0.2272727272727273,"last_synced_commit":"019e95cb64bb31c60e6bf48c97b2f9f5728405ab"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocketraman%2Fsane-scan-pdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocketraman%2Fsane-scan-pdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocketraman%2Fsane-scan-pdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocketraman%2Fsane-scan-pdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rocketraman","download_url":"https://codeload.github.com/rocketraman/sane-scan-pdf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252883397,"owners_count":21819170,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deskew","linux","ocr","sane","scanner","scanning","unpaper"],"created_at":"2024-10-01T10:51:31.153Z","updated_at":"2025-10-23T16:47:40.050Z","avatar_url":"https://github.com/rocketraman.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SANE Command-Line Scan to PDF\n\nSane command-line scanning bash shell script on Linux with OCR and deskew support. The script automates\ncommon scan-to-pdf operations for scanners with an automatic document feeder, such as the awesome Fujitsu\nScanSnap S1500, with output to PDF files.\n\nTested and run regularly on Fedora, but should work on other distributions with the requirements below.\n\n## Features\n\n* Join scanned pages into a single output file, or specify a name for each page\n* Deskew (if supported by scanner driver, or software-based via unpaper)\n* Crop (if supported by scanner driver)\n* Creates searchable PDFs (with tesseract)\n* Duplex (if scanner supports it)\n* Specify resolution\n* Truncate n pages explicitly from end of scan e.g. duplex scanning with last page truncated\n* Skip white-only pages automatically (with ImageMagick)\n* Specify page width and height for odd size pages, or common sizes (Letter, Legal, A4)\n* Performance: scanner run in parallel with page post-processing\n* Limit parallel processing for very fast scanners or constrained environments (if sem installed)\n* Post-scan open scan output(s) in viewer\n* Configuration via default and named option groups\n\n## Requirements\n\nThe following dependencies are requirements of the script. See also [Dependencies\nInstallation](https://github.com/rocketraman/sane-scan-pdf/wiki/Dependencies-Installation).\n\n* bash\n* pnmtops (netpbm-progs)\n* ps2pdf (ghostscript)\n* pdfunite (poppler-utils)\n* units (units)\n* ImageMagick (if --skip-empty-pages or --ocr is used)\n\n### Optional\n\n* unpaper (for software deskew)\n* flock (usually provided by util-linux) (for properly ordered verbose logs)\n* tesseract (to make searchable PDFs)\n* sem (via gnu-parallels, to constrain resource usage during page processing -- install this if you have a fast scanner)\n* bc (for whitepage detection percentage calculations)\n* xdg-open (for opening scan after completion)\n\n## Getting Started\n\n```\n# scan --help\nscan [OPTIONS]... [OUTPUT]\n\nOPTIONS\n -v, --verbose\n   Verbose output (this will slow down the scan due to the need to prevent interleaved output)\n -d, --duplex\n   Duplex scanning\n -m, --mode\n   Mode e.g. Lineart (default), Halftone, Gray, Color, etc. Use --mode-hw-default to not set any mode\n --mode-hw-default\n   Do not set the mode explicitly, use the hardware default — ignored if --mode is set\n -r, --resolution\n   Resolution e.g 300 (default)\n -a, --append\n   Append output to existing scan\n -e, --max \u003cpages\u003e\n   Max number of pages e.g. 2 (default is all pages)\n -t, --truncate \u003cpages\u003e\n   Truncate number of pages from end e.g. 1 (default is none) -- truncation happens after --skip-empty-pages\n -s, --size\n   Page Size as type e.g. Letter (default), Legal, A4, no effect if --crop is specified\n -ph, --page-height\n   Custom Page Height in mm\n -pw, --page-width\n   Custom Page Width in mm\n -x, --device\n   Override scanner device name, defaulting to \"fujitsu\", pass an empty value for no device arg\n -xo, --driver-options\n   Send additional options to the scanner driver e.g.\n   -xo \"--whatever bar --frobnitz baz\"\n --no-default-size\n   Disable default page size, useful if driver does not support page size/location arguments\n --crop\n   Crop to contents (driver must support this)\n --deskew\n   Run driver deskew (driver must support this)\n --unpaper\n   Run post-processing deskew and black edge detection (requires unpaper)\n --ocr\n   Run OCR to make the PDF searchable (requires tesseract)\n --language \u003clang\u003e\n   which language to use for OCR\n --skip-empty-pages\n   remove empty pages from resulting PDF document (e.g. one sided doc in duplex mode)\n --white-threshold\n   threshold to identify an empty page is a percentage value between 0 and 100. The default is 99.8\n --brightness-contrast-sw\n   Alter brightness and contrast via post-processing - prefer specifying brightness and/or\n   contrast via --driver-options if supported by your hardware.\n --open\n   After scanning, open the scan via xdg-open\n -og, --option-group\n   A named option group. Useful for saving collections of options under a name e.g. 'receipt' for easy reuse.\n   Use this option in combination with '--help' to show the location and content of the file and edit it manually.\n\nCONFIGURATION\n\u003cnot shown, system-specific, run `--help` locally\u003e\n```\n\n### Configuration\n\nUse `--help` locally to show the location of optional configuration and\npre-scan hook scripts. These scripts may contain environment variables to\npre-configure `scan`. For example the contents of the `default` file may be\nsomething like:\n\n```\nDEVICE=something\nSEARCHABLE=1\nMODE_HW_DEFAULT=1\n```\n\nCommand line argument `--option-group foo` (or `-og foo`) will read the\n`foo` file in the standard XDG home config directory (use `-og foo --help`\nto see the exact location) for configuration.\n\nFor example, if one wishes to scan receipts always with crop, deskew, unpaper\npost-processing, and making them searchable via OCR, a `receipt` option group\ncan be created by writing the following to a file named `receipt` in the\nconfig directory:\n\n```\nCROP=1\nDESKEW=1\nUNPAPER=1\nSEARCHABLE=1\n```\n\nCommand-line arguments will overwride settings in the default and named\nconfigurations. All command line flags support prefixing with `no-` in order to\noverride configuration settings. For example, to scan receipts using the option\ngroup above, but without making it searchable, you would do:\n\n```\n--option-group receipt --no-searchable\n```\n\n### Tips\n\nThe default scanner device is set to `fujitsu`. If you have another scanner,\nyou will need to use the `-x`/`--device` argument to specify your scanner,\nor save a `DEVICE=something` line to a local config file as shown above.\nSee below for how to get the list of available devices.\n\nIf running via `scanbd`, scanning occurs via the `net` driver rather than the\nusual device driver. In this case, it may be necessary to specify the net\ndriver device in the `scanbd` script, OR perhaps do not specify any device\nat all to let the script choose the best device when running outside of\n`scanbd`, and when running via `scanbd`. To do this, use an empty device\ni.e. `--device \"\"`.\n\nThe scanners and scanner drivers vary in features they support. This script\nprovides several options to the underlying scanner driver by default, and\nthese options may not be supported by your scanner/scanner driver. If\nyou are receiving an error about `--page-width`/`--page-height` being\nunrecognized options, try the `--no-default-size` option. If you receive an\nerror about the `--mode` value being invalid, try `--mode-hw-default`\nand see below for how to retrieve the list of modes that your system understands.\n\n### Helpful Commands\n\nList available scanner devices (for `-x`/`--device` argument):\n\n```\nscanadf -L\n```\n\nList available device-specific options, including acceptable values for\n`-m`/`--mode` and `-r`/`--resolution`:\n\n```\nscanadf [-d \u003cdevice\u003e] --help\n```\n\n## Author(s)\n\n* [Raman Gupta](https://github.com/rocketraman/)\n\nWith assistance from\n[various other contributors](https://github.com/rocketraman/sane-scan-pdf/graphs/contributors)!\nThank you!\n\n## Blog Post Mentions\n\nThe following blog posts talk about scanner automation, and mention use of this\nscript. If you create a blog post, please submit a PR and add your link here!\n\n* [Stefan Armbruster - Jan 1, 2019 - Running Paperless on FreeNAS](https://blog.armbruster-it.de/2019/01/running-paperless-on-freenas/)\n* [Chris Schuld - Jan 8, 2020 - Network Scanner with Fujitsu ScanSnap and a Raspberry Pi](https://chrisschuld.com/2020/01/network-scanner-with-scansnap-and-raspberry-pi/)\n\n## Other Useful Software\n\n* [OCRmyPDF](https://github.com/jbarlow83/OCRmyPDF) - forgot to use the `--ocr` option at scanning time? use this\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocketraman%2Fsane-scan-pdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frocketraman%2Fsane-scan-pdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocketraman%2Fsane-scan-pdf/lists"}