{"id":40935319,"url":"https://github.com/gsauthof/adf2pdf","last_synced_at":"2026-01-22T04:15:24.647Z","repository":{"id":57408198,"uuid":"106207858","full_name":"gsauthof/adf2pdf","owner":"gsauthof","description":"automate the workflow around ADF scanning, OCR and PDF creation","archived":false,"fork":false,"pushed_at":"2025-10-11T11:17:56.000Z","size":36,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-21T23:46:20.337Z","etag":null,"topics":["adf","duplex-scanning","ocr","pdf","pdf-generation","sane","scanning","tesseract"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gsauthof.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-08T20:31:37.000Z","updated_at":"2025-10-11T11:18:00.000Z","dependencies_parsed_at":"2022-09-13T04:50:35.040Z","dependency_job_id":null,"html_url":"https://github.com/gsauthof/adf2pdf","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/gsauthof/adf2pdf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Fadf2pdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Fadf2pdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Fadf2pdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Fadf2pdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gsauthof","download_url":"https://codeload.github.com/gsauthof/adf2pdf/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsauthof%2Fadf2pdf/sbom","scorecard":{"id":447084,"data":{"date":"2025-08-11","repo":{"name":"github.com/gsauthof/adf2pdf","commit":"f9fa56c5b8ea4417eb7eff77ab70e25067e7e413"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":0,"reason":"Found 0/27 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: COPYING:0","Info: FSF or OSI recognized license: GNU General Public License v3.0: COPYING:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-19T07:05:06.126Z","repository_id":57408198,"created_at":"2025-08-19T07:05:06.126Z","updated_at":"2025-08-19T07:05:06.126Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28653970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T01:17:37.254Z","status":"online","status_checked_at":"2026-01-22T02:00:07.137Z","response_time":144,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adf","duplex-scanning","ocr","pdf","pdf-generation","sane","scanning","tesseract"],"created_at":"2026-01-22T04:15:24.589Z","updated_at":"2026-01-22T04:15:24.638Z","avatar_url":"https://github.com/gsauthof.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"adf2pdf - a tool that turns a batch of paper pages into a PDF\nwith a text layer.  By default, it detects empty pages (as they\nmay easily occur during duplex scanning) and excludes them from\nthe OCR and the resulting PDF.\n\nFor that, it uses [Sane's][5] [scanimage][6] for the scanning,\n[Tesseract][4] for the [optical character recognition][ocr] (OCR), and\nthe Python packages [img2pdf][9], [Pillow (PIL)][10] and\n[PyPDF2][11] for some image-processing tasks and PDF mangling.\n\n\nExample:\n\n    $ adf2pdf contract-xyz.pdf\n\n2017, Georg Sauthoff \u003cmail@gms.tf\u003e\n\n## Features\n\n- Automatic document feed (ADF) support\n- Fast empty page detection\n- Overlaying of scanning, image processing, OCR and PDF creation\n  to minimize the total runtime\n- Fast creation of small PDFs using the fine [img2pdf][9] package\n- Only use of safe compression methods, i.e. no error-prone\n  symbol segmentation style compression like [JBIG2][12] or JB2\n  that is used in [Xerox photocopiers][12] and the DjVu format.\n\n## Install Instructions\n\nAdf2pdf can be directly installed with [`pip`][13], e.g.\n\n    $ pip3 install --user adf2pdf\n\nor\n\n    $ pip3 install adf2pdf\n\nSee also the [PyPI adf2pdf project page][14].\n\nAlternatively, the Python file `adf2pdf.py` can be directly\nexecuted in a cloned repository, e.g.:\n\n    $ ./adf2pdf.py report.pdf\n\nIn addition to that, one can install the development version from\na cloned work-tree like this:\n\n    $ pip3 install --user .\n\n## Hardware Requirements\n\nA scanner with automatic document feed (ADF) that is supported by\nSane. For example, the [Fujitsu ScanSnap S1500][1] works\nwell. That model supports duplex scanning, which is quite\nconvenient.\n\n## Example continued\n\nRunning _adf2pdf_ for a 7 page example document takes 150 seconds\non an i7-6600U (Intel Skylake, 4 cores) CPU (using the ADF of the\nFujitsu ScanSnap S1500). With the defaults, _adf2pdf_ calls\n`scanimage` for duplex scanning into 600 dpi lineart (black and\nwhite) images. In this example, 6 pages are empty and thus\nautomatically excluded, i.e. the resulting PDF then just contains\n8 pages.\n\nThe resulting PDF contains a text layer from the OCR such that\none can search and copy'n'paste some text. It is 1.1 MiB big,\ni.e. a page is stored in 132 KiB, on average.\n\n\n## Related Work\n\nIn case you have existing PDF files without text layer or a scan\nappliance that spits out PDFs but doesn't support OCR,\n[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF)\nmay be a good fit.\nIt takes a PDF file, applies OCR to each page and adds the\nresult as text layer to the input PDF file.\nIt's written in Python and also uses Tesseract for the OCR.\n\nFor users that prefer a GUI,\n[Skanpage](https://invent.kde.org/utilities/skanpage) may fit the bill.\nAs the name suggests, it's a KDE application that provides a\nclean and modern graphical interface to scanning.\nUnlike a few other GUI alternatives, it _does_ also integrate OCR\nvia Tesseract.\nFor example, 'Gnome Document Scanner' (a.k.a. simple-scan) and\nSkanlite (also KDE) do **not** support OCR, as of 2025.\n\n\n## Software Requirements\n\nThe script assumes Tesseract version 4, by default. Version 3 can\nbe used as well, but the [new neural network system in Tesseract\n4][8] just performs magnitudes better than the old OCR model.\nTesseract 4.0.0 was released in late 2018, thus, distributions\nreleased in that time frame may still just include version 3 in\ntheir repositories (e.g. Fedora 29 while Fedora 30 features version\n4). Since version 4 is so much better at OCR I can't recommend it\nenough over the stable version 3.\n\nTesseract 4 notes (in case you need to build it from the sources):\n\n- [Build instructions][2] - warning: if you miss the\n  `autoconf-archive` dependency you'll get weird autoconf error\n  messages\n- [Data files][3] - you need the training data for your\n  languages of choice and the OSD data\n\nPython packages:\n\n- [img2pdf][9] (Fedora package: python3-img2pdf)\n- [Pillow (PIL)][10] (Fedora package: python3-pillow-devel)\n- [PyPDF2][11] (Fedora package: python3-PyPDF2)\n\n[1]: http://www.fujitsu.com/us/products/computing/peripheral/scanners/product/eol/s1500/\n[2]: https://github.com/tesseract-ocr/tesseract/wiki/Compiling-–-GitInstallation\n[3]: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files\n[4]: https://en.wikipedia.org/wiki/Tesseract_(software)\n[5]: https://en.wikipedia.org/wiki/Scanner_Access_Now_Easy\n[6]: http://www.sane-project.org/man/scanimage.1.html\n[7]: https://en.wikipedia.org/wiki/Optical_character_recognition\n[8]: https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00\n[9]: https://pypi.org/project/img2pdf/\n[10]: http://python-pillow.github.io/\n[11]: https://github.com/mstamy2/PyPDF2\n[12]: https://en.wikipedia.org/wiki/JBIG2\n[13]: https://en.wikipedia.org/wiki/Pip_(package_manager)\n[14]: https://pypi.org/project/adf2pdf/\n[ocr]: https://en.wikipedia.org/wiki/Optical_character_recognition\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsauthof%2Fadf2pdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgsauthof%2Fadf2pdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsauthof%2Fadf2pdf/lists"}