https://github.com/nymann/pdf-scrub
https://github.com/nymann/pdf-scrub
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/nymann/pdf-scrub
- Owner: nymann
- Created: 2022-08-27T17:36:38.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2022-08-28T14:17:38.000Z (almost 4 years ago)
- Last Synced: 2025-03-28T21:06:49.878Z (about 1 year ago)
- Language: Python
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDF Scrub
_Scrubs encrypted compressed PDF files for text watermarks and metadata._
1. Decrypts the PDF if it's encrypted
2. Uncompresses the PDF
3. Removes metadata (Xpacket)
4. Tries to naively remove text based watermarks by matching objects which number of occurrences, is the same as the PDF page count. If multiple objects match, produce a pdf for each.
5. Optionally compresses the PDF again if `--no-compress` is not given as a command line argument.
## Usage
```sh
$ pdf_scrub --help
Usage: pdf_scrub [OPTIONS] FILES...
Arguments:
FILES... [required]
Options:
--compress / --no-compress Compress the final pdf to reduce file size greatly [default: compress]
```
## Dependencies
Requires `qpdf` and `pdftk`.
## Development
For help getting started developing check [DEVELOPMENT.md](DEVELOPMENT.md)