{"id":40275628,"url":"https://github.com/lgbarn/pdf-cli","last_synced_at":"2026-02-01T05:03:42.764Z","repository":{"id":333569919,"uuid":"1137813371","full_name":"lgbarn/pdf-cli","owner":"lgbarn","description":"A fast, single-binary CLI tool for common PDF operations","archived":false,"fork":false,"pushed_at":"2026-01-20T04:08:58.000Z","size":249409,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-20T04:22:48.270Z","etag":null,"topics":["cli","command-line-tool","go","golang","pdf","pdf-compression","pdf-manipulation","pdf-merger","pdf-splitter","pdf-tools"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lgbarn.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-19T21:48:34.000Z","updated_at":"2026-01-20T04:09:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lgbarn/pdf-cli","commit_stats":null,"previous_names":["lgbarn/pdf-cli"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/lgbarn/pdf-cli","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgbarn%2Fpdf-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgbarn%2Fpdf-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgbarn%2Fpdf-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgbarn%2Fpdf-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lgbarn","download_url":"https://codeload.github.com/lgbarn/pdf-cli/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgbarn%2Fpdf-cli/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28968721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T04:44:20.970Z","status":"ssl_error","status_checked_at":"2026-02-01T04:44:19.994Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","command-line-tool","go","golang","pdf","pdf-compression","pdf-manipulation","pdf-merger","pdf-splitter","pdf-tools"],"created_at":"2026-01-20T03:06:07.099Z","updated_at":"2026-02-01T05:03:42.759Z","avatar_url":"https://github.com/lgbarn.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pdf-cli\n\n[![CI](https://github.com/lgbarn/pdf-cli/actions/workflows/ci.yaml/badge.svg)](https://github.com/lgbarn/pdf-cli/actions/workflows/ci.yaml)\n[![Go Report Card](https://goreportcard.com/badge/github.com/lgbarn/pdf-cli)](https://goreportcard.com/report/github.com/lgbarn/pdf-cli)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Go Version](https://img.shields.io/github/go-mod/go-version/lgbarn/pdf-cli)](https://go.dev/)\n\nA fast, lightweight command-line tool for everyday PDF operations. No GUI needed, no complicated setup—just simple commands to merge, split, compress, encrypt, and manipulate PDF files.\n\n## Table of Contents\n\n- [Why pdf-cli?](#why-pdf-cli)\n- [Quick Start](#quick-start)\n- [Installation](#installation)\n- [Commands](#commands)\n- [Usage Examples](#usage-examples)\n- [Global Options](#global-options)\n- [Configuration](#configuration)\n- [Shell Completion](#shell-completion)\n- [Building from Source](#building-from-source)\n- [Troubleshooting](#troubleshooting)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Why pdf-cli?\n\n- **Fast**: Single binary with no external dependencies, parallel processing for large operations\n- **Simple**: Intuitive commands that do one thing well\n- **Secure**: Supports encrypted PDFs with password protection\n- **Cross-platform**: Works on Linux, macOS, and Windows\n- **Scriptable**: Perfect for automation and batch processing with JSON/CSV/TSV output\n- **Unix-friendly**: Supports stdin/stdout for seamless pipelines\n- **OCR Support**: Extract text from scanned PDFs using native Tesseract (when installed) or built-in WASM fallback\n\n## Quick Start\n\n```bash\n# Install\ngo install github.com/lgbarn/pdf-cli/cmd/pdf@latest\n\n# Merge two PDFs\npdf merge -o combined.pdf file1.pdf file2.pdf\n\n# Extract pages 1-5 from a PDF\npdf extract document.pdf -p 1-5 -o pages.pdf\n\n# Compress a large PDF\npdf compress large.pdf -o smaller.pdf\n\n# Batch compress multiple PDFs\npdf compress *.pdf\n\n# Get PDF info\npdf info document.pdf\n\n# Get PDF info as JSON (for scripting)\npdf info document.pdf --format json\n\n# Extract text from a scanned PDF using OCR\npdf text scanned.pdf --ocr\n\n# Process PDF from stdin (Unix pipes)\ncat document.pdf | pdf text -\ncurl -s https://example.com/doc.pdf | pdf info -\n```\n\n## Installation\n\n### Prerequisites\n\n- Go 1.25 or later (for installation via `go install`)\n\n### Using Go (Recommended)\n\n```bash\ngo install github.com/lgbarn/pdf-cli/cmd/pdf@latest\n```\n\n### Pre-built Binaries\n\nDownload the latest release for your platform from the [Releases page](https://github.com/lgbarn/pdf-cli/releases).\n\nAvailable platforms:\n- Linux (amd64, arm64)\n- macOS (amd64, arm64)\n- Windows (amd64)\n\n### From Source\n\n```bash\ngit clone https://github.com/lgbarn/pdf-cli.git\ncd pdf-cli\nmake build\n```\n\n## Commands\n\n| Command | Description | Batch | stdin | stdout |\n|---------|-------------|:-----:|:-----:|:------:|\n| `info` | Display PDF information (pages, metadata, encryption status) | ✓ | ✓ | - |\n| `merge` | Combine multiple PDFs into a single file | - | - | - |\n| `split` | Split a PDF into individual pages or chunks | - | - | - |\n| `extract` | Extract specific pages into a new PDF | - | ✓ | ✓ |\n| `reorder` | Reorder, reverse, or duplicate pages | - | ✓ | ✓ |\n| `rotate` | Rotate pages by 90, 180, or 270 degrees | ✓ | ✓ | ✓ |\n| `compress` | Optimize and reduce PDF file size | ✓ | ✓ | ✓ |\n| `encrypt` | Add password protection to a PDF | ✓ | ✓ | ✓ |\n| `decrypt` | Remove password protection from a PDF | ✓ | ✓ | ✓ |\n| `text` | Extract text content (supports OCR for scanned PDFs) | - | ✓ | - |\n| `images` | Extract embedded images from a PDF | - | - | - |\n| `combine-images` | Create a PDF from multiple images | - | - | - |\n| `meta` | View or modify PDF metadata (title, author, etc.) | ✓ | - | - |\n| `watermark` | Add text or image watermarks | ✓ | - | - |\n| `pdfa` | PDF/A validation and conversion | - | ✓ | ✓ |\n\n## Usage Examples\n\n### Get PDF Information\n\n```bash\n# Single file - detailed output\npdf info document.pdf\n\n# Multiple files - summary table\npdf info *.pdf\n\n# Machine-readable output (JSON, CSV, TSV)\npdf info document.pdf --format json\npdf info *.pdf --format csv \u003e report.csv\npdf info *.pdf --format tsv\n\n# Process via jq\npdf info document.pdf --format json | jq '.pages'\n```\n\nSingle file output:\n```\nFile:       document.pdf\nSize:       2.45 MB\nPages:      42\nVersion:    1.7\nTitle:      Annual Report\nAuthor:     John Doe\nEncrypted:  No\n```\n\nJSON output (`--format json`):\n```json\n{\n  \"file\": \"document.pdf\",\n  \"size\": 2568192,\n  \"sizeHuman\": \"2.45 MB\",\n  \"pages\": 42,\n  \"version\": \"1.7\",\n  \"title\": \"Annual Report\",\n  \"author\": \"John Doe\",\n  \"encrypted\": false\n}\n```\n\nBatch output:\n```\nFILE                                        PAGES    VER       SIZE\n----------------------------------------------------------------------\ndocument1.pdf                                  42    1.7    2.45 MB\ndocument2.pdf                                  15    1.5  512.00 KB\nreport.pdf                                    128    1.7   10.23 MB\n```\n\n### Merge Multiple PDFs\n\n```bash\n# Merge two files\npdf merge -o combined.pdf file1.pdf file2.pdf\n\n# Merge all PDFs in a directory\npdf merge -o combined.pdf *.pdf\n```\n\n### Split a PDF\n\n```bash\n# Split into individual pages (creates page_001.pdf, page_002.pdf, etc.)\npdf split document.pdf -o output/\n\n# Split into chunks of 5 pages each\npdf split document.pdf -n 5 -o chunks/\n```\n\n### Extract Specific Pages\n\n```bash\n# Extract pages 1 through 5\npdf extract document.pdf -p 1-5 -o first-five.pdf\n\n# Extract specific pages and ranges\npdf extract document.pdf -p 1,3,5,10-15 -o selected.pdf\n```\n\n### Reorder Pages\n\n```bash\n# Move page 5 to position 2\npdf reorder document.pdf -s \"1,5,2,3,4\" -o reordered.pdf\n\n# Reverse all pages\npdf reorder document.pdf -s \"end-1\" -o reversed.pdf\n\n# Duplicate page 1 at the end\npdf reorder document.pdf -s \"1-end,1\" -o with-copy.pdf\n\n# Remove the first page\npdf reorder document.pdf -s \"2-end\" -o skip-first.pdf\n```\n\n### Rotate Pages\n\n```bash\n# Rotate all pages 90 degrees clockwise\npdf rotate document.pdf -a 90 -o rotated.pdf\n\n# Rotate only pages 1-5 by 180 degrees\npdf rotate document.pdf -a 180 -p 1-5 -o rotated.pdf\n```\n\n### Compress a PDF\n\n```bash\n# Compress a single file\npdf compress large.pdf -o smaller.pdf\n\n# Batch compress multiple PDFs (output: *_compressed.pdf)\npdf compress *.pdf\n\n# With progress bar for large files\npdf compress large.pdf -o smaller.pdf --progress\n\n# stdin/stdout support for pipelines\ncat large.pdf | pdf compress - --stdout \u003e compressed.pdf\ncurl -s https://example.com/doc.pdf | pdf compress - --stdout \u003e local.pdf\n```\n\n### Encrypt a PDF\n\n```bash\n# Add password protection (prompts interactively)\npdf encrypt document.pdf -o secure.pdf\n\n# Using a password file (recommended for scripts)\npdf encrypt document.pdf --password-file pass.txt -o secure.pdf\n\n# Using environment variable\nexport PDF_CLI_PASSWORD=mysecret\npdf encrypt document.pdf -o secure.pdf\n\n# Set separate user and owner passwords\npdf encrypt document.pdf --password-file user.txt --owner-password ownerpass -o secure.pdf\n\n# Batch encrypt multiple PDFs (output: *_encrypted.pdf)\npdf encrypt *.pdf --password-file pass.txt\n```\n\n### Decrypt a PDF\n\n```bash\n# Decrypt a single file (prompts interactively)\npdf decrypt secure.pdf -o unlocked.pdf\n\n# Using a password file (recommended for scripts)\npdf decrypt secure.pdf --password-file pass.txt -o unlocked.pdf\n\n# Using environment variable\nexport PDF_CLI_PASSWORD=mysecret\npdf decrypt secure.pdf -o unlocked.pdf\n\n# Batch decrypt multiple PDFs (output: *_decrypted.pdf)\npdf decrypt *.pdf --password-file pass.txt\n```\n\n### Extract Text\n\n```bash\n# Print text to terminal\npdf text document.pdf\n\n# Save to a file\npdf text document.pdf -o content.txt\n\n# Extract text from specific pages\npdf text document.pdf -p 1-5 -o chapter1.txt\n\n# With progress bar for large documents\npdf text large-document.pdf --progress\n\n# Read from stdin\ncat document.pdf | pdf text -\ncurl -s https://example.com/doc.pdf | pdf text -\n```\n\n### Extract Text with OCR (for scanned PDFs)\n\n```bash\n# Use OCR for scanned/image-based PDFs\npdf text scanned.pdf --ocr\n\n# OCR with specific language (downloads tessdata on first use for WASM)\npdf text scanned.pdf --ocr --ocr-lang eng\n\n# Multi-language OCR\npdf text scanned.pdf --ocr --ocr-lang eng+fra\n\n# OCR specific pages and save to file\npdf text scanned.pdf --ocr -p 1-10 -o content.txt\n\n# Force native Tesseract (if installed)\npdf text scanned.pdf --ocr --ocr-backend=native\n\n# Force WASM Tesseract (no system dependencies)\npdf text scanned.pdf --ocr --ocr-backend=wasm\n\n# Auto-select (native if available, else WASM) - this is the default\npdf text scanned.pdf --ocr --ocr-backend=auto\n```\n\n**OCR Backend Selection:**\n- `auto` (default): Uses native Tesseract if installed, otherwise falls back to WASM\n- `native`: Requires system Tesseract installation but provides better quality/speed\n- `wasm`: Built-in, no external dependencies, downloads tessdata on first use (~15MB/language)\n\n**OCR Reliability:**\n- Tessdata downloads include SHA256 checksum verification for integrity\n- Automatic retry with exponential backoff on network failures\n- Corrupted downloads are detected and re-attempted\n\n### Extract Images\n\n```bash\n# Extract all images\npdf images document.pdf -o images/\n\n# Extract images from specific pages\npdf images document.pdf -p 1-10 -o images/\n```\n\n### Using stdin/stdout Pipelines\n\npdf-cli supports Unix-style pipelines for processing PDFs without intermediate files:\n\n```bash\n# Download and extract text in one command\ncurl -s https://example.com/document.pdf | pdf text -\n\n# Download, compress, and save\ncurl -s https://example.com/large.pdf | pdf compress - --stdout \u003e compressed.pdf\n\n# Chain multiple operations\ncat input.pdf | pdf extract - -p 1-5 --stdout | pdf rotate - -a 90 --stdout \u003e output.pdf\n\n# Process PDF from another command\ngenerate-report | pdf compress - --stdout \u003e report.pdf\n\n# Get info from a remote PDF\ncurl -s https://example.com/doc.pdf | pdf info - --format json | jq '.pages'\n```\n\n**Notes:**\n- Use `-` as the input file to read from stdin\n- Use `--stdout` flag to write binary output to stdout\n- When using stdin, pdfcpu requires the entire file, so the PDF is temporarily stored\n\n### Combine Images into PDF\n\n```bash\n# Create PDF from multiple images\npdf combine-images photo1.jpg photo2.jpg -o album.pdf\n\n# Create PDF from all PNG files in current directory\npdf combine-images *.png -o scans.pdf\n\n# Create PDF with specific page size\npdf combine-images scan1.png scan2.png -o document.pdf --page-size A4\n```\n\n### View and Modify Metadata\n\n```bash\n# View metadata for a single file\npdf meta document.pdf\n\n# View metadata for multiple files\npdf meta *.pdf\n\n# Set metadata\npdf meta document.pdf --title \"My Document\" --author \"Jane Doe\" -o updated.pdf\n\n# Set multiple fields\npdf meta document.pdf \\\n  --title \"Annual Report\" \\\n  --author \"John Doe\" \\\n  --subject \"2024 Financial Summary\" \\\n  -o updated.pdf\n```\n\n### Add Watermarks\n\n```bash\n# Add text watermark\npdf watermark document.pdf -t \"CONFIDENTIAL\" -o marked.pdf\n\n# Add image watermark (logo)\npdf watermark document.pdf -i logo.png -o branded.pdf\n\n# Watermark specific pages only\npdf watermark document.pdf -t \"DRAFT\" -p 1-5 -o draft.pdf\n\n# Batch watermark multiple PDFs (output: *_watermarked.pdf)\npdf watermark *.pdf -t \"CONFIDENTIAL\"\n```\n\n### PDF/A Validation and Conversion\n\n```bash\n# Validate PDF/A compliance\npdf pdfa validate document.pdf\n\n# Validate against specific PDF/A level\npdf pdfa validate document.pdf --level 1b\n\n# Convert/optimize a PDF toward PDF/A format\npdf pdfa convert document.pdf -o archive.pdf\n\n# Convert with specific target level\npdf pdfa convert document.pdf --level 2b -o archive.pdf\n```\n\n**Note:** Full PDF/A validation and conversion may require specialized tools. This tool provides basic validation and optimization that can help with PDF/A compliance. For comprehensive validation, consider using [veraPDF](https://verapdf.org/).\n\n\u003e **⚠️ PDF/A Limitations**\n\u003e\n\u003e This tool provides **basic** PDF/A validation and optimization, not full ISO compliance:\n\u003e\n\u003e | Feature | Status |\n\u003e |---------|--------|\n\u003e | Structure validation | ✓ Supported |\n\u003e | Encryption detection | ✓ Supported |\n\u003e | Font embedding check | ✗ Limited |\n\u003e | Color profile validation | ✗ Not supported |\n\u003e | Full ISO 19005 compliance | ✗ Not supported |\n\u003e\n\u003e For comprehensive PDF/A validation, use [veraPDF](https://verapdf.org/).\n\u003e For full PDF/A conversion, consider Ghostscript or Adobe Acrobat.\n\n## Global Options\n\nThese options work with all commands:\n\n| Option | Short | Description |\n|--------|-------|-------------|\n| `--verbose` | `-v` | Show detailed output during operations |\n| `--force` | `-f` | Overwrite existing files without prompting |\n| `--progress` | | Show progress bar for long operations |\n| `--password-file` | | Path to file containing password for encrypted PDFs |\n| `--password` | `-P` | Password for encrypted PDFs (deprecated, use --password-file) |\n| `--dry-run` | | Preview what would happen without making changes |\n| `--log-level` | | Set logging level: `debug`, `info`, `warn`, `error`, `silent` (default: silent) |\n| `--log-format` | | Set log format: `text` or `json` (default: text) |\n| `--help` | `-h` | Show help for any command |\n| `--version` | | Display version information |\n\n### Dry-Run Mode\n\nPreview operations without making any changes:\n\n```bash\n# See what files would be created\npdf compress *.pdf --dry-run\n\n# Preview merge operation\npdf merge -o combined.pdf *.pdf --dry-run\n\n# Check encryption without modifying files\npdf encrypt document.pdf --password secret --dry-run\n```\n\n### Logging\n\nEnable structured logging for debugging or monitoring:\n\n```bash\n# Debug logging to see detailed operations\npdf compress large.pdf --log-level debug\n\n# JSON logging for log aggregation\npdf merge -o out.pdf *.pdf --log-level info --log-format json\n```\n\n### Command-Specific Options\n\n| Option | Commands | Description |\n|--------|----------|-------------|\n| `--format` | info, meta, pdfa | Output format: `json`, `csv`, `tsv` (default: human-readable) |\n| `--stdout` | compress, extract, rotate, reorder, encrypt, decrypt, pdfa convert | Write binary output to stdout |\n| `-` (stdin) | text, info, compress, extract, rotate, reorder, encrypt, decrypt, pdfa convert | Read PDF from stdin |\n\n### Working with Encrypted PDFs\n\npdf-cli provides multiple secure ways to handle passwords for encrypted PDFs:\n\n**1. Interactive prompt (recommended for manual use):**\n```bash\npdf info secure.pdf\n# Prompts: Enter password:\n```\n\n**2. Password file (recommended for scripts/automation):**\n```bash\npdf info secure.pdf --password-file /path/to/password.txt\n```\n\n**3. Environment variable:**\n```bash\nexport PDF_CLI_PASSWORD=mysecret\npdf info secure.pdf\n```\n\n**4. Command-line flag (deprecated, shows warning):**\n```bash\npdf info secure.pdf --password mysecret\n# WARNING: --password flag exposes passwords in process listings\n```\n\nPassword sources are checked in the above order. The first available source is used.\n\n**Examples:**\n```bash\n# Read encrypted PDF info\npdf info secure.pdf --password-file pass.txt\n\n# Extract pages from encrypted PDF\npdf extract secure.pdf -p 1-5 -o pages.pdf --password-file pass.txt\n\n# Batch processing with environment variable\nexport PDF_CLI_PASSWORD=mysecret\npdf compress *.pdf\n```\n\n## Configuration\n\npdf-cli supports an optional configuration file for setting default values.\n\n### Config File Location\n\nThe config file is loaded from (in order of precedence):\n1. `$XDG_CONFIG_HOME/pdf-cli/config.yaml`\n2. `~/.config/pdf-cli/config.yaml`\n\n### Example Configuration\n\n```yaml\n# ~/.config/pdf-cli/config.yaml\n\ndefaults:\n  verbose: false\n  force: false\n  progress: true\n\ncompress:\n  # No specific defaults\n\nencrypt:\n  # Default encryption settings\n\nocr:\n  language: \"eng\"\n  backend: \"auto\"  # auto, native, or wasm\n```\n\n### Environment Variables\n\nAll config options can be overridden with environment variables using the `PDF_CLI_` prefix:\n\n```bash\n# Override verbose mode\nexport PDF_CLI_VERBOSE=true\n\n# Override OCR language\nexport PDF_CLI_OCR_LANGUAGE=eng+fra\n\n# Override OCR backend\nexport PDF_CLI_OCR_BACKEND=native\n\n# Password for encrypted PDFs\nexport PDF_CLI_PASSWORD=mysecret\n```\n\n**Performance tuning:**\n```bash\n# OCR threshold (parallel processing triggered when page count exceeds this)\nexport PDF_CLI_PERF_OCR_THRESHOLD=5\n\n# Text extraction threshold (parallel processing for pages above this)\nexport PDF_CLI_PERF_TEXT_THRESHOLD=5\n\n# Maximum concurrent workers (default: runtime.NumCPU())\nexport PDF_CLI_PERF_MAX_WORKERS=8\n```\n\nEnvironment variables take precedence over config file values.\n\n## Shell Completion\n\nEnable tab completion for your shell:\n\n### Bash\n\n```bash\n# Add to ~/.bashrc\necho 'source \u003c(pdf completion bash)' \u003e\u003e ~/.bashrc\n\n# Or install system-wide\npdf completion bash | sudo tee /etc/bash_completion.d/pdf \u003e /dev/null\n```\n\n### Zsh\n\n```bash\n# Add to ~/.zshrc\necho 'source \u003c(pdf completion zsh)' \u003e\u003e ~/.zshrc\n```\n\n### Fish\n\n```bash\npdf completion fish \u003e ~/.config/fish/completions/pdf.fish\n```\n\n### PowerShell\n\n```powershell\npdf completion powershell | Out-String | Invoke-Expression\n```\n\n## Building from Source\n\n### Prerequisites\n\n- Go 1.25 or later\n- Make (optional, for convenience commands)\n\n### Build Commands\n\n```bash\n# Clone the repository\ngit clone https://github.com/lgbarn/pdf-cli.git\ncd pdf-cli\n\n# Build for your current platform\nmake build\n\n# Run tests\nmake test\n\n# Run tests with coverage report\nmake test-coverage\n\n# Build for all platforms\nmake build-all\n\n# Clean build artifacts\nmake clean\n```\n\n### Project Structure\n\n```\npdf-cli/\n├── cmd/pdf/              # Application entry point\n├── internal/\n│   ├── cli/              # CLI framework and flags\n│   ├── commands/         # Individual command implementations\n│   │   └── patterns/     # Reusable command patterns (StdioHandler)\n│   ├── cleanup/          # Signal-based temp file cleanup registry\n│   ├── config/           # Configuration file support\n│   ├── fileio/           # File operations and stdio utilities\n│   ├── logging/          # Structured logging with slog\n│   ├── ocr/              # OCR text extraction (native + WASM backends)\n│   │   ├── backend.go    # Backend interface and types\n│   │   ├── detect.go     # Native Tesseract detection\n│   │   ├── native.go     # Native Tesseract backend\n│   │   ├── wasm.go       # WASM Tesseract backend\n│   │   └── ocr.go        # Engine with backend selection\n│   ├── output/           # Output formatting (JSON, CSV, TSV)\n│   ├── pages/            # Page range parsing and validation\n│   ├── pdf/              # PDF processing (modular design)\n│   │   ├── metadata.go   # Info, page count, metadata\n│   │   ├── transform.go  # Merge, split, rotate, compress\n│   │   ├── encryption.go # Encrypt, decrypt\n│   │   ├── text.go       # Text extraction\n│   │   ├── watermark.go  # Watermarking\n│   │   └── validation.go # PDF/A validation\n│   ├── pdferrors/        # Error handling with context\n│   ├── progress/         # Progress bar utilities\n│   ├── retry/            # Exponential backoff retry logic\n│   └── testing/          # Test infrastructure and mocks\n├── docs/\n│   └── architecture.md   # Architecture documentation\n├── testdata/             # Test PDF files\n├── .github/workflows/    # CI/CD pipelines\n├── Makefile              # Build automation\n├── CONTRIBUTING.md       # Contribution guidelines\n└── README.md\n```\n\nFor detailed architecture information, see [docs/architecture.md](docs/architecture.md).\n\n## Troubleshooting\n\n### \"command not found: pdf\"\n\nMake sure your Go bin directory is in your PATH:\n\n```bash\nexport PATH=$PATH:$(go env GOPATH)/bin\n```\n\nAdd this line to your `~/.bashrc`, `~/.zshrc`, or equivalent.\n\n### \"failed to open file: permission denied\"\n\nCheck file permissions:\n\n```bash\nls -la document.pdf\nchmod 644 document.pdf  # Make readable\n```\n\n### \"encrypted PDF requires password\"\n\nThe PDF is password-protected. Use one of the secure password methods:\n\n```bash\n# Interactive prompt (recommended)\npdf info document.pdf\n\n# Password file (recommended for scripts)\npdf info document.pdf --password-file pass.txt\n\n# Environment variable\nexport PDF_CLI_PASSWORD=yourpassword\npdf info document.pdf\n```\n\n### \"no text extracted\" from a PDF\n\nSome PDFs contain scanned images instead of actual text. Use the `--ocr` flag to extract text using OCR:\n\n```bash\npdf text scanned.pdf --ocr\n```\n\nThe OCR engine automatically uses native Tesseract if installed, or falls back to the built-in WASM version.\n\n### Native Tesseract not detected\n\nIf you have Tesseract installed but pdf-cli doesn't detect it:\n\n```bash\n# Check if Tesseract is in PATH\ntesseract --version\n\n# Force native backend to see the error\npdf text scanned.pdf --ocr --ocr-backend=native -v\n```\n\nCommon solutions:\n- Ensure `tesseract` is in your PATH\n- Set `TESSDATA_PREFIX` to your tessdata directory\n- Install Tesseract: `brew install tesseract` (macOS) or `apt install tesseract-ocr` (Linux)\n\n### WASM OCR tessdata download\n\nThe first time you use WASM OCR, pdf-cli will download the required language data (~15MB for English).\n\n### Large PDF processing is slow\n\nFor very large PDFs (hundreds of pages), operations may take time. Use `--progress` to see a progress bar:\n\n```bash\npdf text large.pdf --progress\npdf split large.pdf -o output/ --progress\npdf merge -o combined.pdf *.pdf --progress\n```\n\nNote: pdf-cli automatically uses parallel processing for:\n- File validation when merging more than 3 files\n- Text extraction when processing more than 5 pages\n- OCR processing when using native Tesseract backend with more than 5 images\n\nThis significantly improves performance for batch operations.\n\n## Contributing\n\nContributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.\n\nQuick start:\n\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature/amazing-feature`\n3. Make your changes and add tests\n4. Run the full check suite: `make check-all`\n5. Commit your changes: `git commit -m 'Add amazing feature'`\n6. Push to your fork: `git push origin feature/amazing-feature`\n7. Open a Pull Request\n\nCode requirements:\n- All tests pass (`make test`)\n- Linter passes (`make lint`)\n- Coverage meets 75% threshold (`make coverage-check`)\n- Documentation updated as needed\n\n## Dependencies\n\nThis project uses the following open-source libraries:\n\n- [pdfcpu](https://github.com/pdfcpu/pdfcpu) - PDF processing library\n- [ledongthuc/pdf](https://github.com/ledongthuc/pdf) - PDF text extraction\n- [gogosseract](https://github.com/danlock/gogosseract) - WASM-based OCR (no external dependencies)\n- [progressbar](https://github.com/schollz/progressbar) - Progress bar display\n- [cobra](https://github.com/spf13/cobra) - CLI framework\n- [yaml.v3](https://gopkg.in/yaml.v3) - YAML configuration parsing\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- [pdfcpu](https://github.com/pdfcpu/pdfcpu) for the excellent PDF processing library\n- [ledongthuc/pdf](https://github.com/ledongthuc/pdf) for reliable text extraction\n- The Go community for great tooling and libraries\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flgbarn%2Fpdf-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flgbarn%2Fpdf-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flgbarn%2Fpdf-cli/lists"}